Constrained Optimization using Bayesian Bandit Algorithm - {recursion}

Finding the optimal distribution of budget among different ad groups so that one can optimize their business objective is a challenging and time-consuming task. One will have to constantly monitor changes and make decisions accordingly or can leave it to an intelligent system which looks at multiple features, trends for the data points and then comes up with suggestions.

Constrained Optimization

The problem we are trying to solve revolves around the distribution of budget among competing ad groups in a way that we optimize a business objective. For this we have to deal with partial information at the time of allocation and over a period of time identify the trend basis of performance.

Challenges

Use of Uninformative Prior

Uninformative Prior

Consider the above example where the Advertiser has assigned equal proportion to different ad groups irrespective of the reach that they target and their past performance. This expresses vague or general information about the problem. This is called the principle of indifference. Depending on the duration of the campaign we may not be able to converge to an optimized state from our original assumptions.

Estimated Reach

As we can see the estimated reach for all of these ad groups are different and this needs to be taken into consideration for our initial guess.

Initial Distribution

We help guess the best performing distribution for budgets basis of the past performance and estimated reach of the ad groups involved. We also take into account performance of different placement and optimization settings to make sure the high performing settings are assigned higher weights.

Continuous Updates

Based on the Advertiser objective we consider different metrics. Example for an ROI based objective we will consider metrics like ROI ( Return on Investment higher the better ), CPC ( Cost per click lower the better ) and CVR ( Conversion Rate higher the better ).

Binning

To calculate the amount of change that needs to be made to the original distribution, we need to analyze how the different metrics have changed over a certain period of time. We analyze data over different time periods which we call bins and in our use-case would be look like change in ROI from yesterday, change in ROI over last x days and change in ROI over last x day of weeks.

Change In ROI Change In CPC Change in CVR

As you can see there is certainly a trend in the metrics that needs to be captured while calculating the change that needs to be made to the original distribution.

We consider data from different time periods to measure how the different metrics have changed over time but they have to be weighted differently. Recent data should be given more preference and as we move away the weights should be decreased accordingly so that the trends is captured accurately.

Exploration vs Exploitation (Bayesian Bandit)

The tradeoff that any Advertiser faces is to strike a balance between exploitation of the ad group that is performing well versus exploration. We are trying to acquire new knowledge ( “exploration” ) and optimize based on the knowledge already acquired (“exploitation”). The current algorithm tries to balance these tasks over a certain time period. We also allow the Advertiser to tweak this balance based on his preference by introducing different modes.

The reason to consider Bayesian Bandit is to embrace uncertainty that is ever so present in the process. A Bayesian Bandit rewards the ad group that is performing well but also gives some chance to the ones not performing so well right now hoping that they would bring reward in the future and helps us control the dynamic allocation of resources to different ad groups

Aggressive
Moderate
Conservative

They are different in the way they come up with distribution of budget for the upcoming day e.g. in conservative mode, even if one of our actions lead to a huge improvement we would still be a little conservative in investing more in that particular ad group and for aggressive mode we would aggressively make changes to the ones performing very well and ones performing poorly.

Further Improvements

Currently performance of placement is calculated at a global level, we can calculate the performance of particular placement at Advertiser and objective level so that we could analyze what placement works best for a particular Advertiser and objective.
We can also involve risk into the calculation, as currently we only aim to maximize our expected utility.
Add a notion of memory in the system which could keep track of different states of the system and thus help us change the momentum of learning.

Results of Simulation

Simulation Result

Budget Allocation as suggested by our algorithm on 26th June, 2016 versus the uniform distribution being used by the Advertiser