A/B Testing - Common Mistakes - Adjusting Traffic
A/B Testing: Common Mistakes
Adjusting Traffic proportions
Let’s say you are A/B testing a new feature and you’ve given that feature 1% of the traffic and 99% of traffic to control. After a week, the feature is looking promising, so you give the feature 10% of the traffic. And as you gain more statistical significance that the feature is working well, you keep giving it a larger portion of the traffic. traffic. After a month, your A/B test has statistical significance and you declare victory. Right? Not really.
What the matter?
As the test went on, the proportion of traffic was changed between control and treatment. During the test, the content of overall traffic will change. Then the control average will be weighted more towards the old traffic and the treatment average will be weighted more towards the new traffic.
Lets try an example in R. Suppose because of marketing efforts, or because of other successful tests, our traffic suddenly changed and our metric is higher during the second half of our experiment. Now lets have a treatment that actually has no effect, but we will increase our proportion in this second half from 10% to 50% of our data
set.seed(2015)
# For the first half of the experiment
# create a list of 100 random draws from the specific Normal distribution with mean=1
first_half <- rnorm(100, mean=1, sd=1)
# For the second half of the experiment
# create a second list of 100 random draws from a normal distribution with mean=2
second_half <- rnorm(100, mean=2, sd=1)
# Define control as having 90% of traffic from the first half
# and 50% of traffic from the second half
control <- c(first_half[1:90], second_half[1:50])
# Define treatment as having 10% of traffic from the first half
# and 50% of traffic from the second half
treatment <- c(first_half[91:100], second_half[51:100])
# Perform t-test
t.test(control, treatment)$p.value
## [1] 0.002112838
Here, we pretend treatment has no effect over control. We created a before and after distribution that treatment and control both used. All we did was change was the sampling proportion of before and after. As a result, even though treatment and control should show no significance, we can see that the test is showing a 0.002 p-value which is 99.8% significance.
What can we do about it?
We can adjust our statistical test to handle the situation. The adjusting the estimated mean and variance for different proportions is out of the scope of this article.
However, there is a simple way to avoid the situation entirely. If you want to start with 10% of traffic on the new treatment, only put 10% of traffic control. Then if you want to increase traffic to treatment, just add the same amount of traffic to control.
Lets continue our last example but this time we use a proportion of 10%/10% of control/treatment before the change, and then 50%/50% after the change
# Define control to be 10% from the first half
# and 50% of traffic from the second half
control <- c(first_half[1:10], second_half[1:50])
# Define treatment to be 10% from the first half
# and 50% from the second half
treatment <- c(first_half[91:100], second_half[51:100])
# Perform t-test
t.test(control, treatment)$p.value
## [1] 0.3189567
We can see the p-value is 32% which translates into into 68% significance. This non-significance is what we expected since we used a treatment with no effect.
There is one last note. If we keep both proportion of control and treatment exactly equal, this would mean that we can only ever have up to 50% of traffic on the new treatment. If you eventually want to have 80% of traffic on treatment, then start with 8% of traffic on treatment and 2% on control and increase proportionally to 80%/20% traffic. However, it is worth mentioning that the fastest way to statistical significance is with a 50/50 split of traffic.
Additional Information
If you absolutely must change your traffic proportions, then you need to adjust your mean and variance using over sampling and under sampling techniques. In this case, once the traffic proportions have changed, you have over sampled with respect to time and this should be fixed.
http://en.wikipedia.org/wiki/Oversampling_and_undersampling_in_data_analysis
http://www.data-mining-blog.com/tips-and-tutorials/overrepresentation-oversampling/
Conclusion
Becareful when shifting more traffic to your treatment group. Make sure the ratio of traffic is always the same between your control and all treatment groups.