27 Jan

When do Powerball or Mega Millions actually return you money?

With the Powerball Lottery having recently reached $1.58 billion, there are lots of useful posts about calculating the expected value of your money. Everyone already knows about how many bankrupt lotto winners there are. Some people have written about about buying every ticket. And others have written about how the annuity is a better value than you might think.

Calculating the return on the lottery is mostly straight forward and there are many (so many, oh so very many) places that calculate this. The one tricky part is how to estimate the number of tickets sold. Some articles only mention this and others try to estimate it. I present a new methodology that can easily estimate this, hopefully in a better way.

What is Expected Value

Expected Value is the average result of something if it is repeated an infinite number of times. Since infinite is hard to do, it can be calculated by taking an average of each distinct result multiplied by how probable it is. In context of the lottery, we are taking the average of each type of payout multiplied by its probability. And after calculating the expected value, this can be compared to the price of the ticket. If the expected value is more than the price of the ticket, you are expected to win more money than you spend. All casino games (with the exception of possibly table poker) have a lower expected value than the cost of playing. All lottery games do also, even this $1.58 billion Powerball lottery as I will explain below.

Odds of winning

The odds of winning Powerball and Mega Millions are well known. The odds of hitting the jackpot is easy to calculate. We take the number of combinations of the 5 numbers multiplied by the number of Powerball / Megaball numbers. Specifically

Unfortunately, calculating the odds of other prizes are more complicated. There is a prize for only hitting the special ball (Powerball or Megaball). So you might think the odds of this would be 1 in 26/15, the range of the Powerball/Megaball number. But this is incorrect because this also includes the possibility of hitting one of the other prizes (special ball and some additional numbers). So these other winning combinations need to be removed to find the odds of hitting the special ball alone. Calculating these can be tricky and is not the point of this article..

Strangely enough, both Powerball and Mega Millions use a non-standard definition of odds. Odds are usually defined as the ratio of  the positive result count to the negative result count. So the odds stated in the Powerball and Mega Millions websites should actually use one less than the number of combinations (292,201,337 and 258,890,849) since one combination is a positive result and the remainder are negative results. As a specific example, the Mega Millions odds table includes all the possible results. If the odds were stated with the standard definition, the probability of a negative result is equal to \frac{odds}{odds + 1}. So we can convert all the odds into positive probabilities and the sum should equal 100%, but it actually equals 68.8%. The probabilities in these lottery tables are calculated by \frac{1}{odds} and this sum now (almost) equals 100%.

MatchPrizeOddsProbability = 1 - (Odds) / (Odds + 1)Probability = 1 / Odds
5 + 0$1,000,000 18,492,203.5714
0.000000054
0.000000054
4 + 1$5,000
739,688.1429

0.000001352
0.000001352
4 + 0$500
52,834.8673

0.000018927
0.000018927
3 + 1$50
10,720.1180
0.000093274
0.000093283
3 + 0$5
765.7227
0.001304253
0.001305956
2 + 1$5
472.9464
0.002109943
0.002114405
2 + 0$0
33.7819
0.028750586
0.029601651
1 + 1$2
56.4712
0.017400019
0.017708141
1 + 0$0
4.0337
0.198661025
0.247911347
0 + 1$1
21.3906
0.044661599
0.046749507
No Matches$0
1.5279
0.395585268
0.654493095
Total 0.688586303
0.999997720

The Megaplier odds are similar. The standard \frac{odds}{odds + 1} definition of odds gives a total probability of 78.1% while the \frac{1}{odds} definition equals 100%. For the Powerball side of things, looking at the Powerplay odds also shows the same problem with odds. Using the table with the 10X multiplier, the sum of probabilities calculated from the standard \frac{odds}{odds + 1} definition of odds equals 72.2%. But the sum of probabilities calculated from the \frac{1}{odds} definition equals 100%. The table without the 10X multiplier has similar results.

Expected value without Multiplier or Jackpot

The non-jackpot prize amounts are fixed and the expected value is easy to calculate. As stated before, each prize amount is multiplied by the probably of happening and added together.

However, there is some variations in how taxes are handled. All lottery winnings are subject to a 39.6% federal tax. Most places, like New York, have a state income tax. Some cities, like New York City, also have a city tax  (sucks to be me). Some other states, like California, have no income tax or no tax on lottery winnings (lucky to be you). Expected values for these two situations are included below. In addition, I include a third situation. When claiming your winnings, large prizes are paid directly by the state lottery office and taxes are partially deducted automatically. Small amounts are paid by any lottery retailer without any taxes deducted. I imagine some people might forget to declare these smaller winnings in their tax returns so I included two more columns for these situations.

Looking at the expected value, unsurprisingly, you can see that each dollar spent will return you less than one dollar. Interestingly, you can see that Mega Millions is a better value.  Don't forget that Powerball is $2 per ticket, so we need to half the expected value to make the two values comparable.

Mega Millions TierOddsPrizePre-tax Expected ValueCA Post-tax Expected ValueNYC Post-tax Expected ValueCA Partial-tax Expected ValueNYC Partial-tax Expected Value
5 + 0 18,492,203.57
$1,000,000.00
$0.0541
$0.0327
$0.0258
$0.0327
$0.0258
4 + 1 739,688.14
$5,000.00
$0.0068
$0.0041
$0.0032
$0.0041
$0.0032
4 + 0 52,834.87
$500.00

$0.0095

$0.0057
$0.0045
$0.0095
$0.0057
3 + 1 10,720.12
$50.00
$0.0047
$0.0028
$0.0022
$0.0047
$0.0028
3 + 0 765.72
$5.00
$0.0065
$0.0039
$0.0031
$0.0065
$0.0039
2 + 1 472.95
$5.00
$0.0106
$0.0064
$0.0050
$0.0106
$0.0064
1 + 1 56.47
$2.00
$0.0354
$0.0214
$0.0169
$0.0354
$0.0214
0 + 1 21.39
$1.00
$0.0467
$0.0282
$0.0223
$0.0467
$0.0282
Total Expected Value $0.1742
$0.1052
$0.0831
$0.1501
$0.0975
Powerball TierOddsPrizePre-tax Expected ValueCA Post-tax Expected ValueNYC Post-tax Expected ValueCA Partial-tax Expected ValueNYC Partial-tax Expected Value
5 + 0 11,688,053.52
$1,000,000.00
$0.0856
$0.0517
$0.0408
$0.0517
$0.0408
4 + 1 913,129.18
$50,000.00
$0.0548
$0.0331
$0.0261
$0.0331
$0.0261
4 + 0 36,525.17
$100.00
$0.0027
$0.0017
$0.0013
$0.0027
$0.0017
3 + 1 14,494.11
$100.00
$0.0069
$0.0042
$0.0033
$0.0069
$0.0042
3 + 0 579.76
$7.00
$0.0121
$0.0073
$0.0058
$0.0121
$0.0073
2 + 1 701.33
$7.00
$0.0100
$0.0060
$0.0048
$0.0100
$0.0060
1 + 1 91.98
$4.00
$0.0435
$0.0263
$0.0207
$0.0435
$0.0263
0 + 1 38.32
$4.00
$0.1044
$0.0630
$0.0498
$0.1044
$0.0630
Total Expected Value $0.3199
$0.1932
$0.1526
$0.2643
$0.1754
Expected Value per Dollar $0.1599
$0.0966
$0.0763
$0.1322
$0.0877

Expected Value with multiplier without Jackpot

Both lotteries have a random prize multiplier that costs an extra $1 called the Powerplay and Megaplier. Intuitively, by spending only $1, you can increase your non-jackpot winnings by at least 2 times. So we should see increased expected value.

Each multiplier has some odds of being selected that week. So, using the same method above, we first calculate the expected values for each of the multipliers' and their prizes. The result of this calculation is given in the first column below. Then we calculate the expected value of expected values of multipliers, which is the second expected value column below. To make things a little more complicated, Powerplay includes a 10X multiplier when the jackpot is below $150M.

As you can see, the multipliers will increase your expected value but, unsurprisingly, not enough make money from the lottery. Again, Mega Millions remains a better value after adding the lotteries' multipliers. As before, since we are spending an extra dollar for the multiplier, we need to divide by the total cost of the ticket so all expected values are comparable.

MegaplierPre-Tax Expected Value (similar to table above)OddsPre-Tax Expected Value for this table
2 $0.3485
7.50
$0.0465
3 $0.5227
3.75 $0.1394
4 $0.6969
5 $0.1394
5 $0.8712
2.5 $0.3485
Total Expected Value $0.6737
Expected Value per Dollar $0.3368
PowerplayPre-Tax Expected Value (similar to table above)Odds with 10X multiplierPre-Tax Expected Value with 10XOdds without 10X multiplierPre-Tax Expected Value without 10X
2 $0.6398
1.7917
$0.3571
1.7500
$0.3656
3 $0.8741
3.3077
$0.2643
3.2308
$0.2705
4 $1.1084
14.3333
$0.0773
14.0000
$0.0792
5 $1.3427
21.5000
$0.0625
21.0000
$0.0639
10 $2.5143
43.0000
$0.0585
Total Expected Value $0.8196
$0.7792
Expected Value per Dollar $0.2732
$0.2597

Finally, the results for the same post tax situations as in the earlier section are given for completeness. These are all expected values per dollar.

Tax SituationMega Millions w/o MegaplierMega Millions w/ MegaplierPowerball w/o PowerplayPowerplay w/ 10X multiplierPowerplay w/o 10X multiplier
Pre-tax $0.1742
$0.3368
$0.1599
$0.2732
$0.2597
Post-tax CA $0.1052
$0.2035
$0.0966
$0.1650
$0.1569
Post-tax NYC $0.0831
$0.1607
$0.0763
$0.1303
$0.1239
Partial-tax CA $0.1501
$0.2903
$0.1322
$0.2306
$0.2184
Partial-tax NYC $0.0975
$0.2753
$0.0877
$0.2170
$0.2051

Estimating Tickets Sold

We now have to estimate the expected value of the jackpot and simply add it to the numbers above. This can be tricky because, unlike before, having multiple winners will split our winnings. And to calculate the number of winners, we need to first estimate the number of tickets sold. Some places claim the number of tickets sold is half of the sales. I believe it is true that half of sales goes into the prize pool, but not all of that goes into the jackpot. We know how much the jackpot increases, but I don't know where to find overall sales. Florida, Texas, Colorado, New Mexico, and Virginia have FAQs which state that a maximum 34.0066% of sales are contributed towards the jackpot. In addition to not reaching that maximum, other states might have different maximums.

But there is a better way. Both Powerball and Mega Millions gives the details of how many winning tickets there were sold for each prize for each drawing. They even give the breakdown of winners with and without the $1 multiplier. So to estimate the number of tickets sold, we just multiply each type of winner by the probability of winning. Next, since we know the number of tickets sold with the $1 multiplier, we can estimate the total sales and then calculate the percent of sales that goes towards the jackpot. Once we know that percentage, it is easy to calculate the number of tickets sold for any Jackpot from the increase in its cash value.

For a specific example, lets take the $1.58B Powerball drawing. If we take the count of each prize winners multiplied by the probability of winning, we get nine estimates of the number of tickets sold, one for each tier of winner. But, looking at the number of tickets sold, using the number of jackpot winners gives an unreliable (high variance) estimate since there are so few winners. Similarly, looking at earlier weeks, there are sometimes very few (5+0) and (4+1) ticket winners. So I decided not use the top 3 tiers when looking at the estimates. This leaves us with six estimates of the number of tickets sold.

Powerball 1/13/2016 TierPowerball winning ticketsPowerplay winning ticketsOdds of WinningApproximate tickets sold
5 + 13n/a 292,201,338
876,604,014
5 + 073
8
11,688,053.52
946,732,335
4 + 1827
107
913,129.18
852,862,654
4 + 020,544
2,834
36,525.17
853,885,424
3 + 147,685
6,597
14,494.11
786,769,279
3 + 01,164,124
157,552
579.76
766,254,878
2 + 1895,097
120,695
701.33
712,405,403
1 + 16,343,237
840,981
91.98
660,804,372
0 + 114,595,721
1,914,561
38.32
632,674,006

You can see that the estimates are not the same for each tier and it is obvious to see that they increase as the probabilities get lower. However, looking at 11/14/2015, we can see that the estimates also vary but they decrease as the probabilities get lower. For 12/9/2015, there doesn't seem to be a trend with the estimates. In addition to this strange behavior, these six estimates should be much closer since we have so many winners. Specifically speaking, they are outside a 99% confidence interval based on binomial proportions. One possible reason for these anomalies is the assumption that all numbers are equally chosen. This is almost certainly not true since many people use dates to select numbers and others avoid the number 13. In the 1/13/2016 Powerball drawing, there are many low numbers which caused the increasing estimates as the probabilities get lower. So to handle this, we can look over more dates and the estimate of percent revenue to jackpot should average out.

Powerball1/13/2016 Powerball tickets won1/13/2016 Powerplay tickets won1/13/2016 estimated tickets sold12/9/2015 Powerball tickets won12/9/2015 Powerplay tickets won12/9/2015 estimated tickets sold11/14/2015 Powerball tickets won11/14/2015 Powerplay tickets won11/14/2015 estimated tickets sold
5 + 13N/A 876,604,014
0N/A00N/A0
5 + 073
8
946,732,335
1
0
11,688,054
0
0
0
4 + 1827
107
852,862,654
16
4
18,262,584
13
2
13,696,938
4 + 020,544
2,834
853,885,424
369
78
16,326,751
232
65
10,847,975
3 + 147,685
6,597
786,769,279
899
215
16,146,439
584
146
10,580,700
3 + 01,164,124
157,552
766,254,878
25,499
5,804
18,148,227
14,667
3,868
10,745,852
2 + 1895,097
120,695
712,405,403
19,331
4,173
16,484,060
13,178
3,624
11,783,747
1 + 16,343,237
840,981
660,804,372
141,406
30,052
15,770,707
106,298
28,557
12,403,963
0 + 114,595,721
1,914,561
632,674,006
322,035
67,339
14,920,812
263,030
71,231
12,808,882
Avg Tickets Sold 735,465,560
16,299,499
11,528,520
Percent Powerplay tickets sold11.66%
17.44%
21.26%
Total Revenue $1,556,653,562.80
$35,442,203.15
$25,508,567.54
Increase in Cash Value of Jackpot $394,692,000
$10,664,000
Ignored
% of Revenue into Jackpot25.36%
30.09%
Ignored

% of Revenue into Jackpot

If we average the bottom six estimates of the 1/13/2016 drawing, this gives us an average tickets sold of  735,465,560. From the winners web page, we also know the ratio of Powerball to Powerplay tickets won which gives us an estimate for the ratio of tickets sold. Since Powerball tickets are $2 each and Powerplay tickets are $3 each, we now there was approximately $1,556,653,563 in revenue for this drawing. Since the cash value of the jackpot increased from $588M to $983M, This gives us about 25.36% of revenue into the jackpot for the 1/13/2016 drawing.

So lets go over more dates and average for a better estimate. To keep my life simple, I estimated from 10/7/2015 forward since the Powerball odds changed on that date. Unfortunately, there is still one last wrinkle. The minimum increase in the annuity jackpot is $10M. So when the jackpot increases the minimum amount, there is no relation between jackpot increase and tickets sold. For this reason I used only the drawing of 11/4/2015 and all the drawings from 12/9/2015 until 1/13/2016. Averaging all of the percent revenue to jackpot gives a result of  28.90%. This is a bit lower than the maximum of 34.0066% listed on a few lottery FAQs. So most states probably don't reach their maximum or have a lower maximum than 34.0066%. Lets take a look at Megamillions. There is a minimum increase of $5M for Megamillions. Using all the other drawings since 9/15/2015, the percent revenue to jackpot for Mega Millions is 26.63%

There are a few assumptions made here. We assume the states don't change how they are contributing to the jackpot. Even if they don't change, we can still run into problems if each state accounts for a different percent and states contribution proportion changes as the jackpot increases.

Calculating Expected Value with Current Jackpot

First, we need to estimate number of tickets sold for a specific drawing, it takes a few steps. First, to estimate the total revenues, take the increase in cash value from the previous drawing. If you only have the previous week's annuity value, you can convert by multiplying the annuity value by 0.62 to get the cash value.  Next, divide by 28.9% or 26.63% for Powerball and Megamillions respectively. This gives us the total sales since the last drawing. Finally, We know that each Powerball ticket is $2, each Powerplay ticket is $3 or each Megamillions ticket is $1, each Megaplier ticket is $2. We just need to estimate the ratio of ticket types and we can find out how many tickets were sold in total.

For the 1/13/2016 drawing, 11.6% of the tickets won were Powerplay tickets. However, for smaller jackpots, as much as 21.4% of the tickets won were Powerplay tickets.For the recent drawings, the percent of Megaplier tickets only ranged from 10% to 12.5% but this didn't include any of the large Mega Millions jackpots. Choose an appropriate percent of Powerplay/Megaplier tickets based on the jackpot size using the section below as guidance.

Finally, the number of tickets can be calculated as

And once we estimate the number of tickets sold, we can calculate the probability of having a different number of winners and then calculate the expected value of the jackpot. For the big 1/13/2016 drawing of Powerball, the details are worked out below assuming 735,465,560 tickets sold and  $594,075,072 cash value jackpot after applying only 39.6% federal income tax. For the 1/13/2016 Powerball jackpot, a $2 ticket had an expected value per dollar of  $0.3713. When combining this with the non-jackpot prizes, we have a total of $0.4679 per dollar, less than the dollar that we spent. For a $3 powerplay ticket, we have an expected value per dollar of  $0.2475 for the jackpot. After combining with the non-jackpot prizes, we get an expected value of  $0.4044 per dollar, also less than the dollar we spent. This says, considering a large jackpot, we are better off not spending the extra dollar on power play. This is in contrast to when we didn't consider any jackpot at all.

Number of WinnersProbabilityAssuming I win, how much will I get after splitting with other winners?Assuming I win, what is my expected value?
08.07%
469,201,278.72
$37,865,851.28
120.31%
234,600,639.36
$47,653,836.57
225.56%
156,400,426.24
$39,981,285.86
321.45%
117,300,319.68
$25,158,046.01
413.50%
93,840,255.74
$12,664,470.67
56.79%
78,200,213.12
$5,312,707.82
62.85%
67,028,754.10
$1,910,284.41
71.02%
58,650,159.84
$601,018.97
80.32%
52,133,475.41
$168,083.78
90.09%
46,920,127.87
$42,306.39
Total Expected Value $171,357,891.76
Expected Value of $2 ticket (not assuming I win) $0.5864
Expected Value per Dollar $0.2932

On 1/8/2016, there was a winner of a $165M Mega Millions annuity jackpot. After converting to cash value and taking out only federal income tax, we are left with  $61,789,200.00. Using an estimated  41,075,760 tickets sold, we get an expected value of $0.2207 per dollar for a combined total of $0.3259 for a $1 Mega Millions ticket. For a $2 ticket with the megaplier, we get  $0.3138 per dollar. Here, we are also better of not spending the additional $1 for larger jackpots and instead of spending the extra money on more tickets.

Below are plots for the most recent jackpot Powerball and Mega Millions wins. The plots contain the growth of the jackpot along with the growth of expected value. We can see that expected value continued to climb even as the jackpot climbs. Hopefully, one day we will see a jackpot that would give a higher expected value than the cost.

Powerball Jackpot and Expected Value Mega Millions Jackpot and Expected Value

Powerplay and Megaplier

One more important detail is how many people buy the $1 multipliers as the Jackpot increases. It looks like the regular players like to buy the $1 multipliers for both lotteries. But as the jackpot increases, people would rather buy more tickets for a better chance at the jackpot.

While the Powerball curve below includes a very large jackpot, the Megamillions plot only has medium sized jackpots. For Megamillions, I looked at some of the recent large jackpots. The largest jackpot on 3/30/2012 had 5.3% Megaplier purchased. The second largest jackpot on 12/18/10213 had 7.4%, and the most recent large jackpot on 3/18/2014 was 7.7%.

Powerball Jackpot and Powerplay Mega Millinos Jackpot and Megaplier

27 Jun

A/B Testing – Nonparametric tests

A/B Testing

Nonparametric Statistics

Most A/B testing platforms use Student’s t-test to test for statistical significance. However, this test has assumptions that need to be met. It also has some known short comings. This is where the Mann-Whitney U Test comes in handy. It has fewer assumptions and a different set of short comings. Instead of using the data directly, this test will convert all data points into a rank by combining all test groups into one group and computing the combined rank. It then analysis the groups separately using the combined rank.

Normality of Mean Assumption

This is not the same as a normality of data assumption. This assumption is saying that if we hypothetically repeated this test many times and computed the mean each time, then the distribution of mean is Normal. This is called the Central Limit Theorem. It states that as you get more and more data points, the distribution of mean is more and more Normally distributed. This is true for any set of data, even if the data itself is not Normally distributed. However, if your data is not Normally distributed, then it takes more and more data before the Central Limit Theorem becomes accurate. More details about this are given in links below.

This assumption is not required of the Mann-Whitney U test. Since this test uses rank, it removes almost all details of the specific distribution of the data and this assumption is much easier to meet.

So if the data is not normally distributed, the Mann-Whitney U test is actually more “efficient” than the t-test and is almost as “efficient”" when the data is normally distributed. (Reference)

Lets look at an example. Lets look at the Poisson distribution with shape=0.2 and rate=10

x <- seq(0, 3, length=100)
y <- dgamma(x, shape=0.2, rate=10)
plot(x, y, type="n", main="Poisson Density Function (shape=0.2, rate=10)")
lines(x, y)

Nonparametric Posson

We can see this distribution is not normally distributed. Lets draw a sample of 1000 from this distribution, and also from a slightly different distribution. We will perform both a t-test and Mann-Whitney U test and get the p-values. Lets repeat this 100 times and find the mean of the p-values.

## [1] 0.1681351
## [1] 0.004257654

We can see that the t-test doesn’t have significance with an average p-value of 0.17. The Mann-Whitney U test has significance with an average of 0.004.

Lets try this again, but with something that looks more Normally Distributed. We will use a Poisson distribution but with a shape parameter=10 and rate parameter=10

x <- seq(0, 3, length=100)
y <- dgamma(x, shape=10, rate=10)
plot(x, y, type="n", main="Poisson Density Function (shape=10, rate=10)")
lines(x, y)

Nonparametric Posson 2

Lets run our experiment again.

p.value <- sapply(1:100, function(x) run_once(1000, 10, 10.4, 10))
mean(p.value[1,])
## [1] 0.04503111
mean(p.value[2,])
## [1] 0.05102875

We can see both tests have about the same signifiance. The t-test pvalue is 0.045 and the Mann-whitney U test is about 0.051.

Outliers

The t-test has problems dealing with outliers (Link). Mann-Whitney U test doesn’t suffer from this problem since everything is converted into ranks. Outliers are no different than any slightly large value.

Lets look at the previous example in the other post, but use the Mann-Whitney U test instead. We will create two data sets with different means. Then add a single outlier point to one of them.

set.seed(2015)

# Create a list of 100 random draws from a normal distribution 
# with mean 1 and standard deviation 2
data1 <- rnorm(100, mean=1, sd=2)

# Create a second list of 100 random draws from a normal distribution 
# with mean 2 and standard deviation 2
data2 <- rnorm(100, mean=2, sd=2)

# Perform a t-test on these two data sets
# and get the p-value
t.test(data1, data2)$p.value
## [1] 0.0005304826
# Perform a Mann-Whitney-U test on these two data sets
# and get the p-value
wilcox.test(data1, data2)$p.value
## [1] 0.0002706636
# append 1000 to the first data set only
data1 <- c(data1, 1000)

# Perform a t-test on these two data sets
# and get the p-value
t.test(data1, data2)$p.value
## [1] 0.369525
# Perform a Mann-Whitney-U test on these two data sets
# and get the p-value
wilcox.test(data1, data2)$p.value
## [1] 0.0004766358

We can see the Mann-Whitney U Test finds statistical significance before adding the outlier. After we add the outlier, the p-value increases slightly, but the result is still significant. The t-test has significance before the outlier, but after the outlier, the t-test loses significance.

Ties in the values

The Mann-Whitney U test works best if every value is unique. This is normally not a problem for continuous data. If you have many zeros in your data, or you have count data, this will result in many ties. There are various ways to resolve ties, but the results are no longer exact, but approximate. Approximate isn’t necessarily bad since the t-test is also approximate if the data is not normally distributed.

Lets repeat our first test. This test found statistical significance with the wilcox test. This time, we will round off our values to create ties and see how the test performs.

set.seed(2015)

run_once <- function() {
    # Create a list of 100 random draws from an exponential distribution 
    # with rate=1
    data1 <- rgamma(1000, shape=0.2, rate=10)
    
    # Create a second list of 100 random draws from an exponential
    # distribution with rate=2
    data2 <- rgamma(1000, shape=0.24, rate=10)
    
    # Perform a Mann-Whitney U test on these two data sets
    a <- wilcox.test(data1, data2)$p.value
    
    # Perform a Mann-Whitney U test after rounding off two data sets
    b <- wilcox.test(round(data1*500)/500, round(data2*500)/500)$p.value
    
    # Perform a Mann-Whitney U test after rounding off two data sets
    c <- wilcox.test(round(data1*100)/100, round(data2*100)/100)$p.value
    
    # Perform a Mann-Whitney U test after rounding off two data sets
    d <- wilcox.test(round(data1*25)/25, round(data2*25)/25)$p.value
        
    # Perform a Mann-Whitney U test after rounding off two data sets
    e <- wilcox.test(round(data1*5)/5, round(data2*5)/5)$p.value

    c(a, b, c, d, e)
}

p.value <- sapply(1:100, function(x) run_once())
# mean of continuous data
mean(p.value[1,])
## [1] 0.004257654
# mean of discrete data
mean(p.value[2,])
## [1] 0.01127112
# mean of discrete data
mean(p.value[3,])
## [1] 0.02937118
# mean of discrete data
mean(p.value[4,])
## [1] 0.08112481
# mean of discrete data
mean(p.value[5,])
## [1] 0.3527484

We can see here that the average p-value is 0.004 before we start rounding off the data. As we round off the data more and more, we create more and more ties and we can see that we lose significance.

Additional Information

http://en.wikipedia.org/wiki/Mann%E2%80%93Whitney_U_test

http://www.statisticalengineering.com/central_limit_theorem.htm

http://spin.atomicobject.com/2015/02/12/central-limit-theorem-intro/

Conclusion

If your data is not Normally distributed or contains outliers, consider using the Mann-Whitney U test.

13 Jun

A/B Testing - Common Mistakes - Outliers

A/B Testing: Common Mistakes

Exploratory Data Analysis: Outliers

Outliers can easily cause problems with your A/B test. You may have seen strange anomalies with your data metrics, a particular metric being too high or too low compared to the others. You may have seen your statistical test first start significant and then become not significant. These problems may be coming from outliers in your data.

Let look at an example in R. We will create two data sets, the first data set less than the other one.

set.seed(2015)
# Create a list of 100 random draws from a normal distribution 
# with mean 1 and standard deviation 2
data1 <- rnorm(100, mean=1, sd=2)
# Create a second list of 100 random draws from a normal distribution 
# with mean 2 and standard deviation 2
data2 <- rnorm(100, mean=2, sd=2)
# Perform a t-test on these two data sets and get the p-value
t.test(data1, data2)$p.value
## [1] 0.0005304826

We can see this t-test will give a p-value of 0.0005 which is a significance level of 99.95%. Now lets add a single outlier into the first data set.

# append 1000 to the first data set only
data1 <- c(data1, 1000)
# Perform a t-test on these two data sets and get the p-value
t.test(data1, data2)$p.value
## [1] 0.369525

Now, you can see, even though we had 100 points in each data set, a single large outlier caused our data to become non-significant with a 63% significance level only

How do we fix this?

There are multiple ways to fix this problem. Student’s t-test is not robust against outliers and we can run Mann-Whitney U test instead. A deeper discussion of this approach is outside the scope of this article.

We can also detect the outliers and consider removing them. This approach needs to be taken very carefully. We should only remove data that does not come from our target population. If we see one example data point that is an outlier, it may be unlucky to see such a strange data point. However, it may also be unlucky to see only one one such data point. Therefore, outliers need to be removed only after careful examination. For A/B testing, this usually means removing data that is coming from bots and not humans. This can be difficult because not all bots and scripts report their User Agent properly.

Lets look at some diagnostic tools with R. We will create 100 points from the same distribution. Then, we will add 2 outliers to our data set.

set.seed(2015)
# Create a list of 100 random draws from a normal distribution 
# with mean 1 and standard deviation 2
data <- rnorm(100, mean=1, sd=2)

# lets add two outliers
data <- c(data, 20)
data <- c(data, 40)

# Create an image with two plots side by side
par(mfrow=c(1, 2))
hist(data)
boxplot(data, main="Box plot of data")

download (1)

On the left is a histogram. We can see the outlier at 40. It is more questionable if 20 is an outlier. For the boxplot on the right, the box itself contains the 25% to 75% of the data. The thick line in the middle of the plot is the median. The “whisker” at the top and bottom of the plot are the min and max of the data except for “outliers”. A good explanation of outliers in box plots in R can be found at the bottom of this page http://msenux.redwoods.edu/math/R/boxplot.php

So now we have found a few outliers in our data. Remember, it is important to carefully consider each point before removing them, since we easily could have seen more data at that point rather than only one. One technique to try is to perform the test again but with the point removed. If the test gives the same result, then we might as well leave the data point in.

Outliers in more than one dimension

If your data contains two variables, there is another type of outlier to look for. Lets look at this plot. It has 100 points again, but with two correlated variables. Then, we add a single outlier.

set.seed(2015)
# Create a list of 100 random draws from a normal distribution 
# with mean 1 and standard deviation 2
data1 <- rnorm(100, mean=1, sd=2)

# Lets create a second correlated variable.
correlation <- 0.95
data2 <- correlation * data1 + sqrt(1-correlation) * rnorm(100, mean=1, sd=2)

#Lets add our outlier
data1 <- c(data1, -3)
data2 <- c(data2, 4)

par(mfrow=c(1, 1))
plot(data1, data2)

download (2)

Here most of the data lies close to the lower-left to upper-right diagonal. We have a single point on the upper left of the plot. In any single dimension that particular point is right in the range of the data. But combined in two dimensions, it becomes an outlier.

We can find this point by computing something called Leverage. Though this is generally used to find outliers during linear regression, we can use it here to help detect some outliers.

# create a matrix with our two data sets
data_matrix <- matrix(c(data1, data2), nrow=101, ncol=2)
tail(data_matrix)
##              [,1]        [,2]
## [96,]   0.1888523  0.37660990
## [97,]  -2.3504425 -2.08643945
## [98,]   0.9110532  0.06500559
## [99,]  -1.0946785 -1.09385952
## [100,] -2.4602479 -2.40095114
## [101,] -3.0000000  4.00000000
# leverage is also knows as hat values
leverage <- hat(data_matrix)
tail(leverage)
## [1] 0.01121853 0.03700946 0.02901691 0.02292416 0.04241248 0.71791422

Above are the last 6 rows of the matrix and you can see our outlier as the last point. The corresponding leverage values are also given. You can see the very high leverage value for the last point. As a rule of thumb, leverage values that exceed twice the average leverage value should be examined more closely. However, for an A/B test, we have many observations and a wide range of leverage values. In this case, I would start examining the highest leverage points and work your way down.

Alternatives

As mentioned above, Student's t-test is not very robust to outliers. There are other tests that are more robust to outliers and are based on each observation's ranks instead of actual value. You can start looking at the Mann-Whitney U test and enter the world of non-parametric statistics

http://en.wikipedia.org/wiki/Mann%E2%80%93Whitney_U_test

http://en.wikipedia.org/wiki/Nonparametric_statistics

http://www.originlab.com/index.aspx?go=Products/Origin/Statistics/NonparametricTests

http://support.minitab.com/en-us/minitab/17/topic-library/basic-statistics-and-graphs/hypothesis-tests/nonparametrics-tests/understanding-nonparametric-tests/

http://sphweb.bumc.bu.edu/otlt/MPH-Modules/BS/BS704_Nonparametric/BS704_Nonparametric2.html

Additional Information

Here are some other links about leverage and other types of outlier detection

http://en.wikipedia.org/wiki/Leverage_(statistics)

http://en.wikipedia.org/wiki/Outlier

http://onlinestatbook.com/2/regression/influential.html

http://pages.stern.nyu.edu/~churvich/Undergrad/Handouts2/31-Reg6.pdf

Conclusion

Bots and scripts can cause problems in your A/B tests. It is important to try to detect these users in your data and remove them since they do not represent your target population.