A/B Testing - Basics - 95% Significance
What does 95% significance level mean anyway?
Please read this other article for a technical explanation of 95% significance.For this article, I’d like to try to give a more intuitive explanation that I have been told.
To develop this intuition, we’re going to start flipping coins. However, we're unsure if this coin is a fair coin or not. This is an imperfect analogy since we rarely see fake coins, but this should help develop the intuition. So let’s start flipping the coin and suppose the first coin flip is heads. At this point, you would have no reason to believe the coin is a fake. The chance of getting heads on the first coin flip is 50%, which is also 50% significance.
Suppose the second coin flip is also heads. You’d probably still think the coin is probably real. The chance of getting two heads is 25%, which is a 75% significance.
Suppose the third and fourth coin flip are also heads. You might start to get suspicious, but probably would have a hard time saying for certain the coin is fake. The chance of getting four heads is 6.25% which is 93.75% significance.
Suppose the fifth and sixth coin flip are also heads. Now you’re probably wanting to take a good look at that coin. The chance of getting 5 heads in a row is 3.125% (96.875% significance) and the chance of getting 6 heads in a row is 1.5625% (98.4375% significance). This is the 95% significance level that is commonly used for tests. I suggest sticking with 95% confidence or higher.
However, if you tell yourself that I’ll be happy to decide after seeing only 3 or 4 heads, let me try this situation. Your company is running an A/B test and everyone is unsure if it is successful or not. It may or may not be losing money each day. But after deciding, the feature may be profitable or not for the remainder of the life of the product. How many heads would you wait for until you declared an answer? Are you actually willing to take those risks after only seeing 4 heads in a row?
Why 95%?
This is the confidence level that is commonly used for many experiments across many areas of science. However, it is not the only one. For those experiments hunting for the Higgs Boson particle, they are looking for a significance of 99.99997133%. But to get this level of significance, it takes lots and lots of data. So while 95% has become a standard, it depends on the level of certainty you’re looking for and how long you’re willing to wait to declare the test is over.
Lastly, consider this. You don’t need to defend using a 95% significance level since is it standard. It’ll only slightly higher to defend 99% significance level. It’ll be hardest to defend using a 90% confidence level.
Conclusion
So intuition for 95% significance is approximately equal to seeing 5 heads in a row.