3 Big Reasons Why A/B Tests Fail

A/B tests are more popular in today’s environment and can provide a great value to an organization. A/B tests are conducted when one is trying to understand the effect of a change to a system or environment. This type of testing has been done for many decades, but is known in academia and research organizations as experimental designs. Therefore, good A/B tests should follow sound methodological research practices, while remaining cognizant of practical limitations, such as sample size and construction and lack of a controlled environment, such as a laboratory. There are three reasons why many of these AB tests fail to yield expected results.

1. Failure to create hypotheses

A good design starts with a premise that will be tested. Teams shouldn’t conduct a test without knowing what the constructs of the tests, i.e. the parameters by which one concludes that a manipulation is the actual cause of the predicted change. A clearly stated test will understand what the expected results are, formulate a hypothesis around those predicted results, and then collect the appropriate data for testing.

For example, if you are changing a feature on a website, and you are looking to see if there is an increase in navigation clicks through your site because of that change, you would need to identify what constitutes the traceable data points and further asses the data collection in connection with the hypothesis. You must make sure that the data you collect is identifiable by the actual change otherwise you are increasing the likelihood of some other factor affecting your results.

2. Sample Collection Not Representative of the population.

It is extremely important in any test that your sample be representative of the population. When identifying a sample you must make sure that the sample points represent the whole set of your customers or of a general population. To do this you must examine your sample to ensure that the expected value of proposed groups that you might be testing, such as political affiliation, gender or computer operating systems, are similar to the expected value of the population.

One way to achieve this would be to use generally available data. For example, if you are examining political affiliation, the percentage of voters affiliated to a political party is generally known. Those expected values would suffice as a comparison of your sample to a population. Using a Chi-Squared Test you can easily determine if your sample is representing the population. Here is a quick video on the Chi-Squred Test and an example in R.

3. Using the wrong methodology.

There are number of techniques that are available for use in A/B testing. Software packages such as Optimizely, SiteGainer and VWO offer different methods to make it easy to run your AB tests, but the tester must be able to identify what methodology should be employed.

A/B Testers should not rely on just one method for their testing, they should be knowledgable in Bayesian Methods and Frequentists Inference.

Bayesian Analysis employs a series of tests against a control and under repeated measures continues the analysis increasing the sample size until a winner (between the control and manipulation) is chosen. In this case, the probability of the event occurring is calculated and the effect of the event or magnitude can be computed. For business applications this is a very practical approach. Other methods such as Frequentist Inferences are commonly used in experimental, design. In these methods, the distribution of the event is understood or known and inferences are made about the manipulation being different from the control. These tests are in the form of Analysis of Variance (ANOVA), t-tests, or non-parametric tests such as Kruskal-Wallis test.

This article is just a quick overview of AB testing and some of the areas in which AB testing fails at organizations. Testers should be acutely aware of the methods and become familiar with experimental designs in order to effectively create tests that are meaningful for their organization.