About Hansel

7 Unexpected A/B Testing Mistakes That Could Skew Your Results

A/B testing is one of the most widely adopted forms of testing for websites, software, and apps. The reason for its ubiquity is because the tests are relatively simple to execute and can provide in-depth insights on customer behavior.

In turn, conversion rates often rise. Rather than relying merely on a hunch, the tangible data that’s generated allows you to fine-tune your product to better meet your customers’ needs.

But for A/B testing to truly be effective, you must follow the right set of procedures and avoid making some critical mistakes.

Here are seven mistakes, in particular, that could skew your results and how to prevent them from happening.

1.Using an inadequate sample size

This is perhaps the biggest mistake testers make made pound-for-pound. Failing to run your test on a large enough sample size marginalizes your results and can give you false confidence.

In fact, an inadequate sample size is one of the main factors of “sample pollution,” which invalidates your data. Some people may feel that a sample size of 100 per variation is sufficient for getting solid results, but that’s seldom the case. Generally speaking, you’ll want a sample size of at least 1,000.

So just how big of a sample size should you aim for when conducting your specific A/B test?

This Sample Size Calculator is a helpful tool for finding out. Simply enter your baseline conversion rate, minimum detectable effect and statistical power (95 percent or higher is ideal), and it will tell you what your minimum sample size should be.

This should ensure that you don’t run into any bad data and or fuzzy conclusions. That way you can be confident with the results and know for sure whether a variant is having a positive, negative or neutral impact.

2. Cutting a test short

Here’s the scenario. You’ve got a hypothesis that you’re nearly certain is correct. After running an A/B test for a little while, the results are validating your hypothesis.

So what do you do? You go ahead and cut the test short and make changes based on your hypothesis.

But after making those changes, you come to find that you’re not getting the results you anticipated and the conversion rate is virtually the same. Or even worse, the conversion rate has actually dropped.

This is a common rookie mistake that’s made all of the time. And it’s understandable, especially when you’re chomping at the bit to increase conversions.

But it’s vital that you don’t fall into this trap because it’s likely to skew your data. This brings up an important question.

What should the duration of your test be?

It depends on the volume of traffic/users you’re getting, but it really boils down to one thing—statistical confidence. Most CRO experts agree that you’ll want to achieve at least 95 percent statistical confidence before ending a test. In other words, there’s only a five percent chance that the results could be a fluke.

ConversionXL offers some solid advice for determining testing duration and that’s to aim for at least 350 – 400 conversions per variation. At that point, you can be pretty confident that your data is valid.

3. Running too many tests at once

A/B testing can be addicting. Once you discover the impact it can have, it’s tempting to go overboard with your tests and run multiple ones at the same time.

But you can quickly run into problems with this approach. Why?

There are two reasons. First, it’s easy to get overwhelmed with all of the data, and the results of each individual test may be undermined.

Second, it requires a larger sample size. The more variations you run, the higher the volume of traffic/users you need to generate reliable data.

This is why most seasoned CRO experts recommend running an absolute maximum of four tests at any given time. But if you’re just starting out and getting your feet wet, that number should likely be lower.

If fact, if you’re completely new to the process, it’s wise to stick with just one test until you get the hang of things.

4. Not taking seasonality into account

Say you’re an online retailer looking to A/B test a product page. If you run one test in December when annual sales are often at an annual high and another test in January where they’re often sluggish, the results could be dramatically different.

It’s like comparing apples to oranges, and you simply can’t trust the validity of your results. So you need to always be aware of the time periods in which you test.

To prevent this issue, be sure that you’re always running a test with a comparable period so that you know for certain that making a change truly makes a difference.

5. Biased sampling

Random sampling is absolutely essential for attaining valid results. What exactly do we mean by random sampling?

Business Dictionary provides a great definition. “A sampling method in which all members of a group have an equal and independent chance of being selected.”

This simply means that all of the individuals included in your test are chosen at random, which should ultimately provide you with accurate data.

The opposite of this is biased sampling where you choose a sample from the population in a manner where some of the representatives of the population are less likely to be included than others. Here’s an example.

Say that you’re an online retailer and only run your test only on the weekdays and skip the weekend. This will likely skew your results because the people who prefer to shop during the weekdays have a higher probability of being included in your sample than those who shop during the weekend.

The same is true if you only run your test for a portion of a business cycle. This is problematic because you’re not getting the full picture and may come to false conclusions.

Fortunately, there’s an easy solution and that’s to run your A/B tests during every day of the week and during an entire business cycle. Doing so ensures random sampling so that everyone included in your sample is an accurate representation of your visitors as a whole.

6. Not taking behavioral momentum into account

Behavioral momentum is defined as, “the general relation between resistance to change (persistence of behavior) and the rate of reinforcement obtained in a given situation.”

So how does this relate to A/B testing? Behavioral momentum can lead to visitor pollution, which can quickly skew your results. Here’s an example.

Say that you’re experimenting with a new layout that’s more intuitive and less complicated than the original one. Even though most objective visitors would view it as an improvement (which should increase conversions), you may find that many of your returning visitors will still prefer the original simply because humans are naturally resistant to change.

This means that you can’t necessarily trust the results on account of behavioral momentum.

The solution?

Make sure you’re only A/B testing new visitors and exclude returning visitors. This ensures a level of objectivity and more reliable results.

7. Cross-device pollution

The use of mobile devices has spiked dramatically in recent years. As a result, it’s increasingly common for people to use multiple devices to complete a single task over time.

A study by Google even found that 90 percent of people do this in some capacity.

For instance, a customer may initially come across your product while using their smartphone. At that point, they may want to see the full scope of features and move to their laptop for a larger viewing screen.

Unfortunately, this presents a problem for A/B testing because it can lead to cross-device pollination where a single visitor is counted as two or more different visitors. In turn, it’s hard to identify which variation was responsible for the conversion.

While cross-device pollination may only have a minimal impact on the micro scale, it can completely ruin your results and negate your overall efforts when it’s on the large scale.

Now there are two ways to prevent this from happening. One is to simply perform each A/B test on a single device (e.g. one for desktop, one for tablets and one for smartphones).

Another option is to use a platform that assigns users a user ID so that you can follow their path from device to device throughout the entire duration of their journey. Google Analytics actually has a feature called Cross-Device reports that allows you to do this with ease.

Maximizing the quality of your data

When done correctly, A/B testing can be incredibly beneficial and increase your conversion rate considerably. But you need to be aware of some common testing mistakes that can thwart your progress.

The seven issues listed here should definitely be on your radar as they can seriously skew your results making your A/B testing all for naught. But as long as you’re aware of these mistakes and alter your game plan accordingly, you should be in good shape.

What other factors concern you with A/B testing and your ability to generate accurate reports? Please let us know in the comments below.


You Might Also Like

    We would love to help you understand Hansel better