rw-book-cover

Metadata

Highlights

I sometimes ask this question when interviewing, and many data scientists mention users deleting their cookies as the main downside of running a test for too long. This means that if you run the test for too long, you may start serving Variants to Control users and vice versa, which can change the test outcome. (View Highlight)

A/B test is an experiment. And any experiment is designed to be short or time-bounded because it’s very context-dependent. (View Highlight)

if we have already invested in a particular test by doing all the steps and checks mentioned above, why ignore its most important benefit, such as a short timeline? (View Highlight)

What may happen if the test runs for too long, and why it is bad: • Overlapping Tests: The experiment will likely overlap with other tests, causing its impact to compound with the overall lift or decline. This overlap makes it difficult to interpret the test outcomes accurately. • Sample pollution: The longer your test runs, the more likely it is that your sample will become non-representative due to factors like new promotions and campaigns, holidays, seasonal effects, and site outages. These influences can lead to either inconclusive tests or worse - incorrect test outcomes. • Cookie deletion: As mentioned earlier, users tend to delete or clear cookies. For website A/B tests, cookies are used to identify user attributes and correctly place them in appropriate variants based on the targeting rules. When cookies are deleted, a user may be shown two different variants, which can (a) confuse the user and (b) cause the test to be unreliable. • Users changing their attributes: New users become returners, trial users convert into subscribers, churned customers become re-subscribers, etc. As users lose or gain new attributes, they move between different tests. The longer the test runs, the higher the probability of unqualified users entering the test, increasing variance and leading to inconclusive test outcomes. (View Highlight)

Avoid running experiments on complex user groups that have many properties. The more attributes you introduce, the more complex the experiment becomes. (View Highlight)

Avoid slow and disproportional rollouts. For example, once the test has been released to 10% traffic, do not reduce its size. (View Highlight)

You will most likely need to re-run the same test multiple times to confirm its outcome. (View Highlight)