A really interesting paper on A/B testing and experiments in online environments just got accepted to KDD 2012:
- Don’t make changes to your application if your average customers lifetime value will decline. Understand the change, consider alternative hypothesis, watch several metrics. Ensure that your findings align with the long term strategy so that long term growth is not sacrificed for short term financial gain. Example: one time Bing had a bug, which served poor search results, so distinct queries went up 10% and CTR on advertisements went up 30%.
- Ensure that your statistic results are trustworthy. Incorrect results may cause bad ideas to be deployed; good ideas may be ruled out by mistake.
- An upwards trend in a newly launched feature does not imply that users like the feature more. (delayed effect & primacy effect).
- Often running an experiment longer does not provide extra statistical power. Pick a duration and stick to it. Do not stop tests early (unless you use algorithms to tell you when you have statistical confidence enough to be able to stop your test)
- Re-run your experiment again if you get surprising results. Investigating the underlying reasons is often worth it.
- Watch for Carryover Effect… Run A/A experiments. If you use bucketing techniques to assign participants to experiments rerun the exerpiment with a larger test group and with local randomization.