Experimentation Tools

Experimentation Tools

Educational tools to help you with planning, analyzing, & presenting your test results.

Special thanks to Merritt Aho for his work creating and updating these tools.

The Standard Frequentist Sample Size Calculator is an educational tool that helps you design and analyze an a/b test.

Standard Frequentist Sample Size Calculator

Calculate the estimated sample size and runtime based on the estimated lift percentage, i.e., the lift required to make a decision.
Baseline conversion rate
Traffic volume to your experience over 30 days
Number of variations or treatments
Number of tails (1 or 2)
Confidence level
Power level
Sample size
Runtime (if less than 30 days)
Error risk visualization

Use it

Results Analysis Tool: Binomial & Continuous Metrics

Analysis tool for analyzing results for your A/B tests. Use for binomial or continuous metrics* such as revenue. Allows customization of visuals with your screenshots and hypothesis statement.
Metric Type (Binomial or Continous)
*For continuous metrics, you'll need to enter standard deviations calculated separately
Number of Conversions for Control & Treatment
Amount of Traffic for Control & Treatment
Statistical Significance Threshold
Number of Tails (1 or 2)
Total p-values being calculated (ie - how many metrics, treatments, how many times have you peeked to make a decision, etc)
SRM check
Revenue projection inputs - approximate $ value of a conversion
Estimated conversions per month the audience provides
Report customization inputs
Visualizations of the Key Results
Confidence Intervals of the Differences
Calculations Table
Confidence Intervals of Variants Chart
6 Months Revenue Projections

Use it

The Sequential Planning & Analysis Tool is an educational tool to help with planning and analyzing a/b tests.

Sequential Planning & Analysis Tool

Allows you to both plan for and analyze a frequentist-based, sequentially-designed experiment. Use after the runtime-based calculator.
Minimum detectable effect (required lift)
Base conversion rate
Confidence level
Power
1 or 2 tails
Amount of conversions to test area
Amount of traffic to test area
Number of days of traffic for those conversions & traffic numbers
Number of planned analyses (checkpoints)
Fixed-horizon sample size & runtime
Sequential maximum sample size & runtime
Maximum increase in runtime
Power analysis chart plot
Expected duration based on potential effect sizes and sample sizes
Decision Boundaries table for when you can safely “peek” with an associated chart displaying results you’ve entered at checkpoints showing when you cross binding or non-binding decision boundary
For more information on how to plan & analyze frequentist-sequential design tests, see this Q&A on the Analytics Toolkit written by Georgi Georgiev when Lucia Van de Brink chose to interview him to help explain the process.

Use it

Bayesian A/B Test Analysis Tool

Bayesian analysis tool for analyzing results for your A/B tests. Provides the probability that your treatment outperforms the control based on estimated priors using Monte Carlo simulations. (Includes an about Bayesian A/B testing section)
Historical traffic & conversions
$ value of one conversion
% change in the conversion rate that is negligible
How much $ the test would need to make you to justify implementation
Traffic & conversions for each variant
SRM check - % of traffic allocated to test variant
Variant A & B Conversion Rate
Observed Difference
Traffic Split
Variant Posteriors Chart
Effect Posterior Chart
Probabilities Explained
Probabilities Chart

Use it

The Frequentist Runtime-based calculator is an educational tool to help design and analyze an a/b test.

Frequentist Runtime-Based Calculator

Calculate the minimum lift you’d need for a given runtime. So, if you know how much runtime you have, this helps you determine the lift you need. Perfect for decision-makers.
Baseline conversion rate
Traffic volume to your experience over 30 days
Number of variations or treatments
Number of tails (1 or 2)
Confidence level
Power level
Minimum lift
Runtime (if less than 30 days)
Error risk visualization