Experimentation Tools

Educational tools to help you with planning, analyzing, & presenting your test results.

Special thanks to Merritt Aho for his work creating and updating these tools.

Standard Frequentist Sample Size Calculator

  • Calculate the estimated sample size and runtime based on the estimated lift percentage, i.e., the lift required to make a decision.

  • Baseline conversion rate

    Traffic volume to your experience over 30 days

    Number of variations or treatments

    Number of tails (1 or 2)

    Confidence level

    Power level

  • Sample size

    Runtime (if less than 30 days)

    Error risk visualization

Results Analysis Tool: Binomial & Continuous Metrics

  • Analysis tool for analyzing results for your A/B tests. Use for binomial or continuous metrics* such as revenue. Allows customization of visuals with your screenshots and hypothesis statement.

  • Metric Type (Binomial or Continous)

    *For continuous metrics, you'll need to enter standard deviations calculated separately

    Number of Conversions for Control & Treatment

    Amount of Traffic for Control & Treatment

    Statistical Significance Threshold

    Number of Tails (1 or 2)

    Total p-values being calculated (ie - how many metrics, treatments, how many times have you peeked to make a decision, etc)

    SRM check

    Revenue projection inputs - approximate $ value of a conversion

    Estimated conversions per month the audience provides

    Report customization inputs

  • Visualizations of the Key Results

    Confidence Intervals of the Differences

    Calculations Table

    Confidence Intervals of Variants Chart

    6 Months Revenue Projections

Sequential Planning & Analysis Tool

  • Allows you to both plan for and analyze a frequentist-based, sequentially-designed experiment. Use after the runtime-based calculator.

  • Minimum detectable effect (required lift)

    Base conversion rate

    Confidence level

    Power

    1 or 2 tails

    Amount of conversions to test area

    Amount of traffic to test area

    Number of days of traffic for those conversions & traffic numbers

    Number of planned analyses (checkpoints)

  • Fixed-horizon sample size & runtime

    Sequential maximum sample size & runtime

    Maximum increase in runtime

    Power analysis chart plot

    Expected duration based on potential effect sizes and sample sizes

    Decision Boundaries table for when you can safely “peek” with an associated chart displaying results you’ve entered at checkpoints showing when you cross binding or non-binding decision boundary

  • For more information on how to plan & analyze frequentist-sequential design tests, see this Q&A on the Analytics Toolkit written by Georgi Georgiev when Lucia Van de Brink chose to interview him to help explain the process.

Bayesian A/B Test Analysis Tool

  • Bayesian analysis tool for analyzing results for your A/B tests. Provides the probability that your treatment outperforms the control based on estimated priors using Monte Carlo simulations. (Includes an about Bayesian A/B testing section)

  • Historical traffic & conversions

    $ value of one conversion

    % change in the conversion rate that is negligible

    How much $ the test would need to make you to justify implementation

    Traffic & conversions for each variant

    SRM check - % of traffic allocated to test variant

  • Variant A & B Conversion Rate

    Observed Difference

    Traffic Split

    Variant Posteriors Chart

    Effect Posterior Chart

    Probabilities Explained

    Probabilities Chart

Frequentist Runtime-Based Calculator

  • Calculate the minimum lift you’d need for a given runtime. So, if you know how much runtime you have, this helps you determine the lift you need. Perfect for decision-makers.

  • Baseline conversion rate

    Traffic volume to your experience over 30 days

    Number of variations or treatments

    Number of tails (1 or 2)

    Confidence level

    Power level

  • Minimum lift

    Runtime (if less than 30 days)

    Error risk visualization