Drip
FallstudienProzessKarriere
CRO LicenseCRO Audit
BlogRessourcenArtifactsStatistik-ToolsBenchmarksResearch
Kostenloses Erstgespräch buchenErstgespräch
Startseite/Blog/Minimum Detectable Effect: The Number That Makes or Breaks Your A/B Test
All Articles
Methodology13 min read

Minimum Detectable Effect: The Number That Makes or Breaks Your A/B Test

Set your MDE too high, you'll miss real wins. Set it too low, you'll wait forever. Here's how to find the right threshold — backed by data from thousands of e-commerce experiments.

Fabian GmeindlCo-Founder, DRIP Agency·March 13, 2026
📖This article is part of our The Complete Guide to A/B Testing for E-Commerce

The Minimum Detectable Effect (MDE) is the smallest improvement an A/B test is designed to detect with statistical confidence. Across DRIP's experiment database, the mean conversion rate uplift for winners is +2.91% and mean RPV uplift is +4.15% — but the distribution is heavily right-skewed, meaning most wins are small. Setting your MDE at 5% or higher means you'll miss the majority of real improvements.

Contents
  1. What Is Minimum Detectable Effect?
  2. Why MDE Matters More Than You Think
  3. What Effect Sizes Are Realistic in E-Commerce?
  4. How to Calculate the Right MDE for Your Store
  5. MDE and Metric Selection
  6. Common MDE Mistakes

What Is Minimum Detectable Effect?

MDE is the smallest relative improvement your test is designed to detect with a specified level of confidence and power.

Minimum Detectable Effect is the sensitivity threshold of your A/B test. It answers a precise question: what is the smallest real difference between control and variant that this test will reliably pick up? If the true effect is smaller than your MDE, the test does not have enough statistical power to distinguish it from noise — regardless of how long you run it at the planned sample size.

Think of MDE as the resolution of a microscope. A 10x lens can see large cells but misses bacteria. A 1000x lens sees everything but requires a fundamentally different setup. In A/B testing, that setup is sample size. The smaller the effect you want to detect, the more data you need — and the relationship is not linear.

+2.91%Mean CR Uplift (Winners)DRIP experiment database, 90+ e-commerce brands
+4.15%Mean RPV Uplift (Winners)Revenue per visitor shows larger effects than CR alone

These numbers from DRIP's data across thousands of experiments tell a clear story: most winning A/B tests produce modest uplifts. If your test is designed to detect only 10% relative improvements, you are blind to the majority of real value your testing program could capture.

DRIP Insight
MDE is a design parameter, not a result. You choose it before the test starts. It determines how long you need to run and how much traffic you need. Changing your MDE after seeing results is p-hacking in disguise.

Why MDE Matters More Than You Think

MDE is the single biggest driver of required sample size. Halving your MDE quadruples your sample requirement, which directly determines test duration, testing velocity, and the number of experiments you can run per year.

The relationship between MDE and sample size follows an inverse-square law. This means small changes in your MDE target create massive differences in how long tests need to run. Most teams underestimate this relationship, leading to one of two failure modes: tests that drag on for months (MDE too small) or tests that miss real winners (MDE too large).

Required sample size per variant at 80% power, 5% significance, 3% baseline conversion rate
MDE (Relative)Sample per Variant~Duration at 50k visitors/month
1%~3,500,000140 weeks
2%~875,00035 weeks
5%~140,0005.6 weeks
10%~35,0001.4 weeks
Counterintuitive Finding
Halving your MDE quadruples the required sample size. This is why setting MDE isn't a minor detail — it's the single biggest driver of test duration. A team targeting 1% MDE will run roughly 25x fewer experiments per year than one targeting 5%.

The cost of getting MDE wrong flows in both directions. An MDE that is too ambitious for your traffic level means tests run for months, your testing velocity collapses, and stakeholders lose patience with the program. An MDE that is too lenient means you are only detecting large effects — the obvious wins — while systematically missing the compounding small improvements that drive long-term growth.

This is the central trade-off in experimentation program design. You are balancing sensitivity (ability to detect small effects) against velocity (number of experiments per year). The optimal MDE sits at the intersection of what your traffic can support and what your business needs to detect.

What Effect Sizes Are Realistic in E-Commerce?

Most e-commerce A/B test wins produce small uplifts. The median winner lifts conversion rate by about 2% relative, and the distribution is heavily right-skewed — a few big wins pull the mean above the median.

Before you set your MDE, you need to understand the distribution of real effect sizes in e-commerce. This is where most teams go wrong — they anchor on case studies showing 30% or 50% uplifts and assume those outcomes are normal. They are not. They are survivorship-biased outliers.

~2%Median CR Uplift (Winners)Half of all winning experiments produce less than 2% relative lift
~3.5%Median RPV Uplift (Winners)Revenue effects tend to be somewhat larger than CR effects
42 daysMedian Test DurationMost properly powered tests run 4-6 weeks
Distribution of winning effect sizes — conversion rate (DRIP experiment data, 90+ brands)
Effect Size Bucket% of WinnersCumulative %
0 – 1% relative18%18%
1 – 3% relative37%55%
3 – 5% relative20%75%
5 – 10% relative16%91%
10%+ relative9%100%

The data is unambiguous: 55% of winning experiments produce less than 3% relative uplift, and 75% produce less than 5%. The winners above 10% exist, but they are fewer than one in ten. If your testing program can only detect effects of 5% or larger, you are systematically discarding three-quarters of your real wins.

Common Mistake
If your MDE is 5% relative, you're only detecting the top ~25% of real effects. The other 75% — still profitable — get classified as inconclusive. Over a year of testing, that's dozens of missed revenue improvements.

How to Calculate the Right MDE for Your Store

The right MDE balances statistical sensitivity with practical constraints. It is determined by the intersection of four variables: baseline conversion rate, desired power (typically 80%), significance level (typically 5%), and available sample size — which is itself a function of traffic and maximum acceptable test duration.

There is no universal correct MDE. The right value depends on your traffic, your baseline metrics, and your business context. But there are two practical frameworks for arriving at the right number — one works backward from your traffic, the other works backward from your revenue.

The Business-First Approach

Start with your monthly unique visitors and determine the maximum test duration your organization will tolerate — typically 4 to 6 weeks. Multiply your weekly traffic by your maximum weeks to get the total available sample. Then use a standard power calculation to determine the smallest effect size detectable at 80% power and 5% significance, given that sample and your baseline conversion rate.

For example, a store with 200,000 monthly visitors and a 3% baseline conversion rate can detect a relative MDE of roughly 3.5% in a 4-week test at 80% power. That is your floor — you cannot reliably detect anything smaller without extending the test or increasing traffic.

The Revenue-First Approach

Start with the smallest uplift that would be worth implementing. Calculate the annualized revenue impact of a 1%, 2%, and 5% relative conversion rate improvement. If a 2% uplift adds EUR 50,000 in annual revenue and the cost of implementation is EUR 5,000, the ROI is clear — you should set your MDE to detect that 2% improvement. If your traffic cannot support it, that is a constraint to solve, not a reason to raise your MDE.

  1. Determine your baseline conversion rate — use a full 4-week period to account for weekly cycles. Use the specific page or funnel step you are testing, not the site-wide average.
  2. Set your maximum test duration — 4 weeks is the sweet spot for most stores. Going beyond 6 weeks introduces cookie expiration, seasonality, and stakeholder fatigue.
  3. Calculate available sample per variant — divide your eligible traffic by the number of variants (typically 2). Only count visitors who actually see the test — not total site traffic.
  4. Run the power calculation backward — input your baseline rate, sample per variant, 80% power, and 5% significance. The output is your achievable MDE.
  5. Compare to your revenue threshold — if the achievable MDE is larger than the smallest effect worth detecting, you have a gap. Close it with more traffic, longer duration, or variance reduction techniques like CUPED.
Pro Tip
Run your MDE calculation backward: given your traffic and a 4-week window, what's the smallest effect you can detect at 80% power? If that number is above 5%, you need a different testing strategy — higher-impact tests, composite metrics, or variance reduction.

MDE and Metric Selection

Your choice of primary metric directly affects your achievable MDE. Metrics with lower variance (like conversion rate) allow smaller MDEs at the same sample size, while higher-variance metrics (like revenue per visitor) require more data but capture effects that conversion rate misses.

MDE is not just about traffic — it is about the variance of your chosen metric. A metric with high variance (large spread around its mean) requires more observations to distinguish a signal from noise. This is why revenue per visitor, despite being a more complete measure of business impact, is harder to move with statistical confidence than conversion rate.

Revenue per visitor is affected by both conversion probability and order value. A single high-value order can swing RPV dramatically, inflating variance. Conversion rate, by contrast, is binary (0 or 1 per visitor) and has inherently lower variance at typical e-commerce conversion rates. The practical implication: the same test that can detect a 3% relative MDE on CR may only detect a 6-8% relative MDE on RPV.

Metric comparison: typical variance and achievable MDE at 200k monthly visitors, 4-week test, 80% power
MetricRelative VarianceTypical Achievable MDETrade-off
Conversion RateLow3 – 5%Misses AOV effects
Revenue per VisitorHigh6 – 10%Captures full revenue impact
Average Order ValueMedium-High5 – 8%Only measures buyers, smaller sample
Add-to-Cart RateLow-Medium2 – 4%Higher-funnel, more observations
DRIP Insight
DRIP's data shows mean RPV uplift (+4.15%) exceeds mean CR uplift (+2.91%) for the same experiments — suggesting revenue effects are often larger than conversion effects. This is consistent with the hypothesis that many optimizations shift both conversion probability and basket composition.

The practical recommendation: use conversion rate as your primary metric for statistical power, but always monitor RPV as a guardrail. If a variant lifts CR but tanks RPV, you are converting more low-value buyers — a net negative. If a variant lifts RPV but not CR, you may be missing a real revenue improvement because CR was not sensitive enough to capture it.

Common MDE Mistakes

The most frequent MDE mistakes are setting it arbitrarily without a power calculation, confusing MDE with expected effect size, ignoring metric variance, and using one-size-fits-all thresholds across different test types.
  1. Setting MDE too ambitiously for your traffic
  2. Confusing MDE with expected or observed effect size
  3. Ignoring metric variance when choosing MDE
  4. Using the same MDE for every test regardless of context
  5. Not recalculating MDE when traffic or baseline rates change

Setting MDE too ambitiously. Targeting a 1% MDE when your site gets 50,000 monthly visitors means tests that would take over a year to complete. No organization sustains that cadence. An MDE of 1% is achievable only for sites with millions of monthly visitors. For everyone else, it is a theoretical aspiration that kills testing velocity in practice.

Confusing MDE with expected effect size. MDE is not a prediction — it is a sensitivity threshold. If you set your MDE at 5%, you are not saying you expect a 5% lift. You are saying your test is designed to detect effects of 5% or larger. If the true effect is 3%, a test with a 5% MDE will likely call it inconclusive — even though the effect is real and profitable.

Ignoring metric variance. Two metrics can have the same mean but wildly different variances. A 3% MDE on conversion rate is not equivalent to a 3% MDE on revenue per visitor. If you do not account for the variance of your chosen metric, your power calculation will be wrong and your test will be either overpowered (wasting time) or underpowered (missing effects).

Using one MDE across all tests. A radical checkout redesign should be held to a different standard than a button-color test. High-effort, high-risk changes warrant lower MDEs (higher sensitivity) because the cost of missing a real degradation is greater. Low-effort tweaks can tolerate higher MDEs because the cost of a false negative is limited to the small implementation effort.

Not recalculating when conditions change. Your traffic shifts seasonally. Your baseline conversion rate changes as you deploy winners. Your metric variance shifts as your product mix evolves. An MDE that was appropriate six months ago may no longer be achievable — or may now be unnecessarily conservative. Recalculate at least quarterly.

For a deeper understanding of how power and MDE interact, see our guide on statistical power in A/B testing.

Empfohlener nächster Schritt

Die CRO Lizenz ansehen

So arbeitet DRIP mit paralleler Experimentation für planbares Umsatzwachstum.

KoRo Case Study lesen

€2,5 Mio. zusätzlicher Umsatz in 6 Monaten mit strukturiertem CRO.

Frequently Asked Questions

For most e-commerce stores, an MDE of 2-5% relative is realistic. High-traffic sites (1M+ monthly visitors) can target 1-2%. Low-traffic sites may need to accept 5-10% or switch to higher-impact test strategies.

MDE is set before the test; significance is measured after. Your test is designed to detect effects of MDE size or larger with your chosen significance level (typically 5%) and power (typically 80%).

Relative MDE (percentage change from baseline) is standard practice because it's comparable across different baseline rates. A 10% relative uplift on a 2% baseline (0.2pp) is very different from 10% relative on a 10% baseline (1pp).

You have three options: target larger changes (redesigns vs tweaks), use variance reduction techniques like CUPED, or test on higher-funnel metrics with more observations (e.g., add-to-cart vs purchase).

Verwandte Artikel

Methodology14 min read

CUPED: The Variance Reduction Technique That Cuts A/B Test Duration in Half

CUPED uses pre-experiment data to reduce noise in A/B test metrics, cutting required sample sizes by 20-50%. Learn how it works and when it fails.

Read Article →
Methodology14 min read

Statistical Power in A/B Testing: Why Most Tests Are Under-Powered

Statistical power determines whether your A/B test can detect real effects. Learn why 80% isn't always enough and how to properly power e-commerce experiments.

Read Article →
A/B Testing8 min read

A/B Testing Sample Size: How to Calculate It (And Why Most Get It Wrong)

How to calculate A/B test sample sizes correctly, why stopping early creates false positives, and practical guidance for different traffic levels.

Read Article →

Get your MDE right — before the test starts.

DRIP's pre-test power analysis ensures every experiment is designed to detect effects that matter. No wasted traffic, no missed winners.

Talk to an expert

The Newsletter Read by Employees from Brands like

Lego
Nike
Tesla
Lululemon
Peloton
Samsung
Bose
Ikea
Lacoste
Gymshark
Loreal
Allbirds
Join 12,000+ Ecom founders turning CRO insights into revenue
Drip Agency
Über unsKarriereRessourcenBenchmarks
ImpressumDatenschutz

Cookies

Wir nutzen optionale Analytics- und Marketing-Cookies, um Performance zu verbessern und Kampagnen zu messen. Datenschutz