Drip
FallstudienProzessKarriere
CRO LicenseCRO Audit
BlogRessourcenArtifactsStatistik-ToolsBenchmarksResearch
Kostenloses Erstgespräch buchenErstgespräch
Startseite/Blog/Sample Ratio Mismatch: The First Thing to Check in Every A/B Test
All Articles
Methodology11 min read

Sample Ratio Mismatch: The First Thing to Check in Every A/B Test

If your 50/50 split shows 51.2% vs 48.8%, is that normal variance or a broken experiment? SRM is the most reliable diagnostic for A/B test integrity — and it takes 30 seconds to check.

Fabian GmeindlCo-Founder, DRIP Agency·March 13, 2026
📖This article is part of our The Complete Guide to A/B Testing for E-Commerce

Sample Ratio Mismatch (SRM) occurs when the observed traffic split in an A/B test deviates significantly from the intended allocation. A 50/50 test showing 52/48 at scale is a red flag — it means the randomization or tracking is broken, and your results are unreliable regardless of what the p-value says. SRM should be the first check you run on every experiment.

Contents
  1. What Is Sample Ratio Mismatch?
  2. Why SRM Invalidates Your Results
  3. Common Causes of SRM
  4. How to Detect SRM
  5. What to Do When You Find SRM
  6. SRM in Practice: How Often Does It Happen?

What Is Sample Ratio Mismatch?

SRM is a statistically significant deviation between the observed and expected traffic split in an A/B test. It signals that something in the experiment infrastructure — randomization, tracking, or delivery — is broken.

In a 50/50 A/B test with 100,000 visitors, you would expect roughly 50,000 in each group. Due to random variation, a split of 50,200 vs 49,800 is perfectly normal — the binomial distribution guarantees some fluctuation. But a split of 51,500 vs 48,500 is a different story entirely. At that sample size, a deviation of 1,500 is astronomically unlikely under random assignment, and it means something systematic is pushing users disproportionately into one group.

SRM is detected using a chi-squared goodness-of-fit test that compares observed frequencies to expected frequencies. The test is simple: compute the chi-squared statistic from the deviation between observed and expected counts, then check the resulting p-value. A p-value below 0.001 is strong evidence that the split is not the result of chance — it is the result of a bug.

50 / 50Expected splitIntended allocation ratio
51.5 / 48.5Observed split1,500-visitor deviation at 100K total
p < 0.001Chi-squared resultDefinitive SRM — not random variation
Common Mistake
SRM doesn't just suggest a problem — it proves one. When detected, the experiment's results are invalid. No amount of statistical significance in the metric analysis can compensate for broken randomization.

Why SRM Invalidates Your Results

If the groups aren't randomly assigned, the treatment comparison is biased. SRM means systematic differences exist between groups before the treatment even takes effect, making any observed metric difference uninterpretable.

A/B testing rests on a single foundational assumption: random assignment creates statistically equivalent groups. When randomization works, the only systematic difference between control and variant is the treatment itself. Every other characteristic — purchase intent, device type, time of day, prior behavior — is balanced across groups by the law of large numbers. SRM breaks this assumption at the root.

Consider a concrete example: your variant includes a new hero image that is 400KB larger than the control. The variant page loads 200ms slower, causing some users to bounce before the tracking pixel fires. Those bounced users are never counted in the variant group. The result: the variant group is smaller than expected (SRM detected) and is systematically enriched with more patient, higher-intent users — who would have converted at higher rates regardless of the treatment. Your test shows a 'win,' but the lift is an artifact of survivorship bias, not a genuine treatment effect.

Counterintuitive Finding
An SRM-affected test showing a 'win' might actually be a loss. If slower-loading variant pages cause low-intent users to bounce before tracking, the remaining users convert higher — but overall performance (including bounced users) could be worse.

Common Causes of SRM

SRM is usually caused by tracking implementation issues, variant performance differences, or inconsistent bot filtering. The root cause determines the direction and magnitude of the bias.
Common SRM causes, mechanisms, and bias direction
CauseMechanismDirection of Bias
Slow variant loadingUsers bounce before tracking firesVariant group smaller, biased toward patient users
JavaScript errors in variantTracking code fails to executeVariant group smaller, missing error-affected users
Bot filtering differencesBots blocked differently per variantUnpredictable direction
Redirect testsServer-side redirects lose users in transitVariant group smaller
Cookie-based assignment with deletionUsers re-randomized on return visitsGroups drift over time
Cache differencesCDN serves stale pages to some usersDepends on implementation
DRIP Insight
In DRIP's experience across thousands of experiments, the #1 cause of SRM in e-commerce tests is variant JavaScript errors that prevent the tracking pixel from firing. Always check your browser console in both control and variant before analyzing results.

How to Detect SRM

Use a chi-squared test or binomial test on the observed traffic counts versus the expected allocation. SRM detection is computationally trivial and should be automated for every experiment.

The Chi-Squared Test

The chi-squared goodness-of-fit test is the standard method for SRM detection. You take the observed visitor counts per group, compute the expected counts from the intended allocation ratio and total sample, then calculate the chi-squared statistic: X² = Σ (observed - expected)² / expected. With one degree of freedom (two groups), a chi-squared value above 10.83 corresponds to p < 0.001 — definitive evidence of SRM.

When to Check

Check SRM continuously throughout the test. This is one area where peeking is not only acceptable but encouraged. Unlike peeking at metric results, checking SRM does not inflate false positive rates — it is a data quality diagnostic, not a hypothesis test. An SRM check asks whether the experiment is running correctly, not whether the treatment is working.

Pro Tip
Add an automated SRM check to your experimentation pipeline. At DRIP, every experiment triggers an SRM alert if the chi-squared p-value drops below 0.001 at any point during the test.
  • p < 0.001 — Definite SRM. Investigate immediately. Do not report results until root cause is identified and resolved.
  • p < 0.01 — Likely SRM. Monitor closely over the next 24-48 hours. If the p-value continues to drop, treat as confirmed SRM.
  • p < 0.05 — Possible SRM. Flag for review at the end of the test. May resolve as sample size grows, but warrants scrutiny.

What to Do When You Find SRM

Stop the test, investigate the root cause, fix the issue, and restart with a clean population. Do not attempt to salvage data from an SRM-affected experiment.
  1. Pause the test immediately. Stop accumulating corrupted data. Every additional day of a broken experiment is wasted traffic.
  2. Check variant code for JavaScript errors. Open both control and variant in an incognito browser, inspect the console, and look for errors that fire only in one group.
  3. Compare page load times between control and variant. Use your analytics or a tool like WebPageTest to measure real-user performance for both experiences.
  4. Check for bot traffic differences. Segment traffic by user agent and verify that bot filtering is consistent across groups.
  5. Verify tracking implementation. Confirm that the experiment tracking fires at the same point in the page lifecycle for both groups, and that no race conditions exist.
  6. Fix the root cause. Address the specific issue — whether it is a slow asset, a broken script, or a misconfigured redirect.
  7. Restart with a clean population. Do not resume the existing test. Clear the experiment state and begin fresh so that historical bias does not carry over.
Common Mistake
Do NOT try to 'adjust' for SRM by reweighting the groups or trimming data. If randomization is broken, the bias is unknown and cannot be corrected statistically. The only valid response is to fix the cause and rerun.

The temptation to salvage weeks of test data is understandable — no team wants to restart an experiment that has been running for two weeks. But corrupted data produces corrupted decisions. Shipping a false positive (or false negative) based on biased data will cost far more in lost revenue and misallocated engineering effort than the cost of restarting a test. The math is unambiguous: restart.

SRM in Practice: How Often Does It Happen?

SRM is more common than most teams realize. Industry data suggests 5-10% of A/B tests have detectable SRM, with rates exceeding 20% in poorly instrumented setups.

Published research from Microsoft and other large-scale experimentation platforms indicates that 5-10% of all A/B tests exhibit detectable SRM. That figure reflects well-instrumented platforms with dedicated engineering teams. In less mature setups — particularly e-commerce teams using client-side testing tools without server-side validation — the rate can climb above 20%. Redirect tests are especially prone to SRM because every redirect introduces an opportunity for user loss.

5-10%Industry SRM rateOn well-instrumented platforms
20%+SRM rate in weak setupsClient-side tools without server-side validation
Highest riskRedirect testsEvery redirect introduces user-loss opportunities

At DRIP, every experiment is automatically checked for SRM at multiple checkpoints throughout its lifecycle. Tests with confirmed SRM are flagged, investigated, and resolved before any results are reported to the client. This is not optional — it is a non-negotiable quality gate in our experimentation process. We treat SRM the same way a lab would treat a contaminated sample: the data is discarded and the experiment is rerun under controlled conditions.

See how DRIP runs reliable A/B tests at scale →

Empfohlener nächster Schritt

Die CRO Lizenz ansehen

So arbeitet DRIP mit paralleler Experimentation für planbares Umsatzwachstum.

KoRo Case Study lesen

€2,5 Mio. zusätzlicher Umsatz in 6 Monaten mit strukturiertem CRO.

Frequently Asked Questions

For a 50/50 split with large samples (10K+ per group), a chi-squared test with p < 0.01 indicates problematic SRM. Small deviations (50.1% vs 49.9%) are expected from random variation. The threshold depends on sample size — the chi-squared test handles this automatically.

Yes. SRM applies to any intended split ratio. If you designed a 70/30 test and observe 73/27, the SRM check compares against the intended 70/30 ratio, not 50/50.

SRM affects all metric comparisons because it breaks the assumption of comparable groups. Even if your primary metric shows no significant effect, other metrics may be biased. The entire experiment is compromised.

Yes. SRM checking should be automated and run on every experiment. It is computationally trivial (a single chi-squared test), detects fundamental problems, and costs nothing. There is no reason not to check.

Verwandte Artikel

Methodology12 min read

The Peeking Problem: Why Checking Your A/B Test Early Destroys Results

Checking A/B test results early inflates false positives from 5% to over 20%. Learn why the peeking problem is so dangerous and what frameworks actually solve it.

Read Article →
Methodology14 min read

Statistical Power in A/B Testing: Why Most Tests Are Under-Powered

Statistical power determines whether your A/B test can detect real effects. Learn why 80% isn't always enough and how to properly power e-commerce experiments.

Read Article →
A/B Testing9 min read

What E-Commerce Brands Get Wrong About A/B Testing

Six expensive A/B testing mistakes — with real test data from SNOCKS and Blackroll proving why best practices and cosmetic tests destroy ROI.

Read Article →

Every test. Every time. Checked.

DRIP automatically monitors every experiment for SRM and other data quality issues — so you never ship a decision based on broken data.

See how we test

The Newsletter Read by Employees from Brands like

Lego
Nike
Tesla
Lululemon
Peloton
Samsung
Bose
Ikea
Lacoste
Gymshark
Loreal
Allbirds
Join 12,000+ Ecom founders turning CRO insights into revenue
Drip Agency
Über unsKarriereRessourcenBenchmarks
ImpressumDatenschutz

Cookies

Wir nutzen optionale Analytics- und Marketing-Cookies, um Performance zu verbessern und Kampagnen zu messen. Datenschutz