How much sample ratio deviation is acceptable?

For a 50/50 split with large samples (10K+ per group), a chi-squared test with p < 0.01 indicates problematic SRM. Small deviations (50.1% vs 49.9%) are expected from random variation. The threshold depends on sample size — the chi-squared test handles this automatically.

Can SRM occur in 70/30 or other unequal splits?

Yes. SRM applies to any intended split ratio. If you designed a 70/30 test and observe 73/27, the SRM check compares against the intended 70/30 ratio, not 50/50.

Does SRM only affect the winning direction?

SRM affects all metric comparisons because it breaks the assumption of comparable groups. Even if your primary metric shows no significant effect, other metrics may be biased. The entire experiment is compromised.

Should I check SRM for every test?

Yes. SRM checking should be automated and run on every experiment. It is computationally trivial (a single chi-squared test), detects fundamental problems, and costs nothing. There is no reason not to check.

Sample Ratio Mismatch (SRM): The A/B Test Integrity Check You Can't Skip

Q: Can SRM occur in 70/30 or other unequal splits?

Yes. SRM applies to any intended split ratio. If you designed a 70/30 test and observe 73/27, the SRM check compares against the intended 70/30 ratio, not 50/50.

Q: Does SRM only affect the winning direction?

SRM affects all metric comparisons because it breaks the assumption of comparable groups. Even if your primary metric shows no significant effect, other metrics may be biased. The entire experiment is compromised.

Q: Should I check SRM for every test?

Yes. SRM checking should be automated and run on every experiment. It is computationally trivial (a single chi-squared test), detects fundamental problems, and costs nothing. There is no reason not to check.

What Is Sample Ratio Mismatch?

SRM is a statistically significant deviation between the observed and expected traffic split in an A/B test. It signals that something in the experiment infrastructure — randomization, tracking, or delivery — is broken.

In a 50/50 A/B test with 100,000 visitors, you would expect roughly 50,000 in each group. Due to random variation, a split of 50,200 vs 49,800 is perfectly normal — the binomial distribution guarantees some fluctuation. But a split of 51,500 vs 48,500 is a different story entirely. At that sample size, a deviation of 1,500 is astronomically unlikely under random assignment, and it means something systematic is pushing users disproportionately into one group.

SRM is detected using a chi-squared goodness-of-fit test that compares observed frequencies to expected frequencies. The test is simple: compute the chi-squared statistic from the deviation between observed and expected counts, then check the resulting p-value. A p-value below 0.001 is strong evidence that the split is not the result of chance — it is the result of a bug.

50 / 50Expected splitIntended allocation ratio

51.5 / 48.5Observed split1,500-visitor deviation at 100K total

p < 0.001Chi-squared resultDefinitive SRM — not random variation

Common Mistake

SRM doesn't just suggest a problem — it proves one. When detected, the experiment's results are invalid. No amount of statistical significance in the metric analysis can compensate for broken randomization.

Why SRM Invalidates Your Results

If the groups aren't randomly assigned, the treatment comparison is biased. SRM means systematic differences exist between groups before the treatment even takes effect, making any observed metric difference uninterpretable.

A/B testing rests on a single foundational assumption: random assignment creates statistically equivalent groups. When randomization works, the only systematic difference between control and variant is the treatment itself. Every other characteristic — purchase intent, device type, time of day, prior behavior — is balanced across groups by the law of large numbers. SRM breaks this assumption at the root.

Consider a concrete example: your variant includes a new hero image that is 400KB larger than the control. The variant page loads 200ms slower, causing some users to bounce before the tracking pixel fires. Those bounced users are never counted in the variant group. The result: the variant group is smaller than expected (SRM detected) and is systematically enriched with more patient, higher-intent users — who would have converted at higher rates regardless of the treatment. Your test shows a 'win,' but the lift is an artifact of survivorship bias, not a genuine treatment effect.

Counterintuitive Finding

An SRM-affected test showing a 'win' might actually be a loss. If slower-loading variant pages cause low-intent users to bounce before tracking, the remaining users convert higher — but overall performance (including bounced users) could be worse.

Common Causes of SRM

SRM is usually caused by tracking implementation issues, variant performance differences, or inconsistent bot filtering. The root cause determines the direction and magnitude of the bias.

Common SRM causes, mechanisms, and bias direction

Cause	Mechanism	Direction of Bias
Slow variant loading	Users bounce before tracking fires	Variant group smaller, biased toward patient users
JavaScript errors in variant	Tracking code fails to execute	Variant group smaller, missing error-affected users
Bot filtering differences	Bots blocked differently per variant	Unpredictable direction
Redirect tests	Server-side redirects lose users in transit	Variant group smaller
Cookie-based assignment with deletion	Users re-randomized on return visits	Groups drift over time
Cache differences	CDN serves stale pages to some users	Depends on implementation

DRIP Insight

In DRIP's experience across thousands of experiments, the #1 cause of SRM in e-commerce tests is variant JavaScript errors that prevent the tracking pixel from firing. Always check your browser console in both control and variant before analyzing results.

How to Detect SRM

Use a chi-squared test or binomial test on the observed traffic counts versus the expected allocation. SRM detection is computationally trivial and should be automated for every experiment.

The Chi-Squared Test

The chi-squared goodness-of-fit test is the standard method for SRM detection. You take the observed visitor counts per group, compute the expected counts from the intended allocation ratio and total sample, then calculate the chi-squared statistic: X² = Σ (observed - expected)² / expected. With one degree of freedom (two groups), a chi-squared value above 10.83 corresponds to p < 0.001 — definitive evidence of SRM.

When to Check

Check SRM continuously throughout the test. This is one area where peeking is not only acceptable but encouraged. Unlike peeking at metric results, checking SRM does not inflate false positive rates — it is a data quality diagnostic, not a hypothesis test. An SRM check asks whether the experiment is running correctly, not whether the treatment is working.

Pro Tip

Add an automated SRM check to your experimentation pipeline. At DRIP, every experiment triggers an SRM alert if the chi-squared p-value drops below 0.001 at any point during the test.

p < 0.001 — Definite SRM. Investigate immediately. Do not report results until root cause is identified and resolved.
p < 0.01 — Likely SRM. Monitor closely over the next 24-48 hours. If the p-value continues to drop, treat as confirmed SRM.
p < 0.05 — Possible SRM. Flag for review at the end of the test. May resolve as sample size grows, but warrants scrutiny.

What to Do When You Find SRM

Stop the test, investigate the root cause, fix the issue, and restart with a clean population. Do not attempt to salvage data from an SRM-affected experiment.

Pause the test immediately. Stop accumulating corrupted data. Every additional day of a broken experiment is wasted traffic.
Check variant code for JavaScript errors. Open both control and variant in an incognito browser, inspect the console, and look for errors that fire only in one group.
Compare page load times between control and variant. Use your analytics or a tool like WebPageTest to measure real-user performance for both experiences.
Check for bot traffic differences. Segment traffic by user agent and verify that bot filtering is consistent across groups.
Verify tracking implementation. Confirm that the experiment tracking fires at the same point in the page lifecycle for both groups, and that no race conditions exist.
Fix the root cause. Address the specific issue — whether it is a slow asset, a broken script, or a misconfigured redirect.
Restart with a clean population. Do not resume the existing test. Clear the experiment state and begin fresh so that historical bias does not carry over.

Common Mistake

Do NOT try to 'adjust' for SRM by reweighting the groups or trimming data. If randomization is broken, the bias is unknown and cannot be corrected statistically. The only valid response is to fix the cause and rerun.

The temptation to salvage weeks of test data is understandable — no team wants to restart an experiment that has been running for two weeks. But corrupted data produces corrupted decisions. Shipping a false positive (or false negative) based on biased data will cost far more in lost revenue and misallocated engineering effort than the cost of restarting a test. The math is unambiguous: restart.

SRM in Practice: How Often Does It Happen?

SRM is more common than most teams realize. Industry data suggests 5-10% of A/B tests have detectable SRM, with rates exceeding 20% in poorly instrumented setups.

Published research from Microsoft and other large-scale experimentation platforms indicates that 5-10% of all A/B tests exhibit detectable SRM. That figure reflects well-instrumented platforms with dedicated engineering teams. In less mature setups — particularly e-commerce teams using client-side testing tools without server-side validation — the rate can climb above 20%. Redirect tests are especially prone to SRM because every redirect introduces an opportunity for user loss.

5-10%Industry SRM rateOn well-instrumented platforms

20%+SRM rate in weak setupsClient-side tools without server-side validation

Highest riskRedirect testsEvery redirect introduces user-loss opportunities

At DRIP, every experiment is automatically checked for SRM at multiple checkpoints throughout its lifecycle. Tests with confirmed SRM are flagged, investigated, and resolved before any results are reported to the client. This is not optional — it is a non-negotiable quality gate in our experimentation process. We treat SRM the same way a lab would treat a contaminated sample: the data is discarded and the experiment is rerun under controlled conditions.

Sample Ratio Mismatch: The First Thing to Check in Every A/B Test

What Is Sample Ratio Mismatch?

Why SRM Invalidates Your Results

Common Causes of SRM

How to Detect SRM

The Chi-Squared Test

When to Check

What to Do When You Find SRM

SRM in Practice: How Often Does It Happen?

Empfohlener nächster Schritt

Die CRO Lizenz ansehen

KoRo Case Study lesen

Frequently Asked Questions

Every test. Every time. Checked.

The Newsletter Read by Employees from Brands like

Sample Ratio Mismatch: The First Thing to Check in Every A/B Test

What Is Sample Ratio Mismatch?

Why SRM Invalidates Your Results

Common Causes of SRM

How to Detect SRM

The Chi-Squared Test

When to Check

What to Do When You Find SRM

SRM in Practice: How Often Does It Happen?

Empfohlener nächster Schritt

Die CRO Lizenz ansehen

KoRo Case Study lesen

Frequently Asked Questions

Verwandte Artikel

The Peeking Problem: Why Checking Your A/B Test Early Destroys Results

Statistical Power in A/B Testing: Why Most Tests Are Under-Powered

What E-Commerce Brands Get Wrong About A/B Testing

Every test. Every time. Checked.

The Newsletter Read by Employees from Brands like