Contents

Create Your Own Survey Today

Free, easy-to-use survey builder with no response limits. Start collecting feedback in minutes.

Get started free
Logo SurveyNinja

MDE (Minimum Detectable Effect)

You launched an A/B test on a question's wording. A week later: 200 responses, a 3% difference in conversion, p = 0.31 — not significant. You switch back. But wait: could your sample even have detected a 3% difference in the first place?

With 200 responses, the minimum detectable effect is around 10%. You didn't find a result not because there isn't one, but because the test lacked the power to find it. MDE is the boundary below which your test is blind by definition.

Definition

MDE (Minimum Detectable Effect) is the smallest effect size a statistical test can detect at given significance (α) and power (1-β) levels for a fixed sample size. It is a key parameter when planning A/B tests and studies: if the expected effect is smaller than the MDE, the test will most likely fail to show a significant result even when the effect genuinely exists.

Why MDE matters

MDE solves a fundamental problem of study design: you have to decide in advance which effect is practically important for you to detect, and make sure the sample is large enough to do so.

Without MDE, one of two scenarios plays out:

Scenario 1: too little data. You stop the test with an insufficient sample. A real 5% effect goes undetected because the MDE for your sample is 12%. The conclusion "no difference" is wrong. The correct conclusion is: "we didn't have enough data to detect an effect smaller than 12%".

Scenario 2: too much data. You collect 50,000 responses for a test where 500 would be enough. Resources are wasted, and a test with 50,000 will detect statistically significant but practically meaningless differences of 0.1%.

MDE helps you strike a balance: collect exactly enough data to detect precisely the effect that matters for the decision.

How MDE relates to test power and significance level

MDE depends on three test-design parameters:

α (significance level) — the probability of a false positive (finding an effect where there is none). Standard: α = 0.05. With a stricter threshold (α = 0.01), the same MDE requires a larger sample.

β (Type II error probability) — the probability of failing to detect an effect that genuinely exists. Test power = 1 - β. Standard: 80% power (β = 0.20). At 90% power you need roughly 35% more data.

n (sample size) — the number of observations. The larger the n, the smaller the effect you can detect. MDE ∝ 1/√n: to double sensitivity, increase the sample fourfold.

These three parameters are linked: fix any two and the third is determined automatically. In practice you fix α and power, then either compute the n needed for a given MDE or compute the MDE for an existing n.

MDE for proportions: calculation and interpretation

The most common case in A/B tests is comparing proportions (conversions, response percentages, promoter shares).

An approximate MDE formula for comparing two proportions:

MDE = (z_α/2 + z_β) × √(2 × p̄ × (1-p̄) / n)

Where p̄ is the average proportion between groups, z_α/2 = 1.96 (for α=0.05), z_β = 0.84 (for 80% power), and n is the size of each group.

Example: a baseline conversion of 30%, with 200 people per group:

MDE = (1.96 + 0.84) × √(2 × 0.3 × 0.7 / 200) = 2.8 × √(0.0021) = 2.8 × 0.0458 ≈ 0.128

MDE ≈ 12.8 percentage points. A test with this sample will detect only differences from 30% to 42.8% (or down to 17.2%). If the real effect is 5 pp, this test won't find it.

Example: planning an A/B test of a survey format

A team is testing two versions of a survey's welcome screen. The metric is the share of people who start the survey (click-through rate). The current CTR = 45%.

Question: "How many impressions do we need to detect a CTR improvement of at least 5 pp — from 45% to 50%?"

We use the inverse calculation (from MDE → n):

n = 2 × p̄ × (1-p̄) × ((z_α/2 + z_β) / MDE)²

n = 2 × 0.475 × 0.525 × (2.8 / 0.05)² = 0.4988 × 3136 ≈ 1564

We need roughly 1,564 impressions per variant — about 3,128 impressions in total. At a traffic of 500 impressions per day, the test would take ~6 days. That's realistic. If we wanted to detect a 2 pp difference, we'd need ~9,800 impressions per variant — almost a month. Such an effect may be too small to justify the cost.

Relative vs absolute MDE

MDE can be expressed in two ways, and it's important not to confuse them:

Absolute MDE — the difference in units of measurement. "We'll detect a conversion change of at least 5 percentage points" (from 30% to 35%).

Relative MDE — the change as a percentage of the baseline value. "We'll detect a change of at least 10% of the current value". At a baseline conversion of 30% that is 3 percentage points (30% × 10% = 3 pp).

Relative MDE is convenient for comparing tests with different baselines, but it can be misleading. A 10% improvement on a 0.5% conversion is 0.55%, which is extremely hard to detect. Always clarify which MDE is meant.

MDE when measuring means

For numeric scores (average satisfaction score, average NPS), MDE is expressed in scale units and depends on the data's standard deviation. The greater the spread of responses, the harder it is to detect a small effect.

A reference point via Cohen's d: MDE = d × SD. If SD = 2.0 and you want to detect a medium effect (d = 0.5), then MDE = 0.5 × 2.0 = 1.0 point. For a small effect (d = 0.2), MDE = 0.4 points, which requires a significantly larger sample.

Common mistakes when working with MDE

Not calculating MDE before launching the test. The most common mistake. The test is launched "by feel", data is collected until the first significant result or the end of the week — and you end up with either insufficient power or an inflated point estimate of the effect (winner's bias from early stopping).

Confusing MDE with the expected effect. MDE is the test's sensitivity threshold. The real effect may be larger or smaller. If the expected effect equals the MDE, the test's power is exactly 80%: in 20% of cases the test will miss it. For reliable detection it's better to plan an MDE somewhat below the expected effect.

Stopping the test early at the first significant result. If you look at the p-value several times while data accumulates, the probability of a false positive grows. Fix the sample size and test duration in advance — and stick to the plan regardless of interim results.

Ignoring practical significance when choosing the MDE. Setting an MDE of 0.1% because "the more precise, the better" is a mistake. The sample will grow hundreds of times over, while a detected 0.1% effect won't influence a single real decision. The MDE should match the practical significance threshold: what minimum improvement justifies the change?

MDE in A/B testing of surveys

In A/B tests of survey formats, MDE is used to: compare the completion rate of two questionnaire versions, assess the difference in response rate for different invitation wordings, and measure the impact of question order or design on conversion.

Before launching the test: determine what minimum improvement justifies the change → calculate the required sample → make sure traffic allows you to collect it within a reasonable time. For the calculation, use the SurveyNinja A/B test significance calculator — it computes the required sample size from a given MDE and power parameters.

MDE is not a limitation of the test but a tool for honest planning. Knowing the MDE in advance, you make an informed decision: this test can detect the effect that matters to me, or it can't. Running a test without calculating the MDE is like driving to a meeting without knowing whether you have enough fuel.

Frequently asked questions

How do I choose the right MDE for a test?

Start from business logic: what minimum metric improvement justifies the cost of the change? If rolling out a new variant costs 200,000, and each percentage point of conversion brings in 50,000, the minimum justified effect is 4 pp. That is your MDE. The math will tell you how much data you need to detect it.

What if you can't collect the required sample in a reasonable time?

Three options: increase the MDE (accept that only a stronger effect is of interest), lower power from 80% to 70% (riskier, but requires less data), or abandon the test and decide based on expert judgment. A compromise is always better than a test with knowingly insufficient power — that creates a false sense of a well-founded decision.

Can I change the MDE after launching the test?

No — this is a statistical error called p-hacking or HARKing. Changing the MDE or sample size after looking at the data violates the significance-level guarantees. If you want to reconsider the design — stop the current test and launch a new one with new parameters from scratch.

How is MDE related to test power?

MDE and power are two sides of one parameter. With a fixed sample and α: lowering the MDE (detecting a smaller effect) = lowering power. Raising power (missing a real effect less often) = increasing the sample. The standard is 80% power at α = 0.05. This means: a test with a correctly calculated sample will detect a real effect equal to the MDE in 80% of cases.

Does MDE apply only to A/B tests?

No. MDE is a universal concept for any statistical test: comparing groups in a survey, measuring the change in a metric between study waves, estimating a correlation. Anywhere you need to decide in advance which minimum effect is important to detect and how much data is required for that.

1