ANOVA (Analysis of Variance)
May 31, 2026 Reading time ≈ 10 min
You're comparing customer satisfaction across three regions: North — 7.4, Central — 6.9, South — 7.1. Run three separate t-tests? That's a mistake — each test adds a risk of a false positive, and across three comparisons the chance of accidentally finding a "significant" difference climbs to 14% instead of 5%.
ANOVA (analysis of variance) solves this problem: it tests all groups at once with a single test, correctly holding the error rate in check.
Definition
ANOVA (Analysis of Variance) — a parametric statistical method for comparing the means of three or more groups. It analyzes the ratio of variability between groups to variability within groups. The result is an F-statistic and a p-value, which show whether at least one group differs significantly from the rest. When the result is significant, it requires post-hoc tests to determine which groups specifically differ.
Why you can't just run several t-tests
With each t-test, the probability of a false positive is 5% (at a threshold of p < 0.05). This means: in 5% of tests we mistakenly treat random differences as real. With multiple comparisons, these errors accumulate.
If you compare 3 groups pairwise (A-B, A-C, B-C) — three t-tests. The probability of at least one false significance: 1 - 0.95³ ≈ 14%. With 5 groups — 10 pairs — the probability is already 40%. ANOVA tests all groups at once, keeping the overall error probability at the 5% level.
This is called the multiple comparisons problem, and it's exactly what ANOVA solves at the initial testing stage.
How ANOVA works
The idea of ANOVA is to break the total variability of the data into two parts:
Between-group variance. How much the group means differ from the overall mean. If the groups really are different — this variance is large.
Within-group variance. How much individual observations differ from the mean of their own group. This is the "noise" — the natural spread within each group.
The F-statistic — the ratio of these two quantities:
F = Between-group variance / Within-group variance
If F is close to 1 — the differences between groups don't exceed the usual noise. If F is substantially greater than 1 — the differences are more significant than random spread. The p-value is computed from the F-statistic and the degrees of freedom. If p < 0.05 — at least one group differs significantly from the rest.
One-way and two-way ANOVA
One-way ANOVA — comparing groups by a single factor. Example: a satisfaction score across three regions. There's one factor — region. We check whether region affects the score.
Two-way ANOVA — simultaneous analysis of two factors and their interaction. Example: a satisfaction score by region AND by customer type (B2B vs B2C). You can check: does region matter? Does customer type matter? Is there an interaction — that is, does the effect of region differ for B2B and B2C customers?
Factor interaction is an important insight that can't be obtained from two separate one-way ANOVAs. For example: in the North, B2B customers are more satisfied than B2C, while in the South it's the opposite. This is an interaction pattern, and a two-way ANOVA reveals it.
Post-hoc tests: who exactly differs
A significant ANOVA only answers the question "are there differences among these groups?". It doesn't say which groups specifically differ from one another. For that you need post-hoc tests — pairwise comparisons with a correction for multiplicity.
The most common ones:
- Tukey HSD — the standard choice with equal group sizes and equal variances. Controls the error rate across all pairs of comparisons.
- Bonferroni — a conservative method that divides the significance threshold by the number of comparisons. Simple to compute, but less powerful.
- Games-Howell — used with unequal variances or unequal group sizes.
A typical sequence: ANOVA showed p = 0.012 (significant) → post-hoc Tukey → it turns out that the North differs significantly from the South (p = 0.009), but North and Central, Central and South — don't differ (p > 0.05).
Example: ANOVA for comparing ratings across four support channels
A company evaluates support quality across four channels: chat, phone, email, and self-service. 40 customers rated each channel on a 1-10 scale.
- Chat: mean 8.1, SD 1.3
- Phone: mean 7.4, SD 1.8
- Email: mean 6.8, SD 2.1
- Self-service: mean 7.0, SD 1.9
One-way ANOVA: F(3, 156) = 4.87, p = 0.003. The result is significant — there are differences among the channels.
Post-hoc Tukey:
- Chat vs Email: p = 0.002 ✓ significant
- Chat vs Self-service: p = 0.018 ✓ significant
- Chat vs Phone: p = 0.091 — not significant
- The remaining pairs: p > 0.05 — not significant
Conclusion: chat is significantly better than email and self-service, but not statistically better than phone. The remaining channels don't differ from one another. This is a concrete, operational conclusion for decision-making — where to direct investment in improving support.
ANOVA assumptions
ANOVA works correctly when conditions are met that are similar to the t-test:
Normality of the distribution in each group. With n > 30 per group, a violation of normality is not critical. With small groups — check with the Shapiro-Wilk test.
Homogeneity of variances (homoscedasticity). The spread of the data should be roughly the same across all groups. Checked with the Levene test. If violated — use Welch ANOVA, which doesn't require equal variances.
Independence of observations. Each participant is in one group, responses are independent of one another. If one person responds under several conditions — you need a repeated measures ANOVA.
Common mistakes when using ANOVA
Not running post-hoc tests after a significant ANOVA. A significant F says "there's something there," but not "what exactly." Without post-hoc tests the conclusion is incomplete. Sometimes it turns out that the significance is created by one pair out of six, while the other five are within the norm.
Confusing statistical and practical significance. With a large sample, ANOVA will detect a 0.2-point difference between groups. This can be statistically significant and practically meaningless. Always calculate the effect size (eta-squared or omega-squared for ANOVA).
Using a one-way ANOVA when a two-way one is needed. If you have two factors and ignore one — you lose information about the interaction. The interaction can be the most interesting finding in the data.
Ignoring a violation of independence. If the same respondents rate several conditions (for example, three design variants), a standard ANOVA is incorrect. You need a repeated measures ANOVA — otherwise the results will be biased.
ANOVA in the analysis of survey data
In research based on surveys, ANOVA is used to compare scores between several demographic groups (age cohorts, regions, job roles), to analyze the results of multivariate testing with three or more variants, and to compare satisfaction metrics across product lines or channels.
A two-way ANOVA is especially useful when you need to understand the interaction of two variables — for example, whether device type (mobile vs desktop) and user type (new vs experienced) affect the UX score simultaneously. You can calculate the p-value for the F-statistic with the p-value calculator from SurveyNinja.
ANOVA is the right tool when you need to compare three or more groups. A significant result says "there are differences," post-hoc tests say "here's exactly between whom." The effect size (eta-squared) translates the statistics into practical meaning.
Frequently asked questions
How does ANOVA differ from several t-tests?
A t-test compares only two groups. Several t-tests for three or more groups accumulate the probability of a false positive: with three pairwise comparisons the risk of at least one random significance climbs from 5% to ~14%. ANOVA tests all groups in a single test, holding the overall error at the 5% level.
What to do after a significant ANOVA?
Run a post-hoc test — pairwise comparisons with a correction for multiplicity. The standard choice: Tukey HSD with equal groups and equal variances. Games-Howell with unequal variances. The post-hoc test will show which specific pairs of groups differ from one another.
When to use Repeated Measures ANOVA?
When the same participants are measured under several conditions or at different points in time. For example: the same employees rated satisfaction before, immediately after, and a month after changes. A standard ANOVA is incorrect in this case, because the observations are dependent.
What is eta-squared and why is it needed?
Eta-squared (η²) — a measure of effect size for ANOVA: the proportion of total data variability explained by the factor. Values: 0.01 — small effect, 0.06 — medium, 0.14 and above — large. It's needed to understand the practical significance of the result: ANOVA can be significant but explain only 2% of variability — which is practically unimportant.
What to do if the data violates ANOVA assumptions?
If normality is violated with a small sample — a non-parametric analog: the Kruskal-Wallis test (a replacement for one-way ANOVA). With unequal variances — Welch ANOVA. With dependent observations — Repeated Measures ANOVA or the Friedman test (non-parametric). A violation of normality with n > 30 in each group is not critical — ANOVA is robust thanks to the central limit theorem.
Published: May 31, 2026
Mike Taylor