T-test (Student's t-test)
May 31, 2026 Reading time ≈ 10 min
Group A rated the product 7.2 points, group B — 6.8. There is a difference. But is it meaningful? Maybe it is just random noise, and with another sample the numbers would swap places?
The t-test (Student's t-test) is a statistical tool that answers exactly this question: is the difference between two means real, or does it fall within the range of random fluctuation?
Definition
The t-test (Student's t-test) is a parametric statistical test for checking the hypothesis that mean values are equal in one or two groups. It uses the t-statistic, which shows how large the observed difference between means is relative to the variability of the data. From the t-statistic and the degrees of freedom, a p-value is computed — the probability of obtaining such a difference assuming there is none in the population. It is applied to small samples (usually n < 30) and normally distributed data.
Three types of t-test
One-sample t-test. Checks whether the mean of a sample differs from a given number. Example: the average NPS in a sample equals 42. Does this differ from the historical benchmark of 38 in a statistically significant way? We compare one sample against a constant.
Independent samples t-test. Compares the means of two independent groups. Example: men rated interface usability at 7.4, women — at 6.9. Is the difference significant? The groups are independent — different people, not connected to one another. This is the most common type in survey research.
Paired t-test. Compares the means of the same people under two conditions or at two points in time. Example: the same employees rated their satisfaction before and after rolling out a new tool. The groups are dependent — these are the same respondents. The paired test is more powerful than the independent one at the same sample size, because it removes between-person variability.
How the t-test works
The t-statistic is calculated as the ratio of the difference in means to the standard error of that difference:
t = (M1 - M2) / SE
Here SE (the standard error of the difference) accounts for the spread of the data in both groups and the sample size. The larger the difference between means and the smaller the spread within groups, the larger t becomes. A large t means the difference is hard to explain by chance.
From the value of t and the number of degrees of freedom (which depends on sample size), the p-value is determined — the probability of getting such a difference, or a more extreme one, if there is really no difference. The standard threshold: if p < 0.05, the difference is considered statistically significant.
Example: t = 2.34, degrees of freedom = 58, p = 0.023. Conclusion: the probability of getting such a difference by chance is 2.3%. Less than 5% — we accept the difference as real.
T-test vs Z-test
The t-test and the Z-test solve a similar task — comparing means — but they apply under different conditions.
The t-test is used when:
- The sample is small (usually n < 30 in each group)
- The population variance is unknown (in most practical tasks)
The Z-test is used when:
- The sample is large (n > 30)
- The population variance is known
In practice, with n > 30 the results of the t-test and z-test almost coincide. The t-test is the more universal tool: it works correctly on both small and large samples. That is why it is used by default in most cases of analyzing survey data.
Example: a t-test for comparing two onboarding versions
A company is testing two onboarding variants. 35 users went through version A, 35 — version B. After onboarding, each rated how easy it was to get started on a scale of 1-10.
- Version A: mean 6.8, standard deviation 1.9
- Version B: mean 7.6, standard deviation 1.7
At a glance, the 0.8-point difference looks significant. We run an independent t-test:
- t = 1.98, degrees of freedom = 68
- p = 0.051
p = 0.051 — just above the 0.05 threshold. Formally: the difference is statistically insignificant. What to do? Don't rush to conclude "there is no difference." This is a borderline result — perhaps the sample is not large enough. It makes sense to compute the effect size: if it is moderate or large, it is worth repeating the test on a larger sample before making a decision.
Assumptions of the t-test
The t-test works correctly when several conditions are met:
Normality of distribution. The data in each group should be approximately normally distributed. With n > 30 this condition becomes less critical thanks to the central limit theorem. For small samples, a violation of normality is a reason to consider non-parametric alternatives (the Mann-Whitney test).
Independence of observations. Each respondent answers on their own, without influencing others. It is violated, for example, if members of one family end up in the same group.
Homogeneity of variances (for the independent t-test). The spread of the data in the two groups should be approximately equal. This is checked with Levene's test. If the variances differ significantly, you use the Welch variant (Welch's t-test), which does not require this condition and is available in most statistical packages.
Common mistakes when interpreting the t-test
Confusing statistical significance with practical significance. p < 0.05 means the difference is not random. But it does not mean it is important. A difference of 0.3 points in average NPS may be statistically significant with a large sample — and at the same time have no practical meaning at all. Always look at the effect size (Cohen's d) together with the p-value.
Applying the t-test to ordinal scales without caution. Formally, the t-test requires numeric data with equal intervals. A 1-5 Likert scale is ordinal. In practice, researchers often apply the t-test to Likert data, and this is acceptable when n > 30 and the distribution is moderate. But for small samples and pronounced skew, it is better to use non-parametric tests.
Multiple comparisons without correction. If you compare 10 pairs of groups with a p < 0.05 threshold, at least one significant difference will appear by chance with a probability of ~40%. Multiple t-tests require a Bonferroni correction or a switch to analysis of variance (ANOVA).
Ignoring sample size. With n = 10 per group, the t-test will have low statistical power: real differences may go undetected. Calculate the required sample size in advance through the minimum detectable effect.
The t-test in survey data analysis
In research based on surveys, the t-test is applied in several standard scenarios: comparing scores between demographic groups (men vs women, new vs experienced users), comparing results before and after a change (the paired test), comparing two versions of a product or communication in an A/B test.
You can compute a t-test and p-value in the SurveyNinja p-value calculator — without needing statistical packages. After exporting the survey data, it is enough to enter the means, standard deviations and group sizes.
The t-test is the basic tool for checking differences between two means. The key output metrics are: the t-statistic, the p-value and the effect size. The p-value shows whether the difference is random. The effect size shows whether it is practically meaningful. Only together do they give the full picture.
Frequently asked questions
When to use a t-test and when to use ANOVA?
The t-test compares exactly two groups. If there are three or more groups, you need ANOVA. Using several t-tests to compare three groups pairwise is a mistake: it accumulates the probability of a false-positive result. ANOVA checks all groups at once and correctly manages this risk.
What should you do if the p-value is just above 0.05?
Don't rush to conclude "there are no differences." Check: is the sample size sufficient? Compute the effect size — if it is moderate or large, the sample may simply be too small to detect a real difference. p = 0.07 with a small sample and a large effect is a signal to repeat the study with a larger sample, not a conclusion that there is no difference.
Can the t-test be applied to Likert scale data?
Formally, the Likert scale is ordinal, and strictly speaking the t-test is not applicable to it. In practice, with n > 30 and a symmetric distribution of answers, most researchers use the t-test — it is an accepted norm. With small samples or strong skew, it is better to use the non-parametric Mann-Whitney test.
What are degrees of freedom in a t-test?
Degrees of freedom (df) determine the shape of the t-distribution from which the p-value is computed. For an independent t-test, df ≈ n1 + n2 - 2. The larger the sample, the larger the df and the more closely the t-distribution approximates the normal one. In practice you don't need to compute it by hand: all calculators and statistical packages do this automatically.
Does the order of the groups affect the t-test result?
It affects the sign of the t-statistic (positive or negative). It does not affect the p-value or the conclusion about significance. The p-value is always the same regardless of whether it is group A minus group B or vice versa. In a two-sided test (the standard variant), the presence of a difference in either direction is checked.
Published: May 31, 2026
Mike Taylor