Contents

Create Your Own Survey Today

Free, easy-to-use survey builder with no response limits. Start collecting feedback in minutes.

Get started free
Logo SurveyNinja

Stratified sampling

Picture this: a supermarket chain wants to measure how satisfied customers are with its service. If you simply survey 800 people at random from the overall database, there is a risk that most respondents will come from large cities and central districts. Small towns, suburban stores and "night-time" shoppers will end up underrepresented in the sample — and that is often exactly where the sharpest problems hide.

To avoid "diluting" important segments across the sample and losing their voice, researchers divide the audience into homogeneous groups in advance and plan how many people they need to survey in each one. This approach is called stratified sampling, and it is considered one of the most precise ways to build a sample for a survey.

What stratified sampling is

Stratified sampling is a method of building a sample in which the population is first divided into homogeneous groups (strata) by an important characteristic — gender, age, region, customer type and so on — and then a set number of respondents is randomly selected within each stratum.

The idea is simple: instead of hoping that chance will "correctly" distribute respondents across groups, you define the structure of the sample in advance and make sure that every meaningful segment of the audience is represented in the survey.

How stratification differs from quota and simple random sampling

Simple random sampling. Every element of the population has an equal chance of being selected, but the structure by gender, age, region and other characteristics comes out "as luck would have it". With large enough samples, chance usually works in your favour, but with small and medium samples, skews are possible.

Quota sampling. You also set target shares for key characteristics, but selection within each quota is usually based on convenience: interviewers or channels pick the people who are easiest to survey. Formally this is a non-probability scheme, described in detail in the article on quota sampling.

Stratified sampling. It differs from quota sampling in that selection within each stratum must be random (or as close to random as possible). This makes it possible to classify it as a probability method and to correctly estimate the statistical margin of error at the level of the whole sample.

How to plan a stratified sample

1. Describe the population. First, you need to clearly understand exactly who you want to study: all customers over the past 12 months, only new customers, city residents over 18, users on a particular plan. There is no point moving on without this.

2. Choose a characteristic (or several) for stratification. These should be characteristics that strongly affect the metric you are studying: region, type of point of sale, company size, age group. For large-scale studies it helps to draw on best practices for designing public-opinion surveys and on official population statistics.

3. Define the structure of the strata. CRM data, business statistics or public sources help here. For example: "New York — 20% of customers, other large cities — 30%, other towns — 50%". Or: "micro-business — 40%, small — 35%, medium — 25%".

4. Allocate the total sample size across the strata. Most often, proportional allocation is used: if 60% of customers in the population are women and 40% are men, then the same shares are set in the sample. For small but strategically important segments, disproportionate stratification is sometimes applied — you recruit slightly more respondents so that the analytics for them are more robust.

Stratified sampling in online surveys

In the "offline classic", stratification is built around lists and random selection. In an online setting the approach changes, but the basic principles remain.

Working with several databases. Often different strata physically live in different sources: separate customer lists by region, product type or sales channel. In this case, each database acts as its own stratum, within which you randomly select contacts and send out invitations.

Using respondent panels. If you recruit your audience through a panel, a similar effect is achieved through targeting settings. The approach and what to expect from such respondents are explained well in the article "Who is a respondent in a survey": it shows how panels help to top up the segments you need and to avoid skews toward active users.

Hidden variables and filters. Even if you cannot select respondents within each stratum in a perfectly random way, you can at least label them correctly. Using hidden variables and filters in reports, you control the structure of the data and, if necessary, additionally top up underrepresented groups.

Example: surveying customers of a gym chain

Suppose a chain has 30 clubs across three types of locations: city centre, residential districts and shopping malls. According to CRM data, 25% of customers visit central clubs, 50% visit clubs in residential districts, and 25% visit clubs in shopping malls. Analytics show that satisfaction strongly depends on the type of location.

If you simply take a random sample from the overall database, you might end up with an accidental skew toward one type of club. With a stratified scheme you define three strata ("centre", "districts", "malls") and allocate, say, 600 customers as 150/300/150. Within each stratum, people are randomly selected from the CRM to be invited, and the SurveyNinja distribution is then built on the basis of these lists.

As a result, the report gives you not only an overall satisfaction score but also robust comparisons for each type of club — without the fear that one of the groups has only 20 responses and the conclusions about it are unreliable.

Common mistakes in stratification

Too many strata. The desire to "account for everything" leads you to divide the audience by many criteria at once and end up with dozens of cells, in each of which you need to gather a minimum number of observations. In practice this is almost unachievable and makes the project much more expensive.

Using unstable characteristics. Stratifying by characteristics that change quickly (for example, "active/inactive customer over the past week") is risky: while you are recruiting the sample, the structure of the population has already managed to change.

Mixing stratification and quota logic without understanding it. In real online surveys, quota or convenience selection is often used within strata rather than purely random selection. This is not forbidden, but it is important to honestly acknowledge that in such a case your design sits somewhere between strict stratification and quota sampling, and to interpret the statistical conclusions carefully.

Practical recommendations

Stratify by a few genuinely meaningful characteristics. To begin with, 1–2 dimensions are enough: region + customer type, type of point of sale + company size. Other characteristics can be accounted for later, at the analysis stage, by applying filters and cuts in the reports.

Plan the sample size with the strata in mind. It is important to consider the sample size not only "overall" but also within each group. If you have 600 respondents and three strata, the analytics for each of them are built on subsamples of 200 people — it is worth checking in advance whether that is enough for your tasks.

Record the scheme in the methodology section. In reports on survey results, it is worth describing separately which strata were used and how respondents were selected within them. This increases trust in the results and helps colleagues correctly assess how reliable the conclusions are.

Use the capabilities of the survey builder, not just Excel. In SurveyNinja, stratification is easy to "support" through several links to the same survey (for different strata) and hidden variables that mark which group each response belongs to. This simplifies building cuts and comparisons in reports and reduces the risk of mixing up data during manual processing.

Stratified sampling is a way to make sure in advance that every important segment of the audience is heard in a survey, not just the most visible and vocal ones. A little extra planning at the start saves hours of argument at the analysis stage, when you have to explain why "there are almost no customers from the regions in the sample".

1