Contents

Create Your Own Survey Today

Free, easy-to-use survey builder with no response limits. Start collecting feedback in minutes.

Get started free
Logo SurveyNinja

Sample (sampling)

Picture this: a coffee shop chain wants to know whether customers are happy with the new menu. The chain has 200,000 customers a month. Surveying all of them is impossible — it would take months and cost a fortune. Surveying the founder's 15 friends is pointless — their opinion doesn't reflect the audience.

Somewhere between "everyone" and "fifteen acquaintances" lies the sweet spot — a sample large enough to give reliable data, yet compact enough to keep the study feasible. The ability to build that sample correctly is the skill that separates professional research from reading tea leaves.

What a sample is

Sample is a subset of people (or objects) selected from a wider group — the population — to take part in a study. The results obtained from the sample are extrapolated to the entire population. The quality of that extrapolation is determined by how representative the sample is — that is, how accurately it reflects the structure and characteristics of the target group.

A cooking analogy: to find out whether a pot of soup has enough salt, you don't need to drink the whole pot — it's enough to stir it and taste a single spoonful. The spoon is the sample, the pot is the population, and stirring is the way to ensure representativeness (so the spoon catches not just the broth on top, but a mix of all the ingredients).

Sample and population

These two concepts always work as a pair.

Population is the entire group you want to draw conclusions about. It can be anything: all customers of an online store, all residents of a city over 18, all employees of a company, all users of a mobile app.

Sample is the part of that group you actually survey.

The goal is for the conclusions drawn from the sample to hold true for the entire population. This is only possible if the sample is large enough and properly constructed. If 70% of the coffee chain's customers are women aged 25–35, but 90% of your sample are men over 50, the data will be useless even if you survey a thousand people.

Types of samples

All sampling methods fall into two broad groups: probability and non-probability.

Probability sampling

Every element of the population has a known, non-zero chance of being included in the sample. This is the gold standard: the results can be generalized to the whole population with measurable accuracy.

Simple random sampling. N people are selected at random from a complete list of all members of the population. Everyone has an equal chance of being chosen. Example: a random-number generator picks 500 from a base of 10,000 customers. The requirement is a complete list of the population, which isn't always realistic.

Stratified sampling. The population is divided into subgroups (strata) by an important attribute — gender, age, region, income level. A random subsample is drawn from each stratum in proportion to its share of the population. This ensures that all key groups are represented.

Example. A company surveys its employees. Of the staff, 60% work in the office and 40% work remotely. With a simple random sample of 100 people you might happen to get 80 office workers and 20 remote ones — a skew. Stratified sampling guarantees: 60 office + 40 remote.

Cluster sampling. The population is divided into clusters (usually on a geographic or organizational basis), several clusters are chosen at random, and within them all or a randomly selected set of participants are surveyed. It's handy for large, geographically distributed populations.

Systematic sampling. Every N-th element is selected from a list (for example, every tenth customer). A simple and fast method, but it can introduce distortions if the list contains a hidden periodicity.

Non-probability sampling

The probability of being included in the sample is unknown or unequal. The results cannot be strictly generalized to the whole population, but the method is cheaper, faster, and often the only one available.

Convenience sampling. You survey whoever is "close at hand": website visitors, newsletter subscribers, passers-by on the street. It's the most common method in online surveys — and the riskiest in terms of representativeness. If you placed a survey on your website, only those who visited the site will take it — and that isn't your entire audience.

Quota sampling. The interviewer recruits respondents according to set quotas: "we need 50 women and 50 men," "30% aged 18–25, 40% aged 26–40, 30% over 40." It looks similar to stratified sampling on the outside, but within each quota the selection is non-random — the interviewer decides who to include.

Snowball sampling. A respondent invites other respondents: "Filled out the form? Forward it to your colleagues." The method is used when the target group is hard to reach by ordinary means — for example, when researching rare professions, closed communities, or stigmatized groups.

Purposive (expert) sampling. The researcher deliberately selects respondents who meet certain criteria — for example, only those who made a purchase in the past month. It yields in-depth information from a relevant audience but makes no claim to generalizability.

How to calculate the sample size

One of the most frequent questions is: "How many people do I need to survey?" The answer depends on four parameters.

Population size (N). How many people are in the group you're studying overall? For large populations (over 100,000) this parameter has little effect on the calculation — which is surprising, but mathematically proven.

Confidence interval (margin of error). How much error is acceptable? If the confidence interval is ±5% and 60% of respondents answered "yes," then the true value in the population lies in the 55–65% range. The narrower the interval, the larger the sample needs to be.

Confidence level. With what probability does the result fall within the stated interval? The standard is 95%: this means that if you repeated the study 100 times, the result would land in the given range 95 times. For critical decisions people use 99%; for screening, 90% is sometimes enough.

Expected proportion (p). If you don't know in advance how the answers will be distributed, use p = 50% — this gives the maximum sample size (the worst-case scenario).

For a typical marketing survey with a confidence interval of ±5% and a confidence level of 95%, you need about 385 respondents — regardless of whether you're studying 10,000 customers or 10 million. This counterintuitive fact is explained by the principle that statistical accuracy is determined by the absolute size of the sample, not by its share of the population.

For a quick calculation, use Cochran's formula or an online calculator. More important than the formula is grasping the principle: doubling the sample does not double the accuracy. Going from 100 to 400 respondents gives a sharp jump in reliability. Going from 1,000 to 4,000 — a noticeably smaller one.

Sampling errors: what can go wrong

Coverage error. Your list of the population is incomplete. Example: you survey customers by email, but 30% of customers never left an email — they're beyond your reach. The result will reflect only the opinion of those who left an email, and that may be a systematically different group (more loyal, younger).

Self-selection bias. The survey is open to everyone, but only those who can be bothered take it — and that's a particular type of person. Extremely satisfied and extremely dissatisfied customers fill out forms more often than "average" ones. As a result, the data is skewed toward the extremes.

Non-response bias. Some of the selected respondents don't answer. If the non-responders systematically differ from the responders, the data is distorted. Example: a job-satisfaction survey — those who are burned out and thinking of quitting are less likely to spend time on a corporate form.

Too small a sample. Twenty answers don't yield statistically significant conclusions — even if it seems to you that "the trend is obvious." With a small sample, random fluctuations are large, and a single atypical answer can shift the whole picture.

Practical recommendations

Define the population before the study begins. "Our customers" is too vague. "Customers who made at least one purchase in the past 6 months" is precise and operationalizable. Without a clear definition of the population, it's impossible to assess whether the sample is representative.

Use screening questions. If the survey is open to everyone but you only need certain people, add a filter at the start: "Did you buy our product in the past 3 months?" No — thanks, the survey is over. This helps clean the sample of irrelevant respondents.

Compare the sample's structure with known data. If you know that 55% of your customers are women, but in the sample they make up 80%, that's a sign of skew. You can correct the results with statistical weighting, but it's better to control the distribution from the start.

Fight for the response rate. The more people from the originally selected sample actually answer, the smaller the non-response error. Short forms, clear invitations, reminders, incentives (a gift, a discount) — all of this works. In SurveyNinja you can set up automatic reminders and an email campaign to reach non-responders again.

Document your sampling method. Who was included in the sample, how they were selected, what percentage responded — this information is needed to interpret the data and so that colleagues (or you yourself six months later) can assess the reliability of the results.

The sample is the foundation of any study. A mistake at this stage is not compensated by great questions or advanced analytics. If the sample is unrepresentative, you'll get a precise answer to the wrong question. Spend time planning the sample before launching the survey — it's the most profitable investment in the entire project.

1