Contents

Create Your Own Survey Today

Free, easy-to-use survey builder with no response limits. Start collecting feedback in minutes.

Get started free
Logo SurveyNinja

Representativeness

Picture this: before an election, a TV channel runs an online poll on its website — "Who will you vote for?". 100,000 people take part. The result: candidate A — 72%, candidate B — 28%.

In the actual election, candidate B wins with 54%. How could a poll with a hundred thousand votes be so badly wrong? Simple: the channel's audience is not a cross-section of society. It is a specific age, social and political group. 100,000 answers from an unrepresentative audience are worse than 1,000 answers from a representative one. Sample size does not save you if the sample is skewed. That is exactly what this article is about — representativeness.

What representativeness is

Representativeness is the property of a sample to reflect the key characteristics of the population about which conclusions are drawn. If the structure of the sample matches the structure of the population on important parameters (gender, age, region, behavior), the sample is representative and the results can be generalized. If it does not, the conclusions hold only for those who were surveyed, not for the entire target group.

Representativeness is not a binary property ("yes" or "no"). It is a degree: a sample can be more or less representative on different parameters at the same time. It may perfectly reflect the age structure but be skewed by geography. It may be balanced by demographics but systematically miss people with a certain experience (for example, former customers).

Why representativeness is critical

Without representativeness, your research answers the wrong question. Instead of "What do our customers think?" you get "What do those of our customers who don't mind filling out forms think?". And those are fundamentally different groups.

Biased data leads to wrong decisions. If a satisfaction survey overrepresents loyal customers, the average score will be inflated. Management decides everything is fine and does not invest in improvements. Meanwhile the "silent majority" — those who didn't respond — drifts off to competitors.

Unrepresentative data does not scale. You tested a new feature on 50 beta testers — enthusiasts who asked to take part themselves. Everyone loves it. You roll the feature out to your entire audience and discover that ordinary users find it confusing. The opinion of the enthusiasts did not represent the opinion of the majority.

Without representativeness, all further calculations are meaningless. The confidence interval, statistical significance, sample size — all of these tools assume that the sample is random and representative. If it is not, the formulas produce correct numbers, but their interpretation is wrong.

What representativeness depends on

The sampling method

Probability methods (simple random, stratified, cluster sampling) ensure representativeness by design: every element of the population has a known chance of being included in the sample. Non-probability methods (convenience sampling, snowball) do not guarantee this — but they can approach representativeness with proper control. More on the methods in the article "Sample".

The distribution channel

The channel determines who you can physically reach. An email campaign reaches only those who left an address. A survey on the website — only website visitors. A pop-up form in an app — only active users. Each channel applies its own "filter", and if that filter systematically cuts off certain groups, the sample is skewed.

Example. A company runs a satisfaction survey by email. But 35% of customers placed their order by phone and did not leave an email. Those 35% are most likely a different demographic group (older, less digital-oriented). Their opinion is absent from the data, and average satisfaction may be systematically inflated or deflated.

Respondent self-selection

Even if you send the invitation to a perfectly balanced group, not everyone will respond. And those who do respond are not a random subset. Prone to responding: the extremely satisfied (who want to praise you), the extremely dissatisfied (who want to complain), people with higher education, people with more free time. Prone to not responding: the "middle-of-the-roaders", the busy, the indifferent. This is self-selection bias — one of the most insidious threats to representativeness.

Sample size

A large sample does not guarantee representativeness — the TV-channel example proves it. But a small sample definitely cannot be representative: with 30 responses, random fluctuations are too large to reflect the real picture. Size is a necessary but not a sufficient condition.

How to check representativeness

It is impossible to fully prove representativeness — to do so you would need to know everything about the population, and then the research would not be needed at all. But you can check it against key parameters.

Compare the structure of the sample with known data about the population. If the customer base is 55% women and 45% men, but the responses are 70% and 30%, the sample is biased by gender. If 40% of employees work in regional offices but only 15% of respondents do, the regions are underrepresented.

Analyze the profile of non-respondents. If you have data on those who were invited, compare respondents and non-respondents. Do they differ by age, tenure, activity, average check? If they do, the answers do not represent the whole group.

Run a sensitivity test. Remove the 10% most active respondents (those who answered first) from the data — did the results change? If they did, the data is sensitive to the composition of the sample, which means representativeness is in question.

How to improve representativeness

Use stratified sampling. Divide the population into subgroups and control that each group is represented proportionally. If 30% of customers are from the capital, make sure they make up about 30% of the sample too.

Combine channels. Email for the digital audience, a QR code for offline points of sale, Telegram for younger people, phone calls for those who do not use the internet actively. One channel = one filter. Several channels = more complete coverage.

Fight non-response. Reminders increase the response rate by 15–25%. Short questionnaires are filled out more willingly. Personalized invitations work better than impersonal ones. The more people from the original sample actually respond, the smaller the non-response error.

Use statistical weighting. If the responses overrepresent young women and underrepresent older men, you can assign weights to the answers to compensate for the imbalance. An older man's answer "weighs" more, a young woman's — less. This is not a perfect solution (it assumes that the overrepresented and underrepresented people within a group are alike), but it is better than nothing. More on this in the article Weighted Survey.

Add screening and quotas. Screening questions filter out irrelevant respondents, while quotas limit the number of answers from each subgroup: "We have enough with 100 answers from men aged 18–30 — we're not taking any more, we're waiting for other segments".

Representativeness and online surveys

Online surveys by definition cover only people with internet access — and that is not 100% of the population. For consumer marketing research this is usually not critical (the target audience is online anyway). For sociological research it is a significant limitation.

Another feature of online surveys is that self-selection is more pronounced than in telephone or in-person interviews. When an interviewer calls on the phone, it is harder for a person to refuse. When an email with a link arrives, it is easier to ignore. That is why the response rate of online surveys is lower, and the non-response error is potentially higher.

This does not mean that online surveys cannot be representative. They can — provided you control the sample, combine channels and analyze the profile of non-respondents. The SurveyNinja builder helps with this: response filtering by parameters, hidden variables for passing information about the segment, built-in analytics for comparing subgroups.

Common mistakes

"We have a lot of responses, so it's representative". No. 10,000 responses from your Instagram followers is the opinion of Instagram followers, not the "opinion of customers". Volume does not compensate for bias.

Ignoring non-respondents. If 20% of those invited responded, who are the other 80%? If you don't ask this question, you don't know how well your data represents the whole.

Representativeness on a single parameter. A sample can be perfectly balanced by gender — and at the same time radically skewed by age, geography or behavior. Check several parameters at once.

Extrapolation beyond the population. You surveyed customers — and draw conclusions about "the market as a whole". But your customers are already a filtered group (those who chose you specifically). Their opinion is not equal to the opinion of "all consumers".

Representativeness is not a formality or an academic requirement. It is the answer to the question "Can this data be trusted?". If the sample is representative, the conclusions hold for the entire audience. If it is not, you risk making a decision based on the opinion of a minority while thinking it is the opinion of the majority. Check representativeness before you start drawing conclusions, not after.

1