Contents

Create Your Own Survey Today

Free, easy-to-use survey builder with no response limits. Start collecting feedback in minutes.

Get started free
Logo SurveyNinja

Selection bias

A survey showed that 80% of customers are satisfied with the service. But if the survey was only sent to active customers who recently made a purchase, the result may not reflect the opinion of all customers — dissatisfied ones may have left and never received the survey. This is selection bias: the sample is not representative because the wrong people, or the wrong proportions of people, ended up in it compared with the general population. Selection bias is one of the main sources of bias in research.

Selection bias arises at the sampling stage, before respondents start answering questions. It differs from response bias, which is related to the distortion of the answers themselves. Both types of bias are dangerous, but they require different methods of mitigation.

What selection bias means in plain terms

Selection bias is a systematic deviation of the sample from the general population due to errors in the process of selecting respondents. It occurs when some groups of people have a greater or smaller chance of ending up in the sample than others, leading to overrepresentation or underrepresentation of certain groups. In surveys, selection bias can arise from non-random selection of respondents, a low response rate, self-selection of participants, or other factors that make the sample unrepresentative.

Put simply: selection bias is when you survey the wrong people, or not in the proportions you need. If the general population is 50% men and 50% women, but your sample is 80% men and 20% women, that is selection bias. The results will be skewed toward the opinion of men.

How selection bias arises

Non-probability sampling. If respondents are selected non-randomly (for example, only those who agreed to participate, or only those available at a certain time), the sample may be unrepresentative. Probability sampling minimizes selection bias but does not guarantee its absence.

Low response rate. If only 20% of those invited respond to the survey, there is a risk of selection bias: those who responded may differ systematically from those who did not. This is related to non-response bias.

Self-selection of participants. If participation in the survey is voluntary and respondents decide for themselves whether to take part, self-selection occurs: the survey attracts those who are more motivated, interested, or have certain characteristics. For example, in satisfaction surveys, either very satisfied or very dissatisfied customers are more likely to respond.

Limitations of the distribution method. Different distribution methods reach different groups. Online surveys may overrepresent younger and tech-savvy people, phone surveys may overrepresent older people, and social media surveys may overrepresent active users of those platforms.

Time limitations. If the survey is available only at a certain time (for example, only during working hours), it may miss people with a different work schedule or from other time zones.

Language barriers. If the survey is available in only one language, it excludes speakers of other languages, which can create selection bias in multilingual populations.

Types of selection bias

Coverage bias. Some groups of people have no chance of ending up in the sample at all because of limitations in the distribution method. For example, online surveys do not reach people without internet access, and phone surveys do not reach people without a phone.

Non-response bias. Those who did not respond to the survey differ systematically from those who did. For example, dissatisfied customers may more often ignore satisfaction surveys, which inflates the average scores.

Self-selection bias. Participation in the survey is voluntary, and respondents decide for themselves whether to take part. The survey attracts those who are more motivated or have certain characteristics, which makes the sample unrepresentative.

Survivorship bias. Analyzing only the "successful" cases and ignoring those who dropped out of the process. In surveys, this can mean analyzing only completed surveys without accounting for those who started but did not finish.

Time bias. The survey is conducted at a particular time that may be unrepresentative. For example, a survey on a weekday may exclude people who are working at that time, or a survey in a particular season may not reflect opinions during other periods.

When selection bias is especially dangerous

Small samples. With a small number of respondents, even minor selection bias can strongly distort the results. But it is important to remember: increasing the sample size does not solve the problem of selection bias if the bias is systematic.

Heterogeneous general population. If the general population varies greatly in important characteristics (age, income, region), selection bias can lead to overrepresentation of some groups and underrepresentation of others.

Low response rate. If fewer than 30-40% of those invited respond to the survey, the risk of selection bias is high. It is important to analyze who did not respond and how they might differ from those who did.

Voluntary participation. If participation in the survey is completely voluntary and there are no incentives or reminders, self-selection occurs: only the most motivated respondents take part.

Examples of selection bias

Online customer survey. The survey is sent by email only to active customers who recently made a purchase. Inactive customers or those who switched to competitors do not receive the survey. The result: an inflated satisfaction score, because dissatisfied customers are underrepresented.

Employee survey during working hours. The survey is conducted only among those who are in the office at a certain time. Remote employees, employees on business trips, or those with a different work schedule do not end up in the sample. The result: the opinion of only part of the staff, which may not reflect the overall picture.

Social media survey. The survey is posted only on one social network. Users of other platforms or those who do not use social media do not end up in the sample. The result: overrepresentation of that network's users and their characteristics.

Survey in only one language. In a multi-ethnic country, the survey is available only in one language. Speakers of other languages are excluded from the sample. The result: underrepresentation of certain ethnic or linguistic groups.

Analyzing only completed surveys. Only those surveys that respondents fully completed are analyzed. Those who started but abandoned it midway are excluded. The result: overrepresentation of motivated respondents and possible survivorship bias.

How to minimize selection bias

Probability sampling. Use probability sampling methods (simple random sampling, systematic sampling, stratified sampling), where each element of the general population has a known probability of ending up in the sample. This minimizes selection bias but does not guarantee its absence.

Stratified sampling. If the general population is heterogeneous, use stratified sampling: divide the population into groups (strata) by important characteristics and select respondents from each stratum in proportion to its share of the population.

Multiple distribution channels. Use different ways of distributing the survey (email, SMS, social media, website) to reach different groups of respondents and minimize coverage bias.

Increasing the response rate. Use reminders, participation incentives, short surveys, and a convenient interface to raise the response rate and reduce non-response bias. The higher the response rate, the lower the risk of selection bias.

Analyzing non-respondents. Track who did not respond to the survey and, where possible, collect basic information about non-respondents (demographic data, customer status) to assess selection bias and adjust the results if necessary.

Data weighting. If the sample is unrepresentative, you can use weighting: assign respondents weights inversely proportional to their probability of being selected into the sample, in order to correct for selection bias.

Control groups. In experimental research, use control groups that are selected by the same methods as the experimental ones, to make sure that the differences between groups are not due to selection bias.

Explicitly stating limitations. If the sample is non-random or has limitations, state this explicitly in the methodology and describe the possible selection bias. This helps readers interpret the results correctly.

Relationship with representativeness

Selection bias undermines the representativeness of the sample — the ability of the sample to reflect the characteristics of the general population. A representative sample minimizes selection bias but does not guarantee its absence: even with a representative sample, there may be non-response bias or other types of bias.

It is important to distinguish representativeness by demographic characteristics (age, gender, region) from representativeness by characteristics relevant to the research (satisfaction, behavior, opinions). A sample may be representative by demographics but unrepresentative by opinions if there is self-selection bias or non-response bias.

Common mistakes

Ignoring selection bias. Assuming that a large sample or statistical significance guarantees the reliability of the results, without accounting for possible selection bias. This can lead to incorrect conclusions.

Believing that online surveys are always unrepresentative. Online surveys can be representative if the general population consists of internet users, or if methods to compensate for coverage bias are used (multiple channels, weighting).

Confusing selection bias and response bias. Selection bias arises at the sampling stage, while response bias arises at the answer-collection stage. It is important to distinguish them and to apply different methods of mitigation.

Not analyzing non-respondents. Ignoring those who did not respond to the survey and not analyzing how they might differ from those who did. This can hide selection bias.

How this works in SurveyNinja

In SurveyNinja you can use different ways of distributing surveys (email, link, QR code, embedding on a website), which helps minimize coverage bias. You can set up reminders for non-respondents to raise the response rate and reduce non-response bias. When analyzing the results, it is important to take into account the distribution method and possible selection bias: if the survey was distributed only by email to active customers, the results may not reflect the opinion of all customers. In the report, it is worth noting the distribution method and the possible limitations of the sample.

Practical recommendations

Always account for selection bias when planning. At the survey design stage, think through how the sample will be formed, which groups may be overrepresented or underrepresented, and take measures to minimize the bias: probability sampling, multiple channels, raising the response rate.

Analyze non-respondents. Track who did not respond to the survey and, where possible, collect basic information about non-respondents to assess selection bias and adjust the results if necessary.

Use multiple channels. Distribute the survey through different channels (email, SMS, social media, website) to reach different groups of respondents and minimize coverage bias.

Raise the response rate. Use reminders, incentives, short surveys, and a convenient interface to raise the response rate and reduce non-response bias.

State limitations in the report. In the methodology, explicitly state the sampling method, the response rate, the possible selection bias, and the measures taken to minimize it. This increases transparency and helps readers interpret the results correctly.

What to write in the report. In the methodology section, state: "The sample was formed using [simple random sampling / stratified sampling / convenience sampling]. The survey was distributed via [email / SMS / social media]. The response rate was [X]%. Possible limitations: coverage bias due to using only an online channel (minimized through multiple distribution channels) and non-response bias (assessed as low based on a comparison of respondents and non-respondents)."

Selection bias is a systematic deviation of the sample from the general population due to errors in the process of selecting respondents. It arises at the sampling stage and can distort the results of the research. Minimizing selection bias requires attention to the selection methods, the ways surveys are distributed, the response rate, and the analysis of non-respondents — only then can you obtain a representative sample and reliable results.

1