Population (general population)

Mike Taylor May 31, 2026 Reading time ≈ 9 min

Picture this: a company’s HR department runs an employee satisfaction survey. The questionnaire is posted in the corporate chat, and 84 people fill it out. The average score is 4.2 out of 5.

Management is pleased: “Employees are happy!” But the company employs 600 people, 200 of whom work in production without regular access to the corporate chat. The 84 who responded are office workers with comfortable conditions.

The opinion of the production teams never made it into the data at all. The conclusion “employees are happy” turned out to be “office employees who don’t mind filling out questionnaires are, on the whole, satisfied.” And that is a completely different story. The mistake happened at the very first stage: no one defined the population—that is, no one asked the question “who exactly do we want to draw conclusions about?”

What a population is

Population is the complete set of all objects (people, organizations, events, transactions) that meet the criteria of a study and about which the researcher wants to draw well-founded conclusions. The population is “the whole pie,” from which a sample bites off a piece for study.

In statistics, the term “population” does not necessarily mean “the population of a country.” A population can be anything: all customers of an online store, all third-year students of an economics faculty, all visits to a website over the past month, all manufactured batches of a product. The researcher sets the boundaries—and everything else depends on how clearly they do so.

Why you should define the population

Because without it you cannot assess data quality. There are three key consequences.

You cannot build a correct sample. A sample is a subset of the population. If the population is not defined, it is unclear exactly whom to select respondents from. “Survey the customers” is not an instruction—it’s a direction. Which customers? Current or former? Over what period? From which region? Without answers to these questions, the sample is formed haphazardly, and the results cannot be extrapolated to anything.

You cannot assess representativeness. To understand whether your sample reflects the real picture, you need a benchmark to compare against—the structure of the population. If you know that 60% of your customers are women and 40% are men, you can check whether this proportion holds up in the responses. If you don’t know it, there is nothing to check against.

You cannot interpret the results correctly. The very same metric means different things depending on the population. “85% of customers recommend us” is excellent if it refers to all customers. And it is practically useless if it refers only to those who left a review on the website (where satisfied customers predominate).

How to define a population

The definition is built on four parameters: who, where, when, and what inclusion/exclusion criteria.

Who (the element of the population)

The unit of analysis is the specific object that data is collected about. Most often this is a person (a customer, an employee, a respondent), but it can also be a transaction, an order, a visit, a support request, or a company.

Example. A study of service quality. The element of the population could be: a) customers (each person is counted once), b) requests (one customer may have made five requests—and each request is evaluated separately). The choice of the unit of analysis changes the size of the population, the methodology, and the conclusions.

Where (geographic and organizational boundaries)

One city or the whole country? One branch or all of them? Only the online store or including retail? The broader the geography, the more resources are needed for coverage—and the harder it is to ensure that all subgroups are represented.

When (time boundaries)

Customers over the past month, quarter, year? Employees as of the time of the survey, or including those who have left? The time frame is critical: a population of “all buyers over 3 years” and “buyers over the past 30 days” are fundamentally different groups with different experiences and different expectations.

Inclusion and exclusion criteria

Additional filters that refine the boundaries. Should test accounts be included? Employees on a probationary period? Customers who made only one purchase? Each criterion narrows or widens the population—and must be justified by the goal of the study.

A good definition of a population sounds like this: “All individual customers (private persons) who made at least one purchase in the CompanyX online store during the period from January 1 to March 31, 2026, excluding test accounts and internal employee orders.” A bad one: “Our customers.”

Population and sample

These two concepts work in tandem. The population is “who we want to learn about.” The sample is “who we actually ask.” The ideal scenario is a census study, in which all elements of the population are surveyed. But this is only practical at a small size: 50 employees, 200 conference attendees, 30 corporate clients.

When the population is large (thousands, tens of thousands, millions), a sample study is conducted. From the entire population, a subset—a sample—is selected according to certain rules. The results obtained from the sample are extended to the population with a certain margin of error. For more on how to build a sample and calculate its size, see the article “Sample.”

The main rule: the sample must resemble the population in its key characteristics. If 30% of customers are from one city, 25% from another, and 45% from the regions, the proportions in the sample should be roughly the same. Otherwise the data will be skewed toward the overrepresented group.

Common mistakes when defining a population

The population is not defined at all. The most common mistake. The team “just launches a survey”—the link is spread across channels, responses are collected, conclusions are drawn. But conclusions about whom? If the population is not specified, the results describe only those who happened to take the survey—not any meaningful group.

The population is too broad. “All consumers in the country” sounds impressive, but surveying a representative sample of tens of millions of people is a task for a federal statistical agency, not for a marketing department. The broader the population, the more resources are needed to ensure representativeness. It is often enough to narrow it down: “women aged 25–45 in major cities who buy cosmetics online.”

The population does not match the goal. A company wants to understand why customers leave—and surveys current customers. But the population for that question is the customers who have left. Current ones can guess at the reasons, but the real answer is known only to those who have already made the decision to leave.

Substituting the available audience for the population. The survey is placed on the website → it is seen only by site visitors. But if the population is all customers, including those who buy offline and never visit the site, the data is inherently incomplete. The method of distributing the survey must match the structure of the population, not the other way around.

Ignoring “invisible” subgroups. In every population there are groups that are hard to reach: former customers (who deleted the app), dissatisfied employees (who won’t open a corporate survey), older users (who don’t use email). If these groups are systematically dropped, the conclusions will be more optimistic than reality. Use screening questions and various distribution channels to reach as many subgroups as possible.

Finite and infinite populations

From a statistical standpoint, populations are divided into two types.

Finite population. The number of elements can be counted: 5,000 employees, 12,340 subscribers, 87 corporate clients. For finite populations, when calculating the sample size, a finite population correction is applied, which reduces the required sample volume. The smaller the population, the larger the share of it that must be surveyed.

Infinite (effectively infinite) population. The number of elements is so large that it can be considered infinite: all potential buyers, all future site visitors, all possible transactions. In such cases, the size of the population has practically no effect on the sample calculation—the formula simplifies. This is precisely why a survey of 10 thousand customers and a survey of 10 million require roughly the same sample (~385 people under standard parameters), which is discussed in detail in the article on the confidence interval.

Practical recommendations

Formulate the population before creating the questionnaire. This is the first step of a research design. Until the population is defined, it is unclear whom to survey, through which channel, and what questions to ask. Drafting a questionnaire without a defined population is like writing ad copy without understanding the audience.

Write the definition down. A formalized definition imposes discipline and protects against scope creep. If a month later a colleague asks “why didn’t we include former customers?”—you’ll have a documented answer.

Match the population against the available data. Defining a population is one thing; having access to a list of its elements is another. If your population is “all mobile app users,” but you have no contact details for 40% of them (they registered without an email), you need to either adjust the population or find an alternative access channel: push notifications, an in-app survey.

Use several channels for coverage. If the population is heterogeneous, a single channel is not enough. Email will reach those who left their address. A QR code reaches the offline audience. A pop-up form on the site reaches those who are currently active. The more channels, the closer the actual sample is to the population.

Document the discrepancies. In reality, it is rarely possible to cover the entire population perfectly. That is normal—but the discrepancies need to be recorded. “The population is all customers for Q1 (N = 8,200). The survey was distributed by email. Only customers with a listed email were reached (N = 6,100, 74%). Not reached: customers without an email, predominantly offline buyers”—a note like this makes it possible to interpret the results correctly and to honestly state the limitations.

A population is the answer to the question “Who do we want to draw conclusions about?” Without a clear answer, any study turns into collecting data about no one and about nothing. Spend 10 minutes defining the population before you start working—and you’ll save weeks of trying to explain what the collected data means.

Published: May 31, 2026

Create Your Own Survey Today