Contents

Create Your Own Survey Today

Free, easy-to-use survey builder with no response limits. Start collecting feedback in minutes.

Get started free
Logo SurveyNinja

Baseline study (benchmark measurement)

A year after a website redesign you run a satisfaction survey — NPS 42. Is that good or bad? Up or down?

You don't know — because nobody measured the "before." Every comparison after that becomes speculative: "it kind of felt worse earlier." A baseline study is research that captures the initial state before any intervention. Without a baseline, every later metric hangs in the air; with one, it turns into measurable change.

Definition

Baseline study (benchmark measurement) — research conducted to capture the current state of a measured variable before launching an intervention, program, or change. It serves as a reference point for later measurements and lets you quantify the effect of changes. Used in product, marketing, HR, and social research. It is the foundation of any sound "before and after" assessment and of comparative tests.

Why you need a baseline

Without a baseline study you can't tell change from chance or from "how it always was." The three critical functions of a baseline:

Measuring the effect. "Satisfaction of 42" by itself means little. "Satisfaction rose from 35 to 42 after rolling out the new training program" contains a concrete effect of 7 points. Without the first number, the second statement is impossible.

Prioritizing changes. A baseline shows where your problems are right now. Without it, decisions about what to improve are driven by intuition or complaints — which may not reflect the real picture.

Communicating results. "We improved NPS by 8 points" is a measurable result you can show stakeholders. Without a baseline you can only talk about "qualitative improvements," which is far weaker when you need to justify investment.

When to run a baseline

The key rule: the baseline must be completed before the intervention starts. Once changes are launched, it's too late to capture the initial state — it has already blended with the effect of the changes.

Typical scenarios:

Before product development. A baseline of satisfaction, NPS, and usability metrics ahead of a major release or redesign. After the release — a repeat measurement on the same sample or methodology.

Before a marketing campaign. Measure brand awareness, recognition, and brand associations before launch. After the campaign ends — a repeat measurement to assess the effect.

Before an HR program. A baseline of engagement, satisfaction, and intent to stay before rolling out a new program (training, benefits, process changes). Afterwards — a repeat measurement and an analysis of the difference.

Before regular monitoring. The first wave of any recurring survey (quarterly NPS, an annual engagement survey) automatically becomes the baseline for later comparisons.

How to build a baseline study

1. Define the key variables. Which metrics are critical for assessing the effect? The main metric (NPS, CSAT, conversion) plus contextual variables (demographics, segment, channel). Important: the list of metrics must not change between the baseline and the repeat waves, or the comparison is invalid.

2. Lock down the methodology. Scales, question wording, distribution method, sample size, collection period. All of it must be documented and reproduced in later waves. Changing the methodology between the baseline and the repeat = no valid comparison.

3. Ensure a representative sample. The baseline must describe the same population you'll work with afterwards. If the baseline is collected only from active users while the repeat measurement also includes new ones — the results aren't comparable.

4. A sufficient sample size. The baseline must have a sufficient sample size to give a narrow confidence interval. Otherwise later comparisons will be "blurry" — a 5-point difference may fall within the margin of error.

5. Documenting the context. What was happening at the time of the baseline? What were the external conditions, seasonality, and events? This lets you interpret future changes correctly and tell them apart from the noise of external factors.

Example: a baseline for rolling out a new CRM

A company is planning to roll out a new CRM and wants to assess whether it will improve the sales team's productivity. Beforehand, they run a baseline study:

  • A survey of 40 sales managers: current satisfaction with their tools (1-10 scale), time spent on administrative tasks (hours per week), the sense of support from the systems
  • Data from the current CRM: average number of days per deal, number of tasks per manager, percentage of manual entry
  • The department's eNPS

Baseline results:

  • Satisfaction with tools: 5.2/10
  • Time on administrative tasks: 14 hrs/week
  • Average deal time: 18 days
  • Department eNPS: 12

Six months after rolling out the new CRM — a repeat measurement on the same metrics and sample:

  • Satisfaction: 7.4/10 (+2.2)
  • Time on administrative tasks: 9 hrs/week (−5 hrs)
  • Average deal time: 15 days (−3)
  • Department eNPS: 28 (+16)

The quantitative effect is visible. Without a baseline, the claim "the CRM improved the work" would be unprovable — you could only say "managers say it got better." Now there are concrete numbers for the report and for decisions about scaling.

Baseline vs benchmarking

These concepts are sometimes confused, but they're different:

  • Baseline — your own initial state at a fixed point in time
  • Benchmarking — comparison against external references: the industry, competitors, best practices

They complement each other. The baseline tells you where you're heading ("we grew from 42 to 50"). Benchmarking tells you where you stand relative to the market ("the industry benchmark is 58, we're still behind"). Strategic planning needs both.

Common mistakes with baselines

Using "old data" as a baseline. A survey run a year ago with a different methodology is not a baseline. Different wording, different scales, and a different audience make the comparison invalid. It's better to run a proper baseline right now, even if it delays launch by 2-3 weeks.

Changing the questions between the baseline and the repeat measurement. "We tweaked the wording of question 5 a little" sounds harmless, but it can shift the averages by 5-10%. Strictly: the key questions used for comparison must stay identical.

Comparing different samples. The baseline is run on active users, the repeat on the whole base, including inactive ones. The difference in the average may be due not to the change but to a different sample composition. The survey distribution methodology must be the same.

Ignoring seasonality. If the baseline is run in December (peak season for e-commerce) and the repeat in February, the change may reflect seasonal dynamics rather than the effect of your work. Compare comparable periods or use control groups.

Baseline and research design

A baseline is part of a broader research design. To assess causal effects, it's advisable to complement the baseline with a control group (one that didn't receive the intervention) — this is the classic pre-post-control design. Without a control group you can't rule out that the change would have happened even without your intervention — because of external factors, the natural evolution of the product, or a shift in the audience.

For large-scale changes (a redesign, a strategy shift), it's better to complement the baseline with A/B testing in the early stages — this gives a cleaner estimate of the effect than a comparison against a historical point.

Baseline in SurveyNinja

For baseline studies in SurveyNinja you create a survey with a clearly fixed methodology and save it as a template. Later waves are created by copying the template — this guarantees identical questions, scales, and structure. During analysis it's convenient to use hidden variables to tag the wave of the study: "baseline," "wave 1," "wave 2" — this lets you separate the measurements right at export for comparative analysis.

A baseline study is an investment in the ability to measure the effect of your actions. Without it, any claims about improvements stay in the realm of feelings; with it, they become measurable numbers for reports, decisions, and investments. The key conditions for a sound baseline: a clear methodology, a sufficient sample, documented context, and an unchanged methodology in later waves.

Frequently asked questions

Can you run a baseline after launching changes?

No — that's no longer a baseline. Once the intervention is launched, the current state already reflects the effect of that intervention. The only valid baseline is a measurement taken before any changes. If there was no chance to run a proper baseline, you can use historical data as an approximation, but the quality of such a comparison is significantly lower.

What sample size do you need for a baseline?

It depends on the expected effect. If you expect a 5-point improvement with a standard deviation of 15, you need at least 150-200 people in each wave. For small effects (1-2 points) — substantially more. For a precise calculation, use MDE and the required statistical power.

How often should you update the baseline?

With significant changes in the audience or the product — a new baseline. Typical cases: a strategy shift, entering new markets, a product redesign, a change of target audience. For regular monitoring, the baseline is set once when tracking starts and is used as the reference point for all later waves.

Do you need a baseline for short campaigns?

Yes, if you want to measure their effect. For a campaign lasting 2-4 weeks, you can run the baseline a week before the start — that's enough to capture the initial state. For long programs (6+ months) the baseline should be more thorough.

What should you do if the baseline results are worse than expected?

This is a normal situation — a baseline often shows the real picture, which diverges from the team's intuitive expectations. It's important to document the results and not "adjust" them after the fact. An honest baseline is the foundation for an honest assessment of the work: it's better to see the true state and move from there than to measure the effect from an inflated starting point.

1