Duplicate responses
May 31, 2026 Reading time ≈ 8 min
A prize-draw survey offering a gift certificate: 4,000 responses in three days. You analyze them and notice — 47 responses from a single IP, all with different random demographics, but with identical email endings.
One person was trying to boost their chances. Or wanted to spoil the statistics. Either way, that's 47 units of garbage that skew every conclusion. Duplicate responses are one of the most common, and one of the most solvable, data-quality problems in surveys.
Definition
Duplicate responses are a situation where the same respondent submits several records in a single survey, intentionally or by accident. This can happen because of technical glitches (resubmitting the form), motivation (prize draws, rewards), or as part of manipulating the results. They are detected by analyzing technical metadata (IP, cookie, device fingerprint), content signals (identical or overly similar answers), and timing patterns. They belong to the broader class of fraud detection problems.
Where duplicates come from
Accidental duplicates. Technical glitches: a person filled out the survey, clicked "submit," the page froze, they refreshed and submitted again. Or an autosave combined with a manual submission. Such duplicates are rare and are cleaned up automatically with minimal settings.
Motivated duplicates. Surveys with prize draws, gifts, or discounts are the classic source. People go through several times to increase their chances. This shows up in marketing campaigns, contests, and promotions. Usually from a single device or IP, but with different answers so it "looks real."
Panel fraud. In research conducted through respondent panels — attempts by professional "participants" to take a single survey many times in order to collect more rewards. The hardest case: duplicates created deliberately from different devices and VPNs.
Malicious distortion. A deliberate effort to ruin the statistics: a competitor, a dissatisfied customer, an organized group. The scale is small, but it can significantly affect small samples.
Detection methods
Deduplication by IP address. The simplest approach: one IP = one response. The upside — it is easy to set up. The downside — it also blocks legitimate participants from the same household or corporate network who share an IP.
Browser fingerprinting. A set of device characteristics: screen resolution, installed fonts, time zone, user agent. The combination creates a unique fingerprint. It can be bypassed by clearing cookies and switching browsers, but it catches most "simple" attempts.
Unique token in the link. Each respondent receives a personal link such as /survey?token=abc123. A repeat visit using the same link is blocked. This works for email campaigns with a known contact base.
Cookies. After the first completion, a marker is written to the browser. On a repeat attempt the system reads it and does not let the person through again. Clearing cookies bypasses the protection, but most people do not do that.
Content analysis. Identical or nearly identical sequences of answers from the same source are a sign of a duplicate. Matches in open-ended fields are especially suspicious.
Timing pattern. Several responses from the same source within a short interval (10-30 seconds between attempts) are almost certainly duplicates.
Example: cleaning data in a marketing survey
A company launched a survey with a promo-code draw. It received 3,200 responses in 5 days. Before analysis, they ran a check:
- Repeats by IP: 340 duplicates from 78 unique IPs
- Same device fingerprint, different answers: another 65 cases
- Matching email with different IPs (an attempt to bypass the block): 22 cases
- Suspiciously fast completions (< 45 sec): 180 cases
That adds up to 607 suspicious responses — 19% of the dataset. After cleaning, 2,593 responses remained. The NPS for the "raw" sample was 34, and for the cleaned one 41. The 7-point difference is a direct result of cheaters giving predominantly neutral or low ratings in order to "blend in" with ordinary respondents.
How to prevent duplicates in advance
Prevention is better than cleanup. A few practices:
Do not give an explicit incentive to duplicate. If a survey offers a prize, the mechanics should imply one response per person, not "the more responses, the more chances." A promo code for participating instead of a lottery reduces the motivation to game the system.
Personal links. When sending invitations, give everyone their own token. A repeat visit is blocked automatically.
Combined protection. IP + cookie + fingerprint — three layers that together cover 95%+ of accidental duplications. Professional fraud still gets through, but its scale in business surveys is usually not critical.
Explicit rules at the start of the survey. Text like "Please complete the survey only once — repeat submissions are not counted" works on conscientious respondents: they will not try to go through a second time.
When duplicates are normal
There are scenarios where multiple responses from one person are acceptable:
- Longitudinal studies. The same person takes the survey once a quarter — these are not duplicates, they are measurement points over time. A unique identifier is needed to link them.
- Pulse surveys. Employees regularly answer short surveys — each wave is separate.
- Repeated interactions with a product. A survey after every order from a returning customer is a correct metric, not a duplicate.
In these cases it is important to distinguish a "duplicate within one wave" (a problem) from "several waves from one person" (normal). The first requires blocking; the second requires a correct identifier for analysis.
Duplicates in SurveyNinja
In SurveyNinja, limiting repeat responses is configured through completion limits: blocking by IP, cookie, or device. The settings are available in the survey parameters. For users who run into a repeat-completion block, there is a help article explaining the reasons for the block.
Duplicates are one type of problematic response that gets filtered out as part of the overall fraud detection process. A combination of measures — technical (limits, tokens) and analytical (checking patterns before analysis) — provides the most complete protection of data quality.
Duplicate responses are not just "extra rows." They are a systematic distortion of the sample in favor of those who try hardest to take the survey multiple times. Protection against duplicates is built before launch (limits, tokens), and cleanup happens before analysis (checking IP, fingerprint, timing patterns). One IP can be a family, but 47 responses from a single IP are almost certainly fraud.
Frequently asked questions
Do duplicates always need to be blocked?
In most cases — yes. The exceptions: anonymous surveys with a broad audience, where maximizing reach matters and the risk of gaming is low. But even there it is worth keeping basic protection (cookie + timing pattern) to filter out obvious technical duplicates.
Doesn't IP blocking filter out legitimate participants?
Yes, in corporate networks and households this is possible. For such cases, combined protection is used: IP + browser fingerprint. If the device fingerprints differ, the responses are let through even with a matching IP. For exceptionally sensitive surveys, you can disable IP blocking and rely on fingerprint and cookies.
Can duplicates be identified after the data has been collected?
Yes, through post-hoc analysis: checking IP, fingerprint (if it was stored), content signals (identical answer patterns), and timing regularities. Export the data together with the technical metadata and filter out duplicates before analysis.
What should I do if the survey is anonymous but I need protection against duplicates?
Technical methods (cookie, fingerprint, IP) work in anonymous surveys too — they do not reveal identity, they only identify a repeat device. Personal tokens are ruled out — they break anonymity. This level of protection is sufficient for mass surveys without targeted gaming.
How do I know whether the protection is sufficient?
After collecting the data, analyze the distribution of responses by IP: if no IP produces more than 2-3 responses, the protection is working. If you see concentration (dozens of responses from a single source) — the protection is letting things through, and it needs strengthening. Also look at consistency with the expected demographics: a strong skew can be a sign of gaming from specific devices.
Published: May 31, 2026
Mike Taylor