Contents

Create Your Own Survey Today

Free, easy-to-use survey builder with no response limits. Start collecting feedback in minutes.

Get started free
Logo SurveyNinja

Correlation analysis

Imagine the following situation: you have run a large customer survey. It contains dozens of questions: satisfaction with the service, a rating of how fast support responds, the convenience of the interface, the likelihood of recommending you, the frequency of purchases. The report has plenty of nice charts for each question on its own. But what you cannot see is which metrics move together. Does improving support help NPS grow? Is the frequency of purchases related to how people rate the convenience of the interface? Or are these completely separate stories?

To answer these questions, you need more than a look at individual metrics — you need an analysis of the relationships between them. Correlation analysis is exactly what quantifies how strongly and in which direction the answers respondents give to different questions are related to one another.

What correlation analysis is in plain language

Correlation analysis is a set of statistical methods that let you assess how strongly the changes in two (or more) variables are related: whether they grow together, whether one decreases as the other rises, or whether there is almost no relationship between them.

To put it simply, correlation answers the question: "When metric A becomes higher, what usually happens to metric B?" At the same time it is important to remember: correlation does not prove a cause-and-effect relationship — it only describes how often the metrics "move" in the same direction.

Strength and direction of the relationship

Direction. A relationship can be positive (the higher one metric, the higher the other), negative (the higher one, the lower the other) or absent (changes in one are almost unrelated to changes in the other).

Strength. It is usually expressed as a number from -1 to +1. Values close to +1 mean a strong positive relationship, those close to -1 a strong negative one, and values around 0 a weak or non-existent one. For quantitative studies in general and surveys in particular, this way of describing a relationship is examined in detail in the term Quantitative Research.

Example. If the relationship between the "satisfaction with support" rating and "willingness to recommend" is close to +0.7, it means that respondents who give high scores to support tend to rate the likelihood of recommending higher as well. But this still does not prove that support is precisely what "causes" the willingness to recommend — both metrics may be influenced by other factors too.

Types of correlation coefficients and answer scales

Surveys most often use scale questions: ratings from 1 to 5, from 0 to 10, Likert scales from "strongly disagree" to "strongly agree" and so on. Formally such data are not always "pure" numeric values, but in practice they are often treated as interval data. The term Likert Scale examines in detail what these scales are and how to work with them.

Pearson coefficient. It is suitable for approximately normal distributions and when the variables can be treated as quantitative (for example, ratings from 0 to 10). It is sensitive to outliers and non-linearity: a few "anomalous" observations can strongly distort the magnitude of the correlation.

Spearman coefficient. It works with the ranks of the values and is less sensitive to outliers and to the shape of the distribution. It is often applied to agreement scales, ordinal ratings and data where the relationship is most likely monotonic but not necessarily linear.

Correlation between dichotomous and scale variables. When one variable is binary (for example, "purchased / did not purchase", "recommends / does not recommend") and the other is a scale variable, special variants of the coefficients are used. In most analytical packages they are implemented "under the hood", but it is important to understand the limits of interpretation: a strong relationship between a binary and a scale metric does not always mean that the scale explains the behavior well.

Where correlation is useful in surveys

Finding drivers of satisfaction and loyalty. By comparing the answers to questions about individual aspects of the experience (speed, quality, convenience, price) with summary metrics such as NPS, CSI or an overall rating, you can understand which factors are most strongly related to customer loyalty. This is the first step toward building a "driver tree" and prioritizing improvements.

Analyzing internal surveys and engagement. In HR research it is useful to look at how the ratings of different aspects of work are related: interest in the tasks, the quality of management, a sense of fairness, the willingness to recommend the company as a place to work. For such tasks it is especially helpful to test hypotheses about the drivers of behavior, which is exactly how correlation analysis helps you move from a long list of metrics to a focused set of assumptions.

Diagnosing "strange" results. Sometimes the aggregated statistics look fine, but unexpected relationships emerge between individual questions: for example, high satisfaction but low willingness to recommend, or vice versa. Correlation analysis helps you notice such inconsistencies and dig deeper with the qualitative research described in the term Qualitative Analysis.

How to visualize correlations

Correlation matrix. The most illustrative approach is to build a matrix where the survey metrics are laid out along the rows and columns and the cells contain the relationship coefficients. Such a "heat map" quickly shows which pairs of metrics are especially closely related and where there is almost no relationship.

Scatter plots. For individual pairs of metrics it is useful to look not only at the number but also at the cloud of points: it may turn out that a high correlation arises because of a few clusters or outliers. Visual analysis helps you notice such features in time and avoid overestimating the stability of the relationship.

Separate charts by segment. If you build scatter plots and matrices for different groups of customers (new / existing, regions, plan types), you immediately see where the relationship "holds" across all segments and where it is characteristic only of certain sub-samples. This is an important step before drawing general conclusions about the entire audience.

How to run a correlation analysis on survey data

Collecting and preparing the data. To begin with, the answers need to be in a convenient tabular form: rows are respondents, columns are numeric metrics (scale ratings, indices, numbers of purchases and so on). In SurveyNinja such tables can be obtained through export and then analyzed in Excel, Python, R or BI systems.

Choosing the metrics. It is important not to "glue" everything together into the analysis, but to select meaningful pairs and groups: for example, compare satisfaction ratings for different aspects with the summary indices, rather than mixing them with demographics or technical parameters.

Splitting by segment. Relationships may differ across groups: between new and existing customers, between regions, between users of different plans. Accordingly, it is useful to look at correlations not only for the sample as a whole, but also for the key segments discussed in the term Market Segmentation.

Common mistakes when interpreting correlations

"Correlation means causation". A classic mistake: if two metrics are related, many people automatically conclude that one "causes" the other. In reality the relationship may be explained by a third factor (for example, seasonality, the type of customers, market conditions), and sometimes by pure chance.

Ignoring distributions. If the data contain many outliers, the scales are heavily "compressed" or the distributions are very skewed, the standard correlation coefficients may lie. Such situations call for more careful methods, some of which are described in the term Factor Analysis and other materials on multivariate methods.

Hunting for pretty numbers. With a large number of metrics you can almost always find a pair where the correlation looks impressive — simply because of the number of attempts. Without hypotheses formulated in advance and a correction for multiple testing, there is a risk of stumbling onto a chance pattern and mistaking it for an important discovery.

Ignoring time. The same relationship may look different at different stages of a product's or a market's development. That is why it is useful to additionally watch how correlations change over time and connect them with time series analysis (read more in the term Time Series Analysis).

How this looks in reports and in SurveyNinja

The basic SurveyNinja reports already include tools that help you sense the relationships between metrics without diving deep into statistics: cross-tabulations, filters by segment, comparison of distributions across groups. The help section Viewing the survey report describes in more detail how to work with such reports.

If a more detailed correlation analysis is required, survey data can be exported and processed in external tools: BI systems, statistical packages, Python or R. The most advanced approaches — such as building models of how factors influence summary indices — often become part of broader projects on brand health tracking and empirical marketing.

Practical recommendations

Start with hypotheses rather than going through every possible pair. Formulate in advance which relationships you expect to see: for example, "the speed of support responses is related to the willingness to recommend" or "a sense of fairness in evaluation influences employee engagement". This will protect you from chance "findings".

Always check what the raw data look like. Before trusting the coefficients, look at the distributions of the ratings and the points on the scatter plot: are there clusters, outliers or strongly non-linear dependencies that could distort the conclusions?

Combine correlation with qualitative methods. Correlation answers the question "what goes together with what", but it does not answer the question "why". To understand the reasons behind the relationships you have found, it is useful to complement the quantitative analysis with in-depth interviews, focus groups and other methods described in the term Qualitative Analysis.

Correlation analysis is not a magic button that tells you exactly what to change in your product or service. It is a way to bring order to a multitude of metrics, to see which of them "move" together, and to narrow down the range of hypotheses. The more carefully you handle correlations and the more honestly you acknowledge their limitations, the more useful surveys become as a tool for making decisions.

1