Pearson's correlation coefficient: what it is and how to use it

When researching in psychology, descriptive statistics is frequently used, which offers ways of present and evaluate the main characteristics of the data through tables, graphs and measures summaries.

In this article we will know the Pearson correlation coefficient, a measure of descriptive statistics. It is a linear measure between two quantitative random variables, which allows us to know the intensity and direction of the relationship between them.

Related article: "Cronbach's alpha (α): what it is and how it is used in statistics"

descriptive statistics

The Pearson correlation coefficient is a type of coefficient used in descriptive statistics. Specifically, it is used in descriptive statistics applied to the study of two variables.

For its part, descriptive statistics (also called exploratory data analysis) brings together a set of techniques Mathematics designed to obtain, organize, present and describe a set of data, with the purpose of facilitating its use. In general, use tables, numerical measures or graphs as support.

instagram story viewer

Pearson's correlation coefficient: what is it for?

The Pearson correlation coefficient is used to study the relationship (or correlation) between two quantitative random variables (minimum interval scale); for example, the relationship between weight and height.

It is a measure that gives us information about the intensity and direction of the relationship. In other words, it is an index that measures the degree of covariation between different linearly related variables.

We must be clear about the difference between relationship, correlation or covariation between two variables (= variable joint) and causation (also called forecasting, prediction, or regression), since they are different concepts.

You may be interested in: "Chi-square (χ²) test: what it is and how it is used in statistics"

How is it interpreted?

Pearson's correlation coefficient includes values between -1 and +1. Thus, depending on its value, it will have one meaning or another.

If the Pearson correlation coefficient is equal to 1 or -1, we can consider that the correlation that exists between the variables studied is perfect.

If the coefficient is greater than 0, the correlation is positive (“A more, more, and a less less). On the other hand, if it is less than 0 (negative), the correlation is negative (“A more, less, and a less, more). Finally, if the coefficient is equal to 0, we can only affirm that there is no linear relationship between the variables, but there may be some other type of relationship.

Considerations

The Pearson correlation coefficient increases if the variability of X and/or Y (the variables) increases, and decreases otherwise. On the other hand, to assert whether a value is high or low, we must compare our data with other investigations with the same variables and in similar circumstances.

To represent the relationships of different variables that combine linearly, we can use the so-called variance-covariance matrix or the correlation matrix; in the diagonal of the first we will find variance values, and in the second we will find ones (the correlation of a variable with itself is perfect, =1).

squared coefficient

When we square the Pearson correlation coefficient, its meaning changes, and we interpret its value in relation to the forecasts (indicates causality of the relationship). That is, in this case, it can have four interpretations or meanings:

1. Associated variance

Indicates the proportion of the variance of Y (one variable) associated with the variation of X (the other variable). Therefore, we will know that "1-squared Pearson coefficient" = "proportion of the variance of Y that is not associated with the variation of X".

2. individual differences

If we multiply the Pearson correlation coefficient x100, it will indicate the % of the individual differences in Y that are associated / depend on / are explained by individual variations or differences in X. Therefore, "1-squared Pearson coefficient x 100" = % of individual differences in Y that is not associated with / depends on / is explained by individual variations or differences in X.

3. Error reduction rate

The squared Pearson correlation coefficient it can also be interpreted as an index of the reduction of error in the forecasts; that is, it would be the proportion of the root mean square error eliminated using Y' (the regression line, constructed from the results) instead of the mean of Y as the forecast. In this case, the coefficient x 100 would also be multiplied (indicates the %).

Therefore, "1-squared Pearson coefficient" = error that is still made when using the regression line instead of the mean (always multiplied x 100 = indicates the %).

4. Points approximation index

Finally, the last interpretation of the Pearson correlation coefficient raised to the square would indicate the approximation of the points to the commented regression line. The higher the value of the coefficient (closer to 1), the closer the points will be to Y' (to the line).

Bibliographic references:

Bottle, J. Suero, m. Ximenez, C. (2012). Data analysis in psychology I. Madrid: Pyramid.
Lubin, P. Macia, A. Rubio de Lerma, P. (2005). Mathematical psychology I and II. Madrid: UNED.
Pardo, a. San Martin, R. (2006). Data analysis in psychology II. Madrid: Pyramid.