Correlation and Dependence
Correlation and dependence are statistical concepts that describe relationships between two random variables. Dependence indicates that the outcome of one variable can influence the outcome of another, while independence means that the occurrence of one does not affect the other. A common example used to illustrate independence is the rolling of two dice, where the result of one die does not alter the potential result of the other.
Correlation, particularly popularized by Francis Galton in the late 19th century, quantifies the degree to which two variables are related. The Pearson product moment correlation coefficient is a widely used measure that ranges from -1 to 1, indicating perfect negative correlation and perfect positive correlation, respectively. Visual representations of data can illustrate varying levels of correlation: perfect alignment on a line indicates a correlation of 1, while scattered data points suggest weaker or no correlation. Alternative correlation coefficients, such as Spearman's rank and Kendall's tau, also exist and are specifically designed to capture relationships in different contexts, particularly when non-linear relationships might be present. Understanding these concepts is essential for statistical analysis and interpreting data in various fields.
On this Page
Subject Terms
Correlation and Dependence
Dependence describes a statistical relationship between two random variables. Random variables are independent if the realization of one has no influence on the probability of any realization of the other. The standard illustration considers throwing two dice, where random variables would be the number of dots showing on either die. The value of the first random variable (1 to 6) has no influence on the value of the second random variable (also 1 to 6).
Correlation is a related idea developed by Francis Galton in the late nineteenth century. Statistical definitions of correlation vary, but they are quite specific in the way they describe the relationship between two variables. The Pearson product moment correlation coefficient is one common definition. It states that for two random variables X and Y:

Here





This formulation directly shows the meaning of the Pearson product moment correlation coefficient. If we have data where large values of












Figure 1(a) shows data that align perfectly on a straight line with a correlation of 1. Figure 1(b) shows a lower value of correlation, implying variation around some underlying line. Figure 1(c) shows no correlation. Figure 1(d) shows variation around a line with a negative slope, a negative correlation. There is less variation around a line than in Figure 1(b) because the absolute value of the correlation coefficient is higher.

Because of the role of the mean and the assumption of linearity, there are, therefore, other correlation coefficients that attempt to describe a relationship between two variables, which may be more appropriate in other circumstances. Some of these, such as Spearman’s rank correlation coefficient or Kendal’s tau are also calibrated to lie in the range −1 to 1.
Bibliography
Blitzstein, J. K., and J. Hwant. Introduction to Probability. Boca Raton, FL: CRC, 2015.
Pishro-Nik, Hossein. Introduction to Probability, Statistics, and Random Processes. Kappa Research, 2014.
Shang, Du and Pengjian Shang. "The Dependence Index Based on Martingale Difference Correlation: An Efficient Tool to Distinguish Different Complex Systems." Expert Systems with Applications, vol. 213, 1 May 2023, doi.org/10.1016/j.eswa.2022.119284. Accessed 20 Nov. 2024.