Fact-checked by Grok 2 weeks ago

Covariance and correlation

Covariance and correlation are statistical measures used to describe the joint variability and linear relationship between two random variables. quantifies the extent to which the variables deviate from their means in the same direction, with positive values indicating that they tend to increase or decrease together, negative values showing an inverse relationship, and zero suggesting no linear association. , often referring to the , standardizes by dividing it by the product of the variables' standard deviations, yielding a dimensionless value between -1 and +1 that assesses both the strength and direction of the linear relationship. The concept of correlation originated in the late 19th century through the work of , who introduced the term in the context of biological inheritance and , and was formalized by in his 1896 paper on mathematical contributions to . Covariance, as a more general measure from , gained prominence alongside correlation in multivariate analysis; its population is \Cov(X, Y) = \E[(X - \E[X])(Y - \E[Y])], equivalent to \E[XY] - \E[X]\E[Y], while the sample estimator uses division by n-1 for unbiasedness. Key properties include symmetry (\Cov(X, Y) = \Cov(Y, X)), bilinearity, and the fact that \Cov(X, X) = \Var(X); for correlation, \rho_{XY} = \frac{\Cov(X, Y)}{\sqrt{\Var(X)\Var(Y)}}, it equals \pm 1 for perfect linear relationships and 0 for uncorrelated variables, though zero correlation does not imply . These measures are foundational in fields like , where they inform diversification through matrices, and in for identifying patterns in datasets. Unlike , which depends on the units of measurement and can range from -\infty to +\infty, correlation's bounded scale makes it more interpretable for comparing relationships across different scales. Extensions include for controlling variables and rank-based alternatives like Spearman's rho for non-linear monotonic relationships.

Fundamental Concepts

Definition of Covariance

is a statistical measure that quantifies the extent to which two random variables, X and Y, vary together, capturing the direction and degree of their linear relationship. For a pair of random variables defined over a , the population covariance, denoted \operatorname{Cov}(X, Y), is formally defined as the of the product of their deviations from their respective means: \operatorname{Cov}(X, Y) = E[(X - E[X])(Y - E[Y])] This expression arises from the linearity of the expectation operator, as expanding the product yields E[XY - X E[Y] - Y E[X] + E[X] E[Y]] = E[XY] - E[X] E[Y], providing an equivalent formulation \operatorname{Cov}(X, Y) = E[XY] - E[X] E[Y]. The population covariance represents a theoretical parameter for the entire distribution of the variables, whereas the sample covariance serves as an empirical estimate derived from observed data points, with details on its computation addressed separately. The sign of the covariance indicates the nature of the linear co-movement: a positive value signifies that X and Y tend to increase or decrease in tandem, a negative value implies they move in opposite directions, and a value of zero suggests no linear association, though the variables may still be dependent in nonlinear ways. Covariance carries units that are the product of the units of X and Y, rendering it scale-dependent; for instance, measuring one in different units alters the 's magnitude without changing the underlying relationship.

Definition of Correlation

The Pearson product-moment , denoted \rho_{X,Y}, measures the strength and direction of the linear association between two random variables X and Y. It is defined as the between X and Y divided by the product of their standard deviations: \rho_{X,Y} = \frac{\Cov(X,Y)}{\sigma_X \sigma_Y}, where \Cov(X,Y) is the covariance, and \sigma_X and \sigma_Y are the standard deviations of X and Y, respectively. This standardization normalizes the covariance, which serves as the numerator, to produce a bounded measure. The coefficient \rho_{X,Y} ranges from -1 to +1. A value of +1 indicates a perfect positive linear relationship, where increases in one variable correspond exactly to proportional increases in the other; -1 signifies a perfect negative linear relationship, with increases in one corresponding to proportional decreases in the other; and 0 implies no linear association between the variables. These interpretations hold specifically for linear dependencies, as the coefficient does not capture nonlinear relationships. Valid application of \rho_{X,Y} requires that the relationship between the variables is linear and that both variables have finite variances, ensuring the standard deviations are well-defined. In contrast to , which is scale-dependent and retains the units of the variables' product, the is dimensionless and invariant to changes in scale or location of the variables. The term "correlation" was coined by in 1888 to describe interdependent relations.

Mathematical Properties

Properties of Covariance

Covariance possesses several key algebraic properties that arise from the linearity of the , making it a useful tool for deriving expressions involving sums and linear combinations of random variables. Specifically, is bilinear: for scalar constants a and c, and random variables X, Y, Z, \operatorname{Cov}(aX + b, Y) = a \operatorname{Cov}(X, Y), where b is any constant (since adding a constant to the first argument does not affect the centered product in the covariance definition), and \operatorname{Cov}(X, Y + cZ) = \operatorname{Cov}(X, Y) + c \operatorname{Cov}(X, Z). These follow directly from the bilinearity of expectation: \mathbb{E}[(aX + b - \mathbb{E}[aX + b])(Y - \mathbb{E}[Y])] = a \mathbb{E}[(X - \mathbb{E}[X])(Y - \mathbb{E}[Y])] for the first, and similarly for the second by expanding the expectation of the product. For a random vector \mathbf{X} = (X_1, \dots, X_n)^\top, the \Sigma has entries \Sigma_{ij} = \operatorname{Cov}(X_i, X_j). This matrix is symmetric because \operatorname{Cov}(X_i, X_j) = \operatorname{Cov}(X_j, X_i), and the diagonal entries are the variances \operatorname{Var}(X_i). Moreover, \Sigma is positive semi-definite: for any \mathbf{a} \in \mathbb{R}^n, \mathbf{a}^\top \Sigma \mathbf{a} = \operatorname{Var}(\mathbf{a}^\top \mathbf{X}) \geq 0, with equality if \mathbf{a}^\top \mathbf{X} is constant almost surely. A direct consequence of bilinearity is the decomposition of the variance of a sum: for random variables X and Y, \operatorname{Var}(X + Y) = \operatorname{Var}(X) + \operatorname{Var}(Y) + 2 \operatorname{Cov}(X, Y). This expands to the general case for multiple variables, facilitating the analysis of aggregate variability in linear combinations. The Cauchy-Schwarz inequality provides a bound on the magnitude of covariance: for random variables X and Y with finite variances, |\operatorname{Cov}(X, Y)| \leq \sqrt{\operatorname{Var}(X)} \sqrt{\operatorname{Var}(Y)}, with equality if and only if X and Y are linearly dependent almost surely (i.e., one is an affine function of the other). This follows from applying the standard Cauchy-Schwarz inequality to the expectation inner product: \left( \mathbb{E}[(X - \mathbb{E}[X])(Y - \mathbb{E}[Y])] \right)^2 \leq \mathbb{E}[(X - \mathbb{E}[X])^2] \mathbb{E}[(Y - \mathbb{E}[Y])^2]. If \operatorname{Cov}(X, Y) = 0, then X and Y are said to be uncorrelated, meaning their deviations from means do not systematically co-vary. However, uncorrelated random variables are not necessarily independent. A classic counterexample is X \sim \operatorname{[Uniform](/page/Uniform)}[-1, 1] and Y = X^2: here, \mathbb{E}[X] = 0, \mathbb{E}[Y] = \int_{-1}^1 x^2 \cdot \frac{1}{2} \, dx = \frac{1}{3}, and \mathbb{E}[XY] = \mathbb{E}[X^3] = 0 by odd symmetry, so \operatorname{Cov}(X, Y) = 0. Yet, X and Y are dependent, as the distribution of Y given X = x is degenerate at x^2, not matching the marginal of Y.

Properties of Correlation

The , denoted as \rho_{X,Y}, exhibits under affine transformations of the variables. Specifically, for constants a \neq 0 and c \neq 0, the correlation satisfies \rho_{aX + b, cY + d} = \operatorname{sign}(a c) \rho_{X,Y}, meaning it remains unchanged in magnitude but may flip depending on the directions of the scalings. This property arises from the by standard deviations in its , distinguishing it from the scale-sensitive . Another key property involves the product of correlations in multivariate settings. When the partial correlation between X and Y given Z is zero—indicating in a linear sense—the correlation \rho_{X,Y} equals the product \rho_{X,Z} \rho_{Z,Y}. This holds in general from the definition of partial correlation and reflects how linear dependencies propagate through an intermediary Z, such as in a simple chain model without direct links; for jointly normal variables, it further implies . The correlation coefficient is bounded by |\rho_{X,Y}| \leq 1, a consequence of the Cauchy-Schwarz inequality applied to the covariance: |\operatorname{Cov}(X,Y)| \leq \sqrt{\operatorname{Var}(X) \operatorname{Var}(Y)}. Equality occurs if and only if Y = aX + b almost surely for some constants a and b, corresponding to perfect linear dependence. Covariance forms the unnormalized foundation for this bounded measure. A zero correlation \rho_{X,Y} = 0 implies that X and Y are uncorrelated, and for any pair of random variables, independence entails uncorrelatedness. However, the converse does not generally hold; uncorrelated variables can still exhibit dependence, as in mixtures of bivariate normals with nonlinear relationships. In the special case of jointly bivariate normal distributions, uncorrelatedness does imply full independence. Regarding inference, the sampling distribution of the sample correlation r under the null hypothesis \rho = 0 is approximately for large sample sizes n, with mean 0 and variance $1/(n-1). This asymptotic normality, \sqrt{n} r \approx \mathcal{N}(0, 1), facilitates hypothesis testing for the absence of linear association.

Estimation from Data

Sample Covariance

The sample provides an estimate of the between two variables based on a of paired observations from a . For a sample of size n consisting of paired values (x_1, y_1), \dots, (x_n, y_n), the sample s_{XY} is computed as s_{XY} = \frac{1}{n-1} \sum_{i=1}^n (x_i - \bar{x})(y_i - \bar{y}), where \bar{x} = \frac{1}{n} \sum_{i=1}^n x_i and \bar{y} = \frac{1}{n} \sum_{i=1}^n y_i are the sample means of the x_i and y_i, respectively. This formula measures the average product of deviations from the respective sample means, scaled by n-1 to account for the process. A related but biased estimator uses division by n instead of n-1, analogous to the maximum likelihood under the of and identically distributed observations. However, the version with n-1 in the denominator yields an unbiased of the population covariance, meaning its equals the true population covariance \sigma_{XY} for any with finite second moments. This unbiasedness holds generally for samples, though the simplifies proofs of related properties like the for multivariate cases. The adjustment to n-1, known as , addresses the lost when estimating the means with sample means. Since the deviations (x_i - \bar{x}) and (y_i - \bar{y}) are calculated relative to values derived from the same , the of squared deviations tends to underestimate the true variability; dividing by n-1 rather than n corrects this downward bias by effectively increasing the scale factor. In the multivariate setting, the sample covariance extends to a symmetric positive semi-definite S of order p \times p for p variables, where the diagonal elements are sample variances and off-diagonal elements are sample covariances between pairs of variables. The (j,k)-th entry of S is s_{jk} = \frac{1}{n-1} \sum_{i=1}^n (x_{ij} - \bar{x}_j)(x_{ik} - \bar{x}_k), with \bar{x}_j denoting the sample mean of the j-th variable; this serves as an unbiased of the population \Sigma. For illustration, consider a sample of n=5 paired observations on (in inches) and weights (in pounds): (60, 120), (62, 125), (64, 130), (66, 135), (68, 140). The sample height is \bar{x} = 64 and sample weight is \bar{y} = 130. The deviations for height are -4, -2, 0, 2, 4 and for weight are -10, -5, 0, 5, 10, yielding products of 40, 10, 0, 10, 40 with sum 100. Thus, the sample is s_{XY} = 100 / 4 = 25, indicating a positive linear on the scale of the variables' units.

Sample Correlation Coefficient

The sample correlation coefficient, denoted r, serves as the point for the population correlation coefficient \rho. It is computed by normalizing the sample with the product of the sample deviations:
r = \frac{s_{XY}}{s_X s_Y},
where s_{XY} is the sample , and s_X and s_Y are the sample deviations of the variables X and Y, respectively. This yields a dimensionless measure bounded between -1 and 1, with values near 1 or -1 indicating strong positive or negative linear relationships, respectively.
A computationally convenient form of the , avoiding explicit computation of means and standard deviations in intermediate steps, is
r = \frac{\sum_{i=1}^n (x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\sum_{i=1}^n (x_i - \bar{x})^2 \sum_{i=1}^n (y_i - \bar{y})^2}},
where \bar{x} and \bar{y} are the sample means. The sample is slightly biased as an estimator of \rho, tending to underestimate the (i.e., biased downward for |\rho| > 0) in finite samples from populations, with the bias magnitude ranging from about 0.01 to 0.04 depending on sample size n and \rho. To stabilize the variance of r for , particularly when |r| is close to 1, Fisher's z-transformation is applied:
z = \frac{1}{2} \ln \left( \frac{1 + r}{1 - r} \right),
which approximately follows a with variance $1/(n-3).
As a , the converges in probability to the \rho as the sample size n \to \infty, by the applied to the underlying sample moments. For illustration, consider a of (in ) and pulmonary anatomical (in ml) for 15 children:
Height (x)Dead space (y)
11044
11631
12050
12454
12856
13260
13662
14066
14470
14874
15278
15682
16086
16490
17094
Using the formula above, the resulting r \approx 0.85 indicates a strong positive linear relationship between height and dead space volume.

Applications

In Multivariate Distributions

In the multivariate normal distribution, the covariance matrix \Sigma fully characterizes the joint distribution of a vector of random variables \mathbf{X} = (X_1, \dots, X_p)^T \sim \mathcal{N}_p(\boldsymbol{\mu}, \Sigma), where \boldsymbol{\mu} is the mean vector. The matrix \Sigma is symmetric and positive semi-definite, and it governs the shape and orientation of the elliptical contours of equal probability density, with the eigenvalues determining the spread along principal axes and the eigenvectors indicating the directions of these axes. For instance, the probability density function is given by f(\mathbf{x}) = \frac{1}{(2\pi)^{p/2} |\Sigma|^{1/2}} \exp\left( -\frac{1}{2} (\mathbf{x} - \boldsymbol{\mu})^T \Sigma^{-1} (\mathbf{x} - \boldsymbol{\mu}) \right), where the (\mathbf{x} - \boldsymbol{\mu})^T \Sigma^{-1} (\mathbf{x} - \boldsymbol{\mu}) encapsulates the dependencies via \Sigma. The matrix \mathbf{R} is the standardized counterpart to \Sigma, obtained by dividing each element \sigma_{ij} by \sqrt{\sigma_{ii} \sigma_{jj}}, yielding 1s along the diagonal and the pairwise correlation coefficients \rho_{ij} as off-diagonal entries. This matrix provides a scale-free measure of linear dependencies among the variables, facilitating comparisons across different units of measurement. Partial correlations extend this framework by quantifying the linear association between two variables conditional on the remaining variables, equivalent to the correlation in the conditional distribution under multivariate normality. These are derived from the inverse of the correlation matrix, where the partial correlation \rho_{ij \cdot \mathbf{k}} (conditioning on the other variables indexed by \mathbf{k}) is -\Omega_{ij} / \sqrt{\Omega_{ii} \Omega_{jj}}, with \Omega = \mathbf{R}^{-1}. A key application of the covariance matrix arises in principal component analysis (PCA), which performs eigen-decomposition \Sigma = \mathbf{V} \boldsymbol{\Lambda} \mathbf{V}^T, where \boldsymbol{\Lambda} is the diagonal matrix of eigenvalues (variances of principal components) and \mathbf{V} contains the eigenvectors (loadings). This decomposition identifies orthogonal directions of maximum variance, enabling by retaining components with the largest eigenvalues while discarding those with small ones to approximate the data with minimal loss of information. For a concrete illustration in the bivariate case (p=2), consider \mathbf{X} = (X, Y)^T \sim \mathcal{N}_2(\boldsymbol{\mu}, \Sigma) with \Sigma = \begin{pmatrix} \sigma_X^2 & \rho \sigma_X \sigma_Y \\ \rho \sigma_X \sigma_Y & \sigma_Y^2 \end{pmatrix}. The density simplifies to f(x,y) = \frac{1}{2\pi \sigma_X \sigma_Y \sqrt{1 - \rho^2}} \exp\left( -\frac{1}{2(1 - \rho^2)} \left[ \frac{(x - \mu_X)^2}{\sigma_X^2} + \frac{(y - \mu_Y)^2}{\sigma_Y^2} - \frac{2\rho (x - \mu_X)(y - \mu_Y)}{\sigma_X \sigma_Y} \right] \right), where the covariance term \rho \sigma_X \sigma_Y tilts the elliptical contours away from axes-alignment when \rho \neq [0](/page/0). In the , a diagonal (i.e., \operatorname{Cov}(X_i, X_j) = [0](/page/0) for all i \neq j) implies that the components are pairwise uncorrelated and, moreover, fully , as uncorrelatedness suffices for in this .

In Time Series Analysis

In time series analysis, covariance and correlation play a central in characterizing the dependence structure of processes, where statistical properties such as and variance remain constant over time. For a weakly \{X_t\}, the at k is defined as \gamma(k) = \Cov(X_t, X_{t+k}), which measures the linear dependence between observations separated by k time units. This is symmetric, satisfying \gamma(k) = \gamma(-k) for all integers k, and at zero, \gamma([0](/page/0)) equals the variance of the process, \Var(X_t). The autocorrelation function normalizes the autocovariance to produce values between -1 and 1, given by \rho(k) = \gamma(k) / \gamma(0). This function is widely used in autocorrelation function (ACF) plots, which visualize \rho(k) against lags k to assess stationarity and identify patterns such as trends or seasonal components in the . For processes, \rho(k) typically decays to zero as |k| increases, providing insight into the memory or persistence of the series. For two jointly stationary time series \{X_t\} and \{Y_t\}, the cross-covariance function is \gamma_{XY}(k) = \Cov(X_t, Y_{t+k}), capturing the covariance between observations from different series at temporal offset k. The corresponding cross-correlation function \rho_{XY}(k) = \gamma_{XY}(k) / \sqrt{\gamma_X(0) \gamma_Y(0)} normalizes this measure, aiding in the analysis of lead-lag relationships, such as in multivariate time series modeling. In practice, the and functions are estimated from finite samples. The sample at k is \hat{\gamma}(k) = n^{-1} \sum_{t=1}^{n-|k|} (X_t - \bar{X})(X_{t+|k|} - \bar{X}), where n is the sample size and \bar{X} is the sample mean, leading to the sample \hat{\rho}(k) = \hat{\gamma}(k) / \hat{\gamma}(0). Under stationarity, the asymptotic variance of \hat{\rho}(k) for k \geq 1 is approximated by Bartlett's formula: \Var(\hat{\rho}(k)) \approx n^{-1} \sum_{j=-\infty}^{\infty} [\rho(j)^2 + \rho(j+k)\rho(j-k) - 2\rho(k)\rho(j)^2], which simplifies to n^{-1} for processes and guides confidence intervals in ACF plots. A representative example is the autoregressive process of order 1 (AR(1)), defined as X_t = \phi X_{t-1} + Z_t where |\phi| < 1 and \{Z_t\} is white noise with variance \sigma^2. The autocorrelation function decays exponentially: \rho(k) = \phi^{|k|}, illustrating how dependence diminishes geometrically with lag, a pattern commonly observed in economic and climatic time series.

In Practical Fields

In finance, covariance plays a central role in modern portfolio theory, where the variance of a portfolio's returns is given by \sigma_p^2 = \mathbf{w}^T \Sigma \mathbf{w}, with \mathbf{w} as the vector of asset weights and \Sigma as the covariance matrix of asset returns, enabling investors to optimize risk-return trade-offs through diversification. Correlation coefficients further inform diversification strategies by quantifying the degree to which asset returns move together, with low or negative correlations reducing overall portfolio risk. In and , correlation measures, particularly intraclass correlations, are used in twin studies to estimate , which represents the proportion of phenotypic variance attributable to genetic factors; for instance, monozygotic twins exhibit higher intraclass correlations than dizygotic twins for traits like neural activity patterns in fMRI tasks, allowing researchers to partition variance into genetic and environmental components. In , correlation matrices of features help detect in models, where high correlations between predictors can inflate variance estimates and destabilize coefficient interpretations, prompting techniques like to improve model reliability. In , correlation coefficients assess relationships in psychometric testing, such as the moderate positive between IQ scores and job performance ratings, with meta-analyses reporting corrected values around 0.51 (uncorrected around 0.2–0.3), indicating that cognitive ability explains a substantial portion of performance variance while other factors like also contribute. A key caveat in interpreting correlations across fields is the risk of spurious associations, where variables appear related due to factors rather than causation; for example, ice cream sales and shark attacks both rise in summer due to increased activity and warmer , not a direct link between the two.

Extensions and Generalizations

For More Than Two Variables

When extending covariance and correlation to more than two variables, the provides a comprehensive representation of the pairwise covariances among a set of random variables. For an n-dimensional random vector \mathbf{X} = (X_1, \dots, X_n)^T, the covariance matrix \Sigma is an n \times n where the diagonal elements are the variances \operatorname{Var}(X_i) and the off-diagonal elements are the covariances \operatorname{Cov}(X_i, X_j) for i \neq j. The of \Sigma, known as the generalized variance, quantifies the overall variability of the multivariate , with larger values indicating greater spread in multiple dimensions. The multiple correlation coefficient measures the strength of the linear relationship between one variable and a set of other variables in a multivariate context, such as multiple . Denoted R, it is the correlation between the observed values of the dependent variable Y and the predicted values \hat{Y} from regressing Y on predictors X_1, \dots, X_k, and its square R^2 = 1 - \frac{\operatorname{RSS}}{\operatorname{TSS}} represents the proportion of the (TSS) explained by the regression sum of squares (RSS), indicating the model's fit. This extends the bivariate Pearson , where R reduces to the absolute value of the simple when k=1. Partial correlation extends the concept to assess the linear association between two variables while controlling for the effects of one or more additional variables. For three variables X, Y, and Z, the partial correlation coefficient \rho_{XY \cdot Z} is given by \rho_{XY \cdot Z} = \frac{\rho_{XY} - \rho_{XZ} \rho_{YZ}}{\sqrt{(1 - \rho_{XZ}^2)(1 - \rho_{YZ}^2)}}, where \rho_{ij} denotes the Pearson correlation between variables i and j. This formula isolates the direct association between X and Y by removing the influence of Z. For example, in a trivariate case with correlations \rho_{XY} = 0.8, \rho_{XZ} = 0.6, and \rho_{YZ} = 0.5, the partial correlation is \rho_{XY \cdot Z} = \frac{0.8 - 0.6 \times 0.5}{\sqrt{(1 - 0.6^2)(1 - 0.5^2)}} = 0.72, revealing a moderate direct relationship after adjustment. The covariance matrix is positive semi-definite by construction, meaning for any non-zero vector \mathbf{z}, \mathbf{z}^T \Sigma \mathbf{z} \geq 0, with strict positive definiteness (all eigenvalues positive) ensuring non-negative variances and the existence of the matrix inverse. This property is crucial for defining valid distances in multivariate space, such as the Mahalanobis distance d(\mathbf{x}, \boldsymbol{\mu}) = \sqrt{(\mathbf{x} - \boldsymbol{\mu})^T \Sigma^{-1} (\mathbf{x} - \boldsymbol{\mu})}, which accounts for variable correlations and scales.

Non-Linear and Rank-Based Measures

While the Pearson correlation coefficient effectively measures linear relationships between variables, it fails to detect non-linear dependencies, such as in the case where Y = X^2, yielding a correlation of zero despite an evident quadratic association. This limitation underscores the need for alternative measures that capture monotonic or more general forms of dependence without assuming . Spearman's rank correlation coefficient, denoted \rho_s, addresses this by assessing the strength and direction of a between two variables after converting their values to ranks. Introduced by in 1904, it is particularly useful for or when the relationship is non-linear but consistently increasing or decreasing. The formula is given by \rho_s = 1 - \frac{6 \sum_{i=1}^n d_i^2}{n(n^2 - 1)}, where d_i is the difference between the ranks of corresponding values of the two variables, and n is the number of observations; this yields values between -1 and 1, with 1 indicating perfect monotonic agreement. Another rank-based measure, Kendall's tau (\tau), evaluates the ordinal association between two variables by counting concordant and discordant pairs in their rankings. Developed by Maurice Kendall in 1938, it is more robust to outliers than Spearman's rho because it does not square rank differences, instead focusing on pairwise agreements. The coefficient is calculated as \tau = \frac{C - D}{\binom{n}{2}} = \frac{C - D}{n(n-1)/2}, where C is the number of concordant pairs, D is the number of discordant pairs, and \binom{n}{2} is the total number of pairs; like Spearman's, it ranges from -1 (perfect disagreement) to 1 (perfect agreement). For detecting any form of dependence, including non-monotonic non-linear relationships, distance correlation provides a more general approach. Proposed by Gábor J. Székely, Maria L. Rizzo, and Nail K. Bakirov in 2007, the sample distance correlation dCor(X, Y) is defined as dCor(X, Y) = \sqrt{ \frac{ V^2(X, Y) }{ V^2(X, X) V^2(Y, Y) } }, where V^2 denotes the squared distance covariance, computed from pairwise Euclidean distances between observations; it ranges from 0 (independence) to 1 (complete dependence) and equals zero if and only if the variables are independent. A classic illustration of these measures' differences appears in a scatterplot of points where Y = X^2 for X ranging from -3 to 3: the Pearson correlation is 0 due to the symmetric non-linearity, Spearman's \rho_s \approx 0 as the relation is not monotonic, while distance correlation detects the full dependence with a value of 1.