Fact-checked by Grok 2 weeks ago

Cross-covariance matrix

In and , the cross- matrix between two random vectors \mathbf{X} (of k \times [1](/page/1)) and \mathbf{Y} (of l \times [1](/page/1)) is a k \times l whose (i,j)-th entry is the between the i-th component of \mathbf{X} and the j-th component of \mathbf{Y}, formally defined as \operatorname{Cov}(\mathbf{X}, \mathbf{Y}) = E[(\mathbf{X} - E[\mathbf{X}])(\mathbf{Y} - E[\mathbf{Y}])^\top]. This matrix extends the scalar concept to multivariate settings, capturing linear dependencies between elements of distinct random vectors, and reduces to the standard when \mathbf{X} = \mathbf{Y}. Key properties of the cross-covariance matrix include its lack of symmetry in general (i.e., \operatorname{Cov}(\mathbf{X}, \mathbf{Y}) \neq \operatorname{Cov}(\mathbf{X}, \mathbf{Y})^\top unless \mathbf{X} = \mathbf{Y}), but it satisfies \operatorname{Cov}(\mathbf{X}, \mathbf{Y})^\top = \operatorname{Cov}(\mathbf{Y}, \mathbf{X}), and bilinearity under linear transformations: for matrices A and B and vectors \mathbf{a} and \mathbf{b}, \operatorname{Cov}(A\mathbf{X} + \mathbf{a}, B\mathbf{Y} + \mathbf{b}) = A \operatorname{Cov}(\mathbf{X}, \mathbf{Y}) B^\top. It can also be expressed in terms of cross-moments as \operatorname{Cov}(\mathbf{X}, \mathbf{Y}) = E[\mathbf{X}\mathbf{Y}^\top] - E[\mathbf{X}] E[\mathbf{Y}]^\top, highlighting its connection to expected values. These properties make it additive and homogeneous, facilitating computations in vector spaces. The cross-covariance matrix plays a central role in multivariate analysis, enabling the assessment of joint variability between datasets, such as in where it informs relationships between two sets of variables. In time-series analysis, it defines covariance stationarity for vector processes, where the cross-covariance depends only on the time lag rather than absolute time, aiding in modeling dependencies like in multivariate sequences. Applications extend to , where lagged cross-covariance measures similarity between shifted signals for tasks like filtering and detection, and to for modeling spatial cross-dependencies in multivariate fields. In , such as fMRI studies, empirical cross-covariance matrices quantify between brain regions across large-scale data.

Fundamentals

Definition

The cross-covariance matrix between two real-valued random vectors \mathbf{X} \in \mathbb{R}^n and \mathbf{Y} \in \mathbb{R}^m is defined as the n \times m whose capture the pairwise covariances between the components of \mathbf{X} and \mathbf{Y}. Specifically, the (i,j)-th entry of this is given by \operatorname{Cov}(X_i, Y_j) = \mathbb{E}[(X_i - \mathbb{E}[X_i])(Y_j - \mathbb{E}[Y_j])], where X_i is the i-th component of \mathbf{X} and Y_j is the j-th component of \mathbf{Y}. This measures the of the product of the centered components, quantifying their joint variability. In matrix notation, the cross-covariance matrix \mathbf{K}_{XY} can be expressed compactly as \mathbf{K}_{XY} = \mathbb{E}[(\mathbf{X} - \boldsymbol{\mu}_X)(\mathbf{Y} - \boldsymbol{\mu}_Y)^T], where \boldsymbol{\mu}_X = \mathbb{E}[\mathbf{X}] and \boldsymbol{\mu}_Y = \mathbb{E}[\mathbf{Y}] are the mean vectors of \mathbf{X} and \mathbf{Y}, respectively. This formulation highlights the matrix as the expected of the centered vectors, providing a structured representation of their linear dependencies. The definition presupposes that \mathbf{X} and \mathbf{Y} possess finite second moments, ensuring the expectations exist and the covariances are well-defined. When \mathbf{X} = \mathbf{Y}, the cross-covariance matrix reduces to the standard . As a generalization of the scalar cross-covariance to multivariate settings, this concept emerged in the early within the framework of , particularly through Harold Hotelling's development of in 1936.

Example

To illustrate the cross-covariance in the scalar case, consider two random variables X \sim \mathcal{N}(0,1) and Y = 2X + Z, where Z \sim \mathcal{N}(0,1) is of X. Since both have zero , the \operatorname{Cov}(X,Y) = \mathbb{E}[XY] = \mathbb{E}[X(2X + Z)] = 2\mathbb{E}[X^2] + \mathbb{E}[XZ] = 2 \cdot 1 + 0 = 2. For a vector example, let \mathbf{X} = \begin{pmatrix} X_1 \\ X_2 \end{pmatrix} where X_1 \sim \mathcal{N}(0,1), X_2 \sim \mathcal{N}(0,1) are , and let Y_1 = 2X_1 + Z with Z \sim \mathcal{N}(0,1) of \mathbf{X}. The joint distribution is multivariate with zero means. The matrix is the $2 \times 1 \mathbf{K}_{XY} = \begin{pmatrix} \operatorname{Cov}(X_1, Y_1) \\ \operatorname{Cov}(X_2, Y_1) \end{pmatrix} = \begin{pmatrix} \mathbb{E}[X_1 Y_1] \\ \mathbb{E}[X_2 Y_1] \end{pmatrix} = \begin{pmatrix} 2 \\ 0 \end{pmatrix}, using the linearity of and : \operatorname{Cov}(X_1, Y_1) = 2 \operatorname{Var}(X_1) = 2 and \operatorname{Cov}(X_2, Y_1) = 0. The non-zero entry in \mathbf{K}_{XY} indicates linear dependence between the first component of \mathbf{X} and Y_1, while the zero entry reflects the absence of linear dependence between the second component of \mathbf{X} and Y_1.

Properties

General Properties

The cross-covariance matrix K_{XY} of two real-valued random vectors \mathbf{X} \in \mathbb{R}^p and \mathbf{Y} \in \mathbb{R}^q satisfies several fundamental algebraic properties arising from the of . Specifically, it is linear in the first argument: for a scalar a and random \mathbf{U} \in \mathbb{R}^p, K_{a\mathbf{X} + \mathbf{U}, \mathbf{Y}} = a K_{XY} + K_{UY}. Similarly, it is linear in the second argument: K_{\mathbf{X}, b\mathbf{Y} + \mathbf{V}} = b K_{XY} + K_{X\mathbf{V}} for scalar b and random \mathbf{V} \in \mathbb{R}^q. These properties follow directly from the bilinearity of the , which extends the of to centered products of random variables. The cross-covariance matrix is also intimately related to second-moment expectations. In general, K_{XY} = \mathbb{E}[(\mathbf{X} - \boldsymbol{\mu}_X)(\mathbf{Y} - \boldsymbol{\mu}_Y)^T], where \boldsymbol{\mu}_X = \mathbb{E}[\mathbf{X}] and \boldsymbol{\mu}_Y = \mathbb{E}[\mathbf{Y}] are the mean vectors. This centered form expands to the uncentered relation K_{XY} = \mathbb{E}[\mathbf{X} \mathbf{Y}^T] - \boldsymbol{\mu}_X \boldsymbol{\mu}_Y^T, highlighting its connection to the second mixed moment matrix minus the outer product of the means. This equivalence holds because the cross terms involving the means vanish under expectation when centering the vectors. A key transpose property is that the cross-covariance matrix between \mathbf{Y} and \mathbf{X} is the transpose of the original: K_{YX} = K_{XY}^T. This follows from the symmetry of the underlying covariance operator, since the (i,j)-th entry of K_{YX} is \mathrm{Cov}(Y_j, X_i) = \mathrm{Cov}(X_i, Y_j), which is the (i,j)-th entry of K_{XY}^T. Regarding positive semi-definiteness, while K_{XY} itself is generally rectangular and neither symmetric nor positive semi-definite, it contributes to the structure of the joint covariance matrix for the stacked vector (\mathbf{X}^T, \mathbf{Y}^T)^T. This joint matrix takes the block form \begin{pmatrix} K_{XX} & K_{XY} \\ K_{YX} & K_{YY} \end{pmatrix}, which is symmetric and positive semi-definite. That is, for any real vector \mathbf{z} = (\mathbf{u}^T, \mathbf{v}^T)^T with \mathbf{u} \in \mathbb{R}^p and \mathbf{v} \in \mathbb{R}^q, \mathbf{z}^T \begin{pmatrix} K_{XX} & K_{XY} \\ K_{YX} & K_{YY} \end{pmatrix} \mathbf{z} = \mathrm{Var}(\mathbf{u}^T \mathbf{X} + \mathbf{v}^T \mathbf{Y}) \geq 0. This property ensures that the cross-covariance is compatible with valid joint distributions, as the joint covariance matrix must always be positive semi-definite.

Relation to Covariance and Correlation

The cross-covariance matrix K_{XY} between two random vectors X and Y generalizes the concept of covariance to pairs of distinct vectors. A key special case occurs when X = Y, in which K_{XX} reduces to the standard covariance matrix \Sigma_X = \mathbb{E}[(X - \mathbb{E}[X])(X - \mathbb{E}[X])^\top], capturing the joint variability within a single vector. The cross-correlation matrix extends the normalization applied to covariance matrices, providing a standardized measure of linear dependence between the components of X and Y that bounds entries in [-1, 1]. It is defined as R_{XY} = \operatorname{diag}(\Sigma_{XX})^{-1/2} K_{XY} \operatorname{diag}(\Sigma_{YY})^{-1/2}, where \Sigma_{XX} and \Sigma_{YY} are the covariance matrices of X and Y, respectively, and the diagonal matrices contain the square roots of the component variances (i.e., standard deviations). This normalization divides each cross-covariance entry by the product of the corresponding standard deviations, yielding a scale-invariant dependence metric analogous to Pearson's correlation for univariate pairs. In contrast to the cross-covariance matrix, which quantifies absolute covariation influenced by the magnitudes and units of X and Y, the cross-correlation matrix focuses on the relative strength of their linear relationships, facilitating comparisons across different vector scales or datasets. These distinctions mirror those between (absolute joint variation) and (standardized association) for a single vector, but apply to inter-vector dependencies. The cross-covariance matrix also supports analyses like , where it informs tests between variable subsets by examining residual dependencies after accounting for confounding factors via the precision matrix.

Extensions

Definition for Complex Random Vectors

The cross-covariance matrix between two complex-valued random vectors \mathbf{X} \in \mathbb{C}^n and \mathbf{Y} \in \mathbb{C}^m with finite second moments is defined as \mathbf{K}_{XY} = E\left[ (\mathbf{X} - \mu_X) (\mathbf{Y} - \mu_Y)^H \right], where \mu_X = E[\mathbf{X}] and \mu_Y = E[\mathbf{Y}] are the mean vectors, and ^H denotes the (). This adaptation from the real-valued case, which employs the ordinary transpose, incorporates conjugation to ensure the matrix captures meaningful linear dependencies in the domain, as the inner product for complex spaces is sesquilinear. For the auto-covariance case where \mathbf{X} = \mathbf{Y}, the resulting \mathbf{K}_{XX} is a satisfying \mathbf{K}_{XX}^H = \mathbf{K}_{XX}, which guarantees that its eigenvalues are real and non-negative under the finite second-moment assumption. Additionally, the satisfies the relation \mathbf{K}_{YX} = \mathbf{K}_{XY}^H, reflecting the Hermitian between the vectors. The definition assumes that the second moments E[\|\mathbf{X}\|^2] and E[\|\mathbf{Y}\|^2] are finite to ensure the expectation exists; for many applications in , the vectors are further assumed to be circularly symmetric (proper complex), meaning the pseudo-covariance E[(\mathbf{X} - \mu_X)(\mathbf{Y} - \mu_Y)^T] = \mathbf{0}, which simplifies statistical properties without altering the core definition.

Uncorrelatedness

Two random vectors \mathbf{X} and \mathbf{Y} in \mathbb{R}^p and \mathbb{R}^q, respectively, are uncorrelated if their cross-covariance matrix \mathbf{K}_{\mathbf{XY}} = \mathbf{0}, the of dimensions p \times q. This condition is equivalent to \mathbb{E}[X_i Y_j] = \mathbb{E}[X_i] \mathbb{E}[Y_j] for all components i = 1, \dots, p and j = 1, \dots, q, assuming the relevant second moments exist. If two random vectors are independent, then they are uncorrelated, provided the second moments are finite; this follows from the property that independence implies \mathbb{E}[\mathbf{X} \mathbf{Y}^T] = \mathbb{E}[\mathbf{X}] \mathbb{E}[\mathbf{Y}^T]. The converse does not generally hold, as uncorrelated vectors can exhibit nonlinear dependence. However, for jointly Gaussian random vectors, uncorrelatedness is equivalent to statistical independence, a key property arising from the fact that the joint density factors into marginals when the cross-covariance is zero. For complex-valued random vectors \mathbf{X} and \mathbf{Y} in \mathbb{C}^p and \mathbb{C}^q, uncorrelatedness requires \mathbf{K}_{\mathbf{XY}} = \mathbf{0}, where \mathbf{K}_{\mathbf{XY}} = \mathbb{E}[(\mathbf{X} - \boldsymbol{\mu}_{\mathbf{X}})(\mathbf{Y} - \boldsymbol{\mu}_{\mathbf{Y}})^H] and ^H denotes the Hermitian transpose. In the case of proper complex random vectors (those with zero pseudo-covariance), this suffices; more generally, full second-order uncorrelatedness also demands that the pseudo cross-covariance \mathbb{E}[X_i Y_j] = \mathbb{E}[X_i] \mathbb{E}[Y_j] for all i, j. The cross-covariance matrix underlies hypothesis tests for independence between random vectors, particularly in multivariate normal settings where the null hypothesis of independence corresponds to \mathbf{K}_{\mathbf{XY}} = \mathbf{0}; the likelihood ratio test, for instance, assesses this by comparing determinants involving the joint and marginal covariance structures.

Estimation and Applications

Sample Cross-Covariance Matrix

The sample cross-covariance matrix provides an empirical estimate of the cross-covariance between two random vectors based on a finite set of paired observations. Given N independent and identically distributed (i.i.d.) samples \{ \mathbf{X}^{(k)}, \mathbf{Y}^{(k)} \}_{k=1}^N, where each \mathbf{X}^{(k)} is a p \times 1 vector and each \mathbf{Y}^{(k)} is a q \times 1 vector, the empirical estimator is computed as \hat{\mathbf{K}}_{XY} = \frac{1}{N} \sum_{k=1}^N (\mathbf{X}^{(k)} - \bar{\mathbf{X}}) (\mathbf{Y}^{(k)} - \bar{\mathbf{Y}})^T, where \bar{\mathbf{X}} = \frac{1}{N} \sum_{k=1}^N \mathbf{X}^{(k)} and \bar{\mathbf{Y}} = \frac{1}{N} \sum_{k=1}^N \mathbf{Y}^{(k)} are the sample means. This formula targets the population cross-covariance matrix through the centering of the vectors to remove mean effects. An unbiased version of the estimator adjusts the scaling to account for degrees of freedom, yielding \hat{\mathbf{K}}_{XY}^{\text{unb}} = \frac{1}{N-1} \sum_{k=1}^N (\mathbf{X}^{(k)} - \bar{\mathbf{X}}) (\mathbf{Y}^{(k)} - \bar{\mathbf{Y}})^T, analogous to the unbiased sample covariance matrix. This adjustment ensures the estimator has zero bias for the population parameter under the i.i.d. assumption. Computationally, the process begins with centering the data by subtracting the sample means from each paired observation, followed by forming the outer product (\mathbf{X}^{(k)} - \bar{\mathbf{X}}) (\mathbf{Y}^{(k)} - \bar{\mathbf{Y}})^T for each k and averaging these p \times q matrices across all samples. The paired nature of the data requires that each \mathbf{X}^{(k)} corresponds directly to \mathbf{Y}^{(k)}, ensuring alignment in the observations. These steps assume i.i.d. paired samples from the underlying joint distribution.

Applications in Statistics and Signal Processing

In statistics, the cross-covariance matrix is integral to multivariate , where the population regression coefficient matrix \beta for the model \mathbf{Y} = \mathbf{X}\beta + \boldsymbol{\epsilon} is derived as \beta = \Sigma_{\mathbf{X}\mathbf{X}}^{-1} \Sigma_{\mathbf{X}\mathbf{Y}}, with \Sigma_{\mathbf{X}\mathbf{Y}} representing the cross-covariance between the predictor matrix \mathbf{X} and response matrix \mathbf{Y}. The ordinary least squares (OLS) estimator employs the sample cross-covariance analog, \hat{\Sigma}_{\mathbf{X}\mathbf{Y}} = \frac{1}{n} \mathbf{X}^T \mathbf{Y} (after centering), to compute \hat{\beta} = (\mathbf{X}^T \mathbf{X})^{-1} \mathbf{X}^T \mathbf{Y}, enabling prediction of multiple responses from shared predictors while accounting for inter-response correlations. Similarly, in (CCA), the method maximizes the correlations between linear combinations of two multivariate sets \mathbf{X} and \mathbf{Y} by solving for the of the normalized cross-covariance matrix \Sigma_{\mathbf{X}\mathbf{X}}^{-1/2} \Sigma_{\mathbf{X}\mathbf{Y}} \Sigma_{\mathbf{Y}\mathbf{Y}}^{-1/2}, or equivalently, optimizing the of the product involving \Sigma_{\mathbf{X}\mathbf{Y}} to identify paired canonical variates. In , the matrix facilitates frequency-domain analysis through the cross-spectral density, defined as the of the function R_{xy}(\tau), which quantifies coherent power transfer between two signals at different frequencies and supports applications like . It is also employed in algorithms, where the —incorporating terms between sensor signals—is inverted to derive optimal weights that steer nulls toward interferers and enhance the signal-of-interest direction in uniform linear arrays. For noise cancellation, multichannel adaptive filters utilize the between reference noise signals and the primary contaminated signal to minimize residual noise in the output, as seen in generalized sidelobe canceller structures that subtract correlated noise components. A practical example arises in time series analysis, where the lagged cross-covariance function \gamma_{XY}(h) = \mathrm{Cov}(X_t, Y_{t+h}) identifies lead-lag relationships; the lag h maximizing |\gamma_{XY}(h)| reveals the temporal offset, such as one economic indicator preceding another, aiding forecasting in prewhitened models. In modern machine learning, particularly within kernel methods during the 2020s, the cross-covariance operator \mathcal{C}_{YX} in reproducing kernel Hilbert spaces enables feature extraction by capturing dependencies between input and output views, as in kernel CCA for dimensionality reduction and transfer learning tasks that align distributions across domains.