Cross-covariance matrix

In probability theory and statistics, the cross-covariance matrix between two random vectors \mathbf{X} (of dimension k \times [1](/page/1)) and \mathbf{Y} (of dimension l \times [1](/page/1)) is a k \times l matrix whose (i,j)-th entry is the covariance between the i-th component of \mathbf{X} and the j-th component of \mathbf{Y}, formally defined as \operatorname{Cov}(\mathbf{X}, \mathbf{Y}) = E[(\mathbf{X} - E[\mathbf{X}])(\mathbf{Y} - E[\mathbf{Y}])^\top].^[1] This matrix extends the scalar covariance concept to multivariate settings, capturing linear dependencies between elements of distinct random vectors, and reduces to the standard covariance matrix when \mathbf{X} = \mathbf{Y}.^[2] Key properties of the cross-covariance matrix include its lack of symmetry in general (i.e., \operatorname{Cov}(\mathbf{X}, \mathbf{Y}) \neq \operatorname{Cov}(\mathbf{X}, \mathbf{Y})^\top unless \mathbf{X} = \mathbf{Y}), but it satisfies \operatorname{Cov}(\mathbf{X}, \mathbf{Y})^\top = \operatorname{Cov}(\mathbf{Y}, \mathbf{X}), and bilinearity under linear transformations: for matrices A and B and vectors \mathbf{a} and \mathbf{b}, \operatorname{Cov}(A\mathbf{X} + \mathbf{a}, B\mathbf{Y} + \mathbf{b}) = A \operatorname{Cov}(\mathbf{X}, \mathbf{Y}) B^\top.^[1] It can also be expressed in terms of cross-moments as \operatorname{Cov}(\mathbf{X}, \mathbf{Y}) = E[\mathbf{X}\mathbf{Y}^\top] - E[\mathbf{X}] E[\mathbf{Y}]^\top, highlighting its connection to expected values.^[1] These properties make it additive and homogeneous, facilitating computations in vector spaces. The cross-covariance matrix plays a central role in multivariate analysis, enabling the assessment of joint variability between datasets, such as in canonical correlation analysis where it informs relationships between two sets of variables.^[1] In time-series analysis, it defines covariance stationarity for vector processes, where the cross-covariance depends only on the time lag rather than absolute time, aiding in modeling dependencies like autocorrelation in multivariate sequences.^[1] Applications extend to signal processing, where lagged cross-covariance measures similarity between shifted signals for tasks like filtering and detection, and to geostatistics for modeling spatial cross-dependencies in multivariate fields. In neuroimaging, such as fMRI studies, empirical cross-covariance matrices quantify functional connectivity between brain regions across large-scale data.^[3]

Fundamentals

Definition

The cross-covariance matrix between two real-valued random vectors \mathbf{X} \in \mathbb{R}^n and \mathbf{Y} \in \mathbb{R}^m is defined as the n \times m matrix whose elements capture the pairwise covariances between the components of \mathbf{X} and \mathbf{Y}.^[1] Specifically, the (i,j)-th entry of this matrix is given by \operatorname{Cov}(X_i, Y_j) = \mathbb{E}[(X_i - \mathbb{E}[X_i])(Y_j - \mathbb{E}[Y_j])], where X_i is the i-th component of \mathbf{X} and Y_j is the j-th component of \mathbf{Y}.^[1] This measures the expected value of the product of the centered components, quantifying their joint variability.^[4] In matrix notation, the cross-covariance matrix \mathbf{K}_{XY} can be expressed compactly as

\mathbf{K}_{XY} = \mathbb{E}[(\mathbf{X} - \boldsymbol{\mu}_X)(\mathbf{Y} - \boldsymbol{\mu}_Y)^T],

where \boldsymbol{\mu}_X = \mathbb{E}[\mathbf{X}] and \boldsymbol{\mu}_Y = \mathbb{E}[\mathbf{Y}] are the mean vectors of \mathbf{X} and \mathbf{Y}, respectively.^[1] This formulation highlights the matrix as the expected outer product of the centered vectors, providing a structured representation of their linear dependencies.^[4] The definition presupposes that \mathbf{X} and \mathbf{Y} possess finite second moments, ensuring the expectations exist and the covariances are well-defined. When \mathbf{X} = \mathbf{Y}, the cross-covariance matrix reduces to the standard covariance matrix. As a generalization of the scalar cross-covariance to multivariate settings, this concept emerged in the early 20th century within the framework of multivariate statistics, particularly through Harold Hotelling's development of canonical correlation analysis in 1936.

Example

To illustrate the cross-covariance in the scalar case, consider two random variables X \sim \mathcal{N}(0,1) and Y = 2X + Z, where Z \sim \mathcal{N}(0,1) is independent of X. Since both have zero mean, the cross-covariance \operatorname{Cov}(X,Y) = \mathbb{E}[XY] = \mathbb{E}[X(2X + Z)] = 2\mathbb{E}[X^2] + \mathbb{E}[XZ] = 2 \cdot 1 + 0 = 2.^[5] For a vector example, let \mathbf{X} = \begin{pmatrix} X_1 \\ X_2 \end{pmatrix} where X_1 \sim \mathcal{N}(0,1), X_2 \sim \mathcal{N}(0,1) are independent, and let Y_1 = 2X_1 + Z with Z \sim \mathcal{N}(0,1) independent of \mathbf{X}. The joint distribution is multivariate normal with zero means. The cross-covariance matrix is the $2 \times 1 matrix

\mathbf{K}_{XY} = \begin{pmatrix} \operatorname{Cov}(X_1, Y_1) \\ \operatorname{Cov}(X_2, Y_1) \end{pmatrix} = \begin{pmatrix} \mathbb{E}[X_1 Y_1] \\ \mathbb{E}[X_2 Y_1] \end{pmatrix} = \begin{pmatrix} 2 \\ 0 \end{pmatrix},

using the linearity of covariance and independence: \operatorname{Cov}(X_1, Y_1) = 2 \operatorname{Var}(X_1) = 2 and \operatorname{Cov}(X_2, Y_1) = 0.^[5] The non-zero entry in \mathbf{K}_{XY} indicates linear dependence between the first component of \mathbf{X} and Y_1, while the zero entry reflects the absence of linear dependence between the second component of \mathbf{X} and Y_1.^[6]

Properties

General Properties

The cross-covariance matrix K_{XY} of two real-valued random vectors \mathbf{X} \in \mathbb{R}^p and \mathbf{Y} \in \mathbb{R}^q satisfies several fundamental algebraic properties arising from the linearity of expectation. Specifically, it is linear in the first argument: for a scalar a and random vector \mathbf{U} \in \mathbb{R}^p, K_{a\mathbf{X} + \mathbf{U}, \mathbf{Y}} = a K_{XY} + K_{UY}. Similarly, it is linear in the second argument: K_{\mathbf{X}, b\mathbf{Y} + \mathbf{V}} = b K_{XY} + K_{X\mathbf{V}} for scalar b and random vector \mathbf{V} \in \mathbb{R}^q. These properties follow directly from the bilinearity of the covariance operator, which extends the linearity of expectation to centered products of random variables.^[7] The cross-covariance matrix is also intimately related to second-moment expectations. In general, K_{XY} = \mathbb{E}[(\mathbf{X} - \boldsymbol{\mu}_X)(\mathbf{Y} - \boldsymbol{\mu}_Y)^T], where \boldsymbol{\mu}_X = \mathbb{E}[\mathbf{X}] and \boldsymbol{\mu}_Y = \mathbb{E}[\mathbf{Y}] are the mean vectors. This centered form expands to the uncentered relation K_{XY} = \mathbb{E}[\mathbf{X} \mathbf{Y}^T] - \boldsymbol{\mu}_X \boldsymbol{\mu}_Y^T, highlighting its connection to the second mixed moment matrix minus the outer product of the means. This equivalence holds because the cross terms involving the means vanish under expectation when centering the vectors.^[7]^[8] A key transpose property is that the cross-covariance matrix between \mathbf{Y} and \mathbf{X} is the transpose of the original: K_{YX} = K_{XY}^T. This follows from the symmetry of the underlying covariance operator, since the (i,j)-th entry of K_{YX} is \mathrm{Cov}(Y_j, X_i) = \mathrm{Cov}(X_i, Y_j), which is the (i,j)-th entry of K_{XY}^T.^[7] Regarding positive semi-definiteness, while K_{XY} itself is generally rectangular and neither symmetric nor positive semi-definite, it contributes to the structure of the joint covariance matrix for the stacked vector (\mathbf{X}^T, \mathbf{Y}^T)^T. This joint matrix takes the block form

\begin{pmatrix} K_{XX} & K_{XY} \\ K_{YX} & K_{YY} \end{pmatrix},

which is symmetric and positive semi-definite. That is, for any real vector \mathbf{z} = (\mathbf{u}^T, \mathbf{v}^T)^T with \mathbf{u} \in \mathbb{R}^p and \mathbf{v} \in \mathbb{R}^q, \mathbf{z}^T \begin{pmatrix} K_{XX} & K_{XY} \\ K_{YX} & K_{YY} \end{pmatrix} \mathbf{z} = \mathrm{Var}(\mathbf{u}^T \mathbf{X} + \mathbf{v}^T \mathbf{Y}) \geq 0. This property ensures that the cross-covariance is compatible with valid joint distributions, as the joint covariance matrix must always be positive semi-definite.^[9]^[10]

Relation to Covariance and Correlation

The cross-covariance matrix K_{XY} between two random vectors X and Y generalizes the concept of covariance to pairs of distinct vectors. A key special case occurs when X = Y, in which K_{XX} reduces to the standard covariance matrix \Sigma_X = \mathbb{E}[(X - \mathbb{E}[X])(X - \mathbb{E}[X])^\top], capturing the joint variability within a single vector.^[11] The cross-correlation matrix extends the normalization applied to covariance matrices, providing a standardized measure of linear dependence between the components of X and Y that bounds entries in [-1, 1]. It is defined as R_{XY} = \operatorname{diag}(\Sigma_{XX})^{-1/2} K_{XY} \operatorname{diag}(\Sigma_{YY})^{-1/2}, where \Sigma_{XX} and \Sigma_{YY} are the covariance matrices of X and Y, respectively, and the diagonal matrices contain the square roots of the component variances (i.e., standard deviations). This normalization divides each cross-covariance entry by the product of the corresponding standard deviations, yielding a scale-invariant dependence metric analogous to Pearson's correlation for univariate pairs.^[12] In contrast to the cross-covariance matrix, which quantifies absolute covariation influenced by the magnitudes and units of X and Y, the cross-correlation matrix focuses on the relative strength of their linear relationships, facilitating comparisons across different vector scales or datasets. These distinctions mirror those between covariance (absolute joint variation) and correlation (standardized association) for a single vector, but apply to inter-vector dependencies. The cross-covariance matrix also supports analyses like partial correlation, where it informs conditional independence tests between variable subsets by examining residual dependencies after accounting for confounding factors via the precision matrix.^[11]^[13]

Extensions

Definition for Complex Random Vectors

The cross-covariance matrix between two complex-valued random vectors \mathbf{X} \in \mathbb{C}^n and \mathbf{Y} \in \mathbb{C}^m with finite second moments is defined as

\mathbf{K}_{XY} = E\left[ (\mathbf{X} - \mu_X) (\mathbf{Y} - \mu_Y)^H \right],

where \mu_X = E[\mathbf{X}] and \mu_Y = E[\mathbf{Y}] are the mean vectors, and ^H denotes the conjugate transpose (Hermitian adjoint).^[14]^[15] This adaptation from the real-valued case, which employs the ordinary transpose, incorporates conjugation to ensure the matrix captures meaningful linear dependencies in the complex domain, as the inner product for complex spaces is sesquilinear.^[16] For the auto-covariance case where \mathbf{X} = \mathbf{Y}, the resulting \mathbf{K}_{XX} is a Hermitian matrix satisfying \mathbf{K}_{XX}^H = \mathbf{K}_{XX}, which guarantees that its eigenvalues are real and non-negative under the finite second-moment assumption.^[14] Additionally, the cross-covariance satisfies the relation \mathbf{K}_{YX} = \mathbf{K}_{XY}^H, reflecting the Hermitian symmetry between the vectors.^[15] The definition assumes that the second moments E[\|\mathbf{X}\|^2] and E[\|\mathbf{Y}\|^2] are finite to ensure the expectation exists; for many applications in signal processing, the vectors are further assumed to be circularly symmetric (proper complex), meaning the pseudo-covariance E[(\mathbf{X} - \mu_X)(\mathbf{Y} - \mu_Y)^T] = \mathbf{0}, which simplifies statistical properties without altering the core definition.^[16]^[17]

Uncorrelatedness

Two random vectors \mathbf{X} and \mathbf{Y} in \mathbb{R}^p and \mathbb{R}^q, respectively, are uncorrelated if their cross-covariance matrix \mathbf{K}_{\mathbf{XY}} = \mathbf{0}, the zero matrix of dimensions p \times q. This condition is equivalent to \mathbb{E}[X_i Y_j] = \mathbb{E}[X_i] \mathbb{E}[Y_j] for all components i = 1, \dots, p and j = 1, \dots, q, assuming the relevant second moments exist.^[8]^[18] If two random vectors are independent, then they are uncorrelated, provided the second moments are finite; this follows from the property that independence implies \mathbb{E}[\mathbf{X} \mathbf{Y}^T] = \mathbb{E}[\mathbf{X}] \mathbb{E}[\mathbf{Y}^T]. The converse does not generally hold, as uncorrelated vectors can exhibit nonlinear dependence. However, for jointly Gaussian random vectors, uncorrelatedness is equivalent to statistical independence, a key property arising from the fact that the joint density factors into marginals when the cross-covariance is zero.^[18]^[8]^[19] For complex-valued random vectors \mathbf{X} and \mathbf{Y} in \mathbb{C}^p and \mathbb{C}^q, uncorrelatedness requires \mathbf{K}_{\mathbf{XY}} = \mathbf{0}, where \mathbf{K}_{\mathbf{XY}} = \mathbb{E}[(\mathbf{X} - \boldsymbol{\mu}_{\mathbf{X}})(\mathbf{Y} - \boldsymbol{\mu}_{\mathbf{Y}})^H] and ^H denotes the Hermitian transpose. In the case of proper complex random vectors (those with zero pseudo-covariance), this suffices; more generally, full second-order uncorrelatedness also demands that the pseudo cross-covariance \mathbb{E}[X_i Y_j] = \mathbb{E}[X_i] \mathbb{E}[Y_j] for all i, j.^[16]^[17] The cross-covariance matrix underlies hypothesis tests for independence between random vectors, particularly in multivariate normal settings where the null hypothesis of independence corresponds to \mathbf{K}_{\mathbf{XY}} = \mathbf{0}; the likelihood ratio test, for instance, assesses this by comparing determinants involving the joint and marginal covariance structures.^[20]

Estimation and Applications

Sample Cross-Covariance Matrix

The sample cross-covariance matrix provides an empirical estimate of the cross-covariance between two random vectors based on a finite set of paired observations. Given N independent and identically distributed (i.i.d.) samples \{ \mathbf{X}^{(k)}, \mathbf{Y}^{(k)} \}_{k=1}^N, where each \mathbf{X}^{(k)} is a p \times 1 vector and each \mathbf{Y}^{(k)} is a q \times 1 vector, the empirical estimator is computed as

\hat{\mathbf{K}}_{XY} = \frac{1}{N} \sum_{k=1}^N (\mathbf{X}^{(k)} - \bar{\mathbf{X}}) (\mathbf{Y}^{(k)} - \bar{\mathbf{Y}})^T,

where \bar{\mathbf{X}} = \frac{1}{N} \sum_{k=1}^N \mathbf{X}^{(k)} and \bar{\mathbf{Y}} = \frac{1}{N} \sum_{k=1}^N \mathbf{Y}^{(k)} are the sample means.^[21] This formula targets the population cross-covariance matrix through the centering of the vectors to remove mean effects.^[22] An unbiased version of the estimator adjusts the scaling to account for degrees of freedom, yielding

\hat{\mathbf{K}}_{XY}^{\text{unb}} = \frac{1}{N-1} \sum_{k=1}^N (\mathbf{X}^{(k)} - \bar{\mathbf{X}}) (\mathbf{Y}^{(k)} - \bar{\mathbf{Y}})^T,

analogous to the unbiased sample covariance matrix. This adjustment ensures the estimator has zero bias for the population parameter under the i.i.d. assumption.^[21] Computationally, the process begins with centering the data by subtracting the sample means from each paired observation, followed by forming the outer product (\mathbf{X}^{(k)} - \bar{\mathbf{X}}) (\mathbf{Y}^{(k)} - \bar{\mathbf{Y}})^T for each k and averaging these p \times q matrices across all samples. The paired nature of the data requires that each \mathbf{X}^{(k)} corresponds directly to \mathbf{Y}^{(k)}, ensuring alignment in the observations.^[22] These steps assume i.i.d. paired samples from the underlying joint distribution.

Applications in Statistics and Signal Processing

In statistics, the cross-covariance matrix is integral to multivariate linear regression, where the population regression coefficient matrix \beta for the model \mathbf{Y} = \mathbf{X}\beta + \boldsymbol{\epsilon} is derived as \beta = \Sigma_{\mathbf{X}\mathbf{X}}^{-1} \Sigma_{\mathbf{X}\mathbf{Y}}, with \Sigma_{\mathbf{X}\mathbf{Y}} representing the cross-covariance between the predictor matrix \mathbf{X} and response matrix \mathbf{Y}.^[23] The ordinary least squares (OLS) estimator employs the sample cross-covariance analog, \hat{\Sigma}_{\mathbf{X}\mathbf{Y}} = \frac{1}{n} \mathbf{X}^T \mathbf{Y} (after centering), to compute \hat{\beta} = (\mathbf{X}^T \mathbf{X})^{-1} \mathbf{X}^T \mathbf{Y}, enabling prediction of multiple responses from shared predictors while accounting for inter-response correlations.^[24] Similarly, in canonical correlation analysis (CCA), the method maximizes the correlations between linear combinations of two multivariate sets \mathbf{X} and \mathbf{Y} by solving for the singular value decomposition of the normalized cross-covariance matrix \Sigma_{\mathbf{X}\mathbf{X}}^{-1/2} \Sigma_{\mathbf{X}\mathbf{Y}} \Sigma_{\mathbf{Y}\mathbf{Y}}^{-1/2}, or equivalently, optimizing the trace of the product involving \Sigma_{\mathbf{X}\mathbf{Y}} to identify paired canonical variates.^[25] In signal processing, the cross-covariance matrix facilitates frequency-domain analysis through the cross-spectral density, defined as the Fourier transform of the cross-covariance function R_{xy}(\tau), which quantifies coherent power transfer between two stationary signals at different frequencies and supports applications like system identification.^[26] It is also employed in beamforming algorithms, where the array covariance matrix—incorporating cross-covariance terms between sensor signals—is inverted to derive optimal weights that steer nulls toward interferers and enhance the signal-of-interest direction in uniform linear arrays.^[27] For noise cancellation, multichannel adaptive filters utilize the cross-covariance between reference noise signals and the primary contaminated signal to minimize residual noise in the output, as seen in generalized sidelobe canceller structures that subtract correlated noise components.^[28] A practical example arises in time series analysis, where the lagged cross-covariance function \gamma_{XY}(h) = \mathrm{Cov}(X_t, Y_{t+h}) identifies lead-lag relationships; the lag h maximizing |\gamma_{XY}(h)| reveals the temporal offset, such as one economic indicator preceding another, aiding forecasting in prewhitened models.^[29] In modern machine learning, particularly within kernel methods during the 2020s, the cross-covariance operator \mathcal{C}_{YX} in reproducing kernel Hilbert spaces enables feature extraction by capturing dependencies between input and output views, as in kernel CCA for dimensionality reduction and transfer learning tasks that align distributions across domains.^[30]^[31]