Cross-spectrum

The cross-spectrum, also known as the cross-spectral density, is a complex-valued function in signal processing that describes the frequency-domain relationship between two time series or signals, obtained as the Fourier transform of their cross-correlation function.^[1]^[2] It quantifies how the power and phase of one signal relate to another across different frequencies, with the real part (cospectrum) capturing in-phase components and the imaginary part (quadrature spectrum) capturing out-of-phase components.^[2] Mathematically, for signals x(t) and y(t), the cross-spectrum S_{xy}(f) is given by S_{xy}(f) = \int_{-\infty}^{\infty} R_{xy}(\tau) e^{-j2\pi f \tau} d\tau, where R_{xy}(\tau) is the cross-correlation function.^[1] In practice, the cross-spectrum is often estimated using methods like Welch's averaged periodogram, which divides signals into overlapping segments, applies windowing, and computes discrete Fourier transforms to yield the cross power spectral density (CPSD), which measures power distribution per unit frequency between the signals.^[3] The magnitude of the cross-spectrum, when normalized by the auto-spectra S_{xx}(f) and S_{yy}(f) as \gamma^2(f) = \frac{|S_{xy}(f)|^2}{S_{xx}(f) S_{yy}(f)}, produces the coherence function, which ranges from 0 (no linear relationship) to 1 (perfect linear correlation) at each frequency and helps assess the strength of linear dependencies while accounting for noise.^[2] This two-sided complex representation can be converted to single-sided magnitude and phase forms for analysis, where the phase indicates lags or delays between signals.^[4] Cross-spectral analysis is widely applied in fields such as vibration testing, acoustics, and geophysical time series to identify coherent modes, detect phase relationships in periodic phenomena (e.g., waves in ocean or atmospheric data), and evaluate system linearity or causality in black-box models.^[2]^[1] For instance, it is used to compare input-output responses in passive systems, filter uncorrelated noise, and model signal dependencies in engineering designs like EMI analysis or neuromuscular signal processing.^[5]^[1] Significance testing for coherence, often at 95% confidence levels, ensures robust interpretations by distinguishing true relationships from random variability.^[2]

Fundamentals

Definition

The cross-spectrum, also known as the cross-spectral density, is a fundamental tool in signal processing that measures the frequency-dependent correlation and interaction between two distinct time-domain signals. It provides insight into how the signals covary at specific frequencies, capturing both the strength and phase relationship of their joint oscillations, thereby enabling analysis of linear relationships or influences between them in the frequency domain.^[6] The concept emerged in the 1960s amid advances in random data analysis within signal processing and statistics, with key formalization appearing in the pioneering 1971 text Random Data: Analysis and Measurement Procedures by Julius S. Bendat and Allan G. Piersol, which established foundational methods for computing and interpreting cross-spectra from measured data. This work built on earlier Fourier-based techniques to address practical challenges in analyzing stationary random processes, such as those encountered in engineering and physical measurements.^[7] At its core, the cross-spectrum relates to the time-domain cross-correlation function, which assesses the similarity between two signals as one is shifted relative to the other; the cross-spectrum extends this by revealing how such similarities manifest across frequencies.^[8] For instance, when the two signals are identical, the cross-spectrum simplifies to the power spectral density, representing the signal's energy distribution at each frequency.^[6] A practical illustration involves environmental signals like wind speed and ocean wave height: at frequencies where wind drives wave formation, the cross-spectrum shows strong in-phase components, indicating constructive interaction, whereas out-of-phase components at other frequencies might highlight dissipative or unrelated dynamics.

Mathematical Formulation

The cross-spectral density, also known as the cross-power spectral density, quantifies the frequency-domain relationship between two signals through the Fourier transform of their cross-correlation function. For two jointly wide-sense stationary random processes x(t) and y(t), the cross-correlation function is defined as R_{xy}(\tau) = E[x(t) y(t + \tau)], where E[\cdot] denotes the expectation operator.^[10]^[8] The cross-spectral density S_{xy}(f) is then given by the Fourier transform of this cross-correlation:

S_{xy}(f) = \int_{-\infty}^{\infty} R_{xy}(\tau) e^{-j 2\pi f \tau} \, d\tau

This formulation follows from the Wiener–Khinchin theorem extended to cross-correlations, which relates the spectral densities of stationary processes to their time-domain correlations.^[10]^[11] This definition assumes that the signals are wide-sense stationary, meaning their means and autocorrelations are time-invariant, and ergodic, allowing ensemble averages to be estimated from time averages; additionally, the processes must have finite average power to ensure the integrals converge.^[8]^[11] For discrete-time signals x and y, sampled at rate f_s, the cross-spectral density is the discrete-time Fourier transform of the discrete cross-correlation R_{xy} = E[x y[n + k]]:

S_{xy}(\omega) = \sum_{k=-\infty}^{\infty} R_{xy} e^{-j \omega k}

where \omega = 2\pi f / f_s is the normalized angular frequency.^[11]^[12] The units of S_{xy}(f) are typically those of the signal power density, such as (volts)^2 / Hz if x(t) and y(t) share the same units, reflecting the covariance per unit frequency.^[1]^[3]

Properties

Basic Properties

The cross-spectrum S_{xy}(f), defined as the Fourier transform of the cross-correlation function between two stationary signals x(t) and y(t), is generally a complex-valued function of frequency f. Its real part, known as the co-spectrum, captures the in-phase correlation between the signals, while the imaginary part, termed the quadrature spectrum, represents the out-of-phase correlation.^[13] A key symmetry arises from the Hermitian property of the cross-correlation for real-valued signals, leading to S_{yx}(f) = S_{xy}^*(f), where ^* denotes the complex conjugate; this implies that the co-spectrum is even (c_{xy}(f) = c_{yx}(f) = c_{xy}(-f)) and the quadrature spectrum is odd (q_{xy}(f) = -q_{yx}(f) = -q_{xy}(-f)). For real signals, an additional relation holds: S_{xy}(-f) = S_{yx}(f). These symmetries stem directly from the properties of the Fourier transform applied to the cross-correlation.^[13] The cross-spectrum exhibits linearity with respect to linear combinations of the input signals. Specifically, for constants a and b, S_{ax + by, z}(f) = a S_{xz}(f) + b S_{yz}(f). This property follows from the bilinearity of the cross-correlation function and the linearity of the Fourier transform.^[14] The magnitude of the cross-spectrum is bounded by the geometric mean of the auto-spectra: |S_{xy}(f)| \leq \sqrt{S_{xx}(f) S_{yy}(f)}. This inequality arises from the Cauchy-Schwarz inequality in the frequency domain or, equivalently, from the fact that the magnitude-squared coherence \gamma_{xy}^2(f) = \frac{|S_{xy}(f)|^2}{S_{xx}(f) S_{yy}(f)} satisfies $0 \leq \gamma_{xy}^2(f) \leq 1, with equality to 1 indicating perfect linear correlation at frequency f.^[13] For uncorrelated signals, where the cross-correlation function is zero for all lags, the cross-spectrum vanishes asymptotically: S_{xy}(f) \approx 0 across all frequencies, reflecting the absence of frequency-specific dependencies between the signals.^[13]

Magnitude and Phase Interpretation

The magnitude of the cross-spectrum, denoted as |S_{xy}(f)|, quantifies the strength of the linear correlation between two signals x(t) and y(t) at frequency f, with higher values indicating stronger frequency-specific coupling or energy transfer between the signals.^[2] This measure arises from the Fourier transform of the cross-correlation function and reflects the amplitude of shared oscillatory components, though it is sensitive to the individual signal powers.^[15] The phase of the cross-spectrum, \arg(S_{xy}(f)), represents the phase difference between the signals at frequency f, which can indicate a time delay \tau such that \phi(f) = \arg(S_{xy}(f)) \approx 2\pi f \tau for a constant delay.^[16] This phase angle reveals lead-lag relationships, where positive values suggest y(t) lags x(t), and it is most reliable when the magnitude is significant, as low coherence can introduce ambiguity.^[15] The real part of the cross-spectrum, known as the cospectrum, measures the in-phase components between the signals, corresponding to covariance or energy transfer when their oscillations align.^[2] In contrast, the imaginary part, or quadrature spectrum, captures the out-of-phase contributions, quantifying components shifted by approximately \pi/2 radians, which are indicative of quadrature relationships or additional delays.^[2] Together, these parts decompose the cross-spectrum into orthogonal contributions that facilitate detailed analysis of signal synchronization.^[16] Unlike the magnitude-squared coherence function, which normalizes the cross-spectrum by the auto-spectra to yield a bounded measure of correlation independent of signal amplitudes, the raw magnitude |S_{xy}(f)| scales with the signals' overall energies and thus requires careful scaling for comparative analysis.^[15]

Estimation Methods

Welch's Overlapped Segment Averaging

Welch's overlapped segment averaging is a widely used non-parametric method for estimating the cross-spectral density (CSD) of two signals from finite-duration data, extending the periodogram averaging approach to reduce estimation variance while controlling bias through windowing and overlap. Originally developed for power spectral density estimation, the technique applies analogously to CSD by averaging cross-periodograms computed from overlapping segments of the input signals.^[17] This method balances spectral resolution and statistical reliability, making it suitable for applications in signal processing where data length is limited.^[17] The procedure begins by dividing each input signal into K overlapping segments, each of length L, with a typical overlap of 50% between consecutive segments to maximize data utilization.^[17] A window function, such as the Hanning or Hamming window, is then applied to each segment to taper the data and mitigate spectral leakage from finite segment edges.^[17] The discrete Fourier transform (DFT) is computed for the windowed segments of both signals, yielding frequency-domain representations X^{(m)}(f) and Y^{(m)}(f) for the m-th segment.^[3] The cross-periodogram for each segment is formed as the product X^{(m)}(f) [Y^{(m)}(f)]^*, and these are averaged across all K segments to obtain the CSD estimate.^[17] The mathematical formulation of the estimator is given by

\hat{S}_{xy}(f) = \frac{1}{K U} \sum_{m=1}^{K} X^{(m)}(f) \left[ Y^{(m)}(f) \right]^*,

where U = \frac{1}{L} \sum_{n=0}^{L-1} |w(n)|^2 is the window normalization factor ensuring unbiased scaling, and w(n) is the window function applied to the time-domain segments.^[17] This averaging over K segments reduces the variance of the estimate by a factor approximately proportional to $1/K, with 50% overlap enabling roughly twice as many independent segments as non-overlapped methods, thereby achieving about 50% variance reduction for the same data length.^[17] Windowing introduces some bias due to smoothing but effectively suppresses leakage, trading minimal resolution loss for improved overall estimate quality.^[17] Key parameters include the segment length L, which trades off frequency resolution (higher L yields finer resolution, approximately f_s / L where f_s is the sampling frequency) against the number of averages K (shorter L allows more segments and lower variance); the overlap percentage (commonly 50%, increasing effective K without additional data); and the window choice (e.g., Hanning for reduced sidelobes).^[17] Longer total data records permit larger K, further lowering variance, but L must be selected based on the expected spectral features to avoid excessive smoothing.^[17] In practice, this method is implemented in tools like MATLAB's cpsd function, which applies Welch's overlapped segment averaging to estimate the CSD between two audio signals, for instance, to analyze phase relationships in stereo recordings by specifying segment length, overlap, and window type.^[3]

Parametric Estimation Techniques

Parametric estimation techniques for the cross-spectral density assume an underlying parametric model for the joint process of the two signals, typically a multivariate autoregressive (AR) or related model, to derive the cross-spectrum from estimated model parameters. This approach contrasts with non-parametric methods like Welch's overlapped segment averaging by imposing a structural model that extrapolates beyond observed data, enabling higher resolution estimates even with limited samples. In autoregressive modeling, a multivariate AR (VAR) model of order p is fitted to the vector time series [x_t, y_t]^T, where the model parameters are obtained via least squares or maximum likelihood estimation using the Yule-Walker equations. The cross-spectral density is then computed from the model's transfer function matrix H(f), which incorporates the AR coefficients, and the innovation covariance matrix \Sigma, yielding S_{xy}(f) = H_x(f) \Sigma H_y^H(f), where H_x(f) and H_y(f) are the rows of H(f) corresponding to x and y, and ^H denotes the Hermitian transpose. This formulation captures the linear dependencies between signals in the frequency domain.^[18] The maximum entropy method (MEM) extends AR-based estimation by maximizing the spectral entropy subject to autocorrelation constraints, effectively fitting an all-pole AR model of infinite order but truncated practically. For cross-spectra, MEM computes the crosspower from the MEM autopower spectra of the individual series using extensions of the Yule-Walker equations to ensure consistency with observed correlations, providing a high-resolution estimate without explicit multivariate fitting.^[19] These techniques offer advantages such as superior frequency resolution for short time series compared to non-parametric approaches, where the effective resolution is limited by data length, and smoother estimates due to the parametric smoothing inherent in the model. However, they require careful model order selection, often using criteria like the Akaike Information Criterion (AIC) to balance fit and complexity, and are sensitive to violations of stationarity assumptions, potentially leading to biased estimates if the signals exhibit time-varying dynamics.^[20] In applications like electroencephalogram (EEG) analysis for studying brain signal interactions, multivariate AR models are fitted to multichannel EEG data to estimate cross-spectra, revealing interactions between brain regions such as during cognitive tasks.^[21]

Applications

Signal Processing and Coherence

In signal processing, the squared coherence function provides a normalized measure of the linear relationship between two signals as a function of frequency. It is defined as

\gamma_{xy}^2(f) = \frac{|S_{xy}(f)|^2}{S_{xx}(f) S_{yy}(f)},

where S_{xy}(f) denotes the cross-spectral density between signals x(t) and y(t), and S_{xx}(f) and S_{yy}(f) are the respective auto-spectral densities. This formulation arises from normalizing the magnitude-squared cross-spectrum by the product of the auto-spectra, ensuring the result is a dimensionless quantity bounded between 0 and 1, akin to the squared correlation coefficient in the time domain but applied spectrally. The derivation follows from the expectation that, for linearly related stationary random processes, the cross-spectrum captures shared power, while the auto-spectra represent total power; the ratio thus quantifies the proportion of shared variance at each frequency, with the absolute value ensuring non-negativity and the squaring emphasizing strength.^[22]^[23] The interpretation of \gamma_{xy}^2(f) is central to assessing signal relationships: a value approaching 1 at a given frequency f signifies a strong linear deterministic relation, where variations in one signal nearly perfectly predict the other after accounting for scaling and phase; conversely, a value near 0 indicates the signals are uncorrelated at that frequency, typically due to additive noise or independent processes. This metric thus distinguishes deterministic components from stochastic ones, with intermediate values reflecting partial linearity influenced by measurement noise or non-linearities.^[22]^[23] For scenarios involving more than two signals, the concept extends to multiple coherence, which generalizes the pairwise measure by incorporating partial cross-spectra to evaluate the collective linear influence of multiple inputs on an output while isolating individual contributions. The multiple squared coherence \gamma_{y;x_1,\dots,x_n}^2(f) represents the fraction of output power at frequency f explained by the set of inputs, derived similarly through matrix inversion of the cross-spectral density matrix involving partial spectra that remove correlations among inputs. This extension is particularly useful in multi-channel environments, where it quantifies overall predictability beyond pairwise analysis.^[23]^[24] In practical applications, coherence leverages the cross-spectrum to enable noise reduction in sensor systems by pinpointing frequency bands exhibiting high coherence, where correlated signal components can be preserved while uncorrelated noise is suppressed through adaptive filtering or beamforming.^[25] For instance, in audio processing for echo cancellation, the cross-spectrum identifies delay-induced correlations at specific frequencies between the reference and microphone signals, allowing coherence-guided algorithms to attenuate echoes while maintaining speech clarity, as demonstrated in double-talk detection schemes that use coherence thresholds to avoid filter divergence during simultaneous speech.^[26] The magnitude of the cross-spectrum underpins this by providing the raw correlation strength normalized in coherence.

System Identification and Transfer Functions

In system identification, the cross-spectrum plays a central role in estimating the transfer function of linear time-invariant systems from measured input-output data. For a system where an input signal x(t) excites an output y(t), the frequency response function H_{xy}(f) is derived as the ratio of the cross-spectral density S_{xy}(f) to the auto-spectral density of the input S_{xx}(f), assuming a unidirectional causal relationship without feedback:

H_{xy}(f) = \frac{S_{xy}(f)}{S_{xx}(f)}.

This nonparametric approach leverages the Fourier transform properties to characterize the system's gain and phase across frequencies, providing insights into dynamic behavior without requiring parametric model assumptions.^[27]^[28] To validate the reliability of the estimated transfer function, the magnitude-squared coherence \gamma^2_{xy}(f) is examined, with values greater than 0.5 typically indicating acceptable linear correlation and minimal noise influence at those frequencies. The phase of H_{xy}(f) further reveals the system's temporal lag, aiding in causality assessment. In multivariate scenarios involving multiple inputs and outputs, partial transfer functions extend this framework by isolating the effect of one input on an output while accounting for correlations among other inputs, thus mitigating bias from confounding variables through conditional spectral densities.^[28] A practical application arises in structural engineering for vibration analysis, where accelerometers measure input forces (e.g., from shakers) and output responses on buildings or bridges to estimate modal frequencies and damping ratios via cross-spectral methods, enabling predictive maintenance and design validation.^[29] However, these techniques assume system linearity and stationarity of signals; violations, such as nonlinearities or non-stationary noise, can introduce bias, particularly when output-correlated noise affects the input spectrum.^[27]