Fact-checked by Grok 2 weeks ago

Canonical correlation

Canonical correlation analysis (CCA) is a multivariate statistical method used to explore and quantify the relationships between two sets of variables measured on the same observations by identifying pairs of linear combinations—one from each set—that exhibit the maximum possible . These linear combinations, known as canonical variates, are derived such that the first pair maximizes the , while subsequent pairs are uncorrelated with prior ones and maximize the remaining . The correlations between these variates, termed canonical correlations, provide a measure of the strength of association between the two variable sets, generalizing the bivariate Pearson to multidimensional . Introduced by statistician in his 1936 paper "Relations Between Two Sets of Variates," builds on earlier multivariate techniques like and has since been extended by researchers such as M. S. Bartlett in 1948 to include probabilistic interpretations and tests of significance. Mathematically, for two random vectors \mathbf{X} (dimension p) and \mathbf{Y} (dimension q), solves for coefficient vectors \mathbf{a} and \mathbf{b} that maximize \rho = \frac{\mathbf{a}^\top \Sigma_{XY} \mathbf{b}}{\sqrt{\mathbf{a}^\top \Sigma_{XX} \mathbf{a}} \sqrt{\mathbf{b}^\top \Sigma_{YY} \mathbf{b}}}, subject to unit variance constraints, where \Sigma denotes covariance matrices; the solutions correspond to the singular values of the matrix \Sigma_{XX}^{-1/2} \Sigma_{XY} \Sigma_{YY}^{-1/2}. In practice, sample covariances replace population ones, and the number of meaningful pairs is limited by the minimum of p and q. CCA finds applications across diverse fields, including for relating cognitive and behavioral measures, for linking to firm performance, to associate brain activity patterns with stimuli, and to connect demographics with purchase behaviors. It serves purposes like data reduction by summarizing covariation in fewer dimensions and through canonical loadings that reveal contributions to the variates. Modern extensions, such as regularized CCA for high-dimensional data, address challenges like and small sample sizes, enhancing its utility in contemporary contexts.

Introduction

Definition and Motivation

Canonical correlation analysis (CCA) is a multivariate statistical technique that identifies and measures the associations between two sets of random vectors, denoted as \mathbf{X} (a p-dimensional vector) and \mathbf{Y} (a q-dimensional vector), by finding linear combinations of the variables in each set that achieve the maximum possible correlation while constraining each combination to have unit variance. These linear combinations, known as canonical variates, are expressed as u = \mathbf{a}^T \mathbf{X} for the first set and v = \mathbf{b}^T \mathbf{Y} for the second set, where \mathbf{a} and \mathbf{b} are coefficient vectors chosen to maximize the correlation \rho = \text{Corr}(u, v). The resulting correlations, termed canonical correlations \rho_k (where k = 1, 2, \dots, \min(p, q)), are ordered from largest to smallest, providing a sequence of paired variates that successively maximize correlation under orthogonality constraints to prior pairs. The primary motivation for CCA arises in scenarios where researchers seek to bridge relationships between two distinct multivariate datasets, such as behavioral measures (e.g., test scores) and physiological outcomes (e.g., metrics in or indicators), without relying solely on pairwise that may overlook underlying structures. Unlike simple , which handles single variables, or multiple , which predicts one set from another, CCA symmetrically explores mutual associations, serving as a dimension-reduction tool akin to but across datasets, thus summarizing complex inter-set dependencies into interpretable canonical dimensions. This approach is particularly valuable when direct variable pairings are insufficient, enabling the detection of latent patterns, such as linking exercise behaviors to cardiovascular markers. CCA operates under several basic assumptions to ensure reliable interpretation and inference. It assumes linear relationships between the canonical variates and the original variables in each set, meaning curvilinear patterns may require data transformations to avoid underestimating associations. Multivariate of the joint distribution of \mathbf{X} and \mathbf{Y} is assumed for optimal statistical properties and valid testing, though the method remains applicable to non-normal metric data with larger samples, albeit with potentially reduced efficiency. Additionally, absence of severe within each set is required to prevent unstable estimates and ensure distinct contributions from variables, as high inter-correlations can confound the isolation of unique effects.

Historical Development

The concept of canonical correlation analysis has its roots in Camille Jordan's 1875 exploration of principal angles between linear subspaces in , where he introduced a framework for measuring the orientations and alignments between pairs of subspaces that later informed multivariate statistical relations. formally introduced canonical correlation analysis in through his seminal paper "Relations Between Two Sets of Variates," which generalized the notion of correlation to linear combinations across two multivariate sets, motivated by applications in and . This work established the method as a tool for identifying maximal inter-set dependencies, drawing on Jordan's principal angles to define the canonical variates. During and after , the method saw key extensions in the 1940s and 1950s. M. S. developed approximations and tests for the of canonical correlations in 1941, providing practical means to assess their reliability in finite samples. further contributed in 1948 with work on internal and external relating to CCA. advanced the field in the 1950s by integrating canonical correlations into frameworks, including tests of significance that linked inter-set relations to broader multivariate hypothesis testing. These contributions solidified CCA's role in , where it was applied in the 1940s to analyze relationships between psychological test batteries and behavioral outcomes, such as in studies of learning and ability prediction. CCA experienced a revival in the 1970s and 1980s, driven by computational advances and its inclusion in major statistical software packages, which enabled widespread empirical applications in social sciences and beyond. In the post-2010 era, the method has surged in for handling multi-view , with seminal works like deep canonical correlation analysis adapting it to nonlinear, high-dimensional settings for tasks such as cross-modal learning and .

Mathematical Formulation

Population Parameters

In canonical correlation analysis (CCA), the population parameters are defined in terms of the true covariance structure between two random vector variables, \mathbf{X} \in \mathbb{R}^p and \mathbf{Y} \in \mathbb{R}^q, assuming they are jointly distributed with finite second moments. The population covariance matrix of \mathbf{X} is \boldsymbol{\Sigma}_{XX} \in \mathbb{R}^{p \times p}, which is symmetric and positive semi-definite; similarly, \boldsymbol{\Sigma}_{YY} \in \mathbb{R}^{q \times q} is the covariance matrix of \mathbf{Y}; and \boldsymbol{\Sigma}_{XY} \in \mathbb{R}^{p \times q} is the cross-covariance matrix between \mathbf{X} and \mathbf{Y}, with \boldsymbol{\Sigma}_{YX} = \boldsymbol{\Sigma}_{XY}^\top. These matrices capture the underlying linear relationships in the infinite-sample limit, where \boldsymbol{\Sigma}_{XX} and \boldsymbol{\Sigma}_{YY} are assumed to be positive definite to ensure invertibility. The core objective of CCA is to identify linear combinations of the variables in each set that maximize their , subject to unit variance constraints. Specifically, the first canonical \rho_1 is given by \rho_1 = \max_{\mathbf{a} \in \mathbb{R}^p, \mathbf{b} \in \mathbb{R}^q} \frac{\mathbf{a}^\top \boldsymbol{\Sigma}_{XY} \mathbf{b}}{\sqrt{\mathbf{a}^\top \boldsymbol{\Sigma}_{XX} \mathbf{a} \cdot \mathbf{b}^\top \boldsymbol{\Sigma}_{YY} \mathbf{b}}}, where \mathbf{a} and \mathbf{b} are the canonical vectors (or loadings) for \mathbf{X} and \mathbf{Y}, respectively, and the denominator normalizes the variances of the canonical variates \mathbf{a}^\top \mathbf{X} and \mathbf{b}^\top \mathbf{Y} to 1. Subsequent canonical correlations \rho_k (for k = 2, \dots, m, where m = \min(p, q)) are obtained by maximizing the under constraints to prior pairs, yielding \rho_1 \geq \rho_2 \geq \dots \geq \rho_m \geq 0. This maximization problem establishes the theoretical foundation for measuring multivariate associations without assuming a specific joint distribution beyond second moments. The canonical correlations and vectors satisfy a generalized eigenvalue problem derived from the maximization. For the k-th pair, the canonical vector \mathbf{a}_k solves \boldsymbol{\Sigma}_{XX}^{-1} \boldsymbol{\Sigma}_{XY} \boldsymbol{\Sigma}_{YY}^{-1} \boldsymbol{\Sigma}_{YX} \mathbf{a}_k = \rho_k^2 \mathbf{a}_k, with a corresponding equation for \mathbf{b}_k: \boldsymbol{\Sigma}_{YY}^{-1} \boldsymbol{\Sigma}_{YX} \boldsymbol{\Sigma}_{XX}^{-1} \boldsymbol{\Sigma}_{XY} \mathbf{b}_k = \rho_k^2 \mathbf{b}_k. Here, the \rho_k^2 are the eigenvalues of these symmetric matrices, and the full set consists of m such pairs (\mathbf{a}_k, \mathbf{b}_k), which are orthogonal within their respective sets: \mathbf{a}_i^\top \boldsymbol{\Sigma}_{XX} \mathbf{a}_j = \delta_{ij} and \mathbf{b}_i^\top \boldsymbol{\Sigma}_{YY} \mathbf{b}_j = \delta_{ij} for i, j = 1, \dots, m, where \delta_{ij} is the Kronecker delta. Typically, only pairs with \rho_k > 0 are considered informative, and the number of such non-zero correlations equals the rank of \boldsymbol{\Sigma}_{XY}. Key properties of these population parameters include the fact that the squared canonical correlations \rho_k^2 represent the eigenvalues of the matrix \boldsymbol{\Sigma}_{XX}^{-1} \boldsymbol{\Sigma}_{XY} \boldsymbol{\Sigma}_{YY}^{-1} \boldsymbol{\Sigma}_{YX}, ordered decreasingly. Moreover, the sum \sum_{k=1}^m \rho_k^2 equals the trace of \boldsymbol{\Sigma}_{XX}^{-1} \boldsymbol{\Sigma}_{XY} \boldsymbol{\Sigma}_{YY}^{-1} \boldsymbol{\Sigma}_{YX}, which quantifies the total proportion of variance in one set explained by the other through linear combinations, providing a measure of overall strength. These eigenvalues also diagonalize the joint structure, facilitating interpretations of shared variability between the sets.

Sample Estimates

In practice, the population covariance matrices \Sigma_{XX}, \Sigma_{YY}, and \Sigma_{XY} are unknown and must be estimated from a sample of n observations on the two sets of variables, denoted as p-dimensional \mathbf{X} and q-dimensional \mathbf{Y}. The sample covariance matrices are computed after centering the data to remove the means, yielding S_{XX} = \frac{1}{n} \mathbf{X}^T (I_n - \frac{1_n 1_n^T}{n}) \mathbf{X}, S_{YY} = \frac{1}{n} \mathbf{Y}^T (I_n - \frac{1_n 1_n^T}{n}) \mathbf{Y}, and S_{XY} = \frac{1}{n} \mathbf{X}^T (I_n - \frac{1_n 1_n^T}{n}) \mathbf{Y}, where I_n is the n \times n identity matrix and $1_n is an n \times 1 vector of ones. These estimators are unbiased for large n under multivariate normality but can exhibit bias in finite samples due to the division by n rather than n-1, though the difference is negligible asymptotically. The sample canonical correlations r_k (for k = 1, \dots, m where m = \min(p, q)) are obtained via estimation, substituting the sample covariances S_{XX}, S_{YY}, and S_{XY} directly into the population eigenvalue equations that define the canonical correlations \rho_k. This approach yields the roots of the \det(S_{XY} S_{YY}^{-1} S_{YX} S_{XX}^{-1} - r^2 I_m) = 0, ordered as $1 \geq r_1 \geq \cdots \geq r_m \geq 0. Asymptotically, as the sample size n \to \infty, the sample canonical correlations converge in probability to their counterparts, r_k \to \rho_k, assuming the are and identically distributed with finite moments. However, in small samples (e.g., n < 10(p + q)), the estimates are positively , with r_k > \rho_k on average, leading to inflated measures of ; this bias decreases with increasing n but can be mitigated using jackknife corrections or bootstrap resampling. An important interpretational aid is the redundancy index, which quantifies the average squared between a canonical variate from one set and all variables in the opposite set, providing a measure of predictive beyond the canonical correlations themselves. For the k-th pair and the second set (of dimension q), the redundancy index is given by \delta_k = \rho_k^2 \times \frac{1}{q} \sum_{j=1}^q l_{jk}^2, where l_{jk} is the canonical loading () of the j-th variable in the second set with the k-th canonical variate of the first set. This index, introduced as a nonsymmetric measure of shared variance, helps assess how much one set explains the other. When the number of variables exceeds the sample size (i.e., p > n or q > n), the sample matrices S_{XX} or S_{YY} become singular, rendering standard inverses undefined and the plug-in estimates ill-posed. In such cases, the Moore-Penrose pseudo-inverse is employed to compute the generalized solutions for the canonical vectors and correlations, effectively projecting onto the column space of the ; alternatively, regularization techniques (such as penalties) can be previewed to stabilize the estimates by adding small diagonal perturbations to the covariances.

Computation

Derivation

The derivation of canonical variates in canonical correlation analysis begins with the optimization problem of maximizing the correlation between linear combinations of two random vector sets, \mathbf{X} and \mathbf{Y}, subject to normalization constraints on their variances. Specifically, the goal is to find coefficient vectors \mathbf{a} and \mathbf{b} that maximize \rho = \frac{\mathbf{a}^\top \Sigma_{XY} \mathbf{b}}{\sqrt{\mathbf{a}^\top \Sigma_{XX} \mathbf{a} \cdot \mathbf{b}^\top \Sigma_{YY} \mathbf{b}}}, where \Sigma_{XX}, \Sigma_{YY}, and \Sigma_{XY} are the covariance matrices, subject to \mathbf{a}^\top \Sigma_{XX} \mathbf{a} = 1 and \mathbf{b}^\top \Sigma_{YY} \mathbf{b} = 1. This formulation, originally proposed by Hotelling, identifies pairs of canonical variates \mathbf{U} = \mathbf{a}^\top \mathbf{X} and \mathbf{V} = \mathbf{b}^\top \mathbf{Y} with maximum correlation \rho. To solve this constrained optimization, introduce the Lagrangian: \mathcal{L} = \mathbf{a}^\top \Sigma_{XY} \mathbf{b} - \frac{\lambda}{2} (\mathbf{a}^\top \Sigma_{XX} \mathbf{a} - 1) - \frac{\mu}{2} (\mathbf{b}^\top \Sigma_{YY} \mathbf{b} - 1), where \lambda and \mu are Lagrange multipliers. Taking partial derivatives and setting them to zero yields the stationarity conditions: \frac{\partial \mathcal{L}}{\partial \mathbf{a}} = \Sigma_{XY} \mathbf{b} - \lambda \Sigma_{XX} \mathbf{a} = \mathbf{0}, \frac{\partial \mathcal{L}}{\partial \mathbf{b}} = \Sigma_{YX} \mathbf{a} - \mu \Sigma_{YY} \mathbf{b} = \mathbf{0}. These imply \Sigma_{XY} \mathbf{b} = \lambda \Sigma_{XX} \mathbf{a} and \Sigma_{YX} \mathbf{a} = \mu \Sigma_{YY} \mathbf{b}. From the second equation, solve for \mathbf{b}: \mathbf{b} = \mu^{-1} \Sigma_{YY}^{-1} \Sigma_{YX} \mathbf{a}. Substitute this into the first equation: \Sigma_{XY} (\mu^{-1} \Sigma_{YY}^{-1} \Sigma_{YX} \mathbf{a}) = \lambda \Sigma_{XX} \mathbf{a}, which simplifies to \mu^{-1} \Sigma_{XY} \Sigma_{YY}^{-1} \Sigma_{YX} \mathbf{a} = \lambda \Sigma_{XX} \mathbf{a}. Premultiplying both sides by \Sigma_{XX}^{-1} gives \mu^{-1} \Sigma_{XX}^{-1} \Sigma_{XY} \Sigma_{YY}^{-1} \Sigma_{YX} \mathbf{a} = \lambda \mathbf{a}. To relate \lambda and \mu, note that the maximized correlation satisfies \rho^2 = (\mathbf{a}^\top \Sigma_{XY} \mathbf{b})^2 under the constraints. Substituting the stationarity conditions shows \mathbf{a}^\top \Sigma_{XY} \mathbf{b} = \lambda = \mu = \rho^2, yielding the generalized eigenvalue problem \Sigma_{XX}^{-1} \Sigma_{XY} \Sigma_{YY}^{-1} \Sigma_{YX} \mathbf{a} = \rho^2 \mathbf{a}. By symmetry, interchanging the roles of the sets produces \Sigma_{YY}^{-1} \Sigma_{YX} \Sigma_{XX}^{-1} \Sigma_{XY} \mathbf{b} = \rho^2 \mathbf{b}. The eigenvalues \rho_k^2 (with $0 \leq \rho_1 \geq \rho_2 \geq \cdots \geq 0) are the squared canonical correlations, and the corresponding eigenvectors \mathbf{a}_k, \mathbf{b}_k define the canonical variates. The solutions exhibit properties due to the structure of the eigenvalue problem. The canonical vectors satisfy \mathbf{a}_i^\top \Sigma_{XX} \mathbf{a}_j = \delta_{ij} and \mathbf{b}_i^\top \Sigma_{YY} \mathbf{b}_j = \delta_{ij}, where \delta_{ij} is the (1 if i = j, 0 otherwise). Additionally, the cross-correlations are diagonal: \mathbf{a}_i^\top \Sigma_{XY} \mathbf{b}_j = \rho_i \delta_{ij}. These properties ensure that subsequent pairs maximize correlations orthogonal to prior ones.

Numerical Solutions

The canonical correlations \rho_k and corresponding canonical variates can be obtained by performing () on the matrix \Sigma_{XX}^{-1/2} \Sigma_{XY} \Sigma_{YY}^{-1/2}, where the singular values \sigma_k = \rho_k provide the correlations, assuming the matrices are positive definite. This direct approach can also use eigenvalue decomposition (EVD) on the symmetric matrix \Sigma_{XX}^{-1/2} \Sigma_{XY} \Sigma_{YY}^{-1} \Sigma_{YX} \Sigma_{XX}^{-1/2}, whose eigenvalues are \rho_k^2, yielding the canonical loadings as linear combinations of the original variables. An equivalent and often more numerically stable method uses () on the whitened : compute \Sigma_{XX}^{-1/2} \Sigma_{XY} \Sigma_{YY}^{-1/2} = U D V^T, where the diagonal elements of D are the canonical correlations \rho_i, and the columns of U and V determine the canonical directions after transformation back to the original space. This SVD formulation is particularly advantageous when the matrices are not square or when partial decompositions suffice for the leading correlations, as it avoids explicit matrix inversion by incorporating whitening steps via Cholesky or . For high-dimensional settings where p and q (the dimensions of the two variable sets) are large, direct EVD or full becomes prohibitive, prompting iterative methods such as alternating or . These algorithms initialize candidate vectors and iteratively refine them by solving reduced-rank regressions or projecting onto deflated subspaces until , often achieving the top-k correlations with fewer operations than full . For instance, the iterative method approximates solutions via gradient-based updates and randomized projections, scaling to datasets with millions of samples and features. The computational complexity of standard EVD or for is O(\max(p, q)^3), dominated by the inversion and of the matrices, making it suitable for moderate dimensions but inefficient for . Scalable alternatives, such as randomized , reduce this to near-linear time O(np \log k + (p + q)k^2) for extracting the top-k components, where n is the sample size, by projecting onto random subspaces before . When covariance matrices like \Sigma_{XX} are ill-conditioned or near-singular—common in high dimensions or small samples—ridge regularization stabilizes the solution by adding a penalty term \lambda I to the diagonals, effectively shrinking the eigenvalues and preventing inversion failures. This approach, known as regularized CCA, modifies the whitened matrix to \Sigma_{XX}^{-1/2 + \lambda} \Sigma_{XY} \Sigma_{YY}^{-1/2 + \lambda} before SVD, balancing correlation maximization with numerical robustness.

Implementation Considerations

Implementing canonical correlation analysis (CCA) requires attention to practical aspects such as available software tools, data preparation, and computational challenges to ensure reliable results. Several established software libraries provide built-in functions for CCA. In R, the cancor() function from the base stats package computes canonical correlations and variates between two data matrices. In Python, the CCA class in scikit-learn's cross_decomposition module fits linear CCA models, supporting regularization to handle . MATLAB offers the canoncorr function in the Statistics and Toolbox, which returns canonical coefficients and correlations while handling centering internally. Preprocessing is essential for valid CCA application, as the method assumes multivariate normal distributions and linear relationships. Data should be centered by subtracting the from each to remove effects, a step performed automatically in many implementations but recommended for manual verification. Handling missing values typically involves listwise deletion to retain complete cases or imputation methods like substitution, though advanced techniques such as expectation-maximization may be used for structured missingness to avoid bias. When the total number of variables across both sets (p + q) exceeds the sample size (n), direct CCA fails due to singular matrices; dimensionality reduction via (PCA) on each set beforehand is a standard remedy to project data into a lower-dimensional space. Numerical stability is critical, particularly with ill-conditioned covariance matrices from collinear variables or small samples. Employing for whitening—orthogonalizing and scaling the data matrices—enhances robustness by avoiding explicit computation of covariance inverses, which can amplify errors in singular cases. This approach, often integrated into SVD-based solutions, prevents numerical overflow and ensures accurate eigenvalue decompositions underlying . For scalability with large datasets, kernel CCA extends linear CCA to nonlinear relationships using kernel functions, enabling approximate solutions via techniques like iterative least squares that reduce memory demands. Libraries such as pyrcca in Python implement regularized kernel CCA, suitable for high-dimensional data where full kernel matrices would be prohibitive. A common pitfall in high-dimensional settings is overfitting, where spurious high correlations emerge due to noise, especially when p + q >> n. To mitigate this, cross-validation for selecting the number of canonical variates is recommended, balancing model complexity against generalization performance.

Statistical Inference

Hypothesis Testing

In canonical correlation analysis, hypothesis testing primarily addresses whether there exists a significant linear relationship between two sets of multivariate variables. The H_0 states that all population canonical correlations are zero, i.e., \rho_k = 0 for k = 1, \dots, m, where m = \min(p, q) and p, q are the dimensions of the two variable sets, implying no linear association between the sets. A common approach for testing this overall null hypothesis is Wilks' lambda statistic, defined as \Lambda = \prod_{k=1}^m (1 - r_k^2), where r_k are the sample canonical correlations. Under H_0 and assuming large sample sizes, the test statistic -[(n-1) \ln \Lambda] approximately follows a \chi^2 distribution with degrees of freedom df = p q, where n is the sample size; rejection of H_0 indicates at least one significant canonical correlation. Another overall test is the Pillai-Bartlett , given by V = \sum_{k=1}^m r_k^2, which measures the total shared variance between the sets. This statistic is particularly robust to violations of and is approximated by an for significance testing, with depending on p, q, m, and n; smaller p-values suggest significant multivariate . For assessing the significance of individual canonical , conditional on prior ones being nonzero, Bartlett's approximation is used: the statistic z_i = -\frac{n - p - q - 1}{2} \ln(1 - r_i^2) approximately follows a \chi^2 distribution with (p - i + 1)(q - i + 1) for the i-th . This test, originally proposed by in , helps identify the number of meaningful dimensions in the association. These parametric tests assume multivariate of the observations in both variable sets. When this assumption is violated, permutation tests provide a robust alternative by resampling the data under H_0 to generate an empirical for statistics like Wilks' or the Pillai-Bartlett trace, enhancing validity in non-normal scenarios.

Confidence Intervals and

Confidence intervals for canonical correlations provide estimates of the precision around sample estimates r_k, building on tests for by quantifying uncertainty in the population parameters \rho_k. Under multivariate and large sample sizes, the asymptotic variance of the first sample canonical correlation r_1 is approximated as \operatorname{Var}(r_1) \approx \frac{(1 - \rho_1^2)^2}{n}, where n is the sample size and \rho_1 is the population value; this delta-method approximation allows construction of Wald-type confidence intervals via normal theory. For higher-order correlations r_k (k > 1), the variance depends on previous correlations and requires adjustments, but the first-order approximation remains useful for initial assessments. Bootstrap methods offer robust alternatives, especially in small samples or non-normal data, by resampling pairs (X_i, Y_i) to generate empirical distributions of r_k. The percentile bootstrap computes intervals as the 2.5th and 97.5th s of bootstrapped r_k values over B resamples (typically B = 1000), while bias-corrected accelerated () methods adjust for and in the . These approaches perform well for the first canonical correlation but may require more resamples for higher orders due to increased variability. Fieller's method extends to canonical correlations for constructing confidence intervals on ratios or differences involving \rho_k, such as \rho_k / \rho_1 or \rho_k - \rho_{k+1}, by inverting a and accounting for correlation between estimates to avoid unbounded intervals. This is particularly useful when comparing the relative strength of canonical correlations across dimensions. Power analysis for CCA tests, such as those based on Wilks' approximated by a non-central , evaluates the probability of detecting true associations under alternative hypotheses with specified \rho_k > 0. Simulations generate under the alternative, compute the , and estimate as the proportion of rejections at a given level (e.g., \alpha = 0.05); the non-centrality parameter \lambda scales with n, p, q, and \rho_k, enabling assessment for various scenarios. Tools like R's CCA package facilitate these simulations by fitting models to generated and aggregating results over replications. Sample size determination targets desired power (e.g., 80%) for detecting \rho_k > a threshold, incorporating dimensions p and q; formulas or iterative simulations solve for n using the non-central chi-squared distribution, often recommending n \geq 10(p + q) as a minimum but scaling higher for modest effects (e.g., \rho_1 = 0.3). For reliable of multiple functions, guidelines suggest n \geq 40 to $60 times the total number of variables.

Applications

Practical Uses

In , canonical correlation analysis (CCA) has been applied to explore multivariate relationships between scores from different personality test batteries, such as the Minnesota Multiphasic Personality Inventory-2 (MMPI-2) and the Millon Clinical Multiaxial Inventory-III (MCMI-III), revealing significant canonical correlations between their scales that highlight shared underlying constructs in personality assessment. For instance, studies in the 1990s and early 2000s examined associations between MMPI scales and the NEO Personality Inventory (NEO-PI), which measures the traits, to understand across instruments, with canonical variates often showing strong links between neuroticism-related MMPI scales and NEO . In and research, facilitates the integration of multi-view data from different biological assays, such as profiles and measurements, to identify correlated patterns across datasets and uncover shared biological mechanisms driving phenotypes. Post-2010 applications have demonstrated its utility in cross-cohort studies, where extracts latent features from high-dimensional data, explaining 39-50% of variation in counts using proteomic and methylomic views. In economics, CCA links sets of macroeconomic indicators, such as GDP growth, inflation rates, and interest rates, to measures of firm performance, including profitability ratios and stock returns, enabling the identification of how aggregate economic conditions influence corporate outcomes. For example, analyses of emerging markets have shown significant canonical correlations between macroeconomic variables and stock market indices, providing insights into systemic risks for portfolio management. In , particularly multi-view learning, serves as a foundational method for across modalities without labels, such as aligning and text representations to maximize their correlation in a shared . Deep variants of have been employed for tasks like image-text retrieval, where nonlinear projections achieve higher alignment accuracy compared to traditional methods, with applications in search engines demonstrating improved cross-modal retrieval performance. Interpretability in CCA relies on canonical loadings, which indicate the contribution of original variables to the canonical variates (denoted as vectors \mathbf{a}_k and \mathbf{b}_k), allowing researchers to plot these loadings to visualize key variables driving the correlations and identify cross-set influences. Redundancy analysis further quantifies the proportion of variance in one variable set explained by the opposite canonical variate, providing a measure of practical shared information beyond raw correlations, often visualized in bar plots to assess the utility of extracted dimensions. Despite its strengths, classical CCA is sensitive to outliers, which can distort canonical correlations and variates, necessitating robust variants for real-world data prone to noise or anomalies.

Illustrative Examples

A common illustrative toy example involves exploring the relationship between anthropometric measures and components using data from a large-scale survey of adults. In one such analysis, physical constitution variables (height, weight, chest circumference, upper arm circumference, and sitting height) were examined against variables (pulse rate, systolic , and diastolic ) in a sample of 8,909 females. The first canonical correlation was 0.381 (p < 0.001), indicating a moderate association between the first canonical variates, where the physical type factor (dominated by height and weight loadings) positively relates to the factor (with higher loadings on systolic and diastolic pressures). The second canonical correlation was 0.108 (p < 0.001), capturing a weaker link involving pulse rate and a contrast in weight versus chest circumference. These loadings suggest that larger body size is associated with elevated levels, though the correlations are modest due to the population-level variability. For a real-data case, the Iris dataset provides a classic demonstration, splitting the four measurements into sepal dimensions (sepal length and width as X) and petal dimensions (petal length and width as Y) across 150 observations. Canonical correlation analysis yields two correlations since min(p, q) = 2: the first ρ₁ = 0.864 and the second ρ₂ = 0.484. The canonical variates are linear combinations defined by the following coefficients:
VariateSepal LengthSepal WidthPetal LengthPetal Width
First (u₁, v₁)-0.223-0.007-0.258-0.006
Second (u₂, v₂)-0.1190.498-0.0910.549
These loadings indicate that the first pair of variates primarily contrasts length versus width in both sets (negative coefficients for length, near-zero for width), capturing the dominant shared variance between sepal and petal shapes, while the second pair emphasizes width contrasts. This example highlights how reveals strong morphological associations within the flower data, with ρ₁² ≈ 0.747 explaining a substantial portion of the cross-covariance. In the perfect correlation scenario, consider two multivariate sets where Y = A X exactly, with no noise or error term. Here, the first canonical correlation ρ₁ equals 1, as the canonical variates can be perfectly aligned to capture the full linear dependency between the sets. Adding noise, such as Y = A X + ε where ε is random error, reduces ρ₁ below 1, depending on the noise variance; if ε = 0, ρ₁ remains 1, demonstrating CCA's ability to detect deterministic linear relations. For an anticorrelation example, suppose Y = -X directly, without noise. The maximum canonical correlation is still ρ₁ = 1, as the signs of the canonical variates can be flipped during computation to yield a positive correlation while preserving the absolute strength of the linear opposition between the sets. This illustrates that CCA measures the magnitude of the relationship, not the direction, with |ρ| reaching its theoretical maximum of 1 for perfect linear (anti)dependence. Visualizations aid in interpreting CCA results, such as a scree plot of the squared canonical correlations ρ_k², which descends from near 1 for the first dimension to smaller values, analogous to PCA scree plots for assessing dimensionality. Additionally, biplots of the canonical scores (projections of observations onto the variates) overlay variable loadings as vectors, revealing how samples cluster and which original variables drive the correlations in the reduced space. For instance, in the Iris example, a biplot would show tight alignment between sepal and petal score vectors for the first dimension, emphasizing length dominance.

Connections to Principal Angles

Canonical correlation analysis (CCA) provides a geometric interpretation by viewing the data matrices X \in \mathbb{R}^{n \times p} and Y \in \mathbb{R}^{n \times q} as spanning subspaces A and B in \mathbb{R}^p and \mathbb{R}^q, respectively, where the canonical vectors align the principal directions between these subspaces. The canonical correlations \rho_k represent the cosines of the principal angles \theta_k between the subspaces, ordered as \theta_1 \leq \cdots \leq \theta_m with $0 \leq \theta_k \leq \pi/2, such that \cos \theta_k = \rho_k for k = 1, \dots, m = \min(p, q). The concept of principal angles between two subspaces originates from Jordan's 1875 theorem, which defines these angles through the singular value decomposition (SVD) of the matrix formed by the projection operators onto the subspaces. Specifically, if U and V are orthonormal bases for the subspaces, the cosines of the principal angles are the singular values of U^T V. In CCA, this relation is established by applying a whitening transformation to X, yielding \Sigma_{XX}^{-1/2} X, which normalizes the subspace to unit covariance; subsequent SVD on the cross-covariance with the whitened Y directly yields the principal angles as the canonical correlations. This geometric linkage highlights how CCA quantifies the alignment between subspaces. In geometric applications, CCA serves as a measure of subspace similarity, particularly in computer vision tasks such as pose estimation, where it aligns feature subspaces across different views to infer object orientations. Canonical correlation analysis (CCA) extends (PCA) by examining relationships between two distinct sets of variables rather than within a single set. In PCA, linear combinations are derived to maximize variance and achieve decorrelation within the data, whereas CCA identifies pairs of linear combinations—one from each set—that maximize the correlation between them. A notable mathematical connection is that CCA can be formulated as applying PCA to the cross-covariance matrix after whitening both variable sets to unit covariance, effectively normalizing the within-set structures before focusing on between-set associations. Additionally, when the two sets are identical (Y = X), CCA simplifies to standard PCA, with the canonical directions corresponding to the principal components and all canonical correlations equal to 1 (up to the rank). Partial least squares (PLS) regression shares conceptual similarities with CCA as a method for linking two multivariate sets but differs in its optimization criterion. While CCA maximizes the correlation between projected variates, PLS maximizes the covariance between them, which emphasizes shared variance rather than normalized association strength. This makes PLS particularly advantageous in predictive settings, such as regression tasks where one set serves as predictors and the other as responses, as it directly incorporates covariance structure for better forecasting performance. In contrast, CCA prioritizes uncovering underlying associations without a primary focus on prediction. CCA generalizes multivariate analysis of variance (MANOVA) and multivariate analysis of covariance (MANCOVA) by accommodating two sets of dependent variables alongside predictors. MANOVA tests differences in means across groups for multiple dependent variables, but CCA extends this framework to explore linear relationships between two full sets of variables, treating one as predictors and the other as responses without assuming a single outcome structure. This generalization allows CCA to capture more nuanced multivariate dependencies beyond group mean comparisons. Procrustes analysis, which aligns configurations by rotation, translation, or scaling to minimize differences, bears a resemblance to CCA under specific conditions. In orthogonal Procrustes analysis, the goal is to find a rotation matrix that maximizes the trace of the product of transformed matrices, akin to CCA when the within-set covariance matrices are identity (i.e., after whitening to uncorrelated unit-variance variables). Thus, Procrustes can be seen as a simplified form of CCA for comparing pre-aligned or standardized configurations, often used in shape analysis or ordination comparisons. The choice between CCA and related methods depends on the analytical objective: CCA is ideal for exploratory investigations of associations between two variable sets, revealing maximal correlations without assuming causality or prediction needs, whereas is favored for regression-oriented tasks where prediction accuracy is paramount due to its covariance maximization. For group-based comparisons, may suffice if focusing on mean differences, but CCA offers broader relational insights; is appropriate when aligning existing low-dimensional representations rather than deriving new projections.

Extensions and Variants

Probabilistic CCA

Probabilistic canonical correlation analysis (PCCA) offers a Bayesian generative framework for canonical correlation analysis, interpreting the observed multivariate datasets \mathbf{X} \in \mathbb{R}^{n \times p} and \mathbf{Y} \in \mathbb{R}^{n \times q} as noisy linear projections of shared low-dimensional latent variables \mathbf{Z} \in \mathbb{R}^{n \times r}, where r \leq \min(p, q) represents the effective correlation dimension. This model explicitly accounts for measurement noise and latent structure, enabling robust inference in the presence of imperfect data. The core probabilistic model is specified as follows: \begin{align*} \mathbf{X} &= \mathbf{W}_X \mathbf{Z} + \boldsymbol{\epsilon}_X, \\ \mathbf{Y} &= \mathbf{W}_Y \mathbf{Z} + \boldsymbol{\epsilon}_Y, \end{align*} where \mathbf{Z} \sim \mathcal{N}(\mathbf{0}, \mathbf{I}_r), \boldsymbol{\epsilon}_X \sim \mathcal{N}(\mathbf{0}, \boldsymbol{\Psi}_X), and \boldsymbol{\epsilon}_Y \sim \mathcal{N}(\mathbf{0}, \boldsymbol{\Psi}_Y), with \boldsymbol{\Psi}_X and \boldsymbol{\Psi}_Y typically diagonal to ensure identifiability. The conditional joint distribution p(\mathbf{X}, \mathbf{Y} \mid \mathbf{Z}) follows a multivariate Gaussian, and the observed-data likelihood p(\mathbf{X}, \mathbf{Y}) arises from marginalizing over \mathbf{Z}. Maximum likelihood estimation of the loading matrices \mathbf{W}_X, \mathbf{W}_Y and noise covariances \boldsymbol{\Psi}_X, \boldsymbol{\Psi}_Y proceeds via the expectation-maximization (EM) algorithm, which iteratively computes expectations over the latent \mathbf{Z} in the E-step and maximizes the expected complete-data log-likelihood in the M-step. This yields point estimates for the canonical directions while naturally incorporating the noise model. For full Bayesian treatment, posterior inference over the parameters and latents is approximated using variational Bayes or Markov chain Monte Carlo (MCMC) methods, providing uncertainty quantification for the canonical correlations \rho_k. Variational approximations, in particular, offer a scalable mean-field solution that balances computational efficiency with posterior fidelity. PCCA's latent variable formulation confers advantages in handling noisy measurements and missing data, as marginalization over \mathbf{Z} allows imputation and inference without complete observations across views. These properties have proven valuable in multi-omics applications, such as integrating gene expression and methylation datasets, where a whitening-enhanced PCCA variant facilitates dimensionality reduction and correlation discovery amid high noise levels. When the noise covariances are equal and isotropic (i.e., \boldsymbol{\Psi}_X = \boldsymbol{\Psi}_Y = \sigma^2 \mathbf{I}), the marginal correlations between \mathbf{X} and \mathbf{Y} in PCCA coincide with the canonical correlations of the classical formulation, establishing a direct probabilistic link to the deterministic approach. Recent extensions include Deep Dynamic Probabilistic CCA (D2PCCA) for analyzing nonlinear latent dynamics in time-series data from multimodal sources.

Regularized and Deep CCA

Regularized canonical correlation analysis (CCA) extends classical CCA to high-dimensional settings by incorporating penalty terms that promote sparsity and prevent overfitting. A common formulation adds L1 penalties to the canonical vectors, such as \lambda \| \mathbf{a} \|_1 + \mu \| \mathbf{b} \|_1, where \mathbf{a} and \mathbf{b} are the weight vectors for the two data views, and \lambda, \mu > 0 are tuning parameters. This sparsity-inducing regularization selects relevant features by shrinking irrelevant coefficients to zero, making it suitable for applications like where the number of variables exceeds the sample size. The optimization is typically solved using alternating or iterative penalized , with implementations available in tools like the . Kernel CCA further generalizes to capture nonlinear relationships by mapping the input views into a high-dimensional via maps \phi(\mathbf{X}) and \psi(\mathbf{Y}), such as () kernels defined as k(\mathbf{x}_i, \mathbf{x}_j) = \exp(-\gamma \| \mathbf{x}_i - \mathbf{x}_j \|^2). The canonical correlations are then computed through eigendecomposition of the matrices in this implicit , avoiding explicit computation of the mappings. This approach has been shown to outperform linear in extracting nonlinear dependencies, with theoretical guarantees on statistical consistency under mild conditions. Deep CCA (DCCA), introduced in 2013, leverages neural networks to learn complex nonlinear transformations f(\mathbf{X}; \theta_f) and g(\mathbf{Y}; \theta_g) of the two views such that the correlation between f(\mathbf{X}) and g(\mathbf{Y}) is maximized, often using multilayer perceptrons with for . The objective is to solve \max_{\theta_f, \theta_g} \rho(f(\mathbf{X}), g(\mathbf{Y})) subject to unit variance constraints on the outputs, enabling the discovery of hierarchical representations. DCCA has demonstrated superior performance over kernel CCA in tasks, such as aligning audio and visual speech features to achieve high correlations on benchmark datasets. Sparse variants of these methods, such as lasso-penalized CCA, apply L1 regularization specifically for variable selection in high-dimensional genomics data, identifying correlated gene expressions and DNA markers by solving \max \mathbf{a}^T \mathbf{X}^T \mathbf{Y} \mathbf{b} - \lambda (\| \mathbf{a} \|_1 + \| \mathbf{b} \|_1) via alternating optimization. This has been effective in quantifying associations between thousands of genetic features, selecting subsets of 10-50 key variables per canonical pair in expression-marker studies. Recent advancements include DeepGeoCCA (2024), a geometric extension that handles symmetric positive definite (SPD) matrices in multi-view settings, such as covariance structures from data, by embedding SPD manifolds into tangent spaces and applying deep CCA on the flattened representations. It achieves up to 15% higher correlations than standard DCCA in multi-modal EEG-fMRI fusion tasks, with applications in for integrating geospatial or sensor-based multi-view data.

References

  1. [1]
    Lesson 13: Canonical Correlation Analysis | STAT 505
    Overview. Canonical correlation analysis explores the relationships between two multivariate sets of variables (vectors), all measured on the same individual. ...<|control11|><|separator|>
  2. [2]
    [PDF] Canonical Correlation Analysis - The University of Texas at Dallas
    After principal component analysis (PCA), CCA is one of the oldest multivariate techniques that was first defined in 1935 by Hotelling. In addition of being the ...
  3. [3]
    [PDF] Canonical Correlation Analysis
    Apr 7, 2024 · Canonical Correlation Analysis (CCA) connects two sets of variables by finding linear combinations of variables that maximally correlate. There ...
  4. [4]
    Relations Between Two Sets of Variates - jstor
    The study of individual differences in mental and. physical traits calls for a detailed study of the relations between sets of correlated. variates.
  5. [5]
    Canonical correlation analysis in high dimensions with structured ...
    Canonical correlation analysis (CCA) is a technique for measuring the association between two multivariate data matrices. A regularized modification of ...Missing: paper | Show results with:paper
  6. [6]
    13.1 - Setting the Stage for Canonical Correlation Analysis | STAT 505
    In a way, the motivation for canonical correlation is very similar to principal component analysis. It is another dimension-reduction technique.
  7. [7]
    [PDF] Canonical Correlation - NCSS
    Canonical correlation analysis is the study of the linear relations between two sets of variables. It is the multivariate extension of correlation analysis.
  8. [8]
    [PDF] Canonical Correlation Analysis
    Canonical correlation analysis can accommodate any metric variable without the strict assumption of normality. Normality is desirable because it standardizes a.
  9. [9]
    [PDF] Canonical Correlation Analysis: An Annotated 24p. - ERIC
    Apr 26, 1984 · The principal reason behind its resurrection was its computerization and inclusion in major statistical packages." Thus Rood and Erskine (1976) ...Missing: revival advances
  10. [10]
    Canonical Correlation Analysis Using Small Number of Samples
    In this article, we investigate small sample bias in canonical correlation analysis and apply the jackknife bias correction to the estimation of canonical ...
  11. [11]
    Canonical Correlation Analysis: An Overview with Application to ...
    Dec 1, 2004 · We present a general method using kernel canonical correlation analysis to learn a semantic representation to web images and their ...
  12. [12]
    [PDF] Canonical Correlation Clarified by Singular Value Decomposition
    May 28, 2011 · In that case, the calculation of the canonical correlation is essentially free, because SVD must already be done on each data set to get its ...
  13. [13]
    None
    ### Summary of Iterative Least Squares Method for Large-Scale CCA
  14. [14]
    A Tutorial on Canonical Correlation Methods - ACM Digital Library
    This tutorial explains the theory of canonical correlation analysis, including its regularised, kernel, and sparse variants.Abstract · Information & Contributors · Cited By
  15. [15]
    cancor function - Canonical Correlations - RDocumentation
    The canonical correlation analysis seeks linear combinations of the y variables which are well explained by linear combinations of the x variables.
  16. [16]
    CCA — scikit-learn 1.7.2 documentation
    Canonical Correlation Analysis, also known as “Mode B” PLS. For a comparison between other cross decomposition algorithms, see Compare cross decomposition ...Missing: software libraries MATLAB
  17. [17]
    canoncorr - Canonical correlation - MATLAB - MathWorks
    This MATLAB function computes the sample canonical coefficients for the data matrices X and Y.Missing: software libraries Python
  18. [18]
    What is Canonical Correlation Analysis? - GeeksforGeeks
    May 14, 2024 · Step 1: Mean Centering Calculate the mean of each variable in X and Y, and subtract the means from the respective variables to center the data:.
  19. [19]
    Generalized canonical correlation analysis with missing values
    Aug 2, 2011 · In this paper, two new methods for dealing with missing values in generalized canonical correlation analysis are introduced. The first ...
  20. [20]
    [PDF] A tutorial on canonical correlation analysis - arXiv
    Dec 1, 2018 · Moreover, PCA can be used for dimensionality reduction as a pre-processing step before CCA (e.g. ,15). ii) Analogous to PCA and CCA, independent ...
  21. [21]
    [PDF] Large Scale Canonical Correlation Analysis with Iterative Least ...
    Dec 30, 2014 · QR decomposition is performed not only in the end but after every iteration for numerical stability issues (here we only need to QR with ...
  22. [22]
    [cs/0609071] A kernel method for canonical correlation analysis - arXiv
    Sep 13, 2006 · In this paper, we investigate the effectiveness of applying kernel method to canonical correlation analysis.Missing: seminal | Show results with:seminal
  23. [23]
    Pyrcca: Regularized Kernel Canonical Correlation Analysis in ... - NIH
    Nov 22, 2016 · In this article we introduce Pyrcca, an open-source Python package for performing canonical correlation analysis (CCA).
  24. [24]
    On the stability of canonical correlation analysis and partial least ...
    Feb 21, 2024 · Instability or overfitting can occur if an insufficient sample size is available to properly constrain the model.
  25. [25]
    Conduct and Interpret a Canonical Correlation - Statistics Solutions
    Wilk's lambda (U value) and Bartlett's V serve as tests of significance for the canonical correlation coefficient. Wilk's lambda tests the first canonical ...Missing: trace | Show results with:trace
  26. [26]
    Canonical Correlation Analysis | Stata Annotated Output - OARC Stats
    Wilks' lambda is the product of the values of (1-canonical correlation2) ... Pillai's trace is the sum of the squared canonical correlations. It is only ...Missing: Bartlett | Show results with:Bartlett
  27. [27]
    Canonical Correlation Analysis - Phil Ender
    Each canonical correlation has an eigenvalue related to Wilks' Lambda. Compute m = n -3/2 - (p+q)/2 once. with df1 and df2 degrees of freedom. A measure of ...
  28. [28]
    [PDF] A Demonstration of Canonical Correlation Analysis with Orthogonal ...
    Originally proposed by Hotelling (1935; 1936), canonical correlation analysis (CCA) is a generalization of Karl Pearson's product moment correlation coefficient ...
  29. [29]
    p.asym function - RDocumentation
    Canonical correlation analysis (CCA) measures the degree of linear relationship between two sets of variables. The number of correlation coefficients calculated ...
  30. [30]
    Permutation inference for canonical correlation analysis - PMC
    For numerical stability, sum of logarithms is favoured over the logarithm of ... decomposition can be replaced by singular value decomposition (svd) or qr ...
  31. [31]
    Asymptotic Theory for Canonical Correlation Analysis - ScienceDirect
    The asymptotic distribution of the sample canonical correlations and coefficients of the canonical variates is obtained when the nonzero population ...
  32. [32]
    CANONICAL CORRELATION ANALYSIS AND REDUCED RANK ...
    The objective of canonical correlation analysis is to discover and use linear combinations of the variables in one set that are highly correlated with linear ...
  33. [33]
    Canonical correlation analysis for elliptical copulas - ScienceDirect
    Confidence intervals for higher order canonical directions and correlations do not perform as well whether using bootstrap or asymptotic results to construct ...
  34. [34]
    Fieller's Theorem | Request PDF - ResearchGate
    Formulae for multiple, partial and canonical correlation coefficients are generally expressed in terms of the elements of inverse covariance matrix. They are ...
  35. [35]
    OF CERTAIN INDEPENDENCE TEST STATISTICS" - Project Euclid
    ... noncentral chi-square variate on v degrees of freedom with noncentrality ... Some non-central distribution problems in multivariate analysis. Ann. Math ...<|control11|><|separator|>
  36. [36]
    Canonical Correlation Analysis of Survey Data - The R Journal
    Jul 13, 2025 · CCA is used to explore sample correlations between X and Y observed on the same experimental units by analyzing the coefficients, A1=(a1,a2,…,ap) ...Missing: historical | Show results with:historical
  37. [37]
    [PDF] A Canonical Correlation Analysis of the MMPI-2 and MCMI-III
    Oct 1, 2012 · The psychometric properties of the measure have been extensively examined and found to be robust (Millon, 1994;. Millon, 1997) and it is ...Missing: NEO | Show results with:NEO
  38. [38]
    Canonical Correlation Analysis | R Data Analysis Examples
    Canonical correlation analysis is used to identify and measure the associations among two sets of variables.
  39. [39]
    Canonical correlation analysis for multi-omics: Application to cross ...
    CCA is a correlation-based method for multi-omics data that reduces each assay's dimension to orthogonal components, generating low-dimensional summaries.
  40. [40]
    Canonical correlation analysis for multi-omics - PubMed Central
    CCA is a statistical technique to identify associations among two assays where each assay contains multiple variables. Specifically, CCA finds a linear ...
  41. [41]
    [PDF] Deep Correlation for Matching Images and Text - CVF Open Access
    This paper addresses the problem of matching images and captions in a joint latent space learnt with deep canon- ical correlation analysis (DCCA).
  42. [42]
    The canonical correlation analysis of the blood pressure ... - J-Stage
    1) Only two canonical correlations were statistically significant. 2) The value of the first canonical correlation was 0.262 for males and 0.381 for females.
  43. [43]
    [PDF] Chapter 7 Canonical Correlation Analysis
    The R output below is for a canonical correlation analysis on some iris data. An iris is a flower, and there were 50 observations with 4 variables sepal ...
  44. [44]
    [PDF] Canonical Correlation a Tutorial
    Jan 12, 2001 · Canonical correlation analysis (CCA) is a way of measuring the linear relationship between two multidimensional variables.
  45. [45]
    Lesson 13: Canonical Correlation Analysis - STAT ONLINE
    Overview. Canonical correlation analysis explores the relationships between two multivariate sets of variables (vectors), all measured on the same individual. ...
  46. [46]
  47. [47]
    [PDF] Principal Angles Between Subspaces and Their Tangents
    Sep 5, 2012 · The concept of principal angles between subspaces (PABS) is first in- troduced by Jordan [1] in 1875. Hotelling [2] defines PABS in the form of.
  48. [48]
    Phase Space for Face Pose Estimation | SpringerLink
    We propose a new method for pose estimation that exploits oriented Phase Congruency (PC) features and Canonical Correlation Analysis (CCA) to define a latent ...
  49. [49]
    [PDF] A Whitening Approach to Probabilistic Canonical Correlation ... - arXiv
    Dec 5, 2018 · Canonical correlation analysis (CCA) is a classic and highly versatile statistical approach to investigate the linear relationship between two ...
  50. [50]
    Neural Network Implementations for PCA and Its Extensions - 2012
    Jul 19, 2012 · Canonical correlation analysis is described in Section 14. A ... When y = x, it reduces to PCA [14]. After the principal singular ...
  51. [51]
    Canonical Correlation Analysis and Partial Least Squares for ...
    (A) Principal component analysis–canonical correlation analysis (PCA-CCA) with fixed number of principal components (PCs). (B) PCA-CCA with data-driven number ...
  52. [52]
    A synthesis of canonical variate analysis, generalised canonical ...
    Generalised canonical correlation analysis (GCCA) is concerned with the analysis of K sets of variables, all describing the same samples. A generalised ...
  53. [53]
    Bayesian Canonical Correlation Analysis
    Mean-field variational methods are widely used for approximate posterior inference in many probabilistic models. In a typical application, mean-field ...
  54. [54]
    A whitening approach to probabilistic canonical correlation analysis ...
    Jan 9, 2019 · We propose a simple yet flexible probabilistic model for CCA linking together multivariate regression, latent variable models, and high-dimensional estimation.<|control11|><|separator|>
  55. [55]
    Deep Canonical Correlation Analysis
    We introduce Deep Canonical Correlation Analysis (DCCA), a method to learn complex nonlinear transformations of two views of data.
  56. [56]