Tucker decomposition

Tucker decomposition, also known as the higher-order singular value decomposition (HOSVD), is a multilinear algebra method that generalizes the matrix singular value decomposition to higher-order tensors, representing an N-way array as the mode-n product of a core tensor and factor matrices along each mode.^[1] For an Nth-order tensor \mathcal{X} \in \mathbb{R}^{I_1 \times \cdots \times I_N}, the decomposition approximates \mathcal{X} \approx \mathcal{G} \times_1 \mathbf{U}^{(1)} \times_2 \mathbf{U}^{(2)} \cdots \times_N \mathbf{U}^{(N)}, where \mathcal{G} is the core tensor of size R_1 \times \cdots \times R_N (with R_n \leq I_n as the multilinear rank) and each \mathbf{U}^{(n)} \in \mathbb{R}^{I_n \times R_n} is an orthogonal factor matrix capturing the principal components in mode n.^[1] This form allows for compression, dimensionality reduction, and extraction of latent structures in multidimensional data, with the core tensor \mathcal{G} encoding interactions among the modes while the factor matrices represent mode-specific variations.^[1] Named after psychometrician Ledyard R. Tucker, who introduced the method in the context of three-mode factor analysis in 1966,^[2] the decomposition builds on earlier work in tensor algebra and has evolved into a foundational tool in tensor analysis. Tucker's original formulation focused on three-mode data common in psychometrics, but it was later generalized to arbitrary orders and refined through concepts like the HOSVD, proposed by De Lathauwer et al. in 2000, which ensures an all-orthogonal core by using leading singular vectors for each unfolding.^[1] Unlike the canonical polyadic (CP) decomposition, which expresses a tensor as a sum of rank-1 terms with a diagonal core, Tucker allows a dense core tensor, providing greater flexibility but sacrificing uniqueness—solutions are defined up to transformations involving inverses of the factor matrices.^[1] In chemometrics, variants such as Tucker1 (reducing one mode, equivalent to PCA on unfolding) and Tucker2 (reducing two modes) address specific computational or interpretive needs.^[3] The decomposition finds broad applications across fields, including psychometrics for analyzing multi-way individual differences, chemometrics for hyperspectral data modeling, computer vision for tasks like face recognition via TensorFaces, and signal processing for multidimensional array compression.^[1] In data mining, it enables pattern discovery in large-scale tensors, such as recommender systems or neuroimaging, by revealing multilinear relationships that scalar or matrix methods overlook.^[1] Computationally, algorithms like alternating least squares or successive HOSVD balance approximation quality with scalability, though challenges persist for very large tensors, with advances in randomized sketching and parallel methods developed since the 2010s.^[4]

Introduction

Overview

Tucker decomposition is a tensor factorization method that generalizes the singular value decomposition (SVD) of matrices to higher-order tensors, expressing the tensor as a multilinear product involving a smaller core tensor and a set of orthogonal factor matrices, one for each mode of the tensor. This approach enables the approximation and analysis of multi-dimensional data by capturing the essential structure through reduced-rank representations along multiple dimensions simultaneously. The primary motivation for Tucker decomposition lies in its ability to model multi-way interactions in data more effectively than traditional pairwise techniques like principal component analysis (PCA), which flatten multi-dimensional arrays into matrices and thereby lose inherent structural relationships. By preserving the multi-way nature of the data, it facilitates dimensionality reduction and feature extraction that respect the original tensor's modes, leading to more interpretable latent factors. For instance, consider a three-way tensor representing a set of RGB images, where the modes correspond to pixel rows, pixel columns, and color channels; Tucker decomposition can reveal underlying factors such as spatial patterns along rows and columns, and color interactions across slices, allowing for compact representations of image variations. Named after the psychometrician Ledyard R. Tucker, who introduced the method in the 1960s for analyzing multi-mode data in psychometrics, it has since become a foundational tool in multilinear algebra.

History

The Tucker decomposition was first introduced by Ledyard R. Tucker in 1963 as a three-mode factor analysis technique specifically designed for analyzing psychological data involving multiple modes, such as subjects, variables, and occasions.^[5] This initial formulation addressed the challenges of measuring change in multidimensional datasets, extending traditional two-way factor analysis to higher dimensions. Tucker further refined and formalized the method in his 1966 paper, providing mathematical notes on three-mode factor analysis that established a clearer framework for its implementation.^[6] The method was soon generalized to tensors of arbitrary order N in subsequent theoretical works during the late 1960s and 1970s. Subsequent works, including Levin's 1965 contribution, offered additional clarifications on notation and generalized treatments, enhancing the method's accessibility and theoretical foundations.^[7] During the 1970s and 1980s, the decomposition saw early adoption in psychometrics for multidimensional scaling and in chemometrics as a higher-order analogue to principal component analysis, facilitating the analysis of complex experimental data.^[5]^[8] In the 2000s, Tucker decomposition became integrated into the broader field of tensor analysis, with influential surveys like that of Kolda and Bader in 2009 highlighting its role in data compression and multilinear algebra.^[5] A pivotal milestone was the 2000 introduction of the Higher-Order Singular Value Decomposition (HOSVD) by De Lathauwer et al., which provided an efficient, quasi-optimal approximation algorithm based on Tucker's model. The 2010s marked expanded applications in machine learning, where it supported tasks like feature extraction and dimensionality reduction in high-dimensional datasets.^[9] More recent developments from 2023 to 2025 have focused on streaming and parallel algorithms, enabling scalable processing of big data in dynamic environments.^[10]^[11]

Mathematical Formulation

Tensor Notation and Preliminaries

An N-way tensor, also known as an Nth-order tensor, is defined as a multi-dimensional array \mathcal{X} \in \mathbb{R}^{I_1 \times I_2 \times \cdots \times I_N}, where N denotes the order or number of dimensions, and I_n represents the size of the nth dimension for n = 1, \dots, N.^[1] This generalizes scalars (N=0), vectors (N=1), and matrices (N=2) to higher dimensions.^[1] In tensor analysis, fibers and slices provide structured views of the data. A mode-n fiber is a vector obtained by fixing all indices of \mathcal{X} except the nth one, serving as the higher-order analogue to rows or columns in matrices.^[1] For instance, in a third-order tensor \mathcal{X} \in \mathbb{R}^{I \times J \times K}, the mode-1 fibers are the columns of the frontal slices.^[1] A mode-n slice, conversely, is a matrix formed by fixing all indices except two specific modes, yielding two-dimensional sections such as horizontal, lateral, or frontal slices in the third-order case.^[1] The mode-n unfolding, or matricization, rearranges the elements of \mathcal{X} into a matrix \mathbf{X}_{(n)} \in \mathbb{R}^{I_n \times (\prod_{k \neq n} I_k)}, where the mode-n fibers form the columns.^[1] The mapping of elements follows the convention that the (i_n, j) entry in \mathbf{X}_{(n)} corresponds to the tensor element with index j = 1 + \sum_{k \neq n} (i_k - 1) \prod_{m=1, m \neq n}^{k-1} I_m.^[1] This operation facilitates the application of linear algebra techniques to tensors by reducing them to matrix forms along specific modes. The n-mode tensor-matrix product between a tensor \mathcal{X} \in \mathbb{R}^{I_1 \times \cdots \times I_N} and a matrix \mathbf{U} \in \mathbb{R}^{J_n \times I_n} is denoted \mathcal{Y} = \mathcal{X} \times_n \mathbf{U}, resulting in a tensor \mathcal{Y} \in \mathbb{R}^{I_1 \times \cdots \times I_{n-1} \times J_n \times I_{n+1} \times \cdots \times I_N}.^[1] Element-wise, this is expressed using the Einstein summation convention—where repeated indices imply summation—as

y_{i_1 \cdots i_{n-1} j i_{n+1} \cdots i_N} = \sum_{i_n} x_{i_1 \cdots i_n \cdots i_N} u_{j i_n}.

Equivalently, in matrix form, \mathbf{Y}_{(n)} = \mathbf{U} \mathbf{X}_{(n)}.^[1] The Einstein convention simplifies notation for such multilinear operations by omitting explicit summation symbols over repeated indices.^[1] Orthogonal matrices play a fundamental role in tensor factorizations, characterized by the property \mathbf{Q}^\top \mathbf{Q} = \mathbf{I}, where \mathbf{I} is the identity matrix, ensuring the columns are orthonormal.^[1] This orthonormality preserves norms and angles in the transformed spaces, which is essential for decompositions that seek low-rank approximations without introducing scaling ambiguities.^[1]

The Tucker Decomposition Model

The Tucker decomposition, originally proposed for three-mode data analysis, provides a framework for representing an N-way tensor \mathcal{X} \in \mathbb{R}^{I_1 \times \cdots \times I_N} as a product of a core tensor and factor matrices along each mode. The exact decomposition is given by

\mathcal{X} = \mathcal{G} \times_1 \mathbf{U}^{(1)} \times_2 \mathbf{U}^{(2)} \cdots \times_N \mathbf{U}^{(N)},

where \mathcal{G} \in \mathbb{R}^{R_1 \times \cdots \times R_N} is the core tensor of multilinear rank (R_1, \dots, R_N), and each factor matrix \mathbf{U}^{(n)} \in \mathbb{R}^{I_n \times R_n} (for n = 1, \dots, N) has orthonormal columns that form an orthonormal basis for the mode-n subspace. The core tensor \mathcal{G} encodes the multi-way interactions among the components across all modes, with its entries indicating the relative importance or strength of combinations of basis vectors from the factor matrices. In contrast, each factor matrix \mathbf{U}^{(n)} captures the principal directions or basis vectors specific to the nth mode, analogous to the loadings in principal component analysis for that dimension. This structure allows the decomposition to generalize lower-order factorizations to higher dimensions while preserving multi-linear relationships in the data. For approximation purposes, a truncated Tucker decomposition is often used, where the core dimensions satisfy R_n < I_n for some or all modes, yielding

\mathcal{X} \approx \mathcal{G} \times_1 \mathbf{U}^{(1)} \times_2 \mathbf{U}^{(2)} \cdots \times_N \mathbf{U}^{(N)}. $$ This low-rank form enables [dimensionality reduction](/page/Dimensionality_reduction) by retaining only the most significant interactions and basis vectors, with the [approximation error](/page/Approximation_error) controlled by the choice of ranks. The reconstructed tensor can also be expressed through its mode-$n$ unfolding as

\mathbf{X}{(n)} = \mathbf{U}^{(n)} \mathbf{G}{(n)} (\mathbf{U}^{(N)} \otimes \cdots \otimes \mathbf{U}^{(n+1)} \otimes \mathbf{U}^{(n-1)} \otimes \cdots \otimes \mathbf{U}^{(1)})^T,

where $\mathbf{X}_{(n)}$ and $\mathbf{G}_{(n)}$ are the unfoldings of $\mathcal{X}$ and $\mathcal{G}$ along [mode](/page/Mode) $n$, and $\otimes$ denotes the [Kronecker product](/page/Kronecker_product). In the special case of an order-2 tensor (i.e., a [matrix](/page/Matrix) $\mathbf{X} \in \mathbb{R}^{I_1 \times I_2}$), the Tucker decomposition reduces to the [singular value decomposition](/page/Singular_value_decomposition) (SVD), $\mathbf{X} \approx \mathbf{U}^{(1)} \mathbf{\Sigma} (\mathbf{U}^{(2)})^T$, where the core $\mathbf{\Sigma}$ is diagonal and the factor matrices contain the singular vectors. ## Properties ### Multilinear Rank and Approximation In tensor analysis, the mode-$n$ rank of an $N$-way tensor $\mathcal{X} \in \mathbb{R}^{I_1 \times \cdots \times I_N}$, denoted $\rank_n(\mathcal{X})$, is defined as the rank of the mode-$n$ unfolding $\mathbf{X}_{(n)}$, which is the matrix obtained by reshaping $\mathcal{X}$ such that mode $n$ forms the rows and the other modes are column-wise concatenated.[](https://epubs.siam.org/doi/10.1137/07070111X) This rank equals the dimension of the subspace spanned by the mode-$n$ fibers of $\mathcal{X}$.[](https://epubs.siam.org/doi/10.1137/07070111X) The multilinear rank of $\mathcal{X}$ is the tuple $(R_1, \dots, R_N)$, where $R_n = \rank_n(\mathcal{X})$ for each $n=1, \dots, N$.[](https://epubs.siam.org/doi/10.1137/07070111X) In the Tucker decomposition model, the core tensor $\mathcal{G}$ has dimensions matching this multilinear rank, so $\mathcal{G} \in \mathbb{R}^{R_1 \times \cdots \times R_N}$.[](https://epubs.siam.org/doi/10.1137/07070111X) The desired multilinear rank $(R_1, \dots, R_N)$ specifies the reduced dimensions for the factor matrices in the decomposition, enabling a compact [representation](/page/Representation). The best low multilinear rank approximation for a given $(R_1, \dots, R_N)$ is obtained by the truncated Tucker model $\hat{\mathcal{X}} = \mathcal{G} \times_1 \mathbf{U}^{(1)} \cdots \times_N \mathbf{U}^{(N)}$, which minimizes the squared Frobenius norm error

|\mathcal{X} - \hat{\mathcal{X}}|_F^2

over the core tensor $\mathcal{G}$ and orthonormal factor matrices $\mathbf{U}^{(n)} \in \mathbb{R}^{I_n \times R_n}$. However, this minimization does not yield the global optimum in closed form.[](https://epubs.siam.org/doi/10.1137/07070111X) A smaller multilinear [rank](/page/Rank) leads to a sparser tensor [representation](/page/Representation), as the total number of parameters is $\sum_{n=1}^N I_n R_n + \prod_{n=1}^N R_n$, which is substantially less than the original tensor size $\prod_{n=1}^N I_n$ when the $R_n$ are low relative to the $I_n$; this property is particularly useful for [data storage](/page/Data_storage) and efficient computation.[](https://epubs.siam.org/doi/10.1137/07070111X) ### Uniqueness Conditions The Tucker decomposition of a tensor is inherently non-unique due to [rotation](/page/Rotation) ambiguities, where the [factor](/page/Factor) matrices can be post-multiplied by arbitrary nonsingular matrices, with the [inverse](/page/Inverse) transformations absorbed into the [core](/page/Core) tensor to yield an equivalent [representation](/page/Representation).[](https://www.kolda.net/publication/TensorReview.pdf)[](https://pmc.ncbi.nlm.nih.gov/articles/PMC11110042/) This flexibility arises from the model's structure, allowing multiple sets of [factor](/page/Factor)s and [core](/page/Core) to reconstruct the same tensor.[](https://arxiv.org/pdf/1711.10781) To mitigate these ambiguities, the factor matrices are typically constrained to have orthonormal columns, as in the [Higher-Order Singular Value Decomposition](/page/Higher-order_singular_value_decomposition) (HOSVD), which eliminates rotational freedom by fixing the subspaces spanned by the factors.[](https://www.kolda.net/publication/TensorReview.pdf)[](https://arxiv.org/pdf/1711.10781) Under this [orthonormality](/page/Orthonormality) condition and assuming the core tensor has no additional [symmetry](/page/Symmetry)—such as all frontal slices being proportional—the decomposition becomes unique up to scaling factors and sign flips in the columns of the factor matrices, with corresponding adjustments in the core entries.[](https://www.kolda.net/publication/TensorReview.pdf)[](https://pmc.ncbi.nlm.nih.gov/articles/PMC11110042/) Partial uniqueness can be achieved when the core tensor has full multilinear [rank](/page/Rank) (referring to the ranks of its mode-n unfoldings matching the core dimensions) and the factor matrices are orthogonal with full column [rank](/page/Rank), ensuring that the subspaces are uniquely determined without further rotational indeterminacy.[](https://arxiv.org/pdf/1711.10781) In such cases, the [decomposition](/page/Decomposition) is essentially [unique](/page/Unique) modulo permutations and scalings of the core elements, provided the factor matrices satisfy conditions like sufficient spread in their entries to avoid degenerate configurations.[](https://www.kolda.net/publication/TensorReview.pdf) However, limitations persist: ambiguities reemerge if the core tensor features proportional slices across modes, effectively reducing its [rank](/page/Rank) and allowing non-trivial transformations that preserve the [tensor product](/page/Tensor_product), or if the factor matrices lack full column [rank](/page/Rank), leading to underdetermined subspaces.[](https://pmc.ncbi.nlm.nih.gov/articles/PMC11110042/)[](https://arxiv.org/pdf/1711.10781) These conditions highlight the importance of the core's structural properties in enforcing [identifiability](/page/Identifiability).[](https://www.kolda.net/publication/TensorReview.pdf) ## Algorithms ### Higher-Order Singular Value Decomposition (HOSVD) The [Higher-Order Singular Value Decomposition](/page/Higher-order_singular_value_decomposition) (HOSVD), also known as the multilinear singular value decomposition, provides a foundational [algorithm](/page/Algorithm) for computing the [Tucker decomposition](/page/Tucker_decomposition) of an $N$th-order tensor $\mathcal{X} \in \mathbb{R}^{I_1 \times \cdots \times I_N}$ by generalizing the matrix [singular value decomposition](/page/Singular_value_decomposition) ([SVD](/page/SVD)) to higher orders. Introduced as a direct, non-iterative method, HOSVD computes orthogonal factor matrices and a core tensor through successive SVDs of the tensor's mode-$n$ unfoldings, yielding a quasi-optimal [low-rank approximation](/page/Low-rank_approximation) without requiring optimization. This approach ensures that the factor matrices capture the principal components along each [mode](/page/Mode) independently, making it particularly suitable for initial approximations in tensor analysis.[](https://epubs.siam.org/doi/10.1137/S0895479896305696) The algorithm proceeds in three main steps. First, for each mode $n = 1, \dots, N$, compute the SVD of the mode-$n$ unfolding $\mathbf{X}_{(n)} \in \mathbb{R}^{I_n \times J_n}$ (where $J_n = \prod_{k \neq n} I_k$), given by

\mathbf{X}_{(n)} = \mathbf{U}^{(n)} \mathbf{S}^{(n)} (\mathbf{V}^{(n)})^T,

where $\mathbf{U}^{(n)} \in \mathbb{R}^{I_n \times I_n}$ contains the left singular vectors, $\mathbf{S}^{(n)}$ is the [diagonal matrix](/page/Diagonal_matrix) of singular values, and $\mathbf{V}^{(n)}$ contains the right singular vectors. Second, select the first $R_n$ columns of each $\mathbf{U}^{(n)}$ to form the orthogonal factor matrix $\mathbf{U}_n \in \mathbb{R}^{I_n \times R_n}$, where $R_n$ is the desired multilinear [rank](/page/Rank) for mode $n$. Third, compute the core tensor $\mathcal{G} \in \mathbb{R}^{R_1 \times \cdots \times R_N}$ via successive mode-$n$ products:

\mathcal{G} = \mathcal{X} \times_1 (\mathbf{U}_1)^T \times_2 (\mathbf{U}_2)^T \cdots \times_N (\mathbf{U}_N)^T.

The resulting Tucker decomposition is $\mathcal{X} \approx \mathcal{G} \times_1 \mathbf{U}_1 \times_2 \mathbf{U}_2 \cdots \times_N \mathbf{U}_N$, with all factor matrices orthonormal and the core tensor all-orthogonal (i.e., its mode-$n$ fibers are orthonormal for fixed indices in other modes).[](https://epubs.siam.org/doi/10.1137/S0895479896305696) HOSVD possesses several key properties that align it closely with the matrix [SVD](/page/SVD). The factor matrices are orthonormal by construction, and the core tensor's entries are ordered such that the Frobenius norms of its frontal slices decrease along each [mode](/page/Mode). For [approximation](/page/Approximation), the truncated HOSVD provides a **quasi-best** low multilinear-[rank](/page/Rank) representation, where the squared approximation error satisfies

|\mathcal{X} - \hat{\mathcal{X}}|F^2 \leq \sum{n=1}^N \sum_{i=R_n+1}^{I_n} (\sigma_i^{(n)})^2,

with $\sigma_i^{(n)}$ denoting the singular values of $\mathbf{X}_{(n)}$; this bound is at most $N$ times the error of the optimal [Tucker](/page/Tucker) approximation of the same multilinear rank. The $n$-mode singular values are unique, though the vectors are unique only up to [phase](/page/Phase) or unitary transformations. Computationally, the [complexity](/page/Complexity) is dominated by the $N$ [SVD](/page/SVD)s, each requiring $O(I_n J_n \min(I_n, J_n))$ operations, leading to an overall cost of $O(N \prod_k I_k \cdot \max_n R_n)$ for large tensors where unfoldings and partial SVDs (up to rank $R_n$) prevail.[](https://epubs.siam.org/doi/10.1137/S0895479896305696)[](https://epubs.siam.org/doi/10.1137/07070111X) HOSVD offers notable advantages, including its [simplicity](/page/Simplicity) and parallelizability, as the mode-wise SVDs can be computed independently, facilitating distributed [implementation](/page/Implementation) on modern hardware. It also serves as an effective initialization for more advanced Tucker algorithms, capturing mode-specific variations without iteration. However, drawbacks arise from its [greedy](/page/Greedy), non-iterative nature, often resulting in a suboptimal [core](/page/Core) tensor that is not "tight" (i.e., not superdiagonal-dominant), leading to higher approximation errors than possible, with worst-case errors up to $N$ times the optimal.[](https://epubs.siam.org/doi/10.1137/07070111X) ### Higher-Order Orthogonal Iteration (HOOI) The Higher-Order Orthogonal Iteration (HOOI) algorithm computes an optimal orthogonal Tucker decomposition by iteratively refining the factor matrices to minimize the Frobenius norm error between the original tensor and its approximation.[](https://epubs.siam.org/doi/10.1137/S0895479896305696) Introduced as an enhancement to the [Higher-Order Singular Value Decomposition](/page/Higher-order_singular_value_decomposition) (HOSVD), HOOI operates within an alternating [least squares](/page/Least_squares) (ALS) framework, where the core tensor and factor matrices are updated successively until [convergence](/page/Convergence). Typically, the algorithm initializes the factor matrices $\mathbf{U}^{(n)}$ using the leading singular vectors from HOSVD, which provides a strong starting point close to the optimal solution.[](https://epubs.siam.org/doi/10.1137/S0895479896305696) In each iteration, for every mode $n = 1, \dots, N$, the factor matrix $\mathbf{U}^{(n)}$ is updated by solving the least-squares problem $\mathbf{U}^{(n)} = \arg\min_{\mathbf{U}} \|\mathbf{X}_{(n)} - \mathbf{U} \mathbf{G}_{(n)} \mathbf{W}^T\|_F$, where $\mathbf{X}_{(n)}$ is the mode-$n$ unfolding of the tensor $\mathcal{X}$, $\mathbf{G}_{(n)}$ is the mode-$n$ unfolding of the current core tensor $\mathcal{G}$, and $\mathbf{W}$ is the [Kronecker product](/page/Kronecker_product) of the other factor matrices $\mathbf{U}^{(m)}$ for $m \neq n$.[](https://epubs.siam.org/doi/10.1137/S0895479896305696) This minimization is efficiently solved via the truncated [singular value decomposition](/page/Singular_value_decomposition) (SVD) of the matrix $\mathbf{X}_{(n)} \mathbf{W} (\mathbf{W}^T \mathbf{W})^{-1}$, retaining the $R_n$ leading left singular vectors as $\mathbf{U}^{(n)}$. Following the updates to all factor matrices, the core tensor is recomputed as $\mathcal{G} = \mathcal{X} \times_1 (\mathbf{U}^{(1)})^T \times_2 (\mathbf{U}^{(2)})^T \cdots \times_N (\mathbf{U}^{(N)})^T$.[](https://epubs.siam.org/doi/10.1137/S0895479896305696) This process repeats, with the approximation improving at each step due to the non-increasing error property of ALS. HOOI converges to a local minimum in the ALS sense, typically requiring 5–20 iterations depending on the tensor's structure and rank choices. The stopping criteria are usually based on the relative change in the fit (defined as $1 - \|\mathcal{X} - \hat{\mathcal{X}}\|_F / \|\mathcal{X}\|_F$) falling below a small [threshold](/page/Threshold), such as $10^{-6}$, or reaching a maximum number of iterations to prevent excessive [computation](/page/Computation). Compared to the non-iterative HOSVD, HOOI's complexity is higher due to the repeated SVDs, with a per-iteration cost of $O\left( \left( \prod_k I_k \right) \sum_n R_n + \sum_n I_n \left( \prod_{k \neq n} R_k \right)^2 \right)$ in the dominant terms, where $I_k$ are the tensor dimensions and $R_k$ the multilinear ranks; this makes it suitable for moderate-sized tensors but challenging for very large ones without approximations. Variants of HOOI include the successive (or mode-by-mode) update scheme, which is the standard approach updating one mode at a time within each full iteration, and less common full-update versions that optimize all factors simultaneously per cycle, though the successive method is preferred for its stability and lower memory requirements.[](https://www-users.cse.umn.edu/~saad/PDF/umsi-2006-132.pdf) HOOI significantly improves upon [HOSVD](/page/HOSVD) initialization by capturing inter-mode dependencies more effectively. ## Applications ### Data Compression and Dimensionality Reduction Tucker decomposition enables efficient data compression for multi-way arrays by approximating an N-th order tensor $\mathcal{X} \in \mathbb{R}^{I_1 \times \cdots \times I_N}$ as $\hat{\mathcal{X}} = \mathcal{G} \times_1 \mathbf{U}^{(1)} \times_2 \cdots \times_N \mathbf{U}^{(N)}$, where $\mathcal{G} \in \mathbb{R}^{R_1 \times \cdots \times R_N}$ is a compact core tensor and $\mathbf{U}^{(n)} \in \mathbb{R}^{I_n \times R_n}$ are [factor](/page/Factor) matrices for each [mode](/page/Mode) $n$. The storage cost of this representation is $\prod_{n=1}^N R_n + \sum_{n=1}^N I_n R_n$, which yields substantial savings compared to the original $\prod_{n=1}^N I_n$ elements when the multilinear ranks $R_n \ll I_n$, as the core captures multi-linear interactions while the factors encode mode-specific bases. This compression inherently performs [dimensionality reduction](/page/Dimensionality_reduction) by projecting the tensor onto lower-dimensional subspaces along each mode, generalizing [principal component analysis](/page/Principal_component_analysis) ([PCA](/page/PCA)) to higher-order data where each mode undergoes independent orthogonal transformation rather than a single vectorized reduction. The resulting [low-rank approximation](/page/Low-rank_approximation) retains dominant structural features, facilitating analysis of high-dimensional datasets like those from multi-sensor arrays. For instance, in [hyperspectral imaging](/page/Hyperspectral_imaging), a third-order tensor of size $100 \times 100 \times 200$ (representing spatial-spatial-spectral dimensions) can be compressed to multilinear ranks $(10, 10, 20)$, reducing storage by over 90% while maintaining high fidelity for subsequent processing.[](https://doi.org/10.1109/JSTARS.2012.2189200) The effectiveness of such approximations is quantified by the explained variance, given by

1 - \frac{|\mathcal{X} - \hat{\mathcal{X}}|_F^2}{|\mathcal{X}|_F^2},

where $\|\cdot\|_F$ is the Frobenius norm, measuring the proportion of the original tensor's energy captured by the model. Recent developments, such as block incremental Tucker decomposition, extend these benefits to streaming scenarios by updating the core and factors incrementally as new data blocks arrive, enabling real-time compression in resource-constrained environments like [IoT](/page/IOT) sensor networks for applications including [environmental monitoring](/page/Environmental_monitoring).[](https://www.techscience.com/CMES/v139n1/55108) ### Signal Processing and Biomedical Imaging In [signal processing](/page/Signal_processing), Tucker decomposition facilitates denoising of multi-way [data](/page/Data) by exploiting low-rank approximations, particularly for [radar](/page/Radar) array signals represented as space-time tensors. For instance, an improved Tucker algorithm has been applied to suppress [noise](/page/Noise) and [interference](/page/Interference) in ground-penetrating [radar](/page/Radar) [data](/page/Data), enhancing underground target detection accuracy by reconstructing clean signals from noisy observations.[](https://onlinelibrary.wiley.com/doi/abs/10.1002/nsg.12246) Similarly, in side-scan [sonar](/page/Side-scan_sonar) imaging, Tucker-based methods reduce speckle [noise](/page/Noise) while preserving edge details in underwater acoustic images.[](https://www.mdpi.com/1424-8220/19/13/2903) For blind source separation in multi-sensor environments, multiway blind source separation techniques leveraging Tucker decomposition enable the extraction of unique underlying components from large-scale tensor [data](/page/Data), outperforming traditional [matrix](/page/Matrix)-based [independent component analysis](/page/Independent_component_analysis) in handling spatiotemporal correlations.[](https://journals.pan.pl/Content/83598) Tucker decomposition offers advantages over [matrix](/page/Matrix) methods by capturing multi-dimensional interactions, such as space-frequency-time dependencies in signals, which [matrix](/page/Matrix) approaches cannot fully represent without unfolding the [data](/page/Data) and losing structural [information](/page/Information). In biomedical [imaging](/page/Imaging), Tucker decomposition analyzes [functional magnetic resonance imaging](/page/Functional_magnetic_resonance_imaging) (fMRI) data structured as subject-by-time-by-voxel tensors to extract shared functional modes across individuals. For example, Tucker-2 decomposition identifies group-level activation patterns in multi-subject fMRI datasets during varying [sleep](/page/Sleep) stages, revealing intrinsic [brain](/page/Brain) network dynamics.[](https://ieeexplore.ieee.org/document/10805431) In [electroencephalography](/page/Electroencephalography) (EEG), tensor decompositions including Tucker variants remove artifacts like eye blinks from multi-channel recordings, improving signal quality for subsequent analysis without assuming independence of sources.[](https://link.springer.com/chapter/10.1007/978-3-662-44722-2_17) A prominent application is Tucker-based [compression](/page/Compression) of MRI k-space tensors, which accelerates dynamic imaging by [subsampling](/page/Subsampling) data and reconstructing high-resolution images, thereby significantly reducing scan times while maintaining diagnostic fidelity.[](https://www.researchgate.net/publication/352806744_Compression_of_volume-surface_integral_equation_matrices_via_Tucker_decomposition_for_magnetic_resonance_applications) Recent advancements include AI-integrated low-rank Tucker methods for [brain tumor](/page/Brain_tumor) detection in MRI, where [tensor decomposition](/page/Tensor_decomposition) enhances classification accuracy by isolating tumor-specific features from noisy scans.[](https://www.sciencedirect.com/science/article/pii/S2772415824000142) Additionally, regularized Tucker approaches have been employed for image restoration in biomedical contexts, mitigating artifacts in [hyperspectral imaging](/page/Hyperspectral_imaging) through sparsity constraints on the core tensor.[](https://ieeexplore.ieee.org/document/8233403) ## Comparisons ### With Canonical Polyadic (CP) Decomposition The Canonical Polyadic ([CP](/page/CP)) decomposition, also known as CANDECOMP/PARAFAC, approximates an N-way tensor $\mathcal{X} \in \mathbb{R}^{I_1 \times \cdots \times I_N}$ as a sum of rank-1 tensors:

\mathcal{X} \approx \sum_{r=1}^R \mathbf{u}_r^{(1)} \circ \mathbf{u}_r^{(2)} \circ \cdots \circ \mathbf{u}_r^{(N)},

where each $\mathbf{u}_r^{(n)} \in \mathbb{R}^{I_n}$ is a column vector, $\circ$ denotes the [outer product](/page/Outer_product), and $R$ is the CP [rank](/page/Rank).[](https://epubs.siam.org/doi/10.1137/07070111X) This model is equivalent to a Tucker decomposition with a superdiagonal [core](/page/Core) tensor of size $R \times \cdots \times R$, where the core entries are all 1 except the superdiagonal elements set to the weights of the rank-1 terms.[](https://epubs.siam.org/doi/10.1137/07070111X) In contrast to the Tucker decomposition, which employs a dense core tensor $\mathcal{G} \in \mathbb{R}^{R_1 \times \cdots \times R_N}$ to capture multilinear interactions among modes, the CP model restricts the core to be superdiagonal, enforcing a purely additive structure of rank-1 components.[](https://epubs.siam.org/doi/10.1137/07070111X) This makes CP more parsimonious in parameters—for an N-th [order](/page/Order) tensor, CP requires $R \sum_{n=1}^N I_n$ elements, compared to Tucker's $\prod_{n=1}^N R_n + \sum_{n=1}^N I_n R_n$, which grows rapidly with the core dimensions unless the ranks $R_n$ are small.[](https://epubs.siam.org/doi/10.1137/07070111X) Consequently, Tucker offers greater flexibility for modeling complex dependencies but at the cost of higher storage and potential [overfitting](/page/Overfitting), while CP's constrained form promotes interpretability yet may underfit tensors with significant mode interactions.[](https://epubs.siam.org/doi/10.1137/07070111X) Computationally, both decompositions are often fitted using [alternating least squares](/page/Least_squares) (ALS), but their implementations differ. Tucker's ALS variant, known as Higher-Order Orthogonal Iteration (HOOI), iteratively optimizes orthogonal [factor](/page/Factor) matrices and the core, leveraging multilinear products for efficiency.[](https://epubs.siam.org/doi/10.1137/07070111X) CP-ALS, however, updates each [factor](/page/Factor) matrix while fixing others, facing challenges from the nonlinearity of the rank-1 sum, which can lead to slow [convergence](/page/Convergence) or local minima without careful initialization.[](https://epubs.siam.org/doi/10.1137/07070111X) [Uniqueness](/page/Uniqueness) also varies: CP decompositions can be unique up to scaling and permutation under Kruskal's condition (e.g., for third-order tensors, $k_A + k_B + k_C \geq 2R + 2$, where $k_n$ is the Kruskal [rank](/page/Rank) of [mode](/page/Mode) $n$), enabling reliable [factor](/page/Factor) recovery, whereas Tucker lacks inherent [uniqueness](/page/Uniqueness) due to rotational freedom in the core and factors (as detailed in the Uniqueness Conditions [section](/page/Section)).[](https://epubs.siam.org/doi/10.1137/07070111X) Tucker is typically preferred when orthogonal bases are desired for [dimensionality reduction](/page/Dimensionality_reduction) or when the tensor exhibits multilinear structure amenable to a non-trivial [core](/page/Core), such as in signal compression where interactions across modes must be preserved.[](https://epubs.siam.org/doi/10.1137/07070111X) CP excels in scenarios requiring exact low-rank [factorization](/page/Factorization) with minimal parameters and [uniqueness](/page/Uniqueness) guarantees, like blind source separation, but struggles with fitting when the true structure deviates from a sum of rank-1 terms.[](https://epubs.siam.org/doi/10.1137/07070111X) ### With Parallel Factor Analysis (PARAFAC) Parallel Factor Analysis (PARAFAC) represents a constrained variant of the [Tucker](/page/Tucker) decomposition, specifically when the core tensor is restricted to a superdiagonal form with all multilinear ranks equal to the component rank $R$, rendering it mathematically equivalent to the [Canonical](/page/Canonical) Polyadic (CP) decomposition.[](https://www.kolda.net/publication/TensorReview.pdf) This formulation differs fundamentally from the general [Tucker](/page/Tucker) model, which employs a full or block-diagonal core tensor to model interactions among components across different modes; PARAFAC, by contrast, assumes no such interactions, decomposing the tensor solely as a sum of rank-1 terms.[](https://www.kolda.net/publication/TensorReview.pdf) Although PARAFAC was formalized by Harshman in 1970, building on earlier ideas in [factor analysis](/page/Factor_analysis), it emerged shortly after Tucker's foundational work on three-mode [factor analysis](/page/Factor_analysis) in 1966, with both methods rooted in extending classical [factor analysis](/page/Factor_analysis) to multi-way data structures.[](https://three-mode.leidenuniv.nl/pdf/h/harshman1970uclawpp.pdf)[](https://link.springer.com/article/10.1007/BF02289464) In practice, PARAFAC excels in chemometric applications such as [fluorescence spectroscopy](/page/Fluorescence_spectroscopy), where it uniquely resolves distinct [fluorophore](/page/Fluorophore) profiles from excitation-emission data without interference from [scattering](/page/Scattering) or overlapping signals.[](https://pubs.rsc.org/en/content/articlelanding/2013/ay/c3ay41160e) Tucker decomposition, however, is better suited for broader datasets exhibiting significant mode interactions, such as in sensor arrays or [neuroimaging](/page/Neuroimaging).[](https://www.kolda.net/publication/TensorReview.pdf) A primary trade-off lies in their uniqueness properties: PARAFAC guarantees unique solutions under Kruskal's conditions—for a third-order tensor, this requires the sum of the k-ranks of the factor matrices to satisfy $k_A + k_B + k_C \geq 2R + 2$—yielding highly interpretable, directly meaningful factors; [Tucker](/page/Tucker), while computationally simpler through methods like higher-order orthogonal iterations, lacks such essential [uniqueness](/page/Uniqueness), often resulting in factors that require additional [interpretation](/page/Interpretation) due to core tensor flexibility.[](https://www.kolda.net/publication/TensorReview.pdf) In multi-way chromatography, PARAFAC can extract pure analyte profiles from complex data.[](https://www.sciencedirect.com/science/article/pii/S0169743997000324)