Fact-checked by Grok 2 weeks ago

Matrix decomposition

Matrix decomposition, also known as matrix factorization, refers to the process of factoring a matrix into a product of matrices that are simpler or more structured, such as triangular, diagonal, or orthogonal matrices, to enable efficient computation, analysis, and solution of linear systems.^[1]^[2] This approach revolutionized numerical linear algebra in the mid-20th century; for example, the LU decomposition expresses a square matrix A as A = LU where L is lower triangular and U is upper triangular.^[3]^[2] Key types of matrix decompositions include LU decomposition, used for solving general linear systems and related to Gaussian elimination; Cholesky decomposition, which factors symmetric positive-definite matrices as A = LL^T with L lower triangular, offering computational efficiency in statistical applications; QR decomposition, representing A = QR with orthogonal Q and upper triangular R, essential for least squares problems and orthogonalization; singular value decomposition (SVD), decomposing A = U \Sigma V^T where U and V are orthogonal and \Sigma is diagonal with non-negative singular values, crucial for principal component analysis (PCA), low-rank approximations, and pseudo-inverses; and eigendecomposition, for diagonalizable matrices as A = B \Lambda B^{-1} with eigenvalues on the diagonal of \Lambda, aiding in stability analysis and spectral problems.^[1]^[2] These decompositions generally scale as O(n^3) for an n \times n matrix but allow subsequent operations, like solving multiple systems, to run in O(n^2), enhancing efficiency in large-scale computations such as those in machine learning backpropagation or the Netflix Prize matrix factorization challenges.^[1]^[3]^[2] Beyond solving linear equations and eigenvalue problems, matrix decompositions underpin robust numerical software libraries like LAPACK and LINPACK, support updating techniques for dynamic data (e.g., rank-one modifications in O(n^2)), and reveal structural properties like rank, condition number, and subspace angles, with broad applications in physics, engineering, and data science.^[2]^[1]

Fundamentals

Definition and Motivation

In linear algebra, matrix decomposition, also known as matrix factorization, refers to the process of expressing a given matrix A \in \mathbb{R}^{m \times n} as a product of two or more simpler matrices, such as A = BC, where B and C possess special structures like triangular, orthogonal, or diagonal forms.^[4]^[5] This factorization transforms the original matrix into a canonical form that exploits inherent properties for easier manipulation. The primary motivation for matrix decomposition arises from the need to simplify complex matrix operations in numerical computations. For instance, solving linear systems Ax = b, computing matrix inverses, and addressing eigenvalue problems become more tractable when the matrix is factored into components that allow for efficient algorithms like forward or backward substitution. These decompositions reduce computational complexity—for example, from O(n^3) operations in direct methods to more optimized forms—and enhance numerical stability by avoiding ill-conditioned intermediate steps, particularly for large or sparse matrices.^[4] Beyond computation, matrix decompositions reveal intrinsic properties of the matrix, such as its rank, condition number, and overall stability, which are crucial for understanding the behavior of linear systems. In applications, they enable data compression through low-rank approximations, where only the dominant components are retained to represent high-dimensional data efficiently, as seen in techniques like singular value decomposition. Additionally, they play a key role in optimization problems by facilitating gradient computations and constraint handling in fields like machine learning and signal processing.^[5] For example, decompositions such as LU or SVD illustrate how structured factorizations support these tasks without requiring full matrix storage or inversion.

Historical Development

The foundations of matrix decomposition trace back to the 19th century, when mathematicians began exploring determinants and minors as tools for understanding linear systems and quadratic forms. Augustin-Louis Cauchy introduced key concepts in 1826, including the use of minors and adjoints in the context of quadratic forms, laying groundwork for notions of matrix rank through the analysis of non-vanishing minors.^[6] Leopold Kronecker advanced these ideas in the 1880s, particularly in 1888, by applying determinant-based criteria to assess the solvability of linear systems, effectively formalizing early rank concepts that would underpin later decompositions.^[6] A pivotal early milestone in spectral decompositions occurred in 1873, when Eugenio Beltrami developed a diagonalization technique for symmetric matrices via orthogonal transformations, motivated by simplifying bilinear forms; this was independently extended by Camille Jordan in 1874 to rectangular matrices, establishing the singular value decomposition (SVD) in its basic form.^[7] James Joseph Sylvester contributed a similar result for real square matrices in 1889.^[7] Erhard Schmidt formalized the Gram-Schmidt orthogonalization process in 1907, building on Jørgen Pedersen Gram's earlier work from 1883, which provided a basis for QR decompositions by constructing orthogonal bases from linearly independent vectors.^[8] The early 20th century saw further progress in triangular decompositions. André-Louis Cholesky devised a method in 1910 for factoring positive definite matrices into lower triangular factors, originally for geodetic computations, though it was published posthumously in 1924.^[9] In 1936, Carl Eckart and Gale Young demonstrated the utility of SVD for low-rank matrix approximation, showing that the best approximation in the Frobenius or spectral norm is obtained by truncating the SVD, which highlighted its practical value. World War II accelerated developments due to the need for solving large-scale linear systems in ballistics and cryptography, with early electronic computers like ENIAC (1945) demanding efficient algorithms. Alan Turing formalized the general LU decomposition in 1948 as a stable factorization for Gaussian elimination, building on earlier work by Tadeusz Banachiewicz in 1938 and John von Neumann and William Goldstine in 1947.^[10] The rise of digital computers in the 1950s spurred numerical advancements: Alston S. Householder introduced unitary transformations in 1958 for QR decompositions, enabling stable orthogonalization without the instability of classical Gram-Schmidt. The 1960s marked a numerical renaissance, with Gene Golub and William Kahan developing an efficient bidiagonalization algorithm for computing the SVD in 1965, making it viable for practical applications on early computers. Concurrently, researchers addressed numerical stability for ill-conditioned matrices; partial pivoting in LU decompositions, refined by James Wilkinson and others around 1965, became a standard variant to mitigate growth in rounding errors during factorization.^[10] These innovations, tied to the computational demands of scientific simulation and data analysis, solidified matrix decompositions as cornerstones of numerical linear algebra.

General Properties

Matrix decompositions exhibit varying conditions for uniqueness depending on the specific factorization and any imposed constraints. For instance, the LU decomposition of a nonsingular square matrix is unique if no row interchanges are required during Gaussian elimination, but partial pivoting generally introduces non-uniqueness due to permutation choices.^[11] Similarly, the orthogonal factor in the QR decomposition is unique up to sign changes in the columns when the upper triangular factor has positive diagonal entries and the matrix has full column rank.^[11] The singular value decomposition (SVD), however, is unique up to unitary transformations, phase factors, and ordering of the singular values, with strict uniqueness holding when singular values are distinct.^[11] Existence theorems underpin the reliability of these decompositions across matrix classes. The LU decomposition exists for any nonsingular square matrix when partial pivoting is employed, ensuring no zero pivots disrupt the process, though it may fail without pivoting for matrices with small pivots.^[11] In contrast, the SVD exists for every real or complex matrix, regardless of size, rank, or conditioning, providing a universal factorization into orthogonal and diagonal components.^[11] These guarantees make decompositions applicable to a broad range of problems, including the solution of linear systems where factorizations simplify subsequent computations.^[11] Most matrix decompositions for dense n \times n matrices incur a computational complexity of O(n^3) operations, typically derived from variants of Gaussian elimination. For example, the LU decomposition requires approximately $2n^3/3 floating-point operations, while the SVD demands around $4n^3 to $22n^3 depending on the algorithm, such as the Golub-Reinsch method.^[11] This cubic scaling reflects the inherent cost of transforming matrices into structured forms like triangular or diagonal. Stability and conditioning play crucial roles in the practical utility of decompositions, as they determine error propagation in finite-precision arithmetic. Decompositions like LU with partial pivoting and QR via Householder reflections are backward stable, meaning computed factors solve a slightly perturbed version of the original problem with small relative errors bounded by machine epsilon times the matrix norm.^[11] The SVD is particularly robust, with perturbations in singular values controlled by the gap between them. Conditioning is quantified by the condition number \kappa(A) = \|A\| \cdot \|A^{-1}\|, which for the 2-norm equals the ratio of the largest to smallest singular value; well-conditioned matrices have \kappa(A) \approx 1, minimizing sensitivity to input perturbations, while ill-conditioned ones amplify errors.^[11] Matrix decompositions fundamentally equate to processes of triangularization or diagonalization, enabling efficient manipulation of linear transformations. LU and QR decompositions achieve triangularization, converting a general matrix into lower/upper triangular factors through elimination, which simplifies solving systems or computing determinants.^[11] Spectral decompositions like the SVD and eigendecomposition pursue diagonalization, factoring the matrix into orthogonal/unitary matrices and a diagonal matrix of singular values or eigenvalues, revealing intrinsic geometric properties such as rank and norms.^[11]

Decompositions for Linear Systems

LU Decomposition

LU decomposition, also known as LU factorization, expresses a square matrix A as the product of a lower triangular matrix L and an upper triangular matrix U, such that A = LU. Typically, L is a unit lower triangular matrix with ones on the diagonal, while U has arbitrary diagonal entries, ensuring all entries above the diagonal in L are zero and below the diagonal in U are zero.^[12] This factorization applies to nonsingular matrices A \in \mathbb{R}^{n \times n}. For numerical stability, especially when pivots are small or zero, partial pivoting is used, yielding PA = LU where P is a permutation matrix that reorders rows to select the largest pivot in each column.^[12] The modern LU factorization emerged from Alan Turing's 1948 analysis of rounding errors in matrix processes, where he formalized Gaussian elimination as this decomposition.^[13] The algorithm to compute the LU decomposition relies on Gaussian elimination. Starting with the matrix A, elimination proceeds column by column: for each pivot position k, the entries below the pivot in column k are zeroed by subtracting multiples of row k from subsequent rows, with the multipliers l_{ik} = a_{ik}^{(k)} / a_{kk}^{(k)} (for i > k) stored in the lower part of L. The resulting upper triangular form after all eliminations is U, and L is completed with ones on the diagonal and zeros above. With partial pivoting, rows are swapped before each elimination step to maximize the pivot magnitude. Solving Ax = b then involves forward substitution to solve Ly = Pb for y, followed by back substitution to solve Ux = y.^[14] The factorization requires O(n^3) operations, while each subsequent solve costs O(n^2).^[12] Applications of LU decomposition center on efficiently solving linear systems Ax = b, particularly when multiple right-hand sides b share the same A, as the costly factorization is performed once. It also facilitates matrix inversion by solving Ax = e_i for each standard basis vector e_i. The method demands that A be nonsingular and that no zero pivots occur during elimination without pivoting; partial pivoting mitigates this by ensuring numerical stability for ill-conditioned matrices. For symmetric positive definite matrices, LU decomposition connects to Cholesky decomposition, where A = LL^T with L lower triangular.^[15] Consider the 2×2 matrix

A = \begin{pmatrix} 1 & 2 \\ 3 & 4 \end{pmatrix}.

In the first step of Gaussian elimination, the multiplier for the (2,1) entry is l_{21} = 3/1 = 3. Subtract 3 times row 1 from row 2 to get the updated row 2: [3-3\cdot1, 4-3\cdot2] = [0, -2]. Thus,

L = \begin{pmatrix} 1 & 0 \\ 3 & 1 \end{pmatrix}, \quad U = \begin{pmatrix} 1 & 2 \\ 0 & -2 \end{pmatrix},

and A = LU. To solve Ax = b for b = \begin{pmatrix} 1 \\ 2 \end{pmatrix}, first solve Ly = b by forward substitution: y_1 = 1, y_2 = (2 - 3\cdot1)/1 = -1. Then back-substitute Ux = y: x_2 = -1 / -2 = 0.5, x_1 = (1 - 2\cdot0.5)/1 = 0.^[14]

Cholesky Decomposition

The Cholesky decomposition, also known as Cholesky factorization, expresses a symmetric positive-definite matrix A as the product A = LL^T, where L is a lower triangular matrix with positive diagonal entries.^[16] This factorization exploits the symmetry and positive definiteness of A to produce a unique L.^[17] For complex Hermitian positive-definite matrices, the formula generalizes to A = LL^H, where ^H denotes the conjugate transpose.^[16] The method was developed by French military engineer André-Louis Cholesky around 1907–1910 for geodetic computations during World War I, and published posthumously in 1924.^[9] The decomposition exists if and only if A is symmetric (or Hermitian) and positive definite, meaning all eigenvalues of A are positive.^[18] During computation, if any diagonal entry of L becomes negative or imaginary, this indicates that A is not positive definite.^[16] The algorithm computes L row-by-row via a variant of Gaussian elimination that avoids pivoting, leveraging the symmetry to reduce operations. For the v-th row, the diagonal entry is l_{vv} = \sqrt{a_{vv} - \sum_{u=1}^{v-1} l_{vu}^2}, and for off-diagonal entries with t > v, l_{tv} = \frac{a_{tv} - \sum_{u=1}^{v-1} l_{tu} l_{vu}}{l_{vv}}.^[16] This process requires approximately n^3/3 floating-point operations for an n \times n matrix, about half the cost of LU decomposition for the same task.^[18] Cholesky decomposition offers key advantages over general methods like LU factorization: it stores only the lower triangular part of L (halving memory usage) and performs fewer operations by exploiting symmetry, making it particularly efficient for large-scale computations on positive-definite matrices.^[17] It serves as a special case of LU decomposition for symmetric positive-definite matrices, where the upper triangular factor equals the transpose of the lower one.^[19] Applications include solving linear systems Ax = b efficiently via forward substitution to solve Ly = b followed by back substitution for L^T x = y, which is common in optimization problems involving quadratic forms.^[18] In statistics, it decomposes covariance matrices of multivariate normal distributions to generate correlated random samples by transforming independent standard normals through L.^[20] For example, consider the 2×2 covariance matrix A = \begin{pmatrix} 4 & 2 \\ 2 & 5 \end{pmatrix}, which is symmetric positive definite. Its Cholesky decomposition is L = \begin{pmatrix} 2 & 0 \\ 1 & 2 \end{pmatrix}, since LL^T = \begin{pmatrix} 4 & 2 \\ 2 & 5 \end{pmatrix}.^[18]

QR Decomposition

The QR decomposition, also known as QR factorization, expresses an m \times n matrix A (with m \geq n) as the product A = QR, where Q is an m \times m orthogonal matrix satisfying Q^T Q = I_m and R is an m \times n upper triangular matrix.^[21] In this full factorization, the bottom (m - n) \times n block of R consists of zeros.^[21] This decomposition exists for any real matrix A, regardless of rank.^[22] When A has full column rank (rank n), the reduced or thin QR decomposition takes the form A = \hat{Q} \hat{R}, where \hat{Q} is an m \times n matrix with orthonormal columns (\hat{Q}^T \hat{Q} = I_n) and \hat{R} is an n \times n upper triangular matrix; this reduced form is unique if the diagonal entries of \hat{R} are required to be positive.^[21]^[22] The economy-sized variant, often used for rectangular matrices to save computation and storage, corresponds to this reduced form by omitting the unnecessary columns of the full Q and rows of R.^[23] Several algorithms compute the QR decomposition, each with trade-offs in stability and efficiency. The classical Gram-Schmidt process orthogonalizes the columns of A sequentially but suffers from numerical instability due to subtractive cancellation in finite precision arithmetic.^[21] The modified Gram-Schmidt variant improves stability by performing projections in a different order.^[21] More robust methods include Householder reflections, which apply a sequence of orthogonal transformations to triangularize A while preserving its column space, achieving backward stability at an O(m n^2) cost for the thin form; this approach was introduced by Householder in 1958. Givens rotations offer an alternative, zeroing elements pairwise via 2-by-2 orthogonal rotations, which is useful for sparse or structured matrices but generally slower for dense cases.^[21] A primary application of QR decomposition is solving overdetermined linear least squares problems \min_x \| Ax - b \|_2 for m > n. Substituting A = QR yields \| QRx - b \|_2 = \| Rx - Q^T b \|_2 since Q preserves norms (\| Q y \|_2 = \| y \|_2); the upper triangular system R x = Q^T b can then be solved efficiently by back-substitution, avoiding the ill-conditioning of the normal equations A^T A x = A^T b.^[21] For symmetric matrices, QR decomposition facilitates eigenvalue computations via iterations of the form A_{k+1} = R_k Q_k, where A = Q_k R_k, preserving symmetry and leading to triangular convergence without explicit eigenvector calculation in the basic QR algorithm.^[24] It also supports updating LU decompositions for rank-one modifications in adaptive least squares contexts.^[25] As a representative example, consider orthogonalizing the columns of a 3×2 matrix A = \begin{pmatrix} 1 & 1 \\ 1 & 0 \\ 0 & 1 \end{pmatrix} using the modified Gram-Schmidt process. Normalize the first column to obtain the first column of \hat{Q}: q_1 = \frac{1}{\sqrt{2}} \begin{pmatrix} 1 \\ 1 \\ 0 \end{pmatrix}. Project the second column onto q_1 and normalize the orthogonal component: the projection coefficient is q_1^T a_2 = \frac{1}{\sqrt{2}}, so the residual is a_2 - (q_1^T a_2) q_1 = \begin{pmatrix} 1 \\ 0 \\ 1 \end{pmatrix} - \frac{1}{2} \begin{pmatrix} 1 \\ 1 \\ 0 \end{pmatrix} = \begin{pmatrix} 0.5 \\ -0.5 \\ 1 \end{pmatrix}, with norm \sqrt{1.5} = \sqrt{3/2}. Thus, q_2 = \frac{1}{\sqrt{6}} \begin{pmatrix} 1 \\ -1 \\ 2 \end{pmatrix}, yielding

\hat{Q} = \begin{pmatrix} \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{6}} \\ \frac{1}{\sqrt{2}} & -\frac{1}{\sqrt{6}} \\ 0 & \frac{2}{\sqrt{6}} \end{pmatrix}, \quad \hat{R} = \begin{pmatrix} \sqrt{2} & \frac{1}{\sqrt{2}} \\ 0 & \sqrt{\frac{3}{2}} \end{pmatrix},

verifying A = \hat{Q} \hat{R}.

Spectral Decompositions

Eigendecomposition

Eigendecomposition provides a way to factorize a square matrix into a form that reveals its spectral properties, assuming the matrix is diagonalizable. For an n \times n diagonalizable matrix A over the complex numbers, the eigendecomposition is given by A = V \Lambda V^{-1}, where V is an invertible matrix whose columns are the right eigenvectors of A, and \Lambda is a diagonal matrix containing the eigenvalues \lambda_i of A along its main diagonal.^[26] This representation diagonalizes A, transforming it into a simpler form where operations become scalar multiplications on the eigenvalues. A matrix A is diagonalizable if and only if it possesses n linearly independent eigenvectors, which occurs when the algebraic multiplicity of each eigenvalue equals its geometric multiplicity.^[27] For real symmetric matrices, the spectral theorem guarantees diagonalizability over the reals, with all eigenvalues real and the eigenvectors forming an orthogonal basis, yielding A = Q \Lambda Q^T where Q is orthogonal.^[28] More generally, for Hermitian matrices over the complex numbers, the spectral theorem states that A = Q \Lambda Q^*, where Q is unitary and \Lambda has real eigenvalues. Computing the eigendecomposition typically involves iterative algorithms that approximate eigenvalues and eigenvectors. The power iteration method, a simple iterative technique, converges to the dominant eigenvalue and its eigenvector by repeatedly applying the matrix to an initial vector and normalizing, effective when one eigenvalue dominates in magnitude.^[27] For more general cases, the QR algorithm, independently developed by John G. F. Francis in 1959–1961 and Vera Kublanovskaya in 1961, iteratively performs QR decompositions followed by reverse multiplications to drive the matrix toward a form revealing its eigenvalues, often after a Hessenberg reduction for efficiency.^[29] This algorithm forms the basis of modern implementations in numerical libraries for dense matrices. Eigendecomposition finds applications in analyzing dynamical systems and stochastic processes. For matrix powers, if A = V \Lambda V^{-1}, then A^k = V \Lambda^k V^{-1}, simplifying computations of iterates like transition probabilities in Markov chains, where the dominant eigenvalue (typically 1) determines the stationary distribution.^[30] In systems of linear differential equations of the form \mathbf{x}' = A \mathbf{x}, the solution is \mathbf{x}(t) = V e^{\Lambda t} V^{-1} \mathbf{x}(0), decoupling the system into independent scalar equations along eigenvector directions.^[31] Matrices that are not diagonalizable require alternative forms like the Jordan canonical form for similar analyses. As an illustrative example, consider the 2×2 rotation matrix by angle \theta,

R = \begin{pmatrix} \cos \theta & -\sin \theta \\ \sin \theta & \cos \theta \end{pmatrix}.

The characteristic equation \det(R - \lambda I) = 0 yields eigenvalues \lambda = e^{\pm i \theta}, which are complex unless \theta = 0 or \pi (modulo $2\pi), reflecting the lack of real eigenvectors for non-trivial rotations in the plane.^[32] The corresponding eigenvectors involve complex entries, such as \begin{pmatrix} 1 \\ -i \end{pmatrix} for \lambda = e^{i \theta}.

Schur Decomposition

The Schur decomposition, named after mathematician Issai Schur who established its existence in 1909, provides a fundamental factorization for square matrices over the complex numbers. For any complex square matrix A \in \mathbb{C}^{n \times n}, there exists a unitary matrix Q \in \mathbb{C}^{n \times n} (satisfying Q^* Q = I, where Q^* denotes the conjugate transpose) and an upper triangular matrix T \in \mathbb{C}^{n \times n} such that

A = Q T Q^*.

The diagonal entries of T are the eigenvalues of A, appearing in some order, which follows from the similarity transformation preserving the spectrum of the matrix. This decomposition triangularizes A via a unitary similarity, ensuring numerical stability in computations due to the orthogonality of Q.^[33]^[34] For real matrices, a variant known as the real Schur form addresses the presence of complex conjugate eigenvalue pairs while maintaining real arithmetic. Any real square matrix A \in \mathbb{R}^{n \times n} admits an orthogonal matrix U \in \mathbb{R}^{n \times n} (satisfying U^T U = I) and a block upper triangular matrix S \in \mathbb{R}^{n \times n} such that

A = U S U^T,

where the diagonal blocks of S are either 1×1 (for real eigenvalues) or 2×2 (for pairs of complex conjugate eigenvalues). The 2×2 blocks have the form of real companion matrices corresponding to quadratic factors of the characteristic polynomial, ensuring all entries remain real. This form is particularly useful in practical implementations to avoid complex numbers.^[33]^[35] The Schur decomposition is typically computed using the QR algorithm with shifts, applied after an initial reduction to upper Hessenberg form to enhance efficiency. The QR iteration iteratively decomposes the matrix into an orthogonal factor and an upper triangular factor, then recombines them via similarity; shifts accelerate convergence toward the triangular form by targeting individual eigenvalues. This process yields the desired Q and T (or S) factors, with the Hessenberg reduction handled separately via Householder transformations. The algorithm's backward stability stems from the unitary transformations, making it a cornerstone for reliable eigenvalue solvers in numerical libraries.^[24]^[34] In applications, the Schur decomposition underpins eigenvalue algorithms by reducing the problem to triangularization, from which eigenvalues are directly read off the diagonal and eigenvectors can be derived via back-substitution. Its stability ensures accurate computation even for ill-conditioned matrices, supporting tasks like dynamic mode decomposition in control theory and model reduction in simulations. For normal matrices, the decomposition simplifies to the eigendecomposition, where T becomes diagonal.^[35]^[24] The Schur decomposition is not unique; the eigenvalues on the diagonal can be ordered arbitrarily, and for repeated eigenvalues, the corresponding invariant subspaces allow multiple choices for the columns of Q. In the real case, the block structure for complex pairs is unique up to the 2×2 block's internal unitary freedom, but the overall partitioning depends on eigenvalue ordering.^[36]^[33] As an illustrative example, consider the 2×2 real matrix

A = \begin{pmatrix} 1 & 2 \\ 3 & 4 \end{pmatrix}.

Its eigenvalues are \lambda_1 = \frac{5 + \sqrt{33}}{2} \approx 5.372 and \lambda_2 = \frac{5 - \sqrt{33}}{2} \approx -0.372. A corresponding Schur decomposition is

A = Q T Q^*,

where T = \begin{pmatrix} \lambda_1 & * \\ 0 & \lambda_2 \end{pmatrix} (with the off-diagonal entry determined by the transformation), and Q is unitary with columns as Schur vectors, confirming the triangular form with eigenvalues on the diagonal.

Singular Value Decomposition

The singular value decomposition (SVD) provides a canonical factorization for any real matrix A \in \mathbb{R}^{m \times n} (with a similar form for complex matrices), given by A = U \Sigma V^T, where U \in \mathbb{R}^{m \times m} and V \in \mathbb{R}^{n \times n} are orthogonal matrices, and \Sigma \in \mathbb{R}^{m \times n} is a rectangular diagonal matrix containing the singular values \sigma_1 \geq \sigma_2 \geq \dots \geq \sigma_{\min(m,n)} \geq 0 along its main diagonal, with all off-diagonal entries zero. This decomposition generalizes eigendecomposition to rectangular matrices by capturing their intrinsic geometric structure through orthogonal transformations and scaling. The singular values \sigma_i satisfy \sigma_i = \sqrt{\lambda_i}, where \lambda_i are the eigenvalues of A^T A (or equivalently A A^T), ordered decreasingly; the associated right singular vectors (columns of V) are the eigenvectors of A^T A, and the left singular vectors (columns of U) are the eigenvectors of A A^T. The rank of A equals the number of strictly positive singular values, providing a direct measure of the matrix's linear independence. Furthermore, the SVD enables optimal low-rank approximations: for any k \leq \min(m,n), the matrix A_k = \sum_{i=1}^k \sigma_i u_i v_i^T minimizes \|A - B\|_F over all rank-k matrices B, where \| \cdot \|_F denotes the Frobenius norm, as established by the Eckart–Young theorem. Computing the SVD typically involves the Golub–Kahan algorithm, which first applies Householder reflections to bidiagonalize A (reducing it to a form with nonzeros only on the main diagonal and adjacent sub- and super-diagonals), followed by iterative QR decompositions on the bidiagonal matrix to deflate and compute the singular values and vectors accurately and efficiently. This method, with subsequent refinements like divide-and-conquer or Jacobi rotations, forms the basis for implementations in numerical libraries such as LAPACK. Key applications of SVD include solving underdetermined or overdetermined linear systems via the Moore–Penrose pseudoinverse A^+ = V \Sigma^+ U^T, where \Sigma^+ reciprocates the nonzero \sigma_i on its diagonal and sets zeros elsewhere, enabling stable least-squares solutions even for rank-deficient matrices. In data analysis, SVD underpins principal component analysis (PCA): for a centered data matrix X \in \mathbb{R}^{m \times n} with m observations and n variables, the SVD X = U \Sigma V^T identifies the principal components as the columns of V, with the proportion of variance explained by the i-th component given by \sigma_i^2 / \sum_j \sigma_j^2, facilitating dimensionality reduction and noise filtering. SVD also supports image compression by representing an image as a matrix and retaining only the k largest singular values for a rank-k approximation, which preserves essential visual features while significantly reducing file size, as demonstrated in techniques achieving high compression ratios with minimal perceptual loss. Variants of the SVD include the compact (or economy) form, which omits zero singular values for a rank-r matrix A (where r is the number of positive \sigma_i), yielding A = U_r \Sigma_r V_r^T with U_r \in \mathbb{R}^{m \times r}, \Sigma_r \in \mathbb{R}^{r \times r}, and V_r \in \mathbb{R}^{n \times r}, thus requiring less storage and computation. The truncated SVD extends this by selecting only the first k < r terms, explicitly for approximation purposes. As a concrete illustration, consider the 3×2 matrix

A = \begin{pmatrix} 1 & 1 \\ 1 & 0 \\ 0 & 1 \end{pmatrix}.

The SVD computation begins with A^T A = \begin{pmatrix} 2 & 1 \\ 1 & 2 \end{pmatrix}, whose eigenvalues are \lambda_1 = 3 and \lambda_2 = 1, yielding singular values \sigma_1 = \sqrt{3} \approx 1.732 and \sigma_2 = 1. The right singular vectors v_1, v_2 are the normalized eigenvectors of A^T A: for \lambda_1 = 3, an unnormalized eigenvector is \begin{pmatrix} 1 \\ 1 \end{pmatrix}, normalized to \frac{1}{\sqrt{2}} \begin{pmatrix} 1 \\ 1 \end{pmatrix}; for \lambda_2 = 1, it is \begin{pmatrix} 1 \\ -1 \end{pmatrix}, normalized to \frac{1}{\sqrt{2}} \begin{pmatrix} 1 \\ -1 \end{pmatrix}. The corresponding left singular vectors are then u_i = \frac{1}{\sigma_i} A v_i for i=1,2, with a third orthogonal vector completing the full U if needed. This process highlights how SVD extracts the matrix's rank (here 2) and directional scalings.

Advanced and Specialized Decompositions

Polar Decomposition

In linear algebra, the polar decomposition provides a factorization of a square matrix A \in \mathbb{C}^{n \times n} (or \mathbb{R}^{n \times n}) into the product of a unitary matrix U (orthogonal for real matrices) and a positive semidefinite Hermitian matrix P, such that A = UP. This decomposition is unique when A is invertible, with P being the unique positive semidefinite square root of A^*A, where A^* denotes the conjugate transpose. The form generalizes the polar form of complex numbers, separating the "phase" (unitary part) from the "magnitude" (positive part), and exists for any square matrix, though uniqueness requires invertibility. The right polar decomposition is A = UP, while the left form is A = PU, where now U is unitary and P = \sqrt{AA^*} is positive semidefinite. Explicitly, P = \sqrt{A^*A} and U = A (A^*A)^{-1/2} for the right form when A is invertible; the left form follows analogously with roles reversed. For noninvertible matrices, the decomposition still holds, but P may be singular, and uniqueness is guaranteed only for the positive semidefinite factor among decompositions with the same range. This structure highlights geometric properties, such as P representing a pure scaling and U a rigid rotation or isometry. A practical algorithm computes the polar decomposition via the singular value decomposition (SVD) of A = V \Sigma W^*, where V and W are unitary and \Sigma is diagonal with nonnegative entries. For the right polar form, set U = V W^* and P = W \Sigma W^*; the left form uses P = V \Sigma V^* and U = V W^*. This method is numerically stable and leverages efficient SVD routines, making it suitable for computational implementations despite the cubic cost of SVD. For rectangular matrices A \in \mathbb{C}^{m \times n} with m \geq n, a generalized right polar decomposition A = UH exists, where U \in \mathbb{C}^{m \times n} has orthonormal columns and H \in \mathbb{C}^{n \times n} is positive semidefinite, again derivable from SVD.^[37] Applications of polar decomposition span several fields. In continuum mechanics, it decomposes the deformation gradient tensor F as F = RU, where R is the rotation tensor (orthogonal) and U is the right stretch tensor (positive definite symmetric), enabling separation of rigid body motion from pure strain for analysis of material deformation.^[38] In optics, particularly polarimetry, the decomposition of Mueller matrices extracts intrinsic polarization properties: a diattenuator (linear dichroism), a retarder (phase shift), and a depolarizer, providing quantitative interpretation of light-scattering media such as biological tissues.^[39] Another use is finding the closest unitary matrix to a given A, which is the unitary factor U in the decomposition, useful in optimization and approximation problems.^[37] For example, consider the real $2 \times 2 matrix A = \begin{pmatrix} 1 & 1 \\ 0 & 1 \end{pmatrix}. First, compute A^T A = \begin{pmatrix} 1 & 1 \\ 1 & 2 \end{pmatrix}, whose eigenvalues are \frac{3 \pm \sqrt{5}}{2} with eigenvectors yielding the positive semidefinite P = \sqrt{A^T A} = \begin{pmatrix} \frac{1 + \sqrt{5}}{4} & \frac{\sqrt{5} - 1}{4} \\ \frac{\sqrt{5} - 1}{4} & \frac{3 - \sqrt{5}}{4} \end{pmatrix}. Then, U = A P^{-1} = \begin{pmatrix} \frac{5 - \sqrt{5}}{4} & \frac{\sqrt{5} - 1}{4} \\ -\frac{\sqrt{5} - 1}{4} & \frac{3 + \sqrt{5}}{4} \end{pmatrix}, which is orthogonal since U^T U = I. This illustrates how A separates into rotation-scaling factors.

Jordan Canonical Form

The Jordan canonical form provides a canonical representation for square matrices over an algebraically closed field, such as the complex numbers, particularly useful for matrices that are not diagonalizable. Introduced by Camille Jordan in his 1874 treatise on substitutions and algebraic equations, it expresses any n \times n matrix A as similar to a unique (up to permutation of blocks) block-diagonal matrix J, via J = P^{-1} A P where P is invertible.^[40]^[41] Each block in J is a Jordan block J_k(\lambda), corresponding to an eigenvalue \lambda, defined as J_k(\lambda) = \lambda I_k + N_k, where I_k is the k \times k identity matrix and N_k is the nilpotent matrix with 1's on the superdiagonal and zeros elsewhere.^[41]

J_k(\lambda) = \begin{pmatrix} \lambda & 1 & 0 & \cdots & 0 \\ 0 & \lambda & 1 & \cdots & 0 \\ \vdots & \vdots & \ddots & \ddots & \vdots \\ 0 & 0 & \cdots & \lambda & 1 \\ 0 & 0 & \cdots & 0 & \lambda \end{pmatrix}

For an eigenvalue \lambda with algebraic multiplicity m (the multiplicity in the characteristic polynomial), the Jordan form decomposes the corresponding generalized eigenspace into a direct sum of cyclic subspaces, each associated with a Jordan block. The number of blocks equals the geometric multiplicity g = \dim \ker(A - \lambda I), which is at most m, and the sizes of the blocks (the lengths of the Jordan chains) are determined by the differences in dimensions of the kernels of powers of (A - \lambda I)^j for j = 1, \dots, m. The largest block size equals the index of \lambda in the minimal polynomial of A, which divides the characteristic polynomial and annihilates A. This structure extends the eigendecomposition to defective matrices, where g < m, by incorporating generalized eigenvectors satisfying (A - \lambda I)^k v = 0 but not for smaller k.^[41] The existence of the Jordan canonical form requires the base field to be algebraically closed, ensuring all eigenvalues lie in the field; over the reals, a real Jordan form with 2×2 blocks for complex conjugate pairs is used instead. The form is unique up to the ordering of blocks, and the minimal polynomial's degree for each \lambda governs the largest block size for that eigenvalue. Computationally, one first finds the eigenvalues via the characteristic polynomial, then for each \lambda, computes the generalized eigenspace \ker((A - \lambda I)^m) and the Weyr characteristic (dimensions of \ker((A - \lambda I)^j)) to determine block sizes. Jordan chains are constructed by solving (A - \lambda I) v_{j} = v_{j-1} starting from eigenvectors, but this process is numerically unstable for large or ill-conditioned matrices due to sensitivity to perturbations in defective cases. Practical algorithms, such as those using Schur decomposition followed by rank-revealing decompositions, mitigate this but rarely compute the exact form; instead, they approximate the structure.^[41] Applications of the Jordan canonical form include simplifying the solution of linear systems of differential equations, where e^{At} is block-diagonalized, and analyzing nilpotency in powers of (A - \lambda I)^k. In control theory, it determines the controllability indices and minimal realizations of linear systems by revealing the chain lengths for uncontrollable or unobservable modes. For instance, consider the 3×3 matrix

A = \begin{pmatrix} 3 & 1 & 0 \\ 0 & 3 & 2 \\ 0 & 0 & 4 \end{pmatrix},

which has eigenvalues \lambda = 3 (algebraic multiplicity 2, geometric multiplicity 1) and \lambda = 4 (multiplicity 1). Its Jordan form is

J = \begin{pmatrix} 3 & 1 & 0 \\ 0 & 3 & 0 \\ 0 & 0 & 4 \end{pmatrix},

achieved via similarity with P whose columns are a generalized eigenvector chain for \lambda=3 (satisfying (A - 3I)^2 v_2 = 0, (A - 3I) v_2 = v_1, (A - 3I) v_1 = 0) and an eigenvector for \lambda=4. This reveals the single Jordan block of size 2 for the defective eigenvalue.^[41]

Hessenberg and Tridiagonal Forms

The Hessenberg form of a square matrix is an intermediate structure that facilitates efficient computation of eigenvalues and eigenvectors by reducing the fill-in that occurs during iterative algorithms. An upper Hessenberg matrix H is defined such that h_{ij} = 0 for all i > j+1, meaning all entries below the first subdiagonal are zero. Any n \times n matrix A can be reduced to upper Hessenberg form via a unitary similarity transformation A = Q H Q^*, where Q is unitary and H is upper Hessenberg. This decomposition preserves the eigenvalues of A because similarity transformations do not alter the spectrum. The transformation is reversible, allowing recovery of the original matrix from H and Q. The reduction to Hessenberg form is typically achieved using Householder reflections or Givens rotations, with Householder methods being preferred for their efficiency in zeroing multiple elements simultaneously. The algorithm proceeds column by column: for each column k = 1 to n-2, a Householder reflector is constructed from the subvector A(k+1:n, k) to zero the entries below the subdiagonal in that column, and this reflector is applied from both the left and right to maintain similarity. The overall computational cost is \frac{10}{3} n^3 floating-point operations for an n \times n matrix, though the work per column is O(n^2), making it practical for large n. The process is backward stable, meaning the computed \tilde{H} satisfies \tilde{Q} \tilde{H} \tilde{Q}^* = A + \delta A with \|\delta A\| = O(\epsilon_{\text{mach}} \|A\|), where \epsilon_{\text{mach}} is machine epsilon. For symmetric matrices, the Hessenberg reduction simplifies to a tridiagonal form, where A = Q T Q^T and T is tridiagonal (symmetric with zeros outside the main diagonal and the adjacent sub- and superdiagonals). This is because the symmetry of A implies the resulting Hessenberg matrix is also symmetric, forcing the superdiagonal structure to match the subdiagonal. The same Householder-based algorithm applies, but the right multiplications preserve symmetry, and the cost remains \frac{4}{3} n^3 flops for the symmetric case. These forms serve as preprocessing steps in eigenvalue algorithms. In the QR algorithm for computing the Schur decomposition, the initial reduction to Hessenberg or tridiagonal form ensures that subsequent iterations preserve the structure and avoid unnecessary fill-in, significantly improving efficiency. In Krylov subspace methods, such as the Arnoldi iteration, the process generates an upper Hessenberg projection of the matrix onto the subspace, which approximates eigenvalues and eigenvectors for large sparse systems. Example
Consider the $4 \times 4 matrix

A = \begin{pmatrix} 1 & 15 & -6 & 0 \\ 1 & 7 & 3 & 12 \\ 2 & -7 & -3 & 0 \\ 2 & -28 & 15 & 3 \end{pmatrix}.

The reduction begins with the first column: the subvector from row 2 to 4 is [1, 2, 2]^T, with norm \sqrt{9} = 3. A Householder reflector is constructed to map this subvector to [3, 0, 0]^T (choosing the sign for numerical stability), zeroing a_{31} and a_{41}. This reflector is applied to rows and columns 2 through 4 of A. Subsequent steps target the second column (zeroing a_{42}) and the third column (already zero below the subdiagonal). The resulting H is upper Hessenberg, similar to A, with nonzeros only on and above the main diagonal and the subdiagonal.

Variants and Generalizations

Block and Rank-Based Decompositions

Block and rank-based decompositions extend classical factorizations like LU and QR to handle structured matrices or reveal intrinsic low-rank properties, enabling efficient computation on modern hardware and for data compression tasks. These methods partition matrices into blocks for parallelism or incorporate pivoting to identify the numerical rank, which is crucial when matrices are ill-conditioned or approximately low-rank due to noise or approximations in applications such as signal processing. By focusing on block operations, algorithms achieve better cache utilization and scalability on parallel architectures, while rank-revealing variants provide bounds on singular values to approximate the matrix's effective dimension. The block LU decomposition applies the standard LU factorization recursively to block-partitioned matrices, facilitating parallel computation by allowing independent processing of diagonal blocks. For a square matrix A partitioned as A = \begin{bmatrix} A_{11} & A_{12} \\ A_{21} & A_{22} \end{bmatrix}, where A_{11} is k \times k, the block LU takes the form A = \begin{bmatrix} L_{11} & 0 \\ L_{21} & I \end{bmatrix} \begin{bmatrix} U_{11} & U_{12} \\ 0 & S \end{bmatrix}, with L_{11} and U_{11} from the LU of A_{11}, L_{21} = A_{21} U_{11}^{-1}, U_{12} = L_{11}^{-1} A_{12}, and S = A_{22} - L_{21} U_{12} as the Schur complement for further recursion. This structure supports parallelism by computing the factorization of S independently after the initial blocks, reducing synchronization overhead in distributed systems.^[42] A simple example illustrates the block LU on a 4×4 matrix partitioned into 2×2 blocks: Let A = \begin{bmatrix} 1 & 2 & 3 & 4 \\ 5 & 6 & 7 & 8 \\ 9 & 10 & 11 & 12 \\ 13 & 14 & 15 & 16 \end{bmatrix} = \begin{bmatrix} A_{11} & A_{12} \\ A_{21} & A_{22} \end{bmatrix}, with A_{11} = \begin{bmatrix} 1 & 2 \\ 5 & 6 \end{bmatrix}. The LU of A_{11} is L_{11} = \begin{bmatrix} 1 & 0 \\ 5 & 1 \end{bmatrix}, U_{11} = \begin{bmatrix} 1 & 2 \\ 0 & -4 \end{bmatrix}, so U_{11}^{-1} = \begin{bmatrix} 1 & 0.5 \\ 0 & -0.25 \end{bmatrix}, L_{21} = A_{21} U_{11}^{-1} = \begin{bmatrix} 9 & 10 \\ 13 & 14 \end{bmatrix} \begin{bmatrix} 1 & 0.5 \\ 0 & -0.25 \end{bmatrix} = \begin{bmatrix} 9 & 2 \\ 13 & 3 \end{bmatrix}. The Schur complement S = 0 due to the matrix's rank deficiency. This decomposition is used in parallel solvers for large dense systems, where block sizes are tuned for load balancing across processors.^[43]^[44] Rank-revealing QR (RRQR) enhances the QR decomposition with column pivoting to expose the matrix's numerical rank, particularly useful for ill-conditioned or rank-deficient matrices. In RRQR, a permutation matrix P is chosen such that A P = Q R, where Q is orthogonal, R is upper triangular, and the trailing subdiagonal elements of R are small, ensuring |R_{ij}| \leq f \sigma_{\min}(R_{11}) for i > r, j \geq r+1, with f a small factor and r the numerical rank. This reveals the rank by inspecting R's structure, providing a low-rank approximation A \approx Q(:,1:r) R(1:r,:) with error bounded by the tail singular values. The algorithm extends classical QR by selecting pivots that maximize the ratio |r_{jj}| / \| \mathbf{r}_j \|, running in O(m n^2) time for m \times n matrices. Strong variants guarantee tighter bounds on singular values, improving reliability for subset selection in least squares problems.^[45] The URV decomposition serves as a rank-revealing analog to the SVD, factoring A = U R V^T where U and V are orthogonal, and R is triangular with its leading r \times r block well-conditioned, revealing the rank through small subdiagonal norms. Introduced for subspace tracking, it approximates the SVD's rank revelation but avoids full orthogonalization, making it efficient for updates in dynamic applications like signal processing. Algorithms compute URV via bidiagonalization followed by rank-revealing steps, with costs comparable to QR but providing both row and column space approximations.^[46] Rank factorization decomposes a matrix A \in \mathbb{R}^{m \times n} of rank r as A = C R, where C \in \mathbb{R}^{m \times r} has full column rank and R \in \mathbb{R}^{r \times n} has full row rank, spanning the column and row spaces exactly. This basic form enables low-rank approximations by truncating to rank k < r, minimizing Frobenius norm error among rank-k matrices when combined with SVD-derived factors, though direct computation uses row echelon forms. It underpins model order reduction by projecting high-dimensional systems onto lower-dimensional subspaces while preserving block structures in partitioned models. Blocked Householder algorithms optimize QR and related decompositions for cache efficiency by applying Householder reflections in panels, reducing memory accesses in level-3 BLAS operations. In blocked QR, the matrix is divided into blocks, with each panel factorized using unblocked Householder, then applied via block reflections to the trailing matrix, achieving O(n^3) performance on hierarchical memory systems. This approach extends to block LU and URV, enhancing parallelism in multicore environments. These decompositions find applications in parallel computing, where block LU enables task-based parallelism for solving large-scale linear systems on distributed architectures, and in model order reduction, where rank-revealing QR and URV identify dominant subspaces for approximating high-fidelity simulations with reduced models. For instance, RRQR supports subset selection in data mining, selecting informative columns for compression.^[47]^[48]

Decompositions for Nonsquare Matrices

Matrix decompositions extend naturally to nonsquare matrices, particularly rectangular ones where the number of rows m differs from the number of columns n. For tall matrices with m > n, the generalized LU decomposition via Gaussian elimination transforms the matrix into a row echelon form, consisting of an m \times n upper triangular matrix U (possibly with zero rows at the bottom) and permutation matrices accounting for pivoting, such that PA = LU where L is lower triangular but adapted for the rectangular case. This form facilitates solving overdetermined systems by back-substitution after forward elimination. The thin (or reduced) QR decomposition is particularly useful for tall matrices, factoring an m \times n matrix A (with m > n and full column rank) as A = QR, where Q is an m \times n matrix with orthonormal columns and R is an n \times n upper triangular matrix. This decomposition is computed using Householder reflections or Gram-Schmidt orthogonalization, preserving numerical stability for least squares computations. For wide matrices with m < n, a similar economy QR form exists with Q as m \times m orthogonal and R as m \times n upper trapezoidal. The singular value decomposition (SVD) applies directly to rectangular matrices, yielding A = U \Sigma V^T where U is m \times m orthogonal, \Sigma is m \times n diagonal with nonnegative singular values, and V is n \times n orthogonal. For efficiency with nonsquare matrices, economy (or thin) SVD variants are used: for m > n, A = U_{m \times n} \Sigma_{n \times n} V_{n \times n}^T with U having orthonormal columns; for m < n, a similar reduced form discards unnecessary zero singular values and corresponding vectors. These compact forms reduce computational cost while retaining essential information for rank and approximation. These decompositions are central to solving overdetermined systems Ax = b with m > n, minimizing the Euclidean residual \|Ax - b\|_2 via the normal equations or directly through QR or SVD. For underdetermined systems with m < n, they enable minimum-norm solutions \min \|x\|_2 subject to Ax = b, often using the pseudoinverse derived from SVD. The SVD connection to the pseudoinverse provides a unified framework for both cases.^[49] An intermediate step in computing the SVD of a rectangular matrix is bidiagonalization, which reduces A to an m \times n upper bidiagonal form B via Householder transformations: A = U B V^T, where U and V are orthogonal.^[49] The Golub-Kahan procedure applies alternating left and right Householder reflections to zero subdiagonal elements progressively, yielding a form amenable to iterative SVD refinement like the QR algorithm on the bidiagonal matrix.^[49] This step is crucial for numerical stability in rectangular SVD computations. For illustration, consider the overdetermined system with the 3×2 matrix A = \begin{pmatrix} 1 & 1 \\ 1 & 2 \\ 1 & 3 \end{pmatrix} and right-hand side b = \begin{pmatrix} 1 \\ 2 \\ 2 \end{pmatrix}. The thin QR decomposition is A = QR with

Q \approx \begin{pmatrix} 0.5774 & -0.7071 \\ 0.5774 & 0 \\ 0.5774 & 0.7071 \end{pmatrix}, \quad R \approx \begin{pmatrix} 1.7321 & 3.4641 \\ 0 & 1.4142 \end{pmatrix}.

Solving Rx = Q^T b via back-substitution gives the least squares solution x \approx (0.6667, 0.5000)^T, minimizing \|Ax - b\|_2 \approx 0.4082.

Extensions to Other Structures

Matrix decompositions extend beyond finite-dimensional matrices to higher-order structures and infinite-dimensional operators, providing analogous tools for analysis in multilinear algebra and functional analysis. In tensor decompositions, the CANDECOMP/PARAFAC (CP) decomposition represents a higher-order tensor as a sum of rank-1 terms, where each term is the outer product of vectors along each mode of the tensor. This generalization of matrix low-rank approximation was first proposed by Hitchcock in 1927 and later independently developed as PARAFAC by Cattell and Harshman in the 1970s.^[50] The CP model is particularly useful for capturing multiway interactions in data, with the tensor rank indicating the minimal number of such components needed. The Tucker decomposition serves as a higher-order analog to the singular value decomposition (SVD), factoring a tensor into a smaller core tensor multiplied by factor matrices along each mode, allowing for multilinear rank reduction. Introduced by Tucker in 1963 and refined in subsequent works, it enables flexible dimensionality reduction while preserving tensor structure, often applied in signal processing and data compression.^[50] For a third-order tensor \mathcal{X} \in \mathbb{R}^{I \times J \times K}, the Tucker form is \mathcal{X} = \mathcal{G} \times_1 A \times_2 B \times_3 C, where \mathcal{G} is the core tensor and A, B, C are factor matrices.^[50] In the realm of operator theory, the spectral theorem generalizes eigendecomposition to self-adjoint operators on Hilbert spaces, asserting that such an operator T admits a spectral decomposition T = \int \lambda \, dE(\lambda), where E is a projection-valued measure supported on the real spectrum of T. This holds for bounded operators and extends to unbounded self-adjoint operators via the functional calculus, which defines functions of the operator as f(T) = \int f(\lambda) \, dE(\lambda), enabling the application of polynomials and continuous functions to infinite-dimensional settings.^[51] These extensions are foundational in quantum mechanics and partial differential equations, where operators may not be bounded. Further generalizations include the QZ decomposition for matrix pencils, which computes a generalized Schur form A - \lambda B = Q S Z^{-1}, where Q and Z are orthogonal (or unitary) and S is quasitriangular, facilitating the solution of generalized eigenvalue problems \det(A - \lambda B) = 0. Developed in the 1970s as an extension of the Schur form, it is essential in control theory for analyzing system stability and pole placement in descriptor systems.^[52] In symplectic geometry, the Williamson normal form decomposes a positive definite symmetric matrix M via a symplectic transformation S such that S^T M S = \operatorname{diag}(D_+, D_-), where D_+ and D_- are diagonal blocks of symplectic eigenvalues, preserving the symplectic structure in Hamiltonian mechanics.^[53] Similarly, the Takagi factorization applies to complex symmetric matrices A = U \Sigma U^T, with U unitary and \Sigma diagonal containing non-negative singular values, generalizing the SVD for non-Hermitian cases in quantum information and vibration analysis.^[54] These extensions find applications in multidimensional data analysis, such as using CP decomposition in recommender systems to model user-item-context interactions from sparse tensors, improving prediction accuracy over matrix-based methods.^[55] In control theory, the QZ decomposition aids in solving generalized eigenvalue problems for linear descriptor systems, enabling robust stability assessments.^[56] As an illustrative example, consider a $2 \times 2 \times 2 tensor \mathcal{X} with CP rank 2:

\mathcal{X} \approx \sum_{r=1}^2 \mathbf{u}_r \circ \mathbf{v}_r \circ \mathbf{w}_r,

where \mathbf{u}_r, \mathbf{v}_r, \mathbf{w}_r \in \mathbb{R}^2 are component vectors; for instance, with rank-1 terms capturing bilinear and trilinear dependencies in small-scale multiway data. Recent advances in multilinear algebra, building on these foundations, have expanded tensor decompositions to handle large-scale and structured data, addressing limitations in earlier matrix-centric approaches.^[50]

References

[1]
Linear Algebra and Matrix Decompositions - Duke People
Matrix decompositions are an important step in solving linear systems in a computationally efficient manner.Lu Decomposition And... · Cholesky Decomposition · EigendecompositionMissing: scholarly sources
[2]
[PDF] THE DECOMPOSITIONAL APPROACH TO MATRIX COMPUTATION
TO MATRIX COMPUTATION. The introduction of matrix decomposition into numerical linear algebra revolutionized matrix computations. This article outlines the ...
[3]
[2201.00145] Matrix Decomposition and Applications - arXiv
Jan 1, 2022 · The sole aim of this survey is to give a self-contained introduction to concepts and mathematical tools in numerical linear algebra and matrix analysis.Missing: scholarly | Show results with:scholarly
[4]
Matrix Factorizations — Linear Algebra, Geometry, and Computation
A factorization of a matrix A is an equation that expresses A as a product of two or more matrices. ; The essential difference with what we have done so far is ...<|control11|><|separator|>
[5]
[PDF] Matrix Decomposition - Electrical and Computer Engineering
“Matrix decomposition refers to the transformation of a given matrix into a given canonical form.” [1], when the given matrix is transformed to a right-hand- ...Missing: motivation | Show results with:motivation
[6]
Matrices and determinants - MacTutor History of Mathematics
The beginnings of matrices and determinants goes back to the second century BC although traces can be seen back to the fourth century BC.
[7]
On the Early History of the Singular Value Decomposition
The first proofs of the SVD for real square matrices came out of the study of bilinear forms, first by Beltrami in 1873 and, independently, by Jordan in 1874.
[8]
Gram‐Schmidt orthogonalization: 100 years and more
Jun 4, 2012 · In 1907, Erhard Schmidt published a paper in which he introduced an orthogonalization algorithm that has since become known as the classical ...
[9]
Andre-Louis Cholesky (1875 - 1918) - Biography - MacTutor
Andre-Louis Cholesky was a French military officer and mathematician who invented the Method of Cholesky, a computational procedure for solving least squares ...
[10]
[PDF] A Brief History of Linear Algebra and Matrix Theory
Turing introduced the LU decomposition of a matrix in 1948. The L is a lower triangular matrix with 1's on the diagonal and the U is an echelon matrix.
[11]
None
Below is a merged summary of matrix decompositions from Chapters 2, 3, and 5, consolidating all information from the provided segments into a comprehensive and dense format. To maximize detail retention, I will use a combination of text and tables in CSV-like format where appropriate. The summary is organized by key properties, computational complexity, stability and conditioning, relations to triangularization/diagonalization, key quotes/theorems, page references, and useful URLs.
[12]
LU Decomposition for Solving Linear Equations - CS 357
The forward substitution algorithm solves a lower-triangular linear system by working from the top down and solving each variable in turn.
[13]
https://phys.libretexts.org/Bookshelves/Mathematical_Physics_and_Pedagogy/Computational_Physics_(Chong)/05:_Gaussian_Elimination/5.04:_LU_Decomposition
[14]
[PDF] LU Decomposition 1.0 Introduction We have seen how to construct ...
U is an upper-triangular matrix, meaning that all elements below the diagonal are 0. •. L is a lower-triangular matrix, meaning that all elements above the ...
[15]
[PDF] 11. LU Decomposition - UC Davis Math
From here, the process is exactly the same as for a square matrix. We create a sequence of matrices Li and Ui that is eventually the LU decomposition. Again, ...Missing: history | Show results with:history
[16]
Cholesky decomposition - StatLect
A square matrix is said to have a Cholesky decomposition if it can be written as the product of a lower triangular matrix and its transpose.Missing: advantages | Show results with:advantages
[17]
Cholesky Factorization - an overview | ScienceDirect Topics
Cholesky decomposition or factorization is a form of triangular decomposition that can only be applied to either a positive definite symmetric matrix or a ...
[18]
[PDF] 1. Positive Definite Matrices
Oct 26, 2005 · The Cholesky Decomposition. The Cholesky decomposition can be computed directly from the matrix equation A = FTF. Examining this equation on ...
[19]
Cholesky factorization - Higham - Wiley Interdisciplinary Reviews
Jun 30, 2009 · The Cholesky factorization (sometimes called the Cholesky decomposition) is named after André-Louis Cholesky (1875–1918), a French military ...Missing: original | Show results with:original
[20]
[PDF] MODELING COVARIANCE MATRICES IN TERMS OF STANDARD ...
Another approach is to use the Cholesky decomposition of the inverse of the covariance matrix (e.g., Pourahmadi (1999, 2000)), which has a nice regres- sion ...
[21]
[PDF] Unit II: Numerical Linear Algebra Chapter II.3: QR Factorization, SVD
How do we compute the QR Factorization? There are three main methods. ▷ Gram-Schmidt Orthogonalization. ▷ Householder Triangularization. ▷ Givens Rotations.
[22]
[PDF] Some notes on QR factorization - Alen Alexanderian
Sep 27, 2025 · The QR factorization of A is A = QR, where Q ∈ Rm×n has orthonormal columns and R is upper triangular. We also know, if A has full column rank, ...
[23]
QR decomposition - MATLAB - MathWorks
Use the economy-size QR decomposition of a coefficient matrix to solve the linear system Ax · Create a 10-by-5 coefficient matrix by using the first five columns ...
[24]
[PDF] The QR Algorithm - Ethz
The QR algorithm computes a Schur decomposition of a matrix. It is certainly one of the most important algorithm in eigenvalue computations [9].
[25]
[PDF] Updating the QR Factorization and the Least Squares Problem
Nov 12, 2008 · Abstract. In this paper we treat the problem of updating the QR factorization, with applications to the least squares problem.
[26]
[PDF] QR Decomposition with Gram-Schmidt
The QR decomposition (also called the QR factorization) of a matrix is a decomposition of the matrix into an orthogonal matrix and a triangular matrix.
[27]
Eigen Decomposition - an overview | ScienceDirect Topics
Eigen decomposition is defined as the factorization of a tensor into a canonic form using eigenvalues and eigenvectors, and it is utilized in various machine ...Missing: seminal papers
[28]
Eigenvalue computation in the 20th century - ScienceDirect.com
This paper sketches the main research developments in the area of computational methods for eigenvalue problems during the 20th century.
[29]
[PDF] A Short History of Operator Theory - NYU Stern
Cauchy proved the spectral theorem for self-adjoint matrices, i.e., that every real, symmetric matrix is diagonable. The spectral theorem as generalized by ...
[30]
[PDF] The QR algorithm: 50 years later its genesis by John Francis and ...
Jun 8, 2009 · Introduction. On 29 October 1959 John Francis submitted his first theoretical QR algorithm paper. On 6 June 1961.
[31]
[PDF] A Tutorial on the Spectral Theory of Markov Chains - arXiv
Aug 19, 2022 · Finding the eigenvalues and eigenvectors of a matrix is one way to partition a linear transformation into its component parts and to reveal ...
[32]
[PDF] Why eigenvalues? 1 Nonlinear equation solving - CS@Cornell
The purpose of this lecture is to tell you about a few applications of eigenvalue analysis, or perhaps to remind you of some applications that you've seen in ...
[33]
[PDF] Eigenvalues and eigenvectors of rotation matrices
√cos2 θ − 1 = cosθ ± i sin θ = e±iθ . (6). Thus, we ave confirmed that the eigenvalues are not real if θ 6= 0 (mod π). For the special.
[34]
[PDF] 10. Schur decomposition
𝛼2(𝜆2 − 𝜆1)x2 +···+ 𝛼k(𝜆k − 𝜆1)xk = 0. • since x2, ..., xk are linearly independent, 𝛼2 = ··· = 𝛼k = 0. • 𝛼1x1 = −(𝛼2x2 +···+ 𝛼kxk) = 0, so 𝛼1 is also zero.
[35]
[PDF] Golub and Van Loan - EE IIT Bombay
2.4 The Singular Value Decomposition. It is fitting that the first matrix decomposition that we present in the book is the singular value decomposition (SVD).
[36]
[PDF] Basic Theory of Algebraic Eigenvalue Problems
The Schur decomposition is one of the most important tools in theoretical and computa- tional linear algebra! A real square matrix A may have complex ...<|control11|><|separator|>
[37]
[PDF] Basics - Ethz
... Schur vector is an eigenvector of A. The other columns of U, however, are in general not eigenvectors of A. Notice, that the Schur decomposition is not unique.
[38]
[PDF] Eigenvalue Problem for General Matrices 1 The Complex Schur ...
Schur decomposition is defined as: A=QUQ. ∗. ,. (1.1) where ∗ denotes the ... important property of unreduced Hessenberg matrix is that it can only have simple ...
[39]
[PDF] Computing the Polar Decomposition—with Applications
Abstract. A quadratically convergent Newton method for computing the polar decomposition ofa full-rank matrix is presented and analysed. Acceleration ...
[40]
[PDF] Introduction to Continuum Mechanics
Continuum mechanics is a modern discipline that unifies solid and ... the right polar decomposition in Equation (3.92). Here, for the stretch U to ...
[41]
Traité des substitutions et des équations algébriques - Internet Archive
Dec 3, 2008 · Traité des substitutions et des équations algébriques. by: Jordan, Camille, 1838-1922 ... PDF download · download 1 file · SCRIBE SCANDATA ZIP ...Missing: 1874 | Show results with:1874
[42]
https://www.netlib.org/lapack/lawnspdf/lawn40.pdf
[43]
[PDF] Block LU Factorization 1 Introduction - The Netlib
Feb 16, 1992 · Algorithm BLU. This algorithm computes a block LU factorization A = LU 2IRn n. 1. U11 = A11, U12 = A12.
[44]
Parallel Implementation - The Netlib
In this section we describe the parallel implementation of LU factorization, with partial pivoting over rows, for a block-partitioned matrix. The matrix, A, to ...
[45]
Efficient parallel algorithm for dense matrix LU decomposition with ...
The algorithm employs row- column- as well as block-parallelisms interchangeably so that the n processors are used efficiently in the whole computation process.
[46]
Rank revealing QR factorizations - ScienceDirect.com
For matrices of low rank deficiency, the algorithm is guaranteed to reveal the rank of A, and the cost is only slightly more than the cost of one regular ̧ O ̧ ...
[47]
Downdating the Rank-Revealing URV Decomposition - SIAM.org
An accurate algorithm is presented for downdating a row in the rank-revealing URV decomposition that was recently introduced by Stewart.
[48]
Some Applications of the Rank Revealing QR Factorization - SIAM.org
The rank revealing QR factorization can be used to compute solutions to rank deficient least squares problems, to perform subset selection, to compute matrix ...
[49]
[PDF] Block Structure Preserving Model Order Reduction
Abstract—. We propose a block structure preserving model reduction (BSMOR), which generalizes the structure preserving model order reduction (SPRIM).
[50]
[PDF] Calculating the Singular Values and Pseudo-Inverse of a Matrix
GOLUB AND W. KAHAN. Householder's method for the solution of the algebraic eigenproblem, Com- put. J., 3 (1960), pp. 23-27. Error analysis of direct methods ...
[51]
A Gentle Introduction to Matrix Factorization for Machine Learning
Aug 9, 2019 · A matrix decomposition is a way of reducing a matrix into its constituent parts. It is an approach that can simplify more complex matrix operations.Missing: motivation | Show results with:motivation
[52]
Tensor Decompositions and Applications | SIAM Review
Tensor Rank and the CANDECOMP/PARAFAC Decomposition. In 1927, Hitchcock [105, 106] proposed the idea of the polyadic form of a tensor, i.e., expressing a tensor ...
[53]
The spectral theorem and its converses for unbounded symmetric ...
Dec 20, 2011 · The spectral theorem for finite-dimensional self-adjoint operators (ie Hermitian matrices, when viewed in coordinates), which provides a sequence.
[54]
On the Generalized Schur Decomposition of a Matrix Pencil for ...
Here we introduce, as an alternative, a companion QZ algorithm that solves a generalized eigenvalue problem for a companion pencil. More importantly, we ...
[55]
Williamson theorem in classical, quantum, and statistical physics
Dec 1, 2021 · For each real square positive-definite symmetric matrix with even dimension, there is an associated symplectic matrix that diagonalizes it ...
[56]
[PDF] Iterative Algorithms for Computing the Takagi Factorization ... - IAENG
Aug 28, 2018 · The Takagi factorization of a complex symmetric matrix has many applications, such as the Grunsky inequalities [23], computation of the near- ...
[57]
[PDF] Tensor Methods and Recommender Systems - arXiv
Feb 18, 2018 · rank in case of CP) of a tensor decomposition ... The TFC model: Tensor factorization and tag clustering for item recommendation in social tagging ...
[58]
Iterative refinement of the QZ decomposition for solving linear DSGE ...
Our approach allows users to improve the initial QZ solution through repeated refinement, preserving eigenvalues while incrementally improving accuracy.Missing: theory | Show results with:theory