Matrix decomposition
Matrix decomposition, also known as matrix factorization, refers to the process of factoring a matrix into a product of matrices that are simpler or more structured, such as triangular, diagonal, or orthogonal matrices, to enable efficient computation, analysis, and solution of linear systems.[1][2] This approach revolutionized numerical linear algebra in the mid-20th century; for example, the LU decomposition expresses a square matrix A as A = LU where L is lower triangular and U is upper triangular.[3][2] Key types of matrix decompositions include LU decomposition, used for solving general linear systems and related to Gaussian elimination; Cholesky decomposition, which factors symmetric positive-definite matrices as A = LL^T with L lower triangular, offering computational efficiency in statistical applications; QR decomposition, representing A = QR with orthogonal Q and upper triangular R, essential for least squares problems and orthogonalization; singular value decomposition (SVD), decomposing A = U \Sigma V^T where U and V are orthogonal and \Sigma is diagonal with non-negative singular values, crucial for principal component analysis (PCA), low-rank approximations, and pseudo-inverses; and eigendecomposition, for diagonalizable matrices as A = B \Lambda B^{-1} with eigenvalues on the diagonal of \Lambda, aiding in stability analysis and spectral problems.[1][2] These decompositions generally scale as O(n^3) for an n \times n matrix but allow subsequent operations, like solving multiple systems, to run in O(n^2), enhancing efficiency in large-scale computations such as those in machine learning backpropagation or the Netflix Prize matrix factorization challenges.[1][3][2] Beyond solving linear equations and eigenvalue problems, matrix decompositions underpin robust numerical software libraries like LAPACK and LINPACK, support updating techniques for dynamic data (e.g., rank-one modifications in O(n^2)), and reveal structural properties like rank, condition number, and subspace angles, with broad applications in physics, engineering, and data science.[2][1]Fundamentals
Definition and Motivation
In linear algebra, matrix decomposition, also known as matrix factorization, refers to the process of expressing a given matrix A \in \mathbb{R}^{m \times n} as a product of two or more simpler matrices, such as A = BC, where B and C possess special structures like triangular, orthogonal, or diagonal forms.[4][5] This factorization transforms the original matrix into a canonical form that exploits inherent properties for easier manipulation. The primary motivation for matrix decomposition arises from the need to simplify complex matrix operations in numerical computations. For instance, solving linear systems Ax = b, computing matrix inverses, and addressing eigenvalue problems become more tractable when the matrix is factored into components that allow for efficient algorithms like forward or backward substitution. These decompositions reduce computational complexity—for example, from O(n^3) operations in direct methods to more optimized forms—and enhance numerical stability by avoiding ill-conditioned intermediate steps, particularly for large or sparse matrices.[4] Beyond computation, matrix decompositions reveal intrinsic properties of the matrix, such as its rank, condition number, and overall stability, which are crucial for understanding the behavior of linear systems. In applications, they enable data compression through low-rank approximations, where only the dominant components are retained to represent high-dimensional data efficiently, as seen in techniques like singular value decomposition. Additionally, they play a key role in optimization problems by facilitating gradient computations and constraint handling in fields like machine learning and signal processing.[5] For example, decompositions such as LU or SVD illustrate how structured factorizations support these tasks without requiring full matrix storage or inversion.Historical Development
The foundations of matrix decomposition trace back to the 19th century, when mathematicians began exploring determinants and minors as tools for understanding linear systems and quadratic forms. Augustin-Louis Cauchy introduced key concepts in 1826, including the use of minors and adjoints in the context of quadratic forms, laying groundwork for notions of matrix rank through the analysis of non-vanishing minors.[6] Leopold Kronecker advanced these ideas in the 1880s, particularly in 1888, by applying determinant-based criteria to assess the solvability of linear systems, effectively formalizing early rank concepts that would underpin later decompositions.[6] A pivotal early milestone in spectral decompositions occurred in 1873, when Eugenio Beltrami developed a diagonalization technique for symmetric matrices via orthogonal transformations, motivated by simplifying bilinear forms; this was independently extended by Camille Jordan in 1874 to rectangular matrices, establishing the singular value decomposition (SVD) in its basic form.[7] James Joseph Sylvester contributed a similar result for real square matrices in 1889.[7] Erhard Schmidt formalized the Gram-Schmidt orthogonalization process in 1907, building on Jørgen Pedersen Gram's earlier work from 1883, which provided a basis for QR decompositions by constructing orthogonal bases from linearly independent vectors.[8] The early 20th century saw further progress in triangular decompositions. André-Louis Cholesky devised a method in 1910 for factoring positive definite matrices into lower triangular factors, originally for geodetic computations, though it was published posthumously in 1924.[9] In 1936, Carl Eckart and Gale Young demonstrated the utility of SVD for low-rank matrix approximation, showing that the best approximation in the Frobenius or spectral norm is obtained by truncating the SVD, which highlighted its practical value. World War II accelerated developments due to the need for solving large-scale linear systems in ballistics and cryptography, with early electronic computers like ENIAC (1945) demanding efficient algorithms. Alan Turing formalized the general LU decomposition in 1948 as a stable factorization for Gaussian elimination, building on earlier work by Tadeusz Banachiewicz in 1938 and John von Neumann and William Goldstine in 1947.[10] The rise of digital computers in the 1950s spurred numerical advancements: Alston S. Householder introduced unitary transformations in 1958 for QR decompositions, enabling stable orthogonalization without the instability of classical Gram-Schmidt. The 1960s marked a numerical renaissance, with Gene Golub and William Kahan developing an efficient bidiagonalization algorithm for computing the SVD in 1965, making it viable for practical applications on early computers. Concurrently, researchers addressed numerical stability for ill-conditioned matrices; partial pivoting in LU decompositions, refined by James Wilkinson and others around 1965, became a standard variant to mitigate growth in rounding errors during factorization.[10] These innovations, tied to the computational demands of scientific simulation and data analysis, solidified matrix decompositions as cornerstones of numerical linear algebra.General Properties
Matrix decompositions exhibit varying conditions for uniqueness depending on the specific factorization and any imposed constraints. For instance, the LU decomposition of a nonsingular square matrix is unique if no row interchanges are required during Gaussian elimination, but partial pivoting generally introduces non-uniqueness due to permutation choices.[11] Similarly, the orthogonal factor in the QR decomposition is unique up to sign changes in the columns when the upper triangular factor has positive diagonal entries and the matrix has full column rank.[11] The singular value decomposition (SVD), however, is unique up to unitary transformations, phase factors, and ordering of the singular values, with strict uniqueness holding when singular values are distinct.[11] Existence theorems underpin the reliability of these decompositions across matrix classes. The LU decomposition exists for any nonsingular square matrix when partial pivoting is employed, ensuring no zero pivots disrupt the process, though it may fail without pivoting for matrices with small pivots.[11] In contrast, the SVD exists for every real or complex matrix, regardless of size, rank, or conditioning, providing a universal factorization into orthogonal and diagonal components.[11] These guarantees make decompositions applicable to a broad range of problems, including the solution of linear systems where factorizations simplify subsequent computations.[11] Most matrix decompositions for dense n \times n matrices incur a computational complexity of O(n^3) operations, typically derived from variants of Gaussian elimination. For example, the LU decomposition requires approximately $2n^3/3 floating-point operations, while the SVD demands around $4n^3 to $22n^3 depending on the algorithm, such as the Golub-Reinsch method.[11] This cubic scaling reflects the inherent cost of transforming matrices into structured forms like triangular or diagonal. Stability and conditioning play crucial roles in the practical utility of decompositions, as they determine error propagation in finite-precision arithmetic. Decompositions like LU with partial pivoting and QR via Householder reflections are backward stable, meaning computed factors solve a slightly perturbed version of the original problem with small relative errors bounded by machine epsilon times the matrix norm.[11] The SVD is particularly robust, with perturbations in singular values controlled by the gap between them. Conditioning is quantified by the condition number \kappa(A) = \|A\| \cdot \|A^{-1}\|, which for the 2-norm equals the ratio of the largest to smallest singular value; well-conditioned matrices have \kappa(A) \approx 1, minimizing sensitivity to input perturbations, while ill-conditioned ones amplify errors.[11] Matrix decompositions fundamentally equate to processes of triangularization or diagonalization, enabling efficient manipulation of linear transformations. LU and QR decompositions achieve triangularization, converting a general matrix into lower/upper triangular factors through elimination, which simplifies solving systems or computing determinants.[11] Spectral decompositions like the SVD and eigendecomposition pursue diagonalization, factoring the matrix into orthogonal/unitary matrices and a diagonal matrix of singular values or eigenvalues, revealing intrinsic geometric properties such as rank and norms.[11]Decompositions for Linear Systems
LU Decomposition
LU decomposition, also known as LU factorization, expresses a square matrix A as the product of a lower triangular matrix L and an upper triangular matrix U, such that A = LU. Typically, L is a unit lower triangular matrix with ones on the diagonal, while U has arbitrary diagonal entries, ensuring all entries above the diagonal in L are zero and below the diagonal in U are zero.[12] This factorization applies to nonsingular matrices A \in \mathbb{R}^{n \times n}. For numerical stability, especially when pivots are small or zero, partial pivoting is used, yielding PA = LU where P is a permutation matrix that reorders rows to select the largest pivot in each column.[12] The modern LU factorization emerged from Alan Turing's 1948 analysis of rounding errors in matrix processes, where he formalized Gaussian elimination as this decomposition.[13] The algorithm to compute the LU decomposition relies on Gaussian elimination. Starting with the matrix A, elimination proceeds column by column: for each pivot position k, the entries below the pivot in column k are zeroed by subtracting multiples of row k from subsequent rows, with the multipliers l_{ik} = a_{ik}^{(k)} / a_{kk}^{(k)} (for i > k) stored in the lower part of L. The resulting upper triangular form after all eliminations is U, and L is completed with ones on the diagonal and zeros above. With partial pivoting, rows are swapped before each elimination step to maximize the pivot magnitude. Solving Ax = b then involves forward substitution to solve Ly = Pb for y, followed by back substitution to solve Ux = y.[14] The factorization requires O(n^3) operations, while each subsequent solve costs O(n^2).[12] Applications of LU decomposition center on efficiently solving linear systems Ax = b, particularly when multiple right-hand sides b share the same A, as the costly factorization is performed once. It also facilitates matrix inversion by solving Ax = e_i for each standard basis vector e_i. The method demands that A be nonsingular and that no zero pivots occur during elimination without pivoting; partial pivoting mitigates this by ensuring numerical stability for ill-conditioned matrices. For symmetric positive definite matrices, LU decomposition connects to Cholesky decomposition, where A = LL^T with L lower triangular.[15] Consider the 2×2 matrix A = \begin{pmatrix} 1 & 2 \\ 3 & 4 \end{pmatrix}. In the first step of Gaussian elimination, the multiplier for the (2,1) entry is l_{21} = 3/1 = 3. Subtract 3 times row 1 from row 2 to get the updated row 2: [3-3\cdot1, 4-3\cdot2] = [0, -2]. Thus, L = \begin{pmatrix} 1 & 0 \\ 3 & 1 \end{pmatrix}, \quad U = \begin{pmatrix} 1 & 2 \\ 0 & -2 \end{pmatrix}, and A = LU. To solve Ax = b for b = \begin{pmatrix} 1 \\ 2 \end{pmatrix}, first solve Ly = b by forward substitution: y_1 = 1, y_2 = (2 - 3\cdot1)/1 = -1. Then back-substitute Ux = y: x_2 = -1 / -2 = 0.5, x_1 = (1 - 2\cdot0.5)/1 = 0.[14]Cholesky Decomposition
The Cholesky decomposition, also known as Cholesky factorization, expresses a symmetric positive-definite matrix A as the product A = LL^T, where L is a lower triangular matrix with positive diagonal entries.[16] This factorization exploits the symmetry and positive definiteness of A to produce a unique L.[17] For complex Hermitian positive-definite matrices, the formula generalizes to A = LL^H, where ^H denotes the conjugate transpose.[16] The method was developed by French military engineer André-Louis Cholesky around 1907–1910 for geodetic computations during World War I, and published posthumously in 1924.[9] The decomposition exists if and only if A is symmetric (or Hermitian) and positive definite, meaning all eigenvalues of A are positive.[18] During computation, if any diagonal entry of L becomes negative or imaginary, this indicates that A is not positive definite.[16] The algorithm computes L row-by-row via a variant of Gaussian elimination that avoids pivoting, leveraging the symmetry to reduce operations. For the v-th row, the diagonal entry is l_{vv} = \sqrt{a_{vv} - \sum_{u=1}^{v-1} l_{vu}^2}, and for off-diagonal entries with t > v, l_{tv} = \frac{a_{tv} - \sum_{u=1}^{v-1} l_{tu} l_{vu}}{l_{vv}}.[16] This process requires approximately n^3/3 floating-point operations for an n \times n matrix, about half the cost of LU decomposition for the same task.[18] Cholesky decomposition offers key advantages over general methods like LU factorization: it stores only the lower triangular part of L (halving memory usage) and performs fewer operations by exploiting symmetry, making it particularly efficient for large-scale computations on positive-definite matrices.[17] It serves as a special case of LU decomposition for symmetric positive-definite matrices, where the upper triangular factor equals the transpose of the lower one.[19] Applications include solving linear systems Ax = b efficiently via forward substitution to solve Ly = b followed by back substitution for L^T x = y, which is common in optimization problems involving quadratic forms.[18] In statistics, it decomposes covariance matrices of multivariate normal distributions to generate correlated random samples by transforming independent standard normals through L.[20] For example, consider the 2×2 covariance matrix A = \begin{pmatrix} 4 & 2 \\ 2 & 5 \end{pmatrix}, which is symmetric positive definite. Its Cholesky decomposition is L = \begin{pmatrix} 2 & 0 \\ 1 & 2 \end{pmatrix}, since LL^T = \begin{pmatrix} 4 & 2 \\ 2 & 5 \end{pmatrix}.[18]QR Decomposition
The QR decomposition, also known as QR factorization, expresses an m \times n matrix A (with m \geq n) as the product A = QR, where Q is an m \times m orthogonal matrix satisfying Q^T Q = I_m and R is an m \times n upper triangular matrix.[21] In this full factorization, the bottom (m - n) \times n block of R consists of zeros.[21] This decomposition exists for any real matrix A, regardless of rank.[22] When A has full column rank (rank n), the reduced or thin QR decomposition takes the form A = \hat{Q} \hat{R}, where \hat{Q} is an m \times n matrix with orthonormal columns (\hat{Q}^T \hat{Q} = I_n) and \hat{R} is an n \times n upper triangular matrix; this reduced form is unique if the diagonal entries of \hat{R} are required to be positive.[21][22] The economy-sized variant, often used for rectangular matrices to save computation and storage, corresponds to this reduced form by omitting the unnecessary columns of the full Q and rows of R.[23] Several algorithms compute the QR decomposition, each with trade-offs in stability and efficiency. The classical Gram-Schmidt process orthogonalizes the columns of A sequentially but suffers from numerical instability due to subtractive cancellation in finite precision arithmetic.[21] The modified Gram-Schmidt variant improves stability by performing projections in a different order.[21] More robust methods include Householder reflections, which apply a sequence of orthogonal transformations to triangularize A while preserving its column space, achieving backward stability at an O(m n^2) cost for the thin form; this approach was introduced by Householder in 1958. Givens rotations offer an alternative, zeroing elements pairwise via 2-by-2 orthogonal rotations, which is useful for sparse or structured matrices but generally slower for dense cases.[21] A primary application of QR decomposition is solving overdetermined linear least squares problems \min_x \| Ax - b \|_2 for m > n. Substituting A = QR yields \| QRx - b \|_2 = \| Rx - Q^T b \|_2 since Q preserves norms (\| Q y \|_2 = \| y \|_2); the upper triangular system R x = Q^T b can then be solved efficiently by back-substitution, avoiding the ill-conditioning of the normal equations A^T A x = A^T b.[21] For symmetric matrices, QR decomposition facilitates eigenvalue computations via iterations of the form A_{k+1} = R_k Q_k, where A = Q_k R_k, preserving symmetry and leading to triangular convergence without explicit eigenvector calculation in the basic QR algorithm.[24] It also supports updating LU decompositions for rank-one modifications in adaptive least squares contexts.[25] As a representative example, consider orthogonalizing the columns of a 3×2 matrix A = \begin{pmatrix} 1 & 1 \\ 1 & 0 \\ 0 & 1 \end{pmatrix} using the modified Gram-Schmidt process. Normalize the first column to obtain the first column of \hat{Q}: q_1 = \frac{1}{\sqrt{2}} \begin{pmatrix} 1 \\ 1 \\ 0 \end{pmatrix}. Project the second column onto q_1 and normalize the orthogonal component: the projection coefficient is q_1^T a_2 = \frac{1}{\sqrt{2}}, so the residual is a_2 - (q_1^T a_2) q_1 = \begin{pmatrix} 1 \\ 0 \\ 1 \end{pmatrix} - \frac{1}{2} \begin{pmatrix} 1 \\ 1 \\ 0 \end{pmatrix} = \begin{pmatrix} 0.5 \\ -0.5 \\ 1 \end{pmatrix}, with norm \sqrt{1.5} = \sqrt{3/2}. Thus, q_2 = \frac{1}{\sqrt{6}} \begin{pmatrix} 1 \\ -1 \\ 2 \end{pmatrix}, yielding \hat{Q} = \begin{pmatrix} \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{6}} \\ \frac{1}{\sqrt{2}} & -\frac{1}{\sqrt{6}} \\ 0 & \frac{2}{\sqrt{6}} \end{pmatrix}, \quad \hat{R} = \begin{pmatrix} \sqrt{2} & \frac{1}{\sqrt{2}} \\ 0 & \sqrt{\frac{3}{2}} \end{pmatrix}, verifying A = \hat{Q} \hat{R}.Spectral Decompositions
Eigendecomposition
Eigendecomposition provides a way to factorize a square matrix into a form that reveals its spectral properties, assuming the matrix is diagonalizable. For an n \times n diagonalizable matrix A over the complex numbers, the eigendecomposition is given by A = V \Lambda V^{-1}, where V is an invertible matrix whose columns are the right eigenvectors of A, and \Lambda is a diagonal matrix containing the eigenvalues \lambda_i of A along its main diagonal.[26] This representation diagonalizes A, transforming it into a simpler form where operations become scalar multiplications on the eigenvalues. A matrix A is diagonalizable if and only if it possesses n linearly independent eigenvectors, which occurs when the algebraic multiplicity of each eigenvalue equals its geometric multiplicity.[27] For real symmetric matrices, the spectral theorem guarantees diagonalizability over the reals, with all eigenvalues real and the eigenvectors forming an orthogonal basis, yielding A = Q \Lambda Q^T where Q is orthogonal.[28] More generally, for Hermitian matrices over the complex numbers, the spectral theorem states that A = Q \Lambda Q^*, where Q is unitary and \Lambda has real eigenvalues. Computing the eigendecomposition typically involves iterative algorithms that approximate eigenvalues and eigenvectors. The power iteration method, a simple iterative technique, converges to the dominant eigenvalue and its eigenvector by repeatedly applying the matrix to an initial vector and normalizing, effective when one eigenvalue dominates in magnitude.[27] For more general cases, the QR algorithm, independently developed by John G. F. Francis in 1959–1961 and Vera Kublanovskaya in 1961, iteratively performs QR decompositions followed by reverse multiplications to drive the matrix toward a form revealing its eigenvalues, often after a Hessenberg reduction for efficiency.[29] This algorithm forms the basis of modern implementations in numerical libraries for dense matrices. Eigendecomposition finds applications in analyzing dynamical systems and stochastic processes. For matrix powers, if A = V \Lambda V^{-1}, then A^k = V \Lambda^k V^{-1}, simplifying computations of iterates like transition probabilities in Markov chains, where the dominant eigenvalue (typically 1) determines the stationary distribution.[30] In systems of linear differential equations of the form \mathbf{x}' = A \mathbf{x}, the solution is \mathbf{x}(t) = V e^{\Lambda t} V^{-1} \mathbf{x}(0), decoupling the system into independent scalar equations along eigenvector directions.[31] Matrices that are not diagonalizable require alternative forms like the Jordan canonical form for similar analyses. As an illustrative example, consider the 2×2 rotation matrix by angle \theta, R = \begin{pmatrix} \cos \theta & -\sin \theta \\ \sin \theta & \cos \theta \end{pmatrix}. The characteristic equation \det(R - \lambda I) = 0 yields eigenvalues \lambda = e^{\pm i \theta}, which are complex unless \theta = 0 or \pi (modulo $2\pi), reflecting the lack of real eigenvectors for non-trivial rotations in the plane.[32] The corresponding eigenvectors involve complex entries, such as \begin{pmatrix} 1 \\ -i \end{pmatrix} for \lambda = e^{i \theta}.Schur Decomposition
The Schur decomposition, named after mathematician Issai Schur who established its existence in 1909, provides a fundamental factorization for square matrices over the complex numbers. For any complex square matrix A \in \mathbb{C}^{n \times n}, there exists a unitary matrix Q \in \mathbb{C}^{n \times n} (satisfying Q^* Q = I, where Q^* denotes the conjugate transpose) and an upper triangular matrix T \in \mathbb{C}^{n \times n} such that A = Q T Q^*. The diagonal entries of T are the eigenvalues of A, appearing in some order, which follows from the similarity transformation preserving the spectrum of the matrix. This decomposition triangularizes A via a unitary similarity, ensuring numerical stability in computations due to the orthogonality of Q.[33][34] For real matrices, a variant known as the real Schur form addresses the presence of complex conjugate eigenvalue pairs while maintaining real arithmetic. Any real square matrix A \in \mathbb{R}^{n \times n} admits an orthogonal matrix U \in \mathbb{R}^{n \times n} (satisfying U^T U = I) and a block upper triangular matrix S \in \mathbb{R}^{n \times n} such that A = U S U^T, where the diagonal blocks of S are either 1×1 (for real eigenvalues) or 2×2 (for pairs of complex conjugate eigenvalues). The 2×2 blocks have the form of real companion matrices corresponding to quadratic factors of the characteristic polynomial, ensuring all entries remain real. This form is particularly useful in practical implementations to avoid complex numbers.[33][35] The Schur decomposition is typically computed using the QR algorithm with shifts, applied after an initial reduction to upper Hessenberg form to enhance efficiency. The QR iteration iteratively decomposes the matrix into an orthogonal factor and an upper triangular factor, then recombines them via similarity; shifts accelerate convergence toward the triangular form by targeting individual eigenvalues. This process yields the desired Q and T (or S) factors, with the Hessenberg reduction handled separately via Householder transformations. The algorithm's backward stability stems from the unitary transformations, making it a cornerstone for reliable eigenvalue solvers in numerical libraries.[24][34] In applications, the Schur decomposition underpins eigenvalue algorithms by reducing the problem to triangularization, from which eigenvalues are directly read off the diagonal and eigenvectors can be derived via back-substitution. Its stability ensures accurate computation even for ill-conditioned matrices, supporting tasks like dynamic mode decomposition in control theory and model reduction in simulations. For normal matrices, the decomposition simplifies to the eigendecomposition, where T becomes diagonal.[35][24] The Schur decomposition is not unique; the eigenvalues on the diagonal can be ordered arbitrarily, and for repeated eigenvalues, the corresponding invariant subspaces allow multiple choices for the columns of Q. In the real case, the block structure for complex pairs is unique up to the 2×2 block's internal unitary freedom, but the overall partitioning depends on eigenvalue ordering.[36][33] As an illustrative example, consider the 2×2 real matrix A = \begin{pmatrix} 1 & 2 \\ 3 & 4 \end{pmatrix}. Its eigenvalues are \lambda_1 = \frac{5 + \sqrt{33}}{2} \approx 5.372 and \lambda_2 = \frac{5 - \sqrt{33}}{2} \approx -0.372. A corresponding Schur decomposition is A = Q T Q^*, where T = \begin{pmatrix} \lambda_1 & * \\ 0 & \lambda_2 \end{pmatrix} (with the off-diagonal entry determined by the transformation), and Q is unitary with columns as Schur vectors, confirming the triangular form with eigenvalues on the diagonal.Singular Value Decomposition
The singular value decomposition (SVD) provides a canonical factorization for any real matrix A \in \mathbb{R}^{m \times n} (with a similar form for complex matrices), given by A = U \Sigma V^T, where U \in \mathbb{R}^{m \times m} and V \in \mathbb{R}^{n \times n} are orthogonal matrices, and \Sigma \in \mathbb{R}^{m \times n} is a rectangular diagonal matrix containing the singular values \sigma_1 \geq \sigma_2 \geq \dots \geq \sigma_{\min(m,n)} \geq 0 along its main diagonal, with all off-diagonal entries zero. This decomposition generalizes eigendecomposition to rectangular matrices by capturing their intrinsic geometric structure through orthogonal transformations and scaling. The singular values \sigma_i satisfy \sigma_i = \sqrt{\lambda_i}, where \lambda_i are the eigenvalues of A^T A (or equivalently A A^T), ordered decreasingly; the associated right singular vectors (columns of V) are the eigenvectors of A^T A, and the left singular vectors (columns of U) are the eigenvectors of A A^T. The rank of A equals the number of strictly positive singular values, providing a direct measure of the matrix's linear independence. Furthermore, the SVD enables optimal low-rank approximations: for any k \leq \min(m,n), the matrix A_k = \sum_{i=1}^k \sigma_i u_i v_i^T minimizes \|A - B\|_F over all rank-k matrices B, where \| \cdot \|_F denotes the Frobenius norm, as established by the Eckart–Young theorem. Computing the SVD typically involves the Golub–Kahan algorithm, which first applies Householder reflections to bidiagonalize A (reducing it to a form with nonzeros only on the main diagonal and adjacent sub- and super-diagonals), followed by iterative QR decompositions on the bidiagonal matrix to deflate and compute the singular values and vectors accurately and efficiently. This method, with subsequent refinements like divide-and-conquer or Jacobi rotations, forms the basis for implementations in numerical libraries such as LAPACK. Key applications of SVD include solving underdetermined or overdetermined linear systems via the Moore–Penrose pseudoinverse A^+ = V \Sigma^+ U^T, where \Sigma^+ reciprocates the nonzero \sigma_i on its diagonal and sets zeros elsewhere, enabling stable least-squares solutions even for rank-deficient matrices. In data analysis, SVD underpins principal component analysis (PCA): for a centered data matrix X \in \mathbb{R}^{m \times n} with m observations and n variables, the SVD X = U \Sigma V^T identifies the principal components as the columns of V, with the proportion of variance explained by the i-th component given by \sigma_i^2 / \sum_j \sigma_j^2, facilitating dimensionality reduction and noise filtering. SVD also supports image compression by representing an image as a matrix and retaining only the k largest singular values for a rank-k approximation, which preserves essential visual features while significantly reducing file size, as demonstrated in techniques achieving high compression ratios with minimal perceptual loss. Variants of the SVD include the compact (or economy) form, which omits zero singular values for a rank-r matrix A (where r is the number of positive \sigma_i), yielding A = U_r \Sigma_r V_r^T with U_r \in \mathbb{R}^{m \times r}, \Sigma_r \in \mathbb{R}^{r \times r}, and V_r \in \mathbb{R}^{n \times r}, thus requiring less storage and computation. The truncated SVD extends this by selecting only the first k < r terms, explicitly for approximation purposes. As a concrete illustration, consider the 3×2 matrix A = \begin{pmatrix} 1 & 1 \\ 1 & 0 \\ 0 & 1 \end{pmatrix}. The SVD computation begins with A^T A = \begin{pmatrix} 2 & 1 \\ 1 & 2 \end{pmatrix}, whose eigenvalues are \lambda_1 = 3 and \lambda_2 = 1, yielding singular values \sigma_1 = \sqrt{3} \approx 1.732 and \sigma_2 = 1. The right singular vectors v_1, v_2 are the normalized eigenvectors of A^T A: for \lambda_1 = 3, an unnormalized eigenvector is \begin{pmatrix} 1 \\ 1 \end{pmatrix}, normalized to \frac{1}{\sqrt{2}} \begin{pmatrix} 1 \\ 1 \end{pmatrix}; for \lambda_2 = 1, it is \begin{pmatrix} 1 \\ -1 \end{pmatrix}, normalized to \frac{1}{\sqrt{2}} \begin{pmatrix} 1 \\ -1 \end{pmatrix}. The corresponding left singular vectors are then u_i = \frac{1}{\sigma_i} A v_i for i=1,2, with a third orthogonal vector completing the full U if needed. This process highlights how SVD extracts the matrix's rank (here 2) and directional scalings.Advanced and Specialized Decompositions
Polar Decomposition
In linear algebra, the polar decomposition provides a factorization of a square matrix A \in \mathbb{C}^{n \times n} (or \mathbb{R}^{n \times n}) into the product of a unitary matrix U (orthogonal for real matrices) and a positive semidefinite Hermitian matrix P, such that A = UP. This decomposition is unique when A is invertible, with P being the unique positive semidefinite square root of A^*A, where A^* denotes the conjugate transpose. The form generalizes the polar form of complex numbers, separating the "phase" (unitary part) from the "magnitude" (positive part), and exists for any square matrix, though uniqueness requires invertibility. The right polar decomposition is A = UP, while the left form is A = PU, where now U is unitary and P = \sqrt{AA^*} is positive semidefinite. Explicitly, P = \sqrt{A^*A} and U = A (A^*A)^{-1/2} for the right form when A is invertible; the left form follows analogously with roles reversed. For noninvertible matrices, the decomposition still holds, but P may be singular, and uniqueness is guaranteed only for the positive semidefinite factor among decompositions with the same range. This structure highlights geometric properties, such as P representing a pure scaling and U a rigid rotation or isometry. A practical algorithm computes the polar decomposition via the singular value decomposition (SVD) of A = V \Sigma W^*, where V and W are unitary and \Sigma is diagonal with nonnegative entries. For the right polar form, set U = V W^* and P = W \Sigma W^*; the left form uses P = V \Sigma V^* and U = V W^*. This method is numerically stable and leverages efficient SVD routines, making it suitable for computational implementations despite the cubic cost of SVD. For rectangular matrices A \in \mathbb{C}^{m \times n} with m \geq n, a generalized right polar decomposition A = UH exists, where U \in \mathbb{C}^{m \times n} has orthonormal columns and H \in \mathbb{C}^{n \times n} is positive semidefinite, again derivable from SVD.[37] Applications of polar decomposition span several fields. In continuum mechanics, it decomposes the deformation gradient tensor F as F = RU, where R is the rotation tensor (orthogonal) and U is the right stretch tensor (positive definite symmetric), enabling separation of rigid body motion from pure strain for analysis of material deformation.[38] In optics, particularly polarimetry, the decomposition of Mueller matrices extracts intrinsic polarization properties: a diattenuator (linear dichroism), a retarder (phase shift), and a depolarizer, providing quantitative interpretation of light-scattering media such as biological tissues.[39] Another use is finding the closest unitary matrix to a given A, which is the unitary factor U in the decomposition, useful in optimization and approximation problems.[37] For example, consider the real $2 \times 2 matrix A = \begin{pmatrix} 1 & 1 \\ 0 & 1 \end{pmatrix}. First, compute A^T A = \begin{pmatrix} 1 & 1 \\ 1 & 2 \end{pmatrix}, whose eigenvalues are \frac{3 \pm \sqrt{5}}{2} with eigenvectors yielding the positive semidefinite P = \sqrt{A^T A} = \begin{pmatrix} \frac{1 + \sqrt{5}}{4} & \frac{\sqrt{5} - 1}{4} \\ \frac{\sqrt{5} - 1}{4} & \frac{3 - \sqrt{5}}{4} \end{pmatrix}. Then, U = A P^{-1} = \begin{pmatrix} \frac{5 - \sqrt{5}}{4} & \frac{\sqrt{5} - 1}{4} \\ -\frac{\sqrt{5} - 1}{4} & \frac{3 + \sqrt{5}}{4} \end{pmatrix}, which is orthogonal since U^T U = I. This illustrates how A separates into rotation-scaling factors.Jordan Canonical Form
The Jordan canonical form provides a canonical representation for square matrices over an algebraically closed field, such as the complex numbers, particularly useful for matrices that are not diagonalizable. Introduced by Camille Jordan in his 1874 treatise on substitutions and algebraic equations, it expresses any n \times n matrix A as similar to a unique (up to permutation of blocks) block-diagonal matrix J, via J = P^{-1} A P where P is invertible.[40][41] Each block in J is a Jordan block J_k(\lambda), corresponding to an eigenvalue \lambda, defined as J_k(\lambda) = \lambda I_k + N_k, where I_k is the k \times k identity matrix and N_k is the nilpotent matrix with 1's on the superdiagonal and zeros elsewhere.[41] J_k(\lambda) = \begin{pmatrix} \lambda & 1 & 0 & \cdots & 0 \\ 0 & \lambda & 1 & \cdots & 0 \\ \vdots & \vdots & \ddots & \ddots & \vdots \\ 0 & 0 & \cdots & \lambda & 1 \\ 0 & 0 & \cdots & 0 & \lambda \end{pmatrix} For an eigenvalue \lambda with algebraic multiplicity m (the multiplicity in the characteristic polynomial), the Jordan form decomposes the corresponding generalized eigenspace into a direct sum of cyclic subspaces, each associated with a Jordan block. The number of blocks equals the geometric multiplicity g = \dim \ker(A - \lambda I), which is at most m, and the sizes of the blocks (the lengths of the Jordan chains) are determined by the differences in dimensions of the kernels of powers of (A - \lambda I)^j for j = 1, \dots, m. The largest block size equals the index of \lambda in the minimal polynomial of A, which divides the characteristic polynomial and annihilates A. This structure extends the eigendecomposition to defective matrices, where g < m, by incorporating generalized eigenvectors satisfying (A - \lambda I)^k v = 0 but not for smaller k.[41] The existence of the Jordan canonical form requires the base field to be algebraically closed, ensuring all eigenvalues lie in the field; over the reals, a real Jordan form with 2×2 blocks for complex conjugate pairs is used instead. The form is unique up to the ordering of blocks, and the minimal polynomial's degree for each \lambda governs the largest block size for that eigenvalue. Computationally, one first finds the eigenvalues via the characteristic polynomial, then for each \lambda, computes the generalized eigenspace \ker((A - \lambda I)^m) and the Weyr characteristic (dimensions of \ker((A - \lambda I)^j)) to determine block sizes. Jordan chains are constructed by solving (A - \lambda I) v_{j} = v_{j-1} starting from eigenvectors, but this process is numerically unstable for large or ill-conditioned matrices due to sensitivity to perturbations in defective cases. Practical algorithms, such as those using Schur decomposition followed by rank-revealing decompositions, mitigate this but rarely compute the exact form; instead, they approximate the structure.[41] Applications of the Jordan canonical form include simplifying the solution of linear systems of differential equations, where e^{At} is block-diagonalized, and analyzing nilpotency in powers of (A - \lambda I)^k. In control theory, it determines the controllability indices and minimal realizations of linear systems by revealing the chain lengths for uncontrollable or unobservable modes. For instance, consider the 3×3 matrix A = \begin{pmatrix} 3 & 1 & 0 \\ 0 & 3 & 2 \\ 0 & 0 & 4 \end{pmatrix}, which has eigenvalues \lambda = 3 (algebraic multiplicity 2, geometric multiplicity 1) and \lambda = 4 (multiplicity 1). Its Jordan form is J = \begin{pmatrix} 3 & 1 & 0 \\ 0 & 3 & 0 \\ 0 & 0 & 4 \end{pmatrix}, achieved via similarity with P whose columns are a generalized eigenvector chain for \lambda=3 (satisfying (A - 3I)^2 v_2 = 0, (A - 3I) v_2 = v_1, (A - 3I) v_1 = 0) and an eigenvector for \lambda=4. This reveals the single Jordan block of size 2 for the defective eigenvalue.[41]Hessenberg and Tridiagonal Forms
The Hessenberg form of a square matrix is an intermediate structure that facilitates efficient computation of eigenvalues and eigenvectors by reducing the fill-in that occurs during iterative algorithms. An upper Hessenberg matrix H is defined such that h_{ij} = 0 for all i > j+1, meaning all entries below the first subdiagonal are zero. Any n \times n matrix A can be reduced to upper Hessenberg form via a unitary similarity transformation A = Q H Q^*, where Q is unitary and H is upper Hessenberg. This decomposition preserves the eigenvalues of A because similarity transformations do not alter the spectrum. The transformation is reversible, allowing recovery of the original matrix from H and Q. The reduction to Hessenberg form is typically achieved using Householder reflections or Givens rotations, with Householder methods being preferred for their efficiency in zeroing multiple elements simultaneously. The algorithm proceeds column by column: for each column k = 1 to n-2, a Householder reflector is constructed from the subvector A(k+1:n, k) to zero the entries below the subdiagonal in that column, and this reflector is applied from both the left and right to maintain similarity. The overall computational cost is \frac{10}{3} n^3 floating-point operations for an n \times n matrix, though the work per column is O(n^2), making it practical for large n. The process is backward stable, meaning the computed \tilde{H} satisfies \tilde{Q} \tilde{H} \tilde{Q}^* = A + \delta A with \|\delta A\| = O(\epsilon_{\text{mach}} \|A\|), where \epsilon_{\text{mach}} is machine epsilon. For symmetric matrices, the Hessenberg reduction simplifies to a tridiagonal form, where A = Q T Q^T and T is tridiagonal (symmetric with zeros outside the main diagonal and the adjacent sub- and superdiagonals). This is because the symmetry of A implies the resulting Hessenberg matrix is also symmetric, forcing the superdiagonal structure to match the subdiagonal. The same Householder-based algorithm applies, but the right multiplications preserve symmetry, and the cost remains \frac{4}{3} n^3 flops for the symmetric case. These forms serve as preprocessing steps in eigenvalue algorithms. In the QR algorithm for computing the Schur decomposition, the initial reduction to Hessenberg or tridiagonal form ensures that subsequent iterations preserve the structure and avoid unnecessary fill-in, significantly improving efficiency. In Krylov subspace methods, such as the Arnoldi iteration, the process generates an upper Hessenberg projection of the matrix onto the subspace, which approximates eigenvalues and eigenvectors for large sparse systems. ExampleConsider the $4 \times 4 matrix A = \begin{pmatrix} 1 & 15 & -6 & 0 \\ 1 & 7 & 3 & 12 \\ 2 & -7 & -3 & 0 \\ 2 & -28 & 15 & 3 \end{pmatrix}. The reduction begins with the first column: the subvector from row 2 to 4 is [1, 2, 2]^T, with norm \sqrt{9} = 3. A Householder reflector is constructed to map this subvector to [3, 0, 0]^T (choosing the sign for numerical stability), zeroing a_{31} and a_{41}. This reflector is applied to rows and columns 2 through 4 of A. Subsequent steps target the second column (zeroing a_{42}) and the third column (already zero below the subdiagonal). The resulting H is upper Hessenberg, similar to A, with nonzeros only on and above the main diagonal and the subdiagonal.