Fact-checked by Grok 2 weeks ago

Singular value decomposition

Singular value decomposition () is a fundamental in linear algebra that expresses any real or A \in \mathbb{R}^{m \times n} (or \mathbb{C}^{m \times n}) as A = U \Sigma V^T, where U \in \mathbb{R}^{m \times m} and V \in \mathbb{R}^{n \times n} are orthogonal matrices (or unitary for complex cases), and \Sigma \in \mathbb{R}^{m \times n} is a rectangular with non-negative real singular values \sigma_1 \geq \sigma_2 \geq \cdots \geq \sigma_p \geq 0 on its , where p = \min(m, n). The singular values are the square roots of the eigenvalues of A^T A (or A^* A for complex matrices), and the columns of V are the corresponding eigenvectors, while the columns of U are derived from A times the columns of V normalized by the singular values. This decomposition extends the eigenvalue decomposition to arbitrary matrices, not just square symmetric ones, and reveals key structural properties such as the of A (the number of nonzero singular values) and the (the ratio of the largest to smallest nonzero singular value). The origins of SVD trace back to the 19th century, with independent discoveries by Eugenio Beltrami in 1873 and Camille Jordan in 1874, who developed it in the context of simplifying bilinear forms. contributed related canonical forms in 1889, and introduced the term "singular values" in 1907 while studying integral equations. advanced the theoretical framework in 1912 and 1949 through work on eigenvalues of Hermitian matrices. The modern computational significance emerged in the mid-20th century, particularly with Carl Eckart and Gale Young's 1936 approximation theorem, which uses SVD for , and Gene Golub's development of efficient algorithms in the late 1960s, including the Golub-Reinsch method based on bidiagonalization and QR iterations. SVD plays a central role in and due to its stability and versatility. It enables the computation of the Moore-Penrose pseudoinverse A^+ = V \Sigma^+ U^T, where \Sigma^+ inverts the nonzero singular values, which is essential for solving overdetermined least-squares problems and handling ill-conditioned systems. In , SVD underpins (PCA) by decomposing the or centered data matrix to identify directions of maximum variance for . Practical applications include , where retaining only the largest singular values approximates the original with far fewer parameters while preserving essential features (e.g., reducing storage by keeping top k components for k \ll \min(m,n)); noise removal in signals by truncating small singular values; and recommendation systems, such as those used by , where SVD factorizes user-item rating matrices to predict preferences based on latent factors like genres. Additionally, SVD facilitates handwritten classification by projecting s onto low-dimensional subspaces derived from , achieving high accuracy (e.g., ~92% on MNIST with limited samples), and supports geometric interpretations in teaching linear algebra, such as visualizing matrix transformations as rotations, scalings, and rotations.

Fundamentals

Definition

In linear algebra, the singular value decomposition (SVD) is a that expresses any m \times n real or complex A as the product of three matrices: A = U \Sigma V^*, where U is an m \times m , \Sigma is an m \times n rectangular with non-negative real entries on the diagonal known as singular values, and V is an n \times n with V^* denoting its . This decomposition generalizes the eigenvalue decomposition to rectangular matrices and exists for every A, regardless of its dimensions or . A unitary matrix U satisfies U^* U = I_m, where I_m is the m \times m , meaning its columns form an for \mathbb{C}^m (or \mathbb{R}^m for real matrices); similarly for V with respect to \mathbb{C}^n. The matrix \Sigma has the form \Sigma = \begin{pmatrix} \sigma_1 & 0 & \cdots & 0 \\ 0 & \sigma_2 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & \sigma_{\min(m,n)} \end{pmatrix}, where the singular values satisfy \sigma_1 \geq \sigma_2 \geq \cdots \geq \sigma_r > 0 and \sigma_{r+1} = \cdots = \sigma_{\min(m,n)} = 0, with r = \operatorname{[rank](/page/Rank)}(A) being the number of positive singular values. The columns of U are the left singular vectors, and the columns of V are the right singular vectors. Key initial properties follow directly from the factorization: the product A A^* = U \Sigma \Sigma^* U^* = U \Sigma^2 U^*, where \Sigma^2 is diagonal with entries \sigma_i^2, showing that the eigenvalues of the A A^* are the squares of the singular values; analogously, A^* A = V \Sigma^* \Sigma V^* = V \Sigma^2 V^*. These relations highlight how SVD diagonalizes the related Gram matrices A A^* and A^* A.

Notation and basic properties

The singular value decomposition (SVD) of an m \times n matrix A is expressed as A = U \Sigma V^H, where U is an m \times m whose columns u_i are the left singular vectors, V is an n \times n whose columns v_i are the right singular vectors, and \Sigma is an m \times n rectangular diagonal matrix with nonnegative real entries \sigma_1 \geq \sigma_2 \geq \cdots \geq \sigma_p \geq 0 on the diagonal, where p = \min(m, n); these \sigma_i are termed the singular values of A. For real matrices, the Hermitian transpose V^H reduces to the V^T. The singular values \sigma_i are uniquely determined by A up to their conventional decreasing order. The corresponding left and right singular vectors u_i and v_i are up to multiplication by a unit-modulus scalar (or sign flip for real matrices) when the singular values are distinct; however, when singular values are repeated, the individual vectors are not , though the subspaces spanned by the associated singular vectors are . From the SVD, several fundamental properties follow directly. The rank of A, denoted \operatorname{rank}(A), equals the number of positive singular values, i.e., \operatorname{rank}(A) = r where \sigma_r > 0 = \sigma_{r+1}. For a square n \times n matrix A, invertibility holds all singular values are positive, ensuring \Sigma is nonsingular and thus A is invertible. Moreover, the SVD provides orthonormal bases for the four subspaces of A: the column space \operatorname{col}(A) is spanned by the first r left singular vectors \{u_1, \dots, u_r\}, the row space \operatorname{row}(A) by the first r right singular vectors \{v_1, \dots, v_r\}, the space \operatorname{null}(A) by the remaining right singular vectors \{v_{r+1}, \dots, v_n\}, and the left space \operatorname{null}(A^H) by the remaining left singular vectors \{u_{r+1}, \dots, u_m\}. The of a square A, defined as \kappa(A) = \sigma_1 / \sigma_n where \sigma_1 is the largest and \sigma_n the smallest, quantifies the relative of the matrix to perturbations in s such as the or Frobenius ; specifically, \kappa(A) \geq 1, with \kappa(A) = 1 A is a scalar multiple of a .

Illustrative example

To illustrate the singular value decomposition (SVD), consider the 2×2 matrix A = \begin{pmatrix} 3 & 0 \\ 4 & 5 \end{pmatrix}. The SVD process begins by computing A^T A, whose eigenvalues yield the squares of the singular values. Here, the eigenvalues of A^T A are 45 and 5, so the singular values are \sigma_1 = \sqrt{45} = 3\sqrt{5} \approx 6.708 and \sigma_2 = \sqrt{5} \approx 2.236. The normalized eigenvectors of A^T A form the columns of the right singular matrix V: V = \frac{1}{\sqrt{2}} \begin{pmatrix} 1 & -1 \\ 1 & 1 \end{pmatrix}. The columns of the left singular matrix U are then found using u_i = A v_i / \sigma_i for i=1,2: U = \frac{1}{\sqrt{10}} \begin{pmatrix} 1 & -3 \\ 3 & 1 \end{pmatrix}, with the diagonal matrix of singular values \Sigma = \begin{pmatrix} 3\sqrt{5} & 0 \\ 0 & \sqrt{5} \end{pmatrix}. Thus, the SVD is A = U \Sigma V^T. To verify, multiplying U \Sigma V^T reconstructs the original matrix: U \Sigma V^T = \begin{pmatrix} 3 & 0 \\ 4 & 5 \end{pmatrix}. Additionally, U and V satisfy the orthogonality properties U^T U = I and V^T V = I, confirming they are unitary matrices. This decomposition also expresses A as a sum of rank-1 matrices: A = \sigma_1 u_1 v_1^T + \sigma_2 u_2 v_2^T. When singular values are equal and positive, the SVD is non-unique: the corresponding singular vectors are only determined up to an orthogonal transformation within their eigenspace. For instance, repeated eigenvector computations for A^T A (or A A^T) with a repeated eigenvalue may yield different orthonormal bases for the same singular value, leading to equivalent but distinct U and V that still satisfy the decomposition.

Interpretations

Geometric interpretation

The singular value decomposition of a matrix A geometrically interprets the associated linear transformation as a sequence of an V^T on the input , followed by a diagonal \Sigma, and then an U on the output , expressed as A = U \Sigma V^T. This decomposition reveals how A acts by first aligning the input with principal directions via V^T, which rotates or reflects the to match the axes of \Sigma, then stretching or compressing along those axes by factors given by the diagonal entries of \Sigma, and finally reorienting the result in the output through U. The columns of U and V furnish orthonormal bases aligned with these principal directions in their respective spaces. A key visual aspect arises from the action on the unit ball: the transformation A maps the unit (or in two dimensions) in the input to an (or ) in the output , where the singular values \sigma_i (the diagonal entries of \Sigma, ordered \sigma_1 \geq \sigma_2 \geq \cdots \geq 0) determine the lengths of the semiaxes of this . For example, in the plane, the unit is deformed into an whose longest and shortest axes correspond to the largest and smallest nonzero singular values, with directions pointing along the left singular vectors (columns of U). This reflects the varying "" of the in different directions, with larger \sigma_i indicating greater and smaller ones toward zero. The orthogonal components U and V can include reflections when their determinants are -1, introducing an orientation reversal in the rotation-like action, such as flipping the input or output space across a hyperplane. Despite this, the overall decomposition preserves angles and distances up to the scalings imposed by \Sigma, maintaining the Euclidean structure through the isometry of the orthogonal transformations. This reflective possibility arises from sign ambiguities in the singular vectors but does not alter the fundamental scaling geometry.

Relation to fundamental subspaces

The singular value decomposition (SVD) of an m \times n matrix A provides a direct connection to its four fundamental subspaces: the column space, row space, null space, and left null space. Specifically, if A = U \Sigma V^H where U is an m \times m , \Sigma is an m \times n with nonnegative singular values \sigma_1 \geq \sigma_2 \geq \cdots \geq \sigma_r > 0 = \sigma_{r+1} = \cdots = \sigma_{\min(m,n)} ordered decreasingly, and V is an n \times n , then the column space of A, denoted \operatorname{Col}(A), is the span of the first r columns of U, i.e., \operatorname{Col}(A) = \operatorname{span}\{u_1, \dots, u_r\}, where u_i are the left singular vectors corresponding to the nonzero singular values. This subspace has dimension r, matching the of A. The row space of A, denoted \operatorname{Row}(A), is equivalently the column space of A^H and is given by the span of the first r columns of V, i.e., \operatorname{Row}(A) = \operatorname{span}\{v_1, \dots, v_r\}, where v_i are the right singular vectors associated with the nonzero singular values. Meanwhile, the null space of A, or kernel \operatorname{Nul}(A), consists of all vectors x such that Ax = 0, and it is spanned by the remaining columns of V, specifically \operatorname{Nul}(A) = \operatorname{span}\{v_{r+1}, \dots, v_n\}, with dimension n - r. The left null space of A, denoted \operatorname{Nul}(A^H), comprises vectors y such that A^H y = 0 (or y^H A = 0), and it is spanned by the columns of U from r+1 to m, i.e., \operatorname{Nul}(A^H) = \operatorname{span}\{u_{r+1}, \dots, u_m\}, with dimension m - r. A distinctive feature of the SVD is that it furnishes explicit orthonormal bases for all four fundamental subspaces, with the dimensions of the nonzero and null subspaces directly determined by the number and positions of the nonzero singular values. This orthonormal structure arises from the unitarity of U and V, ensuring the basis vectors are orthogonal and normalized.

Orthonormal bases and singular vectors

In the singular value decomposition of a matrix A \in \mathbb{C}^{m \times n}, the left singular vectors u_i (for i = 1, \dots, m) are defined as the eigenvectors of the AA^*, with corresponding eigenvalues \sigma_i^2, where \sigma_i \geq 0 are the singular values of A. Similarly, the right singular vectors v_i (for i = 1, \dots, n) are the eigenvectors of the A^*A, also with eigenvalues \sigma_i^2. These eigenvectors are typically ordered such that \sigma_1 \geq \sigma_2 \geq \dots \geq \sigma_r > 0 = \sigma_{r+1} = \dots = \sigma_{\min(m,n)}, where r is the of A. The matrices U = [u_1 \ \dots \ u_m] \in \mathbb{C}^{m \times m} and V = [v_1 \ \dots \ v_n] \in \mathbb{C}^{n \times n} are unitary, meaning their columns form orthonormal bases for \mathbb{C}^m and \mathbb{C}^n, respectively. This orthonormality follows from the spectral properties of the Hermitian matrices AA^* and A^*A, which admit orthonormal eigenvector decompositions. Consequently, the sets \{u_1, \dots, u_m\} and \{v_1, \dots, v_n\} satisfy \langle u_i, u_j \rangle = \delta_{ij} and \langle v_i, v_j \rangle = \delta_{ij} for all i, j, where \delta_{ij} is the Kronecker delta. A key biorthogonality relation connects the singular vectors through the original matrix: for each i \leq r, A v_i = \sigma_i u_i and A^* u_i = \sigma_i v_i. These equations highlight how the action of A (and its ) scales and rotates between the right and left singular vectors, preserving their norms due to the unit length of u_i and v_i. For indices i > r where \sigma_i = 0, the relations simplify to A v_i = 0 and A^* u_i = 0, implying that the corresponding singular vectors v_i (for i = r+1, \dots, n) span the of the row space of A, while the u_i (for i = r+1, \dots, m) span the of the column space of A. This structure ensures the singular vectors provide a complete, orthonormal for decomposing A.

Connections to Spectral Theory

Relation to eigenvalue decomposition

The singular value decomposition (SVD) of a matrix A generalizes the eigenvalue decomposition by establishing a connection through the associated symmetric matrices A^* A and A A^*, where A^* denotes the . Specifically, the singular values \sigma_i of A are the nonnegative square roots of the eigenvalues of A^* A (or equivalently, of A A^*), and the right singular vectors (columns of V in the SVD A = U \Sigma V^*) are the corresponding eigenvectors of A^* A, while the left singular vectors (columns of U) are the eigenvectors of A A^*. This relation holds for any real or matrix A of size m \times n, allowing SVD to apply to rectangular matrices where eigenvalue decomposition is undefined. In contrast to eigenvalue decomposition, which applies only to square matrices and requires them to be diagonalizable (i.e., A = Q \Lambda Q^{-1} for some invertible Q and diagonal \Lambda), SVD exists and is unique (up to signs and ordering) for every matrix, regardless of shape or symmetry. For square invertible matrices, the two decompositions coincide under specific conditions: if A is normal (satisfying A A^* = A^* A), then the SVD takes the form A = U \Sigma U^*, where the left and right singular vectors are the same (up to signs for negative eigenvalues), and the singular values are the absolute values of the eigenvalues. In this case, the eigenvalue decomposition aligns closely with SVD, but with eigenvalues potentially complex or negative, whereas singular values are always real and nonnegative. SVD effectively provides an "eigen-decomposition" for rectangular matrices by facilitating generalizations like the and the Moore-Penrose pseudoinverse. In the polar form, any A can be written as A = Q P, where Q is unitary (or orthogonal for real matrices) and P is , with P derived from the SVD as V \Sigma V^*; this mirrors how eigenvalue decomposition diagonalizes symmetric matrices but extends to non-square cases. Similarly, the Moore-Penrose pseudoinverse A^+, which generalizes matrix inversion, is computed directly from the SVD as A^+ = V \Sigma^+ U^*, where \Sigma^+ inverts the nonzero singular values, offering a least-squares solution framework absent in standard eigenvalue decomposition.

Singular values and spectral decomposition

The singular value decomposition (SVD) of an m \times n complex matrix A admits a spectral form that expresses A as a sum of rank-one matrices weighted by its singular values. Specifically, A = \sum_{i=1}^{\min(m,n)} \sigma_i \mathbf{u}_i \mathbf{v}_i^*, where \sigma_1 \geq \sigma_2 \geq \cdots \geq \sigma_{\min(m,n)} \geq 0 are the singular values, \mathbf{u}_i are the columns of the U \in \mathbb{C}^{m \times m}, and \mathbf{v}_i are the columns of the V \in \mathbb{C}^{n \times n}. If the r of A is less than \min(m,n), the sum can be restricted to the first r terms with \sigma_{r+1} = \cdots = \sigma_{\min(m,n)} = 0, while the full sum includes these zero contributions for completeness. This decomposition generalizes the to arbitrary matrices, providing a canonical way to represent A through orthogonal transformations and diagonal scaling. The singular values \sigma_i can be interpreted as generalized eigenvalues, specifically the square roots of the eigenvalues of [A^*A](/page/A^*A) or AA^*, which capture the principal directions of variance or importance in the matrix's action. Larger \sigma_i indicate greater stretching or compression along the corresponding singular vector directions, quantifying the matrix's effect on the unit sphere in a manner analogous to eigenvalues but applicable to non-square matrices. For instance, in contexts like , the singular values order the components by their , with the ratio of cumulative squared singular values assessing reconstruction quality. Key properties of the singular values include their non-increasing order, which ensures a natural for , and their role in norms such as the nuclear norm \|A\|_* = \sum_{i=1}^{\min(m,n)} \sigma_i, equivalent to the of the \Sigma in the compact SVD form A = U \Sigma V^*. This nuclear norm serves as a surrogate for minimization, promoting sparsity in the . The SVD achieves a unique under unitary transformations, where the diagonal entries are non-negative and ordered, distinguishing it as the optimal form for such factorizations up to phase ambiguities in the singular vectors. For Hermitian matrices, this reduces to the standard via eigenvalues.

Proofs of Existence

Spectral theorem approach

The singular value decomposition (SVD) of a matrix A \in \mathbb{C}^{m \times n} can be established through the for Hermitian matrices, which guarantees the of operators. This approach leverages the fact that the matrices A^*A and AA^* are Hermitian and , allowing their eigendecompositions to yield the singular vectors and values of A. Consider the matrix B = A^*A \in \mathbb{C}^{n \times n}. This matrix is Hermitian because (A^*A)^* = A^* (A^*)^* = A^*A. Moreover, B is since for any x \in \mathbb{C}^n, x^* B x = x^* A^* A x = \|A x\|^2 \geq 0. By the for Hermitian matrices, there exists a V \in \mathbb{C}^{n \times n} and a \Lambda = \operatorname{diag}(\lambda_1, \dots, \lambda_n) with \lambda_i \geq 0 for all i such that B = V \Lambda V^*. The columns of V, denoted v_1, \dots, v_n, form an for \mathbb{C}^n consisting of eigenvectors of B, and the \lambda_i are the corresponding eigenvalues ordered non-increasingly. The singular values of A are defined as \sigma_i = \sqrt{\lambda_i} for i = 1, \dots, n, yielding non-negative real numbers ordered \sigma_1 \geq \sigma_2 \geq \dots \geq \sigma_n \geq 0. The columns of V serve as the right singular vectors of A, satisfying A^* A v_i = \lambda_i v_i = \sigma_i^2 v_i. To obtain the left singular vectors, define u_i = A v_i / \sigma_i for each i where \sigma_i > 0. These u_i are orthonormal because \langle u_i, u_j \rangle = (1/\sigma_i \sigma_j) \langle A v_i, A v_j \rangle = (1/\sigma_i \sigma_j) \langle v_i, A^* A v_j \rangle = (1/\sigma_i \sigma_j) \sigma_j^2 \langle v_i, v_j \rangle = \delta_{ij}. For indices where \sigma_i = 0, extend the basis orthonormally in the null space of A^*. Similarly, the eigendecomposition of C = AA^* \in \mathbb{C}^{m \times m} yields C = U \Lambda' U^*, where U \in \mathbb{C}^{m \times m} is unitary, \Lambda' is diagonal with entries \lambda_i' (matching the \lambda_i from B for the first \min(m,n) and zeros otherwise), and the columns of U provide the left singular vectors. This establishes A = U \Sigma V^*, where \Sigma = \operatorname{diag}(\sigma_1, \dots, \sigma_{\min(m,n)}, 0, \dots, 0) is the rectangular diagonal matrix of singular values. This construction naturally handles rectangular matrices, as [A^*A](/page/A^*A) is always n \times n and AA^* is m \times m, regardless of whether m > n or m < n; the zero singular values account for the difference in dimensions, leading to the reduced or compact SVD forms when restricting to nonzero singular values. The spectral theorem ensures completeness, as the eigenvectors span the full space, providing orthonormal bases for the domain and codomain. Regarding uniqueness, the singular values \sigma_i > 0 are uniquely determined, while singular vectors corresponding to distinct singular values are unique up to phase; for multiplicities (repeated \sigma_i), the corresponding eigenspaces admit any orthonormal basis therein, but the decomposition remains valid with appropriate choice.

Variational characterization

The singular values of a matrix A \in \mathbb{R}^{m \times n} admit a variational characterization that provides an optimization-based perspective on their existence and properties. The largest singular value is defined as \sigma_1(A) = \max_{\|x\|_2 = 1} \|A x\|_2, where the maximum is attained at the leading right singular vector v_1, which serves as the corresponding argmax. Subsequent singular values are obtained by restricting the maximization to orthogonal complements of the preceding singular vectors: for k = 2, \dots, \min(m,n), \sigma_k(A) = \max_{\|x\|_2 = 1, \, x \perp \operatorname{span}\{v_1, \dots, v_{k-1}\}} \|A x\|_2, with v_k as the argmax in this subspace. This process yields a nonincreasing sequence \sigma_1 \geq \sigma_2 \geq \dots \geq \sigma_{\min(m,n)} \geq 0, and the right singular vectors \{v_k\} form an for \mathbb{R}^n. A symmetric characterization holds for the left singular vectors via maximization of \|A^* y\|_2 over unit vectors y in orthogonal subspaces of \mathbb{R}^m. To establish this characterization, consider the dyadic form of the SVD, A = \sum_{j=1}^r \sigma_j u_j v_j^T, where r = \operatorname{[rank](/page/Rank)}(A). For unit vectors u \perp \operatorname{span}\{u_1, \dots, u_{k-1}\} and v \perp \operatorname{span}\{v_1, \dots, v_{k-1}\}, the inner product satisfies |u^T A v| \leq \sigma_k \|u\|_2 \|v\|_2 by and the definition of singular values, with equality when u = u_k and v = v_k. This implies \|A v\|_2 \leq \sigma_k in the restricted subspace, confirming the maximizers and the decreasing order without invoking the full . The variational characterization connects directly to the eigenvalues of the Hermitian matrix A^T A \in \mathbb{R}^{n \times n}, whose eigenvalues are precisely \sigma_i^2(A). Specifically, the Rayleigh quotient characterization for the eigenvalues of A^T A yields \sigma_k^2(A) = \max_{\|x\|_2 = 1, \, x \perp \operatorname{span}\{v_1, \dots, v_{k-1}\}} \frac{x^T A^T A x}{x^T x} = \max_{\|x\|_2 = 1, \, x \perp \operatorname{span}\{v_1, \dots, v_{k-1}\}} \|A x\|_2^2, aligning with the singular value definition. This equivalence, rooted in the Courant-Fischer min-max theorem applied to A^T A, demonstrates the singular values as the square roots of the ordered eigenvalues of A^T A. This optimization framework offers algorithmic insight into iterative methods for computing SVD, such as the power method applied to A^T A, by successively maximizing the in deflated subspaces without requiring prior knowledge of the .

Computation

Jacobi algorithms

Jacobi algorithms for computing the singular value decomposition () of a A \in \mathbb{R}^{m \times n} are iterative methods that rely on successive applications of plane rotations to progressively diagonalize A. These methods extend the classical Jacobi for symmetric eigenvalue problems, originally developed by in 1846, to the nonsymmetric case of SVD. Unlike direct methods such as QR , Jacobi algorithms emphasize orthogonal transformations that target off-diagonal elements, making them particularly suitable for implementation and high relative accuracy in singular values. The two-sided Jacobi algorithm, first proposed by Kogbetliantz in 1955, applies rotations simultaneously from both the left and right of A to zero out off-diagonal entries, akin to an RQ decomposition process that symmetrizes and diagonalizes 2×2 subblocks iteratively. In each step, a pair of indices (p, q) is selected, typically the one with the largest off-diagonal magnitude or in cyclic order, and two Givens rotations—one acting on rows and one on columns—are computed to eliminate the (p,q) and (q,p) entries in the current matrix. These rotations accumulate into the left singular vector matrix U and right singular vector matrix V, while the diagonal entries converge to the singular values \Sigma. The process continues over multiple sweeps until the Frobenius norm of the off-diagonal part falls below a tolerance. This approach handles multiple singular values effectively by reducing the off-diagonal sum of squares quadratically in each sweep. The one-sided Jacobi algorithm, introduced by Hestenes in , simplifies the process by applying rotations only from the right to A, implicitly diagonalizing A^T A without forming it explicitly, which yields the right singular vectors V and a bidiagonal or diagonal form for \Sigma, from which U is derived. For selected column indices (i,j), a rotation J_{i,j} is determined by solving a 2×2 eigenvalue problem on the submatrix of A^T A, zeroing the off-diagonal entry (A^T A)_{i,j} while updating the columns of A and accumulating into V. Once the columns of the updated AV are sufficiently orthogonal, the singular values are the Euclidean norms of these columns, and U is obtained by normalizing them. This method converges quadratically and is advantageous for rectangular matrices, as it avoids left-side rotations. Both variants follow a common iterative framework: initialize U = I_m and V = I_n (or just V for one-sided); repeat sweeps selecting off-diagonal pairs (e.g., by maximum for faster or cyclic for simplicity); apply Givens rotations to target the pair, updating the matrix and accumulators; terminate when the maximum off-diagonal exceeds a small \epsilon, typically $10^{-12} times the matrix . A outline for the one-sided variant, representative of the rotation-based structure, is as follows:
Input: A (m × n matrix, m ≥ n)
V ← eye(n)  // n × n identity
tolerance ← small value (e.g., machine epsilon * ||A||_F)
while max off-diagonal of A^T A > tolerance:
    for each pair (i, j) with i < j (cyclic or by largest |a_i^T a_j|):
        Compute rotation parameters c, s from 2×2 subproblem on [||a_i||^2, a_i^T a_j; a_j^T a_i, ||a_j||^2]
        Apply right rotation to columns i,j of A: A[:, [i,j]] ← A[:, [i,j]] * [c, -s; s, c]
        Update V: V[:, [i,j]] ← V[:, [i,j]] * [c, -s; s, c]
    // End sweep
Σ ← diag([||A[:,k]||_2 for k=1 to n])
U ← A * inv(Σ)  // Normalize columns
Output: U Σ V^T
For the symmetric case where A is square and symmetric, both algorithms guarantee convergence to the eigenvalue decomposition, as the SVD coincides with the spectral decomposition. These rotations relate briefly to the analytic 2×2 SVD, where exact parameters are derived from hyperbolic functions, but in the iterative setting, approximate solutions suffice for larger matrices. Modern implementations, such as those in , enhance these with block processing for better performance on parallel architectures while preserving quadratic convergence rates of 5–10 sweeps for typical n \leq 1000.

Numerical methods

One prominent class of numerical methods for computing the singular value decomposition (SVD) of a matrix involves QR-based techniques, which proceed in two main phases. First, the matrix is reduced to upper bidiagonal form through a series of applied alternately from the left and right, a process known as . This reduction preserves the singular values and is numerically stable, requiring O(m n^2) operations for an m × n matrix with m ≥ n. In the second phase, the singular values of the bidiagonal matrix are computed using a variant of the , often with shifts to accelerate convergence, followed by back-substitution to obtain the singular vectors if needed. For larger matrices, particularly those that are sparse or structured, divide-and-conquer algorithms offer improved efficiency over traditional iterative methods. These algorithms recursively split the bidiagonal matrix into smaller subproblems, compute the SVDs of the submatrices independently, and then merge the results using rank-revealing decompositions to handle interactions between singular values near crossings. Introduced by Gu and Eisenstat, this approach achieves O(n^2) complexity for the bidiagonal SVD phase and is particularly effective for parallel implementation on distributed systems, making it suitable for matrices with dimensions exceeding 10,000. SVD algorithms are designed to ensure backward stability, meaning the computed decomposition corresponds to the exact SVD of a slightly perturbed input matrix, with the perturbation bounded by machine epsilon times the matrix norm. The forward error in singular values is then controlled by the condition number κ = σ_max / σ_min, where small σ_min amplifies relative errors in ill-conditioned problems. This stability holds for both QR-based and divide-and-conquer methods under standard floating-point arithmetic, though near-zero singular values can pose challenges due to rounding errors that may distort the gap between them and the next larger value. Standard implementations of these methods are provided in the LAPACK library, with routines like DGESVD (QR-based) and DGESDD (divide-and-conquer) serving as de facto benchmarks for dense matrices up to moderate sizes. For near-zero singular values, LAPACK employs deflation techniques to isolate and refine them accurately, mitigating floating-point underflow issues. In the 2020s, advancements in parallel computing have extended these routines to machine learning-scale matrices, with libraries like PRIMME_SVDS enabling preconditioned, distributed SVD computations on clusters for problems involving billions of entries, achieving speedups of 10-50x over serial versions on GPU-accelerated nodes.

Analytic 2×2 case

For a general 2×2 matrix A = \begin{pmatrix} a & b \\ c & d \end{pmatrix}, the singular values \sigma_1 \geq \sigma_2 \geq 0 are the square roots of the eigenvalues of A^* A, where A^* denotes the . These eigenvalues \lambda_{1,2} solve the characteristic equation \lambda^2 - \operatorname{tr}(A^* A) \lambda + \det(A^* A) = 0, yielding the closed-form expression \lambda_{1,2} = \frac{\|A\|_F^2 \pm \sqrt{ \|A\|_F^4 - 4 \det(A^* A) }}{2}, with the Frobenius norm \|A\|_F^2 = |a|^2 + |b|^2 + |c|^2 + |d|^2 = \operatorname{tr}(A^* A) and \det(A^* A) = |\det(A)|^2 = |ad - bc|^2. Thus, \sigma_{1,2} = \sqrt{\lambda_{1,2}}. The unitary matrices U and V in the SVD A = U \Sigma V^*, with \Sigma = \operatorname{diag}(\sigma_1, \sigma_2), consist of the left and right singular vectors, respectively. For real matrices, U and V are orthogonal rotation matrices parameterized by angles \theta_U and \theta_V, computed as \theta = \frac{1}{2} \atantwo(2 \cdot \text{off-diagonal}, \text{diagonal difference}) applied to AA^T and A^T A, ensuring numerical stability via the two-argument arctangent function. For complex entries, additional phase factors e^{i \phi} are incorporated into the columns of U and V to render the singular values real and non-negative, providing insight into how complex phases distribute between the factors without affecting the magnitudes. Special cases simplify the expressions. If A is diagonal, then U = V = I_2 (up to signs/phases) and \sigma_1 = \max(|a|, |d|), \sigma_2 = \min(|a|, |d|). For symmetric A (real or Hermitian), the SVD coincides with the eigenvalue decomposition if A is positive semidefinite, with U = V (up to phases in the complex case). Permutation matrices, such as A = \begin{pmatrix} 0 & 1 \\ 1 & 0 \end{pmatrix}, have singular values \sigma_1 = 1, \sigma_2 = 1, with U and V as rotations by \pi/4 or equivalent phase-adjusted unitaries. These closed-form solutions enable exact computation without iteration, offering efficiency in resource-limited embedded systems for tasks like signal processing, as demonstrated in FPGA implementations post-2010.

Reduced Forms

Thin SVD

The thin singular value decomposition (SVD), also known as the economy SVD, of an m \times n matrix A (with m \geq n) is A = U \Sigma V^*, where U is an m \times n matrix whose columns are the left singular vectors, \Sigma is an n \times n diagonal matrix containing the n singular values \sigma_1 \geq \sigma_2 \geq \cdots \geq \sigma_n \geq 0 along its main diagonal (including zeros if \operatorname{rank}(A) < n), and V is an n \times n unitary matrix whose columns are the right singular vectors. This form captures all possible singular values up to \min(m,n) while using rectangular U to minimize storage, with total elements m n + n + n^2, which is less than the full SVD's m^2 + m n + n^2 when m > n. The columns of U are orthonormal (U^* U = I_n) and span the leading n dimensions of the left singular subspace, while the columns of V satisfy V^* V = I_n. If \operatorname{rank}(A) = r < n, the last n - r singular values are zero, and the corresponding columns of V span the null space of A, while extra columns of U contribute to the left null space basis. This preserves the structure of the range and null spaces up to the matrix dimensions, providing an efficient factorization that exactly reconstructs A without the full square U. For the case m < n, the roles are reversed, with V rectangular n \times m. Unlike the full SVD, the thin SVD omits the extraneous parts of the unitary matrices beyond \min(m,n), tailored for computational efficiency. This form is advantageous for rectangular matrices, reducing computation and storage of null space components beyond the smaller dimension, enhancing efficiency in large-scale problems or memory-limited settings. For example, when m \gg n, it avoids computing the full m \times m U, saving space proportional to m(m - n). It serves as a basis for further reductions like the compact by truncating to the rank if known.

Compact SVD

The compact singular value decomposition (SVD) of an m \times n matrix A with rank r = \operatorname{rank}(A) \leq \min(m,n) is given by A = U_r \Sigma_r V_r^*, where U_r is an m \times r matrix whose columns are the left singular vectors corresponding to the r nonzero singular values, \Sigma_r is an r \times r diagonal matrix containing these positive singular values \sigma_1 \geq \sigma_2 \geq \cdots \geq \sigma_r > 0 along its , and V_r is an n \times r matrix whose columns are the right singular vectors. This focuses exclusively on the nonzero singular triplets, minimizing storage to m r + r + n r, significantly less when r \ll \min(m,n). The columns of U_r form an for the column space of A (satisfying U_r^* U_r = I_r), and the columns of V_r form an for the row space of A (V_r^* V_r = I_r). This minimal form exactly reconstructs A while discarding all null space contributions associated with zero singular values, providing the most reduced exact decomposition. Compared to the thin SVD, the compact SVD further reduces dimensions when the matrix is rank-deficient (r < \min(m,n)), excluding the zero singular values and their vectors; they coincide when r = \min(m,n) (full rank). It balances between the full SVD's completeness and the thin SVD's dimensional efficiency, useful for low-rank matrices where highlighting the effective rank is key, and serves as the starting point for truncated approximations.

Truncated SVD

The truncated singular value decomposition (TSVD) of an m \times n matrix A with rank r is a rank-k approximation A_k = \sum_{i=1}^k \sigma_i u_i v_i^*, where k \leq r, the \sigma_i (with \sigma_1 \geq \sigma_2 \geq \cdots \geq \sigma_r > 0) are the singular values of A, and u_i, v_i are the corresponding left and right singular vectors. This form arises by retaining only the k largest singular triplets from the compact SVD A = U_r \Sigma_r V_r^*, effectively setting the remaining singular values to zero. Among all possible rank-k matrices, A_k provides the optimal approximation in both the spectral norm (operator 2-norm) and the Frobenius norm. The Eckart–Young theorem quantifies the approximation quality, stating that the minimal error in the 2-norm is \|A - A_k\|_2 = \sigma_{k+1}, while the squared Frobenius norm error is \|A - A_k\|_F^2 = \sum_{i=k+1}^r \sigma_i^2. These bounds highlight the direct connection between the discarded singular values and the incurred error, making TSVD a principled method for . A key property of TSVD is the monotonic decrease in approximation error as k increases: since the singular values are ordered non-increasingly, adding more terms reduces the residual sum \sum_{i=k+1}^r \sigma_i^2 by incorporating progressively smaller but positive contributions. This ensures that higher truncation levels yield strictly better approximations in the Frobenius norm, with the 2-norm error dropping to the next singular value threshold. TSVD is optimal for , as smaller singular values often capture noise rather than signal, allowing the approximation to filter perturbations while retaining dominant structure.

Applications

Pseudoinverse and

The Moore-Penrose pseudoinverse of a A \in \mathbb{C}^{m \times n}, denoted A^+, is uniquely defined as the matrix satisfying the four Penrose equations: A A^+ A = A, A^+ A A^+ = A^+, (A A^+)^H = A A^+, and (A^+ A)^H = A^+ A, where ^H denotes the . These properties ensure that A^+ provides the unique solution of minimal to consistent linear systems and the minimal solution to inconsistent ones. Given the singular value decomposition A = U \Sigma V^H, where U and V are unitary and \Sigma is diagonal with nonnegative singular values \sigma_i in decreasing order, the pseudoinverse is computed as A^+ = V \Sigma^+ U^H. Here, \Sigma^+ is the that reciprocates the nonzero singular values ($1/\sigma_i for \sigma_i > 0) and sets zeros for the remaining entries. This formulation leverages the SVD to directly obtain A^+ in a numerically manner, preserving the Hermitian and properties of the pseudoinverse. In the context of least squares problems, consider the overdetermined system A \mathbf{x} = \mathbf{b} where A has full column . The solution \mathbf{x} = A^+ \mathbf{b} minimizes the Euclidean residual \|A \mathbf{x} - \mathbf{b}\|_2. For the underdetermined homogeneous case A \mathbf{x} = \mathbf{0}, the solutions span the null space of A, and A^+ \mathbf{0} = \mathbf{0} yields the unique minimal-norm solution among them. The SVD also facilitates total least squares (TLS), which addresses errors in both A and \mathbf{b} by minimizing perpendicular distances in the augmented system [A \mid \mathbf{b}]. The TLS solution is derived from the right singular vector corresponding to the smallest singular value of the augmented matrix, providing a robust alternative to ordinary least squares for noisy data. For ill-conditioned matrices, where small singular values \sigma_{\min} amplify errors, the pseudoinverse can be regularized by thresholding: set $1/\sigma_i = 0 for \sigma_i < \tau (a tolerance based on machine precision and matrix conditioning), yielding a rank-deficient approximation that stabilizes computations.

Low-rank approximation and compression

One of the primary applications of singular value decomposition (SVD) is in low-rank matrix approximation, where a matrix A \in \mathbb{R}^{m \times n} is approximated by a lower-rank matrix A_k of rank at most k, with k \ll \min(m, n). The truncated SVD achieves this by retaining only the k largest singular values and their corresponding left and right singular vectors, yielding A_k = U_k \Sigma_k V_k^T. This approximation is optimal in the sense that it minimizes the approximation error \|A - B\|_F over all matrices B of rank at most k, where \|\cdot\|_F denotes the Frobenius norm; the same holds for the spectral norm \|\cdot\|_2. This result is established by the Eckart–Young theorem, which proves that no other rank-k matrix can provide a smaller error in these norms. The singular value spectrum of A, given by the diagonal entries of \Sigma, quantifies the and potential loss in such approximations. The error \|A - A_k\|_F equals the of the of the squares of the discarded singular values \sigma_{k+1}^2 + \cdots + \sigma_{\min(m,n)}^2, providing a direct measure of the information discarded. This spectrum-based assessment has informed modern compression techniques, including hybrids that combine SVD with transforms, such as wavelet difference reduction applied post-SVD for improved ratios in the 2010s. In data compression, the truncated SVD enables efficient storage by representing A via the factorized components U_k, \Sigma_k, and V_k^T instead of the full . For an m \times n matrix, full storage requires mn entries, whereas the truncated form needs approximately k(m + n + 1) entries, yielding substantial savings when k is small relative to \min(m, n). This approach is particularly effective for sparse or low-intrinsic-rank data, as the rapid decay of singular values allows high-fidelity reconstruction with minimal storage. A classic example is grayscale image compression, where an image is represented as an m \times n pixel intensity matrix A. Computing the SVD of A and retaining the top k components reconstructs a compressed approximation A_k that preserves essential visual features while reducing file size. For instance, in a 512 × 512 image, using k = 20 can achieve over 90% compression while often preserving visual quality for natural images, as the dominant singular values capture the primary structural patterns like edges and contrasts.

Principal component analysis

Singular value decomposition (SVD) provides an efficient and numerically stable method for computing (PCA), a technique for that identifies directions of maximum variance in a . For a centered X \in \mathbb{R}^{m \times n}, where m represents the number of samples and n the number of features, the SVD is given by X = U \Sigma V^*, with U \in \mathbb{R}^{m \times m} and V \in \mathbb{R}^{n \times n} orthogonal matrices, and \Sigma a containing the singular values \sigma_1 \geq \sigma_2 \geq \cdots \geq \sigma_{\min(m,n)} \geq 0. The columns of V correspond to the principal components, which are the directions of maximum variance, while the singular values quantify the scale of variance along these directions. The sample variances associated with each principal component are \frac{\sigma_i^2}{m-1} for i = 1, \dots, \min(m,n), derived from the eigenvalues of the sample \frac{1}{m-1} X^T X. Centering the data by subtracting the from each feature column is essential prior to SVD, as it ensures the principal components capture variance orthogonal to the direction; without centering, the first singular vector may align with the , distorting the results. To obtain a reduced representation, the data is projected onto the top k right singular vectors: Y = X V_k, where V_k consists of the first k columns of V, yielding a lower-dimensional matrix Y \in \mathbb{R}^{m \times k} that preserves the most significant variance. The singular values provide a direct measure of explained variance, with the proportion explained by the i-th component given by \frac{\sigma_i^2}{\sum_j \sigma_j^2}, allowing assessment of how much each principal component retains. A , which graphs the singular values (or their squares) against component index, aids in selecting k by identifying an "elbow" where additional components contribute diminishing variance. is particularly advantageous for high-dimensional data, avoiding explicit computation of the , which can be ill-conditioned or infeasible when n \gg m. In modern , SVD's role in extends to nonlinear extensions like kernel PCA, where the eigenvalue problem in the feature space—analogous to SVD in the input space—enables capturing nonlinear structures via kernel functions. Similarly, in the 2020s, integrations with autoencoders treat linear autoencoders as equivalent to via SVD, while deep variants build on this for , often outperforming in complex tasks like image classification but at higher computational cost.

Additional applications

The employs to determine the optimal that minimizes the (RMSD) between two sets of atomic coordinates, facilitating the superposition of molecular structures in and . Given two point sets represented as matrices P and Q, the algorithm computes the H = P Q^T, performs SVD H = U \Sigma V^T, and constructs the rotation as R = V U^T, ensuring the alignment preserves by checking the . This method has become standard for comparison, enabling precise RMSD calculations essential for validating simulations and conformational analysis. In , SVD enables noise filtering by decomposing time-series data into singular modes and truncating those with small singular values corresponding to noise, thereby reconstructing a denoised signal while preserving dominant components. For instance, in experimental data such as (), the matrix of signal measurements undergoes SVD, allowing selective retention of low-rank approximations that capture underlying patterns, with the truncation threshold often determined by estimating the noise level from singular value spectra. This approach outperforms traditional filters in non-stationary environments, as demonstrated in reweighted SVD applications for fault detection in machinery signals. The multilinear singular value decomposition (MLSVD), or higher-order SVD, extends matrix SVD to tensors, providing a low multilinear rank approximation for higher-dimensional data by factorizing a tensor into orthogonal mode matrices and a core tensor. This decomposition is particularly useful for separable models in multidimensional arrays, such as video or genomic , where it reveals latent structures by unfolding the tensor along each and applying SVD sequentially. In practice, MLSVD minimizes the Frobenius norm error for multilinear approximations, enabling efficient storage and computation in fields like by reducing tensor dimensions while retaining essential variability. The orthogonal Procrustes problem seeks the nearest orthogonal matrix Q to a given matrix O in the Frobenius norm, solved via SVD of O = U \Sigma V^* yielding Q = U V^*, which projects O onto the orthogonal group without altering its singular values. This formulation arises in shape analysis and sensor calibration, where aligning datasets requires orthogonal transformations, and the solution guarantees minimality due to the Eckart-Young theorem's extension to unitary constraints. In recommender systems, SVD facilitates latent factor models by performing low-rank matrix on user-item interaction matrices, extracting hidden features that predict preferences through truncated approximations. The R \approx U \Sigma V^T identifies user and item embeddings in a , with regularization on singular values preventing in sparse datasets, as evidenced by its role in the where it improved recommendation accuracy by capturing patterns. Singular value decomposition supports tomography by enabling low-rank approximations of density matrices reconstructed from partial measurements, aiding error mitigation in noisy quantum devices through truncation of insignificant singular values. Recent advances in 2023 integrate SVD-based with fermionic reduced density matrices, enhancing in multi-particle and filtering experimental noise for applications like error-corrected simulations. This approach reduces measurement overhead, achieving chemical accuracy in reconstructing two-particle correlations while suppressing decoherence effects in variational quantum eigensolvers.

Norms and Variations

Ky Fan and Hilbert–Schmidt norms

The Ky Fan k-norms form a family of unitarily invariant norms on matrices, defined directly from the singular values obtained via singular value decomposition. For an m \times n matrix A with singular values \sigma_1(A) \geq \sigma_2(A) \geq \cdots \geq \sigma_{\min(m,n)}(A) \geq 0, the Ky Fan k-norm is \|A\|_{(k)} = \sum_{i=1}^k \sigma_i(A), where k ranges from 1 to \min(m,n). These norms satisfy the properties of a matrix norm, including submultiplicativity and the , and are symmetric in the sense that they depend only on the ordered singular values. When k=1, the Ky Fan 1-norm reduces to the spectral norm (or operator 2-norm), \|A\|_2 = \sigma_1(A), which measures the maximum stretch factor induced by A. The Ky Fan k-norms are unitarily invariant: for unitary matrices U \in \mathbb{C}^{m \times m} and V \in \mathbb{C}^{n \times n}, \|U A V\|_{(k)} = \|A\|_{(k)}, reflecting the fact that SVD s remain unchanged under unitary transformations. They represent a specific case within the broader class of unitarily invariant s, akin to a truncated variant of Schatten p-norms at p = \infty; while the Schatten \infty-norm is simply the largest , the Ky Fan k-norm extends this by summing the k largest, providing a measure of the combined effect of the dominant singular directions. A key property linking Ky Fan norms to SVD approximations is their role in low-rank optimization. The Eckart–Young–Mirsky theorem, generalized to unitarily invariant norms, states that the minimum Ky Fan k-norm distance from A to any rank-at-most-k matrix B is achieved precisely by the rank-k truncation of the of A: \min_{\rank(B) \leq k} \|A - B\|_{(k)} = \sum_{j=1}^{\min(k, p-k)} \sigma_{k+j}(A), where p = \min(m,n). This optimality underscores the utility of SVD in bounding approximation errors via tail singular values. The Hilbert–Schmidt norm, equivalent to the Frobenius norm for finite matrices, is another unitarily invariant norm expressible through SVD singular values. It is defined as \|A\|_{HS} = \sqrt{\sum_{i=1}^{\min(m,n)} \sigma_i(A)^2} = \sqrt{\trace(A^* A)}, where A^* denotes the . This norm arises as the Schatten p-norm for p=2, capturing the Euclidean length of the singular value vector in \ell_2 sense, and it equals the \ell_2 norm of the entries of A. Like the Ky Fan norms, it is unitarily invariant: \|U A V\|_{HS} = \|A\|_{HS}. The Hilbert–Schmidt norm quantifies the total "energy" of A across all singular directions, making it particularly useful for assessing overall magnitude in finite-dimensional settings.

Generalizations to operators

The singular value decomposition (SVD) extends to bounded linear operators between Hilbert spaces. For a bounded operator T: H \to K where H and K are complex Hilbert spaces, the singular values \sigma_i(T) are defined as the square roots of the eigenvalues of the positive self-adjoint operator T^* T, arranged in decreasing order; these are also known as the s-numbers of T. The SVD in this setting arises from the polar decomposition T = U |T|, where U is a partial isometry with initial space the closure of the range of |T| = \sqrt{T^* T}, and the singular values capture the "stretching" factors along principal directions. For compact operators, a special class of bounded operators that map bounded sets to precompact sets, the SVD takes a more explicit series form. Specifically, T admits a decomposition T = \sum_{i=1}^\infty \sigma_i \, y_i \otimes x_i, where \{x_i\} and \{y_i\} are orthonormal sets in H and K, respectively, such that T z = \sum_{i=1}^\infty \sigma_i \langle z, x_i \rangle y_i for all z \in H, and the singular values satisfy \sigma_i \to 0 as i \to \infty. This representation allows compact operators to be approximated by finite-rank operators, with the error controlled by the tail of the singular values, enabling practical computations in infinite-dimensional settings. Key properties follow from the compactness: the singular values \sigma_i are positive, nonincreasing, and accumulate only at zero, distinguishing compact operators from general bounded ones. Among compact operators, nuclear operators—equivalently trace-class operators in Hilbert spaces—are those for which the singular values are \ell^1-summable, i.e., \sum_i \sigma_i < \infty, providing a nuclear norm \|T\|_1 = \sum_i \sigma_i that measures their "size" in a summability sense. The existence of the SVD for compact operators is implied by the for compact operators, which decomposes T^* T into a series of projections onto eigenspaces with eigenvalues \sigma_i^2, from which the singular vectors are derived via the .

Scale-invariant and other variations

The scale-invariant singular value decomposition addresses the sensitivity of the standard SVD to row and column by incorporating diagonal scaling matrices into the . For a A, this variation takes the form A = D U \Sigma V^T E, where D and E are diagonal matrices that normalize the rows and columns, respectively, ensuring the core decomposition U \Sigma V^T is invariant to multiplicative scaling factors. This adjustment is particularly useful for matrices with inherent scaling differences, such as those arising in where units or magnitudes vary across dimensions. In the context of positive matrices, scale-invariance can be achieved by setting the scaling factors based on geometric means of the singular values, as in the geometric mean decomposition (GMD), which factorizes A = Q R P^T where the diagonal entries of the upper triangular R are the geometric mean \bar{\sigma} = \left( \prod \sigma_i \right)^{1/r} of the positive singular values, preserving the relative structure while normalizing overall scale. This approach maintains the ratios of the singular values, allowing robust comparison of matrix structures independent of absolute magnitudes, and finds application in fields like where scaling variations between datasets must be mitigated. The generalized singular value decomposition (GSVD) extends to pairs of matrices A and B sharing the same number of columns, decomposing them as A = U \Sigma_X X^{-1} and B = V \Sigma_Y X^{-1}, where X is nonsingular, and the generalized singular values are ratios \gamma_i = \sigma_i(A)/\sigma_i(B). This formulation provides a unified for comparing datasets of different dimensionalities, such as genome-scale expression profiles, by highlighting shared and components through the ratios, which remain to common transformations. The GSVD was first formalized for this purpose in seminal work and has been widely adopted for multi-dataset analysis. For multi-way data represented as tensors, the (HOSVD) generalizes by computing a via successive mode-wise s on unfoldings of the tensor. Specifically, for an N-way tensor \mathcal{X}, HOSVD yields \mathcal{X} = \mathcal{S} \times_1 U^{(1)} \times_2 U^{(2)} \cdots \times_N U^{(N)}, where \mathcal{S} is the core tensor and each U^{(n)} contains the leading left singular vectors along mode n. This multilinear approach preserves the multi-dimensional structure and enables low-rank approximations for higher-order arrays, such as in and multidimensional data compression. Other variations include the bidiagonal SVD, which exploits the bidiagonal structure of reduced matrices for efficient computation in structured problems like least-squares solutions, though modern alternatives like randomized SVD have largely supplanted it for large-scale settings. Randomized SVD variants, particularly those adapted for since 2014, use random projections to the matrix and approximate the decomposition in a single pass, enabling processing of massive or continuously arriving data with controlled error bounds. These methods, such as sketching-based algorithms, achieve near-optimal low-rank approximations while supporting parallel and memory-efficient implementations for applications.

Historical Development

Origins in the 19th century

The origins of the singular value decomposition trace back to the mid-19th century, when mathematicians began exploring the canonical forms of bilinear and quadratic forms in the context of linear algebra and . In 1873, Italian mathematician Eugenio Beltrami published a seminal paper on bilinear functions, where he demonstrated how to decompose a general into a sum of products of linear forms using orthogonal transformations. This work effectively introduced the concept of singular values for real square matrices by reducing the associated quadratic forms to a diagonal structure, with the diagonal entries representing the squares of what we now recognize as singular values. Beltrami's approach focused on positive definite forms, assuming the matrices involved led to positive eigenvalues, and arose from efforts to simplify expressions tied to the geometry of conics and higher-dimensional quadrics. Beltrami's decomposition was motivated by problems in n-dimensional , including the of surfaces and their polar relations, which connected to broader studies in elliptic integrals for computing areas and volumes in such spaces. Although not termed singular value decomposition at the time, his method laid the groundwork for understanding how linear transformations could be diagonalized under orthogonal changes of basis, particularly for nonsingular with distinct singular values. This contribution emphasized the stability and geometric interpretation of the decomposition, influencing subsequent work on factorizations. One year later, in 1874, French mathematician Camille Jordan independently extended these ideas in his memoir on s, applying them to general linear transformations between vector spaces. Jordan showed that any such transformation could be represented in a diagonal form via two orthogonal bases, maximizing and minimizing the under unit norm constraints on the vectors. His variational technique provided a more complete framework, handling the full spectrum of singular values and incorporating a process to iteratively find them. Like Beltrami, Jordan's focus remained on positive definite cases and was linked to the geometry of conics in higher dimensions, as well as transformations relevant to elliptic integrals in . This work solidified the recognition of the diagonal form's universality for real matrices, without naming it as . In 1889, contributed further to the canonical forms of pairs of bilinear forms, providing a more general decomposition that encompassed non-symmetric cases and influenced the development of theory relevant to .

20th-century advancements and naming

In the early , introduced the term "singular values" in 1907 while studying equations, formalizing their role in the decomposition of compact operators. advanced the theoretical foundations in 1912 with a proof of existence for the SVD using the for Hermitian matrices, and further refined the framework in 1949, establishing inequalities for singular values. In the 1930s, significant progress was made in extending the singular value decomposition to rectangular matrices and establishing its approximation properties. Carl Eckart and Gale Young provided a characterization, demonstrating that the best to a in the Frobenius or is obtained by truncating the singular value decomposition to the largest singular values, a result now known as the Eckart-Young theorem. This built on 19th-century foundations by formalizing the error bounds for such approximations. Early computational approaches to the singular value decomposition emerged in the mid-20th century, facilitated by the post-World War II boom in electronic computing, which made iterative numerical methods practical for the first time. John von Neumann's work in the 1930s on unitarily invariant norms and operator decompositions laid theoretical groundwork for algorithmic development, suggesting connections to eigenvalue problems that could be adapted for computation. By the 1950s, methods based on Jacobi rotations—originally proposed for symmetric eigenvalue problems in the but revitalized with computers—were extended to the two-sided Jacobi algorithm for singular value decomposition, enabling iterative orthogonal transformations to diagonalize matrices efficiently. These techniques, including early proposals by Kogbetliantz in 1955, marked the shift from theoretical existence proofs to reliable numerical implementation. In the late 1960s, Gene Golub developed efficient and stable algorithms for computing the , notably the Golub-Reinsch algorithm, which uses bidiagonalization followed by QR iterations on the bidiagonal form. This method became a cornerstone of software. The term "singular value decomposition" was coined by G. W. Stewart in his 1973 textbook Introduction to Matrix Computations, standardizing nomenclature for what had previously been described variably as the "" or "fundamental theorem of linear algebra." This naming reflected growing recognition of its utility in amid advancing computational capabilities. Into the late 20th and early 21st centuries, the singular value decomposition played a pivotal role in handling through low-rank matrix factorization. A notable example is the competition (2006–2009), where teams leveraged SVD-based models to predict user movie ratings from sparse matrices, achieving up to 10% improvement in accuracy over baseline systems and demonstrating its scalability for recommender applications.

References

  1. [1]
    [PDF] svd-notes.pdf - UC Berkeley math
    A singular value decomposition (SVD) is a factorization A = UΣV^T, where U and V are orthogonal matrices, and Σ is a diagonal matrix of singular values.
  2. [2]
    On the Early History of the Singular Value Decomposition
    This paper surveys the contributions of five mathematicians—Eugenio Beltrami (1835–1899), Camille Jordan (1838–1921), James Joseph Sylvester (1814–1897), ...
  3. [3]
    What Is the Singular Value Decomposition? - Nick Higham
    Oct 13, 2020 · The SVD was introduced independently by Beltrami in 1873 and Jordan in 1874. Golub popularized the SVD as an essential computational tool and ...<|separator|>
  4. [4]
    Applications of singular-value decomposition (SVD) - ScienceDirect
    Sep 3, 2004 · We show how SVD can be used as a tool for teaching Linear Algebra geometrically, and then apply it in solving least-squares problems and in data compression.
  5. [5]
    Singular Value Decomposition - MATLAB & Simulink - MathWorks
    Singular value decomposition expresses a matrix as A = U*S*V', where S is a diagonal matrix of singular values. U and V are singular vectors.<|control11|><|separator|>
  6. [6]
    [PDF] Singular-Value Decomposition and its Applications - UCSD Math
    In this thesis, we will discuss several applications of SVD such as recommendation system, imagine compression, handwritten digits classification and the ...
  7. [7]
    [PDF] Chapter 7 The Singular Value Decomposition (SVD)
    The Singular Value Decomposition is a highlight of linear algebra. A is any m by n matrix, square or rectangular. Its rank is r. We will diagonalize this A, ...
  8. [8]
    [PDF] Singular Value Decomposition (SVD)
    The decomposition is called the singular value decomposition, SVD, of A. In matrix notation A = UDV T where ... Golub and Van Loan. Clustering Gaussians. 36.<|control11|><|separator|>
  9. [9]
    7.4 Singular Value Decompositions - Understanding Linear Algebra
    A singular value decomposition will have the form U Σ V T where U and V are orthogonal and Σ is diagonal. Most notably, we will see that every matrix has a ...
  10. [10]
    [PDF] The Singular Value Decomposition
    1. The singular values are unique and, for distinct positive singular values, sj > 0, the jth columns of U and V are also unique up to a sign change of both ...
  11. [11]
    [PDF] Properties of the Singular Value Decomposition - Duke People
    (The use of inverses is to conform with the notation in Golub and Van Loan; it also suggests the common practice of scaling signals by their nominal values.).
  12. [12]
    [PDF] Singular Value Decomposition
    If A is an invertible n ×n matrix, then the ratio σ1/σn of the largest and smallest singular values gives the condition number of A. Exercises 41-43 in Section ...
  13. [13]
    [PDF] A Geometric Perspective on the Singular Value Decomposition - arXiv
    Mar 24, 2015 · This is an introductory survey, from a geometric perspective, on the Singular Value Decomposition (SVD) for real matrices, focusing on the role ...
  14. [14]
    [PDF] L15: Singular Value Decomposition
    Sometimes this geometric interpretation of the SVD is known as PCA (Principal Component Analysis). Data. We will focus on a dataset P ⊂ Rd where P is a set of n ...
  15. [15]
    [PDF] the singular value decomposition and its applications - ICERM
    There is an interesting geometric interpretation of the SVD. Using ui and vj to denote the columns of U and V respectively, the SVD of a 2 × 2 matrix A can be ...<|control11|><|separator|>
  16. [16]
    [PDF] Chapter 9 Singular Value Decomposition - Henry D. Pfister
    SINGULAR VALUE DECOMPOSITION all of the four fundamental subspaces of the matrix A. For example, it is easy to show that. R(A) = span(U1). R(AH) = span(V1). N ...
  17. [17]
    [PDF] Chapter 6 The Singular Value Decomposition - Virginia Tech
    The singular value decomposition (SVD) is among the most important and widely applicable matrix factorizations. It provides a natural way to untangle a matrix ...
  18. [18]
    [PDF] Eigenvalues and Singular Values - MathWorks
    Sep 16, 2013 · A nonnegative eigenvalue, λ ≥ 0, is also a singular value, σ = λ. The corresponding vectors are equal to each other, u = v = x.
  19. [19]
    [PDF] Eigenvalues, Singular Value Decomposition - UCCS
    Given any rectangular matrix (m × n) matrix A, by singular value decomposition of the matrix A we mean a decomposition of the form A = UΣV T , where U and V are.
  20. [20]
    [PDF] Singular Value Decomposition (SVD)
    We are now in a position to investigate SVD mechanics in analogy to eigenvalue/eigenvector mechanics. A similar process of finding singular values.<|control11|><|separator|>
  21. [21]
    [PDF] Chapter 12 Singular Value Decomposition and Polar Form
    This fact has the remarkable consequence that every lin- ear map has two important decompositions: 1. The polar form. 2. The singular value decomposition (SVD).
  22. [22]
    [PDF] Chapter 13 Applications of SVD and Pseudo-inverses - CIS UPenn
    APPLICATIONS OF SVD AND PSEUDO-INVERSES. If V DU>. = Xµ is an SVD decomposition, it is easy to see that a least squares solution of this system is given by.
  23. [23]
    [PDF] Singular Value Decomposition (SVD) and Generalized Singular ...
    The singular value decomposition (SVD) is a generalization of the eigen-decomposition which can be used to analyze rectangular.
  24. [24]
    Singular value decomposition - StatLect
    Singular value decomposition (SVD) decomposes a matrix into a unitary matrix, a matrix with positive diagonal entries, and another unitary matrix.
  25. [25]
    [PDF] Interior-point method for nuclear norm approximation with ...
    The nuclear norm (sum of singular values) of a matrix is often used in convex heuristics for rank minimization problems in control, signal processing, and ...
  26. [26]
    [PDF] Spectral theorems, SVD, and Quadratic forms
    4 Singular Value Decomposition. In this course, we have seen or referred to several matrix decompositions: LU decomposition : A = LU, where L is lower ...
  27. [27]
    [PDF] Singular value decomposition (SVD)
    Mar 3, 2023 · AtA is symmetric, and by the Spectral theorem, let {vi}n i=1 be the orthonormal eigenvectors of. AtA corresponding to eigenvalue λ1 ≥ λ2 ...
  28. [28]
    [PDF] Note 15: Singular Value Decomposition - EECS16B
    Apr 19, 2024 · By the Spectral Theorem, the eigenvalues of. A⊤ A are real and thus may be ordered. Furthermore, there is an orthonormal basis of Rn consisting ...
  29. [29]
    [PDF] Lecture 5: Singular Value Decomposition (SVD)
    This is called the Singular Value Decomposition (SVD) of X: The diagonals of Σ are called the singular values of X (often sorted in decreasing order).
  30. [30]
    The singular value decomposition | Nicholas Hu
    Jan 9, 2025 · Although an SVD is not unique, this argument shows that the singular values are unique and that the singular vectors are unique up to ...
  31. [31]
    [PDF] Math 103, Summer 2006 Spectral Theorem and SVD August 15, 2006
    Aug 15, 2006 · Proof of the Singular Value Decomposition. For i ≤ rank(A), let. −→ ui = 1 σi. A−→vi for i = 1, ··· ,r = rank(A), and let. −−→ ur+1 ...
  32. [32]
    [PDF] Singular value decomposition - Purdue Math
    The problem addressed here is how can one simplify a linear transforma- tion by choosing two different bases, one in the domain and one in the image.
  33. [33]
    [PDF] 3·Singular Value Decomposition
    Apr 1, 2017 · 3.5 Variational Characterization of Singular Values. Since the singular values are square roots of the eigenvalues of the Hermitian matrices ...
  34. [34]
    [PDF] 3.3 Variational Characterization of Singular Values - Faculty
    Mar 7, 2012 · Variance, by its definition as the expected value of the square of a real random variable, is always nonnegative.
  35. [35]
    [PDF] The Singular Value Decomposition - NetLib.org
    Two-sided Jacobi methods, first proposed by. 77. Kogbetliantz in 1955 [76], iteratively apply rotations on both sides of A to bring it. 78 to diagonal form ...
  36. [36]
  37. [37]
    [PDF] Jacobi Methods
    Two types of Jacobi SVD procedures are: • Two-sided Jacobi: In each Jacobi ... • One-sided Jacobi: This approach, like the Golub-Kahan SVD algorithm, implicitly ...
  38. [38]
    [PDF] new fast and accurate jacobi svd algorithm: i. lapack working note 169
    The new algorithm has inherited all good high accuracy properties, and it outperforms not only the best implementations of the one–sided Jacobi algorithm but ...
  39. [39]
    [PDF] Calculating the Singular Values and Pseudo-Inverse of a Matrix
    In order to facilitate the computation of the singular values and the pseudo-inverse of the complex m × n matrix A, we. Page 4. 208. G. GOLUB AND W. KAHAN.
  40. [40]
    [PDF] Singular value decomposition and least squares solutions
    Golub, G., Kahan, W. : Calculating the singular values and pseudo-inverse of a matrix. J. SIAM. Numer. Anal., Ser. B 2, 205--224 (1965). 7. -- Least squares ...Missing: paper | Show results with:paper
  41. [41]
    A Divide-and-Conquer Algorithm for the Bidiagonal SVD - SIAM.org
    The authors present a stable and efficient divide-and-conquer algorithm for computing the singular value decomposition (SVD) of a lower bidiagonal matrix.
  42. [42]
    A Note on Multiplicative Backward Errors of Accurate SVD Algorithms
    Abstract. Multiplicative backward stability results are presented for two algorithms which compute the singular value decomposition of dense matrices.
  43. [43]
    A High-Performance Preconditioned SVD Solver for Accurate Large ...
    In this research, we present a high-performance library, PRIMME_SVDS, that implements our hybrid method based on the state-of-the-art eigensolver package PRIMME ...
  44. [44]
    Closed Form SVD Solutions for 2 x 2 Matrices - Rev 2 - ResearchGate
    The purpose of this paper is to document the closed form singular value decomposition (SVD) solutions for various real and complex 2 x 2 matrices.
  45. [45]
    [PDF] Computing the Singular Values of 2-by-2 Complex Matrices
    This paper describes an algorithm for the singular value decom- position of a 2-by-2 complex matrix. It computes accurate singular values. Keywords Singular ...
  46. [46]
    An FPGA-Based Singular Value Decomposition Processor
    Apr 25, 2017 · ... 2x2 SVD. problems. The 2x2 SVD algorithm is presented as follows,. where a, b, c, d are the four elements of the given 2x2 matrix,. and θ and ...
  47. [47]
    [PDF] Golub and Van Loan - EE IIT Bombay
    My thirty-year book collaboration with Gene Golub began in 1977 at a matrix computation workshop held at Johns Hopkins University. His interest in my work ...
  48. [48]
    None
    Below is a merged summary of the reduced, thin, compact, and economy SVD variants from "Matrix Computations" (4th Edition) based on the provided segments. The information is synthesized into a concise narrative with a table in CSV format to retain all details efficiently. Since the system has a "no thinking token" limit, I’ll focus on directly compiling and organizing the data without additional analysis or elaboration beyond what’s provided.
  49. [49]
  50. [50]
    Matrix Computations - 4th Edition | SIAM Publications Library
    Van Loan's classic is an essential reference for computational scientists and engineers in addition to researchers in the numerical linear algebra community.
  51. [51]
    The approximation of one matrix by another of lower rank
    The mathematical problem of approximating one matrix by another of lower rank is closely related to the fundamental postulate of factor-theory.Missing: theorem | Show results with:theorem
  52. [52]
    Application of low-rank approximation using truncated singular ...
    Here we propose a method to dramatically reduce the noise, and therefore also the acquisition times, by computing, via truncated singular value decomposition, a ...
  53. [53]
    Privacy-Preserving Federated Singular Value Decomposition - MDPI
    Jun 21, 2023 · This paper moves a step further towards these directions: we propose two enhanced federated SVD schemes focusing on utility and privacy, respectively.Missing: 2020s | Show results with:2020s
  54. [54]
    A generalized inverse for matrices | Mathematical Proceedings of ...
    This paper describes a generalization of the inverse of a non-singular matrix, as the unique solution of a certain set of equations.
  55. [55]
    Singular value decomposition and least squares solutions
    Least ...Missing: paper | Show results with:paper
  56. [56]
    An Analysis of the Total Least Squares Problem
    Total least squares (TLS) is one method of solving overdetermined sets of linear equations 𝐴 ⁢ 𝑋 ≈ 𝐵 that is appropriate when there are errors in both the ...
  57. [57]
    [PDF] The Approximation of One Matrix by Another of Lower Rank
    The mathematical problem of approximating one matrix by an- other of lower rank is closely related to the fundamental postulate of factor-theory.Missing: URL | Show results with:URL
  58. [58]
    Lossy image compression using singular value decomposition and ...
    This paper presents a new lossy image compression technique which uses singular value decomposition (SVD) and wavelet difference reduction (WDR).
  59. [59]
    [PDF] Chapter 7 The Singular Value Decomposition (SVD)
    det. 1 - λ. 1. 1. 2 - λ. = λ2 - 3λ +1=0 gives λ1 = 3 + /5. 2 and λ2 =3 -. /5. 2 ... the trace and determinant of. BBT and. BTB in. Problem 3. The singular values ...
  60. [60]
    [PDF] Singular Value Decomposition and Principal Component Analysis
    SVD and PCA are useful techniques in data analysis and visualization. SVD decomposes a matrix into U, Γ, and V. PCA is based on the first two empirical moments.
  61. [61]
    Singular value decomposition of noisy data: noise filtering
    Jul 16, 2019 · We provide a method to (1) estimate the noise level in a given noisy dataset, (2) estimate the root mean square error (rmse) of the SVD modes, and (3) filter ...<|control11|><|separator|>
  62. [62]
    A novel strategy for signal denoising using reweighted SVD and its ...
    Sep 15, 2017 · The basic idea behind SVD denoising is to preserve the singular components (SCs) with significant singular values. However, it is shown that the ...
  63. [63]
    Multilinear Singular Value Decomposition–Based Completion with ...
    The multilinear singular value decomposition (MLSVD) is a fundamental tensor tool for uses such as higher-order principal component analysis, orthogonal ...
  64. [64]
    Fermionic Reduced Density Low-Rank Matrix Completion, Noise ...
    Dec 14, 2023 · Here, we consider the possibility of using matrix completion to reconstruct the two-particle reduced density matrix to chemical accuracy from partial ...
  65. [65]
    [PDF] The Matrix Cookbook
    Nov 15, 2012 · The pseudo inverse A+ can be constructed from the singular value decomposition A = UDVT , by. A+ = VrD−1 r UT r. (223) where Ur, Dr, and Vr ...
  66. [66]
    On the Sum of the Largest Eigenvalues of a Symmetric Matrix
    The sum of the largest k eigenvalues of a symmetric matrix has a well-known extremal property that was given by Fan in 1949.Missing: original | Show results with:original<|control11|><|separator|>
  67. [67]
    Symmetric Gauge Functions and Unitarily Invariant Norms
    A symmetric gauge function is a real-valued function on real n-vectors that satisfies specific conditions, including f(i) < D(x) > 0 and <D(px) = H<D(x).
  68. [68]
    [PDF] An elementary proof of Mirsky's low rank approximation theorem
    As mentioned in the introduction, one may prove Theorem 1 using singular value inequalities and the Ky Fan dominance theorem, e.g., see [2, p.215].
  69. [69]
    Matrix Young inequalities for the Hilbert–Schmidt norm - ScienceDirect
    Mar 15, 2000 · On singular values of a product of operators. SIAM J. Matrix Anal. Appl., 11 (1990), pp. 271-277. Google Scholar. [5]. R. Bhatia, F. Kittaneh.Missing: reference | Show results with:reference
  70. [70]
    [PDF] Article the singular value expansion for arbitrary bounded linear ...
    Aug 12, 2020 · It is less well known that a general bounded linear operator defined on Hilbert spaces also has a singular value expansion.
  71. [71]
    [PDF] Essays in analysis Compact operators - UBC Math
    Apr 16, 2020 · We now construct the singular value decomposition of an arbitrary bounded operator. A partial isometry is a bounded operator U satisfying ...<|control11|><|separator|>
  72. [72]
    [PDF] The singular value decomposition of compact operators on Hilbert ...
    Apr 3, 2014 · The purpose of these notes is to present material about compact operators on. Hilbert spaces that is special to Hilbert spaces, ...
  73. [73]
    [PDF] The geometric mean decomposition - University of Florida
    Starting with the SVD, we show in Section 3 that the GMD can be computed using a series of Givens rotations, and row and column exchanges. Alternatively ...<|control11|><|separator|>
  74. [74]
    Generalized singular value decomposition for comparative analysis ...
    We have shown that GSVD provides a comparative mathematical framework for two genome-scale expression data sets, in which the variables and operations may ...
  75. [75]
    Range-Net: A High Precision Streaming SVD for Big Data Applications
    Oct 27, 2020 · Recently introduced streaming Randomized SVD schemes work under the restrictive assumption that the singular value spectrum of the data has exponential decay.
  76. [76]
    [PDF] ON BILINEAR FUNCTIONS
    A Translation from the Original Italian of one of the. Earliest Published Discussions of the Singular Value Decomposition. Eugenio Beltrami (Cremona, 1835 – ...
  77. [77]
    [PDF] Early History of the Singular Value Decomposition - UC Davis Math
    Jan 17, 2002 · Together, Beltrami and Jordan are the progenitors of the singular value decomposition, Beltrami by virtue of first publication and Jordan by the ...
  78. [78]
    Jacobi eigenvalue algorithm - Wikipedia
    The Jacobi eigenvalue algorithm is an iterative method for the calculation of the eigenvalues and eigenvectors of a real symmetric matrixMissing: SVD | Show results with:SVD
  79. [79]
    [PDF] MATRIX FACTORIZATION TECHNIQUES FOR RECOMMENDER ...
    Applying SVD in the collaborative filtering domain requires factoring the user-item rating matrix.