Fact-checked by Grok 2 weeks ago

Singular value

In linear algebra, a singular value of an m \times n matrix A is a non-negative real number \sigma_i = \sqrt{\lambda_i}, where \lambda_i are the eigenvalues of the symmetric positive semi-definite matrix A^T A, ordered in non-increasing sequence \sigma_1 \geq \sigma_2 \geq \cdots \geq \sigma_{\min(m,n)} \geq 0.^[1] These values represent the scaling factors along principal directions in the singular value decomposition (SVD) of A, given by A = U \Sigma V^T, where U and V are orthogonal matrices, and \Sigma is a diagonal matrix with the singular values on its main diagonal.^[1] The number of non-zero singular values equals the rank of A, providing a measure of its linear independence and dimensionality.^[1] Singular values differ from eigenvalues in that they apply to rectangular matrices and are always non-negative, facilitating analysis of transformations between potentially different-dimensional spaces.^[2] Geometrically, the largest singular value \sigma_1 is the operator norm of A, or the maximum stretch factor \|Ax\| for unit vectors x, while smaller ones indicate contractions along associated right singular vectors.^[1] For complex matrices, singular values are defined analogously using the conjugate transpose A^H instead of A^T.^[3] The SVD, central to singular values, generalizes the eigenvalue decomposition to non-square matrices and underpins numerous applications, including data compression, noise reduction, and principal component analysis in statistics and machine learning.^[2] Computationally, singular values are obtained via algorithms like the Golub-Reinsch method, which leverages the eigendecomposition of A^T A.^[3]

Fundamentals

Definition

In linear algebra, the singular values of an m \times n complex matrix A are defined as the non-negative square roots of the eigenvalues of A^* A, where A^* denotes the conjugate transpose of A, arranged in non-increasing order \sigma_1(A) \geq \sigma_2(A) \geq \cdots \geq \sigma_p(A) \geq 0 and p = \min(m,n) ^[3]. Equivalently, they can be obtained as the square roots of the eigenvalues of AA^*, which yields the same set of values padded with zeros if m \neq n ^[4]. These singular values provide a measure of the matrix's "size" in different directions, independent of the choice of basis. This concept extends to bounded linear operators on Hilbert spaces. For a compact operator T on a separable Hilbert space, the singular values \sigma_k(T) are the square roots of the eigenvalues of T^* T, again ordered decreasingly, where T^* is the adjoint operator ^[5]. The singular values of compact operators accumulate only at zero, reflecting the operator's approximation by finite-rank operators. The largest singular value \sigma_1(A) coincides with the spectral norm (or operator 2-norm) of A, defined as \|A\|_2 = \sup_{\|x\|_2 = 1} \|Ax\|_2 ^[6]. This norm quantifies the maximum stretch induced by A on unit vectors in the 2-norm. For example, consider the $2 \times 2 diagonal matrix A = \operatorname{diag}(3, 2). The eigenvalues of A^* A = A^2 = \operatorname{diag}(9, 4) are 9 and 4, so the singular values are \sigma_1(A) = 3 and \sigma_2(A) = 2 ^[3].

Geometric Interpretation

The singular values of a matrix A offer a geometric lens into the action of the corresponding linear transformation on the unit ball in the input space. The image of this unit ball under A forms an ellipsoid in the output space, where the lengths of the semi-axes of the ellipsoid are precisely the singular values \sigma_1 \geq \sigma_2 \geq \cdots \geq \sigma_{\min(m,n)} \geq 0. This distortion captures how the transformation stretches or compresses volumes along principal directions, with zero singular values corresponding to directions of complete collapse.^[7] These singular values quantify the extent of stretching applied to the unit sphere. The largest singular value \sigma_1 represents the maximum amplification factor, achieved along the direction that maximizes the norm \|A\mathbf{v}\| for unit vectors \mathbf{v}, while the smallest non-zero singular value \sigma_r (where r is the rank of A) denotes the minimum stretching in the subspace spanned by the range of A. In this way, the singular values delineate the range of scaling behaviors induced by the transformation, from maximal expansion to minimal preservation of length.^[8] For a two-dimensional matrix A \in \mathbb{R}^{2 \times 2}, this interpretation simplifies to the mapping of a unit circle into an ellipse, where \sigma_1 scales the major axis and \sigma_2 scales the minor axis, visually demonstrating anisotropic stretching along orthogonal principal axes. For instance, if \sigma_1 = 8.17 and \sigma_2 = 2.31, the unit circle elongates significantly along one direction while compressing in the perpendicular one, highlighting the directional dependence of the transformation.^[9] The ratio \kappa(A) = \sigma_1 / \sigma_{\min}, termed the condition number, geometrically reflects the overall distortion of the unit ball into the ellipsoid, with large values indicating severe ill-conditioning where small input perturbations lead to disproportionately large output changes.^[10]

Singular Value Decomposition

Statement

The singular value decomposition (SVD) of an m \times n real or complex matrix A is a factorization A = U \Sigma V^*, where U is an m \times m unitary matrix, V is an n \times n unitary matrix, and \Sigma is an m \times n rectangular diagonal matrix with non-negative real numbers \sigma_1 \geq \sigma_2 \geq \cdots \geq \sigma_p \geq 0 on its main diagonal (with p = \min(m, n)), and all other entries zero; these \sigma_i are the singular values of A.^[11] The columns of U are the left singular vectors, the columns of V are the right singular vectors, and the singular values appear explicitly as the diagonal entries of \Sigma, ordered in non-increasing fashion to reflect the decreasing "importance" or scaling factors in the decomposition. This form generalizes the eigenvalue decomposition to rectangular matrices, capturing the matrix's action as a composition of unitary transformations and diagonal scaling. For matrices of rank r < p, a compact (or reduced, or economy) SVD truncates the factorization to A = U_r \Sigma_r V_r^*, where U_r is m \times r with orthonormal columns, \Sigma_r is r \times r diagonal with the r positive singular values, and V_r is n \times r with orthonormal columns; this form discards the null spaces and zero singular values for efficiency. The full SVD includes the complete unitary bases for the full spaces, while the reduced form focuses only on the range.^[11] As an illustrative example, consider the $2 \times 2 matrix

A = \begin{pmatrix} 3 & 0 \\ 4 & 5 \end{pmatrix}.

Its full SVD is

A = U \Sigma V^T,

where

U = \frac{1}{\sqrt{10}} \begin{pmatrix} 1 & -3 \\ 3 & 1 \end{pmatrix}, \quad \Sigma = \begin{pmatrix} \sqrt{45} & 0 \\ 0 & \sqrt{5} \end{pmatrix}, \quad V = \frac{1}{\sqrt{2}} \begin{pmatrix} 1 & -1 \\ 1 & 1 \end{pmatrix},

with singular values \sigma_1 = \sqrt{45} \approx 6.708 and \sigma_2 = \sqrt{5} \approx 2.236, both positive since A has full rank 2; multiplying U \Sigma V^T reconstructs A exactly.^[11]

Existence and Construction

The existence of the singular value decomposition (SVD) for any m \times n matrix A (real or complex) follows from the spectral theorem applied to the Hermitian matrix A^H A, which is positive semi-definite.^[12] The eigenvalues \lambda_1 \geq \lambda_2 \geq \cdots \geq \lambda_n \geq 0 of A^H A satisfy \lambda_i \geq 0 for all i, with the corresponding orthonormal eigenvectors forming the columns of a unitary matrix V \in \mathbb{C}^{n \times n}.^[12] The singular values are then defined as \sigma_i = \sqrt{\lambda_i} for i = 1, \dots, n, ordered non-increasingly.^[12] To construct the left singular vectors, consider the rank r of A, where \sigma_1 \geq \cdots \geq \sigma_r > 0 and \sigma_{r+1} = \cdots = \sigma_n = 0. For i = 1, \dots, r, the columns u_i of U are given by u_i = A v_i / \sigma_i, where v_i are the columns of V; these u_i are orthonormal and span the column space of A.^[12] The remaining columns of U \in \mathbb{C}^{m \times m} form an orthonormal basis for the null space of A^H, ensuring U is unitary. This yields A = U \Sigma V^*, where \Sigma is the m \times n diagonal matrix with entries \sigma_i.^[12] Zero singular values account for rank deficiency, as the effective rank r determines the number of non-zero \sigma_i, with the kernel dimension n - r reflected in the trailing zeros of \Sigma.^[12] The singular values \sigma_i are unique up to reordering, as they are the square roots of the distinct non-negative eigenvalues of A^H A (or A A^H).^[13] For distinct \sigma_i, the corresponding singular vectors in U and V are unique up to sign changes in real matrices.^[13] In cases of repeated singular values, the subspaces spanned by the associated singular vectors are uniquely determined, though individual vectors within those subspaces are not.^[13] The SVD is a refinement of the polar decomposition, where any square matrix A admits A = Q P with Q unitary and P positive semi-definite Hermitian; in the SVD form A = U \Sigma V^*, setting Q = U V^* and P = V \Sigma V^* recovers the polar form, highlighting the SVD's role in explicitly revealing the scaling factors via \Sigma.^[12]

Properties

Basic Properties

The singular values of an m \times n matrix A, denoted \sigma_1(A), \sigma_2(A), \dots, \sigma_p(A) where p = \min(m,n), are always non-negative real numbers, satisfying \sigma_i(A) \geq 0 for all i.^[14] These values are defined as the square roots of the eigenvalues of A^H A (or A A^H), which are non-negative due to the positive semidefiniteness of these Hermitian matrices.^[14] By convention, the singular values are ordered in non-increasing sequence: \sigma_1(A) \geq \sigma_2(A) \geq \dots \geq \sigma_p(A) \geq 0.^[14] The number of these singular values is exactly p = \min(m,n), with the number of positive (non-zero) singular values equal to the rank of A; the remaining singular values are zero, accounting for any deficiency in rank.^[14] The singular values exhibit several invariances. Specifically, \sigma_i(A) = \sigma_i(A^T) = \sigma_i(\overline{A}) for all i, where A^T is the transpose and \overline{A} is the complex conjugate of A; this follows from the fact that the non-zero eigenvalues of A^H A and A A^H coincide, and conjugation preserves the spectrum of the positive semidefinite matrix A^H A.^[14] Additionally, the Frobenius norm of A satisfies \|A\|_F = \sqrt{\sum_{i=1}^p \sigma_i(A)^2}, which equals the trace of \Sigma^T \Sigma in the singular value decomposition A = U \Sigma V^H, leveraging the unitarity of U and V.^[14] For normal matrices, where A^H A = A A^H, the singular values coincide with the absolute values of the eigenvalues: \sigma_i(A) = |\lambda_i(A)|, ordered appropriately.^[14] This equivalence arises because the spectral theorem diagonalizes normal matrices unitarily, aligning their eigenvalues directly with the singular values.^[14]

Min-max Characterization

The min-max characterization provides a variational principle for the singular values of a matrix A \in \mathbb{C}^{m \times n}, analogous to the Courant-Fischer theorem for eigenvalues of Hermitian matrices. This characterization arises from applying the Courant-Fischer principle to the eigenvalues of the Hermitian matrix A^* A, whose eigenvalues are the squares of the singular values of A.^[15] For the ordered singular values \sigma_1(A) \geq \sigma_2(A) \geq \cdots \geq \sigma_r(A) \geq 0, where r = \min\{m, n\}, the k-th singular value admits the following min-max and max-min representations over subspaces of the domain:

\sigma_k(A) = \max_{\dim S = k} \min_{\|x\|=1, \, x \in S} \|A x\| = \min_{\dim T = n-k+1} \max_{\|x\|=1, \, x \in T} \|A x\|,

where the maxima and minima are taken over all subspaces S \subseteq \mathbb{C}^n and T \subseteq \mathbb{C}^n of the indicated dimensions, and \|\cdot\| denotes the Euclidean norm.^[16] This theorem implies that the largest singular value \sigma_1(A) coincides with the spectral (operator) norm of A,

\sigma_1(A) = \|A\|_2 = \sup_{\|x\|=1} \|A x\|,

which measures the maximum stretch induced by A. For example, to compute \sigma_1(A), one evaluates the supremum of \|A x\| / \|x\| over unit vectors x, achieved when x aligns with the top right singular vector of A.^[15] The variational form also facilitates approximations of singular values and underpins error bounds in low-rank matrix approximations, where selecting subspaces that optimize the min or max terms yields near-optimal reductions in rank while preserving norms.^[16]

Smallest Singular Value

The smallest singular value of a matrix A \in \mathbb{R}^{m \times n}, denoted \sigma_{\min}(A) or \sigma_p(A) where p = \min(m,n), is the smallest nonnegative value among the singular values \sigma_1(A) \geq \sigma_2(A) \geq \dots \geq \sigma_p(A) \geq 0.^[17] For a square n \times n matrix, it is specifically \sigma_n(A).^[17] This value, which appears as the smallest diagonal entry in the singular value matrix of the SVD, quantifies the matrix's minimal "gain" in the Euclidean norm and serves as a measure of distance to the nearest singular matrix.^[18] For a square matrix A, \sigma_{\min}(A) > 0 if and only if A is invertible, as zero singular values indicate rank deficiency.^[17] Moreover, when A is invertible, the spectral norm of the inverse satisfies \|A^{-1}\|_2 = 1 / \sigma_{\min}(A), linking the smallest singular value directly to the matrix's sensitivity in inversion.^[18] If \sigma_{\min}(A) = 0, the columns (or rows, for the appropriate orientation) of A are linearly dependent, signifying that A does not have full rank and lies in a lower-dimensional subspace.^[17] This condition highlights structural limitations in the matrix's action as a linear transformation. The condition number of A in the 2-norm is defined as \kappa_2(A) = \sigma_{\max}(A) / \sigma_{\min}(A), where a small \sigma_{\min}(A) relative to the largest singular value amplifies errors in numerical computations involving A.^[17] Large values of \kappa_2(A) indicate ill-conditioning, making solutions to systems like Ax = b highly sensitive to perturbations in data or rounding errors.^[18] As an example of perturbation effects, consider the matrix A = \begin{pmatrix} 1 & 100 \\ 0 & 1 \end{pmatrix}, which has singular values approximately 100 and 0.01, yielding \kappa_2(A) \approx 10^4. A small perturbation \Delta A = \begin{pmatrix} 0 & 0 \\ 0.009 & 0 \end{pmatrix} (with \|\Delta A\|_2 \approx 0.009) results in a perturbed matrix whose inverse computation suffers significant relative error due to the diminished effective \sigma_{\min}, illustrating how even tiny changes can destabilize ill-conditioned systems.^[18] The smallest singular value can be characterized briefly as \sigma_{\min}(A) = \inf_{\|x\|_2 = 1} \|Ax\|_2.^[17]

Inequalities

For Submatrices

One key inequality relating the singular values of a matrix to those of its submatrices is the Cauchy interlacing theorem adapted to singular values. For an m \times n matrix A with m \geq n and singular values \sigma_1(A) \geq \sigma_2(A) \geq \cdots \geq \sigma_n(A) \geq 0, if B is a principal submatrix of A of order k \times l with k \geq l \leq n, then the singular values of B satisfy \sigma_i(B) \leq \sigma_i(A) for i = 1, \dots, l.^[19] This result follows from applying the classical Cauchy interlacing theorem for eigenvalues to the Hermitian matrix A^*A, where the principal submatrix B^*B corresponds to a principal submatrix of A^*A, ensuring the eigenvalues (squares of singular values) interlace accordingly.^[19] For arbitrary (not necessarily principal) submatrices B of A, the singular values are also bounded above by those of A: specifically, \sigma_i(B) \leq \sigma_i(A) for i = 1, \dots, \min(k,l). This follows from the min-max characterization, as the Rayleigh quotients for submatrices are dominated by those of the original matrix. These inequalities highlight that submatrices preserve certain spectral structure while potentially concentrating or diluting the singular value distribution. Horn's inequalities provide a complete set of conditions for the possible singular values of minors (submatrices obtained by selecting rows and columns), generalizing the interlacing bounds to recursive families of inequalities that characterize realizable spectra.^[19] These extend the additive and multiplicative forms originally developed for eigenvalues to the non-Hermitian setting of singular values, ensuring consistency with the geometry of matrix subspaces. Consider a simple example with the $3 \times 3 diagonal matrix

A = \begin{pmatrix} 4 & 0 & 0 \\ 0 & 3 & 0 \\ 0 & 0 & 1 \end{pmatrix},

which has singular values \sigma_1(A) = 4, \sigma_2(A) = 3, \sigma_3(A) = 1. The top-left $2 \times 2 principal submatrix

B = \begin{pmatrix} 4 & 0 \\ 0 & 3 \end{pmatrix}

has singular values $4 and $3, satisfying $4 \leq 4 and $3 \leq 3. In contrast, a non-principal $2 \times 2 submatrix formed by rows 1 and 3 and columns 1 and 2,

C = \begin{pmatrix} 4 & 0 \\ 0 & 0 \end{pmatrix},

has singular values $4 and $0, where $4 \leq 4 and $0 \leq 3 for the upper bounds, but the lower interlacing bound (e.g., \sigma_2(C) \geq \sigma_3(A) = 1) does not hold, illustrating the difference for general submatrices.^[19]

For Sums and Products

Singular values exhibit submultiplicativity under matrix multiplication. For compatible matrices A \in \mathbb{C}^{m \times n} and B \in \mathbb{C}^{n \times p}, with singular values ordered nonincreasingly, the i-th singular value of the product satisfies

\sigma_i(AB) \leq \sigma_1(A) \sigma_i(B), \quad i = 1, \dots, \min(m, p).

A symmetric bound holds as

\sigma_i(AB) \leq \sigma_i(A) \sigma_1(B), \quad i = 1, \dots, \min(m, p).

These inequalities follow from the Courant-Fischer min-max theorem for singular values, combined with the fact that the largest singular value \sigma_1 coincides with the spectral (operator) norm, which is submultiplicative: \sigma_1(AB) \leq \sigma_1(A) \sigma_1(B). A stronger relation is given by log-majorization: the vector of singular values s(AB) of the product is log-majorized by the Hadamard (componentwise) product of the singular value vectors s(A) \circ s(B), denoted s(AB) \prec_{\log} s(A) \circ s(B). This means that, assuming square matrices for simplicity,

\sum_{j=1}^k \log \sigma_j(AB) \leq \sum_{j=1}^k \log(\sigma_j(A) \sigma_j(B)), \quad k = 1, \dots, n,

with equality at k = n. Equivalently, in multiplicative form,

\prod_{j=1}^k \sigma_j(AB) \leq \prod_{j=1}^k \sigma_j(A) \cdot \prod_{j=1}^k \sigma_j(B), \quad k = 1, \dots, n.

This majorization relation, established through majorization theory and properties of unitarily invariant norms, implies a host of inequalities for traces, determinants, and other symmetric functions of singular values, and has been refined in subsequent works to incorporate intermediate matrices for monotonicity arguments.^[20] For matrix sums, Weyl's inequality provides a key perturbation bound: for compatible matrices A and B,

|\sigma_i(A + B) - \sigma_i(A)| \leq \sigma_1(B), \quad i = 1, \dots, \min(m, n).

This result, analogous to Weyl's eigenvalue perturbation theorem but adapted via the singular value characterization, quantifies how addition of B shifts the singular values of A by at most the spectral norm of B. It has been strengthened in various forms, such as majorization-based versions for the full spectrum. The Fan-Hoffman inequalities extend these ideas, particularly through subadditivity of the Ky Fan k-norms K_k(A) = \sum_{i=1}^k \sigma_i(A):

K_k(A + B) \leq K_k(A) + K_k(B), \quad k = 1, \dots, n.

For products, the Fan-Hoffman framework yields bounds like K_k(AB) \leq K_k(A) \sigma_1(B), mirroring the individual singular value inequalities but for partial sums. These norm inequalities stem from metric properties in the space of matrices and majorization, enabling control over unitarily invariant norms under algebraic operations.^[21]

Relation to Eigenvalues

For a square matrix A \in \mathbb{C}^{n \times n}, the singular values \sigma_1(A) \geq \sigma_2(A) \geq \cdots \geq \sigma_n(A) and the eigenvalues \lambda_1(A), \dots, \lambda_n(A) (ordered such that |\lambda_1(A)| \geq |\lambda_2(A)| \geq \cdots \geq |\lambda_n(A)|) satisfy the inequality

\sigma_k(A) \geq |\lambda_k(A)|

for each k = 1, \dots, n. This relation follows from the fact that the singular values majorize the moduli of the eigenvalues, reflecting the broader geometric interpretation of singular values as the lengths of semi-axes in the image ellipsoid of the unit ball under A, which bounds the scaling factors associated with eigenvalues.^[22] Equality in this inequality holds for all k when A is a normal matrix, i.e., A^* A = A A^*, in which case \sigma_k(A) = |\lambda_k(A)| for each k. A prominent example is the Hermitian case, where A = A^* (implying normality) and the eigenvalues are real, so the singular values coincide exactly with the absolute values of the eigenvalues ordered by decreasing magnitude. For instance, consider the Hermitian matrix

A = \begin{pmatrix} 3 & 1 \\ 1 & -2 \end{pmatrix},

whose eigenvalues are \frac{1 + \sqrt{29}}{2} and \frac{1 - \sqrt{29}}{2}, yielding singular values \frac{1 + \sqrt{29}}{2} and \frac{\sqrt{29} - 1}{2}.^[22] More generally, the Ky Fan inequalities extend this connection to partial sums:

\sum_{i=1}^k \sigma_i(A) \geq \sum_{i=1}^k |\lambda_i(A)|

for each k = 1, \dots, n, with equality for k = n since both sums equal the nuclear norm of A. These inequalities quantify the cumulative dominance of singular values over eigenvalue moduli and are sharp for normal matrices. Note that the singular values of A relate directly to the eigenvalues of the Hermitian matrix A^* A, as \sigma_i(A) = \sqrt{\lambda_i(A^* A)} with \lambda_i(A^* A) ordered decreasingly, providing a bridge to eigenvalue problems on positive semidefinite matrices without altering the focus on A itself.^[22]

Computation

Algorithms

The computation of singular values typically involves algorithms that first reduce the matrix to a simpler form and then iteratively refine the values. The Golub-Reinsch algorithm, introduced in 1970, is a foundational method for this purpose. It proceeds in two main stages: first, bidiagonalization of the m × n matrix A using Householder reflections to transform A into an upper bidiagonal matrix B, which preserves the singular values; second, application of the QR iteration algorithm to the bidiagonal matrix to compute its singular values iteratively. This approach is implemented in standard numerical libraries such as LAPACK and forms the basis for many subsequent improvements. For larger matrices, particularly dense ones, divide-and-conquer methods offer enhanced efficiency over iterative QR approaches. These algorithms, refined in a 1995 paper by Gu and Eisenstat, operate on the bidiagonal form produced by Householder reductions and recursively split the problem into smaller subproblems by finding a rank-revealing decomposition, solving each recursively, and then merging the results using rank-one updates. Such methods are particularly suitable for parallel computation and achieve higher performance on modern hardware, making them preferable for matrices where full accuracy is required but speed is a concern. The Jacobi method provides an alternative for small to medium-sized dense matrices, emphasizing successive pairwise rotations to diagonalize the matrix. Originating from adaptations of the classical Jacobi eigenvalue algorithm, the two-sided version applies plane rotations to both sides of the matrix to zero out off-diagonal elements progressively, converging quadratically in practice. This method is noted for its simplicity and inherent parallelism, as rotations can be performed independently, though it generally requires more iterations than QR-based techniques. In scenarios involving very large or sparse matrices, especially for big data applications, randomized algorithms enable efficient low-rank approximations of the singular values. The framework developed by Halko, Martinsson, and Tropp in 2011 projects the matrix onto a low-dimensional random subspace, computes a partial SVD there, and uses range-finding techniques like Gaussian projections to capture the dominant singular structure with high probability. These methods trade a small amount of accuracy for dramatic reductions in computational cost, scaling linearly with matrix size for fixed rank. The overall computational complexity for the full singular value decomposition of an m × n matrix with m ≥ n is O(m n^2 + n^3) using standard dense algorithms like Golub-Reinsch or divide-and-conquer, dominated by the bidiagonalization and iterative phases. For the general case, it is O(min(m n^2, m^2 n)). Randomized variants reduce this to O(m n k) for a rank-k approximation, where k ≪ min(m, n).

Numerical Stability

The Golub-Reinsch algorithm for computing the singular value decomposition (SVD) of a matrix A is backward stable, meaning the computed SVD \hat{U} \hat{\Sigma} \hat{V}^T is the exact SVD of a slightly perturbed matrix A + E, where \|E\|_2 \leq p(m,n) \epsilon \|A\|_2 and \epsilon is the machine precision, with p(m,n) a modest function of the matrix dimensions m and n.^[23]^[24] This stability property ensures that the algorithm does not amplify rounding errors beyond the inherent precision of floating-point arithmetic. Perturbation theory provides bounds on how small changes in the matrix affect its singular values. For a perturbation E to A, the change in the i-th singular value satisfies |\delta \sigma_i| \leq \|E\|_2, with a first-order approximation \delta \sigma_i \approx u_i^T E v_i for a simple singular value \sigma_i > 0, where u_i and v_i are the corresponding left and right singular vectors.^[25] This indicates that perturbations primarily affect singular values through their projection onto the singular subspaces, with smaller singular values being more sensitive in relative terms. The relative accuracy of computed singular values \hat{\sigma}_i is generally bounded by O(\epsilon \cdot \kappa(A)), where \kappa(A) = \sigma_1 / \sigma_{\min} is the condition number of A, reflecting that large singular values achieve near-machine-precision accuracy while small ones inherit errors scaled by the condition number.^[24] Specifically, |\hat{\sigma}_i - \sigma_i| \leq p(m,n) \epsilon \sigma_1, so for \sigma_i \ll \sigma_1, the absolute error can mask the true value if it falls below the perturbation level. Computing nearly zero singular values poses challenges for numerical stability, as rounding errors can perturb them by up to O(\epsilon \sigma_1), potentially leading to inaccurate rank revelation in low-rank or ill-conditioned matrices.^[24] For instance, consider the ill-conditioned 4×4 matrix

A = \begin{pmatrix} 0 & 1 & 0 & 0 \\ 0 & 0 & 2 & 0 \\ 0 & 0 & 0 & 3 \\ 1/60000 & 0 & 0 & 0 \end{pmatrix},

which has exact singular values 3, 2, 1, and $1/60000 \approx 1.67 \times 10^{-5}; in double-precision arithmetic (\epsilon \approx 2.22 \times 10^{-16}), rounding during computation may alter the smallest singular value by an amount comparable to \epsilon \sigma_1 \approx 6.66 \times 10^{-16}, but for more severely ill-conditioned cases with \sigma_{\min} \lesssim \epsilon \sigma_1, the perturbation can exceed the true value, falsely indicating full rank.^[11]

Historical Development

Origins

The singular value decomposition (SVD), from which singular values derive, has roots in the late 19th century. In 1873, Eugenio Beltrami provided the first proof of the SVD for real square matrices in the context of bilinear forms. Independently, Camille Jordan proved a similar result in 1874. James Joseph Sylvester contributed related ideas around the same period. These works established the decomposition for symmetric positive semi-definite matrices in finite dimensions.^[26] The concept of singular values in the context of integral equations and infinite-dimensional spaces developed in the early 20th century. David Hilbert laid foundational groundwork around 1904–1906 by investigating the eigenvalues of self-adjoint operators associated with integral equations, treating them as infinite matrices to analyze their spectral properties.^[27] In his 1904 paper "Grundzüge einer allgemeinen Theorie der linearen Integralgleichungen," Hilbert explored the solvability of such equations and the role of eigenvalues in determining convergence and uniqueness.^[28] Erhard Schmidt, a student of Hilbert, formalized the notion of singular values in 1907 in his seminal paper "Zur Theorie der linearen und nichtlinearen Integralgleichungen," published in Mathematische Annalen.^[29] Schmidt introduced these values—initially termed "eigenvalues"—as the positive square roots of the eigenvalues of the operator A^* A, where A represents a compact integral operator on a Hilbert space. This framework extended Hilbert's ideas to provide a canonical decomposition for solving linear integral equations.^[26] The connection to Hilbert-Schmidt operators arose directly from this work, as these operators—named after both mathematicians—characterize compact operators with square-integrable kernels, whose norms are defined via the Hilbert-Schmidt norm involving the sum of squared singular values.^[30] Schmidt's analysis highlighted how singular values quantify the "size" of such operators and facilitate norm estimates in functional analysis.^[26] Initial applications focused on resolving integral equations of the second kind, where singular values enabled the expansion of solutions in orthogonal series, ensuring convergence under compactness assumptions. This approach proved instrumental in addressing Fredholm-type problems, bridging finite matrix theory with infinite-dimensional settings.^[28]

Key Developments

In 1936, Carl Eckart and Gale Young extended the singular value decomposition to rectangular matrices, providing the first rigorous proof for both real and complex cases and establishing its utility in low-rank approximations.^[26] This work built on earlier theoretical foundations by demonstrating the decomposition's applicability beyond square matrices, which was crucial for practical computations in finite dimensions.^[26] The term "singular values" was formalized by Frederick Smithies in 1937 within the context of integral equations in operator theory, shifting from the earlier nomenclature of "characteristic values" to emphasize their distinct role in non-self-adjoint operators.^[26] Smithies' analysis in his paper clarified the eigenvalues and singular values of such equations, influencing subsequent developments in functional analysis. In 1957, D. E. Allakhverdiev advanced the theory by proving that the approximation numbers of completely continuous operators coincide with their singular values, interpreted as the eigenvalues of the absolute value of the operator, thereby generalizing singular values to broader operator approximations.^[31] This result provided a foundational link between singular values and finite-dimensional approximations in Hilbert spaces.^[31] The singular value decomposition gained widespread adoption in numerical linear algebra during the 1960s through the efforts of Gene Golub and William Kahan, who developed efficient algorithms for computing singular values and the pseudo-inverse, making the decomposition computationally feasible for large-scale problems. Their 1965 algorithm, based on bidiagonalization, marked a pivotal shift toward practical implementation. These mid-20th-century advancements facilitated the integration of singular values into statistics, particularly through their equivalence to principal component analysis via the decomposition of covariance matrices, influencing data reduction techniques throughout the century.^[26]

References

[1]
[PDF] svd-notes.pdf - UC Berkeley math
A singular value decomposition (SVD) is a factorization A = UΣV^T, where U and V are orthogonal matrices, and Σ is a diagonal matrix of singular values.
[2]
[PDF] Eigenvalues and Singular Values - MathWorks
Sep 16, 2013 · The term “singular value” relates to the distance between a matrix and the set of singular matrices. Eigenvalues play an important role in ...
[3]
Singular Value -- from Wolfram MathWorld
Singular values are the square roots of eigenvalues of A^(H)A, where A^(H) is the conjugate transpose, and are part of the singular value decomposition of a ...
[4]
[PDF] A Geometric Perspective on the Singular Value Decomposition - arXiv
Mar 24, 2015 · This is an introductory survey, from a geometric perspective, on the Singular Value Decomposition (SVD) for real matrices, focusing on the role ...
[5]
[PDF] Lecture e i ul r lue Decom&ositio The singular value decomposition ...
The SVD is applicable to both real and complex matrices. However, in de- scribing the geometric interpretation, we assume as usual that the matrix is real. The ...
[6]
[PDF] L15: Singular Value Decomposition
Sometimes this geometric interpretation of the SVD is known as PCA (Principal Component Analysis). ... Thus also the singular values σi are eigenvalues of PT P ...
[7]
[PDF] Chapter 7 The Singular Value Decomposition (SVD)
The Singular Value Decomposition is a highlight of linear algebra. A is any m by n matrix, square or rectangular. Its rank is r. We will diagonalize this A, ...Missing: complex | Show results with:complex
[8]
[PDF] Singular Value Decomposition of Real Matrices
Mar 13, 2020 · Proof of existence of SVD. Theorem. Let A be an m × n real matrix of rank r. Then there exist orthogonal matrices U ∈ Rm×m, V ∈ Rn×n and a ...
[9]
[PDF] Singular Value Decomposition (SVD)
The singular value decomposition of a matrix A is the factorization of A into the product of three matrices A = UDV T where the columns of U and V are ...Missing: seminal | Show results with:seminal
[10]
None
Below is a merged summary of the basic properties of singular values from *Matrix Computations* (4th ed.) by Golub and Van Loan, consolidating all information from the provided segments. To retain as much detail as possible, including page references, sections, and URLs, I will use a structured table format in CSV style for clarity and density, followed by a concise narrative summary. The table captures all unique details across the segments, while the narrative provides an overview.
[11]
[PDF] singular values of random matrices - Djalil Chafaï
The following theorem is the counterpart for the singular values, and can be deduced from its Hermitian cousin. Theorem 1.2 (Courant–Fischer variational ...
[12]
[PDF] 4. Singular value decomposition
SVD analog of Courant–Fischer theorem let 𝐴 be an 𝑚 × 𝑛 matrix with singular values. 𝜎1 ≥ 𝜎2 ≥ ··· ≥ 𝜎min{𝑚,𝑛}. • consider an 𝑛 × 𝑘 matrix 𝑋 ...
[13]
[PDF] Golub and Van Loan - EE IIT Bombay
mance matrix computations. After all, it is the clever exploitation of ... a handful of basic properties so that we can fully appreciate the singular value de.
[14]
[PDF] Properties of the Singular Value Decomposition - Duke People
The proof is obtained as a sequence of homework exercises in Golub and van Loan. Note that we can, in principle, calculate A. −1 by successively solving the ...
[15]
On singular values of products of matrices and log-majorization
Jan 5, 2023 · In this paper, we are concerned with the problem of improving the classical inequality, where X, Y are matrices, s(A) denotes the vector of singular values of ...
[16]
The strengthened versions of the additive and multiplicative Weyl ...
The strengthened versions of the classical additive and multiplicative Weyl inequalities for the singular values of A + B and AB*, where A and B are rectan.
[17]
Matrix Analysis - Cambridge University Press & Assessment
Roger A. Horn, The Johns Hopkins University, Charles R. Johnson, College of William and Mary, Virginia. Publisher: Cambridge University Press.Missing: url | Show results with:url
[18]
[PDF] Singular value decomposition and least squares solutions
Singular Value Decomposition and Least Squares Solutions. G. H. Golub et al. 419. Procedure Minfit was used to compute the solutions of the minimization.Missing: seminal | Show results with:seminal
[19]
Further Details: Error Bounds for the Singular Value Decomposition
The SVD algorithm is backward stable. This means that the computed SVD, $\hat{U} \hat{\Sigma} \hat{V , is nearly the exact SVD of A+E where $\Vert E\Vert _2 ...
[20]
[PDF] Perturbation Theory for the Singular Value Decomposition abstract
The singular value decomposition has a number of applications in digital signal processing. However, the the decomposition must be computed from a matrix.
[21]
Math Origins: Eigenvectors and Eigenvalues
Indeed, it was not until David Hilbert's 1904 paper "Grundzüge ... Hilbert called these integral equations of the first kind and second kind, respectively.
[22]
[PDF] FREDHOLM, HILBERT, SCHMIDT Three Fundamental Papers on ...
Dec 15, 2011 · The crowning glory of his paper is an elegant theory of what happens when (1.1) is “singular,” i.e., when −1 is an eigenvalue of arbitrary ...
[23]
[PDF] Early History of the Singular Value Decomposition - UC Davis Math
Jan 17, 2002 · This paper surveys the contributions of five mathematicians-Eugenio Beltrami (1835–1899), Camille. Jordan (1838–1921), James Joseph Sylvester ( ...Missing: seminal | Show results with:seminal
[24]
On the Early History of the Singular Value Decomposition
This paper surveys the contributions of five mathematicians—Eugenio Beltrami (1835–1899), Camille Jordan (1838–1921), James Joseph Sylvester (1814–1897), Erhard ...Missing: original | Show results with:original
[25]
Hilbert-Schmidt Operator - an overview | ScienceDirect Topics
Compact and Hilbert-Schmidt operators were the first kinds of operators on abstract Hilbert space to be studied in depth.
[26]
Eigenvalues and s-numbers by Albrecht Pietsch. Akademische ...
Something else was needed for this direction! In 1957 D. Eh. Allakhveidiev [A] in "an exotic journal" (quoted from [PI]) proved that in Hilbert space. (A).