Fact-checked by Grok 2 weeks ago

Sylvester's theorem

Sylvester's law of inertia is a theorem in linear algebra that asserts the invariance of the inertia of a real symmetric matrix under congruence transformations. The inertia of an n \times n symmetric matrix A, denoted \operatorname{In}(A) = (i_+(A), i_-(A), i_0(A)), is the ordered triple giving the number of positive eigenvalues i_+(A), the number of negative eigenvalues i_-(A), and the multiplicity of the zero eigenvalue i_0(A). Formally, if A is symmetric and X is nonsingular, then \operatorname{In}(A) = \operatorname{In}(X^T A X). Named after the 19th-century , who established the result in 1852 as part of his work on quadratic forms and , the law provides a complete classification of real symmetric matrices up to . It implies that any real can be diagonalized by a to a consisting of p entries of +1, q entries of -1, and r zeros, where p + q + r = n and the triple (p, q, r) is uniquely determined by the original form. The signature of the form, defined as p - q, and the index p are key invariants that remain unchanged. This invariance has profound implications across mathematics and its applications. In geometry, it classifies conic sections and quadric surfaces: for example, a nondegenerate conic in the plane is an ellipse if the signature is \pm 2 (both eigenvalues same sign), a hyperbola if the signature is $0 (eigenvalues of opposite signs), and a parabola if there is one zero eigenvalue (signature \pm 1). In optimization and control theory, the law helps determine positive definiteness (when i_+(A) = n) or indefiniteness of Hessian matrices at critical points, aiding in the analysis of local minima, maxima, or saddle points. Extensions of the theorem apply to Hermitian matrices over the complex numbers and more general bilinear forms, influencing fields such as numerical linear algebra and physics.

Background Concepts

Quadratic Forms

A quadratic form on a finite-dimensional real V is a map Q: V \to \mathbb{R} defined by Q(v) = B(v, v), where B is a on V. A B: V \times V \to \mathbb{R} is bilinear in each argument and satisfies B(u, v) = B(v, u) for all u, v \in V. This construction ensures that Q is homogeneous of degree two and captures the quadratic nature essential for applications in linear algebra. Common examples include the standard norm squared on \mathbb{R}^n, given by Q(x) = x_1^2 + x_2^2 + \dots + x_n^2, which arises from the standard as the associated . Another example is the indefinite Q(x, y) = x^2 - y^2 on \mathbb{R}^2, derived from the B((x_1, y_1), (x_2, y_2)) = x_1 x_2 - y_1 y_2. In coordinates with respect to a basis of V, any Q can be represented as Q(x) = x^T A x, where x is the and A is a real . Symmetric matrices provide the matrix-theoretic framework for quadratic forms and are detailed in the subsequent section on symmetric bilinear forms. The symmetric bilinear form B associated to Q can be recovered via the polarization identity: B(u, v) = \frac{1}{4} \bigl( Q(u + v) - Q(u - v) \bigr) for all u, v \in V. This identity uniquely determines B from Q over the reals, highlighting the close relationship between the two structures.

Symmetric Bilinear Forms

A bilinear form B: V \times V \to \mathbb{R} on a real vector space V is a map that is linear in each argument separately. It is symmetric if B(u, v) = B(v, u) for all u, v \in V. In a fixed basis \{e_i\}_{i=1}^n for V, any admits a B(u, v) = u^T A v, where u and v are the coordinate vectors of the respective inputs and A = (a_{ij}) is an n \times n with entries a_{ij} = B(e_i, e_j). The symmetry of B ensures A = A^T. Under a change of basis given by an invertible matrix P, the matrix representation transforms via congruence: the new matrix is \tilde{A} = P^T A P. This operation preserves the symmetry of the matrix, as \tilde{A}^T = (P^T A P)^T = P^T A^T P = P^T A P = \tilde{A}. A symmetric bilinear form B is non-degenerate if its kernel \ker(B) = \{ v \in V \mid B(v, w) = 0 \ \forall w \in V \} = \{ 0 \}. In matrix terms, this is equivalent to \det(A) \neq 0. Symmetric bilinear forms generalize quadratic forms, which arise as the special case q(v) = B(v, v).

Formal Statement

Inertia Invariants

The inertia of a real A \in \mathbb{R}^{n \times n} is defined as the triple (n_+, n_-, n_0), where n_+ denotes the number of positive eigenvalues of A ( algebraic multiplicities), n_- the number of negative eigenvalues, and n_0 the number of zero eigenvalues. Sylvester's law of inertia states that for any real A, there exists an invertible real matrix P such that P^T A P = \begin{pmatrix} I_{n_+} & & \\ & -I_{n_-} & \\ & & 0_{n_0} \end{pmatrix}, where I_k is the k \times k and $0_k the k \times k ; moreover, this triple is invariant under , meaning that if B = Q^T A Q for some invertible real matrix Q, then B has the same as A. An important consequence is the invariance of the \sigma(A) = n_+ - n_-, which provides a simpler characterizing the up to . For example, the matrix A = \begin{pmatrix} 1 & 0 \\ 0 & -1 \end{pmatrix} has eigenvalues +1 and -1, so its is (1, 1, 0) and $0.

Eigenvalue Perspective

The spectral theorem for real symmetric matrices provides a foundational perspective on Sylvester's theorem by linking the inertia of a quadratic form directly to the eigenvalues of its associated . Specifically, for an n \times n real symmetric A, there exists an Q such that A = Q D Q^T, where D is a containing the real eigenvalues \lambda_1 \geq \lambda_2 \geq \cdots \geq \lambda_n of A along its diagonal. This ensures that the eigenvalues are real and that the matrix is unitarily similar to its diagonal form, preserving the under similarity transformations. From this decomposition, the inertia of A—defined as the triple (n_+, n_-, n_0), where n_+ is the number of positive eigenvalues, n_- the number of negative eigenvalues, and n_0 the multiplicity of the zero eigenvalue—fully characterizes the sign structure of the \mathbf{x}^T A \mathbf{x}. Sylvester's theorem asserts that this is invariant under transformations, meaning that if B = P^T A P for some P, then B has the same as A, even though the specific eigenvalue magnitudes may differ. Thus, while the reveals the through eigenvalue , congruence preserves only the counts of these , not the eigenvalues themselves. This distinction between and similarity is crucial: via the is a special case of similarity (since Q^T = Q^{-1}), which preserves the entire set of eigenvalues, whereas general transformations alter the eigenvalues but maintain their sign pattern. For instance, consider the A = \begin{pmatrix} 1 & 0 \\ 0 & -1 \end{pmatrix}, with eigenvalues $1 > 0 and -1 < 0, yielding inertia (1, 1, 0). Applying with P = \begin{pmatrix} 1 & 1 \\ 0 & 1 \end{pmatrix} gives B = P^T A P = \begin{pmatrix} 1 & 1 \\ 1 & 0 \end{pmatrix}, whose eigenvalues are \frac{1 + \sqrt{5}}{2} > 0 and \frac{1 - \sqrt{5}}{2} < 0, preserving the inertia (1, 1, 0) but changing the eigenvalue values. This example illustrates how basis changes via non-orthogonal transformations, common in quadratic form analyses, retain the inertial properties while adapting the spectrum to new coordinates.

Proof Sketch

Congruence and Diagonalization

A central component of the proof of Sylvester's theorem is the diagonalization of real symmetric bilinear forms via congruence transformations, which explicitly constructs a basis in which the form assumes a diagonal representation. For a symmetric matrix A \in \mathbb{R}^{n \times n} representing the bilinear form \Phi(\mathbf{x}, \mathbf{y}) = \mathbf{y}^T A \mathbf{x}, there exists an invertible matrix P \in \mathbb{R}^{n \times n} such that P^T A P = D, where D is diagonal. This congruence preserves the bilinear form up to change of basis and allows the signs on the diagonal of D to determine the inertia invariants. One algorithmic approach to achieve this diagonalization proceeds by iteratively completing the square on the associated quadratic form Q(\mathbf{x}) = \mathbf{x}^T A \mathbf{x}. Assume without loss of generality that the (1,1) entry of A is nonzero (addressed below); then Q(\mathbf{x}) = a_{11} x_1^2 + 2 x_1 \sum_{k=2}^n a_{1k} x_k + \sum_{i=2}^n \sum_{j=2}^n a_{ij} x_i x_j. Rewriting yields Q(\mathbf{x}) = a_{11} \left( x_1 + \sum_{k=2}^n \frac{a_{1k}}{a_{11}} x_k \right)^2 + \sum_{i=2}^n \sum_{j=2}^n \left( a_{ij} - \frac{a_{1i} a_{1j}}{a_{11}} \right) x_i x_j. Substituting y_1 = x_1 + \sum_{k=2}^n \frac{a_{1k}}{a_{11}} x_k and y_k = x_k for k \geq 2 corresponds to a congruence transformation via an upper-triangular P with 1's on the diagonal, reducing the problem to an (n-1) \times (n-1) symmetric matrix for the remaining form. Repeating this process diagonalizes A, with the sign of each successive leading coefficient determining the diagonal sign. An alternative constructive method employs a Gram-Schmidt orthogonalization adapted to the bilinear form \Phi. Start with a basis and iteratively project subsequent vectors onto the orthogonal complement with respect to \Phi of the span of previous vectors, ensuring \Phi(\mathbf{e}_i, \mathbf{e}_j) = 0 for i \neq j. For the positive and negative parts, decompose the space into a direct sum of a maximal subspace where \Phi is positive definite and one where it is negative definite, using the orthogonalization to build bases for these subspaces separately. Scaling each basis vector \mathbf{e}_i by $1 / \sqrt{|\Phi(\mathbf{e}_i, \mathbf{e}_i)|} (for nonzero entries) yields a diagonal matrix with entries \pm 1 or 0, completing the congruence. The construction relies on identifying maximal positive and negative subspaces, whose dimensions equal the positive inertia index n_+ and negative index n_-, respectively. A subspace U is positive (negative) if \Phi(u, u) > 0 (< 0) for all nonzero u \in U, and maximality ensures no larger such subspace exists; these dimensions are preserved under congruence and match the counts of positive (negative) diagonal entries in any diagonalization. A key lemma facilitating the inductive step states that if there exists a nonzero v such that v^T A v > 0, then there exists an invertible P such that P^T A P has a positive (1,1) diagonal entry. This is established by selecting a basis aligned with a direction where the evaluates positively, achievable since the form represents positive values, followed by ; if needed, a transformation (adding a multiple of one basis to another) adjusts off-diagonal terms while preserving the sign.

Invariance Argument

The invariance of the inertia under congruence transformations is established through an inductive argument on the dimension of the matrix, combined with a characterization of the signature via maximal definite subspaces, ensuring that the numbers of positive, negative, and zero diagonal entries are uniquely determined. To prove uniqueness, consider the associated symmetric bilinear form \phi on a finite-dimensional real vector space V. The index p (number of positive eigenvalues) is defined as the maximum dimension of a subspace P \subseteq V on which \phi restricts to a positive definite form, i.e., \phi(u, u) > 0 for all $0 \neq u \in P. Similarly, the co-index q is the maximum dimension of a subspace where \phi is negative definite. These dimensions are basis-independent, as the existence of such a subspace implies the property holds regardless of the choice of basis for V. The nullity is then \dim V - (p + q), matching the dimension of the radical \{ v \in V \mid \phi(v, w) = 0 \ \forall w \in V \}, which is also invariant as it equals n - \rank(\phi). The proof proceeds by on n = \dim V. For the base case n = 1, the is trivial and . Assume the result holds for dimensions less than n. For of a diagonal form (referencing the construction), if \phi \not\equiv 0, select v \in V with \phi(v, v) \neq 0, and let U = \{ u \in V \mid \phi(v, u) = 0 \}, a of dimension n-1. By the inductive hypothesis, \phi|_U diagonalizes to a form with (p', q', r'). Extending the basis of U by v yields a bordered \begin{pmatrix} \phi(v,v) & 0^\top \\ 0 & D \end{pmatrix}, where D is the for \phi|_U, preserving the signs and thus the inertia counts adjusted by the sign of \phi(v,v). This reduction via bordering confirms the diagonal form's signs are consistent with the dimensions. For uniqueness under the inductive assumption, suppose two diagonal forms D_1 and D_2 for the same \phi have different indices, say p_1 > p_2 for D_1 and D_2. The subspace spanned by the first p_1 basis vectors corresponding to the positive entries in D_1 is positive definite of dimension p_1. However, in the basis for D_2, the maximum such subspace has dimension p_2 < p_1, leading to a contradiction since the positive definite property is intrinsic to \phi and independent of basis. Thus, the subspaces' dimensions must match, implying p_1 = p_2; similarly for q. This contradiction argument ensures the inertia is unique, completing the invariance.

Applications

Positive Definiteness Testing

A quadratic form Q(\mathbf{v}) = \mathbf{v}^T A \mathbf{v}, where A is a real symmetric n \times n matrix and \mathbf{v} \in \mathbb{R}^n, is positive definite if Q(\mathbf{v}) > 0 for all nonzero \mathbf{v}. By Sylvester's law of inertia, this condition holds if and only if the inertia of A consists of n_+ = n positive eigenvalues, with n_- = n_0 = 0 zero or negative eigenvalues. This characterization links the sign properties of the quadratic form directly to the spectral structure preserved under congruence transformations. Sylvester's criterion provides a practical test for without computing all eigenvalues, stating that a real A is positive definite if and only if all its leading principal minors are positive. That is, denoting the k \times k leading principal submatrix by A_k, we require \det(A_k) > 0 for each k = 1, \dots, n. This criterion derives from the law of inertia through the LDL decomposition of A = L D L^T, where L is unit lower triangular and D is diagonal; the positive definiteness of A is equivalent to all diagonal entries of D being positive, which aligns with the signs of the leading principal minors via successive Schur complements. For illustration, consider a $2 \times 2 A = \begin{pmatrix} a & b \\ b & c \end{pmatrix}. The leading principal minors are \det(A_1) = a and \det(A_2) = ac - b^2. Thus, A is positive definite if a > 0 and ac - b^2 > 0. This simple check confirms the condition without eigenvalue solving. Computationally, is advantageous for testing in large-scale applications, as it requires only O(n^3) operations to compute the determinants sequentially (often via , which fails if not positive definite), avoiding the full O(n^3) but more expensive eigendecomposition needed for general revelation. This efficiency is particularly useful in optimization and where repeated checks arise.

Stability Analysis in Dynamics

In mechanical systems, the potential energy near an equilibrium point is typically approximated by a quadratic form V(\mathbf{q}) = \frac{1}{2} \mathbf{q}^T K \mathbf{q}, where K is a symmetric matrix representing the stiffness, and \mathbf{q} denotes the displacement vector from equilibrium. The kinetic energy is similarly given by a quadratic form T(\dot{\mathbf{q}}) = \frac{1}{2} \dot{\mathbf{q}}^T M \dot{\mathbf{q}}, with M positive definite. Sylvester's law of inertia classifies the equilibrium's stability through the inertia of K: if K is positive definite (all eigenvalues positive, so the number of positive eigenvalues n_+ = n with no negative or zero eigenvalues), the equilibrium is stable, as small perturbations lead to restoring forces that bound the motion. For linearized dynamics around equilibrium, the equations of motion take the form M \ddot{\mathbf{q}} + K \mathbf{q} = 0. The natural frequencies and modes are found by solving the generalized eigenvalue problem K \mathbf{v} = \lambda M \mathbf{v}, where the eigenvalues \lambda are real due to the symmetry of K and positive definiteness of M. By Sylvester's law of inertia, the inertia of K determines the signs of these \lambda, as the problem can be transformed via Cholesky factorization of M = L L^T to a standard eigenvalue problem for the congruent matrix L^{-T} K L^{-1}, which has the same inertia as K. If K is positive semi-definite, all \lambda \geq 0; positive definiteness of K (i.e., n_+ = n) ensures all \lambda > 0, yielding purely oscillatory solutions and Lyapunov stability, as the total energy E = T + V serves as a Lyapunov function that remains constant and positive definite away from equilibrium. Negative eigenvalues in K ( n_- > 0 ) indicate instability, with exponential divergence in those directions. In vibration analysis, modal decomposition diagonalizes the generalized eigenvalue problem K \mathbf{v} = \lambda M \mathbf{v}, where the of K classifies the modes: positive eigenvalues correspond to oscillatory modes with frequencies \sqrt{\lambda}, negative eigenvalues to unstable modes with , and zero eigenvalues to neutral rigid-body modes. This classification is under coordinate changes, directly applying Sylvester's theorem to assess structural without full eigendecomposition. For instance, in multi-degree-of-freedom systems like coupled oscillators, identifying n_+ = n confirms all modes are vibrations. In , the of the —analogous to K in optimization landscapes—determines the nature of critical points in problems, such as stabilizing mechanical systems via feedback. Positive definite Hessian ( n_+ = n ) indicates local minima for cost functions, ensuring stable ; indefinite reveals saddles, guiding controller design to avoid unstable directions. This links Sylvester's theorem to -controlling methods in for in dynamics.

Historical Context

Sylvester's Contribution

James Joseph Sylvester (1814–1897) was an English mathematician whose pioneering work in and the nascent field of matrix algebra laid foundational groundwork for modern linear algebra. Born in to Jewish parents, Sylvester overcame significant barriers, including , to become a professor at institutions such as the and , where he influenced a generation of American mathematicians. His research emphasized the properties preserved under transformations of algebraic expressions, particularly those arising in quadratic forms. In 1852, introduced what is now known as Sylvester's law of in a paper published in the , titled "A demonstration of the theorem that every homogeneous quadratic polynomial is reducible by real linear substitutions to the form of a sum of any number and of squares of linear functions." This work marked the first explicit statement and proof of the invariance of the (the counts of positive, negative, and zero diagonal entries) for the symmetric matrices associated with real quadratic forms under by nonsingular real matrices. Sylvester drew an to physical , noting the "rigidity" of these counts in resisting changes under coordinate transformations, a perspective that underscored the theorem's physical interpretability in contexts like . Sylvester's motivation stemmed from his extensive investigations into invariants of binary forms and their transformations, building on earlier ideas in elimination theory and canonical forms. He sought to classify forms up to equivalence, recognizing that while the specific coefficients vary under linear changes, the provides a complete, discrete that fully characterizes the form's type over the reals. This , embedded within his proof of via real linear substitutions, established the theorem as a for distinguishing definite, indefinite, and semi-definite forms.

Preceding and Subsequent Developments

The foundations of on the inertia of quadratic forms were laid by earlier mathematicians working on the and of such forms. In , demonstrated that any real quadratic form can be transformed via linear into a , establishing a key step toward classifying quadratic forms by their diagonal entries. This result built on prior investigations into the geometry of conic sections and optimization problems from the 1750s and 1760s. Complementing this, , in his 1801 , advanced the theory by introducing concepts of equivalence and for binary quadratic forms in the context of . These precursors provided the groundwork for understanding the invariance of diagonal signatures under congruence transformations, though they did not fully address the invariance of the number of positive, negative, and zero eigenvalues in higher dimensions. Following James Joseph Sylvester's formulation of the law of inertia in his 1852 paper, subsequent developments refined and generalized the theorem within . In 1868, established canonical forms for pairs of bilinear forms, extending the reduction techniques to coupled systems and influencing the study of generalized eigenvalue problems. During the , Camille contributed to the of matrices, including symmetric ones, by developing a analysis that prototyped modern decompositions and reinforced the diagonalizability of symmetric matrices over the reals. These advancements built directly on Sylvester's invariance , providing more explicit constructions for the diagonal forms central to the theorem. The theorem's influence extended beyond linear algebra into and optimization during the . In , it underpins the Morse index theorem, where the of the at a critical point determines the index as the number of negative eigenvalues, a pivotal to analyzing the of manifolds via Morse functions. In optimization, Sylvester's law facilitates the analysis of problems by preserving the under , enabling checks for positive semidefiniteness and the formulation of relaxations for constraints, as seen in modern interior-point methods and models for linear systems. Early proofs of the theorem, including Sylvester's original from 1852–1853, were often non-constructive, relying on the existence of invariants or properties of quadratic forms without explicit transformations. In contrast, modern proofs typically employ on the matrix or continuity arguments, such as perturbing the matrix to one with distinct eigenvalues before applying , ensuring constructive paths to the inertial invariants.

Generalizations and Extensions

Complex Hermitian Forms

A Hermitian form on a finite-dimensional complex vector space V is a sesquilinear map B: V \times V \to \mathbb{C} that is linear in the first argument, conjugate-linear in the second, and satisfies B(v, u) = \overline{B(u, v)} for all u, v \in V, which ensures that B(v, v) \in \mathbb{R} for all v \in V. Choosing an ordered basis for V, the form B is represented by a Hermitian matrix H \in \mathbb{C}^{n \times n} with entries H_{ij} = B(e_i, e_j) and H = H^*, where H^* denotes the conjugate transpose. A Hermitian form is positive definite if B(v, v) > 0 for all nonzero v \in V, which is equivalent to all eigenvalues of H being positive. The generalization of Sylvester's theorem to the complex setting defines the inertia of a Hermitian form (or its matrix H) as the triple (n_+(H), n_-(H), n_0(H)), where n_+(H), n_-(H), and n_0(H) are the numbers of positive, negative, and zero eigenvalues of H, respectively; all eigenvalues are real due to the Hermitian property. Two Hermitian matrices H_1, H_2 \in \mathbb{C}^{n \times n} are *-congruent if there exists an invertible P \in \mathbb{C}^{n \times n} such that H_1 = P^* H_2 P. Sylvester's law of inertia states that H_1 and H_2 are *-congruent if and only if they have the same inertia. For a non-degenerate Hermitian form, corresponding to an invertible matrix H (full rank), n_0(H) = 0, so the theorem specifies that the numbers of positive and negative eigenvalues are invariants under *-congruence. This differs from the real symmetric case primarily in the sesquilinear nature of the form and the use of *-congruence involving the conjugate transpose, though the real case arises as a special instance when the imaginary parts vanish. In applications, such as , Hermitian operators on a represent physical observables, with their real eigenvalues corresponding to the possible outcomes of measurements; the then characterizes the signature of the observable's spectrum under unitary transformations, which preserve the Hermitian structure.

Quantitative Bounds on Eigenvalues

Extensions of Sylvester's law of inertia provide quantitative bounds on the locations and perturbations of eigenvalues for symmetric or Hermitian matrices, leveraging the preservation of inertia under to estimate changes due to structured modifications. These results address limitations in the classical theorem by offering numerical estimates on eigenvalue shifts, particularly for low-rank perturbations and submatrix relations, enabling precise error analysis in computational settings. Ostrowski's theorem provides interlacing inequalities for the eigenvalues of under rectangular transformations. Specifically, for a A \in \mathbb{C}^{n \times n} and a full-rank matrix X \in \mathbb{C}^{m \times n} with m \leq n, the eigenvalues of B = X^* A X satisfy \lambda_k(A) \leq \lambda_k(B) \leq \lambda_{n - m + k}(A) for k = 1, \dots, m, where the eigenvalues are ordered decreasingly. This generalizes the classical Cauchy interlacing theorem (for principal submatrices, corresponding to m = n-1) and preserves the in the sense that the number of positive/negative eigenvalues is bounded by the transformation dimensions. This theorem is particularly useful for analyzing stability in iterative methods where dimension reductions occur, such as in techniques during optimization algorithms. Extensions of interlacing properties relate the eigenvalues of a Hermitian matrix to those of its principal submatrices, with inertia information providing constraints on the number of eigenvalues in specified intervals and tighter bounds than standard Cauchy interlacing. This allows for refined estimates on eigenvalue distributions, especially in cases where submatrices arise from deflation or projection techniques in eigenvalue solvers. The Haynsworth inertia additivity formula further refines these bounds by decomposing the inertia of a partitioned Hermitian matrix into the inertias of its blocks and Schur complements. For a block matrix \begin{pmatrix} A & B \\ C & D \end{pmatrix} with A invertible, \operatorname{In}\left( \begin{pmatrix} A & B \\ C & D \end{pmatrix} \right) = \operatorname{In}(A) + \operatorname{In}(D - C A^{-1} B). In the context of rank-one updates A + uv^T, the formula relates the inertia of the updated matrix to the roots of a scalar quadratic equation whose coefficients depend on the inertia of A and the update parameters, enabling exact computation of inertia changes without full eigendecomposition. This additivity property facilitates efficient tracking of eigenvalue sign changes in sequential updates. These quantitative tools find application in , particularly for error analysis in eigenvalue computations and stability assessments of matrix factorizations, where classical alone is insufficient for bounding approximation errors. For instance, they underpin perturbation theories in algorithms like the QR method, ensuring reliable preservation amid rounding errors.