Fact-checked by Grok 2 weeks ago

Characteristic polynomial

In linear algebra, the characteristic polynomial of an n \times n A is defined as p_A(\lambda) = \det(\lambda I - A), where I is the n \times n and \lambda is a scalar variable; this yields a of degree n whose roots are the eigenvalues of A. The characteristic polynomial is invariant under similarity transformations, meaning that if B = P^{-1} A P for some invertible matrix P, then p_B(\lambda) = p_A(\lambda), which underscores its role in capturing intrinsic spectral properties of the matrix independent of basis choice. The coefficients of the polynomial are related to the traces of powers of A via Newton identities, providing connections to other matrix invariants like the determinant (the constant term, up to sign) and the trace (the coefficient of \lambda^{n-1}, up to sign). A cornerstone result involving the characteristic polynomial is the , which states that every satisfies its own , so p_A(A) = 0; this theorem, first appearing in Arthur Cayley's 1858 work on matrices, enables efficient computation of high powers of matrices and has broad implications in algebra and analysis. The characteristic polynomial plays a pivotal role in , facilitating the computation of essential for and Jordan canonical form, and extends to applications in for system stability analysis, for operator spectra, and numerical methods for solving differential equations.

Basic Concepts

Motivation

The concept of the characteristic polynomial emerges from the fundamental quest in linear algebra to identify the eigenvalues of a , which reveal essential properties of the associated linear . Consider a square A representing a linear on a . An eigenvalue \lambda is a scalar for which there exists a non-zero v (an eigenvector) satisfying A v = \lambda v. Rearranging this equation yields (A - \lambda I) v = 0, where I is the . For this homogeneous system to have a non-trivial , the A - \lambda I must be singular, meaning its vanishes: \det(A - \lambda I) = 0. To obtain a (leading coefficient 1), the characteristic polynomial is conventionally defined using \det(\lambda I - A), whose roots are precisely the eigenvalues of A. This connection provides a polynomial equation whose solutions characterize the scaling factors of the along certain directions. This arises naturally from expanding the \det(\lambda I - A), which for an n \times n A produces a -n in \lambda. The brief begins with the \lambda I - A, whose entries are linear in \lambda; the , being a multilinear of the rows (or columns), expands into a of terms, each contributing powers of \lambda up to n, with the constant term being (-1)^n \det(A). This structure encodes the condition for the of \lambda I - A to be non-trivial, directly linking the polynomial's to the of A. Historically, the characteristic polynomial was developed by in his 1829 memoir "Sur l'équation à l'aide de laquelle on détermine les inégalités séculaires des mouvements des planètes," where he employed it in the context of to analyze secular perturbations in planetary orbits, using linear substitutions and forms. In this work, Cauchy introduced the term "" (équation caractéristique) and "characteristic root" (racine caractéristique), and demonstrated that the eigenvalues of symmetric matrices are real, marking a pivotal advancement in the of matrices. This laid the groundwork for later developments in and , where spectral properties underpin the decomposition of transformations. Intuitively, the "characteristic" nature of the polynomial stems from its invariance under similarity transformations: if B = P^{-1} A P for an P, then \det(\lambda I - B) = \det(\lambda I - P^{-1} A P) = \det(P^{-1} (\lambda I - A) P) = \det(\lambda I - A), preserving the . This invariance ensures that the characteristic polynomial captures intrinsic behavioral traits of the linear transformation, independent of the basis chosen to represent , making it a robust descriptor of the operator's .

Formal Definition

The characteristic polynomial of an n \times n A with entries in a F is defined as the p_A(\lambda) = \det(\lambda I_n - A) in the F[\lambda], where I_n denotes the n \times n and \det is the function over F. This has degree n and is monic, meaning its leading coefficient is 1, because the leading term arises from \det(\lambda I_n) = \lambda^n, with lower-degree terms contributed by the entries of -A. Common notations for the characteristic polynomial include p_A(\lambda) or \chi_A(\lambda). Some texts define it as \det(A - \lambda I_n), which equals (-1)^n \det(\lambda I_n - A), introducing a sign alternation depending on the parity of n; in such cases, the monic version is obtained by multiplying by (-1)^n to ensure the leading coefficient is 1. More generally, for an T (a from a finite-dimensional V over F to itself), the characteristic polynomial p_T(\lambda) is defined using any A of T with respect to a basis of V, yielding p_T(\lambda) = \det(\lambda I - A); this is independent of the choice of basis, as similar matrices share the same characteristic polynomial.

Illustrative Examples

Low-Dimensional Matrices

For the simplest case of a $1 \times 1 matrix A = , the characteristic polynomial is computed as p(\lambda) = \lambda - a. This linear polynomial directly reflects the matrix's single entry, and its root \lambda = a is the eigenvalue of A. Consider a general $2 \times 2 matrix A = \begin{pmatrix} a & b \\ c & d \end{pmatrix}. The characteristic polynomial is p(\lambda) = \det(\lambda I - A), which expands to the determinant of \begin{pmatrix} \lambda - a & -b \\ -c & \lambda - d \end{pmatrix}. Expanding this determinant gives (\lambda - a)(\lambda - d) - (-b)(-c) = \lambda^2 - (a + d)\lambda + (ad - bc). The roots of this quadratic polynomial are the eigenvalues of A, accounting for any multiplicity if repeated. For a $3 \times 3 A = \operatorname{diag}(\lambda_1, \lambda_2, \lambda_3), the characteristic polynomial simplifies to p(\lambda) = (\lambda - \lambda_1)(\lambda - \lambda_2)(\lambda - \lambda_3). This product form arises because the off-diagonal entries are zero, making the a straightforward of the diagonal terms after subtracting \lambda I. The roots \lambda_1, \lambda_2, \lambda_3 are precisely the eigenvalues, each with algebraic multiplicity one unless values repeat. In all these low-dimensional cases, the roots of the characteristic polynomial correspond to the eigenvalues of the matrix, with multiplicities indicating how many times each eigenvalue appears. This connection holds generally, as the eigenvalues solve \det(\lambda I - A) = 0.

Structured Matrices

Matrices with specific structures often admit simplified expressions for their characteristic polynomials, revealing direct connections between the matrix entries and the eigenvalues. For a D = \operatorname{diag}(d_1, d_2, \dots, d_n), the characteristic is given by \det(\lambda I - D) = \prod_{i=1}^n (\lambda - d_i). In this case, the eigenvalues are precisely the diagonal entries d_i, and the factors completely into linear terms corresponding to these values. Upper and lower triangular matrices exhibit a similar property. For an upper triangular matrix T with diagonal entries t_1, t_2, \dots, t_n, the characteristic polynomial is \det(\lambda I - T) = \prod_{i=1}^n (\lambda - t_i), as the determinant of \lambda I - T is the product of the diagonal entries due to its triangular form. The eigenvalues are thus the diagonal elements, independent of the entries above (or below, for lower triangular) the diagonal. This holds analogously for lower triangular matrices. The companion matrix provides a canonical construction linking a monic polynomial directly to a matrix whose characteristic polynomial matches it. For a monic polynomial p(\lambda) = \lambda^n + a_{n-1} \lambda^{n-1} + \cdots + a_1 \lambda + a_0, the companion matrix C is the n \times n matrix with subdiagonal entries of 1 (i.e., 1's on the first subdiagonal and zeros elsewhere below), the last column consisting of -a_0, -a_1, \dots, -a_{n-1} in the rows from bottom to top, and zeros above the subdiagonal in the first n-1 columns. Explicitly, C = \begin{pmatrix} 0 & 0 & \cdots & 0 & -a_0 \\ 1 & 0 & \cdots & 0 & -a_1 \\ 0 & 1 & \cdots & 0 & -a_2 \\ \vdots & \vdots & \ddots & \vdots & \vdots \\ 0 & 0 & \cdots & 1 & -a_{n-1} \end{pmatrix}. The characteristic polynomial of C is exactly p(\lambda), as \det(\lambda I - C) = p(\lambda), which follows from expanding the determinant along the first row and using induction on the polynomial degree. This construction is fundamental for realizing any monic polynomial as the characteristic polynomial of some matrix. A Jordan block J_k(\mu) of size k with eigenvalue \mu is an upper triangular matrix with \mu on the diagonal and 1's on the superdiagonal, zeros elsewhere: J_k(\mu) = \begin{pmatrix} \mu & 1 & 0 & \cdots & 0 \\ 0 & \mu & 1 & \cdots & 0 \\ \vdots & \vdots & \ddots & \ddots & \vdots \\ 0 & 0 & \cdots & \mu & 1 \\ 0 & 0 & \cdots & 0 & \mu \end{pmatrix}. Its characteristic polynomial is (\lambda - \mu)^k, reflecting the algebraic multiplicity k of the eigenvalue \mu, with all eigenvalues equal to \mu. This arises because \lambda I - J_k(\mu) is upper triangular with \lambda - \mu on the diagonal.

Algebraic Properties

Invariance and Trace Relations

One fundamental property of the characteristic polynomial p_A(\lambda) = \det(\lambda I - A) of an n \times n matrix A is its invariance under similarity transformations. Specifically, if P is an invertible matrix, then the characteristic polynomial of P^{-1} A P equals that of A: p_{P^{-1} A P}(\lambda) = p_A(\lambda). This follows from the determinant identity \det(\lambda I - P^{-1} A P) = \det(P^{-1} (\lambda I - A) P) = \det(P^{-1}) \det(\lambda I - A) \det(P) = \det(\lambda I - A), since \det(P^{-1}) \det(P) = 1. This invariance underscores the characteristic polynomial's role as a similarity invariant, capturing essential spectral information independent of the basis chosen for the matrix representation. The coefficients of the characteristic polynomial connect directly to key matrix invariants through , applied to its —the eigenvalues of A counted with algebraic multiplicity. For the p_A(\lambda) = \lambda^n + c_{n-1} \lambda^{n-1} + \cdots + c_1 \lambda + c_0, the sum of the (with ) is -c_{n-1}, so the of A, \operatorname{tr}(A), equals the negative of the of \lambda^{n-1}, or \operatorname{tr}(A) = -\sum \lambda_i. Similarly, the product of the (with ) relates to the constant term c_0 = (-1)^n \prod \lambda_i, yielding \det(A) = \prod \lambda_i, up to the sign from \det(-A). The leading is always 1, ensuring the polynomial is , while the constant term is precisely (-1)^n \det(A). As the unique of degree n whose are exactly the eigenvalues of A with their algebraic multiplicities, the characteristic polynomial provides a complete algebraic encapsulation of the . This uniqueness stems from the , guaranteeing that the eigenvalues are the with the specified multiplicities in the complex numbers, and the monic distinguishes it from scalar multiples.

Cayley-Hamilton Theorem

The Cayley-Hamilton theorem states that if A is an n \times n over a , and p_A(\lambda) = \det(\lambda I - A) = \lambda^n + c_{n-1} \lambda^{n-1} + \cdots + c_1 \lambda + c_0 is its characteristic polynomial, then p_A(A) = A^n + c_{n-1} A^{n-1} + \cdots + c_1 A + c_0 I = 0, the . The theorem was independently discovered by William Rowan Hamilton in 1853, who proved it in the context of inverses of linear functions of quaternions, and by Arthur Cayley in 1858, who provided a general proof for matrices in his seminal paper on matrix theory. A standard proof uses the adjugate matrix. Recall that for any square matrix B, B \cdot \adj(B) = \det(B) I. Consider the characteristic matrix \lambda I - A, whose adjugate is a matrix of polynomials in \lambda of degree at most n-1, say \adj(\lambda I - A) = \sum_{k=0}^{n-1} P_k \lambda^k, where each P_k is an n \times n matrix with entries that are polynomials in the entries of A. Then, (\lambda I - A) \cdot \adj(\lambda I - A) = p_A(\lambda) I. This is a matrix polynomial identity in \lambda. Since the entries of A commute with the scalar \lambda, we can formally substitute \lambda = A, yielding (A I - A) \cdot \adj(A I - A) = p_A(A) I, or $0 \cdot \adj(0) = p_A(A) I, so p_A(A) = 0. The theorem implies that every square matrix satisfies a monic polynomial equation of degree at most n, and the minimal polynomial of the matrix, which is the monic polynomial of least degree annihilating the matrix, divides the characteristic polynomial.

Special Cases

Products of Matrices

When two square matrices A and B of the same size commute, meaning AB = BA, they can be simultaneously upper triangularized over the complex numbers. In this common triangular basis, the diagonal entries of the triangular form of AB are the products of the corresponding diagonal entries of the triangular forms of A and B. Consequently, the eigenvalues of AB (counting algebraic multiplicities) are precisely the products of the eigenvalues of A and the eigenvalues of B. Thus, the roots of the characteristic polynomial p_{AB}(\lambda) are the products of the roots of p_A(\lambda) and p_B(\lambda), determining p_{AB}(\lambda) up to the specific pairing of eigenvalues induced by the simultaneous triangularization. This multiplicative property of eigenvalues holds only under the commutativity assumption. Without commutativity, the eigenvalues of AB generally do not form the set of products of individual eigenvalues from A and B. For instance, consider the non-commuting $2 \times 2 matrices A = \begin{pmatrix} 1 & 1 \\ 0 & 1 \end{pmatrix}, \quad B = \begin{pmatrix} 1 & 0 \\ 1 & 1 \end{pmatrix}, each with characteristic polynomial p_A(\lambda) = p_B(\lambda) = (\lambda - 1)^2 and eigenvalues $1, 1. Their product is AB = \begin{pmatrix} 2 & 1 \\ 1 & 1 \end{pmatrix}, with characteristic polynomial p_{AB}(\lambda) = \lambda^2 - 3\lambda + 1 and eigenvalues \frac{3 \pm \sqrt{5}}{2}, which are not products of the eigenvalues of A and B. For rectangular matrices, the situation differs as AB and BA are square but of potentially different dimensions. Let A be an m \times n matrix and B an n \times m matrix, with m \geq n. The non-zero eigenvalues of AB and BA coincide (with matching algebraic multiplicities), while AB has additional zero eigenvalues of multiplicity m - n. This implies p_{AB}(\lambda) = \lambda^{m-n} p_{BA}(\lambda). Equivalently, \det(\lambda I_m - AB) = \lambda^{m-n} \det(\lambda I_n - BA). This relation holds regardless of commutativity, as it follows from block matrix determinant identities applied to augmented forms of A and B. If m < n, the roles reverse symmetrically.

Powers of a Single Matrix

The eigenvalues of the kth power A^k of an n \times n matrix A are given by \mu_j^k, where \mu_1, \dots, \mu_n are the eigenvalues of A (counted with algebraic multiplicity). This follows from the fact that if Av = \mu v for a nonzero v, then A^k v = \mu^k v, so \mu^k is an eigenvalue of A^k with the same eigenvector; the algebraic multiplicities are preserved because the characteristic polynomial is monic of degree n and fully determined by its roots. Suppose the characteristic polynomial of A factors as p_A(\lambda) = \prod_{j=1}^s (\lambda - \mu_j)^{m_j}, where \mu_1, \dots, \mu_s are the distinct eigenvalues with algebraic multiplicities m_1, \dots, m_s satisfying \sum m_j = n. Then the characteristic polynomial of A^k is p_{A^k}(\lambda) = \prod_{j=1}^s (\lambda - \mu_j^k)^{m_j}. This relation holds because the roots of p_{A^k}(\lambda) are precisely the eigenvalues \mu_j^k with the same multiplicities m_j. For large k, the roots of p_{A^k}(\lambda) exhibit asymptotic behavior dominated by the \rho(A) = \max_j |\mu_j|. Specifically, the eigenvalues of A^k with magnitude close to \rho(A)^k arise from those \mu_j satisfying |\mu_j| = \rho(A), while the others satisfy |\mu_j^k| = o(\rho(A)^k) and thus concentrate near the origin relative to the dominant scale. If there are multiple peripheral eigenvalues (those with |\mu_j| = \rho(A)), their kth powers lie on the circle of radius \rho(A)^k in the , determining the leading asymptotic growth of entries in A^k. This property finds application in solving linear recurrence relations via companion matrices. For the Fibonacci sequence defined by F_0 = 0, F_1 = 1, and F_n = F_{n-1} + F_{n-2} for n \geq 2, the companion matrix is C = \begin{pmatrix} 0 & 1 \\ 1 & 1 \end{pmatrix}, whose characteristic polynomial is p_C(\lambda) = \lambda^2 - \lambda - 1 with roots \phi = (1 + \sqrt{5})/2 and \hat{\phi} = (1 - \sqrt{5})/2. The powers C^k = \begin{pmatrix} F_{k-1} & F_k \\ F_k & F_{k+1} \end{pmatrix} yield the sequence terms as entries, and p_{C^k}(\lambda) = (\lambda - \phi^k)(\lambda - \hat{\phi}^k); since |\hat{\phi}| < 1 < \phi = \rho(C), the root \hat{\phi}^k approaches 0 as k increases, concentrating near the spectral radius \phi^k. This illustrates how powering amplifies the dominant eigenvalue in recurrence solutions.

Advanced Generalizations

Secular Function

In the context of perturbation theory, the secular function refers to the characteristic polynomial of a perturbed , particularly when analyzing small deviations from an unperturbed system. For an unperturbed A and a small perturbation \varepsilon B, the secular function is \det(\lambda I - A - \varepsilon B), which approximates the unperturbed characteristic polynomial p_A(\lambda) to as p_A(\lambda) - \varepsilon \trace(\adj(\lambda I - A) B). This expansion arises from the Jacobi formula for the derivative of the determinant, providing into how eigenvalues shift under changes. The term originates in , where Slater and Koster introduced it in 1954 to describe the determinant arising in the method for energy bands in periodic potentials with impurities. In this framework, the secular function encapsulates the effects of local on the electronic structure, facilitating solutions to otherwise intractable band structure problems. In degenerate within , the secular function determines the first-order corrections to degenerate eigenvalues by restricting the problem to the degenerate . The corrected energies E satisfy the secular equation \det( \langle \phi_i | V | \phi_j \rangle - (E - E_0) \delta_{ij} ) = 0, where \{ \phi_i \} is an for the degenerate at unperturbed energy E_0, and V is the ; this reduces to solving a low-dimensional eigenvalue problem for the perturbation matrix elements. This approach lifts the degeneracy and yields good approximations even beyond in many cases. Numerically, the secular function is central to divide-and-conquer algorithms for the symmetric tridiagonal eigenvalue problem, as developed by Cuppen in 1981. These methods recursively partition the tridiagonal matrix into smaller blocks, solve them separately, and then merge solutions by finding roots of a secular equation from a rank-one update, such as \det(D + \rho v v^T - \lambda I) = 0, where D is block-diagonal with known eigenvalues. This enables efficient, parallelizable computation of all eigenvalues and eigenvectors with O(n^2) complexity for n \times n matrices.

General Associative Algebras

In finite-dimensional associative algebras over a k, the characteristic polynomial of an element extends the matrix case through the . Let A be such an algebra of dimension n over k, equipped with a basis \{e_1, \dots, e_n\}. For any a \in A, the left maps a to the L_a: A \to A defined by L_a(x) = a x for all x \in A. Relative to the basis, L_a corresponds to an n \times n with entries in k, and the characteristic polynomial of a is p_a(\lambda) = \det(\lambda I - L_a). This construction yields a of degree n. The roots of p_a(\lambda) are the eigenvalues of L_a, which generalize the notion of eigenvalues for a within the regular representation of A. By , the coefficients of p_a(\lambda) express symmetric functions of these eigenvalues; specifically, the linear coefficient is -\operatorname{trace}(L_a), and the constant term is (-1)^n \det(L_a). These and functions on A, defined via the , extend the classical matrix invariants to the algebraic setting and satisfy multilinearity and cyclic properties under the multiplication. A concrete illustration arises in the quaternion algebra \mathbb{H} over \mathbb{R}, a 4-dimensional non-commutative . The reduced characteristic polynomial of the basis element i (satisfying i^2 = -1) is \lambda^2 + 1, while the full characteristic polynomial of the left is (\lambda^2 + 1)^2. In this case, the reduced polynomial reflects the structure of irreducible representations of \mathbb{H}. For non-commutative algebras, p_a(\lambda) remains a well-defined over k, independent of the choice of basis. However, the eigenvalues— in an algebraic closure of k—do not necessarily commute with a itself, distinguishing the non-commutative scenario from the commutative case where eigenvalues lie in the center.