Characteristic polynomial

In linear algebra, the characteristic polynomial of an n \times n square matrix A is defined as p_A(\lambda) = \det(\lambda I - A), where I is the n \times n identity matrix and \lambda is a scalar variable; this yields a monic polynomial of degree n whose roots are the eigenvalues of A.^[1]^[2] The characteristic polynomial is invariant under similarity transformations, meaning that if B = P^{-1} A P for some invertible matrix P, then p_B(\lambda) = p_A(\lambda), which underscores its role in capturing intrinsic spectral properties of the matrix independent of basis choice.^[2] The coefficients of the polynomial are related to the traces of powers of A via Newton identities, providing connections to other matrix invariants like the determinant (the constant term, up to sign) and the trace (the coefficient of \lambda^{n-1}, up to sign).^[3] A cornerstone result involving the characteristic polynomial is the Cayley–Hamilton theorem, which states that every square matrix satisfies its own characteristic equation, so p_A(A) = 0; this theorem, first appearing in Arthur Cayley's 1858 work on matrices, enables efficient computation of high powers of matrices and has broad implications in algebra and analysis.^[2]^[4] The characteristic polynomial plays a pivotal role in spectral theory, facilitating the computation of eigenvalues and eigenvectors essential for diagonalization and Jordan canonical form, and extends to applications in control theory for system stability analysis, quantum mechanics for operator spectra, and numerical methods for solving differential equations.^[1]^[5]

Basic Concepts

Motivation

The concept of the characteristic polynomial emerges from the fundamental quest in linear algebra to identify the eigenvalues of a matrix, which reveal essential properties of the associated linear transformation. Consider a square matrix A representing a linear operator on a vector space. An eigenvalue \lambda is a scalar for which there exists a non-zero vector v (an eigenvector) satisfying A v = \lambda v. Rearranging this equation yields (A - \lambda I) v = 0, where I is the identity matrix. For this homogeneous system to have a non-trivial solution, the matrix A - \lambda I must be singular, meaning its determinant vanishes: \det(A - \lambda I) = 0. To obtain a monic polynomial (leading coefficient 1), the characteristic polynomial is conventionally defined using \det(\lambda I - A), whose roots are precisely the eigenvalues of A. This connection provides a polynomial equation whose solutions characterize the scaling factors of the transformation along certain directions.^[6] This polynomial arises naturally from expanding the determinant \det(\lambda I - A), which for an n \times n matrix A produces a degree-n polynomial in \lambda. The brief derivation begins with the matrix \lambda I - A, whose entries are linear in \lambda; the determinant, being a multilinear function of the rows (or columns), expands into a sum of terms, each contributing powers of \lambda up to n, with the constant term being (-1)^n \det(A). This structure encodes the condition for the kernel of \lambda I - A to be non-trivial, directly linking the polynomial's roots to the spectrum of A.^[6] Historically, the characteristic polynomial was developed by Augustin-Louis Cauchy in his 1829 memoir "Sur l'équation à l'aide de laquelle on détermine les inégalités séculaires des mouvements des planètes," where he employed it in the context of celestial mechanics to analyze secular perturbations in planetary orbits, using linear substitutions and quadratic forms. In this work, Cauchy introduced the term "characteristic equation" (équation caractéristique) and "characteristic root" (racine caractéristique), and demonstrated that the eigenvalues of symmetric matrices are real, marking a pivotal advancement in the spectral theory of matrices. This laid the groundwork for later developments in operator theory and quantum mechanics, where spectral properties underpin the decomposition of transformations.^[7] Intuitively, the "characteristic" nature of the polynomial stems from its invariance under similarity transformations: if B = P^{-1} A P for an invertible matrix P, then \det(\lambda I - B) = \det(\lambda I - P^{-1} A P) = \det(P^{-1} (\lambda I - A) P) = \det(\lambda I - A), preserving the polynomial. This invariance ensures that the characteristic polynomial captures intrinsic behavioral traits of the linear transformation, independent of the basis chosen to represent the matrix, making it a robust descriptor of the operator's spectrum.^[8]

Formal Definition

The characteristic polynomial of an n \times n matrix A with entries in a field F is defined as the polynomial p_A(\lambda) = \det(\lambda I_n - A) in the polynomial ring F[\lambda], where I_n denotes the n \times n identity matrix and \det is the determinant function over F.^[8]^[9] This polynomial has degree n and is monic, meaning its leading coefficient is 1, because the leading term arises from \det(\lambda I_n) = \lambda^n, with lower-degree terms contributed by the entries of -A.^[8] Common notations for the characteristic polynomial include p_A(\lambda) or \chi_A(\lambda). Some texts define it as \det(A - \lambda I_n), which equals (-1)^n \det(\lambda I_n - A), introducing a sign alternation depending on the parity of n; in such cases, the monic version is obtained by multiplying by (-1)^n to ensure the leading coefficient is 1.^[3]^[10] More generally, for an endomorphism T (a linear map from a finite-dimensional vector space V over F to itself), the characteristic polynomial p_T(\lambda) is defined using any matrix representation A of T with respect to a basis of V, yielding p_T(\lambda) = \det(\lambda I - A); this is independent of the choice of basis, as similar matrices share the same characteristic polynomial.^[9]

Illustrative Examples

Low-Dimensional Matrices

For the simplest case of a $1 \times 1 matrix A = , the characteristic polynomial is computed as p(\lambda) = \lambda - a.^[3]^[11] This linear polynomial directly reflects the matrix's single entry, and its root \lambda = a is the eigenvalue of A.^[3]^[11] Consider a general $2 \times 2 matrix A = \begin{pmatrix} a & b \\ c & d \end{pmatrix}. The characteristic polynomial is p(\lambda) = \det(\lambda I - A), which expands to the determinant of \begin{pmatrix} \lambda - a & -b \\ -c & \lambda - d \end{pmatrix}. Expanding this determinant gives (\lambda - a)(\lambda - d) - (-b)(-c) = \lambda^2 - (a + d)\lambda + (ad - bc).^[3]^[11] The roots of this quadratic polynomial are the eigenvalues of A, accounting for any multiplicity if repeated.^[3]^[11] For a $3 \times 3 diagonal matrix A = \operatorname{diag}(\lambda_1, \lambda_2, \lambda_3), the characteristic polynomial simplifies to p(\lambda) = (\lambda - \lambda_1)(\lambda - \lambda_2)(\lambda - \lambda_3).^[3]^[11] This product form arises because the off-diagonal entries are zero, making the determinant a straightforward multiplication of the diagonal terms after subtracting \lambda I. The roots \lambda_1, \lambda_2, \lambda_3 are precisely the eigenvalues, each with algebraic multiplicity one unless values repeat.^[3]^[11] In all these low-dimensional cases, the roots of the characteristic polynomial correspond to the eigenvalues of the matrix, with multiplicities indicating how many times each eigenvalue appears.^[3]^[11] This connection holds generally, as the eigenvalues solve \det(\lambda I - A) = 0.^[3]

Structured Matrices

Matrices with specific structures often admit simplified expressions for their characteristic polynomials, revealing direct connections between the matrix entries and the eigenvalues. For a diagonal matrix D = \operatorname{diag}(d_1, d_2, \dots, d_n), the characteristic polynomial is given by \det(\lambda I - D) = \prod_{i=1}^n (\lambda - d_i). In this case, the eigenvalues are precisely the diagonal entries d_i, and the polynomial factors completely into linear terms corresponding to these values.^[11] Upper and lower triangular matrices exhibit a similar property. For an upper triangular matrix T with diagonal entries t_1, t_2, \dots, t_n, the characteristic polynomial is \det(\lambda I - T) = \prod_{i=1}^n (\lambda - t_i), as the determinant of \lambda I - T is the product of the diagonal entries due to its triangular form. The eigenvalues are thus the diagonal elements, independent of the entries above (or below, for lower triangular) the diagonal. This holds analogously for lower triangular matrices.^[12] The companion matrix provides a canonical construction linking a monic polynomial directly to a matrix whose characteristic polynomial matches it. For a monic polynomial p(\lambda) = \lambda^n + a_{n-1} \lambda^{n-1} + \cdots + a_1 \lambda + a_0, the companion matrix C is the n \times n matrix with subdiagonal entries of 1 (i.e., 1's on the first subdiagonal and zeros elsewhere below), the last column consisting of -a_0, -a_1, \dots, -a_{n-1} in the rows from bottom to top, and zeros above the subdiagonal in the first n-1 columns. Explicitly,

C = \begin{pmatrix} 0 & 0 & \cdots & 0 & -a_0 \\ 1 & 0 & \cdots & 0 & -a_1 \\ 0 & 1 & \cdots & 0 & -a_2 \\ \vdots & \vdots & \ddots & \vdots & \vdots \\ 0 & 0 & \cdots & 1 & -a_{n-1} \end{pmatrix}.

The characteristic polynomial of C is exactly p(\lambda), as \det(\lambda I - C) = p(\lambda), which follows from expanding the determinant along the first row and using induction on the polynomial degree. This construction is fundamental for realizing any monic polynomial as the characteristic polynomial of some matrix.^[13] A Jordan block J_k(\mu) of size k with eigenvalue \mu is an upper triangular matrix with \mu on the diagonal and 1's on the superdiagonal, zeros elsewhere:

J_k(\mu) = \begin{pmatrix} \mu & 1 & 0 & \cdots & 0 \\ 0 & \mu & 1 & \cdots & 0 \\ \vdots & \vdots & \ddots & \ddots & \vdots \\ 0 & 0 & \cdots & \mu & 1 \\ 0 & 0 & \cdots & 0 & \mu \end{pmatrix}.

Its characteristic polynomial is (\lambda - \mu)^k, reflecting the algebraic multiplicity k of the eigenvalue \mu, with all eigenvalues equal to \mu. This arises because \lambda I - J_k(\mu) is upper triangular with \lambda - \mu on the diagonal.

Algebraic Properties

Invariance and Trace Relations

One fundamental property of the characteristic polynomial p_A(\lambda) = \det(\lambda I - A) of an n \times n matrix A is its invariance under similarity transformations. Specifically, if P is an invertible matrix, then the characteristic polynomial of P^{-1} A P equals that of A: p_{P^{-1} A P}(\lambda) = p_A(\lambda). This follows from the determinant identity

\det(\lambda I - P^{-1} A P) = \det(P^{-1} (\lambda I - A) P) = \det(P^{-1}) \det(\lambda I - A) \det(P) = \det(\lambda I - A),

since \det(P^{-1}) \det(P) = 1.^[11] This invariance underscores the characteristic polynomial's role as a similarity invariant, capturing essential spectral information independent of the basis chosen for the matrix representation.^[14] The coefficients of the characteristic polynomial connect directly to key matrix invariants through Vieta's formulas, applied to its roots—the eigenvalues of A counted with algebraic multiplicity. For the monic polynomial p_A(\lambda) = \lambda^n + c_{n-1} \lambda^{n-1} + \cdots + c_1 \lambda + c_0, the sum of the roots (with sign) is -c_{n-1}, so the trace of A, \operatorname{tr}(A), equals the negative of the coefficient of \lambda^{n-1}, or \operatorname{tr}(A) = -\sum \lambda_i. Similarly, the product of the roots (with sign) relates to the constant term c_0 = (-1)^n \prod \lambda_i, yielding \det(A) = \prod \lambda_i, up to the sign from \det(-A).^[15] The leading coefficient is always 1, ensuring the polynomial is monic, while the constant term is precisely (-1)^n \det(A).^[3] As the unique monic polynomial of degree n whose roots are exactly the eigenvalues of A with their algebraic multiplicities, the characteristic polynomial provides a complete algebraic encapsulation of the spectrum. This uniqueness stems from the fundamental theorem of algebra, guaranteeing that the eigenvalues are the roots with the specified multiplicities in the complex numbers, and the monic normalization distinguishes it from scalar multiples.^[3]

Cayley-Hamilton Theorem

The Cayley-Hamilton theorem states that if A is an n \times n matrix over a commutative ring, and p_A(\lambda) = \det(\lambda I - A) = \lambda^n + c_{n-1} \lambda^{n-1} + \cdots + c_1 \lambda + c_0 is its characteristic polynomial, then p_A(A) = A^n + c_{n-1} A^{n-1} + \cdots + c_1 A + c_0 I = 0, the zero matrix.^[16] The theorem was independently discovered by William Rowan Hamilton in 1853, who proved it in the context of inverses of linear functions of quaternions, and by Arthur Cayley in 1858, who provided a general proof for matrices in his seminal paper on matrix theory.^[17]^[18] A standard proof uses the adjugate matrix. Recall that for any square matrix B, B \cdot \adj(B) = \det(B) I. Consider the characteristic matrix \lambda I - A, whose adjugate is a matrix of polynomials in \lambda of degree at most n-1, say \adj(\lambda I - A) = \sum_{k=0}^{n-1} P_k \lambda^k, where each P_k is an n \times n matrix with entries that are polynomials in the entries of A. Then,

(\lambda I - A) \cdot \adj(\lambda I - A) = p_A(\lambda) I.

This is a matrix polynomial identity in \lambda. Since the entries of A commute with the scalar \lambda, we can formally substitute \lambda = A, yielding

(A I - A) \cdot \adj(A I - A) = p_A(A) I,

or $0 \cdot \adj(0) = p_A(A) I, so p_A(A) = 0.^[16]^[19] The theorem implies that every square matrix satisfies a monic polynomial equation of degree at most n, and the minimal polynomial of the matrix, which is the monic polynomial of least degree annihilating the matrix, divides the characteristic polynomial.^[8]

Special Cases

Products of Matrices

When two square matrices A and B of the same size commute, meaning AB = BA, they can be simultaneously upper triangularized over the complex numbers. In this common triangular basis, the diagonal entries of the triangular form of AB are the products of the corresponding diagonal entries of the triangular forms of A and B. Consequently, the eigenvalues of AB (counting algebraic multiplicities) are precisely the products of the eigenvalues of A and the eigenvalues of B. Thus, the roots of the characteristic polynomial p_{AB}(\lambda) are the products of the roots of p_A(\lambda) and p_B(\lambda), determining p_{AB}(\lambda) up to the specific pairing of eigenvalues induced by the simultaneous triangularization. This multiplicative property of eigenvalues holds only under the commutativity assumption. Without commutativity, the eigenvalues of AB generally do not form the set of products of individual eigenvalues from A and B. For instance, consider the non-commuting $2 \times 2 matrices

A = \begin{pmatrix} 1 & 1 \\ 0 & 1 \end{pmatrix}, \quad B = \begin{pmatrix} 1 & 0 \\ 1 & 1 \end{pmatrix},

each with characteristic polynomial p_A(\lambda) = p_B(\lambda) = (\lambda - 1)^2 and eigenvalues $1, 1. Their product is

AB = \begin{pmatrix} 2 & 1 \\ 1 & 1 \end{pmatrix},

with characteristic polynomial p_{AB}(\lambda) = \lambda^2 - 3\lambda + 1 and eigenvalues \frac{3 \pm \sqrt{5}}{2}, which are not products of the eigenvalues of A and B. For rectangular matrices, the situation differs as AB and BA are square but of potentially different dimensions. Let A be an m \times n matrix and B an n \times m matrix, with m \geq n. The non-zero eigenvalues of AB and BA coincide (with matching algebraic multiplicities), while AB has additional zero eigenvalues of multiplicity m - n. This implies

p_{AB}(\lambda) = \lambda^{m-n} p_{BA}(\lambda).

Equivalently,

\det(\lambda I_m - AB) = \lambda^{m-n} \det(\lambda I_n - BA).

This relation holds regardless of commutativity, as it follows from block matrix determinant identities applied to augmented forms of A and B. If m < n, the roles reverse symmetrically.

Powers of a Single Matrix

The eigenvalues of the kth power A^k of an n \times n matrix A are given by \mu_j^k, where \mu_1, \dots, \mu_n are the eigenvalues of A (counted with algebraic multiplicity). This follows from the fact that if Av = \mu v for a nonzero vector v, then A^k v = \mu^k v, so \mu^k is an eigenvalue of A^k with the same eigenvector; the algebraic multiplicities are preserved because the characteristic polynomial is monic of degree n and fully determined by its roots. Suppose the characteristic polynomial of A factors as p_A(\lambda) = \prod_{j=1}^s (\lambda - \mu_j)^{m_j}, where \mu_1, \dots, \mu_s are the distinct eigenvalues with algebraic multiplicities m_1, \dots, m_s satisfying \sum m_j = n. Then the characteristic polynomial of A^k is p_{A^k}(\lambda) = \prod_{j=1}^s (\lambda - \mu_j^k)^{m_j}. This relation holds because the roots of p_{A^k}(\lambda) are precisely the eigenvalues \mu_j^k with the same multiplicities m_j. For large k, the roots of p_{A^k}(\lambda) exhibit asymptotic behavior dominated by the spectral radius \rho(A) = \max_j |\mu_j|. Specifically, the eigenvalues of A^k with magnitude close to \rho(A)^k arise from those \mu_j satisfying |\mu_j| = \rho(A), while the others satisfy |\mu_j^k| = o(\rho(A)^k) and thus concentrate near the origin relative to the dominant scale. If there are multiple peripheral eigenvalues (those with |\mu_j| = \rho(A)), their kth powers lie on the circle of radius \rho(A)^k in the complex plane, determining the leading asymptotic growth of entries in A^k. This property finds application in solving linear recurrence relations via companion matrices. For the Fibonacci sequence defined by F_0 = 0, F_1 = 1, and F_n = F_{n-1} + F_{n-2} for n \geq 2, the companion matrix is C = \begin{pmatrix} 0 & 1 \\ 1 & 1 \end{pmatrix}, whose characteristic polynomial is p_C(\lambda) = \lambda^2 - \lambda - 1 with roots \phi = (1 + \sqrt{5})/2 and \hat{\phi} = (1 - \sqrt{5})/2. The powers C^k = \begin{pmatrix} F_{k-1} & F_k \\ F_k & F_{k+1} \end{pmatrix} yield the sequence terms as entries, and p_{C^k}(\lambda) = (\lambda - \phi^k)(\lambda - \hat{\phi}^k); since |\hat{\phi}| < 1 < \phi = \rho(C), the root \hat{\phi}^k approaches 0 as k increases, concentrating near the spectral radius \phi^k. This illustrates how powering amplifies the dominant eigenvalue in recurrence solutions.

Advanced Generalizations

Secular Function

In the context of matrix perturbation theory, the secular function refers to the characteristic polynomial of a perturbed matrix, particularly when analyzing small deviations from an unperturbed system. For an unperturbed matrix A and a small perturbation \varepsilon B, the secular function is \det(\lambda I - A - \varepsilon B), which approximates the unperturbed characteristic polynomial p_A(\lambda) to first order as p_A(\lambda) - \varepsilon \trace(\adj(\lambda I - A) B).^[20] This expansion arises from the Jacobi formula for the derivative of the determinant, providing insight into how eigenvalues shift under infinitesimal changes.^[21] The term originates in solid-state physics, where Slater and Koster introduced it in 1954 to describe the determinant arising in the linear combination of atomic orbitals method for energy bands in periodic potentials with impurities. In this framework, the secular function encapsulates the effects of local perturbations on the electronic structure, facilitating solutions to otherwise intractable band structure problems.^[22] In degenerate perturbation theory within quantum mechanics, the secular function determines the first-order corrections to degenerate eigenvalues by restricting the problem to the degenerate subspace. The corrected energies E satisfy the secular equation \det( \langle \phi_i | V | \phi_j \rangle - (E - E_0) \delta_{ij} ) = 0, where \{ \phi_i \} is an orthonormal basis for the degenerate subspace at unperturbed energy E_0, and V is the perturbation; this reduces to solving a low-dimensional eigenvalue problem for the perturbation matrix elements.^[23] This approach lifts the degeneracy and yields good approximations even beyond first order in many cases. Numerically, the secular function is central to divide-and-conquer algorithms for the symmetric tridiagonal eigenvalue problem, as developed by Cuppen in 1981. These methods recursively partition the tridiagonal matrix into smaller blocks, solve them separately, and then merge solutions by finding roots of a secular equation from a rank-one update, such as \det(D + \rho v v^T - \lambda I) = 0, where D is block-diagonal with known eigenvalues. This enables efficient, parallelizable computation of all eigenvalues and eigenvectors with O(n^2) complexity for n \times n matrices.^[24]

General Associative Algebras

In finite-dimensional associative algebras over a field k, the characteristic polynomial of an element extends the matrix case through the regular representation. Let A be such an algebra of dimension n over k, equipped with a basis \{e_1, \dots, e_n\}. For any a \in A, the left regular representation maps a to the endomorphism L_a: A \to A defined by L_a(x) = a x for all x \in A. Relative to the basis, L_a corresponds to an n \times n matrix with entries in k, and the characteristic polynomial of a is p_a(\lambda) = \det(\lambda I - L_a).^[25] This construction yields a monic polynomial of degree n.^[25] The roots of p_a(\lambda) are the eigenvalues of L_a, which generalize the notion of eigenvalues for a within the regular representation of A. By Vieta's formulas, the coefficients of p_a(\lambda) express symmetric functions of these eigenvalues; specifically, the linear coefficient is -\operatorname{trace}(L_a), and the constant term is (-1)^n \det(L_a). These trace and determinant functions on A, defined via the regular representation, extend the classical matrix invariants to the algebraic setting and satisfy multilinearity and cyclic properties under the algebra multiplication. A concrete illustration arises in the quaternion algebra \mathbb{H} over \mathbb{R}, a 4-dimensional non-commutative division algebra. The reduced characteristic polynomial of the basis element i (satisfying i^2 = -1) is \lambda^2 + 1, while the full characteristic polynomial of the left regular representation is (\lambda^2 + 1)^2.^[26] In this case, the reduced polynomial reflects the structure of irreducible representations of \mathbb{H}. For non-commutative algebras, p_a(\lambda) remains a well-defined monic polynomial over k, independent of the choice of basis. However, the eigenvalues—roots in an algebraic closure of k—do not necessarily commute with a itself, distinguishing the non-commutative scenario from the commutative case where eigenvalues lie in the center.