Eigenvalues and eigenvectors

In linear algebra, eigenvalues and eigenvectors are core concepts that describe the intrinsic scaling and directional properties of linear transformations represented by square matrices. Specifically, for an n \times n matrix A, an eigenvalue \lambda is a scalar such that there exists a nonzero vector v \in \mathbb{R}^n, called an eigenvector corresponding to \lambda, satisfying the equation A v = \lambda v.^[1] This equation implies that the linear transformation defined by A scales the eigenvector v by the factor \lambda without altering its direction.^[1] The eigenvalues of A are the roots of its characteristic polynomial \det(A - \lambda I_n) = 0, where I_n is the identity matrix, and there are at most n such values, counting multiplicities.^[2] The notion of "eigen" derives from the German word meaning "own" or "characteristic," reflecting how these vectors and scalars capture the self-similar behaviors inherent to the matrix.^[1] Historically, the concepts emerged in the early 19th century, with Joseph Fourier popularizing their use in 1822 for analyzing heat conduction through separation of variables,^[3] and Augustin-Louis Cauchy establishing key properties for symmetric matrices in 1829, including eigenvalue interlacing.^[4] By the mid-20th century, computational methods for finding eigenvalues advanced significantly, driven by needs in physics and engineering, such as those surveyed in historical reviews of numerical linear algebra.^[5] Eigenvalues and eigenvectors underpin the spectral theory of matrices, enabling diagonalization when a full set of linearly independent eigenvectors exists, which simplifies matrix powers and exponentials as A^k = P D^k P^{-1}, where D is diagonal with eigenvalues on the diagonal and P has eigenvectors as columns.^[2] This structure is crucial for understanding matrix stability, as the eigenvalues determine whether powers of A converge (if all |\lambda| < 1) or diverge.^[6] Beyond pure mathematics, they have broad applications: in differential equations, eigenvalues govern the stability and oscillatory behavior of solutions to systems like \dot{x} = A x;^[7] in principal component analysis (PCA), the largest eigenvalues of the covariance matrix identify directions of maximum variance for dimensionality reduction in data science;^[8] and in Google's PageRank algorithm, the dominant eigenvector (with eigenvalue 1) ranks web pages based on link structures modeled as Markov chains.^[8] Other uses include spectral clustering for graph partitioning in biology and social networks,^[8] image compression via singular value decomposition (closely related to eigenvalues),^[9] and stability analysis in control systems for mechanical and electrical engineering.^[10]

Basic Concepts

Definition

In linear algebra, eigenvalues and eigenvectors arise in the context of linear transformations on vector spaces. A linear transformation represented by a square matrix A maps vectors from a vector space to itself, preserving addition and scalar multiplication. For an n \times n square matrix A over the real or complex numbers, a scalar \lambda is called an eigenvalue of A if there exists a non-zero vector v in \mathbb{R}^n or \mathbb{C}^n such that A v = \lambda v; in this case, v is an eigenvector corresponding to \lambda.^[11] This equation can be expressed in component form as a system: for v = (v_1, \dots, v_n)^T, the i-th row gives \sum_{j=1}^n a_{ij} v_j = \lambda v_i for each i = 1, \dots, n, where a_{ij} are the entries of A.^[12] Intuitively, an eigenvector v represents a direction in the vector space that remains unchanged under the linear transformation defined by A, except for scaling by the factor \lambda; thus, eigenvalues quantify how the transformation stretches or compresses along those invariant directions.^[1] In two dimensions, this manifests geometrically as the matrix A inducing a combination of rotation and scaling: eigenvectors align with the principal axes of this transformation, where the eigenvalue \lambda > 1 indicates expansion, $0 < \lambda < 1 indicates contraction, \lambda = 1 indicates no scaling, and \lambda < 0 indicates reflection along that axis. For example, a rotation matrix by an angle \theta has eigenvalues e^{\pm i\theta} (complex unless \theta = 0 or \pi), with no real scaling directions except in degenerate cases.^[13]

Historical background

The origins of eigenvalues and eigenvectors trace back to the 18th century, rooted in efforts to solve systems of linear differential equations and analyze mechanical systems. Leonhard Euler laid early groundwork in 1755 through his investigations into linear differential equations with constant coefficients, where he employed methods that implicitly involved characteristic roots to find solutions, particularly in the context of vibrations and celestial mechanics.^[14] These ideas prefigured the systematic treatment of roots that determine the behavior of linear systems. In the 19th century, advancements accelerated with contributions from Joseph-Louis Lagrange and Augustin-Louis Cauchy, who focused on secular equations arising in celestial mechanics to describe long-term perturbations in planetary orbits. Lagrange, in his work on the stability of the solar system during the 1770s and 1780s, formulated the secular equation as a determinant whose roots captured the principal modes of variation, influencing later geometric and analytic developments.^[15] Cauchy extended this in his 1829 paper "Sur l'équation à l'aide de laquelle on détermine les inégalités séculaires des mouvements des planètes," introducing the term "valeur propre" for eigenvalues, proving that the roots of symmetric matrices associated with these equations are real, and further elaborated in his 1851 memoir on linear differential equations, where he explored their integration and properties.^[4]^[16] Concurrently, Joseph Liouville and Charles-François Sturm developed the theory of second-order linear differential equations and boundary value problems now known as Sturm-Liouville theory.^[17] The formalization of these concepts occurred around 1900, with David Hilbert playing a pivotal role in spectral theory through his 1904 work "Grundzüge einer allgemeinen Theorie der linearen Integralgleichungen," where he coined the German term "Eigenwert" (proper value) for eigenvalues and "Eigenfunktion" for eigenfunctions in the context of integral operators.^[4] This terminology, emphasizing intrinsic or "proper" values of operators, was later translated into English as "eigenvalue." In the late 1920s, John von Neumann advanced the framework through his early papers and collaborations, such as the 1928 work with Hilbert, on the mathematical foundations of quantum mechanics, applying matrix mechanics to demonstrate how eigenvalues represent observable quantities like energy levels, solidifying their role in physics.^[4] These developments marked the transition from ad hoc solutions in mechanics to a unified abstract theory.

Matrix Eigenvalues and Eigenvectors

Characteristic polynomial and eigenvalues

The characteristic polynomial of an n \times n matrix A is defined as p_A(\lambda) = \det(\lambda I_n - A), where I_n denotes the n \times n identity matrix. This polynomial equation p_A(\lambda) = 0 is known as the characteristic equation. The eigenvalues of A are the roots of this polynomial. By the Fundamental Theorem of Algebra, the characteristic polynomial has exactly n roots in the complex numbers, counted with algebraic multiplicity.^[18] The characteristic polynomial is monic of degree n, with leading coefficient 1 for the \lambda^n term. It can be expressed in terms of the eigenvalues \lambda_1, \dots, \lambda_n of A as p_A(\lambda) = \prod_{i=1}^n (\lambda - \lambda_i). The coefficients of this polynomial are related to the elementary symmetric functions of the eigenvalues, up to sign, via Vieta's formulas. Specifically, the coefficient of \lambda^{n-1} is -\sum_{i=1}^n \lambda_i = -\operatorname{trace}(A), and the constant term is (-1)^n \prod_{i=1}^n \lambda_i = (-1)^n \det(A). To compute p_A(\lambda), the determinant \det(\lambda I_n - A) is expanded using the Leibniz formula, which sums over all permutations of the indices with signed products of matrix entries.^[18]^[19] The set of all eigenvalues of A is called the spectrum of A, denoted \sigma(A). For real matrices, eigenvalues may be complex, occurring in conjugate pairs if the matrix is real.^[20] As an example, consider the $2 \times 2 matrix A = \begin{pmatrix} a & b \\ c & d \end{pmatrix}. The characteristic polynomial is

p_A(\lambda) = \det\begin{pmatrix} \lambda - a & -b \\ -c & \lambda - d \end{pmatrix} = \lambda^2 - (a + d)\lambda + (ad - bc) = \lambda^2 - \operatorname{trace}(A) \lambda + \det(A).

The eigenvalues are the roots of this quadratic equation, given explicitly by the quadratic formula:

\lambda = \frac{\operatorname{trace}(A) \pm \sqrt{ [\operatorname{trace}(A)]^2 - 4 \det(A) }}{2}.

This formula provides the eigenvalues directly from the trace and determinant without full determinant expansion for larger matrices, though numerical stability considerations apply in computations.^[18]^[21]

Eigenspaces and multiplicities

The eigenspace corresponding to an eigenvalue \lambda of a square matrix A is the null space of A - \lambda I, denoted \ker(A - \lambda I), consisting of all eigenvectors associated with \lambda and the zero vector.^[22] This subspace captures the directions in which A acts as scalar multiplication by \lambda. The dimension of this eigenspace, known as the geometric multiplicity m_g(\lambda), measures the number of linearly independent eigenvectors for \lambda.^[23] The algebraic multiplicity m_a(\lambda) of \lambda is defined as the multiplicity of \lambda as a root of the characteristic polynomial \det(A - \lambda I).^[24] A fundamental property is that m_a(\lambda) \geq m_g(\lambda) for every eigenvalue \lambda, with equality holding if and only if the eigenspace spans the full generalized eigenspace dimension for \lambda.^[22] In the Jordan canonical form of A, the algebraic multiplicity m_a(\lambda) equals the sum of the sizes of all Jordan blocks associated with \lambda, while the geometric multiplicity m_g(\lambda) equals the number of such blocks.^[25] Additionally, the trace of A, \operatorname{tr}(A), equals the sum of all eigenvalues counted with their algebraic multiplicities: \operatorname{tr}(A) = \sum m_a(\lambda) \lambda.^[26] For matrices that admit a basis of eigenvectors—those that are diagonalizable—an eigenbasis can be formed by selecting bases from the eigenspaces of distinct eigenvalues, ensuring the entire space is spanned by these eigenvectors.^[23] This requires m_g(\lambda) = m_a(\lambda) for all \lambda. Consider the $2 \times 2 matrix

A = \begin{pmatrix} 2 & 1 \\ 0 & 2 \end{pmatrix}.

The characteristic polynomial is (\lambda - 2)^2 = 0, so \lambda = 2 has algebraic multiplicity m_a(2) = 2. The eigenspace is \ker(A - 2I) = \ker\begin{pmatrix} 0 & 1 \\ 0 & 0 \end{pmatrix}, which has dimension m_g(2) = 1, spanned by \begin{pmatrix} 1 \\ 0 \end{pmatrix}. Here, m_a(2) > m_g(2), illustrating a non-diagonalizable case with a single Jordan block of size 2.^[24]

Diagonalization and spectral decomposition

A square matrix A \in \mathbb{C}^{n \times n} is diagonalizable if it possesses a complete set of n linearly independent eigenvectors, allowing representation as A = P D P^{-1}, where D is a diagonal matrix containing the eigenvalues of A on its diagonal, and the columns of the invertible matrix P are the corresponding eigenvectors.^[27]^[28] This condition holds if and only if the geometric multiplicity equals the algebraic multiplicity for each eigenvalue of A.^[29] A sufficient but not necessary condition for diagonalizability is that all eigenvalues of A are distinct.^[30] The eigendecomposition A = P D P^{-1} simplifies powers of the matrix to A^k = P D^k P^{-1}, where D^k is obtained by raising each diagonal entry to the power k, facilitating analysis in applications such as dynamical systems and Markov chains.^[28] For matrices over the real numbers that are not diagonalizable, the Jordan canonical form provides a generalized decomposition with Jordan blocks for eigenvalues having deficient eigenspaces, though details of its construction are addressed in computational methods. The spectral theorem extends diagonalizability to real symmetric matrices, stating that every real symmetric matrix A is orthogonally diagonalizable: A = Q D Q^T, where Q is an orthogonal matrix (Q^T Q = I) whose columns are orthonormal eigenvectors, and D is diagonal with real eigenvalues.^[31]^[32] This decomposition ensures the eigenvalues are real and the eigenvectors form an orthonormal basis.^[33] For positive semi-definite symmetric matrices, all eigenvalues in D are non-negative.^[34] As an illustrative example, consider the real symmetric matrix

A = \begin{pmatrix} 2 & 1 \\ 1 & 2 \end{pmatrix}.

The eigenvalues are \lambda_1 = 3 and \lambda_2 = 1, with corresponding orthonormal eigenvectors \mathbf{u}_1 = \frac{1}{\sqrt{2}} \begin{pmatrix} 1 \\ 1 \end{pmatrix} and \mathbf{u}_2 = \frac{1}{\sqrt{2}} \begin{pmatrix} 1 \\ -1 \end{pmatrix}. The spectral decomposition is then

A = Q D Q^T, \quad D = \begin{pmatrix} 3 & 0 \\ 0 & 1 \end{pmatrix}, \quad Q = \begin{pmatrix} \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}} & -\frac{1}{\sqrt{2}} \end{pmatrix},

verifying Q^T Q = I and the orthogonality property.^[31]

Properties and Characterizations

Variational characterization

The variational characterization provides a way to describe the eigenvalues of a Hermitian matrix through optimization principles involving quadratic forms. For a Hermitian matrix A \in \mathbb{C}^{n \times n}, the Rayleigh quotient is defined as

R_A(x) = \frac{x^* A x}{x^* x}

for any nonzero vector x \in \mathbb{C}^n, where x^* denotes the conjugate transpose. When restricted to unit vectors with \|x\| = 1, the maximum value of R_A(x) equals the largest eigenvalue \lambda_{\max}(A) of A, achieved at a corresponding eigenvector, while the minimum value equals the smallest eigenvalue \lambda_{\min}(A). For real symmetric matrices, the eigenvalues can be viewed as the stationary points of the Rayleigh quotient under the constraint \|x\| = 1. Specifically, if x is an eigenvector corresponding to eigenvalue \lambda, then R_A(x) = \lambda, and the critical points of R_A on the unit sphere yield exactly the eigenvalues.^[35] The full set of eigenvalues admits a more refined characterization via the Courant-Fischer min-max theorem. Assuming the eigenvalues of the Hermitian matrix A are ordered as \lambda_1 \leq \lambda_2 \leq \cdots \leq \lambda_n, the k-th smallest eigenvalue is given by

\lambda_k = \min_{\dim S = k} \max_{\substack{x \in S \\ \|x\| = 1}} x^* A x = \max_{\dim T = n-k+1} \min_{\substack{x \in T \\ \|x\| = 1}} x^* A x,

where the minimum is over all subspaces S \subseteq \mathbb{C}^n of dimension k, and the maximum is over all subspaces T \subseteq \mathbb{C}^n of dimension n-k+1.^[35]^[36] A sketch of the proof relies on the spectral decomposition of A. Since A is Hermitian, it admits an orthonormal basis of eigenvectors \{v_1, \dots, v_n\} with A v_i = \lambda_i v_i. For any unit vector x = \sum_{i=1}^n c_i v_i with \sum |c_i|^2 = 1, the quadratic form expands as x^* A x = \sum_{i=1}^n \lambda_i |c_i|^2. To establish the min-max formula, consider the subspace S_k = \operatorname{span}\{v_1, \dots, v_k\}; the maximum of x^* A x over unit vectors in S_k is \lambda_k, as larger eigenvalues are orthogonal to this span. Extending to arbitrary subspaces of dimension k via the orthogonal projector onto the span of the first k eigenvectors bounds the expression from below, yielding equality. The max-min form follows dually by considering the complementary subspace. This variational approach enables bounding eigenvalues without computing the full spectrum, for instance by evaluating the Rayleigh quotient on trial subspaces to obtain upper or lower estimates for specific \lambda_k.

Left and right eigenvectors

In linear algebra, right eigenvectors of a square matrix A are defined by the equation A \mathbf{v} = \lambda \mathbf{v}, where \lambda is an eigenvalue and \mathbf{v} is a nonzero column vector.^[37] This standard formulation captures the action of A on column vectors from the right.^[38] For a more complete description, particularly with non-symmetric matrices, left eigenvectors are introduced, satisfying \mathbf{w}^* A = \lambda \mathbf{w}^*, where \mathbf{w}^* denotes the row vector that is the conjugate transpose of a column vector \mathbf{w}.^[37] Equivalently, in column vector form, this is A^* \mathbf{w} = \bar{\lambda} \mathbf{w}, where A^* is the adjoint (conjugate transpose) of A.^[38] Left eigenvectors (as columns) thus correspond to the right eigenvectors of the adjoint matrix A^* with eigenvalue \bar{\lambda}.^[37] The eigenvalues of A^* are the complex conjugates of the eigenvalues of A.^[38] However, for non-symmetric matrices, the left and right eigenvectors generally differ, unlike the case for symmetric or normal matrices where they coincide up to scaling.^[37] A key property is biorthogonality: if \{\mathbf{v}_i\} and \{\mathbf{w}_i\} are the right and left eigenvectors corresponding to distinct eigenvalues \{\lambda_i\}, then \mathbf{w}_i^* \mathbf{v}_j = \delta_{ij} after appropriate normalization, forming a biorthogonal basis.^[37]^[38] Consider the non-symmetric $2 \times 2 matrix

A = \begin{pmatrix} 3 & 1 \\ 0 & 2 \end{pmatrix}.

The eigenvalues are \lambda_1 = 3 and \lambda_2 = 2. The right eigenvectors are \mathbf{v}_1 = \begin{pmatrix} 1 \\ 0 \end{pmatrix} for \lambda_1 and \mathbf{v}_2 = \begin{pmatrix} 1 \\ -1 \end{pmatrix} for \lambda_2. The left eigenvectors (as rows) are \mathbf{w}_1^* = \begin{pmatrix} 1 & 1 \end{pmatrix} for \lambda_1 and \mathbf{w}_2^* = \begin{pmatrix} 0 & 1 \end{pmatrix} for \lambda_2, demonstrating the difference between left and right vectors and satisfying biorthogonality relations such as \mathbf{w}_1^* \mathbf{v}_2 = 0 and \mathbf{w}_2^* \mathbf{v}_1 = 0.^[38]

Additional matrix properties

The trace of a square matrix A, defined as the sum of its diagonal entries, equals the sum of its eigenvalues, counted with algebraic multiplicity.^[39] Similarly, the determinant of A is the product of its eigenvalues, also counted with multiplicity.^[39] These relations follow from the coefficients of the characteristic polynomial and hold for any complex square matrix. Eigenvalues are invariant under similarity transformations; that is, if B = P^{-1} A P for an invertible matrix P, then A and B share the same eigenvalues with identical multiplicities.^[40] This property underscores that eigenvalues capture intrinsic spectral information independent of the basis chosen to represent the linear transformation. The Gershgorin circle theorem provides bounds on eigenvalue locations: every eigenvalue of an n \times n complex matrix A = [a_{ij}] lies in at least one of the disks in the complex plane centered at a_{ii} with radius \sum_{j \neq i} |a_{ij}|, for i = 1, \dots, n.^[41] Named after Sergei Gershgorin, this result offers a simple way to estimate eigenvalue regions without computing the characteristic polynomial, particularly useful for matrices with dominant diagonal entries.^[42] For symmetric matrices, positive definiteness is equivalent to all eigenvalues being positive.^[43] A real symmetric matrix A is positive definite if x^T A x > 0 for all nonzero x \in \mathbb{R}^n, and this condition aligns directly with its spectral properties, ensuring the quadratic form remains positive.^[43] As an illustrative example, consider a $2 \times 2 matrix A arising in a linear dynamical system \dot{x} = A x. The eigenvalues determine stability: the origin is asymptotically stable if both have negative real parts, which occurs precisely when \operatorname{tr}(A) < 0 and \det(A) > 0.^[44] For instance, with A = \begin{pmatrix} -1 & 1 \\ 0 & -2 \end{pmatrix}, \operatorname{tr}(A) = -3 < 0 and \det(A) = 2 > 0, confirming stability without explicit eigenvalue computation.^[44]

Generalizations

Linear operators and eigenfunctions

In infinite-dimensional settings, the notions of eigenvalues and eigenvectors from finite-dimensional linear algebra extend to linear operators on Banach or Hilbert spaces, where non-zero elements satisfying Tf = \lambda f are termed eigenfunctions corresponding to the eigenvalue \lambda.^[45] The point spectrum of an operator T, consisting precisely of its eigenvalues, comprises those \lambda \in \mathbb{C} for which T - \lambda I is not injective, meaning there exists a non-trivial kernel.^[45] This framework applies to spaces of functions, such as L^2 intervals, where eigenfunctions play a central role in solving differential equations.^[46] A simple example is the differentiation operator D = \frac{d}{dx} acting on the space of polynomials, where constant functions serve as eigenfunctions for the eigenvalue \lambda = 0, since their derivatives vanish identically.^[47] More generally, the spectrum of a bounded linear operator T on a Banach space decomposes into the disjoint union of the point spectrum (eigenvalues), the continuous spectrum (where T - \lambda I is injective with dense but non-surjective range), and the residual spectrum (where T - \lambda I is injective but the range is not dense).^[45] For compact operators on infinite-dimensional Banach spaces, the non-zero elements of the spectrum are isolated eigenvalues of finite multiplicity, forming a discrete set that can only accumulate at 0; the point 0 itself belongs to the spectrum.^[48] This discreteness contrasts with the potentially continuous spectrum of non-compact operators and facilitates spectral decompositions in applications like integral equations.^[48] In the context of boundary value problems, Sturm-Liouville theory addresses self-adjoint differential operators of the form L = \frac{d}{dx} \left( p(x) \frac{d}{dx} \right) + q(x) on finite intervals with appropriate boundary conditions, yielding real eigenvalues \lambda_n that are simple and countable with \lambda_1 < \lambda_2 < \cdots \to \infty.^[49] The corresponding eigenfunctions \phi_n are orthogonal with respect to the weight function \sigma(x) in the inner product \int_a^b \phi_n(x) \phi_m(x) \sigma(x) \, dx = 0 for n \neq m, forming a complete orthogonal basis for the underlying Hilbert space.^[49] This orthogonality, established via Green's identity, underpins expansions of arbitrary functions in series of eigenfunctions.^[49]

Spectral theory overview

Spectral theory provides a framework for understanding the structure of linear operators on Hilbert spaces through their eigenvalues and eigenvectors, generalizing finite-dimensional concepts to infinite dimensions. The spectral theorem for normal operators states that every bounded normal operator T on a separable Hilbert space \mathcal{H} admits a spectral decomposition of the form

T = \int_{\sigma(T)} \lambda \, dE(\lambda),

where \sigma(T) denotes the spectrum of T, and E is a unique projection-valued measure (spectral measure) supported on the Borel subsets of \mathbb{C} such that E(\mathbb{C}) = I, the identity operator on \mathcal{H}. This integral representation expresses T as a "continuous superposition" of scalar multiples of projections, with E(\Delta) projecting onto the subspace spanned by generalized eigenvectors associated with eigenvalues in the Borel set \Delta \subseteq \sigma(T).^[50] For self-adjoint operators, a subclass of normal operators, the spectral theorem yields particularly concrete consequences. The spectrum \sigma(T) is real and contained in \mathbb{R}, and the Hilbert space \mathcal{H} decomposes orthogonally as a direct sum of closed subspaces \mathcal{H} = \bigoplus_{\lambda \in \sigma_p(T)} \ker(T - \lambda I) \oplus \overline{\operatorname{ran} E(\sigma(T) \setminus \sigma_p(T))}, where \sigma_p(T) is the point spectrum, and the continuous part corresponds to the range of the spectral projections over the continuous spectrum. This orthogonal decomposition ensures that eigenvectors (or generalized eigenvectors) for distinct eigenvalues are orthogonal, facilitating the analysis of operator behavior through its spectral components.^[51] The spectral theorem underpins the development of functional calculus for normal operators, allowing the definition of f(T) for Borel measurable functions f: \sigma(T) \to \mathbb{C} via

f(T) = \int_{\sigma(T)} f(\lambda) \, dE(\lambda).

This construction preserves the algebraic structure, such that f(T) is normal if T is, and it enables the computation of functions like exponentials e^{i t T} or powers T^n, which are essential in applications like quantum mechanics. The resolution of the identity provided by E is a projection-valued measure satisfying E(\emptyset) = 0, E(\mathbb{C}) = I, and E(\Delta_1) E(\Delta_2) = E(\Delta_1 \cap \Delta_2) for Borel sets \Delta_1, \Delta_2, with the projections E(\{\lambda\}) isolating point masses corresponding to eigenvalues and their associated generalized eigenspaces.^[50]^[52] A illustrative example is the multiplication operator M_m on L^2(X, \mu), where m \in L^\infty(X, \mu) is a bounded measurable function and (M_m \phi)(x) = m(x) \phi(x) for \phi \in L^2(X, \mu). This operator is normal (self-adjoint if m is real-valued), and its spectrum \sigma(M_m) coincides with the essential range of m, defined as the set of \lambda \in \mathbb{C} such that \mu(\{ x : |m(x) - \lambda| < \epsilon \}) > 0 for every \epsilon > 0. The spectral measure E is given by E(\Delta) \phi = \chi_{\{ m \in \Delta \}} \phi, where \chi is the characteristic function, projecting onto the subspace of functions supported where m falls in \Delta.^[53]

Associative algebras and representations

In unital associative algebras over the complex numbers, the spectrum of an element a is defined as the set \sigma(a) = \{\lambda \in \mathbb{C} \mid a - \lambda \cdot 1 is not invertible in the algebra

\}\). This generalizes the notion of eigenvalues from matrix algebras, where the [spectrum](/page/Spectrum) coincides with the set of eigenvalues of the matrix. For finite-dimensional algebras, the [spectrum](/page/Spectrum) of

a$ matches the eigenvalues arising from its action via left multiplication in the regular representation.^[54] In commutative unital associative algebras over \mathbb{C}, the characters—algebra homomorphisms \chi: A \to \mathbb{C}—provide a concrete realization of the spectrum, with \sigma(a) = \{\chi(a) \mid \chi a character of A\}. Each character has a kernel that is a maximal ideal of A, establishing a bijection between the set of characters and the maximal ideals when A is finitely generated. These characters correspond to one-dimensional representations, where \chi(a) is the eigenvalue of a in that representation.^[55] In the representation theory of associative algebras or finite groups, a representation \rho: A \to \mathrm{End}(V) or \rho: G \to \mathrm{GL}(V) maps elements a \in A or g \in G to endomorphisms whose eigenvalues capture the "spectral content" of a or g in the module V. Irreducible representations often allow triangularization over \mathbb{C}, with eigenvalues appearing on the diagonal in a suitable basis, though full diagonalization requires additional structure like abelianity. The character \chi_\rho(g) = \mathrm{tr}(\rho(g)) equals the sum of these eigenvalues, providing a conjugation-invariant summary of the spectrum across the representation. For finite groups, the eigenvalues of \rho(g) are roots of unity, reflecting the finite order of g. Schur's lemma plays a key role in controlling eigenvalue multiplicities within irreducible representations: it states that any endomorphism commuting with all \rho(g) (or \rho(a)) must be a scalar multiple of the identity. Consequently, central elements of the algebra or group act as scalars in irreducible representations, yielding a single eigenvalue with multiplicity equal to the representation dimension. For non-central elements, the lemma implies that eigenspaces for distinct eigenvalues cannot be interchanged by representation operators without contradicting irreducibility, leading to balanced multiplicities determined by the representation's structure; in unitary irreducible representations of compact groups, conjugate eigenvalues occur with equal multiplicity. A concrete illustration arises in the representation theory of the compact Lie group \mathrm{SO}(3). Its irreducible representations, labeled by non-negative integers l = 0, 1, 2, \dots, have dimension $2l + 1. The angular momentum operators, which generate the Lie algebra \mathfrak{so}(3), act on these representations; for instance, the operator corresponding to rotation around the z-axis has eigenvalues m = -l, -l+1, \dots, l-1, l, each with multiplicity one. The Casimir operator J^2, associated with the quadratic form on the Lie algebra, acts as the scalar l(l+1) times the identity, with multiplicity $2l + 1. These eigenvalues encode the decomposition of representations and underpin the theory's classification.

Computation

Analytical methods

The classical analytical approach to finding eigenvalues and eigenvectors of an n \times n square matrix A begins by forming the characteristic equation \det(A - \lambda I) = 0, where I is the identity matrix; the roots \lambda of this polynomial equation are the eigenvalues.^[56] For each eigenvalue \lambda_k, the corresponding eigenvectors v_k are obtained by solving the homogeneous linear system (A - \lambda_k I) v = 0, which yields a basis for the eigenspace associated with \lambda_k./04:_Eigenvalues_and_eigenvectors/4.02:_Finding_eigenvalues_and_eigenvectors) This method provides exact solutions when the characteristic polynomial can be solved symbolically.^[57] For 2×2 matrices, the characteristic polynomial is quadratic, \lambda^2 - \operatorname{tr}(A) \lambda + \det(A) = 0, and eigenvalues are given explicitly by the quadratic formula \lambda = \frac{\operatorname{tr}(A) \pm \sqrt{[\operatorname{tr}(A)]^2 - 4 \det(A)}}{2}.^[56] Eigenvectors follow by substitution into the null space equation. For 3×3 matrices, the characteristic polynomial is cubic, solvable in closed form using Cardano's formula, though the expressions involve cube roots and may yield complex values even for real matrices.^[58] The Cayley-Hamilton theorem asserts that A satisfies its own characteristic polynomial, p(A) = 0, enabling symbolic reduction of matrix powers to lower-degree polynomials via the eigenvalues, which aids in verifying solutions or computing functions of A analytically without explicit eigenvalue extraction in some cases.^[59] Special cases simplify the process: for diagonal matrices, the eigenvalues are precisely the diagonal entries, and the eigenvectors are the standard basis vectors.^[60] For upper or lower triangular matrices, the eigenvalues are also the diagonal entries, with eigenvectors found by solving the triangular system (A - \lambda_i I) v = 0 via forward or back substitution.^[61] Consider a 3×3 rotation matrix representing a rotation by angle \theta around the z-axis:

A = \begin{pmatrix} \cos \theta & -\sin \theta & 0 \\ \sin \theta & \cos \theta & 0 \\ 0 & 0 & 1 \end{pmatrix}.

The characteristic polynomial is \det(A - \lambda I) = (\lambda - 1)(\lambda^2 - 2\lambda \cos \theta + 1) = 0, yielding eigenvalues \lambda_1 = 1, \lambda_{2,3} = \cos \theta \pm i \sin \theta = e^{\pm i \theta}.^[62] The eigenvector for \lambda_1 = 1 is v_1 = (0, 0, 1)^T, along the rotation axis. For the complex eigenvalues, the eigenvectors are v_2 = (1, -i, 0)^T and v_3 = (1, i, 0)^T, spanning the rotation plane in the complex sense; real approximations can be formed from their linear combinations for practical use.^[62]

Iterative techniques

Iterative techniques provide efficient ways to approximate the dominant eigenvalues and eigenvectors of large matrices where exact eigendecomposition is computationally prohibitive. These methods rely on repeated matrix-vector multiplications to refine initial guesses, assuming the matrix has a dominant eigenvalue whose absolute value exceeds that of all others. Under this condition, the iterates converge linearly to the corresponding eigenvector.^[57] The power iteration, also known as the power method, is the simplest such algorithm. Starting from an initial unit vector \mathbf{v}_0, it generates the sequence \mathbf{v}_{k+1} = \frac{A \mathbf{v}_k}{\| A \mathbf{v}_k \|}, where \|\cdot\| denotes a vector norm, typically the Euclidean norm. If the eigenvalues are ordered by |\lambda_1| > |\lambda_2| \geq \cdots \geq |\lambda_n|, and \mathbf{v}_0 has a nonzero component in the direction of the dominant eigenvector \mathbf{u}_1, then \mathbf{v}_k converges to \mathbf{u}_1 as k \to \infty. The corresponding eigenvalue \lambda_1 can be approximated by the Rayleigh quotient \mathbf{v}_k^T A \mathbf{v}_k. This method originates from early 20th-century work on iterative approximations for linear operators and is widely used for its minimal storage requirements, needing only O(n) space beyond the matrix itself.^[57] The convergence of power iteration is linear, with the error typically reducing by a factor of \left| \frac{\lambda_2}{\lambda_1} \right| per iteration in the asymptotic regime. More precisely, the angle \theta_k between \mathbf{v}_k and \mathbf{u}_1 satisfies \tan \theta_{k+1} \approx \left| \frac{\lambda_2}{\lambda_1} \right| \tan \theta_k, implying that the number of iterations needed for a desired accuracy grows logarithmically with the tolerance and inversely with \log \left| \frac{\lambda_1}{\lambda_2} \right|. This rate makes the method effective when the spectral gap is large but slow otherwise.^[57] To illustrate, consider the symmetric matrix A = \begin{pmatrix} 3 & 1 \\ 1 & 3 \end{pmatrix}, which has eigenvalues \lambda_1 = 4 and \lambda_2 = 2, with eigenvectors \mathbf{u}_1 = \frac{1}{\sqrt{2}} \begin{pmatrix} 1 \\ 1 \end{pmatrix} and \mathbf{u}_2 = \frac{1}{\sqrt{2}} \begin{pmatrix} 1 \\ -1 \end{pmatrix}. Starting with \mathbf{v}_0 = \begin{pmatrix} 1 \\ 0 \end{pmatrix}, the first few iterations yield:

\mathbf{v}_1 = \frac{A \mathbf{v}_0}{\|A \mathbf{v}_0\|} = \frac{1}{\sqrt{10}} \begin{pmatrix} 3 \\ 1 \end{pmatrix} \approx \begin{pmatrix} 0.949 \\ 0.316 \end{pmatrix}
\mathbf{v}_2 \approx \begin{pmatrix} 0.858 \\ 0.515 \end{pmatrix}
\mathbf{v}_3 \approx \begin{pmatrix} 0.789 \\ 0.614 \end{pmatrix}

The Rayleigh quotients are approximately 3.60, 3.88, and 3.97, approaching \lambda_1 = 4, demonstrating convergence dominated by the ratio |\lambda_2 / \lambda_1| = 0.5.^[57] Inverse iteration extends power iteration to target eigenvalues near a known shift \mu, by applying the method to the shifted matrix (A - \mu I)^{-1}. The update becomes \mathbf{v}_{k+1} = \frac{(A - \mu I)^{-1} \mathbf{v}_k}{\| (A - \mu I)^{-1} \mathbf{v}_k \|}, where the dominant eigenvalue of the inverse corresponds to the original eigenvalue closest to \mu. Each step requires solving a linear system (A - \mu I) \mathbf{w} = \mathbf{v}_k, which can be done efficiently via LU factorization if \mu is fixed. This technique accelerates convergence for interior eigenvalues when a good shift is available, with the rate determined by the ratio of distances to the nearest eigenvalues.^[57] Orthogonal iteration generalizes power iteration to compute multiple dominant eigenvectors simultaneously by operating on an initial orthonormal subspace. Starting with an n \times p matrix Q_0 whose columns form an orthonormal basis for a p-dimensional subspace (with p equal to the number of desired eigenvectors), the method computes Y_{k+1} = A Q_k and then orthogonalizes via QR decomposition to obtain Q_{k+1} R_{k+1} = Y_{k+1}, yielding Q_{k+1} as the updated orthonormal basis. The columns of Q_k converge to the dominant p eigenvectors, assuming a sufficient spectral gap |\lambda_p| > |\lambda_{p+1}|, with linear convergence governed by |\lambda_{p+1} / \lambda_p|. This approach is particularly useful for symmetric matrices and maintains numerical stability through orthogonalization.^[57]

Numerical algorithms

Numerical algorithms for computing eigenvalues and eigenvectors address the full spectrum of matrices, particularly through direct methods for dense problems and subspace projections for large-scale sparse cases. These approaches aim to deliver the complete set of eigenvalues (and optionally eigenvectors) with high accuracy and efficiency, scaling from moderate-sized dense matrices to massive sparse systems arising in scientific computing. Modern implementations balance stability, convergence speed, and computational cost, often building on orthogonal transformations to preserve matrix norms and avoid ill-conditioning. The QR algorithm stands as a cornerstone for solving the general nonsymmetric eigenvalue problem, transforming the matrix via iterative orthogonal similarities to reveal its Schur form, where eigenvalues appear on the diagonal of an upper triangular matrix.^[63] Prior to iterations, the matrix undergoes Hessenberg reduction using Householder reflections, which converts a dense n \times n matrix to unreduced Hessenberg form in approximately \frac{10}{3} n^3 floating-point operations, preserving eigenvalues while simplifying subsequent steps.^[63] Each QR iteration then decomposes the Hessenberg matrix H_k = Q_k R_k and forms H_{k+1} = R_k Q_k, converging quadratically near the diagonal for well-separated eigenvalues; shifts, such as the Wilkinson shift, accelerate this process to ensure deflation of converged eigenvalues.^[63] For a symmetric matrix, the algorithm yields a real tridiagonal form after reduction (in \frac{4}{3} n^3 operations), followed by iterations that produce a diagonal matrix of eigenvalues.^[63] For symmetric tridiagonal matrices, the divide-and-conquer algorithm offers an efficient alternative to QR iterations, particularly on parallel architectures, by recursively splitting the problem into smaller subproblems while exploiting the banded structure.^[64] The method, as refined by Gu and Eisenstat, divides the tridiagonal matrix at a rank-one update point, computes eigensystems of the resulting blocks independently, and merges them using secular equations solved via bisection or interpolation, achieving high relative accuracy for eigenvectors.^[64] This approach reduces complexity to O(n^2) per level in the recursion tree, outperforming standard QR for large n by factors of 2–4 on vector processors, though it requires careful handling of nearly singular merges.^[65] Subspace methods like Lanczos and Arnoldi excel for large sparse matrices, projecting the problem onto low-dimensional Krylov subspaces to approximate extremal eigenvalues without forming the full eigendecomposition. The Lanczos algorithm, tailored for symmetric sparse matrices, generates an orthonormal basis Q_k for the Krylov subspace \mathcal{K}_k(A, q_1) = \operatorname{span}\{q_1, A q_1, \dots, A^{k-1} q_1\} via three-term recurrence, yielding a tridiagonal projection T_k = Q_k^T A Q_k whose Ritz values (eigenvalues of T_k) converge rapidly to the largest or smallest eigenvalues of A.^[66] For nonsymmetric cases, the Arnoldi iteration extends this by constructing a Hessenberg projection H_k = Q_k^T A Q_k through Gram-Schmidt orthogonalization, with Ritz values approximating a cluster of eigenvalues near the spectral edge, often requiring restarts for targeted spectra.^[67] These methods leverage sparse matrix-vector products, making them suitable for matrices with O(n) nonzeros. Practical implementations rely on libraries like LAPACK, which provide robust routines for dense problems. The DGEEV routine computes all eigenvalues and optionally left/right eigenvectors of a general real matrix via the QR algorithm with Hessenberg reduction, handling complex conjugate pairs appropriately.^[68] For symmetric matrices, DSYEV employs a divide-and-conquer or QR-based approach on tridiagonal form to deliver eigenvalues and orthonormal eigenvectors, with workspace optimized for stability.^[69] Computational complexity for dense matrices is O(n^3) overall, dominated by the initial reduction and iterations in QR-based methods, while sparse Krylov methods scale as O(k \cdot \mathrm{nnz}(A)) for k iterations and \mathrm{nnz}(A) nonzeros, enabling solutions for n > 10^6.^[63]^[70] As an illustrative example, consider the companion matrix for the polynomial t^3 - 6t^2 + 11t - 6 = 0,

A = \begin{pmatrix} 0 & 1 & 0 \\ 0 & 0 & 1 \\ 6 & -11 & 6 \end{pmatrix},

whose eigenvalues are the roots 1, 2, 3. The QR algorithm applied to this matrix converges after several iterations to an upper triangular form with diagonal entries 1, 2, 3.^[63]

Applications

Geometric and dynamic systems

In the context of linear transformations, eigenvectors represent directions that remain invariant under the transformation, meaning they are mapped to scalar multiples of themselves, while eigenvalues quantify the scaling factor along those directions. For a linear transformation represented by a matrix A, if Av = \lambda v where v \neq 0, then v is an eigenvector and \lambda is the corresponding eigenvalue, indicating how the vector is stretched or compressed geometrically.^[1] In dynamical systems governed by the linear ordinary differential equation x' = A x, the stability of the equilibrium at the origin is determined by the eigenvalues of A: the system is asymptotically stable if all eigenvalues have negative real parts, as solutions decay exponentially toward zero over time. If any eigenvalue has a positive real part, the equilibrium is unstable, with trajectories diverging; zero real parts lead to marginal stability without guaranteed decay.^[71] Eigenvalues play a central role in analyzing vibration modes of mechanical systems, such as coupled mass-spring arrangements, where they correspond to the squares of the natural frequencies of oscillation. For a system with masses m_i and springs with stiffness k_j, the equations of motion form a matrix eigenvalue problem K \phi = \omega^2 M \phi, where the eigenvalues \omega^2 yield the squared frequencies, and the eigenvectors \phi describe the mode shapes of synchronized displacements.^[72] In continuous dynamical systems, such as those modeled by partial differential equations, eigenvalues of differential operators like the Laplacian dictate the decay rates of solutions; for the heat equation \partial_t u = \Delta u on a domain, separation of variables leads to solutions of the form u(x,t) = \sum c_n \phi_n(x) e^{-\lambda_n t}, where \lambda_n > 0 are the eigenvalues of -\Delta, and larger \lambda_n produce faster exponential decay, smoothing the temperature distribution over time.^[73] A illustrative example is the two-dimensional phase portrait of a saddle point equilibrium, arising when the matrix A has one positive eigenvalue \lambda_2 > 0 and one negative eigenvalue \lambda_1 < 0: trajectories along the eigenvector for \lambda_1 approach the origin, while those along the eigenvector for \lambda_2 diverge, forming hyperbolic curves that separate stable and unstable manifolds.^[74] This configuration highlights how opposing eigenvalue signs lead to unstable dynamics, often resolvable via diagonalization to express solutions as x(t) = e^{A t} x(0).^[71]

Statistics and data analysis

In statistics and data analysis, eigenvalues and eigenvectors play a central role in dimensionality reduction techniques, enabling the identification of underlying patterns in high-dimensional datasets. Principal component analysis (PCA), introduced by Hotelling, decomposes the covariance matrix of the data into its eigenvalues and eigenvectors, where the eigenvectors represent the principal components—orthogonal directions of maximum variance—and the corresponding eigenvalues quantify the amount of variance explained by each component. By selecting the principal components with the largest eigenvalues, PCA reduces data dimensionality while preserving the most significant variability, facilitating visualization and preprocessing for machine learning models.^[75] Singular value decomposition (SVD) extends this framework to rectangular matrices, commonly encountered in data analysis, by factoring a matrix A as A = U \Sigma V^T, where the singular values in \Sigma are the square roots of the eigenvalues of A^T A (or A^* A for complex cases), and the columns of V are the corresponding eigenvectors.^[76] In statistical applications, such as latent semantic analysis or recommender systems, SVD identifies low-rank approximations that capture essential data structure, with larger singular values indicating dominant features.^[77] Eigenvalues also underpin probabilistic models like Markov chains, where the transition matrix is row-stochastic, ensuring a dominant eigenvalue of 1 with an associated left eigenvector representing the stationary distribution.^[78] The subdominant eigenvalues, with magnitudes less than 1 for irreducible and aperiodic chains, determine the convergence rate to the stationary distribution, as the powers of the matrix asymptotically approach the projector onto the eigenspace of eigenvalue 1.^[79] In epidemiology, the basic reproduction number R_0 is defined as the spectral radius—the largest eigenvalue—of the next-generation matrix, which models the expected number of secondary infections produced by a single infected individual in a susceptible population. If R_0 > 1, the disease-free equilibrium is unstable, indicating potential epidemic growth; conversely, R_0 < 1 implies stability and disease extinction. For illustration, consider a 2D dataset with covariance matrix \Sigma = \begin{pmatrix} 4 & 2 \\ 2 & 3 \end{pmatrix}. The eigenvalues are approximately 5.56 and 1.44, with eigenvectors such as \begin{pmatrix} 0.8 \\ 0.6 \end{pmatrix} for the larger eigenvalue and \begin{pmatrix} -0.6 \\ 0.8 \end{pmatrix} for the smaller; data points projected onto the first eigenvector align along the axis of greatest spread, reducing the dataset to a 1D representation that explains approximately 79% of the variance.^[75]

Physics and engineering

In quantum mechanics, the time-independent Schrödinger equation, H \psi = E \psi, where H is the Hamiltonian operator, identifies eigenvalues E as the discrete energy levels of a quantum system.^[80] These eigenvalues represent the possible stationary states, with the corresponding eigenfunctions \psi describing the wavefunctions of the system. The spectral theorem ensures that for self-adjoint Hamiltonians, the eigenvalues are real, aligning with the physical requirement that energy measurements yield observable, real-valued outcomes. In quantum chemistry, the eigenvalues of the Hamiltonian matrix determine the energies of molecular orbitals, which form the basis for understanding electronic structure and chemical bonding.^[81] Approximations such as the Hartree-Fock method solve the eigenvalue problem for the Fock matrix, yielding orbital energies that predict molecular stability and reactivity.^[82] These eigenvalues help quantify the distribution of electrons in bonding and antibonding orbitals, essential for computational simulations of molecular properties. Vibration analysis in engineering and physics relies on solving the generalized eigenvalue problem M^{-1} K \mathbf{u} = \lambda \mathbf{u}, where M is the mass matrix, K is the stiffness matrix, \mathbf{u} are the mode shapes (eigenvectors), and \lambda are the eigenvalues corresponding to squared natural frequencies.^[72] The natural frequencies are then given by \sqrt{\lambda}, enabling the identification of normal modes in structures like bridges or mechanical systems to predict resonance and stability.^[83] This approach is fundamental in finite element analysis for designing vibration-resistant components. The stress tensor in continuum mechanics is diagonalized to find principal stresses, which are the eigenvalues, with principal axes as the corresponding eigenvectors, simplifying the analysis of material failure under load.^[84] Similarly, the inertia tensor for rigid bodies yields principal moments of inertia as eigenvalues and principal axes as eigenvectors, optimizing rotational dynamics calculations in aerospace and mechanical engineering.^[85] These decompositions reduce complex tensor operations to scalar problems along orthogonal directions. A representative example is the one-dimensional quantum harmonic oscillator, whose Hamiltonian H = \frac{p^2}{2m} + \frac{1}{2} m \omega^2 x^2 has eigenvalues E_n = \left(n + \frac{1}{2}\right) \hbar \omega, where n = 0, 1, 2, \dots, illustrating quantized energy levels in systems like molecular vibrations. This model underpins approximations for anharmonic potentials in real physical systems.^[86]

Other domains

In graph theory, the eigenvalues of the adjacency matrix and Laplacian matrix provide key insights into graph connectivity and structural properties. The second-smallest eigenvalue of the Laplacian, known as the algebraic connectivity or Fiedler value, quantifies how well-connected the graph is, with larger values indicating stronger overall connectivity and resistance to disconnection.^[87] Spectral graph partitioning leverages the Fiedler vector—the eigenvector corresponding to this eigenvalue—to identify balanced cuts that minimize the number of edges between partitions, enabling efficient division of large networks for parallel computing or community detection.^[88] In geology, eigenvalues derived from the stress tensor are used to analyze fault plane solutions and predict tectonic faulting patterns. By treating the stress tensor as a symmetric matrix, its eigenvalues represent the principal stresses, which determine the orientation and type of potential faults through criteria like Mohr-Coulomb failure envelopes, aiding in the reconstruction of paleostress fields from observed fault data.^[89] In glaciology, eigenvalues of crystal orientation tensors model ice fabric anisotropy, influencing flow dynamics in polar ice sheets; for instance, the spread of eigenvalues reflects how crystal alignments affect viscosity and shear in ice cores, improving simulations of glacier movement under varying conditions.^[90] The eigenfaces method applies principal component analysis to face recognition, where eigenvalues of the covariance matrix rank the importance of facial features by capturing variance in image datasets. Larger eigenvalues correspond to principal components that explain the most variation across faces, allowing efficient dimensionality reduction for identifying individuals from grayscale images.^[91] This builds on statistical techniques like PCA to project high-dimensional face data onto a low-dimensional eigenspace. In wave transport, eigenvalues in scattering theory characterize propagation modes, particularly in waveguides or disordered media, where they determine resonant frequencies and transmission efficiencies for acoustic or electromagnetic waves. For example, the eigenvalues of the scattering operator help model how waves scatter off obstacles, revealing quasi-normal modes that predict energy flow and decay rates in open systems.^[92]^[93] In emerging machine learning applications, the eigenvalues of the Hessian matrix of a neural network's loss function illuminate the optimization landscape, with the spectral density indicating the presence of flat minima (small eigenvalues) that promote better generalization or sharp regions (large eigenvalues) that hinder convergence. Analysis of the top Hessian eigenvalues during training reveals how network depth affects curvature, guiding adaptive optimizers to navigate saddle points more effectively.^[94]^[95]