Gershgorin circle theorem

The Gershgorin circle theorem provides a method to localize the eigenvalues of an n \times n complex square matrix A = (a_{ij}) within the complex plane. It states that every eigenvalue of A lies in at least one of the closed disks (called Gershgorin disks) centered at the diagonal entry a_{ii} for i = 1, \dots, n, with radius r_i(A) = \sum_{j \neq i} |a_{ij}|, the sum of the absolute values of the off-diagonal entries in the i-th row.^[1] Named after the Soviet mathematician Semyon Gershgorin, the theorem appeared in his 1931 paper "Über die Abgrenzung der Eigenwerte einer Matrix," published in the proceedings of the Academy of Sciences of the USSR.^[2] Although the result built on earlier ideas related to diagonal dominance—such as those by Lévy in 1881, Desplanques in 1887, and Hadamard in 1903—it was the first to explicitly bound all eigenvalues using row-sum disks.^[1] The proof relies on selecting an eigenvector and identifying the component with maximum modulus to show that the corresponding eigenvalue cannot lie outside the relevant disk.^[3] A key strengthening asserts that if k of the Gershgorin disks are disjoint from the rest, then those k disks together contain exactly k eigenvalues (counting algebraic multiplicities).^[3] The theorem applies to any square matrix and is particularly useful for rough estimates of eigenvalue locations, proving nonsingularity for strictly diagonally dominant matrices (where |a_{ii}| > r_i(A) for all i), and analyzing the convergence of iterative methods like the Jacobi or Gauss-Seidel algorithms.^[3] Extensions include versions for generalized eigenvalue problems, operator matrices, and refined bounds using column sums or scaled radii.^[4]

Core Theorem

Statement

The Gershgorin circle theorem provides a method to localize the eigenvalues of a complex matrix within specific regions of the complex plane. For an n \times n complex matrix A = (a_{ij}), every eigenvalue \lambda of A satisfies

|\lambda - a_{ii}| \leq \sum_{\substack{j=1 \\ j \neq i}}^n |a_{ij}|

for at least one i = 1, \dots, n. This inequality defines the Gershgorin disks D_i = \{ z \in \mathbb{C} \mid |z - a_{ii}| \leq r_i \}, where the center of D_i is the diagonal entry a_{ii} and the radius r_i = \sum_{\substack{j=1 \\ j \neq i}}^n |a_{ij}| is the sum of the absolute values of the off-diagonal entries in the i-th row. The Gershgorin set is the union \bigcup_{i=1}^n D_i, and all eigenvalues of A lie within this set.^[5] An equivalent formulation arises by applying the theorem to the transpose A^T, which has the same eigenvalues as A. This yields disks centered at a_{ii} with radii \sum_{\substack{j=1 \\ j \neq i}}^n |a_{ji}|, corresponding to the sums of absolute values in the i-th column.^[5] The theorem was discovered by the Soviet mathematician Semyon Aronovich Gershgorin and first published in 1931 in his paper "Über die Abgrenzung der Eigenwerte einer Matrix" in Izvestiya Akademii Nauk SSSR, Otdelenie Matematicheskikh i Estestvennykh Nauk.^[2]

Proof

The proof of the Gershgorin circle theorem relies on basic properties of eigenvalues, eigenvectors, and the triangle inequality in the complex plane.^[2] Consider an n \times n complex matrix A = (a_{ij}) with eigenvalue \lambda and corresponding eigenvector x = (x_1, \dots, x_n)^T \neq 0. Without loss of generality, normalize x such that \|x\|_\infty = \max_k |x_k| = 1. Let i be an index where |x_i| = 1. The equation (A - \lambda I)x = 0 implies that the i-th component satisfies

\sum_{j=1}^n (a_{ij} - \lambda \delta_{ij}) x_j = 0,

where \delta_{ij} is the Kronecker delta. Rearranging gives

(a_{ii} - \lambda) x_i + \sum_{j \neq i} a_{ij} x_j = 0,

\lambda - a_{ii} = \sum_{j \neq i} a_{ij} \frac{x_j}{x_i}.

Taking absolute values yields

|\lambda - a_{ii}| = \left| \sum_{j \neq i} a_{ij} \frac{x_j}{x_i} \right| \leq \sum_{j \neq i} |a_{ij}| \left| \frac{x_j}{x_i} \right|.

Since |x_j| \leq 1 = |x_i| for all j, it follows that \left| \frac{x_j}{x_i} \right| \leq 1, and thus

|\lambda - a_{ii}| \leq \sum_{j \neq i} |a_{ij}| = r_i,

where r_i is the radius of the i-th Gershgorin disk. Therefore, \lambda lies in the disk centered at a_{ii} with radius r_i. The infinity norm is used here for its simplicity in bounding the ratios |x_j / x_i| directly by 1 without additional estimation.^[5] For the column version of the theorem, apply the row version to the transpose A^T. The eigenvalues of A^T coincide with those of A, and the row sums of absolute off-diagonal entries for A^T are precisely the column sums for A. Thus, every eigenvalue of A lies in at least one disk centered at a_{jj} with radius \sum_{i \neq j} |a_{ij}|.^[5]

Interpretation

Disk Properties

Each Gershgorin disk D_i associated with an n \times n complex matrix A = (a_{ij}) is a closed disk in the complex plane, centered at the diagonal entry a_{ii} with radius r_i = \sum_{j \neq i} |a_{ij}|, the \ell_1-norm of the off-diagonal elements in the i-th row. The union G(A) = \bigcup_{i=1}^n D_i, known as the Gershgorin region, contains all eigenvalues of A.^[2]^[6] The disks exhibit specific invariance and scaling properties under diagonal similarity transformations. For a nonsingular diagonal matrix D = \operatorname{diag}(d_1, \dots, d_n), the transformed matrix B = D^{-1} A D has unchanged diagonal entries b_{ii} = a_{ii}, preserving the disk centers, while the new radii become r_i' = \sum_{j \neq i} |d_i^{-1} a_{ij} d_j|. Choosing the d_k > 0 appropriately allows row and column scaling to minimize the radii, often tightening the overall bound provided by G(B), which still contains the eigenvalues of A since similarity preserves the spectrum.^[6] Disks frequently overlap, yielding conservative localization since multiple eigenvalues may lie within shared regions of G(A), potentially enlarging the effective bound beyond the precise spectral locations. For an irreducible matrix A, whose associated directed graph is strongly connected, the Gershgorin region G(A) is path-connected in the complex plane.^[6] If A is strictly diagonally dominant, meaning |a_{ii}| > r_i for each i, then A is nonsingular, and the Gershgorin disks bound the eigenvalues near the diagonal entries.

Eigenvalue Localization

The Gershgorin circle theorem localizes all eigenvalues of a complex square matrix A = (a_{ij}) to the union of disks in the complex plane, each centered at a diagonal entry a_{ii} with radius r_i = \sum_{j \neq i} |a_{ij}|, thereby demonstrating that eigenvalues cluster near these diagonal entries, which often capture the primary spectral behavior of the matrix.^[7] This clustering insight arises because the theorem exploits the row sums of absolute off-diagonal entries to bound deviations from the diagonal, providing a geometric enclosure that reflects the matrix's diagonal dominance or lack thereof. For Hermitian matrices, where all eigenvalues are real and the diagonal entries a_{ii} are real, the theorem confines the eigenvalues to the union of real line segments [a_{ii} - r_i, a_{ii} + r_i], effectively projecting the disks onto the real axis and yielding interval bounds that align directly with the spectrum's location. In general, for any matrix, the real parts of all eigenvalues lie between \min_i (\operatorname{Re}(a_{ii}) - r_i) and \max_i (\operatorname{Re}(a_{ii}) + r_i), as the disk geometry ensures that the horizontal extent of each disk spans this range around \operatorname{Re}(a_{ii}).^[7] This property has significant implications for stability analysis: if every disk lies entirely in the open left half-plane (i.e., \operatorname{Re}(a_{ii}) + r_i < 0 for all i), then all eigenvalues have negative real parts, implying that the matrix is Hurwitz stable and associated linear systems are asymptotically stable. Despite its utility, the theorem's bounds can be loose when off-diagonal entries are large relative to the diagonal, resulting in expansive radii that enclose a wide portion of the complex plane and reduce the precision of localization.^[7] Furthermore, while the disks exactly contain the eigenvalues, the pseudospectra—which characterize the eigenvalues of perturbed matrices and measure nonnormality—often extend beyond these boundaries, highlighting the theorem's focus on the unperturbed spectrum alone. Compared to the spectral radius bound \rho(A) \leq \max_i (|a_{ii}| + r_i), which follows directly from the disk enclosures, the full theorem offers sharper localization by considering the union of disks rather than a single encompassing circle, particularly when diagonal entries vary substantially across rows.^[7]

Disjoint Disk Strengthening

The disjoint disk strengthening of the Gershgorin circle theorem, originally established by Sergei Gershgorin in his 1931 paper, provides a more precise localization of eigenvalues when the disks separate into disjoint groups.^[2] Specifically, suppose the n Gershgorin disks of a complex n \times n matrix A partition into k disjoint unions, where the j-th union consists of the disks corresponding to a subset S_j \subseteq \{1, \dots, n\} of row indices with |S_j| = m_j and \bigcup_{i=1}^k S_j = \{1, \dots, n\}. Then, the union of the disks for each S_j contains precisely m_j eigenvalues (counting multiplicities) of A, which coincide with the eigenvalues of the m_j \times m_j principal submatrix of A formed by the rows and columns in S_j. This refinement enables exact multiplicity determination within each disjoint component, which is impossible in general without separation. A proof sketch relies on the continuity of eigenvalues with respect to matrix entries. Consider a perturbation that scales the off-block entries (those connecting different S_j) by a parameter t \in [0,1], yielding a family of matrices A(t) with A(0) block diagonal (containing the principal submatrices along the diagonal blocks) and A(1) = A. The eigenvalues of A(t) vary continuously in t, and since the Gershgorin disks of A(1) are disjoint across blocks, no eigenvalue can cross between the block-localized regions during the deformation. Thus, each block's eigenvalues remain confined to its corresponding disk union. When disks overlap, this exact partitioning fails, preventing precise multiplicity assignment to subsets. For instance, the companion matrix of the polynomial (z - \lambda)^n has all n eigenvalues at \lambda (with multiplicity n), but its Gershgorin disks generally overlap substantially, so the theorem cannot isolate individual eigenvalues or subgroups without additional structure.

Taussky's Theorem

Taussky's refinement of the Gershgorin circle theorem, introduced in 1949, addresses the precise location of eigenvalues on the boundaries of the disks for irreducible matrices.^[8] For an irreducible complex matrix A, if an eigenvalue \lambda lies on the boundary of one Gershgorin disk, then \lambda lies on the boundary of every Gershgorin disk. This result applies more broadly, including to real symmetric irreducible matrices, where the Gershgorin disks provide intervals on the real line [a_{ii} - r_i, a_{ii} + r_i] with r_i = \sum_{j \neq i} |a_{ij}|, and eigenvalues at the endpoints of these intervals—corresponding to the extreme values of the spectrum—must be simple or satisfy multiplicity conditions dictated by the structure of overlapping disks sharing that endpoint. For instance, in the case of a real symmetric irreducible matrix with positive off-diagonal entries, the largest eigenvalue (Perron root) is simple, ensuring no higher multiplicity at the rightmost endpoint unless multiple disks coincide exactly there. The proof for the general case employs the eigenvector with maximum modulus component, showing that boundary equality in one row propagates via irreducibility to all rows. For the symmetric case with positive off-diagonals, arguments analogous to those in the Perron-Frobenius theorem leverage the positive eigenvector: Consider \lambda as an eigenvalue with normalized eigenvector x where \max_i |x_i| = 1. The Gershgorin theorem implies |\lambda - a_{ii}| \leq r_i for all i, with equality for some row k if \lambda is on the boundary of the k-th interval. This equality requires |x_j| = |x_k| for all j with a_{kj} \neq 0. Irreducibility ensures the associated graph is strongly connected, and positivity guarantees a positive eigenvector, propagating the equality condition across all rows via contradiction: if \lambda were interior to another disk, the strict inequality would violate the eigenvector's uniformity in magnitude.^[8] This theorem strengthens the disjoint disks result by yielding more precise boundary placements, enabling tighter localization when eigenvalues touch the disk peripheries.

Examples

Basic Application

Consider the following 4×4 diagonally dominant matrix as a basic example to illustrate the application of the Gershgorin circle theorem:

A = \begin{pmatrix} 3 & 1 & 0 & 0 \\ 1 & 4 & 1 & 0 \\ 0 & 1 & 4 & 1 \\ 0 & 0 & 1 & 3 \end{pmatrix}

The Gershgorin disks for this matrix are determined by the diagonal entries as centers and the sums of the absolute values of the off-diagonal entries in each row as radii. For the first row, the center is 3 with radius |1| = 1, yielding the disk D_1 = \{ z \in \mathbb{C} : |z - 3| \leq 1 \}. For the second row, the center is 4 with radius |1| + |1| = 2, so D_2 = \{ z \in \mathbb{C} : |z - 4| \leq 2 \}. Similarly, the third row gives D_3 = \{ z \in \mathbb{C} : |z - 4| \leq 2 \}, and the fourth row gives D_4 = \{ z \in \mathbb{C} : |z - 3| \leq 1 \}. The union of these disks is the region in the complex plane covered by at least one D_i, which for this real symmetric matrix lies along the real axis from 2 to 6, as the leftmost boundary is \min(3-1, 4-2) = 2 and the rightmost is \max(3+1, 4+2) = 6. By the Gershgorin circle theorem, all eigenvalues of A must lie within this union. The actual eigenvalues of A can be found by solving the characteristic equation \det(A - \lambda I) = 0, which yields the polynomial \lambda^4 - 14\lambda^3 + 70\lambda^2 - 148\lambda + 112 = 0. Factoring gives (\lambda - 2)(\lambda - 4)(\lambda^2 - 8\lambda + 14) = 0, so the eigenvalues are exactly $2

, &#36;4

, and $4 \pm \sqrt{2} (approximately $2.586

and &#36;5.414

). All these values lie within the interval [2, 6], confirming the theorem's bound, with the lower bound achieved exactly at \lambda = 2. This example demonstrates the theorem's utility in providing a quick estimate for eigenvalue locations without full computation, particularly for diagonally dominant matrices where the disks are relatively tight. However, the significant overlap among D_2, D_3, D_1, and D_4 results in a coarser upper bound (6 versus the actual maximum of about $5.414$), highlighting how disk overlaps can reduce the precision of the localization.

Taussky Illustration

Consider the symmetric tridiagonal matrix

A = \begin{pmatrix} 2 & 1 & 0 \\ 1 & 3 & 1 \\ 0 & 1 & 2 \end{pmatrix},

which is irreducible because its associated directed graph is strongly connected. The Gershgorin disks for A are centered at the diagonal entries with radii equal to the sums of the absolute values of the off-diagonal entries in each row: the first and third disks are both |z - 2| \leq 1 (corresponding to the real interval [1, 3]), and the second disk is |z - 3| \leq 2 (corresponding to [1, 5]). The union of these disks is [1, 5], so all eigenvalues of A lie in this interval. The characteristic polynomial of A is (\lambda - 2)(\lambda - 1)(\lambda - 4) = 0, yielding eigenvalues \lambda = 1, 2, 4. The eigenvalue \lambda = 1 lies on the boundary of the union [1, 5]. In accordance with Taussky's theorem for irreducible matrices, this boundary eigenvalue also lies on the boundary of every Gershgorin disk containing it: |1 - 2| = 1 for the first and third disks, and |1 - 3| = 2 for the second disk. The eigenvalue \lambda = 1 is simple (multiplicity one), consistent with Taussky's observation that boundary eigenvalues of irreducible matrices cannot have full multiplicity unless the matrix is a multiple of the identity. To illustrate the necessity of irreducibility in Taussky's theorem, consider the reducible symmetric block-diagonal matrix

B = \begin{pmatrix} 1 & 0 & 0 \\ 0 & 2 & 2 \\ 0 & 2 & 2 \end{pmatrix}.

The Gershgorin disks are the degenerate disk \{1\} for the first row, and |z - 2| \leq 2 (interval [0, 4]) for the second and third rows. The union is [0, 4]. The eigenvalues of B are $1 (from the first block) and $0, 4 (from the second block, as its characteristic polynomial is \lambda^2 - 4\lambda = \lambda(\lambda - 4) = 0). Here, \lambda = 1 lies on the boundary of its own degenerate disk but in the strict interior of the other two disks containing it (|1 - 2| = 1 < 2), violating the boundary condition required by Taussky's theorem.

Applications

Numerical Preconditioning

In numerical linear algebra, the Gershgorin circle theorem plays a key role in preconditioning strategies for iterative solvers of linear systems Ax = b, particularly through row scaling techniques that enhance diagonal dominance. By applying a diagonal scaling matrix D with positive entries, the matrix A can be equilibrated to D^{-1}AD, where the goal is to minimize the maximum row sums of the off-diagonal absolute values, thereby tightening the Gershgorin disks and improving the matrix's conditioning. Such scaling leverages the theorem's localization of eigenvalues near the diagonal, ensuring the scaled matrix is strictly diagonally dominant—a property that confines all eigenvalues to disks with radii less than the diagonal magnitudes.^[9] The preconditioning procedure typically involves selecting the diagonal entries of D to equalize row norms, often using the infinity norm, which directly applies to methods like Jacobi and Gauss-Seidel iterations for solving Ax = b. In the Jacobi method, the preconditioner is the diagonal D of A, transforming the iteration matrix to I - D^{-1}A, while Gauss-Seidel uses the lower triangular part D - E. Convergence is guaranteed if the spectral radius \rho(I - D^{-1}A) < 1, a condition bounded by the Gershgorin theorem for diagonally dominant matrices, as the eigenvalues of D^{-1}A lie within disks centered at 1 with radii less than 1, implying those of the iteration matrix lie within the open unit disk. This approach is especially effective for sparse, non-symmetric systems where direct methods are prohibitive, and the theorem provides a posteriori verification of the preconditioner's quality without computing eigenvalues explicitly.^[9] Historically, this use of Gershgorin-based row scaling for preconditioning became a staple in 20th-century numerical linear algebra, appearing in foundational texts that analyzed iterative convergence for diagonally dominant matrices. Early works established that such scaling not only accelerates Jacobi and Gauss-Seidel but also extends to preconditioned Krylov subspace methods, with the theorem serving as a theoretical cornerstone for bounding error reduction rates.^[9]

Eigenvalue Bounds

The Gershgorin circle theorem provides a computationally efficient method for obtaining rough bounds on the eigenvalues of large sparse matrices, where full eigendecomposition is prohibitive due to high dimensionality. In control theory, these bounds are particularly valuable for assessing system stability without solving the characteristic equation; for instance, if all Gershgorin disks lie in the open left half of the complex plane (Re(z) < 0), the matrix is Hurwitz stable, implying asymptotic stability of the linear system. This application is demonstrated in the analysis of coupled nonlinear oscillators, where the theorem confirms that eigenvalues of the Jacobian matrix have negative real parts, ensuring bounded trajectories and synchronization stability as validated by numerical simulations.^[10] In graph theory, the theorem is applied to bound eigenvalues of the Laplacian matrix, which encodes graph connectivity and spectral properties relevant to diffusion processes and network analysis. For gain Laplacians in signed or complex unit gain graphs, Gershgorin disks yield upper bounds on the largest eigenvalue λ_n in terms of vertex degrees and gain magnitudes, such as λ_n(Φ) ≤ max_i {d_i + ∑_{j ∼ i} |c_j|/|c_i|}, where d_i is the degree and c_k are gain parameters; equality holds under switching equivalence to unsigned graphs. These bounds facilitate quick assessments of spectral gaps, which influence convergence rates in consensus algorithms on graphs.^[11] For non-normal matrices, where eigenvalues alone do not capture sensitivity to perturbations, the theorem extends to bounding pseudospectra—regions where eigenvalues of perturbed matrices can lie. Recent Gershgorin-type inclusion sets, derived via block decompositions of the matrix into perturbations of block-tridiagonal forms, enclose the ε-pseudospectrum within unions of disks or more refined regions like Γ_n^ε(A), offering sharper enclosures than classical disks for large Toeplitz matrices common in non-normal operator discretizations. Such bounds are crucial for understanding transient growth in non-normal dynamics, as in fluid flows or quantum systems.^[12] Despite its simplicity, the theorem's bounds can be loose for matrices with overlapping disks or weak diagonal dominance, limiting precision in eigenvalue estimation. In the context of Markov chains, Gershgorin disks bound eigenvalues of the transition rate matrix, constraining the spectral radius and thus the mixing time or relaxation rates; for continuous-time chains, the disks center on negative diagonals (outflow rates) with radii given by total transition rates, ensuring the second-largest eigenvalue magnitude informs convergence to stationarity without detailed spectral computation. These bounds complement localization results by providing algebraic enclosures that align with physical constraints like detailed balance in reversible chains.^[13]