Fact-checked by Grok 2 weeks ago

Min-max theorem

The min-max theorem, also known as the Courant–Fischer theorem or Courant–Fischer–Weyl min-max principle, is a fundamental result in linear algebra and . It provides a variational characterization of the eigenvalues of a (or more generally, a on a ) using minima and maxima of the over subspaces. For an n \times n A with eigenvalues \lambda_1 \geq \lambda_2 \geq \cdots \geq \lambda_n, the states that the k-th largest eigenvalue is given by \lambda_k = \min_{\dim V = n-k+1} \max_{\substack{x \in V \\ \|x\|=1}} x^* A x = \max_{\dim W = k} \min_{\substack{x \in W \\ \|x\|=1}} x^* A x, where V and W are subspaces of \mathbb{C}^n, x^* denotes the conjugate transpose, and \| \cdot \| is the Euclidean norm. This formulation equates the k-th eigenvalue to an optimization problem over Rayleigh quotients R_A(x) = \frac{x^* A x}{x^* x} restricted to appropriate subspaces, highlighting the extremal properties of eigenvectors. The min-max characterization originates from work by Ernst Fischer in 1905 on the max-min principle for the smallest eigenvalue, Hermann Weyl's 1912 extension to intermediate eigenvalues, and Richard Courant's 1920 unification into the full min-max principle. These developments built on Lord Rayleigh's earlier ideas (1870s) about variational methods for eigenvalues in physics, particularly vibration problems. The theorem's proof relies on the for Hermitian matrices and properties of orthogonal projections. The result extends beyond finite dimensions to compact self-adjoint operators on Hilbert spaces, where eigenvalues are characterized similarly using closed subspaces, and to bounded self-adjoint operators under additional conditions. Applications include bounding eigenvalues without full computation, as in the Cauchy interlacing theorem for principal submatrices and Lidskii's inequality for eigenvalue perturbations. It also underpins the min-max principle for singular values of general matrices and numerical methods like the Lanczos algorithm for approximating spectra. In quantum mechanics, it justifies variational principles for energy levels, while in graph theory, it applies to the spectrum of the Laplacian via Rayleigh quotients on functions.

Formulation for matrices

The Courant-Fischer theorem

The Courant-Fischer theorem, named after Ernst who proved a version in 1905 and who extended it in 1920, provides a variational characterization of the eigenvalues of a in finite dimensions. For a A \in \mathbb{C}^{n \times n}, the is defined as R_A(x) = \frac{\langle Ax, x \rangle}{\langle x, x \rangle} for nonzero x \in \mathbb{C}^n, where \langle \cdot, \cdot \rangle denotes the standard inner product; the eigenvalues of A correspond to the critical values of this quotient under the constraint \|x\| = 1. Let the eigenvalues of A be ordered in decreasing order: \lambda_1(A) \geq \lambda_2(A) \geq \cdots \geq \lambda_n(A). The min-max principle states that the k-th largest eigenvalue satisfies \lambda_k(A) = \max_{\dim V = k} \min_{\substack{x \in V \\ \|x\| = 1}} \langle Ax, x \rangle, where the maximum is taken over all k-dimensional subspaces V of \mathbb{C}^n. Dually, the max-min principle gives \lambda_k(A) = \min_{\dim W = n - k + 1} \max_{\substack{x \in W \\ \|x\| = 1}} \langle Ax, x \rangle, with the minimum over all subspaces W of dimension n - k + 1. These characterizations hold under the assumption that A is Hermitian and the underlying space is finite-dimensional over the complex numbers. The theorem arises from the Rayleigh-Ritz variational method, which identifies eigenvalues as extrema of the over appropriate subspaces, providing both theoretical insight and a basis for numerical of eigensystems. A related result, the Ky-Fan , extends the characterization to sums of the first k largest eigenvalues: \sum_{i=1}^k \lambda_i(A) = \max \sum_{i=1}^k \langle A x_i, x_i \rangle, where the maximum is taken over all orthonormal sets \{x_1, \dots, x_k\} \subset \mathbb{C}^n. This assumes A is Hermitian and finite-dimensional.

Counterexample in the non-Hermitian case

The (Hermitian) property is essential to the min-max theorem because non-Hermitian matrices generally possess eigenvalues, and the associated —defined for a A and nonzero x as R_A(x) = \frac{x^* A x}{x^* x}, which takes values—fails to provide a variational that aligns with the eigenvalues in the same manner as for Hermitian matrices. Instead, the real part of the is often considered to analogize the Hermitian case, but even this does not bound the eigenvalues appropriately for non-self-adjoint operators, leading to values that lie outside the . A concrete counterexample is the nilpotent Jordan block A = \begin{pmatrix} 0 & 1 \\ 0 & 0 \end{pmatrix}, which has a double eigenvalue of 0 (with algebraic multiplicity 2 but geometric multiplicity 1). For a x = \begin{pmatrix} a \\ b \end{pmatrix} with |a|^2 + |b|^2 = 1, the simplifies to R_A(x) = \overline{a} b, so its real part is \operatorname{Re}(\overline{a} b). To maximize \operatorname{Re}(\overline{a} b), consider real a and b for simplicity (as the maximum occurs in the real case); the expression becomes a b subject to a^2 + b^2 = 1. By Lagrange multipliers or parametrization a = \cos \theta, b = \sin \theta, this is \frac{1}{2} \sin 2\theta, attaining a maximum of \frac{1}{2} at \theta = \frac{\pi}{4} (i.e., a = b = \frac{1}{\sqrt{2}}). Thus, the supremum of the real part of the Rayleigh quotient over unit vectors is \frac{1}{2}, exceeding the largest eigenvalue of 0. This demonstrates that the min-max procedure, when applied via the , produces a value outside the eigenvalue spectrum for non-self-adjoint operators, underscoring the necessity of self-adjointness. Extensions exist for normal matrices (which are unitarily diagonalizable like Hermitian ones) and more generally via the Jordan canonical form, though these require adjustments beyond the standard min-max formulation.

Generalizations to operators

Compact self-adjoint operators

In a separable infinite-dimensional \mathcal{H}, a positive compact A possesses a countable consisting of real eigenvalues \{\lambda_n\}_{n=1}^\infty, which can be arranged in decreasing order \lambda_1 \geq \lambda_2 \geq \cdots \geq 0 with \lim_{n \to \infty} \lambda_n = 0, and each non-zero eigenvalue has finite multiplicity. This spectral structure arises because compactness ensures that the resolvent is compact away from the , leading to isolated eigenvalues of finite multiplicity accumulating only at zero. The min-max theorem provides a variational characterization of these eigenvalues through finite-dimensional subspaces. Specifically, the k-th largest eigenvalue is given by \lambda_k = \max_{\substack{S \subset \mathcal{H} \\ \dim S = k}} \min_{\substack{x \in S \\ \|x\| = 1}} \langle Ax, x \rangle, where the maximum is taken over all k-dimensional subspaces S of \mathcal{H}. This formulation justifies the ordering of the eigenvalues by optimizing the Rayleigh quotient over subspaces of increasing dimension. A dual max-min characterization complements this result. One equivalent form is \lambda_k = \min_{\substack{T \subset \mathcal{H} \\ \dim T = k-1}} \max_{\substack{x \in T^\perp \\ \|x\| = 1}} \langle Ax, x \rangle, where the minimum is over all (k-1)-dimensional subspaces T, and the maximum is over unit vectors orthogonal to T. Alternatively, it can be expressed in terms of : \lambda_k = \min_{\substack{U \subset \mathcal{H} \\ \codim U = k}} \max_{\substack{x \in U \\ \|x\| = 1}} \langle Ax, x \rangle. These dual principles ensure that the eigenvalues are well-defined and capture the essential spectral properties without relying on explicit . Unlike the finite-dimensional case, where the entire space is finite-dimensional, the infinite-dimensional setting requires restricting to finite-dimensional trial subspaces, as the full space would yield the supremum of the , which is zero for compact operators. Compactness facilitates approximation by finite-rank operators, whose spectra converge to that of A, thereby extending the finite-dimensional min-max theorem to this context while preserving the discreteness and decay of the eigenvalues.

Bounded self-adjoint operators

The min-max theorem extends naturally to bounded operators on a separable \mathcal{H}. Consider a bounded A: \mathcal{H} \to \mathcal{H}, which is defined everywhere on \mathcal{H} and satisfies \langle A x, y \rangle = \langle x, A y \rangle for all x, y \in \mathcal{H}. By the , A admits a E_A, a on \mathbb{R} such that A = \int_{\sigma(A)} \lambda \, dE_A(\lambda), where \sigma(A) \subset \mathbb{R} is the , potentially comprising both point spectrum (eigenvalues) and continuous spectrum. This framework allows the theorem to characterize eigenvalues without requiring compactness, accommodating operators whose includes continuous components. The eigenvalues of A, ordered increasingly as \lambda_1 \leq \lambda_2 \leq \cdots and counted with multiplicity, are defined via the spectral projections as \lambda_k = \min \left\{ \mu \in \mathbb{R} : \dim E_A((-\infty, \mu]) \geq k \right\}. Here, \dim E_A((-\infty, \mu]) denotes the dimension of the spectral subspace corresponding to spectrum up to \mu, which captures the total multiplicity of eigenvalues below or equal to \mu. This formulation relies directly on the to measure multiplicity through the range of E_A, even when the spectrum contains continuous parts where no eigenvalues exist. If the essential spectrum \sigma_{\rm ess}(A) intervenes, only finitely many \lambda_k may lie below its infimum, with subsequent \lambda_k aligning to \inf \sigma_{\rm ess}(A). A variational principle provides an equivalent characterization without explicit reference to the spectral measure. Assuming A is bounded below (without loss of generality by shifting), the n-th min-max value is \lambda_n = \inf_{\{\psi_1, \dots, \psi_n\}} \max_{\psi \in \operatorname{span}\{\psi_1, \dots, \psi_n\}, \|\psi\|=1} \langle A \psi, \psi \rangle, where the infimum is taken over all sets of n pairwise orthogonal vectors \psi_1, \dots, \psi_n \in \mathcal{H} (or equivalently, over n-dimensional subspaces via the ). This yields the eigenvalues below \inf \sigma_{\rm ess}(A), and equals \inf \sigma_{\rm ess}(A) thereafter, with the theorem applying to isolated eigenvalues via finite-dimensional approximations from the spectral subspaces. The justifies this equivalence by ensuring the quadratic form \langle A \psi, \psi \rangle integrates against the spectral measure. Unlike the compact case, no uniform discreteness of the non-zero spectrum is assumed, allowing the continuous spectrum to be handled through the essential spectrum threshold.

Applications

Min-max principle for singular values

The singular values of a matrix M \in \mathbb{C}^{m \times n}, denoted \sigma_1^\downarrow(M) \geq \sigma_2^\downarrow(M) \geq \cdots \geq \sigma_{\min(m,n)}^\downarrow(M) \geq 0, are the eigenvalues (in decreasing order) of the matrix \sqrt{M^* M}, where M^* denotes the of M, with multiplicities preserved and additional zeros if m \neq n.[] This definition extends naturally to compact operators M on a , where the singular values form a sequence converging to zero.[] The min-max principle provides a variational characterization of these singular values without requiring M to be square or Hermitian. Specifically, the k-th largest singular value satisfies \sigma_k(M) = \max_{\dim S = k} \min_{\substack{x \in S \\ \|x\| = 1}} \|M x\|, where the maximum is taken over all k-dimensional subspaces S of the domain of M.[] A dual max-min formulation is \sigma_k(M) = \min_{\dim T = n - k + 1} \max_{\substack{x \in T \\ \|x\| = 1}} \|M x\|, with the minimum over subspaces T of dimension n - k + 1, assuming n \leq m without loss of generality.[] These expressions derive from applying the Courant-Fischer min-max theorem to the eigenvalues of the self-adjoint operator M^* M.[] This principle captures optimization for the restricted norms induced by M, offering insight into how the action of M concentrates on low-dimensional subspaces; it applies directly to non-square matrices, where the singular values quantify the "effective ranks" beyond eigenvalue .[] As a precursor, the eigenvalue min-max theorem for operators provides the foundational variational framework adapted here for singular values.[] The characterization holds more broadly for compact operators on separable Hilbert spaces, where the singular values \{\sigma_k\} form a countable sequence with \sigma_k \to 0 as k \to \infty, enabling approximations via finite-rank projections.[] Analyses in confirm this as a direct analogue of the eigenvalue min-max principle, applicable without the Hermitian assumption on the operator itself.[]

Cauchy interlacing theorem

The Cauchy interlacing theorem provides a fundamental relationship between the eigenvalues of a and those of its principal submatrices. Specifically, let A be an n \times n with eigenvalues \lambda_1(A) \geq \lambda_2(A) \geq \cdots \geq \lambda_n(A) ordered in decreasing order, and let B be an m \times m principal submatrix of A with m < n and eigenvalues \lambda_1(B) \geq \lambda_2(B) \geq \cdots \geq \lambda_m(B) also in decreasing order. Then, the eigenvalues interlace as follows: \lambda_{n-m+j}(A) \leq \lambda_j(B) \leq \lambda_j(A), \quad j = 1, 2, \dots, m. This inequality implies that each eigenvalue of the submatrix B lies between corresponding eigenvalues of the original matrix A, ensuring that the spectrum of B "interlaces" within the spectrum of A. The proof of the theorem relies on the min-max principle () for eigenvalues of Hermitian matrices. To establish the upper bound \lambda_j(B) \leq \lambda_j(A), note that \lambda_j(A) is the maximum over all j-dimensional subspaces of the minimum Rayleigh quotient x^* A x / x^* x for unit vectors x in that subspace. The corresponding j-dimensional subspace for \lambda_j(B) embeds naturally into \mathbb{R}^n via the coordinate subspace associated with the principal submatrix, yielding a j-dimensional subspace where the minimum Rayleigh quotient for A is at least \lambda_j(B); thus, the overall maximum for A satisfies \lambda_j(A) \geq \lambda_j(B). For the lower bound \lambda_{n-m+j}(A) \leq \lambda_j(B), a dual argument uses the min-max characterization in terms of codimensions: consider subspaces of dimension n - (n-m+j) + 1 = m - j + 1 for the complementary eigenvalues, and intersect with the embedded space of B to bound the Rayleigh quotients from below, ensuring the inequality holds. The theorem applies to any Hermitian matrix, where a principal submatrix is formed by selecting and retaining a subset of m rows and the corresponding columns (or vice versa). It extends naturally to bordered matrices, obtained by adding a single row and column to an (n-1) \times (n-1) Hermitian matrix, in which case the eigenvalues of the bordered matrix interlace those of the original in the reverse direction. As an illustrative example, consider the $3 \times 3 diagonal Hermitian matrix A = \begin{pmatrix} 3 & 0 & 0 \\ 0 & 2 & 0 \\ 0 & 0 & 1 \end{pmatrix}, with eigenvalues \lambda_1(A) = 3 > \lambda_2(A) = 2 > \lambda_3(A) = 1. The top-left $2 \times 2 principal submatrix is B = \operatorname{diag}(3, 2), with eigenvalues $3 > 2. For j=1, \lambda_{3-2+1}(A) = \lambda_2(A) = 2 \leq 3 = \lambda_1(B) \leq 3 = \lambda_1(A); for j=2, \lambda_{3-2+2}(A) = \lambda_3(A) = 1 \leq 2 = \lambda_2(B) \leq 2 = \lambda_2(A). Similarly, the submatrix from rows/columns 1 and 3 yields eigenvalues $3 > 1, satisfying $2 \leq 3 \leq 3 for j=1 and $1 \leq 1 \leq 2 for j=2. The theorem is named after , who established it in while analyzing secular inequalities in planetary motion via the characteristic equation of symmetric systems. Extensions of interlacing principles appear in , where they bound eigenvalues of adjacency matrices for induced subgraphs.

Lidskii's inequality

Lidskii's inequality provides a bound on the changes in eigenvalues of a under additive perturbations. Specifically, for Hermitian matrices A and B of size n \times n, with eigenvalues \lambda_1(A) \geq \cdots \geq \lambda_n(A) and \lambda_1(B) \geq \cdots \geq \lambda_n(B) (denoted \mu_i(B) in some notations), the differences \lambda_i(A + B) - \lambda_i(A) for i = 1, \dots, n are majorized by the eigenvalues of B. This means that \sum_{i=1}^k \bigl( \lambda_i(A + B) - \lambda_i(A) \bigr) \leq \sum_{i=1}^k \mu_i(B) for all k = 1, \dots, n, with equality for k = n since the is preserved under addition. The inequality admits a variational proof leveraging the min-max theorem (Courant-Fischer characterization). To establish the partial sum bound, consider subspaces V of dimension k that maximize the minimal Rayleigh quotient for A + B, so \sum_{i=1}^k \lambda_i(A + B) = \sup_{\dim V = k} \sum_{v \in \mathcal{O}(V)} v^* (A + B) v, where \mathcal{O}(V) is an orthonormal basis for V. By linearity of the Rayleigh quotient, v^* (A + B) v = v^* A v + v^* B v \leq \lambda_1(A) + \cdots + \lambda_k(A) + \sup_{\dim W = k} \sum_{w \in \mathcal{O}(W)} w^* B w = \sum_{i=1}^k \lambda_i(A) + \sum_{i=1}^k \mu_i(B), using the min-max principle to bound the B-term over suitable subspaces. Rearranging yields the majorization inequality for the eigenvalue shifts. Weyl's inequalities emerge as corollaries of Lidskii's via selective summation. For instance, the upper bound \lambda_{i+j-1}(A + B) \leq \lambda_i(A) + \lambda_j(B) follows by applying the k = i + j - 1 partial sum to the largest eigenvalues and isolating the desired term, with similar derivations for lower bounds like \lambda_{i+j-n}(A + B) \geq \lambda_i(A) + \lambda_{j}(B). These provide pointwise constraints on individual eigenvalue movements, tightening the for specific indices. In applications, Lidskii's inequality quantifies how eigenvalues of a shift under perturbations A \to A + B, offering bounds on spectral stability essential in and ; for example, it implies that the eigenvalues of A + B lie within the union of intervals centered at \lambda_i(A) with radii bounded by the spectral norm of B. Originally established in 1950, the result remains robust, with recent applications in quantum information theory, such as optimizing f-divergences over unitary orbits and analyzing induced metrics on density matrices.

References

  1. [1]
    Zur Theorie der Gesellschaftsspiele | Mathematische Annalen
    Cite this article. v. Neumann, J. Zur Theorie der Gesellschaftsspiele. Math. Ann. 100, 295–320 (1928). https://doi.org/10.1007/BF01448847. Download citation.
  2. [2]
    [PDF] VON NEUMANN MINIMAX THEOREM
    Theorem: Let A be a m × n matrix representing the payoff matrix for a two-person, zero- sum game. Then the game has a value and there exists a pair of mixed ...
  3. [3]
    [PDF] 1 Review of Game Theory 2 Minimax Theorem - cs.Princeton
    May 2, 2018 · The Minimax theorem states that for two-player zero-sum games, max min M(P, Q) = min max M(P, Q), where v is the game's value.
  4. [4]
    [PDF] The Minimax Theorem and Algorithms for Linear Programming
    Feb 4, 2016 · The minimax theorem asserts that, under optimal play, the expected payoff of each player is the same in the two scenarios. For example, in Rock ...
  5. [5]
    [PDF] John von Neumann's Conception of the Minimax Theorem
    The first purpose of this paper is to tell the history of John von Neumann's devel- opment of the minimax theorem for two-person zero-sum games from his first ...
  6. [6]
    [PDF] 16.1 L.P. Duality Applied to the Minimax Theorem - cs.wisc.edu
    Oct 22, 2007 · In this lecture, we first conclude our discussion of LP-duality by applying it to prove the Minimax theorem. Next we introduce vector ...
  7. [7]
    John Von Neumann's Contributions to Computer Science
    Today minimax has uses ranging from artificial intelligence where it is used to model rational agents, to distributed systems where it used to model the complex ...
  8. [8]
    Von Neumann and the Development of Game Theory
    Von Neumann's mathematical models were also used to plan out the path the bombers carrying the bombs would take to minimize their chances of being shot down.
  9. [9]
    On general minimax theorems. - Project Euclid
    1958 On general minimax theorems. Maurice Sion · DOWNLOAD PDF + SAVE TO MY LIBRARY. Pacific J. Math. 8(1): 171-176 (1958).
  10. [10]
    On general minimax theorems - MSP
    Introduction, von Neumann's minimax theorem [10] can be stated as follows : if M and N are finite dimensional simplices and / is a bilinear function on MxN, ...
  11. [11]
    Über quadratische Formen mit reellen Koeffizienten
    Cite this article. Fischer, E. Über quadratische Formen mit reellen Koeffizienten. Monatsh. f. Mathematik und Physik 16, 234–249 (1905).
  12. [12]
    [PDF] The Rayleigh Quotient Iteration and Some Generalizations for ...
    May 8, 2007 · In fact, the proof exploited symmetry so artfully that it was difficult to see how the result could be extended to non-Hermitian matrices.Missing: nilpotent counterexample<|control11|><|separator|>
  13. [13]
    [PDF] NUMERICAL METHODS FOR LARGE EIGENVALUE PROBLEMS ...
    Theorem 1.9 (Min-Max theorem) The eigenvalues of a Hermitian matrix A are characterized by the relation λk = min. S, dim(S)=n−k+1 max x∈S,x6=0. (Ax, x). (x ...
  14. [14]
    [PDF] Functional Analysis II Prof. Phan Th`anh Nam
    The Kato Theorem (which tells us that HN is self-adjoint in H2 and bounded from below) and the HVZ thoerem hold in all 3 cases. • L2(RdN ) with no symmetry.Missing: reference | Show results with:reference
  15. [15]
    [PDF] 1 The min-max theorem
    The min-max values of q satisfy µn ≤ Σ(A). If µn < Σ(A) then µn is the n-th eigenvalue of A counted with multiplicity from the lowest value.
  16. [16]
    [PDF] Chen,Aden.pdf - UChicago Math
    The min-max theorem offers a variational characterization of the discrete eigen- values that lie below the essential spectrum of a self-adjoint operator. In ...
  17. [17]
    None
    Below is a merged summary of singular values and min-max/variational characterization from Roger A. Horn and Charles R. Johnson’s *Matrix Analysis* (2nd Edition), consolidating all information from the provided segments. To retain maximum detail, I will use a structured format with tables where appropriate, followed by a narrative summary. The response avoids exceeding the thinking token limit by focusing on concise, dense representation.
  18. [18]
    [PDF] The singular value decomposition of compact operators on Hilbert ...
    Apr 3, 2014 · Therefore the Courant min-max theorem gives expressions for the singular values of a compact linear operator on a Hilbert space, whether or not.
  19. [19]
    [PDF] 4. Singular value decomposition
    Max–min characterization we extend (3) to a max–min characterization of the other singular values: 𝜎𝑘 = max. 𝑋𝑇 𝑋=𝐼𝑘. 𝜎min(𝐴𝑋). (6a). = max. 𝑋𝑇 𝑋 ...
  20. [20]
    [PDF] Lecture 4 1 Eigenvalue Interlacing Theorem
    Sep 1, 2016 · The following theorem is known as the eigenvalue interlacing theorem. Theorem 1 (Eigenvalue Interlacing Theorem) Suppose A ∈ Rn×n is symmet-.Missing: primary source
  21. [21]
    254A, Notes 3a: Eigenvalues and sums of Hermitian matrices
    Jan 12, 2010 · In these notes we are arranging eigenvalues in descending order; of course, one can also arrange eigenvalues in increasing order, which causes some slight ...
  22. [22]
    Eigenvalues of sums of hermitian matrices - EuDML
    ... Wielandt - An extremum property of sums of eigenvalues, Proc. Amer. Math. Soc.6 (1955), 106-110. Zbl0064.24703; [Z] A. Zelevinsky - Littlewood-Richardson ...
  23. [23]
    Eigenvalue Inequalities for Hermitian Matrices - Nick Higham
    Mar 9, 2021 · The eigenvalues of Hermitian matrices satisfy a wide variety of inequalities. We present some of the most useful and explain their implications.
  24. [24]
    Unitary orbit optimization of quantum f-divergence - ADS
    In this paper, we study the optimization of quantum f-divergence between the unitary orbits. The proof relies on the well-known Lidskii's inequality.