Rayleigh quotient iteration
Rayleigh quotient iteration (RQI) is an iterative eigenvalue algorithm in numerical linear algebra designed to compute eigenvalues and corresponding eigenvectors of a symmetric matrix by combining the Rayleigh quotient for eigenvalue estimation with inverse iteration for eigenvector refinement, achieving rapid cubic convergence for nearly all initial vectors.[1]
The algorithm starts with an initial unit vector v^{(0)} and computes the Rayleigh quotient \lambda^{(0)} = r(v^{(0)}) = \frac{(v^{(0)})^T A v^{(0)}}{(v^{(0)})^T v^{(0)}} as an initial eigenvalue approximation, where A is the symmetric matrix.[1] In each subsequent iteration, it solves the shifted linear system (A - \lambda^{(k-1)} I) w = v^{(k-1)} for w, normalizes to obtain v^{(k)} = w / \|w\|, and updates the eigenvalue estimate \lambda^{(k)} = r(v^{(k)}).[1] This process leverages the stationarity of the Rayleigh quotient at eigenvectors, ensuring the residual norm decreases monotonically, which contributes to its global convergence properties for symmetric matrices.[2]
RQI demonstrates cubic convergence asymptotically, where the error in the eigenvalue approximation satisfies |\lambda^{(k)} - \lambda_j| = O(|\lambda^{(k-1)} - \lambda_j|^3) near the target eigenvalue \lambda_j, and similarly for the eigenvector, often tripling the number of accurate digits per iteration.[1] This makes it more efficient than standard inverse iteration with fixed shifts, though it requires solving a new linear system at each step, increasing computational cost unless factorizations are inexpensive.[3] While primarily effective for symmetric matrices, generalizations exist for nonsymmetric cases, though they may lose some convergence guarantees due to nonnormality.[2]
The Rayleigh quotient itself traces back to the 1870s work of Lord Rayleigh on vibration theory, where it was used to approximate natural frequencies from mode shapes.[2] The full iterative method emerged in the 1950s alongside digital computing, with early analyses by Temple and Crandall on convergence, followed by Ostrowski's 1958–1959 papers on local behavior and Kahan's 1968 proof of global convergence for symmetric matrices from almost all starting vectors.[2] RQI has since become a foundational technique, influencing algorithms like the QR method for eigensystems.[1]
Theoretical Foundations
Rayleigh Quotient
The Rayleigh quotient for a symmetric matrix A \in \mathbb{R}^{n \times n} and a nonzero vector x \in \mathbb{R}^n is defined as
R(x) = \frac{x^T A x}{x^T x}.
This expression represents a normalized quadratic form, where the numerator x^T A x measures the action of A on x in the direction of x, and the denominator ensures scale invariance.[4] For Hermitian matrices A \in \mathbb{C}^{n \times n}, the definition extends analogously using the conjugate transpose, R(x) = \frac{x^* A x}{x^* x}, preserving the real-valued nature of R(x) due to the self-adjoint property.[5]
The concept originated in the work of John William Strutt, third Baron Rayleigh, during the 1870s as part of his investigations into the theory of sound and vibrations, where it appeared in approximations for fundamental modes.[6] It was later formalized by Walther Ritz in 1909 as a key tool in variational methods for solving boundary value problems in mathematical physics, enabling systematic approximations through trial functions.[7]
As a Rayleigh-Ritz approximation, R(x) provides a variational estimate for eigenvalues of A. By the spectral theorem, if A has eigenvalues \lambda_1 \leq \lambda_2 \leq \cdots \leq \lambda_n with corresponding orthonormal eigenvectors v_1, \dots, v_n, then expanding x = \sum c_i v_i yields
R(x) = \sum_{i=1}^n \frac{|c_i|^2}{|c|^2} \lambda_i,
a weighted average of the eigenvalues, where |c|^2 = \sum |c_i|^2. This implies \lambda_1 \leq R(x) \leq \lambda_n, establishing R(x) as an upper bound for the smallest eigenvalue and a lower bound for the largest. Furthermore, R(x) approximates the eigenvalue closest to it, with the error bounded by the spectral gap to neighboring eigenvalues.
Key properties include stationarity at eigenvectors: the eigenvectors v_i are critical points of R(x) on the unit sphere, satisfying \nabla R(v_i) = 0, as derived from the variational characterization \lambda_i = \min_{\|x\|=1} R(x) subject to orthogonality constraints to previous eigenvectors (via the Courant-Fischer min-max theorem).[8] This stationarity underscores its role in eigenvalue optimization, though in practice, it is often employed to refine approximate eigenvectors iteratively.
Relation to Power and Inverse Iteration
Power iteration is a fundamental eigenvalue algorithm that generates a sequence of vectors by repeatedly multiplying an initial guess by the matrix A and normalizing the result, leading to convergence toward the dominant eigenvector corresponding to the eigenvalue of largest magnitude. The Rayleigh quotient, defined as \rho(v) = \frac{v^T A v}{v^T v} for a symmetric matrix A, serves as an estimate of this eigenvalue at each step, providing a simple yet effective approximation without additional computation. This method exhibits linear convergence, with the rate determined by the ratio of the dominant eigenvalue to the subdominant one, making it slow when eigenvalues are closely spaced.[1]
Inverse iteration extends power iteration to target interior or specific eigenvalues by applying the power method not to A, but to the shifted inverse (A - \mu I)^{-1}, where \mu is a fixed shift close to the desired eigenvalue. Each iteration solves a linear system (A - \mu I) w = v^{(k-1)} to obtain the next vector v^{(k)}, followed by normalization, which accelerates convergence for clustered spectra by making the target eigenvalue dominant in the inverse matrix. Like power iteration, it achieves only linear convergence, but the rate improves dramatically as \mu approaches the true eigenvalue, allowing selection of non-dominant eigenpairs.[1]
Rayleigh quotient iteration synthesizes these approaches by employing inverse iteration with dynamically updated shifts \mu_k chosen as the Rayleigh quotient from the previous iterate, \mu_k = \rho(v^{(k-1)}), thereby refining both eigenvector and eigenvalue estimates simultaneously. This adaptive strategy ensures faster global convergence compared to fixed-shift inverse iteration, as the shifts progressively align with the target eigenpair. The key innovation lies in these quotient-based shifts, which yield cubic local convergence for normal matrices by leveraging the stationarity of the Rayleigh quotient and ensuring monotonic decrease in residuals.[1]
Algorithm Description
Core Steps
The Rayleigh quotient iteration (RQI) begins with an initialization phase to establish starting approximations for the eigenvector and corresponding shift. An initial unit vector x_0 is selected as an approximate eigenvector, often chosen randomly or based on prior knowledge of the spectrum. The initial shift \mu_0 is typically set to the Rayleigh quotient of this vector, defined as \mu_0 = R(x_0) = \frac{x_0^T A x_0}{x_0^T x_0}, where A is the symmetric matrix whose eigenpair is sought.[9]
The core of the algorithm proceeds through an iterative loop that refines these approximations. At each iteration k, the process solves the linear system (A - \mu_k I) y_{k+1} = x_k for the correction vector y_{k+1}, or equivalently, computes y_{k+1} = (A - \mu_k I)^{-1} x_k. This step amplifies the component of x_k in the direction of the target eigenvector. The updated approximate eigenvector is then obtained by normalizing: x_{k+1} = \frac{y_{k+1}}{\| y_{k+1} \|}, ensuring unit length. Finally, the shift is updated using the Rayleigh quotient of the new vector: \mu_{k+1} = R(x_{k+1}) = x_{k+1}^T A x_{k+1}. This sequence of operations—solve, normalize, and update shift—constitutes one full iteration.[9]
The iteration continues until a suitable termination criterion is met, indicating sufficient convergence to the desired eigenpair. Common criteria include monitoring the change in the approximate eigenvector, such as halting when \| x_{k+1} - x_k \| < \epsilon for a small tolerance \epsilon, or checking the residual \| A x_k - \mu_k x_k \| < \delta, or convergence of the shift | \mu_{k+1} - \mu_k | < \eta. In exact arithmetic, the process terminates if A - \mu_k I becomes singular, yielding the exact eigenvector, though numerical implementations rely on these relative or absolute tolerances.[9]
Shift Strategies
In Rayleigh quotient iteration (RQI), the basic shift strategy employs the Rayleigh quotient from the previous iterate as the shift parameter for the subsequent linear solve. Specifically, given an approximate eigenvector x_k, the shift \mu_k = R(x_k) = \frac{x_k^T A x_k}{x_k^T x_k} is used to form the shifted system (A - \mu_k I) y_{k+1} = x_k, followed by normalization of y_{k+1}. This adaptive choice leverages the stationarity of the Rayleigh quotient at the true eigenvector, promoting rapid local convergence without requiring prior knowledge of the target eigenvalue.[10]
Advanced shift options extend this approach for improved efficiency or handling multiple eigenvalues. For deflation after computing an eigenpair, a subspace Rayleigh quotient—computed over a block of vectors spanning the invariant subspace—serves as the shift to target nearby eigenvalues, reducing sensitivity to clustering.[10]
Ill-conditioning arises when the shift \mu_k approaches an eigenvalue, making the shifted matrix nearly singular and risking zero pivots during LU factorization of (A - \mu_k I). To mitigate this, Peters and Wilkinson proposed a scaled variant using 1-norm normalization on the right-hand side vector, interpreting the iteration as Newton's method on the eigenvector residual and ensuring quadratic convergence even for ill-conditioned systems by avoiding small pivots.
Comparisons reveal trade-offs between simple RQI and accelerated variants: the basic Rayleigh quotient shift yields cubic convergence in 3–5 iterations to machine precision for isolated eigenvalues, but incurs higher per-iteration costs due to full solves. Wilkinson or subspace shifts reduce computational overhead in reduced forms or multi-vector settings, trading minor slowdowns in iteration count for overall efficiency, particularly in clustered spectra where basic RQI may require safeguards against stagnation. The Wilkinson shift, used in the QR algorithm on tridiagonal matrices, selects the eigenvalue of the trailing 2×2 submatrix closest to the bottom-right diagonal entry to mimic RQI behavior.[1]
Convergence Analysis
Theoretical Guarantees
Rayleigh quotient iteration exhibits local cubic convergence near an eigenvector for symmetric or Hermitian matrices with simple eigenvalues. Specifically, if the initial vector is sufficiently close to the target eigenvector q_j, the error satisfies \|v^{(k)} - (\pm q_j)\| = O(\|v^{(k-1)} - (\pm q_j)\|^3), while the eigenvalue approximation error follows |\lambda^{(k)} - \lambda_j| = O(\|v^{(k-1)} - (\pm q_j)\|^2). This cubic rate arises from a Taylor expansion of the inverse iteration operator applied to the shifted matrix (A - \mu I)^{-1}, where the Rayleigh quotient shift \mu ensures higher-order terms in the expansion dominate the error reduction, effectively tripling the number of accurate digits per iteration once the approximation is close enough.[1] The proof relies on the stationarity of the Rayleigh quotient at eigenvectors and the self-correcting nature of the adaptive shifts for symmetric matrices, with the iteration interpretable as a normalized Newton's method on the eigenproblem.[10]
Globally, for symmetric or Hermitian matrices, the algorithm converges to an eigenvalue from almost all starting vectors, excluding a set of measure zero.[1] This global behavior is supported by the monotonic decrease in the residual norms \|A v^{(k)} - \lambda^{(k)} v^{(k)}\|, which ensures steady progress toward the target eigenpair regardless of the starting point, provided the eigenvalues are simple (non-derogatory). Parlett's theorem establishes superlinear convergence under these conditions, highlighting that the use of Rayleigh quotients as shifts prevents divergence except in pathological cases. For multiple eigenvalues or small gaps, convergence may reduce to linear rates or attract to invariant subspaces spanning multiple eigenspaces.[6]
Factors Affecting Speed
The convergence speed of Rayleigh quotient iteration is profoundly influenced by the spectral properties of the matrix. A large spectral gap—the separation between the target eigenvalue \lambda_1 and the nearest other eigenvalue \lambda_2—accelerates convergence, enabling the algorithm to approach its cubic rate more efficiently for Hermitian matrices, as the relative distance |\lambda_1 - \mu| / |\lambda_2 - \mu| (where \mu is the shift) dominates the error reduction per step.[11] In contrast, small spectral gaps, such as those in clustered eigenvalue spectra, lead to slower convergence, often degrading to linear rates until the approximation improves sufficiently.[6] Matrix conditioning further modulates this; well-conditioned problems with distinct eigenvectors converge rapidly, while ill-conditioned cases near multiple eigenvalues exhibit heightened sensitivity, prolonging the iteration count due to the secant of the angle between eigenvectors (cond(p)) impacting residual reduction.[6]
The quality of the initial guess critically determines the number of iterations needed. Starting vectors close to the target eigenvector can achieve near-cubic convergence from the outset, minimizing computational effort, whereas random or poor initializations may require additional preconditioning or preliminary iterations to align with the eigenspace.[1] For problems with clustered eigenvalues, subspace-based initializations—such as those from Lanczos or Arnoldi methods—enhance effectiveness by projecting onto invariant subspaces, thereby avoiding slow initial transients and improving overall efficiency.[11]
Each iteration's computational cost is primarily governed by solving the shifted linear system (A - \mu I) w = v, which for dense n \times n matrices incurs an O(n^2) expense via forward/backward substitution after an initial O(n^3) LU factorization, making it suitable for moderate-sized problems but demanding for large n.[11] Deflation techniques, which orthogonalize against previously computed eigenvectors to isolate the target, reduce the effective matrix dimension in subsequent steps, lowering per-iteration costs and preventing subspace contamination, though they introduce minor overhead from reorthogonalization.[11]
Numerical stability in Rayleigh quotient iteration can be compromised by round-off errors accumulating in shift computations and linear solves, particularly as \mu nears \lambda_1, rendering A - \mu I ill-conditioned and amplifying perturbations in the approximate eigenvector.[6] For non-Hermitian matrices, this risk increases due to potential loss of residual monotonicity, but for symmetric cases, the method remains robust with proper implementation. Recommendations include periodic reorthogonalization to preserve vector accuracy.[11]
Practical Implementation
Pseudocode
The Rayleigh quotient iteration algorithm can be implemented in a straightforward manner for symmetric matrices, relying on iterative linear solves and Rayleigh quotient updates. The following pseudocode outlines the core logic in a language-agnostic form, assuming access to basic linear algebra operations such as matrix-vector products, linear system solves, and vector normalization.
Input: Symmetric matrix A ∈ ℝ^{n×n}, initial normalized vector x ∈ ℝ^n with ||x||₂ = 1, tolerance tol > 0
Output: Approximate eigenvalue λ, approximate eigenvector v
function rayleigh_quotient(A, x):
return (xᵀ A x) / (xᵀ x) // For normalized x, denominator is 1
v ← x
μ ← rayleigh_quotient(A, v)
residual ← ||A v - μ v||₂
while residual > tol:
// Solve shifted linear system (A - μ I) y = v
// Use LU factorization with partial pivoting for stability if A is dense
y ← solve((A - μ I), v)
// Normalize
v ← y / ||y||₂
// Update shift
μ ← rayleigh_quotient(A, v)
// Check residual
residual ← ||A v - μ v||₂
λ ← μ
return λ, v
Input: Symmetric matrix A ∈ ℝ^{n×n}, initial normalized vector x ∈ ℝ^n with ||x||₂ = 1, tolerance tol > 0
Output: Approximate eigenvalue λ, approximate eigenvector v
function rayleigh_quotient(A, x):
return (xᵀ A x) / (xᵀ x) // For normalized x, denominator is 1
v ← x
μ ← rayleigh_quotient(A, v)
residual ← ||A v - μ v||₂
while residual > tol:
// Solve shifted linear system (A - μ I) y = v
// Use LU factorization with partial pivoting for stability if A is dense
y ← solve((A - μ I), v)
// Normalize
v ← y / ||y||₂
// Update shift
μ ← rayleigh_quotient(A, v)
// Check residual
residual ← ||A v - μ v||₂
λ ← μ
return λ, v
The auxiliary rayleigh_quotient function computes the Rayleigh quotient, which serves as the shift estimate μ at each step. The linear solve operation requires solving a shifted system, which can be performed using LU factorization with partial pivoting to handle potential near-singularity when μ is close to an eigenvalue.
This implementation assumes A is symmetric and real, ensuring positive definiteness in the context of isolated eigenvalues for convergence.
Numerical Example
To illustrate the Rayleigh quotient iteration in action, consider the symmetric toy matrix
A = \begin{pmatrix}
2 & 1 & 1 \\
1 & 3 & 1 \\
1 & 1 & 4
\end{pmatrix},
with largest eigenvalue \lambda \approx 5.214319743377. The characteristic polynomial is \lambda^3 - 9\lambda^2 + 23\lambda - 17 = 0.[12]
Begin with the initial unit vector x_0 = \frac{1}{\sqrt{3}} \begin{pmatrix} 1 \\ 1 \\ 1 \end{pmatrix}. The Rayleigh quotient is \mu_0 = x_0^T A x_0 = 5. For the unnormalized counterpart \tilde{x}_0 = \begin{pmatrix} 1 \\ 1 \\ 1 \end{pmatrix}, the residual \|A \tilde{x}_0 - \mu_0 \tilde{x}_0\| = \sqrt{2} \approx 1.414; for the unit x_0, it is \sqrt{2/3} \approx 0.816.[12]
In the first iteration, solve (A - \mu_0 I) y_1 = x_0, yielding y_1 = \frac{1}{\sqrt{3}} \begin{pmatrix} 3 \\ 4 \\ 6 \end{pmatrix}. Normalize to obtain x_1 = \frac{1}{\sqrt{61}} \begin{pmatrix} 3 \\ 4 \\ 6 \end{pmatrix} \approx \begin{pmatrix} 0.384 \\ 0.512 \\ 0.768 \end{pmatrix}. The updated quotient is \mu_1 = x_1^T A x_1 = \frac{318}{61} \approx 5.213114754. The residual norm for x_1 is approximately 0.061.[12]
The second iteration produces \mu_2 \approx 5.214319743184, with the residual norm on the order of $10^{-9}. By the third iteration, \mu_3 \approx 5.214319743377, matching the true eigenvalue to more than 10 decimal places, and the residual norm falls below $10^{-6}, showcasing the method's cubic convergence.[12]
The table below summarizes the key iteration values (residuals computed for unit vectors):
| Iteration k | \mu_k | Residual norm \|A x_k - \mu_k x_k\|_2 |
|---|
| 0 | 5.000000000 | 0.816 |
| 1 | 5.213114754 | 0.061 |
| 2 | 5.214319743 | \approx 2 \times 10^{-10} |
| 3 | 5.214319743 | < 10^{-6} |
Applications and Extensions
Standard Eigenvalue Problems
The Rayleigh quotient iteration is primarily applied to compute isolated eigenvalues and corresponding eigenvectors of large sparse symmetric matrices, where the goal is to target specific spectrum regions rather than the entire set. This makes it particularly suitable for problems arising in quantum mechanics, such as finding energy levels of Hamiltonians represented by large symmetric matrices, where the method leverages the variational principle to approximate bound states efficiently.[13] In structural vibration analysis, it is used to determine natural frequencies and mode shapes of mechanical systems modeled by symmetric stiffness and mass matrices, enabling precise isolation of critical modes without computing the full eigenspectrum.[14]
In practice, the method is often integrated with preprocessing techniques like the Lanczos algorithm, which reduces the original large sparse symmetric matrix to a much smaller symmetric tridiagonal form while preserving the relevant spectral subspace. Rayleigh quotient iteration is then applied locally to this tridiagonal matrix to refine the targeted eigenpairs, combining the global reduction efficiency of Lanczos with the rapid local convergence of the iteration.[15] This hybrid approach is effective for matrices where direct application of the iteration would be prohibitive due to factorization costs.
Compared to simultaneous iteration, which computes multiple eigenpairs in parallel but converges linearly and struggles with non-dominant modes, Rayleigh quotient iteration offers cubic convergence for well-separated eigenvalues, making it superior for isolating interior or non-extreme modes in symmetric problems. It is implemented in established libraries, such as the LAPACK routine DSYEVX, which uses bisection-inverse iteration for selecting and refining eigenpairs of real symmetric matrices.[16]
A representative case study in structural engineering involves applying Rayleigh quotient iteration to finite element models of building frames or bridges, where matrices of size n ≈ 10^4 arise from discretized stiffness operators. For instance, in vibration analysis of a multi-story structure under uncertain parameters, the method computes bounds on the lowest natural frequencies and modes, demonstrating high accuracy with minimal iterations even for sparse systems derived from finite elements.[14][17]
Variants for Multiple Eigenvalues
When the Rayleigh quotient iteration (RQI) is extended to handle multiple eigenvalues or compute several nearby eigenvectors simultaneously, a subspace iteration variant is commonly employed. This approach maintains an orthogonal basis Q \in \mathbb{R}^{n \times k} spanning a k-dimensional subspace, where k is the number of desired eigenvectors. In each iteration, a block solve is performed: (A - \mu I) Y = Q, with \mu chosen as a shift (often the trace of the Rayleigh quotient Q^T A Q divided by k). The resulting Y is then orthogonalized via QR factorization to obtain the updated basis Q \leftarrow Q R^{-1} (where Y = Q R), ensuring orthonormality. A Rayleigh-Ritz projection follows: compute the small eigenvalue problem H = Q^T A Q, solve for its eigendecomposition H = F \Theta F^T, and refine the basis as Q \leftarrow Q F, with Ritz values \Theta providing improved eigenvalue estimates. This variant achieves faster convergence for clustered eigenvalues compared to applying scalar RQI sequentially, as the subspace captures interactions among nearby modes.[18]
Deflation techniques are essential in RQI variants to isolate converged eigenvalues and prevent recomputation when targeting multiple eigenvectors. One common method involves orthogonalizing the current iterate against previously converged eigenvectors before each update, effectively projecting onto the orthogonal complement to deflate the subspace spanned by known eigenpairs. For block formulations, deflation can incorporate multiple shifts simultaneously—such as using the eigenvalues of the Rayleigh quotient block Q^T A Q as shifts in a sequence of block solves—to accelerate isolation of a cluster of eigenvalues. Chasing procedures, adapted from QR iterations, can then be applied to bulge-like artifacts in the block to propagate and deflate converged pairs without disrupting the remaining spectrum. In hybrid approaches like the Jacobi-Davidson method augmented with inexact RQI, deflation is achieved through projectors that orthogonalize corrections against unwanted directions, enabling robust computation of interior eigenvalues by tuning the preconditioner to align with the target eigenpair and reducing inner iterations for multiple nearby modes. These techniques ensure numerical stability and efficiency, particularly for degenerate cases where eigenvalues have multiplicity greater than one.[19][20]
For Hermitian generalized eigenvalue problems of the form A x = \lambda B x, where A and B are Hermitian and B is positive definite, RQI is adapted using the generalized Rayleigh quotient \rho(x) = \frac{x^T A x}{x^T B x} as the shift. The iteration proceeds by solving (A - \rho B) y = B x for the update direction y, followed by normalization x \leftarrow y / \| B^{1/2} y \| and recomputation of \rho. This preserves the cubic local convergence of the standard case under suitable separation conditions on the spectrum, as the shift minimizes the residual in the B-inner product. The method is particularly effective for problems arising in structural mechanics, where B represents a mass matrix.[21]
Modern extensions from the 1990s and early 2000s, such as the Locally Optimal Block Preconditioned Conjugate Gradient (LOBPCG) method, build on RQI principles for parallel computation of multiple eigenvectors in large-scale symmetric problems. LOBPCG maintains a block of orthonormal vectors and iteratively minimizes the Rayleigh quotient over a locally optimal three-term recurrence, incorporating preconditioned updates and Rayleigh-Ritz projections to refine shifts dynamically. This allows efficient handling of degenerate or clustered eigenvalues on distributed architectures, often converging twice as fast as alternatives like Jacobi-Davidson while requiring fewer matrix-vector products. The approach is widely adopted in applications like quantum chemistry and principal component analysis due to its robustness and scalability.[22]