Power iteration
Power iteration, also known as the power method, is a fundamental iterative algorithm in numerical linear algebra designed to approximate the dominant eigenvalue (the one with the largest absolute value) and its corresponding eigenvector of a square, diagonalizable matrix A \in \mathbb{C}^{n \times n}, under the condition that the magnitude of this dominant eigenvalue strictly exceeds that of all others.[1] The method leverages repeated matrix-vector multiplications to amplify the component of the initial vector aligned with the dominant eigenvector, gradually suppressing contributions from less dominant modes.[2] Originating from early 20th-century work on eigenvalue approximation, it remains a cornerstone for solving large-scale eigenproblems due to its simplicity and low storage requirements.[3]
The algorithm begins with an arbitrary non-zero initial vector \mathbf{v}^{(0)}, typically chosen randomly to ensure a non-zero projection onto the dominant eigenspace.[1] At each iteration k, compute \mathbf{w}^{(k+1)} = A \mathbf{v}^{(k)}, followed by normalization \mathbf{v}^{(k+1)} = \mathbf{w}^{(k+1)} / \|\mathbf{w}^{(k+1)}\|, where the norm can be the Euclidean or infinity norm for numerical stability.[2] The approximate eigenvalue is then estimated via the Rayleigh quotient \mu^{(k+1)} = (\mathbf{v}^{(k+1)})^T A \mathbf{v}^{(k+1)} / (\mathbf{v}^{(k+1)})^T \mathbf{v}^{(k+1)} for symmetric matrices, or more generally from the scaling factor in the iteration.[3] Convergence is monitored by checking the residual \|A \mathbf{v}^{(k)} - \mu^{(k)} \mathbf{v}^{(k)}\| or changes in successive iterates until a tolerance is met.[1]
Under the assumptions of diagonalizability and a strictly dominant eigenvalue \lambda_1 with |\lambda_1| > |\lambda_2| \geq \cdots \geq |\lambda_n|, the method converges linearly to the dominant eigenvector, with the error reducing by a factor of approximately |\lambda_2 / \lambda_1| per step—faster convergence occurs when this ratio is small.[2] If the initial vector lacks a component in the dominant direction, convergence fails, but random initialization mitigates this risk with high probability.[3] For non-dominant eigenvalues, variants such as the inverse power method (applying (A - \sigma I)^{-1} for a shift \sigma near the target) or Rayleigh quotient iteration (using dynamic shifts for quadratic convergence) extend the approach.[2]
Power iteration finds applications in principal component analysis, Google's PageRank algorithm for web ranking, and solving large sparse eigenproblems in physics and engineering, where only the dominant mode is needed.[1] Despite its slow convergence for closely spaced eigenvalues, its ease of parallelization and minimal memory use make it preferable over denser methods like QR algorithm for very large matrices.[3]
Overview
Definition and Purpose
Power iteration is an iterative algorithm designed to approximate the dominant eigenvalue, which has the largest absolute value, and its corresponding eigenvector for a square matrix A, under the assumption that A is diagonalizable and possesses a unique dominant eigenvalue.[4]
To understand this method, it is essential to recall the foundational concepts of eigenvalues and eigenvectors in linear algebra. An eigenvector of a matrix A is a nonzero vector v such that Av = \lambda v, where \lambda is a scalar known as the corresponding eigenvalue; this equation indicates that A scales v by \lambda without altering its direction.[5] The dominant eigenvalue is the one with the largest magnitude |\lambda|, and its eigenvector represents the primary direction of expansion or contraction induced by repeated applications of A.[4]
The primary purpose of power iteration is to compute the principal eigenvector efficiently, particularly for high-dimensional datasets or large sparse matrices where direct eigenvalue solvers, such as QR decomposition, incur prohibitive computational costs due to their O(n^3) complexity and high memory demands.[6] This makes it especially suitable for applications involving discretized partial differential equations or graph-based problems, where matrices are vast and sparse, allowing fast matrix-vector multiplications that exploit the limited non-zero entries.[6]
For illustration, consider the simple 2x2 matrix A = \begin{pmatrix} 1 & 1 \\ 2 & 0 \end{pmatrix}, which has eigenvalues 2 and -1, with the dominant eigenvalue \lambda = 2 and corresponding eigenvector \begin{pmatrix} 1 \\ 1 \end{pmatrix}. Geometrically, this dominant eigenvector points in the direction of maximum stretch under transformations by A, as vectors aligned with it are scaled by the largest factor, while others are pulled toward this axis over iterations.[7]
Historical Development
The power iteration method, also known as the von Mises iteration, originated in the early 20th century as a practical approach to approximating the dominant eigenvalue and eigenvector of a matrix. It was formalized in 1929 by Richard von Mises and Hilda Pollaczek-Geiringer in their seminal paper "Praktische Verfahren der Gleichungsauflösung," published in the Zeitschrift für Angewandte Mathematik und Mechanik, where they described iterative procedures for solving systems of linear equations that naturally extended to eigenvalue problems.[8][9] This work built upon earlier 19th-century contributions, particularly Lord Rayleigh's development of the Rayleigh quotient in the 1870s for estimating eigenvalues in the context of vibrational modes in physical systems.[10]
Following its introduction, power iteration gained traction in numerical linear algebra after World War II, coinciding with the advent of electronic computers that enabled iterative computations on larger matrices. The method's simplicity and effectiveness for dominant eigenvalues made it a staple in early computational routines. In the 1960s, James H. Wilkinson further popularized and refined it through his influential book The Algebraic Eigenvalue Problem (1965), which provided rigorous convergence analysis and implementation guidance, solidifying its role in standard matrix computation libraries.[10]
Key milestones in its adoption include integration into major numerical software packages during the late 20th century, such as MATLAB in the 1980s, where it became a core tool for eigenvalue approximation, and later NumPy in the 1990s as part of Python's scientific computing ecosystem. By the 2000s, power iteration regained prominence in big data applications, notably underpinning the PageRank algorithm introduced by Sergey Brin and Larry Page in 1998 for web search ranking, demonstrating its enduring scalability for sparse, large-scale matrices.[10]
Algorithm Description
Iterative Steps
The power iteration algorithm proceeds by iteratively applying a matrix to an initial vector and normalizing the result to approximate the dominant eigenvector. It begins with the selection of an initial non-zero vector b_0, which is typically chosen randomly or using domain-specific heuristics to ensure it has a non-zero component in the direction of the dominant eigenvector.[11] The core recurrence relation is then applied for k = 0, 1, 2, \dots: compute the unnormalized vector w = A b_k, followed by normalization to obtain b_{k+1} = \frac{w}{\| w \|}, where A is the input matrix and \| \cdot \| denotes a vector norm.[12] This normalization is essential to prevent numerical overflow or underflow, as repeated matrix-vector multiplications without scaling can lead to vectors with magnitudes that exceed machine precision limits.[13] The choice of norm, such as the Euclidean (2-)norm, is common because it preserves compatibility with the Rayleigh quotient for eigenvalue estimation and contributes to numerical stability in floating-point arithmetic.[11]
Upon convergence of the eigenvector sequence, the corresponding dominant eigenvalue \lambda is approximated using the Rayleigh quotient:
\lambda \approx b_{k+1}^T A b_{k+1},
which provides a reliable estimate when b_{k+1} is close to the unit eigenvector.[11] The iterations continue until a stopping criterion is met, such as \| b_{k+1} - b_k \| < \epsilon for a small tolerance \epsilon > 0, or a predefined maximum number of iterations is reached to avoid infinite loops in non-convergent cases.[12]
The following pseudocode outlines the basic procedure:
Initialize: Select initial unit vector b_0 (||b_0|| = 1)
Set k = 0
While not converged and k < max_iterations:
Compute w = A * b_k
Set b_{k+1} = w / ||w||
If ||b_{k+1} - b_k|| < ε:
Converged = true
Set k = k + 1
Return b_k as approximate eigenvector, b_k^T * A * b_k as approximate eigenvalue
Initialize: Select initial unit vector b_0 (||b_0|| = 1)
Set k = 0
While not converged and k < max_iterations:
Compute w = A * b_k
Set b_{k+1} = w / ||w||
If ||b_{k+1} - b_k|| < ε:
Converged = true
Set k = k + 1
Return b_k as approximate eigenvector, b_k^T * A * b_k as approximate eigenvalue
This structure ensures the process remains simple and focused on the mathematical iteration.[11]
Implementation Considerations
In implementing the power iteration algorithm, the choice of initial vector b_0 is crucial to ensure convergence to the dominant eigenvector. A common strategy is to select a uniform random vector, which has a high probability of containing a non-zero component in the direction of the dominant eigenvector, avoiding the risk of starting orthogonal to it and thus failing to converge.[14] Alternatively, an arbitrary non-zero vector can be used, but it should be normalized initially (e.g., to unit Euclidean norm) to prevent overflow or underflow in early iterations.[1]
Numerical stability is a key concern, as repeated matrix-vector multiplications can amplify floating-point errors, particularly for ill-conditioned matrices where eigenvalues are closely spaced. To mitigate this, double precision arithmetic is recommended throughout the computation to maintain accuracy in vector norms and Rayleigh quotients.[14] Normalization at each step—typically by dividing the updated vector by its Euclidean norm—prevents exponential growth or decay, stabilizing the iteration without significantly impacting convergence. For matrices with poor conditioning, additional safeguards like periodic reorthogonalization against previously computed vectors may be needed, though this increases computational cost.[14]
The efficiency of power iteration depends on the matrix representation and hardware capabilities. For dense n \times n matrices, each iteration requires a matrix-vector multiplication, incurring O(n^2) time complexity, making it suitable for moderate-sized problems but prohibitive for very large n.[14] In contrast, for sparse matrices with nnz non-zero entries, the complexity reduces to O(nnz) per iteration using sparse multiplication routines, enabling scalability to larger systems. Parallelization is feasible, especially for the matrix-vector product, which can be distributed across processors or GPUs to achieve near-linear speedup on suitable architectures.[14]
A practical implementation in Python using NumPy illustrates these steps concisely:
python
import numpy as np
def power_iteration(A, num_iterations=100, tol=1e-6):
n = A.shape[0]
b = np.random.rand(n) # Random initial vector
b = b / np.linalg.norm(b) # Normalize
for i in range(num_iterations):
b_new = A @ b # Matrix-vector multiply
eigenvalue = np.dot(b, b_new) # Rayleigh quotient
b_new = b_new / np.linalg.norm(b_new) # Normalize
if np.linalg.norm(b_new - b) < tol:
break
b = b_new
return eigenvalue, b
import numpy as np
def power_iteration(A, num_iterations=100, tol=1e-6):
n = A.shape[0]
b = np.random.rand(n) # Random initial vector
b = b / np.linalg.norm(b) # Normalize
for i in range(num_iterations):
b_new = A @ b # Matrix-vector multiply
eigenvalue = np.dot(b, b_new) # Rayleigh quotient
b_new = b_new / np.linalg.norm(b_new) # Normalize
if np.linalg.norm(b_new - b) < tol:
break
b = b_new
return eigenvalue, b
This code initializes with a random unit vector, performs the core iteration via NumPy's efficient linear algebra operations, and uses the Rayleigh quotient for eigenvalue estimation, all in double precision by default.[12]
For very large matrices where explicit storage is infeasible due to memory constraints, the matrix A can be represented implicitly through a function handle that computes only the matrix-vector product A b on demand, avoiding full matrix storage while preserving the algorithm's structure. This approach is particularly effective for sparse or structured matrices arising in applications like graph analysis.[14]
Convergence Properties
Assumptions and Conditions
Power iteration relies on several key mathematical assumptions about the matrix and initial vector to ensure convergence to the dominant eigenvector. Primarily, the matrix A must be diagonalizable, meaning it can be expressed as A = X \Lambda X^{-1} where \Lambda is a diagonal matrix of eigenvalues and X is an invertible matrix of corresponding eigenvectors.[15] Additionally, A must have a unique dominant eigenvalue \lambda_1 such that |\lambda_1| > |\lambda_i| for all other eigenvalues \lambda_i (with i \neq 1), ensuring that the component associated with \lambda_1 dominates after repeated iterations.[16] The initial vector b_0 must also have a non-zero projection onto the dominant eigenvector; otherwise, the iteration may converge to a different eigenspace or fail entirely.[17]
In the non-diagonalizable case, where A has a non-trivial Jordan canonical form with a Jordan block of size m > 1 for the dominant eigenvalue, power iteration still converges to the corresponding eigenvector. However, the iterates incorporate components from generalized eigenvectors, leading to polynomial growth terms that result in a slower polynomial convergence rate of order O(1/k^{m-1}), rather than geometric.[18] For instance, a 2×2 Jordan block yields convergence at rate O(1/k).
Specific conditions on the matrix further refine applicability. For real symmetric matrices, diagonalizability is guaranteed by the spectral theorem, and all eigenvalues are real, simplifying the dominance condition to the largest absolute value among real numbers.[19] In the complex case, dominance is assessed by eigenvalue magnitudes, but the method assumes the dominant eigenvalue is real or that complex conjugates do not interfere with convergence; otherwise, oscillatory behavior may arise.[20]
Edge cases highlight potential failures when assumptions are violated. If multiple eigenvalues share the same maximum magnitude, such as \lambda_1 = 1 and \lambda_2 = -1 in a matrix like the adjacency matrix of a bipartite graph, the iterates oscillate and do not converge to a single eigenvector.[21] Similarly, rotation matrices in 2D have complex conjugate eigenvalues on the unit circle with equal magnitudes, causing the power iterates to rotate indefinitely without settling on a fixed direction.[16]
Rate and Proof of Convergence
Under the assumption of a unique dominant eigenvalue \lambda_1 with |\lambda_1| > |\lambda_i| for all i \neq 1 and an initial vector with nonzero component in the direction of the corresponding eigenvector v_1, the power iteration converges geometrically to v_1 (up to sign).[22]
The rate of convergence is governed by the factor \rho = \max_{i \neq 1} \left| \frac{\lambda_i}{\lambda_1} \right| < 1, which measures the size of the spectral gap between \lambda_1 and the subdominant eigenvalues.[22]
The error between the normalized iterate b_k and the unit vector in the direction of v_1 satisfies \|b_k - v_1 / \|v_1\| \| \sim C \rho^k as k \to \infty, where C > 0 is a constant depending on the initial vector b_0 and the condition number of the eigenvector matrix X.[22]
To establish this, let A = X \Lambda X^{-1} be the eigendecomposition, with columns of X the right eigenvectors v_j and \Lambda = \operatorname{diag}(\lambda_j). Express the initial vector as b_0 = X c, where c = X^{-1} b_0 with c_1 \neq 0.[22]
Applying the iteration k times yields
A^k b_0 = X \Lambda^k c = \lambda_1^k X \left( c_1 e_1 + \sum_{j=2}^n c_j \left( \frac{\lambda_j}{\lambda_1} \right)^k e_j \right),
and the normalized iterate is
b_k = \frac{A^k b_0}{\|A^k b_0\|} = \frac{ X d_k }{ \| X d_k \| },
where d_k = c_1 e_1 + \sum_{j=2}^n c_j \left( \frac{\lambda_j}{\lambda_1} \right)^k e_j.[22]
As k \to \infty, the terms \left( \frac{\lambda_j}{\lambda_1} \right)^k decay geometrically at rate at most \rho^k for j \geq 2, so d_k \to c_1 e_1, and b_k approaches v_1 / \| v_1 \| (up to sign).[22]
The angle \theta_k between b_k and v_1 satisfies \sin \theta_k = O(\rho^k) (or \tan \theta_k = O(\rho^k) for the symmetric case), confirming the linear convergence of the direction.[22]
The Rayleigh quotient R(b_k) = \frac{b_k^T A b_k}{b_k^T b_k} provides an approximation to \lambda_1 that converges at the same linear rate \rho, with |R(b_k) - \lambda_1| = O(\rho^k).[22]
For symmetric matrices, the eigenvalue error refines to |R(b_k) - \lambda_1| = O(\rho^{2k}) due to the variational characterization.[22]
Convergence slows when the spectral gap is small (\rho close to 1), as the number of iterations needed to achieve error \epsilon scales as \frac{\log(1/\epsilon)}{-\log \rho}; thus, high precision requires polynomially many iterations in $1/(1 - \rho).[22]
Applications
Principal Component Analysis
Power iteration serves a crucial role in principal component analysis (PCA) by approximating the direction of maximum variance in a dataset, achieved through computation of the dominant eigenvector of the sample covariance matrix.[23] This eigenvector represents the first principal component, capturing the largest amount of data variability along a single axis.[23]
The sample covariance matrix is constructed as S = \frac{1}{m} X^T X, where X is the centered data matrix with m samples and features in columns, ensuring the data has zero mean to focus on variance structure.[23] To obtain the principal eigenvector using power iteration on this centered data, begin with a random unit vector v_0. Then, iteratively update v_i = S v_{i-1} / \| S v_{i-1} \| until convergence, yielding the direction that maximizes the projected variance \mathbf{v}^T S \mathbf{v}.[23][24]
The method's simplicity renders it particularly advantageous for high-dimensional data, where it scales efficiently to millions of features by avoiding full eigendecomposition and requiring only matrix-vector multiplications per iteration.[23] In full PCA, power iteration typically computes the first principal component as the initial step, after which deflation—projecting the data orthogonal to this component—is applied to isolate subsequent components.[23]
Web Ranking Algorithms
Power iteration plays a central role in web ranking algorithms, particularly in the PageRank system developed by Google, which computes the importance of web pages by finding the principal eigenvector of the Google matrix representing the web's link structure. The web is modeled as a directed graph where nodes are pages and edges are hyperlinks. The transition matrix P is row-stochastic: P_{i,j} = \frac{1}{\deg^+(i)} if page i links to page j, where \deg^+(i) is the out-degree of i, and 0 otherwise; dangling nodes (zero out-degree) are handled by setting their row to uniform $1/n. To ensure irreducibility and aperiodicity for convergence, the Google matrix G is constructed as G = \alpha P + (1 - \alpha) \frac{1}{n} \mathbf{1} \mathbf{1}^T, where \alpha is the damping factor (typically 0.85), n is the number of pages, and \mathbf{1} is the all-ones vector.[25]
The PageRank vector \pi, representing the steady-state probability distribution of a random surfer, is the principal left eigenvector of [G](/page/G) satisfying \pi^T G = \pi^T with \sum \pi_i = [1](/page/1). Power iteration computes this by starting with an initial uniform vector \pi^{(0)} = \frac{[1](/page/1)}{n} \mathbf{1}^T and iteratively updating \pi^{(k+1)T} = \pi^{(k)T} G, normalizing to sum to 1 at each step, until \|\pi^{(k+1)} - \pi^{(k)}\|_1 < \epsilon (typically \epsilon = 10^{-6}). This process finds the stationary distribution, as the dominant eigenvalue of G is 1. The damping factor \alpha \approx 0.85 models the surfer's 15% chance of jumping to a random page, ensuring convergence even on graphs with disconnected components or dangling nodes.[25]
Google's original 1998 implementation of PageRank employed power iteration to rank hundreds of millions of pages and links, achieving convergence in approximately 52 iterations for a graph with 322 million links. In practice, for larger web-scale graphs (e.g., over 80 million pages), the method converges in about 50 iterations, with the number scaling logarithmically due to the spectral gap induced by damping.[25][26]
Extensions of this approach appear in social networks, such as Twitter's "Who to Follow" service, which uses personalized PageRank—computed via power iteration on the follower graph—to recommend accounts based on eigenvector centrality in a user's "circle of trust," prioritizing connections to influential users.[27]
Extensions and Limitations
Variations like Inverse Iteration
Inverse power iteration, also known as inverse iteration, is a modification of the power iteration method designed to compute eigenvectors corresponding to eigenvalues that are not the dominant one in magnitude. It applies the power method to the inverse of a shifted matrix, specifically solving the equation (A - \mu I)^{-1} v = \frac{1}{\lambda'} v, where \mu is a shift value chosen close to a target eigenvalue \lambda, and \lambda' is the eigenvalue of the shifted matrix closest to zero. This approach converges to the eigenvector associated with the eigenvalue of A nearest to \mu, making it particularly useful for finding interior or smaller eigenvalues that standard power iteration overlooks.[28]
The iterative step in inverse power iteration is given by
b_{k+1} = \frac{(A - \mu I)^{-1} b_k}{\|(A - \mu I)^{-1} b_k\|},
where b_k is the current iterate, and each step requires solving a linear system (A - \mu I) z = b_k for z, followed by normalization. Convergence is linear, with the rate determined by the ratio of the distances from \mu to the target eigenvalue and the nearest other eigenvalue; a well-chosen \mu accelerates this process significantly. When \mu = 0, the method targets the smallest eigenvalue in magnitude of A, as the eigenvalues of A^{-1} are the reciprocals, making the smallest |\lambda| the dominant one in the inverse.[29][30]
The origins of inverse power iteration trace back to Ernst Pohlhausen in 1921, who developed it for computing resonance frequencies in structural mechanics, though it was later formalized for general matrix eigenvalue problems by Helmut Wielandt in 1944 as a method for eigenfunctions of linear operators. James H. Wilkinson further refined and popularized it in the late 1950s, integrating it into practical numerical libraries and emphasizing its complementarity to the power method for non-dominant eigenvalues.[30][28]
Shifted inverse iteration extends this by allowing a fixed nonzero \mu, enabling the targeting of specific eigenvalues away from the extremes; it was suggested by Wielandt in 1944 and independently by Wilkinson in 1958. Unlike standard power iteration, which amplifies the component corresponding to the largest-magnitude eigenvalue, inverse variants with appropriate shifts allow access to the spectrum's interior, providing a versatile toolkit for eigenvalue computation.[29][28]
Rayleigh quotient iteration builds on shifted inverse iteration by adaptively updating the shift at each step using the Rayleigh quotient \rho(b_k) = \frac{b_k^T A b_k}{b_k^T b_k}, which serves as an improved eigenvalue estimate from the current iterate. The update becomes
b_{k+1} = \frac{(A - \rho(b_k) I)^{-1} b_k}{\|(A - \rho(b_k) I)^{-1} b_k\|},
yielding cubic convergence near the target eigenvalue under suitable conditions, though each iteration is more computationally expensive due to repeated linear solves with varying shifts. This method was analyzed and established by Alexander M. Ostrowski in 1957–1958, highlighting its rapid local convergence for symmetric matrices.[29]
Practical Challenges and Alternatives
Power iteration encounters several practical challenges that can hinder its efficiency and reliability in real-world applications. One primary issue is slow convergence when the spectral gap is small, meaning the ratio of the second-largest eigenvalue magnitude to the dominant one, |\lambda_2 / \lambda_1|, is close to 1; in such cases, the error reduction per iteration diminishes, potentially requiring thousands of steps for acceptable accuracy, as demonstrated in examples with Poisson matrices where over 1,900 iterations were needed due to a ratio of approximately 0.996.[18] Additionally, the method is sensitive to the choice of initial vector; if the initial guess has no component in the direction of the dominant eigenvector, convergence fails entirely, though random initialization mitigates this with high probability.[2] For non-normal matrices, numerical instability arises because eigenvectors can be ill-conditioned, leading to sensitivity in computed approximations under finite-precision arithmetic, where small perturbations amplify errors in the iterates.[28] Furthermore, handling dense matrices demands high memory for storage and O(n²) time per matrix-vector multiplication, making it impractical for very large dimensions without sparsity.[31]
Key limitations further restrict power iteration's applicability. The method fails to converge to a single eigenvector when multiple eigenvalues have equal magnitude, such as complex conjugate pairs with the same modulus, as the iterates oscillate or diverge from isolation.[2] It also does not directly compute multiple eigenvectors without additional techniques like deflation, which subtracts the contribution of the dominant mode to target subsequent ones but introduces accumulation of rounding errors over iterations.[4] For non-diagonalizable matrices with defective eigenvalues, while the dominant eigenvector can still be approximated if isolated, the method cannot resolve generalized eigenvectors without extensions, limiting its use in such cases.[18]
In practice, alternatives are preferred when power iteration's assumptions falter. Krylov subspace methods like the Arnoldi and Lanczos algorithms extend power iteration to compute a fuller spectrum of extremal eigenvalues for large sparse matrices by building orthogonal bases, offering faster convergence for clustered spectra.[14] For symmetric generalized eigenvalue problems, the locally optimal block preconditioned conjugate gradient (LOBPCG) method provides robust block computations with preconditioning, achieving efficiency comparable to ideal solvers in numerical tests on large systems. For small dense matrices (n < 10³), direct methods such as the QR algorithm implemented in routines like MATLAB's eig() are favored, as they compute the full eigensystem reliably without iteration.[32] Hybrids combining power iteration with deflation enable computation of the top-k eigenvectors, though they are prone to error buildup.[4]
Power iteration remains vital for sparse matrices exceeding 10⁶ dimensions with a clear dominant eigenvalue, such as in graph analytics, where matrix-vector multiplies exploit structure efficiently, but it is largely outdated for small dense cases solvable directly.[31] It performs best on positive definite matrices with a well-separated dominant mode, ensuring monotonic convergence under the outlined conditions.[18]