Fact-checked by Grok 2 weeks ago

Preconditioner

A preconditioner is a matrix or operator in that transforms a Ax = b into an equivalent form, such as M^{-1}Ax = M^{-1}b for left preconditioning, to accelerate the of iterative methods by improving the spectral properties of the system, such as reducing the or clustering eigenvalues around unity. The preconditioner M must approximate the original A closely enough to enhance efficiency while remaining computationally inexpensive to apply or invert, often through sparse approximations or factorizations. The term "preconditioning" originated in the late 1940s with the first use by in the context of direct methods to address round-off errors related to matrix conditioning, but techniques gained widespread adoption in the 1970s alongside the rise of methods like the conjugate gradient () algorithm and incomplete Cholesky (IC) preconditioning. By the 1980s and 1990s, advancements included incomplete LU (ILU) factorizations and algebraic multigrid (AMG) methods, addressing challenges in solving large sparse systems arising from partial differential equations (PDEs) in scientific , simulations, and optimization. These developments were driven by the need for robust solvers on increasingly large-scale problems, where direct methods become prohibitive due to memory and time constraints. Common types of preconditioners include stationary methods like Jacobi (using the diagonal of A) and Gauss-Seidel (using the lower triangular part), which are simple but limited for ill-conditioned systems; incomplete factorizations such as ILU(0) or IC(0), which drop small elements during LU or to maintain sparsity; and advanced approaches like domain decomposition (e.g., Schwarz methods) or sparse approximate inverses (SPAI), which exploit problem structure for parallelism and scalability on modern hardware. The choice of preconditioner depends on the matrix properties, such as symmetry and , and is crucial for methods like GMRES or BiCGSTAB, often reducing iteration counts from hundreds to a few dozen in practical applications. Ongoing research focuses on adaptive and machine-learning-enhanced preconditioners to handle the demands of and multiphysics simulations.

Introduction

Definition and Purpose

In , the of a A, denoted \kappa(A), is defined as \kappa(A) = \|A\| \cdot \|A^{-1}\|, where \|\cdot\| denotes a consistent matrix norm. This scalar measures the sensitivity of the solution to a linear system Ax = b to small perturbations in the input data or roundoff errors in computations, with higher values indicating greater amplification of errors and potential instability in solvers or slower convergence rates in iterative algorithms. A preconditioner is a or P (often approximating A or A^{-1}) that transforms the original Ax = b into an equivalent form with more favorable spectral properties, such as the left-preconditioned system P^{-1}Ax = P^{-1}b or the right-preconditioned system AP^{-1}y = b where y = P^{-1}x. The goal is to reduce the of the transformed system, either \kappa(P^{-1}A) for left preconditioning or \kappa(AP^{-1}) for right preconditioning, thereby improving and efficiency. The primary purpose of a preconditioner is to accelerate the convergence of iterative methods, such as the , by clustering the eigenvalues of the preconditioned matrix near unity, which minimizes the effective and reduces the number of iterations required for a desired accuracy. This benefit involves a fundamental : the computational cost of constructing and applying P must be outweighed by the savings from fewer iterations. In cases where A is symmetric positive definite, left and right preconditioning are equivalent with a single symmetric preconditioner P = M = N, yielding the system M^{-1}Ax = M^{-1}b.

Historical Overview

The concept of preconditioning was first introduced by in 1948, who used the term to describe transformations that improve the conditioning of matrices in iterative and direct solvers. The origins of preconditioning techniques trace back to early iterative methods for solving linear systems, with foundational developments in the mid-20th century. In the 1950s, David M. Young introduced (SOR) in his doctoral thesis, establishing a framework for accelerating iterative methods like Jacobi and Gauss-Seidel, which implicitly preconditioned the system by adjusting relaxation parameters to improve . Around the same time, Richardson iteration, originally proposed in 1910 but adapted for modern numerical contexts, served as a simple explicit preconditioner by incorporating dynamic scaling to reduce the of the iteration matrix. These methods laid the groundwork for preconditioning as a means to cluster eigenvalues and enhance solver efficiency, particularly for elliptic partial differential equations discretized on grids. The 1970s marked a pivotal advancement with the formalization of preconditioned methods. In 1976, Paul Concus, Gene H. Golub, and Dianne P. O'Leary developed the preconditioned conjugate gradient (PCG) method, extending the classical conjugate gradient algorithm to incorporate left or right preconditioners, significantly accelerating convergence for symmetric positive definite systems arising in and finite element discretizations. The following year, J.A. Meijerink and H.A. van der Vorst introduced incomplete (ILU) factorization as an algebraic preconditioner, providing stable incomplete decompositions for sparse matrices, especially M-matrices, and enabling robust preconditioning for nonsymmetric problems. Gene Golub's contributions to PCG and related eigenvalue analyses further solidified these techniques' theoretical foundations. During the 1980s and 1990s, preconditioning evolved toward scalable methods for large-scale problems, driven by advances in and techniques. Multigrid methods, pioneered by Achi Brandt in the late 1970s and refined throughout the 1980s, emerged as powerful preconditioners by leveraging hierarchical grids to eliminate errors across scales, achieving mesh-independent convergence rates. Domain decomposition preconditioners, developed concurrently by researchers like , partitioned problems into subdomains for parallel solution, with overlapping variants like Schwarz methods enhancing robustness. In the 1990s, sparse approximate inverse (SPAI) preconditioners were introduced by Marcus J. Grote and Thomas Huckle in 1997, optimizing sparse matrices to approximate the inverse via least-squares minimization, offering explicit and parallelizable alternatives to factorization-based approaches. Andrea Greenbaum's spectral analyses during this period provided critical insights into convergence bounds and preconditioner efficacy for Krylov methods. In the 2000s, preconditioning extended beyond linear systems to nonlinear and time-dependent contexts, with variable preconditioners adapting to evolving problem structures. For nonlinear problems, Xu Cai and David E. Keyes proposed additive Schwarz preconditioned inexact (ASPIN) in 2002, combining domain decomposition with nonlinear elimination to precondition Jacobian approximations in optimization and PDE solvers. This evolution facilitated applications in , where block preconditioners addressed indefinite KKT systems, and time-dependent simulations, employing variable metric updates to maintain amid changing coefficients.

Preconditioning for Linear Systems

Core Concepts and Description

Preconditioning addresses the challenge of solving large-scale sparse linear systems of the form Ax = b, where A is an n \times n with n potentially reaching millions, and direct methods like become infeasible due to high computational cost and storage requirements, particularly when A is ill-conditioned. These systems arise frequently in scientific computing, such as discretizations of partial differential equations, where sparsity preserves structure but ill-conditioning amplifies numerical errors and slows convergence of iterative solvers. The core idea of preconditioning is to introduce a matrix P, typically nonsingular and easier to invert than A, that approximates A^{-1} such that the preconditioned matrix P^{-1}A has eigenvalues clustered near unity, effectively reducing the \kappa(P^{-1}A) and improving solver efficiency. In the ideal case, P^{-1}A approaches the , minimizing the eigenvalue spread and accelerating convergence. Preconditioning can be applied in left, right, or split forms: left preconditioning transforms the system to P^{-1}Ax = P^{-1}b; right preconditioning solves AP^{-1}y = b followed by x = P^{-1}y; and split preconditioning uses P = P_1 P_2 to form P_1^{-1} A P_2^{-1} y = P_1^{-1} b with x = P_2^{-1} y, offering flexibility for symmetric or structured problems. For stationary iterative methods, such as those based on splittings like Jacobi or (SOR), preconditioning reduces the of the iteration matrix, ensuring faster asymptotic rates. In methods, it clusters eigenvalues to diminish the effective dimension of the needed for . When A is symmetric positive definite (SPD), preconditioners are often chosen to be SPD as well, enabling methods like the preconditioned conjugate gradient (PCG) that exploit for efficiency. For general nonsymmetric matrices, preconditioners must handle potential indefiniteness, with methods like GMRES accommodating left, right, or split variants to maintain stability. The transformed system emphasizes minimization in a preconditioned inner product defined as \langle x, y \rangle_M = x^T M y, where M is SPD (often M = P^T P for nonsymmetric P), altering the of the space to favor rapid convergence. This inner product induces a norm \| z \|_M = \sqrt{\langle z, z \rangle_M} under which the preconditioned is minimized, providing a theoretically grounded for .

Preconditioned Iterative Methods

Iterative methods for solving large sparse linear systems Ax = b often rely on techniques, which generate solution approximations within the \mathcal{K}_k(A, r_0) = \operatorname{span}\{r_0, Ar_0, \dots, A^{k-1}r_0\}, where r_0 = b - Ax_0 is the initial residual. For symmetric positive definite (SPD) matrices A, the conjugate gradient (CG) method minimizes the A-norm of the error over the at each step, ensuring orthogonality of residuals and search directions. For nonsymmetric systems, the generalized minimal residual (GMRES) method minimizes the Euclidean norm of the residual over the , while the biconjugate gradient stabilized (BiCGSTAB) method provides a computationally efficient alternative with smoother convergence properties by combining elements of biconjugate gradients with stabilizing polynomials. Preconditioning integrates into these Krylov methods by transforming the system to improve spectral properties, typically via left-preconditioning P^{-1}Ax = P^{-1}b or right-preconditioning A P^{-1}y = b with x = P^{-1}y, where P approximates A. In the preconditioned conjugate gradient (PCG) method for SPD systems, the search directions incorporate preconditioned residuals z_k = P^{-1}r_k, leading to an error bound in the P^{-1}A-: \frac{\|e_k\|_{P^{-1}A}}{\|e_0\|_{P^{-1}A}} \leq 2 \left( \frac{\sqrt{\kappa(P^{-1}A)} - 1}{\sqrt{\kappa(P^{-1}A)} + 1} \right)^k, where e_k = x - x_k is the error and \kappa(M) = \lambda_{\max}(M)/\lambda_{\min}(M) is the . For symmetric indefinite systems, the preconditioned MINRES method extends this framework by minimizing the residual in the while handling indefinite spectra through Lanczos tridiagonalization. These preconditioned variants maintain the short-recurrence properties of their unpreconditioned counterparts, enabling efficient implementation for large-scale problems. A common approach to constructing preconditioners is matrix splitting, where A = P - Q with P chosen for easy inversion, serving as an approximate inverse to A since P^{-1} \approx A^{-1} when Q is small relative to P in an appropriate . For incomplete LU (ILU(0)) factorization, P = [LU](/page/LU_decomposition) where L and U are sparse lower and upper triangular factors of A with no fill-in beyond the and first superdiagonal/subdiagonal, derived by dropping terms in the exact to preserve sparsity. Similarly, (SOR) splitting yields P = (D - \omega L)D^{-1}(D - \omega U) for relaxation parameter \omega, and the symmetric version (SSOR) applies forward and backward SOR sweeps to ensure symmetry preservation for SPD matrices, acting as P^{-1} \approx A^{-1} through iterative approximation of the inverse. In practice, the preconditioned is computed as r_{k+1} = b - A x_{k+1}, with the search direction updated using z_k = P^{-1} r_k to accelerate within the Krylov . Convergence theory for these methods emphasizes the condition number \kappa(P^{-1}A); for PCG on SPD systems, the number of iterations to reduce the relative error below a \epsilon scales as O(\sqrt{\kappa(P^{-1}A)} \log(1/\epsilon)), reflecting slower for ill-conditioned matrices due to eigenvalue spread. Effective preconditioning targets \kappa(P^{-1}A) \approx 1 by clustering eigenvalues near unity, ideally achieving mesh-independent rates in discretized PDEs. For nonsymmetric cases like preconditioned GMRES or BiCGSTAB, is superlinear but lacks sharp worst-case bounds, though preconditioning typically reduces effective condition numbers and iteration counts significantly in practice.

Geometric and Spectral Interpretations

Preconditioning can be interpreted geometrically as a that alters the geometry of the error propagation in iterative methods for solving linear systems Ax = b. In the original A-, defined as \|x\|_A = \sqrt{x^T A x} for a symmetric positive definite (SPD) A, the set of errors with unit A- forms an aligned with the eigenvectors of A. Applying a preconditioner P transforms this into the preconditioned \|x\|_P = \sqrt{x^T P x}, where the error in the original becomes a unit ball ( in ) in the preconditioned , simplifying the of methods like the conjugate gradient algorithm. This transformation is often achieved via the Cholesky factorization of the preconditioner, which scales and rotates the error contours to make them more isotropic, thereby reducing the distortion caused by the ill-conditioning of A. From a perspective, preconditioning aims to cluster the eigenvalues of the preconditioned P^{-1}A tightly around 1, which promotes robust and rapid of iterative solvers. For SPD matrices A, the effective \kappa(P^{-1}A) is determined by the ratio of the largest to smallest eigenvalues, \kappa(P^{-1}A) = \lambda_{\max}(P^{-1}A) / \lambda_{\min}(P^{-1}A), where an ideal preconditioner minimizes this ratio by pushing eigenvalues toward unity and away from extremes. In structured problems, such as those arising from discretized differential operators, optimal preconditioners like circulant approximations achieve eigenvalue clustering near 1 (or \{-1, 1\} for indefinite cases), ensuring the distribution remains bounded independently of the problem size. This clustering directly bounds the number of iterations required for , as the reduction per step depends on the eigenvalue spread. For non-Hermitian matrices, the field of values (numerical range) \mathcal{W}(P^{-1}A) = \{ z \in \mathbb{C} : z = u^* (P^{-1}A) u, \, \|u\|=1 \} provides a containing the eigenvalues, and preconditioning reduces its spread to improve the of methods like GMRES. A well-chosen preconditioner confines \mathcal{W}(P^{-1}A) to a small disk or region around 1, minimizing the maximum deviation from the origin and thus bounding the norms independently of parameters. In nonsymmetric saddle-point systems, including zero in the field of values requires preconditioners that control the skew-symmetric part, ensuring the numerical range forms a with a bounded "" for dimension-independent .

Advanced Preconditioning Techniques

Variable preconditioning extends standard techniques by allowing the preconditioner P to vary across iterations, denoted as P_k at the k-th step, which is particularly useful for solving sequences of linear systems arising in time-dependent or space-varying problems, such as those from discretized partial differential equations (PDEs). This flexibility accommodates evolving system matrices, as in unsteady simulations where the operator changes with time, enabling more accurate approximations tailored to each system without recomputing a fixed preconditioner from scratch. Seminal work introduced variable preconditioning through nested GMRES methods like GMRESR, where an inner GMRES iteration approximates the preconditioner update per outer step, demonstrating reduced computational cost for unsymmetric systems in applications. Nonlinear preconditioning addresses challenges in solving nonlinear equations by enhancing fixed-point iterations, such as iterations, with techniques that accelerate convergence without linearizing the full . , a prominent method, modifies the update by minimizing a over a of previous iterates, effectively acting as a nonlinear preconditioner that improves the robustness of outer solvers like methods for systems derived from nonlinear PDEs. In this context, left-preconditioning integrates the nonlinear step to precondition the linearized subproblems within iterations, leading to faster convergence for stiff nonlinear systems in areas like non-isothermal flow simulations. The approach originates from Anderson's 1965 mixing technique but was formalized for fixed-point iterations in modern analysis, showing equivalence to GMRES on linear problems and quadratic convergence under suitable conditions. Random preconditioning leverages randomized algorithms to construct approximate inverses or low-rank updates efficiently, particularly for high-dimensional linear systems where exact is prohibitive. Randomized sketches, such as those based on Gaussian projections, reduce the matrix to a lower-dimensional , enabling approximations that serve as preconditioners for methods like conjugate gradients. A key example is randomized Nyström preconditioning, which approximates the via and to form a low-rank preconditioner, achieving a bounded independently of the system size when the sketch matches the effective numerical . This technique is especially effective for kernel matrices in , where it accelerates iterative solvers by capturing dominant components with minimal overhead. Spectrally equivalent preconditioning constructs a matrix P such that the preconditioned P^{-1}A has eigenvalues clustered within a small , ensuring robust of iterative methods regardless of the size in discretized PDEs. Formally, this requires bounds of the form \alpha \|v\|^2 \leq v^T P^{-1} A v \leq \beta \|v\|^2 \quad \text{for all } v \neq 0, with \beta / \alpha close to 1, often achieved through \alpha I \preceq P^{-1}A \preceq \beta I. Construction typically involves approximate decompositions, such as those from multilevel methods or domain decomposition, where additive Schwarz preconditioners provide equivalence with constants independent of parameters. In additive Schwarz frameworks, the preconditioner aggregates local solvers over overlapping subdomains, yielding bounded spectra via partition-of-unity arguments and properties of finite element spaces. This approach underpins robust solvers for elliptic PDEs, with the scaling logarithmically with subdomain count in optimized variants.

Specific Examples

One prominent example of a classical preconditioner is the Jacobi preconditioner, which uses the diagonal part of the matrix A to form P = D, where D = \operatorname{diag}(A). This approach is simple to implement and highly parallelizable, as it requires only local operations on each diagonal entry, making it suitable for distributed computing environments. It is particularly effective for matrices that are diagonally dominant, where the magnitude of each diagonal entry exceeds the sum of the absolute values of the off-diagonal entries in the corresponding row. The inverse preconditioner is then P^{-1} = D^{-1}, and in the context of preconditioned Richardson iteration (a basic form of preconditioned gradient descent), the update step is given by x_{k+1} = x_k + P^{-1} (b - A x_k), which clusters eigenvalues near unity to accelerate convergence for well-conditioned problems. Incomplete factorizations represent another foundational class of preconditioners, approximating the of A while controlling computational cost through sparsity patterns. The ILU(k) method, for instance, computes lower and upper triangular factors L and U such that P = LU \approx A, but discards (drops) elements beyond a fill level k, where fill refers to new nonzeros introduced during . The algorithm proceeds by performing on A and dropping elements smaller than a or beyond the sparsity pattern defined by k; for k=0, it preserves only the original nonzero structure (ILU(0)). This technique is robust for symmetric positive definite matrices and M-matrices, providing a good to the exact when drop tolerances are tuned appropriately. Sparse approximate inverse (SPAI) preconditioners construct P directly as an approximation to A^{-1} by minimizing the Frobenius norm \|I - P A\|_F subject to sparsity constraints on P. The algorithm uses column-by-column projections: for each column j of P, it solves a constrained least-squares problem \min \|e_j - P_j A\|_2, where e_j is the j-th and P_j is the j-th column of P restricted to a predefined sparse , often derived from the sparsity of A. This yields a sparse, explicit that is easily parallelizable, as columns can be computed independently, and it performs well for nonsymmetric matrices where factorization methods may fail. SPAI often requires fewer iterations than diagonal preconditioners in Krylov methods, especially for ill-conditioned systems from PDE discretizations. Other notable examples include the symmetric successive over-relaxation (SSOR) preconditioner, which applies two sweeps in forward and backward directions with relaxation \omega, yielding P = (D + \omega L) D^{-1} (D + \omega U), where L and U are the strict lower and upper triangular parts of A; it enhances preservation and is effective for elliptic PDEs. preconditioners, such as those based on the least-squares commutator (LSC), approximate A^{-1} using low-degree polynomials \sum_{i=0}^m c_i A^i fitted via least-squares minimization of residuals, often exploiting relations like [B, C] \approx A for auxiliary operators B and C to simplify construction; these are particularly useful in block preconditioning for coupled systems, reducing condition numbers by up to an in targeted applications.

Preconditioning for Eigenvalue Problems

Spectral Transformation Methods

Spectral transformation methods play a crucial role in solving large-scale eigenvalue problems of the form A \mathbf{x} = \lambda \mathbf{x}, where A is a large , by reshaping the to make target eigenvalues more accessible to iterative algorithms. These techniques shift or the eigenvalues, clustering unwanted ones while isolating those of interest, such as interior or extreme values, thereby accelerating convergence of projection methods. Preconditioners are to these transformations, approximating the of shifted operators to mitigate ill-conditioning and reduce the cost of linear solves within the iterations. One prominent approach is the shift-and-invert transformation, particularly effective for targeting interior eigenvalues near a shift \sigma. In this method, the original problem is reformulated by solving linear systems of the form (A - \sigma I) \mathbf{y} = \mathbf{x}, which transforms the eigenvalues such that the new eigenvalues \mu satisfy \mu = 1/(\lambda - \sigma); thus, eigenvalues \lambda closest to \sigma become the largest in magnitude in the transformed spectrum. A preconditioner P approximates (A - \sigma I)^{-1}, enabling efficient solution of these systems via iterative solvers like GMRES, which is essential for sparse matrices where direct inversion is infeasible. This preconditioned shift-and-invert strategy enhances the clustering of interior eigenvalues, making them dominant for extraction. For non-Hermitian problems, the offers a robust alternative, converting the original into a Hermitian form to leverage symmetric solvers. The transformation is defined as (A - \sigma I)(A + \overline{\sigma} I)^{-1}, where \sigma is and \overline{\sigma} its conjugate, yielding a Hermitian whose eigenvalues relate to the original spectrum via a mapping that maps the imaginary axis to the unit circle. Preconditioning targets the resolvent (A + \overline{\sigma} I), with P approximating its inverse to stabilize solves and preserve the Hermitian structure, thus improving robustness over pure shift-invert when linear systems are solved inexactly. In the preconditioned eigensolver framework, the transformed problem takes the form P^{-1} (A - \sigma I) \mathbf{z} = \mu \mathbf{z}, where \mu is related to the target \lambda by \mu = 1/(\lambda - \sigma) for shift-invert, and P clusters the spectrum around unity to accelerate convergence. This setup is typically solved using preconditioned Lanczos for symmetric A (targeting exterior or interior spectra via shifts) or Arnoldi for nonsymmetric cases, where matrix-vector products involve preconditioned solves of the shifted system. These methods, such as implicitly restarted variants, efficiently compute a few eigenpairs by deflating converged modes and restarting to focus on the transformed extremal eigenvalues.

Ideal and Practical Preconditioning

In the context of eigenvalue problems, the ideal preconditioner is defined as P = (A - \sigma I)^{-1}, where A is the matrix, I is the identity, and \sigma is a shift chosen near the target eigenvalues to enhance of iterative methods. This exact inverse transforms the original eigenvalues \lambda_i of A into \mu_i = 1 / (\lambda_i - \sigma) for the preconditioned operator, effectively clustering those near \sigma around unity while spreading others, thereby improving spectral separation and accelerating methods like Arnoldi or Lanczos. However, computing this preconditioner exactly requires a full of A - \sigma I, which is computationally prohibitive for large sparse matrices due to high setup costs and storage demands. The primary goals of such ideal preconditioning include of known eigenpairs to avoid recomputation and targeted isolation of eigenvalue clusters, though these remain theoretical benchmarks rather than direct implementations. Practical preconditioners approximate this ideal form to balance accuracy and efficiency, often tailoring approximations to the spectrum of A. For instance, subspace recycling reuses approximate invariant subspaces from prior iterations or related solves to augment Krylov bases, reducing the effective dimension and speeding up convergence in sequences of shifted systems. Coarse-grid corrections, drawn from multigrid techniques, project the problem onto lower-resolution grids to approximate low-frequency eigenmodes, providing robust preconditioning for symmetric problems without inner-outer iterations. Other approximations include polynomial-based inverses, such as Chebyshev or least-squares polynomials fitted to (A - \sigma I)^{-1}, which avoid explicit factorizations while preserving spectral tailoring. A generalized ideal preconditioner P seeks to minimize the spectral of the preconditioned P^{-1} A, ideally clustering all relevant eigenvalues near 1 to ensure uniform bounds across iterative eigensolvers. Robustness is particularly crucial for handling multiple shifts or clustered eigenvalues, where methods like shift-and-invert combined with separate clusters by removing converged eigenpairs, preventing stagnation in block solvers. Preconditioned iterations further enhance cluster robustness by analyzing in eigenvalue groups rather than individually, yielding tighter bounds for dense clusters. Key challenges in practical preconditioning include trading off setup costs—such as building multigrid hierarchies or recycling bases—against per-solve speed, where overly complex approximations may not amortize over few iterations. Adaptive preconditioners address varying shifts by dynamically updating approximations, such as through adjustments in inexact iterations, though they require careful selection to maintain . These trade-offs underscore the need for problem-specific tuning to achieve reliable performance in large-scale eigenvalue computations.

Preconditioning in Optimization

Role and Description

In nonlinear optimization, preconditioning plays a crucial role in accelerating gradient-based methods for minimizing an objective function f(\mathbf{x}) by rescaling the search directions to mitigate issues arising from poor variable scaling or ill-conditioned . The standard preconditioned update takes the form \mathbf{x}_{k+1} = \mathbf{x}_k - \alpha P^{-1} \nabla f(\mathbf{x}_k), where P is a positive definite preconditioning matrix that approximates the inverse Hessian or scales the coordinates to improve the of the problem. This approach transforms the optimization landscape, making the gradients more isotropic and enabling faster convergence compared to , especially in high-dimensional settings where variables have disparate scales. In second-order methods, preconditioning approximates the inverse Hessian to enhance the efficiency of , which solves for the step \delta \mathbf{x} in the system H \delta \mathbf{x} = -\nabla f(\mathbf{x}_k), where H is the or its . The preconditioned variant solves the modified system P^{-1} H \delta \mathbf{x} = -P^{-1} \nabla f(\mathbf{x}_k), effectively incorporating P^{-1} as a that clusters the eigenvalues of the preconditioned near unity, thus reducing the number of iterations needed for . This is particularly valuable in quasi-Newton methods like BFGS, where limited-memory approximations serve as both the update and a form of adaptive preconditioning to handle large-scale problems without full storage. For , preconditioning extends to methods like (SQP), where it targets the of the constraints to improve the conditioning of the Karush-Kuhn-Tucker (KKT) systems solved at each . In active-set SQP variants, preconditioners based on block factorizations of the help stabilize the reduced solves, leading to robust performance in PDE-constrained problems. Overall, these techniques yield faster convergence near the minimum by addressing ill-conditioning from mismatched variable scales, often reducing computational cost in practice without altering the problem's stationary points.

Connections to Linear and Eigenvalue Problems

In the context of optimization, preconditioning techniques draw heavily from methods developed for solving large-scale s, particularly in where each iteration requires solving the H \delta x = -\nabla f(x), with H denoting the of the objective function f. Preconditioners such as incomplete LU (ILU) factorization, originally designed for sparse s, are directly applicable to these Hessian equations, improving of iterative solvers like GMRES by reducing the condition number without full . This reuse leverages the structural similarities between general s and the quadratic models in optimization, enabling efficient handling of ill-conditioned Hessians in large-scale problems. Preconditioning in optimization also connects to eigenvalue problems through the properties of the , whose eigenvalues characterize the local of f. Large eigenvalue spreads indicate poor , leading to slow ; preconditioners shift or scale the to promote , ensuring the Hessian is well-conditioned for subsequent steps. Preconditioning can help manage indefinite Hessians by promoting , aiding local in nonconvex optimization settings such as problems. Such insights, derived from eigenvalue solvers, guide preconditioner selection to cluster eigenvalues around unity, mirroring techniques in generalized eigenvalue problems. Hybrid approaches further bridge these domains, with the (L-BFGS) method acting as an approximate preconditioner that approximates the inverse using gradient history, linking quasi-Newton updates to solvers. This limited-memory variant maintains efficiency for high-dimensional problems by storing only recent information, effectively preconditioning Newton-like steps without explicit storage or . In trust-region methods, preconditioning transforms the subproblem \min_{\delta x} \frac{1}{2} \delta x^T H \delta x + \nabla f^T \delta x subject to \|\delta x\| \leq \Delta, often via a scaling P to yield the preconditioned P^{-1} H P^{-T}, which facilitates faster solution of the constrained quadratic using preconditioned conjugate gradients. Techniques from eigenvalue problems transfer to optimization preconditioner design, particularly through , where eigenvalues of graph Laplacians or matrices are grouped to inform partitioning and preconditioning strategies. In optimization, this aids in constructing preconditioners that cluster the of the preconditioned , accelerating iterative solvers by mimicking ideal shifts observed in eigenvalue decompositions. For instance, scaled spectral preconditioners capture outlier eigenvalues and cluster the remainder, drawing from multigrid and rooted in .

Variable and Nonlinear Approaches

In variable preconditioning for nonlinear optimization, the preconditioner P_k is updated at each k to incorporate local about the problem's , such as approximations to the or , rather than using a fixed preconditioner throughout the process. This approach enhances by adapting to the changing curvature in non-quadratic objectives, often through quasi-Newton updates that maintain spectral equivalence in the underlying space. For instance, diagonal can be derived directly from the \nabla f(x_k) to normalize step directions, reducing sensitivity to ill-conditioning in dynamic landscapes. Nonlinear preconditioning techniques extend these ideas to handle the full nonlinearity of the objective, particularly in methods like the preconditioned inexact algorithm, which solves an approximate Newton system P_k \delta x = -\nabla f(x_k) with a to ensure global convergence. Here, the preconditioner P_k approximates the local or , and inexact solves (e.g., via Krylov methods) balance computational cost with accuracy. provides an efficient way to approximate the for such preconditioners by applying low-rank updates that satisfy the secant condition B_{k+1} (x_{k+1} - x_k) = \nabla f(x_{k+1}) - \nabla f(x_k), avoiding full recomputation at every step. The nonlinear preconditioned update takes the form \delta x_k = -P_k^{-1} \nabla f(x_k), where P_k is constructed to satisfy conditions from prior iterations, enabling superlinear convergence under suitable assumptions on the objective's smoothness. In settings, such as those encountered in , preconditioners adapt to gradient noise through methods like and RMSProp, which maintain exponential moving averages of gradient moments to form a diagonal preconditioner that scales updates inversely with the root-mean-square of recent gradients. Recent advances include nonlinear Schwarz preconditioning for bound-constrained nonlinear optimization, which decomposes the problem into subdomains for parallelizable updates, improving scalability as of 2023. In neural field optimization, stochastic preconditioning has been developed to handle expensive nonlinear least-squares problems in 2025. Despite these advances, challenges persist in nonconvex landscapes, where indefinite Hessians can lead to unstable preconditioned directions that fail to decrease the objective, necessitating inertia-revealing techniques to detect and correct negative . Additionally, frequent updates to P_k risk amplifying or divergence, prompting adaptive freezing strategies that halt recomputation when preconditioner quality metrics (e.g., residual norms) indicate sufficient .

Applications and Recent Developments

Practical Applications

Preconditioners play a crucial role in (CFD), particularly for solving the Stokes equations in simulations of fluid flow. Multigrid preconditioners are widely employed to accelerate iterative solvers for these systems, enabling efficient handling of large-scale problems in engineering design and . For instance, matrix-free monolithic multigrid methods precondition the discretized Stokes equations, supporting in finite element discretizations. In , preconditioners enhance the efficiency of training algorithms by addressing ill-conditioned matrices that arise in optimization problems. For Gaussian processes, preconditioning techniques such as those based on conjugate gradients with re-orthogonalization and mixed accelerate the inversion of large matrices, which is essential for scalable probabilistic predictions in tasks. Similarly, in training, preconditioners like those integrated into methods improve by regulating the of the loss landscape, as demonstrated in where optimization stabilizes computations. Engineering applications, such as structural analysis, rely on preconditioners to solve finite element systems arising from partial differential equations like the Poisson equation, which underlies elasticity models. Low-order finite element preconditioners are effective for high-order discretizations in these contexts, providing scalable solutions for predictions in complex geometries. Domain decomposition preconditioners further enable parallel computing for large-scale finite element models, partitioning the domain into subdomains to facilitate distributed solves while maintaining robustness. In , preconditioners support algorithms for computed tomography (CT) and (MRI). Ordered subset preconditioned methods, such as relaxed ordered subset preconditioned alternating projection, accelerate the expectation-maximization process in (PET)/CT reconstruction, improving image quality and reducing computational time for clinical diagnostics. Performance benefits of preconditioners are evident in case studies from porous media flow simulations, where they significantly reduce iteration counts in iterative solvers. For example, multi-stage preconditioners in compositional flow models achieve parallel speedups on GPUs compared to sequential approaches, enabling faster simulations of oil reservoir dynamics.

Modern and Emerging Techniques

In the 2020s, the integration of has revolutionized preconditioning by enabling data-driven approximations tailored to complex partial differential equations (PDEs). Neural network-based preconditioners leverage , such as graph neural networks, to approximate matrix factorizations that accelerate methods like conjugate gradient for solving sparse linear systems arising from PDE discretizations. These approaches train on ensembles of similar systems to predict effective preconditioners, addressing limitations of traditional methods in high-dimensional or parametric settings. Scalable preconditioning techniques have advanced through GPU acceleration and hybrid machine learning-iterative frameworks, enabling efficient handling of massive datasets in modern computing environments. GPU-accelerated randomized preconditioners, like those constructing approximate Cholesky factors for Laplacian matrices, parallelize the sampling and factorization processes to achieve near-linear speedup on sparse systems with millions of degrees of freedom. Mixed-precision variants further optimize memory and computation by using lower precision for randomization steps in LSQR solvers, outperforming double-precision baselines by factors of 2-3 on dense least-squares problems. Hybrid methods combine deep operator networks, such as DeepONet, with iterative solvers like GMRES, where the neural component preconditions the system to reduce residual norms faster, particularly for parametric PDEs with up to 10^5 parameters. Another example is neural preconditioned conjugate gradient (neuralPCG), which learns subspace projections to cut solver iterations by 30-70% on time-dependent PDEs. Quantum-inspired preconditioners draw from variational quantum algorithms to tackle linear systems in noisy intermediate-scale quantum (NISQ) devices, extending classical ideas to quantum linear algebra. Variational preconditioners enhance the by incorporating classical preconditioning steps, such as incomplete factorizations, to condition the quantum phase estimation and improve solution accuracy on systems with condition numbers exceeding 10^4. HHL variants integrate variational quantum eigensolvers with classical iterative refinement, reducing complexity and enabling solutions on current for sparse matrices up to 2^10 dimensions. These methods precondition the input to cluster eigenvalues, achieving exponential speedup over classical solvers for well-conditioned subsystems while mitigating through error-corrected post-processing. Emerging techniques address critical gaps in preconditioning for exascale computing and nonlinear problems through adaptive and physics-aware strategies. In exascale environments, libraries like hypre provide scalable algebraic multigrid preconditioners with adaptive coarsening, supporting simulations on 10^6+ cores while maintaining convergence rates below 1.5 iterations per time step for fluid dynamics. Adaptive deep learning for nonlinear preconditioning employs nonlinear elimination within Newton frameworks, bounding errors to 10^{-6} for transport equations by dynamically selecting subspaces based on residual growth. Post-2023 developments in physics-informed neural preconditioners (PINNs) use adjoint-based matrix preconditioning to stabilize training, improving convergence on ill-conditioned PDEs like 3D aerodynamics by factors of 10 in loss reduction. As of 2025, further advances include neural-operator preconditioned Newton methods for parametric nonlinear systems and machine learning techniques like PCGBandit for one-shot acceleration of transient PDE solvers. A key recent innovation involves learning approximate inverses via autoencoders, where the preconditioner P is trained to satisfy P \approx A^{-1} by encoding matrix sparsity patterns and decoding low-rank approximations, achieving sparsity levels of 99% with relative errors under 10^{-3} for elliptic operators. Generative models further refine sparse approximate inverse (SPAI) preconditioners using diffusion processes on graph representations of A, enabling deployment on heterogeneous architectures for seismic imaging tasks. These learned inverses integrate seamlessly with iterative solvers, demonstrating robustness across out-of-distribution matrices in large-scale applications.

References

  1. [1]
    [PDF] Preconditioning Techniques for Large Linear Systems: A Survey
    Preconditioning as a means of reducing the condition number in or- der to improve convergence of an iterative process seems to have been first considered by ...<|control11|><|separator|>
  2. [2]
    [PDF] 11.3 Iterative Methods and Preconditioners
    With a good preconditioner, conjugate gradients becomes one of the most popular and powerful algorithms in numerical linear algebra. Multigrid Solve smaller ...
  3. [3]
    Preconditioner - an overview | ScienceDirect Topics
    Preconditioning is a sort of preprocessing of the original linear system that in theory is used for improving the convergence of the iterative methods.
  4. [4]
    [PDF] 11.2 Norms and Condition Numbers
    For a positive definite symmetric matrix the norm is kAk = λmax(A). Choose x to be the eigenvector with maximum eigenvalue. Then kAxk/kxk equals λmax. The point ...
  5. [5]
    [PDF] Iterative Methods for Sparse Linear Systems Second Edition
    In the six years that passed since the publication of the first edition of this book, iterative methods for linear systems have made good progress in ...
  6. [6]
    [PDF] A Ph.D. Thesis of Historical Importance Iterative Methods for Solving ...
    Preface. David Young's thesis is one ofthe monumental works of modern numerical analysis. His creation, development and analysis of the Successive ...
  7. [7]
    [PDF] Iterative methods for linear systems of equations: A brief historical ...
    Abstract. This paper presents a brief historical survey of iterative methods for solving linear systems of equations. The journey begins with Gauss who.
  8. [8]
    [PDF] On the Origins of Linear and Non-Linear Preconditioning
    This gives us a very general idea of non-linear preconditioning: one first designs a fixed point iteration (like the stationary iterative method in the linear.
  9. [9]
    [PDF] s u326 p30-44 a generalized conjugate gradient method for the ...
    ABSTRACT. We consider a generalized conjugate gradient method for solving sparse, symmetric, positive-definite systems of linear equations,.
  10. [10]
    Preconditioning for Sparse Linear Systems at the Dawn of the 21st ...
    Dec 26, 2012 · The Block FSAI preconditioner of A is defined as the product FTF ... 45 Saad Y., Iterative Methods for Sparse Linear Systems, 2003, 2nd ...
  11. [11]
    On the Origins of Linear and Non-linear Preconditioning
    The idea of preconditioning iterative methods for the solution of linear systems goes back to Jacobi (Astron Nachr 22(20):297–306, 1845), who used rotations ...
  12. [12]
    Parallel Preconditioning with Sparse Approximate Inverses
    Parallel Preconditioning with Sparse Approximate Inverses. Authors: Marcus J. Grote and Thomas HuckleAuthors Info & Affiliations ... PDF. View PDF. Figures ...
  13. [13]
  14. [14]
    Preconditioners for Indefinite Systems Arising in Optimization
    For nonlinear programs a preconditioner is derived from the “smaller” KKT system associated with variables that are not near a bound. For linear programs ...
  15. [15]
    [PDF] Methods of Conjugate Gradients for Solving Linear Systems 1
    Hestenes 2 and Eduard Stiefel 3. An iterative algorithm is given for solving ... In the present paper, the conjugate gradient rou- tines are developed ...
  16. [16]
    GMRES: A Generalized Minimal Residual Algorithm for Solving ...
    We present an iterative method for solving linear systems, which has the property of minimizing at every step the norm of the residual vector over a Krylov ...
  17. [17]
    Bi-CGSTAB: A Fast and Smoothly Converging Variant of Bi-CG for ...
    Bi-CGSTAB: A Fast and Smoothly Converging Variant of Bi-CG for the Solution of Nonsymmetric Linear Systems. Author: H. A. van der VorstAuthors Info & ...
  18. [18]
    Solution of Sparse Indefinite Systems of Linear Equations - SIAM.org
    Solution of Sparse Indefinite Systems of Linear Equations. Authors: C. C. Paige and M. A. SaundersAuthors Info & Affiliations ... MINRES: From Negative Curvature ...Missing: original | Show results with:original
  19. [19]
    [PDF] A geometric theory for preconditioned inverse iteration: IV
    This makes a fundamental difference between optimal preconditioning for linear systems and for eigenvalue problems. Lemma 2.1 provides a geometric description ...
  20. [20]
    [PDF] Spectral Analysis and Preconditioned Iterative Solvers for Large ...
    Apr 30, 2022 · Greenbaum, V. Pt k, and Z. Strakoš. Any nonincreasing convergence curve is possible for GMRES. SIAM J Matrix Anal. Appl., 17(3):465–469, 1996 ...Missing: Andrea | Show results with:Andrea
  21. [21]
    [PDF] Field of values analysis that includes zero for preconditioned ...
    We present a field-of-values (FOV) analysis for preconditioned nonsymmetric saddle- point linear systems, where zero is included in the field of values of the ...
  22. [22]
  23. [23]
    [1611.00288] Efficient variants of the CMRH method for solving a ...
    Nov 1, 2016 · Then, we introduce a flexible variant of the algorithm that allows to use variable preconditioning at each iteration to further accelerate the ...
  24. [24]
  25. [25]
    Block Preconditioners Based on Approximate Commutators - SIAM.org
    This paper introduces two stabilization schemes for the least squares commutator (LSC) preconditioner developed by Elman, Howle, Shadid, Shuttleworth, and ...
  26. [26]
    Shift-Invert Arnoldi's Method with Preconditioned Iterative Solves
    We consider the computation of a few eigenvectors and corresponding eigenvalues of a large sparse nonsymmetric matrix using shift-invert Arnoldi's method ...Missing: seminal | Show results with:seminal
  27. [27]
    Using Generalized Cayley Transformations within an Inexact ...
    We show that a Cayley transformation leads to a more efficient and robust eigensolver than the usual shift-invert transformation when the linear systems are ...
  28. [28]
  29. [29]
    [PDF] NUMERICAL METHODS FOR LARGE EIGENVALUE PROBLEMS ...
    This is a revised edition of a book which appeared close to two decades ago. Someone scrutinizing how the field has evolved in these two decades will make.
  30. [30]
  31. [31]
    None
    ### Summary of Multigrid Preconditioners for Symmetric Eigenvalue Problems
  32. [32]
    Cluster robustness of preconditioned gradient subspace iteration ...
    The paper uses a novel approach of studying the convergence of groups of eigenvalues, rather than individual ones, to obtain new convergence estimates for this ...
  33. [33]
    [PDF] Preconditioned Stochastic Gradient Descent - arXiv
    This paper proposes a new method to adaptively estimate a preconditioner such that the amplitudes of perturbations of pre- conditioned stochastic gradient match ...
  34. [34]
    Automatic Preconditioning by Limited Memory Quasi-Newton Updating
    This paper proposes a preconditioner for the conjugate gradient method (CG) that is designed for solving systems of equations Ax=bi with different ...
  35. [35]
    Preconditioning of Active-Set Newton Methods for PDE-constrained ...
    We present two new preconditioners based on a full block matrix factorization of the Schur complement of the Jacobian matrices, where the active-set blocks are ...
  36. [36]
    [PDF] Preconditioning issues in the numerical solution of nonlinear ...
    Here, we will focus on sequences arising in optimization methods for nonlinear systems and nonlinear least-squares problems and assume to use a Krylov subspace ...
  37. [37]
    [PDF] Preconditioning Techniques for Large Linear Systems: A Survey
    Preconditioning as a means of reducing the condition number in or- der to improve convergence of an iterative process seems to have been first considered by ...
  38. [38]
    Preconditioners for the geometry optimisation and saddle point ...
    Sep 18, 2018 · A class of preconditioners is introduced to enhance geometry optimisation and transition state search of molecular systems.
  39. [39]
    [PDF] Preconditioning for Hessian-Free Optimization
    Apr 2, 2012 · The convergence of the conjugate-gradient method is strongly influenced by the condi- tion number of the Hessian (i.e., its extreme eigenvalues) ...
  40. [40]
    On the limited memory BFGS method for large scale optimization
    Abstract. We study the numerical performance of a limited memory quasi-Newton method for large scale optimization, which we call the L-BFGS method.Missing: solvers | Show results with:solvers
  41. [41]
    [PDF] Preconditioned Conjugate Gradient Methods in Truncated Newton ...
    We summarize a trust region Newton method in Algorithm 1. Solving the sub-problem (9) is similar to solving the linear system (5) though a constraint ksk ≤ ∆k ...
  42. [42]
    [PDF] An Efficient Scaled spectral preconditioner for sequences of ... - arXiv
    Oct 3, 2024 · The main idea is to capture the eigenvalues not captured by the first-level preconditioner, and cluster them to a positive quantity, typically ...
  43. [43]
    Variable Preconditioning via Quasi-Newton Methods for Nonlinear ...
    The aim of this paper is to develop stepwise variable preconditioning for the iterative solution of monotone operator equations in Hilbert space and apply ...Missing: seminal | Show results with:seminal
  44. [44]
    Quasi-Newton variable preconditioning for nonlinear nonuniformly ...
    May 18, 2021 · This paper develops quasi-Newton iterative solvers for nonlinear elliptic problems in Banach spaces, extending variable preconditioning to ...
  45. [45]
    On the Behavior of Broyden's Class of Quasi-Newton Methods
    This paper analyzes algorithms from the Broyden class of quasi-Newton methods for nonlinear unconstrained optimization.
  46. [46]
    Adam: A Method for Stochastic Optimization
    ### Summary of Adam's Preconditioning in Stochastic Optimization (arXiv:1412.6980)
  47. [47]
    Inertia-Revealing Preconditioning For Large-Scale Nonconvex ...
    In nonconvex problems, the Newton direction is guaranteed to be a descent direction only if the Hessian of the Lagrange function is positive definite on the ...Missing: challenges | Show results with:challenges
  48. [48]
    Matrix-Free Monolithic Multigrid Methods for Stokes and ...
    We propose and analyze matrix-free monolithic geometric multigrid solvers that are based on appropriately scaled Chebyshev–Jacobi smoothers.
  49. [49]
    Low-Precision Arithmetic for Fast Gaussian Processes - arXiv
    Jul 14, 2022 · We propose a multi-faceted approach involving conjugate gradients with re-orthogonalization, mixed precision, and preconditioning.
  50. [50]
    Scalable Low-Order Finite Element Preconditioners for High-Order ...
    The best performing preconditioners are formed with low-order finite element meshes that have more vertices than the high-order element has degrees of freedom, ...
  51. [51]
    A Domain Decomposition Solver for a Parallel Adaptive Meshing ...
    We describe a domain decomposition algorithm for use in the parallel adaptive meshing paradigm of Bank and Holst SIAM J. Sci. Comput., 22 (2000), pp.
  52. [52]
    An Improved Multi-Stage Preconditioner on GPUs for Compositional ...
    Aug 18, 2022 · Abstract:The compositional model is often used to describe multicomponent multiphase porous media flows in the petroleum industry. ... speedup ...
  53. [53]
    [2201.01970] Parallel Multi-Stage Preconditioners with Adaptive ...
    Jan 6, 2022 · The black oil model is widely used to describe multiphase porous media flow in the petroleum industry. The fully implicit method features strong ...
  54. [54]
    Learning Preconditioners for Conjugate Gradient PDE Solvers
    We present a new method that leverages learning-based approach to obtain an approximate matrix factorization to the system matrix to be used as a preconditioner ...<|separator|>
  55. [55]
    Accelerating PDE Solvers with Equation-Recast Neural Operator ...
    Sep 1, 2025 · We introduce a Minimal-Data Parametric Neural Operator Preconditioning (MD-PNOP) framework, which establishes a new paradigm for accelerating ...
  56. [56]
    Parallel GPU-Accelerated Randomized Construction of Approximate ...
    May 5, 2025 · We introduce a parallel algorithm to construct a preconditioner for solving a large, sparse linear system where the coefficient matrix is a Laplacian matrix.
  57. [57]
    [PDF] A Mixed Precision Randomized Preconditioner for the LSQR Solver ...
    We implement and eval- uate our method on GPUs and we demonstrate that it outperforms the standard double precision version of randomized, preconditioned LSQR.
  58. [58]
    DeepONet Based Preconditioning Strategies for Solving Parametric ...
    Jan 4, 2024 · The proposed preconditioners are constructed by hybridizing the deep operator network, namely, DeepONet, with standard iterative methods.
  59. [59]
    [PDF] neuralpcg: learning preconditioners for solving partial differential ...
    Our strategy is to build a hybrid PDE solver that combines the advantages of both machine learning approaches and classic numerical solvers. Traditional ...
  60. [60]
    Preconditioning for a Variational Quantum Linear Solver - Inspire HEP
    Dec 25, 2023 · Our findings suggest that combining classical computing techniques, such as preconditioning, with quantum algorithms can significantly enhance ...Missing: variants | Show results with:variants
  61. [61]
    Solving linear systems on quantum hardware with hybrid HHL - Nature
    Sep 10, 2024 · Our proposal adds to the existing literature of hybrid quantum algorithms for linear algebra that are more compatible with the current scale of quantum devices.
  62. [62]
    SUNDIALS and hypre: Exascale-Capable Libraries for Adaptive ...
    Jan 17, 2023 · The ECP SUNDIALS-hypre project provides two numerical libraries that offer vendor-agnostic, GPU-accelerated performance and CPU support. Both ...Sundials · The Hypre Project · A Focus On Multigrid Methods
  63. [63]
    A matrix preconditioning framework for physics-informed neural ...
    Sep 10, 2025 · Abstract. Physics-informed neural networks (PINNs) have recently emerged as a popular approach for solving forward and inverse problems ...
  64. [64]
    [PDF] Generative modeling of Sparse Approximate Inverse Preconditioners
    In this paper, we propose a deep learning based generative model for constructing a. SPAI preconditioner, P, for a given SPD matrix A arising from the finite ...
  65. [65]
    [PDF] Preconditioning least-squares migration with a deep - GeoConvention
    After an iterative training process, we combine the learned inverse Hessian approximation with the dimensionality reduction characteristics of the autoencoder ...<|control11|><|separator|>