Incomplete Cholesky factorization
Incomplete Cholesky factorization is a sparse approximate factorization method in numerical linear algebra that decomposes a symmetric positive definite (SPD) matrix A into A \approx LL^T, where L is a lower triangular matrix with a controlled sparsity pattern to limit computational fill-in during the decomposition process.[1] This approximation preserves the positive definiteness of A under certain conditions, such as diagonal dominance, and is particularly effective for large sparse matrices arising from discretizations of partial differential equations.[2] Unlike the full Cholesky factorization, which can introduce excessive nonzero elements and increase storage and time costs, the incomplete version drops elements beyond a predefined sparsity pattern, making it suitable for preconditioning iterative solvers.[1]
The method was originally developed by Meijerink and van der Vorst in 1977 as part of an iterative solution strategy for linear systems Ax = b where A is a symmetric M-matrix, a class of SPD matrices with nonpositive off-diagonal entries.[2] They proposed the factorization as a regular splitting A = LL^T - R, where R is the residual matrix with nonzeros only outside the allowed pattern, ensuring convergence when combined with methods like the conjugate gradient algorithm.[2] This approach demonstrated superior performance over earlier iterative techniques, such as successive line over-relaxation, in numerical tests on finite difference discretizations of elliptic problems.[2]
Computationally, the factorization is obtained by modifying the standard Cholesky algorithm to neglect certain off-diagonal entries during Gaussian elimination, typically using a level-of-fill parameter k (as in IC(k)) or a drop tolerance for threshold-based variants.[1] For the basic IC(0) case, the sparsity pattern of L matches that of A, avoiding any fill-in.[1] When the standard procedure fails to produce a valid factorization (e.g., due to pivoting issues or loss of positive definiteness), modified versions like ICNR(0) or ICNE(0) incorporate diagonal shifts to enforce existence while approximating the spectrum of A.[1]
As a preconditioner, incomplete Cholesky factorization clusters the eigenvalues of the preconditioned matrix, accelerating the convergence of Krylov subspace methods like preconditioned conjugate gradients, often achieving speedups of 4 to 5 times in practical implementations on parallel hardware.[1] It is widely applied in solving sparse linear systems from structural engineering, fluid dynamics, and optimization, though its effectiveness can depend on matrix ordering and conditioning.[1] Advanced variants, such as multilevel or threshold-dropping approaches, further enhance robustness for ill-conditioned problems.[1]
Introduction
Definition
Incomplete Cholesky factorization is an approximation to the complete Cholesky decomposition for symmetric positive definite (SPD) matrices, where a sparse lower triangular matrix L is computed such that A \approx LL^T.[3] This sparsity in L is enforced by dropping certain elements—either those outside a predefined sparsity pattern (such as the pattern of A itself) or those below a specified threshold—during the factorization process, thereby limiting fill-in compared to the exact factorization.[3] The method was originally developed for matrices arising from finite difference discretizations, particularly symmetric M-matrices, to ensure numerical stability.
In mathematical terms, the factorization often incorporates a permutation matrix P to minimize fill-in, satisfying P A P^T = L L^T + D, where L is a sparse lower triangular matrix and D represents the matrix of dropped terms, which is typically neglected in the approximation to yield P A P^T \approx L L^T.[3] This formulation preserves the positive definiteness of the original matrix A under certain conditions on the dropping strategy, as the approximate factor LL^T remains SPD.
The primary purpose of incomplete Cholesky factorization is to serve as a preconditioner in iterative solvers, such as the conjugate gradient method, for large sparse linear systems Ax = b. By approximating the exact Cholesky factors while significantly reducing computational and storage costs, it accelerates convergence without requiring the full dense factorization.[3] This makes it particularly valuable for applications in scientific computing, such as solving partial differential equations on sparse grids.
Historical Development
The development of incomplete Cholesky factorization emerged in the context of early iterative methods for solving large-scale linear systems arising in scientific computing, particularly following the introduction of the conjugate gradient method by Hestenes and Stiefel in 1952, which highlighted the need for effective preconditioners to handle sparse symmetric positive definite matrices. Initial ideas for incomplete factorizations appeared in the late 1950s and early 1960s as approximations to facilitate convergence in finite difference schemes for elliptic partial differential equations. Richard S. Varga formalized one of the earliest variants in 1960, proposing factorization-based splittings for normalized iterative methods applied to boundary value problems, motivated by the computational limitations of direct methods on emerging digital computers. Similar contributions from Buleev in 1960 and Oliphant in 1961 extended these techniques to improve the spectral properties of iteration matrices for difference operators.
The 1970s marked a pivotal formalization of incomplete Cholesky factorization, driven by applications in plasma physics and structural analysis where full factorizations were infeasible due to sparsity and scale. In 1977, Meijerink and van der Vorst introduced the IC(0) method, an incomplete Cholesky preconditioner that preserves the sparsity pattern of the lower triangular factor, proving its existence and positive definiteness for symmetric M-matrices and demonstrating its efficacy as a preconditioner for conjugate gradient iterations on five-point finite difference discretizations. Building on this, Kershaw proposed in 1978 a shifted incomplete Cholesky variant to mitigate breakdown issues in plasma simulation problems, where negative pivots could arise, achieving up to 200 times faster convergence than alternating direction implicit methods on challenging systems. These works established incomplete Cholesky as a robust algebraic preconditioner, emphasizing drop tolerance and pivot modifications to balance accuracy and sparsity.
Extensions in the 1980s focused on enhancing robustness for general sparse matrices beyond M-matrices, incorporating numerical dropping and ordering strategies to control fill-in in preconditioners for conjugate gradient solvers. Munksgaard's 1980 approach introduced threshold-based dropping during factorization, allowing controlled fill to improve conditioning while limiting storage, particularly for irregular sparse grids in fluid dynamics. Ajiz and Jennings further advanced breakdown prevention in 1984 with a modified incomplete Cholesky scheme that ensures positive definiteness through diagonal adjustments, widely adopted for elliptic PDEs on unstructured meshes. These innovations addressed the sensitivity of early methods to matrix ordering, paving the way for broader use in sparse linear algebra libraries.
By the 1990s and 2000s, incomplete Cholesky factorization was integrated into major scientific computing software packages, reflecting its maturity as a standard preconditioner for high-performance simulations. The Portable, Extensible Toolkit for Scientific Computation (PETSc), initiated in the early 1990s, incorporated IC variants including threshold dropping, enabling scalable implementations for parallel architectures in applications like geophysics and climate modeling. Similarly, the hypre library, developed from the late 1990s, included incomplete Cholesky options within its BoomerAMG framework for hybrid preconditioning in large-scale multiphysics problems. Post-2010 developments have emphasized acceleration on modern hardware, with GPU-optimized algorithms emerging around 2012 to parallelize the factorization and solves, achieving speedups of approximately 2 to 3 times over CPU versions for sparse systems in machine learning and computational fluid dynamics.[4] Hybrid variants combining incomplete Cholesky with multigrid or low-rank updates have further extended its applicability to exascale computing challenges. More recent advances, as of 2024, include robust implementations in half-precision arithmetic for improved energy efficiency in large-scale simulations.[5]
Mathematical Background
Complete Cholesky Factorization
The Cholesky factorization provides an exact decomposition for symmetric positive definite matrices. Specifically, for an n \times n symmetric positive definite matrix A, it expresses A = LL^T, where L is a lower triangular matrix with positive diagonal entries.[6] This decomposition leverages the symmetry and positive definiteness of A to halve the storage and computational requirements compared to general LU factorizations.
The algorithm derives from equating entries in the matrix equation A = LL^T. It proceeds column-by-column, computing the elements of L recursively. For the diagonal entry at position i,
l_{ii} = \sqrt{ a_{ii} - \sum_{k=1}^{i-1} l_{ik}^2 },
and for each off-diagonal entry l_{ji} with j > i,
l_{ji} = \frac{ a_{ji} - \sum_{k=1}^{i-1} l_{jk} l_{ik} }{ l_{ii} }.
These formulas ensure that the factorization matches A exactly, with the square root operation guaranteed to be real and positive due to the positive definiteness of A.[6]
Key properties of the Cholesky factorization include its computational complexity of approximately \frac{1}{3} n^3 floating-point operations for dense matrices, making it efficient for moderate-sized problems.[7] It preserves the positive definiteness of A, as the eigenvalues of L L^T are nonnegative, and provides an exact representation for dense matrices, though it introduces fill-in (additional nonzero elements) when applied to sparse matrices. Numerically, the factorization is backward stable: the computed factor \hat{L} satisfies \hat{L} \hat{L}^T = A + \Delta A, where \| \Delta A \|_2 / \| A \|_2 \leq c n^2 u and u is the unit roundoff, ensuring reliable results in finite precision arithmetic without pivoting for positive definite matrices.[8]
Challenges with Sparse Matrices
One of the primary challenges in applying complete Cholesky factorization to sparse symmetric positive definite matrices arises from the fill-in problem, where the elimination process introduces new nonzero entries in the lower triangular factor L at positions that were zero in the original matrix A. This fill-in expands the sparsity pattern of L beyond that of A, potentially transforming a matrix with O(n) nonzeros into one requiring O(n^2) storage and operations in the worst case, thereby undermining the efficiency gains expected from sparsity.[9]
The extent of fill-in is highly sensitive to the ordering of the matrix rows and columns, as poor vertex ordering in the associated graph can lead to excessive new connections during elimination, amplifying storage and computational demands. Techniques such as nested dissection address this by recursively partitioning the graph into separators to minimize fill-in, achieving near-optimal reductions for structured problems like regular finite element meshes, where fill-in scales as O(n \log n) under suitable ordering. However, determining an optimal ordering remains computationally intensive, often approximating NP-hard minimization problems.[10][9]
For large-scale applications, such as finite element methods (FEM) in structural analysis or circuit simulations involving matrices with n > 10^5 unknowns, complete Cholesky factorization becomes prohibitive due to its superlinear growth in memory and floating-point operations; for two-dimensional FEM grids, factorization requires O(n^{3/2}) operations and storage, escalating to infeasible levels for three-dimensional problems with O(n^2) costs.[9]
Incomplete Factorization Variants
IC(0) Method
The IC(0) method, introduced as a basic form of incomplete Cholesky factorization, computes a sparse lower triangular matrix L such that the nonzero entries of L match exactly the sparsity pattern of the lower triangle of the symmetric positive definite matrix A, with all fill-in elements explicitly set to zero during the factorization process.[3] This approach approximates the complete Cholesky decomposition A = LL^T by preserving the original matrix sparsity, thereby reducing both storage requirements and computational complexity, particularly for large sparse systems arising in practical applications like partial differential equation discretizations.[3]
The computation follows a modified version of the standard Cholesky algorithm, where elimination steps are restricted to the predefined sparsity pattern, ignoring any contributions from positions outside this pattern. For the diagonal elements, at step i, the value is determined by
l_{ii}^2 = a_{ii} - \sum_{\substack{k=1 \\ (i,k) \in S}}^{i-1} l_{ik}^2,
where S denotes the sparsity pattern of A, ensuring only existing nonzero terms contribute to the sum; l_{ii} is then taken as the positive square root.[3] For off-diagonal elements l_{ji} with j > i and (j,i) \in S, the formula is
l_{ji} = \frac{1}{l_{ii}} \left( a_{ji} - \sum_{\substack{k=1 \\ (j,k) \in S, (i,k) \in S}}^{i-1} l_{jk} l_{ik} \right),
again limiting the summation to positions allowed by the pattern, which excludes fill-in interactions.[3] These restricted updates ensure the factorization remains within the original sparsity structure but may introduce approximation errors compared to the full decomposition.
A key property of the IC(0) method is its strict preservation of sparsity, which facilitates efficient implementation in sparse matrix formats and makes it suitable as a preconditioner for iterative solvers like the conjugate gradient method.[3] However, the dropping of fill-in can lead to a loss of positive definiteness in LL^T, particularly if a computed diagonal pivot l_{ii}^2 becomes negative due to aggressive sparsity enforcement; in such cases, robustness is often achieved by clamping the negative value to a small positive constant (e.g., machine epsilon) or applying a diagonal shift to A prior to factorization.[3] This potential breakdown highlights a trade-off between sparsity efficiency and numerical stability, with empirical studies showing that IC(0) performs well for well-structured matrices like those from finite difference discretizations but may require modifications for more irregular sparsity patterns.[3]
Higher-Level Incomplete Methods
Higher-level incomplete methods, particularly the IC(k) variants for k \geq 1, extend the basic incomplete Cholesky factorization by allowing controlled fill-in in the lower triangular factor L to improve approximation quality while preserving sparsity. In these methods, a fill-in entry in L is retained if its generation during Gaussian elimination corresponds to a graph distance of at most k in the matrix's sparsity graph or, equivalently, a path length of at most k in the elimination tree.[11] This distance measures the minimum number of elimination steps required to produce the entry from the original nonzero pattern.[12]
From a graph-theoretic viewpoint, the symmetric matrix A is associated with an undirected graph G where vertices represent indices and edges indicate nonzero entries. During Cholesky elimination, fill-ins occur along paths connecting previously eliminated nodes; the level of a potential fill-in at position (i,j) with i > j is defined recursively as the smallest integer \ell such that there exists a sequence of indices linking i and j through at most \ell intermediate eliminations. IC(k) includes all such positions where \ell \leq k, with level-1 fill-ins limited to direct extensions of the original adjacency (e.g., neighbors of existing nonzeros).[11]
The retention rule is implemented symbolically prior to numerical factorization. The level \mathrm{lev}(i,j) for a fill-in position is computed iteratively via
\mathrm{lev}^{(m+1)}(i,j) = \min\left( \mathrm{lev}^{(m)}(i,j), \, \min_{p < \min(i,j)} \left( \mathrm{lev}^{(m)}(i,p) + \mathrm{lev}^{(m)}(p,j) + 1 \right) \right),
starting with \mathrm{lev}^{(0)}(i,j) = 0 if a_{ij} \neq 0 and \infty otherwise, converging after finitely many steps; entries with final \mathrm{lev}(i,j) \leq k are kept.[11] During the numerical phase, for row i and column j < i, the entry is
l_{ij} = \frac{1}{l_{jj}} \left( a_{ij} - \sum_{m < j, \, \mathrm{lev}(i,m) \leq k} l_{im} l_{jm} \right),
provided the denominator is positive; for k=1, the sum includes only immediate predecessors m adjacent to both i and j in the level-0 graph.[11]
These approaches yield superior preconditioners compared to the IC(0) baseline (where k=0 enforces no fill-in), especially for ill-conditioned systems from discretized PDEs, by clustering eigenvalues more effectively and reducing iteration counts in methods like preconditioned conjugate gradients—e.g., IC(1) often halves iterations relative to IC(0) on 2D Poisson problems at the expense of roughly doubling the nonzero count in L.[11] The trade-off balances enhanced accuracy with manageable storage growth, making IC(k) suitable for large-scale applications where full factorization is prohibitive.[12]
Threshold Incomplete Factorization
Threshold incomplete factorization, often denoted as IC(τ), is a magnitude-based variant of incomplete Cholesky factorization designed for sparse symmetric positive definite matrices. In this approach, the factorization proceeds by computing potential fill-in elements across the entire lower triangular factor L as in the complete Cholesky decomposition, but discards those whose absolute values fall below a user-specified threshold τ during the process. This dropping strategy allows fill-in in any position but controls sparsity numerically rather than structurally, making it suitable for matrices where fill patterns are not easily predictable from graph distance.[3][13]
The core computation for an off-diagonal element l_{ji} (with j > i) follows the standard Cholesky update formula:
l_{ji} = \frac{a_{ji} - \sum_{k=1}^{i-1} l_{jk} l_{ik}}{l_{ii}}
This value is then evaluated against the threshold: if |l_{ji}| ≥ τ, it is retained; otherwise, it is set to zero. To enhance robustness and ensure non-negativity—particularly to prevent negative pivots that could cause breakdown—a modified form clamps the value before dropping:
l_{ji} = \max\left(0, \frac{a_{ji} - \sum_{k=1}^{i-1} l_{jk} l_{ik}}{l_{ii}}\right)
if the result exceeds τ, and zero otherwise. Diagonal elements l_{ii} are always retained and computed as the square root of the remaining Schur complement to maintain positive definiteness.[3]
Variants of this method include absolute thresholding, where τ is a fixed small positive scalar (e.g., 10^{-3}), and relative thresholding, where the drop criterion is |l_{ji}| < τ ⋅ ‖row norm of the current Schur complement‖ or τ ⋅ |l_{ii}|, adapting the tolerance to local matrix scaling. These are frequently combined with diagonal scaling of the original matrix A to normalize row sums or diagonals, improving the preconditioner's effectiveness for ill-conditioned systems. A robust extension, known as the Ajiz-Jennings modification, compensates for dropped elements by adding their contributions to the diagonals, guaranteeing the existence of a positive definite factorization under mild assumptions.[3][14]
This factorization is particularly adaptive to matrix conditioning, as larger elements in poorly conditioned regions are more likely to be retained, yielding a better spectral approximation of A ≈ LL^T. Compared to level-based methods, it offers greater flexibility in controlling the trade-off between sparsity and accuracy via τ, but the resulting fill-in pattern is harder to predict a priori since it depends on numerical magnitudes rather than predefined sparsity levels. As τ decreases, the factor becomes denser and closer to the complete factorization, often reducing the number of iterations in preconditioned conjugate gradient methods by 20-50% for typical sparse problems, though at higher computational cost.[3][13]
Algorithms and Computation
Basic Algorithm
The basic algorithm for incomplete Cholesky factorization of a symmetric positive definite matrix A proceeds in two distinct phases: a symbolic phase and a numeric phase. In the symbolic phase, the sparsity pattern of the lower triangular factor L is predetermined, often through graph analysis of A (such as identifying the nonzero structure via the adjacency graph) or by executing a dummy factorization that simulates fill-in while adhering to the chosen dropping criterion. This phase ensures that the positions where L may have nonzeros are fixed in advance, limiting computational overhead in the subsequent numeric phase.[3][15]
The numeric phase computes the values of the nonzero entries in L such that A \approx LL^T, by performing a sparse variant of Gaussian elimination where elements outside the symbolic pattern are dropped (set to zero). For the standard IC(0) variant, which retains the sparsity pattern of the lower triangle of A without additional fill-in, the process updates only existing positions. The following pseudocode outlines the core sequential steps for this phase, assuming row-major storage and that the sparsity pattern is precomputed:
for i = 1 to n
sum_diag = 0
for k in nonzero positions below i in column i (per pattern)
sum_diag += l_{i k}^2
l_{i i} = sqrt(a_{i i} - sum_diag)
if l_{i i} <= 0 (breakdown handling; see below)
l_{i i} = sqrt(τ) where τ is a small positive tolerance (e.g., machine epsilon)
// Adjust subsequent sums if needed for stability
for j = i+1 to n (only if (j,i) in pattern)
sum_off = 0
for k in intersection of patterns for rows i and j below i
sum_off += l_{j k} * l_{i k}
l_{j i} = (a_{j i} - sum_off) / l_{i i}
// Drop l_{j i} if |l_{j i}| < δ (threshold, if applicable beyond IC(0))
for i = 1 to n
sum_diag = 0
for k in nonzero positions below i in column i (per pattern)
sum_diag += l_{i k}^2
l_{i i} = sqrt(a_{i i} - sum_diag)
if l_{i i} <= 0 (breakdown handling; see below)
l_{i i} = sqrt(τ) where τ is a small positive tolerance (e.g., machine epsilon)
// Adjust subsequent sums if needed for stability
for j = i+1 to n (only if (j,i) in pattern)
sum_off = 0
for k in intersection of patterns for rows i and j below i
sum_off += l_{j k} * l_{i k}
l_{j i} = (a_{j i} - sum_off) / l_{i i}
// Drop l_{j i} if |l_{j i}| < δ (threshold, if applicable beyond IC(0))
This algorithm mimics the full Cholesky decomposition but restricts updates to the predefined pattern, ensuring sparsity.[3][15]
The overall computational complexity is O(\mathrm{nnz}(A) + f), where \mathrm{nnz}(A) denotes the number of nonzeros in A and f represents the fill-in (additional nonzeros in L); for IC(0), f is typically minimal or zero beyond A's structure. Pivoting is optional and generally omitted to maintain the sparsity pattern, though diagonal pivoting can be incorporated for enhanced numerical stability in ill-conditioned cases without significantly altering the pattern.[3]
To handle potential breakdowns where the computed diagonal l_{ii}^2 < 0 (violating positive definiteness assumptions due to dropping), the algorithm sets l_{ii} to a small positive value such as \sqrt{\tau} (with \tau near machine precision) and proceeds, optionally propagating adjustments to off-diagonal estimates for robustness; alternatively, a diagonal shift \sigma I (with \sigma > 0) can be added to A beforehand to ensure positivity.[16]
Advanced Algorithmic Modifications
Advanced algorithmic modifications to incomplete Cholesky factorization address limitations in scalability, numerical stability, and adaptability, particularly for large-scale sparse systems arising in scientific computing. Task-based parallelism leverages the elimination tree structure of the matrix to enable concurrent processing during factorization. For instance, a task-parallel approach using a 2D partitioned-block layout decomposes the sparse matrix into blocks, allowing independent tasks for computing Schur complements and updates, which can be scheduled dynamically to minimize idle time on multicore architectures. This method extends to distributed-memory settings through left-looking or right-looking supernodal techniques, where supernodes—groups of columns with identical sparsity patterns—are factored in parallel, reducing communication overhead by processing dense blocks within supernodes before scattering updates.[17]
To enhance robustness, especially in ensuring positive definiteness for preconditioning symmetric positive definite matrices, the modified incomplete Cholesky (MIC) variant adjusts dropped fill-in elements by incorporating them into the diagonal entries of the factor. In MIC, when a potential fill-in term is discarded during factorization due to a dropping criterion, its contribution is instead added to the pivot diagonal to preserve row sums and maintain the spectrum's positive definiteness, preventing factorization breakdown in ill-conditioned cases.[18] This diagonal compensation, originally proposed in schemes like Jennings-Malik, ensures the approximate factor remains positive definite even for matrices close to singularity, improving convergence in iterative solvers like conjugate gradients.[12]
Regarding computational complexity, parallel implementations of these modifications offer theoretical speedup up to O(p) on p processors by exploiting the elimination tree's height for task granularity, though sparsity-induced load imbalance often limits scalability, as irregular nonzero distributions lead to uneven work across processors. Techniques like dynamic scheduling and supernodal partitioning mitigate this by balancing computational loads, achieving near-linear speedups on up to hundreds of cores for matrices with favorable sparsity patterns, such as those from finite element discretizations.[19]
Implementations
Dense Matrix Implementation
For dense matrices, where no inherent sparsity pattern exists, the incomplete Cholesky factorization is typically implemented using full matrix storage in languages like MATLAB or Fortran, adapting the standard Cholesky algorithm with an artificial incompleteness mechanism such as threshold dropping. This approach treats the matrix as fully dense, looping over all elements without sparsity checks, and introduces dropping to control fill-in and approximate the factorization LL^T \approx A, where A is symmetric positive definite and L is lower triangular. The IC(0) variant, which retains only the original pattern, degenerates to the complete Cholesky for dense A, so threshold-based methods like ICT(\tau) are preferred, where elements below a tolerance \tau (often relative to the diagonal or column norm) are discarded during computation. Banded storage may be applied if the matrix exhibits a known bandwidth, reducing memory for semi-dense cases, but full two-dimensional arrays are standard for general dense implementations.
In practice, the algorithm modifies the classical Cholesky process by computing Schur complements and applying the dropping rule inline, ensuring numerical stability for positive definite matrices. This avoids the need for sparse data structures like compressed sparse row (CSR) formats, simplifying the code but increasing computational cost to O(n^3) in the worst case before dropping. For threshold dropping, the tolerance \tau is chosen empirically, often starting small (e.g., $10^{-3}) to balance approximation quality and efficiency; global or column-relative thresholds can be used, with the latter scaling drops per column to handle varying matrix densities. Such implementations are straightforward in high-level languages, leveraging built-in linear algebra routines for inner products, though custom loops are needed for dropping.
A representative MATLAB-like pseudocode for threshold incomplete Cholesky (ICT) on a dense matrix A of size n \times n illustrates the process:
L = zeros(n, n);
[tau](/page/Tau) = 1e-3; % Drop tolerance
for i = [1](/page/1):n
% Compute diagonal
sum_sq = 0;
for k = [1](/page/1):i-1
sum_sq = sum_sq + L(i, k)^2;
end
L(i, i) = sqrt(A(i, i) - sum_sq);
for j = i+1:n
% Compute off-diagonal
sum_prod = 0;
for k = [1](/page/1):i-1
sum_prod = sum_prod + L(j, k) * L(i, k);
end
temp = (A(j, i) - sum_prod) / L(i, i);
% Apply threshold dropping
if abs(temp) > [tau](/page/Tau) * abs(L(i, i))
L(j, i) = temp;
else
L(j, i) = 0;
end
end
end
L = zeros(n, n);
[tau](/page/Tau) = 1e-3; % Drop tolerance
for i = [1](/page/1):n
% Compute diagonal
sum_sq = 0;
for k = [1](/page/1):i-1
sum_sq = sum_sq + L(i, k)^2;
end
L(i, i) = sqrt(A(i, i) - sum_sq);
for j = i+1:n
% Compute off-diagonal
sum_prod = 0;
for k = [1](/page/1):i-1
sum_prod = sum_prod + L(j, k) * L(i, k);
end
temp = (A(j, i) - sum_prod) / L(i, i);
% Apply threshold dropping
if abs(temp) > [tau](/page/Tau) * abs(L(i, i))
L(j, i) = temp;
else
L(j, i) = 0;
end
end
end
This code uses explicit loops for clarity and dropping, simulating incompleteness in a dense setting; optimizations like BLAS calls for the inner sums can improve performance for moderate n.
Despite its simplicity, dense matrix implementations of incomplete Cholesky are less prevalent than sparse variants, mainly serving as benchmarks for preconditioner testing on small-to-medium problems (e.g., n < 1000) or when matrices are generated without sparse structure, such as in kernel methods or dense kernel approximations. For larger dense systems, complete factorization or other preconditioners like diagonal scaling are often more practical due to storage demands, limiting incomplete approaches to scenarios where approximate factors suffice for iterative solvers.
Sparse Matrix Implementation
Implementing incomplete Cholesky factorization for sparse matrices requires specialized data structures to exploit the sparsity pattern and minimize storage overhead. The input matrix A and the resulting lower triangular factor L are typically stored in compressed sparse row (CSR) format, which efficiently represents the non-zero entries using three arrays: values, column indices, and row pointers. This format allows for fast access during the factorization process, as updates to the Schur complement can be performed by traversing only the relevant non-zeros.[20][21]
In the symbolic phase, the sparsity pattern of L is determined prior to numerical computation to allocate storage and predict fill-in. This phase models the matrix as an undirected graph, where non-zero entries correspond to edges, and uses adjacency lists to represent the graph structure for efficient traversal and elimination ordering analysis. Adjacency lists facilitate the identification of potential fill positions without storing the full dense graph, enabling scalable symbolic factorization even for large matrices.[22]
Several established libraries provide robust implementations of incomplete Cholesky for sparse matrices, integrating seamlessly with high-performance computing environments. PETSc's PCICC preconditioner supports level-based incomplete Cholesky (ICC(k)) for sequential sparse matrices in AIJ format, allowing users to specify fill levels via options like -pc_factor_levels. HSL_MI28 offers an efficient limited-memory incomplete Cholesky with support for level-k fill control, suitable for general sparse symmetric matrices and emphasizing robustness against breakdowns.[23][24]
To optimize performance and reduce fill-in during factorization, pre-ordering techniques are essential. Approximate minimum degree (AMD) ordering minimizes the number of non-zeros in L by selecting pivots that eliminate vertices with the fewest connections, significantly reducing computational cost for IC(0) and higher levels. Similarly, METIS can be used for multilevel ordering to balance load in parallel settings while controlling fill. For higher-level IC(k) methods, dynamic allocation strategies are employed, where the sparsity pattern of L is grown incrementally using linked lists or vector resizing to accommodate additional fill without excessive over-allocation.[25][26]
Key challenges in sparse implementations include memory management for variable fill patterns, particularly in higher-level or threshold-based methods where the exact number of non-zeros is unpredictable. Limited-memory approaches, such as those in HSL_MI28, bound storage by dropping small entries or limiting column counts, but require careful tuning to avoid excessive sparsity that degrades preconditioner quality. Additionally, debugging indefiniteness—arising from negative or zero pivots during factorization—can be addressed by monitoring diagonal entries in the Schur complement; if a pivot falls below a small threshold (e.g., machine epsilon), a small diagonal shift or regularization can be applied to restore positive definiteness.[24][16][27]
Applications and Examples
Use in Preconditioning
Incomplete Cholesky factorization serves as an effective preconditioner for iterative methods solving large sparse symmetric positive definite (SPD) linear systems Ax = b. In the left-preconditioned form, it transforms the system to M^{-1}Ax = M^{-1}b, where M = LL^T approximates A with L being a lower triangular sparse matrix from the incomplete factorization. Solving systems involving M requires only forward and backward substitutions on L and L^T, which are computationally efficient due to the sparsity of L. This approach is particularly useful in methods like the preconditioned conjugate gradient (PCG) for SPD matrices and preconditioned GMRES for more general cases, as introduced in early applications such as the incomplete Cholesky-CG method.[3]
The primary benefit of incomplete Cholesky preconditioning lies in its impact on convergence rates of iterative solvers. By approximating A, it clusters the eigenvalues of the preconditioned matrix M^{-1}A, reducing the effective condition number and spectral radius, which accelerates convergence in PCG and GMRES. For instance, higher-level methods like IC(k) generally provide better conditioning than IC(0) by retaining more fill-in, leading to tighter eigenvalue clustering, especially for matrices with eigenvalues already clustered around unity. This improvement is evident in PCG, where the convergence bound depends on the maximum and minimum eigenvalues of M^{-1}A, and in GMRES, where it minimizes residuals more effectively through reduced iteration counts.[3][3]
Theoretically, for an SPD matrix A, the incomplete Cholesky factor L ensures that M = LL^T remains SPD provided no negative pivots (diagonal entries) occur during factorization, preserving the positive definiteness essential for stable preconditioning. Approximation quality is quantified by the dropped terms forming a remainder D = A - LL^T, with error bounds derived from the norm \|D\|; small \|D\| (controlled by dropping strategies) guarantees that the condition number of M^{-1}A is bounded, ensuring robust convergence. The factorization computation, typically via sparse LU-like algorithms, supports this by enforcing sparsity patterns or thresholds to manage \|D\|.[3][28][3]
Selection of the incomplete Cholesky variant depends on the eigenvalue spectrum of A: IC(0) suffices for well-structured matrices with minimal fill, while IC(k) or threshold-based methods are preferred for spectra requiring more accurate approximation to avoid pivot issues. For added robustness, especially in cases prone to breakdown, it can be combined with symmetric successive over-relaxation (SSOR), yielding a hybrid preconditioner that enhances stability without excessive fill.[3][29]
Numerical and Sparse Examples
To illustrate the incomplete Cholesky factorization, consider a small dense symmetric positive definite (SPD) matrix A = \begin{pmatrix} 4 & 1 & 0 \\ 1 & 3 & 1 \\ 0 & 1 & 2 \end{pmatrix}.
The IC(0) factorization, which retains the full sparsity pattern without dropping fill-ins, coincides with the complete Cholesky factorization for this matrix since no additional fill-ins occur beyond the original pattern. Using the standard algorithm, the lower triangular factor is L = \begin{pmatrix} 2 & 0 & 0 \\ 0.5 & \sqrt{11}/2 & 0 \\ 0 & 2/\sqrt{11} & \sqrt{18/11} \end{pmatrix} \approx \begin{pmatrix} 2 & 0 & 0 \\ 0.5 & 1.658 & 0 \\ 0 & 0.603 & 1.279 \end{pmatrix}. The approximation error is \|A - LL^T\|_F = 0, indicating an exact factorization.
For the threshold-based variant with \tau = 0.1, entries in L below this tolerance in absolute value are dropped during computation. In this case, all computed entries exceed 0.1, yielding the same L as IC(0) and identical error \|A - LL^T\|_F = 0.
Now consider a sparse example with the 5x5 tridiagonal SPD matrix from the discrete 1D Laplacian A = \begin{pmatrix} 2 & -1 & 0 & 0 & 0 \\ -1 & 2 & -1 & 0 & 0 \\ 0 & -1 & 2 & -1 & 0 \\ 0 & 0 & -1 & 2 & -1 \\ 0 & 0 & 0 & -1 & 2 \end{pmatrix}, which has 13 nonzeros (nnz(A) = 13). The IC(0) factorization preserves the original tridiagonal pattern in L, dropping any potential fill-ins outside this structure, resulting in a bidiagonal L.
The step-by-step computation for IC(0) proceeds as follows, using the formula l_{ij} = \frac{a_{ij} - \sum_{k=1}^{j-1} l_{ik} l_{jk}}{l_{jj}} for i > j (dropped if outside pattern) and l_{ii} = \sqrt{a_{ii} - \sum_{k=1}^{i-1} l_{ik}^2}, with only subdiagonal and diagonal terms retained:
-
Row 1: l_{11} = \sqrt{2} \approx 1.414.
-
Row 2: l_{21} = -1 / l_{11} \approx -0.707, l_{22} = \sqrt{2 - (-0.707)^2} \approx 1.225.
-
Row 3: l_{32} = -1 / l_{22} \approx -0.816, l_{33} = \sqrt{2 - (-0.816)^2} \approx 1.155.
-
Row 4: l_{43} = -1 / l_{33} \approx -0.866, l_{44} = \sqrt{2 - (-0.866)^2} \approx 1.118.
-
Row 5: l_{54} = -1 / l_{44} \approx -0.894, l_{55} = \sqrt{2 - (-0.894)^2} \approx 1.095.
Thus, L is bidiagonal with nnz(L) = 9 (diagonal plus subdiagonal), preserving the sparsity of the lower triangle of A. In contrast, a complete Cholesky factorization without sparsity control would introduce no additional fill-in for this well-structured tridiagonal matrix, but in general sparse cases with poorer ordering, significant fill-in could dense the factor (e.g., up to O(n^2) nonzeros in worst-case graphs).
The approximation LL^T exactly reproduces A, so \|A - LL^T\|_F = 0. Error analysis via condition number shows cond(A) ≈ 14 (as expected for the 1D Laplacian with n=5), and cond(LL^T) = cond(A) since the factorization is exact, highlighting the method's accuracy for band matrices.
Sparsity patterns can be visualized as graphs: the original A corresponds to a path graph with 5 nodes and edges between consecutive nodes (tridiagonal bands), while the IC(0) L maintains the same lower bidiagonal structure without added edges from fill-in; a complete factorization on a fill-prone sparse matrix might add chords or denser connections, increasing the graph's edge count.