Fact-checked by Grok 2 weeks ago

Kaczmarz method

The Kaczmarz method is an iterative algorithm for solving systems of consistent linear equations Ax = b, where A is an m \times n with full column and m \geq n, by performing successive orthogonal projections of the current iterate onto the hyperplanes defined by each individual equation in the system. In each step, starting from an initial guess x_0, the method updates the solution by projecting onto the i-th H_i = \{x \in \mathbb{R}^n : \langle a_i, x \rangle = b_i \}, yielding x_{k+1} = x_k + \frac{b_i - \langle a_i, x_k \rangle}{\|a_i\|^2} a_i, where a_i is the i-th row of A and the rows are cycled through sequentially. This row-action approach is particularly efficient for large-scale, overdetermined systems where direct methods are computationally prohibitive. Developed by Polish mathematician Stefan Kaczmarz in 1937 as a means to approximate solutions to linear systems, the method was initially published in German under the title Angenäherte Auflösung von Systemen linearer Gleichungen. It saw limited attention until its rediscovery and application in the late , notably in 1970 when , , and Herman adapted it for image reconstruction in computed tomography as the Algebraic Reconstruction Technique (). The method's popularity surged in the with the introduction of randomized variants, which select rows probabilistically rather than cyclically, leading to faster practical convergence. Key applications of the Kaczmarz method span , such as () and (), where it reconstructs images from projection data, as well as , , and for solving sparse recovery problems. For consistent systems, the classical cyclic version converges linearly to the unique solution when A is square and nonsingular, with the rate depending on the matrix's . Randomized extensions, such as those proposed by Strohmer and Vershynin in , achieve exponential in , bounded by \left(1 - \frac{1}{\kappa^2(A)}\right)^k, where \kappa(A) is the scaled , making them robust for ill-conditioned or noisy data. Variants like block Kaczmarz, which project onto multiple rows simultaneously, and selections further enhance speed for large-scale problems.

Overview

Definition and Motivation

The Kaczmarz method is an iterative algorithm for solving consistent systems of linear equations Ax = b, where A is an m \times n with m \geq n, x \in \mathbb{R}^n, and b \in \mathbb{R}^m, assuming an exact solution exists. The method iteratively refines an initial guess by performing successive orthogonal projections onto the hyperplanes defined by the individual equations a_i^T x = b_i, with a_i denoting the i-th row of A. This projection-based approach geometrically drives the solution toward the of all hyperplanes, leveraging the structure of the system without requiring full operations in each step. For large-scale overdetermined systems, the Kaczmarz method offers significant advantages over direct least-squares approaches, such as solving the normal equations A^T A x = A^T b. Forming A^T A demands substantial for the dense n \times n , especially when n is large, and subsequent is computationally intensive. In contrast, the row-action nature of the Kaczmarz method processes only one equation per iteration, storing just the original A and facilitating parallel implementations across rows, which is ideal for sparse or distributed data environments. A simple illustration arises in a consistent system where two equations represent intersecting lines as hyperplanes. Starting from an arbitrary point, the method projects it orthogonally onto the first line, then projects the result onto the second line, repeating until convergence to the intersection point. This geometric underscores its efficacy for visualizing solution processes in low dimensions. The method originated in as an approximate solver for linear systems and was later adapted for computed , where it enables image reconstruction from sparse projection data using row-action updates. Randomized variants extend its applicability to inconsistent cases with faster convergence.

Historical Background

The Kaczmarz method was introduced by Polish mathematician Stefan Kaczmarz in 1937 through his seminal paper "Angenäherte Auflösung von Systemen linearer Gleichungen," published in the Bulletin International de l'Académie Polonaise des Sciences et des Lettres. Kaczmarz died in 1939 amid the onset of . In this work, Kaczmarz proposed an iterative approach based on successive orthogonal projections onto the hyperplanes defined by the equations of a , establishing for systems with nonsingular coefficient matrices. Despite its elegance, the method received limited initial attention. The algorithm saw independent rediscoveries in the late 1940s and early 1950s, notably by E. Bodewig in 1948 and G.E. Forsythe in 1953, who explored its utility in for engineering contexts. Its popularity surged in the 1970s with applications to image reconstruction in computerized (CT) scans, where Richard Gordon, Robert Bender, and Gabor T. Herman independently revived it as the Algebraic Reconstruction Technique () in their 1970 paper. This adoption integrated the method into pipelines, emphasizing its efficiency for sparse, overdetermined systems arising from projection data. Throughout the and , researchers like Herman and others extended its use in algebraic reconstruction, solidifying its role in despite hardware limitations of the time. A pivotal review came in 1981 with Yair Censor's survey on row-action methods, which systematically analyzed the Kaczmarz algorithm alongside related iterative projection techniques for large-scale sparse systems, highlighting its theoretical foundations and practical implementations. The method's deterministic cycling was observed to exhibit slow convergence in ill-conditioned cases, motivating relaxations and ordering strategies to improve performance. It revived dramatically in 2009, when Thomas Strohmer and Roman Vershynin proposed a randomized variant, proving exponential convergence rates and demonstrating its efficacy in for problems. This resurgence connected the classical method to modern and applications.

Classical Method

Algorithm Description

The classical Kaczmarz method is an iterative projection technique for solving consistent linear systems Ax = b, where A \in \mathbb{R}^{m \times n} is the with m \geq n and full column rank, x \in \mathbb{R}^n is the unknown solution , and b \in \mathbb{R}^m is the right-hand side . The method proceeds by cyclically selecting rows of A and orthogonally projecting the current iterate onto the defined by the corresponding equation. To implement the algorithm, initialize an arbitrary starting x^0 \in \mathbb{R}^n, often chosen as the zero for simplicity. For each k = 0, 1, 2, \dots, select the row index i = (k \mod m) + 1, and update the iterate using the projection formula: x^{k+1} = x^k + \frac{b_i - a_i^T x^k}{\|a_i\|^2} a_i, where a_i^T denotes the i-th row of A. This update enforces exact satisfaction of the i-th at each step. The full procedure can be expressed in pseudocode as follows, which cycles through all rows sequentially in a loop until a convergence criterion is met, such as \|A x^k - b\|_2 < \epsilon for a small tolerance \epsilon > 0:
Input: Matrix A ∈ ℝ^{m×n} (full column rank), vector b ∈ ℝ^m, initial x⁰ ∈ ℝⁿ, tolerance ε > 0
Set k = 0
While ||A xᵏ - b||₂ ≥ ε:
    For i = 1 to m:
        x^{k+1} = xᵏ + (b_i - a_iᵀ xᵏ) / ||a_i||₂² * a_i
        k = k + 1
    End For
End While
Output: Approximate solution xᵏ
This structure completes one full over the m rows in O(mn) time for dense A, with each row requiring O(n) operations for the inner product, scalar computation, and update; the is particularly efficient when A is sparse, as operations scale with the number of nonzeros per row.

Geometric Interpretation

The Kaczmarz admits a clear geometric as an iterative process of orthogonal projections onto the defined by the rows of the Ax = b. Each equation a_i^T x = b_i, where a_i is the i-th row of A, corresponds to an affine in \mathbb{R}^n, and the seeks the point in the intersection of these hyperplanes (the ) by successively minimizing the from the current iterate to each in a cyclic manner. This -based view underscores the 's role in feasibility problems, where the goal is to find a point in the intersection of sets, here specifically the . The orthogonal projection P_i(x) of a point x onto the H_i = \{ z \in \mathbb{R}^n : a_i^T z = b_i \} is given by the P_i(x) = x + \frac{b_i - a_i^T x}{\|a_i\|^2} a_i, which adds to x the along the direction a_i that bridges the distance to the hyperplane, ensuring the minimal . This update represents the closest point on H_i to x, and in the Kaczmarz iteration, the next estimate is obtained by applying such projections sequentially over the hyperplanes. In two dimensions, where the hyperplanes reduce to lines, the iterative process manifests as a zig-zag : starting from an initial point, each reflects the current position perpendicularly onto the next line, gradually spiraling inward toward the intersection point that satisfies all equations. For instance, consider lines defined by x_1 + x_2 = 1 and -x_1 + 3x_2 = 2; successive projections from a starting point like (0,0) would alternate between the lines, forming a polygonal path that converges to the (1/4, 3/4). In higher dimensions, the process similarly generates successive approximations that refine the position within the , though the path is less visually intuitive, converging to the affine defined by the consistent system. A conceptual of this process in typically depicts the initial point, the sequence of projection lines, and the orthogonal segments connecting iterates to their , illustrating the non-increasing to the and the eventual at the ; arrows along the path highlight the iterative refinement. The method's projections are nonexpansive, preserving or reducing distances, which ensures Fejér monotonicity: each iterate is no farther from any point in the than its predecessor, promoting steady progress toward the .

Basic Convergence

The classical Kaczmarz method converges to the unique solution x^* of the consistent Ax = b, where A \in \mathbb{R}^{m \times n} has full column and m \geq n, starting from any initial point x^0. This convergence is linear, with the contraction factor depending on the angles between the row vectors of A; larger angles between successive rows lead to faster progress per . In the worst case, the method requires O(\kappa^2) iterations to achieve a desired accuracy, where \kappa denotes the of A. A proof sketch relies on the orthogonal projection property: each iteration projects the current iterate onto the defined by the selected equation, yielding \|x^{k+1} - x^*\| \leq \|x^k - x^*\| in the Euclidean norm, with strict inequality unless x^k already lies on that . Over a full through all rows, the error decreases by a factor bounded below by the smallest of the normalized row , ensuring monotonic decrease toward x^*. The method exhibits slow convergence for ill-conditioned systems (large \kappa) or when rows are nearly parallel (small angles), as progress per step becomes minimal. Empirically, multiple full cycles through the rows are often needed for practical in such cases.

Randomized Variant

Algorithm Formulation

The randomized Kaczmarz method, introduced by Strohmer and Vershynin in , extends the classical approach by incorporating probabilistic row selection to achieve faster expected for solving linear systems Ax = b, where A \in \mathbb{R}^{m \times n} with m \geq n. This variant selects equations with probabilities proportional to the squared row norms rather than in a fixed sequence, making it particularly effective for large-scale overdetermined systems. At each iteration k, the row index i_k is chosen randomly with probability p_{i_k} = \frac{\|a_{i_k}\|^2}{\|A\|_F^2}. The iterate is then updated via onto the hyperplane defined by the selected equation a_{i_k}^T x = b_{i_k}: x^{k+1} = x^k + \frac{b_{i_k} - a_{i_k}^T x^k}{\|a_{i_k}\|^2} a_{i_k}. This update rule mirrors the classical Kaczmarz projection but applies it in a random order weighted by row norms, which serves as the deterministic special case when rows are cycled sequentially. The algorithm assumes an overdetermined system (m \geq n) that may be consistent or inconsistent due to noise; it also applies to underdetermined cases (m < n) but primarily targets overdetermined ones where multiple equations constrain the solution. Here is the pseudocode for the randomized Kaczmarz method:
Initialize x^0 ∈ ℝ^n arbitrarily (e.g., x^0 = 0)
Compute p_i = ||a_i||^2 / ||A||_F^2 for i = 1, ..., m
While not converged (e.g., until ||Ax^k - b|| < ε or max iterations reached):
    Select i_k randomly with probability p_{i_k}
    x^{k+1} = x^k + \frac{b_{i_k} - a_{i_k}^T x^k}{\|a_{i_k}\|^2} a_{i_k}
    k ← k + 1
Return x^k
This formulation yields expected faster convergence compared to the deterministic version, with the weighted randomness promoting diverse projections that accelerate progress toward the solution. For inconsistent systems, the iterates approach a neighborhood of the least-squares solution \hat{x} = \arg\min_x \|Ax - b\|_2^2 up to an error determined by the noise level.

Convergence Proof

The convergence of the randomized Kaczmarz method is analyzed in expectation, yielding a linear rate that is exponentially fast under mild conditions on the system matrix. For a consistent overdetermined linear system Ax = b where A \in \mathbb{R}^{m \times n} has full column rank and rows a_i^\top for i = 1, \dots, m, the method generates iterates x^k by randomly selecting an index i_k with probability p_i = \|a_i\|^2 / \|A\|_F^2 and updating via the projection x^{k+1} = x^k + \frac{b_{i_k} - a_{i_k}^\top x^k}{\|a_{i_k}\|^2} a_{i_k}. The following theorem establishes the expected squared distance to the true solution x^*: \begin{equation} \mathbb{E} \left[ |x^k - x^|^2 \right] \leq \left(1 - \frac{1}{\kappa_F^2}\right)^k |x^0 - x^|^2, \end{equation} where \kappa_F = \|A\|_F / \sigma_{\min}(A) is the frame condition number of A, with \|A\|_F denoting the and \sigma_{\min}(A) the smallest of A. To prove this, condition on the iterate x^k and take the expectation over the random row index i. The update satisfies \begin{equation} |x^{k+1} - x^|^2 = |x^k - x^|^2 - \frac{(a_i^\top (x^k - x^*))^2}{|a_i|^2}, \end{equation} so the conditional expectation is \begin{equation} \mathbb{E} \left[ |x^{k+1} - x^|^2 \mid x^k \right] = |x^k - x^|^2 - \sum_{i=1}^m p_i \frac{(a_i^\top (x^k - x^))^2}{|a_i|^2} = |x^k - x^|^2 - \frac{|A(x^k - x^*)|^2}{|A|_F^2}. \end{equation} The subtracted term is nonnegative, ensuring nonexpansion in expectation. To bound the contraction, note that \|A(x^k - x^*)\|^2 \geq \sigma_{\min}^2(A) \|x^k - x^*\|^2, yielding a uniform contraction factor $1 - 1/\kappa_F^2. Iterating the expectation then gives the theorem. For inconsistent systems where no exact solution exists, the randomized Kaczmarz method converges in expectation to a neighborhood of the minimum Euclidean norm solution of the associated least-squares problem \min_x \|Ax - b\|_2^2, with the error bounded by a term proportional to the noise level, assuming the algorithm starts at the origin x^0 = 0. This follows from the method's behavior as a stochastic projection onto the row space, asymptotically approaching the least-squares solution up to noise-induced error. This expected convergence rate offers a key advantage over the deterministic , as it is independent of the number of equations m and avoids slowdowns due to near-parallel row directions, achieving exponential decay governed solely by the frame condition number.

Insights and Equivalences

The randomized Kaczmarz method admits several theoretical equivalences that provide deeper insights into its behavior and unify it with broader classes of iterative algorithms. One key perspective views the uniform selection variant as a form of stochastic gradient descent (SGD) on the least-squares objective function \|Ax - b\|_2^2 / 2, where a row a_i of A is selected uniformly at random, and the update step employs a data-dependent step size of $1/\|a_i\|_2^2. This SGD interpretation, established by connecting randomized projections to mini-batch gradient steps, reveals the method's reliance on row-wise residuals as unbiased stochastic gradients and explains its empirical efficiency in high-dimensional settings. The weighted variant discussed here can be seen as an importance-sampled version of this SGD, achieving improved convergence rates. Further equivalences arise within the sketch-and-project framework, where randomized Kaczmarz corresponds to sketching the rows of A with a random single-row subspace and then projecting onto the solution set of the sketched system. Equivalently, it can be seen as a constrain-and-approximate procedure: first projecting the current iterate onto a randomly selected constraint hyperplane defined by the i-th equation, followed by an approximation step that minimizes the distance to the original point within that constraint. From a random update viewpoint, the iteration aligns directly with the SGD formulation above, emphasizing the role of randomized coordinate selection in variance reduction. Additionally, as a random fixed-point iteration, the update takes the form x^{k+1} = P_{i_k}(x^k), where P_i denotes the orthogonal projection operator onto the i-th hyperplane \{x : a_i^\top x = b_i\}, and i_k is drawn randomly; this casts convergence as approaching the fixed point of the expected projection operator. These perspectives extend to connections with expectation maximization (EM) algorithms, particularly in regularized variants for ill-posed inverse problems, where randomized Kaczmarz updates mimic EM steps by iteratively maximizing likelihoods over random subsets of constraints. The fixed-point and SGD views, in turn, link randomized Kaczmarz to proximal methods and expectation-based iterations in optimization. Such unifications facilitate comparisons across iterative solvers and motivate accelerations, as variance reduction techniques from SGD—such as importance sampling—directly improve convergence by reducing iteration-to-iteration fluctuations. Building on these insights, Gower and Richtárik (2015) generalized the method to allow arbitrary sketching measures, including weighted row selection probabilities p_i proportional to \|a_i\|_2^2, enabling tailored convergence rates that adapt to matrix conditioning and further bridging randomized Kaczmarz with weighted stochastic approximations in machine learning. This weighting enhances the method's robustness, as seen in earlier analyses where probabilities scaled by squared row norms yield exponential expected convergence under minimal assumptions. Overall, these equivalences underscore randomized Kaczmarz's versatility, positioning it as a foundational primitive for randomized linear algebra and iterative optimization.

Advanced Generalizations

Gower-Richtarik Framework

The Gower-Richtárik framework, introduced in 2015, provides a unified generalization of randomized iterative methods, including the randomized , to abstract sketch-and-project schemes for solving linear systems of the form Ax = b, where A \in \mathbb{R}^{m \times n} and B \succ 0 is a symmetric positive definite matrix defining the metric for projections. This framework abstracts the projection step by incorporating random sketching matrices S_k drawn from a distribution \mathcal{D}, allowing for broader applicability beyond row selections. The core algorithm, often referred to as sketch-and-project, operates as follows: at each iteration k, select a random sketch S_k \sim \mathcal{D}; solve the subproblem \arg\min_z \|z - x^k\|_B subject to S_k^T A z = S_k^T b; and update x^{k+1} = x^k + (z - x^k), where z is the solution to the minimization. For the classical randomized , this recovers the standard form by setting B = I_n (the identity matrix) and S_k = e_i for a randomly chosen standard basis vector e_i (corresponding to selecting row i of A), yielding the orthogonal projection onto the selected hyperplane. Under this framework, the method exhibits expected convergence in the B-norm: \mathbb{E}[\|x^{k+1} - x^*\|_B^2] \leq (1 - \mu_k) \|x^k - x^*\|_B^2, where x^* is the solution and \mu_k > 0 depends on the expected properties of the sketches, such as the trace of a certain projection matrix involving A, B, and the distribution \mathcal{D}. For consistent systems (b \in \operatorname{range}(A)), the framework guarantees linear convergence to the solution with rate $1 - \lambda_{\min}(B^{-1} \mathbb{E}[Z]), where Z = A^T S (S^T A B^{-1} A^T S)^\dagger S^T A. In the randomized Kaczmarz case, this specializes to $1 - \lambda_{\min}(A^T A) / \|A\|_F^2 = 1 - 1/\kappa^2, where \kappa = \|A\|_F / \sigma_{\min}(A) is the scaled condition number. The flexibility of the Gower-Richtárik enables extensions to weighted projections (via choice of B), block-coordinate updates (through structured sketches S_k), and even nonlinear constraints by adapting the sketching distribution \mathcal{D}, making it a foundational tool for modern randomized optimization.

Equivalent Perspectives

The Gower-Richtarik method for solving linear systems admits six equivalent formulations, each providing a distinct that unifies various iterative solvers under a common . These viewpoints—sketch-and-project, constrain-and-approximate, random intersect, random linear solve, random update, and random fixed-point—all generate the identical update rule for the iterate, as established by Theorem 2.1 in the original analysis, which demonstrates their equivalence through operator formulations and dual optimizations. The sketch-and-project view interprets the method as projecting the current iterate onto the solution set of a randomly sketched subsystem, minimizing the distance in a Bregman divergence induced by a positive definite matrix B. Specifically, given a linear system Ax = b with A \in \mathbb{R}^{m \times n}, a random sketching matrix S \in \mathbb{R}^{k \times m} (with k < m) yields the update x^{k+1} = \arg\min_{x \in \mathcal{K}(S)} \|x - x^k\|_B, where \mathcal{K}(S) = \{x : S A x = S b\} is the sketched solution space. This perspective highlights connections to randomized dimensionality reduction techniques. In the constrain-and-approximate formulation, the minimizes the to the true while approximately satisfying a relaxed . It solves x^{k+1} = \arg\min_x \|x - x^*\|_B \quad \text{subject to} \quad x = x^k + B^{-1} A^T S y for some y, which approximates the full system's constraints by restricting to an affine parallel to the sketched directions. This view emphasizes duality. The random intersect perspective views the as finding the unique point of two randomly chosen affine subspaces: one passing through the current iterate orthogonal to the sketched rows, and another containing the . arises from the of projections onto these spaces. Under the random linear solve interpretation, each step solves a smaller random linear subsystem S A x = S b and uses its solution to update via a pseudoinverse, reducing to x^{k+1} = x^k - B^{-1} A^T S^T (S A B^{-1} A^T S^T)^{-1} (S A x^k - S b). This framing connects to direct methods for subsystems. The random update viewpoint treats the method as a stochastic gradient-like step in a randomized direction, with the update expressed as x^{k+1} = x^k + \alpha^k d^k, where d^k is a direction derived from the residual and B-norm. It aligns with by interpreting the step as minimizing a local model. Finally, the random fixed-point formulation casts the iteration as a contraction mapping in the B-norm toward the solution, with the operator T_S(x) = x - B^{-1} A^T S^T (S A B^{-1} A^T S^T)^{-1} (S A x - S b) satisfying \|T_S(x) - x^*\|_B \leq \rho_S \|x - x^*\|_B for some contraction factor \rho_S < 1. This enables fixed-point theory for . These equivalences are proven by showing that each formulation derives the same explicit update expression through algebraic manipulations, often leveraging the or properties of Moore-Penrose pseudoinverses to align the projections and solves. The multiplicity of views facilitates diverse proofs, such as those using fixed-point contractions or duality, allowing tailored analyses for different sketching distributions. In the context of the classical Kaczmarz method, which corresponds to uniform row sampling (S selects a single row of A) and B = I, each of these perspectives recovers the standard onto the defined by the selected a_i^T x = b_i, yielding the x^{k+1} = x^k - \frac{a_i^T x^k - b_i}{\|a_i\|^2} a_i. This unification underscores how the Gower-Richtarik framework generalizes Kaczmarz while preserving its geometric essence across all views.

Special Cases

One prominent special case within the Gower-Richtárik framework arises when the matrix A is square and the sketching operator selects single coordinates, which recovers the randomized (CD) method applied to the least-squares objective f(x) = \frac{1}{2} \|Ax - b\|^2. In this setup, the update projects onto the defined by a randomly chosen coordinate direction, equivalent to minimizing f(x) along that , with probabilities uniform over the n coordinates. Another specialization links the framework to the algorithm in statistical models with . The randomized Kaczmarz method, interpreted as a sketch-and-project update, aligns with EM iterations for imputing missing observations in linear models under Poisson likelihoods or , particularly in applications like where subsets of equations represent partial data. This connection arises because EM's E-step estimates missing values via conditional expectations, while the M-step maximizes the complete-data likelihood, mirroring the cyclic projections in Kaczmarz-type methods for ill-posed systems. Further cases include weighted variants of the Kaczmarz method, where row selection probabilities are set to p_i = \|a_i\|^2 / \|A\|_F^2, achieving optimal convergence rates by emphasizing rows with larger norms. Block versions emerge as multi-row sketches, where the sketching matrix selects a random subset of rows, leading to updates that solve smaller least-squares subproblems over those blocks. Regarding convergence, the coordinate descent case exhibits a rate of $1 - 1/(n \kappa^2), serving as a lower bound on the expected decrease in the weighted error \mathbb{E}[\|x_{k+1} - x^*\|_{A^\top A}^2] \leq (1 - \lambda_{\min}(A^\top A)/\|A\|_F^2) \|x_k - x^*\|_{A^\top A}^2, where \kappa is the condition number of A and n is the dimension. For EM interpretations, convergence relies on probabilistic models, ensuring monotonicity in the likelihood under suitable regularity conditions for missing data setups. These specializations explain the acceleration of Kaczmarz methods in systems with sparse or well-conditioned rows, as the effective sketching increases, tightening the factor \rho by better capturing the solution relative to the of the .

Extensions and Applications

Block and Adaptive Methods

The block Kaczmarz method extends the classical approach by selecting multiple rows simultaneously in each , projecting the iterate onto the of the corresponding hyperplanes to accelerate for large-scale systems. In this variant, a block of b rows, denoted as A_B and b_B, is chosen, and the update is given by x^{k+1} = \arg\min_{z} \|z - x^k\|_2^2 \quad \text{subject to} \quad A_B z = b_B, which simplifies to x^{k+1} = x^k + A_B^+ (b_B - A_B x^k), where A_B^+ is the Moore-Penrose pseudoinverse of A_B. This formulation allows for parallel computation within blocks, making it suitable for distributed systems, though the choice of block size b influences the between computational cost and progress per step. Block selection strategies vary between random and approaches to balance and exploitation of residuals. In the randomized block Kaczmarz method, blocks are selected uniformly at random or with probabilities proportional to the squared norms of block residuals, enabling expected linear rates that depend on the minimum of A and the block partitioning overlap. For selection, the block with the largest aggregate residual norm, \|A_B x^k - b_B\|_2, is chosen deterministically, which empirically speeds up for ill-conditioned systems by prioritizing informative constraints. The following outlines a generic block Kaczmarz iteration with selection:
Input: A ∈ ℝ^{m×n}, b ∈ ℝ^m, x^0 ∈ ℝ^n, block size b
For k = 0, 1, 2, ...
    Select block indices B_k ⊂ {1,...,m} with |B_k| = b maximizing \|A_{B_k} x^k - b_{B_k}\|_2
    Compute r_B = b_{B_k} - A_{B_k} x^k
    x^{k+1} = x^k + A_{B_k}^+ r_B
Output: x^K after K iterations
This greedy variant avoids pseudoinverse computation in some implementations by using row averaging or relaxation, reducing overhead for sparse matrices. Adaptive deterministic variants further refine this by dynamically adjusting block size or selection order based on current residual norms, incorporating momentum to dampen oscillations and enhance progress. For instance, the momentum-augmented adaptive deterministic block Kaczmarz (mADBK) method selects blocks via a residual-norm criterion and updates with a heavy-ball step: x^{k+1} = x^k + \alpha (A_{B_k}^+ r_{B_k}) + \beta (x^k - x^{k-1}), where \alpha and \beta are step and momentum parameters tuned adaptively. Such adaptations ensure linear convergence to the least-norm solution for consistent systems without fixed partitioning, with rates improved by up to 20-30% over static greedy blocks in numerical tests on large sparse matrices. Convergence analysis for block methods yields sublinear rates for cyclic deterministic selections but linear rates in expectation for randomized versions, with the contraction factor $1 - c / \kappa^2 where c scales with block overlap and \kappa is the condition number; larger blocks reduce iterations but increase per-step cost. These rates outperform single-row Kaczmarz (the case b=1) when rows are correlated, as blocks capture joint constraints more efficiently. In the extreme case of full blocks (b=m), the method solves the system in one iteration by computing the projection onto the full solution space, equivalent to solving the normal equations A^T A x = A^T b for consistent overdetermined systems.

Bregman and Weighted Variants

The Bregman-Kaczmarz method extends the classical Kaczmarz algorithm by replacing projections with projections defined via s, enabling the handling of non- geometries in problems. The associated with a strongly F is given by D_F(x, y) = F(x) - F(y) - \langle \nabla F(y), x - y \rangle, which generalizes the squared when F(x) = \frac{1}{2} \|x\|^2. This formulation is particularly useful for problems involving probability distributions, where the Kullback-Leibler () divergence arises by choosing F(x) = \sum_i x_i \log x_i, allowing projections onto simplices or other non- constraint sets while preserving sparsity or positivity. A recent accelerated variant, the block accelerated randomized Bregman-Kaczmarz (ARBK) method from 2024, employs block-randomized updates that minimize the D_F(z, x^k) subject to a subset of linear constraints, incorporating through parameters and a restart mechanism to stabilize iterations. In each step, a random block of rows is selected with probabilities proportional to their constants, and the update is computed in the space for efficiency, making it suitable for large-scale linearly constrained minimization of strongly functions. This approach builds on the Gower-Richtarik for weighted randomized iterations as a foundational tool for incorporating row norms in selection probabilities. Complementing these divergence-based extensions, the weighted average fast block Kaczmarz (WAFBK) method, proposed in 2024, addresses overdetermined linear systems by integrating greedy row selection—based on residual distances—with weighted averages of multiple projections to accelerate without pseudoinverses. The algorithm selects rows using one of four weighting schemes (uniform, non-uniform, residual-based, or distance-based) and updates via a weighted combination that balances speed and robustness, outperforming prior greedy block methods in numerical tests on large consistent systems. Convergence analyses for these variants demonstrate significant improvements over standard rates. The ARBK method achieves accelerated sublinear convergence of O(1/k^2) in Bregman distance for general cases and linear rates under the Polyak-Łojasiewicz condition and regularity assumptions on the blocks, with the restart scheme ensuring last-iterate guarantees. Similarly, WAFBK exhibits linear convergence to the minimum-norm solution, with error bounds scaling as \|x_{k+1} - x^*\|_2^2 \leq (1 - \theta \frac{\sigma_{\min}^2(A)}{m \|A\|_F^2} \frac{\|A_{\mathcal{I}_k}\|_F^2}{\sigma_{\max}^2(A_{\mathcal{I}_k})}) \|x_k - x^*\|_2^2 for uniform weighting, where \theta captures the greedy selection efficacy. These results hold in smooth interpolation regimes, where the constraints tightly interpolate the objective. Recent developments in 2025 further advance adaptive block variants within row-action frameworks, emphasizing applications in such as on infinite-dimensional spaces and non-orthogonal expansions. Surveys of row-action methods highlight how these adaptive Bregman and weighted extensions enhance scalability for ML tasks like sparse recovery and tensor regression, providing explicit regret bounds under noisy data.

Practical Applications

The Kaczmarz method, particularly through its implementation as the Algebraic Reconstruction Technique (), plays a central role in computed () imaging by iteratively solving systems of linear equations derived from sparse projections to reconstruct cross-sectional images. initializes a crude and applies data-driven relaxation parameters to enhance , enabling high-quality images from limited projections in frameworks, which reduces radiation dose in medical scans. In (), a phase-constrained variant of Kaczmarz's processes oversampled k-space data to mitigate phase errors and Gibbs ringing, achieving accelerated scans (e.g., 2x undersampling) without explicit regularization while preserving image fidelity for Cartesian and spiral trajectories. In , the randomized Kaczmarz method addresses underdetermined systems in by enabling fast sparse signal through iterative projections onto randomly selected hyperplanes. Variants like Kaczmarz-based Iterative Hard Thresholding (KZIHT) combine projection steps with thresholding to achieve linear convergence for bounded orthonormal and sub-Gaussian matrices, recovering sparse signals with optimal statistical bias under minimal sampling requirements (e.g., m ≥ C s log⁴ N). The sparse randomized Kaczmarz algorithm further supports in multiple measurement vector problems with corruptions, using binning for robust support estimation in streaming settings, as demonstrated in applications like hyperspectral . Within , the Kaczmarz method serves as a optimizer for least-squares , equivalent to (SGD) with stepwise-optimal stepsizes in the regime of overparameterized models. Recent analyses link it to , where block Kaczmarz iterations model sequential task , yielding universal forgetting rates of O(1/√k) for random orderings independent of dimensionality, improving upon prior SGD bounds. In overparameterized settings, such as neural tangent kernels, last-iterate reaches O(1/√T) with η = 1/β, explaining efficient training in high-dimensional linear models without . Beyond these domains, the method applies to inverse problems in , such as seismic for volcano monitoring, where distributed randomized Kaczmarz solves large-scale sparse least-squares systems from travel-time data across sensor networks, achieving accuracy comparable to centralized solvers with reduced communication overhead. implementations on graphics processing units (GPUs) accelerate Kaczmarz for large-scale dense systems (e.g., n up to 15,000), outperforming CPU versions by factors of nearly 2x per iteration through vectorized dot products, though benefits diminish for sparse matrices due to data transfer costs. Empirical studies highlight performance advantages in high-dimensional settings, where randomized Kaczmarz variants converge faster than conjugate gradient methods for consistent overdetermined sparse systems, often reducing computation time significantly in and benchmarks.

References

  1. [1]
    [PDF] Kaczmarz algorithm - Repozytorium PK
    Abstract. In 1937, Stefan Kaczmarz proposed a simple method, called the Kaczmarz algorithm, to solve iteratively systems of linear equations Ax = b in ...
  2. [2]
    [PDF] A REVIEW OF THE KACZMARZ METHOD
    The Kaczmarz method is an iterative method for solving linear systems of equations. The. Kaczmarz method has been around since it was developed by Kaczmarz 1937 ...
  3. [3]
    [PDF] Angenäherte Auflösung von Systemen linearer Gleichungen
    Given the importance of Kaczmarz's algorithm to this thesis, and unable to find an English translation of Kaczmarz's orig- inal article, I undertook the ...
  4. [4]
    [PDF] A randomized Kaczmarz algorithm with exponential convergence
    In this paper, we propose the first randomized Kaczmarz method with exponential expected rate of convergence, cf. Section 2. Furthermore, this rate depends ...
  5. [5]
    [PDF] Survey of a class of iterative row-action methods: The Kaczmarz ...
    Apr 9, 2024 · The Kaczmarz algorithm is an iterative method that solves linear systems of equations. It stands out among iterative algorithms when dealing ...
  6. [6]
    Stefan Kaczmarz (1895 - 1939) - Biography - MacTutor
    ... paper Angenäherte Auflösung von Systemen linearer Gleichungen T. (Approximate resolution of systems of linear equations). published in the Bulletin ...
  7. [7]
    Algebraic Reconstruction Techniques (ART) for three-dimensional ...
    We give a new method for direct reconstruction of three-dimensional objects from a few electron micrographs taken at angles which need not exceed a range of 60 ...
  8. [8]
    Row-Action Methods for Huge and Sparse Systems and Their ...
    This paper brings together and discusses theory and applications of methods, identified and labelled as row-action methods, for linear feasibility problems.
  9. [9]
    A Randomized Kaczmarz Algorithm with Exponential Convergence
    Apr 25, 2008 · We introduce a randomized version of the Kaczmarz method for consistent, overdetermined linear systems and we prove that it converges with expected exponential ...
  10. [10]
    [PDF] Topics in Numerical Analysis II Computational Inverse Problems
    Oct 30, 2023 · orthogonal projection into hyperplanes ... Kaczmarz method was proposed in 1937 by Polish mathematician Stefan Kaczmarz.
  11. [11]
    [PDF] Projection onto convex sets - angms.science
    Mar 13, 2025 · perplane is called Kaczmarz method in linear algebra. 9 / 31. Page ... , this is known as Fejér monotonicity. • Math fact (real analysis): ...
  12. [12]
    Remarks on Kaczmarz Algorithm for Solving Consistent and ... - NIH
    If the normal vectors of every two successive hyperplanes keep reasonably large angles, the convergence of the classical Kaczmarz algorithm will be fast, ...
  13. [13]
    [PDF] Kaczmarz Algorithm, row action methods, and statistical learning ...
    Abstract. The Kaczmarz algorithm is an iterative row action method that typically solves an overdetermined linear system. The randomized Kaczmarz.
  14. [14]
  15. [15]
    A randomized Kaczmarz algorithm with exponential convergence
    Feb 8, 2007 · The Kaczmarz method for solving linear systems of equations is an iterative algorithm that has found many applications ranging from computer ...
  16. [16]
    Randomized Kaczmarz solver for noisy linear systems
    Apr 2, 2010 · The randomized Kaczmarz method converges with expected exponential rate, independent of the number of equations in the system.
  17. [17]
    [1310.5715] Stochastic Gradient Descent, Weighted Sampling, and ...
    Oct 21, 2013 · Our results are based on a connection we make between SGD and the randomized Kaczmarz algorithm, which allows us to transfer ideas between the ...
  18. [18]
    [1506.03296] Randomized Iterative Methods for Linear Systems - arXiv
    Jun 10, 2015 · We develop a novel, fundamental and surprisingly simple randomized iterative method for solving consistent linear systems.
  19. [19]
    [0810.3619] On regularization methods of EM-Kaczmarz type - arXiv
    Oct 20, 2008 · We consider regularization methods of Kaczmarz type in connection with the expectation-maximization (EM) algorithm for solving ill-posed ...
  20. [20]
    Randomized Iterative Methods for Linear Systems
    Kaczmarz, Angenäherte Auflösung von Systemen linearer Gleichungen, Bulletin International de l'Académie Polonaise des Sciences et des Lettres. ... (1937), pp ...
  21. [21]
    [PDF] Analysis of a randomized block Kaczmarz method - Joel A. Tropp
    This algorithm is the first block Kaczmarz method with an (expected) linear rate of convergence that can be expressed in terms of the geometric properties of ...
  22. [22]
    A greedy block Kaczmarz algorithm for solving large-scale linear ...
    This paper describes a greedy block Kaczmarz algorithm by using a greedy strategy to construct control index set and choosing block submatrix in each iteration.
  23. [23]
    On the adaptive deterministic block Kaczmarz method with ...
    Mar 15, 2025 · The Kaczmarz method is a traditional and widely used iterative algorithm for solving large-scale consistent linear systems, ...Missing: 2025 | Show results with:2025
  24. [24]
  25. [25]
  26. [26]
  27. [27]
  28. [28]
    [PDF] Adaptive Adjustment of Relaxation Parameters for Algebraic ... - arXiv
    Sparsity prior. CT reconstruction relies on iterative algorithms such as the algebraic reconstruction technique (ART) to produce a crude reconstruction based on.Missing: MRI | Show results with:MRI
  29. [29]
    [PDF] Model-driven reconstruction with phase-constrained highly ... - arXiv
    Dec 14, 2020 · In PECOS, highly oversampled-in-time k-space data are fed into a phase-constrained variant of Kaczmarz's algebraic recon- struction algorithm, ...<|separator|>
  30. [30]
    [PDF] arXiv:2304.10123v1 [stat.ML] 20 Apr 2023
    Apr 20, 2023 · Recently, Zhang et al. [45] have proposed an accelerated version of IHT based on the Kaczmarz method (KZIHT) for compressed sensing by ...
  31. [31]
    [PDF] Sparse Randomized Kaczmarz for Support Recovery of Jointly ...
    Jun 15, 2018 · We present a variant of the Sparse Randomized Kaczmarz algorithm for corrupted MMV and compare our proposed method with an existing Kaczmarz ...
  32. [32]
    [PDF] Fast Last-Iterate Convergence of SGD in the Smooth Interpolation ...
    Jul 29, 2025 · As noted above, such an upper bound would also yield improved guarantees in continual learning and the Kaczmarz method. Notably, in those ...
  33. [33]
    None
    Summary of each segment:
  34. [34]
    Distributed Randomized Kaczmarz and Applications to Seismic ...
    Sep 26, 2015 · In this paper, we propose distributed randomized kaczmarz that performs in-network computation to solve least-squares over the network by ...
  35. [35]
    GPU computing with Kaczmarz's and other iterative algorithms ... - NIH
    Kaczmarz row projections are performed within each block in parallel and the results are then merged using component-averaging operations. CARP was shown to be ...