Aitken's delta-squared process
Aitken's delta-squared process, also known as Aitken extrapolation, is a method in numerical analysis for accelerating the convergence of a linearly convergent sequence to its limit by extrapolating a superior approximation using three consecutive terms and second-order differences. Introduced by Scottish mathematician Alexander Craig Aitken in his 1926 paper on Bernoulli's method for solving algebraic equations, although an early form was known to the Japanese mathematician Seki Takakazu in the late 17th century, the process assumes the sequence exhibits geometric convergence with a constant error ratio, enabling the elimination of the leading error term to produce a faster-converging subsequence.[1][2]
The core formula of the process, applied to a sequence \{s_n\} converging to a limit q, computes the accelerated term as
\hat{s}_n = s_n - \frac{(s_{n+1} - s_n)^2}{s_{n+2} - 2s_{n+1} + s_n},
where the numerator is the square of the first forward difference \Delta s_n = s_{n+1} - s_n, and the denominator is the second difference \Delta^2 s_n = s_{n+2} - 2s_{n+1} + s_n. This derivation relies on the asymptotic error model e_n = s_n - q \approx \lambda e_{n-1} for some constant |\lambda| < 1, ensuring that under these conditions, the error in \hat{s}_n is asymptotically smaller than in s_n, often achieving quadratic-like improvement for linear sequences. The method is optimal in the sense that it fully eliminates the linear error component for sequences satisfying the assumption exactly.[3]
Aitken's process finds broad applications in numerical computations, including the acceleration of partial sums in slowly converging infinite series, iterative fixed-point solvers such as those for root-finding (e.g., enhancing the cosine fixed-point iteration for the Dottie number), and the refinement of approximations in expectation-maximization algorithms for statistical models. It serves as a foundational technique in sequence transformation theory, influencing later methods like the Shanks transformation and Wynn's epsilon algorithm, and remains valuable for its simplicity and effectiveness when higher-order information about the error is unavailable. Extensions address cases with alternating error signs or paired opposite signs, broadening its utility in practical iterative schemes while preserving the core acceleration principle.[4]
Introduction
Definition
Aitken's delta-squared process, also known as Aitken extrapolation, is an extrapolation technique designed to accelerate the convergence of linearly convergent sequences to their limit, provided the limit exists and the error terms behave asymptotically like a geometric series.[5]
For a given sequence \{s_n\}, the Aitken transform generates a new sequence \{A_n\} using the standard formula
A_n = s_n - \frac{(s_{n+1} - s_n)^2}{s_{n+2} - 2s_{n+1} + s_n},
which computes each term based on three consecutive elements of the original sequence.[3]
Intuitively, the method eliminates the leading-order error term by assuming a quadratic model that fits the three terms, thereby refining the approximation to the limit more rapidly than the original sequence.[5]
The process applies to sequences converging to a limit L where the error e_n = s_n - L satisfies e_n \approx c r^n for some constants c \neq 0 and |r| < 1, ensuring linear convergence that can be improved upon.[3]
Historical development
Aitken's delta-squared process was invented by the New Zealand mathematician Alexander Aitken in 1926, during his doctoral research at the University of Edinburgh.[6] As part of his Ph.D. thesis on curve fitting that incorporated statistical errors, Aitken developed the method to accelerate the convergence of sequences generated by iterative numerical processes.[6] He introduced it in the paper "On Bernoulli's Numerical Solution of Algebraic Equations," published in the Proceedings of the Royal Society of Edinburgh, where it served as an extension of Daniel Bernoulli's 1732 approach to approximating the roots of algebraic equations through successive approximations.[1]
The process emerged from Aitken's broader work in numerical analysis and statistics, building on foundational techniques such as Newton's divided difference interpolation for handling progressive linear approximations in computational settings.[6] Although an early precursor to the method appeared in the work of the Japanese mathematician Seki Kōwa in the late 17th century, Aitken's formulation provided a systematic, modern framework for sequence acceleration, particularly suited to problems in statistical computation where rapid convergence was essential.[7]
In the 1930s, the method found early adoption in statistical applications, notably through Aitken's collaboration with R.A. Fisher at the Rothamsted Experimental Station.[6] Its utility in numerical analysis gained wider recognition among statisticians in subsequent decades.[8]
Mathematical foundations
Forward difference operator
The forward difference operator, denoted by \Delta, is a fundamental tool in discrete calculus for analyzing sequences \{s_n\}. It is defined for a sequence \{s_n\} as
\Delta s_n = s_{n+1} - s_n,
measuring the first-order discrete change between consecutive terms.[9]
Higher-order forward differences are defined recursively: The k-th order difference \Delta^k s_n is obtained by applying the operator \Delta to the (k-1)-th order difference, so \Delta^k s_n = \Delta (\Delta^{k-1} s_n), with \Delta^0 s_n = s_n as the base case.[9] This recursive structure allows computation of differences of any order, analogous to higher derivatives in continuous analysis.[10]
The second-order forward difference, particularly relevant in acceleration methods, expands explicitly as
\Delta^2 s_n = \Delta (\Delta s_n) = s_{n+2} - 2 s_{n+1} + s_n.
This can be computed directly from three consecutive sequence terms without intermediate steps, providing a discrete analog to the second derivative.[9] More generally, the k-th order difference follows the binomial form \Delta^k s_n = \sum_{m=0}^k (-1)^{k-m} \binom{k}{m} s_{n+m}.[9]
The forward difference operator exhibits several key properties that facilitate its use in numerical analysis. It is linear, satisfying \Delta (a s_n + b t_n) = a \Delta s_n + b \Delta t_n for scalars a, b and sequences \{s_n\}, \{t_n\}, which enables its application to linear combinations of sequences.[10] Additionally, it is shift-invariant in the sense that it commutes with the forward shift operator E, where E s_n = s_{n+1}, yielding \Delta E = E \Delta.[10] These properties mirror those of the differentiation operator in continuous calculus, where forward differences approximate derivatives via \Delta s_n / h \approx s'(n h) for small step size h.[9]
In the context of sequences, the forward difference operator quantifies discrete rates of change, helping to identify patterns such as linear convergence where \Delta s_n \approx \rho (s_n - \sigma) for limit \sigma and constant \rho with |\rho| < 1.[11] This makes it essential for detecting and exploiting convergence behaviors in iterative processes.
Notationally, \Delta specifically denotes the forward difference, distinguishing it from the backward difference \nabla s_n = s_n - s_{n-1} or the central difference \delta s_n = s_{n+1/2} - s_{n-1/2}, each suited to different approximation needs in discrete analysis.[9] The second-order forward difference plays a key role in Aitken's delta-squared process for accelerating such sequences.[11]
The Aitken transformation formula arises from the assumption that a slowly converging sequence \{s_n\} approaches its limit L with errors dominated by a geometric term, such that the error e_n = L - s_n \approx c r^n for some constants c \neq 0 and |r| < 1. Under this model, the first forward difference is \Delta s_n = s_{n+1} - s_n \approx c r^n (1 - r), and the second forward difference is \Delta^2 s_n = s_{n+2} - 2 s_{n+1} + s_n \approx -c r^n (1 - r)^2. Substituting these into the transformation yields the exact limit: A_n = s_n - \frac{(\Delta s_n)^2}{\Delta^2 s_n} = s_n - \frac{[c r^n (1 - r)]^2}{-c r^n (1 - r)^2} = s_n - (-c r^n) = s_n + c r^n = L.[12]
To derive the explicit formula without the model constants, substitute the differences directly:
\Delta s_n = s_{n+1} - s_n, \quad \Delta^2 s_n = (s_{n+2} - s_{n+1}) - (s_{n+1} - s_n) = s_{n+2} - 2 s_{n+1} + s_n.
Thus,
A_n = s_n - \frac{(s_{n+1} - s_n)^2}{s_{n+2} - 2 s_{n+1} + s_n} = \frac{s_n s_{n+2} - s_{n+1}^2}{s_{n+2} - 2 s_{n+1} + s_n},
where the equivalent fractional form follows by clearing the denominator:
A_n (s_{n+2} - 2 s_{n+1} + s_n) = s_n (s_{n+2} - 2 s_{n+1} + s_n) - (s_{n+1} - s_n)^2 = s_n s_{n+2} - s_{n+1}^2.
This formula computes an accelerated approximation A_n using three consecutive terms s_n, s_{n+1}, s_{n+2}.[12]
An alternative derivation views the formula as the limit L that satisfies the geometric error ratio across the three terms: s_{n+1} - L = r (s_n - L) and s_{n+2} - L = r (s_{n+1} - L). Equating the ratios gives (s_{n+1} - L)^2 = (s_n - L)(s_{n+2} - L), which expands and solves to the same expression for L = A_n. This perspective aligns with the forward difference operator, where \Delta^2 s_n measures the deviation from linearity in the error.[12]
The formula can also be obtained via quadratic interpolation on the points (n, s_n), (n+1, s_{n+1}), (n+2, s_{n+2}). Fit the quadratic p(k) = a (k - n)^2 + b (k - n) + d, yielding divided differences where the second divided difference is \Delta^2 s_n / 2!. The Aitken value A_n corresponds to the constant term d adjusted such that the quadratic degenerates to linear under the convergence assumption, equivalent to setting the leading coefficient a = 0 and solving the overdetermined system, which reduces to the difference formula above. Step-by-step, solve the system:
d = s_n,
a + b + d = s_{n+1} \implies a + b = s_{n+1} - s_n = \Delta s_n,
$4a + 2b + d = s_{n+2} \implies 4a + 2b = s_{n+2} - s_n = \Delta s_n + \Delta s_{n+1}.
Setting a = 0 for convergence implies b = \Delta s_n, but consistency requires adjusting d = A_n to satisfy the second equation proportionally, leading again to A_n = s_n - (\Delta s_n)^2 / \Delta^2 s_n.
If the denominator \Delta^2 s_n = 0, the second differences vanish, indicating the sequence is already arithmetic (linear convergence eliminated or absent), so no acceleration is possible; in practice, set A_n = s_{n+1} to avoid division by zero and continue with the original sequence.
For higher-order acceleration, apply the transformation recursively to the sequence of A_n. Define A_n^{(0)} = s_n, then compute A_n^{(1)} = A_n using the formula on \{A_k^{(0)}\}, and iterate to A_n^{(k)} on \{A_m^{(k-1)}\}. This builds the Shanks transformation table, where each level eliminates higher-order error terms, though stability decreases with deeper iterations.[13]
In terms of error analysis, if the original error is e_n = O(r^n) with $0 < |r| < 1, the transformed error is e_n^{(1)} = A_n - L = O(r^{2n}), asymptotically squaring the convergence rate by removing the leading geometric term; higher iterations yield O(r^{k n}) for the k-th level, assuming the error expansion permits it.[13]
Theoretical properties
Acceleration mechanism
Aitken's delta-squared process accelerates the convergence of sequences by eliminating the dominant geometric error term through the use of second-order forward differences, effectively squaring the convergence factor for linearly convergent sequences with a constant error ratio. This mechanism assumes that the error in successive terms follows a geometric progression, allowing the process to approximate the limit more rapidly by correcting for the linear component of the error without requiring knowledge of the underlying iteration function. As a result, the asymptotic error reduction doubles the order of convergence from O(r^n) to O(r^{2n}), where r is the common ratio of the error terms, provided |r| < 1.[14]
To illustrate this theoretically, consider a sequence satisfying s_n = L + c r^n + o(r^n), where L is the limit, c \neq 0, and |r| < 1. The second differences capture the ratio r implicitly, and applying the process yields an accelerated sequence A_n = L + O(r^{2n}). This derivation parallels a discrete Taylor expansion, where the first-order term is subtracted based on the estimated ratio from consecutive differences, leaving higher-order residuals. The proof relies on solving the system of error equations for three consecutive terms to isolate the limit, confirming the squared error reduction under the geometric assumption.[14]
The process succeeds when the original sequence approaches the limit monotonically with a constant error ratio r, ensuring the second differences remain stable and non-zero. Specifically, the sequence must exhibit linear convergence without rapid variations in the ratio, as deviations can destabilize the extrapolation. Aitken's method emerges as a special case of the Shanks transformation for order one (e_1), and it corresponds to the second column of Wynn's \epsilon-algorithm, which generalizes it to higher-order accelerations for more complex error structures. However, it fails for oscillatory sequences (where r alternates sign) or those with superlinear convergence, as the second differences may vanish or become inconsistent (detailed in subsequent sections on convergence conditions).[14]
Convergence conditions
Aitken's delta-squared process requires that the input sequence \{s_n\} converges to a limit L, with the asymptotic error satisfying e_n = s_n - L \sim c \rho^n for some constant c \neq 0 and $0 < |\rho| < 1.[15] This linear convergence assumption ensures the transformed sequence \{A_n\} also converges to L, typically at a faster rate.[8] If the error ratios |e_{n+1}/e_n| approach a limit \lambda with |\lambda| < 1, the process eliminates the leading error term, yielding quadratic convergence asymptotically.[16]
Sufficient conditions for guaranteed acceleration include the positivity of second differences \Delta^2 s_n = s_{n+2} - 2s_{n+1} + s_n > 0 in monotonic sequences, which preserves the convergence order while reducing the error constant. For power series summation, the method succeeds under analytic continuation assumptions, where the generating function is analytic in a sector of the complex plane containing the convergence radius, allowing extrapolation beyond the disk of convergence.[17]
The process fails when |\rho| \geq 1, leading to divergence or stagnation, as the denominator \Delta^2 s_n may approach zero or change sign unpredictably.[8] It also underperforms if higher-order terms dominate, such as e_n \sim n r^n with |r| < 1, where the logarithmic factor prevents full error elimination.[8]
Under the standard linear error assumption, the error in the accelerated sequence satisfies \|A_n - L\| \leq K |\rho|^{2n} for some constant K > 0, demonstrating a squared reduction in the convergence factor.[15] For iterated applications, where the process is repeated on the output sequence, convergence holds for higher-order errors like e_n \sim c \rho^n + d \sigma^n with |\sigma| < |\rho| < 1, achieving super-quadratic rates after multiple iterations.[8]
Practical applications
Sequence convergence acceleration
Aitken's delta-squared process finds primary application in accelerating the convergence of sequences produced by fixed-point iterations, such as those solving equations of the form x = f(x) where the standard iteration x_{n+1} = f(x_n) exhibits linear convergence.[11] This method transforms three consecutive iterates into a single improved estimate that achieves quadratic convergence when the original sequence satisfies the necessary asymptotic error conditions.[11] It is particularly valuable in numerical methods where linear convergence limits efficiency, allowing practitioners to obtain accurate solutions with fewer iterations without altering the underlying iteration scheme.[11]
The process integrates effectively with other acceleration techniques, such as Richardson extrapolation, to enhance convergence in broader iterative frameworks like solving systems from partial differential equations.[11] In optimization contexts, Aitken's method serves as a foundational component in advanced preconditioners, notably within Anderson acceleration, which generalizes the delta-squared extrapolation to vector-valued fixed-point problems and improves convergence rates in nonlinear solvers.[18] These combinations leverage Aitken's simplicity to boost performance in high-dimensional settings, such as quasi-Newton methods or proximal algorithms.[18]
Computationally, Aitken's delta-squared process incurs only O(1) additional arithmetic operations per step, relying on simple differences and divisions from three prior terms, which enables its use in online acceleration scenarios without requiring storage of the entire sequence.[11] This low overhead makes it suitable for resource-constrained environments, where it can be applied sequentially as new iterates become available. In practical examples, the method accelerates eigenvalue computations, such as enhancing the power iteration for the dominant eigenvalue of nonnegative tensors by extrapolating successive Rayleigh quotient estimates.[19] Similarly, in root-finding, Aitken extrapolation augments the secant method by applying the transformation to iterates, reducing the number of function evaluations needed for nonlinear equation solving compared to the unaccelerated secant approach.[20]
As of 2025, implementations of Aitken's delta-squared process are accessible in established numerical libraries, including the aitken function in the pracma package for R, which supports scalar and vector sequences for convergence acceleration. In MATLAB, dedicated functions are available through the File Exchange, such as those for root-finding and series summation, integrable into custom iterative solvers.[21] For Python environments, while not native to NumPy or SciPy core modules, the algorithm is straightforward to code using array operations and appears in specialized packages like mpmath for high-precision applications.
Summation of alternating series
Aitken's delta-squared process finds significant application in accelerating the partial sums of alternating infinite series of the form \sum_{k=0}^{\infty} (-1)^k a_k, where a_k > 0 decreases monotonically and the series converges conditionally or slowly due to the oscillatory nature of the terms. The partial sums s_n = \sum_{k=0}^{n} (-1)^k a_k typically exhibit linear convergence with error alternating in sign and magnitude comparable to the next term a_{n+1}, leading to sluggish progress toward the limit. By applying the transformation \hat{s}_n = s_n - \frac{(s_{n+1} - s_n)^2}{s_{n+2} - 2s_{n+1} + s_n}, where the forward difference operator is \Delta s_n = s_{n+1} - s_n and \Delta^2 s_n = s_{n+2} - 2s_{n+1} + s_n, the process effectively removes the dominant \pm a_n error component, yielding a new sequence \{\hat{s}_n\} that converges more rapidly to the series sum.[5][22]
The specific benefit of this acceleration lies in its ability to eliminate the leading-order error term, which for geometrically decreasing alternating series effectively squares the convergence rate from O(r^n) to O(r^{2n}), where |r| < 1 is the common ratio. This makes the method particularly suited to series where the terms decay hyperbolically or slower, transforming the error behavior from linear to quadratic or higher order. For instance, in the Leibniz formula for \pi/4 = \sum_{k=0}^{\infty} (-1)^k / (2k+1), the unaided partial sums converge at an O(1/n) rate due to the asymptotic error s_n - \pi/4 \sim (-1)^{n+1}/(4n); application of Aitken's process reduces this to O(1/n^3), allowing accurate approximation with far fewer terms—for example, achieving several decimal places of \pi using just tens of iterations rather than thousands. Similar improvements apply to other alternating series, such as those arising in Fourier expansions or probabilistic sums with oscillatory components.[23][22]
Aitken originally developed the delta-squared process in 1926 as an enhancement to Daniel Bernoulli's iterative method for solving algebraic equations, where it accelerated root-finding iterations; this finite-difference foundation later extended to computations involving Bernoulli numbers, which appear in formulas for power sums \sum_{k=1}^n k^m via the relation \sum_{k=1}^n k^m = \frac{1}{m+1} \sum_{j=0}^m (-1)^j \binom{m+1}{j} B_j n^{m+1-j}.
Compared to Euler's transformation, which re-expresses alternating series via integrated remainders and excels for smoothly decreasing terms but requires knowledge of higher derivatives, Aitken's process is superior for purely linearly convergent alternating cases without additional analytic structure, as it relies solely on consecutive partial sums. However, it is less general than the Levin u-transform, a parameterized family of transformations that accommodates both convergent and divergent series with nonlinear error patterns, offering broader applicability at the cost of increased computational parameters.[9]
Implementation and examples
Numerical example
To illustrate the application of Aitken's delta-squared process, consider the partial sums s_n = \sum_{k=1}^n \frac{(-1)^{k+1}}{k} of the alternating harmonic series, which converges slowly to \ln 2 \approx 0.693147.[24] This sequence exhibits linear convergence with oscillation around the limit. The Aitken transformation, A_n = s_n - \frac{(\Delta s_n)^2}{\Delta^2 s_n}, where \Delta s_n = s_{n+1} - s_n and \Delta^2 s_n = s_{n+2} - 2s_{n+1} + s_n, is applied to accelerate convergence (as detailed in the Aitken transformation formula section).[2]
The following table shows the first 10 partial sums s_n, first differences \Delta s_n, second differences \Delta^2 s_n, and transformed values A_n, rounded to 6 decimal places:
| n | s_n | \Delta s_n | \Delta^2 s_n | A_n |
|---|
| 1 | 1.000000 | -0.500000 | 0.833333 | 0.700000 |
| 2 | 0.500000 | 0.333333 | -0.583333 | 0.690476 |
| 3 | 0.833333 | -0.250000 | 0.450000 | 0.694444 |
| 4 | 0.583333 | 0.200000 | -0.366667 | 0.692422 |
| 5 | 0.783333 | -0.166667 | 0.309524 | 0.693548 |
| 6 | 0.616667 | 0.142857 | -0.267857 | 0.692857 |
| 7 | 0.759524 | -0.125000 | 0.236111 | 0.693370 |
| 8 | 0.634524 | 0.111111 | -0.211111 | 0.692968 |
The original sequence s_n oscillates with amplitude decreasing roughly as $1/(2n), leading to an error of approximately $4.75 \times 10^{-2} at n=10. In contrast, the transformed sequence A_n exhibits reduced oscillation and quadratic convergence, with the error at A_8 dropping to about $1.79 \times 10^{-4}—a reduction of over two orders of magnitude—demonstrating faster approach to \ln 2.[24]
In finite-precision arithmetic, the process is sensitive to rounding errors, particularly when \Delta^2 s_n becomes small relative to the terms, as the division can magnify perturbations in the differences by factors up to the condition number of the transformation, potentially limiting accuracy to 10-12 decimal places even with double-precision inputs.[11]
Algorithmic pseudocode
The Aitken delta-squared process can be implemented as a straightforward algorithm that transforms a given sequence of partial sums or iterates into an accelerated sequence using the forward difference operator. The core computation applies the transformation T_n = s_n - \frac{(s_{n+1} - s_n)^2}{s_{n+2} - 2s_{n+1} + s_n} to each triplet of consecutive terms, where s denotes the input sequence. This produces a new sequence of length n-2 from an input of length n.
The following pseudocode outlines the basic algorithm for a single application of the process, assuming a zero-indexed array s of length n \geq 3 containing real-valued terms:
function aitken_delta_squared(s: array of real, n: integer) -> array of real:
if n < 3:
return s // Edge case: insufficient terms for transformation
A = new array of size n-2
for i = 0 to n-3:
delta1 = s[i+1] - s[i]
denom = s[i+2] - 2 * s[i+1] + s[i]
if abs(denom) < epsilon: // Edge case: small denominator to avoid division by near-zero
A[i] = s[i]
else:
A[i] = s[i] - (delta1 * delta1) / denom
return A
function aitken_delta_squared(s: array of real, n: integer) -> array of real:
if n < 3:
return s // Edge case: insufficient terms for transformation
A = new array of size n-2
for i = 0 to n-3:
delta1 = s[i+1] - s[i]
denom = s[i+2] - 2 * s[i+1] + s[i]
if abs(denom) < epsilon: // Edge case: small denominator to avoid division by near-zero
A[i] = s[i]
else:
A[i] = s[i] - (delta1 * delta1) / denom
return A
Here, \epsilon is a small tolerance threshold (e.g., $10^{-10}) to handle numerical instability when the second difference \Delta^2 s_i \approx 0, in which case the original term s_i is retained to prevent erratic behavior. This implementation ensures robustness in floating-point arithmetic across languages like Python or Fortran.
For iterative application, the process can be repeated on the output sequence until a convergence criterion is satisfied, such as the maximum change between successive transformed sequences falling below a tolerance. The following pseudocode demonstrates this, starting with an initial sequence and applying the transformation in a loop:
function iterative_aitken_delta_squared(s: array of real, n: integer, tol: real, max_iter: integer) -> real:
current = s
for iter = 1 to max_iter:
transformed = aitken_delta_squared(current, length(current))
if length(transformed) < 3 or max_abs_diff(transformed, current[0..length(transformed)-1]) < tol:
return mean(transformed) // Or last element as estimate
current = transformed
return mean(current) // Fallback after max iterations
function iterative_aitken_delta_squared(s: array of real, n: integer, tol: real, max_iter: integer) -> real:
current = s
for iter = 1 to max_iter:
transformed = aitken_delta_squared(current, length(current))
if length(transformed) < 3 or max_abs_diff(transformed, current[0..length(transformed)-1]) < tol:
return mean(transformed) // Or last element as estimate
current = transformed
return mean(current) // Fallback after max iterations
This iterative version accelerates linearly convergent sequences toward quadratic convergence under suitable conditions, with each iteration requiring additional terms if the sequence is generated on-the-fly.[9]
The basic algorithm runs in O(n) time and O(n) space for a single pass, as it processes the sequence linearly without recursion. For streaming data or large n, optimizations include computing differences incrementally to reduce arithmetic operations or using a sliding window of three terms to achieve O(1) space per output point, though this trades off the full transformed array. Floating-point caveats include potential overflow in the squared term for large |s_i|, mitigated by scaling or using extended precision.[9]