Fact-checked by Grok 2 weeks ago

Kahan summation algorithm

The Kahan summation algorithm, also known as compensated summation, is a numerical technique for computing the sum of a of floating-point numbers with significantly reduced compared to naive recursive summation. Developed by in 1965, it employs an auxiliary variable to track and compensate for the lost low-order bits that would otherwise accumulate during additions on finite-precision arithmetic systems. The algorithm operates by maintaining two accumulators: the primary sum S (initially zero) and a correction term c (initially zero). For each term y in the sequence, it first computes an adjusted term y' = y - c, then a temporary value t = S + y', updates the correction as c = (t - S) - y' to recapture any loss, and sets S = t. This process effectively simulates higher-precision without requiring additional support, provided the floating-point system normalizes sums before , as was common in early computers like the IBM 7090, and works well under IEEE 754. The algorithm is particularly effective when terms are added in decreasing . In practice, the is:
double sum = 0.0;
double c = 0.0;
for each y in the sequence:
    double y_ = y - c;
    double t = sum + y_;
    c = (t - sum) - y_;
    sum = t;
Kahan originally proposed this in the context of reducing truncation errors in large sums where terms are of comparable magnitude, demonstrating its efficacy through tests on quadrature integrals, solutions to differential equations, and infinite series. A key advantage of the Kahan algorithm is its error bound, which remains nearly independent of the number of terms n, bounded by approximately $2u \sum |x_i| + O(n u^2), where u is the unit roundoff (machine epsilon divided by 2). This contrasts sharply with the naive summation error, which grows as O(n u \sum |x_i|), potentially losing up to \log_{10} n decimal digits of accuracy for large n. The method is particularly valuable in scientific computing, where it has been used in numerical libraries and some implementations of BLAS-level operations like dot products, and it performs well regardless of the order of summation, though it assumes additions do not cause underflow. Further refinements, such as the Kahan-Babuška-Neumaier variant, extend the approach by adjusting the correction term to handle cases where added values may exceed the current sum, improving robustness in certain scenarios. Despite its simplicity and low overhead (adding only a few floating-point operations per term), the algorithm remains a cornerstone of accurate numerical , and is implemented in libraries for languages like and .

Floating-Point Arithmetic Fundamentals

Representation of Floating-Point Numbers

Floating-point numbers in computing are typically represented in a binary format that approximates real numbers using a sign bit, an exponent, and a mantissa (also known as the significand or fraction). The general form is (-1)^s \times m \times 2^{e - b}, where s is the sign bit (0 for positive, 1 for negative), m is the mantissa normalized to lie between 1 and 2 (with an implicit leading 1 in binary representation), e is the stored exponent, and b is the bias applied to the true exponent for storage efficiency. The standard, established by the Institute of Electrical and Electronics Engineers, defines the most widely used binary floating-point formats, including single precision (32 bits) and double precision (64 bits). In single precision, the allocation is 1 bit for the , 8 bits for the biased exponent (with a bias of 127, allowing exponents from -126 to +127 after adjustment), and 23 bits for the , providing about 7 decimal digits of precision. Double precision uses 1 , 11 bits for the biased exponent (bias of 1023, exponents from -1022 to +1023), and 52 bits, offering approximately 15-16 decimal digits of precision. These formats ensure a normalized representation where most numbers have an implicit leading 1 in the , except for subnormal numbers near zero. A key measure of precision in these formats is machine epsilon (\epsilon), defined as the smallest positive floating-point number such that $1 + \epsilon > 1 in the arithmetic of the system, representing the gap between 1 and the next representable number greater than 1. For IEEE 754 single precision, \epsilon = 2^{-23} \approx 1.192 \times 10^{-7}, while for double precision, \epsilon = 2^{-52} \approx 2.220 \times 10^{-16}. This value quantifies the relative rounding error inherent in representing and manipulating numbers. IEEE 754 specifies four rounding modes for arithmetic operations to handle cases where the exact result cannot be represented precisely: round to nearest (with ties rounding to the even least significant bit, the default mode), round toward positive infinity, round toward negative infinity, and round toward zero (). These modes affect the outcome of additions, multiplications, and other operations by determining how inexact results are approximated, potentially introducing small errors that accumulate in computations. For instance, round to nearest minimizes average error magnitude but can still lead to discrepancies in sequential operations. Such representational limitations form the basis for errors observed in numerical tasks like .

Sources of Error in Computations

In floating-point arithmetic, addition and subtraction operations introduce rounding errors because the result must be represented within the finite precision of the format. Specifically, for two floating-point numbers a and b, the computed sum \mathrm{Fl}(a + b) satisfies \mathrm{Fl}(a + b) = (a + b)(1 + \delta) where |\delta| \leq u and u is the unit roundoff (half the machine epsilon), due to the need to align exponents and round the mantissa to fit the available bits, potentially discarding low-order bits that carry significant relative information. A primary source of error occurs during addition when the operands differ greatly in magnitude, leading to loss of precision in the smaller number. For instance, adding a small value like $1.25 \times 10^{-5} to a large one such as $2.15 \times 10^{12} results in the sum rounding to $2.15 \times 10^{12}, effectively discarding the smaller term entirely due to exponent misalignment and mantissa truncation. In this process, low-order bits of the smaller operand are shifted right during alignment and lost if they fall outside the representable precision, reducing the effective significant digits in the result. Such losses accumulate in repeated operations, as each addition can introduce an error bounded by half a unit in the last place (ulp). Catastrophic cancellation represents another critical source, particularly in , where two nearly equal numbers are differenced, amplifying relative from prior . This phenomenon exposes inaccuracies in the operands, as the leading digits cancel, leaving the result dominated by the subtracted low-order bits, which may be unreliable. For example, b^2 - 4ac in the with approximate coefficients can yield a result with up to 70 ulps when the true difference is small, far exceeding the typical of 0.5 ulps. Although primarily involves , negative terms or reordered computations can induce similar cancellations, revealing hidden precision losses. In naive sequential summation of n terms, these individual errors compound, leading to a cumulative error on the order of O(n u \sum |x_i|), where u is the unit roundoff (approximately $2^{-53} for double-precision ). Each addition discards low-order bits from previous partial sums, especially when accumulating small terms into a growing total, causing the effective to degrade with n. This bound holds under the assumption of positive terms and rounded , with the total error depending on the order and the magnitude variations among terms.

Core Algorithm

Step-by-Step Procedure

The Kahan summation algorithm was invented by in 1965 to enable more accurate summation of floating-point numbers by compensating for s that accumulate during repeated s. The procedure begins by initializing two variables: the running sum S to 0 and the compensation variable c to 0. For each number x_i in the sequence to be summed, compute y = x_i - c, which adjusts the current term by subtracting the previously accumulated . Next, form the temporary sum t = S + y. Then, update the compensation as c = (t - S) - y, capturing the introduced in the addition of y to S. Finally, set S = t to advance the running sum. This process repeats for all terms in the sequence, yielding S as the compensated total. The compensation variable c serves to recycle the low-order bits of information that would otherwise be lost due to the limited precision of floating-point representation during each addition. Intuitively, c tracks the rounding error from the prior addition step and adds it back into the computation of the next term, thereby preserving significant digits that naive summation discards and reducing overall error accumulation.

Pseudocode Implementation

The Kahan summation algorithm can be implemented in a straightforward manner using the following , which directly translates to most programming languages supporting . This formulation assumes the input is an of floating-point numbers and uses variables of the same precision as the input elements to maintain consistency.
pseudocode
function kahan_sum([array](/page/Array) x):
    if x is empty:
        return 0.0
    S = 0.0  // Running sum
    c = 0.0  // Compensation for lost low-order bits
    for each x_i in x:
        y = x_i - c
        t = S + y
        c = (t - S) - y
        S = t
    return S
The loop structure iterates sequentially over the array elements, performing the compensation at each step without requiring or additional data structures. All variables (S, c, y, t, and x_i) should be declared in the floating-point type matching the input to avoid unnecessary loss during computations. For edge cases, such as an empty , the returns 0.0 by explicit check; sums consisting entirely of zeros naturally yield S = 0.0 through the initialization and loop logic. The algorithm maintains time , where n is the array length, but incurs approximately five floating-point operations per addition (subtractions and additions) compared to one in naive .

Illustrative Examples

Basic Numerical Example

To illustrate the Kahan summation algorithm in double-precision floating-point arithmetic, consider the small sequence of numbers x = [1.0, 2^{-53}, 2^{-53}], whose exact mathematical sum is $1 + 2^{-52} \approx 1.0000000000000002. In the naive summation approach, the calculation proceeds as follows: start with S = 0, then S = 0 + 1.0 = 1.0, then S = 1.0 + 2^{-53} = 1.0 (the small term is lost due to precision limits), and finally S = 1.0 + 2^{-53} = 1.0 (again lost), yielding a result of 1.0 with significant relative error for the small contributions. The Kahan algorithm, however, compensates for lost precision by maintaining a running error term c (initialized to 0). The step-by-step computation for this sequence, tracking the variables S (sum), c (compensation), y (adjusted addend), and t (temporary sum), is shown in the table below. Each iteration follows the procedure: y = x_i - c, t = S + y, c = (t - S) - y, S = t.
Iterationx_ic (before)y = x_i - ct = S + yc (after) = (t - S) - yS (after)
----
1.01.01.001.0
2$2^{-53}$2^{-53}1.0-2^{-53}1.0
3$2^{-53}-2^{-53}$2^{-52}$1 + 2^{-52}0$1 + 2^{-52}
The final Kahan sum is $1 + 2^{-52}, matching the exact value and demonstrating how the compensation term preserves the small additions that would otherwise be rounded away.

Error Visualization

To visualize the benefits of the Kahan summation algorithm, a conceptual diagram often depicts the magnitude of accumulated error as a function of the number of terms summed. In naive summation, the error grows roughly linearly with the number of terms due to successive rounding losses that are not compensated, potentially reaching O(n ε) where n is the number of terms and ε is the machine epsilon. In contrast, the Kahan method keeps the error bounded, typically O(ε), by recycling lost low-order bits into subsequent additions, resulting in a nearly flat error curve that does not degrade significantly with increasing n. A representative plot illustrates this for the summation of 1000 random floating-point numbers drawn from a in [0,1]. The naive approach shows accumulated on the order of hundreds to thousands of ulps (units in the last place), reflecting progressive loss of , while the Kahan summation maintains near a few ulps, comparable to that of a single addition in double , demonstrating its ability to preserve accuracy even for large sequences. The mechanism behind this preservation is "error recycling," where the compensation variable captures and reincorporates the rounding error from each addition. This can be represented in a flowchart: start with the current sum S and compensation c (initially 0); for each term y, compute temporary t = S + (y - c); update c to track the difference (S - t) + (y - c) to salvage lost bits; then set S = t. Arrows show the flow of the low-order error back into the next iteration, preventing its discard and bounding overall accumulation. Qualitatively, Kahan summation achieves relative accuracy comparable to performing the entire in higher arithmetic, such as when using inputs, making it particularly valuable for long-running summations where naive methods would otherwise amplify errors unacceptably.

Accuracy and

Error Bound Derivation

The error analysis of the Kahan summation algorithm assumes adherence to the standard for , with rounding performed to nearest (ties to even) and a unit roundoff u = \frac{1}{2} \epsilon, where \epsilon is the (e.g., u = 2^{-53} for ). In the algorithm, at each step i, a temporary value t = \mathrm{fl}(S + y) is computed, where S is the accumulated sum and y = x_i + c (with c the prior compensation term, initially 0). The local rounding error is then e_i = t - (S + y), satisfying |e_i| \leq u |S + y|. The compensation step updates c = \mathrm{fl}(\mathrm{fl}(S - t) + y), which recovers most of this error such that |c| \leq 2u \max(|S|, |y|). To derive the global error bound, consider the computed sum \tilde{S} = \sum_{i=1}^n x_i + E, where E is the total accumulated error. The compensation ensures that errors are not amplified by subsequent additions in the same way as in naive summation. Specifically, each compensation term bounds the propagation: the error contributed at step i is absorbed into c without exceeding O(u) relative to the local magnitudes, leading to a telescoping sum where higher-order terms O(n u^2) arise from interactions but the leading error remains controlled by the input magnitudes rather than the number of terms n. Thus, the absolute error satisfies |E| \leq 3u \sum_{i=1}^n |x_i| + O(n u^2 \max_i |x_i|). For the relative error, under the assumption of a well-conditioned sum (where the condition number \kappa = \sum |x_i| / |\sum x_i| is modest), the bound simplifies to |\tilde{S} - \sum x_i| / |\sum x_i| \leq 3 u \kappa + O(n u^2 \kappa), demonstrating superior accuracy over naive summation's O(n u \kappa) leading term, as the compensation prevents error growth linear in n.

Performance Comparison

The performance of the Kahan summation algorithm is typically evaluated using metrics such as relative error (defined as the between the computed sum and the exact sum, normalized by the exact sum) and absolute error, applied to standard test cases including the harmonic series (summing 1 + 1/2 + ... + 1/n), sums of random positive numbers (uniformly distributed in [0,1]), and sums of random signed numbers (uniformly distributed in [-1,1]). These cases highlight accumulation of errors in naive summation while testing Kahan's compensation mechanism. Empirical benchmarks demonstrate that Kahan summation reduces the relative by a factor of approximately n/2 compared to naive on , where n is the number of terms, due to its bounded growth independent of length versus the linear growth in naive methods. For instance, in double precision ( binary64 with ≈ 2.22 × 10^{-16}), runtime overhead arises from the additional , temporary storage, and conditional operations, resulting in Kahan being roughly 4-5 times slower than naive for small to medium sizes, though the gap narrows for very large n limited by . The following table summarizes representative relative error results for summation of 10^6 random positive terms in double precision, based on benchmarks confirming theoretical bounds:
MethodRelative Error (approx.)
Naive10^{-10}
Kahan10^{-16}
These values align with naive error scaling as O(n ε) ≈ 10^6 × 10^{-16} = 10^{-10}, while Kahan maintains near-machine-epsilon accuracy. Kahan summation particularly excels in ill-conditioned sums involving terms of widely varying magnitudes, such as alternating positive and negative values or decreasing sequences like the harmonic series, where naive methods suffer and error amplification, but Kahan's compensation term preserves significant digits effectively.

Extensions and Variants

Enhanced Compensated Summation

The Kahan summation algorithm can be enhanced to address limitations in the original , particularly when added terms may exceed the of the running or when special floating-point values like signed zeros and are present. These improvements maintain the core compensated approach while providing better error control and robustness. Enhancements involve using floating-point operations that preserve the sign according to standards during compensation computations to avoid unintended sign flips for zeros. Similarly, for , the compensation is adjusted to ensure propagation mirrors naive summation, preventing cases where compensated steps produce instead of , as observed in some implementations. A prominent enhancement is the iterative refinement technique, which applies multiple passes or inner loops of compensated summation to recover additional . In this approach, the initial Kahan sum is computed, and the accumulated c is then used in a second pass over the data (or a ) to refine the total, effectively cascading the compensation for higher-order . This is particularly useful for long sequences where single-pass compensation leaves residual errors, achieving near-double accuracy with modest computational overhead. For dot products, the Kahan algorithm is adapted by applying compensated summation to the running total of pairwise products, compensating for lost bits in each multiplication-addition step. Tools like inspire further tweaks by automatically analyzing expressions and inserting compensated loops around dot product computations, optimizing for specific input distributions to minimize rounding errors without manual intervention. This adaptation is common in numerical libraries, where it significantly improves accuracy for vector operations compared to naive loops. A specific variant offering enhanced accuracy is the Kahan-Babuška-Neumaier (KBN) algorithm, which adjusts the compensation based on the relative magnitudes of the term and the running sum to prevent loss when large terms are added to a small accumulator. Developed by V. Babuška in 1972 and improved by A. Neumaier in 1974, it ensures robustness in such scenarios. The for this variant is as follows:
sum = 0.0
c = 0.0
for each y in the sequence:
    t = sum + y
    if abs(sum) >= abs(y):
        c = c + (sum - t) + y
    else:
        c = c + (y - t) + sum
    sum = t
return sum + c
This ensures the error bound is approximately O(ε ∑ |x_i|), providing accuracy nearly independent of the number of terms n, and has been adopted in modern implementations, such as 's built-in sum function since version 3.12. The Kahan summation algorithm, introduced in , inspired subsequent advancements in compensated summation techniques aimed at mitigating floating-point rounding errors in higher-precision contexts. Post-, and collaborators, including researchers at the , pursued refinements that extended compensation principles to multi-precision arithmetic and hardware support, influencing algorithms for exact computations in . A foundational related algorithm is Dekker's fast two-sum, which performs an error-free transformation to compute both the rounded sum and the exact rounding error of two floating-point numbers using only three floating-point operations, assuming no underflow and that one operand is larger in magnitude. Developed by T.J. Dekker in 1971, this method simulates double-length precision from single-length floating-point arithmetic by isolating the error term, providing a building block for compensated summations of longer sequences without accumulating multiple rounding errors. Extending these error-free ideas, David M. Priest's double-double arithmetic represents numbers as unevaluated sums of two double-precision floating-point values (hi and lo components), achieving effective quadruple precision with a relative error bound of O(u²), where u is the unit roundoff of double precision. Introduced in , Priest's approach applies compensation during and of these pairs to maintain exactness in intermediate results, enabling accurate summation of floating-point sequences by propagating s into the low-order component rather than discarding them. This technique has been widely adopted in libraries for arbitrary-precision computations, such as QD, to handle operations beyond standard precision. In a hardware-oriented direction, the Kulisch accumulator addresses summation accuracy through a dedicated fixed-point large enough to encompass the full of floating-point products, allowing exact accumulation of dot products without intermediate . Proposed by Kulisch starting in the , this design contrasts software compensation by leveraging silicon resources—a 4288-bit accumulator for double-precision terms—to store the precise sum of up to thousands of terms before a single final to floating-point format. Early implementations, such as the XPA 3233 coprocessor developed by Kulisch's team in 1993, demonstrated its feasibility for high-speed exact dot products in scientific computing, reducing error to zero during accumulation at the cost of additional hardware.

Alternative Techniques

Recursive or Pairwise Summation

Pairwise summation and recursive summation represent non-compensated techniques for improving the accuracy of floating-point sums by reorganizing the order of additions, rather than relying on sequential accumulation as in naive , which can accumulate errors up to O(n ε) where n is the number of terms and ε is the . These methods exploit a divide-and-conquer to pair terms recursively, forming a reduction that halves the number of summands at each level until a single result remains. In pairwise summation, elements are grouped into pairs, each pair summed, and the results of those sums then paired and summed recursively over approximately log₂(n) stages; this process reduces the overall rounding error growth to O(log n ε), a significant improvement over the linear error in naive methods for large n. For instance, with six terms x₁ through x₆, the summation proceeds as ((x₁ + x₂) + (x₃ + x₄)) + (x₅ + x₆), with further pairing if needed. The error bound for this approach is |error| ≤ γ_{⌊log₂ n⌋} ∑ |x_i|, where γ_k ≈ k ε for small k ε, confirming the logarithmic dependence. Recursive summation operates similarly as a , recursively splitting the into halves, summing each half independently, and then adding the two partial sums; this mirrors pairwise summation but emphasizes balanced partitioning for efficiency. A sketch for recursive summation is as follows:
function recursive_sum(arr):
    if length(arr) ≤ 1:
        return arr[0] if arr else 0
    mid = length(arr) // 2
    left_sum = recursive_sum(arr[0:mid])
    right_sum = recursive_sum(arr[mid:])
    return left_sum + right_sum
This implementation achieves the same O(log n ε) error bound as pairwise , with the recursion depth determining the error factor. These techniques offer key advantages, including no additional floating-point operations beyond the n-1 additions required for any , making them computationally efficient and suitable for parallelization on architectures supporting tree reductions; they perform particularly well when n is a , avoiding uneven partitioning. However, their limitations include persistent O(log n) error growth with increasing n, which remains worse than the effectively constant O(1) error of compensated methods like Kahan summation, and potential inefficiency for non-power-of-two lengths due to handling of odd-sized subarrays.

Block and Streaming Methods

Block summation techniques address the challenges of summing large arrays where the number of elements n greatly exceeds the machine's word size, by partitioning the data into manageable chunks to balance accuracy and computational efficiency. In this approach, the input array is divided into blocks of size b, where each block is summed using a high-accuracy method such as Kahan compensated summation to minimize rounding errors within the block. The resulting partial sums from each block are then combined using a stable summation strategy, such as pairwise summation, to produce the final total. This method achieves a backward error bound of approximately (b+1)u + O(u^2), where u is the unit roundoff, making the error largely independent of n to first order and providing more correct digits than naive recursive summation for large datasets. For scenarios involving continuous or infinite data streams, such as in or analytics, streaming algorithms enable online without requiring the entire to be available upfront. These algorithms elements one at a time, maintaining a running total with constant memory usage and ensuring exact relative to the correctly rounded sum of the inputs. A notable example is Bailey's online exact algorithm (Algorithm 908) that improves upon Kahan by using only five floating-point operations per summand, leveraging for efficiency while guaranteeing correctness even for non-normalized inputs, intermediate . This approach is particularly suited for streaming applications where data arrives incrementally, offering exact results 2–3 times slower than basic recursive but faster than other accurate alternatives for streams longer than 10,000 elements. Compensated block methods extend these ideas into hybrid approaches for environments, applying Kahan summation within each block and combining block sums in a multi-layer to handle . In such systems, error-free transformations and self-compensated summation ensure reproducibility across SIMD, multi-core, and cluster-level , as seen in applications where non-associativity in floating-point operations must be mitigated. For instance, multi-level block summation with Kahan compensation achieves linear speedup on multi-core processors via and up to 76% efficiency on 32-node MPI clusters, making it viable for large-scale numerical tasks in distributed settings. These methods involve trade-offs between usage, accuracy, and , particularly when n \gg word size. Block approaches reduce overhead compared to storing all elements but may sacrifice some accuracy if block size b is too large, while streaming variants prioritize constant at the cost of slightly higher per-element computation. Overall, they provide robust alternatives to pure Kahan summation for data-intensive scenarios, enhancing without excessive overhead.

Implementation Challenges

Compiler and Optimization Issues

Aggressive floating-point optimizations in modern compilers can undermine the Kahan summation algorithm by altering the precise order of operations required for error compensation. Specifically, flags enabling reassociation, such as GCC's -ffast-math, permit the compiler to reorder additions and subtractions, effectively nullifying the role of the compensation variable c, which relies on sequential evaluation to capture and reinject lost precision. This transformation can reduce the algorithm to a naive summation, amplifying rounding errors in sequences with varying magnitudes. A notable example involves fused multiply-add (FMA) instructions, which compilers may insert during optimization to fuse operations like the compensation step c = y - (t - S). If reassociation occurs, this can compute an "exact" difference that discards subtle error information, as the intermediate rounding in separate subtractions is what isolates the lost bits for later recovery. Such contractions, while efficient, violate the algorithm's dependence on non-associative floating-point semantics, leading to degraded accuracy in critical applications like . To mitigate these issues, developers must enforce strict floating-point models that prohibit reassociation. In and , the -fno-associative-math flag preserves the required operation order without broadly disabling optimizations. Similarly, MSVC's /fp:strict option ensures compliant behavior by respecting the standard's non-associativity. Intrinsics or inline assembly can further protect sensitive computations, though they increase code complexity. For , the Intel compiler's -assume protect_parens flag safeguards parenthesized expressions like those in Kahan summation. Historically, such optimizations invalidated Kahan summation in early implementations, where lax floating-point semantics led to unpredictable summation results across JVMs, as critiqued by himself. In compilers of the era, similar reassociation issues arose without protective flags, prompting recommendations for unoptimized builds (-O0) to maintain accuracy until stricter options like -assume protect_parens were introduced. Modern compilers have improved with configurable strict modes, but vigilance remains essential for reliable compensated summation.

Library and Language Support

The Julia programming language provides support for compensated summation through the KahanSummation.jl package, which implements the Kahan-Babuška-Neumaier (KBN) variant via functions such as sum_kbn and cumsum_kbn for enhanced accuracy in summing arrays. Julia's base sum function, however, defaults to pairwise summation rather than Kahan summation. Additionally, the AccurateArithmetic.jl package offers compensated summation algorithms, including Kahan-Babuška-Neumaier, as alternatives to naive summation for high-precision needs. In , the standard library's math.fsum function employs Shewchuk's compensated summation algorithm to accurately sequences of floating-point numbers, mitigating errors (distinct from Kahan summation). 's sum function uses pairwise summation for improved but does not natively implement Kahan summation; third-party libraries like accupy provide Kahan-based methods for multidimensional arrays. The Boost C++ Libraries include support for Kahan summation in the Accumulators framework, with the sum_kahan statistic reducing numerical errors in sequential sums through dedicated implementation classes. In frameworks, TensorFlow Probability offers reduce_kahan_sum for reducing tensors along specified axes using Kahan summation, returning both the sum and correction term. For , the optimi library integrates Kahan summation into optimizers for low-precision training, such as BFloat16, to maintain accuracy in gradient computations and reduce memory usage. In , the kahan crate provides a type for computing Kahan sums over floating-point iterators, enabling accurate accumulation since its initial release. The nalgebra linear algebra library does not include built-in Kahan support, requiring users to integrate external crates for such operations. Java lacks built-in Kahan summation in its standard libraries, necessitating manual implementations or reliance on specialized frameworks like Apache Flink's CompensatedSum aggregator, which applies the algorithm for distributed computations. For GPU computing, supports Kahan summation through software implementations, such as atomic operations adapted for compensated accumulation in parallel reductions, though no native hardware instructions are provided. BLAS and routines, commonly used in numerical libraries, typically do not employ Kahan summation internally, favoring other methods like pairwise for .

References

  1. [1]
    Pracniques: further remarks on reducing truncation errors
    Pracniques: further remarks on reducing truncation errors. Author: W. Kahan. W. Kahan Univ. of Toronto, Toronto, Ont., Canada.
  2. [2]
    [PDF] CS 6210: Matrix Computations
    Sep 10, 2025 · Where the error bound for ordinary summation is (n − 1)𝜖mach. ‖v‖1 + O(𝜖2 mach), the error bound for compensated summation is 2𝜖mach. ‖v‖1 + O( ...
  3. [3]
    IEEE Floating-Point Representation | Microsoft Learn
    Aug 3, 2021 · The IEEE-754 standard describes floating-point formats, a way to represent real numbers in hardware. There are at least five internal formats for floating- ...
  4. [4]
    [PDF] IEEE 754 Floating Point Representation
    Round to the nearest representable number. If exactly halfway between, round to representable value w/ 0 in LSB (i.e. nearest even fraction). Round towards 0.
  5. [5]
    [PDF] CMSC216: Binary Floating Point Numbers - UMD CS
    Oct 13, 2025 · IEEE 754 Format: The Standard for Floating Point float double ... ▷ Sign, Exponent, and Mantissa/Fractional Portion. 3. Represent 7.125 ...
  6. [6]
    [PDF] IEEE754 Format
    the first bit is use to designate the sign of the number – 0=positive, 1=negative (the. “sign bit”). • the next 8 bits are designated for the exponent to ...
  7. [7]
    CS 357 | Floating Point Representation
    Machine epsilon (ϵm) is defined as the distance (gap) between 1 and the next largest floating point number. For IEEE-754 single precision, ϵm=2− ...
  8. [8]
    1: IEEE Arithmetic - Mathematics LibreTexts
    Jul 17, 2022 · ... called machine epsilon (called eps in MATLAB). Machine epsilon is defined as the distance between 1 and the next largest number. If ...
  9. [9]
    IEEE 754 arithmetic and rounding - Arm Developer
    Round to nearest. The system chooses the nearer of the two possible outputs. · Round up, or round toward plus infinity · Round down, or round toward minus ...
  10. [10]
    [PDF] Floating Point and IEEE 754 - NVIDIA Docs Hub
    Mar 2, 2024 · The IEEE 754 standard defines four rounding modes: round-to-nearest, round towards positive, round towards negative, and round towards zero.
  11. [11]
    What Every Computer Scientist Should Know About Floating-Point ...
    There are two kinds of cancellation: catastrophic and benign. Catastrophic cancellation occurs when the operands are subject to rounding errors. For example ...
  12. [12]
    [PDF] an algorithm for floating-point accumulation of sums with small ...
    Perhaps "catastrophic loss of precision" would be a more appropriate name. Catastophic cancellation is fairly common with poorly designed algorithms; most good ...
  13. [13]
    [PDF] Accurate floating point summation∗ - People @EECS
    May 8, 2002 · We generated f < 24 bit data by taking single precision numbers and zeroing out the trailing 24 − f bits. We used the simple distillation ...
  14. [14]
    [PDF] Scalable and Numerically Stable Descriptive Statistics in SystemML
    In terms of runtime performance, our MapReduce Kahan summation and the naive recursive summation are similar, but the sorted summation is up to 5 times ...<|separator|>
  15. [15]
    Kahan summation in `sum`? - General Usage - Julia Discourse
    Aug 11, 2023 · Pairwise summation recursively divides the array in half, sums the halves recursively, and then adds the two sums.
  16. [16]
    Accurate summation — Accurate Algorithms 0.1 documentation
    For getting an overall picture, the algorithms for the steps 1 and 2 are presented first. The algorithm for step 2 is a slight modification of Kahan's cascaded ...Missing: original | Show results with:original<|control11|><|separator|>
  17. [17]
    A floating-point technique for extending the available precision
    A floating-point technique for extending the available precision. Published: June 1971. Volume 18, pages 224–242, (1971); Cite this article. Download PDF.
  18. [18]
    [PDF] Algorithms for Arbitrary Precision Floating Point Arithmetic
    Abstract. We present techniques which may be used to perform computations of very high accuracy using only straight- forward floating point arithmetic ...
  19. [19]
    The exact dot product as basic tool for long interval arithmetic
    Nov 2, 2010 · The basic tool to achieve high speed dynamic precision arithmetic for real and interval data is an exact multiply and accumulate operation and ...Missing: accumulator | Show results with:accumulator
  20. [20]
    [PDF] xn) E(IOO(x2i-x22i_1 + (1 x2i_1)2). - Nick Higham
    In this paper we examine a variety of methods for floating point summation, with ... [27] compares recursive summation with pairwise summation for uniform ...
  21. [21]
  22. [22]
  23. [23]
    63644 – Kahan Summation with fast-math, pattern not always ...
    In the following example (compiled with -Ofast -std=c++11) the kahan summation pattern is recognized in "sum", not in "counter"
  24. [24]
  25. [25]
    Optimization question - Intel Community
    Jun 24, 2013 · The default will break Kahan summation except at -O0. I would suggest setting "-assume protect_parens" so you will not require O0.Missing: issues | Show results with:issues<|control11|><|separator|>
  26. [26]
    [PDF] How Java's Floating-Point Hurts Everyone Everywhere
    Kahan's imaginary class allows real and complex to mix without forcing coercions of real to complex. Thus his classes avoid a little wasteful arithmetic ( with ...Missing: summation | Show results with:summation
  27. [27]
    JuliaMath/KahanSummation.jl - GitHub
    This package provides variants of sum and cumsum, called sum_kbn and cumsum_kbn respectively, using the Kahan-Babuska-Neumaier (KBN) algorithm for additional ...
  28. [28]
    Home · KahanSummation.jl
    This package uses the Kahan-Babuska-Neumaier (KBN) compensated summation algorithm for computing sums and cumulative sums. The sum and cumsum functions in ...
  29. [29]
    AccurateArithmetic.jl - Julia Packages
    In effect these (simply) compensated algorithms produce the same results as if a naive summation had been performed with twice the working precision (128 bits ...
  30. [30]
    Summing large lists of floating point numbers in Python - CMU Math
    Another algorithm is compensated summation (also called Kahan summation). The algorithms are all simple to implement if you want to play with them.
  31. [31]
    numpy.sum — NumPy v2.4.dev0 Manual
    numpy.sum calculates the sum of array elements over a given axis. By default, it sums all elements of the input array.
  32. [32]
    accupy - PyPI
    Feb 27, 2018 · All summation methods sum the first dimension of a multidimensional NumPy ... Kahan summation not significantly better; this, too, is expected.Missing: support | Show results with:support
  33. [33]
    boost/accumulators/statistics/sum_kahan.hpp
    The Kahan summation algorithm reduces the numerical error obtained with standard sequential sum.
  34. [34]
    tfp.math.reduce_kahan_sum | TensorFlow Probability
    Nov 21, 2023 · Reduces the input tensor along the given axis using Kahan summation. Returns both the total and the correction term, as a namedtuple , ...Missing: PyTorch | Show results with:PyTorch
  35. [35]
    warner-benjamin/optimi: Fast, Modern, and Low Precision PyTorch ...
    optimi enables accurate low precision training via Kahan summation, integrates gradient release and optimizer accumulation for additional memory efficiency.Missing: TensorFlow | Show results with:TensorFlow
  36. [36]
    kahan - Rust - Docs.rs
    This crate implements a type for computing Kahan sums over floating point numbers. It also implements a new trait for computing Kahan sums over iterators of ...
  37. [37]
    CompensatedSum (Flink : Java 1.15.0 API) - javadoc.io
    Used to calculate sums using the Kahan summation algorithm. The Kahan summation algorithm (also known as compensated summation) reduces the numerical errors ...
  38. [38]
    atomicAdd Kahan Summation - CUDA - NVIDIA Developer Forums
    Mar 11, 2015 · Pseudo code demonstrating Kahan summation: function KahanSum(input) var sum = 0.0 var c = 0.0 // A running compensation for lost low-order ...
  39. [39]
    Kahan summation algorithm - Wikipedia
    The Kahan summation algorithm, also known as compensated summation, [1] significantly reduces the numerical error in the total obtained by adding a sequence of ...The algorithm · Further enhancements · Alternatives · Support by libraries