Fact-checked by Grok 2 weeks ago

Machine epsilon

Machine epsilon, denoted ε_mach or simply ε, is the smallest positive floating-point number such that 1 + ε is distinguishable from 1 in the of a given computing system, serving as a measure of the relative precision inherent in floating-point representations. It quantifies the maximum relative error introduced when representing real numbers as floating-point approximations, typically arising from the finite number of bits allocated to the in formats like IEEE 754. This value is fundamental in , as it bounds the error in arithmetic operations and informs the design of stable algorithms. In the IEEE 754 standard, which defines the most widely used binary floating-point formats, machine epsilon for single precision (32 bits, 24-bit ) is 2^{-23} ≈ 1.19 × 10^{-7}, while for double precision (64 bits, 53-bit ) it is 2^{-52} ≈ 2.22 × 10^{-16}. These values correspond to the distance from 1 to the next representable floating-point number greater than 1, also known as the unit in the last place (ulp) at 1. The unit roundoff u, often u = ε/2, represents the maximum relative error for rounding to nearest, and machine epsilon is twice this quantity in symmetric rounding modes. Machine epsilon can be computed programmatically by iteratively halving a value starting from 1 until 1 + δ equals 1, highlighting its dependence on the system's arithmetic implementation. The concept is crucial for assessing the accuracy and stability of numerical computations, particularly in error propagation analysis, where floating-point operations introduce perturbations bounded by multiples of ε. For instance, the computed result fl(x ⊕ y) of satisfies |fl(x ⊕ y) - (x + y)| ≤ u |x + y|, ensuring predictable error growth in algorithms. While machine epsilon provides an upper bound on representation errors near 1, the relative precision varies across the due to , being ulp(x)/|x| for numbers away from powers of the . Its role extends to validating software implementations of floating-point standards and guiding choices in settings for iterative solvers and in scientific .

Floating-Point Arithmetic Basics

Binary Floating-Point Representation

Binary floating-point numbers represent real numbers in computer systems using a fixed number of bits divided into three components: a , an exponent field, and a (also known as the ). The , which is a single bit, indicates whether the number is positive (0) or negative (1). The exponent field consists of several bits that encode the scale of the number as a power of 2, while the field stores the of the number in . In normalized binary floating-point representation, as defined by the standard, the value x of a finite non-zero number is given by the formula x = (-1)^s \times m \times 2^{e - \text{bias}}, where s is the (0 or 1), m is the interpreted as a value between 1 (inclusive) and 2 (exclusive) with an implied leading 1 followed by the stored fraction bits, e is the stored biased exponent, and bias is a constant specific to the format (e.g., 127 for single precision). This normalization ensures that the always starts with a leading 1 in binary, maximizing precision by avoiding leading zeros. The unit in the last place (ulp) refers to the spacing or gap between two consecutive representable floating-point numbers in the same binade (range between powers of 2), equivalent to the value of the least significant bit in the for a given exponent. It provides a measure of the limits inherent in the , as the ulp varies with the of the number: larger exponents result in larger ulps, reflecting coarser spacing for bigger values. For example, the number 1.0 in single-precision floating-point (32 bits total) is represented with 0, biased exponent 127 (actual exponent 0), and fraction bits all 0 (implied leading 1, so m = 1.0). In , this is 0 01111111 00000000000000000000000, or in 0x3F800000. The ulp at 1.0 is $2^{-23} \approx 1.192 \times 10^{-7}, the distance to the next representable number 1 + ulp.

Rounding in Arithmetic Operations

In floating-point arithmetic, basic operations such as addition, subtraction, multiplication, and division are performed by first computing an exact intermediate result, which is then rounded to the nearest representable value in the floating-point format to fit within the available precision. This rounding step is necessary because most real numbers and operation outcomes cannot be exactly represented due to the finite number of bits allocated for the significand. The IEEE 754 standard mandates correct rounding for these operations, ensuring the computed result is as close as possible to the mathematically exact value under the selected mode. The specifies five rounding modes to control how inexact results are approximated, with round-to-nearest (ties to even) as the default for most implementations. In round-to-nearest, the exact result is rounded to the closest representable floating-point number; if it lies exactly midway between two candidates, the tie is broken by rounding to the one with an even least significant bit in the . Alternative directed modes include round toward zero (, which discards excess magnitude), round toward positive infinity ( up for positive excesses), and round toward negative infinity ( down for negative excesses); a fifth mode, round to nearest with ties away from zero, was added in the 2019 revision for specific use cases like statistical computations. These modes allow programmers to influence behavior, such as in or when preserving sign in financial calculations. Rounding introduces an error when the exact result falls between two consecutive representable numbers, known as the unit in the last place (ulp), which is the spacing between those numbers at the given magnitude. In round-to-nearest , this rounding error is bounded by 0.5 ulp of the result, meaning the computed value deviates from the exact value by at most half the local spacing between representable points. The ulp itself varies with the exponent, being larger for bigger numbers since the significand's fixed scales the dynamically across magnitudes. A clear illustration of rounding error occurs in , where a small value added to a larger one may be effectively ignored if it falls below the resolution threshold. For instance, adding a tiny positive number ε to 1.0 yields exactly 1.0 + ε, but if ε is smaller than 0.5 ulp at 1.0, the sum rounds back to 1.0 in round-to-nearest mode, losing the small contribution entirely. This effect arises because the binary floating-point representation around 1.0 has a fixed ulp determined by the significand's , as covered in the structure of binary floating-point numbers. Such losses highlight the limitations of finite and can propagate in chained operations, emphasizing the need for awareness in numerical algorithms. The unit roundoff quantifies the maximum relative rounding error possible in these operations, providing a bound on how much the computed result can deviate from the exact one relative to the result's scale. In the context of round-to-nearest, this relative error is at most 0.5 ulp normalized by the result's magnitude, serving as a fundamental measure of the arithmetic's faithfulness to real-number mathematics.

Core Definitions

Rounding Machine Epsilon

The rounding machine epsilon, also known as the unit roundoff and denoted u, represents the maximum relative error introduced when representing a in a floating-point system or performing operations that round to the nearest representable value. In binary floating-point , it is formally defined as u = 2^{-p}, where p is the , corresponding to the number of bits in the (including the implicit leading bit). This is half the machine epsilon \epsilon = 2^{1-p}. This definition arises within the standard model of floating-point arithmetic, which assumes that every operation rounds the exact result to the nearest representable number, with ties broken in a specified manner (such as round-to-nearest-even). Under this model, the computed result \mathrm{fl}(x \oplus y) of an operation \oplus satisfies \mathrm{fl}(x \oplus y) = (x \oplus y)(1 + \delta), where |\delta| \leq u, bounding the relative error. The value of u can be derived directly from the unit in the last place (ulp) at 1, which is \epsilon = 2^{1-p}; thus, u = \epsilon / 2. In binary floating-point representation, the representable numbers near 1 have a spacing of $2^{1-p}, so the maximum rounding error is half this spacing relative to the number. The concept of rounding machine epsilon (unit roundoff) originated in early numerical analysis literature for bounding errors in computational processes, notably in Wilkinson's seminal work on rounding error analysis. Note that terminology varies; while some sources use "rounding machine epsilon" for u, the more common term is "unit roundoff," with "machine epsilon" referring to the full spacing \epsilon.

Interval Machine Epsilon

The machine , denoted as \epsilon_{\text{mach}}, is defined as the smallest positive floating-point number \epsilon > 0 such that the floating-point representation \mathrm{fl}(1 + \epsilon) > 1, where \mathrm{fl} denotes the floating-point mapping function. This value captures the smallest distinguishable addition to 1 in the floating-point system, independent of the specific rounding mode, and is also known as the standard machine . In binary floating-point formats with a p-bit (including the implicit leading 1), the interval machine equals the unit in the last place (ulp) at 1.0, expressed as \epsilon = 2^{1-p}. This spacing arises because representable numbers near 1 are separated by increments of $2^{-(p-1)}, reflecting the granularity of the in the normalized . It differs from the rounding machine epsilon (unit roundoff u), which bounds the maximum relative error in rounding operations and equals half the interval machine epsilon in round-to-nearest mode (u = \epsilon / 2). The interval variant emphasizes the representable gap for detection purposes, while the rounding variant focuses on error propagation in computations. Note that "machine epsilon" typically refers to this interval value in most literature and programming languages (e.g., DBL_EPSILON in C). One practical to approximate the machine empirically involves iteratively halving an until the to 1 becomes indistinguishable:
eps = 1.0
while (1.0 + eps != 1.0):
    eps = eps / 2.0
eps = eps * 2.0  // Adjust to the last distinguishable [value](/page/Value)
This exploits the floating-point to identify the where $1 + \epsilon = 1 in .

Computation Techniques

Theoretical Derivation

In binary floating-point arithmetic, a is represented as x = \pm (1.f) \times 2^e, where f is the consisting of p-1 bits, e is the exponent, and the leading bit is implicitly 1, making the have p bits in total. For numbers in the interval [1, 2), the exponent e = 0, and the significand ranges from $1.000\ldots_2 (which is 1) to $1.111\ldots_2 = 2 - 2^{-(p-1)}. The representable values are thus $1 + k \times 2^{-(p-1)} for integer k = 0, 1, \dots, 2^{p-1} - 1. The spacing between consecutive representable numbers in this binade, known as the unit in the last place (ulp), is therefore $2^{-(p-1)}. The interval machine epsilon, denoted \epsilon, is defined as this ulp value at 1, representing the smallest positive floating-point number such that $1 + \epsilon > 1 in the arithmetic. Thus, \epsilon = 2^{-(p-1)}. This derivation assumes an IEEE 754-like model with binary base (\beta = 2), normalized representations (no denormals, which affect subnormal numbers far from 1), and no overflow or underflow near 1.0. The formula holds specifically for binary floating-point but generalizes to \epsilon = \beta^{1-p} for base-\beta systems with p-digit significands, such as decimal formats where the spacing differs accordingly.

Practical Approximation

A practical method for approximating machine epsilon in software involves an iterative algorithm that empirically determines the smallest positive floating-point number \epsilon such that $1 + \epsilon > 1 in the given arithmetic environment. This approach begins with an initial value of \epsilon = 1.0 and repeatedly halves it until the condition $1 + \epsilon = 1 holds true in floating-point evaluation, at which point \epsilon is doubled to recover the boundary value. The algorithm can be expressed in pseudocode as follows:
eps = 1.0
while (1.0 + eps > 1.0):
    eps = eps / 2.0
eps = eps * 2.0
This yields the machine epsilon for the prevailing floating-point format. The method avoids underflow by starting from a value well above the smallest representable number and converging toward machine epsilon through division, ensuring the loop terminates long before reaching denormalized or zero values. It is portable and effective across common programming languages, including C, Python, and MATLAB, where it can be implemented directly without reliance on predefined constants. However, results may vary slightly due to optimizations that rearrange expressions or apply algebraic simplifications, potentially treating $1 + \epsilon > 1 as \epsilon > 0 and yielding an incorrect (too small) value. Additionally, on architectures with registers, such as x86, intermediate computations might use higher precision, leading to a larger apparent unless strict floating-point modes or volatile qualifiers are employed to enforce the target precision.

Standard Values and Formats

IEEE 754 Single and Double Precision

The standard for binary floating-point arithmetic, initially published in 1985 and subsequently revised in 2008 and 2019, specifies formats that dominate modern computing hardware, including x86 and processor architectures. These formats ensure consistent representation and operations across systems, with single and double precision being the most prevalent for general-purpose computations. In single (binary32), the 32-bit format allocates 1 bit for the sign, 8 bits for the biased exponent, and 23 bits for the , yielding an effective of 24 bits due to the implicit leading 1 in normalized numbers. The machine , defined as the smallest positive value \epsilon such that $1 + \epsilon > 1 in floating-point representation, is \epsilon = 2^{-23} \approx 1.192 \times 10^{-7}. This value represents the unit in the last place (ulp) at 1.0, bounding the spacing between representable numbers in the vicinity of 1. Double precision (binary64) employs 64 bits: 1 , 11 exponent bits, and 52 bits, for an effective precision of 53 bits including the implicit 1. Here, the machine is \epsilon = 2^{-52} \approx 2.220 \times 10^{-16}, again corresponding to the ulp at 1.0. The unit roundoff u, which quantifies the maximum relative error in round-to-nearest mode, is half the machine epsilon, or u = \epsilon / 2. The following table summarizes these key parameters for comparison:
FormatTotal BitsPrecision pMachine Epsilon \epsilonUnit Roundoff uULP at 1.0
Single (binary32)3224$2^{-23} \approx 1.192 \times 10^{-7}$2^{-24} \approx 5.961 \times 10^{-8}$2^{-23}
Double (binary64)6453$2^{-52} \approx 2.220 \times 10^{-16}$2^{-53} \approx 1.110 \times 10^{-16}$2^{-52}
These values are derived directly from the significand structure in the standard and are fundamental to assessing errors in numerical algorithms.

Other Common Formats

Half-precision floating-point, also known as binary16 in the -2008 standard, utilizes 11 bits for the (including an implicit leading 1), resulting in a machine epsilon of $2^{-10} \approx 9.765 \times 10^{-4}. This format is commonly employed in graphics processing and applications where reduced storage is beneficial, though it offers lower precision compared to or formats. Quadruple-precision floating-point, or binary128, extends the to 113 bits under IEEE 754-2008, yielding a machine epsilon of $2^{-112} \approx 1.925 \times 10^{-34}. It provides exceptionally high accuracy for scientific simulations requiring minimal errors, such as in . Decimal floating-point formats, introduced in IEEE 754-2008, operate in base-10 rather than base-2, altering the nature of spacing and . For decimal32, which supports 7 digits of , the machine epsilon is approximately $10^{-6}, representing the unit in the last place around 1.0. This base-10 structure avoids some binary-to-decimal conversion issues in financial computations, though it generally results in larger epsilons than equivalent binary formats due to the difference. Historical hexadecimal floating-point formats, used in System/360 and later mainframes, employ -16 with a of 6 hexadecimal digits (24 bits) for . The machine epsilon is $2^{-(p-1)} where p reflects the effective bit , but the hexadecimal introduces uneven spacing, with the ulp at 1.0 being $16^{-5} = 2^{-20} \approx 9.537 \times 10^{-7}. This format's variable across binades—arising from to non-zero leading hexadecimal digits—complicates error analysis compared to standards, though it was optimized for the era's efficiency.

Error Analysis Connections

The relative rounding error in floating-point arithmetic quantifies the deviation between the exact result of an operation and its computed approximation, providing a measure of precision loss due to finite representation. For a generic arithmetic operation \oplus, the computed result satisfies \left| \mathrm{fl}(x \oplus y) - (x \oplus y) \right| / \left| x \oplus y \right| \leq \epsilon / 2, where \mathrm{fl}(\cdot) denotes the floating-point evaluation and \epsilon is the machine epsilon. This bound arises from the standard's requirement for correctly rounded operations, ensuring the result is as if the exact value were computed and then rounded to the nearest representable number. Machine epsilon thus serves as an upper limit on the worst-case relative accuracy of basic operations, with \epsilon / 2 acting as the unit roundoff that scales the potential . In practice, this connection implies that each operation introduces a small, controlled proportional to the input magnitudes, allowing analysts to predict and bound propagation in more complex computations. A illustrative example occurs in the of n positive numbers, where the naive recursive accumulates that grow proportionally to n \epsilon, potentially magnifying inaccuracies for large n. This highlights how machine epsilon informs growth in sequential operations, as each addition contributes an bounded by \epsilon / 2 times the current partial sum. Understanding this link guides the selection of numerical algorithms, favoring those—like compensated summation or pairwise summation—that mitigate error accumulation and keep total relative error close to \epsilon rather than allowing it to scale with problem size.

Mathematical Proof

To rigorously establish the connection between machine epsilon and the maximum relative rounding error in floating-point arithmetic, consider the standard model defined by the IEEE 754 standard, which uses binary representation with base \beta = 2, a mantissa of p bits (including the implicit leading 1 for normalized numbers), and rounding to nearest (ties to even). Theorem. For any x in the normalized of the floating-point (i.e., x_{\min} \leq x \leq x_{\max}, excluding subnormals for simplicity), the relative incurred by x to the nearest floating-point number \mathrm{fl}(x) satisfies \frac{|\mathrm{fl}(x) - x|}{|x|} \leq u = 2^{-p}, where u is the unit roundoff. The machine \epsilon is defined as \epsilon = 2u = 2^{1-p}, which thus bounds the maximum relative by \epsilon/2. Proof. Assume rounding to nearest, so the absolute error satisfies |\delta| = |\mathrm{fl}(x) - x| \leq \frac{1}{2} \cdot \mathrm{ulp}(x), where \mathrm{ulp}(x) is the unit in the last place of x, representing the spacing between consecutive floating-point numbers at the magnitude of x. For a normalized x with exponent e (such that $2^e \leq |x| < 2^{e+1}), \mathrm{ulp}(x) = 2^{e - (p-1)}. Thus, |\delta| \leq 2^{e - p}. The relative error is then \frac{|\delta|}{|x|} \leq \frac{2^{e - p}}{2^e} = 2^{-p} = u, since |x| \geq 2^e. This bound holds uniformly because the relative spacing \mathrm{ulp}(x)/|x| is maximized when |x| is just above a power of two (i.e., at the lower end of a binade), where it equals $2^{1-p}, but the rounding error is at most half that spacing, yielding the factor of $1/2. To derive \epsilon, note that near 1 (where e = 0), \mathrm{ulp}(1) = 2^{-(p-1)} = 2^{1-p}, so the smallest \epsilon > 0 such that \mathrm{fl}(1 + \epsilon) > 1 is this ulp value, confirming \epsilon = 2^{1-p} = 2u. The maximum relative error occurs at points midway between representable numbers, achieving equality with u in the limit. This completes the proof, as the bound is tight and independent of the specific x magnitude within the normalized range.

References

  1. [1]
    [PDF] The Mathematics Of Floating-Point Arithmetic - Nick Higham
    epsilon, which is the distance from 1 to the next larger floating-point number. See the box for a simple example of a floating-point number system. A toy ...
  2. [2]
    The Machine Epsilon - CECM, SFU
    Nov 28, 1995 · The machine epsilon is that it measures the effects of rounding errors made when adding, subtracting, multiplying, or dividing two numbers.
  3. [3]
    Machine Epsilon - an overview | ScienceDirect Topics
    Machine epsilon is defined as the smallest number that a computer recognizes as significantly greater than zero, and it represents the limit beyond which ...
  4. [4]
    IEEE Floating-Point Representation | Microsoft Learn
    Aug 3, 2021 · The IEEE-754 standard describes floating-point formats, a way to represent real numbers in hardware. There are at least five internal formats for floating- ...
  5. [5]
    IEEE 754-2019 - IEEE SA
    Jul 22, 2019 · This standard specifies interchange and arithmetic formats and methods for binary and decimal floating-point arithmetic in computer programming environments.
  6. [6]
    IEEE Arithmetic
    One convenient measure of its size is called a unit in the last place, abbreviated ulp. The least significant bit of the fraction of a floating-point number in ...
  7. [7]
    What Every Computer Scientist Should Know About Floating-Point ...
    The term ulps will be used as shorthand for "units in the last place." If the result of a calculation is the floating-point number nearest to the correct result ...
  8. [8]
    IEEE 754 arithmetic and rounding - Arm Developer
    IEEE 754 defines rounding rules. Arithmetic is calculated as if stored exactly, then rounded to fit, using methods like round to nearest, up, down, or toward ...
  9. [9]
    1. Introduction — Floating Point and IEEE 754 13.0 documentation
    IEEE 754 standardizes how arithmetic results should be approximated in floating point. Whenever working with inexact results, programming decisions can affect ...
  10. [10]
    [PDF] Stochastic Rounding: Implementation, Error Analysis, and Applications
    machine epsilon. εM = 2. 1−p. ,. (4.1) which is the spacing of the floating-point numbers just to the right of 1, and the unit roundoff u = 2. −p. = 1. 2. εM ...<|control11|><|separator|>
  11. [11]
    Rounding errors in algebraic processes - Internet Archive
    May 8, 2019 · Rounding errors in algebraic processes. by: Wilkinson, J. H. (James Hardy). Publication date: 1963. Topics: Electronic digital computers ...
  12. [12]
    [PDF] Roundoff errors and floating-point arithmetic Machine precision
    ➤ u is called Unit Round-off. ➤ In fact can easily show: fl(x) = x(1 + δ) with |δ| < u. ✍2 Matlab experiment: find the machine epsilon on your computer.
  13. [13]
    [PDF] Numerical Linear Algebra for Computational Science and ...
    - The (interval) machine epsilon, often denoted by ϵmach, is the distance between. 1 and the next floating-point number. - The unit roundoff u is half the ...
  14. [14]
    [PDF] Floating point arithmetic - Urbain Vaes
    From the preci- sion, the machine epsilon is defined as εM = 2−(p−1); its significance is discussed in Section 1.2.2. For a number x ∈ F(p, Emin,Emax), s ...
  15. [15]
    Floating Point Representation - CS 357 - Course Websites
    Machine epsilon (ϵm ϵ m ) is defined as the distance (gap) between 1 1 and the next largest floating point number. For IEEE-754 single precision, ϵm ...
  16. [16]
    Numerical aspects of integration in physics-informed neural ... - arXiv
    Aug 9, 2024 · In binary64 this value (in particular the interval machine epsilon 2 22It is the largest relative interval between two nearest numbers in ...
  17. [17]
    Exercise "machine epsilon"
    Using the while loop calculate the machine epsilon for the types float and double . Something like double x=1; while(1+x!=1){x/=2;} x*=2; float y=1F; while ...Missing: compute pseudocode
  18. [18]
    Machine Epsilon (GNU C Language Manual)
    Our macheps function here assumes binary floating point; some architectures may differ. The C library includes some related functions that can also be used ...Missing: p} | Show results with:p}
  19. [19]
    IEEE 754-1985 - IEEE SA
    IEEE Standard for Binary Floating-Point Arithmetic ; Board Approval: 1985-03-21 ; History. ANSI Approved: 1991-05-21; Published: 1985-10-12; Reaffirmed: 1990-12- ...
  20. [20]
    754-2008 - IEEE Standard for Floating-Point Arithmetic
    Aug 29, 2008 · This standard specifies interchange and arithmetic formats and methods for binary and decimal floating-point arithmetic in computer programming environments.<|control11|><|separator|>
  21. [21]
    Data representation
    Modern Intel and AMD processors -- the x86 ... That changed with the 1985 IEEE 754 floating point standard, which almost all modern processors support.
  22. [22]
    Floating Point Representation - CS 357 - Course Websites
    Machine epsilon (ϵm ϵ m ) is defined as the distance (gap) between 1 1 and the next largest floating point number. For IEEE-754 single precision, ϵm ...
  23. [23]
    Basic Issues in Floating Point Arithmetic and Error Analysis
    Macheps is short for "machine epsilon", and is used below for roundoff error analysis. The distance between 1 and the next larger floating point number is 2* ...
  24. [24]
  25. [25]
    What Is Floating-Point Arithmetic? - Nick Higham
    May 4, 2020 · In MATLAB, eps is the machine epsilon, eps(x) is the distance from x to the next larger (in magnitude) floating-point number, realmax is the ...