Fact-checked by Grok 2 weeks ago

Round-off error

Round-off error refers to the discrepancy between the exact mathematical value of a and its in , arising from the finite available in computer representations of real numbers. This error is inherent to digital computation, where real numbers are encoded using a fixed number of bits, leading to inexact representations and the need for during operations. In systems adhering to the standard, floating-point numbers typically use formats such as single (32 bits) or double (64 bits), with the latter providing about 15 decimal digits of accuracy. The primary causes of round-off error include the inability to represent most irrational numbers exactly in —such as 0.1, which has a repeating binary expansion—and the that occurs when the result of an operation exceeds the available precision. For instance, basic operations like or multiplication introduce an error bounded by half a unit in the last place (ulp), often quantified relative to the (ε), the smallest positive floating-point number such that 1 + ε > 1, approximately 2^{-52} (or 2.22 × 10^{-16}) for double precision. These errors can accumulate over multiple operations, potentially magnifying in processes like iterative algorithms or summations, where naive pairwise of n terms may incur up to nε relative error. A notable implication of round-off error is , where subtraction of two nearly equal numbers leads to significant loss of precision; for example, in solving the ax² + bx + c = 0, the term b² - 4ac can result in substantial relative error if b² is much larger than 4ac. Mitigation strategies include using higher-precision arithmetic, compensatory algorithms like Kahan summation (which bounds summation error to roughly 2ε regardless of n), and careful ordering of operations to minimize cancellation. The standard addresses consistency by mandating rounding modes (e.g., to nearest) and features like guard digits to reduce subtraction errors. Overall, understanding and managing round-off error is crucial in , scientific computing, and to ensure reliable results.

Numerical Representation Basics

Representation Error

Representation error arises in numerical computing when a real number x cannot be exactly stored in a finite-precision , such as the floating-point format used in computers. This error is defined as the difference between the exact value x and its approximated floating-point representation \mathrm{fl}(x), where \mathrm{fl}(x) is the closest representable number in the system's finite set of values. In finite-precision systems, only a countable subset of s can be represented exactly, leading to inherent discrepancies for most or non-dyadic rational numbers. A classic example occurs in decimal systems, where the fraction \frac{1}{3} = 0.333\ldots_{10} (repeating infinitely) must be truncated or rounded to a finite number of digits, such as 0.333, resulting in a representation error. This mirrors the challenge in binary floating-point systems, where decimal fractions like 0.1 cannot be expressed exactly because 0.1 requires an infinite repeating binary expansion (approximately 0.0001100110011\ldots_2). In binary floating-point with 24-bit precision (as in single precision), 0.1 is approximated as $1.10011001100110011001101_2 \times 2^{-4}, introducing a persistent small error that affects subsequent computations. The magnitude of is quantified using absolute and relative measures. The absolute is |x - \mathrm{fl}(x)|, which depends on the scale of x and the system's . The relative , \frac{|x - \mathrm{fl}(x)|}{|x|}, normalizes this difference and is typically bounded by half a unit in the last place (ulp) in well-designed systems, providing a scale-invariant of accuracy. These are foundational to round-off issues, as even exact mathematical values become inexact upon storage.

Floating-Point Number System

Floating-point number systems in computers approximate real numbers through a structured format consisting of three primary components: a , an exponent, and a (also referred to as the ). The , typically a single bit, determines the of the number, with 0 indicating positive and 1 indicating negative. The exponent, represented by a fixed number of bits, encodes the scale or of the number. The captures the significant digits, providing the precision of the representation. The general form of a floating-point number \mathrm{fl}(x) for a real number x is expressed as \mathrm{fl}(x) = \pm (1 + m) \times b^{e}, where b is the or of the system (commonly 2 or 10), m is the of the with $0 \leq m < 1, and e is the integer exponent. This formulation assumes a normalized representation, where the is scaled such that its leading digit is nonzero, maximizing the use of available digits for precision. In practice, the is stored as a fixed-length sequence of digits in b, and the exponent adjusts the position of the point. Normalized floating-point numbers require the leading digit of the significand to be nonzero, ensuring a canonical form that avoids redundant representations and optimizes precision within the allocated bits. For instance, a number like $0.d_1 d_2 \dots \times b^e is shifted to d_1.d_2 \dots \times b^{e-k} where d_1 \neq 0. Denormalized forms, conversely, occur when the exponent reaches its minimum value and the leading digit is zero, allowing gradual underflow by representing subnormal numbers with reduced precision near zero. This distinction helps mitigate abrupt transitions in representable values around the smallest normalized magnitude. The representable range is bounded by the minimum and maximum exponents, e_{\min} and e_{\max}, limiting numbers to approximately [b^{e_{\min}}, b^{e_{\max}}] for normalized forms. Precision is constrained by the length of the significand, typically measured in digits or bits, which determines the smallest distinguishable relative difference between numbers of similar magnitude. Overflow arises when a value exceeds b^{e_{\max}}, often resulting in a special infinity representation, while underflow occurs for values below b^{e_{\min}}, potentially flushing tiny results to zero or using denormalized forms for gradual precision loss. These bounds impose inherent limitations on the system's ability to capture arbitrary real numbers exactly. Binary floating-point systems with base b = 2 predominate in modern computers owing to their hardware efficiency; binary representations facilitate simple shifts for exponent adjustments and align seamlessly with binary logic gates, reducing complexity in arithmetic units compared to higher-radix systems.

Floating-Point Standards and Notation

Notation of Floating-Point Systems

Floating-point numbers are typically represented in a standardized mathematical form to approximate real numbers within a finite precision system. The general notation for a floating-point number x is given by x = \pm (d_0 . d_1 d_2 \dots d_{p-1})_\beta \times \beta^e, where \beta is the base (radix, often 2 or 10), p denotes the precision (number of digits in the significand), each d_i (for i = 0, 1, \dots, p-1) is a digit satisfying $0 \leq d_i < \beta, and e is the exponent, an integer within a defined range. This form mirrors scientific notation but is constrained to discrete values, leading to round-off errors when exact representation is impossible. In normalized floating-point systems, the significand (or mantissa) is adjusted such that the leading digit d_0 is nonzero, ensuring $1_\beta \leq d_0 . d_1 d_2 \dots d_{p-1} < \beta. This normalization eliminates leading zeros, providing a unique representation for nonzero numbers and maximizing precision. The exponent e ranges from e_{\min} to e_{\max}, which bound the smallest and largest representable magnitudes; for example, underflow occurs for exponents below e_{\min}, and overflow above e_{\max}. The unit in the last place, denoted \operatorname{ulp}(x), quantifies the spacing between consecutive representable floating-point numbers near x. For a normalized number x with exponent e, \operatorname{ulp}(x) = \beta^{e - p + 1}, representing the value of the least significant digit in the significand. This measure is crucial for assessing representation granularity and potential rounding discrepancies. Under rounding to nearest, the unit roundoff u defines the maximum relative rounding error, expressed as u = \frac{1}{2} \beta^{1-p}. This bounds the error in approximating any real number by the nearest floating-point representation, where |\operatorname{fl}(y) - y| \leq u |y| for a real y and its floating-point approximation \operatorname{fl}(y). This notation underpins standards like , which specify concrete parameters for \beta, p, e_{\min}, and e_{\max}.

IEEE 754 Standard

The IEEE 754 standard, originally published in 1985 as IEEE Std 754-1985, defines a technical framework for binary floating-point arithmetic to promote consistent representation and computation across diverse computer systems. Its primary purpose is to ensure portability of floating-point data and reproducibility of arithmetic results, addressing inconsistencies in earlier proprietary formats that hindered software development and scientific computing. The standard was revised in 2008 (IEEE Std 754-2008) to incorporate decimal floating-point formats, refine operations, and clarify exception handling, while maintaining backward compatibility with the original binary specifications. The latest revision, IEEE Std 754-2019, further expanded support by adding new decimal formats and updating exception handling mechanisms to better accommodate modern hardware and application needs. Central to the standard are its binary interchange formats, which encode floating-point numbers using a sign bit, an exponent field, and a significand (also called the fraction or mantissa). The single-precision format (binary32) occupies 32 bits total: 1 sign bit, 8 exponent bits, and 23 fraction bits, providing an effective significand precision of 24 bits including an implicit leading 1 for normalized numbers. The double-precision format (binary64) uses 64 bits: 1 sign bit, 11 exponent bits, and 52 fraction bits, yielding 53 bits of significand precision. For higher precision, the quad-precision format (binary128) employs 128 bits: 1 sign bit, 15 exponent bits, and 112 fraction bits, resulting in 113 bits of significand precision. Exponents in these formats are biased to allow representation of both positive and negative values; for example, in double precision, the 11-bit exponent field uses a bias of 1023, where the stored exponent value e represents the true exponent as e - 1023. The standard also defines special values to handle exceptional conditions in computations. Infinities are represented by setting the exponent to all ones (e.g., 2047 for ) and the significand to zero, with the sign bit indicating positive or negative infinity. Not a Number (NaN) values use the same all-ones exponent but with a non-zero significand, allowing distinction between quiet NaNs (propagating without signaling) and signaling NaNs (triggering exceptions); NaNs do not have a sign. Signed zeros are supported, where +0 and -0 are distinct representations (exponent and significand all zeros, differing only in the sign bit), preserving the sign of zero results from operations like subtraction near zero. These features, introduced in the 1985 standard and refined in subsequent revisions, enable robust error detection and consistent behavior in floating-point environments.

Quantifying Round-off Error

Machine Epsilon

Machine epsilon, denoted \epsilon, is defined as the smallest positive real number such that the floating-point representation \mathrm{fl}(1 + \epsilon) > 1, meaning it is the smallest \epsilon > 0 distinguishable from zero when added to 1 in . This value quantifies the precision limit of the floating-point system, representing the gap between 1 and the next larger representable number. In a floating-point system with base b and p (the number of digits in the ), machine epsilon derives from the spacing of representable numbers near 1, given by \epsilon = b^{1-p}. For the binary64 double-precision format, where b = 2 and p = 53, this yields \epsilon \approx 2.22 \times 10^{-16}. The unit roundoff u, which bounds the maximum relative error, relates to machine epsilon by u = \epsilon / 2. Machine epsilon plays a central role in error analysis by providing an upper bound on relative errors in floating-point representations and basic operations. Specifically, for any x in the normal range of the floating-point system, the representation error satisfies |\mathrm{fl}(x) - x| \leq u |x|, or equivalently |\mathrm{fl}(x) - x| \leq (\epsilon / 2) |x|. This bound ensures that the relative error in representing x is at most u, facilitating the analysis of in more complex computations.

Round-off Error Under Rounding Rules

In floating-point arithmetic, rounding rules determine how a non-representable real number is approximated by the closest representable value, thereby introducing round-off error. The IEEE 754 standard specifies four primary rounding modes: round to nearest, where the result is the representable value closest to the exact result (with ties resolved to the even mantissa); round toward zero (truncation), which discards excess bits beyond the precision; round toward positive infinity, which rounds up for positive numbers and down for negative; and round toward negative infinity, which rounds down for positive and up for negative. Round to nearest serves as the default mode, promoting minimal error magnitude, while directed modes (toward zero or infinity) are used in applications like interval arithmetic for bounding computations. An additional common rule, round away from zero, directs rounding toward the infinity opposite the sign (up for positive, down for negative), though it is not a distinct IEEE mode but can be emulated by sign-dependent selection of directed rounding. Under these rules, round-off error is quantified by the difference between the exact value x and its floating-point representation \mathrm{fl}(x). For round to nearest, the absolute error is bounded by half the unit in the last place (ulp) of x: |\mathrm{fl}(x) - x| \leq \frac{1}{2} \mathrm{ulp}(x). This bound arises because the representable numbers are spaced by \mathrm{ulp}(x) in the binade containing x, and rounding selects the nearest point, ensuring the maximum deviation is halfway between points. In directed modes like toward zero or infinity, the error can reach a full ulp, leading to larger potential discrepancies: |\mathrm{fl}(x) - x| \leq \mathrm{ulp}(x). The relative round-off error provides a scale-invariant measure, especially useful for normalized floating-point systems. Machine epsilon \epsilon, defined as the smallest positive floating-point number such that $1 + \epsilon > 1, captures the spacing around unity and relates to relative precision. For operations on values within the normal range (away from underflow), the relative error under rounding satisfies |\mathrm{fl}(x) - x| / |x| \leq \epsilon / 2 in binary systems, or more generally \leq u where u = \frac{1}{2} \beta^{1-p} is the unit roundoff, with base \beta and precision p. A key result in normalized floating-point systems with round to nearest is the theorem that the relative round-off error is bounded by the unit roundoff: |\mathrm{fl}(x) - x| / |x| \leq u. The proof relies on the structure of the and . For a normalized x = m \cdot \beta^e with $1 \leq m < \beta, the ulp is \beta^{e - p + 1}, so \frac{1}{2} \mathrm{ulp}(x) = \frac{1}{2} \beta^{e - p + 1}. Dividing by |x| \geq \beta^e yields \frac{|\mathrm{fl}(x) - x|}{|x|} \leq \frac{\frac{1}{2} \beta^{e - p + 1}}{\beta^e} = \frac{1}{2} \beta^{1 - p} = u, confirming that \mathrm{fl}(x) is the closest representable number and the error does not exceed this relative threshold. This bound holds assuming no or underflow, emphasizing the role of in maintaining computational .

Errors in Floating-Point Arithmetic

Addition and Subtraction

In floating-point addition, the operands are first aligned by shifting the of the number with the smaller exponent to match the larger one, which may introduce errors if bits are shifted beyond the limit. The aligned significands are then added, producing a sum that may exceed the representable . This sum undergoes , shifting the left or right to restore the normalized form while adjusting the exponent accordingly. Finally, the result is rounded to the nearest representable floating-point number according to the system's mode, such as round-to-nearest in IEEE 754. The round-off error introduced by this process for satisfies the model fl(a + b) = (a + b)(1 + \delta), where fl denotes the floating-point result, and the relative error satisfies |\delta| \leq u, with u being the unit roundoff (half the ). This bound holds under the standard's exact rounding requirement, assuming no or underflow. The same error model applies to , fl(a - b) = (a - b)(1 + \delta) with |\delta| \leq u, as subtraction is implemented by negating one and performing . Subtraction introduces a particular vulnerability known as when the operands are nearly equal in magnitude (a \approx b), causing leading significant digits to cancel and resulting in a small with potentially amplified relative . In such cases, the absolute round-off remains bounded by u |a - b|, but the relative in the result can greatly exceed u because the true is much smaller than the operands, effectively magnifying any prior errors in a or b or those from the itself. For instance, in a floating-point system with six significant digits, subtracting 9.99995 from 10.00000 yields an exact of 0.00005, but if the operands are rounded approximations, the result might be 0.00000 due to cancellation of all significant digits, leading to of in the .

Multiplication and Division

In floating-point arithmetic, multiplication of two numbers a and b, represented as a = m_a \cdot \beta^{e_a} and b = m_b \cdot \beta^{e_b} where m_a, m_b are significands and \beta is the , involves multiplying the significands to form an intermediate product m_a \cdot m_b (which may require up to $2p digits for p-digit precision) and adding the exponents e_a + e_b. The result is then normalized if necessary and rounded to fit the destination format. The computed result satisfies \mathrm{fl}(a \times b) = a b (1 + \delta), where |\delta| \leq u and u = \beta^{1-p}/2 is the , ensuring a relative error bounded by half a unit in the last place (ulp). This rounding error arises because the exact product may not be representable exactly, particularly when the product exceeds p digits before . To achieve correctly rounded results as required by , implementations use extra bits beyond the : a bit to hold the most significant discarded bit, a round bit for the next, and a that is set if any further bits are nonzero, allowing precise decisions for modes like round-to-nearest (ties to even). These mechanisms reduce the effective rounding error to at most 0.5 ulp. For example, multiplying a large number near the threshold, such as $10^{308} by 2 in double precision, may produce an intermediate exponent that triggers , resulting in if the final rounded value exceeds the maximum representable finite number. Division follows a similar process: the significands are divided to yield m_a / m_b, the exponents are subtracted as e_a - e_b, and the quotient is normalized and rounded. The relative error is likewise bounded: \mathrm{fl}(a / b) = (a / b) (1 + \delta) with |\delta| \leq u. However, division introduces potential underflow for small quotients, where the result's magnitude falls below the smallest normalized number, leading to a subnormal or zero after rounding; the IEEE 754 standard mandates signaling underflow in such cases and provides gradual underflow via subnormals to minimize information loss. The absolute error in the quotient is |(a / b) \delta| \leq u \cdot |a / b|, which depends on the quotient's magnitude and remains small relative to the result even for tiny values. Guard, round, and sticky bits are again employed during significand division (often via iterative algorithms like SRT) to ensure correct rounding.

Propagation and Accumulation of Errors

Error Accumulation in Computations

In floating-point computations involving multiple operations, individual s from basic can propagate and accumulate, leading to a total that grows with the number of steps. The forward represents the overall discrepancy between the exact mathematical result and the computed floating-point result, which includes both the initial representation s in the input data and the propagated round-off s from subsequent operations. This accumulation arises because each operation introduces a relative bounded by the unit roundoff u, and these s can compound through by subsequent factors in the computation. Backward error analysis provides a complementary perspective by interpreting the computed result as the exact solution to a slightly perturbed problem, where the input is modified by a small backward error \epsilon such that the algorithm applied to the perturbed input yields the observed output. In this framework, the computed result \hat{y} satisfies \hat{y} = f(x + \delta x) for some small \delta x with \|\delta x\| \leq \epsilon \|x\|, allowing the forward error to be bounded as approximately \epsilon times the condition number of the problem. This approach is particularly useful for assessing stability in algorithms where round-off errors mimic small input perturbations, and the total forward error then combines representation errors with the propagated effects of these perturbations. A key example of error accumulation occurs in the summation of n floating-point numbers using recursive (naive) addition, where the error bound without significant cancellation is O(n u) times a measure of the input magnitude, such as \sum |x_i|. Specifically, for recursive summation s_1 = x_1, s_k = fl(s_{k-1} + x_k) for k = 2, \dots, n, the absolute error satisfies |s_n - \sum_{i=1}^n x_i| \leq \gamma_n \sum_{i=1}^n |x_i|, where \gamma_n = \frac{n u}{1 - n u} (assuming n u < 1), reflecting the linear growth in error due to the sequential addition of bounded relative errors at each step. This bound arises from the first-order model of floating-point addition, where each operation incurs a relative error at most u, and the errors propagate multiplicatively through the partial sums. Under the random sign errors model, which assumes that the rounding errors at each step behave like independent random variables with random signs (mean zero and bounded by u), the accumulation resembles a , leading to a probabilistic error growth of order \sqrt{n} u rather than O(n u). This model, originally proposed by Wilkinson, predicts that the errors partially cancel out on average, reducing the expected magnitude of the total to \sqrt{n} times the unit roundoff scaled by the input size, with high probability. In practice, this applies when the input data has mixed signs and magnitudes that avoid severe cancellation, providing a tighter bound for typical scenarios compared to the worst-case deterministic analysis.

Unstable Algorithms and Ill-Conditioned Problems

In numerical computations, unstable algorithms amplify round-off errors, leading to forward error growth that exceeds the inherent precision limits of . This occurs when the algorithm's structure causes small perturbations to propagate disproportionately, often due to subtractive cancellations or ill-suited recurrence relations. A classic example is the backward recurrence for numbers, defined by f_{n-1} = f_{n+1} - f_n starting from large indices and proceeding to smaller ones. While the forward recurrence f_{n+1} = f_n + f_{n-1} remains stable with relative errors accumulating as O(n \epsilon_{\text{mach}}), where \epsilon_{\text{mach}} is the , the backward version catastrophically amplifies round-off errors because the terms become nearly equal, resulting in severe loss of accuracy even for modest n. For instance, in double precision (\epsilon_{\text{mach}} \approx 10^{-16}), backward computation from n \approx 40 can yield errors exceeding unity when recovering initial conditions like f_0 and f_1. Ill-conditioned problems, in contrast, are intrinsically sensitive to input perturbations, where small changes in the problem data produce large variations in the , independent of the algorithm used. For solving linear systems Ax = b, the \kappa(A) = \|A\| \|A^{-1}\|, using any consistent \|\cdot\|, quantifies this sensitivity. A large \kappa(A) indicates ill-conditioning, as the relative error in the computed solution \hat{x} is bounded by \|\hat{x} - x\| / \|x\| \leq \kappa(A) \, u approximately, where u is the unit roundoff and the bound assumes a well-behaved (backward stable) algorithm. For the 2-norm, \kappa_2(A) = \sigma_{\max}(A) / \sigma_{\min}(A), emphasizing the role of the smallest singular value in vulnerability to perturbations. The interplay between unstable algorithms and ill-conditioned problems can cause complete computational failure, as round-off errors are first magnified by the algorithm and then further amplified by the problem's . The H_n, with entries (H_n)_{ij} = 1/(i+j-1), exemplifies this: its grows exponentially as \kappa_2(H_n) \sim e^{3.5n}, rendering systems H_n x = b unsolvable in IEEE double precision for n \geq 12 due to effective from round-off. Similarly, Wilkinson's p(x) = \prod_{k=1}^{20} (x - k) has at integers 1 through 20, but perturbing a single by a unit in the last place (ulp) causes some to become with imaginary parts up to several units, demonstrating extreme that overwhelms even root-finding methods. These cases underscore the need for assessment and alternatives to mitigate total error domination.

References

  1. [1]
    What Every Computer Scientist Should Know About Floating-Point ...
    For example rounding to the nearest floating-point number corresponds to an error of less than or equal to .5 ulp. However, when analyzing the rounding error ...
  2. [2]
    [PDF] Contents 1. Source of errors 1 1.1. Roundoff error 1 1.2. Truncation ...
    The four major sources of error in computations are: roundoff, truncation, termination, and statistical errors.
  3. [3]
    Sources of Error in Numerical Calculations - The Netlib
    The two main sources of error in numerical calculations are roundoff error, from rounding floating-point operations, and input error, from prior calculations ...<|control11|><|separator|>
  4. [4]
    What every computer scientist should know about floating-point ...
    What every computer scientist should know about floating-point arithmetic ... It begins with background on floating-point representation and rounding error ...
  5. [5]
    15. Floating-Point Arithmetic: Issues and Limitations — Python 3.14 ...
    Representation error refers to the fact that some (most, actually) decimal fractions cannot be represented exactly as binary (base 2) fractions. This is the ...
  6. [6]
    754-1985 - IEEE Standard for Binary Floating-Point Arithmetic
    This standard specifies basic and extended floating-point number formats; add, subtract, multiply, divide, square root, remainder, and compare operations.
  7. [7]
    IEEE 754-2019 - IEEE SA
    Jul 22, 2019 · IEEE 754-2019 specifies formats and methods for floating-point arithmetic, including interconversion, data exchange, and exception handling.
  8. [8]
    754-2008 - IEEE Standard for Floating-Point Arithmetic
    Aug 29, 2008 · This standard specifies interchange and arithmetic formats and methods for binary and decimal floating-point arithmetic in computer programming environments.
  9. [9]
    [PDF] What every computer scientist should know about floating-point ...
    There- fore, the result of a floating-point calcu- lation must often be rounded in order to fit back into its finite representation. The resulting rounding.
  10. [10]
    754-2008 - IEEE Standard for Floating-Point Arithmetic
    Aug 29, 2008 · This standard specifies interchange and arithmetic formats and methods for binary and decimal floating-point arithmetic in computer ...
  11. [11]
    Basic Issues in Floating Point Arithmetic and Error Analysis
    Floating point numbers are represented in the form +-significand * 2^(exponent), where the significand is a nonnegative number. A normalized significand lies in ...Missing: definition | Show results with:definition
  12. [12]
    [PDF] 2008 (Revision of IEEE Std 754-1985), IEEE Standard for Floating ...
    Aug 29, 2008 · Abstract: This standard specifies interchange and arithmetic formats and methods for binary and decimal floating-point arithmetic in ...
  13. [13]
    The Accuracy of Floating Point Summation - SIAM Publications Library
    Five summation methods and their variations are analyzed here. The accuracy of the methods is compared using rounding error analysis and numerical experiments.
  14. [14]
    [PDF] A New Approach to Probabilistic Rounding Error Analysis
    Traditional rounding error analysis in numerical linear algebra leads to backward error bounds involving the constant γn = nu/(1 − nu), for a problem size n and ...
  15. [15]
    [PDF] Probabilistic Rounding Error Analysis for Sums
    Higham (2002):. Whenever we write γn there is an implicit assumption that nu < 1, which is true in virtually any circumstance that might arise with IEEE.
  16. [16]
    [PDF] Week 1 1 About this Scientific Computing course - NYU Courant
    should start with f0 and f1 and use the Fibonacci recurrence to compute fn for n up to some N +1, then turn around and re-compute fn−1 from fn and fn+1 to ...
  17. [17]
    None
    Below is a merged and comprehensive summary of the condition number and relative error bound from "Matrix Computations" (4th Edition) by Golub and Van Loan. To retain all the detailed information from the provided segments, I will use a structured table format in CSV style for clarity and density, followed by a narrative summary that consolidates the key points. This approach ensures all page references, sections, formulas, and URLs are preserved while avoiding redundancy.
  18. [18]
    What Is the Hilbert Matrix? - Nick Higham
    Jun 30, 2020 · An underlying reason for the ill conditioning is that the Hilbert matrix is obtained when least squares polynomial approximation is done using ...Missing: round- off
  19. [19]
    Rounding Errors in Algebraic Processes - SIAM Publications Library
    Rounding Errors in Algebraic Processes was the first book to give systematic analyses of the effects of rounding errors on a variety of key computations.