Divided differences

Divided differences constitute a fundamental concept in numerical analysis, serving as a generalization of finite differences for interpolating functions at arbitrarily spaced points. They enable the construction of polynomial approximations to a function f(x) based on its values at distinct nodes x_0, x_1, \dots, x_n, and are particularly valuable when data points are unequally spaced, unlike traditional finite difference methods that assume uniform intervals.^[1]^[2] The origins of divided differences trace back to the 17th century, when Sir Isaac Newton developed them as part of his pioneering work on interpolation methods to approximate continuous functions from discrete observations. Historically, these techniques were instrumental in generating accurate tables for logarithms, trigonometric functions, and other mathematical constants, facilitating computations in astronomy, navigation, and early scientific calculations before the advent of digital computers.^[3]^[2] Formally, the zeroth-order divided difference is simply the function value, f[x_i] = f(x_i), while higher-order divided differences are defined recursively: the first-order difference is f[x_i, x_{i+1}] = \frac{f(x_{i+1}) - f(x_i)}{x_{i+1} - x_i}, and for the k-th order, f[x_i, \dots, x_{i+k}] = \frac{f[x_{i+1}, \dots, x_{i+k}] - f[x_i, \dots, x_{i+k-1}]}{x_{i+k} - x_i}. This recursive structure allows for systematic computation via a divided difference table, where each entry builds upon previous ones, promoting numerical stability and ease of implementation in algorithms.^[1]^[2] In practice, divided differences form the coefficients of Newton's divided difference interpolation polynomial, expressed as P_n(x) = f[x_0] + f[x_0, x_1](x - x_0) + f[x_0, x_1, x_2](x - x_0)(x - x_1) + \dots + f[x_0, \dots, x_n] \prod_{i=0}^{n-1} (x - x_i), which provides an exact interpolant of degree at most n through the given points. This formulation not only supports efficient evaluation—often using Horner's method for O(n) operations per point—but also extends to more advanced applications, such as Hermite interpolation incorporating derivatives or error estimation via the next-order divided difference, where the interpolation error is f(x) - P_n(x) = f[x_0, \dots, x_n, x] \prod_{i=0}^n (x - x_i) = \frac{f^{(n+1)}(\xi)}{(n+1)!} \prod_{i=0}^n (x - x_i) for some \xi in the interval spanned by x_0, \dots, x_n, x.^[1]^[4]^[2]

Fundamentals

Definition

Divided differences provide a means to quantify the rate of change of a function at multiple distinct points through recursive quotients, extending the notion of finite differences to handle unequally spaced data. Unlike finite differences, which rely on uniform grid spacings, divided differences adapt to arbitrary point distributions, making them essential for flexible numerical approximations. They originate from the evaluation of the function at individual points and build higher-order measures iteratively, capturing successive levels of variation in the function's behavior.^[1] Formally, the zeroth-order divided difference is the function value itself:

f[x_0] = f(x_0)

Higher-order divided differences are then defined recursively as

f[x_0, \dots, x_k] = \frac{f[x_1, \dots, x_k] - f[x_0, \dots, x_{k-1}]}{x_k - x_0}

for k \geq 1 and distinct points x_0, x_1, \dots, x_k. This construction requires initial function evaluations at the points before computing subsequent orders.^[5] In approximation theory, divided differences underpin the Newton interpolation formula, enabling the construction of polynomials that match a function's values at given points while accommodating irregular spacings that finite differences cannot. This generalization allows for robust error estimation and derivative approximation in non-uniform settings, with the k-th order divided difference relating to the k-th derivative via f[x_0, \dots, x_k] = \frac{f^{(k)}(\xi)}{k!} for some \xi in the interval spanning the points, assuming sufficient smoothness.^[6]

Notation

The standard notation for the k-th order divided difference of a function f at distinct points x_0, x_1, \dots, x_k is f[x_0, x_1, \dots, x_k], where the zeroth-order case simplifies to f[x_i] = f(x_i).^[2] This bracket notation distinguishes divided differences from function evaluations and facilitates recursive computations.^[2] The concept of divided differences originates from Isaac Newton's development of interpolation methods in his works Philosophiæ Naturalis Principia Mathematica (1687) and Methodus differentialis (1711), where they served as a discrete analogue to derivatives for polynomial approximation. The modern bracket notation f[x_0, x_1, \dots, x_k] became standard in 20th-century numerical analysis literature.^[1] Variations exist in the literature; for instance, some texts denote the (j-1)-th divided difference as \Delta(t_1 : j)p or p[t_1, \dots, t_j], emphasizing its role as a linear functional on polynomials.^[7] When points coincide, divided differences connect directly to derivatives. For k+1 identical points x_0 = x_1 = \dots = x_k = x, the k-th order divided difference is f[x, x, \dots, x] = \frac{f^{(k)}(x)}{k!}.^[2] More generally, if the points x_i approach a common value x_0, the limit yields \lim_{x_i \to x_0} f[x_0, x_1, \dots, x_k] = \frac{f^{(k)}(x_0)}{k!}.^[2] This extends the definition to repeated points via Taylor expansion principles.^[2]

Examples

First-Order Divided Differences

The first-order divided difference provides the foundational step in constructing higher-order differences for polynomial interpolation, representing the average rate of change of a function f between two distinct points x_i and x_{i+1}. It is computed using the formula

f[x_i, x_{i+1}] = \frac{f(x_{i+1}) - f(x_i)}{x_{i+1} - x_i},

assuming x_{i+1} \neq x_i.^[2] This expression directly generalizes the concept of a finite difference quotient.^[8] To illustrate, consider the quadratic function f(x) = x^2 evaluated at points x_0 = 0 and x_1 = 1. Here, f(0) = 0 and f(1) = 1, so

f[0, 1] = \frac{1 - 0}{1 - 0} = 1.

This value equals the slope of the secant line connecting the points (0, 0) and (1, 1) on the parabola f(x) = x^2, highlighting the geometric interpretation of the first-order divided difference as the secant slope.^[2] For clarity, the following table shows first-order divided differences for f(x) = x^2 at additional closely spaced points around x = 0.5:

Points	f[x_i]	First-Order Divided Difference
x_0 = 0	0
x_1 = 0.5	0.25	f[0, 0.5] = 0.5
x_2 = 1.0	1.00	f[0.5, 1.0] = 1.5

These values approximate the function's behavior between the points.^[8] When the points x_i and x_{i+1} are closely spaced, the first-order divided difference approximates the derivative f'(x_i). Specifically, as x_{i+1} \to x_i,

\lim_{x_{i+1} \to x_i} f[x_i, x_{i+1}] = f'(x_i).

For f(x) = x^2, the derivative is f'(x) = 2x. Taking points x_i = 0 and x_{i+1} = h > 0, the first-order difference is f[0, h] = h, which approaches f'(0) = 0 as h \to 0. This limit relation underscores the connection between divided differences and differential calculus.^[9]

Higher-Order Divided Differences

Higher-order divided differences extend the first-order case by recursively applying the divided difference operator to previous orders, allowing the construction of divided difference tables that capture increasingly refined measures of function variation across multiple points. The recurrence relation for the nth-order divided difference is given by

f[x_0, x_1, \dots, x_n] = \frac{f[x_1, x_2, \dots, x_n] - f[x_0, x_1, \dots, x_{n-1}]}{x_n - x_0},

where the points x_i are distinct.^[10] This formula enables the systematic buildup from zeroth-order values (the function evaluations themselves) to higher orders, forming the basis for Newton's interpolation polynomial.^[11] To illustrate, consider the function f(x) = \sin(x) evaluated at the points x_0 = 0, x_1 = 0.5, x_2 = 1.0, and x_3 = 1.5 (in radians). The zeroth-order divided differences are simply the function values:

f[x_0] = \sin(0) = 0, \quad f[x_1] = \sin(0.5) \approx 0.4794, \quad f[x_2] = \sin(1.0) \approx 0.8415, \quad f[x_3] = \sin(1.5) \approx 0.9975.

The first-order divided differences are computed as

f[x_0, x_1] = \frac{0.4794 - 0}{0.5 - 0} \approx 0.9589, \quad f[x_1, x_2] = \frac{0.8415 - 0.4794}{1.0 - 0.5} \approx 0.7241, \quad f[x_2, x_3] = \frac{0.9975 - 0.8415}{1.5 - 1.0} \approx 0.3120.

For the second-order divided differences,

f[x_0, x_1, x_2] = \frac{0.7241 - 0.9589}{1.0 - 0} \approx -0.2348, \quad f[x_1, x_2, x_3] = \frac{0.3120 - 0.7241}{1.5 - 0.5} \approx -0.4121.

Finally, the third-order divided difference is

f[x_0, x_1, x_2, x_3] = \frac{-0.4121 - (-0.2348)}{1.5 - 0} \approx -0.1182.

These computations are typically organized in a divided difference table, which displays the progression from lower to higher orders in a triangular array. The table for this example, rounded to four decimal places for clarity, is as follows:

x_i	f[x_i]	1st order	2nd order	3rd order
0.0	0.0000
		0.9589
0.5	0.4794		-0.2348
		0.7241		-0.1182
1.0	0.8415		-0.4121
		0.3120
1.5	0.9975

The diagonal entries (starting from the top-left) yield the coefficients for Newton's form of the interpolating polynomial.^[2] Higher-order divided differences provide an interpretation analogous to higher-order derivatives: the first-order approximates the average slope (first derivative), the second-order captures average curvature (second derivative), and subsequent orders reflect higher-order rates of change, scaled appropriately. When the points are closely spaced, the nth-order divided difference approximates f^{(n)}(\xi)/n! for some \xi in the interval spanning the points.^[12]^[13] This perspective underscores their role in quantifying the function's local behavior beyond linear trends.

Properties

Recurrence Relations

The Newton recurrence relation provides a fundamental recursive method for computing divided differences of order greater than zero. For a function f and distinct points x_0, x_1, \dots, x_k, the k-th divided difference is given by

f[x_0, \dots, x_k] = \frac{f[x_1, \dots, x_k] - f[x_0, \dots, x_{k-1}]}{x_k - x_0},

with the base case f[x_i] = f(x_i) for i = 0, \dots, k.^[14] This relation enables the construction of a divided difference table, where each entry is computed from the differences of preceding entries divided by the span of the corresponding points. To derive this recurrence, consider the interpolating polynomials associated with subsets of the points. Let p(x) be the unique polynomial of degree at most k-1 that interpolates f at x_0, \dots, x_{k-1}, and let q(x) be the unique polynomial of degree at most k-1 that interpolates f at x_1, \dots, x_k. Then, the polynomial r(x) = \frac{(x_k - x) p(x) + (x - x_0) q(x)}{x_k - x_0} is of degree at most k and interpolates f at all points x_0, \dots, x_k, since at x = x_i for i = 1, \dots, k-1, both terms agree with f(x_i), and it matches at the endpoints by construction. By uniqueness of the interpolating polynomial of degree at most k, r(x) is the interpolant at these points.^[15] The leading coefficient of r(x) (scaled appropriately for the Newton basis) yields the divided difference f[x_0, \dots, x_k]. Since the leading coefficients of p(x) and q(x) are f[x_0, \dots, x_{k-1}] and f[x_1, \dots, x_k], respectively, expanding r(x) shows that its leading coefficient is \frac{f[x_1, \dots, x_k] - f[x_0, \dots, x_{k-1}]}{x_k - x_0}, confirming the recurrence.^[14] A proof of the recurrence's validity can be established by induction on the order k. For the base case k=1, f[x_0, x_1] = \frac{f(x_1) - f(x_0)}{x_1 - x_0}, which holds by the first-order definition. Assume the relation holds for all divided differences up to order k-1. For order k, the interpolating polynomial argument above relies on the uniqueness theorem for interpolation, which is proven by induction on the number of points, ensuring the leading coefficients satisfy the recursive form. Alternatively, the proof follows directly from the Hermite-Genocchi formula or explicit summation, but the polynomial interpolation approach verifies it without additional assumptions.^[15]^[14] This recurrence enhances computational stability, particularly for equally spaced points x_i = x_0 + i h with small step size h. In contrast to successive approximations that divide by h at each step (e.g., in naive finite difference computations), the denominator x_k - x_0 = k h grows with order k, reducing the magnification of roundoff errors from repeated small divisions. When points are ordered monotonically (e.g., increasing), the algorithm is backward stable, with error bounds proportional to machine epsilon times the condition number of the points, avoiding severe cancellation in subtractions for clustered data. For equally spaced grids, this corresponds to scaled forward differences \Delta^k f(x_0) / (k! h^k), computed without intermediate underflow if h is not excessively small.^[16]^[17] Divided differences also satisfy a Leibniz rule analogous to the product rule for derivatives. For functions g and h, the k-th divided difference of the product f = g h is

f[x_0, \dots, x_k] = \sum_{j=0}^k g[x_0, \dots, x_j] \, h[x_j, \dots, x_k].

This expresses the result as a sum of products of divided differences, with "cross terms" where the partition point x_j splits the arguments between g and h. The formula arises from the product of Newton interpolants for g and h, where the leading coefficient of degree k in the product matches this sum, as lower-degree terms vanish at the interpolation points.^[18]

Symmetry and Uniqueness

One key property of divided differences is their symmetry: the value f[x_0, x_1, \dots, x_k] remains unchanged under any permutation of the distinct points x_0, x_1, \dots, x_k.^[7] This ensures that the divided difference is well-defined independent of the ordering of the evaluation points in the notation.^[7] The symmetry can be established by induction on the order k using the recurrence relation f[x_0, \dots, x_k] = \frac{f[x_1, \dots, x_k] - f[x_0, \dots, x_{k-1}]}{x_k - x_0}.^[19] For the base case k = 0, f[x_0] = f(x_0), which is trivially symmetric. For k = 1, f[x_0, x_1] = \frac{f(x_1) - f(x_0)}{x_1 - x_0} = \frac{f(x_0) - f(x_1)}{x_0 - x_1} = f[x_1, x_0], confirming symmetry. Assuming the property holds for all divided differences of order less than k, the induction step follows by verifying invariance under adjacent transpositions (swapping two consecutive points x_i and x_{i+1}). For such a swap, the recurrence relation and the induction hypothesis on the relevant lower-order divided differences over the modified subsets ensure the overall value remains unchanged. Since any permutation of the points can be obtained as a composition of adjacent transpositions, the divided difference is symmetric under arbitrary permutations.^[19]^[20] A related uniqueness theorem states that, for distinct points x_0, \dots, x_k, the divided difference f[x_0, \dots, x_k] equals the leading coefficient of the unique polynomial p of degree at most k that interpolates f at these points.^[7] This uniqueness follows from the fact that if two such polynomials p and q agree at k+1 distinct points, then p - q is a polynomial of degree at most k with k+1 roots, hence identically zero.^[21] In Newton's divided difference interpolation formula, this coefficient appears explicitly as f[x_0, \dots, x_k], underscoring the well-defined nature of divided differences.^[7] Divided differences also admit an extension of the mean value theorem: if f is k-times continuously differentiable on an interval containing the points x_0 \leq x_1 \leq \dots \leq x_k, then there exists some \xi in [x_0, x_k] such that f[x_0, \dots, x_k] = \frac{f^{(k)}(\xi)}{k!}.^[21] The proof proceeds by applying Rolle's theorem repeatedly to the error function g(x) = f(x) - p(x), where p is the interpolating polynomial of degree at most k-1, yielding k points where derivatives vanish, and thus a final point where the k-th derivative satisfies the relation.^[21] This result generalizes the classical mean value theorem (for k=1) and provides a theoretical basis for error analysis in interpolation.^[7] When points are repeated, divided differences are defined via limits as coincident points approach each other, transitioning smoothly to Taylor series coefficients.^[7] Specifically, for a point x repeated k+1 times, f[x, x, \dots, x] = \frac{f^{(k)}(x)}{k!}, assuming f is sufficiently differentiable.^[21] This limit preserves the recursive structure and enables Hermite interpolation, where the divided differences incorporate both function values and derivatives at repeated nodes.^[7]

Matrix Form

Vandermonde Matrix Connection

Divided differences can be expressed in a matrix form that connects directly to the Vandermonde matrix through the lens of polynomial interpolation bases. Consider distinct points x_0, x_1, \dots, x_n and function values f(x_i) forming the vector \mathbf{f} = (f(x_0), f(x_1), \dots, f(x_n))^T. The Vandermonde matrix V is defined with entries V_{ij} = x_i^{j} for i, j = 0, \dots, n, and it relates the monomial basis coefficients \mathbf{a} of the interpolating polynomial to \mathbf{f} via V \mathbf{a} = \mathbf{f}, so \mathbf{a} = V^{-1} \mathbf{f}. In contrast, the vector of divided differences \mathbf{c} = (f[x_0], f[x_0, x_1], \dots, f[x_0, \dots, x_n])^T serves as coefficients in the Newton basis, where the interpolating polynomial is p(x) = \sum_{k=0}^n c_k \prod_{j=0}^{k-1} (x - x_j). This Newton form arises from solving L \mathbf{c} = \mathbf{f}, where L is the lower triangular matrix with L_{ij} = \prod_{m=0}^{j-1} (x_i - x_m) for i \geq j and 0 otherwise, yielding \mathbf{c} = L^{-1} \mathbf{f}.^[22]^[23] The connection to the Vandermonde matrix emerges via the change of basis between monomial and Newton forms. Specifically, the transposed Vandermonde matrix admits an LU decomposition V^T = L U, where L maps the Newton basis polynomials to the monomial basis, and U relates the Newton basis to the Lagrange basis. Consequently, the inverse satisfies V^{-1} = U^{-1} L^{-1}, so the monomial coefficients are \mathbf{a} = U^{-1} (L^{-1} \mathbf{f}) = U^{-1} \mathbf{c}. This factorization highlights how divided differences \mathbf{c} are obtained first by premultiplying \mathbf{f} by L^{-1}, which computes the Newton coefficients directly, before transforming to the monomial basis via U^{-1}. The entries of L^{-1} can be computed recursively, leveraging the structure to avoid explicit inversion of the full Vandermonde, which is often ill-conditioned.^[22]^[23] An explicit formula for the k-th divided difference f[x_0, \dots, x_k] underscores this matrix perspective:

f[x_0, \dots, x_k] = \sum_{j=0}^k \frac{f(x_j)}{\prod_{\substack{m=0 \\ m \neq j}}^k (x_j - x_m)}.

This expression arises as the (k+1)-th entry of \mathbf{c} = L^{-1} \mathbf{f} and corresponds to the leading coefficient in the monomial basis for the interpolant over those points, linking back to the structure of V^{-1}. For the full vector, the relation to subsets appears in combinatorial interpretations, but the sum form suffices for computation without permanents, though determinants of sub-Vandermonde matrices appear in proofs of uniqueness.^[24] Geometrically, this matrix connection interprets divided differences as a coordinate transformation in the vector space of polynomials of degree at most n. The monomial basis \{1, x, x^2, \dots, x^n\} and Newton basis \{1, \pi_1(x), \pi_2(x), \dots, \pi_n(x)\} (with \pi_k(x) = \prod_{j=0}^{k-1} (x - x_j)) span the same space, and the Vandermonde matrix encodes the evaluation map from monomials to values at the points. The change of basis matrix S satisfies V = L S, where S has entries that are elementary symmetric functions in the x_i, facilitating stable conversions between bases for interpolation tasks. This view emphasizes the Newton basis's hierarchical structure, building polynomials incrementally, unlike the global nature of the monomial basis captured by V.^[22] When the points are equally spaced, say x_i = x_0 + i h for step size h > 0, divided differences link directly to finite differences, offering computational advantages. Specifically, the k-th divided difference is f[x_0, \dots, x_k] = \frac{\Delta^k f(x_0)}{k! h^k}, where \Delta^k denotes the forward difference operator. This relation allows efficient computation of divided differences via difference tables, avoiding the instability of direct Vandermonde inversion, as finite differences require only O(n^2) additions and no divisions by small denominators unless h is tiny. Such equivalence underpins numerical differentiation and integration schemes, where the Vandermonde framework would otherwise demand solving ill-conditioned systems.^[24]

Relation to Polynomials and Power Series

Divided differences provide the coefficients in the Newton form of the interpolating polynomial, which expresses a polynomial p(x) of degree at most n that passes through given points (x_0, f(x_0)), \dots, (x_n, f(x_n)) as

p(x) = f[x_0] + f[x_0, x_1](x - x_0) + f[x_0, x_1, x_2](x - x_0)(x - x_1) + \dots + f[x_0, \dots, x_n] \prod_{i=0}^{n-1} (x - x_i),

where the coefficients a_j = f[x_0, \dots, x_j] are the divided differences of order up to j.^[25] This form is analogous to the Taylor polynomial expansion but uses products of (x - x_i) instead of powers of (x - x_0), allowing efficient computation and addition of points without recomputing prior coefficients.^[5] For analytic functions, divided differences generalize the Taylor series coefficients. When all interpolation points coincide at x_0, the k-th order divided difference reduces exactly to f[x_0, \dots, x_0] = f^{(k)}(x_0)/k! (with x_0 repeated k+1 times), recovering the Taylor coefficients.^[5] For distinct but closely spaced points near x_0, the divided difference f[x_0, \dots, x_k] approximates f^{(k)}(\xi)/k! for some \xi in the interval, providing a finite-difference estimate of the derivative that aligns with the Taylor expansion as the points converge.^[9] Divided differences can be extracted from the coefficients of a function's power series expansion using confluent Vandermonde matrices, which generalize the standard Vandermonde matrix for cases with repeated points. The confluent form incorporates derivatives via limits of divided differences, solving for interpolation coefficients in the power basis from function values and derivatives at the nodes.^[26] In Hermite interpolation, which matches both function values and derivatives at nodes, divided differences are extended to repeated points by defining higher-order differences through successive derivatives: for a point x_0 repeated m times, f[x_0, \dots, x_0] = f^{(m-1)}(x_0)/(m-1)!. This yields the Newton form for Hermite polynomials, bridging polynomial interpolation with Taylor-like expansions at repeated nodes.^[27]

Alternative Characterizations

Expanded Form

The expanded form provides an explicit, non-recursive expression for the divided difference of order n at distinct points x_0, x_1, \dots, x_n, given by

f[x_0, x_1, \dots, x_n] = \sum_{j=0}^n \frac{f(x_j)}{\prod_{\substack{0 \leq i \leq n \\ i \neq j}} (x_j - x_i)}.

This formula expresses the divided difference as a sum over the function values f(x_j), each weighted by the reciprocal of the product of differences between x_j and the other points. To establish equivalence with the recursive definition, the formula can be proved by mathematical induction on n. For the base case n=0, it reduces to f[x_0] = f(x_0), which holds trivially. Assuming it holds for orders up to n-1, substituting into the recurrence relation f[x_0, \dots, x_n] = \frac{f[x_1, \dots, x_n] - f[x_0, \dots, x_{n-1}]}{x_n - x_0} yields the sum for order n after algebraic simplification, confirming the induction step.^[9] Computationally, this form interprets the divided difference as a weighted linear combination of the function values, where the weights w_j = 1 / \prod_{i \neq j} (x_j - x_i) reflect the influence of each point relative to the others; these weights are precisely the reciprocals of the derivatives of the monic polynomial with roots at the points, evaluated at x_j.^[28] This explicit representation facilitates theoretical analysis, particularly in complex analysis, where it leads to a contour integral form: if f is analytic in a domain containing the points, then f[x_0, \dots, x_n] = \frac{1}{2\pi i} \oint_C \frac{f(\zeta) \, d\zeta}{\prod_{k=0}^n (\zeta - x_k)}, with C a suitable closed contour enclosing the points.^[29]

Peano Form

The Peano form provides an integral representation of the divided difference f[x_0, \dots, x_k] for a sufficiently smooth function f, expressing it in terms of the k-th derivative and a kernel function known as the Peano kernel. Specifically, if f is k-times continuously differentiable on an interval containing the points x_0, \dots, x_k, then

f[x_0, \dots, x_k] = \frac{1}{k!} \int_a^b f^{(k)}(t) \, K(t) \, dt,

where [a, b] is an interval enclosing the points (typically a = \min\{x_i\}, b = \max\{x_i\}), and K(t) is the Peano kernel depending on the points x_0, \dots, x_k.^[7]^[30] This representation highlights the divided difference as a weighted average of the k-th derivative, analogous to the mean value theorem for divided differences. This form arises from the Taylor expansion of f with integral remainder. Consider the Taylor formula at a point s with remainder after k-1 terms:

f(s) = \sum_{j=0}^{k-1} \frac{f^{(j)}(u)}{j!} (s - u)^j + \frac{1}{(k-1)!} \int_u^s f^{(k)}(t) (s - t)^{k-1}_+ \, dt,

for some base point u. Applying the divided difference operator [x_0, \dots, x_k] to both sides, the operator annihilates polynomials of degree less than k, leaving only the remainder term. The divided difference of the integral yields

f[x_0, \dots, x_k] = \frac{1}{k!} \int_a^b f^{(k)}(t) \left( k! \, [x_0, \dots, x_k] (\cdot - t)^{k-1}_+ \right) dt,

where (\cdot - t)^{k-1}_+ = \max(\cdot - t, 0)^{k-1} is the truncated power function. Thus, the Peano kernel is defined as

K(t) = k! \, [x_0, \dots, x_k] (\cdot - t)^{k-1}_+,

which is a piecewise polynomial of degree k-1, supported on [a, b], and satisfies \int_a^b K(t) \, dt = 1. This kernel coincides with the cardinal B-spline of order k on the knots x_0, \dots, x_k.^[7]^[30] For low-order cases, the kernel takes simple forms. For k=1 with points x_0 < x_1,

K(t) = \begin{cases} 1 & t \in [x_0, x_1], \\ 0 & \text{otherwise}, \end{cases}

so f[x_0, x_1] = \int_{x_0}^{x_1} f'(t) \, dt / (x_1 - x_0), recovering the fundamental theorem of calculus. For k=2 with x_0 < x_1 < x_2, the kernel K(t) = 2 [x_0, x_1, x_2] (\cdot - t)_+ is a piecewise linear function: it rises linearly from 0 at x_0 to a peak, then falls to 0 at x_2, with the explicit form depending on the spacing (e.g., uniform spacing yields a symmetric tent function normalized to integral 1). These examples illustrate how the kernel captures the local variation influenced by the point distribution.^[7] The Peano form is particularly useful in error analysis for smooth functions, as it allows bounding the divided difference via the supremum norm of the derivative: |f[x_0, \dots, x_k]| \leq \frac{\|f^{(k)}\|_\infty}{k!} \int_a^b |K(t)| \, dt. The total variation V(K) = \int_a^b |K(t)| \, dt provides a sharp constant, often V(K) \leq 1 for monotone kernels, enabling precise estimates in interpolation error and stability analysis without relying on discrete sums. This continuous perspective facilitates asymptotic analysis and extensions to spline approximations.^[30]

Connection to Forward and Backward Differences

When the interpolation points are equally spaced with spacing h, that is, x_i = x_0 + i h, the divided differences simplify to a scaled version of the forward finite differences. Specifically, the k-th order divided difference satisfies

f[x_0, x_1, \dots, x_k] = \frac{\Delta^k f(x_0)}{k! \, h^k},

where \Delta denotes the forward difference operator defined by \Delta f(x) = f(x + h) - f(x) and \Delta^k = \Delta (\Delta^{k-1}).^[2] This relation arises because the recursive structure of divided differences aligns with the binomial expansion underlying finite differences for uniform grids.^[2] For backward differences, consider points in reverse order starting from x_n, such that x_{n-k} = x_n - k h. The corresponding divided difference is

f[x_{n-k}, x_{n-k+1}, \dots, x_n] = \frac{\nabla^k f(x_n)}{k! \, h^k},

where \nabla is the backward difference operator given by \nabla f(x) = f(x) - f(x - h) and \nabla^k = \nabla (\nabla^{k-1}).^[2] This connection highlights the symmetry between forward and backward formulations in the equally spaced case. These relations enable the Newton forward and backward interpolation formulas, which express the interpolating polynomial in terms of binomial coefficients and finite differences. The forward form, centered at x_0, is

P_n(x_0 + s h) = \sum_{k=0}^n \binom{s}{k} \Delta^k f(x_0),

where \binom{s}{k} = \frac{s(s-1) \cdots (s-k+1)}{k!}. The backward form, centered at x_n, is

P_n(x_n - s h) = \sum_{k=0}^n \binom{s + k - 1}{k} \nabla^k f(x_n).

The table below summarizes these formulas and their coefficient structures:

Formula Type	Polynomial Expression	Coefficient Link
Newton Forward	P_n(x_0 + s h) = \sum_{k=0}^n \binom{s}{k} \Delta^k f(x_0)	Binomial coefficients \binom{s}{k}
Newton Backward	P_n(x_n - s h) = \sum_{k=0}^n \binom{s + k - 1}{k} \nabla^k f(x_n)	Generalized binomial coefficients \binom{s + k - 1}{k}

These binomial forms facilitate efficient computation on uniform grids by avoiding direct divided difference tables.^[2] Although finite differences offer computational advantages for equal spacings, they are inapplicable to unequal spacings, where general divided differences must be used instead. However, even with equal spacings, higher-order divided differences (and thus finite differences) can become ill-conditioned due to error amplification in the recurrence relations, particularly when points are closely clustered or in finite-precision arithmetic.^[31] For unequal spacings, this ill-conditioning worsens because small denominators in the divided difference recurrence lead to subtractive cancellation and magnified rounding errors, motivating the development of robust algorithms like scaling and squaring for accurate computation.^[31]

Applications

Newton's Divided Difference Interpolation

Newton's divided difference interpolation constructs the unique polynomial p_n(x) of degree at most n that passes through n+1 given points (x_i, f(x_i)) for i = 0, \dots, n, where the x_i are distinct. The polynomial takes the nested form

p_n(x) = f[x_0] + f[x_0, x_1](x - x_0) + f[x_0, x_1, x_2](x - x_0)(x - x_1) + \cdots + f[x_0, \dots, x_n] \prod_{j=0}^{n-1} (x - x_j),

or more compactly,

p_n(x) = \sum_{k=0}^n f[x_0, \dots, x_k] \prod_{j=0}^{k-1} (x - x_j),

with the empty product defined as 1. This form leverages the divided differences f[x_0, \dots, x_k] as coefficients, which approximate higher-order derivatives scaled by factorials when the points are close.^[4]^[32] To build the interpolant, first compute the divided differences using a table constructed via the recurrence relation. Initialize the zeroth-order column with f[x_i] = f(x_i) for i = 0, \dots, n. Then, for orders k = 1 to n, compute each k-th order divided difference as

f[x_i, \dots, x_{i+k}] = \frac{f[x_{i+1}, \dots, x_{i+k}] - f[x_i, \dots, x_{i+k-1}]}{x_{i+k} - x_i}

for appropriate i. The entries along the main diagonal, f[x_0, \dots, x_k] for k = 0, \dots, n, are the coefficients used in the Newton form. This tabular algorithm systematically generates all required differences in O(n^2) operations, facilitating efficient computation even for unequally spaced points.^[33]^[34] The Newton form offers key advantages over the Lagrange interpolation formula. It supports hierarchical construction: adding a new point x_{n+1} requires only extending the table to compute additional divided differences, leaving prior coefficients unchanged, which is ideal for iterative refinement. Furthermore, evaluation uses nested multiplication akin to Horner's scheme, enhancing numerical stability by minimizing operations and reducing propagation of rounding errors, particularly for high-degree polynomials or ill-conditioned data.^[35]^[36] For illustration, consider interpolating f(x) = e^x at points x_0 = 0, x_1 = 1, x_2 = 2. The divided difference table (using e \approx 2.71828, e^2 \approx 7.38906) is

x_i	f[x_i]	1st-order	2nd-order
0	1
		1.71828
1	2.71828		1.47625
		4.67078
2	7.38906

The interpolating polynomial is p_2(x) = 1 + 1.71828x + 1.47625 x (x-1). This quadratic matches e^x exactly at the points and provides a local approximation elsewhere.^[4]^[37]

Error Estimation and Convergence

The error in approximating a function f by its Newton divided difference interpolating polynomial p_n(x) of degree at most n, based on distinct points x_0, \dots, x_n, is expressed as

f(x) - p_n(x) = f[x_0, \dots, x_n, x] \prod_{j=0}^n (x - x_j)

for any x \notin \{x_0, \dots, x_n\}, where f[x_0, \dots, x_n, x] denotes the (n+1)-th divided difference extended to include x.^[38] Assuming f is (n+1)-times continuously differentiable on an interval containing the points, the divided difference in the error term satisfies

f[x_0, \dots, x_n, x] = \frac{f^{(n+1)}(\xi)}{(n+1)!}

for some \xi in the convex hull of \{x_0, \dots, x_n, x\}.^[39] This yields the bound

|f(x) - p_n(x)| \leq \frac{M_{n+1}}{(n+1)!} \prod_{j=0}^n |x - x_j|,

where M_{n+1} = \max |f^{(n+1)}(\zeta)| over the relevant interval.^[40] The factorial denominator ensures the error tends to zero as n \to \infty for fixed points and bounded higher derivatives, though the product term grows with the spread of the nodes. For analytic functions f (holomorphic in a complex domain), the interpolation sequence p_n converges uniformly to f on compact subsets of the domain when using suitable node distributions, such as Chebyshev nodes scaled to the interval.^[19] However, with equispaced nodes over large intervals, high-degree polynomials exhibit the Runge phenomenon, where oscillations amplify near the endpoints, preventing convergence; this was first demonstrated for f(x) = 1/(1 + 25x^2) on [-1, 1]. To mitigate this and promote convergence, Chebyshev nodes—projections of equispaced points on the unit circle—distribute errors more evenly, reducing endpoint overshoot without altering the polynomial degree.^[19]