Interpolation
Interpolation is a fundamental method in numerical analysis and mathematics for estimating unknown values within a range of known discrete data points by constructing a continuous function that exactly passes through those points.[1] This process, often involving polynomials or piecewise functions, enables the approximation of smooth curves from sampled data, distinguishing it from broader approximation techniques that may not require exact fits at the given points.[1] The practice of interpolation has ancient origins, with Babylonian astronomers employing it around 300 BC for tabular computations, followed by Greek scholar Hipparchus around 150 BC in astronomical predictions.[2] The term "interpolation" first appeared in 1655 in a Latin text by John Wallis, but systematic development began in the 17th century with Thomas Harriot's work in 1611 and Isaac Newton's foundational contributions starting in 1675, including finite difference methods for polynomial interpolation.[2] In the 18th century, Edward Waring discovered the Lagrange interpolation formula in 1779, which was independently published by Joseph-Louis Lagrange in 1795, providing an explicit basis for polynomial forms.[3] Key methods of interpolation include polynomial interpolation, where a single polynomial of degree at most n is fitted to n+1 distinct points, ensuring uniqueness via the fundamental theorem of algebra.[1] The Lagrange form expresses the interpolant as a weighted sum of basis polynomials, each centered at a data point, while the Newton form uses divided differences for efficient computation and extension to more points.[1] For improved stability and avoidance of high-degree polynomial oscillations—known as Runge's phenomenon—piecewise interpolation methods like splines divide the domain into subintervals, fitting low-degree polynomials (e.g., linear or cubic) on each while enforcing continuity conditions.[1] Interpolation finds wide applications in data analysis, computer graphics for curve rendering, signal processing for resampling, and numerical methods such as quadrature (integration) and differentiation, where it derives approximation rules from fitted polynomials.[1] Error estimation is essential, often bounded by the next derivative of the underlying function and the point distribution, guiding the choice of method for accuracy and computational efficiency.[1]Fundamentals
Definition
In numerical analysis, interpolation is the process of constructing a function that passes exactly through a given set of discrete data points to estimate values at intermediate locations.[1] Specifically, given a set of n+1 distinct points (x_0, y_0), (x_1, y_1), \dots, (x_n, y_n) where the x_i are the nodes and y_i = f(x_i) are the corresponding function values, interpolation seeks a function f such that f(x_i) = y_i for i = 0, 1, \dots, n, allowing evaluation of f(x) for x not coinciding with any x_i.[1] This approach assumes basic knowledge of functions and emphasizes the role of data points as nodes where the interpolant must fit exactly.[4] A key distinction from function approximation is that interpolation requires an exact fit at the specified nodes, whereas approximation methods, such as least-squares fitting, seek to minimize overall error without necessarily passing through every point—often when the number of data points exceeds the degree of the approximating function.[1] For instance, in polynomial interpolation of degree at most n through n+1 points, the solution is unique, ensuring precise reproduction at the nodes.[1] Common formulations include the Lagrange interpolating polynomial, expressed as p(x) = \sum_{i=0}^n y_i \ell_i(x), where \ell_i(x) = \prod_{j=0, j \neq i}^n \frac{x - x_j}{x_i - x_j} are the basis polynomials, or the Newton form, which builds the interpolant incrementally using divided differences without deriving the full expressions here.[1] These provide introductory ways to represent the interpolating function while maintaining the exact fit requirement.[5]Motivation and History
Interpolation serves as a fundamental tool in numerical analysis for estimating unknown values within discrete datasets, enabling the approximation of continuous functions from sampled points in fields such as scientific measurement, engineering design, and data visualization.[6] This need arises because real-world data is often collected at irregular or finite intervals, requiring interpolation to fill gaps, smooth curves, or predict intermediate values for practical applications like modeling physical phenomena or generating graphical representations.[7] The practice of interpolation dates back to ancient astronomy, where Claudius Ptolemy employed linear interpolation techniques in the 2nd century AD to construct tables of chord functions in his Almagest, facilitating the prediction of planetary positions from discrete observations.[8] In the 17th century, Isaac Newton laid foundational work on finite differences and interpolation in a 1675 letter, establishing methods for polynomial approximation that influenced classical theory.[9] Joseph-Louis Lagrange advanced this in 1795 with his explicit formula for polynomial interpolation, providing a systematic way to construct unique polynomials passing through given points.[9] By the early 20th century, Carl Runge highlighted limitations in 1901, demonstrating through examples that high-degree polynomial interpolation on equispaced points could lead to oscillatory errors, known as Runge's phenomenon, which underscored the need for careful node selection in approximations.[10] Key advancements in the mid-20th century included the mathematical formalization of splines by I. J. Schoenberg in 1946, inspired by flexible wooden or metal strips used in shipbuilding to draw smooth hull curves, leading to piecewise polynomial methods that avoid global oscillations.[11] In geostatistics, D. G. Krige developed early statistical interpolation techniques in 1951 for estimating ore grades in South African mining, which Georges Matheron formalized as kriging in 1960, introducing optimal unbiased prediction under spatial correlation assumptions.[12] The advent of digital computing in the 1950s propelled interpolation into numerical methods for solving differential equations and data processing on early machines like ENIAC, enabling efficient implementation of algorithms for engineering simulations. In the modern era, interpolation has integrated with artificial intelligence, particularly post-2020, for data imputation in large-scale machine learning datasets, where methods like Gaussian process regression variants enhance missing value estimation while preserving statistical properties in high-dimensional data.[13]Univariate Interpolation Methods
Nearest-Neighbor Interpolation
Nearest-neighbor interpolation, also known as piecewise constant interpolation or zero-order hold, is the simplest method for estimating values between known data points in univariate interpolation. It assigns to a query point x the value y_i of the closest data point x_i from a given set of pairs (x_j, y_j) for j = 1 to n, without any blending or smoothing.[14][15] This approach is particularly suited for categorical data or scenarios where smoothness is not required, producing a step-like function with constant values in intervals defined by midpoints between neighboring points.[16] The mathematical formulation is given by: f(x) = y_i \quad \text{where} \quad i = \arg\min_{j=1,\dots,n} |x - x_j| In cases of ties (equal distances to multiple points), a convention such as selecting the leftmost or rightmost point, or rounding toward even indices, is typically applied to ensure determinism.[15] The algorithm involves, for a query point x, computing the Euclidean distance (absolute difference in one dimension) to each x_j and selecting the y_j with the minimum distance; this naive implementation runs in O(n) time per query. With preprocessing, such as sorting the x_j or using a search structure like a binary search tree, the time complexity can be reduced to O(\log n) or O(1) for uniform grids.[15][16] Consider a simple example with data points (0, 1), (1, 3), and (2, 2). For a query at x = 0.6, the distances are |0.6 - 0| = 0.6, |0.6 - 1| = 0.4, and |0.6 - 2| = 1.4, so the closest point is (1, 3) and f(0.6) = 3. This results in a discontinuous step function: constant at 1 for x \in [-0.5, 0.5), 3 for x \in [0.5, 1.5), and 2 for x \in [1.5, 2.5), visualizing as a staircase with jumps at decision boundaries.[16] Nearest-neighbor interpolation offers significant computational efficiency, requiring minimal operations beyond distance comparisons, making it ideal for real-time applications or large datasets where speed trumps accuracy. It also preserves the exact range of input values, avoiding extrapolation beyond the data. However, it produces non-differentiable, discontinuous outputs that poorly approximate smooth underlying functions, leading to visual artifacts like blockiness in images or aliasing in signals.[14][15][16]Linear Interpolation
Linear interpolation is a fundamental method in numerical analysis for estimating values between known data points by constructing a piecewise linear function that connects consecutive points with straight line segments, assuming the data points are ordered by their independent variable values x_i. This approach provides a simple, first-order approximation that is computationally efficient and preserves monotonicity within each interval.[17] The interpolated value f(x) for a query point x lying within the interval [x_i, x_{i+1}], where the known points are (x_i, y_i) and (x_{i+1}, y_{i+1}) with x_i < x_{i+1}, is given by the formula: f(x) = y_i + (y_{i+1} - y_i) \frac{x - x_i}{x_{i+1} - x_i} This expression ensures exact reproduction of the data points at the endpoints.[17] The formula derives from the concept of a weighted average, where f(x) is a convex combination of y_i and y_{i+1}. The weight for y_{i+1} is the relative distance \frac{x - x_i}{x_{i+1} - x_i}, which ranges from 0 to 1 across the interval, and the weight for y_i is the complement $1 - \frac{x - x_i}{x_{i+1} - x_i}. This linear weighting follows directly from the equation of a straight line passing through the two points, parameterized by the slope m = \frac{y_{i+1} - y_i}{x_{i+1} - x_i}.[18] To illustrate, consider the data points (0, 0), (1, 1), and (2, 0). The piecewise linear interpolant consists of two segments: from x=0 to x=1, and from x=1 to x=2. The following table shows interpolated values at selected points within these intervals:| x | Interval | f(x) |
|---|---|---|
| 0.0 | [0,1] | 0.0 |
| 0.25 | [0,1] | 0.25 |
| 0.5 | [0,1] | 0.5 |
| 0.75 | [0,1] | 0.75 |
| 1.0 | [0,1] or [1,2] | 1.0 |
| 1.25 | [1,2] | 0.75 |
| 1.5 | [1,2] | 0.5 |
| 1.75 | [1,2] | 0.25 |
| 2.0 | [1,2] | 0.0 |
Polynomial Interpolation
Polynomial interpolation constructs a unique polynomial p(x) of degree at most n-1 that passes exactly through n given distinct points (x_i, y_i) for i = 0, 1, \dots, n-1, where y_i = f(x_i) for some underlying function f. This global method applies the same polynomial across the entire domain, making it suitable for exact fitting but prone to instability for high degrees or ill-conditioned points.[22][23] One common representation is the Lagrange form, which expresses p(x) directly in terms of the data points without solving a system of equations: p(x) = \sum_{i=0}^{n-1} y_i \ell_i(x), where the Lagrange basis polynomials are \ell_i(x) = \prod_{\substack{j=0 \\ j \neq i}}^{n-1} \frac{x - x_j}{x_i - x_j}. Each \ell_i(x) is 1 at x = x_i and 0 at the other points x_j (j \neq i), ensuring the interpolation conditions are satisfied. This form is intuitive for theoretical analysis but computationally inefficient for large n due to the product evaluations.[22][23] An alternative is the Newton form, which builds the polynomial incrementally using divided differences and is more efficient for adding points or evaluating at multiple locations: p(x) = f[x_0] + f[x_0, x_1](x - x_0) + f[x_0, x_1, x_2](x - x_0)(x - x_1) + \cdots + f[x_0, \dots, x_{n-1}] \prod_{k=0}^{n-2} (x - x_k), where the coefficients are the divided differences defined recursively: f[x_i] = y_i, and for k \geq 1, f[x_i, \dots, x_{i+k}] = \frac{f[x_{i+1}, \dots, x_{i+k}] - f[x_i, \dots, x_{i+k-1}]}{x_{i+k} - x_i}. These can be computed via a divided-difference table, facilitating numerical stability and error estimation. For equispaced points, forward differences simplify the process, but the general form handles arbitrary spacing.[24][23] Example: Cubic Interpolation for Rocket Velocity Consider interpolating the upward velocity v(t) of a rocket at times t = 0, 2, 4, 6 seconds, with data:| t (s) | v(t) (m/s) |
|---|---|
| 0 | 0 |
| 2 | 227 |
| 4 | 362 |
| 6 | 517 |
| t_i | f[t_i] | First-order | Second-order | Third-order |
|---|---|---|---|---|
| 0 | 0 | |||
| 113.5 | ||||
| 2 | 227 | -11.5 | ||
| 67.5 | 2.333 | |||
| 4 | 362 | 2.5 | ||
| 77.5 | ||||
| 6 | 517 |