Remez algorithm
The Remez algorithm is an iterative numerical method for determining the unique polynomial of best uniform (minimax) approximation to a continuous function on a closed interval, minimizing the maximum deviation between the function and its polynomial approximation of a given degree.[1] Developed by Soviet mathematician Evgeny Yakovlevich Remez in 1934,[2][3] it leverages Chebyshev's equioscillation theorem, which states that the error in the optimal approximation equioscillates—reaching its maximum absolute value with alternating signs—at exactly n+2 points for a polynomial of degree n.[2] This approach ensures the approximation is optimal in the L∞ norm, distinguishing it from least-squares methods that minimize average error.[4]
The algorithm proceeds through an exchange process: it begins with an initial set of n+2 reference points (often Chebyshev nodes for stability), solves a linear system to find polynomial coefficients that force the weighted error to equioscillate at these points, and then updates the points by locating the actual extrema of the error function via root-finding or optimization techniques.[5] Iterations continue until the reference points converge to the true equioscillation points, typically requiring few steps for rapid convergence, though careful initialization is crucial to avoid local minima.[1] For rational approximations, the method extends by linearizing the nonlinear denominator, solving iteratively for numerator and denominator coefficients while estimating the error bound.[4]
Remez's work built on earlier approximation theory, including Chebyshev's 1853 contributions, and has influenced fields beyond pure mathematics, such as digital signal processing where the Parks-McClellan algorithm—a discrete variant—designs optimal FIR filters with minimal passband/stopband ripple.[2] Implementations appear in numerical libraries like Boost.Math and Wolfram's tools, underscoring its practical utility in approximating special functions (e.g., Bessel functions) and engineering applications requiring high-fidelity low-degree approximations.[5] Despite its efficiency, the algorithm can be sensitive to function smoothness and interval choice, prompting variants for weighted or discrete data scenarios.[4]
Overview and History
Definition and Purpose
The Remez algorithm is an iterative, exchange-based procedure designed to construct the best uniform-norm polynomial approximation to a continuous function on a closed interval.[6] Given a continuous function f(x) defined on [a, b] and a specified degree n, it seeks a polynomial p_n(x) of degree at most n that minimizes the maximum deviation, formally \|f - p_n\|_\infty = \max_{x \in [a, b]} |f(x) - p_n(x)|.[7] This minimax approximation is unique and characterized by the property that the error f(x) - p_n(x) attains its maximum absolute value at exactly n+2 points in [a, b] with alternating signs, known as equioscillation.[6]
The primary purpose of the Remez algorithm is to achieve optimal error control in the Chebyshev norm (uniform norm), ensuring the smallest possible maximum deviation across the entire interval rather than an average measure.[6] This makes it particularly valuable in numerical computations where bounding the worst-case error is critical, such as in filter design or function evaluation routines.[6] In contrast to least-squares methods, which minimize the integral of squared errors and may yield large localized deviations, the Remez approach provides superior uniform approximation by directly targeting the supremum norm.[6]
Historical Development
The Remez algorithm was developed by the Soviet mathematician Evgeny Yakovlevich Remez and first published in 1934 through a series of three papers in mathematical journals, including contributions in the Communications of the Kharkov Mathematical Society and Comptes Rendus de l'Académie des Sciences.[8] These publications presented an iterative procedure for computing minimax polynomial approximations, emerging from Remez's work at institutions in Kharkov and Kiev.[3] The algorithm built upon earlier foundational results in approximation theory, notably Pafnuty Chebyshev's 19th-century work on minimax polynomials and Charles-Jean de la Vallée Poussin's 1919 theorem providing bounds on approximation errors.[8]
Amid the flourishing of Soviet mathematics in the 1930s, characterized by advances in constructive function theory despite political challenges, Remez's contributions addressed practical numerical needs for uniform approximations of continuous functions.[3] Early manual computations using the method, such as approximations to the absolute value function up to degree 11, were performed by Remez's students at the University of Kiev, demonstrating its feasibility before widespread computing.[8]
Post-World War II, the algorithm received increasing attention in Western literature during the 1950s and 1960s, aligning with the growth of electronic computers and numerical analysis.[8] Implementations appeared in FORTRAN-based software libraries, including ACM Algorithm 318 in 1967 for polynomial approximation and subsequent routines in the 1970s.[8] By the mid-20th century, it influenced developments in related areas, such as algorithms for rational function approximations and digital filter design, exemplified by the Parks-McClellan method in 1972.[8]
The 75th anniversary of the algorithm's publication in 2009 prompted reflections on its lasting impact, with analyses emphasizing its role in modern numerical methods.[8] Contemporary adaptations, including barycentric interpolation forms integrated into systems like Chebfun, have enhanced its applicability to high-degree polynomials and complex domains.[8]
Mathematical Foundations
Chebyshev Approximation Theory
The uniform norm, also known as the supremum or infinity norm, measures the maximum deviation of an approximation from the target function over a given interval [a, b], defined as \|e\|_\infty = \max_{x \in [a,b]} |f(x) - p(x)|, where f is the continuous function to be approximated and p is a polynomial of degree at most n.[9] The minimax problem seeks the polynomial p_n that minimizes this norm, yielding the best uniform approximation, which ensures the smallest possible worst-case error across the interval.[9]
For a continuous function f on a compact interval [a, b], there exists a unique polynomial of degree at most n that achieves this minimal uniform norm, distinguishing uniform approximation from other settings where multiple solutions may exist.[10] This uniqueness guarantees a well-defined best approximation, facilitating theoretical analysis and practical computation in approximation theory.
Chebyshev polynomials of the first kind, T_n(x), play a central role as they provide the monic polynomial of degree n with the smallest maximum deviation from zero on [-1, 1], where the leading coefficient is $2^{n-1} for n \geq 1.[11] These polynomials equioscillate between -1 and $1, achieving the minimal deviation bound of $1/2^{n-1}. To extend this to an arbitrary interval [a, b], Chebyshev polynomials are scaled and shifted via the affine transformation x = \frac{b-a}{2} t + \frac{a+b}{2}, where t \in [-1, 1], preserving the minimax properties while adapting to the new domain.[12]
The uniform norm is preferred in numerical analysis over other norms, such as the L^2 norm, because it directly bounds the worst-case error, which is critical for applications requiring guaranteed performance across the entire interval rather than average behavior.[13]
Equioscillation Theorem
The equioscillation theorem, also known as Chebyshev's alternation theorem, provides a fundamental characterization of the best uniform (minimax) polynomial approximation to a continuous function on a closed interval. Specifically, for a continuous function f: [a, b] \to \mathbb{R} and polynomials of degree at most n, a polynomial p_n is the unique best uniform approximation to f if and only if the error e(x) = f(x) - p_n(x) equioscillates at exactly n+2 points in [a, b]. That is, there exist points x_0 < x_1 < \cdots < x_{n+1} such that |e(x_i)| = \|e\|_\infty for all i = 0, \dots, n+1, and the signs alternate: e(x_{i+1}) = -e(x_i).[14][15]
A closely related result is the de la Vallée Poussin theorem, which states that if there exist n+2 points t_0 < t_1 < \cdots < t_{n+1} in [a, b] such that f(t_i) - p(t_i) = (-1)^i \epsilon_i for some polynomial p of degree at most n and \epsilon_i > 0 with constant sign, then the best approximation error satisfies E_n(f) \geq \min_i \epsilon_i.[16]
The proof of the equioscillation theorem relies on a contradiction argument. Assume p_n is the best approximation but the error equioscillates at fewer than n+2 points. Let M(e) = \{x \in [a, b] : |e(x)| = \|e\|_\infty\} denote the set of extremal points. With fewer than n+2 alternating extrema, one can construct a nonzero polynomial r of degree at most n such that e(x) r(x) > 0 at all points in M(e). Then, for sufficiently small \delta > 0, the modified approximation p_n + \delta r yields a smaller uniform error, contradicting the optimality of p_n. The intermediate value theorem ensures that the perturbed error does not introduce new extrema that exceed the original norm, as the sign consistency at existing extrema forces the adjustment to reduce the maximum deviation.[14]
The theorem has key implications for uniqueness and optimality in uniform approximation. Equioscillation at exactly n+2 points guarantees that p_n is the unique minimax polynomial, as any other approximation would violate the alternation property. Fewer than n+2 alternating points indicates a suboptimal approximation, where further refinement can reduce the error norm. This property underpins the theoretical foundation for algorithms seeking minimax solutions.[14][15]
Algorithm Procedure
Initialization Methods
A good initialization is crucial for the Remez algorithm's efficiency, as it influences convergence speed and helps avoid local minima or stagnation in the iterative process.[8] Poor starting points can lead to slower convergence or even failure to reach the global minimax solution, particularly in high-degree approximations or ill-conditioned problems.[2]
One simple method involves selecting an initial set of equioscillation points using a uniform grid. For a polynomial of degree n on the interval [a, b], this entails choosing n+2 equally spaced points across the interval, excluding regions where any weighting function is zero if applicable. This approach is straightforward to implement but may suffer from uneven error distribution due to the Runge phenomenon, potentially slowing initial iterations.[17][2]
A more robust strategy uses Chebyshev nodes, which provide a better-conditioned starting point by leveraging the properties of Chebyshev polynomials. The initial points are taken as the extrema of the Chebyshev polynomial of degree n+1, scaled and shifted to the target interval [a, b], such as via the mapping x_i = \frac{a+b}{2} + \frac{b-a}{2} \cos\left(\frac{i\pi}{n+1}\right) for i = 0, \dots, n+1. This distribution minimizes interpolation errors and promotes faster convergence, often requiring only 5–10 iterations total.[8][5][2]
Another effective technique starts with a least-squares approximation to obtain initial polynomial coefficients, followed by selecting the n+2 points where the residual error is largest as the reference set. This "don't care" least-squares method ensures the initial error exhibits the required number of alternating extrema, making it particularly robust for challenging designs like FIR filter approximation, where it can reduce computational time by 20–50% compared to uniform starts and prevent convergence failures.[18]
To handle ill-conditioned cases, it is advisable to scale the approximation interval to [-1, 1] via an affine transformation, as this aligns with the natural domain of Chebyshev polynomials and improves numerical stability. Additionally, normalizing the target function f(x) by dividing by its maximum value on the interval can prevent overflow and ensure balanced error scaling during computations.[8][5]
Potential pitfalls include choosing points too close to singularities or endpoints, which can cause ill-conditioning or non-equioscillation in the initial error; simple numerical checks, such as verifying that the initial maximum error is below a reasonable threshold (e.g., 10% of the function's range) and that the error alternates at least [n+1](/page/N+1) times, help detect such issues early.[8][2]
Iterative Steps
The iterative steps of the Remez algorithm form the core loop that refines the approximating polynomial toward the minimax solution by alternately updating the coefficients and the reference points where the error is evaluated. Given an initial set of n+2 reference points x_0 < x_1 < \cdots < x_{n+1} in the interval [a, b], the algorithm proceeds in two main phases per iteration: solving for the polynomial coefficients and the deviation, followed by exchanging the reference points based on the computed error function. This exchange-and-update mechanism leverages the equioscillation property to progressively reduce the maximum error, ensuring monotonic improvement in the approximation quality under suitable conditions.[8]
In the first step, the coefficients a_0, \dots, a_n of the approximating polynomial p_n(x) = \sum_{k=0}^n a_k \phi_k(x) (where \phi_k are the basis functions, such as monomials \phi_k(x) = x^k) and the current deviation \delta are determined by solving a linear system that enforces approximate equioscillation at the reference points. Specifically, the system requires p_n(x_i) = f(x_i) - (-1)^i \delta for i = 0, \dots, n+1, which can be rearranged to the following (n+2) \times (n+2) linear system:
\begin{bmatrix}
\phi_0(x_0) & \phi_1(x_0) & \cdots & \phi_n(x_0) & 1 \\
\phi_0(x_1) & \phi_1(x_1) & \cdots & \phi_n(x_1) & -1 \\
\vdots & \vdots & \ddots & \vdots & \vdots \\
\phi_0(x_{n+1}) & \phi_1(x_{n+1}) & \cdots & \phi_n(x_{n+1}) & (-1)^{n+1}
\end{bmatrix}
\begin{bmatrix}
a_0 \\ a_1 \\ \vdots \\ a_n \\ \delta
\end{bmatrix}
=
\begin{bmatrix}
f(x_0) \\ f(x_1) \\ \vdots \\ f(x_{n+1})
\end{bmatrix}.
To enhance numerical stability, the system is often solved by first forming the square (n+1) \times (n+1) subsystem from the first n+1 equations to obtain the coefficients a_k, then using the last equation to compute \delta. For ill-conditioned Vandermonde matrices arising from monomial bases, orthogonal polynomials (such as shifted Chebyshev polynomials) or QR decomposition of the basis matrix can be employed to mitigate round-off errors and ensure reliable solutions.[8][5]
In the second step, the error function e(x) = f(x) - p_n(x) is evaluated over [a, b] to identify its local extrema, where the maximum absolute errors occur. The reference points are then updated by replacing the current set with a new set of n+2 points selected from these extrema locations, prioritizing those that maintain alternating error signs and exhibit the largest minimum deviation among candidate sets to promote progress toward the global minimax approximation. If multiple maxima exist with comparable magnitudes, the selection favors the consecutive set of n+2 extrema that maximizes the smallest absolute error in the set, ensuring the new \delta is at least as large as the previous one. This multiple-exchange strategy accelerates convergence compared to single-point exchanges.[8][19]
The overall iteration can be outlined in pseudocode as follows:
while not equioscillation_achieved(within_tolerance):
Solve [linear system](/page/Linear_system) for a_0, ..., a_n and δ using current reference points x_0, ..., x_{n+1}
Compute error e(x) = f(x) - ∑_{k=0}^n a_k φ_k(x) over [a, b]
Find local extrema of e(x), including their signs and magnitudes
Select new reference points as the n+2 extrema set with the largest min |e| and alternating signs
Update x_0, ..., x_{n+1} to the new points
while not equioscillation_achieved(within_tolerance):
Solve [linear system](/page/Linear_system) for a_0, ..., a_n and δ using current reference points x_0, ..., x_{n+1}
Compute error e(x) = f(x) - ∑_{k=0}^n a_k φ_k(x) over [a, b]
Find local extrema of e(x), including their signs and magnitudes
Select new reference points as the n+2 extrema set with the largest min |e| and alternating signs
Update x_0, ..., x_{n+1} to the new points
This loop repeats until the reference points coincide with the error extrema within a specified tolerance, yielding the desired minimax polynomial.[8]
Convergence and Termination
The Remez algorithm converges quadratically to the unique minimax polynomial approximation under suitable conditions, such as when the target function f is continuous on a compact interval and the initial reference points are appropriately chosen.[8][2] This quadratic convergence rate holds for twice-differentiable functions, with the error typically halving in squared norm every few iterations once near the solution.[8]
The proof of convergence relies on the equioscillation theorem, which characterizes the minimax solution by the error attaining its maximum magnitude with alternating signs at n+2 points, where n is the polynomial degree. Each iteration of the algorithm enforces a levelled error on the current reference set, and a theorem by Remez guarantees that the deviation \delta—the maximum error level—increases monotonically or remains constant while the alternation improves, driving the process toward the unique equioscillation points.[20][17]
Convergence speed is generally rapid for low-degree polynomials, often requiring only 5-10 iterations to achieve high precision, as demonstrated in filter design examples where equiripple behavior stabilizes within 5-8 steps. However, for higher degrees, the process slows due to increasing ill-conditioning in the linear systems solved at each iteration, potentially necessitating more exchanges and higher computational effort.[17][8]
Termination occurs when the computed \delta aligns closely with the actual maximum error on the interval, typically within a small tolerance \epsilon such as $10^{-10}, or when the reference points shift by less than a predefined threshold between iterations, indicating stability.[17][8] In practice, implementations may cap iterations at 20 to avoid prolonged runs, monitoring the relative change in error.[8]
Post-convergence error estimation involves evaluating the approximation over the full interval to confirm equioscillation at exactly n+2 points with equal ripple magnitude; if the alternation is not precise due to numerical effects, additional dense sampling or refinement checks ensure the solution meets the minimax criterion within machine precision.[17][20]
Despite its robustness, the algorithm may fail to converge for non-smooth target functions or with poor initial guesses that miss key extremal points, in which case restarting with adjusted initialization—such as denser grids or Chebyshev nodes—is required to recover progress.[8][2] Numerical round-off can also stall monotonicity in \delta for ill-conditioned high-degree cases, leading to premature termination.[20]
Variants and Extensions
Second Remez Algorithm
The second Remez algorithm, an extension of the original exchange method introduced by Evgeny Yakovlevich Remez, was developed to address scenarios where the first algorithm experiences slow convergence or instability due to inadequate initial reference points. Detailed in Remez's 1957 monograph on computational methods for Chebyshev approximation, it offers a more robust framework for nonlinear minimax problems by iteratively refining both approximation coefficients and error extremal points.[3]
A primary distinction from the standard Remez algorithm lies in its approach to minimizing the maximum deviation error: rather than performing single-point exchanges, it simultaneously optimizes coefficients and reference points through a nonlinear iterative process that aligns the error curve with the equioscillation theorem more directly. This is achieved by replacing the entire reference set at each step with the current error's local extrema, including the global maximum deviation.[21][22]
The procedure often leverages the output of the first Remez algorithm as an initial warm start to accelerate convergence. From there, it proceeds iteratively: a linear system is solved over the current n+2 reference points to compute trial coefficients and an error bound, after which the reference points are updated to the locations of the new error function's extrema (with signs alternating). Weights are then adjusted based on these maximum error positions, and the cycle repeats until the error bound stabilizes within a tolerance.[23][24]
This variant provides advantages in managing nonlinear constraints, particularly for weighted Chebyshev approximations or bases beyond polynomials, such as general Haar systems, where the standard method may falter.[22][24]
Regarding convergence, the second Remez algorithm demonstrates quadratic rates under conditions of twice continuous differentiability for the target function, rendering it more resilient to ill-conditioned problems compared to the quadratic convergence (every n+2 iterations) typical of the first algorithm; in practice, it often necessitates fewer reference set updates for challenging cases.[21][23] For instance, when approximating exponential functions like e^x on intervals such as [0,1], it attains the equioscillation property with greater efficiency, reducing the number of iterations required relative to the standard method.[2]
Approximations for Rational Functions
The Remez algorithm extends to rational function approximation by seeking polynomials p(x) of degree at most m and q(x) of degree at most n (with q(x) \not\equiv 0) to minimize the uniform norm \|f - p/q\|_\infty = \max_{x \in [a,b]} |f(x) - p(x)/q(x)| over a compact interval [a,b], where f is continuous and the approximation is sought without poles of r(x) = p(x)/q(x) in [a,b]. To address non-uniqueness arising from scaling (since \lambda p / \lambda q = p/q for \lambda \neq 0), the problem is typically normalized by fixing the leading coefficient of q(x) to 1 or setting q(c) = 1 for some point c \in [a,b].[2]
Unlike the linear polynomial case, the rational variant is nonlinear due to the quotient structure, requiring adaptations such as differential corrections or successive linearizations to solve the underlying optimization.[25] Initialization often begins with polynomial approximations: for instance, a degree-m polynomial for the numerator based on f, and a degree-n polynomial (or constant) for the denominator, potentially refined via least-squares fitting to ensure q(x) has no zeros in [a,b].[5]
The iterative process selects an initial set of m + n + 2 reference points in [a,b] and solves a nonlinear system enforcing equioscillation of the weighted error at these points, typically via Newton-like methods or by linearizing around a current estimate of q(x)—for example, approximating f(x) q(x) with p(x) while treating the error as \delta (-1)^i at each point x_i, yielding a linear system in the coefficients of p, q, and \delta. New reference points are then determined by locating the extrema of the current error function |f(x) - p(x)/q(x)|, exchanging those that violate the equioscillation pattern, and repeating until the maximum error stabilizes (e.g., within a tolerance like $1.05 times the previous minimum).[2]
Key challenges include potential non-convergence due to the nonlinearity, sensitivity to initial guesses that may lead to denominator zeros or poles entering [a,b], and numerical instability from ill-conditioned systems when degrees m and n are high. These are often mitigated by constrained optimization techniques, such as restricting pole locations outside [a,b] or using regularization to bound the denominator.[5]
The error characterization generalizes the equioscillation property from polynomial approximations, stating that the unique best rational approximation attains its maximum error magnitude at exactly m + n + 2 points in [a,b] with alternating signs, though sign changes may occur near poles if present (but poles are excluded from the interval).[25]
Modern developments, such as the AAA (adaptive Antoulas–Anderson) algorithm, introduced in 2017, draw inspiration from Remez principles by adaptively selecting support points and poles for stable barycentric rational functions, followed by Lawson iterations to refine toward the minimax solution, enhancing robustness for high-degree cases.[26]
Applications and Examples
Digital Signal Processing
The Remez algorithm plays a pivotal role in digital signal processing, most notably in the design of finite impulse response (FIR) filters, where it computes coefficients that yield equiripple characteristics. This minimizes the maximum deviation—or ripple—between the desired and actual frequency responses across passbands and stopbands, ensuring uniform error distribution as per the equioscillation theorem. Such optimality in the minimax sense allows for efficient filters with the smallest possible transition bandwidth for a given order, which is crucial for resource-limited hardware implementations.[27]
A key implementation of the Remez algorithm in this domain is the Parks-McClellan algorithm, introduced in the early 1970s for designing optimal FIR filters tailored to frequency-selective tasks like low-pass, high-pass, and bandpass filtering. This method reformulates the approximation problem using a cosine basis, which exploits the symmetry of linear-phase FIR filters to reduce computational complexity and enable faster iterations. By specifying the desired frequency response H(\omega) over the normalized interval [0, \pi], the algorithm iteratively refines a polynomial that approximates the ideal response while adhering to user-defined band edges and ripple tolerances.[27]
The advantages of this Remez-based approach lie in its guaranteed optimality for minimax error, outperforming simpler methods like windowing in terms of filter sharpness and efficiency for bandwidth-constrained scenarios. It proves especially valuable in applications such as audio signal processing, where precise frequency control prevents distortion, and in communications systems, where tight stopband attenuation suppresses interference. Historically, the Parks-McClellan algorithm's availability as early software tools spurred the development of real-time DSP hardware in the 1980s, transitioning FIR filters from theoretical designs to practical embedded systems. Today, it remains embedded in industry-standard software, exemplified by MATLAB's firpm function, which automates the design process for engineers.[27][28]
Extensions of the algorithm further broaden its utility in DSP, supporting multiband filter designs that handle multiple pass and stop bands simultaneously for complex spectral shaping, such as in equalizers or channelizers. Additionally, weighted approximations enable unequal ripple control, allowing users to prioritize lower error in critical bands (e.g., passband) at the expense of others (e.g., stopband), thus customizing the trade-off between performance and filter length. These enhancements maintain the core Remez exchange procedure while adapting to diverse signal processing demands.[28][27]
Numerical Examples
A concrete illustration of the Remez algorithm involves approximating f(x) = \sin(x) on the interval [0, \pi/2] with a polynomial of degree 3. Initial reference points are selected uniformly as x_i = i \cdot (\pi/2)/4 for i = 0, 1, 2, 3, 4, yielding points 0, \pi/8, \pi/4, $3\pi/8, \pi/2. In the first iteration, the algorithm identifies the error maxima and exchanges reference points to those where the error alternates in sign, updating the set for the next linear solve.[8]
The computation proceeds by solving the linear system for polynomial coefficients at the current reference points, yielding updated approximations. The reference points evolve through iterations, with the final set converging to locations where the error equioscillates. The error function exhibits equioscillation at 5 points across the interval, confirming the minimax property.[8]
Another example applies the Remez algorithm to the Runge function f(x) = 1/(1 + 25x^2) on [-1, 1] using a degree-5 polynomial. Starting with scaled Chebyshev points for initialization, the iterative exchanges lead to a polynomial that minimizes the maximum deviation, avoiding the large oscillations (Gibbs phenomenon) seen in Fourier series approximations of similar functions. The final approximation demonstrates uniform error distribution with equioscillation at 7 points.[8]
To highlight the superiority of the minimax criterion, the Remez algorithm typically achieves a smaller maximum error compared to the least-squares method for these examples. In practice, the Remez algorithm is initialized with scaled Chebyshev points to accelerate convergence, typically requiring 4-6 iterations for these cases, and results can be confirmed via tools like MATLAB's remez function or equivalent libraries.[8]