Halley's method
Halley's method is a numerical root-finding algorithm in mathematics used to approximate solutions to nonlinear equations of the form f(x) = 0, where f is a differentiable function with a continuous second derivative.[1] It achieves cubic (third-order) convergence for simple roots, surpassing the quadratic convergence of Newton's method, by incorporating both the first and second derivatives of f in its iterative formula.[1] Named after the English astronomer and mathematician Edmond Halley (1656–1742), the method was originally introduced in 1694 as a technique for solving polynomial equations without relying on calculus, generalizing earlier iterative approaches for specific degrees.[2]
The modern formulation of Halley's method derives from a second-order Taylor expansion of f around the current iterate x_n, leading to the update rule
x_{n+1} = x_n - \frac{2 f(x_n) f'(x_n)}{2 [f'(x_n)]^2 - f(x_n) f''(x_n)},
which can be interpreted geometrically as the intersection of the function's graph with a hyperbola tangent to both the function and its first derivative at x_n.[1] This formula ensures that the error satisfies e_{n+1} \approx C e_n^3 near a simple root, where C involves higher derivatives of f, providing rapid convergence when initial guesses are reasonable and derivatives are computable.[1]
Halley's method builds directly on Newton's method (x_{n+1} = x_n - f(x_n)/f'(x_n)) by correcting for the quadratic term neglected in the first-order Taylor approximation, effectively transforming it into a higher-order scheme without increasing the number of function evaluations per iteration beyond those required for the second derivative.[1] Although Halley's original 1694 presentation in Philosophical Transactions of the Royal Society focused on a rational iterative formula for polynomials—avoiding explicit derivatives—it aligns closely with the derivative-based version rediscovered and generalized in the 20th century for broader analytic functions.[2] The method's efficiency makes it particularly valuable in applications like nonlinear optimization, scientific computing, and solving Kepler's equation in orbital mechanics, though it demands more computational effort for the second derivative compared to derivative-free alternatives.[3]
History and Overview
Historical Development
Edmond Halley, an English astronomer and mathematician, introduced his iterative method for root-finding in 1694 through a paper published in the Philosophical Transactions of the Royal Society. Titled "Methodus nova accurata & facilis inveniendi radices æqnationum quarumcumque generaliter, sine præviæ reductione," the work described a general approach to extracting roots from equations of any degree without preliminary algebraic simplification, building on the analytical art to resolve mathematical problems posed as equations.[4] Halley's innovation lay in achieving cubic convergence, surpassing the quadratic rate of contemporary methods like Newton's, which enhanced efficiency for complex computations.[5]
As a prominent astronomer known for his work on comets and planetary motion, Halley developed the method amid his broader efforts in numerical computation for astronomical purposes.
The method received early recognition within the mathematical community, with contemporary figures like Joseph Raphson referencing and building upon similar iterative strategies in the second edition of his Analysis aequationum universalis published in 1697, reflecting the rapid dissemination of such advances among Royal Society affiliates.[5] Over the subsequent centuries, Halley's approach remained somewhat overlooked until its rediscovery and formalization in modern numerical analysis during the 20th century.
In 1970, Alston S. Householder analyzed Halley's method as a special case within the broader class of Householder's higher-order iteration schemes in his seminal book The Numerical Treatment of a Single Nonlinear Equation, integrating it into systematic studies of convergence properties and applications in computational mathematics. This revival positioned Halley's method as a foundational tool in numerical root-finding, influencing subsequent developments in algorithms for solving nonlinear equations.
Basic Principles and Purpose
Halley's method is a root-finding algorithm designed to solve nonlinear equations of the form f(x) = 0, where f is a function of one real variable. It iteratively refines an initial approximation to the root by incorporating the function value, its first derivative, and its second derivative at each step, enabling a more accurate prediction of the root's location compared to methods relying solely on lower-order information.
The primary purpose of Halley's method is to achieve cubic convergence for simple roots, which is faster than the quadratic convergence of Newton's method, particularly beneficial for nonlinear equations where higher precision is needed with fewer iterations. This makes it valuable in numerical computations requiring efficient root approximation, such as in scientific simulations and engineering problems involving transcendental or polynomial equations. By leveraging a second-order approximation that captures the function's curvature, the method reduces error more rapidly once an initial guess is sufficiently close to the root.
For effective application, Halley's method assumes the function f is twice continuously differentiable in a neighborhood of the root, ensuring the second derivative exists and is well-behaved. Additionally, a good initial guess near the desired root is prerequisite to guarantee convergence, as the method may diverge or slow down otherwise. In practice, it proceeds by successively updating the approximation through a process that balances the linear and quadratic terms of the function's expansion, progressively honing in on the solution with increasing accuracy.
This technique was originally introduced by astronomer Edmond Halley in 1694 as a means to extract roots from algebraic and geometric equations efficiently.[4]
Halley's method updates an initial guess x_0 iteratively to approximate a root of a twice continuously differentiable function f(x) = 0, where f'(x) \neq 0 near the root. The standard iteration formula is given by
x_{n+1} = x_n - \frac{f(x_n)}{f'(x_n)} \cdot \frac{1}{1 - \frac{f(x_n) f''(x_n)}{2 [f'(x_n)]^2}},
provided the denominator is nonzero. This formula modifies Newton's method by incorporating the second derivative f''(x_n) to achieve cubic convergence under suitable conditions.
To compute the next iterate x_{n+1}, first evaluate f(x_n), f'(x_n), and f''(x_n) at the current approximation x_n. Then, calculate the correction term \delta = \frac{f(x_n)}{f'(x_n)} \cdot \frac{1}{1 - \frac{f(x_n) f''(x_n)}{2 [f'(x_n)]^2}}, ensuring the term \frac{f(x_n) f''(x_n)}{2 [f'(x_n)]^2} \neq 1 to avoid division by zero. Finally, update the approximation as x_{n+1} = x_n - \delta. The method requires explicit knowledge or computable expressions for both the first and second derivatives, which increases the per-iteration cost compared to derivative-free alternatives but enables higher-order accuracy.
The following pseudocode illustrates the implementation for finding a root of f(x) = x^2 - 2, where f'(x) = 2x and f''(x) = 2:
function halley_sqrt2(x0, tolerance, max_iter)
x = x0
for i = 1 to max_iter
f = x^2 - 2
fp = 2 * x
fpp = 2
denom = 1 - (f * fpp) / (2 * fp^2)
if abs(denom) < tolerance // Avoid division by near-zero
break
delta = (f / fp) / denom
x = x - delta
if abs(delta) < tolerance
return x
end
return x // Or handle non-convergence
end
function halley_sqrt2(x0, tolerance, max_iter)
x = x0
for i = 1 to max_iter
f = x^2 - 2
fp = 2 * x
fpp = 2
denom = 1 - (f * fpp) / (2 * fp^2)
if abs(denom) < tolerance // Avoid division by near-zero
break
delta = (f / fp) / denom
x = x - delta
if abs(delta) < tolerance
return x
end
return x // Or handle non-convergence
end
This structure evaluates the necessary quantities at each step and applies the update, halting based on a tolerance for the change in x.
Householder's methods constitute a family of root-finding algorithms for solving nonlinear equations f(x) = 0, where f is sufficiently differentiable, achieving convergence of order d+1 by employing rational approximations based on Padé approximants or, equivalently, higher-order Taylor expansions of the inverse function f^{-1}. These methods generalize Newton's method (which corresponds to d=1, order 2) and were introduced by Alston S. Householder in his 1970 monograph, where he derived them through inverse interpolation techniques that approximate the step to the root using polynomial or rational forms matched to the local Taylor series of f. For general d, the iteration constructs a Padé approximant of type [d-1, 1] to f around the current iterate x_n, and the next iterate x_{n+1} is the root of this approximant, ensuring the approximation error aligns with the desired order. This rational form outperforms pure polynomial approximations by better capturing the behavior near the root, particularly for functions with inflection points or varying curvature.
The case d=2, yielding Halley's method with cubic (order-3) convergence, utilizes a [1,1] Padé approximant—a linear numerator over a linear denominator—to f expanded around x_n. This approximant r(h) = \frac{p_0 + p_1 h}{1 + q_1 h}, where h = x - x_n, is determined by matching the Taylor series of f(x_n + h) up to order 3, ensuring f(x_n + h) - r(h) = O(h^3). Equivalently, this arises from the second-order Taylor expansion of the inverse function g(y) = f^{-1}(y) around y_0 = f(x_n), which provides a cubic approximation to the root step h = g(0) - x_n.
To derive explicitly, begin with the Taylor expansion of f at x_n:
f(x_n + h) = f(x_n) + f'(x_n) h + \frac{f''(x_n)}{2} h^2 + \frac{f'''(x_n)}{6} h^3 + O(h^4),
where f = f(x_n) for brevity. The inverse g(y) satisfies g(y_0) = x_n and has derivatives g'(y) = 1 / f'(g(y)), g''(y) = -f''(g(y)) / [f'(g(y))]^3. Thus, at y_0,
g'(y_0) = \frac{1}{f'(x_n)}, \quad g''(y_0) = -\frac{f''(x_n)}{[f'(x_n)]^3}.
The second-order Taylor expansion of g around y_0 evaluated at 0 is
g(0) \approx x_n + g'(y_0) (0 - y_0) + \frac{g''(y_0)}{2} (0 - y_0)^2 = x_n - \frac{f}{f'} + \frac{1}{2} \left( -\frac{f''}{[f']^3} \right) f^2 = x_n - \frac{f}{f'} - \frac{1}{2} \frac{f f'' f}{[f']^3}.
Simplifying the correction term,
h \approx -\frac{f f'^2 + \frac{1}{2} f^2 f''}{[f']^3} = -\frac{f}{f' - \frac{1}{2} \frac{f f''}{f'}} = -\frac{2 f f'}{2 [f']^2 - f f''}.
Thus, the update is x_{n+1} = x_n + h, which matches the form obtained by solving the zero of the [1,1] Padé approximant to f. This substitution demonstrates Halley's method as the d=2 special case within Householder's framework, where the second derivative term arises naturally from the quadratic correction in the inverse approximation.
Theoretical Analysis
Relation to Newton's Method
Newton's method is a fundamental root-finding algorithm that iteratively approximates roots of a function f(x) using the update formula
x_{n+1} = x_n - \frac{f(x_n)}{f'(x_n)},
achieving quadratic convergence under suitable conditions near a simple root.[6]
Halley's method extends Newton's method by incorporating the second derivative f''(x) to account for higher-order terms in the Taylor series expansion of f(x) around the current iterate, thereby achieving cubic convergence. Specifically, the iteration formula for Halley's method can be derived by applying a second-order Taylor approximation to f(x) and solving for the root of the resulting quadratic, yielding
x_{n+1} = x_n - \frac{f(x_n)}{f'(x_n) \left(1 - \frac{f(x_n) f''(x_n)}{2 [f'(x_n)]^2}\right)},
which modifies the Newtonian step by a correction factor involving f''(x_n).[7][1]
Asymptotically, near a root where f(x_n) is small, the correction term \frac{1}{1 - \frac{f(x_n) f''(x_n)}{2 [f'(x_n)]^2}} approximates $1 + \frac{1}{2} \frac{f(x_n) f''(x_n)}{[f'(x_n)]^2}, representing a small perturbation to the Newtonian update that eliminates the quadratic error term.
While Halley's method requires evaluating both f'(x_n) and f''(x_n) per iteration—roughly twice the computational cost of Newton's method—it typically demands fewer iterations overall due to its cubic convergence order, making it advantageous for problems where derivative evaluations are not prohibitive.[1][7]
Cubic Convergence Proof
Under the standard assumptions that the function f is analytic in a neighborhood of the simple root \alpha, with f(\alpha) = 0, f'(\alpha) \neq 0, Halley's method achieves cubic convergence. (Note that f''(\alpha) \neq 0 is not required, though it affects the explicit constant.) Let e_n = x_n - \alpha denote the error at the nth iteration step. The proof proceeds by substituting Taylor series expansions of f, f', and f'' around \alpha into the iteration formula and analyzing the resulting error expression up to cubic order.
The Taylor expansions are:
f(x_n) = f'(\alpha) e_n + \frac{1}{2} f''(\alpha) e_n^2 + \frac{1}{6} f'''(\alpha) e_n^3 + O(e_n^4)
f'(x_n) = f'(\alpha) + f''(\alpha) e_n + \frac{1}{2} f'''(\alpha) e_n^2 + O(e_n^3)
f''(x_n) = f''(\alpha) + f'''(\alpha) e_n + O(e_n^2).
The Halley's iteration formula is
x_{n+1} = x_n - \frac{f(x_n)}{f'(x_n) - \frac{1}{2} \frac{f(x_n) f''(x_n)}{f'(x_n)}}.
Substituting the expansions into this formula yields e_{n+1} = x_{n+1} - \alpha. The constant and linear terms in e_n vanish due to the structure of the method, as do the quadratic terms (in contrast to Newton's method, which retains a quadratic error). Collecting the cubic-order terms gives the error recursion
e_{n+1} = \left[ -\frac{1}{6} \frac{f'''(\alpha)}{f'(\alpha)} + \frac{1}{4} \frac{[f''(\alpha)]^2}{[f'(\alpha)]^2} \right] e_n^3 + O(e_n^4).
Thus, the asymptotic behavior is |e_{n+1}| \approx |C| |e_n|^3, where the explicit asymptotic constant is C = -\frac{1}{6} \frac{f'''(\alpha)}{f'(\alpha)} + \frac{1}{4} \frac{[f''(\alpha)]^2}{[f'(\alpha)]^2}. This confirms the cubic order of convergence under the stated assumptions.[1]
Variants and Extensions
Halley's Irrational Method
Halley's irrational method, also known as Halley's formula or Euler's method in some contexts, is an alternative formulation of Halley's method that achieves cubic convergence while still requiring the first and second derivatives.[8] Unlike the standard rational form of Halley's method, which results in a rational function iteration, this variant involves a square root, giving it the "irrational" designation. It is particularly noted in historical contexts as one of the two third-order methods developed by Edmond Halley.[8]
The iteration formula is given by
x_{n+1} = x_n - \frac{2 f(x_n) f'(x_n)}{f'(x_n)^2 - f(x_n) f''(x_n) + \sqrt{[f'(x_n)]^2 - 2 f(x_n) f''(x_n)}},
or equivalently,
x_{n+1} = x_n - \frac{1 - \sqrt{1 - 2 \frac{f(x_n) f''(x_n)}{[f'(x_n)]^2}}}{\frac{f''(x_n)}{f'(x_n)}}.
[8] This form arises from solving a quadratic equation derived from the second-order Taylor expansion, similar to the standard method, but retaining the square root from the quadratic formula.[8]
The method preserves the cubic convergence order of Halley's method for simple roots, with the error satisfying e_{n+1} \propto e_n^3, under the assumption that the function is sufficiently smooth.[8] It requires the same computational components as the standard Halley's method (evaluations of f, f', and f'') but may be preferred in certain algebraic manipulations or when the square root form simplifies implementation for specific functions. However, the presence of the square root can introduce branch cut issues in complex domains or additional numerical considerations in floating-point arithmetic.[8]
Practical Modifications
In practical implementations of Halley's method, the higher computational cost due to second-derivative evaluations is a consideration; each iteration typically requires one evaluation each of f(x_n), f'(x_n), and f''(x_n), compared to two for Newton's method. This can be mitigated through analytical derivatives or automatic differentiation tools.[9]
Hybrid approaches can enhance reliability by combining Halley's method with other techniques. For example, bracketing methods like bisection can isolate a root interval before applying Halley's for rapid refinement, ensuring global convergence properties alongside local cubic speed. Similarly, derivative-free methods such as the secant method can provide initial approximations before transitioning to Halley's when derivatives become available.
Software libraries support Halley's method with provisions for stability. In SciPy's optimize.newton function, providing the second derivative via the fprime2 argument enables Halley's method, with parameters like tol for tolerance, maxiter for maximum iterations, and rtol for relative precision.[10] MATLAB does not have a built-in function for Halley's method but features user-contributed implementations on the File Exchange, often incorporating hybrid options and derivative safeguards.[11]
Applications and Limitations
Halley's method finds applications in various fields requiring efficient root-finding for nonlinear equations. In orbital mechanics, it is used to solve Kepler's equation, which relates the mean anomaly to the eccentric anomaly in elliptical orbits, offering faster convergence than Newton's method for precise astronomical calculations.[3] It is also employed in nonlinear optimization and scientific computing, where cubic convergence accelerates solutions for differentiable functions with available second derivatives. Additionally, extensions appear in finite element analysis for solving systems of nonlinear equations. However, its reliance on second derivatives limits practicality in high-dimensional problems or when analytical derivatives are unavailable, often favoring derivative-free methods like the secant in such cases.[12]
Numerical Examples
To illustrate the application of Halley's method, consider the problem of finding the positive square root of 2 by solving f(x) = x^2 - 2 = 0, where f'(x) = 2x and f''(x) = 2. Starting with the initial guess x_0 = 1, the method applies the iteration formula to yield rapid convergence. The first iteration gives x_1 = 1.4, and the second iteration produces x_2 \approx 1.4142139. The true root is \sqrt{2} \approx 1.41421356237.
The errors demonstrate the method's efficiency:
| Iteration n | x_n | |e_n| (absolute error) |
|-----------------|---------------|---------------------------|
| 0 | 1.000000 | 0.414214 |
| 1 | 1.400000 | 0.014214 |
| 2 | 1.4142139 | 3.38 \times 10^{-7} |
Here, the error reduces from |e_0| \approx 0.414 to |e_1| \approx 0.014 (a factor of about 29), and then to |e_2| \approx 3.38 \times 10^{-7} (a factor of about 42,000), highlighting the cubic error reduction characteristic of Halley's method.
For a nonlinear example, solve f(x) = e^x + x - 2 = 0, where f'(x) = e^x + 1 and f''(x) = e^x. The true root is approximately 0.4428544. Starting with x_0 = 0, Halley's method converges in two iterations: x_1 \approx 0.444444 and x_2 \approx 0.442856. In comparison, Newton's method requires three iterations for similar precision: x_1 = 0.5, x_2 \approx 0.44385, and x_3 \approx 0.442814.
The following table tracks the iterations and absolute errors for both methods:
| Method | Iteration n | x_n | |e_n| |
|---------|-----------------|-------------|--------------------|
| Halley | 0 | 0.000000 | 0.442854 |
| Halley | 1 | 0.444444 | 0.001590 |
| Halley | 2 | 0.442856 | 1.6 \times 10^{-6}|
| Newton | 0 | 0.000000 | 0.442854 |
| Newton | 1 | 0.500000 | 0.057146 |
| Newton | 2 | 0.443850 | 0.000996 |
| Newton | 3 | 0.442814 | 4.0 \times 10^{-5} |
This table shows Halley's cubic convergence, as |e_2| \approx |e_1|^3 (up to scaling), versus Newton's quadratic behavior where |e_{n+1}| \approx C |e_n|^2.
A simple convergence plot of \log_{10} |e_n| versus iteration number would depict a line with slope approximately -3 for Halley's method after the first step, steeper than the slope of -2 for Newton's method, visually confirming the faster error decay.[13]
Comparison with Other Methods
Halley's method exhibits cubic convergence, making it faster than Newton's method near roots, though each iteration is more computationally expensive due to the need for second derivatives. A key limitation is the potential for the denominator $2 [f'(x_n)]^2 - f(x_n) f''(x_n) to become zero or near-zero, causing iteration failure or instability, particularly if the initial guess is poor or near inflection points where f'' changes sign. In multi-dimensional systems, computing the Hessian (analogous to f'') adds significant overhead, often making quasi-Newton methods more practical.[1]
In terms of asymptotic efficiency, defined as the order of convergence raised to the power of one over the number of function evaluations per iteration, Halley's method achieves an index of approximately 1.442 (order 3 with 3 evaluations: f, f', and f''), slightly outperforming Newton's method's index of about 1.414 (order 2 with 2 evaluations). However, Newton's method often requires less total computational work when starting from initial guesses where both converge, as it avoids the second derivative computation. Additionally, Halley's method provides larger convergence basins, encounters fewer matrix singularities in systems of equations, and typically demands fewer iterations overall compared to Newton, particularly in one-dimensional problems with available derivatives and a good initial approximation.[14][15]
Compared to the secant method, which is derivative-free and relies solely on function evaluations, Halley's method offers superior cubic convergence versus the secant's superlinear order of approximately 1.618, leading to fewer iterations for smooth functions. The secant method's efficiency index is higher at about 1.618 (with 1 evaluation per iteration after initialization), making it preferable when derivatives are unavailable or costly to compute, but Halley's approach excels in scenarios requiring rapid local convergence, such as polynomial root-finding with known higher derivatives. Empirical studies confirm Halley's method outperforms the secant in both iteration count and computational time for various nonlinear equations, though the secant remains advantageous for global searches without derivative information.[16][17]
Against higher-order methods, such as fourth-order variants like the super-Halley method, Halley's cubic iteration is simpler to implement and sufficient for most practical purposes, but it yields lower efficiency for achieving very high precision, where methods with order 4 or above (efficiency indices up to approximately 1.587 with optimized evaluations) reduce the total number of steps more effectively. These higher-order alternatives often build on Halley's framework by minimizing second-derivative usage, yet Halley's balance of order and accessibility makes it ideal for one-dimensional problems rather than extending to multi-root or complex high-dimensional scenarios, where computational overhead can outweigh benefits.[18][17]