Fact-checked by Grok 2 weeks ago

Newton's method

Newton's method, also known as the Newton–Raphson method, is a root-finding algorithm that produces successively better approximations to the roots (or zeroes) of a real-valued differentiable function by leveraging the first few terms of the function's Taylor series expansion around an initial guess.^[1] The method iteratively refines an estimate x_n using the formula x_{n+1} = x_n - \frac{f(x_n)}{f'(x_n)}, where f(x) is the function and f'(x) its derivative, effectively applying linear approximation via the function's tangent line at each step.^[1]^[2] Developed by Isaac Newton in his unpublished manuscript De analysi around 1669, where it was applied to polynomial equations via binomial expansions, the method was later refined and described by Joseph Raphson in 1690 as an iterative process for solving algebraic equations, emphasizing its quadratic convergence properties.^[3] Newton further utilized a version of the technique in his Principia Mathematica (1687) to solve Kepler's equation in orbital mechanics, while Raphson extended it to higher-degree polynomials in his Analysis aequationum universalis.^[3] Thomas Simpson generalized the approach in 1740 by incorporating fluxional calculus (early differential calculus) to handle nonlinear and systems of equations, marking its transition to a broader numerical tool.^[3] Under suitable conditions—such as starting sufficiently close to a simple root where f'(x^*) \neq 0—Newton's method exhibits quadratic convergence, meaning the number of correct digits roughly doubles with each iteration, making it highly efficient for numerical computations.^[1]^[2] However, it requires the derivative to be computable and nonzero at iteration points, and poor initial guesses can lead to divergence, oscillation, or convergence to unintended roots, particularly near local extrema or inflection points.^[2] The method's versatility extends beyond root-finding to optimization (via Newton's method for minimization),^[4] machine learning (e.g., in training neural networks),^[5] and graphics (e.g., generating fractal patterns like Newton fractals for polynomials of degree two or higher).^[1] Despite these strengths, modern variants address limitations, such as modified versions for derivative-free approximations or safeguards against division by near-zero derivatives.^[2]

Overview

Description

Newton's method is an iterative root-finding algorithm designed to solve equations of the form f(x) = 0, where f is a real-valued function, by generating a sequence of successively better approximations to the root.^[6] Starting from an initial guess x_0, the method refines the estimate at each step until it converges to a solution sufficiently close to the actual root.^[7] The core of the algorithm is the iterative update formula:

x_{n+1} = x_n - \frac{f(x_n)}{f'(x_n)}

This step arises from the first-order Taylor expansion of f around the current approximation x_n, which provides a linear approximation f(x) \approx f(x_n) + f'(x_n)(x - x_n); setting this equal to zero and solving for x yields the next iterate x_{n+1}.^[8] The formula essentially corrects the current guess by subtracting the ratio of the function value to its derivative, leveraging local linearity to approach the root.^[7] For the method to apply, f must be continuously differentiable in a neighborhood of the root, and the derivative f'(x) must be nonzero near that point to ensure the tangent is well-defined and the iteration is meaningful.^[6] Geometrically, each iteration computes the point where the tangent line to the graph of y = f(x) at x_n intersects the x-axis, using this linear tangent as a proxy for the curve itself near x_n.^[9] Compared to simpler techniques like the bisection method, which relies only on sign changes, or the secant method, which approximates the derivative, Newton's method typically achieves faster convergence for smooth functions due to its use of exact derivative information.^[10]

History

Isaac Newton first devised the method in 1669 while composing his manuscript De analysi per aequationes numero terminorum infinitas, applying it iteratively to approximate roots of polynomial equations through linearization of successively reduced polynomials.^[11] This early version demonstrated quadratic convergence, as illustrated in Newton's example for solving x^3 - 2x - 5 = 0.^[12] The manuscript remained unpublished for decades, though Newton revisited and refined the approach in his 1671 work De methodis fluxionum et serierum infinitarum, which incorporated fluxions (derivatives) for broader applications.^[11] Publication of De analysi occurred in 1711 through William Jones's Analysis per quantitatum series, fluxiones, ac differentias cum enumeratione quarundam mathematicarum. The English translation of the 1671 manuscript, Method of Fluxions and Infinite Series, appeared in 1736.^[11] Newton applied the method practically in his Philosophiæ Naturalis Principia Mathematica (1687) to solve Kepler's equation x - e \sin x = M for astronomical computations.^[12] Independently, Joseph Raphson described a similar iterative procedure in 1690 in Analysis aequationum universalis, formulating it directly for polynomials without polynomial reduction, emphasizing decimal approximations.^[11] The method is sometimes associated with Thomas Simpson, who in 1740 extended it to general nonlinear equations using fluxional calculus in Essays on Several Curious and Useful Subjects in Speculative Mathematics.^[12] In the 19th century, Augustin-Louis Cauchy provided a rigorous modern presentation in 1847, generalizing it to systems and initiating formal error analysis.^[11] By the 1930s, amid growing interest in computational techniques, the method's role in numerical analysis solidified, with Leonid Kantorovich proving convergence properties in 1939 for functional spaces, paving the way for its widespread adoption in early digital computing.^[12]

Mathematical Formulation

One-Dimensional Case

In the one-dimensional case, Newton's method is an iterative technique for approximating a root \alpha of a scalar function f: \mathbb{R} \to \mathbb{R} satisfying f(\alpha) = 0, assuming such a root exists.^[13] The method derives from the first-order Taylor series expansion of f around the current approximation x_n:

f(x) \approx f(x_n) + f'(x_n)(x - x_n).

Setting the approximation equal to zero and solving for x yields the update formula for the next iterate:

x_{n+1} = x_n - \frac{f(x_n)}{f'(x_n)}.

This linear approximation replaces the nonlinear equation with a solvable linear one at each step.^[13] To apply the method, select an initial guess x_0 sufficiently close to \alpha, then generate the sequence \{x_n\} via the update formula until a convergence criterion is met, such as |x_{n+1} - x_n| < \epsilon for a small tolerance \epsilon > 0. The process terminates when the change between iterates is negligible or a maximum iteration count is reached to prevent non-convergence.^[13] The method requires that f is twice continuously differentiable in a neighborhood of \alpha, that f'(\alpha) \neq 0, and that \alpha is a simple root (i.e., the multiplicity is one). These conditions ensure the derivative is invertible at the root and support local convergence behavior.^[14] Define the error at step n as e_n = x_n - \alpha. Under the stated assumptions, the error satisfies the asymptotic relation

e_{n+1} \approx \frac{f''(\alpha)}{2 f'(\alpha)} e_n^2

as n \to \infty and e_n \to 0, which sets up the quadratic convergence rate analyzed further in the quadratic convergence section.^[13]

Multidimensional Case

Newton's method extends naturally to finding roots of systems of nonlinear equations in multiple variables, where the function \mathbf{F}: \mathbb{R}^n \to \mathbb{R}^n satisfies \mathbf{F}(\mathbf{x}) = \mathbf{0}.^[15] The method relies on a linear approximation derived from the multivariable Taylor expansion around the current iterate \mathbf{x}_k:

\mathbf{F}(\mathbf{x}_k + \boldsymbol{\delta}) \approx \mathbf{F}(\mathbf{x}_k) + \mathbf{J}(\mathbf{x}_k) \boldsymbol{\delta} = \mathbf{0},

where \mathbf{J}(\mathbf{x}_k) is the n \times n Jacobian matrix with entries J_{ij} = \frac{\partial F_i}{\partial x_j} evaluated at \mathbf{x}_k.^[15] Solving for the step \boldsymbol{\delta} yields \boldsymbol{\delta} = -\mathbf{J}(\mathbf{x}_k)^{-1} \mathbf{F}(\mathbf{x}_k), assuming \mathbf{J}(\mathbf{x}_k) is invertible, which requires it to be nonsingular near the root.^[16] The iteration then updates as \mathbf{x}_{k+1} = \mathbf{x}_k - \mathbf{J}(\mathbf{x}_k)^{-1} \mathbf{F}(\mathbf{x}_k).^[15] The Jacobian matrix captures the local sensitivity of each equation component to changes in the variables and is typically computed analytically if possible or approximated via finite differences otherwise.^[17] Invertibility of the Jacobian near the root ensures a unique local solution to the linearized system, facilitating convergence to the nonlinear root.^[16] For overdetermined systems where \mathbf{F}: \mathbb{R}^n \to \mathbb{R}^m with m > n, the method adapts via a least-squares approach, minimizing \|\mathbf{F}(\mathbf{x}_k + \boldsymbol{\delta})\|^2 \approx \|\mathbf{F}(\mathbf{x}_k) + \mathbf{J}(\mathbf{x}_k) \boldsymbol{\delta}\|^2.^[17] The solution is \boldsymbol{\delta} = -\left( \mathbf{J}(\mathbf{x}_k)^T \mathbf{J}(\mathbf{x}_k) \right)^{-1} \mathbf{J}(\mathbf{x}_k)^T \mathbf{F}(\mathbf{x}_k), or equivalently using the Moore-Penrose pseudoinverse \mathbf{J}(\mathbf{x}_k)^- when the normal matrix \mathbf{J}^T \mathbf{J} is ill-conditioned.^[17] Geometrically, the update in the n-dimensional case can be interpreted by considering the graph of \mathbf{F} in \mathbb{R}^{2n}, where the current point (\mathbf{x}_k, \mathbf{F}(\mathbf{x}_k)) lies on a hypersurface; the linear approximation defines the tangent hyperplane at this point, and the Newton step finds the intersection of this tangent hyperplane with the hyperplane \{\mathbf{y} = \mathbf{0}\} to obtain the next iterate \mathbf{x}_{k+1}.^[18] This intersection refines the approximation by aligning the local linear model with the zero level set of \mathbf{F}.^[18]

Convergence Analysis

Quadratic Convergence

Under standard assumptions, Newton's method exhibits quadratic convergence to a simple root. Specifically, if the function f is twice continuously differentiable, f'(\alpha) \neq 0 where \alpha is a simple root (i.e., f(\alpha) = 0), and the initial guess x_0 is sufficiently close to \alpha, then the error e_n = x_n - \alpha satisfies

\lim_{n \to \infty} \frac{|e_{n+1}|}{|e_n|^2} = \left| \frac{f''(\alpha)}{2 f'(\alpha)} \right| < \infty. \tag{1}

This limit establishes the quadratic rate, meaning the number of correct digits roughly doubles with each iteration once close to the root.^[6]^[19] The proof relies on Taylor expansions of f and f' around the root \alpha. Consider the iteration x_{n+1} = x_n - \frac{f(x_n)}{f'(x_n)}, so the error is

e_{n+1} = e_n - \frac{f(\alpha + e_n)}{f'(\alpha + e_n)}.

Expanding f(\alpha + e_n) gives

f(\alpha + e_n) = f(\alpha) + f'(\alpha) e_n + \frac{1}{2} f''(\xi_n) e_n^2 = f'(\alpha) e_n + \frac{1}{2} f''(\xi_n) e_n^2,

for some \xi_n between \alpha and x_n, since f(\alpha) = 0. Similarly, expand f'(\alpha + e_n) = f'(\alpha) + f''(\eta_n) e_n for some \eta_n between \alpha and x_n. Substituting and simplifying, while neglecting higher-order terms as e_n \to 0,

e_{n+1} \approx -\frac{f''(\alpha)}{2 f'(\alpha)} e_n^2.

Taking absolute values yields |e_{n+1}| \approx \frac{|f''(\alpha)|}{2 |f'(\alpha)|} |e_n|^2, confirming the limit in (1).^[6]^[19] This quadratic convergence is local, occurring within a basin of attraction around \alpha: there exists a neighborhood such that if x_0 lies within it, the sequence converges to \alpha at the quadratic rate. The size of this basin depends on the function's properties, such as the ratio |f''(\alpha)/f'(\alpha)|, ensuring |e_{n+1}| < |e_n| when |e_n| is small enough (e.g., |e_n| < 2 |f'(\alpha)/f''(\alpha)|).^[20]^[21] The finite limit in (1) defines the asymptotic constant C = \left| \frac{f''(\alpha)}{2 f'(\alpha)} \right|, which quantifies the convergence speed near the root; smaller C implies faster reduction in error.^[6]^[19]

Error Bounds and Conditions

In the one-dimensional case, Newton's method admits an a priori error bound derived from Taylor's theorem with remainder, stating that if f is twice continuously differentiable on an interval containing the root \alpha and the iterates, with |f'(x)| \geq L > 0 and |f''(x)| \leq M for all x in that interval, then the error satisfies

|x_{n+1} - \alpha| \leq \frac{M}{2L} |x_n - \alpha|^2,

provided the initial error |x_0 - \alpha| is small enough to keep iterates within the interval (specifically, \frac{M}{2L} |x_0 - \alpha| < 1). This bound quantifies the quadratic reduction in error, with the constant K = \frac{M}{2L} depending on the supremum norm of f'' relative to the infimum norm of f'.^[22] Sufficient conditions for global convergence from any starting point in a suitable interval, often referred to as Fourier conditions in historical analyses, require properties like the boundedness of |f''(x)/f'(x)| and Lipschitz continuity of f'(x)/f(x), ensuring the iteration remains monotone and approaches the root without oscillation. These conditions, building on early work by Fourier, who in 1818 attempted to prove quadratic convergence in one dimension (though the proof was flawed), and more robustly in 1831, guarantee that if f has a simple root and satisfies such analytic constraints (e.g., |f''(x)/f'(x)| \leq \beta < 2 on an interval bracketing the root), the method converges globally for initial guesses within that interval. Later extensions, such as Cauchy's semilocal theorem, refine these by imposing bounds on |f''| and |f'| to ensure uniqueness and quadratic rates from sufficiently broad starting regions.^[23]^[24] In the multivariable case, the error analysis extends via the mean value theorem for vector functions, yielding an approximation

e_{k+1} \approx \frac{1}{2} J(\alpha)^{-1} H(\alpha) [e_k \otimes e_k],

where e_k = x_k - \alpha is the error vector, J(\alpha) is the Jacobian matrix of the system F(x) = 0 at the root \alpha, and H(\alpha) denotes the second derivative tensor (Hessian for scalar F, or bilinear form for vector F). This quadratic term arises from the second-order Taylor expansion of F, highlighting how the method's superlinear behavior depends on the nonsingularity of J(\alpha) and boundedness of higher derivatives near \alpha. Quantitative bounds follow similarly, with \|e_{k+1}\| \leq \tau \|e_k\|^2 for some \tau > 0 when iterates are sufficiently close.^[25] Newton's method demonstrates local convergence, where quadratic error reduction occurs provided the initial guess is sufficiently proximate to the root, as captured by the aforementioned bounds; however, global convergence—ensuring success from distant starting points—relies on additional structural assumptions on F, such as monotonicity or convexity in one dimension, or contractive mappings in higher dimensions, without which the method may diverge or converge slowly.^[23]

Practical Considerations

Derivative Computation Challenges

One significant challenge in applying Newton's method arises when the derivative f'(x) of the objective function is difficult or impossible to compute analytically, particularly for complex or black-box functions where symbolic differentiation yields cumbersome expressions. In such cases, numerical approximations become necessary, but they introduce errors that can compromise the method's quadratic convergence properties. For instance, in financial modeling, deriving exact derivatives for implied volatility calculations can be tedious, prompting the use of numerical alternatives.^[26] A common approach is the finite difference approximation, which estimates the derivative using discrete function evaluations, such as the forward difference formula:

f'(x) \approx \frac{f(x + h) - f(x)}{h},

where h > 0 is a small perturbation parameter. The truncation error in this first-order scheme is O(h), stemming from the Taylor expansion of f, while roundoff error arises from floating-point arithmetic and scales as O(\epsilon / h), with \epsilon denoting machine precision (typically $10^{-16} for double precision). To minimize the total error, h is optimally chosen around \sqrt{\epsilon} \cdot |x|, balancing these competing effects; for central differences, which achieve O(h^2) truncation error, the optimal h is approximately \sqrt{\epsilon / 2} \cdot |x|. However, poor h selection can amplify errors, especially in high dimensions or near singularities, leading to unreliable iteration steps in Newton's method.^[27]^[28] Automatic differentiation (AD) addresses these limitations by computing exact derivatives to machine precision without approximation, decomposing the function into elementary operations and applying chain rule propagation. In forward-mode AD, derivatives are computed alongside function evaluations in a single pass, suitable for low-dimensional problems, while reverse-mode (adjoint) AD efficiently handles high-dimensional cases by backpropagating sensitivities, often at a cost comparable to two function evaluations. This exactness eliminates truncation errors inherent in finite differences, improving accuracy by orders of magnitude (e.g., from $10^{-13} relative error in numerical methods to near-zero in AD for test functions) and enhancing performance in root-finding applications, where AD-derived Jacobians accelerate convergence in Newton's iterations for nonlinear equations in solid mechanics.^[29]^[30]^[31] When the derivative is entirely unavailable, such as in derivative-free optimization scenarios, Newton's method must be modified, motivating transitions to derivative-free alternatives like the secant method, which approximates f' via divided differences from prior iterates, or quasi-Newton methods that build low-rank updates to an initial Hessian estimate. These approaches maintain superlinear convergence under suitable conditions but sacrifice the quadratic rate of exact Newton steps.^[32] Inaccuracies in derivative computations propagate through the Newton update x_{k+1} = x_k - f(x_k)/f'(x_k), potentially destabilizing iterations by altering the step direction or length, which can cause divergence or slow convergence if the relative error exceeds a fraction of the residual (e.g., \delta \leq 0.1 \|f(x_k)\|). In inexact Newton frameworks, convergence guarantees hold if derivative approximations satisfy adaptive tolerances, such as \|F'(x_k) s_k + F(x_k)\| \leq \eta_k \|F(x_k)\| with forcing parameters \eta_k \to 0, ensuring local quadratic rates akin to exact methods while allowing controlled inexactness to reduce computational cost.^[33]

Convergence Failures and Slow Cases

Newton's method can fail to converge or exhibit pathological behavior in several scenarios, often due to the nature of the function or the choice of initial guess. One common failure occurs when the derivative f'(x_n) is approximately zero at an iterate, leading to large steps that cause divergence, or when the initial guess x_0 is sufficiently far from the root, resulting in iterates that move away from the solution region.^[21] Additionally, if f'(x_n) = 0 exactly, the iteration step becomes undefined due to division by zero, halting the process; for instance, starting at a critical point like x_0 = 0 for f(x) = x^3 - x^2 - 1 yields f'(0) = 0, preventing further computation.^[34] Oscillations represent another form of non-convergence, where iterates alternate around the root due to overshooting caused by steep tangents or specific function shapes. A classic example is f(x) = x^3 - 2x + 2, where starting at x_0 = 0 produces a period-2 cycle: the next iterate is x_1 = 1, followed by x_2 = 0, and so on, never approaching the actual root near -1.77.^[6] Such cycles trap the method in bounded but non-convergent paths, particularly near local extrema or for functions without real roots.^[21] For roots of multiplicity m > 1, where f(x) = (x - \alpha)^m g(x) with g(\alpha) \neq 0, Newton's method converges linearly with rate \lambda = 1 - 1/m, significantly slower than the quadratic rate for simple roots (m=1).^[35] To restore quadratic convergence, a modified iteration can be used if m is known:

x_{n+1} = x_n - m \frac{f(x_n)}{f'(x_n)}.

This adjustment accounts for the higher-order zero in the derivative at the root.^[35] Without modification, the method's efficiency drops, as seen in examples like f(x) = (x-1)^3, where only one significant digit may be gained after 10 iterations near the multiple root.^[21] Slow convergence also arises near inflection points or flat regions, where the second derivative changes sign or the first derivative is small, causing the linear tangent approximation to poorly capture the function's curvature. For f(x) = (x-1)^3, an inflection at the root x=1 leads to high relative errors persisting over iterations, such as 32.65% after four steps from x_0 = -1, due to the tangent line's inadequate fit.^[36] In flat regions, small |f'(x_n)| amplifies errors similarly to near-zero derivatives, prolonging the approach to the root.^[34] In the complex plane, Newton's method reveals chaotic basins of attraction, where the plane partitions into fractal regions drawing initial points to different roots, with intricate, interwoven boundaries. For polynomials like z^3 - 1 = 0, these basins exhibit self-similar fractal structures, highlighting sensitivity to initial conditions and potential non-convergence to the intended root.^[37]

Examples

Square Root Computation

One of the simplest applications of Newton's method is computing the square root of a positive number a, denoted \sqrt{a}, by finding the root of the equation f(x) = x^2 - a = 0.^[38] The derivative is f'(x) = 2x, leading to the iteration formula x_{n+1} = x_n - \frac{f(x_n)}{f'(x_n)} = \frac{1}{2} \left( x_n + \frac{a}{x_n} \right).^[38] This is a special case of the one-dimensional Newton's method for solving f(x) = 0. A reasonable starting guess x_0 can be a/2 or simply 1, assuming a > 0; for small a, x_0 = 1 works well.^[38] To illustrate, consider computing \sqrt{2} \approx 1.414213562 with x_0 = 1:

Iteration n	x_n	Error (relative to true \sqrt{2})
0	1.000000	≈ 0.292893
1	1.500000	≈ 0.060660
2	1.416667	≈ 0.001794
3	1.414216	≈ 0.0000009
4	1.414214	< machine epsilon (≈ 2.22 × 10^{-16})

The values converge to machine precision in about 4 iterations using double-precision arithmetic.^[38] This method exhibits quadratic convergence, roughly doubling the number of correct digits with each iteration once close to the root, as the error satisfies \delta_{n+1} \approx \frac{\delta_n^2}{2 x^*} where x^* = \sqrt{a}.^[38] Historically, this iteration—known as the Babylonian method since the 19th century BCE—was used for square root approximations long before calculators and was later recognized by Isaac Newton as a case of his general root-finding technique.^[39] Its efficiency made it a standard tool in hand computations and early numerical analysis.^[39]

Transcendental Equation Solution

A representative application of Newton's method arises in solving transcendental equations that combine trigonometric and polynomial terms, such as finding the intersection points of y = \cos x and y = x^3. These equations lack closed-form algebraic solutions, making iterative numerical methods essential. Define the function f(x) = \cos x - x^3, so the roots satisfy f(x) = 0. The derivative is f'(x) = -\sin x - 3x^2, which is used in the iteration formula x_{n+1} = x_n - \frac{f(x_n)}{f'(x_n)}.^[40] Selecting an initial guess near the intersection, take x_0 = 0.8. This value is chosen based on graphical inspection, where \cos(0.8) \approx 0.697 > 0.8^3 = 0.512, placing it close to the root around 0.865. The iterations proceed as follows, with values computed to four decimal places for clarity:

Iteration n	x_n	f(x_n)	f'(x_n)	x_{n+1}
0	0.8000	0.1847	-2.6374	0.8701
1	0.8701	-0.0151	-3.0355	0.8651
2	0.8651	0.0003	-3.0199	0.8652
3	0.8652	-0.0000	-3.0200	0.8652
4	0.8652	0.0000	-3.0200	0.8652

The computations for each step are obtained by substituting x_n into f and f', then applying the update formula; for instance, at n=0, \frac{f(0.8000)}{f'(0.8000)} = \frac{0.1847}{-2.6374} \approx -0.0701, so x_1 = 0.8000 - (-0.0701) = 0.8701. Convergence occurs rapidly after the first few steps, yielding the root x \approx 0.8652 to four decimal places.^[41] Graphically, the functions y = \cos x and y = x^3 intersect at one real point near 0.865. The root is isolated in the interval (0.8, 0.9), where f(x) changes sign.^[40] The success of Newton's method here highlights its efficiency for such equations but also underscores its sensitivity to the initial guess; for example, starting at x_0 = 0 leads to division by zero since f'(0) = 0. This equation has only one real root, but in general, transcendental equations often admit multiple roots, necessitating graphical or analytical previews to select an appropriate x_0 for the target solution—details on potential failures are covered in the convergence analysis section.^[40]

Generalizations

Systems of Equations

Newton's method extends naturally to solving systems of nonlinear equations \mathbf{F}(\mathbf{x}) = \mathbf{0}, where \mathbf{F}: \mathbb{R}^n \to \mathbb{R}^n is a vector-valued function with continuously differentiable components. The method approximates the solution by iteratively solving a linear system involving the Jacobian matrix \mathbf{J}(\mathbf{x}), which contains the partial derivatives of \mathbf{F} (as defined in the multidimensional case). At each iteration k, given current estimate \mathbf{x}^{(k)}, the update is obtained by solving \mathbf{J}(\mathbf{x}^{(k)}) \boldsymbol{\delta} = -\mathbf{F}(\mathbf{x}^{(k)}) for the step \boldsymbol{\delta}, then setting \mathbf{x}^{(k+1)} = \mathbf{x}^{(k)} + \boldsymbol{\delta}. This process continues until \|\mathbf{F}(\mathbf{x}^{(k)})\| falls below a specified tolerance.^[42] A representative example is the system

\begin{cases} x_1^2 + x_1 x_2 - 10 = 0, \\ x_2 + 3 x_1 x_2^2 - 57 = 0, \end{cases}

with exact solution (x_1, x_2) = (2, 3). The Jacobian is

\mathbf{J}(x_1, x_2) = \begin{bmatrix} 2x_1 + x_2 & x_1 \\ 3 x_2^2 & 1 + 6 x_1 x_2 \end{bmatrix}.

Starting from initial guess (x_1, x_2) = (1.5, 3.5), the method requires 4 iterations to converge to the solution within a relative error tolerance of $10^{-4}. At the first iteration, \mathbf{F}(1.5, 3.5) = (-2.5, 1.625) and \mathbf{J}(1.5, 3.5) = \begin{bmatrix} 6.5 & 1.5 \\ 36.75 & 32.5 \end{bmatrix}, yielding step \boldsymbol{\delta} \approx (0.536, -0.656) and update to (2.036, 2.844); subsequent steps refine this quadratically toward the root.^[42] For overdetermined systems where the number of equations m > n (the number of unknowns), an exact solution may not exist, so Newton's method is adapted via the Gauss-Newton algorithm to minimize the nonlinear least-squares objective \frac{1}{2} \|\mathbf{F}(\mathbf{x})\|_2^2. Here, the iteration approximates the Hessian of the objective using \mathbf{J}(\mathbf{x})^T \mathbf{J}(\mathbf{x}), solving the normal equations [\mathbf{J}(\mathbf{x}^{(k)})^T \mathbf{J}(\mathbf{x}^{(k)})] \boldsymbol{\delta} = -\mathbf{J}(\mathbf{x}^{(k)})^T \mathbf{F}(\mathbf{x}^{(k)}) for the step, which reduces to a linear least-squares problem per iteration. This approach achieves local quadratic convergence near a solution where \mathbf{J} has full rank.^[43] The primary computational cost per iteration for square systems arises from forming the Jacobian (requiring O(n^2) evaluations of partial derivatives) and solving the resulting n \times n linear system, typically via LU decomposition at O(n^3) operations. For large n, direct solvers dominate, though preconditioning or iterative methods can mitigate this for sparse Jacobians.^[44]

Complex and Banach Space Extensions

Newton's method extends naturally to the complex domain for analytic functions f: \mathbb{C} \to \mathbb{C}, where the iteration formula remains z_{n+1} = z_n - \frac{f(z_n)}{f'(z_n)}, mirroring the real case but operating in the complex plane.^[45] This extension preserves local quadratic convergence near simple roots, provided f'(z^*) \neq 0 at the root z^*, due to the analyticity ensuring the method's classical properties hold.^[45] A key advancement in the complex setting is Smale's \alpha-theory, which provides pointwise estimates for convergence using data at a single approximate root.^[45] Specifically, for an analytic function f, the quantity \alpha(f, z) = \beta(f, z) \cdot \gamma(f, z)^{1/2}, where \beta(f, z) = \frac{|f(z)|}{|f'(z)|} measures the Newton step size and \gamma(f, z) bounds higher-order terms via the inverse of the first-order Taylor remainder, certifies quadratic convergence if \alpha(f, z) < \alpha_0 \approx 0.1307.^[45] This theory quantifies the basin of attraction in the complex plane, enabling a posteriori verification of approximate roots without global knowledge of f.^[45] In Banach spaces, Newton's method generalizes to operators F: X \to Y between Banach spaces X and Y, using the Fréchet derivative DF(x) at iteration points.^[46] The update becomes x_{n+1} = x_n - [DF(x_n)]^{-1} F(x_n), where [DF(x_n)]^{-1} is a bounded inverse operator, assuming invertibility.^[46] This formulation applies to infinite-dimensional settings, such as function spaces, where local quadratic convergence holds under Lipschitz continuity of DF near a root x^* with DF(x^*) invertible.^[46] The Newton-Kantorovich theorem establishes sufficient conditions for quadratic convergence in this Banach space context.^[46] For F continuously Fréchet differentiable on a convex open set \Omega \subset X, with DF(x_0) invertible and \|DF(x) - DF(x_0)\| \leq K \|x - x_0\| for some K > 0, if the initial residual satisfies \eta = \| [DF(x_0)]^{-1} F(x_0) \| \leq \frac{1}{2K} and the ball condition holds, then the iterates converge quadratically to a unique root in a specified ball.^[46] This semi-local result, originating from Kantorovich's work on functional equations, bounds the convergence radius and error without requiring global Lipschitz assumptions.^[46] For low-regularity problems in infinite dimensions, such as nonlinear partial differential equations (PDEs), the Nash-Moser iteration provides a variant of Newton's method that handles loss of derivatives through smoothing operators.^[47] In tame Fréchet spaces equipped with scales of smoothing norms, the iteration applies a modified Newton step u_{n+1} = u_n - S_n [DF(u_n)]^{-1} F(u_n), where S_n is a smoothing operator gaining derivatives, followed by a tame loss-compensation step to control regularity.^[47] This yields an inverse function theorem for maps between tame Fréchet spaces, ensuring local surjectivity and invertibility even when classical Newton fails due to insufficient smoothness.^[47] The technique, building on Nash's embedding proofs and Moser's elliptic regularity work, is pivotal for proving local existence of solutions to quasilinear PDEs in high dimensions.^[47]

Modifications

Quasi-Newton Methods

Quasi-Newton methods represent a class of iterative algorithms that modify Newton's method by approximating the Jacobian matrix (for systems of equations) or the Hessian matrix (for optimization problems) through low-rank updates, thereby avoiding the costly full recomputation of derivatives at each iteration.^[48] These approximations are constructed to satisfy a secant condition, which ensures that the update aligns with the curvature information from successive function evaluations, providing a balance between accuracy and computational efficiency.^[48] By leveraging information from previous steps, quasi-Newton methods reduce the expense associated with derivative computation challenges, such as in high-dimensional settings.^[48] A foundational example is Broyden's method, introduced in 1965, which applies to solving systems of nonlinear equations \mathbf{f}(\mathbf{x}) = \mathbf{0}.^[49] It updates the Jacobian approximation \mathbf{J}_{k+1} from \mathbf{J}_k using a rank-1 correction that enforces the secant condition:

\mathbf{J}_{k+1} \mathbf{s}_k = \mathbf{y}_k,

where \mathbf{s}_k = \mathbf{x}_{k+1} - \mathbf{x}_k is the step vector and \mathbf{y}_k = \mathbf{f}(\mathbf{x}_{k+1}) - \mathbf{f}(\mathbf{x}_k) is the function difference.^[49] The explicit update formula is

\mathbf{J}_{k+1} = \mathbf{J}_k + \frac{(\mathbf{y}_k - \mathbf{J}_k \mathbf{s}_k) \mathbf{s}_k^\top}{\|\mathbf{s}_k\|^2},

which maintains a good approximation while requiring only matrix-vector products and avoiding direct derivative evaluations after initialization.^[49] In the context of unconstrained optimization, the BFGS method extends this idea to approximate the Hessian of the objective function, ensuring the update remains positive definite under standard assumptions like sufficient decrease in the line search.^[48] Independently proposed in 1970 by Broyden, Fletcher, Goldfarb, and Shanno, BFGS uses a rank-2 update for the inverse Hessian approximation \mathbf{B}_{k+1}^{-1}:

\mathbf{B}_{k+1}^{-1} = \left( \mathbf{I} - \frac{\mathbf{s}_k \mathbf{y}_k^\top}{\mathbf{y}_k^\top \mathbf{s}_k} \right) \mathbf{B}_k^{-1} \left( \mathbf{I} - \frac{\mathbf{y}_k \mathbf{s}_k^\top}{\mathbf{y}_k^\top \mathbf{s}_k} \right) + \frac{\mathbf{s}_k \mathbf{s}_k^\top}{\mathbf{y}_k^\top \mathbf{s}_k},

where \mathbf{y}_k = \nabla f(\mathbf{x}_{k+1}) - \nabla f(\mathbf{x}_k).^[50] This formulation preserves symmetry and positive definiteness if \mathbf{B}_k^{-1} is positive definite and \mathbf{y}_k^\top \mathbf{s}_k > 0, making it suitable for minimizing smooth functions.^[48] Under appropriate local conditions, such as Lipschitz continuity of the derivatives and starting sufficiently close to the solution, quasi-Newton methods achieve superlinear convergence, meaning the convergence order exceeds 1 but is typically less than the quadratic order of full Newton's method.^[48] This superlinear rate arises from the accumulating accuracy of the matrix approximations, while each iteration remains cheaper than Newton's due to the low-rank updates and avoidance of exact second derivatives.^[48] These methods are particularly advantageous in high-dimensional problems, where evaluating the full Jacobian or Hessian can dominate computational costs, as the updates scale favorably with dimension.^[48]

Higher-Order and Specialized Variants

Chebyshev's method is a third-order root-finding algorithm that extends Newton's method by incorporating the second derivative, achieving cubic convergence under suitable conditions near a simple root.^[51] The iteration is given by

x_{n+1} = x_n - \frac{f(x_n)}{f'(x_n)} \left(1 - \frac{1}{2} \frac{f(x_n) f''(x_n)}{[f'(x_n)]^2}\right),

where f'' denotes the second derivative.^[52] This method requires evaluating higher derivatives, increasing computational cost but offering faster convergence than the quadratic rate of standard Newton's method; explicit convergence regions have been derived for specific applications like principal pth roots.^[52] In the context of p-adic numbers, Newton's method adapts to the ultrametric topology of \mathbb{Q}_p, where the p-adic norm satisfies |x + y|_p \leq \max\{|x|_p, |y|_p\}, leading to unique convergence behaviors distinct from the real case.^[53] For a polynomial f \in \mathbb{Z}_p, if an initial approximation x_0 satisfies f(x_0) \equiv_p 0 and f'(x_0) \not\equiv_p 0, the iterates converge quadratically to a unique root \xi \in \mathbb{Z}_p with |x_{n+1} - \xi|_p \leq p^{-2^n}, leveraging Taylor expansions in the complete ultrametric field.^[53] This variant finds applications in number theory for computing algebraic roots in p-adic settings, such as higher-order equations, with Smale's \alpha-theory extended to provide sufficient conditions for convergence in ultrametric fields.^[54] The q-analogue of Newton's method arises in q-deformed calculus, replacing the standard derivative with the q-derivative D_{q,x} f(x_k) = \frac{f(x_k) - f(q x_k)}{x_k (1 - q)} for q \neq 1, yielding a deformed iteration for root finding.^[55] For simple roots, the update is x_{k+1} = x_k - \frac{f(x_k)}{D_{q,x} f(x_k)}, while for roots of multiplicity m, it becomes x_{k+1} = x_k - m \frac{f(x_k)}{D_{q,x} f(x_k)}.^[55] When q is close to 1, this method converges more rapidly than the classical version for algebraic and transcendental equations, as demonstrated in numerical examples like solving x e^x - 1 = 0, where q \approx 1 yields approximations nearer to the true root.^[55] Modifications to Newton's method address slow linear convergence at multiple roots by estimating multiplicity or adjusting the iteration. Maehly's procedure refines successive roots of polynomials by incorporating previously found roots to avoid reconvergence, using the update x_{k+1} = x_k - \frac{P(x_k)}{P'(x_k) - P(x_k) \sum_{i=1}^j (x_k - x_i)^{-1}}, where P is the polynomial and x_i are prior roots.^[56] This approach suppresses zeros effectively without explicit deflation, reducing rounding errors in closely spaced or multiple roots.^[56] Hirano's modification enhances global convergence for algebraic equations with multiple roots by altering the Newton step to ensure quadratic convergence even in challenging cases, successfully applied to otherwise difficult polynomial systems.^[57] Interval Newton's method employs interval arithmetic to compute guaranteed enclosures of roots, providing verified bounds for nonlinear equations.^[58] It combines a Gauss-Seidel-type iteration, real Newton steps, and a linearized solver subalgorithm, where intervals represent sets of possible values, and the method refines enclosures until a unique root is isolated within the interval.^[58] This variant ensures computational reliability by enclosing all possible solutions, converging in fewer iterations than alternatives like Krawczyk's method while bounding errors rigorously.^[58]

Applications

Optimization Problems

Newton's method is widely used to solve unconstrained optimization problems of the form \min_{x \in \mathbb{R}^n} f(x), where f: \mathbb{R}^n \to \mathbb{R} is a twice continuously differentiable function, by iteratively finding critical points where the gradient \nabla f(x) = 0.^[59] The method approximates the objective function locally by its second-order Taylor expansion, leading to an update rule that solves for the minimum of this quadratic model. Specifically, at each iteration k, the Newton step computes the search direction as the solution to H_k d = -\nabla f(x_k), where H_k is the Hessian matrix \nabla^2 f(x_k), and updates x_{k+1} = x_k + d_k = x_k - H_k^{-1} \nabla f(x_k), assuming H_k is invertible.^[59] This approach leverages the curvature information from the Hessian to achieve faster convergence compared to first-order methods like gradient descent.^[60] To determine the nature of a critical point found by the method, second-order conditions are examined using the Hessian: if \nabla f(x^*) = 0 and H(x^*) is positive definite, then x^* is a strict local minimum; if negative definite, a local maximum; and if indefinite, a saddle point.^[61] Positive definiteness of the Hessian ensures the quadratic model is convex, providing a sufficient condition for a local minimum, though it is not necessary—a positive semidefinite Hessian may still indicate a minimum but requires further analysis.^[62] Near such a stationary point where the Hessian is positive definite and Lipschitz continuous, Newton's method exhibits quadratic convergence, meaning the error decreases quadratically with each iteration once sufficiently close to the optimum.^[59] In practice, the pure Newton method may fail to make progress or diverge if the initial guess is far from the optimum or if the Hessian is not positive definite, prompting the use of damped or modified variants. The damped Newton method incorporates a line search to select a step size \alpha_k \in (0,1] that ensures sufficient decrease in the objective, typically via backtracking along the Newton direction: x_{k+1} = x_k - \alpha_k H_k^{-1} \nabla f(x_k), satisfying conditions like the Armijo rule for descent.^[60] This modification guarantees global convergence to a stationary point under suitable assumptions, such as self-concordance of f, while preserving local quadratic convergence.^[63] In machine learning, damped Newton's method is applied to optimize the negative log-likelihood in logistic regression, where the Hessian corresponds to the Fisher information matrix, enabling efficient parameter estimation for binary classification tasks.^[64] Newton's method is closely related to trust-region methods, which solve a constrained subproblem within a neighborhood where the quadratic model is trusted to be accurate, often yielding similar directions but with added robustness for nonconvex problems.^[65]

Inverse Computation and Verification

Newton's method can be applied to compute the multiplicative inverse of a nonzero number b, by solving the equation f(x) = \frac{1}{x} - b = 0. The derivative is f'(x) = -\frac{1}{x^2}, leading to the iteration formula x_{n+1} = x_n - \frac{f(x_n)}{f'(x_n)} = 2x_n - b x_n^2.^[66] This approach converges quadratically for a suitable initial guess, such as x_0 obtained by approximating b with a power of 2 and using bit shifts for an initial reciprocal, and is analogous to the method for square roots but avoids explicit division in hardware implementations.^[67] For formal power series, Newton's method enables efficient reversion, which inverts a series y = f(x) to express x = g(y) as a power series, assuming f(0) = 0 and f'(0) \neq 0. The classical Newton iteration for series operates degree by degree, achieving quadratic convergence in the number of terms, and can be implemented to compute the first n terms in O(n (\log n)^{O(1)}) arithmetic operations using fast multiplication techniques.^[68] This is particularly useful in algebraic computations, such as solving differential equations or composing series in computer algebra systems.^[69] Numerical verification of solutions to nonlinear equations employs variants like the interval Newton method, which uses interval arithmetic to enclose roots and prove existence or uniqueness within a box [a, b]. A key tool is the Krawczyk operator, defined for a system F(x) = 0 with an approximate solution \tilde{x} and preconditioner C (often the inverse Jacobian approximation) as K(\tilde{x}, X) = \tilde{x} - C F(\tilde{x}) + (I - C J(\tilde{x})) (X - \tilde{x}), where X is an interval enclosure and J is the Jacobian; if K(\tilde{x}, X) \subseteq X and the spectral radius of I - C J(\tilde{x}) is less than 1, then there exists a unique root in X.^[70] This method guarantees certified enclosures without exhaustive search, extending classical Newton's method to validated numerics.^[71] In applications to transcendental equations, Newton's method solves Kepler's equation M = E - e \sin E for the eccentric anomaly E given mean anomaly M and eccentricity e < 1, crucial in orbital mechanics for determining satellite positions. Starting with a simple initial guess such as E_0 = M, the iteration converges rapidly, often in 3-4 steps, outperforming fixed-point methods for high precision.^[72]^[73] This has been optimized in astronomical computations since the 19th century.^[73] Error certification for computed roots uses backward error analysis, assessing how close the approximate root \tilde{x} is to an exact root of a perturbed equation f(x) + \delta(x) = 0, where the backward error is typically \|\delta\| \approx |f(\tilde{x})| in a suitable norm. For Newton's method, the residual |f(\tilde{x})| provides a bound on this perturbation, certifying the accuracy relative to input data uncertainties, especially in floating-point arithmetic where forward error may amplify ill-conditioning.^[74] This approach is fundamental in verified computing to ensure reliability of roots in scientific simulations.^[75]

Implementation

Algorithmic Steps

Newton's method, also known as the Newton-Raphson method, proceeds iteratively to approximate a root of the equation f(x) = 0 by refining an initial guess using the function's derivative.^[76] The algorithm begins with an initial approximation x_0, typically chosen based on graphical inspection or domain knowledge to ensure it is reasonably close to the root.^[77] At each iteration n, the method computes the function value f(x_n) and its derivative f'(x_n), then updates the approximation via the formula x_{n+1} = x_n - \frac{f(x_n)}{f'(x_n)}, which corresponds to the x-intercept of the tangent line to f at x_n.^[78] This process repeats until a convergence criterion is met, producing a sequence \{x_n\} that ideally converges quadratically to the root under suitable conditions.^[77] The following pseudocode outlines the core steps for a one-dimensional implementation:

INPUT: initial guess x_0, tolerance ε > 0, maximum iterations N
SET n = 0
WHILE n < N DO
    COMPUTE f(x_n)
    COMPUTE f'(x_n)
    IF f'(x_n) = 0 THEN
        OUTPUT "Derivative is zero; method fails"
        STOP
    END IF
    SET x_{n+1} = x_n - f(x_n) / f'(x_n)
    IF |f(x_{n+1})| < ε OR |x_{n+1} - x_n| < ε THEN
        OUTPUT x_{n+1} as approximate root
        STOP
    END IF
    SET n = n + 1
END WHILE
OUTPUT "Maximum iterations reached; no convergence"
INPUT: initial guess x_0, tolerance ε > 0, maximum iterations N
SET n = 0
WHILE n < N DO
    COMPUTE f(x_n)
    COMPUTE f'(x_n)
    IF f'(x_n) = 0 THEN
        OUTPUT "Derivative is zero; method fails"
        STOP
    END IF
    SET x_{n+1} = x_n - f(x_n) / f'(x_n)
    IF |f(x_{n+1})| < ε OR |x_{n+1} - x_n| < ε THEN
        OUTPUT x_{n+1} as approximate root
        STOP
    END IF
    SET n = n + 1
END WHILE
OUTPUT "Maximum iterations reached; no convergence"

This structure ensures the iteration halts appropriately.^[78]^[76] To safeguard against infinite loops or numerical instability, implementations include a maximum iteration limit N, often set to 100 or based on expected convergence speed, beyond which the algorithm terminates with a failure message.^[78] Additionally, a check for f'(x_n) \neq 0 prevents division by zero, which would otherwise cause the method to fail at that step.^[76] Common convergence criteria involve absolute or relative tolerances, such as |f(x_{n+1})| < \epsilon for the function residual, |x_{n+1} - x_n| < \delta for the step size, or |x_{n+1} - x_n| / |x_{n+1}| < \delta for relative change, where \epsilon and \delta are small user-defined thresholds like $10^{-6}.^[78] These criteria balance accuracy with computational efficiency, as the method typically doubles the number of correct digits per iteration near the root.^[77] In cases where Newton's method stalls, such as when the derivative is near zero or the initial guess leads to oscillation, hybrid approaches incorporate a fallback to the bisection method, which guarantees convergence albeit more slowly.^[79] For instance, if the step x_{n+1} falls outside a bracketing interval [a, b] containing the root, the algorithm switches to bisection on that interval until progress resumes.^[79] Regarding computational complexity, each iteration in one dimension requires O(1) operations, assuming evaluations of f and f' are constant time.^[80] In n dimensions, where the method solves f(\mathbf{x}) = \mathbf{0} using the Jacobian matrix, each iteration involves inverting or factoring an n \times n system, leading to O(n^3) complexity per step.^[80]

Sample Code Snippets

Newton's method can be implemented straightforwardly in various programming languages, following the iterative update x_{n+1} = x_n - f(x_n)/f'(x_n) for univariate cases or its matrix generalization for multivariate systems. These examples include convergence checks, iteration limits, and error handling for issues like zero derivatives or singular Jacobians, drawing from standard numerical analysis practices.^[81] Python (Univariate)
The following Python function implements Newton's method for finding roots of a univariate function, using a for loop with exception handling for division by zero and a tolerance on the step size for convergence.

python
def newton(f, fprime, x0, tol=1e-6, maxiter=100):
    """
    Newton's method for univariate root finding.
    
    Parameters:
    f: function to solve f(x) = 0
    fprime: derivative of f
    x0: initial guess
    tol: tolerance for convergence
    maxiter: maximum iterations
    
    Returns:
    Approximate root and number of iterations
    """
    x = x0
    for i in range(maxiter):
        try:
            fx = f(x)
            fpx = fprime(x)
            dx = fx / fpx
            x -= dx
            if abs(dx) < tol:
                return x, i + 1
        except ZeroDivisionError:
            raise ValueError("Derivative is zero at current iterate")
    raise ValueError("Maximum iterations reached without convergence")
def newton(f, fprime, x0, tol=1e-6, maxiter=100):
    """
    Newton's method for univariate root finding.
    
    Parameters:
    f: function to solve f(x) = 0
    fprime: derivative of f
    x0: initial guess
    tol: tolerance for convergence
    maxiter: maximum iterations
    
    Returns:
    Approximate root and number of iterations
    """
    x = x0
    for i in range(maxiter):
        try:
            fx = f(x)
            fpx = fprime(x)
            dx = fx / fpx
            x -= dx
            if abs(dx) < tol:
                return x, i + 1
        except ZeroDivisionError:
            raise ValueError("Derivative is zero at current iterate")
    raise ValueError("Maximum iterations reached without convergence")

This code assumes the user provides the function f and its derivative f'; it returns the approximate root and iteration count upon convergence.^[82] MATLAB/Octave (Univariate)
A MATLAB or Octave equivalent uses a while loop to iterate until the step norm falls below tolerance, with a check for near-zero derivatives in one dimension.

matlab
function [root, iter] = newton(f, df, x0, tol, maxiter)
    % Newton's method for univariate root finding
    if nargin < 4, tol = 1e-6; end
    if nargin < 5, maxiter = 100; end
    x = x0;
    iter = 0;
    while iter < maxiter
        fx = f(x);
        dfx = df(x);
        if abs(dfx) < 1e-15
            error('Derivative is zero at current iterate');
        end
        dx = fx / dfx;
        x = x - dx;
        iter = iter + 1;
        if norm(dx) < tol
            root = x;
            return;
        end
    end
    error('Maximum iterations reached without convergence');
end
function [root, iter] = newton(f, df, x0, tol, maxiter)
    % Newton's method for univariate root finding
    if nargin < 4, tol = 1e-6; end
    if nargin < 5, maxiter = 100; end
    x = x0;
    iter = 0;
    while iter < maxiter
        fx = f(x);
        dfx = df(x);
        if abs(dfx) < 1e-15
            error('Derivative is zero at current iterate');
        end
        dx = fx / dfx;
        x = x - dx;
        iter = iter + 1;
        if norm(dx) < tol
            root = x;
            return;
        end
    end
    error('Maximum iterations reached without convergence');
end

For one-dimensional problems, the norm of the scalar step dx is equivalent to its absolute value; this script can be called directly from the command window.^[7] C++ (Square Root Example)
To compute the square root of a positive number a (e.g., \sqrt{2}), apply Newton's method to f(x) = x^2 - a = 0 with derivative f'(x) = 2x, using an inline loop for iteration.

cpp
#include <iostream>
#include <cmath>

double newton_sqrt(double a, double tol = 1e-6, int maxiter = 100) {
    // Newton's method for sqrt(a)
    if (a < 0.0) return std::nan("");
    double x = a > 0.0 ? a : 1.0;  // Initial guess
    for (int i = 0; i < maxiter; ++i) {
        double fx = x * x - a;
        double fpx = 2.0 * x;
        if (std::abs(fpx) < 1e-15) {
            return std::nan("");  // Singular case
        }
        double dx = fx / fpx;
        x -= dx;
        if (std::abs(dx) < tol) {
            return x;
        }
    }
    return std::nan("");  // No convergence
}

int main() {
    double root = newton_sqrt(2.0);
    std::cout << "Approximate sqrt(2): " << root << std::endl;
    return 0;
}
#include <iostream>
#include <cmath>

double newton_sqrt(double a, double tol = 1e-6, int maxiter = 100) {
    // Newton's method for sqrt(a)
    if (a < 0.0) return std::nan("");
    double x = a > 0.0 ? a : 1.0;  // Initial guess
    for (int i = 0; i < maxiter; ++i) {
        double fx = x * x - a;
        double fpx = 2.0 * x;
        if (std::abs(fpx) < 1e-15) {
            return std::nan("");  // Singular case
        }
        double dx = fx / fpx;
        x -= dx;
        if (std::abs(dx) < tol) {
            return x;
        }
    }
    return std::nan("");  // No convergence
}

int main() {
    double root = newton_sqrt(2.0);
    std::cout << "Approximate sqrt(2): " << root << std::endl;
    return 0;
}

This specialized version starts with x_0 = a and outputs NaN for invalid or non-convergent cases, achieving high precision in few iterations for a = 2.^[83] Python (Multivariate with NumPy)
For solving systems of nonlinear equations \mathbf{F}(\mathbf{x}) = \mathbf{0}, use NumPy to compute the Jacobian and solve the linear system at each step via matrix inversion.

python
import numpy as np

def newton_multivariate(F, J, x0, tol=1e-6, maxiter=100):
    """
    Multivariate Newton's method.
    
    Parameters:
    F: vector-valued function R^n -> R^n
    J: Jacobian matrix function
    x0: initial guess (array-like)
    tol: tolerance
    maxiter: maximum iterations
    
    Returns:
    Approximate root and iterations
    """
    x = np.array(x0, dtype=float)
    for i in range(maxiter):
        Fx = F(x)
        Jx = J(x)
        try:
            dx = np.linalg.solve(Jx, Fx)
            x -= dx
            if np.linalg.norm(dx) < tol:
                return x, i + 1
        except np.linalg.LinAlgError:
            raise ValueError("Singular Jacobian at current iterate")
    raise ValueError("Maximum iterations reached without convergence")
import numpy as np

def newton_multivariate(F, J, x0, tol=1e-6, maxiter=100):
    """
    Multivariate Newton's method.
    
    Parameters:
    F: vector-valued function R^n -> R^n
    J: Jacobian matrix function
    x0: initial guess (array-like)
    tol: tolerance
    maxiter: maximum iterations
    
    Returns:
    Approximate root and iterations
    """
    x = np.array(x0, dtype=float)
    for i in range(maxiter):
        Fx = F(x)
        Jx = J(x)
        try:
            dx = np.linalg.solve(Jx, Fx)
            x -= dx
            if np.linalg.norm(dx) < tol:
                return x, i + 1
        except np.linalg.LinAlgError:
            raise ValueError("Singular Jacobian at current iterate")
    raise ValueError("Maximum iterations reached without convergence")

The function \mathbf{F} returns a vector, and J a matrix; convergence is checked via the Euclidean norm of the step.^[84] Testing the Implementations
These codes can be tested on the classic problem of finding \sqrt{2}, the positive root of f(x) = x^2 - 2 = 0, using an initial guess x_0 = 1.0. The univariate Python version converges to approximately 1.41421356237 in 4 iterations, verifiable as follows:

python
def f(x): return x**2 - 2
def fprime(x): return 2 * x
root, iters = newton(f, fprime, 1.0)
assert abs(root**2 - 2) < 1e-5, "Test failed: inaccuracy exceeds tolerance"
print(f"Root: {root}, Iterations: {iters}")
def f(x): return x**2 - 2
def fprime(x): return 2 * x
root, iters = newton(f, fprime, 1.0)
assert abs(root**2 - 2) < 1e-5, "Test failed: inaccuracy exceeds tolerance"
print(f"Root: {root}, Iterations: {iters}")

Similar assertions apply to the other languages, confirming quadratic convergence near the root.^[82]

References

[1]
Newton's Method -- from Wolfram MathWorld
Newton's method, also called the Newton-Raphson method, is a root-finding algorithm that uses the first few terms of the Taylor series of a function f(x) in ...
[2]
[PDF] The Newton-Raphson Method - UBC Math
The Newton-Raphson method, or Newton Method, is a powerful technique for solving equations numerically. Like so much of the differential calculus,.
[3]
https://cedar.wwu.edu/math_facpubs/93
[4]
Newton's Method - Department of Mathematics at UTSA
### Theorem Statement
[5]
Newton's Method | Introduction To MATLAB Programming
Newton's method is an iterative method. This means that there is a basic mechanism for taking an approximation to the root, and finding a better one.
[6]
[PDF] Newton's Method
Newton's method is another technique for finding the zeros of an equation of the form ... side of this equation is just the first order Taylor expansion of f ...
[7]
2.13 Newton's Method - Dartmouth Mathematics
Tangent lines are used to find the root of an equation f(x) = 0, where f is a differentiable function. The procedure, called Newton's Method, defines a sequence ...
[8]
Solving Nonlinear Equations - CS 357 - Course Websites
As you can see, Newton's Method is already converging significantly faster than the Bisection Method. ... bisection, Newton's method, and secant method?
[9]
https://math.dartmouth.edu/~klbooksite/2.13/213.html
[10]
[PDF] A Short History of Newton's Method - FTP Server of the GWDG
• In 1664, Isaac Newton (1643–1727) got to know Vieta's method. Up to. 1669 he had improved it by linearizing the successively arising polyno- mials. As an ...
[11]
[PDF] Numerical Analysis, 9th ed.
... Numerical Analysis. NINTH EDITION. Richard L. Burden. Youngstown State University. J. Douglas Faires ... Method 48. 2.2 Fixed-Point Iteration 56. 2.3 Newton's ...
[12]
[PDF] Newton-type Methods - Stanford University
Jul 5, 2010 · Let f(x) be a univariate continuously differentiable real-valued function on an open convex set D ⊆ IR1. Assume that f(x∗)=0 for some x∗⊂ D and ...
[13]
[PDF] Multidimensional-Newton | MIT
Sep 7, 2017 · Clearly, J(v) is the analogue of f0(x) in the one-dimensional Newton method. We will look at the general formula for J(v) below, but to ...
[14]
[PDF] Multidimensional Newton Methods
– Basic Algorithm. – Description of the Jacobian. – Equation formulation. • Multidimensional Convergence Properties. – Prove local convergence. – Improving ...
[15]
Newton-Raphson Method (Multivariate)
The Newton-Raphson method discussed above for solving a single-variable equation $f(x)=0$ can be generalized to the case of multivariate equation systems.
[16]
A modified Newton method with cubic convergence: the multivariate ...
This may be interpreted geometrically in the following way: The tangent hyperplane going through the point P=(x,F(x)) that is used in the Newton method, is ...Missing: multidimensional | Show results with:multidimensional
[17]
[PDF] Quadratic Convergence of Newton's Method - NYU Computer Science
the proof of quadratic convergence (assuming convergence takes place) is fairly simple and may be found in many books. Here it is. Let f be a real-valued.
[18]
[PDF] Scientific Computing: Solving Nonlinear Equations - NYU Courant ...
One Dimensional Root Finding. Basin of attraction for Newton's method. For convergence we want ek+1. < ek so we want ek f //(α). 2f /(α). < 1 ⇒ ek. <. 2f /(α).
[19]
[PDF] NEWTON'S METHOD AND FRACTALS 1. Solving the equation f(x ...
Newton's method involves choosing an initial guess x0, and then, through an iterative process, finding a sequence of numbers x0, x1, x2, x3, ··· 1 that ...<|control11|><|separator|>
[20]
[PDF] Error Behaviour of Newton's Method - UBC Math
Each time you increase n by one, the number of zeroes after the decimal place roughly doubles. October 10, 2012. Error Behaviour of Newton's Method. 2.
[21]
Historical developments in convergence analysis for Newton's and ...
More than three hundred years have passed since a procedure for solving an algebraic equation was proposed by Newton in 1669 and later by Raphson in 1690 ...
[22]
[PDF] Nonlinear Systems
Oct 23, 2022 · error can dislodge the iterate off the ... The quadratic convergence of Newton's Method implies that, roughly, each new iterate doubles.
[23]
Using the Newton-Raphson Method with Automatic Differentiation to ...
Jul 19, 2022 · In this paper, we apply the Newton-Raphson method together with Automatic Differention to numerically approximate the implied volatility of an arbitrary stock ...
[24]
[PDF] Adaptive Finite-Difference Interval Estimation for Noisy Derivative ...
Oct 12, 2021 · With no noise, excluding round-off error, one would ideally choose h as small as possible, the common practical choice being h = max{1, |x|} √ ...
[25]
[PDF] Comparison of Derivative Estimation Methods in Solving Optimal ...
May 29, 2019 · Next, the hyper-dual derivative approximation does not suffer from either truncation error or roundoff error. As a result, an arbitrary step ...
[26]
[PDF] arXiv:2004.04435v1 [cs.MS] 9 Apr 2020
Apr 9, 2020 · We show that AD can drastically improve accuracy and performance of derivative evaluation, compared to ND.
[27]
[PDF] Automatic differentiation for solid mechanics - arXiv
Jan 21, 2020 · AD differs from finite differences, because it does not approximate the continuous derivatives with discrete differences, thus it does not ...
[28]
[PDF] A Review of Automatic Differentiation and its Efficient Implementation
Jan 14, 2019 · Automatic differentiation is a powerful tool to automate the calculation of derivatives and is preferable to more traditional methods, ...
[29]
[PDF] Newton Like Iterative Method without Derivative for Solving ... - arXiv
In order to avoid the derivative function in the iterative scheme, the difference quotient is used instead of the derivative.
[30]
[PDF] Inexact Newton-CG Algorithms With Complexity Guarantees - arXiv
Apr 11, 2022 · Abstract. We consider variants of a recently-developed Newton-CG algorithm for nonconvex problems [Royer et al.,. 2020] in which inexact ...
[31]
[PDF] NEWTON'S METHOD
has a division by zero. 2 : Near-Zero Slope. Even if x1 results in just near a zero slope, it can cause x2 to be a worse approximation than x1. For example ...
[32]
[PDF] MULTIPLE ROOTS We study two classes of functions for which there ...
Methods such as Newton's method and the se- cant method converge more slowly than for the case of a simple root. 2. There is a large interval of uncertainty in ...
[33]
[PDF] Slow convergence around inflection points--Newton-Raphson Method.
The following simulation illustrates the Newton-Raphson method converges slowly due to an inflection point occuring in the vicinity of the root. > restart;
[34]
Fractal Basins of Attraction Associated with a Damped Newton's ...
Newton's Method is based on a linear approximation of the function whose roots are to be determined taken at the current point, and the resulting algorithm is ...<|separator|>
[35]
[PDF] Square Roots via Newton's Method - MIT Mathematics
Feb 4, 2015 · Recall that Newton's method finds an approximate root of f(x)=0 from a guess xn by approximating f(x) as its tangent line f(xn) + f0(xn)(x − xn ...
[36]
[PDF] Babylonian Method of Computing the Square Root
Historically the first natural explanation of the efficiency of the Babylonian method was proposed by Isaac Newton. Newton showed that this method is a ...
[37]
4.9 Newton's Method - Calculus Volume 1 | OpenStax
Finding a Square Root Use Newton's method to approximate √2 (Figure 4.79). Let f(x)=x2−2, let x0=2, and calculate x1,x2,x3,x4,x5.
[38]
1. Using the Newton-Raphson Method find the root of equation cos(x)
Mar 14, 2023 · Using the Newton-Raphson Method find the root of equation cos(x)−x3 using as starting point xO=0.5 with εS=0.01%. Answer: x=0.8654 (20 points)
[39]
Calculus I - Newton's Method - Pauls Online Math Notes
Oct 26, 2023 · So, we can find the new approximation provided the derivative isn't zero at the original approximation. Now we repeat the whole process to find ...
[40]
Nonlinear Systems of Equations: Newton-Raphson Method
The Newton-Raphson method is the method of choice for solving nonlinear systems of equations. Many engineering software packages (especially finite element ...
[41]
[PDF] Nonlinear Least Squares
The method of Gauss-Newton updates the current approximation via the solution of a least squares system. Levenberg-Marquardt adds steepest descent to Gauss- ...
[42]
Computational complexity of Newton's method
Nov 17, 2018 · Newton's method requires the computation of the Jacobian matrix and its "inversion" at every step k. This could be too expensive (O(N3)) ...
[43]
Newton's Method Estimates from Data at One Point | SpringerLink
About this paper. Cite this paper. Smale, S. (1986). Newton's Method Estimates from Data at One Point. In: Ewing, R.E., Gross, K.I., Martin, C.F. (eds) The ...
[44]
Newton--Kantorovich method and its global convergence
Newton method in its basic form possesses just local convergence, its global behavior and modifications to achieve global convergence will be discussed in Secs.
[45]
The inverse function theorem of Nash and Moser - Project Euclid
The inverse function theorem of Nash and Moser. Richard S. Hamilton. DOWNLOAD PDF + SAVE TO MY LIBRARY. Bull. Amer. Math. Soc. (NS) 7(1): 65-222 (July 1982).Missing: original | Show results with:original
[46]
Quasi-Newton Methods, Motivation and Theory | SIAM Review
This paper is an attempt to motivate and justify quasi-Newton methods as useful modifications of Newton's method for general and gradient nonlinear systems ...
[47]
A Class of Methods for Solving Nonlinear Simultaneous Equations
Introduction. The solution of a set of nonlinear simultaneous equations is often the final step in the solution of practical problems arising in physics and ...
[48]
A new approach to variable metric algorithms - Oxford Academic
A new approach to variable metric algorithms. R. Fletcher. R. Fletcher. Atomic ... The Computer Journal, Volume 13, Issue 3, 1970, Pages 317–322, https ...
[49]
Dynamics of Newton-like root finding methods | Numerical Algorithms
Dec 15, 2022 · This family includes the Chebyshev's method when the parameter α is equal to 0, Halley's scheme for α = 1/2 and Newton's method when α tends to ...
[50]
Explicit convergence regions of Newton's method and Chebyshev's ...
Dec 15, 2019 · In this paper, we have obtained new explicit convergence regions for Newton's method and Chebyshev's method for finding the principal pth root ...
[51]
None
### Summary of Newton's Method Over p-Adics, Convergence Properties, and Ultrametric Aspects
[52]
https://www.sciencedirect.com/science/article/pii/S002437951930360X
[53]
[PDF] q-Iterative Methods - IOSR Journal
q-analogue of iterative methods for solving algebraic and transcendental equations gives the same result as classical methods do but it converges more rapidly ...
[54]
[PDF] 9.5 Roots of Polynomials
May 9, 2017 · Like one-dimensional Newton-Raphson, the method works well in the vicinity of a root ... An alternative is Maehly's procedure. Maehly ...
[55]
Global Convergence of a Modified Newton Iteration for Algebraic ...
The modification of Newton's method devised by S. Hirano has been successfully applied in the solution of otherwise formidable algebraic equations.
[56]
An interval Newton method - ScienceDirect.com
We introduce an interval Newton method for bounding solutions of systems of nonlinear equations. It entails three subalgorithms.
[57]
[PDF] Newton's Method for Unconstrained Optimization
Here ∇f(x) is the gradient of f(x) and H(x) is the Hessian of f(x). Notice that h(x) is a quadratic function, which is minimized by solving. ∇h(x) = 0 ...
[58]
[PDF] Newton's Method
x(k) = x(k−1) − tk · ∇f(x(k−1)), k = 1,2,3,... In comparison, Newton's method repeats x(k) = x(k−1) − ∇2f(x(k−1)). −1. ∇f(x(k−1)), k = 1,2,3,... Here ∇2f(x(k−1) ...
[59]
[PDF] Lec3p1, ORF363/COS323
This condition is necessary but not sufficient for local optimality ... (i.e., the Hessian at is positive definite), then is a strict local minimum ...
[60]
[PDF] Optimality Conditions
Apr 2, 2011 · The condition that H is positive definite is a sufficient condition for a local minimizer, but it is not a necessary condition. It can be ...
[61]
[PDF] CS257 Linear and Convex Optimization - Lecture 11
Nov 16, 2020 · To ensure f(xk+1) < f(xk), damped Newton's method does backtracking line search along the Newton direction. Recall pure Newton's method ...
[62]
[PDF] Logistic Regression and Newton's Method
Nov 18, 2009 · It makes stronger, more detailed predictions, and can be fit in a different way; but those strong predictions could be wrong. Using logistic ...
[63]
Newton's Method with a Model Trust Region Modification - SIAM.org
A modified Newton method for unconstrained minimization is presented and analyzed. The modification is based upon the model trust region approach.
[64]
[PDF] On Newton-Raphson iteration for multiplicative inverses modulo ...
May 15, 2018 · We study algorithms for the fast computation of modular inverses. Newton-Raphson iteration over p-adic numbers gives a recurrence relation ...
[65]
[PDF] Newton's Method without Division - Jeff Blanchard
Newtons's Method for root-finding is modified to avoid the division step while maintaining quadratic convergence. 1. INTRODUCTION. Newton's Method is the most ...
[66]
Fast Algorithms for Manipulating Formal Power Series
Hence the reversion of a power series of the form (3.3) can be done in REV(n) + O(M(n)) operations.
[67]
Fast algorithms for manipulating formal power series
The classical algorithms require order n 3 operations to compute the first n terms in the reversion of a power series or the composition of two series.
[68]
Krawczyk-Like Algorithms for the Solution of Systems of Nonlinear ...
The present paper contains a comparison of the iterates produced by the original method of Krawczyk and our methods. Download PDF · HomeSIAM Journal on ...
[69]
[PDF] Interval Krawczyk and Newton method
Feb 20, 2007 · The modified Newton operator is given by. Nm(x) = x − CF(x). (6) ... The Krawczyk operator is given by. K(x0, [x],F) := x0 − CF(x0)+(Id ...
[70]
[PDF] Optimized solution of kepler's equation
Kepler's equation, E = M + e sin E, is solved for E by use of a second-order Newton-Raphson iterative algorithm. The derivation of this algorithm is discussed ...
[71]
Kepler equation and accelerated Newton method - ScienceDirect.com
The more conventional form of this equation is as follows: f(x)=x−e sen x−M=0, with 0⩽e<1 and 0⩽M<π.
[72]
1.3. Newton's Method for Solving Equations
The backward error in x ~ as an approximation to a root of a function f is f ( x ~ ) . The absolute backward error is its absolute value, ...
[73]
Rootfinding, Newton's Method, and Dynamical Systems
So instead we use backward error: if the residual is r k = f ( x k ) then we have found the exact solution of, not f ( x ) = 0 , but f ( x ) − r k = 0 . This ...
[74]
The Idea of Newton's Method
Newton's method is a technique for solving equations of the form f(x)=0 by successive approximation. The idea is to pick an initial guess x0 such that f ...
[75]
[PDF] Newton's Method - MATH 375 Numerical Analysis
▷ The Secant Method is generally slower to converge to the root than Newton's Method. ▷ The Secant Method is faster to converge than the Bisection. Method ...
[76]
Combining Newton's method with Bisection
In this case, we fall back to the bisection method. and if m≥b or m≤a, we scrap this value of m and use the bisection method instead: m=a+b2.
[77]
[PDF] Newton's Method 14.1 Review of previous lecture 14.2 Introduction
Part (a) of the theorem says that Newton's method has super-linear local convergence. Note that this is stronger than linear convergence: x(k) → x? linearly ⇐⇒ ...
[78]
3. Newton's Method for Solving Equations
To explore some examples of this, here is a Python function implementing Newton's method.
[79]
Newton-Raphson Method - Python Numerical Methods
Newton-Raphson Method¶. Let f(x) be a smooth and continuous function and xr be an unknown root of f(x). Now assume that x0 is a guess for xr.
[80]
[PDF] CSCI 135 Software Design and Analysis, C++ Homework 1 ... - CUNY
int square(int n) { return n*n;. } Page 5. Problem 3: Square root. We have seen in class a function to compute the square root of a number x based on Newton's ...
[81]
Algorithms for Optimization and Root Finding for Multivariate Problems
Now, if x0 is 'close enough' to the minimizer x∗, the step size αk=1 gives quadratic convergence. The advantage of multiplying the gradient by the inverse of ...