Newton's method
Newton's method, also known as the Newton–Raphson method, is a root-finding algorithm that produces successively better approximations to the roots (or zeroes) of a real-valued differentiable function by leveraging the first few terms of the function's Taylor series expansion around an initial guess.[1] The method iteratively refines an estimate x_n using the formula x_{n+1} = x_n - \frac{f(x_n)}{f'(x_n)}, where f(x) is the function and f'(x) its derivative, effectively applying linear approximation via the function's tangent line at each step.[1][2] Developed by Isaac Newton in his unpublished manuscript De analysi around 1669, where it was applied to polynomial equations via binomial expansions, the method was later refined and described by Joseph Raphson in 1690 as an iterative process for solving algebraic equations, emphasizing its quadratic convergence properties.[3] Newton further utilized a version of the technique in his Principia Mathematica (1687) to solve Kepler's equation in orbital mechanics, while Raphson extended it to higher-degree polynomials in his Analysis aequationum universalis.[3] Thomas Simpson generalized the approach in 1740 by incorporating fluxional calculus (early differential calculus) to handle nonlinear and systems of equations, marking its transition to a broader numerical tool.[3] Under suitable conditions—such as starting sufficiently close to a simple root where f'(x^*) \neq 0—Newton's method exhibits quadratic convergence, meaning the number of correct digits roughly doubles with each iteration, making it highly efficient for numerical computations.[1][2] However, it requires the derivative to be computable and nonzero at iteration points, and poor initial guesses can lead to divergence, oscillation, or convergence to unintended roots, particularly near local extrema or inflection points.[2] The method's versatility extends beyond root-finding to optimization (via Newton's method for minimization),[4] machine learning (e.g., in training neural networks),[5] and graphics (e.g., generating fractal patterns like Newton fractals for polynomials of degree two or higher).[1] Despite these strengths, modern variants address limitations, such as modified versions for derivative-free approximations or safeguards against division by near-zero derivatives.[2]Overview
Description
Newton's method is an iterative root-finding algorithm designed to solve equations of the form f(x) = 0, where f is a real-valued function, by generating a sequence of successively better approximations to the root.[6] Starting from an initial guess x_0, the method refines the estimate at each step until it converges to a solution sufficiently close to the actual root.[7] The core of the algorithm is the iterative update formula: x_{n+1} = x_n - \frac{f(x_n)}{f'(x_n)} This step arises from the first-order Taylor expansion of f around the current approximation x_n, which provides a linear approximation f(x) \approx f(x_n) + f'(x_n)(x - x_n); setting this equal to zero and solving for x yields the next iterate x_{n+1}.[8] The formula essentially corrects the current guess by subtracting the ratio of the function value to its derivative, leveraging local linearity to approach the root.[7] For the method to apply, f must be continuously differentiable in a neighborhood of the root, and the derivative f'(x) must be nonzero near that point to ensure the tangent is well-defined and the iteration is meaningful.[6] Geometrically, each iteration computes the point where the tangent line to the graph of y = f(x) at x_n intersects the x-axis, using this linear tangent as a proxy for the curve itself near x_n.[9] Compared to simpler techniques like the bisection method, which relies only on sign changes, or the secant method, which approximates the derivative, Newton's method typically achieves faster convergence for smooth functions due to its use of exact derivative information.[10]History
Isaac Newton first devised the method in 1669 while composing his manuscript De analysi per aequationes numero terminorum infinitas, applying it iteratively to approximate roots of polynomial equations through linearization of successively reduced polynomials.[11] This early version demonstrated quadratic convergence, as illustrated in Newton's example for solving x^3 - 2x - 5 = 0.[12] The manuscript remained unpublished for decades, though Newton revisited and refined the approach in his 1671 work De methodis fluxionum et serierum infinitarum, which incorporated fluxions (derivatives) for broader applications.[11] Publication of De analysi occurred in 1711 through William Jones's Analysis per quantitatum series, fluxiones, ac differentias cum enumeratione quarundam mathematicarum. The English translation of the 1671 manuscript, Method of Fluxions and Infinite Series, appeared in 1736.[11] Newton applied the method practically in his Philosophiæ Naturalis Principia Mathematica (1687) to solve Kepler's equation x - e \sin x = M for astronomical computations.[12] Independently, Joseph Raphson described a similar iterative procedure in 1690 in Analysis aequationum universalis, formulating it directly for polynomials without polynomial reduction, emphasizing decimal approximations.[11] The method is sometimes associated with Thomas Simpson, who in 1740 extended it to general nonlinear equations using fluxional calculus in Essays on Several Curious and Useful Subjects in Speculative Mathematics.[12] In the 19th century, Augustin-Louis Cauchy provided a rigorous modern presentation in 1847, generalizing it to systems and initiating formal error analysis.[11] By the 1930s, amid growing interest in computational techniques, the method's role in numerical analysis solidified, with Leonid Kantorovich proving convergence properties in 1939 for functional spaces, paving the way for its widespread adoption in early digital computing.[12]Mathematical Formulation
One-Dimensional Case
In the one-dimensional case, Newton's method is an iterative technique for approximating a root \alpha of a scalar function f: \mathbb{R} \to \mathbb{R} satisfying f(\alpha) = 0, assuming such a root exists.[13] The method derives from the first-order Taylor series expansion of f around the current approximation x_n: f(x) \approx f(x_n) + f'(x_n)(x - x_n). Setting the approximation equal to zero and solving for x yields the update formula for the next iterate: x_{n+1} = x_n - \frac{f(x_n)}{f'(x_n)}. This linear approximation replaces the nonlinear equation with a solvable linear one at each step.[13] To apply the method, select an initial guess x_0 sufficiently close to \alpha, then generate the sequence \{x_n\} via the update formula until a convergence criterion is met, such as |x_{n+1} - x_n| < \epsilon for a small tolerance \epsilon > 0. The process terminates when the change between iterates is negligible or a maximum iteration count is reached to prevent non-convergence.[13] The method requires that f is twice continuously differentiable in a neighborhood of \alpha, that f'(\alpha) \neq 0, and that \alpha is a simple root (i.e., the multiplicity is one). These conditions ensure the derivative is invertible at the root and support local convergence behavior.[14] Define the error at step n as e_n = x_n - \alpha. Under the stated assumptions, the error satisfies the asymptotic relation e_{n+1} \approx \frac{f''(\alpha)}{2 f'(\alpha)} e_n^2 as n \to \infty and e_n \to 0, which sets up the quadratic convergence rate analyzed further in the quadratic convergence section.[13]Multidimensional Case
Newton's method extends naturally to finding roots of systems of nonlinear equations in multiple variables, where the function \mathbf{F}: \mathbb{R}^n \to \mathbb{R}^n satisfies \mathbf{F}(\mathbf{x}) = \mathbf{0}.[15] The method relies on a linear approximation derived from the multivariable Taylor expansion around the current iterate \mathbf{x}_k: \mathbf{F}(\mathbf{x}_k + \boldsymbol{\delta}) \approx \mathbf{F}(\mathbf{x}_k) + \mathbf{J}(\mathbf{x}_k) \boldsymbol{\delta} = \mathbf{0}, where \mathbf{J}(\mathbf{x}_k) is the n \times n Jacobian matrix with entries J_{ij} = \frac{\partial F_i}{\partial x_j} evaluated at \mathbf{x}_k.[15] Solving for the step \boldsymbol{\delta} yields \boldsymbol{\delta} = -\mathbf{J}(\mathbf{x}_k)^{-1} \mathbf{F}(\mathbf{x}_k), assuming \mathbf{J}(\mathbf{x}_k) is invertible, which requires it to be nonsingular near the root.[16] The iteration then updates as \mathbf{x}_{k+1} = \mathbf{x}_k - \mathbf{J}(\mathbf{x}_k)^{-1} \mathbf{F}(\mathbf{x}_k).[15] The Jacobian matrix captures the local sensitivity of each equation component to changes in the variables and is typically computed analytically if possible or approximated via finite differences otherwise.[17] Invertibility of the Jacobian near the root ensures a unique local solution to the linearized system, facilitating convergence to the nonlinear root.[16] For overdetermined systems where \mathbf{F}: \mathbb{R}^n \to \mathbb{R}^m with m > n, the method adapts via a least-squares approach, minimizing \|\mathbf{F}(\mathbf{x}_k + \boldsymbol{\delta})\|^2 \approx \|\mathbf{F}(\mathbf{x}_k) + \mathbf{J}(\mathbf{x}_k) \boldsymbol{\delta}\|^2.[17] The solution is \boldsymbol{\delta} = -\left( \mathbf{J}(\mathbf{x}_k)^T \mathbf{J}(\mathbf{x}_k) \right)^{-1} \mathbf{J}(\mathbf{x}_k)^T \mathbf{F}(\mathbf{x}_k), or equivalently using the Moore-Penrose pseudoinverse \mathbf{J}(\mathbf{x}_k)^- when the normal matrix \mathbf{J}^T \mathbf{J} is ill-conditioned.[17] Geometrically, the update in the n-dimensional case can be interpreted by considering the graph of \mathbf{F} in \mathbb{R}^{2n}, where the current point (\mathbf{x}_k, \mathbf{F}(\mathbf{x}_k)) lies on a hypersurface; the linear approximation defines the tangent hyperplane at this point, and the Newton step finds the intersection of this tangent hyperplane with the hyperplane \{\mathbf{y} = \mathbf{0}\} to obtain the next iterate \mathbf{x}_{k+1}.[18] This intersection refines the approximation by aligning the local linear model with the zero level set of \mathbf{F}.[18]Convergence Analysis
Quadratic Convergence
Under standard assumptions, Newton's method exhibits quadratic convergence to a simple root. Specifically, if the function f is twice continuously differentiable, f'(\alpha) \neq 0 where \alpha is a simple root (i.e., f(\alpha) = 0), and the initial guess x_0 is sufficiently close to \alpha, then the error e_n = x_n - \alpha satisfies \lim_{n \to \infty} \frac{|e_{n+1}|}{|e_n|^2} = \left| \frac{f''(\alpha)}{2 f'(\alpha)} \right| < \infty. \tag{1} This limit establishes the quadratic rate, meaning the number of correct digits roughly doubles with each iteration once close to the root.[6][19] The proof relies on Taylor expansions of f and f' around the root \alpha. Consider the iteration x_{n+1} = x_n - \frac{f(x_n)}{f'(x_n)}, so the error is e_{n+1} = e_n - \frac{f(\alpha + e_n)}{f'(\alpha + e_n)}. Expanding f(\alpha + e_n) gives f(\alpha + e_n) = f(\alpha) + f'(\alpha) e_n + \frac{1}{2} f''(\xi_n) e_n^2 = f'(\alpha) e_n + \frac{1}{2} f''(\xi_n) e_n^2, for some \xi_n between \alpha and x_n, since f(\alpha) = 0. Similarly, expand f'(\alpha + e_n) = f'(\alpha) + f''(\eta_n) e_n for some \eta_n between \alpha and x_n. Substituting and simplifying, while neglecting higher-order terms as e_n \to 0, e_{n+1} \approx -\frac{f''(\alpha)}{2 f'(\alpha)} e_n^2. Taking absolute values yields |e_{n+1}| \approx \frac{|f''(\alpha)|}{2 |f'(\alpha)|} |e_n|^2, confirming the limit in (1).[6][19] This quadratic convergence is local, occurring within a basin of attraction around \alpha: there exists a neighborhood such that if x_0 lies within it, the sequence converges to \alpha at the quadratic rate. The size of this basin depends on the function's properties, such as the ratio |f''(\alpha)/f'(\alpha)|, ensuring |e_{n+1}| < |e_n| when |e_n| is small enough (e.g., |e_n| < 2 |f'(\alpha)/f''(\alpha)|).[20][21] The finite limit in (1) defines the asymptotic constant C = \left| \frac{f''(\alpha)}{2 f'(\alpha)} \right|, which quantifies the convergence speed near the root; smaller C implies faster reduction in error.[6][19]Error Bounds and Conditions
In the one-dimensional case, Newton's method admits an a priori error bound derived from Taylor's theorem with remainder, stating that if f is twice continuously differentiable on an interval containing the root \alpha and the iterates, with |f'(x)| \geq L > 0 and |f''(x)| \leq M for all x in that interval, then the error satisfies |x_{n+1} - \alpha| \leq \frac{M}{2L} |x_n - \alpha|^2, provided the initial error |x_0 - \alpha| is small enough to keep iterates within the interval (specifically, \frac{M}{2L} |x_0 - \alpha| < 1). This bound quantifies the quadratic reduction in error, with the constant K = \frac{M}{2L} depending on the supremum norm of f'' relative to the infimum norm of f'.[22] Sufficient conditions for global convergence from any starting point in a suitable interval, often referred to as Fourier conditions in historical analyses, require properties like the boundedness of |f''(x)/f'(x)| and Lipschitz continuity of f'(x)/f(x), ensuring the iteration remains monotone and approaches the root without oscillation. These conditions, building on early work by Fourier, who in 1818 attempted to prove quadratic convergence in one dimension (though the proof was flawed), and more robustly in 1831, guarantee that if f has a simple root and satisfies such analytic constraints (e.g., |f''(x)/f'(x)| \leq \beta < 2 on an interval bracketing the root), the method converges globally for initial guesses within that interval. Later extensions, such as Cauchy's semilocal theorem, refine these by imposing bounds on |f''| and |f'| to ensure uniqueness and quadratic rates from sufficiently broad starting regions.[23][24] In the multivariable case, the error analysis extends via the mean value theorem for vector functions, yielding an approximation e_{k+1} \approx \frac{1}{2} J(\alpha)^{-1} H(\alpha) [e_k \otimes e_k], where e_k = x_k - \alpha is the error vector, J(\alpha) is the Jacobian matrix of the system F(x) = 0 at the root \alpha, and H(\alpha) denotes the second derivative tensor (Hessian for scalar F, or bilinear form for vector F). This quadratic term arises from the second-order Taylor expansion of F, highlighting how the method's superlinear behavior depends on the nonsingularity of J(\alpha) and boundedness of higher derivatives near \alpha. Quantitative bounds follow similarly, with \|e_{k+1}\| \leq \tau \|e_k\|^2 for some \tau > 0 when iterates are sufficiently close.[25] Newton's method demonstrates local convergence, where quadratic error reduction occurs provided the initial guess is sufficiently proximate to the root, as captured by the aforementioned bounds; however, global convergence—ensuring success from distant starting points—relies on additional structural assumptions on F, such as monotonicity or convexity in one dimension, or contractive mappings in higher dimensions, without which the method may diverge or converge slowly.[23]Practical Considerations
Derivative Computation Challenges
One significant challenge in applying Newton's method arises when the derivative f'(x) of the objective function is difficult or impossible to compute analytically, particularly for complex or black-box functions where symbolic differentiation yields cumbersome expressions. In such cases, numerical approximations become necessary, but they introduce errors that can compromise the method's quadratic convergence properties. For instance, in financial modeling, deriving exact derivatives for implied volatility calculations can be tedious, prompting the use of numerical alternatives.[26] A common approach is the finite difference approximation, which estimates the derivative using discrete function evaluations, such as the forward difference formula: f'(x) \approx \frac{f(x + h) - f(x)}{h}, where h > 0 is a small perturbation parameter. The truncation error in this first-order scheme is O(h), stemming from the Taylor expansion of f, while roundoff error arises from floating-point arithmetic and scales as O(\epsilon / h), with \epsilon denoting machine precision (typically $10^{-16} for double precision). To minimize the total error, h is optimally chosen around \sqrt{\epsilon} \cdot |x|, balancing these competing effects; for central differences, which achieve O(h^2) truncation error, the optimal h is approximately \sqrt{\epsilon / 2} \cdot |x|. However, poor h selection can amplify errors, especially in high dimensions or near singularities, leading to unreliable iteration steps in Newton's method.[27][28] Automatic differentiation (AD) addresses these limitations by computing exact derivatives to machine precision without approximation, decomposing the function into elementary operations and applying chain rule propagation. In forward-mode AD, derivatives are computed alongside function evaluations in a single pass, suitable for low-dimensional problems, while reverse-mode (adjoint) AD efficiently handles high-dimensional cases by backpropagating sensitivities, often at a cost comparable to two function evaluations. This exactness eliminates truncation errors inherent in finite differences, improving accuracy by orders of magnitude (e.g., from $10^{-13} relative error in numerical methods to near-zero in AD for test functions) and enhancing performance in root-finding applications, where AD-derived Jacobians accelerate convergence in Newton's iterations for nonlinear equations in solid mechanics.[29][30][31] When the derivative is entirely unavailable, such as in derivative-free optimization scenarios, Newton's method must be modified, motivating transitions to derivative-free alternatives like the secant method, which approximates f' via divided differences from prior iterates, or quasi-Newton methods that build low-rank updates to an initial Hessian estimate. These approaches maintain superlinear convergence under suitable conditions but sacrifice the quadratic rate of exact Newton steps.[32] Inaccuracies in derivative computations propagate through the Newton update x_{k+1} = x_k - f(x_k)/f'(x_k), potentially destabilizing iterations by altering the step direction or length, which can cause divergence or slow convergence if the relative error exceeds a fraction of the residual (e.g., \delta \leq 0.1 \|f(x_k)\|). In inexact Newton frameworks, convergence guarantees hold if derivative approximations satisfy adaptive tolerances, such as \|F'(x_k) s_k + F(x_k)\| \leq \eta_k \|F(x_k)\| with forcing parameters \eta_k \to 0, ensuring local quadratic rates akin to exact methods while allowing controlled inexactness to reduce computational cost.[33]Convergence Failures and Slow Cases
Newton's method can fail to converge or exhibit pathological behavior in several scenarios, often due to the nature of the function or the choice of initial guess. One common failure occurs when the derivative f'(x_n) is approximately zero at an iterate, leading to large steps that cause divergence, or when the initial guess x_0 is sufficiently far from the root, resulting in iterates that move away from the solution region.[21] Additionally, if f'(x_n) = 0 exactly, the iteration step becomes undefined due to division by zero, halting the process; for instance, starting at a critical point like x_0 = 0 for f(x) = x^3 - x^2 - 1 yields f'(0) = 0, preventing further computation.[34] Oscillations represent another form of non-convergence, where iterates alternate around the root due to overshooting caused by steep tangents or specific function shapes. A classic example is f(x) = x^3 - 2x + 2, where starting at x_0 = 0 produces a period-2 cycle: the next iterate is x_1 = 1, followed by x_2 = 0, and so on, never approaching the actual root near -1.77.[6] Such cycles trap the method in bounded but non-convergent paths, particularly near local extrema or for functions without real roots.[21] For roots of multiplicity m > 1, where f(x) = (x - \alpha)^m g(x) with g(\alpha) \neq 0, Newton's method converges linearly with rate \lambda = 1 - 1/m, significantly slower than the quadratic rate for simple roots (m=1).[35] To restore quadratic convergence, a modified iteration can be used if m is known: x_{n+1} = x_n - m \frac{f(x_n)}{f'(x_n)}. This adjustment accounts for the higher-order zero in the derivative at the root.[35] Without modification, the method's efficiency drops, as seen in examples like f(x) = (x-1)^3, where only one significant digit may be gained after 10 iterations near the multiple root.[21] Slow convergence also arises near inflection points or flat regions, where the second derivative changes sign or the first derivative is small, causing the linear tangent approximation to poorly capture the function's curvature. For f(x) = (x-1)^3, an inflection at the root x=1 leads to high relative errors persisting over iterations, such as 32.65% after four steps from x_0 = -1, due to the tangent line's inadequate fit.[36] In flat regions, small |f'(x_n)| amplifies errors similarly to near-zero derivatives, prolonging the approach to the root.[34] In the complex plane, Newton's method reveals chaotic basins of attraction, where the plane partitions into fractal regions drawing initial points to different roots, with intricate, interwoven boundaries. For polynomials like z^3 - 1 = 0, these basins exhibit self-similar fractal structures, highlighting sensitivity to initial conditions and potential non-convergence to the intended root.[37]Examples
Square Root Computation
One of the simplest applications of Newton's method is computing the square root of a positive number a, denoted \sqrt{a}, by finding the root of the equation f(x) = x^2 - a = 0.[38] The derivative is f'(x) = 2x, leading to the iteration formula x_{n+1} = x_n - \frac{f(x_n)}{f'(x_n)} = \frac{1}{2} \left( x_n + \frac{a}{x_n} \right).[38] This is a special case of the one-dimensional Newton's method for solving f(x) = 0. A reasonable starting guess x_0 can be a/2 or simply 1, assuming a > 0; for small a, x_0 = 1 works well.[38] To illustrate, consider computing \sqrt{2} \approx 1.414213562 with x_0 = 1:| Iteration n | x_n | Error (relative to true \sqrt{2}) |
|---|---|---|
| 0 | 1.000000 | ≈ 0.292893 |
| 1 | 1.500000 | ≈ 0.060660 |
| 2 | 1.416667 | ≈ 0.001794 |
| 3 | 1.414216 | ≈ 0.0000009 |
| 4 | 1.414214 | < machine epsilon (≈ 2.22 × 10^{-16}) |
Transcendental Equation Solution
A representative application of Newton's method arises in solving transcendental equations that combine trigonometric and polynomial terms, such as finding the intersection points of y = \cos x and y = x^3. These equations lack closed-form algebraic solutions, making iterative numerical methods essential. Define the function f(x) = \cos x - x^3, so the roots satisfy f(x) = 0. The derivative is f'(x) = -\sin x - 3x^2, which is used in the iteration formula x_{n+1} = x_n - \frac{f(x_n)}{f'(x_n)}.[40] Selecting an initial guess near the intersection, take x_0 = 0.8. This value is chosen based on graphical inspection, where \cos(0.8) \approx 0.697 > 0.8^3 = 0.512, placing it close to the root around 0.865. The iterations proceed as follows, with values computed to four decimal places for clarity:| Iteration n | x_n | f(x_n) | f'(x_n) | x_{n+1} |
|---|---|---|---|---|
| 0 | 0.8000 | 0.1847 | -2.6374 | 0.8701 |
| 1 | 0.8701 | -0.0151 | -3.0355 | 0.8651 |
| 2 | 0.8651 | 0.0003 | -3.0199 | 0.8652 |
| 3 | 0.8652 | -0.0000 | -3.0200 | 0.8652 |
| 4 | 0.8652 | 0.0000 | -3.0200 | 0.8652 |
Generalizations
Systems of Equations
Newton's method extends naturally to solving systems of nonlinear equations \mathbf{F}(\mathbf{x}) = \mathbf{0}, where \mathbf{F}: \mathbb{R}^n \to \mathbb{R}^n is a vector-valued function with continuously differentiable components. The method approximates the solution by iteratively solving a linear system involving the Jacobian matrix \mathbf{J}(\mathbf{x}), which contains the partial derivatives of \mathbf{F} (as defined in the multidimensional case). At each iteration k, given current estimate \mathbf{x}^{(k)}, the update is obtained by solving \mathbf{J}(\mathbf{x}^{(k)}) \boldsymbol{\delta} = -\mathbf{F}(\mathbf{x}^{(k)}) for the step \boldsymbol{\delta}, then setting \mathbf{x}^{(k+1)} = \mathbf{x}^{(k)} + \boldsymbol{\delta}. This process continues until \|\mathbf{F}(\mathbf{x}^{(k)})\| falls below a specified tolerance.[42] A representative example is the system \begin{cases} x_1^2 + x_1 x_2 - 10 = 0, \\ x_2 + 3 x_1 x_2^2 - 57 = 0, \end{cases} with exact solution (x_1, x_2) = (2, 3). The Jacobian is \mathbf{J}(x_1, x_2) = \begin{bmatrix} 2x_1 + x_2 & x_1 \\ 3 x_2^2 & 1 + 6 x_1 x_2 \end{bmatrix}. Starting from initial guess (x_1, x_2) = (1.5, 3.5), the method requires 4 iterations to converge to the solution within a relative error tolerance of $10^{-4}. At the first iteration, \mathbf{F}(1.5, 3.5) = (-2.5, 1.625) and \mathbf{J}(1.5, 3.5) = \begin{bmatrix} 6.5 & 1.5 \\ 36.75 & 32.5 \end{bmatrix}, yielding step \boldsymbol{\delta} \approx (0.536, -0.656) and update to (2.036, 2.844); subsequent steps refine this quadratically toward the root.[42] For overdetermined systems where the number of equations m > n (the number of unknowns), an exact solution may not exist, so Newton's method is adapted via the Gauss-Newton algorithm to minimize the nonlinear least-squares objective \frac{1}{2} \|\mathbf{F}(\mathbf{x})\|_2^2. Here, the iteration approximates the Hessian of the objective using \mathbf{J}(\mathbf{x})^T \mathbf{J}(\mathbf{x}), solving the normal equations [\mathbf{J}(\mathbf{x}^{(k)})^T \mathbf{J}(\mathbf{x}^{(k)})] \boldsymbol{\delta} = -\mathbf{J}(\mathbf{x}^{(k)})^T \mathbf{F}(\mathbf{x}^{(k)}) for the step, which reduces to a linear least-squares problem per iteration. This approach achieves local quadratic convergence near a solution where \mathbf{J} has full rank.[43] The primary computational cost per iteration for square systems arises from forming the Jacobian (requiring O(n^2) evaluations of partial derivatives) and solving the resulting n \times n linear system, typically via LU decomposition at O(n^3) operations. For large n, direct solvers dominate, though preconditioning or iterative methods can mitigate this for sparse Jacobians.[44]Complex and Banach Space Extensions
Newton's method extends naturally to the complex domain for analytic functions f: \mathbb{C} \to \mathbb{C}, where the iteration formula remains z_{n+1} = z_n - \frac{f(z_n)}{f'(z_n)}, mirroring the real case but operating in the complex plane.[45] This extension preserves local quadratic convergence near simple roots, provided f'(z^*) \neq 0 at the root z^*, due to the analyticity ensuring the method's classical properties hold.[45] A key advancement in the complex setting is Smale's \alpha-theory, which provides pointwise estimates for convergence using data at a single approximate root.[45] Specifically, for an analytic function f, the quantity \alpha(f, z) = \beta(f, z) \cdot \gamma(f, z)^{1/2}, where \beta(f, z) = \frac{|f(z)|}{|f'(z)|} measures the Newton step size and \gamma(f, z) bounds higher-order terms via the inverse of the first-order Taylor remainder, certifies quadratic convergence if \alpha(f, z) < \alpha_0 \approx 0.1307.[45] This theory quantifies the basin of attraction in the complex plane, enabling a posteriori verification of approximate roots without global knowledge of f.[45] In Banach spaces, Newton's method generalizes to operators F: X \to Y between Banach spaces X and Y, using the Fréchet derivative DF(x) at iteration points.[46] The update becomes x_{n+1} = x_n - [DF(x_n)]^{-1} F(x_n), where [DF(x_n)]^{-1} is a bounded inverse operator, assuming invertibility.[46] This formulation applies to infinite-dimensional settings, such as function spaces, where local quadratic convergence holds under Lipschitz continuity of DF near a root x^* with DF(x^*) invertible.[46] The Newton-Kantorovich theorem establishes sufficient conditions for quadratic convergence in this Banach space context.[46] For F continuously Fréchet differentiable on a convex open set \Omega \subset X, with DF(x_0) invertible and \|DF(x) - DF(x_0)\| \leq K \|x - x_0\| for some K > 0, if the initial residual satisfies \eta = \| [DF(x_0)]^{-1} F(x_0) \| \leq \frac{1}{2K} and the ball condition holds, then the iterates converge quadratically to a unique root in a specified ball.[46] This semi-local result, originating from Kantorovich's work on functional equations, bounds the convergence radius and error without requiring global Lipschitz assumptions.[46] For low-regularity problems in infinite dimensions, such as nonlinear partial differential equations (PDEs), the Nash-Moser iteration provides a variant of Newton's method that handles loss of derivatives through smoothing operators.[47] In tame Fréchet spaces equipped with scales of smoothing norms, the iteration applies a modified Newton step u_{n+1} = u_n - S_n [DF(u_n)]^{-1} F(u_n), where S_n is a smoothing operator gaining derivatives, followed by a tame loss-compensation step to control regularity.[47] This yields an inverse function theorem for maps between tame Fréchet spaces, ensuring local surjectivity and invertibility even when classical Newton fails due to insufficient smoothness.[47] The technique, building on Nash's embedding proofs and Moser's elliptic regularity work, is pivotal for proving local existence of solutions to quasilinear PDEs in high dimensions.[47]Modifications
Quasi-Newton Methods
Quasi-Newton methods represent a class of iterative algorithms that modify Newton's method by approximating the Jacobian matrix (for systems of equations) or the Hessian matrix (for optimization problems) through low-rank updates, thereby avoiding the costly full recomputation of derivatives at each iteration.[48] These approximations are constructed to satisfy a secant condition, which ensures that the update aligns with the curvature information from successive function evaluations, providing a balance between accuracy and computational efficiency.[48] By leveraging information from previous steps, quasi-Newton methods reduce the expense associated with derivative computation challenges, such as in high-dimensional settings.[48] A foundational example is Broyden's method, introduced in 1965, which applies to solving systems of nonlinear equations \mathbf{f}(\mathbf{x}) = \mathbf{0}.[49] It updates the Jacobian approximation \mathbf{J}_{k+1} from \mathbf{J}_k using a rank-1 correction that enforces the secant condition: \mathbf{J}_{k+1} \mathbf{s}_k = \mathbf{y}_k, where \mathbf{s}_k = \mathbf{x}_{k+1} - \mathbf{x}_k is the step vector and \mathbf{y}_k = \mathbf{f}(\mathbf{x}_{k+1}) - \mathbf{f}(\mathbf{x}_k) is the function difference.[49] The explicit update formula is \mathbf{J}_{k+1} = \mathbf{J}_k + \frac{(\mathbf{y}_k - \mathbf{J}_k \mathbf{s}_k) \mathbf{s}_k^\top}{\|\mathbf{s}_k\|^2}, which maintains a good approximation while requiring only matrix-vector products and avoiding direct derivative evaluations after initialization.[49] In the context of unconstrained optimization, the BFGS method extends this idea to approximate the Hessian of the objective function, ensuring the update remains positive definite under standard assumptions like sufficient decrease in the line search.[48] Independently proposed in 1970 by Broyden, Fletcher, Goldfarb, and Shanno, BFGS uses a rank-2 update for the inverse Hessian approximation \mathbf{B}_{k+1}^{-1}: \mathbf{B}_{k+1}^{-1} = \left( \mathbf{I} - \frac{\mathbf{s}_k \mathbf{y}_k^\top}{\mathbf{y}_k^\top \mathbf{s}_k} \right) \mathbf{B}_k^{-1} \left( \mathbf{I} - \frac{\mathbf{y}_k \mathbf{s}_k^\top}{\mathbf{y}_k^\top \mathbf{s}_k} \right) + \frac{\mathbf{s}_k \mathbf{s}_k^\top}{\mathbf{y}_k^\top \mathbf{s}_k}, where \mathbf{y}_k = \nabla f(\mathbf{x}_{k+1}) - \nabla f(\mathbf{x}_k).[50] This formulation preserves symmetry and positive definiteness if \mathbf{B}_k^{-1} is positive definite and \mathbf{y}_k^\top \mathbf{s}_k > 0, making it suitable for minimizing smooth functions.[48] Under appropriate local conditions, such as Lipschitz continuity of the derivatives and starting sufficiently close to the solution, quasi-Newton methods achieve superlinear convergence, meaning the convergence order exceeds 1 but is typically less than the quadratic order of full Newton's method.[48] This superlinear rate arises from the accumulating accuracy of the matrix approximations, while each iteration remains cheaper than Newton's due to the low-rank updates and avoidance of exact second derivatives.[48] These methods are particularly advantageous in high-dimensional problems, where evaluating the full Jacobian or Hessian can dominate computational costs, as the updates scale favorably with dimension.[48]Higher-Order and Specialized Variants
Chebyshev's method is a third-order root-finding algorithm that extends Newton's method by incorporating the second derivative, achieving cubic convergence under suitable conditions near a simple root.[51] The iteration is given by x_{n+1} = x_n - \frac{f(x_n)}{f'(x_n)} \left(1 - \frac{1}{2} \frac{f(x_n) f''(x_n)}{[f'(x_n)]^2}\right), where f'' denotes the second derivative.[52] This method requires evaluating higher derivatives, increasing computational cost but offering faster convergence than the quadratic rate of standard Newton's method; explicit convergence regions have been derived for specific applications like principal pth roots.[52] In the context of p-adic numbers, Newton's method adapts to the ultrametric topology of \mathbb{Q}_p, where the p-adic norm satisfies |x + y|_p \leq \max\{|x|_p, |y|_p\}, leading to unique convergence behaviors distinct from the real case.[53] For a polynomial f \in \mathbb{Z}_p, if an initial approximation x_0 satisfies f(x_0) \equiv_p 0 and f'(x_0) \not\equiv_p 0, the iterates converge quadratically to a unique root \xi \in \mathbb{Z}_p with |x_{n+1} - \xi|_p \leq p^{-2^n}, leveraging Taylor expansions in the complete ultrametric field.[53] This variant finds applications in number theory for computing algebraic roots in p-adic settings, such as higher-order equations, with Smale's \alpha-theory extended to provide sufficient conditions for convergence in ultrametric fields.[54] The q-analogue of Newton's method arises in q-deformed calculus, replacing the standard derivative with the q-derivative D_{q,x} f(x_k) = \frac{f(x_k) - f(q x_k)}{x_k (1 - q)} for q \neq 1, yielding a deformed iteration for root finding.[55] For simple roots, the update is x_{k+1} = x_k - \frac{f(x_k)}{D_{q,x} f(x_k)}, while for roots of multiplicity m, it becomes x_{k+1} = x_k - m \frac{f(x_k)}{D_{q,x} f(x_k)}.[55] When q is close to 1, this method converges more rapidly than the classical version for algebraic and transcendental equations, as demonstrated in numerical examples like solving x e^x - 1 = 0, where q \approx 1 yields approximations nearer to the true root.[55] Modifications to Newton's method address slow linear convergence at multiple roots by estimating multiplicity or adjusting the iteration. Maehly's procedure refines successive roots of polynomials by incorporating previously found roots to avoid reconvergence, using the update x_{k+1} = x_k - \frac{P(x_k)}{P'(x_k) - P(x_k) \sum_{i=1}^j (x_k - x_i)^{-1}}, where P is the polynomial and x_i are prior roots.[56] This approach suppresses zeros effectively without explicit deflation, reducing rounding errors in closely spaced or multiple roots.[56] Hirano's modification enhances global convergence for algebraic equations with multiple roots by altering the Newton step to ensure quadratic convergence even in challenging cases, successfully applied to otherwise difficult polynomial systems.[57] Interval Newton's method employs interval arithmetic to compute guaranteed enclosures of roots, providing verified bounds for nonlinear equations.[58] It combines a Gauss-Seidel-type iteration, real Newton steps, and a linearized solver subalgorithm, where intervals represent sets of possible values, and the method refines enclosures until a unique root is isolated within the interval.[58] This variant ensures computational reliability by enclosing all possible solutions, converging in fewer iterations than alternatives like Krawczyk's method while bounding errors rigorously.[58]Applications
Optimization Problems
Newton's method is widely used to solve unconstrained optimization problems of the form \min_{x \in \mathbb{R}^n} f(x), where f: \mathbb{R}^n \to \mathbb{R} is a twice continuously differentiable function, by iteratively finding critical points where the gradient \nabla f(x) = 0.[59] The method approximates the objective function locally by its second-order Taylor expansion, leading to an update rule that solves for the minimum of this quadratic model. Specifically, at each iteration k, the Newton step computes the search direction as the solution to H_k d = -\nabla f(x_k), where H_k is the Hessian matrix \nabla^2 f(x_k), and updates x_{k+1} = x_k + d_k = x_k - H_k^{-1} \nabla f(x_k), assuming H_k is invertible.[59] This approach leverages the curvature information from the Hessian to achieve faster convergence compared to first-order methods like gradient descent.[60] To determine the nature of a critical point found by the method, second-order conditions are examined using the Hessian: if \nabla f(x^*) = 0 and H(x^*) is positive definite, then x^* is a strict local minimum; if negative definite, a local maximum; and if indefinite, a saddle point.[61] Positive definiteness of the Hessian ensures the quadratic model is convex, providing a sufficient condition for a local minimum, though it is not necessary—a positive semidefinite Hessian may still indicate a minimum but requires further analysis.[62] Near such a stationary point where the Hessian is positive definite and Lipschitz continuous, Newton's method exhibits quadratic convergence, meaning the error decreases quadratically with each iteration once sufficiently close to the optimum.[59] In practice, the pure Newton method may fail to make progress or diverge if the initial guess is far from the optimum or if the Hessian is not positive definite, prompting the use of damped or modified variants. The damped Newton method incorporates a line search to select a step size \alpha_k \in (0,1] that ensures sufficient decrease in the objective, typically via backtracking along the Newton direction: x_{k+1} = x_k - \alpha_k H_k^{-1} \nabla f(x_k), satisfying conditions like the Armijo rule for descent.[60] This modification guarantees global convergence to a stationary point under suitable assumptions, such as self-concordance of f, while preserving local quadratic convergence.[63] In machine learning, damped Newton's method is applied to optimize the negative log-likelihood in logistic regression, where the Hessian corresponds to the Fisher information matrix, enabling efficient parameter estimation for binary classification tasks.[64] Newton's method is closely related to trust-region methods, which solve a constrained subproblem within a neighborhood where the quadratic model is trusted to be accurate, often yielding similar directions but with added robustness for nonconvex problems.[65]Inverse Computation and Verification
Newton's method can be applied to compute the multiplicative inverse of a nonzero number b, by solving the equation f(x) = \frac{1}{x} - b = 0. The derivative is f'(x) = -\frac{1}{x^2}, leading to the iteration formula x_{n+1} = x_n - \frac{f(x_n)}{f'(x_n)} = 2x_n - b x_n^2.[66] This approach converges quadratically for a suitable initial guess, such as x_0 obtained by approximating b with a power of 2 and using bit shifts for an initial reciprocal, and is analogous to the method for square roots but avoids explicit division in hardware implementations.[67] For formal power series, Newton's method enables efficient reversion, which inverts a series y = f(x) to express x = g(y) as a power series, assuming f(0) = 0 and f'(0) \neq 0. The classical Newton iteration for series operates degree by degree, achieving quadratic convergence in the number of terms, and can be implemented to compute the first n terms in O(n (\log n)^{O(1)}) arithmetic operations using fast multiplication techniques.[68] This is particularly useful in algebraic computations, such as solving differential equations or composing series in computer algebra systems.[69] Numerical verification of solutions to nonlinear equations employs variants like the interval Newton method, which uses interval arithmetic to enclose roots and prove existence or uniqueness within a box [a, b]. A key tool is the Krawczyk operator, defined for a system F(x) = 0 with an approximate solution \tilde{x} and preconditioner C (often the inverse Jacobian approximation) as K(\tilde{x}, X) = \tilde{x} - C F(\tilde{x}) + (I - C J(\tilde{x})) (X - \tilde{x}), where X is an interval enclosure and J is the Jacobian; if K(\tilde{x}, X) \subseteq X and the spectral radius of I - C J(\tilde{x}) is less than 1, then there exists a unique root in X.[70] This method guarantees certified enclosures without exhaustive search, extending classical Newton's method to validated numerics.[71] In applications to transcendental equations, Newton's method solves Kepler's equation M = E - e \sin E for the eccentric anomaly E given mean anomaly M and eccentricity e < 1, crucial in orbital mechanics for determining satellite positions. Starting with a simple initial guess such as E_0 = M, the iteration converges rapidly, often in 3-4 steps, outperforming fixed-point methods for high precision.[72][73] This has been optimized in astronomical computations since the 19th century.[73] Error certification for computed roots uses backward error analysis, assessing how close the approximate root \tilde{x} is to an exact root of a perturbed equation f(x) + \delta(x) = 0, where the backward error is typically \|\delta\| \approx |f(\tilde{x})| in a suitable norm. For Newton's method, the residual |f(\tilde{x})| provides a bound on this perturbation, certifying the accuracy relative to input data uncertainties, especially in floating-point arithmetic where forward error may amplify ill-conditioning.[74] This approach is fundamental in verified computing to ensure reliability of roots in scientific simulations.[75]Implementation
Algorithmic Steps
Newton's method, also known as the Newton-Raphson method, proceeds iteratively to approximate a root of the equation f(x) = 0 by refining an initial guess using the function's derivative.[76] The algorithm begins with an initial approximation x_0, typically chosen based on graphical inspection or domain knowledge to ensure it is reasonably close to the root.[77] At each iteration n, the method computes the function value f(x_n) and its derivative f'(x_n), then updates the approximation via the formula x_{n+1} = x_n - \frac{f(x_n)}{f'(x_n)}, which corresponds to the x-intercept of the tangent line to f at x_n.[78] This process repeats until a convergence criterion is met, producing a sequence \{x_n\} that ideally converges quadratically to the root under suitable conditions.[77] The following pseudocode outlines the core steps for a one-dimensional implementation:This structure ensures the iteration halts appropriately.[78][76] To safeguard against infinite loops or numerical instability, implementations include a maximum iteration limit N, often set to 100 or based on expected convergence speed, beyond which the algorithm terminates with a failure message.[78] Additionally, a check for f'(x_n) \neq 0 prevents division by zero, which would otherwise cause the method to fail at that step.[76] Common convergence criteria involve absolute or relative tolerances, such as |f(x_{n+1})| < \epsilon for the function residual, |x_{n+1} - x_n| < \delta for the step size, or |x_{n+1} - x_n| / |x_{n+1}| < \delta for relative change, where \epsilon and \delta are small user-defined thresholds like $10^{-6}.[78] These criteria balance accuracy with computational efficiency, as the method typically doubles the number of correct digits per iteration near the root.[77] In cases where Newton's method stalls, such as when the derivative is near zero or the initial guess leads to oscillation, hybrid approaches incorporate a fallback to the bisection method, which guarantees convergence albeit more slowly.[79] For instance, if the step x_{n+1} falls outside a bracketing interval [a, b] containing the root, the algorithm switches to bisection on that interval until progress resumes.[79] Regarding computational complexity, each iteration in one dimension requires O(1) operations, assuming evaluations of f and f' are constant time.[80] In n dimensions, where the method solves f(\mathbf{x}) = \mathbf{0} using the Jacobian matrix, each iteration involves inverting or factoring an n \times n system, leading to O(n^3) complexity per step.[80]INPUT: initial guess x_0, tolerance ε > 0, maximum iterations N SET n = 0 WHILE n < N DO COMPUTE f(x_n) COMPUTE f'(x_n) IF f'(x_n) = 0 THEN OUTPUT "Derivative is zero; method fails" STOP END IF SET x_{n+1} = x_n - f(x_n) / f'(x_n) IF |f(x_{n+1})| < ε OR |x_{n+1} - x_n| < ε THEN OUTPUT x_{n+1} as approximate root STOP END IF SET n = n + 1 END WHILE OUTPUT "Maximum iterations reached; no convergence"INPUT: initial guess x_0, tolerance ε > 0, maximum iterations N SET n = 0 WHILE n < N DO COMPUTE f(x_n) COMPUTE f'(x_n) IF f'(x_n) = 0 THEN OUTPUT "Derivative is zero; method fails" STOP END IF SET x_{n+1} = x_n - f(x_n) / f'(x_n) IF |f(x_{n+1})| < ε OR |x_{n+1} - x_n| < ε THEN OUTPUT x_{n+1} as approximate root STOP END IF SET n = n + 1 END WHILE OUTPUT "Maximum iterations reached; no convergence"
Sample Code Snippets
Newton's method can be implemented straightforwardly in various programming languages, following the iterative update x_{n+1} = x_n - f(x_n)/f'(x_n) for univariate cases or its matrix generalization for multivariate systems. These examples include convergence checks, iteration limits, and error handling for issues like zero derivatives or singular Jacobians, drawing from standard numerical analysis practices.[81] Python (Univariate)The following Python function implements Newton's method for finding roots of a univariate function, using a for loop with exception handling for division by zero and a tolerance on the step size for convergence.
This code assumes the user provides the function f and its derivative f'; it returns the approximate root and iteration count upon convergence.[82] MATLAB/Octave (Univariate)pythondef newton(f, fprime, x0, tol=1e-6, maxiter=100): """ Newton's method for univariate root finding. Parameters: f: function to solve f(x) = 0 fprime: derivative of f x0: initial guess tol: tolerance for convergence maxiter: maximum iterations Returns: Approximate root and number of iterations """ x = x0 for i in range(maxiter): try: fx = f(x) fpx = fprime(x) dx = fx / fpx x -= dx if abs(dx) < tol: return x, i + 1 except ZeroDivisionError: raise ValueError("Derivative is zero at current iterate") raise ValueError("Maximum iterations reached without convergence")def newton(f, fprime, x0, tol=1e-6, maxiter=100): """ Newton's method for univariate root finding. Parameters: f: function to solve f(x) = 0 fprime: derivative of f x0: initial guess tol: tolerance for convergence maxiter: maximum iterations Returns: Approximate root and number of iterations """ x = x0 for i in range(maxiter): try: fx = f(x) fpx = fprime(x) dx = fx / fpx x -= dx if abs(dx) < tol: return x, i + 1 except ZeroDivisionError: raise ValueError("Derivative is zero at current iterate") raise ValueError("Maximum iterations reached without convergence")
A MATLAB or Octave equivalent uses a while loop to iterate until the step norm falls below tolerance, with a check for near-zero derivatives in one dimension.
For one-dimensional problems, the norm of the scalar step dx is equivalent to its absolute value; this script can be called directly from the command window.[7] C++ (Square Root Example)matlabfunction [root, iter] = newton(f, df, x0, tol, maxiter) % Newton's method for univariate root finding if nargin < 4, tol = 1e-6; end if nargin < 5, maxiter = 100; end x = x0; iter = 0; while iter < maxiter fx = f(x); dfx = df(x); if abs(dfx) < 1e-15 error('Derivative is zero at current iterate'); end dx = fx / dfx; x = x - dx; iter = iter + 1; if norm(dx) < tol root = x; return; end end error('Maximum iterations reached without convergence'); endfunction [root, iter] = newton(f, df, x0, tol, maxiter) % Newton's method for univariate root finding if nargin < 4, tol = 1e-6; end if nargin < 5, maxiter = 100; end x = x0; iter = 0; while iter < maxiter fx = f(x); dfx = df(x); if abs(dfx) < 1e-15 error('Derivative is zero at current iterate'); end dx = fx / dfx; x = x - dx; iter = iter + 1; if norm(dx) < tol root = x; return; end end error('Maximum iterations reached without convergence'); end
To compute the square root of a positive number a (e.g., \sqrt{2}), apply Newton's method to f(x) = x^2 - a = 0 with derivative f'(x) = 2x, using an inline loop for iteration.
This specialized version starts with x_0 = a and outputs NaN for invalid or non-convergent cases, achieving high precision in few iterations for a = 2.[83] Python (Multivariate with NumPy)cpp#include <iostream> #include <cmath> double newton_sqrt(double a, double tol = 1e-6, int maxiter = 100) { // Newton's method for sqrt(a) if (a < 0.0) return std::nan(""); double x = a > 0.0 ? a : 1.0; // Initial guess for (int i = 0; i < maxiter; ++i) { double fx = x * x - a; double fpx = 2.0 * x; if (std::abs(fpx) < 1e-15) { return std::nan(""); // Singular case } double dx = fx / fpx; x -= dx; if (std::abs(dx) < tol) { return x; } } return std::nan(""); // No convergence } int main() { double root = newton_sqrt(2.0); std::cout << "Approximate sqrt(2): " << root << std::endl; return 0; }#include <iostream> #include <cmath> double newton_sqrt(double a, double tol = 1e-6, int maxiter = 100) { // Newton's method for sqrt(a) if (a < 0.0) return std::nan(""); double x = a > 0.0 ? a : 1.0; // Initial guess for (int i = 0; i < maxiter; ++i) { double fx = x * x - a; double fpx = 2.0 * x; if (std::abs(fpx) < 1e-15) { return std::nan(""); // Singular case } double dx = fx / fpx; x -= dx; if (std::abs(dx) < tol) { return x; } } return std::nan(""); // No convergence } int main() { double root = newton_sqrt(2.0); std::cout << "Approximate sqrt(2): " << root << std::endl; return 0; }
For solving systems of nonlinear equations \mathbf{F}(\mathbf{x}) = \mathbf{0}, use NumPy to compute the Jacobian and solve the linear system at each step via matrix inversion.
The function \mathbf{F} returns a vector, and J a matrix; convergence is checked via the Euclidean norm of the step.[84] Testing the Implementationspythonimport numpy as np def newton_multivariate(F, J, x0, tol=1e-6, maxiter=100): """ Multivariate Newton's method. Parameters: F: vector-valued function R^n -> R^n J: Jacobian matrix function x0: initial guess (array-like) tol: tolerance maxiter: maximum iterations Returns: Approximate root and iterations """ x = np.array(x0, dtype=float) for i in range(maxiter): Fx = F(x) Jx = J(x) try: dx = np.linalg.solve(Jx, Fx) x -= dx if np.linalg.norm(dx) < tol: return x, i + 1 except np.linalg.LinAlgError: raise ValueError("Singular Jacobian at current iterate") raise ValueError("Maximum iterations reached without convergence")import numpy as np def newton_multivariate(F, J, x0, tol=1e-6, maxiter=100): """ Multivariate Newton's method. Parameters: F: vector-valued function R^n -> R^n J: Jacobian matrix function x0: initial guess (array-like) tol: tolerance maxiter: maximum iterations Returns: Approximate root and iterations """ x = np.array(x0, dtype=float) for i in range(maxiter): Fx = F(x) Jx = J(x) try: dx = np.linalg.solve(Jx, Fx) x -= dx if np.linalg.norm(dx) < tol: return x, i + 1 except np.linalg.LinAlgError: raise ValueError("Singular Jacobian at current iterate") raise ValueError("Maximum iterations reached without convergence")
These codes can be tested on the classic problem of finding \sqrt{2}, the positive root of f(x) = x^2 - 2 = 0, using an initial guess x_0 = 1.0. The univariate Python version converges to approximately 1.41421356237 in 4 iterations, verifiable as follows:
Similar assertions apply to the other languages, confirming quadratic convergence near the root.[82]pythondef f(x): return x**2 - 2 def fprime(x): return 2 * x root, iters = newton(f, fprime, 1.0) assert abs(root**2 - 2) < 1e-5, "Test failed: inaccuracy exceeds tolerance" print(f"Root: {root}, Iterations: {iters}")def f(x): return x**2 - 2 def fprime(x): return 2 * x root, iters = newton(f, fprime, 1.0) assert abs(root**2 - 2) < 1e-5, "Test failed: inaccuracy exceeds tolerance" print(f"Root: {root}, Iterations: {iters}")