Broyden's method
Broyden's method is a quasi-Newton algorithm in numerical analysis designed to solve systems of n nonlinear equations in n unknowns, F(x) = 0, where F: \mathbb{R}^n \to \mathbb{R}^n. Introduced by C. G. Broyden in 1965, it extends the secant method to multiple dimensions by iteratively approximating the Jacobian matrix (or its inverse) without requiring explicit derivative computations, thereby addressing the high computational expense of Newton's method.[1]
The method begins with an initial guess x_0 and an initial approximation B_0 to the Jacobian F'(x) (or its inverse H_0), then generates iterates x_{k+1} = x_k - B_k^{-1} F(x_k). The key innovation lies in the rank-one update to the approximation: for the "good" Broyden update (one of the variants in the original class), B_{k+1} = B_k + \frac{(y_k - B_k s_k) s_k^T}{s_k^T s_k}, where s_k = x_{k+1} - x_k and y_k = F(x_{k+1}) - F(x_k); this satisfies the secant condition B_{k+1} s_k = y_k while preserving computational efficiency at O(n^2) operations per iteration.[1][2] An alternative "bad" update, B_{k+1} = B_k + \frac{(y_k - B_k s_k) t_k^T}{t_k^T s_k} where t_k = B_k^{-1} y_k, is also part of Broyden's class but can exhibit poorer global behavior.
Under suitable conditions—such as the Jacobian being nonsingular and the initial approximation sufficiently close to the true Jacobian—Broyden's method achieves local superlinear convergence, often approaching the quadratic rate of Newton's method while requiring only n additional function evaluations per step.[2][3] This efficiency has made it a foundational tool in scientific computing, particularly for applications in physics, engineering, and optimization where systems arise from discretizing partial differential equations or setting gradients to zero.[2] Despite potential issues like non-preservation of positive definiteness or sensitivity to poor initial guesses, variants and modifications (e.g., limited-memory versions for large-scale problems) continue to enhance its robustness and applicability.[3]
Overview
Definition and purpose
Broyden's method is a quasi-Newton algorithm for solving systems of nonlinear equations defined by F(\mathbf{x}) = \mathbf{0}, where \mathbf{x} \in \mathbb{R}^n and F: \mathbb{R}^n \to \mathbb{R}^n is a continuously differentiable mapping.[4] The method iteratively approximates solutions by updating an estimate of the Jacobian matrix J_F(\mathbf{x}), enabling the computation of search directions without requiring the full Jacobian at each step.[5] This makes it particularly suitable for high-dimensional problems where F represents complex nonlinear relationships, such as those arising in engineering simulations.[4]
The core purpose of Broyden's method is to achieve efficient root-finding by employing rank-one updates to maintain an approximation of the Jacobian, which significantly lowers the computational cost relative to the full Newton method.[4] Unlike Newton's method, which demands an exact Jacobian evaluation or approximation via finite differences at every iteration—often requiring O(n^2) function evaluations—Broyden's approach leverages previously computed function values to perform these updates, avoiding such overhead after an initial setup.[5] This motivation stems from the need to balance convergence speed with practicality in scenarios where Jacobian computation is prohibitively expensive due to the nonlinearity of F.[4]
A typical problem setup involves seeking roots of F(\mathbf{x}) = \mathbf{0} where direct analytical inversion is infeasible, such as in systems modeling physical processes with interdependent nonlinear components.[4] For example, in pneumatically damped hydraulic networks, the equations couple pressure and flow variables in a way that renders explicit solutions impractical, necessitating iterative methods like Broyden's to converge to equilibrium states.[4] By approximating the Jacobian iteratively, the method facilitates superlinear convergence while minimizing the total number of function evaluations required.[5]
Historical background
Broyden's method was first introduced by Charles George Broyden in 1965 in his seminal paper "A Class of Methods for Solving Nonlinear Simultaneous Equations," published in Mathematics of Computation. In this work, Broyden proposed a family of rank-one update formulas for approximating the Jacobian matrix in the context of solving systems of nonlinear equations, marking a significant advancement in iterative numerical techniques.[4]
The method emerged during a period of increasing interest in quasi-Newton approaches, which sought to balance the efficiency of the full Newton method with reduced computational demands for large-scale problems. It served as a multivariable generalization of the classical secant method, extending the idea of finite-difference approximations to higher dimensions without requiring explicit Jacobian evaluations at each step.[6] This innovation addressed the challenges posed by nonlinear systems in fields like engineering and scientific computing, where exact derivatives were often costly or impractical to compute.[7]
Subsequent developments in the 1970s included extensions of Broyden's ideas to unconstrained optimization, where Broyden himself contributed double-rank updates that influenced symmetric quasi-Newton methods for minimizing nonlinear functions. A key theoretical milestone came in 1979, when David M. Gay proved that Broyden's method terminates in at most $2n steps when applied to nonsingular linear systems of size n \times n, providing important guarantees on its finite convergence properties for such cases.[3]
While primarily designed for root-finding in nonlinear equations, Broyden's framework laid foundational groundwork for later quasi-Newton variants, notably influencing the development of the BFGS method in the early 1970s by providing a general parameterization for updates that preserved desirable properties like positive definiteness in optimization contexts.[8]
Mathematical Foundations
The problem of solving nonlinear equations involves finding values of x such that F(x) = 0, where F can be a scalar function for one-dimensional cases or a vector-valued function for systems in multiple variables.[6] Unlike linear equations, which admit closed-form solutions via matrix inversion or direct formulas like Cramer's rule, nonlinear equations generally lack such explicit solutions and require iterative numerical techniques.[9]
Key challenges in nonlinear equation solving include the absence of general analytical solutions, high sensitivity to the choice of initial guess—which can lead to convergence to different roots or divergence—and the potential existence of multiple roots or none at all, complicating uniqueness and reliability.[10] These issues arise because nonlinear functions can exhibit complex behaviors, such as discontinuities or rapid variations, making global solution finding computationally intensive.[11] Additionally, evaluating the function and its properties at various points often demands significant resources, especially for high-dimensional systems.[12]
Standard approaches for scalar nonlinear equations include the bisection method, which brackets roots using sign changes for guaranteed convergence albeit slowly, and fixed-point iteration, which reformulates the equation as x = g(x) and iterates until stabilization, though it may converge slowly or diverge depending on g.[10] For greater efficiency, particularly in multivariable settings, derivative-based methods like Newton's method serve as a baseline iterative technique by leveraging local approximations.[6]
In multivariable cases, the Jacobian matrix J(x), composed of first partial derivatives, plays a central role by providing the local linearization of the nonlinear system around a point near the root, enabling step computations in iterative solvers.[13] This linear approximation, F(x + \delta x) \approx F(x) + J(x) \delta x = 0, facilitates solving for updates \delta x that drive the iteration toward the solution.[14]
Quasi-Newton methods
Quasi-Newton methods constitute a family of iterative algorithms designed to solve systems of nonlinear equations F(x) = 0, where F: \mathbb{R}^n \to \mathbb{R}^n, by approximating the Jacobian matrix J(x) = \nabla F(x) without explicitly computing its derivatives at each iteration. These methods build and refine the Jacobian approximation using information from successive function evaluations and iterates, thereby avoiding the need for analytical or finite-difference derivative computations. This approach originated as an extension of the secant method to multivariable settings, providing a balance between the rapid local convergence of Newton's method and the derivative-free nature of simpler fixed-point iterations.
Central to quasi-Newton methods is the secant condition, which enforces that the updated Jacobian approximation J_{k+1} satisfies J_{k+1} \Delta x_k = \Delta F_k, where \Delta x_k = x_{k+1} - x_k and \Delta F_k = F(x_{k+1}) - F(x_k). This condition ensures the approximation accurately captures the first-order change in the function along the line connecting consecutive iterates, analogous to the univariate secant method's interpolation property. Various solutions to this underdetermined equation yield different update formulas, all of which maintain the approximation's consistency with observed function differences while requiring only matrix-vector products for implementation.[15]
A primary advantage of quasi-Newton methods over the classical Newton method lies in their reduced computational cost per iteration. Newton's method demands O(n^2) evaluations to form the Jacobian and O(n^3) operations to solve the resulting linear system, which becomes prohibitive for large n. In contrast, quasi-Newton updates typically involve O(n^2) operations, leveraging low-rank modifications to the prior approximation, making them scalable for high-dimensional nonlinear systems where derivative information is expensive or unavailable. This efficiency comes at the potential expense of slightly slower superlinear convergence compared to Newton's quadratic rate, though the overall savings often dominate in practice.[15]
Quasi-Newton methods differ in the structure of their updates, broadly categorized by rank. Rank-one updates alter the Jacobian by adding a single outer product, resulting in simple and stable formulas that preserve nonsingularity under mild conditions. Rank-two updates, which add two outer products, allow for additional constraints such as symmetry preservation and are common in variants adapted from optimization contexts. Broyden's method exemplifies a rank-one approach, highlighting the efficacy of minimal updates in multivariable nonlinear solving.
Scalar case
In the scalar case, Broyden's method addresses the problem of solving a single nonlinear equation f(x) = 0, where it reduces to the classical secant method by approximating the derivative via finite differences from two prior iterates.[4][16] This equivalence arises because the method updates a scalar approximation to the derivative f'(x_n) using the secant formula f'(x_n) \approx \frac{f(x_n) - f(x_{n-1})}{x_n - x_{n-1}}, avoiding explicit derivative computations.[17][18]
The algorithm begins with initial guesses x_0 and x_1 such that f(x_0) \neq f(x_1), then iterates according to the update
x_{n+1} = x_n - f(x_n) \frac{x_n - x_{n-1}}{f(x_n) - f(x_{n-1})}.
This step replaces Newton's method formula x_{n+1} = x_n - f(x_n)/f'(x_n) by substituting the secant approximation for the derivative.[16][17] Each iteration requires only a single function evaluation, making it efficient for scalar problems where derivatives are unavailable or costly.[18]
To illustrate, consider solving the quadratic equation f(x) = x^2 - 2 = 0 (with root \sqrt{2} \approx 1.4142) using initial points x_0 = 1 and x_1 = 1.5:
- f(x_0) = -1, f(x_1) = 0.25
- x_2 = 1.5 - 0.25 \cdot \frac{0.5}{1.25} = 1.4, f(x_2) = -0.04
- x_3 = 1.4 - \frac{(-0.04)(1.4 - 1.5)}{(-0.04) - 0.25} \approx 1.4138, f(x_3) \approx -0.0000
The iterates converge rapidly to the root after three steps, demonstrating the method's effectiveness for simple nonlinear equations.[16][17] This one-dimensional approach provides the conceptual basis for generalizing Broyden's method to multivariable systems.
Multivariable case
Broyden's method in the multivariable case addresses systems of nonlinear equations \mathbf{F}(\mathbf{x}) = \mathbf{0}, where \mathbf{x} \in \mathbb{R}^n and \mathbf{F}: \mathbb{R}^n \to \mathbb{R}^n. The approach approximates the Jacobian matrix \mathbf{J}_k at each iterate \mathbf{x}_k, enabling Newton-like steps without recomputing the full Jacobian via derivatives at every iteration. This rank-one update scheme builds on secant approximations, satisfying the condition \mathbf{J}_{k+1} \mathbf{s}_k = \mathbf{y}_k in a least-squares sense to minimize the Frobenius norm perturbation from \mathbf{J}_k.
The algorithm initializes with a starting point \mathbf{x}_0 and an initial Jacobian approximation \mathbf{J}_0, often taken as the identity matrix for simplicity or estimated via finite differences for better accuracy. At iteration k, solve the linear system \mathbf{J}_k \Delta \mathbf{x}_k = -\mathbf{F}(\mathbf{x}_k) for the search direction \Delta \mathbf{x}_k. Then, update the iterate as \mathbf{x}_{k+1} = \mathbf{x}_k + \alpha_k \Delta \mathbf{x}_k, where the damping parameter \alpha_k \in (0,1] (typically starting at 1) ensures stability by reducing \|\mathbf{F}(\mathbf{x}_{k+1})\|, often via a simple line search like Armijo's condition. Compute the differences \mathbf{s}_k = \mathbf{x}_{k+1} - \mathbf{x}_k and \mathbf{y}_k = \mathbf{F}(\mathbf{x}_{k+1}) - \mathbf{F}(\mathbf{x}_k). The core update applies the "good" Broyden formula:
\mathbf{J}_{k+1} = \mathbf{J}_k + \frac{ (\mathbf{y}_k - \mathbf{J}_k \mathbf{s}_k) \mathbf{s}_k^T }{ \mathbf{s}_k^T \mathbf{s}_k }
This rank-one correction requires only vector operations and one function evaluation per iteration beyond the linear solve.
For implementation, the following pseudocode outlines the basic procedure without line search details:
Initialize: Choose x_0 ∈ ℝ^n, J_0 ≈ ∇F(x_0) (e.g., identity or finite differences)
Set tolerance ε > 0
While ||F(x)|| > ε:
Solve J Δx = -F(x) for Δx (e.g., via LU factorization)
α ← 1 (or compute via line search to ensure ||F(x + α Δx)|| < ||F(x)||)
x_new ← x + α Δx
s ← x_new - x
y ← F(x_new) - F(x)
If s^T s = 0: break (degenerate case)
J ← J + [(y - J s) s^T] / (s^T s)
x ← x_new
Return x
Initialize: Choose x_0 ∈ ℝ^n, J_0 ≈ ∇F(x_0) (e.g., identity or finite differences)
Set tolerance ε > 0
While ||F(x)|| > ε:
Solve J Δx = -F(x) for Δx (e.g., via LU factorization)
α ← 1 (or compute via line search to ensure ||F(x + α Δx)|| < ||F(x)||)
x_new ← x + α Δx
s ← x_new - x
y ← F(x_new) - F(x)
If s^T s = 0: break (degenerate case)
J ← J + [(y - J s) s^T] / (s^T s)
x ← x_new
Return x
To illustrate, consider a 2D system \mathbf{F}(x,y) = \begin{pmatrix} x^2 + y - 1 \\ x + y^2 - 1 \end{pmatrix} = \mathbf{0} with initial \mathbf{x}_0 = (0.5, 0.5)^T and \mathbf{J}_0 = \mathbf{I}_2. First, \mathbf{F}(\mathbf{x}_0) = (-0.25, -0.25)^T, so solve \mathbf{I} \Delta \mathbf{x}_0 = -(-0.25, -0.25)^T yielding \Delta \mathbf{x}_0 = (0.25, 0.25)^T. Assuming full step \alpha_0 = 1, \mathbf{x}_1 = (0.75, 0.75)^T, \mathbf{s}_0 = (0.25, 0.25)^T, \mathbf{y}_0 = \mathbf{F}(\mathbf{x}_1) - \mathbf{F}(\mathbf{x}_0) = (0.5625, 0.5625)^T. The update gives \mathbf{J}_1 = \mathbf{I} + \frac{ (\mathbf{y}_0 - \mathbf{J}_0 \mathbf{s}_0 ) \mathbf{s}_0^T }{ \mathbf{s}_0^T \mathbf{s}_0 } = \begin{pmatrix} 1.625 & 0.625 \\ 0.625 & 1.625 \end{pmatrix}, reflecting the approximation improvement. Subsequent iterations refine toward the root near (0.618, 0.618).
The Broyden Family
General parameterization
The Broyden class encompasses a parameterized family of rank-one updates for approximating the Jacobian matrix in quasi-Newton methods for solving nonlinear systems of equations. The general form of the update is given by
\mathbf{J}_{k+1} = \mathbf{J}_k + \frac{ (\mathbf{y}_k - \mathbf{J}_k \mathbf{s}_k) \mathbf{w}_k^T }{ \mathbf{w}_k^T \mathbf{s}_k },
where \mathbf{J}_k is the current Jacobian approximation, \mathbf{s}_k = \mathbf{x}_{k+1} - \mathbf{x}_k is the step vector, \mathbf{y}_k = \mathbf{F}(\mathbf{x}_{k+1}) - \mathbf{F}(\mathbf{x}_k) is the function difference, and \mathbf{w}_k = \theta \mathbf{s}_k + (1 - \theta) \mathbf{J}_k^{-1} \mathbf{y}_k for a scalar parameter \theta \in [0,1] that blends between the good Broyden update (\theta = 1) and the bad Broyden update (\theta = 0).[19]
All updates in the Broyden class satisfy the secant condition \mathbf{J}_{k+1} \mathbf{s}_k = \mathbf{y}_k, ensuring that the approximation exactly matches the true Jacobian along the current step, thereby providing a consistent local linear model of the nonlinear system.
This parameterization extends naturally to unconstrained optimization, where the class is adapted to approximate the Hessian matrix (or its inverse) using analogous secant conditions on gradient differences, unifying quasi-Newton methods like BFGS and DFP under a single framework for improved efficiency in derivative-free iterations.[20]
Key variants
The Broyden family includes several prominent variants that differ in their choice of update direction for the rank-one correction to the Jacobian approximation, offering flexibility for various nonlinear equation-solving scenarios. These variants maintain the secant condition J_{k+1} s_k = y_k while minimizing changes to the previous approximation under different norms or criteria.[1]
One key variant is the good Broyden update, which applies a direct correction to the Jacobian using the step vector s_k for direction adjustment. Its formula is
J_{k+1} = J_k + \frac{(y_k - J_k s_k) s_k^T}{s_k^T s_k},
equivalent to selecting \theta = 1 in the general rank-one form. This approach emphasizes minimal perturbation to J_k in the Frobenius norm, making it computationally straightforward without requiring additional solves beyond the step computation.[21]
The bad Broyden update uses the direction t_k = J_k^{-1} y_k, with formula
J_{k+1} = J_k + \frac{(y_k - J_k s_k) t_k^T}{t_k^T s_k},
corresponding to \theta = 0. This variant minimizes the change in the inverse approximation but requires an additional linear solve to compute t_k.[2]
The Davidon-Fletcher-Powell (DFP) method, traditionally a rank-two update for optimization, relates to the Broyden class through its parameterization in the optimization context, yielding the inverse update
H_{k+1} = H_k - \frac{H_k y_k y_k^T H_k}{y_k^T H_k y_k} + \frac{s_k s_k^T}{s_k^T y_k}.
This positions DFP as a boundary case in the family for optimization problems, preserving positive definiteness when applicable and bridging to optimization contexts via the secant relation.[20]
Broyden's standard method produces nonsymmetric Jacobian approximations, ideal for general nonlinear root-finding where the true Jacobian lacks symmetry. In contrast, symmetric variants, such as the symmetric rank-one (SR1) update
J_{k+1} = J_k + \frac{(y_k - J_k s_k)(y_k - J_k s_k)^T}{(y_k - J_k s_k)^T s_k},
preserve symmetry and are suited to self-adjoint problems, like those arising in unconstrained optimization with symmetric Hessians.[22]
| Variant | Pros | Cons |
|---|
| Good Broyden | Simple direct update without additional inversion for the update; invariant under linear scalings of variables. Often exhibits superior superlinear convergence in practice. | Poorer adherence to secant in inverse-related metrics; potential for slower overall convergence in some cases.[21] |
| Bad Broyden | Better preserves the secant condition in inverse-related norms; minimizes change in the inverse approximation. | Requires computation of t_k = J_k^{-1} y_k, increasing complexity per iteration. Less stable numerically; may accumulate errors without careful implementation.[23] |
Theoretical Analysis
Convergence properties
Broyden's method exhibits local superlinear convergence when applied to nonlinear systems of equations F(\mathbf{x}) = \mathbf{0}, where F: \mathbb{R}^n \to \mathbb{R}^n is continuously differentiable and its Jacobian J(\mathbf{x}) = F'(\mathbf{x}) satisfies a Lipschitz condition near a root \mathbf{x}^*, meaning there exists a constant L > 0 such that \| J(\mathbf{x}) - J(\mathbf{y}) \| \leq L \| \mathbf{x} - \mathbf{y} \| for \mathbf{x}, \mathbf{y} sufficiently close to \mathbf{x}^*. Under these assumptions, if the initial approximation B_0 to J(\mathbf{x}^*) is sufficiently close in norm to the true Jacobian, the error \mathbf{e}_k = \mathbf{x}_k - \mathbf{x}^* satisfies \| \mathbf{e}_{k+1} \| = o( \| \mathbf{e}_k \| ) as k \to \infty, indicating that the convergence rate approaches that of Newton's method asymptotically.[24]
This superlinear behavior is Q-superlinear, where the ratio \| \mathbf{e}_{k+1} \| / \| \mathbf{e}_k \| tends to zero, and for the "good" Broyden update near the root, the asymptotic constant is strictly less than 1, ensuring faster relative progress compared to linear convergence methods. This property aligns with the broader class of quasi-Newton methods, where the secant condition preserves local superlinearity without exact Hessian or Jacobian evaluations.
For linear systems F(\mathbf{x}) = A \mathbf{x} - \mathbf{b} = \mathbf{0} with nonsingular n \times n matrix A, Broyden's method terminates in a finite number of steps, specifically at most $2n iterations, regardless of the initial approximation B_0, as the updates span the necessary subspace to recover the exact solution.[3]
Convergence requires the initial matrix B_0 to be sufficiently close to J(\mathbf{x}^*), often in the sense of \| B_0 - J(\mathbf{x}^*) \| being small relative to the Lipschitz constant and the conditioning of J(\mathbf{x}^*). In cases where symmetric positive definiteness is assumed for J(\mathbf{x}^*) and B_0, variants like the symmetric Broyden method maintain these guarantees while preserving indefiniteness avoidance.
Computational considerations
Broyden's method offers significant computational advantages over the full Newton method for solving systems of nonlinear equations, primarily due to its low-rank update scheme that approximates the Jacobian without recomputing it from scratch at each iteration. The update step requires only O(n^2) operations, where n is the dimension of the system, compared to the O(n^3) cost of forming and factoring the exact Jacobian in Newton's method, making it particularly suitable for problems with dimensions up to several thousand.[25] This efficiency stems from the rank-one update formula, which leverages previously computed information to maintain a good approximation while avoiding expensive finite-difference or analytical Jacobian evaluations after the initial iteration.
For solving the linear system involving the approximate Jacobian at each step, direct methods like LU factorization can be employed when n is moderate, as the factorization cost is O(n^3) but amortized over multiple iterations if the approximation remains stable. However, for large-scale problems where n exceeds thousands, iterative solvers such as GMRES are recommended to approximate the solution to B_k \Delta x = -F(x_k) with a cost of roughly O(n) per iteration, often requiring only a few iterations due to the quality of the quasi-Newton preconditioner. These approaches avoid the prohibitive storage and time demands of full factorizations, enabling scalability to high-dimensional systems in applications like optimization and circuit simulation.[26]
To ensure global convergence, especially with poor initial guesses, Broyden's method is often modified by incorporating a line search, such as the Armijo rule, which enforces sufficient decrease in a merit function like \|F(x)\|^2 along the search direction. Trust-region strategies can also be integrated, bounding the step size to maintain model accuracy and prevent divergence, thereby extending the method's reliability beyond the local superlinear regime. These modifications guarantee convergence to a root under mild assumptions on the Jacobian's nonsingularity, without sacrificing the method's efficiency.[27][28]
Numerical stability in Broyden's method can be compromised by ill-conditioning of the approximate Jacobian, leading to erratic updates or stagnation, particularly in problems with nearly singular Jacobians or noisy function evaluations. To mitigate this, periodic restarting—recomputing the exact Jacobian every few iterations—resets the approximation and restores accuracy, though at increased cost. Hybrid approaches, combining Broyden updates with finite-difference Jacobians for initial or corrective steps, further enhance robustness by blending low-cost approximations with reliable exact information when conditioning deteriorates.[29][30]