Fact-checked by Grok 2 weeks ago

Quasi-Newton method

The Quasi-Newton method is a of iterative optimization algorithms designed to solve unconstrained nonlinear optimization problems by approximating the second-order information () of the objective function using only evaluations, thereby avoiding the computationally expensive direct computation or inversion of the exact as required in the classical . These methods update an approximate (or its inverse) at each iteration via low-rank modifications that satisfy the secant condition, which ensures the approximation correctly captures the change in between successive points. Originating in the mid-1950s with early work by W. C. Davidon at —though not formally published until 1991—these methods evolved as efficient alternatives to full for large-scale problems where second derivatives are costly or unavailable. Key developments include the Davidon-Fletcher-Powell (DFP) update in the 1950s–1960s and the more robust Broyden-Fletcher-Goldfarb-Shanno (BFGS) method, independently proposed in 1970 by Charles George Broyden, Roger Fletcher, Donald Goldfarb, and David Shanno. Other notable variants encompass the symmetric rank-one (SR1) update and the broader Broyden class of methods, which generalize these updates through a parameter that interpolates between BFGS and DFP behaviors. Quasi-Newton methods offer significant advantages over first-order techniques like , achieving superlinear convergence—meaning the error decreases faster than linearly near the optimum—while maintaining a per-iteration cost of only O(n²) for n-dimensional problems, compared to O(n³) for exact steps. Under suitable conditions, such as a twice continuously differentiable objective function with continuous and strict convexity ensuring positive-definiteness of updates (e.g., yᵀs > 0 where y and s are and step differences), BFGS guarantees global with and local superlinear rates. For high-dimensional or memory-constrained applications, limited-memory variants like L-BFGS further reduce storage to O(n m) using a fixed number m of past updates, making them widely applicable in and scientific computing.

Background and Fundamentals

Newton's Method Overview

Newton's method, also known as the Newton-Raphson method, is an iterative algorithm for finding roots of equations, originally developed by in the mid-1660s and independently formulated by in 1690, though it gained widespread use in only in the . For a univariate f(x), the method generates a sequence of approximations via the x_{k+1} = x_k - \frac{f(x_k)}{f'(x_k)}, where f'(x_k) is the derivative evaluated at the current iterate x_k, provided f'(x_k) \neq 0. This update approximates the root by intersecting the tangent line to the curve y = f(x) at x_k with the x-axis. Under suitable conditions, such as f being twice continuously differentiable near the root x^* with f'(x^*) \neq 0, the method exhibits local quadratic convergence, meaning the error e_{k+1} \approx C e_k^2 for some constant C > 0 and sufficiently small e_k = |x_k - x^*|. In multivariate optimization, Newton's method seeks to minimize a twice continuously differentiable function f: \mathbb{R}^n \to \mathbb{R} by solving \nabla f(\mathbf{x}^*) = \mathbf{0}. The iteration becomes \mathbf{x}_{k+1} = \mathbf{x}_k - \mathbf{H}_k^{-1} \nabla f(\mathbf{x}_k), where \mathbf{H}_k = \nabla^2 f(\mathbf{x}_k) is the at \mathbf{x}_k, assumed invertible. This step minimizes a local m_k(\mathbf{p}) = f(\mathbf{x}_k) + \nabla f(\mathbf{x}_k)^T \mathbf{p} + \frac{1}{2} \mathbf{p}^T \mathbf{H}_k \mathbf{p} of f along the search direction \mathbf{p}. Local quadratic convergence holds if f is twice continuously differentiable near a stationary point \mathbf{x}^* where \nabla f(\mathbf{x}^*) = \mathbf{0} and the \mathbf{H}(\mathbf{x}^*) is positive definite, ensuring the point is a strict local minimum. The method extends naturally to solving systems of nonlinear equations \mathbf{F}(\mathbf{x}) = \mathbf{0} in \mathbb{R}^n, where \mathbf{F}: \mathbb{R}^n \to \mathbb{R}^n is continuously differentiable. The update is \mathbf{x}_{k+1} = \mathbf{x}_k - \mathbf{J}_k^{-1} \mathbf{F}(\mathbf{x}_k), with \mathbf{J}_k = \nabla \mathbf{F}(\mathbf{x}_k) the matrix, provided it is nonsingular. Quadratic occurs locally if \mathbf{F} is continuously differentiable near a solution \mathbf{x}^* with \mathbf{F}(\mathbf{x}^*) = \mathbf{0} and \mathbf{J}(\mathbf{x}^*) invertible. This formulation draws from the univariate case, akin to a derivative-free precursor like the but leveraging full first-order information.

Motivation and Advantages of Quasi-Newton Methods

Newton's method achieves quadratic convergence near a solution, serving as an ideal benchmark for . However, its practical application is hindered by the high computational cost of evaluating exact Jacobians or Hessians, particularly in high-dimensional problems or when functions lack analytical second derivatives, necessitating expensive finite-difference approximations that require O(n^2) additional function or evaluations per iteration. Furthermore, storing and inverting these full n by n matrices demands O(n^2) space and time, which becomes prohibitive for large-scale problems where n exceeds thousands. Quasi-Newton methods address these limitations by constructing low-rank updates to approximate the or using only and evaluations, thereby avoiding direct computation of second . This results in reduced per-iteration expense, as updates can be performed in O(n^2) time similar to matrix-vector products in , but with amortized lower costs for derivative approximations over multiple iterations. These methods retain superlinear rates, typically with order greater than 1 but less than 2, providing a favorable trade-off between efficiency and accuracy. The development of quasi-Newton methods emerged in the and , driven by the need for efficient optimization techniques in and scientific computing, with early contributions including Davidon's variable-metric approach in 1959 and the Davidon-Fletcher-Powell update formalized in 1963. Broyden's extensions in the mid-1960s further generalized these ideas to nonlinear systems, establishing a family of methods that balance computational feasibility with rapid convergence. Overall, quasi-Newton approaches enable scalable solutions to problems where full Hessian information is impractical, maintaining much of Newton's desirable properties while mitigating its drawbacks.

Core Update Mechanisms

The Secant Condition

The secant condition forms the cornerstone of quasi-Newton methods, providing a way to update an approximate without computing exact second-order . In the context of root-finding for nonlinear systems \mathbf{F}(\mathbf{x}) = \mathbf{0}, where \mathbf{F}: \mathbb{R}^n \to \mathbb{R}^n is a nonlinear , the condition specifies that the updated approximation \mathbf{B}_{k+1} to the \mathbf{J}(\mathbf{x}_k) satisfies \mathbf{B}_{k+1} (\mathbf{x}_{k+1} - \mathbf{x}_k) = \mathbf{F}(\mathbf{x}_{k+1}) - \mathbf{F}(\mathbf{x}_k). This equation ensures that the defined by \mathbf{B}_{k+1} matches the actual change in \mathbf{F} along the step from \mathbf{x}_k to \mathbf{x}_{k+1}, mimicking the behavior of the true on that direction. For unconstrained optimization problems of minimizing a twice-differentiable f: \mathbb{R}^n \to \mathbb{R}, the secant condition adapts to the approximation: \mathbf{B}_{k+1} (\mathbf{x}_{k+1} - \mathbf{x}_k) = \nabla f(\mathbf{x}_{k+1}) - \nabla f(\mathbf{x}_k). Here, \mathbf{B}_{k+1} approximates the \nabla^2 f(\mathbf{x}_k), and the condition captures the change in the over the step, denoted compactly as \mathbf{B}_{k+1} \mathbf{s}_k = \mathbf{y}_k where \mathbf{s}_k = \mathbf{x}_{k+1} - \mathbf{x}_k and \mathbf{y}_k = \nabla f(\mathbf{x}_{k+1}) - \nabla f(\mathbf{x}_k). This condition derives from the first-order Taylor expansion of the underlying function. For root-finding, the Taylor series around \mathbf{x}_k gives \mathbf{F}(\mathbf{x}_{k+1}) \approx \mathbf{F}(\mathbf{x}_k) + \mathbf{J}(\mathbf{x}_k) (\mathbf{x}_{k+1} - \mathbf{x}_k); imposing the secant condition on \mathbf{B}_{k+1} ensures that the approximation \mathbf{B}_{k+1} agrees with this expansion exactly along the secant direction \mathbf{s}_k = \mathbf{x}_{k+1} - \mathbf{x}_k. Similarly, in optimization, the expansion \nabla f(\mathbf{x}_{k+1}) \approx \nabla f(\mathbf{x}_k) + \nabla^2 f(\mathbf{x}_k) (\mathbf{x}_{k+1} - \mathbf{x}_k) motivates the condition, guaranteeing that \mathbf{B}_{k+1} reproduces the first-order change in the gradient. However, the secant condition does not uniquely determine \mathbf{B}_{k+1}, as infinitely many matrices in \mathbb{R}^{n \times n} can satisfy the single \mathbf{B}_{k+1} \mathbf{s}_k = \mathbf{y}_k for given \mathbf{s}_k and \mathbf{y}_k. Solutions typically involve low-rank modifications to the previous approximation \mathbf{B}_k, such as rank-one updates, to minimally alter \mathbf{B}_k while enforcing the condition. Key properties of updates satisfying the secant condition include the potential for nonsingularity preservation: if \mathbf{B}_k is nonsingular and \mathbf{s}_k \neq \mathbf{0}, then under mild conditions (e.g., \mathbf{y}_k linearly independent from prior steps), \mathbf{B}_{k+1} remains nonsingular. In the optimization setting, if \mathbf{B}_k is symmetric and positive definite, certain updates can preserve these properties, ensuring descent directions.

Broyden's Family of Updates

In 1965, C. G. Broyden introduced quasi-Newton updates for approximating the matrix in methods for solving systems of nonlinear equations, providing a framework that satisfies the secant condition while allowing flexibility. This work was further explored in his 1967 paper on applications to function minimization, extending the ideas to optimization contexts. For root-finding, a key update from Broyden's 1965 work is the rank-one modification: \mathbf{B}_{k+1} = \mathbf{B}_k + \frac{ (\mathbf{y}_k - \mathbf{B}_k \mathbf{s}_k) \mathbf{s}_k^T }{ \mathbf{s}_k^T \mathbf{s}_k }, where \mathbf{s}_k = \mathbf{x}_{k+1} - \mathbf{x}_k and \mathbf{y}_k = \mathbf{F}(\mathbf{x}_{k+1}) - \mathbf{F}(\mathbf{x}_k). This formula, often referred to as , enforces the secant condition through an outer product correction, achieving O(n²) computational cost per iteration. Variants include a "good" update that better preserves certain properties, such as: \mathbf{B}_{k+1} = \mathbf{B}_k + \frac{ (\mathbf{y}_k - \mathbf{B}_k \mathbf{s}_k) (\mathbf{B}_k \mathbf{s}_k)^T }{ \mathbf{s}_k^T \mathbf{B}_k \mathbf{s}_k }. In the optimization context, Broyden's later contributions led to the Broyden class of updates, which generalize specific methods like BFGS and DFP through a parameter \phi_k \in [0,1] as a convex combination: \mathbf{B}_{k+1} = (1 - \phi_k) \mathbf{B}_{k+1}^{\mathrm{BFGS}} + \phi_k \mathbf{B}_{k+1}^{\mathrm{DFP}}. These updates satisfy the secant condition by construction and are detailed in subsequent sections on optimization applications. The choice of updates influences numerical stability and convergence, with rank-one forms promoting simplicity and rank-two forms incorporating more curvature information.

Applications in Optimization

BFGS and DFP Methods

The Davidon–Fletcher–Powell (DFP) represents one of the earliest quasi-Newton approaches for unconstrained optimization, originally proposed by Davidon in an unpublished 1959 technical report and later formalized and analyzed by and Powell in 1963. This updates an approximation \mathbf{B}_k to the \nabla^2 f(\mathbf{x}_k) using curvature information from successive iterates, specifically the displacement \mathbf{s}_k = \mathbf{x}_{k+1} - \mathbf{x}_k and the change \mathbf{y}_k = \nabla f(\mathbf{x}_{k+1}) - \nabla f(\mathbf{x}_k). The DFP update formula is given by \mathbf{B}_{k+1} = \mathbf{B}_k - \frac{\mathbf{B}_k \mathbf{s}_k \mathbf{s}_k^T \mathbf{B}_k^T}{\mathbf{s}_k^T \mathbf{B}_k \mathbf{s}_k} + \frac{\mathbf{y}_k \mathbf{y}_k^T}{\mathbf{y}_k^T \mathbf{s}_k}, where the update satisfies the secant condition \mathbf{B}_{k+1} \mathbf{s}_k = \mathbf{y}_k. If the initial approximation \mathbf{B}_0 is positive definite and the curvature condition \mathbf{y}_k^T \mathbf{s}_k > 0 holds (ensuring sufficient decrease via a suitable line search), the DFP update preserves positive definiteness of \mathbf{B}_{k+1}. In practice, the DFP algorithm proceeds iteratively: at each step k, compute the search direction \mathbf{p}_k = -\mathbf{B}_k \nabla f(\mathbf{x}_k), perform a to select a step length \alpha_k > 0 satisfying (which imply \mathbf{y}_k^T \mathbf{s}_k > 0), update the iterate \mathbf{x}_{k+1} = \mathbf{x}_k + \alpha_k \mathbf{p}_k, and apply the DFP update to \mathbf{B}_{k+1}. This framework ensures descent in the objective function f while approximating the with low-rank modifications, avoiding costly exact Hessian computations. The Broyden–Fletcher–Goldfarb–Shanno (BFGS) method, developed independently by Broyden, , Goldfarb, and Shanno in 1970, provides a dual update to DFP that often exhibits superior . The BFGS update for the approximation is \mathbf{B}_{k+1} = \mathbf{B}_k + \frac{\mathbf{y}_k \mathbf{y}_k^T}{\mathbf{y}_k^T \mathbf{s}_k} - \frac{\mathbf{B}_k \mathbf{s}_k \mathbf{s}_k^T \mathbf{B}_k}{\mathbf{s}_k^T \mathbf{B}_k \mathbf{s}_k}, also satisfying the secant condition, with preserved under the same assumptions on \mathbf{B}_0 and \mathbf{y}_k^T \mathbf{s}_k > 0. The algorithm follows the same outline as DFP, using the same line search-integrated update process, but the BFGS formula tends to yield better conditioned approximations, particularly for functions with inexact line searches. Empirical studies confirm that BFGS outperforms DFP in many unconstrained optimization problems, requiring fewer iterations and function evaluations due to its improved conditioning and robustness to scaling in the objective. Both methods belong to Broyden's broader family of quasi-Newton updates but are specifically designed for optimization, emphasizing symmetry and positive definiteness for reliable descent directions.

Limited-Memory Variants

Limited-memory variants of quasi-Newton methods address the scalability issues of full-matrix updates in high-dimensional optimization problems by storing only a limited number of update pairs rather than the entire approximate . The most prominent example is the (L-BFGS) algorithm, which approximates the inverse using the m most recent pairs of curvature information, denoted as \mathbf{s}_k = \mathbf{x}_{k+1} - \mathbf{x}_k and \mathbf{y}_k = \nabla f(\mathbf{x}_{k+1}) - \nabla f(\mathbf{x}_k), where m is a small integer typically between 5 and 20. In L-BFGS, the approximate inverse Hessian-vector product \mathbf{H}_{k+1} \mathbf{g}_{k+1} (where \mathbf{g}_{k+1} = \nabla f(\mathbf{x}_{k+1})) is computed implicitly through a two-loop recursion procedure, avoiding explicit storage of the n \times n matrix. The recursion uses the m most recent pairs (\mathbf{s}_i, \mathbf{y}_i) for i = k-m+1, \dots, k, with \rho_i = 1 / (\mathbf{y}_i^T \mathbf{s}_i). It begins with the first loop (from most recent i = k to oldest i = k-m+1): initialize \mathbf{q} = \mathbf{g}_{k+1}; for each i, compute \alpha_i = \rho_i \mathbf{s}_i^T \mathbf{q} and update \mathbf{q} \leftarrow \mathbf{q} - \alpha_i \mathbf{y}_i. Then, apply an initial scaling \mathbf{H}_0 = \gamma_{k+1} \mathbf{I}, where \gamma_{k+1} = \frac{\mathbf{s}_k^T \mathbf{y}_k}{\mathbf{y}_k^T \mathbf{y}_k}, to obtain \mathbf{r} = \mathbf{H}_0 \mathbf{q}. This is followed by the second loop (from oldest i = k-m+1 to most recent i = k): for each i, compute \beta_i = \rho_i \mathbf{y}_i^T \mathbf{r} and update \mathbf{r} \leftarrow \mathbf{r} + (\alpha_i - \beta_i) \mathbf{s}_i. The result \mathbf{r} approximates \mathbf{H}_{k+1} \mathbf{g}_{k+1}. This approach ensures that the storage requirement is O(m n) instead of O(n^2) for the full BFGS method, making it practical for problems where the dimension n > 1000. L-BFGS has found widespread application in tasks, such as training models on large datasets, where its efficiency in handling high-dimensional parameter spaces outperforms first-order methods like in terms of speed per iteration. In these settings, the choice of m = 5 to $20 balances quality and usage, often leading to robust performance on datasets with millions of features. Variants of L-BFGS leverage a compact representation of the approximate , typically in the form \mathbf{H}_k = \gamma_k \mathbf{I} + \mathbf{S}_k \mathbf{M}_k \mathbf{Y}_k^T, where \mathbf{S}_k and \mathbf{Y}_k are matrices of the stored vectors, \mathbf{M}_k encodes the inner products, and \gamma_k is a scaling factor; this structure enables efficient use as a in conjugate gradient methods for solving the search direction subproblems.

Applications in Root Finding

Broyden's Method for Systems

Broyden's method is a quasi-Newton technique specifically designed for solving systems of nonlinear equations \mathbf{F}(\mathbf{x}) = \mathbf{0}, where \mathbf{F}: \mathbb{R}^n \to \mathbb{R}^n is a continuously with a nonsingular at the solution \mathbf{x}^*. The method approximates the \mathbf{J}(\mathbf{x}) iteratively without requiring explicit computations after the initial step, updating the approximation using function values to satisfy the condition \mathbf{y}_k = \mathbf{B}_{k+1} \mathbf{s}_k, where \mathbf{s}_k = \mathbf{x}_{k+1} - \mathbf{x}_k and \mathbf{y}_k = \mathbf{F}(\mathbf{x}_{k+1}) - \mathbf{F}(\mathbf{x}_k). This approach extends the scalar to higher dimensions and is particularly useful when evaluations are costly or unavailable. The algorithm begins with an initial guess \mathbf{x}_0 and an initial Jacobian approximation \mathbf{B}_0, often chosen as the I for simplicity when no prior information is available. At each k, the step \mathbf{s}_k is computed by solving the \mathbf{B}_k \mathbf{s}_k = -\mathbf{F}(\mathbf{x}_k), typically via LU factorization or iterative solvers, and the next iterate is \mathbf{x}_{k+1} = \mathbf{x}_k + \mathbf{s}_k. The approximation is then updated using Broyden's rank-one formula: \mathbf{B}_{k+1} = \mathbf{B}_k + \frac{ (\mathbf{y}_k - \mathbf{B}_k \mathbf{s}_k) \mathbf{s}_k^T }{ \|\mathbf{s}_k\|^2 }, which ensures the secant condition is met while introducing a minimal perturbation to \mathbf{B}_k. This update requires only matrix-vector products and avoids full Jacobian recomputation, reducing computational cost to O(n^2) per iteration after the first. Within Broyden's framework, two primary update variants exist: the "good" Broyden update and the "bad" Broyden update. The good update minimizes the change in the Frobenius norm \|\mathbf{B}_{k+1} - \mathbf{B}_k\|_F subject to the secant condition, leading to a more stable approximation that preserves positive definiteness in certain cases and exhibits better numerical behavior. In contrast, the bad update is a simpler projection that directly applies the rank-one correction but can be less stable, particularly when the iterates are far from the solution, as it may amplify errors in the approximation. The good variant is generally preferred in practice for its robustness, though both satisfy the secant condition and are rank-one updates. Under suitable conditions, exhibits local superlinear convergence. Specifically, if the initial approximation \mathbf{B}_0 is sufficiently close to \mathbf{J}(\mathbf{x}^*) and the Dennis-Moré condition holds—that is, \lim_{k \to \infty} \frac{\|\mathbf{B}_k^{-1} (\mathbf{J}(\mathbf{x}^*) - \mathbf{B}_k) \mathbf{s}_k \|}{\|\mathbf{s}_k\|} = 0—then the error satisfies \|\mathbf{x}_{k+1} - \mathbf{x}^*\| = o(\|\mathbf{x}_k - \mathbf{x}^*\|) as k \to \infty. This superlinear rate, typically approaching order 1.5–1.6 in low dimensions, is established for both good and bad updates, though the good variant often achieves it more reliably. Global convergence requires additional safeguards like line searches or trust regions, but locally, the method is highly efficient near the root. To illustrate, consider the 2D \mathbf{F}(\mathbf{x}) = \begin{pmatrix} \sin x_1 + x_2 \\ x_1^2 + x_2^2 - 1 \end{pmatrix} = \mathbf{0}, which describes the intersection of a sine and a , with a at approximately (\cos(\pi/6), -\sin(\pi/6)) \approx (0.866, -0.5). Starting from \mathbf{x}_0 = (0.5, 0.5) and \mathbf{B}_0 = I, the method iteratively refines the solution: the first step solves I \mathbf{s}_0 = -\mathbf{F}(\mathbf{x}_0), yielding \mathbf{x}_1 \approx (-0.479, 1.0), and subsequent updates via the rank-one formula converge superlinearly to the in about 8–10 iterations, demonstrating the method's effectiveness on coupled transcendental equations. (Note: This example is adapted from standard numerical demonstrations in quasi-Newton literature.) In practice, selecting \mathbf{B}_0 as the identity matrix works well for well-scaled problems but may require scaling adjustments for ill-conditioned systems. If \mathbf{B}_k becomes singular during iterations—detectable via a small minimum singular value—regularization techniques such as adding a small multiple of the identity (\mathbf{B}_k + \epsilon I) or restarting with a fresh finite-difference Jacobian can restore stability, though these increase computational overhead. Line searches along \mathbf{s}_k to ensure \|\mathbf{F}(\mathbf{x}_{k+1})\| < \|\mathbf{F}(\mathbf{x}_k)\| further enhance reliability, especially for nonconvex \mathbf{F}.

Variants for Nonlinear Equations

While Broyden's method serves as a foundational quasi-Newton approach for solving systems of nonlinear equations, several variants have been developed to enhance its robustness, particularly in cases prone to stagnation or divergence. Powell's hybrid method combines elements of with quasi-Newton updates and incorporates a line search to promote global convergence, making it suitable for problems where the initial approximation is far from the solution. This approach alternates between full Jacobian evaluations and approximate updates, reducing computational cost while ensuring descent in the residual norm. Anderson mixing, originally proposed in 1965 and revived in the 2010s, accelerates convergence by extrapolating a linear combination of multiple previous iterates and residuals, effectively incorporating more historical information than single-secant methods like Broyden's. This technique is particularly effective for fixed-point iterations underlying nonlinear systems, often achieving faster linear convergence rates in practice. The Shamanskii modification improves stability by periodically recomputing the full Jacobian every m steps, using it unchanged for the intermediate quasi-Newton updates to mitigate error accumulation in the approximation. This periodic reset enhances reliability for ill-conditioned systems without the full expense of recomputation at every iteration. Globalized variants extend these methods with trust-region or line-search strategies to enforce sufficient decrease in the merit function, ensuring global convergence even from poor starting points. Trust-region approaches constrain updates within a dynamic radius based on trial steps, while line searches along quasi-Newton directions accept steps satisfying Wolfe conditions. These variants find prominent applications in chemical engineering, such as modeling large-scale reactor systems where nonlinear equations arise from mass and energy balances, often involving ill-conditioned Jacobians due to stiff kinetics.

Theoretical Foundations

Convergence Properties

Quasi-Newton methods demonstrate local superlinear convergence, characterized by an order \alpha where $1 < \alpha < 2, under appropriate assumptions on the objective function and update mechanism. This property arises when the approximation matrices converge to the true Hessian (or Jacobian) in a manner that satisfies specific error conditions. For methods in Broyden's family, such as Broyden's method for nonlinear equations, superlinear convergence is established via the Dennis-Moré condition, which requires that \lim_{k \to \infty} \frac{ \| \mathbf{B}_k^{-1} (\mathbf{y}_k - \mathbf{B}_k \mathbf{s}_k ) \| }{ \| \mathbf{s}_k \| } = 0, where \mathbf{B}_k is the approximation matrix at iteration k, \mathbf{s}_k = \mathbf{x}_{k+1} - \mathbf{x}_k, and \mathbf{y}_k is the difference in function values or gradients. This condition ensures that the update error diminishes relative to the step size, leading to q-superlinear convergence rates for the sequence of iterates. In the context of unconstrained optimization, the BFGS method achieves q-superlinear convergence locally when paired with line searches satisfying the Wolfe conditions, which enforce sufficient decrease and curvature along the search direction. These conditions guarantee that the step length balances progress toward the minimum while preserving the quasi-Newton update's desirable properties, such as positive definiteness of the approximation. The theoretical foundation for this behavior in BFGS and related rank-two updates stems from analyses showing that the updates satisfy the Dennis-Moré condition asymptotically under Lipschitz continuity of the Hessian. Global convergence of quasi-Newton methods, including those from Broyden's class, requires additional safeguards like exact or inexact line searches or trust-region strategies to ensure descent and prevent divergence on nonconvex problems. Early results from the 1970s demonstrated that, with appropriate line search techniques, these methods converge globally to a stationary point for continuously differentiable functions, provided the initial approximation is sufficiently close or the objective satisfies mild regularity conditions. For instance, modifications to Broyden's method ensure global convergence even for linear mappings, extending to nonlinear cases with bounded deterioration of the approximation. The convergence rate of quasi-Newton methods is influenced by factors such as the quality of the initial Hessian or Jacobian approximation and the Lipschitz continuity of the function's derivatives, which control how quickly the secant condition is satisfied. Compared to full Newton's method, which achieves quadratic convergence but requires exact second derivatives, quasi-Newton methods offer a slower superlinear rate yet outperform first-order methods like steepest descent, which exhibit only linear convergence, especially near the solution where second-order information accelerates progress.

Connection to Matrix Inversion

Quasi-Newton methods approximate the inverse of the or using secant conditions derived from successive function evaluations, providing an efficient alternative to exact matrix inversion in nonlinear optimization and root-finding problems. These updates construct low-rank modifications to an initial approximation, iteratively refining the inverse without recomputing it from scratch each time. This approach draws from iterative techniques for matrix inversion, where each update mimics a step toward solving for the inverse through successive corrections. A key linear algebra tool underlying these updates is the , which computes the inverse of a matrix after a rank-one perturbation: (\mathbf{A} + \mathbf{u} \mathbf{v}^T)^{-1} = \mathbf{A}^{-1} - \frac{\mathbf{A}^{-1} \mathbf{u} \mathbf{v}^T \mathbf{A}^{-1}}{1 + \mathbf{v}^T \mathbf{A}^{-1} \mathbf{u}}, assuming \mathbf{A} is invertible and the denominator is nonzero. This formula enables efficient rank-one updates in quasi-Newton methods, such as , by adjusting the approximate inverse with minimal computational cost. For rank-two updates, common in optimization methods like , the more general extends this capability to handle paired vectors, preserving the structure of the approximation. In Broyden's family of updates, the modification can be interpreted as a discrete Newton step applied to the equation \mathbf{B} \Delta = \mathbf{y} - \mathbf{B} \mathbf{s}, where \mathbf{B} is the current approximation to the Jacobian, \mathbf{s} is the step vector, and \mathbf{y} is the secant difference. Solving this least-squares problem yields the update direction \Delta, which corrects the approximation to satisfy the secant condition exactly in one dimension while minimizing deviation in others, effectively advancing the iterative inversion process. This perspective highlights the quasi-Newton update as a targeted correction akin to Newton's method for inverting the residual system. The development of these connections emerged in the 1960s with Broyden's foundational work in 1965, which introduced the class of updates satisfying the secant condition, linking them to efficient inversion techniques prevalent in numerical linear algebra at the time. An important implication of these update mechanisms is the preservation of nonsingularity: if the initial matrix approximation is nonsingular, the low-rank updates maintain this property under the , provided the perturbation terms avoid singularity in the denominator. This stability ensures reliable step computations throughout the iteration, contributing to the robustness of quasi-Newton methods in practice.

Implementations and Extensions

Software and Algorithms

MINPACK, developed in the 1970s by researchers at Argonne National Laboratory, is an early Fortran library providing implementations of quasi-Newton methods for solving systems of nonlinear equations and nonlinear least squares problems. It includes subroutines like HYBRD1 and HYBRJ1, which employ and hybrid approaches combining quasi-Newton updates with for improved robustness in nonlinear root-finding. In modern Python environments, the SciPy library's optimize.minimize function supports quasi-Newton optimization through the 'BFGS' method for unconstrained problems and 'L-BFGS-B' for bounded constraints, leveraging limited-memory approximations to handle large-scale objectives efficiently. MATLAB's Optimization Toolbox implements quasi-Newton methods in the fminunc solver, which defaults to this approach for unconstrained multivariable minimization and allows specification via the 'Algorithm' option. The NLopt library, written in C++ with bindings for multiple languages, incorporates BFGS for unconstrained optimization and L-BFGS variants suitable for high-dimensional problems, enabling seamless integration in applications requiring nonlinear optimization. BFGS serves as a common default in these libraries due to its balance of efficiency and superlinear convergence. A generic algorithmic framework for the BFGS method with Armijo line search, as outlined in standard numerical optimization references, proceeds as follows:
Initialize: Choose initial point x_0, initial Hessian approximation H_0 (e.g., identity matrix), tolerance ε > 0
Set k = 0
While ||∇f(x_k)|| > ε:
    Compute search direction: p_k = -H_k ∇f(x_k)
    Perform Armijo line search: Find α_k > 0 such that f(x_k + α_k p_k) ≤ f(x_k) + c α_k ∇f(x_k)^T p_k for some c ∈ (0,1)
    Update: x_{k+1} = x_k + α_k p_k
    Compute s_k = x_{k+1} - x_k, y_k = ∇f(x_{k+1}) - ∇f(x_k)
    If s_k^T y_k > 0, update H_{k+1} using BFGS formula: H_{k+1} = (I - ρ_k s_k y_k^T) H_k (I - ρ_k y_k s_k^T) + ρ_k s_k s_k^T where ρ_k = 1 / (s_k^T y_k)
    Else: H_{k+1} = H_k (or reset to initial)
    k = k + 1
Return x_k
This pseudocode assumes exact gradient evaluations and focuses on the core update loop, with the Armijo condition ensuring sufficient decrease. Best practices for implementing quasi-Newton methods emphasize variable to normalize gradients and function values, preventing ill-conditioning in high-dimensional spaces, and selecting an initial approximation as a derived from gradient components or finite differences for faster early iterations. These techniques enhance without altering the method's asymptotic properties.

Recent Advances

Recent developments in quasi-Newton methods have focused on enhancing efficiency and applicability in large-scale and distributed optimization settings. A notable advancement is the greedy quasi-Newton method introduced by Rodomanov and in 2021, which selects updates from the Broyden family that minimize a specific error measure, leading to explicit superlinear rates. This approach has been extended to distributed environments, where adaptive greedy updates achieve non-asymptotic guarantees, making it suitable for in high-dimensional problems. In 2024, incremental quasi-Newton methods incorporating symmetric rank-1 (SR1) updates were proposed to handle in scenarios. These methods update approximations incrementally as new data arrives, achieving faster superlinear compared to traditional variants, which is particularly beneficial for sequential optimization tasks with evolving objectives. For multi-variable nonlinear equations, full-rank Jacobian updates via the T-Secant strategy, developed around 2023, improve by ensuring complete rank preservation in each iteration through dual conditions, reducing sensitivity to ill-conditioning. Applications in have seen quasi-Newton methods adapted for challenging problems like -point optimization. The multiple greedy SR1 method for strongly-convex-strongly-concave points, introduced in 2024, uses multiple greedy selections of SR1 updates to escape points more effectively, demonstrating superior performance in training generative models. In , limited-memory quasi-Newton approaches from 2023 construct Pareto fronts by approximating s with low-rank updates, enabling scalable solutions for conflicting objectives in engineering design. Additionally, a 2025 diagonal quasi-Newton method employing penalty decomposition addresses sparse nonconvex bound-constrained problems, promoting sparsity while maintaining through diagonal approximations. These advances address longstanding gaps in classical quasi-Newton methods, such as handling noisy gradients and in AI-driven applications, moving beyond the 1970s-era focus on small-scale unconstrained problems to support modern distributed and data-intensive frameworks.

References

  1. [1]
    [PDF] Quasi-Newton Methods
    Newton's method has (local) quadratic convergence, versus linear convergence of gradient descent • But Newton iterations are much more expensive ... ...
  2. [2]
    [PDF] Quasi-Newton optimization: Origin of the BFGS update
    This approach leads to “quasi-Newton” or “variable-metric” methods, so-called because they approximate an exact Newton step for. ∇f = 0. The most widely used ...
  3. [3]
    [PDF] TMA4180 Optimization: Quasi-Newton Methods - NTNU
    - The most popular quasi-Newton algorithm is the BFGS method, named for its discoverers Charles George Broyden, Roger Fletcher, Donald. Goldfarb and David ...
  4. [4]
    [PDF] Historical Development of the Newton-Raphson Method
    This expository paper traces the development of the Newton-Raphson method for solving nonlinear algebraic equations through the extant notes, letters, and ...
  5. [5]
    Numerical Optimization | SpringerLink
    In stock Free deliveryNumerical Optimization presents a comprehensive and up-to-date description of the most effective methods in continuous optimization.Derivative-Free Optimization · Sequential Quadratic... · Quasi-Newton Methods
  6. [6]
    [PDF] Numerical Optimization - UCI Mathematics
    Page 1. Page 2. This is page iii. Printer: Opaque this. Jorge Nocedal Stephen J. Wright. Numerical Optimization. Second Edition. Page 3. This is pag. Printer: O.
  7. [7]
    Quasi-Newton Methods, Motivation and Theory | SIAM Review
    This paper is an attempt to motivate and justify quasi-Newton methods as useful modifications of Newton's method for general and gradient nonlinear systems ...
  8. [8]
    [PDF] Quasi-Newton Methods
    the first quasi-Newton method ever. Although Davidon's contribution was a major breakthrough in ...
  9. [9]
    [PDF] A Class of Methods for Solving Nonlinear Simultaneous Equations ...
    Since it is impossible to com- pare times accurately when two methods are tested on two different machines or using two different programming systems, the ...
  10. [10]
    [PDF] 7 Quasi-Newton (secant) methods
    Equation (21) is known as the quasi-Newton condition, or the secant equation. Suppose that at every iteration we update the matrix Bk+1 by taking the matrix ...
  11. [11]
    [PDF] A rapidly convergent descent method for minimization
    A rapidly convergent descent method for minimization. By R. Fletcher and M. J. D. Powell. A powerful iterative descent method for finding a local minimum of a ...
  12. [12]
    [PDF] Lecture 21: Quasi-Newton Methods - cs.wisc.edu
    When the curvature condition holds, the secant equation Bk+1sk = yk has infinitely many solu- tions. Choosing Bk+1. To uniquely specify Bk+1, we can enforce ...
  13. [13]
    Variable Metric Method for Minimization - SIAM Publications Library
    Abstract. This is a method for determining numerically local minima of differentiable functions of several variables. In the process of locating each minimum, a ...Missing: URL | Show results with:URL
  14. [14]
    A comparison of Quasi-Newton methods considering line search ...
    Nov 22, 2022 · The paper compares SR-1, BFGS, and DFP Quasi-Newton methods with various line search conditions. BFGS-GC is fastest, DFP-SWC slowest, and BFGS- ...Missing: empirical | Show results with:empirical
  15. [15]
    [PDF] ON SOLVING LARGE-SCALE LIMITED-MEMORY QUASI-NEWTON ...
    The main contribution of this paper is a new approach by formulating a compact representation for the inverse of any member of the restricted Broyden class as ...
  16. [16]
    [PDF] Broyden Updating, the Good and the Bad! - FTP Server of the GWDG
    inverse of the good Broyden update can be seen to blow up if y⊤ i Bi−1si = 0. Second, the bad Broyden update is invariant with respect to linear variable.
  17. [17]
    Quasi- Newton Methods for Nonlinear Equations | Journal of the ACM
    A unified derivation is presented of the quasi-Newton methods for solving systems of nonlinear equations. The general algorithm contains, as special cases, ...
  18. [18]
    Practical quasi-Newton methods for solving nonlinear systems
    Griewank [44] has proved that Broyden's “good” method, with a suitable line search, also has global convergence properties assuming uniform nonsingularity of ...<|control11|><|separator|>
  19. [19]
    BFGS trust-region method for symmetric nonlinear equations
    For nonlinear equations, Griewank [11] first established a global convergence theorem for quasi-Newton method with a suitable line search. One nonmonotone ...Missing: globalized | Show results with:globalized
  20. [20]
    A Characterization of Superlinear Convergence and Its - jstor
    is not a quasi-Newton method. Although the Powell and Broyden, Dennis, More convergence proofs are quite distinct, the parts of both papers which deal with Q ...
  21. [21]
    [PDF] UPDATING THE INVERSE OF A MATRIX - People
    Quasi-Newton methods. The Modification Formula is often employed in quasi-Newton methods for finding a root of a function or for performing an uncon ...
  22. [22]
  23. [23]
    [PDF] user guide for minpack-1 - Mathematics and Computer Science
    MINPACK-1 is a package of Fortran subprograms for the numerical solution of systems of nonlinear equations and nonlinear least squares problems. This report ...
  24. [24]
    minimize — SciPy v1.16.2 Manual
    Method L-BFGS-B uses the L-BFGS-B algorithm [6], [7] for bound constrained minimization. Method Powell is a modification of Powell's method [3], [4] which ...minimize(method=’L-BFGS-B’) · Minimize · minimize(method=’SLSQP’) · Bounds
  25. [25]
    fminunc - Find minimum of unconstrained multivariable function
    fminunc is a nonlinear programming solver that finds the minimum of a multivariable function, where f(x) returns a scalar.
  26. [26]
    NLopt algorithms - NLopt Documentation
    NLopt includes implementations of a number of different optimization algorithms. These algorithms are listed below, including links to the original source code.
  27. [27]
    Unconstrained Nonlinear Optimization Algorithms - MathWorks
    fminunc quasi-newton Algorithm. Basics of Unconstrained Optimization. Although a wide spectrum of methods exists for unconstrained optimization, methods can ...
  28. [28]
    [PDF] On Recent Developments in BFGS Methods for Unconstrained ...
    Oct 29, 2023 · Variations of quasi-Newton methods that focus on managing the condition num- ber and preserving positive definiteness result in a marked ...
  29. [29]
    [PDF] Scaling, Newton's method quasi-Newton
    • Know Newton's method and computational effort per iteration. • Know quasi-Newton method and basic idea with secant condition. • Have seen Newton proximal ...
  30. [30]
    Greedy Quasi-Newton Methods with Explicit Superlinear Convergence
    Feb 3, 2020 · In this paper, we study greedy variants of quasi-Newton methods. They are based on the updating formulas from a certain subclass of the Broyden family.
  31. [31]
    Distributed adaptive greedy quasi-Newton methods with explicit non ...
    In this work, we adopt the greedy quasi-Newton (GQN) update in Rodomanov and Nesterov (2021a) which is a variant of the Broyden family with the greedily ...
  32. [32]
    Incremental Quasi-Newton Methods with Faster Superlinear ... - arXiv
    Feb 4, 2024 · This paper proposes a more efficient quasi-Newton method by incorporating the symmetric rank-1 update into the incremental framework.Missing: streaming online
  33. [33]
    Convergence and Stability Improvement of Quasi-Newton Methods ...
    The suggested new iteration strategy (“T-Secant”) allows for a full-rank update of the Jacobian approximate in each iteration by determining two independent ...
  34. [34]
    Multiple Greedy Quasi-Newton Methods for Saddle Point Problems
    This paper introduces the Multiple Greedy Quasi-Newton (MGSR1-SP) method, a novel approach to solving strongly-convex-strongly-concave (SCSC) saddle point ...Missing: SR1 | Show results with:SR1
  35. [35]
    A limited memory Quasi-Newton approach for multi-objective ...
    Mar 2, 2023 · We introduce, for the first time in the literature, a Limited Memory Quasi-Newton type method, which is well suited especially in large scale scenarios.