Fact-checked by Grok 2 weeks ago

Backtracking line search

Backtracking line search is an inexact technique used in to determine a suitable step \alpha_k along a descent direction p_k in iterative algorithms, by starting with a trial value (often \alpha_k = 1 or \bar{\alpha} > 0) and iteratively reducing it by a contraction factor \rho \in (0,1) until a sufficient decrease condition—typically the Armijo rule f(x_k + \alpha_k p_k) \leq f(x_k) + c \alpha_k \nabla f(x_k)^T p_k with c \in (0,1)—is satisfied for the objective function f. This method ensures progress toward minimizing f without requiring the computationally intensive exact minimization of the one-dimensional function \phi(\alpha) = f(x_k + \alpha p_k), thereby balancing efficiency and reliability in algorithms like . The algorithm for backtracking line search, as formalized in standard optimization literature, proceeds as follows: select parameters \bar{\alpha} > 0, \rho \in (0,1), and c \in (0,1); initialize \alpha \leftarrow \bar{\alpha}; while f(x_k + \alpha p_k) > f(x_k) + c \alpha \nabla f(x_k)^T p_k, update \alpha \leftarrow \rho \alpha; and accept the resulting \alpha_k = \alpha as the step length, setting x_{k+1} = x_k + \alpha_k p_k. Under the assumption that \nabla f is Lipschitz continuous, this process terminates in a finite number of iterations, guaranteeing a decrease in f and descent in the direction p_k. Variants may incorporate additional Wolfe curvature conditions—such as |\nabla f(x_k + \alpha_k p_k)^T p_k| \leq -\beta \nabla f(x_k)^T p_k with \beta \in (c, 1)—to further control the step and prevent issues like the Maratos effect in constrained settings. Backtracking line search plays a crucial role in promoting global for a wide range of optimization methods, including steepest descent, nonlinear conjugate gradient (e.g., Fletcher-Reeves), quasi-Newton updates like BFGS and L-BFGS, Newton methods, and (SQP) for nonlinearly constrained problems. When paired with , it ensures that the algorithm satisfies the Zoutendijk descent criterion, leading to to points under assumptions like bounded level sets and twice continuous differentiability of f. In large-scale or noisy optimization, such as implicit filtering or , backtracking adaptations maintain stability by enforcing feasibility and handling perturbations, often achieving superlinear rates when combined with nonmonotone strategies.

Fundamentals

Motivation

In iterative optimization algorithms like gradient descent, fixed step sizes pose significant challenges: excessively large steps can cause the iterates to diverge by overshooting the minimum, while overly conservative small steps lead to sluggish progress and require an impractically large number of iterations to approach the optimum. Line search techniques mitigate these issues by solving a one-dimensional subproblem to select an adaptive step size along a given descent direction, ensuring sufficient decrease in the objective function value to guarantee monotonic progress toward a minimum. A key mechanism is the Armijo rule, which enforces that the step achieves at least a predefined fraction of the predicted decrease based on the directional derivative, thereby balancing efficiency and stability without demanding full minimization. This approach gained prominence in nonlinear optimization due to the prohibitive cost of exact , which requires solving a potentially univariate optimization at each iteration, especially in high dimensions. Inexact variants like emerged as computationally affordable alternatives, starting from an initial guess (often 1) and repeatedly halving the step until the sufficient decrease condition holds, thus approximating the ideal step with minimal evaluations. The development of backtracking line search was particularly motivated by the needs of quasi-Newton methods in the and , where rapid Hessian approximations demanded reliable yet inexpensive step size adjustments to preserve desirable rates in solving nonlinear systems and unconstrained problems. Foundational contributions, including Armijo's sufficient decrease in 1966 and subsequent integrations into variable-metric updates by researchers like Broyden, Fletcher, Goldfarb, and Shanno around 1970, established inexact line search as a cornerstone for practical implementations.

Core Concepts

Backtracking line search is a technique within the broader framework of methods in optimization, where the goal is to determine a suitable step length \alpha > 0 that reduces the objective function f along a specified direction d from the current point x. Specifically, it seeks to approximately solve \min_{\alpha > 0} f(x + \alpha d), assuming f is differentiable and d satisfies \nabla f(x)^T d < 0 to ensure descent. This approach balances computational efficiency with progress toward minimizing f, particularly for smooth functions where exact minimization along the line would be costly. Central to inexact line search methods, including backtracking, are stopping criteria that guarantee sufficient progress without requiring exact minimization. The Armijo condition, also known as the sufficient decrease condition, is the primary criterion used in backtracking and states that the step \alpha must satisfy f(x + \alpha d) \leq f(x) + c \alpha \nabla f(x)^T d, where $0 < c < 1 (typically c \approx 10^{-4}) ensures a fraction of the first-order predicted decrease is achieved. This condition, introduced by , leverages the Lipschitz continuity of \nabla f to bound the function decrease and promote convergence. Complementary criteria include the , \nabla f(x + \alpha d)^T d \geq \sigma \nabla f(x)^T d, with c < \sigma < 1, which prevents steps that are excessively small by enforcing approximate satisfaction of the first-order optimality along the line; this was proposed by to strengthen convergence guarantees in ascent and descent methods. The further refine this by adding a lower bound to avoid overly conservative steps: f(x + \alpha d) \geq f(x) + (1 - c) \alpha \nabla f(x)^T d. While the full set of Armijo, Wolfe, and Goldstein conditions provides robust inexactness, backtracking primarily relies on the Armijo rule for its simplicity and effectiveness in practice. In backtracking line search, the process begins with an initial step length \alpha_0 > 0, often chosen heuristically (e.g., 1), and iteratively reduces it via multiplication by a parameter \beta \in (0, 1) (commonly \beta = 0.5 or $0.8) until the Armijo condition holds: \alpha_{k+1} = \beta \alpha_k. This successive halving or ensures termination in finite steps under standard assumptions on f, as the reductions geometrically decrease \alpha while the sufficient decrease criterion is eventually met due to the descent property of d. The descent d is typically the negative d = -\nabla f(x) in methods, though it can be adapted for other schemes like . This mechanism adapts the step size dynamically, making it suitable for non-constant constants in \nabla f.

Algorithm Description

Step-by-Step Procedure

The backtracking line search algorithm initiates by selecting an initial step size \alpha_0 > 0, commonly set to 1 for simplicity or derived from an estimate of the constant of the to promote in the early iterations. Parameters such as the factor \beta \in (0,1) (typically 0.5 to 0.8) and a small minimum for \alpha are also chosen to control the reduction process. The core iterative loop then commences: starting with \alpha = \alpha_0, the algorithm evaluates whether the Armijo condition—a sufficient decrease criterion referencing the along the search direction—is satisfied at the trial point. If violated, the step size is reduced by setting \alpha \leftarrow \beta \alpha, and the condition is checked again at the new trial point; this backtracking repeats, geometrically shrinking \alpha until the Armijo condition holds or \alpha drops below the predefined minimum threshold, ensuring the selected step promotes objective function reduction without excessive conservatism. Upon exiting the loop, the algorithm terminates by accepting the current \alpha as the valid step size for the outer optimization iteration, balancing computational cost with progress toward a local minimum. To handle edge cases such as numerical instability or pathological functions that might otherwise cause prolonged or infinite , a maximum number of iterations within the loop is typically imposed, after which a minimal feasible step or recovery strategy is adopted.

Pseudocode and Implementation Notes

The backtracking line search determines a suitable step \alpha along a descent direction d by starting with an initial guess and iteratively reducing it until the Armijo condition is satisfied, ensuring sufficient decrease in the objective function f. The following pseudocode outlines the procedure:
function BacktrackLineSearch(x, d, f, ∇f, α₀=1.0, ρ=0.5, c=1e-4, α_min=1e-10):
    α = α₀
    while f(x + α d) > f(x) + c α (∇f(x))^T d and α > α_min:
        α = ρ α
    if α <= α_min:
        # Safeguard: return minimum step or handle failure (e.g., raise warning)
        α = α_min
    return α
This implementation computes the step length by evaluating the function f and its gradient \nabla f in a loop, reducing \alpha multiplicatively until the condition holds or a minimum threshold is reached. Key parameters include \rho, the backtracking factor typically set between 0.5 and 0.8 to control the aggressiveness of step reduction; c, the Armijo parameter often chosen as $10^{-4} to enforce a modest sufficient decrease; and \alpha_0, the initial step length commonly initialized to 1.0 assuming normalized directions. An additional safeguard \alpha_{\min}, such as $10^{-10}, prevents excessive reductions that could lead to numerical underflow or infinite loops. In practice, the algorithm performs O(k) function evaluations, where k is the number of backtracking iterations, which is usually small (e.g., 1-5) for well-conditioned problems. For efficiency in high-dimensional settings, computations should be vectorized using libraries like , computing x + \alpha d and the inner product (\nabla f(x))^T d in a single operation to minimize overhead. Integration with optimizers such as involves calling this function after computing the direction d = -\nabla f(x), then updating x \leftarrow x + \alpha d. Common pitfalls include numerical instability from floating-point precision errors when \alpha becomes very small, potentially causing the Armijo condition to fail unpredictably due to rounding in f(x + \alpha d); this can be mitigated by using higher-precision arithmetic or clamping \alpha early. Additionally, a poor choice of initial \alpha_0 (e.g., too large for steep gradients) may lead to many iterations, increasing computational cost, while \rho too close to 1 can result in overly conservative steps that slow overall convergence.

Practical Applications

In Deterministic Gradient Descent

In deterministic gradient descent, backtracking line search is integrated to adaptively select the step size \alpha_k along the negative direction, ensuring sufficient function decrease at each iteration for smooth, twice-differentiable objective functions f: \mathbb{R}^n \to \mathbb{R}. The update rule is given by \mathbf{x}_{k+1} = \mathbf{x}_k + \alpha_k (-\nabla f(\mathbf{x}_k)), where \alpha_k > 0 is determined via backtracking, typically starting from an initial guess (e.g., \alpha_k = 1) and repeatedly reduced by a factor \rho \in (0,1) until the Armijo condition f(\mathbf{x}_k + \alpha_k \mathbf{p}_k) \leq f(\mathbf{x}_k) + c \alpha_k \nabla f(\mathbf{x}_k)^\top \mathbf{p}_k holds, with \mathbf{p}_k = -\nabla f(\mathbf{x}_k) and c \in (0,1). This approach guarantees descent and promotes global convergence under mild assumptions on f, such as of the . A practical example arises in minimizing functions, which model many deterministic optimization problems like least-squares fitting. Consider f(\mathbf{x}) = \frac{1}{2} \mathbf{x}^\top Q \mathbf{x} - \mathbf{b}^\top \mathbf{x}, where Q \succ 0 is the ; ensures the step size avoids violating the sufficient decrease condition, leading to steady progress toward the minimum \mathbf{x}^* = Q^{-1} \mathbf{b}. Similarly, in for , where f(\mathbf{w}) = -\sum_{i=1}^m [y_i \log(\sigma(\mathbf{w}^\top \mathbf{x}_i)) + (1-y_i) \log(1 - \sigma(\mathbf{w}^\top \mathbf{x}_i))] and \sigma is the , adapts \alpha_k to handle the but potentially ill-conditioned landscape, ensuring descent without manual tuning. Compared to fixed-step , backtracking offers key advantages in ill-conditioned problems, where the \kappa(Q) = \lambda_{\max}/\lambda_{\min} \gg 1 can cause fixed steps to overshoot minima or stall progress. The adaptive \alpha_k dynamically scales the step to respect local , reducing zigzagging and accelerating — for instance, fixed steps might require \alpha < 2/\lambda_{\max} for stability, but backtracking achieves larger effective steps on average without prior knowledge of \lambda_{\max}. An empirical illustration in 2D appears with the ill-conditioned quadratic f(x_1, x_2) = \frac{1}{2}(10 x_1^2 + x_2^2), starting from an initial point away from the origin (e.g., on the order of magnitude 10). Using backtracking with Armijo parameters \alpha = 0.5 and \beta = 0.5 (initial t=1), the algorithm performs 12 outer iterations (40 total line search steps) to converge near the minimum at (0,0), tracing a path that efficiently navigates the elongated valley along the x_2-axis without overshooting, as visualized in convergence plots.

In Stochastic Gradient Descent

In stochastic gradient descent (SGD), backtracking line search is adapted by incorporating stochastic gradient estimates g_k \approx \nabla f(x_k) directly into the Armijo condition, typically formulated as f(x_k + \alpha_k d_k) \leq f(x_k) + c \alpha_k g_k^\top d_k, where d_k is the search direction (often d_k = -g_k). This substitution accounts for the noisy nature of g_k, computed via mini-batches. In standard SGD settings, backtracking is performed per iteration to dynamically adjust the step size after each gradient update, though it can be aggregated per epoch in large-scale scenarios to curb overhead. Initial step sizes \alpha_0 are commonly initialized via diminishing schedules, such as linear warm-up over initial iterations followed by adaptive scaling (e.g., doubling every fixed number of steps until backtracking engages), which helps stabilize early training phases amid high variance. Practical implementation in SGD reveals challenges from noise, including a higher number of function evaluations during backtracking loops, as stochastic fluctuations can cause repeated rejections of trial steps. This is often alleviated through adaptations that account for variance in gradient estimates. Research has explored backtracking line search in deep learning for tasks like multi-class classification with , where it can lead to faster convergence and better generalization compared to fixed learning rates in SGD. Recent studies as of 2024 have investigated improvements to line search methods for large-scale training. However, due to the computational cost of multiple function evaluations, it remains more experimental than standard in large-scale deep learning.

Step Size Determination

Establishing Lower Bounds

In backtracking line search algorithms, establishing a lower bound on the step size, denoted \alpha_{\min}, is essential to prevent stagnation, where repeated reductions lead to negligible progress and potential halt in optimization. Without such a bound, the procedure may generate impractically small steps, slowing convergence or causing the algorithm to fail in reaching a minimum. This bound ensures that each iteration maintains sufficient movement while respecting numerical stability constraints. Typically, \alpha_{\min} is set to a small positive value like $10^{-10}, chosen to be larger than machine epsilon (approximately $2.22 \times 10^{-16} for double-precision floating-point arithmetic) to avoid precision loss in computations. Theoretically, if the gradient \nabla f is L-Lipschitz continuous, step sizes \alpha \leq 2/L guarantee a decrease in the objective function along the negative gradient direction[](https://www.cs.ubc.ca/~schmidtm/Courses/540-W18/L4.pdf); however, backtracking may select smaller values during reductions, necessitating \alpha_{\min} to prevent excessively small steps, such as \min\{\alpha_0, \beta / L\} where \beta \in (0,1) is the contraction factor. Practical heuristics mitigate excessive backtracking by restarting the initial step size \alpha_0 (e.g., halving it) if the number of reductions exceeds a threshold like 5–10 in a single line search, preventing persistent small steps in future iterations. Another approach scales \alpha_{\min} proportionally to $1 / \|\nabla f(x_k)\|, ensuring the bound adapts to the local gradient magnitude and avoids overly conservative steps when gradients are large. Violating the lower bound risks numerical underflow, where step sizes approach the limits of floating-point representation, leading to inaccurate function evaluations, or infinite loops in the backtracking procedure if no cutoff is enforced. These issues can trap the algorithm near stationary points without verifiable progress, underscoring the need for robust bound enforcement in implementations.

Establishing Upper Bounds

In backtracking line search, the procedure initiates with a trial step size \alpha_0 that acts as an initial upper bound, from which the step is iteratively reduced until a sufficient decrease condition, such as the , is met. A standard choice for \alpha_0 is 1, which is particularly suitable when the search direction is the negative gradient, providing a simple and effective starting point that balances computational efficiency and adaptability across many optimization problems. When a Hessian approximation B is available, as in quasi-Newton methods, \alpha_0 can be more precisely estimated using a quadratic approximation along the search direction: \alpha_0 \approx \frac{\|\nabla f(x)\|^2}{\nabla f(x)^T B \nabla f(x)}. This estimate leverages the curvature information from the approximate Hessian to suggest a step closer to the optimal, often minimizing the number of subsequent backtracking iterations required. Under assumptions of smoothness with Lipschitz constant L, theoretical upper bounds ensure monotonic decrease; specifically, step sizes \alpha \leq 2/L guarantee that the quadratic upper bound from the descent lemma yields a negative directional derivative term. However, backtracking line search rarely imposes explicit caps on \alpha_0 beyond the initial trial, as the iterative reduction mechanism inherently prevents instability without additional constraints. In practice, to avoid excessive function evaluations from prolonged backtracking, implementations commonly enforce an upper limit on the number of iterations per line search, typically 20 to 50 steps, beyond which a minimal acceptable step is accepted. Alternatively, incorporating alongside can provide tighter control by enforcing curvature requirements, further bounding the step to promote global convergence properties. These strategies involve trade-offs: conservative choices for \alpha_0 or stricter iteration limits reduce the risk of computational overhead but may result in smaller accepted steps, potentially slowing overall convergence rates in ill-conditioned problems.

Analysis and Guarantees

Computational Efficiency

The per-iteration computational cost of backtracking line search is dominated by the function and gradient evaluations needed to determine an acceptable step length. In a typical implementation, the gradient is computed once at the current iterate, incurring a cost that scales with the problem dimension n, while the backtracking loop requires multiple function evaluations—often an average of 1 to 3 per outer iteration—to satisfy the sufficient decrease condition, such as the . This results in an overall per-iteration complexity of O(kn), where k denotes the average number of backtracks, assuming linear scaling of evaluation costs with dimension. Compared to exact line search, which involves minimizing a one-dimensional model along the search direction and can demand substantially more evaluations or even iterative solvers, backtracking is markedly cheaper per iteration due to its simple multiplicative reduction of the trial step size. However, this approximation may necessitate additional outer iterations to achieve convergence, particularly in methods like where precise step lengths influence direction quality. Several factors influence the efficiency of backtracking line search. For ill-conditioned objective functions, the curvature mismatch often leads to more backtracks, as larger initial step sizes fail the sufficient decrease condition more frequently, increasing k. High-dimensional problems exacerbate this by amplifying the cost of each evaluation, since function and gradient computations typically grow at least linearly with n. Parameter choices, such as the contraction factor \rho \in (0,1) and the Armijo constant c \in (0,0.5), also play a role in balancing the number of backtracks against progress per step. Optimizations can mitigate these costs without compromising reliability. For instance, gradients computed at the current point are reused across backtracks, avoiding redundant differentiations, while safeguarded variants interpolate to estimate better initial step sizes and reduce average k. In some contexts, approximate surrogate models for the objective function further limit full evaluations during the search.

Theoretical Convergence Properties

Under the standard assumptions that the objective function f: \mathbb{R}^n \to \mathbb{R} is continuously differentiable and its \nabla f is continuous with constant L > 0, line search provides robust guarantees for optimization algorithms such as . These properties ensure that the search direction, typically a direction like the negative , leads to sufficient progress in reducing f. In , where the search direction p_k = -\nabla f(x_k), line search satisfying the Armijo sufficient decrease condition guarantees that each decreases f by at least c(1 - \beta) \|\nabla f(x_k)\|^2 / (2L), with Armijo parameter c \in (0, 1) and backtracking contraction factor \beta \in (0, 1). This bound arises from the quadratic approximation enabled by the Lipschitz assumption, ensuring the accepted step size \alpha_k is sufficiently large to achieve meaningful progress when \nabla f(x_k) \neq 0. Consequently, the algorithm exhibits global , with \liminf_{k \to \infty} \|\nabla f(x_k)\| = 0, meaning points are approached along any limit point of the iterate sequence. Regarding convergence rates, for general smooth convex functions, backtracking line search in yields sublinear convergence of O(1/k) in function value to the optimum. For strongly functions (with \mu > 0), the method achieves linear , where the error decreases geometrically as \|x_k - x^*\| = O(\rho^k) for some \rho \in (0, 1), assuming the constant L satisfies the standard condition L > \mu. Extensions to more advanced methods, such as quasi-Newton algorithms, leverage backtracking with weak (Armijo sufficient decrease plus a weak condition) to attain superlinear rates near points, provided the approximations satisfy certain bounded deterioration properties.

References

  1. [1]
    [PDF] Numerical Optimization - UCI Mathematics
    ... Nocedal. Stephen J. Wright. EECS Department. Computer Sciences Department ... (Backtracking Line Search). Choose ¯α > 0, ρ ∈ (0, 1), c ∈ (0, 1); Set α ...
  2. [2]
    [PDF] Gradient descent revisited
    Same rate for a step size chosen by backtracking search. Theorem: Gradient descent with backtracking line search satis- fies f(x(k)) − f(x?) ≤ kx(0) − x ...
  3. [3]
    Minimization of functions having Lipschitz continuous first partial ...
    1966 Minimization of functions having Lipschitz continuous first partial derivatives. Larry Armijo · DOWNLOAD PDF + SAVE TO MY LIBRARY. Pacific J. Math.
  4. [4]
    Quasi-Newton Methods, Motivation and Theory | SIAM Review
    This paper is an attempt to motivate and justify quasi-Newton methods as useful modifications of Newton's method for general and gradient nonlinear systems ...
  5. [5]
    Convergence Conditions for Ascent Methods | SIAM Review
    Convergence Conditions for Ascent Methods. Author: Philip WolfeAuthors Info & Affiliations. https://doi ...
  6. [6]
    Constructive Real Analysis - Allen A. Goldstein - Google Books
    Title, Constructive Real Analysis Harper international editions · Harper's series in modern mathematics ; Author, Allen A. Goldstein ; Publisher, Harper & Row, ...
  7. [7]
    Line search methods - Optimization Wiki
    Dec 16, 2021 · The backtracking method is often used to find the appropriate step length and terminate line search based. The backtracking method starts with a ...Introduction · Generic Line Search Method · Exact Search · Inexact Search
  8. [8]
    [PDF] BACKTRACKING LINE SEARCH 1. Line Search Methods Let f
    Backtracking line search computes a step length t* by finding the largest γν where f(xc + γνd) ≤ f(xc) + cγνf0(xc; d) is satisfied, updating xn = xc + t*d.
  9. [9]
    [PDF] Line search methods
    We can just use a constant step size αk = α. This will work if the step size is small enough, but usually this results in using more iterations than necessary.Missing: motivation challenges
  10. [10]
    [PDF] Line Search Methods for Unconstrained Optimisation - People
    Line search involves calculating a search direction, then a steplength (αk) to reduce the objective function, and setting xk+1 = xk + αkpk.
  11. [11]
    NOX::LineSearch::Backtrack Class Reference - Index of /
    Generic backtracking line search. This line search starts with the step ... "Max Iters" - maximum number of iterations (i.e., RHS computations) ...
  12. [12]
    MINIMIZATION OF FUNCTIONS HAVING LIPSCHITZ CONTINUOUS ...
    1, 1966. MINIMIZATION OF FUNCTIONS HAVING LIPSCHITZ. CONTINUOUS FIRST PARTIAL DERIVATIVES. LARRY ARMIJO. A general convergence theorem for the gradient method.
  13. [13]
    [PDF] Gradient Descent
    One way to adaptively choose the step size is to use backtracking line search: • First fix parameters 0 <β< 1 and 0 < α ≤ 1/2.
  14. [14]
    A Stochastic Line Search Method with Convergence Rate Analysis
    Jul 20, 2018 · We adapt a classical backtracking Armijo line-search to the stochastic optimization setting. While traditional line-search relies on exact computations of the ...
  15. [15]
    [PDF] Approximately Exact Line Search - arXiv
    Apr 12, 2022 · ... backtracking line search ... As is evident in the bounds, the convergence rate of Armijo-based methods depends on the minimum step size they take.
  16. [16]
    [PDF] Optimization methods
    Feb 8, 2016 · the step size in the backtracking line search satisfies αk ≥ αmin ... Achieves lower bound, i.e. O. ( 1√. ) convergence. Uses momentum ...
  17. [17]
    [PDF] METRIC-AWARE OPTIMIZATION OF HIGH-ORDER MESHES FOR ...
    Section 3.3.1, we first review the standard backtracking line-search strategy (Nocedal ... Finally, the standard BLS strategy restarts the next initial ... too many ...
  18. [18]
    [PDF] Broadening the Basin All Downhill from Here
    infinite loop. We guard against this possibility in two ways. First, we will ... we use the line search algorithm sketched above (backtracking line search.
  19. [19]
    Line Search Methods Modelling and Scientific Computing
    Program the steepest descent and Newton algorithms using the backtracking line search. Use them to minimize the Rosenbrock function. Set the initial step length ...Missing: numerical | Show results with:numerical
  20. [20]
    [PDF] Optimization Randomized Hessian estimation and directional search
    Mar 29, 2010 · ... Hessian estimation technique to update a quadratic ... backtracking line search with initial step size equal to that suggested by the exact.
  21. [21]
    [PDF] CPSC 540: Machine Learning - Convergence of Gradient Descent
    So let's consider doing gradient descent with a step-size of αk = 1/L, wk+1 ... Same argument shows that any αk < 2/L will decrease f. Page 17. Gradient ...
  22. [22]
    line_search — SciPy v1.16.2 Manual
    Maximum number of iterations to perform. Returns: alphafloat or None. Alpha for which x_new = x0 + alpha * pk , or None if the line search algorithm did not ...
  23. [23]
    Gradient Descent - Strongly Convex
    Apr 10, 2013 · x)||22+α2L2||∇f(x)| ... Step Size The proof above relies on a constant step size, but ...