Fact-checked by Grok 2 weeks ago

Sequential quadratic programming

Sequential quadratic programming (SQP) is a class of iterative algorithms designed to solve nonlinear optimization problems subject to nonlinear constraints by approximating the original problem through a sequence of subproblems at each . These methods leverage the Karush-Kuhn-Tucker (KKT) conditions to formulate the subproblems, combining elements of for unconstrained optimization with active set strategies for handling constraints. The origins of SQP trace back to Wilson's 1963 PhD thesis, which introduced the foundational idea of using quadratic approximations and Lagrange multipliers for constrained nonlinear programs. Significant advancements occurred in the 1970s with contributions from and Powell, who developed quasi-Newton variants incorporating merit functions for global convergence and techniques to ensure descent. Further refinements in the 1980s and 1990s, including trust-region approaches by and filter methods, addressed issues like QP subproblem infeasibility and enhanced scalability for large-scale problems with sparse derivatives. In the SQP algorithm, at each iterate x_k, a quadratic subproblem is solved to determine a search direction d_k: minimize \nabla f(x_k)^T d + \frac{1}{2} d^T [B_k](/page/Hessian) d subject to linearized and constraints \nabla h(x_k)^T d + h(x_k) = 0 and \nabla g(x_k)^T d + g(x_k) \geq 0, where B_k approximates the of the . The next iterate is obtained via x_{k+1} = x_k + \alpha_k d_k, with step length \alpha_k selected using a merit function such as the augmented or l_1 penalty to promote global convergence. Locally, SQP exhibits quadratic convergence when using the exact and superlinear convergence with quasi-Newton updates like BFGS, assuming regularity conditions such as of constraint gradients and strict complementarity. SQP methods are widely implemented in commercial and , including SNOPT for large-scale sparse problems and NPSOL for dense cases, making them suitable for applications in engineering design, chemical process optimization, and . Their ability to handle general nonlinearities in objectives and constraints, while providing certificates of infeasibility, distinguishes them from interior-point or augmented methods, though they require efficient solvers and analytical derivatives for practical use.

Background and Prerequisites

Nonlinear Programming Overview

Nonlinear programming (NLP) encompasses the optimization of a nonlinear objective function f(\mathbf{x}), where \mathbf{x} \in \mathbb{R}^n, subject to equality constraints \mathbf{c}(\mathbf{x}) = \mathbf{0} and inequality constraints \mathbf{g}(\mathbf{x}) \leq \mathbf{0}, with f, \mathbf{c}, and \mathbf{g} being nonlinear functions. This formulation generalizes linear programming by allowing curvature in the objective and constraints, enabling the modeling of complex real-world phenomena. NLPs are classified into unconstrained problems, where no constraints are present; equality-constrained problems, involving only \mathbf{c}(\mathbf{x}) = \mathbf{0}; and general constrained cases that incorporate both and constraints. These categories reflect increasing in handling feasibility and optimality. Solving NLPs presents significant challenges, including nonconvexity that can lead to multiple local minima rather than a unique global optimum, as well as heightened computational demands due to the nonlinearity disrupting straightforward search procedures like those in . The field emerged prominently in the post-World War II era, with foundational work in the early , but saw substantial growth and application in the and driven by design problems and economic modeling needs, such as under nonlinear costs. For instance, consider minimizing f(x) = x^2 + \sin(x) subject to linear constraints like x \geq 0 and x \leq 2, where the oscillatory term introduces potential local minima. , with its quadratic objective and linear constraints, acts as a key building block for addressing broader NLPs.

Quadratic Programming Basics

Quadratic programming (QP) is a class of optimization problems characterized by a quadratic objective function and linear constraints. The standard formulation seeks to minimize \frac{1}{2} \mathbf{x}^T H \mathbf{x} + \mathbf{c}^T \mathbf{x}, where H is a symmetric matrix, subject to equality constraints A \mathbf{x} = \mathbf{b} and inequality constraints G \mathbf{x} \leq \mathbf{d}. This structure ensures the problem is when H is , allowing for a global minimum under feasible conditions. Several methods address the solvability of QP problems, each leveraging the convexity for efficient computation. Active-set strategies iteratively determine the binding constraints by solving reduced equality-constrained subproblems and adjusting the active set based on Lagrange multipliers. Interior-point methods navigate the interior using logarithmic barrier functions to handle inequalities, solving a sequence of barrier subproblems via . Dual formulations transform the primal QP into a maximization problem over variables, often yielding insights into and enabling specialized solvers. In broader optimization contexts, QP serves as a convex approximation to nonlinear problems, where the quadratic term models the local curvature of a more complex objective, facilitating reliable and computationally tractable local solutions. Convex QP instances exhibit polynomial-time solvability, as demonstrated by the ellipsoid method for bounded feasibility checks and interior-point approaches using self-concordant barrier functions, which achieve complexity bounds polynomial in the problem dimension and data precision. A representative application is mean-variance portfolio optimization, where the quadratic objective minimizes the portfolio's variance \mathbf{w}^T \Sigma \mathbf{w} (with \Sigma as the and \mathbf{w} as asset weights) subject to linear constraints on \mu^T \mathbf{w} \geq r and budget \mathbf{1}^T \mathbf{w} = 1, balancing risk and return efficiently.

Theoretical Foundations

Karush-Kuhn-Tucker Conditions

The Karush–Kuhn–Tucker (KKT) conditions are first-order necessary optimality criteria for local solutions to problems involving both equality and inequality constraints. These conditions extend the classical theorem from equality-constrained optimization to handle inequalities by incorporating nonnegative dual variables and a complementarity requirement. They form a cornerstone of theory, providing the framework for verifying optimality and deriving algorithms that seek points satisfying them. The conditions were first derived by William Karush in his 1939 master's thesis, where he addressed minima of functions subject to inequality side constraints. Independently, and formalized and published them in 1951 as part of their work on , earning the conditions their common acronym. Karush's contribution remained largely unrecognized until the , after which the full name was adopted to honor all three originators. Consider the general nonlinear programming problem of minimizing a twice continuously differentiable objective function f: \mathbb{R}^n \to \mathbb{R} subject to m equality constraints c(x) = 0, where c: \mathbb{R}^n \to \mathbb{R}^m is continuously differentiable, and p inequality constraints g(x) \leq 0, where g: \mathbb{R}^n \to \mathbb{R}^p is continuously differentiable: \begin{align*} \min_{x \in \mathbb{R}^n} \quad & f(x) \\ \text{subject to} \quad & c(x) = 0, \\ & g(x) \leq 0. \end{align*} To derive the KKT conditions, form the Lagrangian \mathcal{L}(x, \lambda, \mu) = f(x) + \lambda^\top c(x) + \mu^\top g(x), where \lambda \in \mathbb{R}^m and \mu \in \mathbb{R}^p are Lagrange multipliers for the equalities and inequalities, respectively. Suppose x^* is a local minimizer. For the equality constraints, the method of Lagrange multipliers yields the stationarity condition \nabla_x \mathcal{L}(x^*, \lambda^*, \mu^*) = 0 for some \lambda^*, which gives \nabla f(x^*) + \sum_{i=1}^m \lambda_i^* \nabla c_i(x^*) + \sum_{j=1}^p \mu_j^* \nabla g_j(x^*) = 0. For the inequalities, the derivation considers the active set \mathcal{A}(x^*) = \{ j \mid g_j(x^*) = 0 \}. Perturbations around x^* must not decrease the objective while satisfying the constraints, leading to the requirement that multipliers \mu_j^* for inactive constraints (j \notin \mathcal{A}(x^*)) can be zero, but for active ones, they must be nonnegative to balance the gradient. This results in the full set of KKT conditions, which hold if a constraint qualification is satisfied:
  1. Stationarity: There exist multipliers \lambda^* \in \mathbb{R}^m and \mu^* \in \mathbb{R}^p such that \nabla f(x^*) + \lambda^{*\top} \nabla c(x^*) + \mu^{*\top} \nabla g(x^*) = 0.
  2. Primal feasibility: c(x^*) = 0 and g(x^*) \leq 0.
  3. Dual feasibility: \mu^* \geq 0.
  4. Complementary slackness: \mu_j^* g_j(x^*) = 0 for all j = 1, \dots, p.
These conditions ensure that at x^*, the objective is a nonnegative of the active gradients, generalizing the unconstrained case \nabla f(x^*) = 0. The KKT conditions are necessary under appropriate constraint qualifications (CQs), which prevent degeneracy and ensure the existence and uniqueness of multipliers. The linear independence constraint qualification (LICQ) requires that the set \{ \nabla c_i(x^*) \mid i=1,\dots,m \} \cup \{ \nabla g_j(x^*) \mid j \in \mathcal{A}(x^*) \} consists of vectors. A weaker condition is the Mangasarian–Fromovitz constraint qualification (MFCQ), which states that \{ \nabla c_i(x^*) \mid i=1,\dots,m \} are and there exists a vector d \in \mathbb{R}^n such that \nabla c(x^*)^\top d = 0 and \nabla g_j(x^*)^\top d < 0 for all j \in \mathcal{A}(x^*). Both CQs guarantee that any local minimizer satisfies the KKT conditions, with LICQ also implying unique multipliers. In quadratic programming, the KKT conditions simplify to a linear complementarity problem, serving as exact optimality criteria that sequential methods approximate for general nonlinear cases. As an illustrative example, consider the simple problem \min_{x \in \mathbb{R}} x^2 subject to x \geq 1, or equivalently g(x) = 1 - x \leq 0. The unconstrained minimizer x = 0 violates the constraint, so check x^* = 1. Here, f(x) = x^2, \nabla f(1) = 2, and \nabla g(1) = -1. Stationarity requires $2 + \mu^* (-1) = 0, so \mu^* = 2 \geq 0. Primal feasibility holds as g(1) = 0 \leq 0, and complementary slackness is $2 \cdot 0 = 0. Thus, x^* = 1 satisfies the KKT conditions under LICQ (single active gradient).

Newton-Type Methods in Optimization

Newton-type methods in optimization leverage second-order information, specifically the of the objective function or Lagrangian, to compute search directions that accelerate convergence compared to first-order methods like . These methods form the foundation for (SQP) by approximating the nonlinear problem locally with quadratic models. In unconstrained optimization, the Newton method seeks to minimize a twice-differentiable function f: \mathbb{R}^n \to \mathbb{R} by iteratively solving the system \nabla^2 f(x_k) d_k + \nabla f(x_k) = 0 for the step d_k, yielding the update x_{k+1} = x_k + d_k. Near a strict local minimum where the Hessian is positive definite, this method exhibits quadratic convergence, meaning the error reduces quadratically with each iteration. For equality-constrained optimization, the method extends to the Lagrangian \mathcal{L}(x, \lambda) = f(x) + \lambda^T c(x), where c(x) = 0 are the constraints. The Newton step solves the linear system \begin{bmatrix} \nabla_{xx}^2 \mathcal{L}(x_k, \lambda_k) & \nabla c(x_k) \\ \nabla c(x_k)^T & 0 \end{bmatrix} \begin{bmatrix} d_k \\ \Delta \lambda_k \end{bmatrix} = - \begin{bmatrix} \nabla f(x_k) + \nabla c(x_k) \lambda_k \\ c(x_k) \end{bmatrix}, providing a direction that reduces both the objective and constraint violation when the linearized constraints are consistent and the Hessian on the tangent space is positive definite. This approach relates to the by iteratively satisfying the associated nonlinear system. When exact second derivatives are unavailable or computationally expensive, quasi-Newton methods approximate the Hessian (or its inverse) using gradient information from successive iterations. Seminal updates include the Davidon-Fletcher-Powell (DFP) formula, which modifies the Hessian approximation to satisfy the secant condition \tilde{H}_{k+1} y_k = s_k while preserving symmetry and positive definiteness under suitable line searches, and the Broyden-Fletcher-Goldfarb-Shanno (BFGS) update, which approximates the inverse Hessian and often performs better in practice due to its numerical stability. These methods offer superlinear or quadratic local convergence rates, exploiting curvature information for efficient progress toward optima, but they can fail globally without safeguards like line searches or trust regions, as pure Newton steps may increase the function value far from the solution. As an illustrative example, consider minimizing the one-dimensional function f(x) = (x-1)^2 + (x-2)^4, which has a minimum near x \approx 1.6. Starting at x_0 = 0, the gradient is \nabla f(0) = -34 and Hessian is \nabla^2 f(0) = 50, so the Newton step is d_0 = -(\nabla^2 f(0))^{-1} \nabla f(0) = 34/50 = 0.68, yielding x_1 = 0.68. Subsequent steps converge rapidly, demonstrating the inversion of the scalar Hessian to compute the direction.

Core SQP Framework

Equality-Constrained Formulation

Sequential quadratic programming (SQP) addresses equality-constrained nonlinear optimization problems of the form \min_x f(x) subject to c(x) = 0, where f: \mathbb{R}^n \to \mathbb{R} is a nonlinear objective function and c: \mathbb{R}^n \to \mathbb{R}^m consists of m nonlinear equality constraints with m < n. At each iteration k, starting from a current estimate x_k, SQP approximates the nonlinear problem by solving an equality-constrained quadratic program (QP) that captures the local curvature of the objective and linearizes the constraints. The QP subproblem is formulated as \min_d \ \nabla f(x_k)^T d + \frac{1}{2} d^T B_k d \quad \text{subject to} \quad \nabla c(x_k)^T d + c(x_k) = 0, where B_k is a symmetric positive definite matrix approximating the Hessian of the Lagrangian \mathcal{L}(x, \lambda) = f(x) + \lambda^T c(x) with respect to x. This subproblem derives from applying Newton's method to the first-order Karush-Kuhn-Tucker (KKT) conditions of the original problem, yielding a local quadratic model of the objective and a linear approximation of the constraints. The solution to the QP subproblem provides both the search direction d_k and an updated estimate of the Lagrange multipliers \lambda_{k+1}. Specifically, the KKT conditions for the QP are B_k d_k + \nabla c(x_k) \lambda_{k+1} = -\nabla f(x_k), \quad \nabla c(x_k)^T d_k = -c(x_k), which form a linear system solved for d_k and \lambda_{k+1}. The next iterate is then computed as x_{k+1} = x_k + \alpha_k d_k, where the step length \alpha_k > 0 is chosen via a procedure to ensure sufficient decrease in a merit function, as discussed in subsequent sections. This update maintains feasibility approximately while progressing toward the optimum. The matrix B_k may be taken as the exact second derivative \nabla_{xx}^2 \mathcal{L}(x_k, \lambda_k), leading to a Newton-type method with potential convergence near the solution, or approximated using quasi-Newton updates to reduce computational cost. Common quasi-Newton approximations, such as the BFGS formula, update B_k based on differences in gradient information from successive iterates, ensuring while requiring only oracle access. For instance, the BFGS update is B_{k+1} = B_k + \frac{y_k y_k^T}{y_k^T s_k} - \frac{B_k s_k s_k^T B_k}{s_k^T B_k s_k}, where s_k = x_{k+1} - x_k and y_k = \nabla_x \mathcal{L}(x_{k+1}, \lambda_{k+1}) - \nabla_x \mathcal{L}(x_k, \lambda_k). This maintains superlinear convergence under suitable conditions. To illustrate, consider the problem \min_{x,y} x^2 + y^2 subject to x + y = 1. At the initial point (x_k, y_k) = (0, 0), we have \nabla f = (0, 0), c_k = -1, and \nabla c = (1, 1). Assuming B_k = I_2 (the 2x2 ) as a simple initial approximation, the QP subproblem becomes \min_{d_x, d_y} \ \frac{1}{2} (d_x^2 + d_y^2) \quad \text{subject to} \ d_x + d_y = 1. The yields d = (0.5, 0.5) and \lambda_{k+1} = -0.5, providing a direction that moves toward the optimum (0.5, 0.5). Subsequent iterations refine this using updated B_k and . This example demonstrates how the QP linearizes the constraint and quadratically models the objective to generate feasible progress.

Inequality-Constrained Extension

The inequality-constrained extension of (SQP) addresses general nonlinear programs of the form \min_x f(x) subject to c(x) = 0 and g(x) \geq 0, where f is the objective, c are constraints, and g are constraints. This builds on the equality-constrained case by incorporating inequality handling into the (QP) subproblem at each iteration k, approximating the nonlinear program locally around the current point x_k. The QP subproblem is formulated as \min_d \ \nabla f(x_k)^T d + \frac{1}{2} d^T B_k d subject to \nabla c(x_k)^T d + c(x_k) = 0 and \nabla g_i(x_k)^T d + g_i(x_k) \geq 0 for each inequality index i, where B_k approximates the of the . A common approach to solving this QP subproblem is the active-set method, which estimates a working set \mathcal{A}_k \subseteq \{1, \dots, p\} of potentially active inequalities at x_k. The inequalities in \mathcal{A}_k are treated as equalities in the QP, yielding a reduced problem: \min_d \ \nabla f(x_k)^T d + \frac{1}{2} d^T B_k d \quad \text{s.t.} \quad \nabla c(x_k)^T d + c(x_k) = 0, \ \nabla g_i(x_k)^T d + g_i(x_k) = 0 \ \forall i \in \mathcal{A}_k, with the remaining inequalities checked for feasibility post-solution. The QP multipliers \mu_{k,i} \geq 0 for i \in \mathcal{A}_k are computed; if any \mu_{k,i} < 0, the constraint is removed from the active set, or if a violated inequality exists outside \mathcal{A}_k, it is added. This process iterates until a suitable QP solution is obtained, ensuring complementary slackness \mu_{k,i} ( \nabla g_i(x_k)^T d + g_i(x_k) ) = 0. The active-set strategy was foundational in early SQP developments for inequalities. For a more integrated treatment, the full Karush-Kuhn-Tucker (KKT) system of the QP subproblem can be solved directly, incorporating updates to Lagrange multipliers \lambda for equalities and \mu for inequalities. Assuming an active set \mathcal{A}_k, the system is \begin{bmatrix} B_k & \nabla c(x_k) & \nabla g_{\mathcal{A}_k}(x_k) \\ \nabla c(x_k)^T & 0 & 0 \\ \nabla g_{\mathcal{A}_k}(x_k)^T & 0 & 0 \end{bmatrix} \begin{bmatrix} d \\ \lambda_{k+1} \\ \mu_{k+1, \mathcal{A}_k} \end{bmatrix} = - \begin{bmatrix} \nabla f(x_k) \\ c(x_k) \\ g_{\mathcal{A}_k}(x_k) \end{bmatrix}, where complementarity is enforced via the active-set choice (\mu_{k+1,i} \geq 0 implies treated as active, \mu_{k+1,i} = 0 for inactive). This primal-dual formulation facilitates efficient computation in large-scale settings. Nonconvexity in the QP subproblem arises when B_k is indefinite, potentially leading to unbounded or multiple local solutions, which can cause numerical instability or failure to progress. To mitigate this, B_k is often modified to be positive semidefinite (e.g., via eigenvalue adjustment) or embedded in a trust-region constraint \|d\| \leq \Delta_k, ensuring a well-posed QP while preserving superlinear convergence near solutions. Such safeguards are essential for practical reliability. As an illustrative example, consider \min_x x^2 subject to x \geq 0. The global minimum is at x = 0. At an interior point x_k > 0, the inequality is inactive, so the QP reduces to \min_d 2x_k d + d^2 with no on d, yielding d = -x_k. Near the , say at x_k = 0, the active set includes the inequality, transforming the QP to \min_d d^2 subject to d \geq 0, with solution d = 0 and \mu_k = 0, confirming stationarity as the unconstrained QP minimum lies on the without requiring force. This demonstrates how active-set handling captures optima.

Iterative Algorithm Description

Sequential quadratic programming (SQP) operates as an iterative procedure to solve nonlinear optimization problems by successively approximating the original problem with a sequence of (QP) subproblems. At each iteration k, given a current estimate x_k of the and associated Lagrange multipliers \lambda_k, the forms a QP subproblem that uses a of the constraints and a quadratic of the objective function. The to this QP provides a search direction d_k and updated multipliers \lambda_{k+1}, which are then used to compute a step length \alpha_k and advance to the next iterate x_{k+1} = x_k + \alpha_k d_k. This process continues until convergence criteria are met, generating a sequence of points that ideally approach a local optimum satisfying the Karush-Kuhn-Tucker (KKT) conditions. The core iterative algorithm can be outlined in pseudocode as follows:
Initialize: Choose initial point x_0, multipliers λ_0, and Hessian approximation B_0 (e.g., [identity matrix](/page/Identity_matrix)); set k = 0; choose tolerance ε > 0.
While not converged:
    Form QP subproblem: Minimize ∇f(x_k)^T d + (1/2) d^T B_k d
                        subject to ∇c(x_k)^T d + c(x_k) = 0 (equalities)
                        and ∇g(x_k)^T d + g(x_k) ≥ 0 (inequalities),
                        where c and g are the [constraint](/page/Constraint) functions.
    Solve [QP](/page/QP) to obtain direction d_k and multipliers λ_{k+1}.
    Compute step length α_k (e.g., via [line search](/page/Line_search) to reduce a merit function).
    Set x_{k+1} = x_k + α_k d_k.
    Update B_{k+1} (e.g., using BFGS formula).
    Set k = k + 1.
End while
Return x_k, λ_k as approximate solution.
This pseudocode integrates the QP subproblem formation and solution as described in the equality- and inequality-constrained formulations. Termination occurs when the iterates satisfy approximate first-order optimality conditions, typically checked by ensuring the norm of the \|\nabla L(x_k, \lambda_k)\| < \epsilon, the constraint violation \|c(x_k)\| < \epsilon, and the inequality violation or complementarity measure (e.g., \min(g(x_k), -\lambda_k g(x_k)) < \epsilon) is sufficiently small. These criteria verify proximity to a KKT point, with \epsilon often set to $10^{-6} or smaller depending on problem scale and desired accuracy. Initialization strategies aim to provide a reasonable starting point to promote rapid convergence and avoid infeasible QP subproblems. For equality-constrained problems, a feasible x_0 can be found by solving a least-squares problem minimizing \|c(x)\|^2; for inequalities, slack variables may be introduced to render the initial point feasible, or a phase-I procedure minimizes constraint violations before the main loop. Multipliers \lambda_0 are often set to zero or estimated from a least-squares fit to \nabla c(x_0)^T \lambda \approx -\nabla f(x_0), while B_0 is typically a positive definite matrix like the identity to ensure QP convexity. The computational cost per iteration is dominated by solving the QP subproblem, which for a problem with n variables and m constraints generally requires O(n^3) operations in the dense case using methods like active-set or interior-point solvers, though sparsity can reduce this to O(n^2) or better with specialized factorizations. Updating the Hessian approximation B_k adds O(n^2) cost via quasi-Newton formulas. To illustrate, consider the toy nonlinear program: minimize f(x, y) = x^2 + y^2 subject to g(x, y) = x + y - 1 \geq 0. The optimum is at (0.5, 0.5) with f = 0.5, where the constraint is active. Assume an initial feasible point x_0 = (1, 0), \lambda_0 = 0, and Hessian approximation B_0 = \begin{pmatrix} 2 & 0 \\ 0 & 2 \end{pmatrix} (matching the true Hessian of the Lagrangian). At iteration 0, \nabla f(x_0) = (2, 0)^T, \nabla g(x_0) = (1, 1)^T, and g(x_0) = 0. The QP subproblem is: minimize $2 d_x + d_x^2 + d_y^2 subject to d_x + d_y \geq 0. Assuming the constraint binds (verified post-solution), solving yields d_0 = (-0.5, 0.5)^T and \lambda_1 = 1. Taking a full step \alpha_0 = 1 (as it reduces the objective), x_1 = (0.5, 0.5). At iteration 1, \nabla L(x_1, \lambda_1) = (0, 0)^T and g(x_1) = 0, so convergence is achieved (e.g., \|\nabla L\| < 10^{-6}). A third iteration is unnecessary, demonstrating quadratic convergence near the solution. This example highlights how SQP rapidly identifies the active set and converges in few steps for low-dimensional problems.

Analysis and Properties

Local Convergence Theory

Local convergence theory for sequential quadratic programming (SQP) methods establishes that, under suitable regularity conditions at a solution, the iterates converge rapidly to a local optimum of the nonlinear program. Specifically, when the exact Hessian of the Lagrangian is used in the quadratic subproblems, SQP exhibits quadratic convergence provided the linear independence constraint qualification (LICQ) and strong second-order sufficient conditions (SSOSC) hold. These conditions ensure the solution is isolated and the method behaves like a Newton iteration on the Karush-Kuhn-Tucker (KKT) system. If a quasi-Newton approximation to the Hessian is employed instead, convergence is superlinear, assuming the approximation satisfies standard quality conditions such as the Dennis-Moré criterion. The key assumptions underpinning this local theory include LICQ, which requires the gradients of the active constraints to be linearly independent at the solution, guaranteeing unique Lagrange multipliers and well-posedness of the KKT conditions. Additionally, SSOSC demands that the Hessian of the Lagrangian be positive definite on the tangent space to the active constraints, confirming the point is a strict local minimizer. Strict complementarity slackness is also typically assumed, ensuring nonzero multipliers for active inequality constraints, which stabilizes the identification of the active set near the solution. These assumptions are standard in nonlinear optimization and are verified at the KKT point. A proof sketch for quadratic convergence with the exact Hessian interprets the SQP step as a applied to the nonlinear . Let e_k = (x_k - x^*, \lambda_k - \lambda^*) denote the error in the primal-dual pair at iteration k. Near the solution, the KKT residual satisfies a quadratic approximation, leading to e_{k+1} \approx \begin{pmatrix} \nabla^2_{xx} L(x^*, \lambda^*) & A(x^*) \\ A(x^*)^T & 0 \end{pmatrix} e_k^2, where A(x^*) is the Jacobian of the active constraints, and the squared term arises from the second-order Taylor expansion of the nonlinear functions. Under and , this mapping is a contraction, yielding \|e_{k+1}\| = O(\|e_k\|^2). For quasi-Newton variants, the approximation error in the Hessian introduces a linear term, reducing the rate to superlinear, \|e_{k+1}\| = o(\|e_k\|). Sensitivity analysis reveals that the Lagrange multipliers converge at the same rate as the primal variables: quadratically with the exact Hessian and superlinearly with quasi-Newton approximations. This follows from the nonsingularity of the KKT matrix under LICQ and the continuous dependence of multipliers on the primal solution. Despite these strong guarantees, the local convergence can fail if the assumptions are violated, such as near degenerate constraints where LICQ does not hold, leading to nonunique multipliers or ill-conditioned subproblems. Similarly, absence of strict complementarity may cause erratic active set changes, disrupting the superlinear or quadratic behavior.

Global Convergence Enhancements

Sequential quadratic programming (SQP) methods, while exhibiting rapid local convergence near solutions, can fail to progress from arbitrary initial points due to the lack of descent guarantees in the nonlinear problem. To address this, global convergence enhancements incorporate mechanisms that ensure monotonic decrease in a suitable merit function, preventing divergence or cycling and promoting convergence to stationary points under mild assumptions. These techniques transform the purely local Newton-like updates into globally reliable algorithms, often at the cost of slightly slower local rates. A key component is the use of merit functions to balance objective reduction and constraint satisfaction. One common choice is the penalty merit function, defined as \phi(x, \rho) = f(x) + \frac{\rho}{2} \|c(x)\|^2 + \frac{\rho}{2} \|\min(0, g(x))\|^2, where f(x) is the objective, c(x) and g(x) are the equality and inequality constraint functions, respectively, and \rho > 0 is a penalty parameter that may be updated iteratively to enforce feasibility. This quadratic penalty form provides a continuously differentiable measure of progress, with the \min(0, g(x)) term penalizing violations of inequalities while ignoring satisfied ones. The search direction from the QP subproblem is then accepted only if it yields sufficient decrease in \phi, ensuring descent even when the linearized constraints are inaccurate far from the solution. To achieve this decrease, procedures are employed along the computed direction d_k. Standard Armijo conditions require that the step length \alpha_k satisfies \phi(x_k + \alpha_k d_k, \rho_k) \leq \phi(x_k, \rho_k) + \eta \alpha_k \nabla \phi(x_k, \rho_k)^T d_k for some $0 < \eta < 1, often combined with Wolfe curvature conditions to promote full steps near the solution. These backtracking searches safeguard against overshooting, maintaining bounded iterates and global convergence properties. Alternatively, trust-region variants constrain the QP subproblem to \|d\| \leq \Delta_k, solving for the direction within a local model trust region of radius \Delta_k. The radius is then adjusted—expanded if the actual merit function decrease exceeds a predicted model decrease by a factor greater than \gamma \in (0,1), or reduced otherwise—providing a geometry-based globalization that handles ill-conditioning robustly. When the Hessian approximation B_k in the QP subproblem is indefinite, leading to potentially unbounded or multiple solutions, modifications are applied to convexify it. A common approach adds a shifting term, such as \sigma_k I with \sigma_k \geq 0 chosen minimally to render B_k + \sigma_k I positive semidefinite, preserving sparsity and ensuring the QP remains bounded below while approximating the true curvature. This modification maintains descent properties without altering the null space of the constraints. Under assumptions like Lipschitz continuity of the problem functions, boundedness of iterates and Hessian approximations, and nondegeneracy, these enhancements guarantee that the sequence \{x_k\} converges globally to a point satisfying the Karush-Kuhn-Tucker conditions.

Implementation Aspects

Merit Functions and Step Adjustment

In sequential quadratic programming (SQP), merit functions serve as auxiliary scalar functions that combine the objective and constraint violation terms to guide the line search and ensure descent toward feasible and optimal points. A common choice is the \ell_1 penalty merit function, defined for problems with equality constraints c(x) = 0 and inequality constraints g(x) \leq 0 as m(x; \rho) = f(x) + \rho \|c(x)\|_1 + \rho \| [g(x)]_+ \|_1, where \rho > 0 is a penalty parameter, \| \cdot \|_1 denotes the \ell_1- (sum of values), and [g(x)]_+ = \max(0, g(x)) elementwise. This penalizes both constraint violations and promotes reduction in the objective f(x). The \ell_1 merit is nonsmooth due to the values but computationally efficient, as it requires only evaluations without gradients of the penalty terms. Exact penalty functions, such as the \ell_1 merit with sufficiently large \rho > \|\pi^*\|_\infty (where \pi^* are the optimal Lagrange multipliers), ensure that local minimizers of the merit function coincide with those of the original constrained problem, avoiding "" solutions that might arise from smaller penalties. In contrast, approximate penalties with smaller \rho balance feasibility and optimality progress but may require careful updating to maintain descent properties. The penalty parameter \rho is typically initialized to a modest value and updated dynamically: it is increased (e.g., doubled) if the norm of the multipliers from the quadratic program subproblem exceeds the current \rho, indicating potential inconsistency between the step and . This strategy, originally proposed in early SQP frameworks, prevents stagnation when constraints are active or ill-conditioned. Line search procedures in SQP use the merit function to determine an acceptable step length \alpha_k > 0 along the search direction d_k from the QP subproblem, ensuring sufficient decrease. A standard variant is with the Armijo condition, which accepts \alpha_k if m(x_k + \alpha_k d_k; \rho_k) \leq m(x_k; \rho_k) + \eta \alpha_k \nabla m(x_k; \rho_k)^T d_k, where $0 < \eta < 1 (often \eta = 10^{-4}) and \alpha_k is reduced from an initial value (e.g., 1) by a factor like 0.5 until satisfied. This condition guarantees a fraction \eta of the predicted linear decrease, promoting monotonic reduction in the merit function and contributing to global convergence when combined with positive definiteness of the Hessian approximation. To address cases where strict monotonicity leads to overly conservative steps or the Maratos effect (slow convergence near solutions), non-monotonic techniques like the watchdog method are employed. In the watchdog approach, the line search tolerates temporary increases in the merit function by referencing a "watchdog value" updated every few iterations (e.g., after q successful steps), allowing \alpha_k = 1 more readily while bounding overall progress. This enhances efficiency in practice without compromising convergence guarantees. For illustration, consider a simple iteration in a problem minimizing f(x) = x^2 subject to c(x) = x - 1 = 0, starting at x_0 = 2 with \rho_0 = 1. The QP yields direction d_0 = -1, and \nabla m(x_0; \rho_0) = (2x_0 + \rho_0 \cdot \text{sign}(c(x_0))) = (4 + 1) = 5, so the predicted decrease is \alpha \cdot (-5). With \eta = 0.1, test \alpha = 1: m(1; 1) = 1 + 1 \cdot |0| = 1, while m(2; 1) + 0.1 \cdot 1 \cdot (-5) = 5 - 0.5 = 4.5; since $1 < 4.5, accept \alpha = 1. If \rho were updated to 10 due to multiplier norm >1, \nabla m = 4 + 10 = 14, -14, right-hand side 5 + 0.1 \cdot 1 \cdot (-14) = 5 - 1.4 = 3.6; 1 < 3.6 still holds, emphasizing feasibility more.

Constraint Handling Techniques

In sequential quadratic programming (SQP) methods, feasibility restoration is invoked during iterations when the current point is infeasible and the quadratic programming (QP) subproblem lacks a feasible solution, addressing violations of equality and inequality constraints. This technique solves an auxiliary QP that minimizes the squared norm of linearized constraint violations, formulated as minimizing \| \nabla c(x_k)^T d + c(x_k) \|^2 + \| [ \nabla g(x_k)^T d + g(x_k) ]_+ \|^2, where c denotes equality constraints, g \leq 0 inequality constraints, and [ \cdot ]_+ = \max(0, \cdot) elementwise. This phase continues until feasibility is restored or a minimum violation is achieved, allowing the algorithm to return to normal SQP steps. Degeneracy in SQP arises when constraint gradients are linearly dependent or when strict complementarity fails, potentially leading to ill-conditioned QP subproblems and stalled . To handle this, stabilized SQP methods introduce regularization via an augmented Lagrangian term in the QP, incorporating a penalty parameter \rho_k > 0 to penalize constraint violations and ensure the remains positive definite on the , with \rho_k bounded between constants \sigma_0 and \sigma_1 to maintain . Perturbations, such as small shifts in the right-hand side of linearized constraints, can also be applied to resolve linear dependence, promoting full-rank Jacobians without altering the asymptotically. These approaches achieve local even under violated Mangasarian-Fromovitz constraint qualifications. Slack variables enable SQP to initiate from infeasible points by reformulating constraints g(x) \leq 0 as equalities g(x) + z = 0 with z \geq 0, transforming the QP subproblem into an equality-constrained form solvable via standard techniques. This allows the algorithm to generate steps that gradually reduce both objective and infeasibility, without requiring an initial feasible point, which is particularly useful for problems where finding a strictly feasible start is computationally expensive. The slacks are updated iteratively, converging to zero at feasible solutions under appropriate conditions. Scaling and preconditioning normalize constraints to mitigate numerical imbalances between equalities and inequalities, ensuring equitable treatment in the QP solver. Constraints are scaled by estimating their magnitudes (e.g., via norms of gradients or function values), applying diagonal scaling matrices to the to balance condition numbers, often reducing sensitivity to ill-conditioning by orders of magnitude in large-scale problems. Preconditioning further stabilizes the reduced in the QP by incorporating limited-memory quasi-Newton approximations, enhancing iterative solver for sparse systems. For instance, in resolving QP infeasibility due to near-zero gradients—indicative of degeneracy near a or manifold —feasibility restoration with slacks can perturb the linearized inequalities slightly, yielding a direction d that moves toward the feasible set while avoiding singular systems, as demonstrated in stabilized variants where regularization ensures a unique with bounded multipliers.

Applications and Alternatives

Practical Applications

Sequential quadratic programming (SQP) finds extensive use in engineering applications, particularly in problems within . For instance, it is employed for of rockets and reentry vehicles, where nonlinear constraints on , fuel consumption, and thermal loads must be satisfied to minimize time or maximize . The SNOPT solver, based on SQP, has been applied to such large-scale problems in the aerospace industry, demonstrating robustness for sparse nonlinear programs involving thousands of variables. In , SQP facilitates parameter estimation in reactor design under nonlinear constraints, such as and kinetic rate laws, to optimize process efficiency and yield. A hybrid genetic algorithm-SQP approach has been used to determine kinetic parameters in complex reacting systems, handling the nonlinearity inherent in parameter estimation problems from experimental data. This enables accurate modeling of reactor performance while respecting constraints on , pressure, and concentrations. Financial applications of SQP include incorporating transaction costs and risk measures, addressing nonlinear objectives like expected utility maximization subject to cardinality or diversification constraints. In models with measures for uncertain returns, SQP solves the resulting nonlinear programs to adjust portfolios dynamically, balancing returns against costs and risks such as value-at-risk. Such methods support rebalancing strategies that account for market frictions, improving practical implementability. In , SQP aids hyperparameter tuning and constrained training, where optimization involves nonlinear loss functions with bounds on model complexity or fairness constraints. For , trust-region SQP enforces hard constraints during training, enhancing accuracy in scientific simulations by integrating physical laws directly into the optimization. This approach mitigates issues like constraint violations in models applied to constrained domains.

Comparison with Other Methods

Sequential quadratic programming (SQP) differs from interior-point methods, such as those implemented in , primarily in its handling of inequality constraints. While interior-point methods employ barrier functions to maintain feasibility by penalizing constraint violations through logarithmic terms, SQP solves subproblems that approximate the nonlinear constraints linearly, allowing temporary infeasibility. This makes SQP generally faster per iteration, as evidenced by benchmarks where SQP achieves higher success rates (91.7%) on the CUTEst test collection compared to IPOPT's 82.3% with information, though interior-point methods scale better for very large-scale problems with many inequalities due to their ability to exploit sparse structures efficiently. In contrast to augmented Lagrangian methods like ALGENCAN, which augment the with quadratic penalty terms on constraints to enforce feasibility and handle nonconvexity through outer iterations updating multipliers, SQP integrates second-order information directly into its quadratic subproblems, often requiring fewer outer loops for . Augmented Lagrangian approaches excel in robustness for highly nonconvex problems by decoupling from objective minimization, but they typically demand more iterations overall due to the need for multiple penalty parameter adjustments. Stabilized variants combining SQP with augmented Lagrangian elements can mitigate these issues, achieving superlinear under second-order sufficient conditions while preserving guarantees. Compared to first-order gradient-based methods, such as , SQP leverages second-order approximations for more accurate step directions, leading to faster local near solutions but at higher computational cost per iteration. First-order methods, relying solely on gradients, are preferable for very large-scale problems where computation is prohibitive, offering at the expense of slower overall and lower accuracy on medium-scale nonlinear problems with sparse structures. Empirical comparisons on the CUTEst test set highlight SQP's strengths in medium-scale structural optimization, where implementations like SNOPT solve over 90% of problems with fewer function evaluations than interior-point alternatives, particularly when are below 3000. For instance, SQP methods based on quasi-Newton Hessian updates demonstrate superior iteration efficiency on constrained problems, outperforming first-order approaches. SQP is particularly suited for problems featuring sparse Hessians and a moderate number of constraints, where its quadratic subproblems can be solved efficiently; in such scenarios, it outperforms interior-point methods on speed and augmented Lagrangian on loop count, while methods are better reserved for ultra-large, derivative-light applications.

References

  1. [1]
    Sequential quadratic programming - Optimization Wiki
    Apr 1, 2022 · Sequential quadratic programming (SQP) is a class of algorithms for solving non-linear optimization problems (NLP) in the real world.Introduction · Background: Prerequisite... · Newton's Method · The SQP Algorithm
  2. [2]
    [PDF] Sequential Quadratic Programming - Duke University
    In a sense, the Newton version of the SQP algorithm can be thought of as the ideal algorithm for solving the nonlinear program: it provides rapid convergence ...
  3. [3]
    [PDF] SEQUENTIAL QUADRATIC PROGRAMMING METHODS
    We review some of the most prominent developments in SQP methods since 1963 and discuss the relationship of. SQP methods to other popular methods, including ...
  4. [4]
    [PDF] Nonlinear Programming - MIT
    Geometrically, nonlinear programs can behave much differently from linear programs, even for problems with linear constraints. In Fig. 13.1, the portfolio- ...Missing: seminal | Show results with:seminal
  5. [5]
    Nonlinear Programming - Project Euclid
    Nonlinear Programming Chapter. Author(s) HW Kuhn, AW Tucker. Editor(s) Jerzy Neyman. Berkeley Symp. on Math. Statist. and Prob., 1951: 481-492 (1951)
  6. [6]
    A Historical View of Nonlinear Programming: Traces and Emergence
    The historical view we propose in this introductory chapter will point out some of the technical difficulties of mathematical problems related to nonlinear ...
  7. [7]
    Quadratic programming - Optimization Wiki
    Dec 15, 2024 · A quadratic program is an optimization problem that comprises a quadratic objective function bound to linear constraints.Algorithm Discussion · Dual Formulation · Active Set Methods · Numerical Examples
  8. [8]
    [PDF] 1 Linear Programming - Princeton University
    • ... 2 Quadratic Programming. Definition 2. A quadratic program (QP) is an optimization problem with a quadratic ob- jective and linear constraints.
  9. [9]
    [PDF] Quadratic Programming
    Page 1. Quadratic Programming. Quadratic programming is a special case of non-linear programming, and has many applications. One application is for optimal ...
  10. [10]
    [PDF] Active-Set Methods for Quadratic Programming - CCoM
    each iteration of interior-point methods is more expensive because the method must solve linear ... LOQO: an interior-point code for quadratic programming.
  11. [11]
    [PDF] Primal and dual active-set methods for convex quadratic programming
    The proposed primal, dual, and combined primal–dual methods use a “conven- tional” active-set approach in the sense that the constraints remain unchanged ...
  12. [12]
    [PDF] Sequential Convex Programming - Stanford University
    Sequential convex programming (SCP) is a local optimization method for nonconvex prob- lems that leverages convex optimization. The basic idea is simple: we ...
  13. [13]
    [PDF] On the Performance of SQP Methods for Nonlinear Optimization
    SQP methods find approximate solutions of a sequence of quadratic programming (QP) subproblems in which a quadratic model of the Lagrangian is minimized subject ...
  14. [14]
    [PDF] Second-order cone programming - Department of Statistics
    Aug 18, 2001 · Like LP, QP and SDP problems, SOCP problems can be solved in polynomial time by interior point methods. The computational effort per iter- ation ...
  15. [15]
    [PDF] Markowitz Portfolio Theory II—Quadratic Programming (QP)
    Notice that the optimal portfolio has the same mean return as asset 3. However, risk of the portfolio is smaller than asset 3 2.748954 < 4.5 = σ3. 4. Lesson is, ...
  16. [16]
    Minima of Functions of Several Variables with Inequalities as Side ...
    The problem of determining necessary conditions and sufficient conditions for a relative minimum of a function \( f({x_1},{x_2},....,{x_n})\) in the class ...
  17. [17]
    Numerical Optimization | SpringerLink
    Numerical Optimization. Overview. Authors: Jorge Nocedal,; Stephen J. Wright. Jorge Nocedal. EECS Department, Northwestern University, Evanston, USA. View ...Derivative-Free Optimization · Sequential Quadratic... · Quasi-Newton Methods
  18. [18]
  19. [19]
    Sequential Quadratic Programming * | Acta Numerica | Cambridge ...
    Nov 7, 2008 · Sequential Quadratic Programming (SQP) has arguably become the most successful method for solving nonlinearly constrained optimization problems.Missing: seminal | Show results with:seminal
  20. [20]
    [PDF] Some Theoretical Properties of an Augmented Lagrangian Merit ...
    Global convergence is proved for an SQP algorithm that uses this merit function. We also prove that steps of unity are accepted in a neighborhood of the.
  21. [21]
    Global Convergence of a Trust-Region SQP-Filter Algorithm for ...
    A trust-region algorithm combining line search filter method with Lagrange merit function for nonlinear constrained optimization. Applied Mathematics and ...
  22. [22]
    [PDF] A second derivative SQP method: global convergence - ePubs
    Feb 9, 2009 · The first approach is to use a positive definite approximation to the Hessian in the quadratic subproblem. The resultant strictly convex QP is ...
  23. [23]
    On the Global Convergence of a Filter--SQP Algorithm - SIAM.org
    A mechanism for proving global convergence in SQP--filter methods for nonlinear programming (NLP) is described.
  24. [24]
    [PDF] Stabilized Sequential Quadratic Programming ∗ - People
    Abstract. Recently, Wright proposed a stabilized sequential quadratic programming algorithm for inequality constrained optimization.
  25. [25]
    [PDF] SNOPT: An SQP Algorithm for Large-Scale Constrained Optimization
    In Tjoa and Biegler's method, the QP subproblems are solved by eliminating variables using the (linearized) equality constraints, and the remaining variables ...
  26. [26]
    Modifying SQP for Degenerate Problems - SIAM.org
    Most local convergence analyses of the sequential quadratic programming ... sequential quadratic programming · degeneracy. Formats available. You can view ...
  27. [27]
    [PDF] SNOPT: An SQP algorithm for large-scale constrained optimization.
    Our aim is to describe a new SQP method that has the favorable theoretical properties of the NPSOL algorithm but is suitable for a broad class of large problems ...
  28. [28]
    A hybrid GA–SQP optimization technique for determination of kinetic ...
    Jul 24, 2008 · Mathematical modeling of reaction kinetics and determination of reaction parameters lead to nonlinear parameter estimation problems. There have ...
  29. [29]
    Portfolio adjusting optimization under credibility measures
    We apply the sequential quadratic programming method for the credibilistic portfolio adjusting problems to make the investment strategy derivation an easy ...
  30. [30]
    [2409.10777] Physics-Informed Neural Networks with Trust-Region ...
    Sep 16, 2024 · This paper addresses the failure modes of PINNs by introducing a novel, hard-constrained deep learning method -- trust-region Sequential Quadratic Programming ...
  31. [31]
    [PDF] Post-Optimality Analysis In Aerospace Vehicle Design
    Aug 11, 2020 · Optimization was performed with the sequen- tial quadratic programming algorithm, NPSOL, which uses a quasi-Newton method to approximate the ...Missing: shuttle | Show results with:shuttle
  32. [32]
    [PDF] Combining stabilized SQP with the augmented Lagrangian algorithm
    Abstract For an optimization problem with general equality and inequality constraints, we propose an algorithm which uses subproblems of the stabilized SQP ( ...
  33. [33]
    [PDF] Improving ultimate convergence of an Augmented Lagrangian method
    In these cases, it is better to persevere with the matrix-free techniques of Algencan and Gencan. However, we felt the necessity of confirming these conclusions ...
  34. [34]
    [PDF] Comparing three popular optimization algorithms by considering ...
    Algorithms that we want to consider in this paper are 1-Steepest ascent method (SA) 2-Sequential quadratic programming method (SQP) and 3-Interior point method ...
  35. [35]
    [PDF] Nonlinear Optimization: A Comparison of Two Competing Approaches
    SQP (e04vh) advantages. • Efficient on highly constrained problems. • Can capitalize on a good initial point. • Stays feasible with respect to the linear ...