Fact-checked by Grok 2 weeks ago

Constrained least squares

Constrained least squares is an optimization method in linear algebra and statistics that extends the classical least squares approach by incorporating linear equality constraints, aiming to find the vector x that minimizes the squared Euclidean norm \|Ax - b\|^2 subject to the constraint Cx = d, where A is an m \times n design matrix, b is an m-dimensional observation vector, C is a p \times n constraint matrix with p < n, and d is a p-dimensional vector. This formulation ensures the solution fits data as closely as possible while exactly satisfying specified linear conditions, such as continuity or derivative requirements in curve fitting. The problem can be solved using techniques like Lagrange multipliers, which transform it into an unconstrained system via the Karush-Kuhn-Tucker (KKT) conditions, or through of the constraint matrix to reduce the dimensionality and improve . In the Lagrange approach, the solution satisfies the augmented equation \begin{bmatrix} A^T A & C^T \\ C & 0 \end{bmatrix} \begin{bmatrix} x \\ \lambda \end{bmatrix} = \begin{bmatrix} A^T b \\ d \end{bmatrix}, where \lambda are the multiplier vectors, assuming the combined matrix \begin{bmatrix} A \\ C \end{bmatrix} has full column for uniqueness. Alternative methods, such as null-space projection or elimination, eliminate constrained variables to solve a reduced unconstrained problem. Constrained least squares finds broad applications in fields requiring data fitting with restrictions, including for smooth curve approximation with continuity constraints, signal processing for and under linearity assumptions, and for optimizing inputs in linear quadratic regulators with state equality conditions. It also appears in robust estimation problems, such as determining centers of rotation in without sensitivity to local minima.

Introduction

Definition

Constrained least squares is an that involves minimizing the squared of the residuals \|Ax - b\|_2^2, where A is the m \times n , x is the n-dimensional to be estimated, and b is the m-dimensional observation , subject to additional linear constraints on x. This formulation extends the standard approach by incorporating restrictions that ensure the solution adheres to specified conditions. Ordinary least squares represents the unconstrained special case of this problem. The general form for equality-constrained least squares is to minimize over x the objective function \frac{1}{2} x^T A^T A x - (A^T b)^T x subject to C x = d, where C is the p \times n with p < n and d is the p-dimensional vector. The factor of \frac{1}{2} is conventional in to simplify derivatives, and the constant term b^T b is omitted as it does not affect the minimizer. Constraints in this framework typically arise from domain knowledge requiring the parameters to satisfy physical, economic, or statistical conditions, such as non-negativity (for cases) or fixed sums that enforce or balance. For instance, in , one might fit a line y = \beta_0 + \beta_1 x subject to the \beta_0 + \beta_1 = 1, which ensures the model passes through the point (1, 1), reflecting a known physical or boundary condition at x = 1.

Relation to unconstrained least squares

Constrained least squares extends the unconstrained problem by incorporating additional restrictions on the solution vector, modifying the optimization landscape while retaining the core objective of minimizing the squared residual norm. In the unconstrained case, the solution to the Ax \approx b is given by the ordinary \hat{x} = (A^T A)^{-1} A^T b, assuming A has full column and the inverse exists. This projects the response vector b orthogonally onto the column space of the A, yielding the point in that subspace closest to b in the norm. The introduction of constraints alters the feasible solution set, restricting \hat{x} to lie within a specific region defined by the constraints, which can lead to solutions that are more interpretable or realistic but potentially biased relative to the unconstrained optimum. For equality constraints Cx = d, the feasible set forms an affine of the original ; for inequality constraints, it defines a . Geometrically, the constrained solution represents the of b onto the of the column of A and the set, rather than the full column alone, which may shift the optimum away from the unconstrained if the constraints are active. Under the Gauss-Markov assumptions (, strict exogeneity, homoscedasticity, and no perfect ), the unconstrained is the best linear unbiased with minimum variance among linear unbiased alternatives, but it risks producing solutions that violate domain-specific realism, such as non-negativity in coefficients. In contrast, constrained least squares trades this unbiasedness for enforced feasibility, often resulting in biased with reduced variance, particularly when constraints incorporate prior knowledge about the parameters. If the unconstrained solution already satisfies the constraints—such as when C \hat{x} = d holds for constraints—the constrained and unconstrained solutions coincide exactly, preserving all properties of the least squares . This redundancy highlights that constraints only modify the solution when they bind, ensuring the constrained approach generalizes the unconstrained method without unnecessary alteration.

Mathematical Formulation

Equality constraints

In the equality constrained least squares problem, the objective is to minimize the squared of the , \|Ax - b\|_2^2, subject to linear constraints Cx = d. Here, A is an m \times n with m \geq n, b is an m-, C is a p \times n typically with p < n, and d is a p-. This setup assumes the problem is feasible, meaning the constraints are consistent and there exists at least one x satisfying Cx = d. Feasibility requires that d lies in the range of C, ensuring the affine subspace defined by the constraints is non-empty. For the problem to yield a solution, rank conditions must hold: C should have full row rank (rank p) to avoid redundant constraints, and the stacked matrix [A; C] should have full column rank (rank n) to ensure and invertibility of the associated KKT system. These conditions prevent underdetermined or inconsistent systems that could lead to multiple or no s. The problem can be reformulated as a quadratic program: \begin{align*} \min_x &\quad \frac{1}{2} x^\top Q x + c^\top x \\ \text{s.t.} &\quad Cx = d, \end{align*} where Q = A^\top A and c = -A^\top b. This quadratic objective arises directly from expanding the least squares term, ignoring the constant b^\top b. A common application appears in portfolio optimization, such as index tracking, where asset weights x are chosen to minimize the tracking error \|Rx - r_\text{index}\|_2^2 (with R as the matrix of historical asset returns and r_\text{index} the benchmark returns) subject to the budget constraint \mathbf{1}^\top x = 1, ensuring the weights sum to one. This enforces full investment while approximating benchmark performance.

Inequality constraints

In constrained least squares problems incorporating constraints, the objective is to minimize the squared norm of the subject to linear inequalities, formulated as \min_{x} \|Ax - b\|_2^2 \quad \text{subject to} \quad Gx \leq h, where A \in \mathbb{R}^{m \times n} is the , b \in \mathbb{R}^m is the response vector, G \in \mathbb{R}^{p \times n} is the constraint , and h \in \mathbb{R}^p is the bound vector. constraints Cx = d may also be included alongside inequalities, reducing to the equality-only case when no inequalities are present. This setup transforms the problem into a convex quadratic program, as the objective is convex quadratic and the feasible set is a defined by the linear inequalities. Under standard assumptions, the \{x \mid Gx \leq h, Cx = d\} forms a , ensuring the problem is . The least-squares objective \|Ax - b\|_2^2 is provided A has full column , guaranteeing a unique minimizer when the feasible set is nonempty. A prominent special case is (NNLS), where the constraints simplify to x \geq 0, corresponding to G = I_n and h = 0. This arises in applications requiring non-negative parameters, such as or image processing, where negative values are infeasible. Optimality in this formulation is characterized by the Karush-Kuhn-Tucker (KKT) conditions, which extend Lagrange multipliers to handle inequalities via complementary slackness and non-negativity of dual variables. For instance, in for demand modeling, positivity constraints on coefficients ensure that estimated elasticities or impacts remain non-negative, aligning with economic interpretations where negative effects are implausible.

Solution Methods

Lagrange multipliers for equality constraints

To solve the equality-constrained least squares problem of minimizing \frac{1}{2} \|Ax - b\|_2^2 subject to Cx = d, where A \in \mathbb{R}^{m \times n}, C \in \mathbb{R}^{p \times n}, b \in \mathbb{R}^m, and d \in \mathbb{R}^p with p < n, the method of incorporates the constraints into the objective via a . The Lagrangian is constructed as L(x, \lambda) = \frac{1}{2} \|Ax - b\|_2^2 + \lambda^T (Cx - d), where \lambda \in \mathbb{R}^p is the vector of Lagrange multipliers. The stationarity conditions for a minimum are obtained by setting the partial derivatives to zero: \nabla_x L = A^T (Ax - b) + C^T \lambda = 0, \quad \nabla_\lambda L = Cx - d = 0. These equations enforce both the gradient of the objective adjusted by the constraints and the satisfaction of the equalities. Combining the stationarity conditions yields the augmented linear system \begin{bmatrix} A^T A & C^T \\ C & 0 \end{bmatrix} \begin{bmatrix} x \\ \lambda \end{bmatrix} = \begin{bmatrix} A^T b \\ d \end{bmatrix}, which can be solved for x and \lambda using block elimination or of the , assuming it is invertible. An explicit solution for x expresses the constrained minimizer in terms of the unconstrained solution x_{\text{unc}} = (A^T A)^{-1} A^T b: x = x_{\text{unc}} - (A^T A)^{-1} C^T \left( C (A^T A)^{-1} C^T \right)^{-1} (C x_{\text{unc}} - d). This formula adjusts the unconstrained solution by a correction term that projects onto the . The solution is unique provided that the stacked matrix \begin{bmatrix} A \\ C \end{bmatrix} has full column rank, ensuring the positive semidefiniteness of A^T A is strengthened by the constraints to yield a unique minimizer. As an illustrative example, consider fitting a line y = mx + c to the data points (1,1), (2,2), (3,3) subject to the constraint m + c = 2. Here, A = \begin{bmatrix} 1 & 1 \\ 2 & 1 \\ 3 & 1 \end{bmatrix}, b = \begin{bmatrix} 1 \\ 2 \\ 3 \end{bmatrix}, C = \begin{bmatrix} 1 & 1 \end{bmatrix}, and d = 2. The unconstrained solution is x_{\text{unc}} = \begin{bmatrix} m \\ c \end{bmatrix} = \begin{bmatrix} 1 \\ 0 \end{bmatrix}, with C x_{\text{unc}} - d = -1. Then A^T A = \begin{bmatrix} 14 & 6 \\ 6 & 3 \end{bmatrix}, (A^T A)^{-1} = \frac{1}{6} \begin{bmatrix} 3 & -6 \\ -6 & 14 \end{bmatrix}, and C (A^T A)^{-1} C^T = \frac{5}{6}. The correction term is -(A^T A)^{-1} C^T \left( \frac{5}{6} \right)^{-1} (-1) = \begin{bmatrix} -0.6 \\ 1.6 \end{bmatrix}, yielding the constrained solution x = \begin{bmatrix} 0.4 \\ 1.6 \end{bmatrix}. This satisfies the constraint and minimizes the residual sum of squares under it.

Quadratic programming for inequality constraints

Constrained least squares problems with inequality constraints can be reformulated as (), which are problems of the form \min_x \ \frac{1}{2} x^T (A^T A) x - (A^T b)^T x \quad \text{subject to} \quad Gx \leq h, \ Cx = d, where the objective function arises from the squared \|Ax - b\|^2/2, G and h define the inequality constraints, and C and d handle any equality constraints. This standard QP structure ensures the problem is solvable using established optimization techniques, as the A^T A is . Active set methods address this QP by iteratively maintaining a of active inequality constraints, treating them as equalities and solving the resulting equality-constrained subproblem via Lagrange multipliers, then updating the set by adding violated constraints or removing non-binding ones based on dual feasibility checks. These methods are particularly efficient for sparse or structured problems, though they can require up to exponentially many iterations in the worst case due to potential over constraint sets. A seminal example is the Lawson-Hanson algorithm for (NNLS), which applies active set principles to minimize \|Ax - b\|^2 subject to x \geq 0 by starting with all variables passive and adding positivity constraints as needed while ensuring descent in the objective. In contrast, interior-point methods incorporate the inequality constraints through logarithmic barrier functions or primal-dual formulations, transforming the QP into a sequence of unconstrained (or equality-constrained) problems solved via Newton's method on the associated Karush-Kuhn-Tucker (KKT) system, with central path-following to approach feasibility and optimality. These approaches guarantee polynomial-time convergence, typically O(\sqrt{n} \log(1/\epsilon)) iterations for an \epsilon-optimal solution in convex QPs with n variables, making them suitable for large-scale instances despite higher per-iteration costs from solving linear systems. Active set methods, while potentially superlinear in practice for well-behaved problems, lack such worst-case guarantees but often outperform interior-point methods on smaller or degenerate cases. Practical implementations of these QP solvers for constrained least squares are available in numerical software, such as the quadprog function in , which supports both active set and interior-point algorithms, and the CVXOPT library in , which provides solvers like solvers.qp based on interior-point methods. A representative application is solving NNLS to estimate non-negative regression coefficients, such as in image processing where pixel intensities cannot be negative; for instance, given data matrix A \in \mathbb{R}^{m \times n} and response b \in \mathbb{R}^m, the QP formulation with G = -I_n and h = 0 yields coefficients x \geq 0 minimizing the , often computed efficiently via the Lawson-Hanson for problems up to thousands of variables.

Applications

Linear regression with constraints

Constrained linear regression seeks to estimate the parameter vector \beta by minimizing the squared of the residuals \|y - X\beta\|_2^2 subject to linear constraints on \beta, such as restrictions \beta_i \geq 0 for all i or monotonicity conditions \beta_1 \leq \beta_2 \leq \dots \leq \beta_p. These formulations arise in statistical modeling where domain-specific knowledge imposes feasible regions on the coefficients, transforming the standard ordinary problem into a program. Imposing such constraints enhances model interpretability and fit by integrating prior information, for instance, requiring non-negative elasticities in demand models based on economic theory. In the presence of among predictors, the constraints reduce the impact of near-singular design matrices by biasing estimates toward the constraint boundary, leading to more stable predictions. Additionally, in high-dimensional regressions where the number of parameters approaches or exceeds the sample size, these restrictions serve as regularization, mitigating and improving out-of-sample performance. Constrained estimators are typically biased relative to their unconstrained counterparts but often achieve lower , especially when the true parameters lie within the or the model is misspecified. Under standard regularity conditions, including constraint qualification and , these estimators exhibit asymptotic normality, enabling standard inference procedures with adjusted covariance matrices. A prominent special case is , which enforces ordered parameters \beta_1 \leq \beta_2 \leq \dots \leq \beta_p and can be efficiently solved using the pool-adjacent-violators algorithm, a linear-time procedure that iteratively merges adjacent violators of the order until consistency is achieved. This approach is particularly useful in scenarios requiring monotonic relationships, such as estimating wage returns to successive years of , where constraints ensure non-decreasing marginal benefits, aligning with theoretical expectations of positive and potentially increasing returns.

Signal processing and control systems

In signal processing, constrained least squares methods address inverse problems by enforcing physical constraints to ensure realistic solutions, such as sparsity in seismic data or positivity in image restoration. In seismic processing, sparseness-constrained least-squares inversion reconstructs wavefields from bandlimited recordings by minimizing the squared error between observed and modeled data while penalizing non-sparse reflectivity models, which reflects the sparse nature of subsurface reflectors. This approach enhances resolution and suppresses artifacts in migration imaging. Similarly, in image restoration, constrained least-squares techniques solve the Fredholm integral equation by minimizing the least-squares misfit subject to positivity constraints on pixel intensities, preventing unphysical negative values and improving edge preservation in blurred or noisy images. Extensions of Kalman filtering incorporate constraints for state estimation in dynamic systems, particularly where physical bounds like non-negativity must be respected, as in tracking applications. The constrained Kalman filter projects the unconstrained state estimate onto the feasible set after the update step, using methods like perfect measurements or inequality projections to enforce bounds without violating the filter's optimality properties. For example, in tracking non-negative quantities such as concentrations or populations, this projection ensures the posterior lies within the constraint region, reducing in scenarios with noisy sensor data. In control systems, constrained least squares formulations optimize the (LQR) under input constraints, deriving optimal gains by solving a linearly constrained least-squares problem that minimizes the quadratic cost while respecting actuator limits. This approach extends the standard Riccati solution to handle equality or inequality constraints on states or inputs, enabling stable control in systems like or where saturation occurs. methods can implement these solutions efficiently for real-time applications. Constrained total least squares enhances superresolution in by estimating frequencies or directions of arrival under algebraic constraints on signal parameters, outperforming unconstrained methods in resolving closely spaced sources amid noise. In uniform linear arrays, this technique minimizes errors in both data and model coefficients while enforcing constraints like Vandermonde structure, achieving higher accuracy in direction-of-arrival estimation for or . A representative example is beamforming in antenna arrays, where equality constraints on weights steer nulls toward interferers via linearly constrained least squares. The Frost algorithm minimizes output power subject to linear constraints that preserve the desired signal direction and null specific interference angles, forming deep nulls narrower than the array's beamwidth for improved signal-to-interference ratio in adaptive arrays.

References

  1. [1]
    [PDF] 11. Constrained least squares
    Constrained least squares minimizes ∥𝐴𝑥 − 𝑏∥ subject to 𝐶𝑥 = 𝑑, where 𝐴 is an m x n matrix, 𝐶 is a p x n matrix, 𝑏 is an m-vector, and 𝑑 is a p-vector.Missing: definition | Show results with:definition
  2. [2]
    [PDF] 1. Least Squares with Linear Constraints - Department of Statistics
    Nov 30, 2005 · The general form of a least squares problem with linear constraints is as follows: we wish to find an n-vector x that minimizes kAx − bk2, ...
  3. [3]
    [PDF] Constrained Linear Least Squares - Duke People
    Constrained linear least squares fits a curve to data while satisfying additional criteria, like passing through a point or having a specific slope.Missing: definition | Show results with:definition
  4. [4]
    [PDF] Constrained least squares optimization for robust estimation of ...
    This paper presents a new direct method for estimating the average center of rotation (CoR) using constrained least-squares, which can perform as well as ...<|control11|><|separator|>
  5. [5]
    [PDF] Least Squares with Examples in Signal Processing
    6 Constrained least squares. Constrained least squares refers to the problem of finding a least squares solution that exactly satisfies additional ...
  6. [6]
    The Gauss-Markov Theorem and BLUE OLS Coefficient Estimates
    The Gauss-Markov theorem states that OLS can produce the best coefficient estimates. Learn more about this theorem and its implications for the estimates.
  7. [7]
    The bias of the least squares estimator over interval constraints
    We show that the interval constrained least squares estimator for a regression model is in general bias. The bias and some of its properties are given when ...
  8. [8]
    [PDF] Constrained Least Squares - Semantic Scholar
    Nov 16, 2015 · Least squares with equality constraints. ▻ CLS combines solving ... ▻ put together with Cx = d to get KKT conditions. 2AT A CT. C. 0 x z.<|control11|><|separator|>
  9. [9]
    [PDF] Sparse and stable Markowitz portfolios - European Central Bank
    We consider the problem of portfolio selection within the classical Markowitz mean- variance framework, reformulated as a constrained least-squares regression ...
  10. [10]
    Sensitivity analysis of constrained linear L1 regression
    In many applications there are additional linear constraints that must be satisfied by some or all of the parameters, for example, positivity. In particular, ...<|control11|><|separator|>
  11. [11]
    [PDF] Convex Optimization
    This book is about convex optimization, a special class of mathematical optimiza- tion problems, which includes least-squares and linear programming ...
  12. [12]
    quadprog - Quadratic programming - MATLAB - MathWorks
    Create a problem structure using a Problem-Based Optimization Workflow. Create an optimization problem equivalent to Quadratic Program with Linear Constraints.
  13. [13]
    Quadratic Programming Algorithms - MATLAB & Simulink - MathWorks
    Quadratic programming is the problem of finding a vector x that minimizes a quadratic function, possibly subject to linear constraints.<|separator|>
  14. [14]
    [PDF] Active-Set Methods for Quadratic Programming - CCoM
    of active-set method for quadratic programming: binding-direction methods and nonbinding- direction methods. Broadly speaking, the working set for a binding ...
  15. [15]
    Interior-point methods - ScienceDirect.com
    As mentioned above, the worst-case complexity of interior-point methods is weakly polynomial, in the sense that the iteration bounds are polynomials in the ...
  16. [16]
    [PDF] Interior Point Methods for Convex Quadratic Programming
    Outline. • Part 1: IPM for QP. – quadratic forms. – duality in QP. – first order optimality conditions. – primal-dual framework.
  17. [17]
    Solving a quadratic program - CVXOPT
    Quadratic programs can be solved via the solvers.qp() function. As an example, we can solve the QP as follows:
  18. [18]
    Quadratic programming (QP) - Hua Zhou
    For example, nonnegative least squares (NNLS) minimize12‖y−Xβ‖22subject toβ⪰0. In NNMF (nonnegative matrix factorization), the objective ‖X ...
  19. [19]
    Inequality Constrained Least-Squares Estimation
    Apr 5, 2012 · This paper provides an inequality constrained least-squares (ICLS) estimation, specifies an untruncated variance-covariance matrix of the ICLS estimates,
  20. [20]
    Constrained Inverse Regression for Incorporating Prior Information
    conform to researchers' prior knowledge. Previous research has ... discussed constrained linear regression models, and Seber and. Wild (1989 ...<|control11|><|separator|>
  21. [21]
    Inequality Constrained Least-Squares Estimation - Semantic Scholar
    Sep 1, 1976 · To meet such demands, this paper provides an inequality constrained least-squares (ICLS) estimation, specifies an untruncated variance ...Missing: seminal | Show results with:seminal
  22. [22]
    Sign-constrained least squares estimation for high-dimensional ...
    Aug 6, 2025 · We show that a simple non-negative or sign-constrained least squares is a very simple and effective regularization technique for a certain class ...
  23. [23]
    Response error in a transformation model with an application to ...
    regression and isotonic regression with very similar results. 9In the ... studies have found that the OLS estimate of returns to education is biased downwards.
  24. [24]
    [PDF] Constrained State Estimation - A Review - arXiv
    This tutorial-style paper reviews the Bayesian state estimation for (non)linear state-space systems and introduces the formulation of constrained state ...
  25. [25]
    [PDF] Linear-Quadratic Control - Stanford University
    Linear quadratic regulator. 26. Page 27. ▷ LQR problem is a linearly constrained least-squares problem: minimize. ∥Fz∥2 subject to Gz = d. ▷ variable z is ...
  26. [26]