Fact-checked by Grok 2 weeks ago

Constrained least squares

Constrained least squares is an optimization method in linear algebra and statistics that extends the classical least squares approach by incorporating linear equality constraints, aiming to find the vector x that minimizes the squared Euclidean norm \|Ax - b\|^2 subject to the constraint Cx = d, where A is an m \times n design matrix, b is an m-dimensional observation vector, C is a p \times n constraint matrix with p < n, and d is a p-dimensional vector.^[1] This formulation ensures the solution fits data as closely as possible while exactly satisfying specified linear conditions, such as continuity or derivative requirements in curve fitting.^[2] The problem can be solved using techniques like Lagrange multipliers, which transform it into an unconstrained system via the Karush-Kuhn-Tucker (KKT) conditions,^[3] or through QR factorization of the constraint matrix to reduce the dimensionality and improve numerical stability.^[1] In the Lagrange approach, the solution satisfies the augmented equation \begin{bmatrix} A^T A & C^T \\ C & 0 \end{bmatrix} \begin{bmatrix} x \\ \lambda \end{bmatrix} = \begin{bmatrix} A^T b \\ d \end{bmatrix}, where \lambda are the multiplier vectors, assuming the combined matrix \begin{bmatrix} A \\ C \end{bmatrix} has full column rank for uniqueness.^[2] Alternative methods, such as null-space projection or elimination, eliminate constrained variables to solve a reduced unconstrained least squares problem.^[1] Constrained least squares finds broad applications in fields requiring data fitting with restrictions, including spline interpolation for smooth curve approximation with continuity constraints, signal processing for deconvolution and smoothing under linearity assumptions, and control theory for optimizing inputs in linear quadratic regulators with state equality conditions.^[2] It also appears in robust estimation problems, such as determining centers of rotation in imaging without sensitivity to local minima.^[4]

Introduction

Definition

Constrained least squares is an optimization problem that involves minimizing the squared Euclidean norm of the residuals \|Ax - b\|_2^2, where A is the m \times n design matrix, x is the n-dimensional parameter vector to be estimated, and b is the m-dimensional observation vector, subject to additional linear constraints on x.^[2] This formulation extends the standard least squares approach by incorporating restrictions that ensure the solution adheres to specified conditions. Ordinary least squares represents the unconstrained special case of this problem.^[5] The general form for equality-constrained least squares is to minimize over x the quadratic objective function \frac{1}{2} x^T A^T A x - (A^T b)^T x subject to C x = d, where C is the p \times n constraint matrix with p < n and d is the p-dimensional constraint vector.^[2] The factor of \frac{1}{2} is conventional in quadratic programming to simplify derivatives, and the constant term b^T b is omitted as it does not affect the minimizer.^[3] Constraints in this framework typically arise from domain knowledge requiring the parameters to satisfy physical, economic, or statistical conditions, such as non-negativity (for inequality cases) or fixed sums that enforce normalization or balance.^[3] For instance, in linear regression, one might fit a line y = \beta_0 + \beta_1 x subject to the constraint \beta_0 + \beta_1 = 1, which ensures the model passes through the point (1, 1), reflecting a known physical or boundary condition at x = 1.^[3]

Relation to unconstrained least squares

Constrained least squares extends the unconstrained least squares problem by incorporating additional restrictions on the solution vector, modifying the optimization landscape while retaining the core objective of minimizing the squared residual norm. In the unconstrained case, the solution to the linear system Ax \approx b is given by the ordinary least squares estimator \hat{x} = (A^T A)^{-1} A^T b, assuming A has full column rank and the inverse exists.^[1] This estimator projects the response vector b orthogonally onto the column space of the design matrix A, yielding the point in that subspace closest to b in the Euclidean norm.^[1] The introduction of constraints alters the feasible solution set, restricting \hat{x} to lie within a specific region defined by the constraints, which can lead to solutions that are more interpretable or realistic but potentially biased relative to the unconstrained optimum. For equality constraints Cx = d, the feasible set forms an affine subspace of the original space; for inequality constraints, it defines a convex polyhedron.^[1] Geometrically, the constrained solution represents the projection of b onto the intersection of the column space of A and the constraint set, rather than the full column space alone, which may shift the optimum away from the unconstrained projection if the constraints are active.^[1] Under the Gauss-Markov assumptions (linearity, strict exogeneity, homoscedasticity, and no perfect multicollinearity), the unconstrained estimator is the best linear unbiased estimator with minimum variance among linear unbiased alternatives, but it risks producing solutions that violate domain-specific realism, such as non-negativity in coefficients.^[6] In contrast, constrained least squares trades this unbiasedness for enforced feasibility, often resulting in biased estimators with reduced variance, particularly when constraints incorporate prior knowledge about the parameters.^[7] If the unconstrained solution already satisfies the constraints—such as when C \hat{x} = d holds for equality constraints—the constrained and unconstrained solutions coincide exactly, preserving all properties of the ordinary least squares estimator.^[1] This redundancy highlights that constraints only modify the solution when they bind, ensuring the constrained approach generalizes the unconstrained method without unnecessary alteration.

Mathematical Formulation

Equality constraints

In the equality constrained least squares problem, the objective is to minimize the squared Euclidean norm of the residual, \|Ax - b\|_2^2, subject to linear equality constraints Cx = d. Here, A is an m \times n matrix with m \geq n, b is an m-vector, C is a p \times n constraint matrix typically with p < n, and d is a p-vector.^[1] This setup assumes the problem is feasible, meaning the constraints are consistent and there exists at least one x satisfying Cx = d.^[1] Feasibility requires that d lies in the range of C, ensuring the affine subspace defined by the constraints is non-empty.^[8] For the problem to yield a unique solution, rank conditions must hold: C should have full row rank (rank p) to avoid redundant constraints, and the stacked matrix [A; C] should have full column rank (rank n) to ensure compatibility and invertibility of the associated KKT system.^[1] These conditions prevent underdetermined or inconsistent systems that could lead to multiple or no solutions.^[8] The problem can be reformulated as a quadratic program:

\begin{align*} \min_x &\quad \frac{1}{2} x^\top Q x + c^\top x \\ \text{s.t.} &\quad Cx = d, \end{align*}

where Q = A^\top A and c = -A^\top b.^[1] This quadratic objective arises directly from expanding the least squares term, ignoring the constant b^\top b.^[8] A common application appears in portfolio optimization, such as index tracking, where asset weights x are chosen to minimize the tracking error \|Rx - r_\text{index}\|_2^2 (with R as the matrix of historical asset returns and r_\text{index} the benchmark returns) subject to the budget constraint \mathbf{1}^\top x = 1, ensuring the weights sum to one.^[9] This enforces full investment while approximating benchmark performance.^[9]

Inequality constraints

In constrained least squares problems incorporating inequality constraints, the objective is to minimize the squared Euclidean norm of the residual subject to linear inequalities, formulated as

\min_{x} \|Ax - b\|_2^2 \quad \text{subject to} \quad Gx \leq h,

where A \in \mathbb{R}^{m \times n} is the design matrix, b \in \mathbb{R}^m is the response vector, G \in \mathbb{R}^{p \times n} is the inequality constraint matrix, and h \in \mathbb{R}^p is the bound vector. Equality constraints Cx = d may also be included alongside inequalities, reducing to the equality-only case when no inequalities are present. This setup transforms the problem into a convex quadratic program, as the objective is convex quadratic and the feasible set is a polyhedron defined by the linear inequalities. Under standard assumptions, the feasible region \{x \mid Gx \leq h, Cx = d\} forms a convex polyhedron, ensuring the problem is convex. The least-squares objective \|Ax - b\|_2^2 is strictly convex provided A has full column rank, guaranteeing a unique minimizer when the feasible set is nonempty. A prominent special case is non-negative least squares (NNLS), where the constraints simplify to x \geq 0, corresponding to G = I_n and h = 0. This arises in applications requiring non-negative parameters, such as factor analysis or image processing, where negative values are infeasible. Optimality in this formulation is characterized by the Karush-Kuhn-Tucker (KKT) conditions, which extend Lagrange multipliers to handle inequalities via complementary slackness and non-negativity of dual variables. For instance, in linear regression for demand modeling, positivity constraints on coefficients ensure that estimated elasticities or impacts remain non-negative, aligning with economic interpretations where negative effects are implausible.

Solution Methods

Lagrange multipliers for equality constraints

To solve the equality-constrained least squares problem of minimizing \frac{1}{2} \|Ax - b\|_2^2 subject to Cx = d, where A \in \mathbb{R}^{m \times n}, C \in \mathbb{R}^{p \times n}, b \in \mathbb{R}^m, and d \in \mathbb{R}^p with p < n, the method of Lagrange multipliers incorporates the constraints into the objective via a Lagrangian function.^[10]^[2] The Lagrangian is constructed as

L(x, \lambda) = \frac{1}{2} \|Ax - b\|_2^2 + \lambda^T (Cx - d),

where \lambda \in \mathbb{R}^p is the vector of Lagrange multipliers.^[10]^[2] The stationarity conditions for a minimum are obtained by setting the partial derivatives to zero:

\nabla_x L = A^T (Ax - b) + C^T \lambda = 0, \quad \nabla_\lambda L = Cx - d = 0.

These equations enforce both the gradient of the objective adjusted by the constraints and the satisfaction of the equalities.^[10]^[2] Combining the stationarity conditions yields the augmented linear system

\begin{bmatrix} A^T A & C^T \\ C & 0 \end{bmatrix} \begin{bmatrix} x \\ \lambda \end{bmatrix} = \begin{bmatrix} A^T b \\ d \end{bmatrix},

which can be solved for x and \lambda using block elimination or QR decomposition of the coefficient matrix, assuming it is invertible.^[10]^[2] An explicit solution for x expresses the constrained minimizer in terms of the unconstrained least squares solution x_{\text{unc}} = (A^T A)^{-1} A^T b:

x = x_{\text{unc}} - (A^T A)^{-1} C^T \left( C (A^T A)^{-1} C^T \right)^{-1} (C x_{\text{unc}} - d).

This formula adjusts the unconstrained solution by a correction term that projects onto the constraint subspace.^[10]^[2] The solution is unique provided that the stacked matrix \begin{bmatrix} A \\ C \end{bmatrix} has full column rank, ensuring the positive semidefiniteness of A^T A is strengthened by the constraints to yield a unique minimizer.^[10]^[2] As an illustrative example, consider fitting a line y = mx + c to the data points (1,1), (2,2), (3,3) subject to the constraint m + c = 2. Here, A = \begin{bmatrix} 1 & 1 \\ 2 & 1 \\ 3 & 1 \end{bmatrix}, b = \begin{bmatrix} 1 \\ 2 \\ 3 \end{bmatrix}, C = \begin{bmatrix} 1 & 1 \end{bmatrix}, and d = 2. The unconstrained solution is x_{\text{unc}} = \begin{bmatrix} m \\ c \end{bmatrix} = \begin{bmatrix} 1 \\ 0 \end{bmatrix}, with C x_{\text{unc}} - d = -1. Then A^T A = \begin{bmatrix} 14 & 6 \\ 6 & 3 \end{bmatrix}, (A^T A)^{-1} = \frac{1}{6} \begin{bmatrix} 3 & -6 \\ -6 & 14 \end{bmatrix}, and C (A^T A)^{-1} C^T = \frac{5}{6}. The correction term is -(A^T A)^{-1} C^T \left( \frac{5}{6} \right)^{-1} (-1) = \begin{bmatrix} -0.6 \\ 1.6 \end{bmatrix}, yielding the constrained solution x = \begin{bmatrix} 0.4 \\ 1.6 \end{bmatrix}. This satisfies the constraint and minimizes the residual sum of squares under it.^[10]

Quadratic programming for inequality constraints

Constrained least squares problems with inequality constraints can be reformulated as quadratic programs (QPs), which are convex optimization problems of the form

\min_x \ \frac{1}{2} x^T (A^T A) x - (A^T b)^T x \quad \text{subject to} \quad Gx \leq h, \ Cx = d,

where the objective function arises from the squared Euclidean norm \|Ax - b\|^2/2, G and h define the inequality constraints, and C and d handle any equality constraints.^[11] This standard QP structure ensures the problem is solvable using established optimization techniques, as the Hessian A^T A is positive semidefinite.^[12] Active set methods address this QP by iteratively maintaining a working set of active inequality constraints, treating them as equalities and solving the resulting equality-constrained subproblem via Lagrange multipliers, then updating the set by adding violated constraints or removing non-binding ones based on dual feasibility checks.^[13] These methods are particularly efficient for sparse or structured problems, though they can require up to exponentially many iterations in the worst case due to potential cycling over constraint sets.^[14] A seminal example is the Lawson-Hanson algorithm for non-negative least squares (NNLS), which applies active set principles to minimize \|Ax - b\|^2 subject to x \geq 0 by starting with all variables passive and adding positivity constraints as needed while ensuring descent in the objective. In contrast, interior-point methods incorporate the inequality constraints through logarithmic barrier functions or primal-dual formulations, transforming the QP into a sequence of unconstrained (or equality-constrained) problems solved via Newton's method on the associated Karush-Kuhn-Tucker (KKT) system, with central path-following to approach feasibility and optimality.^[15] These approaches guarantee polynomial-time convergence, typically O(\sqrt{n} \log(1/\epsilon)) iterations for an \epsilon-optimal solution in convex QPs with n variables, making them suitable for large-scale instances despite higher per-iteration costs from solving linear systems.^[14] Active set methods, while potentially superlinear in practice for well-behaved problems, lack such worst-case guarantees but often outperform interior-point methods on smaller or degenerate cases.^[14] Practical implementations of these QP solvers for constrained least squares are available in numerical software, such as the quadprog function in MATLAB, which supports both active set and interior-point algorithms, and the CVXOPT library in Python, which provides solvers like solvers.qp based on interior-point methods.^[11]^[16] A representative application is solving NNLS to estimate non-negative regression coefficients, such as in image processing where pixel intensities cannot be negative; for instance, given data matrix A \in \mathbb{R}^{m \times n} and response b \in \mathbb{R}^m, the QP formulation with G = -I_n and h = 0 yields coefficients x \geq 0 minimizing the residual sum of squares, often computed efficiently via the Lawson-Hanson active set method for problems up to thousands of variables.^[17]

Applications

Linear regression with constraints

Constrained linear regression seeks to estimate the parameter vector \beta by minimizing the squared Euclidean norm of the residuals \|y - X\beta\|_2^2 subject to linear constraints on \beta, such as sign restrictions \beta_i \geq 0 for all i or monotonicity conditions \beta_1 \leq \beta_2 \leq \dots \leq \beta_p.^[18] These formulations arise in statistical modeling where domain-specific knowledge imposes feasible regions on the coefficients, transforming the standard ordinary least squares problem into a quadratic program.^[18] Imposing such constraints enhances model interpretability and fit by integrating prior information, for instance, requiring non-negative elasticities in demand models based on economic theory.^[19] In the presence of multicollinearity among predictors, the constraints reduce the impact of near-singular design matrices by biasing estimates toward the constraint boundary, leading to more stable predictions.^[20] Additionally, in high-dimensional regressions where the number of parameters approaches or exceeds the sample size, these restrictions serve as regularization, mitigating overfitting and improving out-of-sample performance.^[21] Constrained estimators are typically biased relative to their unconstrained counterparts but often achieve lower mean squared error, especially when the true parameters lie within the feasible region or the model is misspecified. Under standard regularity conditions, including constraint qualification and identifiability, these estimators exhibit asymptotic normality, enabling standard inference procedures with adjusted covariance matrices. A prominent special case is isotonic regression, which enforces ordered parameters \beta_1 \leq \beta_2 \leq \dots \leq \beta_p and can be efficiently solved using the pool-adjacent-violators algorithm, a linear-time procedure that iteratively merges adjacent violators of the order until consistency is achieved. This approach is particularly useful in scenarios requiring monotonic relationships, such as estimating wage returns to successive years of education, where constraints ensure non-decreasing marginal benefits, aligning with theoretical expectations of positive and potentially increasing returns.^[22]

Signal processing and control systems

In signal processing, constrained least squares methods address inverse problems by enforcing physical constraints to ensure realistic solutions, such as sparsity in seismic data or positivity in image restoration. In seismic processing, sparseness-constrained least-squares inversion reconstructs wavefields from bandlimited recordings by minimizing the squared error between observed and modeled data while penalizing non-sparse reflectivity models, which reflects the sparse nature of subsurface reflectors. This approach enhances resolution and suppresses artifacts in migration imaging. Similarly, in image restoration, constrained least-squares techniques solve the Fredholm integral equation by minimizing the least-squares misfit subject to positivity constraints on pixel intensities, preventing unphysical negative values and improving edge preservation in blurred or noisy images. Extensions of Kalman filtering incorporate constraints for state estimation in dynamic systems, particularly where physical bounds like non-negativity must be respected, as in tracking applications. The constrained Kalman filter projects the unconstrained state estimate onto the feasible set after the update step, using methods like perfect measurements or inequality projections to enforce bounds without violating the filter's optimality properties. For example, in tracking non-negative quantities such as concentrations or populations, this projection ensures the posterior mean lies within the constraint region, reducing estimation bias in scenarios with noisy sensor data.^[23] In control systems, constrained least squares formulations optimize the linear quadratic regulator (LQR) under input constraints, deriving optimal feedback gains by solving a linearly constrained least-squares problem that minimizes the quadratic cost while respecting actuator limits. This approach extends the standard Riccati solution to handle equality or inequality constraints on states or inputs, enabling stable control in systems like robotics or aerospace where saturation occurs. Quadratic programming methods can implement these solutions efficiently for real-time applications.^[24] Constrained total least squares enhances superresolution in array processing by estimating frequencies or directions of arrival under algebraic constraints on signal parameters, outperforming unconstrained methods in resolving closely spaced sources amid noise. In uniform linear arrays, this technique minimizes errors in both data and model coefficients while enforcing constraints like Vandermonde structure, achieving higher accuracy in direction-of-arrival estimation for radar or sonar.^[25] A representative example is beamforming in antenna arrays, where equality constraints on weights steer nulls toward interferers via linearly constrained least squares. The Frost algorithm minimizes output power subject to linear constraints that preserve the desired signal direction and null specific interference angles, forming deep nulls narrower than the array's beamwidth for improved signal-to-interference ratio in adaptive arrays.

References

[1]
[PDF] 11. Constrained least squares
Constrained least squares minimizes ∥𝐴𝑥 − 𝑏∥ subject to 𝐶𝑥 = 𝑑, where 𝐴 is an m x n matrix, 𝐶 is a p x n matrix, 𝑏 is an m-vector, and 𝑑 is a p-vector.Missing: definition | Show results with:definition
[2]
[PDF] 1. Least Squares with Linear Constraints - Department of Statistics
Nov 30, 2005 · The general form of a least squares problem with linear constraints is as follows: we wish to find an n-vector x that minimizes kAx − bk2, ...
[3]
[PDF] Constrained Linear Least Squares - Duke People
Constrained linear least squares fits a curve to data while satisfying additional criteria, like passing through a point or having a specific slope.Missing: definition | Show results with:definition
[4]
[PDF] Constrained least squares optimization for robust estimation of ...
This paper presents a new direct method for estimating the average center of rotation (CoR) using constrained least-squares, which can perform as well as ...<|control11|><|separator|>
[5]
[PDF] Least Squares with Examples in Signal Processing
6 Constrained least squares. Constrained least squares refers to the problem of finding a least squares solution that exactly satisfies additional ...
[6]
The Gauss-Markov Theorem and BLUE OLS Coefficient Estimates
The Gauss-Markov theorem states that OLS can produce the best coefficient estimates. Learn more about this theorem and its implications for the estimates.
[7]
The bias of the least squares estimator over interval constraints
We show that the interval constrained least squares estimator for a regression model is in general bias. The bias and some of its properties are given when ...
[8]
[PDF] Constrained Least Squares - Semantic Scholar
Nov 16, 2015 · Least squares with equality constraints. ▻ CLS combines solving ... ▻ put together with Cx = d to get KKT conditions. 2AT A CT. C. 0 x z.<|control11|><|separator|>
[9]
[PDF] Sparse and stable Markowitz portfolios - European Central Bank
We consider the problem of portfolio selection within the classical Markowitz mean- variance framework, reformulated as a constrained least-squares regression ...
[10]
Sensitivity analysis of constrained linear L1 regression
In many applications there are additional linear constraints that must be satisfied by some or all of the parameters, for example, positivity. In particular, ...<|control11|><|separator|>
[11]
[PDF] Convex Optimization
This book is about convex optimization, a special class of mathematical optimiza- tion problems, which includes least-squares and linear programming ...
[12]
quadprog - Quadratic programming - MATLAB - MathWorks
Create a problem structure using a Problem-Based Optimization Workflow. Create an optimization problem equivalent to Quadratic Program with Linear Constraints.
[13]
Quadratic Programming Algorithms - MATLAB & Simulink - MathWorks
Quadratic programming is the problem of finding a vector x that minimizes a quadratic function, possibly subject to linear constraints.<|separator|>
[14]
[PDF] Active-Set Methods for Quadratic Programming - CCoM
of active-set method for quadratic programming: binding-direction methods and nonbinding- direction methods. Broadly speaking, the working set for a binding ...
[15]
Interior-point methods - ScienceDirect.com
As mentioned above, the worst-case complexity of interior-point methods is weakly polynomial, in the sense that the iteration bounds are polynomials in the ...
[16]
[PDF] Interior Point Methods for Convex Quadratic Programming
Outline. • Part 1: IPM for QP. – quadratic forms. – duality in QP. – first order optimality conditions. – primal-dual framework.
[17]
Solving a quadratic program - CVXOPT
Quadratic programs can be solved via the solvers.qp() function. As an example, we can solve the QP as follows:
[18]
Quadratic programming (QP) - Hua Zhou
For example, nonnegative least squares (NNLS) minimize12‖y−Xβ‖22subject toβ⪰0. In NNMF (nonnegative matrix factorization), the objective ‖X ...
[19]
Inequality Constrained Least-Squares Estimation
Apr 5, 2012 · This paper provides an inequality constrained least-squares (ICLS) estimation, specifies an untruncated variance-covariance matrix of the ICLS estimates,
[20]
Constrained Inverse Regression for Incorporating Prior Information
conform to researchers' prior knowledge. Previous research has ... discussed constrained linear regression models, and Seber and. Wild (1989 ...<|control11|><|separator|>
[21]
Inequality Constrained Least-Squares Estimation - Semantic Scholar
Sep 1, 1976 · To meet such demands, this paper provides an inequality constrained least-squares (ICLS) estimation, specifies an untruncated variance ...Missing: seminal | Show results with:seminal
[22]
Sign-constrained least squares estimation for high-dimensional ...
Aug 6, 2025 · We show that a simple non-negative or sign-constrained least squares is a very simple and effective regularization technique for a certain class ...
[23]
Response error in a transformation model with an application to ...
regression and isotonic regression with very similar results. 9In the ... studies have found that the OLS estimate of returns to education is biased downwards.
[24]
[PDF] Constrained State Estimation - A Review - arXiv
This tutorial-style paper reviews the Bayesian state estimation for (non)linear state-space systems and introduces the formulation of constrained state ...
[25]
[PDF] Linear-Quadratic Control - Stanford University
Linear quadratic regulator. 26. Page 27. ▷ LQR problem is a linearly constrained least-squares problem: minimize. ∥Fz∥2 subject to Gz = d. ▷ variable z is ...
[26]
https://ieeexplore.ieee.org/document/80955