Constrained least squares
Constrained least squares is an optimization method in linear algebra and statistics that extends the classical least squares approach by incorporating linear equality constraints, aiming to find the vector x that minimizes the squared Euclidean norm \|Ax - b\|^2 subject to the constraint Cx = d, where A is an m \times n design matrix, b is an m-dimensional observation vector, C is a p \times n constraint matrix with p < n, and d is a p-dimensional vector.[1] This formulation ensures the solution fits data as closely as possible while exactly satisfying specified linear conditions, such as continuity or derivative requirements in curve fitting.[2] The problem can be solved using techniques like Lagrange multipliers, which transform it into an unconstrained system via the Karush-Kuhn-Tucker (KKT) conditions,[3] or through QR factorization of the constraint matrix to reduce the dimensionality and improve numerical stability.[1] In the Lagrange approach, the solution satisfies the augmented equation \begin{bmatrix} A^T A & C^T \\ C & 0 \end{bmatrix} \begin{bmatrix} x \\ \lambda \end{bmatrix} = \begin{bmatrix} A^T b \\ d \end{bmatrix}, where \lambda are the multiplier vectors, assuming the combined matrix \begin{bmatrix} A \\ C \end{bmatrix} has full column rank for uniqueness.[2] Alternative methods, such as null-space projection or elimination, eliminate constrained variables to solve a reduced unconstrained least squares problem.[1] Constrained least squares finds broad applications in fields requiring data fitting with restrictions, including spline interpolation for smooth curve approximation with continuity constraints, signal processing for deconvolution and smoothing under linearity assumptions, and control theory for optimizing inputs in linear quadratic regulators with state equality conditions.[2] It also appears in robust estimation problems, such as determining centers of rotation in imaging without sensitivity to local minima.[4]Introduction
Definition
Constrained least squares is an optimization problem that involves minimizing the squared Euclidean norm of the residuals \|Ax - b\|_2^2, where A is the m \times n design matrix, x is the n-dimensional parameter vector to be estimated, and b is the m-dimensional observation vector, subject to additional linear constraints on x.[2] This formulation extends the standard least squares approach by incorporating restrictions that ensure the solution adheres to specified conditions. Ordinary least squares represents the unconstrained special case of this problem.[5] The general form for equality-constrained least squares is to minimize over x the quadratic objective function \frac{1}{2} x^T A^T A x - (A^T b)^T x subject to C x = d, where C is the p \times n constraint matrix with p < n and d is the p-dimensional constraint vector.[2] The factor of \frac{1}{2} is conventional in quadratic programming to simplify derivatives, and the constant term b^T b is omitted as it does not affect the minimizer.[3] Constraints in this framework typically arise from domain knowledge requiring the parameters to satisfy physical, economic, or statistical conditions, such as non-negativity (for inequality cases) or fixed sums that enforce normalization or balance.[3] For instance, in linear regression, one might fit a line y = \beta_0 + \beta_1 x subject to the constraint \beta_0 + \beta_1 = 1, which ensures the model passes through the point (1, 1), reflecting a known physical or boundary condition at x = 1.[3]Relation to unconstrained least squares
Constrained least squares extends the unconstrained least squares problem by incorporating additional restrictions on the solution vector, modifying the optimization landscape while retaining the core objective of minimizing the squared residual norm. In the unconstrained case, the solution to the linear system Ax \approx b is given by the ordinary least squares estimator \hat{x} = (A^T A)^{-1} A^T b, assuming A has full column rank and the inverse exists.[1] This estimator projects the response vector b orthogonally onto the column space of the design matrix A, yielding the point in that subspace closest to b in the Euclidean norm.[1] The introduction of constraints alters the feasible solution set, restricting \hat{x} to lie within a specific region defined by the constraints, which can lead to solutions that are more interpretable or realistic but potentially biased relative to the unconstrained optimum. For equality constraints Cx = d, the feasible set forms an affine subspace of the original space; for inequality constraints, it defines a convex polyhedron.[1] Geometrically, the constrained solution represents the projection of b onto the intersection of the column space of A and the constraint set, rather than the full column space alone, which may shift the optimum away from the unconstrained projection if the constraints are active.[1] Under the Gauss-Markov assumptions (linearity, strict exogeneity, homoscedasticity, and no perfect multicollinearity), the unconstrained estimator is the best linear unbiased estimator with minimum variance among linear unbiased alternatives, but it risks producing solutions that violate domain-specific realism, such as non-negativity in coefficients.[6] In contrast, constrained least squares trades this unbiasedness for enforced feasibility, often resulting in biased estimators with reduced variance, particularly when constraints incorporate prior knowledge about the parameters.[7] If the unconstrained solution already satisfies the constraints—such as when C \hat{x} = d holds for equality constraints—the constrained and unconstrained solutions coincide exactly, preserving all properties of the ordinary least squares estimator.[1] This redundancy highlights that constraints only modify the solution when they bind, ensuring the constrained approach generalizes the unconstrained method without unnecessary alteration.Mathematical Formulation
Equality constraints
In the equality constrained least squares problem, the objective is to minimize the squared Euclidean norm of the residual, \|Ax - b\|_2^2, subject to linear equality constraints Cx = d. Here, A is an m \times n matrix with m \geq n, b is an m-vector, C is a p \times n constraint matrix typically with p < n, and d is a p-vector.[1] This setup assumes the problem is feasible, meaning the constraints are consistent and there exists at least one x satisfying Cx = d.[1] Feasibility requires that d lies in the range of C, ensuring the affine subspace defined by the constraints is non-empty.[8] For the problem to yield a unique solution, rank conditions must hold: C should have full row rank (rank p) to avoid redundant constraints, and the stacked matrix [A; C] should have full column rank (rank n) to ensure compatibility and invertibility of the associated KKT system.[1] These conditions prevent underdetermined or inconsistent systems that could lead to multiple or no solutions.[8] The problem can be reformulated as a quadratic program: \begin{align*} \min_x &\quad \frac{1}{2} x^\top Q x + c^\top x \\ \text{s.t.} &\quad Cx = d, \end{align*} where Q = A^\top A and c = -A^\top b.[1] This quadratic objective arises directly from expanding the least squares term, ignoring the constant b^\top b.[8] A common application appears in portfolio optimization, such as index tracking, where asset weights x are chosen to minimize the tracking error \|Rx - r_\text{index}\|_2^2 (with R as the matrix of historical asset returns and r_\text{index} the benchmark returns) subject to the budget constraint \mathbf{1}^\top x = 1, ensuring the weights sum to one.[9] This enforces full investment while approximating benchmark performance.[9]Inequality constraints
In constrained least squares problems incorporating inequality constraints, the objective is to minimize the squared Euclidean norm of the residual subject to linear inequalities, formulated as \min_{x} \|Ax - b\|_2^2 \quad \text{subject to} \quad Gx \leq h, where A \in \mathbb{R}^{m \times n} is the design matrix, b \in \mathbb{R}^m is the response vector, G \in \mathbb{R}^{p \times n} is the inequality constraint matrix, and h \in \mathbb{R}^p is the bound vector. Equality constraints Cx = d may also be included alongside inequalities, reducing to the equality-only case when no inequalities are present. This setup transforms the problem into a convex quadratic program, as the objective is convex quadratic and the feasible set is a polyhedron defined by the linear inequalities. Under standard assumptions, the feasible region \{x \mid Gx \leq h, Cx = d\} forms a convex polyhedron, ensuring the problem is convex. The least-squares objective \|Ax - b\|_2^2 is strictly convex provided A has full column rank, guaranteeing a unique minimizer when the feasible set is nonempty. A prominent special case is non-negative least squares (NNLS), where the constraints simplify to x \geq 0, corresponding to G = I_n and h = 0. This arises in applications requiring non-negative parameters, such as factor analysis or image processing, where negative values are infeasible. Optimality in this formulation is characterized by the Karush-Kuhn-Tucker (KKT) conditions, which extend Lagrange multipliers to handle inequalities via complementary slackness and non-negativity of dual variables. For instance, in linear regression for demand modeling, positivity constraints on coefficients ensure that estimated elasticities or impacts remain non-negative, aligning with economic interpretations where negative effects are implausible.Solution Methods
Lagrange multipliers for equality constraints
To solve the equality-constrained least squares problem of minimizing \frac{1}{2} \|Ax - b\|_2^2 subject to Cx = d, where A \in \mathbb{R}^{m \times n}, C \in \mathbb{R}^{p \times n}, b \in \mathbb{R}^m, and d \in \mathbb{R}^p with p < n, the method of Lagrange multipliers incorporates the constraints into the objective via a Lagrangian function.[10][2] The Lagrangian is constructed as L(x, \lambda) = \frac{1}{2} \|Ax - b\|_2^2 + \lambda^T (Cx - d), where \lambda \in \mathbb{R}^p is the vector of Lagrange multipliers.[10][2] The stationarity conditions for a minimum are obtained by setting the partial derivatives to zero: \nabla_x L = A^T (Ax - b) + C^T \lambda = 0, \quad \nabla_\lambda L = Cx - d = 0. These equations enforce both the gradient of the objective adjusted by the constraints and the satisfaction of the equalities.[10][2] Combining the stationarity conditions yields the augmented linear system \begin{bmatrix} A^T A & C^T \\ C & 0 \end{bmatrix} \begin{bmatrix} x \\ \lambda \end{bmatrix} = \begin{bmatrix} A^T b \\ d \end{bmatrix}, which can be solved for x and \lambda using block elimination or QR decomposition of the coefficient matrix, assuming it is invertible.[10][2] An explicit solution for x expresses the constrained minimizer in terms of the unconstrained least squares solution x_{\text{unc}} = (A^T A)^{-1} A^T b: x = x_{\text{unc}} - (A^T A)^{-1} C^T \left( C (A^T A)^{-1} C^T \right)^{-1} (C x_{\text{unc}} - d). This formula adjusts the unconstrained solution by a correction term that projects onto the constraint subspace.[10][2] The solution is unique provided that the stacked matrix \begin{bmatrix} A \\ C \end{bmatrix} has full column rank, ensuring the positive semidefiniteness of A^T A is strengthened by the constraints to yield a unique minimizer.[10][2] As an illustrative example, consider fitting a line y = mx + c to the data points (1,1), (2,2), (3,3) subject to the constraint m + c = 2. Here, A = \begin{bmatrix} 1 & 1 \\ 2 & 1 \\ 3 & 1 \end{bmatrix}, b = \begin{bmatrix} 1 \\ 2 \\ 3 \end{bmatrix}, C = \begin{bmatrix} 1 & 1 \end{bmatrix}, and d = 2. The unconstrained solution is x_{\text{unc}} = \begin{bmatrix} m \\ c \end{bmatrix} = \begin{bmatrix} 1 \\ 0 \end{bmatrix}, with C x_{\text{unc}} - d = -1. Then A^T A = \begin{bmatrix} 14 & 6 \\ 6 & 3 \end{bmatrix}, (A^T A)^{-1} = \frac{1}{6} \begin{bmatrix} 3 & -6 \\ -6 & 14 \end{bmatrix}, and C (A^T A)^{-1} C^T = \frac{5}{6}. The correction term is -(A^T A)^{-1} C^T \left( \frac{5}{6} \right)^{-1} (-1) = \begin{bmatrix} -0.6 \\ 1.6 \end{bmatrix}, yielding the constrained solution x = \begin{bmatrix} 0.4 \\ 1.6 \end{bmatrix}. This satisfies the constraint and minimizes the residual sum of squares under it.[10]Quadratic programming for inequality constraints
Constrained least squares problems with inequality constraints can be reformulated as quadratic programs (QPs), which are convex optimization problems of the form \min_x \ \frac{1}{2} x^T (A^T A) x - (A^T b)^T x \quad \text{subject to} \quad Gx \leq h, \ Cx = d, where the objective function arises from the squared Euclidean norm \|Ax - b\|^2/2, G and h define the inequality constraints, and C and d handle any equality constraints.[11] This standard QP structure ensures the problem is solvable using established optimization techniques, as the Hessian A^T A is positive semidefinite.[12] Active set methods address this QP by iteratively maintaining a working set of active inequality constraints, treating them as equalities and solving the resulting equality-constrained subproblem via Lagrange multipliers, then updating the set by adding violated constraints or removing non-binding ones based on dual feasibility checks.[13] These methods are particularly efficient for sparse or structured problems, though they can require up to exponentially many iterations in the worst case due to potential cycling over constraint sets.[14] A seminal example is the Lawson-Hanson algorithm for non-negative least squares (NNLS), which applies active set principles to minimize \|Ax - b\|^2 subject to x \geq 0 by starting with all variables passive and adding positivity constraints as needed while ensuring descent in the objective. In contrast, interior-point methods incorporate the inequality constraints through logarithmic barrier functions or primal-dual formulations, transforming the QP into a sequence of unconstrained (or equality-constrained) problems solved via Newton's method on the associated Karush-Kuhn-Tucker (KKT) system, with central path-following to approach feasibility and optimality.[15] These approaches guarantee polynomial-time convergence, typically O(\sqrt{n} \log(1/\epsilon)) iterations for an \epsilon-optimal solution in convex QPs with n variables, making them suitable for large-scale instances despite higher per-iteration costs from solving linear systems.[14] Active set methods, while potentially superlinear in practice for well-behaved problems, lack such worst-case guarantees but often outperform interior-point methods on smaller or degenerate cases.[14] Practical implementations of these QP solvers for constrained least squares are available in numerical software, such as thequadprog function in MATLAB, which supports both active set and interior-point algorithms, and the CVXOPT library in Python, which provides solvers like solvers.qp based on interior-point methods.[11][16]
A representative application is solving NNLS to estimate non-negative regression coefficients, such as in image processing where pixel intensities cannot be negative; for instance, given data matrix A \in \mathbb{R}^{m \times n} and response b \in \mathbb{R}^m, the QP formulation with G = -I_n and h = 0 yields coefficients x \geq 0 minimizing the residual sum of squares, often computed efficiently via the Lawson-Hanson active set method for problems up to thousands of variables.[17]