Non-negative least squares
Non-negative least squares (NNLS) is a constrained optimization problem that seeks to minimize the Euclidean norm of the residual vector \|Ax - b\|_2^2, where A is an m \times n matrix, b is an m-dimensional vector, and the solution vector x must satisfy x \geq 0 componentwise.[1] This formulation extends the classical linear least squares problem by imposing non-negativity constraints, ensuring that the estimated parameters cannot be negative, which is essential when modeling phenomena where negative values are physically or contextually impossible, such as concentrations in chemical mixtures or intensities in image processing.[2] As a convex quadratic program with linear inequalities, NNLS guarantees a global minimum and is computationally tractable for moderate-sized problems. The foundational algorithm for solving NNLS is the active-set method developed by Charles L. Lawson and Richard J. Hanson, first detailed in their 1974 book Solving Least Squares Problems and later revised in the 1995 SIAM edition. This method iteratively identifies and adjusts the set of active constraints (where x_i = 0) using a sequence of unconstrained least squares subproblems, achieving efficiency through passive set strategies that avoid unnecessary constraint checks.[3] Modern implementations, such as those in SciPy and MATLAB, build on this approach, often incorporating block principal pivoting for faster convergence in large-scale settings.[1][2] Variations include projected gradient methods and interior-point techniques for handling sparsity or additional regularization.[4] NNLS finds broad applications across scientific and engineering domains where non-negativity is inherent. In machine learning, it serves as a core subroutine in non-negative matrix factorization (NMF), enabling the decomposition of data matrices into interpretable, parts-based representations for tasks like topic modeling and recommender systems.[5] In remote sensing and astronomy, NNLS is used for hyperspectral unmixing, estimating the abundance of endmember materials in mixed pixel spectra while enforcing non-negative fractions.[6] Additional uses include portfolio optimization in finance, where asset weights must be non-negative, and signal processing for sparse recovery from non-negative measurements.[7] These applications highlight NNLS's role in promoting sparsity and interpretability in high-dimensional data analysis.[8]Formulation
Problem Statement
The non-negative least squares (NNLS) problem seeks to determine a non-negative vector x \in \mathbb{R}^n with x \geq 0 that minimizes the squared Euclidean norm of the residual, formulated as \min_{x \geq 0} \| Ax - b \|_2^2, where A is an m \times n matrix with m \geq n, b is an m-dimensional vector, and the objective function measures the fit of the linear model Ax to the data b. This constrained optimization arises in scenarios where the entries of x must remain non-negative due to their interpretation as physical or probabilistic quantities, such as chemical concentrations in mixture analysis, where negative values lack physical meaning.[9][10] The NNLS problem extends the classical unconstrained least squares, which solves \min_x \| Ax - b \|_2^2 via the normal equations A^T A x = A^T b (assuming A^T A is invertible, yielding x = (A^T A)^{-1} A^T b). In cases where this unconstrained solution violates the non-negativity requirement—producing negative components—the NNLS adjusts by enforcing x \geq 0, effectively projecting the solution onto the non-negative orthant while preserving the least squares objective. This adjustment ensures feasibility for applications like spectroscopic quantification, where x represents relative concentrations that must be non-negative.[11] For illustration, consider a one-dimensional example with m = n = 1, A = 1, and b = -1. The unconstrained solution is x = -1, which is infeasible under the non-negativity constraint; thus, the NNLS solution sets x = 0, yielding a residual of \| -1 \|_2^2 = 1. This simple case highlights how the constraint overrides the unconstrained minimizer to maintain interpretability.Quadratic Programming Equivalence
The non-negative least squares (NNLS) problem, which seeks to minimize \|\mathbf{Ax} - \mathbf{b}\|^2_2 subject to \mathbf{x} \geq \mathbf{0}, can be equivalently reformulated as a quadratic programming (QP) problem. Expanding the squared norm yields the objective function \frac{1}{2} \mathbf{x}^T (\mathbf{A}^T \mathbf{A}) \mathbf{x} - (\mathbf{A}^T \mathbf{b})^T \mathbf{x} + \frac{1}{2} \mathbf{b}^T \mathbf{b}, where the constant term \frac{1}{2} \mathbf{b}^T \mathbf{b} does not affect the optimization and can be omitted. Thus, the NNLS problem is equivalent to minimizing the quadratic function \frac{1}{2} \mathbf{x}^T \mathbf{Q} \mathbf{x} + \mathbf{c}^T \mathbf{x} subject to \mathbf{x} \geq \mathbf{0}, with \mathbf{Q} = \mathbf{A}^T \mathbf{A} and \mathbf{c} = -\mathbf{A}^T \mathbf{b}. The Hessian matrix \mathbf{Q} = \mathbf{A}^T \mathbf{A} is symmetric and positive semidefinite, as \mathbf{x}^T (\mathbf{A}^T \mathbf{A}) \mathbf{x} = \|\mathbf{Ax}\|^2_2 \geq 0 for all \mathbf{x}, with equality if and only if \mathbf{Ax} = \mathbf{0}. This property guarantees that the QP objective is convex. This formulation positions NNLS as a special case of bound-constrained quadratic programming, featuring simple non-negativity bounds \mathbf{x} \geq \mathbf{0} and no equality constraints. The recognition of NNLS as a QP traces back to early optimization literature, but its formalization in this context was established by Lawson and Hanson in their seminal 1974 work on solving least squares problems.Theoretical Properties
Convexity and Feasibility
The non-negative least squares (NNLS) problem minimizes the quadratic objective function \|Ax - b\|_2^2 subject to the constraint x \geq 0, where A is an m \times n matrix and b is an m-vector. This objective is convex because it is a quadratic function f(x) = x^\top (A^\top A) x - 2 b^\top A x + \|b\|_2^2 whose Hessian matrix H = 2 A^\top A is positive semidefinite, as all eigenvalues of A^\top A are non-negative.[12] The feasible set defined by x \geq 0 is the non-negative orthant \mathbb{R}^n_+, which is a convex polyhedral cone, ensuring that any convex combination of feasible points remains feasible.[12] Consequently, the NNLS problem is a convex optimization problem, as it involves minimizing a convex function over a convex set.[12] The NNLS problem is always feasible, since the origin x = 0 satisfies the non-negativity constraints and yields a finite objective value \|b\|_2^2. In many applications, such as non-negative matrix factorization or signal processing with non-negative data, the matrix A has non-negative entries and b \geq 0, which further ensures feasibility while aligning with the problem's physical interpretability. The convexity of NNLS has key implications for optimization: any local minimum is also a global minimum, avoiding the pitfalls of local optima traps common in non-convex problems, and enabling the development of efficient, convergent algorithms like active set methods.[12] Geometrically, the objective function resembles an upward-opening paraboloid in \mathbb{R}^n, restricted to the first orthant by the feasible set, where the minimum occurs at the lowest point of this paraboloid within the orthant boundaries.[12]Existence and Uniqueness of Solutions
The non-negative least squares (NNLS) problem always admits at least one optimal solution for any matrix A \in \mathbb{R}^{m \times n} and vector b \in \mathbb{R}^m. This existence is guaranteed because the problem is a convex quadratic program with a nonempty feasible set (the nonnegative orthant \mathbb{R}^n_+, which includes x = 0) and a continuous objective function \frac{1}{2} \|Ax - b\|_2^2 that attains its infimum over the feasible set.[12] The objective is coercive in the sense that it tends to infinity as \|x\|_2 \to \infty along directions where Ax grows, while remaining bounded below (e.g., at x = 0, the value is \frac{1}{2} \|b\|_2^2); in cases where the null space of A intersects the nonnegative orthant nontrivially, the objective is constant along those recession directions, but the minimum is still attained on an affine subset of the feasible set.[12] Uniqueness of the solution holds if A has full column rank, i.e., \operatorname{rank}(A) = n. In this case, the Hessian A^T A is positive definite, making the objective strictly convex over \mathbb{R}^n, and thus the minimizer over the convex feasible set \mathbb{R}^n_+ is unique.[12] If \operatorname{rank}(A) < n, the columns of A are linearly dependent, and the solution set may be nonunique, consisting of all x \geq 0 that achieve the minimum residual on an affine subspace intersected with the nonnegative orthant; this occurs when there exists a nonzero d \geq 0 in the null space of A, allowing multiple points to yield the same objective value.[12] Any optimal solution x^* to the NNLS problem satisfies the Karush-Kuhn-Tucker (KKT) conditions, which are necessary and sufficient for optimality due to the problem's convexity.[12] These conditions are:- Primal feasibility: x^* \geq 0,
- Dual feasibility: \lambda^* \geq 0,
- Complementary slackness: x_i^* \lambda_i^* = 0 for all i = 1, \dots, n,
- Stationarity: A^T (Ax^* - b) - \lambda^* = 0.
Solution Methods
Active Set Methods
Active set methods for non-negative least squares (NNLS) operate by iteratively partitioning the variables into an active set, where components are constrained to zero, and a passive set, where variables are free to take non-negative values. At each iteration, the method solves an unconstrained least squares problem on the passive set to obtain a candidate solution, then adjusts the sets based on whether the solution violates the non-negativity constraints. This approach leverages the Karush-Kuhn-Tucker (KKT) conditions, ensuring that at optimality, variables in the active set have zero values and non-positive Lagrange multipliers, while passive variables are non-negative.[13] The seminal Lawson-Hanson algorithm, introduced in 1974, implements this strategy through a two-phase process: a forward phase to expand the passive set and a backward phase to contract it when necessary. In the forward phase, the algorithm identifies variables in the active set with positive Lagrange multipliers (indicating potential benefit from inclusion) and adds the one with the largest multiplier to the passive set; it then solves the reduced least squares problem on the updated passive set. If any passive variable becomes negative, the backward phase activates: the algorithm computes a step size to the boundary where the most negative variable reaches zero and removes that variable from the passive set, restoring feasibility. The residual is updated as r = b - A x after each passive set solve, where A is the design matrix, b the response vector, and x the current solution. The process repeats until no further adjustments are needed, guaranteeing convergence in a finite number of steps due to the monotonic decrease in the objective function and the finite number of possible active sets.[14][13] The following outline summarizes the core steps of the Lawson-Hanson algorithm:- Initialization: Set the passive set P = \emptyset (all variables active, x = 0), compute initial residual r = b and Lagrange multipliers w = A^T r.
- While w_j > 0 for some j \in active set:
- Termination: When \max(w_{active}) \leq 0, x is optimal.
lsqnonneg function, which follows this exact procedure.[15]
As the first algorithm dedicated specifically to NNLS, the Lawson-Hanson method laid the foundation for subsequent developments in constrained quadratic optimization, influencing active set approaches in broader optimization contexts.[14]