Fact-checked by Grok 2 weeks ago

Least absolute deviations

Least absolute deviations (LAD), also known as median , is a statistical method for that minimizes the sum of absolute residuals between observed and predicted values, formulated as \min_{\beta} \sum_{i=1}^n |y_i - x_i^T \beta|, where y_i are the responses, x_i the predictors, and \beta the parameters to estimate. Unlike ordinary (OLS), which minimizes the sum of squared residuals and is sensitive to outliers, LAD provides a robust by treating all deviations linearly, making it equivalent to the maximum likelihood estimator under a Laplace error distribution. The origins of LAD trace back to 1757, when proposed it for reconciling discrepancies in astronomical measurements of Earth's shape. further developed the approach in 1788 for similar observational adjustments. Although it predates OLS—introduced by in 1805 and around 1795—LAD saw limited adoption historically due to the lack of closed-form solutions and computational challenges, which favored the differentiable nature of . Advances in and optimization software in the late revived its use, enabling efficient computation via reformulation as a linear program with auxiliary error variables. LAD excels in robust statistics by mitigating the impact of outliers and heavy-tailed errors, where OLS can produce biased estimates; for instance, simulations demonstrate LAD's superior efficiency over OLS in non-normal error scenarios. Under Gaussian errors, LAD is less asymptotically efficient than OLS but outperforms in contaminated data. Applications span financial modeling (e.g., GARCH processes for returns), chemometrics, economic surveys, and regression trees, where it minimizes median absolute deviations for node splits. Modern inference for LAD includes asymptotic normality of estimators, \sqrt{n} (\hat{\beta} - \beta_0) \to N\left(0, \frac{1}{(2f(0))^2} V^{-1}\right) where f(0) is the error density at zero and V the design matrix covariance, along with resampling-based tests for hypotheses that handle heterogeneous errors without density assumptions.

Overview and Background

Definition and Motivation

Least absolute deviations (LAD), also known as L1 , is a statistical method that determines parameters by minimizing the sum of absolute residuals between observed and predicted values, applicable to both and location problems. This approach seeks to find the best-fitting model or that reduces the total absolute deviation across data points, offering a robust alternative to methods sensitive to error magnitude. The primary motivation for LAD stems from its robustness to outliers, as the penalty treats large deviations linearly rather than quadratically, preventing extreme values from disproportionately influencing the estimate. In contrast, squared-error methods amplify outliers, leading to biased fits in contaminated datasets, whereas LAD maintains stability by bounding the impact of such anomalies. This property makes LAD particularly valuable in real-world applications where data may include measurement errors or atypical observations, ensuring more reliable estimates under heteroscedasticity or heavy-tailed error distributions. In the univariate case, LAD estimation corresponds to the , which for one-dimensional data is simply the that minimizes the sum of deviations from the data points. For example, given a , the value that achieves this minimum balances the number of points on either side, providing a central location resistant to skewed or outlier-affected distributions. Historically, LAD emerged as an estimation technique in the , with Boscovich introducing the method in 1757 for fitting linear models by minimizing deviations, predating the approach and serving as an early robust alternative. Its use gained renewed attention in the through computational advancements and theoretical developments, solidifying its role in .

Relation to Other Regression Methods

Ordinary least squares (OLS) regression minimizes the sum of squared residuals between observed and predicted values, providing efficient estimates under the assumption of normally distributed errors with constant variance. This method relies on the L2 norm and is highly sensitive to outliers, as large residuals disproportionately influence the estimates due to squaring. In contrast, least absolute deviations (LAD) regression minimizes the sum of absolute residuals, employing the L1 norm, which treats all deviations more equally and reduces the impact of extreme values. The distinction between L1 and L2 norms positions LAD as a robust alternative to OLS, particularly in datasets with non-normal errors or contamination. While OLS achieves maximum efficiency under Gaussian assumptions, LAD serves as the maximum likelihood estimator for Laplace-distributed errors, offering greater resistance to outliers without requiring normality. This makes LAD preferable in scenarios where data may include gross errors, though it sacrifices some efficiency relative to OLS in uncontaminated normal settings. LAD regression is equivalent to median regression, estimating the conditional of the response variable given the predictors, analogous to how the sample minimizes absolute deviations in univariate data. Under the assumption that errors have a conditional of zero, LAD identifies parameters that minimize expected absolute deviations, generalizing the to linear models and providing a location estimate robust to and outliers. Within the broader framework for , LAD corresponds to a specific choice of the where the objective is the absolute deviation, balancing and breakdown . As a robust , it offers a compromise between the outlier sensitivity of OLS and more complex methods, though modern variants like MM-estimators often surpass it in high-leverage scenarios.

Mathematical Formulation

General Objective Function

The general objective function of least absolute deviations (LAD) estimation seeks to determine the parameter vector \theta that minimizes the of deviations from a set of observed points y_i \in \mathbb{R}, i = 1, \dots, n. This is formulated as \min_{\theta \in \mathbb{R}} \sum_{i=1}^n |y_i - \theta|, which corresponds to the classical problem in one . The method traces its origins to early statistical work, where it was recognized as a robust alternative to squared-error criteria for fitting to a central value. In the univariate case, the solution to this objective is the sample median, \hat{\theta} = \median(y_1, \dots, y_n). To derive this, assume without loss of generality that the data are ordered such that y_1 \leq y_2 \leq \dots \leq y_n. For an odd sample size n = 2m + 1, the median is y_{m+1}. The objective function S(\theta) = \sum_{i=1}^n |y_i - \theta| is piecewise linear and convex, with subgradient \partial S(\theta) = \sum_{i: y_i > \theta} 1 - \sum_{i: y_i < \theta} 1 + \sum_{i: y_i = \theta} [-1, 1]. At \theta < y_{m+1}, the subgradient is positive (more terms pull rightward), so S(\theta) decreases as \theta increases; at \theta > y_{m+1}, the subgradient is negative (more terms pull leftward), so S(\theta) increases. Thus, \theta = y_{m+1} minimizes S(\theta), as the subgradient contains zero. For even n = 2m, any \theta \in [y_m, y_{m+1}] works, with the conventional choice being the midpoint or either endpoint. This property was established in early probability theory as a key advantage of the median for absolute error minimization. This objective extends naturally to multivariate location estimation in \mathbb{R}^d, where the goal is to minimize the sum of L1 norms: \min_{\theta \in \mathbb{R}^d} \sum_{i=1}^n \|y_i - \theta\|_1, \quad \|z\|_1 = \sum_{j=1}^d |z^j|. Due to the separability of the L1 norm across coordinates, the optimization decouples into d univariate problems, yielding the component-wise sample as the : \hat{\theta}^j = \median(y_1^j, \dots, y_n^j) for each j. This multivariate form preserves the robustness of the univariate case while handling vector-valued data. The LAD objective function is but non-differentiable at points where y_i = \theta for any i (or component-wise in the multivariate case), as the function has a at zero. This non-smoothness implies that standard gradient-based optimization techniques fail, necessitating specialized approaches like subgradient descent or reformulation into equivalent smooth or linear programs for computation.

Linear Model Specification

In the linear model specification for least absolute deviations (LAD) regression, the response variable y_i for each observation i = 1, \dots, n is modeled as a linear function of predictors plus an error term: y_i = \beta_0 + \sum_{j=1}^p \beta_j x_{ij} + \epsilon_i, where \beta_0 is the intercept parameter, \beta_j (for j = 1, \dots, p) are the slope parameters, x_{ij} are the values of the p predictor variables, and \epsilon_i are the errors assumed to have median zero. The LAD estimator \hat{\beta} = (\hat{\beta}_0, \hat{\beta}_1, \dots, \hat{\beta}_p)^\top is obtained by minimizing the sum of absolute residuals: \hat{\beta} = \arg\min_{\beta} \sum_{i=1}^n \left| y_i - \left( \beta_0 + \sum_{j=1}^p \beta_j x_{ij} \right) \right|. This objective directly adapts the general LAD criterion to the linear predictor form, focusing on median-based fitting rather than mean-based. The intercept parameter \beta_0 shifts the regression hyperplane vertically to center the fitted values around the median of the responses when all predictors are zero, while each slope parameter \beta_j measures the marginal effect of the corresponding predictor x_j on y, adjusted for the other predictors in the model. These parameters collectively define the linear predictor \hat{y}_i = \beta_0 + \sum_{j=1}^p \beta_j x_{ij}, which approximates the conditional median of y_i given the predictors. In , the specification reduces to a single predictor (p = 1), yielding the model y_i = \beta_0 + \beta_1 x_{i1} + \epsilon_i and the objective \min_{\beta_0, \beta_1} \sum_{i=1}^n |y_i - \beta_0 - \beta_1 x_{i1}|, often visualized as the line passing through the data that minimizes total vertical absolute deviations. For (p > 1), the formulation incorporates additional predictors, allowing for more complex relationships while maintaining the same absolute deviation minimization over the multivariate . Residuals in the LAD linear model are denoted as the absolute deviations from the fitted values, r_i = |y_i - \hat{y}_i|, where \hat{y}_i = \hat{\beta}_0 + \sum_{j=1}^p \hat{\beta}_j x_{ij}; the sum \sum_{i=1}^n r_i represents the minimized objective value, emphasizing pointwise deviations without squaring.

Computation and Solution Methods

Linear Programming Formulation

The least absolute deviations (LAD) estimation problem for a linear model can be reformulated as a linear program by introducing auxiliary variables to handle the absolute values in the objective function. Specifically, for observations y_i = \mathbf{x}_i^T \boldsymbol{\beta} + e_i where i = 1, \dots, n, define non-negative auxiliary variables u_i \geq 0 and v_i \geq 0 for each i such that the residual e_i = u_i - v_i and |e_i| = u_i + v_i. This transformation linearizes the absolute deviation term while preserving the original minimization of \sum_{i=1}^n |e_i|. The full linear programming problem is then stated as: \begin{align*} \min_{\boldsymbol{\beta}, \mathbf{u}, \mathbf{v}} \quad & \sum_{i=1}^n (u_i + v_i) \\ \text{subject to} \quad & u_i - v_i = y_i - \mathbf{x}_i^T \boldsymbol{\beta}, \quad i = 1, \dots, n, \\ & u_i \geq 0, \quad v_i \geq 0, \quad i = 1, \dots, n, \end{align*} where \boldsymbol{\beta} is the vector of parameters. To ensure feasibility and in solvers, bounds on the parameters \boldsymbol{\beta} (e.g., L_j \leq \beta_j \leq U_j) can be added as additional constraints, converting the problem to a standard form suitable for software such as CPLEX or Gurobi. This bounded-variable formulation was first detailed for contexts in seminal work on applying to statistical . For small datasets, where the number of observations n and parameters p is modest (e.g., n \leq 100), the resulting linear program can be solved exactly using the simplex method, which efficiently navigates the feasible region defined by the n equality constraints and non-negativity conditions. The simplex algorithm's pivot operations handle the bounded variables effectively, yielding the optimal LAD estimates upon termination. This approach demonstrates the computational tractability of LAD via linear programming for problems where exact solutions are preferred over approximations.

Iterative Approximation Algorithms

Iterative approximation algorithms for least absolute deviations () regression provide practical solutions when exact linear programming formulations become computationally prohibitive for large datasets, offering scalable alternatives that leverage iterative refinements to approximate the L1 minimizer. These methods typically start from an initial estimate, such as ordinary least squares, and iteratively update parameters until convergence to a solution that minimizes the sum of absolute residuals. The Barrodale-Roberts algorithm stands as a seminal specialized variant of the method tailored specifically for problems, optimizing the representation of L1 by exploiting its structure to reduce storage and computational demands. Introduced in 1973, it performs row and column operations on an to efficiently navigate the , achieving exact solutions faster than general-purpose simplex implementations for moderate-sized problems with up to hundreds of observations. This algorithm is particularly effective for linear models, as it avoids unnecessary pivots by prioritizing median-based properties inherent to the L1 objective. Another widely adopted approach is the iterative reweighted least squares (IRLS) adaptation for L1 minimization, which approximates the non-differentiable absolute deviation function through a sequence of weighted least squares subproblems. In each iteration, weights are assigned inversely proportional to the absolute residuals from the previous estimate—specifically, w_i = 1 / |\hat{e}_i| for nonzero residuals, with safeguards like a small epsilon to avoid division by zero—transforming the LAD objective into a quadratic form solvable via standard least squares solvers. This method converges to the LAD solution under mild conditions, such as bounded residuals and a suitable initial guess, often requiring 10–20 iterations for practical accuracy in low-dimensional settings. Convergence properties of these iterative methods depend on the algorithm and problem characteristics; the Barrodale-Roberts variant guarantees finite termination to an exact optimum due to the finite basis in , typically in O(nm) operations where n is the number of observations and m the number of parameters. In contrast, IRLS exhibits monotonic decrease in the objective function and rates globally when augmented with regularization, though it may slow near zero residuals or require techniques for high dimensions. Common stopping criteria include a relative change in parameter estimates below a (e.g., 10^{-6}), a fixed maximum number of iterations (e.g., 50), or negligible reduction in the sum of absolute deviations between successive steps. Handling non-uniqueness in LAD solutions, where multiple parameter sets achieve the same minimum due to the objective's flatness at the (e.g., when an even number of residuals straddle zero), requires algorithms to detect and report alternative minimizers. The Barrodale-Roberts algorithm identifies non-uniqueness by checking for degenerate bases or multiple optimal vertices in the simplex tableau, allowing users to select, for instance, the solution with minimal L2 norm among equivalents. For IRLS, non-uniqueness manifests as convergence to one of several local equivalents, mitigated by post-processing or ensemble starts from perturbed initials to verify robustness.

Statistical Properties

Robustness Characteristics

Least absolute deviations (LAD) estimation demonstrates strong robustness properties, particularly in its resistance to outliers and deviations from distributional assumptions. In the univariate setting, LAD corresponds to the sample , which achieves a breakdown point of 50%, allowing it to withstand contamination of up to nearly half the data points by arbitrary s before the estimate can be driven to or lose all resemblance to the true ; this contrasts sharply with the sample mean's breakdown point of 0%, where even a single can arbitrarily distort the estimate. The influence function further underscores LAD's robustness, revealing bounded influence from individual observations. For the univariate median, the influence function is given by \text{IF}(x; F, T) = \frac{\operatorname{sign}(x - \mu)}{2 f(\mu)}, where \mu is the true median, f(\mu) is the density at \mu, and \operatorname{sign}(\cdot) limits the effect of any single point to at most $1/(2 f(\mu)), preventing outliers from exerting unbounded leverage. In the regression context, LAD's influence function for residuals is \psi(r) = \operatorname{sign}(r), which is bounded between -1 and 1, ensuring that outliers in the response variable contribute only a constant maximum influence regardless of their extremity, unlike ordinary least squares (OLS) where influence grows linearly with residual size. Under contamination models such as Huber's framework, where the data distribution is (1 - \epsilon) F + \epsilon G with F the good distribution and G arbitrary, the univariate LAD estimator remains consistent provided \epsilon < 0.5, as the median shifts continuously but stays within a bounded neighborhood of the true value. This property extends to LAD regression under similar conditions, maintaining consistency for moderate contamination levels when the design matrix satisfies standard regularity assumptions, such as bounded leverage. Empirical simulations illustrate these characteristics, showing that in scenarios with outliers in the response direction or high-leverage points, LAD yields lower mean squared errors for slope estimates than OLS, demonstrating reduced bias and variance under outlier contamination.

Asymptotic Behavior

The least absolute deviations (LAD) estimator is consistent for the true regression parameters under mild conditions, including the design matrix having full column rank asymptotically and the error distribution possessing a unique median at zero with finite first moment. Under additional regularity conditions, such as the error density being positive and continuous at the median, the LAD estimator exhibits asymptotic normality. Specifically, for independent and identically distributed errors with median zero, \sqrt{n} (\hat{\beta} - \beta) \xrightarrow{d} N\left(0, \frac{1}{4 f(0)^2} (E[X X^T])^{-1}\right), where f(0) denotes the error density evaluated at the median, \beta is the true parameter vector, and X represents the regressors. The asymptotic relative efficiency of the LAD estimator compared to ordinary least squares (OLS) depends on the error distribution. For Gaussian errors, LAD achieves approximately 64% of the efficiency of OLS due to the influence of the density term in the asymptotic variance. However, in heavy-tailed distributions like the Laplace, where outliers are more prevalent, LAD outperforms OLS in efficiency, leveraging its robustness to achieve lower asymptotic variance. A Bahadur representation exists for the LAD regression coefficients, expressing \hat{\beta} as the true \beta plus a linear term involving the score function and a remainder that is asymptotically negligible at rate O_p(n^{-1/2}), facilitating derivations of higher-order asymptotics and inference procedures.

Advantages and Disadvantages

Benefits in Outlier Resistance

Least absolute deviations (LAD) regression offers significant advantages in handling outliers, particularly in datasets prone to anomalies, by minimizing the sum of absolute residuals rather than squared ones, which prevents extreme values from disproportionately influencing the fit. This approach yields improved prediction accuracy in contaminated environments, such as financial time series where sudden market shocks introduce outliers; for instance, applications to demonstrate that LAD models effectively capture non-linear behaviors while maintaining stability against such disruptions. Unlike ordinary least squares (OLS), which can produce biased estimates when outliers are present, LAD's robustness stems from its equivalence to estimating the conditional median, providing a straightforward interpretation as the value that minimizes absolute deviations and resists vertical outliers up to nearly 50% contamination. This property enhances overall model reliability in real-world scenarios with irregular data structures. In simulations of contaminated regression settings, LAD consistently outperforms OLS by delivering lower mean squared errors and higher efficiency when up to 20% of observations are outliers, underscoring its practical superiority for robust inference. As noted in analyses of robustness characteristics, LAD achieves a high breakdown point for response outliers, further solidifying its role in outlier-resistant estimation.

Limitations in Efficiency and Computation

One significant limitation of the least absolute deviations (LAD) estimator is its lower asymptotic efficiency compared to (OLS) when the errors follow a normal distribution. Specifically, the asymptotic relative efficiency of the LAD estimator relative to OLS is \frac{2}{\pi} \approx 0.637, implying that the asymptotic variance of the LAD estimator is approximately \frac{\pi}{2} \approx 1.57 times larger than that of the OLS estimator under normality. This reduced efficiency arises because LAD minimizes the sum of absolute residuals, which is less optimal for symmetric unimodal densities like the normal, where squared residuals better capture the variance structure. Another challenge is the potential non-uniqueness of LAD solutions, particularly in cases of collinear predictors or sparse data configurations. When the design matrix is singular due to collinearity, the linear programming formulation of LAD admits multiple optimal solutions, as the objective function remains constant along certain directions in the parameter space. Similarly, in sparse datasets—such as those with few observations relative to parameters or with many tied residuals—the median regression nature of LAD can lead to non-unique minimizers, complicating interpretation and requiring additional regularization to select a unique estimate. LAD also suffers from higher computational demands, especially for large sample sizes n, due to its formulation as a linear program lacking a closed-form solution unlike OLS. Solving the LAD problem typically requires numerical optimization via simplex or interior-point methods, with worst-case complexity scaling as O(n^3) in general-purpose solvers, making it substantially slower than the O(np^2) direct computation of OLS for fixed predictors p. This absence of closed-form expressions necessitates iterative approximations, further increasing the practical burden for high-dimensional or big data applications. While specialized algorithms like iterative reweighted least squares can mitigate some costs, they still demand more resources than closed-form alternatives.

Applications and Examples

Real-World Use Cases

In econometrics, (LAD) regression is applied to model income data, which often exhibits skewness and outliers due to factors like economic shocks or high-income extremes. For instance, Glahe and Hunt (1970) used LAD to estimate parameters in a simultaneous equation model incorporating personal and farm income variables from U.S. time series data (1960–1964), demonstrating its superior performance over in small samples with non-normal error distributions. This robustness makes LAD particularly suitable for analyzing income distributions where outliers can distort mean-based estimates, as highlighted in broader econometric literature on thick-tailed economic indicators. In engineering, LAD regression supports signal processing tasks, especially for recovering sparse signals contaminated by heavy-tailed noise, where traditional least squares methods fail. Markovic (2013) proposed an LAD-based algorithm that adapts for sparse signal reconstruction, achieving higher recovery rates than least squares under t(2)-distributed noise, which is common in real-world sensor data. This approach enhances fault detection in systems by identifying deviations in signal patterns without undue influence from anomalous measurements, leveraging LAD's outlier resistance. In environmental science, LAD regression is employed to analyze skewed pollution measurements, such as contaminant concentrations in sediments or volatile compounds, where outliers from sampling variability or extreme events are prevalent. Mebarki et al. (2017) applied LAD to model retention indices of pyrazines—heterocyclic pollutants from industrial sources—using quantitative structure-retention relationships, yielding robust fits (R² > 98%) on gas chromatography data for 114 compounds. Similarly, Grant and Middleton (1998) utilized least absolute values regression for grain-size normalization of metal contaminants in Humber Estuary sediments, effectively handling outliers to distinguish true pollution signals from textural artifacts. Software implementations of LAD regression are widely available, facilitating its adoption across disciplines. In R, the quantreg package provides the rq() function, where setting tau=0.5 computes the as a special case of . Python's statsmodels library offers QuantReg from statsmodels.regression., with q=0.5 yielding LAD results for linear models. In , the PROC QUANTREG procedure supports LAD estimation via the quantile=0.5 option, including tools for robust analysis.

Illustrative Numerical Example

To illustrate the least absolute deviations (LAD) method, begin with the univariate case, where it corresponds to selecting the as the estimator that minimizes the sum of absolute deviations from the data points. Consider the following dataset of 11 observations, which includes an : 22, 24, 26, 28, 29, 31, 35, 37, 41, 53, . Sorting the values yields the same order, and with an odd number of points, the is the sixth value, 31. The absolute deviations are 9, 7, 5, 3, 2, 0, 4, 6, 10, 22, and , summing to 101. This sum is minimized at the , providing robustness against the at 64, which contributes only to the total but would exert disproportionate influence under squared-error minimization. For comparison, the of this dataset is 390 / 11 ≈ 35.45. The absolute deviations from the sum to approximately 106.4 (calculated as ∑|x_i - 35.45| for each point, where the contributes 28.55 and pulls the center higher, increasing the total deviation). To arrive at the 's sum of absolute deviations, subtract the from each sorted value and take absolutes: |22-35.45|=13.45, |24-35.45|=11.45, |26-35.45|=9.45, |28-35.45|=7.45, |29-35.45|=6.45, |31-35.45|=4.45, |35-35.45|=0.45, |37-35.45|=1.55, |41-35.45|=5.55, |53-35.45|=17.55, |64-35.45|=28.55; summing these yields 13.45 + 11.45 + 9.45 + 7.45 + 6.45 + 4.45 + 0.45 + 1.55 + 5.55 + 17.55 + 28.55 = 106.35. Thus, the yields a lower sum of absolute deviations (101 vs. 106.35), highlighting LAD's outlier resistance in the univariate setting. Now consider a example using the dataset with 8 points, including potential outliers such as the low value at x=3: (1,7), (2,14), (3,10), (4,17), (5,15), (6,21), (7,26), (8,23). To fit the parameters β₀ (intercept) and β₁ (), the LAD objective minimizes ∑|y_i - (β₀ + β₁ x_i)|, which can be reformulated as a basic linear program: minimize ∑(u_i + v_i) subject to y_i = β₀ + β₁ x_i + u_i - v_i for i=1 to 8, with u_i ≥ 0 and v_i ≥ 0 representing positive and negative s, respectively. This setup ensures the absolute |y_i - (β₀ + β₁ x_i)| = u_i + v_i is captured linearly. Solving this yields the fitted line ŷ = 4.2 + 2.8x, which passes through two data points ((1,7) and (6,21)) as is characteristic of univariate lines. The and absolute residuals are shown in the table below:
xyŷ (y - ŷ)Absolute Residual
177.00.00.0
2149.84.24.2
31012.6-2.62.6
41715.41.61.6
51518.2-3.23.2
62121.00.00.0
72623.82.22.2
82326.6-3.63.6
The sum of absolute residuals is 17.4. The point at (3,10) acts as a potential , deviating below the trend, while (8,23) shows a milder downward deviation; the LAD fit balances these without overemphasizing them. In contrast, ordinary (OLS) minimizes the sum of squared residuals, yielding ŷ ≈ 5.75 + 2.42x (computed via β₁ = Cov(x,y)/(x) = 101.5/42 ≈ 2.42 and β₀ = 16.625 - 2.42×4.5 ≈ 5.75, where Cov and Var use sample formulas with n-1 denominator). The sum of absolute residuals under this OLS fit is approximately 18.2 (higher than LAD's 17.4), as the line is pulled upward by higher-y points like (7,26), increasing errors for lower points such as (3,10). To arrive at the OLS sum of absolute residuals, compute fitted values (e.g., for x=1: 8.17, residual |7-8.17|=1.17; sum all as detailed earlier), confirming LAD's superior under the absolute error criterion and greater resistance to outliers.

Extensions and Variations

Quantile Regression Connections

Least absolute deviations (LAD) estimation serves as a special case of when the quantile level τ is set to 0.5, corresponding to the conditional of the response given the predictors. In this framework, the LAD objective minimizes the sum of absolute residuals, which aligns precisely with the criterion at the , providing a robust measure of that is less sensitive to outliers compared to mean-based estimators. The general form of quantile regression extends this by estimating conditional at arbitrary levels τ ∈ (0,1), formulated as the minimization of the of a check function applied to the residuals. Specifically, the objective is to minimize ∑_{i=1}^n ρ_τ(y_i - x_i^T β), where the check function is defined as ρ_τ(u) = u (τ - ), with denoting the that equals 1 if the argument is true and 0 otherwise. This linear asymmetrically weights positive and negative residuals: residuals below the quantile are weighted by τ, while those above are weighted by (1 - τ), allowing for the characterization of the full distributional response rather than just the center. Estimation of quantiles other than the median follows a similar optimization approach to , reformulating the problem as a linear program () that can be solved efficiently using standard or interior-point methods. For a given τ, the involves introducing auxiliary variables to handle the absolute deviations in the check function, enabling the computation of β(τ) as the solution to a set of linear constraints, much like the median case but with adjusted weights. This solvability is a key advantage, as it scales well for moderate-sized datasets and facilitates across the quantile spectrum. While LAD focuses solely on the median for robust central estimation, quantile regression provides a broader interpretive lens by mapping the entire conditional of the response, revealing heteroscedasticity, , and varying covariate effects across different parts of the . For instance, lower quantiles (small τ) emphasize the tails influenced by adverse conditions, whereas upper quantiles highlight growth or upper-bound behaviors, offering insights that a single summary cannot capture.

Nonlinear and Multivariate Forms

Nonlinear least deviations () generalizes the LAD criterion to models where the f(\mathbf{x}_i; \boldsymbol{\theta}) is nonlinear in the parameters \boldsymbol{\theta}. The objective is to minimize the sum of residuals, \hat{\boldsymbol{\theta}} = \arg\min_{\boldsymbol{\theta}} \sum_{i=1}^n |y_i - f(\mathbf{x}_i; \boldsymbol{\theta})|, which promotes robustness to outliers while fitting complex relationships, such as in dynamic systems or growth curves. Under assumptions that the errors have a of zero and a positive density at the , this estimator is and asymptotically , with the asymptotic depending on the error density at zero and the of the mean . A for the is available, enabling even under heteroscedasticity, testable via a statistic based on residuals. Penalized versions further incorporate measurement errors or sparsity, maintaining asymptotic normality with rates adjusted for the penalty. In multivariate , LAD extends to vector-valued responses \mathbf{y}_i \in \mathbb{R}^q by minimizing an L1-type loss, often formulated as the sum of norms of residuals or through multivariate to estimate the conditional matrix. A key approach uses a transformation-retransformation : the response vectors are rotated into a data-driven via an adaptive , followed by coordinatewise univariate LAD on the transformed data, and then retransformation to recover the parameter . This method ensures equivariance under linear transformations of the responses, asymptotic normality, and superior efficiency over ordinary or non-equivariant coordinatewise LAD when errors are heavy-tailed and non-normal. Bootstrap procedures facilitate estimation, and simulations show robustness gains in finite samples. Adaptations of LAD to generalized linear models (GLMs) replace the traditional with an absolute deviation criterion, yielding robust fits for non-normal responses like or data. The estimating equations minimize a robustified deviance analogous to sum of absolute deviations in the linear case, often using bounded functions to downweight outliers in the working response or variance. For instance, the LAD fit solves score equations derived from a minimum power or Huber-type adapted to the GLM link and variance functions, providing high breakdown point and asymptotic under . These estimators maintain the GLM while resisting leverage and response outliers, with via . A primary challenge in nonlinear and multivariate LAD is the non-convexity of the objective function when f is nonlinear, leading to multiple local minima and requiring techniques like genetic algorithms or concave-convex procedures to avoid poor solutions. The non-differentiability at zero further complicates gradient-based methods, often necessitating smoothed approximations or iterative reweighted for practical computation. In multivariate settings, the transformation step adds computational overhead, though adaptive algorithms mitigate this. Despite these issues, the robustness benefits persist across dimensions.

Historical Development

Origins and Key Contributors

The method of least absolute deviations (LAD), which minimizes the sum of absolute errors in regression, traces its origins to the mid-18th century in the work of Roger Joseph Boscovich. In 1757, Boscovich proposed using this approach to adjust astronomical observations by fitting a line that minimizes the total absolute deviations from the data points, predating the least squares method by over 50 years. This geometric intuition emphasized balancing positive and negative deviations to achieve an optimal fit without squaring errors, providing an early robust alternative for handling measurement inaccuracies. Pierre-Simon Laplace built upon these ideas in the late 18th and early 19th centuries, advocating for minimization in the context of probability and . Laplace's contributions, particularly in works from 1786 onward, explored procedures and integrated deviations into probabilistic models of , influencing the theoretical foundation of LAD as a maximum likelihood under the . His emphasis on measures helped establish LAD's role in early statistical practice for problems involving non-normal distributions. The 20th-century revival of LAD came through formalizations by figures like , who in 1888 provided a geometric interpretation of the least sum of absolute errors problem. Edgeworth described the solution as the intersection of half-planes defined by the data, enabling graphical methods for solving LAD regressions before the advent of . These pre-linear programming techniques relied on iterative geometric constructions to find the regression line, highlighting LAD's computational challenges and robustness properties. Key modern contributors include Peter J. Huber, whose 1964 development of M-estimators in positioned LAD as a fundamental robust procedure resistant to outliers. Huber's framework underscored LAD's breakdown point and efficiency under contaminated distributions. Similarly, Roger Koenker and Gilbert Bassett, in their 1978 paper on regression quantiles, formalized LAD as the special case of the (50th ) quantile regression, bridging it to broader conditional estimation. Their work revitalized interest in LAD by embedding it within theory, emphasizing its connections to the sample .

Evolution in Statistical Practice

Although proposed as early as 1757 by Roger Joseph Boscovich for fitting lines to astronomical data, the least absolute deviations (LAD) method saw limited adoption in statistical practice during the 18th and 19th centuries, overshadowed by Carl Friedrich Gauss's least squares approach, which offered simpler closed-form solutions and better computational tractability for Gaussian errors. Pierre-Simon Laplace also explored LAD variants in the late 18th century for probabilistic modeling, but the method's non-differentiable objective function made it challenging to implement without modern computing, restricting its use to small-scale or geometric problems. The mid-20th century marked a revival of amid growing interest in , particularly following Peter Huber's work on breakdown points and outlier resistance in the , which highlighted 's superior performance under heavy-tailed error distributions compared to . A pivotal development occurred in 1978 when Koenker and Bassett formalized as regression within the broader framework of , enabling estimation of conditional quantiles and facilitating asymptotic inference under mild conditions. This integration spurred econometric applications, exemplified by Takeshi Amemiya's 1982 introduction of two-stage estimators for simultaneous equation models, analogous to two-stage but robust to non-normal errors and . Concurrently, Peter Bloomfield and William L. Steiger's 1983 monograph provided comprehensive algorithms, including simplex-based methods, bridging theory and practical computation for linear models. By the 1990s, advances in optimization addressed LAD's computational hurdles, with Stephen Portnoy and Roger Koenker demonstrating in 1997 that preprocessing techniques combined with interior-point methods could make LAD estimation competitive with or faster than for large datasets (n up to 10,000), achieving 10- to 100-fold speedups over traditional algorithms. This catalyzed widespread adoption in statistical software; for instance, Koenker's quantreg package in (initially released in the early 2000s) implements LAD as the tau=0.5 case of , supporting and for robust analysis in fields like and . Today, LAD remains a standard tool in practice, particularly for datasets with outliers or asymmetric errors, as evidenced by its use in econometric modeling of wage distributions and assessment, where it outperforms in efficiency under Laplace-distributed errors.

References

  1. [1]
    Least Absolute Deviation - an overview | ScienceDirect Topics
    Least absolute deviation refers to a method that minimizes the mean absolute deviation from the median within a node in regression trees, providing a more ...
  2. [2]
    LAD Regression — Hands-On Mathematical Optimization ... - AMPL
    The Least Absolute Deviation (LAD) is a possible statistical optimality criterion for such a linear regression. Similar to the well-known least-squares ...
  3. [3]
    [PDF] LEAST SQUARES VERSUS LEAST ABSOLUTE DEVIATIONS ...
    A more recent history of regression analysis and its applications can also be seen in Draper and Smith (1981). Minimizing Sum of Squared Deviations (Least ...<|control11|><|separator|>
  4. [4]
    [PDF] Analysis of least absolute deviation - HKUST Math Department
    The least absolute deviation or L1 method is a widely known alternative to the classical least squares or L2 method for statistical analysis of linear ...<|control11|><|separator|>
  5. [5]
    [PDF] A Simple Noncalculus Proof That the Median Minimizes the Sum of ...
    It is widely known among statisticians that the median minimizes the sum of the absolute deviation about any point for a set of x's, xl, x2, x3, . . . , x,. ...Missing: source paper
  6. [6]
    On Boscovich's Estimator - Project Euclid
    Boscovich's (1757) proposal to estimate the parameters of a linear model by minimizing the sum of absolute deviations subject to the constraint that the ...
  7. [7]
    4. robust regression for the linear model
    1. Other names for LAV regression are least absolute deviations (LAD) regres- sion and minimum sum of absolute errors (MSAE) regression (Birkes and Dodge. 1993) ...
  8. [8]
    [PDF] CHARLES UNIVERSITY Least Absolute Deviations
    This is a theoretical study of the Least Absolute Deviations (LAD) fits. In the first part, fundamental mathematical properties of LAD fits are established.
  9. [9]
    [PDF] Notes On Median and Quantile Regression James L. Powell ...
    of regression coefficients, least absolute deviations (LAD) estimation is the generalization of the sample median to the linear regression context. For the ...
  10. [10]
    computability of squared-error versus absolute-error estimators
    November 1997 The Gaussian hare and the Laplacian tortoise: computability of squared-error versus absolute-error estimators. Stephen Portnoy, Roger Koenker.
  11. [11]
    [PDF] Least absolute deviation estimation of linear econometric models
    Feb 13, 2007 · This estimator measures the error term as the absolute distance of the estimated values from the true values and belongs to the median family of ...
  12. [12]
    Linear Programming Techniques for Regression Analysis
    Linear Programming Techniques for Regression Analysis. Harvey M. Wagner Stanford University. Pages 206-212 | Published online: 11 Apr 2012.
  13. [13]
    Least Absolute Deviations: Theory, Applications and Algorithms
    The LAD criterion has great utility. LAD fits are optimal for linear regressions where the errors are double exponential. However they also have excellent ...Missing: formulation seminal
  14. [14]
    An Improved Algorithm for Discrete l 1 Linear Approximation
    A modification to the discrete l 1 linear approximation algorithm of Barrodale and Roberts allows the determination of the vector x that minimizes ‖ e ‖ 1 + ‖ x ...
  15. [15]
    (PDF) A Robust Algorithm for Least Absolute Deviations Curve Fitting
    One of the most popular methods for solving the least absolute deviations data fitting problem is the Barrodale and Roberts (BR) algorithm (1973), which is ...
  16. [16]
    [PDF] Fast Regression Quantiles Using A Modification of the Barrodale ...
    Least absolute deviation (LAD) regression is the impor- tant special case where 8 = 1/2. Applications. The algorithm may be used to estimate the regression ...
  17. [17]
    LAD Regression via IRLS Method | Real Statistics Using Excel
    Describes how to calculate the coefficients of a LAD regression model in Excel using iteratively reweighted least squares (IRLS). Examples are provided.Missing: L1 minimization
  18. [18]
    [PDF] Breakdown Point Theory Notes
    Feb 2, 2006 · From the breakdown point characterization of robustness, the sample mean is the worst estimator ever invented. It is suitable only for perfect.
  19. [19]
    Statistics 5601 (Geyer, Fall 2007) Course Announcement
    The sample median has breakdown point 50% because even if just under half of the data are wild, the median is in the majority half. (The median changes as ...
  20. [20]
    [PDF] Model Selection in Kernel Regression using Influence Function
    T(Pε,z)−T(P) ε . The influence function measures the effect on the estimator T when adding an infinitesimally small amount of contamination at the point z. ...
  21. [21]
    [PDF] The Influence Function of Penalized Regression Estimators
    Feb 28, 2014 · The least absolute deviation (LAD) estimator is well suited for heavy-tailed error distributions, but does not perform any variable ...
  22. [22]
    LAD Estimation for Linear Regression Models with Contaminated Data
    In this paper, linear regression models with contaminated ... consistency and asymptotic normality of the proposed method are proved under some regular conditions ...
  23. [23]
    [PDF] ASYMPTOTICS FOR LEAST ABSOLUTE DEVIATION REGRESSION ...
    The LAD estimator of the vector parameter in a linear regression is defined by minimizing the sum of the absolute values of the residuals. This paper provides a ...
  24. [24]
    Least squares versus minimum absolute deviation estimation in ...
    Aug 6, 2025 · The focus of this book is on the birth and historical development of permutation statistical methods from the early 1920s to the near present.
  25. [25]
    [PDF] Second Order Representations of the Least Absolute Deviation ...
    We consider exact weak and strong Bahadur-Kiefer representa- tions of the least absolute deviation estimator for the linear regression model. The precise ...
  26. [26]
    A Robust Method for Multiple Linear Regression - jstor
    Robust Estimation. Least Squares. Least Absolute Deviations. Sine Estimate. Huiber Estimate. 1. INTRODUCTION. Much of statistical computing is done on linear.
  27. [27]
    Least Absolute Deviation Estimation for All-Pass Time Series Models
    The methodology is applied to exchange rate returns to show that linear all-pass models can mimic "nonlinear" behavior, and is applied to stock market volume ...
  28. [28]
    Regression Quantiles - jstor
    The least absolute error estimator is the regression median, i.e., the regression quantile for 0 = 1/2. We now introduce some crucial notation and state ...Missing: LAD | Show results with:LAD
  29. [29]
    [PDF] Robust Linear Regression: A Review and Comparison - arXiv
    Apr 24, 2014 · Although LAD is more resistent than OLS to unusual y values, it is sensitive to high leverage outliers, and thus has a breakdown point of BP = 1 ...
  30. [30]
    [PDF] Robustness in sparse linear models: relative efficiency based ... - arXiv
    Dec 24, 2016 · case of the huber, least absolute deviations and quantile losses, respectively. ... Deviation estimator is around 2/π. Further, we can see that in ...
  31. [31]
    Asymptotic Theory of Least Absolute Error Regression
    Apr 6, 2012 · The estimator which minimizes the sum of absolute residuals is demonstrated to be consistent and asymptotically Gaussian with covariance matrix.
  32. [32]
    [PDF] Least Absolute Deviations Method For Sparse Signal Recovery
    Feb 2, 2013 · Our goal is to apply the least absolute deviations (LAD) method in an algorithm that would essentially follow the steps of the orthogonal ...
  33. [33]
    (PDF) Least Absolute Deviation Regression and Least Squares for ...
    Aug 7, 2025 · PDF | On May 1, 2017, Fatiha Mebarki and others published Least Absolute Deviation Regression and Least Squares for Modeling Retention ...
  34. [34]
    Contaminants in sediments: Using robust regression for grain-size ...
    They can be removed semi-manually, but robust regression methods such as least absolute values provide a convenient and objective alternative. The methods are ...
  35. [35]
    [PDF] quantreg: Quantile Regression
    Also described in detail in Portnoy and Koenker(1997), this method is primarily well-suited for large n, small p problems, that is when the parametric dimension ...
  36. [36]
    statsmodels.regression.quantile_regression.QuantReg
    Names of exogenous variables. The Least Absolute Deviation (LAD) estimator is a special case where quantile is set to 0.5 (q argument of the fit method). The ...
  37. [37]
    [PDF] 213-30: An Introduction to Quantile Regression and the QUANTREG ...
    This paper introduces the QUANTREG procedure, which computes estimates and related quantities for quantile regression. For SAS 9.1, an experimental version of ...
  38. [38]
    How to Find Outliers | 4 Ways with Examples & Explanation - Scribbr
    Nov 30, 2021 · Step 1: Sort your data from low to high · Step 2: Identify the median, the first quartile (Q1), and the third quartile (Q3) · Step 3: Calculate ...
  39. [39]
    [PDF] THE BEST LEAST ABSOLUTE DEVIATION LINEAR REGRESSION
    In this paper we give an overview of basic properties and facts related to estimation of parameters of the best LAD-line and LAD-plane and propose an efficient ...
  40. [40]
    Quantile Regression - Cambridge University Press & Assessment
    Quantile regression is gradually emerging as a unified statistical methodology for estimating models of conditional quantile functions.
  41. [41]
    Least Absolute Deviation Regression - SpringerLink
    The least absolute deviation regression was introduced around 50 years before the least-squares method, in 1757, by Roger Joseph Boscovich. He used this ...
  42. [42]
    The historical development of the linear minimax absolute residual ...
    The minimax absolute residual procedure was first proposed by Laplace in 1786 and developed over the next 40 years by de Prony, Cauchy, Fourier, and Laplace ...Missing: errors minimization
  43. [43]
    Robust Estimation of a Location Parameter - Semantic Scholar
    This paper contains a new approach toward a theory of robust estimation; it treats in detail the asymptotic theory of estimating a location parameter for ...<|separator|>
  44. [44]
    [PDF] Regression Quantiles - Roger Koenker; Gilbert Bassett, Jr.
    Oct 30, 2003 · The least absolute error estimator is the regression median, i.e., the regression quantile for = 1/2. We now introduce some crucial notation ...Missing: LAD | Show results with:LAD
  45. [45]
    Two Stage Least Absolute Deviations Estimators - jstor
    In this paper the method of least absolute deviations is applied to the estimation of the parameters of a structural equation in the simultaneous equations ...<|separator|>