Least absolute deviations
Least absolute deviations (LAD), also known as median regression, is a statistical method for linear regression that minimizes the sum of absolute residuals between observed and predicted values, formulated as \min_{\beta} \sum_{i=1}^n |y_i - x_i^T \beta|, where y_i are the responses, x_i the predictors, and \beta the parameters to estimate.[1] Unlike ordinary least squares (OLS), which minimizes the sum of squared residuals and is sensitive to outliers, LAD provides a robust estimator by treating all deviations linearly, making it equivalent to the maximum likelihood estimator under a Laplace error distribution.[2][1] The origins of LAD trace back to 1757, when Roger Joseph Boscovich proposed it for reconciling discrepancies in astronomical measurements of Earth's shape. Pierre-Simon Laplace further developed the approach in 1788 for similar observational adjustments. Although it predates OLS—introduced by Adrien-Marie Legendre in 1805 and Carl Friedrich Gauss around 1795—LAD saw limited adoption historically due to the lack of closed-form solutions and computational challenges, which favored the differentiable nature of least squares. Advances in linear programming and optimization software in the late 20th century revived its use, enabling efficient computation via reformulation as a linear program with auxiliary error variables.[2] LAD excels in robust statistics by mitigating the impact of outliers and heavy-tailed errors, where OLS can produce biased estimates; for instance, simulations demonstrate LAD's superior efficiency over OLS in non-normal error scenarios.[1][3] Under Gaussian errors, LAD is less asymptotically efficient than OLS but outperforms in contaminated data.[1] Applications span financial modeling (e.g., GARCH processes for returns), chemometrics, economic surveys, and regression trees, where it minimizes median absolute deviations for node splits.[1] Modern inference for LAD includes asymptotic normality of estimators, \sqrt{n} (\hat{\beta} - \beta_0) \to N\left(0, \frac{1}{(2f(0))^2} V^{-1}\right) where f(0) is the error density at zero and V the design matrix covariance, along with resampling-based tests for hypotheses that handle heterogeneous errors without density assumptions.[4]Overview and Background
Definition and Motivation
Least absolute deviations (LAD), also known as L1 regression, is a statistical estimation method that determines parameters by minimizing the sum of absolute residuals between observed and predicted values, applicable to both regression analysis and location estimation problems.[4] This approach seeks to find the best-fitting model or central tendency that reduces the total absolute deviation across data points, offering a robust alternative to methods sensitive to error magnitude. The primary motivation for LAD stems from its robustness to outliers, as the absolute value penalty treats large deviations linearly rather than quadratically, preventing extreme values from disproportionately influencing the estimate.[4] In contrast, squared-error methods amplify outliers, leading to biased fits in contaminated datasets, whereas LAD maintains stability by bounding the impact of such anomalies. This property makes LAD particularly valuable in real-world applications where data may include measurement errors or atypical observations, ensuring more reliable parameter estimates under heteroscedasticity or heavy-tailed error distributions.[4] In the univariate case, LAD estimation corresponds to the geometric median, which for one-dimensional data is simply the sample median that minimizes the sum of absolute deviations from the data points.[5] For example, given a dataset, the value that achieves this minimum balances the number of points on either side, providing a central location resistant to skewed or outlier-affected distributions. Historically, LAD emerged as an estimation technique in the 18th century, with Roger Boscovich introducing the method in 1757 for fitting linear models by minimizing absolute deviations, predating the least squares approach and serving as an early robust alternative.[6] Its use gained renewed attention in the 20th century through computational advancements and theoretical developments, solidifying its role in robust statistics.[4]Relation to Other Regression Methods
Ordinary least squares (OLS) regression minimizes the sum of squared residuals between observed and predicted values, providing efficient estimates under the assumption of normally distributed errors with constant variance. This method relies on the L2 norm and is highly sensitive to outliers, as large residuals disproportionately influence the estimates due to squaring. In contrast, least absolute deviations (LAD) regression minimizes the sum of absolute residuals, employing the L1 norm, which treats all deviations more equally and reduces the impact of extreme values.[7][8] The distinction between L1 and L2 norms positions LAD as a robust alternative to OLS, particularly in datasets with non-normal errors or contamination. While OLS achieves maximum efficiency under Gaussian assumptions, LAD serves as the maximum likelihood estimator for Laplace-distributed errors, offering greater resistance to outliers without requiring normality. This makes LAD preferable in scenarios where data may include gross errors, though it sacrifices some efficiency relative to OLS in uncontaminated normal settings.[8][7] LAD regression is equivalent to median regression, estimating the conditional median of the response variable given the predictors, analogous to how the sample median minimizes absolute deviations in univariate data. Under the assumption that errors have a conditional median of zero, LAD identifies parameters that minimize expected absolute deviations, generalizing the median to linear models and providing a location estimate robust to skewness and outliers.[9] Within the broader M-estimation framework for robust statistics, LAD corresponds to a specific choice of the influence function where the objective is the absolute deviation, balancing efficiency and breakdown resistance. As a robust M-estimator, it offers a compromise between the outlier sensitivity of OLS and more complex methods, though modern variants like MM-estimators often surpass it in high-leverage scenarios.[7]Mathematical Formulation
General Objective Function
The general objective function of least absolute deviations (LAD) estimation seeks to determine the parameter vector \theta that minimizes the sum of absolute deviations from a set of observed data points y_i \in \mathbb{R}, i = 1, \dots, n. This is formulated as \min_{\theta \in \mathbb{R}} \sum_{i=1}^n |y_i - \theta|, which corresponds to the classical location estimation problem in one dimension.[10] The method traces its origins to early statistical work, where it was recognized as a robust alternative to squared-error criteria for fitting data to a central value.[10] In the univariate case, the solution to this objective is the sample median, \hat{\theta} = \median(y_1, \dots, y_n). To derive this, assume without loss of generality that the data are ordered such that y_1 \leq y_2 \leq \dots \leq y_n. For an odd sample size n = 2m + 1, the median is y_{m+1}. The objective function S(\theta) = \sum_{i=1}^n |y_i - \theta| is piecewise linear and convex, with subgradient \partial S(\theta) = \sum_{i: y_i > \theta} 1 - \sum_{i: y_i < \theta} 1 + \sum_{i: y_i = \theta} [-1, 1]. At \theta < y_{m+1}, the subgradient is positive (more terms pull rightward), so S(\theta) decreases as \theta increases; at \theta > y_{m+1}, the subgradient is negative (more terms pull leftward), so S(\theta) increases. Thus, \theta = y_{m+1} minimizes S(\theta), as the subgradient contains zero. For even n = 2m, any \theta \in [y_m, y_{m+1}] works, with the conventional choice being the midpoint or either endpoint. This property was established in early probability theory as a key advantage of the median for absolute error minimization.[10] This objective extends naturally to multivariate location estimation in \mathbb{R}^d, where the goal is to minimize the sum of L1 norms: \min_{\theta \in \mathbb{R}^d} \sum_{i=1}^n \|y_i - \theta\|_1, \quad \|z\|_1 = \sum_{j=1}^d |z^j|. Due to the separability of the L1 norm across coordinates, the optimization decouples into d independent univariate problems, yielding the component-wise sample median as the solution: \hat{\theta}^j = \median(y_1^j, \dots, y_n^j) for each dimension j. This multivariate form preserves the robustness of the univariate case while handling vector-valued data. The LAD objective function is convex but non-differentiable at points where y_i = \theta for any i (or component-wise in the multivariate case), as the absolute value function has a kink at zero. This non-smoothness implies that standard gradient-based optimization techniques fail, necessitating specialized approaches like subgradient descent or reformulation into equivalent smooth or linear programs for computation.[10]Linear Model Specification
In the linear model specification for least absolute deviations (LAD) regression, the response variable y_i for each observation i = 1, \dots, n is modeled as a linear function of predictors plus an error term: y_i = \beta_0 + \sum_{j=1}^p \beta_j x_{ij} + \epsilon_i, where \beta_0 is the intercept parameter, \beta_j (for j = 1, \dots, p) are the slope parameters, x_{ij} are the values of the p predictor variables, and \epsilon_i are the errors assumed to have median zero.[11] The LAD estimator \hat{\beta} = (\hat{\beta}_0, \hat{\beta}_1, \dots, \hat{\beta}_p)^\top is obtained by minimizing the sum of absolute residuals: \hat{\beta} = \arg\min_{\beta} \sum_{i=1}^n \left| y_i - \left( \beta_0 + \sum_{j=1}^p \beta_j x_{ij} \right) \right|. This objective directly adapts the general LAD criterion to the linear predictor form, focusing on median-based fitting rather than mean-based.[11] The intercept parameter \beta_0 shifts the regression hyperplane vertically to center the fitted values around the median of the responses when all predictors are zero, while each slope parameter \beta_j measures the marginal effect of the corresponding predictor x_j on y, adjusted for the other predictors in the model.[11] These parameters collectively define the linear predictor \hat{y}_i = \beta_0 + \sum_{j=1}^p \beta_j x_{ij}, which approximates the conditional median of y_i given the predictors.[4] In simple linear regression, the specification reduces to a single predictor (p = 1), yielding the model y_i = \beta_0 + \beta_1 x_{i1} + \epsilon_i and the objective \min_{\beta_0, \beta_1} \sum_{i=1}^n |y_i - \beta_0 - \beta_1 x_{i1}|, often visualized as the line passing through the data that minimizes total vertical absolute deviations.[11] For multiple linear regression (p > 1), the formulation incorporates additional predictors, allowing for more complex relationships while maintaining the same absolute deviation minimization over the multivariate linear combination.[11] Residuals in the LAD linear model are denoted as the absolute deviations from the fitted values, r_i = |y_i - \hat{y}_i|, where \hat{y}_i = \hat{\beta}_0 + \sum_{j=1}^p \hat{\beta}_j x_{ij}; the sum \sum_{i=1}^n r_i represents the minimized objective value, emphasizing pointwise deviations without squaring.[11]Computation and Solution Methods
Linear Programming Formulation
The least absolute deviations (LAD) estimation problem for a linear model can be reformulated as a linear program by introducing auxiliary variables to handle the absolute values in the objective function. Specifically, for observations y_i = \mathbf{x}_i^T \boldsymbol{\beta} + e_i where i = 1, \dots, n, define non-negative auxiliary variables u_i \geq 0 and v_i \geq 0 for each i such that the residual e_i = u_i - v_i and |e_i| = u_i + v_i. This transformation linearizes the absolute deviation term while preserving the original minimization of \sum_{i=1}^n |e_i|.[12] The full linear programming problem is then stated as: \begin{align*} \min_{\boldsymbol{\beta}, \mathbf{u}, \mathbf{v}} \quad & \sum_{i=1}^n (u_i + v_i) \\ \text{subject to} \quad & u_i - v_i = y_i - \mathbf{x}_i^T \boldsymbol{\beta}, \quad i = 1, \dots, n, \\ & u_i \geq 0, \quad v_i \geq 0, \quad i = 1, \dots, n, \end{align*} where \boldsymbol{\beta} is the vector of regression parameters. To ensure feasibility and numerical stability in solvers, bounds on the parameters \boldsymbol{\beta} (e.g., L_j \leq \beta_j \leq U_j) can be added as additional constraints, converting the problem to a standard form suitable for linear programming software such as CPLEX or Gurobi. This bounded-variable formulation was first detailed for regression contexts in seminal work on applying linear programming to statistical estimation.[12][13] For small datasets, where the number of observations n and parameters p is modest (e.g., n \leq 100), the resulting linear program can be solved exactly using the simplex method, which efficiently navigates the feasible region defined by the n equality constraints and non-negativity conditions. The simplex algorithm's pivot operations handle the bounded variables effectively, yielding the optimal LAD estimates upon termination. This approach demonstrates the computational tractability of LAD via linear programming for problems where exact solutions are preferred over approximations.[12]Iterative Approximation Algorithms
Iterative approximation algorithms for least absolute deviations (LAD) regression provide practical solutions when exact linear programming formulations become computationally prohibitive for large datasets, offering scalable alternatives that leverage iterative refinements to approximate the L1 minimizer.[14] These methods typically start from an initial estimate, such as ordinary least squares, and iteratively update parameters until convergence to a solution that minimizes the sum of absolute residuals.[15] The Barrodale-Roberts algorithm stands as a seminal specialized variant of the simplex method tailored specifically for LAD problems, optimizing the linear programming representation of L1 regression by exploiting its structure to reduce storage and computational demands.[14] Introduced in 1973, it performs row and column operations on an augmented matrix to efficiently navigate the feasible region, achieving exact solutions faster than general-purpose simplex implementations for moderate-sized problems with up to hundreds of observations.[14] This algorithm is particularly effective for linear LAD models, as it avoids unnecessary pivots by prioritizing median-based properties inherent to the L1 objective.[16] Another widely adopted approach is the iterative reweighted least squares (IRLS) adaptation for L1 minimization, which approximates the non-differentiable absolute deviation function through a sequence of weighted least squares subproblems.[17] In each iteration, weights are assigned inversely proportional to the absolute residuals from the previous estimate—specifically, w_i = 1 / |\hat{e}_i| for nonzero residuals, with safeguards like a small epsilon to avoid division by zero—transforming the LAD objective into a quadratic form solvable via standard least squares solvers.[17] This method converges to the LAD solution under mild conditions, such as bounded residuals and a suitable initial guess, often requiring 10–20 iterations for practical accuracy in low-dimensional settings. Convergence properties of these iterative methods depend on the algorithm and problem characteristics; the Barrodale-Roberts simplex variant guarantees finite termination to an exact optimum due to the finite basis in linear programming, typically in O(nm) operations where n is the number of observations and m the number of parameters.[14] In contrast, IRLS exhibits monotonic decrease in the objective function and linear convergence rates globally when augmented with smoothing regularization, though it may slow near zero residuals or require acceleration techniques for high dimensions. Common stopping criteria include a relative change in parameter estimates below a threshold (e.g., 10^{-6}), a fixed maximum number of iterations (e.g., 50), or negligible reduction in the sum of absolute deviations between successive steps.[15] Handling non-uniqueness in LAD solutions, where multiple parameter sets achieve the same minimum due to the objective's flatness at the median (e.g., when an even number of residuals straddle zero), requires algorithms to detect and report alternative minimizers. The Barrodale-Roberts algorithm identifies non-uniqueness by checking for degenerate bases or multiple optimal vertices in the simplex tableau, allowing users to select, for instance, the solution with minimal L2 norm among equivalents.[16] For IRLS, non-uniqueness manifests as convergence to one of several local equivalents, mitigated by post-processing sensitivity analysis or ensemble starts from perturbed initials to verify robustness.Statistical Properties
Robustness Characteristics
Least absolute deviations (LAD) estimation demonstrates strong robustness properties, particularly in its resistance to outliers and deviations from distributional assumptions. In the univariate setting, LAD corresponds to the sample median, which achieves a breakdown point of 50%, allowing it to withstand contamination of up to nearly half the data points by arbitrary outliers before the estimate can be driven to infinity or lose all resemblance to the true parameter; this contrasts sharply with the sample mean's breakdown point of 0%, where even a single outlier can arbitrarily distort the estimate.[18][19] The influence function further underscores LAD's robustness, revealing bounded influence from individual observations. For the univariate median, the influence function is given by \text{IF}(x; F, T) = \frac{\operatorname{sign}(x - \mu)}{2 f(\mu)}, where \mu is the true median, f(\mu) is the density at \mu, and \operatorname{sign}(\cdot) limits the effect of any single point to at most $1/(2 f(\mu)), preventing outliers from exerting unbounded leverage. In the regression context, LAD's influence function for residuals is \psi(r) = \operatorname{sign}(r), which is bounded between -1 and 1, ensuring that outliers in the response variable contribute only a constant maximum influence regardless of their extremity, unlike ordinary least squares (OLS) where influence grows linearly with residual size.[20][21] Under contamination models such as Huber's \epsilon-contamination framework, where the data distribution is (1 - \epsilon) F + \epsilon G with F the good distribution and G arbitrary, the univariate LAD estimator remains consistent provided \epsilon < 0.5, as the median shifts continuously but stays within a bounded neighborhood of the true value. This property extends to LAD regression under similar conditions, maintaining consistency for moderate contamination levels when the design matrix satisfies standard regularity assumptions, such as bounded leverage.[22] Empirical simulations illustrate these characteristics, showing that in scenarios with outliers in the response direction or high-leverage points, LAD yields lower mean squared errors for slope estimates than OLS, demonstrating reduced bias and variance under outlier contamination.Asymptotic Behavior
The least absolute deviations (LAD) estimator is consistent for the true regression parameters under mild conditions, including the design matrix having full column rank asymptotically and the error distribution possessing a unique median at zero with finite first moment.[23] Under additional regularity conditions, such as the error density being positive and continuous at the median, the LAD estimator exhibits asymptotic normality. Specifically, for independent and identically distributed errors with median zero, \sqrt{n} (\hat{\beta} - \beta) \xrightarrow{d} N\left(0, \frac{1}{4 f(0)^2} (E[X X^T])^{-1}\right), where f(0) denotes the error density evaluated at the median, \beta is the true parameter vector, and X represents the regressors.[23][9] The asymptotic relative efficiency of the LAD estimator compared to ordinary least squares (OLS) depends on the error distribution. For Gaussian errors, LAD achieves approximately 64% of the efficiency of OLS due to the influence of the density term in the asymptotic variance. However, in heavy-tailed distributions like the Laplace, where outliers are more prevalent, LAD outperforms OLS in efficiency, leveraging its robustness to achieve lower asymptotic variance.[24][4] A Bahadur representation exists for the LAD regression coefficients, expressing \hat{\beta} as the true \beta plus a linear term involving the score function and a remainder that is asymptotically negligible at rate O_p(n^{-1/2}), facilitating derivations of higher-order asymptotics and inference procedures.[25]Advantages and Disadvantages
Benefits in Outlier Resistance
Least absolute deviations (LAD) regression offers significant advantages in handling outliers, particularly in datasets prone to anomalies, by minimizing the sum of absolute residuals rather than squared ones, which prevents extreme values from disproportionately influencing the fit.[26] This approach yields improved prediction accuracy in contaminated environments, such as financial time series where sudden market shocks introduce outliers; for instance, applications to exchange rate returns demonstrate that LAD models effectively capture non-linear behaviors while maintaining stability against such disruptions.[27] Unlike ordinary least squares (OLS), which can produce biased estimates when outliers are present, LAD's robustness stems from its equivalence to estimating the conditional median, providing a straightforward interpretation as the value that minimizes absolute deviations and resists vertical outliers up to nearly 50% contamination.[28] This property enhances overall model reliability in real-world scenarios with irregular data structures. In simulations of contaminated regression settings, LAD consistently outperforms OLS by delivering lower mean squared errors and higher efficiency when up to 20% of observations are outliers, underscoring its practical superiority for robust inference.[3] As noted in analyses of robustness characteristics, LAD achieves a high breakdown point for response outliers, further solidifying its role in outlier-resistant estimation.[26]Limitations in Efficiency and Computation
One significant limitation of the least absolute deviations (LAD) estimator is its lower asymptotic efficiency compared to ordinary least squares (OLS) when the errors follow a normal distribution. Specifically, the asymptotic relative efficiency of the LAD estimator relative to OLS is \frac{2}{\pi} \approx 0.637, implying that the asymptotic variance of the LAD estimator is approximately \frac{\pi}{2} \approx 1.57 times larger than that of the OLS estimator under normality. This reduced efficiency arises because LAD minimizes the sum of absolute residuals, which is less optimal for symmetric unimodal densities like the normal, where squared residuals better capture the variance structure.[29] Another challenge is the potential non-uniqueness of LAD solutions, particularly in cases of collinear predictors or sparse data configurations. When the design matrix is singular due to collinearity, the linear programming formulation of LAD admits multiple optimal solutions, as the objective function remains constant along certain directions in the parameter space. Similarly, in sparse datasets—such as those with few observations relative to parameters or with many tied residuals—the median regression nature of LAD can lead to non-unique minimizers, complicating interpretation and requiring additional regularization to select a unique estimate.[4] LAD also suffers from higher computational demands, especially for large sample sizes n, due to its formulation as a linear program lacking a closed-form solution unlike OLS. Solving the LAD problem typically requires numerical optimization via simplex or interior-point methods, with worst-case complexity scaling as O(n^3) in general-purpose solvers, making it substantially slower than the O(np^2) direct computation of OLS for fixed predictors p. This absence of closed-form expressions necessitates iterative approximations, further increasing the practical burden for high-dimensional or big data applications. While specialized algorithms like iterative reweighted least squares can mitigate some costs, they still demand more resources than closed-form alternatives.[4]Applications and Examples
Real-World Use Cases
In econometrics, least absolute deviations (LAD) regression is applied to model income data, which often exhibits skewness and outliers due to factors like economic shocks or high-income extremes. For instance, Glahe and Hunt (1970) used LAD to estimate parameters in a simultaneous equation model incorporating personal and farm income variables from U.S. time series data (1960–1964), demonstrating its superior performance over ordinary least squares in small samples with non-normal error distributions.[11] This robustness makes LAD particularly suitable for analyzing income distributions where outliers can distort mean-based estimates, as highlighted in broader econometric literature on thick-tailed economic indicators.[11] In engineering, LAD regression supports signal processing tasks, especially for recovering sparse signals contaminated by heavy-tailed noise, where traditional least squares methods fail. Markovic (2013) proposed an LAD-based algorithm that adapts orthogonal matching pursuit for sparse signal reconstruction, achieving higher recovery rates than least squares under t(2)-distributed noise, which is common in real-world sensor data.[30] This approach enhances fault detection in systems by identifying deviations in signal patterns without undue influence from anomalous measurements, leveraging LAD's outlier resistance.[30] In environmental science, LAD regression is employed to analyze skewed pollution measurements, such as contaminant concentrations in sediments or volatile compounds, where outliers from sampling variability or extreme events are prevalent. Mebarki et al. (2017) applied LAD to model retention indices of pyrazines—heterocyclic pollutants from industrial sources—using quantitative structure-retention relationships, yielding robust fits (R² > 98%) on gas chromatography data for 114 compounds.[31] Similarly, Grant and Middleton (1998) utilized least absolute values regression for grain-size normalization of metal contaminants in Humber Estuary sediments, effectively handling outliers to distinguish true pollution signals from textural artifacts.[32] Software implementations of LAD regression are widely available, facilitating its adoption across disciplines. In R, the quantreg package provides the rq() function, where setting tau=0.5 computes the LAD estimator as a special case of quantile regression.[33] Python's statsmodels library offers QuantReg from statsmodels.regression.quantile_regression, with q=0.5 yielding LAD results for linear models.[34] In SAS, the PROC QUANTREG procedure supports LAD estimation via the quantile=0.5 option, including inference tools for robust analysis.[35]Illustrative Numerical Example
To illustrate the least absolute deviations (LAD) method, begin with the univariate case, where it corresponds to selecting the median as the estimator that minimizes the sum of absolute deviations from the data points. Consider the following dataset of 11 observations, which includes an outlier: 22, 24, 26, 28, 29, 31, 35, 37, 41, 53, 64. Sorting the values yields the same order, and with an odd number of points, the median is the sixth value, 31. The absolute deviations are 9, 7, 5, 3, 2, 0, 4, 6, 10, 22, and 33, summing to 101. This sum is minimized at the median, providing robustness against the outlier at 64, which contributes only 33 to the total but would exert disproportionate influence under squared-error minimization.[36] For comparison, the arithmetic mean of this dataset is 390 / 11 ≈ 35.45. The absolute deviations from the mean sum to approximately 106.4 (calculated as ∑|x_i - 35.45| for each point, where the outlier contributes 28.55 and pulls the center higher, increasing the total deviation). To arrive at the mean's sum of absolute deviations, subtract the mean from each sorted value and take absolutes: |22-35.45|=13.45, |24-35.45|=11.45, |26-35.45|=9.45, |28-35.45|=7.45, |29-35.45|=6.45, |31-35.45|=4.45, |35-35.45|=0.45, |37-35.45|=1.55, |41-35.45|=5.55, |53-35.45|=17.55, |64-35.45|=28.55; summing these yields 13.45 + 11.45 + 9.45 + 7.45 + 6.45 + 4.45 + 0.45 + 1.55 + 5.55 + 17.55 + 28.55 = 106.35. Thus, the median yields a lower sum of absolute deviations (101 vs. 106.35), highlighting LAD's outlier resistance in the univariate setting. Now consider a simple linear regression example using the dataset with 8 points, including potential outliers such as the low value at x=3: (1,7), (2,14), (3,10), (4,17), (5,15), (6,21), (7,26), (8,23). To fit the parameters β₀ (intercept) and β₁ (slope), the LAD objective minimizes ∑|y_i - (β₀ + β₁ x_i)|, which can be reformulated as a basic linear program: minimize ∑(u_i + v_i) subject to y_i = β₀ + β₁ x_i + u_i - v_i for i=1 to 8, with u_i ≥ 0 and v_i ≥ 0 representing positive and negative residuals, respectively. This setup ensures the absolute residual |y_i - (β₀ + β₁ x_i)| = u_i + v_i is captured linearly.[37] Solving this yields the fitted line ŷ = 4.2 + 2.8x, which passes through two data points ((1,7) and (6,21)) as is characteristic of univariate LAD regression lines. The residuals and absolute residuals are shown in the table below:| x | y | ŷ | Residual (y - ŷ) | Absolute Residual |
|---|---|---|---|---|
| 1 | 7 | 7.0 | 0.0 | 0.0 |
| 2 | 14 | 9.8 | 4.2 | 4.2 |
| 3 | 10 | 12.6 | -2.6 | 2.6 |
| 4 | 17 | 15.4 | 1.6 | 1.6 |
| 5 | 15 | 18.2 | -3.2 | 3.2 |
| 6 | 21 | 21.0 | 0.0 | 0.0 |
| 7 | 26 | 23.8 | 2.2 | 2.2 |
| 8 | 23 | 26.6 | -3.6 | 3.6 |