Fact-checked by Grok 2 weeks ago

Mean absolute error

The mean absolute error (MAE), also known as L1 loss, is a fundamental statistical metric used to assess the accuracy of predictive models by quantifying the average magnitude of errors between predicted and actual values, without regard to the direction of those errors. It is particularly prevalent in regression analysis and machine learning for evaluating how closely model predictions align with observed data. The MAE is computed using the formula:

\text{MAE} = \frac{1}{n} \sum_{i=1}^{n} |y_i - \hat{y}_i|

where n is the number of observations, y_i represents the actual values, and \hat{y}_i denotes the predicted values.^[1]^[2] MAE offers several advantages as an error metric, including computational simplicity, which makes it efficient for large datasets, and robustness to outliers since it does not square errors, thereby avoiding disproportionate penalties for extreme deviations.^[1] In contrast to the mean squared error (MSE), which amplifies larger errors through squaring and is optimal for Gaussian-distributed errors, MAE provides a more balanced assessment suitable for Laplacian error distributions and scenarios where interpretability is prioritized over sensitivity to anomalies.^[3]^[1] This metric is widely applied in fields such as time series forecasting, weather prediction, and economic modeling, where it helps quantify typical prediction discrepancies in a scale-consistent manner with the data units.^[1]^[3]^[4] Despite its strengths, MAE has limitations, notably its lack of differentiability at zero, which can complicate gradient-based optimization in training machine learning models, potentially leading to slower convergence or suboptimal minima.^[1] Additionally, while MAE excels in robustness, it may underemphasize large errors compared to MSE, making it less ideal for applications where such deviations carry severe consequences, like safety-critical systems.^[5] Overall, the choice of MAE over alternatives depends on the error distribution assumptions and the specific goals of model evaluation, often complementing other metrics for a holistic performance assessment.^[3]

Definition and Interpretation

Formal Definition

The mean absolute error (MAE) is defined as the average of the absolute differences between a set of predicted values and their corresponding observed values in a dataset.^[3] The formal mathematical expression for MAE, given n paired observations, is

\text{MAE} = \frac{1}{n} \sum_{i=1}^{n} |y_i - \hat{y}_i|,

where y_i denotes the i-th observed value and \hat{y}_i denotes the i-th predicted value.^[3] In regression analysis, the predictions are commonly expressed as \hat{y}_i = f(x_i), where f is the regression function and x_i represents the input features for the i-th observation, while y consistently denotes the observed responses.^[6] To compute the MAE, first determine the absolute error for each pair as |y_i - \hat{y}_i|, then sum these values across all n pairs, and finally divide the total by n.^[3] This process yields a single scalar value representing the average magnitude of the prediction errors, without regard to their direction.^[3] The concept of mean absolute error traces its origins to 18th-century statistical theory as a measure of deviation from a central tendency, with foundational contributions from Pierre-Simon Laplace's 1774 work on modeling error frequencies.^[7]

Interpretation and Units

The mean absolute error (MAE) preserves the units of the original data, such as dollars when predicting prices or meters when forecasting distances, which enhances its interpretability for non-technical stakeholders by aligning directly with the scale of the measured variable.^[4]^[8] A MAE value of 5, for instance, signifies that the model's predictions deviate from the true values by an average of 5 units across the dataset; smaller values thus reflect higher accuracy, providing a clear benchmark for model performance evaluation.^[4] MAE scales linearly with the magnitude of errors, meaning larger deviations contribute proportionally to the overall score without excessive emphasis on extreme outliers.^[3] As the average of the L1 norm applied to prediction errors, MAE quantifies the typical absolute deviation in a manner that remains intuitive and unit-consistent, differing from approaches that amplify errors nonlinearly.^[3] For example, in temperature forecasting, a MAE of 2°C indicates an average error of 2 degrees Celsius, offering a practical sense of reliability for daily weather predictions.^[9]

Mathematical Properties

Robustness to Outliers

The mean absolute error (MAE) demonstrates robustness to outliers primarily because it employs the absolute value function, which applies a linear penalty to errors of any size. This linear treatment ensures that extreme errors contribute to the overall metric in direct proportion to their deviation, without the disproportionate amplification seen in metrics that use quadratic penalties, such as mean squared error (MSE). As a result, MAE provides a more balanced assessment of model performance in datasets prone to occasional large deviations, preventing outliers from overly skewing the evaluation.^[3] This robustness stems from a fundamental statistical property: the value that minimizes the expected MAE in a set of observations is the median. Unlike the mean, which can be heavily influenced by extreme values, the median maintains stability by focusing on the central ordering of data points, achieving a breakdown point of 50%—the highest possible for location estimators—meaning it can tolerate contamination in up to half the observations before becoming unreliable. This property carries over to MAE-based estimation in broader contexts, enhancing its reliability when data integrity is compromised by outliers.^[10]^[11] To illustrate, consider a dataset of five errors: 1, 1, 1, 1, and 10. The MAE is calculated as the average absolute error, yielding 2.8, whereas the MSE yields 20.8 due to the squaring of the outlier (10² = 100). Here, the single large error elevates the MSE far more dramatically than the MAE, highlighting how the linear scaling in MAE limits the outlier's disruptive effect.^[12] In the realm of non-parametric statistics, MAE estimators are particularly favored for analyzing contaminated data, as their linear error treatment aligns well with heavy-tailed distributions common in real-world scenarios with outliers, offering greater stability than squared-error alternatives.^[3]

Optimality Under L1 Loss

In statistics, the value that minimizes the expected absolute deviation from a random variable X, defined as \mathbb{E}[|X - c|], is any median of the distribution of X.^[13] This optimality arises because the subderivative of the absolute loss function |x| is the sign function \operatorname{sign}(x), which equals -1 for x < 0, +1 for x > 0, and any value in [-1, 1] at x = 0. For the expected loss \mathbb{E}[|X - c|], the subderivative is \mathbb{E}[\operatorname{sign}(X - c)], which equals zero precisely when P(X \leq c) \geq 1/2 and P(X \geq c) \geq 1/2, defining the median; this critical point is a global minimum due to the convexity of the absolute loss.^[10] In the sample setting, the empirical minimizer of the mean absolute error \frac{1}{n} \sum_{i=1}^n |x_i - m| is the sample median, which approximates the population median as n increases.^[14] For regression under L1 loss, the predictor that minimizes the expected mean absolute error \mathbb{E}[|Y - f(X)|] is the conditional median \operatorname{median}(Y \mid X).^[13] Unlike the mean squared error, which is minimized by the conditional mean and thus emphasizes variance reduction, the mean absolute error is optimal for median-based predictions under L1 loss, prioritizing balanced deviations around the center of the distribution.^[13]

Comparisons with Other Error Metrics

Versus Mean Squared Error

The mean squared error (MSE) is defined as the average of the squared differences between predicted and actual values, given by

\text{MSE} = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2,

where n is the number of observations, y_i are the actual values, and \hat{y}_i are the predicted values.^[15] This quadratic penalization in MSE amplifies the impact of larger errors, assigning them disproportionately greater weight compared to smaller ones.^[16] In contrast, the mean absolute error (MAE) applies a linear penalty to errors, treating deviations proportionally without squaring, which makes MAE less sensitive to outliers than MSE.^[17] Consequently, MSE is more influenced by extreme values, potentially leading to higher overall scores in datasets with anomalies, while MAE provides a more balanced assessment across the error distribution.^[18] MSE is typically preferred in scenarios assuming Gaussian error distributions or where penalizing large deviations is crucial.^[19] MAE, however, is favored for robustness in datasets with skewed distributions or outliers, like environmental or sensor data, as it aligns better with Laplacian error assumptions and avoids overemphasizing rare extremes.^[19]^[18] From a computational perspective, MAE retains the same units as the target variable, facilitating direct interpretation, but its absolute value function renders it non-differentiable at zero, complicating gradient-based optimization techniques.^[15] MSE, being differentiable everywhere, supports smoother optimization but requires scaling interpretations due to its squared units. For illustration, consider a small dataset with actual values [1, 2, 3, 100] and predictions [1, 2, 3, 4]. The MAE is \frac{|1-1| + |2-2| + |3-3| + |100-4|}{4} = 24, while the MSE is \frac{(1-1)^2 + (2-2)^2 + (3-3)^2 + (100-4)^2}{4} = 2304, demonstrating how the outlier inflates MSE far more than MAE.^[16]

Versus Other Absolute Error Variants

The Mean Absolute Percentage Error (MAPE) is a relative variant of absolute error metrics, defined as

\text{MAPE} = \frac{100}{n} \sum_{i=1}^n \left| \frac{y_i - \hat{y}_i}{y_i} \right|,

where y_i are the actual values and \hat{y}_i are the predictions. This formulation expresses errors as percentages, offering scale invariance that facilitates comparisons across datasets with varying magnitudes or units. However, MAPE encounters critical limitations, such as undefined results when any y_i = 0 due to division by zero, and inapplicability to datasets containing negative values.^[20] In contrast to MAPE's relative nature, the Mean Absolute Error (MAE) is scale-dependent, as its magnitude directly reflects the units of the target variable, making it unsuitable for direct comparisons between differently scaled datasets. MAE proves preferable over MAPE in scenarios involving zero or negative actual values, avoiding the computational instabilities and biases inherent in percentage-based divisions.^[20] The Median Absolute Error (MdAE), computed as the median of the absolute residuals |y_i - \hat{y}_i|, enhances robustness to outliers compared to MAE by downweighting extreme errors through the median's insensitivity to distributional tails. This property makes MdAE valuable for skewed or noisy data, though it sacrifices sensitivity to the full range of errors by focusing solely on the central tendency.^[21] Other absolute error variants include the Weighted MAE, which modifies the standard MAE by applying weights to individual errors based on factors like observation importance or variance, thereby prioritizing critical predictions without shifting to relative scaling.^[22] MAE maintains scale consistency in additive error models, where prediction deviations accumulate absolutely, in opposition to multiplicative models that better align with relative metrics like MAPE.^[20]

Applications and Decompositions

In Forecast Verification

In meteorology and economics, the mean absolute error (MAE) serves as a fundamental metric for evaluating the accuracy of time series forecasts by quantifying the average magnitude of deviations between predicted and actual values.^[23] This scale-dependent measure is particularly valued in these fields for its interpretability in the units of the forecasted variable, such as millimeters for precipitation or currency units for economic indicators. A useful decomposition of MAE distinguishes between errors due to overall bias in totals and those arising from mismatches in distribution, expressed as MAE equals quantity disagreement plus allocation disagreement. Quantity disagreement captures the bias in aggregate quantities, defined as the absolute value of the mean error:

\text{Quantity disagreement} = \left| \frac{1}{n} \sum_{i=1}^{n} (f_i - o_i) \right|

where f_i denotes the forecasted value at time or location i, o_i the observed value, and n the number of forecasts. Allocation disagreement, which reflects errors in the spatial or temporal allocation of values, is then computed as MAE minus quantity disagreement. This breakdown aids in diagnosing whether forecast inaccuracies stem primarily from under- or over-prediction of totals or from poor patterning of events. In rainfall forecasting, for instance, quantity disagreement quantifies the systematic bias in total precipitation volume over a period, while allocation disagreement assesses discrepancies in how that rain is distributed across days or regions, enabling forecasters to refine models accordingly. MAE has been a standard verification measure in World Meteorological Organization guidelines since the 1990s, supporting systematic evaluation of numerical weather prediction outputs.^[24]

In Regression and Machine Learning

In regression analysis, the mean absolute error (MAE) serves as the primary loss function for least absolute deviations (LAD) regression, also known as L1 regression, where the objective is to minimize the sum of absolute residuals between observed and predicted values. This approach yields the conditional median as the optimal predictor under the L1 loss, providing robustness in scenarios with heavy-tailed error distributions. A related variant appears in Lasso regression, which incorporates an L1 penalty on coefficients to the mean squared error loss, promoting sparsity while retaining some absolute deviation sensitivity for feature selection. In machine learning frameworks, MAE is widely implemented as both an evaluation metric and a loss function, such as in scikit-learn's mean_absolute_error for scoring regression models and TensorFlow's MeanAbsoluteError for assessing prediction accuracy. It is particularly favored in imbalanced datasets or those prone to outliers, including image regression tasks where pixel-level predictions require balanced error weighting without excessive penalization of large deviations.^[25] Optimizing MAE directly poses challenges due to its non-differentiability at zero, necessitating subgradient methods in convex optimization algorithms to approximate gradients and converge to the minimum. To address this, approximations like the Huber loss are employed, which blends MAE for large errors with mean squared error for small ones, enabling smoother gradient-based training in neural networks while maintaining robustness. For instance, when evaluating a random forest regressor on housing price datasets, MAE quantifies average prediction errors in interpretable dollar units, directly indicating the typical deviation in estimated property values. In emerging 2020s AI applications, such as autonomous driving, MAE supports robust trajectory predictions by measuring average positional errors in vehicle behavior forecasting, enhancing safety in outlier-heavy real-world scenarios.

References

[1]
[PDF] A Comprehensive Survey of Regression Based Loss Functions for ...
Nov 5, 2022 · TABLE I. ADVANTAGES AND DISADVANTAGES OF USING MEAN ABSOLUTE ERROR. LOSS FUNCTION [6]. Advantages. Disadvantages. MAE is computationally cheap.
[2]
Mean Absolute Error - an overview | ScienceDirect Topics
Mean Absolute Error (MAE) is the average sum of the absolute differences between actual and predicted values, measuring model accuracy.
[3]
Root-mean-square error (RMSE) or mean absolute error (MAE) - GMD
Jul 19, 2022 · Neither metric is inherently better: RMSE is optimal for normal (Gaussian) errors, and MAE is optimal for Laplacian errors.
[4]
Mean Absolute Error [MAE] - Statistics By Jim
Mean Absolute Error (MAE) is a statistical measure that evaluates model accuracy by averaging the absolute differences between predicted and actual values.
[5]
How to compare regression models - Duke People
The mean absolute error (MAE) is also measured in the same units as the data, and is usually similar in magnitude to, but slightly smaller than, the root mean ...
[6]
Notebook: The Laplace distribution - Geraci - 2018 - Significance
Oct 4, 2018 · Pierre-Simon Laplace (1749–1827) discovered this distribution in 1774. ... During Laplace's time, a central problem in science was that of ...
[7]
What is Mean Absolute Error (MAE)? | Activeloop Glossary
Mean Absolute Error (MAE) ... MAE provides a more intuitive interpretation of the error, as it is in the same unit as the data, while MSE is in squared units.
[8]
Assessing Forecast Accuracy | METEO 3: Introductory Meteorology
Mean absolute error of WPC 3-7 maximum temperature forecasts from 1972-2017 ... For example, a maximum temperature forecast for five days in the future ...
[9]
Proof: The median minimizes the mean absolute error
Sep 23, 2024 · The median is the sole critical point, that it must be a global minimum. Therefore, the median must minimize the mean absolute error, completing the proof.
[10]
[PDF] Alternatives to the Median Absolute Deviation - KU Leuven
The median has a breakdown point of 50% (which is the highest possible), because the estimate remains bounded when fewer than 50% of the data points are ...
[11]
https://wis.kuleuven.be/stat/robust/papers/publications-1993/rousseeuwcroux-alternativestomedianad-jasa-1993.pdf
[12]
https://towardsdatascience.com/comparing-robustness-of-mae-mse-and-rmse-6d69da870828
[13]
A Simple Noncalculus Proof That the Median Minimizes the Sum of ...
A Simple Noncalculus Proof That the Median Minimizes the Sum of the Absolute Deviations. Neil C. Schwertman Department of Mathematics and Statistics ...
[14]
[PDF] Performance Metrics (Error Measures) in Machine Learning ... - arXiv
Absolute error preserves the same units of measurement as the data under analysis and gives all individual errors same weights (as compared to squared error).
[15]
What's the bottom line? How to compare models - Duke People
The root mean squared error (and mean absolute error) can only be compared between models whose errors are measured in the same units (e.g., dollars, or ...
[16]
A Comparison of Five Estimation Methods - USDA Forest Service
The MSE is the mean of the squared residuals. Residuals are squared to eliminate negative numbers. The MAE is the mean of the absolute value of the residuals.
[17]
MSE CHEAT SHEET - irc
MSE penalizes larger errors more heavily than MAE. Choose MSE when large errors are undesirable. Use MAE if outliers are a concern. How do I interpret a high ...
[18]
Root-mean-square error (RMSE) or mean absolute error (MAE)
Jul 19, 2022 · Neither metric is inherently better: RMSE is optimal for normal (Gaussian) errors, and MAE is optimal for Laplacian errors.
[19]
Another look at measures of forecast accuracy - ScienceDirect.com
We discuss and compare measures of accuracy of univariate time series forecasts. The methods used in the M-competition as well as the M3-competition, ...
[20]
3.4. Metrics and scoring: quantifying the quality of predictions
Median absolute error#. The median_absolute_error is particularly interesting because it is robust to outliers. The loss is calculated by taking the median of ...Mean_absolute_error · Mean_squared_error · Root_mean_squared_errorMissing: robustness | Show results with:robustness
[21]
WWRP/WGNE Joint Working Group on Forecast Verification Research
Mean absolute error - Equation for mean absolute error. Answers the question: What is the average magnitude of the forecast errors? Range: 0 to ∞. Perfect ...Missing: allocation disagreement
[22]
New Insights Into Error Decomposition for Precipitation Products
Aug 25, 2021 · Mean absolute error (MAE) measures the average magnitude of the errors without considering their direction. Root mean squared error (RMSE) is a ...
[23]
[PDF] Recommendations for the verification and intercomparison of QPFs ...
The Working Group on Numerical Experimentation (WGNE) began verifying quantitative precipitation forecasts (QPFs) in the mid 1990s.Missing: history | Show results with:history
[24]
5 Regression Loss Functions All Machine Learners Should Know
Jun 14, 2023 · 2. Mean Absolute Error, L1 Loss. Mean Absolute Error (MAE) is another loss function used for regression models. MAE is the sum of absolute ...
[25]
Pedestrian and vehicle behaviour prediction in autonomous vehicle ...
Mar 15, 2024 · Evaluation: Mean Absolute Error(MAE). Predict the future position of the surrounding vehicle using OGM. OWN MAE:1.51 for 2 s; 0.88 for 1 s ...Missing: 2020s | Show results with:2020s