Fact-checked by Grok 2 weeks ago

Mean absolute scaled error

The mean absolute scaled error (MASE) is a scale-independent metric for assessing the accuracy of forecasts by comparing the absolute errors of a proposed method to those of a baseline naive forecast. Proposed by Rob J. Hyndman and Anne B. Koehler in as a superior alternative to traditional measures like (MAPE), which can suffer from degeneracy in cases involving zeros or small values, MASE normalizes errors relative to in-sample one-step predictions, enabling fair comparisons across diverse datasets and horizons. For non-seasonal , is calculated as the average absolute divided by the average absolute error of a naive one-step-ahead forecast from the training data. Specifically, if e_t denotes the at time t and y_t the observed value, then = \frac{1}{h} \sum_{t=1}^h |e_t| / \frac{1}{n-1} \sum_{i=2}^n |y_i - y_{i-1}|, where h is the forecast horizon and n is the number of observations in the training period. This scaling factor, derived from the in-sample naive errors, ensures the measure is unit-free and avoids issues like infinite values in percentage-based metrics. In seasonal time series, the denominator adjusts to reflect the seasonal naive forecast, using the average absolute change over the seasonal period m (e.g., m=12 for monthly data). The formula becomes MASE = \frac{1}{h} \sum_{t=1}^h |e_t| / \frac{1}{n-m} \sum_{i=m+1}^n |y_i - y_{i-m}|, providing a robust benchmark that accounts for periodicity without assuming stationarity. MASE's key advantages include its interpretability—a value of 1 indicates the forecast performs as well as the naive method, while values below 1 signify improvement—and its applicability to intermittent demand or non-time-series data by adapting the scaling. It has been endorsed for use in forecast competitions and statistical testing frameworks, such as the Diebold-Mariano test, due to its favorable statistical properties like finite moments, making it preferable over squared-error metrics like RMSE for comparative analysis. Widely implemented in software libraries for forecasting, MASE remains a standard tool in fields like economics, supply chain management, and machine learning for evaluating model performance.

Introduction

Definition

The mean absolute scaled error (MASE) is a scale-independent measure of forecast accuracy, defined as the mean absolute divided by the mean absolute error of a naive forecast. This approach normalizes the errors relative to a simple one-step-ahead naive method, which uses the previous observation as the forecast for the next period. MASE was proposed by Rob J. Hyndman and Anne B. Koehler in 2006 as a standard for evaluating performance across diverse . For the non-seasonal case, the MASE is calculated using the formula \text{MASE} = \frac{\frac{1}{n} \sum_{t=1}^{n} |e_t|}{\frac{1}{T-1} \sum_{t=2}^{T} |y_t - y_{t-1}|}, where e_t = y_t - \hat{y}_t denotes the forecast error at time t, with y_t as the actual value and \hat{y}_t as the forecast value; n is the number of out-of-sample forecast periods; and T is the length of the in-sample period used to compute the benchmark. The denominator serves as the scaling factor, representing the average in-sample error of the naive benchmark, which makes MASE unit-free and facilitates direct comparisons of forecast accuracy across datasets with varying scales or units.

History

The mean absolute scaled error (MASE) was proposed in 2006 by Rob J. Hyndman and Anne B. Koehler as a robust alternative to traditional forecast accuracy measures in their seminal paper "Another look at measures of forecast accuracy," published in the International Journal of Forecasting. This development emerged amid growing concerns in early 2000s forecasting research following the M3-Competition (2000), which highlighted persistent problems with metrics like the (MAPE)—such as its tendency to become infinite or undefined in the presence of zeros or negative values—and the root mean squared error (RMSE), which is scale-dependent and overly sensitive to outliers. Hyndman and Koehler aimed to create a scale-independent measure suitable for comparing forecasts across diverse , drawing on real-world examples from prior competitions to demonstrate the shortcomings of existing approaches. Following its introduction, gained rapid traction in the forecasting community, with Hyndman prominently recommending it as a standard evaluation tool in his subsequent works, including the widely used Forecasting: Principles and Practice (first edition 2013, with ongoing updates). In this text, Hyndman emphasizes MASE's utility for cross-series comparisons due to its favorable properties, such as avoiding the pitfalls of percentage-based errors, and integrates it into practical forecasting workflows using software like . This endorsement helped establish MASE as a in academic and applied forecasting, influencing its inclusion in forecasting competitions and software packages by the mid-2010s. Since its inception, has seen no major methodological updates, remaining largely unchanged from the 2006 formulation, though later analyses have examined its statistical properties in specific contexts. For instance, Philip Hans Franses (2015) noted in a commentary that while possesses desirable moment conditions making it compatible with significance testing frameworks like Diebold-Mariano tests.

Properties

Advantages

The mean absolute scaled error (MASE) offers scale invariance, enabling direct comparisons of forecast accuracy across time series with different units or magnitudes, unlike scale-dependent metrics such as mean absolute error (MAE) or root mean squared error (RMSE). This property arises from scaling the forecast errors by the in-sample MAE of a naive benchmark forecast, normalizing the metric to a unit-free scale. MASE effectively handles series with values near zero or intermittent demand, avoiding the division-by-zero problems and infinite values that plague percentage-based metrics like (MAPE). By relying on absolute errors rather than relative ones, it remains well-defined even when actual values approach zero, providing reliable evaluations in such cases. The metric treats over- and under-forecasting symmetrically through its use of absolute errors, penalizing deviations equally regardless of direction, in contrast to squared-error measures like RMSE that disproportionately weight larger errors. This symmetry ensures balanced assessment of forecast performance without bias toward one type of error. MASE enhances interpretability by benchmarking against a simple naive forecast, where values less than 1 indicate superior accuracy to the naive method, a value of 1 matches it, and values greater than 1 signify poorer performance. This intuitive scale facilitates quick judgments of model efficacy relative to a . Furthermore, the scaled errors in MASE exhibit asymptotic , assuming finite first and second moments, which supports the application of statistical tests such as the Diebold-Mariano test for comparing forecast accuracies across multiple models. This property aligns MASE with established inferential frameworks, enabling rigorous hypothesis testing. As a benchmark-based measure, uses in-sample errors from the naive forecast for , promoting robustness to out-of-sample variations and trends or in the data. This approach ensures consistent evaluation across diverse series without sensitivity to specific out-of-sample characteristics.

Limitations and interpretations

The mean absolute error () depends critically on the quality of the naive forecast used for , which can lead to misleading evaluations if an inappropriate is selected. For instance, in highly seasonal , applying a non-seasonal naive forecast (repeating the previous ) as the may produce a poor scaling factor, undervaluing the performance of models that properly account for ; instead, a seasonal naive forecast (repeating the from the same in the prior cycle) is recommended to ensure fair comparison. MASE is particularly effective for evaluating short-term forecasts but may require adjustments for longer horizons, as the scaling is based on in-sample one-step naive errors, which do not directly capture the accumulation of over extended periods. While MASE can be applied to multi-step forecasts, the one-step may not fully reflect the relative difficulty of longer-term predictions, potentially leading to less intuitive comparisons across varying forecast lengths. Interpreting MASE values lacks universal thresholds beyond the benchmark of 1, where values less than 1 indicate superior performance to the in-sample one-step naive forecast, and values greater than 1 signify inferior performance. Franses (2016) highlights the reliance of on in-sample scaling for its desirable statistical properties, such as finite moments that support asymptotic normality in tests like the Diebold-Mariano framework, but notes that this in-sample focus may not always perfectly align with pure out-of-sample performance assessments. MASE's properties, including its scaling stability, assume sufficiently large samples for reliable estimation; in small datasets, the in-sample naive error denominator can become unstable or even undefined if historical observations are identical, leading to volatile MASE values.

Calculation Methods

Non-seasonal time series

For non-seasonal time series, the mean absolute scaled error (MASE) is computed using a simple naive benchmark that assumes the forecast for each period is the value from the immediately preceding observation. This benchmark, denoted as \hat{y}_t = y_{t-1}, represents a one-step-ahead prediction based on the random walk model without drift, suitable for data lacking strong trends or periodic patterns. The scaling factor for in this context is the absolute change observed in the in-sample training data, calculated as \frac{1}{T-1} \sum_{t=2}^{T} |y_t - y_{t-1}|, where T is the number of observations in the training period. This denominator captures the average one-step deviation in the historical series, providing a unit-free scale that reflects the inherent variability under the naive approach. By dividing forecast errors by this factor, becomes comparable across different series and units. The complete formula for MASE over a forecast horizon of h periods is given by \text{MASE} = \frac{\frac{1}{h} \sum_{t=1}^{h} |y_{T+t} - \hat{y}_{T+t}|}{\frac{1}{T-1} \sum_{t=2}^{T} |y_t - y_{t-1}|}, where the numerator is the mean absolute error (MAE) of the out-of-sample forecasts, and \hat{y}_{T+t} are the predicted values for the holdout period. This ratio yields a value of 1 when the forecasting method performs equivalently to the in-sample naive benchmark; values below 1 indicate superior accuracy, while values above 1 suggest poorer performance. The approach assumes the data exhibits random walk-like behavior without pronounced seasonality, ensuring the naive one-lag shift serves as an appropriate baseline. To illustrate, consider a non-seasonal monthly series of 24 observations, such as retail units sold without evident cycles: the first 18 values form the training set (T=18). Compute the scaling factor by averaging the absolute differences between consecutive in-sample values, e.g., |y_2 - y_1| + \cdots + |y_{18} - y_{17}| divided by 17. For the remaining 6 values (h=6), generate naive forecasts (\hat{y}_{19} = y_{18}, \hat{y}_{20} = y_{19}, etc.), calculate the out-of-sample , and divide by the scaling factor to obtain . This process highlights how normalizes errors relative to variability, aiding model evaluation in stable, non-periodic contexts.

Seasonal time series

For time series data exhibiting strong seasonal patterns, the mean absolute scaled error (MASE) is adapted by using a seasonal naïve forecast as the benchmark, rather than the simple one-step naïve forecast appropriate for non-seasonal data. The seasonal naïve forecast predicts each future value \hat{y}_{T+t} as the value from the same season in the previous cycle, specifically \hat{y}_{T+t} = y_{T+t-m}, where m is the seasonal period (e.g., m=12 for monthly data or m=4 for quarterly data). This benchmark captures the inherent periodicity, making it suitable for evaluating forecasts against a simple method that replicates observed seasonal behavior. The scaling factor for seasonal MASE is the mean absolute error (MAE) of the seasonal naïve forecast applied in-sample to the , computed as \frac{1}{T-m} \sum_{t=m+1}^{T} |y_t - y_{t-m}|, where T is the of the series. This denominator normalizes the out-of-sample forecast errors relative to the average one-season-ahead error observed within the historical , ensuring the metric is scale-independent and comparable across series with similar . The full formula for seasonal MASE, based on h out-of-sample forecasts, is then: \text{MASE} = \frac{\frac{1}{h} \sum_{t=1}^{h} |y_{T+t} - \hat{y}_{T+t}|}{\frac{1}{T-m} \sum_{t=m+1}^{T} |y_t - y_{t-m}|} This yields a value of 1 when the forecast accuracy matches that of the seasonal naïve benchmark; values below 1 indicate superior performance, while values above 1 suggest poorer accuracy. For instance, in quarterly data with m=4, forecast errors are scaled against the average absolute deviations between observations one season apart in the training period, such as comparing a model's errors to the typical quarterly fluctuations in historical sales or production data. This seasonal adaptation assumes the presence of strong, consistent periodic patterns in the ; applying the non-seasonal version to such series would yield misleading results by ignoring the dominant cyclical .

Non-time series

The absolute scaled error () can be adapted for non-time series , such as cross-sectional observations, where there is no inherent temporal ordering or . In this context, the are treated as a collection of independent observations, and the benchmark forecast is the simple of the , denoted as \hat{y} = \bar{y}, which serves as a naive predictor without exploiting any sequential dependencies. This adaptation maintains the scale-independent property of , allowing comparisons across datasets with varying units or magnitudes. The scaling factor for non-time series MASE is the mean absolute deviation (MAD) from the sample mean in the training data, calculated as \frac{1}{T} \sum_{t=1}^{T} |y_t - \bar{y}|, where T is the number of training observations and \bar{y} is the mean of the training set. The full MASE formula then becomes \text{MASE} = \frac{\frac{1}{n} \sum_{i=1}^{n} |y_i - \hat{y}_i|}{\frac{1}{T} \sum_{t=1}^{T} |y_t - \bar{y}|}, where n is the number of out-of-sample predictions, y_i are the actual values, and \hat{y}_i are the forecasted values. A MASE value less than 1 indicates that the forecasting method outperforms the naive mean benchmark, while values greater than 1 suggest inferior performance. This formulation aligns with implementations in statistical software for non-temporal data analysis. For example, in predicting house prices based on cross-sectional features like and size, the training data might consist of historical sales without time ordering. Here, prediction errors are scaled by the from the mean training price, providing a unitless measure to evaluate model accuracy relative to simply predicting the average price for all houses. This approach is particularly useful in scenarios like across product categories or tasks on static datasets, where temporal is absent.

Applications and Extensions

Forecasting evaluation

The mean absolute scaled error (MASE) plays a key role in model selection for time series forecasting by providing a scale-independent metric to rank competing models on the same dataset. For instance, it enables direct comparison of autoregressive integrated moving average (ARIMA) and exponential smoothing state space (ETS) models, where lower MASE values indicate superior out-of-sample performance. In cross-validation procedures, MASE is computed on hold-out sets or through rolling-origin cross-validation to assess model robustness across multiple forecast origins. This approach evaluates one-step-ahead forecasts iteratively, using in-sample naïve errors for scaling, which ensures consistent measurement of without scale biases. The scale-free property of MASE facilitates multi-series comparisons by allowing aggregation of errors across diverse datasets, such as averaging individual MASE values to gauge overall model efficacy in large-scale applications. This aggregation is standard for hierarchical or grouped , enabling benchmarks like those in the M-competition datasets where mean MASE summarizes performance over hundreds of series. In energy demand forecasting, helps identify superior models for ; for example, across 25 electricity load series, hierarchies achieved average of 0.72, outperforming benchmarks and informing grid optimization. For statistical significance, differences in between models can be tested using the Diebold-Mariano framework, which leverages the asymptotic of scaled errors to determine if one method significantly outperforms another.

Intermittent demand forecasting

Intermittent demand patterns, characterized by frequent zero observations interspersed with sporadic non-zero demands, pose significant challenges for forecast accuracy evaluation. Traditional metrics such as the (MAPE) often fail in these scenarios because observations results in infinite or undefined values, rendering them unusable for series with sparsity. The (MASE) addresses this issue through its absolute scaling approach, which divides forecast errors by the in-sample of a naive forecast, avoiding division by actual values and ensuring finite results even in highly sparse data. For intermittent series, typically employs the non-seasonal benchmark, calculating the scale as the average one-step-ahead absolute differences from historical data, which naturally accommodates zeros without requiring special adjustments. However, in zero-inflated contexts, evaluations may emphasize errors on non-zero periods to better capture demand variability, though standard computations include all periods for a comprehensive . This maintains 's scale-independence, allowing consistent comparisons across lumpy or erratic intermittent patterns. A practical example arises in inventory management for slow-moving spare parts, where MASE is used to assess methods like Croston's approach— which separately forecasts demand sizes and intervals—and the Syntetos-Boylan approximation (SBA), an unbiased variant, against simple naive baselines such as . In empirical studies of enterprise data with over 16,000 intermittent items, MASE revealed that while Croston and SBA often underperform relative to optimized methods like the Teunter-Syntetos-Babai (TSB) approach (with scaled MASE values indicating higher errors for Croston and SBA), they provide viable baselines for lumpy demands, outperforming naive methods in erratic subsets. The benefits of in this domain include delivering stable, interpretable measures for lumpy demand series, where traditional metrics fluctuate wildly due to irregularity, thereby supporting more reliable decisions such as stock level optimization and reorder policies. This application was highlighted in foundational work by Hyndman (2006), who advocated as a robust standard for intermittent demand evaluation, with subsequent research confirming its utility in ranking forecasters and handling bias in sparse inventory contexts.

References

  1. [1]
    Another look at measures of forecast accuracy - ScienceDirect.com
    ... Journal of Forecasting. Another look at measures of forecast accuracy. Author links open overlay panelRob J. Hyndman , Anne B. Koehler b 1. Show more.
  2. [2]
    A note on the Mean Absolute Scaled Error - ScienceDirect.com
    Hyndman and Koehler (2006) recommend that the Mean Absolute Scaled Error (MASE) should become the standard when comparing forecast accuracies.
  3. [3]
    MeanAbsoluteScaledError — sktime documentation
    This scale-free error metric can be used to compare forecast methods on a single series and also to compare forecast accuracy between series.
  4. [4]
    5.8 Evaluating point forecast accuracy - OTexts
    Scaled errors were proposed by Hyndman & Koehler (2006) as an alternative to using percentage errors when comparing forecast accuracy across series with ...
  5. [5]
    A note on the Mean Absolute Scaled Error
    ### Extracted and Summarized Content
  6. [6]
    [PDF] ANOTHER LOOK AT FORECAST-ACCURACY METRICS FOR ...
    Rob Hyndman summarizes these forecast accuracy metrics and explains their potential failings. He also introduces a new metric—the mean absolute scaled error.
  7. [7]
    Forecast evaluation for data scientists: common pitfalls and best ...
    Dec 2, 2022 · Benchmarks are an important part of forecast evaluation. Comparison against the right benchmarks and especially the simpler ones is essential.
  8. [8]
    [PDF] Another look at measures of forecast accuracy - Rob J Hyndman
    Nov 2, 2005 · Instead, we propose that the mean absolute scaled error become the standard measure for comparing forecast accuracy across multiple time series.
  9. [9]
    Another look at measures of forecast accuracy - ScienceDirect
    Another look at measures of forecast accuracy. Author links open overlay ... 120, and Bowerman, O'Connell, & Koehler, 2004, p.18) and it was the primary ...
  10. [10]
    Out of sample MASE - Cross Validated - Stats StackExchange
    Sep 16, 2020 · My understanding is that one limitation with using the out of sample naive MAE is that if the out of sample set is small, it is not reliable.Interpretation of mean absolute scaled error (MASE) - Cross ValidatedHow to interpret MASE for longer horizon forecasts? - Cross ValidatedMore results from stats.stackexchange.com
  11. [11]
    Accuracy measures for a forecast model - Rob J Hyndman - Software
    MASE: Mean Absolute Scaled Error. ACF1: Autocorrelation of errors at lag 1 ... non-time series data. If f is a numerical vector rather than a forecast ...
  12. [12]
  13. [13]
    9.10 ARIMA vs ETS | Forecasting: Principles and Practice (3rd ed)
    In this case the ARIMA model seems to be the slightly more accurate model based on the test set RMSE, MAPE and MASE. # Generate forecasts and compare ...
  14. [14]
    [PDF] A Study of Time Series Models ARIMA and ETS - MECS Press
    Apr 7, 2017 · A Comparative study brings between the ETS and. ARIMA Model through SSE, MAE, RMSE, MASE and. MAPE also include the criteria AIC and BIC ...<|separator|>
  15. [15]
    5.10 Time series cross-validation | Forecasting - OTexts
    This procedure is sometimes known as “evaluation on a rolling forecasting origin” because the “origin” at which the forecast is based rolls forward in time.
  16. [16]
    Computing aggregated MASE for multiple time series
    May 27, 2020 · The standard approach is to calculate the MASE separately for each series, using that series' scaling factor (classically, the in-sample MAE ...Multi-input, multi-output time series regression loss using MASEAggregating error metrics like RMSE for multiple time seriesMore results from stats.stackexchange.comMissing: datasets | Show results with:datasets
  17. [17]
    A note on the Mean Absolute Scaled Error - IDEAS/RePEc
    Hyndman and Koehler (2006) recommend that the Mean Absolute Scaled Error (MASE) should become the standard when comparing forecast accuracies.
  18. [18]