Fact-checked by Grok 2 weeks ago

Moving-average model

The moving-average (MA) model is a class of univariate time series models in statistics that represents the current value of a stochastic process as a constant mean plus a finite sum of current and past white noise error terms, capturing dependencies from recent random shocks.^[1] Formally, an MA model of order q, denoted MA(q), is defined by the equation

Y_t = \mu + \varepsilon_t + \sum_{i=1}^q \theta_i \varepsilon_{t-i},

where \mu is the mean of the process, \{\varepsilon_t\} is a sequence of independent and identically distributed white noise errors with mean zero and constant variance \sigma^2 > 0, and \theta_1, \dots, \theta_q are fixed parameters (with the sign convention sometimes using negative coefficients, which is equivalent). This structure implies that the model's memory is limited to the last q periods, making it suitable for modeling processes with short-term correlations.^[2] Developed within the broader autoregressive moving average (ARMA) framework by statisticians George E. P. Box and Gwilym M. Jenkins in their influential 1970 book Time Series Analysis: Forecasting and Control, the MA model became a cornerstone of modern time series analysis.^[3] Box and Jenkins emphasized its role in the autoregressive integrated moving average (ARIMA) methodology, which combines MA components with autoregressive (AR) terms and differencing to handle non-stationary data for forecasting purposes.^[3] The model's parameters are typically estimated using maximum likelihood methods, assuming the errors follow a normal distribution, though non-normal innovations can also be accommodated.^[3] A key property of the MA(q) model is its inherent stationarity, as the finite dependence on past errors ensures constant mean, variance, and autocovariances regardless of the parameter values, provided the errors are white noise.^[4] However, for practical estimation and interpretation, the model must also satisfy the invertibility condition, which requires that the roots of the characteristic polynomial $1 + \theta_1 z + \dots + \theta_q z^q = 0 lie outside the unit circle in the complex plane; this allows the process to be expressed as an infinite-order autoregression, facilitating shock recovery.^[4] Identification of the order q relies on the sample autocorrelation function (ACF), which theoretically cuts off to zero after lag q, while the partial ACF decays gradually.^[3] MA models are widely applied in econometrics, finance, and signal processing for short-term forecasting, where recent disturbances dominate, such as in stock price modeling or economic indicator projections.^[1] For instance, they underpin stochastic simulations in actuarial science for long-range demographic and financial projections.^[1] Extensions include seasonal MA components in SARIMA models to handle periodic patterns, enhancing their utility in diverse fields like hydrology and epidemiology.^[5] Despite their simplicity, MA models can approximate more complex dynamics when combined with AR terms, though they may underperform for long-memory processes better captured by fractional integration.^[3]

Fundamentals

Definition

In time series analysis, the moving-average model serves as a linear filter that models a stochastic process by expressing the observed value at time t as a function of current and past random shocks, thereby capturing short-term dependencies in the data.^[6] This approach assumes the process is driven by white noise innovations, making it suitable for representing dependencies that decay quickly without long-term memory effects.^[7] A moving-average model of order q, denoted MA(q), defines the time series y_t as a constant plus a linear combination of the current white noise term \epsilon_t and the previous q white noise terms, where \epsilon_t is a sequence of independent and identically distributed random variables with mean zero and finite variance \sigma^2.^[7] The model is formally expressed as:

y_t = \mu + \epsilon_t + \sum_{i=1}^{q} \theta_i \epsilon_{t-i},

where \mu is the mean of the series, the \theta_i are the model parameters (moving-average coefficients).^[6] This formulation implies that the value at time t depends only on shocks up to lag q, after which the influence drops to zero.^[7] Unlike simple moving averages, which are deterministic smoothing techniques that average a fixed number of past observations to reduce noise and estimate trends in historical data, the MA(q) model is inherently probabilistic and focuses on modeling the underlying stochastic structure for forecasting future values.^[6] The term "moving average" in this context originates from early 20th-century work on time series, particularly Eugen Slutsky's 1927 exploration of random sums generating cyclic patterns (translated and published in 1937) and G. Udny Yule's 1927 contributions to linear stochastic models.^[8]

Historical background

The concept of moving averages in time series analysis emerged in the early 20th century amid efforts to address spurious correlations and cyclical patterns in economic and astronomical data. G. Udny Yule, in his 1926 study, highlighted how linear trends could induce misleading correlations between unrelated time series, proposing methods like differencing to mitigate such artifacts while noting the smoothing effects of averaging on noisy data. Concurrently, Eugen Slutsky's 1927 work demonstrated that the cumulative summation of random shocks—akin to a moving average process—could generate apparent cycles in otherwise random economic series, laying groundwork for understanding how averaging random inputs produces structured fluctuations without inherent periodicity.^[9] The formal foundation of the moving-average (MA) model was established by Herman Wold in his 1938 dissertation, where he decomposed stationary stochastic processes into a deterministic component and a purely stochastic part representable as an infinite-order MA of white noise innovations.^[10] This Wold decomposition theorem provided a rigorous theoretical basis for viewing stationary time series as filtered versions of uncorrelated errors, influencing subsequent developments in time series modeling by emphasizing the MA structure as a universal representation for the stochastic component of such processes. The MA model gained widespread practical adoption through George E. P. Box and Gwilym M. Jenkins' 1970 seminal text, which integrated finite-order MA processes into the autoregressive integrated moving average (ARIMA) framework for forecasting and control, popularizing their use in empirical analysis across economics and engineering.^[11] This positioned the pure MA model as a foundational noise-driven mechanism for capturing short-term dependencies in residuals, distinct from autoregressive components. In the post-1950s computing era, MA models evolved significantly within spectral analysis and signal processing, where they served as finite impulse response filters for smoothing and frequency domain decomposition; John Tukey's 1949 innovations in computational spectral estimation, extended by Bartlett's 1950 lag-window methods, leveraged MA representations to estimate power spectra from finite data samples.^[12] These advancements enabled efficient implementation of MA-based techniques on early computers, bridging time domain modeling with frequency-based signal processing applications.

Model Formulation and Properties

Mathematical representation

The moving average model of order q, denoted MA(q), represents a time series \{y_t\} as a constant mean plus a finite linear combination of current and past white noise errors \{\epsilon_t\}, where \epsilon_t is assumed to be identically and independently distributed with mean zero and variance \sigma^2.^[13] The explicit form is given by

y_t = \mu + \epsilon_t + \sum_{i=1}^q \theta_i \epsilon_{t-i},

where \mu is the mean of the process and the \theta_i are the model parameters.^[13] This can be compactly expressed using the backshift operator B, defined such that B \epsilon_t = \epsilon_{t-1} and B^k \epsilon_t = \epsilon_{t-k} for positive integer k.^[14] In operator notation, the model becomes y_t - \mu = \theta(B) \epsilon_t, where \theta(B) = 1 + \theta_1 B + \theta_2 B^2 + \cdots + \theta_q B^q is the MA polynomial of degree q.^[13] The parameters \theta_i (for i = 1, \dots, q) serve as weights that quantify the influence of shocks occurring i periods in the past on the current value y_t; a larger |\theta_i| indicates stronger persistence of the effect from the error \epsilon_{t-i}.^[13] For identifiability, the roots of the polynomial equation \theta(z) = 1 + \theta_1 z + \cdots + \theta_q z^q = 0 must lie outside the unit circle in the complex plane, ensuring a unique representation; for the simple MA(1) case, this condition simplifies to |\theta_1| < 1.^[13] In the multivariate setting, the model extends to a d-dimensional time series \mathbf{y}_t, expressed in vector-matrix form as \mathbf{y}_t = \boldsymbol{\mu} + \Theta(B) \boldsymbol{\epsilon}_t, where \boldsymbol{\mu} is the mean vector, \boldsymbol{\epsilon}_t is a d \times 1 white noise vector with covariance matrix \Sigma, and \Theta(B) = I_d + \Theta_1 B + \cdots + \Theta_q B^q with each \Theta_i a d \times d parameter matrix.^[15] Identifiability in this case requires that the determinant of the matrix polynomial satisfies \det\{\Theta(z)\} \neq 0 for all complex z with |z| \leq 1, often supplemented by canonical forms to resolve non-uniqueness.^[15]

Stationarity and invertibility

Moving-average models of finite order, denoted MA(q), are inherently stationary. This property arises because the process is a constant mean plus a finite linear combination of white noise errors, which are themselves stationary, resulting in a constant mean \mu and finite, time-invariant variance given by \sigma^2 \sum_{i=0}^q \theta_i^2 for the centered process y_t - \mu, where \theta_0 = 1 and \sigma^2 is the error variance. Unlike autoregressive models, no additional restrictions on the parameters are needed to ensure stationarity, as the dependence on past errors is limited to the most recent q lags, preventing any explosive or non-constant behavior.^[16] A key complementary property is invertibility, which requires that all roots of the MA polynomial \theta(z) = 1 + \theta_1 z + \cdots + \theta_q z^q = 0 lie outside the unit circle in the complex plane (i.e., have absolute value greater than 1). This condition ensures that the MA(q) process can be equivalently represented as an infinite-order autoregressive process, AR(\infty), where current observations depend on an infinite but exponentially decaying sequence of past observations. For example, in the simple MA(1) case, invertibility holds if |\theta_1| < 1. Non-invertible models, while mathematically valid, complicate estimation and interpretation, as they imply heavier weighting on distant past errors rather than recent ones.^[17]^[16] The autocorrelation function (ACF) of an MA(q) model reflects its finite dependence structure, with autocorrelations \rho_k = 0 for all lags k > q. For k = 1, \dots, q, the ACF is derived from the autocovariance \gamma_k = \sigma^2 \sum_{i=0}^{q-k} \theta_i \theta_{i+k}, yielding

\rho_k = \frac{\sum_{i=0}^{q-k} \theta_i \theta_{i+k}}{\sum_{i=0}^q \theta_i^2}.

This results in a "cut-off" pattern in the ACF plot after lag q, aiding model identification. In contrast, the partial autocorrelation function (PACF) for MA(q) does not cut off but instead decays gradually (exponentially or in a damped sinusoidal manner) to zero as the lag increases, indicating persistent but diminishing direct correlations after controlling for intermediate lags.^[16]^[18]

Estimation and Inference

Parameter estimation methods

Parameter estimation in moving-average (MA) models typically relies on maximum likelihood estimation (MLE) under the assumption of Gaussian-distributed errors, where the parameters \theta_1, \dots, \theta_q and the innovation variance \sigma^2 are obtained by maximizing the log-likelihood function of the observed time series. This approach treats the MA process as a conditional density given the unobservable past innovations, leading to a nonlinear optimization problem that requires iterative numerical methods for solution. To initiate the MLE optimization, the conditional sum-of-squares method serves as an efficient approximation, minimizing the sum of squared differences between observed values and one-step-ahead predictions assuming zero initial innovations beyond the model's order. This technique provides reliable starting values for the subsequent MLE refinement, particularly for higher-order MA models where direct computation is challenging. For efficient evaluation of the likelihood in MLE, nonlinear least squares optimization is often employed alongside the innovations algorithm, which recursively computes the one-step-ahead prediction errors (innovations) and their conditional variances without explicitly forming the full covariance matrix of the observations. This algorithm enhances computational feasibility for moderate to large samples by avoiding the inversion of high-dimensional matrices inherent in exact likelihood calculations. Estimation of MA models faces challenges related to non-uniqueness arising from overparameterization, where multiple parameter sets can yield equivalent autocovariance structures unless constrained by invertibility conditions that ensure the MA polynomial has roots outside the unit circle. Additionally, handling initial values for the unobservable innovations \epsilon_t introduces approximation errors in finite samples, potentially affecting the accuracy of estimates near the boundary of the parameter space.

Model selection criteria

Model selection for moving-average (MA) models involves identifying the appropriate order q and validating the fitted model to ensure it adequately captures the underlying time series structure without unnecessary complexity. A primary method for determining the order q relies on the sample autocorrelation function (ACF) plot, where significant autocorrelations are expected up to lag q, after which the ACF values drop sharply to near zero, indicating a cutoff pattern characteristic of an MA(q) process.^[19] Once candidate models are identified, information criteria such as the Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) are used to compare models and select the one balancing goodness-of-fit and parsimony. The AIC is defined as

\text{AIC} = -2 \log L + 2k,

where L is the maximized likelihood of the model and k is the number of parameters, penalizing models with more parameters to avoid overfitting while favoring those that explain the data well.^[20] The BIC applies a stronger penalty for complexity, given by

\text{BIC} = -2 \log L + k \log n,

with n as the sample size, making it more conservative and often selecting simpler models, particularly in larger datasets.^[21] Lower values of AIC or BIC indicate preferable models for MA processes. After estimation, diagnostic checks assess model adequacy by examining the residuals. The Ljung-Box test evaluates whether residuals exhibit white noise properties by testing the null hypothesis of no serial correlation up to a specified lag, with a significant p-value suggesting inadequate model fit and the need for higher-order terms.^[22] Additionally, quantile-quantile (Q-Q) plots compare the distribution of standardized residuals against a normal distribution to check for normality assumptions, where deviations in the tails may indicate issues like heavy-tailed errors requiring model adjustments.^[23] To mitigate overfitting, especially when multiple MA orders are considered, time series cross-validation techniques are employed, such as rolling-origin or expanding-window validation, which respect the temporal order by training on past data and testing on future observations, ensuring out-of-sample performance aligns with in-sample fit.^[24]

Applications and Extensions

Forecasting procedures

Forecasting in moving-average (MA) models relies on the conditional expectation of future values given past observations, leveraging the model's finite dependence on previous error terms, which are assumed to be white noise with mean zero and variance \sigma^2.^[25] The one-step-ahead forecast at time t, denoted \hat{y}_{t+1|t}, is computed as the sum of the estimated MA coefficients multiplied by the most recent estimated error terms:

\hat{y}_{t+1|t} = \hat{\theta}_1 \hat{\epsilon}_t + \hat{\theta}_2 \hat{\epsilon}_{t-1} + \dots + \hat{\theta}_q \hat{\epsilon}_{t-q+1},

where \hat{\theta}_i are the estimated parameters and \hat{\epsilon}_{t-i} are the estimated residuals from the fitted model, with future errors set to zero.^[25] If the model includes a constant term \mu (the process mean), this is added to the forecast.^[6] For multi-step-ahead forecasts (h > 1), the optimal predictors gradually incorporate fewer past errors as the forecast horizon increases, due to the finite memory of the MA(q) process. Specifically, \hat{y}_{t+h|t} = \mu + \sum_{i=h}^{q} \hat{\theta}_i \hat{\epsilon}_{t + h - i}, with \hat{\epsilon}_{t+j} = 0 for j > 0. After h > q steps, the forecast converges to the unconditional mean of the process (typically 0 for centered data), as no further error terms influence the prediction.^[25] The forecast variance increases with h up to h = q+1 and then stabilizes at the unconditional variance \sigma^2 (1 + \sum_{i=1}^q \theta_i^2), reflecting growing uncertainty beyond the model's memory.^[25] Prediction intervals for MA forecasts are constructed using the forecast error variance and assuming normality of the errors. For horizon h, the h-step-ahead forecast variance is \sigma_h^2 = \sigma^2 \left(1 + \sum_{i=1}^{h-1} \theta_i^2 \right), where \theta_i = 0 for i > q. A (1 - \alpha) \times 100\% prediction interval is then \hat{y}_{t+h|t} \pm z_{\alpha/2} \sqrt{\hat{\sigma}_h^2}, with z_{\alpha/2} the critical value from the standard normal distribution (e.g., 1.96 for 95% intervals).^[25] This approach provides symmetric intervals that widen with h until stabilizing, highlighting the model's suitability for short-term predictions.^[25] In practice, MA forecasts are updated efficiently with new observations by recalculating residuals and incorporating the latest error into the moving average terms, exploiting the finite q lags. For non-invertible cases (where roots of the MA polynomial lie inside the unit circle), forecasts can still be obtained directly via the above formulas, though parameter estimation may require constrained optimization to ensure stability, and interpretations should account for potential over-differencing in the underlying process.^[26]

Relation to ARIMA models

The autoregressive integrated moving average (ARIMA) model generalizes the moving average (MA) model by incorporating autoregressive (AR) components and differencing to handle non-stationary time series. Specifically, an ARIMA(p, d, q) model combines an AR(p) process, d levels of differencing to achieve stationarity, and an MA(q) process, where the pure MA(q) model corresponds to the special case with p=0 and d=0.^[27] For non-stationary series, such as those exhibiting random walk behavior, the integrated moving average (IMA) model applies differencing to the MA process; the IMA(1,1), or ARIMA(0,1,1), is particularly useful for modeling a random walk with drift or noise, where the first difference follows an MA(1) process.^[28]^[14] Extensions to seasonal data incorporate seasonal lags into the ARIMA framework via the seasonal ARIMA (SARIMA) model, denoted SARIMA(p,d,q)(P,D,Q)s, where the seasonal MA component of order Q at lag s captures periodic patterns in addition to the non-seasonal MA(q).^[29] In economic and financial time series analysis, the MA component within ARIMA models offers advantages by modeling transient shocks and mitigating the effects of over-differencing; over-differencing introduces a unit root in the MA polynomial (e.g., an MA(1) coefficient of -1), which the MA terms can parsimoniously account for without invalidating forecasts, unlike under-differencing that leads to spurious autocorrelation.^[27]

References

[1]
VB: Stochastic Model of OASDI program
Moving Average (MA) Models. A time series is called a moving average model of order q, or simply an MA(q) process, if. Yt = µ + εt –θ1εt-1 –θ2εt-2 –…–θqεt-q ...
[2]
[PDF] General Linear Process - Purdue Department of Statistics
The invertibility of MA(1) and MA(2) is dual to the stationarity of. AR(1) and AR(2). Variance and autocorrelation. For MA(1), γ0 = σ. 2 a. (1 ...
[3]
6.4.4.6. Box-Jenkins Model Identification
Box and Jenkins recommend the differencing approach to achieve stationarity. ... Moving average model, order identified by where plot becomes zero. Decay ...
[4]
[PDF] Conditions for Stationarity and Invertibility James L. Powell ...
Invertibility of Moving Average Processes. If an MA(q) process yt. = μ + εt + ... More generally, invertibility of an MA(q) process is the flip side of ...
[5]
[PDF] STAT 520 FORECASTING AND TIME SERIES
seasonal moving average (MA) model of order Q with seasonal period s, denoted by MA(Q)s, is. Yt = et − Θ1et−s − Θ2et−2s −···− ΘQet−Qs. A nonzero mean ...<|control11|><|separator|>
[6]
8.4 Moving average models | Forecasting - OTexts
A moving average model is used for forecasting future values, while moving average smoothing is used for estimating the trend-cycle of past values. Two examples ...
[7]
Time Series Analysis | Wiley Series in Probability and Statistics
Author(s):. George E. P. Box, Gwilym M. Jenkins, Gregory C. Reinsel ; First published:12 June 2008 ; Print ISBN:9780470272848 | ; |Online ISBN:9781118619193 ...
[8]
The Summation of Random Causes as the Source of Cyclic Processes
Slutzky, Eugen. “The Summation of Random Causes as the Source of Cyclic Processes.” Econometrica, vol. 5, .no 2, Econometric Society, 1937, pp. 105-146.Missing: 1927 | Show results with:1927
[9]
[PDF] The Summation of Random Causes as the Source of Cyclic Processes
May 9, 2006 · * Professor Eugen Slutzky's paper of 1927, "The Summation of Random. Causes as the Source of Cyclic Processes," Problems of Economic Conditions,.
[10]
A Study In The Analysis Of Stationary Time Series
There are two main lines of approach, both of them germinating from G. U. Yule. Let these be briefly outlined. Starting from a purely random series as given ...
[11]
Time series analysis; forecasting and control : Box, George E. P
Apr 8, 2019 · Time series analysis; forecasting and control. by: Box, George E. P. Publication date: 1970. Topics: Feedback control systems -- Mathematical ...
[12]
[PDF] some history and applications of numerical spectrum analysis
In particular one may point to the references Tukey and Hamming (1949), Bartlett (1950), and Tukey (1950).
[13]
[PDF] Time Series: Autoregressive models AR, MA, ARMA, ARIMA
Oct 23, 2018 · A time series is a sequential set of data points, measured typically over successive times. • Time series analysis comprises methods for ...
[14]
[PDF] Lecture 6: Autoregressive Integrated Moving Average Models
• A useful tool for expressing and working with AR models is the backshift operator: this is an opera- tor we denote by B that takes a given time series and ...
[15]
[PDF] Vector AutoRegressive Moving Average Models: A Review - arXiv
Jun 28, 2024 · Vector AutoRegressive Moving Averages (VARMAs) have long been considered a fundamental model class for multivariate time series. VARMA models ...
[16]
[PDF] Lecture 8-b Time Series: Stationarity, AR(p) & MA(q)
Invertibility allows us to convert an MA process into an AR process. AR ... Note: Box, Jenkins, and Reinsel (1994) proposed using the AIC above. R Note ...
[17]
None
### Definitions and Conditions for Invertibility in Box-Jenkins MA Models
[18]
[PDF] Lecture 9-a Time Series: Identification of AR, MA & ARMA Models
Review: ARMA Process – Stationarity & ACF. • ACF: A recursive formula ... Note: Box, Jenkins, and Reinsel (1994) proposed using the AIC above. R Note ...
[19]
[PDF] 454-2013: The Box-Jenkins Methodology for Time Series Models
Not only does the Box-Jenkins model have to be stationary, it also has to be invertible. Invertible means recent observations are more heavily weighted than ...
[20]
[PDF] A New Look at the Statistical Model Identification - Semantic Scholar
If the statistical identification procedure is con- sidered as a decision procedure the very basic problem is the appropriate choice of t,he loss function. In ...
[21]
[PDF] Estimating the Dimension of a Model Gideon Schwarz The Annals of ...
Apr 5, 2007 · The problem of selecting one of a number of models of different dimensions is treated by finding its Bayes solution, and evaluating the leading ...
[22]
[PDF] On a measure of lack of fit in time series models
The overall test for lack of fit in autoregressive-moving average models proposed by Box &. Pierce (1970) is considered. It is shown that a substantially ...
[23]
3.3 Residual diagnostics | Forecasting: Principles and Practice (2nd ...
The “residuals” in a time series model are what is left over after fitting a model. For many (but not all) time series models, the residuals are equal to the ...
[24]
[PDF] A Note on the Validity of Cross-Validation for Evaluating ...
Jul 23, 2017 · Cross-validation (CV) (Stone, 1974; Arlot and Celisse, 2010) is one of the most widely used methods to assess the generalizability of algorithms ...Missing: avoidance | Show results with:avoidance
[25]
8.8 Forecasting | Forecasting: Principles and Practice (2nd ed)
Point forecasts can be calculated using the following three steps. Beginning with h=1 h = 1 , these steps are then repeated for h=2,3,... until all forecasts ...
[26]
Estimation of a non-invertible moving average process: The case of ...
Excessive use of the difference transformation induces a non-invertible moving average (MA) process in the disturbances of the transformed regression.
[27]
Introduction to ARIMA: nonseasonal models - Duke People
ARIMA(p,d,q) forecasting equation: ARIMA models are, in theory, the most general class of models for forecasting a time series which can be made to be ...
[28]
[PDF] Integrated Moving Averages - NYU Stern
ill study the simplest case, the IMA(1,1), also known as ARIMA (0,1,1). The model can be written as x −x =ε −a ε. , t t −1 t t −1 t s n where a is ...
[29]
T.2.5.1 - ARIMA Models | STAT 501
A general class of time series models called autoregressive integrated moving averages or ARIMA models. They are also referred to as Box-Jenkins models.Missing: representation | Show results with:representation