Fact-checked by Grok 2 weeks ago

Mixed-data sampling

Mixed-data sampling, commonly abbreviated as , refers to a class of econometric models designed to incorporate data observed at different frequencies, particularly by using high-frequency predictors to explain low-frequency dependent variables without requiring temporal aggregation. These models express the of the low-frequency variable as a function of the higher-frequency regressors, often parameterized through flexible forms such as exponential Almon or beta s to maintain despite potentially long lag structures. Introduced in a seminal 2004 working paper by Eric Ghysels, Pedro Santa-Clara, and Rossen Valkanov, MIDAS addresses practical challenges in empirical analysis where data availability varies, such as monthly indicators informing quarterly economic outcomes. The core advantage of MIDAS models lies in their ability to avoid the biases and efficiency losses associated with aggregating high-frequency to match the low-frequency horizon, enabling more timely and accurate nowcasting and forecasting. Estimation typically proceeds via , allowing for the weighting of recent high-frequency observations more heavily, which captures dynamic relationships effectively. Extensions have since proliferated, including autoregressive MIDAS (AR-MIDAS) for incorporating dynamics in the dependent variable, factor-augmented versions for high-dimensional , and variants to account for regime shifts. These developments build on the original framework's roots in models from earlier econometric literature, such as those by (1971) and Geweke (1978). In applications, has proven particularly valuable in for GDP growth and using mixed-frequency indicators like or consumer prices, as well as in for modeling asset return volatility with intraday data. Its robustness to model misspecification and computational simplicity relative to alternatives like state-space methods or mixed-frequency VARs have made it a standard tool in central banking and policy analysis. Recent studies continue to refine for unbalanced panels and integrations, underscoring its ongoing relevance in handling real-world data irregularities.

Introduction

Definition and Purpose

Mixed-data sampling (MIDAS) refers to a single-equation framework in that enables the of sampled at different frequencies, allowing a low-frequency dependent to incorporate high-frequency regressors without requiring temporal aggregation of the higher-frequency data. This approach specifies the conditional expectation of the low-frequency as a function of the high-frequency regressors, preserving the underlying dynamics of the more granular data. The primary purpose of MIDAS is to mitigate information loss inherent in traditional econometric models that aggregate high-frequency data to match the lowest common frequency, thereby avoiding biases from discretization and enabling more accurate predictions. By directly utilizing high-frequency information, MIDAS addresses key challenges in empirical analysis where relevant predictors evolve more rapidly than the outcome variable, such as combining monthly inflation data with quarterly GDP measures. Its advantages include greater parsimony through fewer parameters compared to unrestricted models, enhanced flexibility in handling diverse data structures, and reduced specification errors relative to multi-equation systems that impose additional constraints on variable interactions. A core motivation for arises in real-world applications involving mixed-frequency economic and financial , such as linking quarterly GDP growth to daily financial indicators or monthly rates to weekly survey data. For instance, it facilitates modeling annual while accounting for intra-year volatility, capturing short-term fluctuations that influence longer-term outcomes without diluting their impact through aggregation.

Historical Background

The concept of mixed-data sampling (MIDAS) regression models originated in the early as a practical approach to handling data observed at different frequencies, addressing limitations of traditional methods like Kalman filtering that required state-space representations. Eric Ghysels, along with co-authors Pedro Santa-Clara and Rossen Valkanov, first introduced in a working paper, proposing it as a flexible that avoids the computational complexity of Kalman filters by directly incorporating higher-frequency predictors into lower-frequency models through lag polynomials. This innovation built on earlier models but specifically targeted the challenges of frequency misalignment in econometric forecasting. A pivotal milestone came with the 2007 publication of " Regressions: Further Results and New Directions" in Econometric Reviews, where Ghysels, Sinko, and Valkanov expanded the theoretical foundations, asymptotic properties, and empirical applications of , establishing it as a viable alternative for mixed-frequency analysis. Subsequent advancements included Andreou, Ghysels, and Kourtellos's 2010 paper in the Journal of Econometrics, which derived asymptotic properties for estimators in MIDAS regressions and demonstrated their use in macroeconomic forecasting. Their 2013 work further applied MIDAS to incorporate daily financial data for quarterly GDP predictions, showing improved forecast accuracy over standard autoregressive models. More recently, Babii, Ghysels, and Striaukas's 2022 paper integrated techniques into MIDAS frameworks for high-dimensional , enabling scalable nowcasting with mixed frequencies. MIDAS evolved from basic formulations using lags—influenced by Shirley Almon's 1965 distributed lag technique, which parameterized lag weights as polynomials to reduce —to more sophisticated extensions. Post-2010 developments introduced MIDAS models, allowing regime-switching based on covariates to capture nonlinear dynamics in mixed-frequency data. By the , high-dimensional variants emerged, incorporating sparse regularization and factor structures to handle large datasets, thus broadening MIDAS's applicability in big data while preserving its core innovation in frequency mixing.

Fundamental Concepts

Mixed-Frequency Data Challenges

Mixed-frequency data in arises when variables are observed at different sampling intervals, such as quarterly (GDP) alongside daily interest rates or monthly industrial production indices paired with annual fiscal data. This temporal misalignment creates significant challenges, as standard econometric models assume synchronized observations, leading to difficulties in aligning high-frequency indicators with low-frequency aggregates. For instance, in macroeconomic forecasting, daily data must be reconciled with quarterly , often resulting in the loss of timely information from faster-sampled series. A primary issue is , where high-frequency data is typically averaged or summarized to match the lowest common frequency, diluting short-term signals and introducing distortions in the underlying relationships. This process can complicate inference in models like vector autoregressions (VARs). In settings with varying observation intervals across units—such as firm-level monthly sales versus industry-level quarterly aggregates—estimation becomes more complex. The consequences of these challenges are pronounced in and : mismatched frequencies lead to biased estimates, specification errors in multi-equation systems, and reduced predictive accuracy compared to synchronized data setups. For example, temporal aggregation in New Keynesian models estimated at quarterly frequencies can upwardly bias measures of price stickiness, overstating the duration of rigidities by factors of several months. Model complexity also escalates, as incorporating mixed frequencies requires handling unbalanced panels or state-space representations, increasing computational demands and the risk of in high-dimensional settings. Approaches like mixed-data sampling regressions have been developed to mitigate these issues by directly utilizing disaggregated data without excessive temporal smoothing.

Lag Polynomial Approach

The lag polynomial approach in mixed-data sampling (MIDAS) regressions addresses the challenge of incorporating high-frequency variables into low-frequency models by parameterizing distributed lags in a parsimonious manner. Traditional distributed lag models for time series data sampled at a single frequency require estimating a coefficient for each lag, leading to an explosion in parameters when dealing with higher frequencies—for instance, a distributed lag over 12 low-frequency periods with monthly high-frequency data and quarterly outcomes (where m=3, the number of high-frequency observations per low-frequency period) could demand up to 36 coefficients without structure. In MIDAS, the lag polynomial imposes a finite-dimensional structure on these lags, reducing the number of free parameters while preserving the dynamic information from the high-frequency data. This parameterization assumes familiarity with standard distributed lag models, such as those in autoregressive distributed lag (ADL) frameworks, but extends them to handle frequency mismatches by adapting the lag operator to the ratio between sampling frequencies, denoted as m (e.g., m=3 for monthly to quarterly data). The general form of the lag polynomial is given by B(L; \theta) = \sum_{k=0}^{K-1} w_k(\theta) L^k, where L is the , K is the finite lag length, and w_k(\theta) are weights that depend on a small set of hyperparameters \theta. These weights are designed to ensure properties like and gradual , mimicking the economic that recent high-frequency observations should carry more weight than distant ones. The effectively collapses multiple high-frequency lags into a weighted that serves as a single input to the low-frequency model, allowing the of rapid variables—such as daily financial indicators—to influence slower ones like quarterly GDP without overwhelming the estimation process. This structure maintains the interpretability of the model while avoiding the curse of dimensionality inherent in unrestricted high-frequency lags. By adapting the to fractional powers, such as L^{1/m}, the approach explicitly accounts for the mixed-frequency nature of the data, enabling the to align high-frequency observations with low-frequency periods. For example, in a setup with quarterly dependent variables and monthly regressors (m=3), the aggregates three monthly lags per quarter into a cohesive low-frequency predictor. This method has become foundational in econometric applications involving temporal aggregation, as it balances flexibility with feasibility, assuming the underlying processes are and the regressors are weakly exogenous.

MIDAS Regression Models

Basic Formulation

The basic formulation of the regression model addresses the challenge of incorporating high-frequency data into low-frequency regressions by using a parsimonious lag structure. The posits a linear between a low-frequency dependent and one or more high-frequency explanatory variables, where the latter are aggregated via a to avoid parameter proliferation. The core equation for the regression with multiple high-frequency regressors is y_t = \beta_0 + \sum_{i=1}^N \beta_i B_i(L^{1/m}; \theta_i) x_{t}^{(i)} + \varepsilon_t, where y_t denotes the dependent variable observed at the low (reference) , such as quarterly GDP; x_t^{(i)} represents the i-th explanatory variable observed at a higher , with m indicating the (for example, m=3 when aligning monthly data with quarterly observations); \beta_0 is the intercept term; \beta_i are scalar coefficients that scale each lag ; B_i(L^{1/m}; \theta_i) is the lag of j_{\max} that weights and sums the high-frequency lags, parameterized by \theta_i to impose structure on the weights; L^{1/m} is the fractional such that L^{k/m} x_t^{(i)} = x_{t - k/m}^{(i)}; and \varepsilon_t is the error term, assumed to be independent and identically distributed (i.i.d.) with zero mean and constant variance. This formulation derives from the unrestricted distributed lag regression, in which y_t would be regressed directly on all m \times j_{\max} high-frequency lags of each x_t^{(i)}, leading to a large number of parameters that overparameterize the model for typical sample sizes. The approach achieves parsimony by restricting the lag coefficients through the functional form of B_i(\cdot; \theta_i), which typically involves far fewer parameters (often 1–2 per ) while preserving the ability to capture temporal dynamics across frequencies. Key assumptions underlying the model include in the parameters, ensuring that the of y_t is a of the transformed high-frequency regressors; exogeneity across frequencies, meaning the high-frequency variables are not correlated with the low-frequency error term; and stationarity of all involved , implying covariance stationarity with finite second moments to validate the lag structure and inference.

Weighting Schemes

In regression models, weighting schemes parameterize the polynomial to aggregate high-frequency data into a low-frequency equivalent, ensuring while capturing temporal dynamics. These schemes impose structure on the weights w_k, which are typically normalized such that \sum_{k=0}^M w_k = 1 and non-negative to maintain interpretability. The Almon employs a approximation to model the weights, providing a smooth, low-order representation suitable for gradual decay patterns. It is defined as w_k = \sum_{j=0}^J \alpha_j k^j, where J is typically small (e.g., 1 or 2) to limit parameters, and coefficients \alpha_j are estimated. This approach, adapted from models, ensures continuity and differentiability, making it effective for applications requiring monotonic weight decline. A more flexible alternative is the beta lag scheme, which draws from the beta density function to allow diverse shapes, including hump-shaped profiles that emphasize recent data. The weights are given by w_k(\theta_1, \theta_2) = \frac{(k/M)^{\theta_1 - 1} (1 - k/M)^{\theta_2 - 1}}{\sum_{j=0}^M (j/M)^{\theta_1 - 1} (1 - j/M)^{\theta_2 - 1}}, where M is the total number of lags, and parameters \theta_1 > 0, \theta_2 > 0 control the shape—e.g., \theta_1 = 1, \theta_2 > 1 yields , while \theta_1 < 1, \theta_2 < 1 produces a forward-peaking hump. This scheme inherently satisfies positivity and normalization, facilitating estimation via . Other schemes include the exponential Almon, which extends the form with exponential terms for faster tail decay: w_k(\theta_1, \theta_2) = \frac{\exp(\theta_1 k + \theta_2 k^2)}{\sum_{j=0}^M \exp(\theta_1 j + \theta_2 j^2)}, allowing or patterns based on \theta_2. These variants are chosen when economic intuition suggests rapid of older . Selection of weighting schemes depends on economic theory, such as in where recent observations dominate, and empirical performance metrics like criteria. Desirable properties include positivity to avoid negative weights, normalization for , and to prevent ; the and exponential Almon schemes excel in these regards across macroeconomic and financial applications.

Extensions

Unrestricted and Restricted MIDAS

In restricted MIDAS models, the lag structure of high-frequency regressors is imposed through parameterized weighting functions, which significantly reduce the number of parameters to be estimated. For instance, the original parameterizes the distributed lag polynomial using schemes like the , transforming a potentially large set of individual lag weights—on the of O(mK), where m is the frequency ratio and K is the length—into a small number of hyperparameters, typically O(p) with p \ll mK. This approach, introduced by Ghysels, Santa-Clara, and Valkanov, enhances parsimony and interpretability while preventing the curse of dimensionality in mixed-frequency settings. In contrast, unrestricted MIDAS models, or U-MIDAS, dispense with such functional restrictions on the lag polynomials, allowing each high-frequency lag weight to be estimated individually without assuming a specific form. These models are derived from linear projections of high-frequency variables and are typically estimated using ordinary least squares (OLS), making them straightforward to implement and useful for empirical testing of imposed restrictions in restricted variants. Foroni, Marcellino, and developed this extension to provide greater flexibility, particularly when the frequency mismatch is modest, such as between monthly and quarterly data. The primary trade-offs between these approaches revolve around model flexibility and estimation risks. Restricted MIDAS prioritizes parsimony, which aids in avoiding overfitting and facilitates economic interpretation of the lag decay patterns, but it may fail to capture irregular or non-smooth weight profiles in the data. Unrestricted MIDAS excels at accommodating complex dynamics and irregular patterns in high-frequency indicators, yet it introduces a higher risk of overfitting, especially with large frequency ratios like daily-to-quarterly data, due to the proliferation of parameters. Simulations and empirical comparisons indicate that U-MIDAS often performs competitively in-sample and for shorter forecast horizons, while restricted models maintain advantages in out-of-sample forecasting for larger frequency mismatches. To assess the validity of restrictions in restricted , researchers employ statistical tests such as Wald tests or likelihood ratio tests, comparing the unrestricted model against the parameterized version to evaluate whether the imposed structure significantly worsens the fit. These tests help determine if the parsimony gains justify the loss of flexibility, with rejection often signaling the need for unrestricted alternatives in specific applications.

Machine Learning Enhanced MIDAS

Machine learning enhancements to regressions address the limitations of traditional models in handling high-dimensional mixed-frequency data, particularly when incorporating thousands of high-frequency predictors. A key approach involves the sparse-group (sg-) estimator combined with for variable selection and weight approximation in high-dimensional frameworks. This method structures the regularization to penalize entire groups of coefficients associated with each predictor, promoting sparsity at both the variable and lag levels while leveraging the natural grouping in time series data. The formulation extends the standard to accommodate high dimensionality as follows: y_t = \sum_{i=1}^p \phi(L^{1/m}; \beta_i, \theta) x_{t,i}^{(m)} + \varepsilon_t where y_t is the low-frequency target variable, x_{t,i}^{(m)} represents the i-th high-frequency predictor observed at frequency m, L^{1/m} is the adjusted for mixed frequencies, \phi(\cdot; \beta_i, \theta) is the lag weighting function approximated using orthogonal polynomials (typically of degree up to 3 or higher) with coefficients \beta_i (a vector for each predictor i), and the \beta_i are penalized via sg-LASSO to enforce group sparsity. The sg-LASSO penalty term is \lambda \left( \sum_{i=1}^p \|\beta_i\|_2 + \sum_{i=1}^p \sum_{k=1}^Q |\beta_{i k}| \right), which balances group and individual selection, allowing the model to select relevant predictors and their structures efficiently. This setup establishes oracle inequalities under mixing conditions, ensuring consistent even with heavy-tailed errors common in financial and macroeconomic data. These enhancements provide significant advantages, including the ability to process environments with thousands of high-frequency variables, such as daily financial indicators for quarterly GDP nowcasting, where traditional would suffer from . In panel settings, sg-LASSO- improves predictive accuracy by incorporating cross-sectional heterogeneity and text-based features, outperforming unstructured by exploiting structures. For instance, applications to GDP nowcasting demonstrate reduced mean squared forecast errors compared to benchmark models, particularly in data-rich scenarios. Recent developments post-2020 have integrated embeddings to capture nonlinearities in MIDAS weights, extending beyond polynomial approximations. The DL-MIDAS model employs architectures, such as recurrent neural networks, to learn flexible, data-driven transformations of high-frequency inputs, enabling the exploration of complex nonlinear patterns in mixed-frequency data and yielding more stable predictions than linear MIDAS variants. In forecasting, hybrid approaches combining MIDAS with convolutional neural networks and units preprocess mixed-frequency inputs for enhanced predictions, achieving superior out-of-sample performance in capturing shifts and asymmetries. These neural-enhanced MIDAS models have been applied to financial , improving long-horizon forecasts in volatile markets. As of 2025, further extensions include kernel within MIDAS frameworks for nonlinear high-dimensional forecasting and fully nonparametric MIDAS (FNP-MIDAS) approaches that avoid lag assumptions for greater flexibility in lag estimation.

Estimation and Diagnostics

Parameter Estimation Methods

Parameter estimation in mixed-data sampling (MIDAS) regression models primarily relies on classical econometric techniques adapted to the nonlinear structure arising from frequency aggregation and weighting functions. The nonlinear least squares (NLS) estimator is the most straightforward and widely adopted approach, minimizing the objective function \hat{M}_T(\gamma) = -T^{-1} \sum_{t=1}^T \varepsilon_t(\gamma)^2, where \varepsilon_t(\gamma) = y_t - B_0 - f\left( \sum_{i=1}^K \sum_{j=1}^L B_{ij}(L^{1/m_i}) g(X^{(m_i)}_t) \right) and \gamma encompasses the parameters of interest, including the slope coefficients \beta and weighting parameters \theta. This method is iterative because the lag weights depend nonlinearly on \theta, requiring numerical optimization algorithms such as Gauss-Newton or BFGS to converge. Under standard regularity conditions, the NLS estimator is consistent and asymptotically normal as the low-frequency sample size T \to \infty, with fixed sampling frequencies m_i. Maximum likelihood (ML) estimation extends NLS by incorporating assumptions about the error distribution, typically Gaussian, to maximize the average log-likelihood \hat{M}_T(\gamma) = T^{-1} \sum_{t=1}^T l(\varepsilon_t | \gamma), where l(\cdot) is the log-density function. This approach enables full , including likelihood ratio tests, and is particularly useful for handling potential heteroskedasticity through extensions like quasi-ML or by specifying a full structure for the errors. Like NLS, ML estimators are consistent and asymptotically normal under suitable conditions on the error process and model specification. In practice, ML is implemented via similar iterative procedures and is often preferred when the errors exhibit non-normal features that NLS ignores. A estimation offers an alternative for scenarios where the weighting function is treated nonparametrically, first estimating the weights using kernel methods such as Nadaraya-Watson and then applying to the resulting partial . This approach reduces the parametric assumptions on the weights while maintaining computational tractability, with the first step providing consistent estimates of the nonlinear component and the second step focusing on the linear parameters. It is particularly effective in exploratory analyses or when the form of the weighting scheme is uncertain. Estimation in models faces challenges due to the high dimensionality of potential lags and the nonlinearity in parameters, which can lead to sensitivity to initial values and convergence issues in optimization. To address initial value sensitivity, grid search methods are commonly employed to evaluate multiple starting points and select the one yielding the highest likelihood or lowest sum of squared s before proceeding with iterative optimization. For inference, standard errors of the parameter estimates can be obtained from the inverse at the converged values, providing asymptotic variance-covariance matrices under regularity conditions. When asymptotic approximations are unreliable due to small samples or model misspecification, bootstrap methods—such as or paired bootstraps—are used to compute empirical standard errors by resampling the data and re-estimating the model multiple times.

Model Selection Criteria

In mixed data sampling (MIDAS) regression models, information criteria play a central role in selecting the lag order K and the degree of the lag , balancing model fit and complexity to avoid in high-dimensional settings. The (AIC) is commonly applied, defined as \text{AIC} = -2 \log L + 2k, where L is the maximized likelihood and k is the number of estimated parameters; this formulation favors parsimonious specifications while rewarding improvements in explanatory power. The (BIC), which substitutes k \log n for $2k (with n denoting the sample size), imposes a harsher penalty on additional parameters, making it preferable for larger samples where true model sparsity is assumed. These criteria are implemented in software tools for automated selection, such as generating tables of AIC and BIC values across varying K and polynomial degrees to identify the optimal configuration. Out-of-sample forecasting evaluation provides a robust check on model performance, particularly for nowcasting applications where predictive accuracy is paramount. Metrics such as the mean squared forecast error (MSE) quantify errors in hold-out periods, with competing specifications compared using the Diebold-Mariano test to assess whether differences in accuracy are statistically significant. This test evaluates the of equal predictive ability across models, accounting for potential correlation in forecast errors, and has been widely applied to validate against benchmarks like aggregated data regressions. Time-series cross-validation variants, including rolling window schemes, further refine selection by iteratively training on expanding or fixed-size windows and testing on subsequent observations, thereby mitigating lookahead bias inherent in non-stationary data. Diagnostic tests ensure the adequacy of the selected specification by examining residuals for violations of assumptions. Tests for , such as the Ljung-Box Q-statistic, detect serial dependence in standardized residuals, which could indicate omitted dynamics or inadequate lag structures. Heteroskedasticity is assessed via the ARCH-LM test, which checks for clustering by regressing squared residuals on their lags and testing the significance of coefficients under the null of no ARCH effects. For models with restricted weighting schemes, specification tests like the heteroscedasticity- and autocorrelation-robust (hAhr) test evaluate the validity of imposed constraints on lag polynomials, such as monotonicity or humped shapes, by comparing restricted and unrestricted variants. These diagnostics guide refinements, ensuring the chosen model aligns with empirical patterns in mixed-frequency data.

Applications

Macroeconomic Nowcasting

Mixed-data sampling () regression models are widely applied in macroeconomic nowcasting to forecast low-frequency aggregates, such as quarterly GDP growth or rates, by integrating higher-frequency indicators like monthly Purchasing Managers' Indexes (PMIs), weekly retail sales, or daily financial metrics. This approach addresses temporal misalignment in data releases, enabling predictions of current-quarter outcomes using the most recent high-frequency information available. For instance, monthly industrial production or employment data can inform quarterly GDP estimates, while weekly surveys provide timely signals for dynamics. A notable case study involves nowcasting Eurozone Harmonized Index of Consumer Prices (HICP) inflation, where MIDAS models incorporate daily indicators such as oil prices and interest rate spreads to predict monthly inflation rates from June 2010 to June 2022. In a comparative analysis, MIDAS achieved a mean absolute error (MAE) of 0.23 percentage points and an R² of 0.77 over a 24-month evaluation period (June 2019–June 2021), outperforming a simple AR(1) benchmark but slightly underperforming the AI-based Lag-Llama model, which yielded an MAE of 0.21 and R² of 0.84. During the COVID-19 pandemic, MIDAS regressions were used to nowcast economic activity in Latin America and the Caribbean (LAC) economies, leveraging daily Google Community Mobility Report data to predict industrial production growth rates—a key proxy for GDP—as mobility patterns reflected lockdown impacts and recovery phases, with the model capturing sharp contractions in early 2020 more effectively than static benchmarks. The primary benefits of in nowcasting stem from its ability to provide frequent updates as new high-frequency data arrives, allowing revisions to forecasts without requiring full re-estimation of low-frequency models. This timeliness is particularly valuable in bridging publication lags between indicators and targets. Empirical evidence indicates that often outperforms (ARIMA) models, especially during volatile periods like economic crises, with bridge-style variants reducing error (RMSE) relative to univariate benchmarks in GDP forecasting exercises. Seminal work by Ghysels, Sinko, and Valkanov (2007) laid the foundation for these applications, demonstrating the framework's efficacy in handling mixed frequencies for predictive accuracy in macroeconomic settings.

Financial Time Series Analysis

Mixed-data sampling (MIDAS) models have been widely applied in financial time series analysis to forecast volatility using mixed-frequency data, particularly by incorporating high-frequency daily returns to predict realized volatility at monthly or quarterly horizons. In this framework, daily squared returns or realized measures serve as predictors for lower-frequency volatility targets, allowing for the aggregation of intraday information without temporal disaggregation. This approach is particularly useful in equity markets, where high-frequency data capture short-term fluctuations that inform longer-term risk assessments. For instance, Ghysels, Santa-Clara, and Valkanov (2004) demonstrate that MIDAS regressions using daily returns significantly improve volatility forecasts compared to traditional low-frequency models, with applications extending to option pricing where intraday data enhances the precision of implied volatility estimates. Key examples illustrate the efficacy of in stock return modeling. Andreou (2016) examines the integration of high-frequency measures into MIDAS regressions for predicting stock returns, showing that specifications with intraday predictors outperform standard autoregressive models by better accounting for temporal dependencies in financial data. Additionally, the GARCH-MIDAS model, introduced by Engle, Ghysels, and Sohn (2013), decomposes into short-term (high-frequency) and long-term (low-frequency) components, enabling the separation of daily noise from persistent macroeconomic influences on stock . This hybrid approach has been applied to U.S. indices, revealing that low-frequency macroeconomic variables explain a substantial portion of long-run persistence. Recent extensions include hybrid models combining MIDAS with techniques, such as convolutional neural networks, to forecast stock more accurately under mixed-frequency data. MIDAS models offer distinct advantages in high-frequency finance by capturing leverage effects—where negative returns amplify future volatility—and jumps associated with sudden market events, thereby enhancing risk management practices such as Value-at-Risk calculations. Extensions like the GARCH-MIDAS-X variant incorporate signed high-frequency returns to explicitly model leverage, improving forecasts during asymmetric market conditions. Empirical evidence supports these benefits: MIDAS specifications consistently outperform Heterogeneous Autoregressive (HAR) models in equity and foreign exchange (FX) volatility forecasting, with superior accuracy in out-of-sample tests across major indices and currency pairs. Notably, during the 2008 financial crisis, MIDAS-based forecasts demonstrated greater robustness under market stress, yielding lower forecast errors compared to HAR benchmarks and aiding in better crisis risk assessment.

Implementation

Software Packages

Several software packages facilitate the implementation of Mixed Data Sampling () regression models across various programming languages and environments. In , the midasr package provides tools for estimating, selecting, and forecasting with regressions using mixed-frequency data, including support for unrestricted and restricted models with built-in weighting schemes such as and Almon polynomials, as well as estimation routines via and plotting functions. The package, version 0.9 released on April 7, 2025, is open-source and available via CRAN. Complementing this, the midasml package extends to high-dimensional settings with enhancements, implementing sparse-group (sg-) for regularization and prediction in and , featuring functions for data manipulation, orthogonal polynomial bases, and proximal block optimization; it is also open-source, with version 0.1.11 available on CRAN and as of October 2025. For , the Toolbox, originally developed by Ghysels, supports ADL-MIDAS, GARCH-MIDAS, and DCC-MIDAS regressions with flexible lag structures, Legendre polynomial weighting, and out-of-sample forecasting capabilities; version 2.4.0.0, updated in March 2021, is freely downloadable from File Exchange. includes a built-in feature since version 9.5, allowing estimation of models with low-frequency dependents and high-frequency regressors using weighting schemes like Almon/PDL, step, and beta functions, along with forecast averaging and integration with external data sources such as ; it is part of the commercial software. In , MIDAS implementations are available through third-party open-source packages such as midaspy on , which provides lagged matrix generation, ordinary estimation for MIDAS regressions, and statistical summaries, or midas_pro for univariate and multivariate MIDAS; custom implementations can also leverage libraries like statsmodels for core tasks, though no dedicated core exists in statsmodels. Stata offers the user-contributed midasreg command for MIDAS estimation, primarily for restricted models with polynomial weights, though availability may be limited to private distribution; it is integrated into the commercial environment.

Practical Examples

In , the midasr package facilitates fitting MIDAS models to mixed-frequency data, such as quarterly U.S. real GDP growth regressed on monthly growth in rates. To implement this, load the built-in datasets USrealgdp and USunempr, compute log differences for growth rates, and construct a MIDAS lag structure using a weighting function, which imposes smoothness on the lag weights via () estimation. The following code snippet demonstrates the process:
r
library(midasr)
data(USrealgdp)
data(USunempr)
y <- diff(log(USrealgdp)) * 100  # Quarterly GDP growth
x <- diff(log(USunempr)) * 100  # Monthly unemployment growth, aligned to quarterly
midas_formula <- y ~ mls(x, k = 0:11, m = 3, weight_function = "[beta](/page/Beta)")  # 12 monthly lags (4 quarters)
midas_model <- midas_r(midas_formula, start = list([beta](/page/Beta) = c(1, 1), [theta](/page/Theta) = c(2, 1)))  # NLS [estimation](/page/Estimation)
summary(midas_model)
weights <- [beta](/page/Beta)_weights(midas_model$[theta](/page/Theta), k = 0:11, m = 3)  # Extract [beta](/page/Beta) lag weights
plot(weights)  # Visualize weights
This approach estimates parameters for the , where weights decline gradually over , improving forecasts of GDP growth by incorporating recent monthly unemployment dynamics. In , Eric Ghysels' supports nowcasting by regressing low-frequency measures, such as monthly , on daily returns using ADL-MIDAS specifications. For instance, load daily returns, compute monthly , and apply a beta to weight up to 66 daily (about two months), then estimate via OLS after constructing the MIDAS regressor. Post-estimation, the weights to assess their decay pattern, revealing higher weights on recent days for better short-term predictions. The includes functions like midas_reg for and plot_midas_weights for , as demonstrated in applications to daily financial data for . EViews provides built-in diagnostics for MIDAS models, aiding practical model refinement. After estimating a MIDAS regression, such as quarterly GDP growth on monthly indicators using Almon or beta weights, examine the (AIC) in the output table to select among lag structures or weight functions; lower AIC values indicate superior in-sample fit balancing complexity and explanatory power. For residual analysis, access the equation object's "Residual Graph" view to plot actual versus fitted values and residuals over time, checking for patterns like or heteroskedasticity that may suggest model misspecification. These steps ensure robust interpretation in empirical applications. A key practical consideration in implementation is preparing high-frequency data for alignment with low-frequency targets, particularly handling missing observations due to publication lags or ragged edges. A common method is last-value carry-forward (LOCF), where the most recent available high-frequency value is repeated until the next observation arrives, preserving temporal structure without introducing bias from . This technique is routinely applied in nowcasting setups to maintain across frequencies.

Alternatives

Temporal Disaggregation Methods

Temporal disaggregation methods provide a for interpolating low-frequency , such as annual aggregates, into higher-frequency series, such as quarterly or monthly observations, while ensuring consistency with the original low-frequency benchmarks. These techniques are particularly useful in addressing mixed-frequency data challenges by sparse series to align with more frequent indicators, enabling integrated analysis in time series models. Unlike approaches that aggregate high-frequency data, disaggregation emphasizes preserving movements from related indicators under constraints that the sum or average of the disaggregated series matches the low-frequency data. One seminal method is the Denton proportional approach, developed in the early , which focuses on benchmark-constrained disaggregation by minimizing the squared proportional deviations of the interpolated series from an indicator series, subject to the constraint that the aggregated high-frequency values equal the low-frequency observations. Formally, for a low-frequency series y observed at times k = 1, \dots, K and a high-frequency indicator z at times t = 1, \dots, T, the method solves: \min_{y^*} \sum_{t=1}^T \left( \frac{y_t^* - z_t}{z_t} \right)^2 subject to B y^* = y, where y^* is the disaggregated high-frequency series, and B is the aggregation matrix linking high- to low-frequency periods (e.g., summing quarters to annual totals). This proportional variant preserves relative movements and is often applied with differencing (e.g., first differences) to enhance smoothness, making it suitable for flow variables like GDP. The univariate version operates without indicators, relying solely on the constraint, while multivariate extensions incorporate multiple indicators for improved accuracy. Denton (1971) introduced this quadratic minimization principle, which has become a standard in official statistics for its computational simplicity and ability to handle revisions. Another influential technique is the Litterman method from the , which employs a regression-based assuming the disaggregated series follows a or AR(1) process, using (GLS) to estimate high-frequency values from annualized indicators. The formulation involves regressing the low-frequency data on the indicator after annualization, with a variance-covariance that accounts for correlation in residuals, modeled as \epsilon_t = \rho \epsilon_{t-1} + \eta_t where \rho is estimated from the data. This approach is particularly effective for non-cointegrated series and produces smoother interpolations by incorporating temporal dynamics. Litterman (1983) proposed this for distributing , emphasizing its utility in where high-frequency patterns exhibit persistence. Univariate implementations simplify to AR(1)-driven interpolation without indicators, while multivariate versions leverage vector autoregressions for joint disaggregation. These methods find widespread application in , such as converting annual GDP or trade data into quarterly series to facilitate () models for and nowcasting, where low-frequency benchmarks must align with monthly indicators like industrial production. For instance, national statistical agencies routinely use Denton-based disaggregation to produce preliminary quarterly from annual surveys, enabling timely macroeconomic monitoring. Multivariate Denton variants are employed in systems of accounts, such as disaggregating annual sector-level data using multiple high-frequency proxies. Despite their practicality, temporal disaggregation methods have notable limitations, including an of in the interpolated series that may overlook abrupt high-frequency shocks or structural breaks, potentially leading to biased estimates in volatile environments. They also ignore explicit high-frequency drivers beyond the indicator, relying heavily on its representativeness, and are prone to revisions when new low-frequency data arrives, which can propagate errors in downstream models. These constraints highlight their role as tools rather than full dynamic models.

State-Space Models

State-space models provide a multivariate framework for analyzing mixed-frequency data, extending univariate approaches by incorporating latent states and dynamic evolution through the . In this setup, the observation equation links observed variables of differing frequencies to the , while the state equation governs the underlying dynamics. Formally, the model is specified as: y_t = Z_t \alpha_t + \varepsilon_t, \quad \varepsilon_t \sim N(0, H_t) \alpha_{t+1} = T_t \alpha_t + \eta_t, \quad \eta_t \sim N(0, Q_t) where y_t represents the vector of observations (potentially with missing values for lower-frequency data), \alpha_t is the latent state vector, Z_t and T_t are time-varying system matrices, and \varepsilon_t and \eta_t are Gaussian noise terms with covariance matrices H_t and Q_t, respectively. This structure allows the Kalman filter to recursively update estimates as new data arrive at irregular intervals, treating lower-frequency observations as aggregated or missing high-frequency realizations. A key adaptation for mixed frequencies was developed by Mariano and Murasawa (2003), who embedded high-frequency updates within the state-space framework to handle datasets like monthly indicators and quarterly GDP. Their approach transforms lower-frequency variables (e.g., quarterly aggregates) into a state-space compatible form by modeling them as sums or averages of unobserved high-frequency components, enabling via the . This method has been widely applied in dynamic factor models, where the state vector captures common latent factors driving multiple series at the highest available frequency. State-space models offer significant advantages for mixed-frequency analysis, particularly in handling latent variables and errors-in-variables, which are common in where true high-frequency measures are unobserved. They are especially suitable for factor models, allowing the extraction of common trends from disparate frequencies without explicit aggregation, thus preserving information and enabling nowcasting of low-frequency aggregates like GDP. However, these models are computationally intensive due to the iterative nature of the and the need to evaluate likelihoods over high-dimensional states, particularly with large datasets or complex dynamics. They are also specification-sensitive, as the choice of state dimension, matrix structures, and initial conditions can greatly influence results, often requiring careful . Moreover, reliable demands substantial data, especially for estimating covariances in latent factor setups, which can pose challenges in short samples.

References

  1. [1]
    [PDF] The MIDAS Touch: Mixed Data Sampling Regression Models∗
    We introduce Mixed Data Sampling (henceforth MIDAS) regression models. The regressions involve time series data sampled at different frequencies.
  2. [2]
  3. [3]
    [PDF] Mixed Frequency Data Sampling Regression Models
    In this article we define a general autoregressive MIDAS regression model with multiple variables of different frequencies and show how it can be specified ...
  4. [4]
    Comparative analysis of Mixed-Data Sampling (MIDAS) model ...
    Jul 11, 2024 · In this study, we compare the performance of the MIDAS model and the Lag-Llama model in nowcasting the Harmonized Index of Consumer Prices (HICP) in the Euro ...
  5. [5]
    [PDF] Forecasting fiscal time series using mixed frequency data
    filter, is that MiDaS is more parsimonious and less sensitive to specification errors due to the use of non-linear lag polynomials. To the best of our ...
  6. [6]
    [PDF] MIDAS Regressions: Further Results and New Directions∗
    Their MIxed Data Sampling – or MIDAS – regressions represent a simple, parsimonious, and flexible class of time series models that allow the left-hand and right ...
  7. [7]
    The MIDAS Touch: Mixed Data Sampling Regression Models
    Eric Ghysels & Pedro Santa-Clara & Rossen Valkanov, 2004. "The MIDAS Touch: Mixed Data Sampling Regression Models," CIRANO Working Papers 2004s-20, CIRANO.
  8. [8]
    MIDAS Regressions: Further Results and New Directions
    Abstract. We explore mixed data sampling (henceforth MIDAS) regression models. The regressions involve time series data sampled at different frequencies.Missing: 2006 | Show results with:2006
  9. [9]
    Regression models with mixed sampling frequencies - ScienceDirect
    We study regression models that involve data sampled at different frequencies, the so called Mi(xed) Da(ta) S(ampling), or MIDAS, regression models.
  10. [10]
    Should Macroeconomic Forecasters Use Daily Financial Data and ...
    We introduce easy-to-implement, regression-based methods for predicting quarterly real economic activity that use daily financial data.
  11. [11]
    Machine Learning Time Series Regressions With an Application to ...
    Apr 21, 2021 · This article introduces structured machine learning regressions for high-dimensional time series data potentially sampled at different frequencies.
  12. [12]
    The Distributed Lag Between Capital Appropriations and Expenditures
    1 (January, 1965). THE DISTRIBUTED LAG BETWEEN CAPITAL APPROPRIATIONS AND. EXPENDITURES1. BY SHIRLEY ALMON. THIS PAPER PRESENTS a new distributed lag, very ...
  13. [13]
    Threshold mixed data sampling (TMIDAS) regression models with ...
    Feb 18, 2021 · We apply the TMIDAS model to investigate presence and pattern of cyclical bias in quarterly GDP forecast errors and compare the out-of-sample ...2 Threshold Mixed Data... · 2.1 Model Estimation · 2.2 Model Specification...Missing: MIDAS | Show results with:MIDAS
  14. [14]
    [PDF] Temporal Aggregation Bias and Mixed Frequency Estimation of a ...
    This paper asks whether frequency misspecification of a New Keynesian model re- sults in temporal aggregation bias of the Calvo parameter.
  15. [15]
    The MIDAS Touch: Mixed Data Sampling Regression Models
    Author(s): Ghysels, Eric; Santa-Clara, Pedro; Valkanov, Rossen | Abstract: We introduce Mixed Data Sampling (henceforth MIDAS) regression models.Missing: citation | Show results with:citation
  16. [16]
  17. [17]
    [PDF] Bayesian MIDAS penalized regressions: estimation, selection, and ...
    Mar 1, 2019 · First, we consider MIDAS regressions resorting to Almon lag polynomial weighting schemes, which depend only on a bunch of functional parameters.
  18. [18]
    [PDF] U-MIDAS: MIDAS regressions with unrestricted lag polynomials
    This specification has the advantage of a higher flexibility compared to the functional lag polynomials in the standard MIDAS approach. However, U-MIDAS has the ...
  19. [19]
  20. [20]
    On Stock Volatility Forecasting under Mixed-Frequency Data Based ...
    This paper uses RR-MIDAS for mixed-frequency data, CNN-LSTM for feature extraction, and Optuna for hyperparameter optimization to forecast stock volatility.
  21. [21]
    Mixed data sampling (MIDAS) regression models - ScienceDirect.com
    This chapter focuses on single-equation MIDAS regression models involving stationary processes with the dependent variable observed at a lower frequency.
  22. [22]
    (PDF) Estimating MIDAS regressions via MIDAS-NLS with revised ...
    Oct 17, 2020 · Mixed data sampling (MIDAS) regression has received much attention in relation to modeling financial time series due to its flexibility.<|control11|><|separator|>
  23. [23]
    [PDF] Pooled Mean Group Estimator with MIDAS Covariates
    We started trying a standard grid search approach were we chose the set that lead to the maximum value for the initial likelihood. The problem with the latter ...<|separator|>
  24. [24]
    [PDF] midasr: Mixed Data Sampling Regression - CRAN
    Apr 7, 2025 · Creates a high frequency lag selection table for MIDAS regression model with given information criteria and minimum and maximum lags. Usage.
  25. [25]
    [PDF] Inference for Factor-MIDAS Regression Models - Yookyung Julia Koh
    A common weighting scheme in the MIDAS regression model is the exponential Almon lag with two parameters such that wk(θ) = exp(θ1k + θ2k2). PK k=1 exp(θ1k + ...
  26. [26]
    [PDF] Machine Learning Time Series Regressions With an Application to ...
    Feb 24, 2021 · We assess via simulations the out-of-sample predictive performance (forecasting and nowcasting), and the MIDAS weights recovery of the sg-LASSO ...
  27. [27]
    Comparative analysis of Mixed-Data Sampling (MIDAS) model ...
    Jul 11, 2024 · Two models were compared and assessed whether the Lag-Llama can outperform the MIDAS regression, ensuring that the MIDAS regression is evaluated ...Missing: HICP | Show results with:HICP
  28. [28]
    [PDF] Nowcasting Economic Activity in Times of COVID-19
    To reduce some of this uncertainty this paper details the use of daily mobility and air quality data to predict movements in industrial production, which is ...
  29. [29]
    [PDF] Exploring Nowcasting Techniques for Real-Time GDP Estimation in ...
    Jul 6, 2024 · The results indicate that bridge models outperform traditional time-series models, such as autoregressive integrated moving average (ARIMA), ...
  30. [30]
    [PDF] Working Paper 10914 - National Bureau of Economic Research
    A comparison of the MIDAS regressions with purely autoregressive volatility models reveals that the MIDAS forecasts are better at forecasting future realized ...
  31. [31]
    On the use of high frequency measures of volatility in MIDAS ...
    Our analysis is related to Andreou, Ghysels and Kourtellos (AGK) (2010) except they do not deal with high frequency volatility filters and study special cases ...
  32. [32]
    ANTICIPATING LONG-TERM STOCK MARKET VOLATILITY - jstor
    Aug 12, 2014 · Since we intend to directly model the effects of the macro variables on long-term volatility, we rely on the GARCH-MIDAS model proposed in Engle ...
  33. [33]
    Forecasting short-run exchange rate volatility with monetary ...
    Engle et al. (2013) find that directly including quarterly economic fundamentals in the GARCH-MIDAS model improves pseudo out-of-sample prediction accuracy for ...
  34. [34]
    MIDAS volatility forecast performance under market stress
    Studies have shown that the forecasting performance of the HAR-RV model cannot surpass that of the MIDAS-RV model in predicting high-frequency asset volatility ...Missing: FX | Show results with:FX
  35. [35]
    Stock exchange volatility forecasting under market stress with ...
    Jan 6, 2021 · Stock exchange volatility forecasting under market stress with MIDAS regression ... Andreou, E., Ghysels, E., & Kourtello, A. (2010) ...
  36. [36]
    Mixed Frequency Data Sampling Regression Models: The R ...
    Aug 16, 2016 · We introduce the R package midasr which enables estimating regression models with variables sampled at different frequencies within a MIDAS regression ...
  37. [37]
  38. [38]
    CRAN: Package midasml - R Project
    Oct 9, 2025 · The regularized MIDAS models are estimated using orthogonal (e.g. Legendre) polynomials and sparse-group LASSO (sg-LASSO) estimator. For ...
  39. [39]
    jstriaukas/midasml - GitHub
    The midasml package implements estimation and prediction methods for high-dimensional mixed-frequency (MIDAS) time-series and panel data regression models.
  40. [40]
    MIDAS Matlab Toolbox - File Exchange - MathWorks
    This toolbox is a repack of the Mi(xed) Da(ta) S(ampling) regressions (MIDAS) programs written by Eric Ghysels. It supports ADL-MIDAS type regressions.
  41. [41]
    MIDAS - EViews.com
    MIDAS is a method for estimating and forecasting when the dependent variable is recorded at a lower frequency than independent variables, using all higher ...
  42. [42]
    Mixed Data Sampling (MIDAS) Modeling in Python - GitHub
    Lagged matrix generator function to project higher frequency data onto lower frequency. Flexible MIDAS ordinary least squares regressor model and results ...
  43. [43]
    sapphire921/midas_pro: Python version of Mixed Data ... - GitHub
    This package is developed based on midaspy. This version can be used for MIDAS regression and multivariate MIDAS regression.
  44. [44]
    Mixed Data Sampling in Stata (MIDAS) - more info needed - Statalist
    Mar 24, 2015 · It seems there are packages for mixed frequency data in MATLAB, R and Eviews. But really don't want to use these tool, I am more used to stata ...<|control11|><|separator|>
  45. [45]
    [PDF] VOLATILITY MODELS AND THEIR APPLICATIONS
    A MIDAS method uses daily data to produce ... Journal of Applied Econometrics (forthcoming), 2009. 25. 85. A. Sinko, M. Sockin, and E. Ghysels. Matlab toolbox for ...
  46. [46]
    Temporal Disaggregation of Time Series - The R Journal
    Aug 25, 2013 · Abstract: Temporal disaggregation methods are used to disaggregate low frequency time series to higher frequency series, where either the sum, ...Missing: paper | Show results with:paper
  47. [47]
  48. [48]
    [PDF] Methods of Temporal Disaggregation for Estimating Output of the ...
    Nov 23, 2014 · Of Denton's movement preservation methods, the proportional first difference variant is most commonly applied for purposes of bench- marking and ...
  49. [49]
  50. [50]
    [PDF] Temporal Disaggregation Using Multivariate Structural Time Series ...
    In this paper we provide a multivariate framework for temporal disaggregation of time series observed at a certain frequency into higher frequency data. The.
  51. [51]