Interrupted time series
Interrupted time series (ITS) analysis is a quasi-experimental research design used to evaluate the causal impact of an intervention, policy, or event on an outcome variable by examining longitudinal data collected at multiple, equally spaced time points before and after the interruption.[1] This approach models the time series to detect changes in the level (immediate shift in the outcome) or slope (alteration in the trend) attributable to the intervention, while accounting for underlying patterns such as autocorrelation, seasonality, and secular trends.[2] ITS is particularly valuable in settings where randomized controlled trials are impractical, such as population-level health or social policy evaluations, as it provides robust evidence by leveraging pre-intervention data as a counterfactual.[3] The ITS design was introduced in 1963 by Donald T. Campbell and Julian C. Stanley as a powerful quasi-experimental alternative to true experiments, especially for assessing educational and social interventions where ethical or logistical constraints prevent randomization.[4] They emphasized its strength in ruling out many threats to internal validity, such as maturation, history, and instrumentation, through repeated observations that establish a baseline trend.[5] Building on this, George E. P. Box and George C. Tiao developed formal statistical methods in 1965, introducing intervention analysis techniques for non-stationary time series to test for level shifts and trend changes using autoregressive integrated moving average (ARIMA) models.[6] These foundational contributions established ITS as a cornerstone of causal inference in observational data, with subsequent refinements incorporating segmented regression for simpler applications.[7] ITS has been widely applied across disciplines, notably in public health to assess the effects of legislation like smoke-free policies or vaccination campaigns, in economics for policy evaluations such as minimum wage increases, and in criminal justice for analyzing the impact of laws on crime rates.[2] Key advantages include its ability to handle time-varying confounders and provide effect estimates over time, making it superior to simple pre-post comparisons.[3] However, effective implementation requires at least 12 data points pre- and post-intervention to ensure statistical power when using monthly data, and assumptions like the absence of concurrent events or stable pre-intervention trends must hold to avoid bias.[8] Modern extensions, such as multiple-group ITS or Bayesian approaches, address these limitations for complex, clustered data.[9]Overview
Definition
An interrupted time series (ITS) is a quasi-experimental research design that involves collecting multiple observations of an outcome variable at regular intervals over an extended period, with the time series divided into pre-intervention and post-intervention phases to evaluate the effects of a specific intervention.[10] This design detects potential changes in the level (immediate shift) or trend (slope over time) of the outcome that can be attributed to the intervention, such as a policy implementation or program rollout, by comparing patterns before and after the interruption point.[7] Unlike true experiments that employ random assignment to treatment and control groups to establish causality, ITS relies on temporal sequencing and the natural variation in time-series data rather than randomization, making it particularly suitable for real-world settings where ethical or practical constraints prevent controlled trials.[10] The core components include the time series data itself—typically comprising repeated measures of outcomes like rates, counts, or continuous variables (e.g., monthly hospitalization rates)—the clearly defined intervention acting as the interruption, and the analytical focus on discernible shifts in the series' trajectory post-intervention.[7]Core Principles
Interrupted time series (ITS) analysis relies on the principle of level change, which detects an immediate shift in the outcome variable's value immediately following the intervention, representing a discontinuous jump from the pre-intervention trajectory.[10] This abrupt alteration contrasts with the ongoing trend observed prior to the interruption and serves as a primary indicator of the intervention's short-term effect.[10] Complementing the level change, the principle of slope change examines alterations in the trend or rate of change in the outcome variable after the intervention, signaling a sustained or long-term impact.[10] For instance, if the pre-intervention trend shows a gradual increase, a post-intervention slope change might accelerate or decelerate this progression, thereby quantifying the intervention's influence on the underlying trajectory.[10] Visually, ITS is represented through time-series plots that display the outcome variable over sequential time points, with a vertical line marking the intervention point to separate pre- and post-intervention phases.[10] These plots typically fit linear trends to the pre-intervention data to infer a counterfactual—what the outcome would have been without the intervention—and compare it to the observed post-intervention path, highlighting any deviations in level or slope.[10] In causal inference, ITS enhances internal validity by leveraging multiple observations before and after the intervention to account for time-dependent confounds such as maturation and regression to the mean, thereby isolating the intervention's effect more robustly than simpler pre-post designs.[10] This repeated measurement strengthens the design's ability to rule out alternative explanations for observed changes.[10] Reliable estimation of trends in ITS requires a minimum of 8-12 data points both pre- and post-intervention to capture underlying patterns and detect meaningful changes with sufficient statistical power.[11] Fewer observations may obscure true effects or inflate Type II errors, while more extensive series improve precision.[11]History
Origins
The interrupted time series (ITS) design was first systematically introduced as a quasi-experimental method by psychologists Donald T. Campbell and Julian C. Stanley in their 1963 monograph Experimental and Quasi-Experimental Designs for Research. They classified the time-series experiment—characterized by multiple observations before and after an intervention—as one of the stronger alternatives to randomized controlled trials (RCTs) when randomization is infeasible, particularly for its ability to control for maturation, instrumentation, and testing effects through repeated measures.[5] This framework positioned ITS within broader quasi-experimental research, emphasizing its utility for causal inference in naturalistic settings without random assignment.[4] During the 1960s and 1970s, ITS saw early applications in education and psychology to evaluate non-randomized programs. These applications highlighted ITS's strength in detecting intervention-induced shifts amid ongoing temporal processes, fostering its adoption in social science program evaluation. The methodological roots of ITS also drew from econometrics, building on pre-existing time series analysis techniques for policy evaluation, notably the autoregressive integrated moving average (ARIMA) models introduced by George E. P. Box and Gwilym M. Jenkins in their 1970 book Time Series Analysis: Forecasting and Control. These models, originally developed for forecasting, were adapted to assess interventions by modeling structural breaks in time series data, enabling the detection of policy impacts on economic indicators. This econometric influence provided a rigorous statistical foundation for ITS, bridging descriptive time series with causal analysis. A key milestone in formalizing ITS for the social sciences came with the 1980 publication of Interrupted Time Series Analysis by David McDowall, Richard McCleary, Errol E. Meidinger, and Richard A. Hay Jr., which integrated ARIMA-based intervention analysis with quasi-experimental principles to offer practical tools for researchers in fields like criminology and public policy. The book emphasized model specification, estimation, and diagnostics tailored to social data, solidifying ITS as a standard method for evaluating interventions in non-experimental contexts.Key Developments
In the 1990s, interrupted time series (ITS) analysis saw significant methodological refinement through the integration of segmented regression techniques, which allowed for more precise estimation of intervention effects on both level and trend changes in time series data.[12] This approach was particularly advanced in public health research, where Wagner et al. provided comprehensive guidelines in 2002 for applying segmented regression to evaluate longitudinal effects of interventions, such as medication use policies, emphasizing its quasi-experimental strength over simpler before-after designs.[13] These developments built on earlier ARIMA foundations by enhancing interpretability for policy impacts without requiring randomization. The 2010s marked the rise of Bayesian approaches in ITS, offering flexible handling of uncertainty and complex structures in time series data. A pivotal contribution was Brodersen et al.'s 2015 introduction of structural time series models within a Bayesian framework to infer causal impacts, using state-space representations to predict counterfactual outcomes and quantify intervention effects in settings like marketing and econometrics.[14] This method, implemented in tools like the R package CausalImpact, addressed limitations of classical models by incorporating priors and enabling robust inference even with sparse data.[15] Computational advances further propelled ITS adoption through dedicated software in the 2010s. In R, the itsadug package, developed around 2015, facilitated the analysis of autocorrelated time series using generalized additive mixed models (GAMMs), supporting visualization and model diagnostics for intervention evaluations. Similarly, Stata's itsa command, introduced by Linden in 2015, streamlined single- and multiple-group ITS analyses via ordinary least-squares regression, making the method accessible for program evaluations in health and social sciences.[16] These tools democratized ITS by reducing analytical barriers and promoting standardized reporting. Post-2010 trends emphasized controlled ITS designs and integrations with advanced techniques to mitigate confounding, enhancing causal validity in observational settings. Linden and Adams's 2011 propensity score-based weighting model, for instance, improved inference by reweighting control series to mimic treated units' pre-intervention trajectories, particularly useful in policy evaluations with non-equivalent groups.[17] This approach, extended in subsequent works, has intersected with machine learning methods for confounder adjustment, such as synthetic controls, allowing hybrid models to handle high-dimensional covariates in real-world applications.[18] Global adoption of ITS surged post-2000, with publication volumes increasing substantially due to its utility in evaluating health policy interventions amid major events.[19] This spike was driven by applications assessing impacts like those from the 2008 financial crisis on healthcare access and outcomes, where ITS provided timely evidence for austerity measures and reform effects in multiple countries.[7] By the 2020s, ITS had become a cornerstone for population-level studies. The COVID-19 pandemic further accelerated its use, with numerous studies applying ITS to assess the effects of lockdowns, vaccination programs, and other public health measures on infection rates, hospitalizations, and mortality.[20]Design Elements
Basic Design
The basic design of an interrupted time series (ITS) study involves observing a single outcome variable over multiple time points before and after the introduction of an intervention, without the use of a comparison group.[10] This single-group approach relies on the longitudinal data from one unit—such as a population, region, or organization—to establish a baseline trend and detect potential changes attributable to the intervention.[21] Implementation begins with selecting an outcome variable that is measurable and relevant to the intervention, such as hospital admission rates or traffic fatalities.[2] Next, the intervention timing must be precisely defined, typically as a specific date when the change takes effect, enabling a clear demarcation between pre- and post-periods; this is crucial for abrupt interventions like policy implementations, though gradual rollouts (e.g., phased program introductions) can also be accommodated by noting the start of the transition.[10] Data collection then proceeds at equally spaced intervals, such as monthly or quarterly, to ensure consistency and capture the temporal pattern.[2] The data structure consists of time-indexed observations on the outcome for the single unit, with a sufficient number of pre-intervention points—at least 8–12—to reliably establish the baseline level and trend, allowing for the assessment of any immediate level shift or change in slope following the intervention.[8] Post-intervention data should mirror this in quantity to balance the series and enhance power, often drawn from routine administrative records or archives for reliability.[10] A representative example is the evaluation of England's 2007 smoke-free legislation on childhood asthma hospital admissions, using monthly data from April 2002 to November 2010 (63 pre-intervention and 41 post-intervention observations) to model baseline trends and observe post-intervention changes.[22]Controlled Designs
Controlled interrupted time series (CITS) designs enhance the basic interrupted time series (ITS) approach by incorporating one or more comparison groups unaffected by the intervention, allowing for a more robust estimation of the counterfactual trajectory and strengthening causal inferences.[23] These designs address limitations of single-series ITS, such as vulnerability to concurrent events, by enabling direct comparison of changes in trends between treated and control units.[24] Multiple-group ITS, also known as comparative ITS, involves adding a control series from units not exposed to the intervention to estimate the counterfactual trend for the treated group.[24] The analysis typically uses a segmented regression model that includes interaction terms between time, intervention status, and group membership to test for pre-intervention parallelism and estimate intervention effects on level and slope changes relative to the control.[24] For instance, parameters in the model capture baseline differences in intercepts and slopes between groups before the intervention, as well as differential changes post-intervention, facilitating tests of group comparability.[24] This approach has been applied in evaluations of public health policies, such as assessing the impact of speed limit reductions on road injuries by comparing implemented areas with similar untreated regions.[25] A hybrid of difference-in-differences (DiD) and ITS integrates the parallel trends assumption from DiD with the time-series structure of ITS to test and model both immediate level shifts and gradual slope changes post-intervention.[26] In this design, pre-intervention trends between treated and control groups are compared to validate assumptions, and post-intervention deviations are analyzed to isolate the intervention effect, often using fixed-effects models or generalized least squares.[26] This method is particularly useful in educational evaluations, where it has demonstrated greater precision than standard DiD under certain conditions, such as when interventions cause sustained trend alterations.[26] Propensity score-weighted ITS, introduced by Linden and Adams, employs a weighting scheme to balance treated and control units based on pre-intervention covariates, thereby creating a counterfactual that mimics the treated group's characteristics.[18] The process involves estimating propensity scores via logistic regression on baseline data, assigning weights (e.g., inverse probability weights for the average treatment effect on the treated) to control observations, and then applying weighted segmented regression to assess intervention impacts.[18] This technique was used to evaluate California's Proposition 99 tobacco control program, revealing a significant reduction in per capita cigarette sales compared to weighted controls, while accommodating multiple treated units without requiring complex matching.[18] Synthetic control ITS extends the synthetic control method by constructing a weighted composite control series from multiple untreated units to approximate the treated unit's pre-intervention trajectory, particularly when no single control matches closely.[27] Originating from Abadie et al.'s framework for comparative case studies, it optimizes weights via minimization of pre-intervention differences in outcomes and predictors, then extrapolates this synthetic series post-intervention to estimate effects in time-series contexts.[27] Applications include policy evaluations like the California tobacco program, where the synthetic control better captured secular trends than traditional controls, enhancing inference in settings with heterogeneous units.[27] These controlled designs collectively reduce threats from historical events or maturation by focusing on relative rather than absolute changes, improving the plausibility of causal attributions in observational settings.[23]Statistical Methods
Segmented Regression
Segmented regression represents the foundational parametric method for analyzing interrupted time series (ITS) data, employing linear regression to model changes in outcome levels and trends before and after an intervention.[12] This approach decomposes the time series into pre- and post-intervention segments, allowing estimation of immediate level shifts and alterations in the underlying trend slope attributable to the intervention.[19] By fitting a single model across the entire series, segmented regression provides a straightforward framework for quantifying intervention effects while accommodating the temporal structure of the data.[12] The core model equation for segmented regression in ITS analysis is given by: Y_t = \beta_0 + \beta_1 t + \beta_2 T_t + \beta_3 (t - \tau) T_t + \varepsilon_t where Y_t denotes the outcome at time t, t is the time index, T_t is a binary indicator variable (0 for pre-intervention periods and 1 for post-intervention), and \tau marks the intervention time point.[12] Here, \beta_0 captures the baseline level at time zero, \beta_1 estimates the pre-intervention trend slope, \beta_2 quantifies the immediate level change at the intervention, and \beta_3 measures the post-intervention change in slope relative to the pre-intervention trend.[19] The error term \varepsilon_t accounts for unexplained variation, typically assumed to follow a normal distribution.[12] Estimation proceeds via ordinary least squares (OLS) to obtain parameter coefficients, with adjustments for serial autocorrelation in the errors using Newey-West standard errors to ensure robust inference.[19] This heteroskedasticity- and autocorrelation-consistent (HAC) estimator corrects for potential correlation in residuals, which is common in time series data, thereby producing valid confidence intervals and p-values. Interpretation centers on the coefficients \beta_2 and \beta_3, where statistical tests (e.g., t-tests) assess their significance to determine the presence of an immediate effect (level change) or a gradual effect (slope change) due to the intervention.[12] Visualization aids comprehension, typically plotting the observed data points alongside the fitted regression lines for pre- and post-intervention segments to illustrate discontinuities or trend shifts.[19] The model assumes linearity in the pre- and post-intervention trends, absence of autocorrelation in residuals (or appropriate adjustment via robust errors), and homoscedasticity (stable variance) across time periods.[12] Violations of these can bias estimates, though the Newey-West correction mitigates autocorrelation issues.[19] Implementation is accessible in statistical software; in R, thelm() function fits the model, with the sandwich package providing NeweyWest() for robust standard errors. In Stata, the itsa command automates segmented regression for ITS, incorporating options for autocorrelation adjustments.