Fact-checked by Grok 2 weeks ago

Synthetic control method

The synthetic control method is a statistical approach for in comparative case studies, where a single treated unit—such as a , , or —undergoes an , and a synthetic counterfactual is formed by assigning non-negative weights to a set of untreated control units that optimally match the treated unit's pre-treatment outcome trajectory and relevant covariates. This weighting ensures the synthetic control approximates what the treated unit's outcome would have been absent the , enabling estimation of the on the treated as the post-treatment gap between observed and synthetic outcomes. Developed by economists Alberto Abadie, Alexis Diamond, and Jens Hainmueller, the method builds on earlier work by Abadie and Javier Gardeazabal (2003) and was formalized in their 2010 paper in the Journal of the American Statistical Association, addressing gaps in traditional difference-in-differences estimators by minimizing researcher discretion in control selection and avoiding extrapolation beyond observed data through weights that sum to one and are bounded between zero and one. Unlike simple matching or averaging of controls, SCM optimizes for balance on multiple pre-treatment periods, providing transparent unit contributions and robust inference under factor model assumptions where unobserved confounders evolve similarly across units. The method's strengths include its applicability to rare events or single-unit interventions where randomized experiments are infeasible, its data-driven nature that reduces bias from comparisons, and its ability to handle heterogeneous treatment effects by focusing on transparent, reproducible counterfactuals. Early applications demonstrated its utility in estimating the reduction in cigarette sales from California's Proposition 99 tobacco control initiative (approximately 26 packs annually by 2000) and the short-term economic costs of on West Germany's GDP trajectory. Subsequent extensions, such as generalized and augmented variants, have refined inference and addressed limitations like sensitivity to pre-treatment fit or sparse donor pools, though the core approach remains prized for bridging qualitative insights with quantitative rigor in fields like , , and .

Overview

Definition and Purpose

The synthetic control method (SCM) is a statistical technique for estimating causal effects of interventions on aggregate outcomes in comparative case studies, especially when a single treated unit—such as a , , or —lacks a direct untreated counterpart with parallel characteristics. Developed by economists Alberto Abadie and Javier Gardeazabal, SCM constructs a synthetic counterfactual by selecting optimal weights for a donor pool of untreated units to replicate the treated unit's pre-treatment values of the outcome variable and relevant predictors as closely as possible. This weighted combination, termed the synthetic control, approximates what the treated unit's trajectory would have been without the intervention, with the post-treatment gap between observed and synthetic outcomes serving as the effect estimate. The method's core purpose is to enable rigorous in non-experimental settings where traditional or matching approaches falter due to the absence of multiple treated units or violations of assumptions like parallel trends in difference-in-differences estimators. By prioritizing exact pre-treatment fit over post-treatment , SCM leverages longitudinal to isolate intervention effects while mitigating biases from unobserved confounders that vary over time but are stable across units in the pre-period. It has been applied to evaluate policies like laws, hikes, and economic shocks, providing transparent, replicable estimates grounded in observable rather than parametric modeling. Unlike propensity score methods, SCM avoids assuming functional forms for outcomes, emphasizing transparent weight optimization via minimization of in the pre-treatment phase.

Comparison to Difference-in-Differences and Other Methods

The synthetic control method (SCM) constructs a counterfactual for the treated unit by optimally weighting untreated donor units to replicate its pre-treatment outcome trajectory and relevant covariates, thereby avoiding the parallel trends assumption central to difference-in-differences (DiD) estimation. In DiD, treatment effects are identified under the assumption that treated and control groups would have followed parallel paths absent intervention, which can fail when units exhibit divergent pre-trends, leading to biased estimates if not addressed via extensions like fixed effects or trend controls. SCM mitigates this by design, as the synthetic control's weights ensure close matching on pre-treatment dynamics, making it preferable in applications with heterogeneous trends or where simple averaging of controls would distort the counterfactual. SCM excels in scenarios with one or few treated units, such as evaluating nationwide policies, where DiD struggles due to limited and reliance on group-level averages that may not capture unit-specific paths. For instance, DiD requires multiple untreated units for robust averaging but assumes their collective trends proxy the treated unit's counterfactual, whereas SCM's (weights summing to 1 and non-negative) provides a tailored, data-driven without beyond donors. However, SCM demands extensive pre-treatment observations—typically 10 or more periods—and a pool of donors exceeding predictors to avoid , constraints less stringent in DiD, which can operate with shorter panels if parallel trends hold. Relative to matching methods, SCM extends exact or propensity score matching by incorporating time-series structure and multiple predictors, yielding a dynamic counterfactual rather than static pairs, though it shares sensitivity to donor pool quality and dimensionality. Unlike regression discontinuity designs (RDD), which identify local average treatment effects near a forcing variable cutoff via continuity assumptions, SCM applies to non-experimental, unit-level interventions without natural discontinuities, prioritizing global trajectory approximation over localized compliance. Compared to instrumental variables (IV) approaches, SCM eschews the need for exclusion restrictions and valid instruments, relying instead on observable pre-treatment data for identification, but it forgoes IV's ability to address endogeneity from unobserved confounders when donors imperfectly match latent factors. Inference in SCM typically employs placebo permutations on donors or time, contrasting DiD's standard errors or wild bootstraps, with recent hybrids like synthetic DiD blending both for improved precision in multi-unit settings.

History

Origins in Abadie and Gardeazabal (2003)

The synthetic control method was first proposed by economists Alberto Abadie and Javier Gardeazabal in their 2003 study examining the economic effects of associated with the Basque separatist group in Spain's . Motivated by the challenge of estimating causal impacts in settings lacking directly comparable untreated units—such as other Spanish regions differentially affected by national policies or economic trends—the authors developed a data-driven approach to construct a counterfactual outcome trajectory for the treated unit. This method addressed limitations of traditional difference-in-differences estimators, which rely on parallel trends assumptions that may not hold when control groups are heterogeneous or influenced by unobserved confounders. In the Basque Country application, Abadie and Gardeazabal used annual per capita GDP data spanning 1955 to 1997, treating the intensification of terrorism—marked by a surge in attacks following the late 1960s and particularly after Francisco Franco's death in 1975—as the intervention. The synthetic control was formed as a weighted average of seven potential donor regions (excluding the Basque Country and Catalonia initially for matching), with weights optimized to minimize the difference between the Basque Country's pre-terrorism (primarily 1960s) characteristics and those of the synthetic counterpart. Key predictors for matching included per capita GDP, the investment-to-GDP ratio, and other economic indicators from the pre-intervention period, yielding weights dominated by Catalonia (0.8508) and Madrid (0.1492). This construction ensured the synthetic Basque Country closely replicated the actual region's trajectory before the conflict's escalation, assuming terrorism's effects were lagged and primarily negative on outcomes like GDP. Post-intervention comparisons revealed that Basque per capita GDP fell approximately 10 percentage points relative to the synthetic after the 1970s, attributing this gap to terrorism's direct and indirect costs, such as reduced investment and . To assess robustness, the authors conducted a placebo test by applying the method to (which received the highest weight in the Basque synthetic ) as if it were treated, finding no significant divergence and even a 4% relative outperformance in Catalonia's GDP during 1990–1997, which suggested the Basque estimate might understate the true impact. Supplementary evidence from Basque stock returns during a 1998–1999 ETA truce showed a +10.14% abnormal , reversing to -11.21% after the truce ended, corroborating the method's ability to isolate conflict-related effects. This inaugural use established the synthetic control as a transparent, non-parametric tool for comparative case studies, emphasizing empirical matching over parametric modeling.

Extensions and Popularization (2010 Onward)

The synthetic control method gained substantial traction following its application in Abadie, Diamond, and Hainmueller's 2010 analysis of California's Proposition 99, a 1988 tobacco control program that increased cigarette taxes and anti- advertising; this study, published in the Journal of the , demonstrated the method's efficacy in estimating effects for single treated units by constructing a synthetic counterfactual from untreated states, matching pre-treatment rates and other predictors closely. The paper's emphasis on data-driven weights reduced researcher discretion compared to traditional case studies, fostering adoption in and for interventions lacking randomized controls, such as state-level reforms. By the mid-2010s, applications proliferated in areas like , labor , and , with over 500 citations of the 2010 work by 2020, reflecting its utility for aggregate treatments where parallel trends assumptions of difference-in-differences fail. Software implementations accelerated popularization, including the Synth package for R and Stata released around 2011, which operationalized the original estimator, and subsequent tools like gsynth for generalized variants. These packages enabled reproducible analyses of donor pools with dozens of units, standardizing placebo tests for inference and promoting the method's use in non-experimental settings, such as evaluating minimum wage hikes or immigration reforms. Guidance from Abadie et al. (2021) outlined feasibility criteria, like requiring at least 5-10 pre-treatment periods and sufficient donor variability, while cautioning against overfitting in small samples; this work highlighted SCM's strengths in transparent counterfactual construction but noted vulnerabilities to post-treatment donor contamination. Extensions post-2010 addressed limitations in handling multiple treated units, time-varying unobservables, and inference biases. Xu's 2017 generalized synthetic control method incorporated interactive fixed effects models, allowing estimation of average treatment effects under factor structures where unit-specific factors load differently over time, thus relaxing strict pre-treatment matching and extending to with heterogeneous effects; implemented in gsynth, it has been applied to multi-country shocks. Ben-Michael, Feller, and Rothstein's 2021 augmented synthetic control method refined estimation via ridge-penalized outcome models, improving pre-treatment fit and reducing bias from extrapolation outside the convex hull of donors, particularly when covariates are few or noisy; this approach enhances robustness in finite samples through cross-validation of penalties. A 2021 Journal of the American Statistical Association special section further advanced robust variants, linking SCM to principal components for and equivalence to estimators under low-rank assumptions. These developments widened SCM's scope to clustered treatments and high-dimensional settings, though empirical validity hinges on unconfounded donor selection and stable latent factors.

Methodology

Data Structure and Requirements

The synthetic control method requires comprising a single treated unit and a donor pool of multiple untreated s, observed across a sequence of time periods that include both pre- and post- phases. The core outcome variable, denoted Y_{jt}, must be recorded for the treated unit (j=1) and each (j=2, \dots, J+1), over T total periods, with the intervention commencing after period T_0, such that pre- data span t=1 to T_0 and post- data span t=T_0+1 to T. This structure enables the construction of a counterfactual trajectory for the treated unit by weighting s to replicate its pre- outcome path and associated characteristics. A sufficient length of pre-intervention periods (T_0) is essential for feasibility, as it allows for precise of weights that minimize discrepancies in the treated unit's predictors—typically including averages of the outcome variable over the pre-period or other time-invariant covariates unaffected by the intervention. The donor pool must contain enough units (often J \geq 10, though no strict minimum is mandated) to ensure the treated unit's pre-intervention features lie within or near the of the controls' features, facilitating a close synthetic match. Balanced panels are standard, with complete data on outcomes and predictors for all units in the pre-period; while basic implementations assume no missing values, extensions such as methods can accommodate imbalances or post-treatment gaps in controls. Data requirements also encompass the absence of anticipation effects in the treated unit prior to T_0 and adherence to the stable unit value assumption, ensuring no spillovers or interference between units that could confound the control pool's validity. Predictor variables should be selected based on their to the outcome and immunity to effects, with empirical emphasizing a parsimonious set to avoid while capturing structural similarities.

Constructing the Synthetic Control

The synthetic control is formed as a weighted average of untreated donor units, where the weights are selected to minimize the discrepancy between the treated unit's pre-intervention characteristics and those of the synthetic counterpart. This process relies on a pool of control units presumed unaffected by the intervention, typically chosen based on similarity in economic, institutional, or geographic features to the treated unit. The characteristics matrix X_1 for the treated unit includes averages of outcome variables over the pre-treatment periods (e.g., mean GDP from t=1 to T_0) and additional predictors such as lagged outcomes, demographic factors, or sectoral shares that influence the outcome but are unaffected by the treatment. The corresponding matrix X_0 compiles these for the J donor units. Weights w^* = (w_2^*, \dots, w_{J+1}^*)^\top are obtained by solving the optimization problem w^* = \arg\min_w (X_1 - X_0 w)^\top V (X_1 - X_0 w) subject to w_j \geq 0 for all j and \sum_{j=2}^{J+1} w_j = 1, ensuring the synthetic control is a convex combination of donors. The symmetric positive definite diagonal matrix V assigns relative weights to the predictors, emphasizing those with greater predictive power for the outcome; in early applications, V was set to identity or simple diagonals, but subsequent refinements select its diagonal entries via cross-validation, splitting pre-treatment data (e.g., 1971–1980 for training, 1981–1990 for validation) to minimize root mean squared prediction error on held-out pre-period outcomes. The resulting weights yield a synthetic pre-treatment outcome \sum_{j=2}^{J+1} w_j^* Y_{jt} that approximates Y_{1t} closely for t \leq T_0, often achieving near-perfect fit in aggregate applications like state-level policy evaluations. For instance, in estimating California's tobacco control program's effects, weights of 0.00 for most states but positive for others (e.g., higher for ) matched cigarette consumption and predictors like retail prices and incomes from –1987. Poor pre-treatment fit (e.g., large root mean squared prediction error) signals invalid donors or model misspecification, prompting exclusion of dissimilar units or addition of constraints for sparsity (e.g., limiting non-zero weights to 1–4 units). This matching condition underpins the counterfactual: post-treatment, the gap Y_{1t} - \sum_{j=2}^{J+1} w_j^* Y_{jt} for t > T_0 estimates the on the treated, assuming the factor model generating the data—Y_{it}^N = \delta_t + \theta_t Z_i + \lambda_t \mu_i + \varepsilon_{it}, with unobserved factors \lambda_t and loadings \mu_i—permits via the weighted donors.

Estimation Procedure and Weights

The estimation procedure for synthetic control weights centers on solving a constrained optimization problem using pre-intervention data to construct a counterfactual that closely matches the treated unit's observed characteristics prior to treatment. Specifically, let unit 1 be the treated unit and units j = 2, \dots, J+1 the potential controls from the donor pool; the vector of weights \mathbf{w}^* = (w_2^*, \dots, w_{J+1}^*)' minimizes the quadratic form \| \mathbf{X}_1 - \mathbf{X}_0 \mathbf{w} \|_V^2 = (\mathbf{X}_1 - \mathbf{X}_0 \mathbf{w})' V (\mathbf{X}_1 - \mathbf{X}_0 \mathbf{w}), subject to w_j \geq 0 for all j and \sum_{j=2}^{J+1} w_j = 1. Here, \mathbf{X}_1 is a K \times 1 vector of pre-treatment predictors for the treated unit, including time-invariant covariates \mathbf{Z}_1 and transformations of pre-treatment outcomes such as averages \bar{Y}_{1} = T_0^{-1} \sum_{t=1}^{T_0} Y_{1t}; \mathbf{X}_0 is the analogous K \times J matrix for controls; and V is a K \times K symmetric positive semidefinite diagonal matrix that weights the relative importance of each predictor, often calibrated to minimize mean squared prediction error (MSPE) on held-out pre-treatment periods. This optimization yields a of controls, ensuring the synthetic unit lies within the of the donor pool and avoiding beyond observed data points, which enhances transparency and interpretability relative to methods permitting negative weights. The problem is solved via algorithms, as implemented in packages like Synth in , which iteratively adjust weights to achieve the minimum discrepancy. In the original application to the Basque Country's terrorism costs, Abadie and Gardeazabal (2003) specified predictors including initial per capita GDP, population shares, and sectoral investments, selecting V's elements to best replicate the Basque GDP trajectory in the 1960s, resulting in nonzero weights primarily for (85%) and (15%). Post-treatment, for t > T_0, the counterfactual outcome is \hat{Y}_{1t}^N = \sum_{j=2}^{J+1} w_j^* Y_{jt}, and the treatment effect estimate is \hat{\alpha}_{1t} = Y_{1t} - \hat{Y}_{1t}^N. Extensions refine this core procedure; for instance, Abadie (2021) recommends splitting pre-intervention periods into training and validation subsets to select V^* via out-of-sample MSPE minimization, reducing risks when predictor dimensionality approaches the number of pre-treatment periods. While the standard approach enforces non-negativity for causal realism under parallel trends-like assumptions, alternatives like Doudchenko and Imbens (2016) relax this via elastic net penalization, allowing negative weights but introducing potential biases. The number of nonzero weights is typically bounded by the number of predictors K, promoting sparsity and economic interpretability in the synthetic composition.

Assumptions and Identification Strategy

Core Identifying Assumptions

The synthetic control method identifies causal effects under a linear interactive fixed effects for untreated potential outcomes. For control units j = 2, \dots, J and all time periods t, the outcome is Y_{jt}^N = \delta_t + \theta_t' Z_j + \lambda_t' \mu_j + \varepsilon_{jt}, where \delta_t captures common time shocks, Z_j denotes observed time-invariant covariates, \mu_j represents unobserved unit-specific factors, \lambda_t are unobserved time-varying factor loadings, and \varepsilon_{jt} is a zero-mean idiosyncratic error orthogonal to the factors and covariates. This model assumes that systematic variation in outcomes arises from these interactive fixed effects, with transitory shocks \varepsilon_{jt} being small relative to the signal, particularly in pre-treatment periods, to enable precise matching. Identification hinges on constructing non-negative weights w^* = (w_2^*, \dots, w_J^*) summing to one that minimize the discrepancy between the treated unit's pre- outcomes Y_{1t} for t \leq T_0 and the synthetic counterpart \sum_{j=2}^J w_j^* Y_{jt}, often incorporating covariates Z_1. This matching implies that the synthetic approximates the treated unit's unobserved factors \mu_1 and covariates Z_1, such that absent treatment, post-treatment outcomes would have evolved similarly: Y_{1t}^N \approx \sum_{j=2}^J w_j^* Y_{jt} for t > T_0. The requires that the donor pool spans the of the treated unit's factor loadings, allowing exact or near-exact replication of pre-trends without relying on parallel trends across raw s. Crucial auxiliary assumptions include the absence of spillovers, ensuring treatment affects only the treated unit and not controls, and no , whereby effects manifest only after T_0. Additionally, the framework presumes stable causal mechanisms post-treatment, meaning unobserved confounders captured by \mu_i and \lambda_t do not evolve differentially between the treated and synthetic units due to the intervention. Violations, such as time-varying unobserved confounders uncorrelated with pre-treatment data, can bias estimates, underscoring the need for substantive knowledge to validate the approximation.

Inference via Placebo and Permutation Tests

In the synthetic control method (SCM), standard parametric inference procedures are often unsuitable due to the typical presence of a single treated unit and a small number of donor pool controls, which precludes asymptotic approximations and variance estimation under conventional assumptions. Instead, Abadie, Diamond, and Hainmueller (2010) propose nonparametric inference based on placebo and permutation tests, which generate an empirical distribution of test statistics under a sharp null hypothesis of no treatment effect to assess significance. These approaches leverage the structure of comparative case studies by simulating "placebo interventions" or random reassignments, providing exact finite-sample inference without relying on large-sample normality. Placebo tests form the core of this inference framework, involving the application of the SCM to each untreated in the donor pool as if it had received the . For a given placebo unit j, a synthetic control is constructed using the remaining donor units (excluding j) to match pre-intervention outcomes and covariates, yielding a counterfactual . The post-intervention discrepancy—typically measured as the root mean squared prediction error (RMSPE), defined as \sqrt{\frac{1}{T - T_0} \sum_{t > T_0} (Y_{jt} - \hat{Y}_{jt}^N)^2} where T_0 is the pre-treatment period length and \hat{Y}_{jt}^N is the synthetic prediction—is then computed and compared to the pre-intervention fit via the ratio of post- to pre-RMSPE. To mitigate from poorly fitting placebos, units with pre-intervention MSPE exceeding a multiple (e.g., 5–20 times) of the treated unit's are often discarded, ensuring the placebo distribution reflects comparable goodness-of-fit. The significance of the estimated treatment effect for the actual treated unit is evaluated by its position in this placebo distribution: the p-value is the proportion of placebo units exhibiting a larger absolute post-intervention gap or RMSPE ratio than the treated unit. For instance, in the evaluation of California's Proposition 99 tobacco control program implemented in 1988, placebo applications to 38 control states yielded a p-value of 0.026 for the per capita cigarette consumption gap, indicating the observed effect was unlikely under the null. With stricter fit criteria (MSPE < 5 times California's), the p-value tightened to 0.05 using 19 placebos. Permutation tests extend this logic by randomly reassigning the treatment status across units (or, less commonly, in time) to generate a null distribution, akin to randomization inference in experimental settings. This mirrors the placebo approach but allows for broader permutations when donor pool size permits enumeration of all assignments, though computational feasibility often limits it to sampling. While effective for small samples, subsequent analyses have highlighted potential size distortions in permutation tests under repeated sampling, particularly when pre-treatment fit varies or the number of controls is limited, as the symmetry required for exact inference may not hold in non-i.i.d. data-generating processes. Monte Carlo evidence shows rejection rates exceeding nominal levels (e.g., 19–84% vs. 10%) in certain specifications, underscoring the need for robustness checks.

Applications

Early Applications in Economics

The synthetic control method found early traction in economics through its application to evaluate the causal effects of Proposition 99 in California, a November 1988 voter initiative that increased the excise tax on cigarettes by 25 cents per pack and directed a portion of revenues to anti-smoking campaigns, taking effect in January 1989. Abadie, Diamond, and Hainmueller (2010) employed the method to assess its impact on annual per capita cigarette sales, constructing a synthetic California as a weighted average of other U.S. states: Colorado (weight 0.164), Connecticut (0.069), Montana (0.199), Nevada (0.234), and Utah (0.334). These weights minimized pre-1988 discrepancies in predictors such as retail cigarette prices, per capita income, and beer consumption. The analysis indicated that Proposition 99 reduced per capita sales by approximately 26 packs by 2000 compared to the synthetic counterfactual, corresponding to an average annual reduction of about 20 packs (roughly 25% below the pre-intervention trajectory). Another formative economic application assessed the macroeconomic repercussions of German reunification, which occurred on October 3, 1990, following the fall of the Berlin Wall in November 1989 and integrated East Germany into West Germany's institutions and currency union. Abadie, Diamond, and Hainmueller (2015) used the method to estimate effects on West Germany's real per capita GDP (in 2002 purchasing power parity USD), drawing from a donor pool of 16 OECD countries excluding Germany to form a synthetic control weighted as follows: Austria (0.42), United States (0.22), Japan (0.16), Switzerland (0.11), and Netherlands (0.09). Weights were optimized via leave-one-out cross-validation to match 1960–1989 trends. Results showed reunification lowered per capita GDP by an average of 1,600 USD annually from 1990 to 2003—about 8% relative to the 1990 baseline—with the synthetic control exceeding actual GDP by 12% by 2003, attributing the divergence to fiscal transfers, labor market disruptions, and productivity drags from integration. These studies highlighted the method's capacity to generate transparent counterfactuals for aggregate, single-treated units where randomized experiments were infeasible, influencing subsequent policy analyses such as state-level tax reforms and trade shocks. By providing quantifiable estimates grounded in pre-treatment matching, they addressed selection biases inherent in difference-in-differences approaches reliant on untreated units with imperfect comparability.

Use in Policy Evaluation and Beyond

The synthetic control method (SCM) has found widespread application in policy evaluation, particularly for assessing interventions at the state, provincial, or national level where randomized experiments are infeasible. By constructing a weighted combination of untreated units that mirrors the treated unit's pre-intervention trajectory, SCM estimates counterfactual outcomes to quantify policy effects. For example, in evaluating the economic consequences of in 1990, SCM revealed that West Germany's per capita GDP grew more slowly than its synthetic counterpart post-treatment, attributing a portion of this divergence to the fiscal burdens of integration. In public health policy, SCM has been used to examine the impacts of tobacco control programs and smoking restrictions. Beyond the seminal analysis of California's , which demonstrated a reduction in per capita cigarette sales, the method has evaluated similar antismoking initiatives across European regions and U.S. states, often finding significant declines in consumption attributable to tax hikes and advertising bans. In broader health policy contexts, SCM addresses single-unit treatments like regional vaccination campaigns or healthcare reforms, offering advantages over by avoiding assumptions of parallel trends. Environmental policy evaluations have leveraged SCM to assess air quality regulations and event-specific interventions. During the 2016 G20 Summit in Hangzhou, China, temporary pollution controls resulted in immediate and persistent improvements in air quality metrics, as evidenced by comparisons to a synthetic control constructed from similar untreated cities. Similarly, in wildfire exposure studies, two-stage generalized SCM variants have quantified heterogeneous health effects, highlighting the method's adaptability to environmental shocks. Beyond traditional policy domains, SCM extends to ecology and natural experiments, where it constructs counterfactuals for rare events like invasive species introductions or habitat alterations. In these applications, the method relates treated ecological units to donor pools of controls, enabling causal estimates without relying on parametric models. In comparative politics, SCM informs analyses of institutional shocks, such as democratic transitions, by providing transparent, data-driven benchmarks for economic and social outcomes. These extensions underscore SCM's versatility in handling aggregate data with limited treatment variation, though applications outside economics remain less common due to data demands.

Advantages

Transparency and Interpretability

The synthetic control method derives its transparency from the explicit construction of the counterfactual as a convex combination of observable control units, with weights w_j constrained to lie between 0 and 1 and sum to 1, thereby avoiding extrapolation beyond the donor pool. This formulation allows researchers to directly inspect the weights, revealing the relative contributions of specific control units to mimicking the treated unit's pre-treatment trajectory in both outcomes and predictors. Unlike opaque machine learning approaches, the optimization minimizes discrepancies via a quadratic loss function, yielding deterministic weights that prioritize close empirical matches over theoretical assumptions. Interpretability is further enhanced by the method's sparsity in weights, which typically involves only a few donor units with non-zero contributions, providing a geometric and substantive understanding of the synthetic control's composition. For instance, in applications like evaluating California's tobacco control program, the weights highlighted specific U.S. states whose combined pre-1988 patterns best approximated California's smoking rates and covariates, enabling qualitative validation alongside quantitative fit. Time-series visualizations of the treated and synthetic paths facilitate intuitive assessment of pre-treatment balance and post-intervention divergence, where the gap represents the estimated effect without relying on aggregated residuals or complex diagnostics. This data-driven transparency reduces researcher discretion in control selection, promoting replicability, though it requires sufficient pre-treatment periods and relevant predictors to achieve meaningful weights. Relative to difference-in-differences or regression discontinuity designs, synthetic controls safeguard against bias from poor matches by embedding the matching process in the estimation, making the causal claim traceable to observable data alignments.

Handling Single Treated Units

The synthetic control method (SCM) is particularly advantageous in empirical settings featuring a single treated unit, such as a specific country, state, or policy jurisdiction exposed to an intervention, accompanied by a donor pool of untreated control units. Traditional causal inference approaches like difference-in-differences require multiple treated units to estimate average treatment effects under parallel trends assumptions, rendering them inapplicable or unreliable for singleton cases where within-treated variation is absent. In contrast, SCM generates a counterfactual outcome trajectory for the treated unit by solving a constrained optimization problem: non-negative weights summing to one are assigned to control units to minimize the mean squared prediction error between the weighted control average and the treated unit's pre-intervention outcomes and predictors. This data-driven weighting ensures the synthetic control closely replicates the treated unit's pre-treatment path, allowing the post-intervention gap to estimate the causal effect without relying on untestable group-level assumptions. This capability addresses a core challenge in comparative case studies, where no individual control unit may sufficiently resemble the treated unit across multiple pre-treatment periods or covariates, but a convex combination can achieve a superior match. For instance, in evaluating the economic impact of terrorist activity in the Basque Country from 1975 onward, Abadie and Gardeazabal (2003) applied SCM to construct a synthetic Basque region from Spanish regions excluding the Basque Country and other European counterparts, yielding weights that matched per capita GDP closely pre-1975 while revealing a substantial divergence post-intervention. The method's transparency in weight selection—often interpretable as emphasizing donor units with similar economic structures—facilitates qualitative validation alongside quantitative estimation, mitigating concerns over arbitrary control selection. Inference in single treated unit applications relies on in-space and in-time placebo tests rather than large-sample asymptotics, enhancing robustness. Placebo studies involve reapplying SCM to untreated control units as pseudo-treated, generating a distribution of "placebo effects" under the null of no treatment; significant divergence of the actual post-treatment gap from this distribution indicates a genuine effect. This permutation-based approach accounts for finite-sample uncertainty inherent to singleton designs, as demonstrated in Abadie et al. (2010)'s analysis of California's tobacco control program, where the synthetic control (weighted from other U.S. states) showed no pre-trend discrepancy and placebo tests rejected the null for the true effect on cigarette sales. By formalizing counterfactual construction, SCM thus enables credible causal claims in data-scarce treated settings, prioritizing empirical fit over parametric modeling.

Limitations and Criticisms

Potential Biases and Inconsistency

The synthetic control method (SCM) relies on the assumption that the weighted combination of control units closely approximates the counterfactual outcome for the treated unit in the absence of treatment, but imperfect matching in pre-treatment periods can introduce bias by attributing residual discrepancies to the intervention effect rather than underlying differences. This interpolation bias arises when the control pool lacks units that fully span the treated unit's characteristics, leading to systematic errors that grow with the magnitude of unmodeled confounders. For instance, in applications with sparse predictors or short pre-intervention windows, the optimization may yield suboptimal weights, exacerbating bias toward over- or underestimation of treatment effects. Regression to the mean further compounds bias in SCM estimates, particularly for outcomes exhibiting mean reversion, where high or low pre-treatment values in the treated unit regress toward the population average post-treatment, mimicking or masking intervention impacts. Simulations demonstrate that this effect can inflate type I error rates and produce bias toward the null hypothesis in policy evaluations with volatile outcomes, such as health or economic indicators. Additionally, in non-linear data-generating processes, the linear weighting scheme of SCM fails to capture heterogeneous responses, resulting in biased counterfactuals that increase with unit-wise discrepancies between the treated and synthetic units. Regarding inconsistency, the SCM estimator lacks guaranteed consistency under standard asymptotics with fixed numbers of controls (J) and periods (T), as the finite-dimensional approximation may not converge to the true counterfactual even as T grows, unless the latent factor space is low-dimensional and well-spanned by controls. Theoretical analyses show that inconsistency emerges from model misspecification, such as unaccounted time-varying factors or spillover effects to controls, violating the no-interference assumption and leading to divergent estimates in large samples. Empirical studies highlight sensitivity to predictor selection, where omitting key covariates or including noisy ones amplifies inconsistent recovery of treatment effects across replications. While extensions like penalized or factor-augmented SCM aim to mitigate this by enforcing sparsity or imposing structure, base implementations remain prone to inconsistency in high-dimensional or weakly parallel settings.

Challenges in Inference and Sensitivity

Inference in the synthetic control method (SCM) is complicated by the absence of large-sample asymptotic theory, as the approach typically involves a fixed number of units, including only one treated unit, precluding standard error estimates derived from high-dimensional asymptotics. Instead, inference relies heavily on non-parametric procedures such as placebo tests, which simulate the treatment effect under the null hypothesis by applying the method to untreated units (in-space placebos) or pre-treatment periods (in-time placebos), but these can suffer from low statistical power and distorted test sizes in finite samples, particularly when control units exhibit heterogeneous pre-treatment fits. Refinements like leave-two-out placebo tests improve type-I error control and power by generating more reference distributions, yet they remain conditional on assumptions such as uniform treatment assignment probabilities and can fail uniform consistency under certain data-generating processes. Standard placebo tests implicitly assume equal probabilities of treatment across units, which, if violated in observational settings, introduces hidden bias and reduces the validity of p-values, as demonstrated in applications where sensitivity parameters (e.g., φ > 1) alter rejection thresholds significantly. Moreover, excluding control units with poor pre-intervention matches—such as those with root mean squared prediction error (RMSPE) exceeding five times the treated unit's—can mitigate but risks reducing the donor pool size, exacerbating fragility when few viable controls remain. These procedures are often restricted to testing the sharp null of zero treatment effects, with extensions to or linear effects requiring additional assumptions that limit generalizability and confidence set precision. Sensitivity analyses highlight vulnerabilities to violations of core assumptions, such as the invariance of latent causal across pre- and post-intervention periods, where distribution shifts in unobserved factors can counterfactual predictions, with bounds on proportional to the maximum deviation in expected covariates (e.g., ≤ N × max(|β_i|) × max(|E(x_pre_i) - E(x_post_i)|)). Estimates are particularly to donor composition and predictor selection; small perturbations, like altering the number of pre- periods or excluding dissimilar units, can yield unstable weights and divergent effect magnitudes, as evidenced in re-analyses of cases like the terrorism impact where results hinge on quality. While parameters (e.g., varying probabilities via φ) provide robustness checks, they demand careful and do not fully eliminate extrapolation risks post-, underscoring the method's reliance on strong assumptions for credible causal claims.

Extensions and Recent Developments

Generalized Synthetic Controls

The generalized synthetic control method (GSCM), introduced by Yiqing Xu in 2017, extends the original synthetic control framework by incorporating linear interactive fixed effects models to estimate counterfactual outcomes in settings with binary treatments. Unlike the standard synthetic control method, which relies on weighted averages of control units to match pre-treatment outcomes under additive unit and time fixed effects, GSCM models unobserved heterogeneities as interactions between latent time-varying factors and unit-specific loadings, allowing for more flexible trends and multiple treated units. The underlying model for untreated outcomes is specified as Y_{it} = \mathbf{x}_{it}'\boldsymbol{\beta} + \boldsymbol{\lambda}_i' \mathbf{f}_t + \epsilon_{it}, where \mathbf{x}_{it} are observed covariates, \boldsymbol{\beta} their coefficients, \boldsymbol{\lambda}_i are unobserved unit-specific factor loadings, \mathbf{f}_t are unobserved common factors varying over time, and \epsilon_{it} is an error term. For treated units, the model includes a treatment effect term \delta_{it} D_{it}, where D_{it} = 1 post-treatment. Estimation proceeds in two stages: first, the interactive fixed effects model is fitted to the control group data to recover estimates of \boldsymbol{\beta}, the factor matrix \mathbf{F}, and control loadings \boldsymbol{\Lambda}_{co}; second, pre-treatment data for treated units are used to estimate their loadings \boldsymbol{\lambda}_{tr}, enabling imputation of post-treatment counterfactuals \hat{Y}_{it}(0) = \mathbf{x}_{it}'\hat{\boldsymbol{\beta}} + \hat{\boldsymbol{\lambda}}_i' \hat{\mathbf{f}}_t. The average treatment effect on the treated (ATT) at time t is then \widehat{ATT}_t = \frac{1}{N_{tr}} \sum_{i \in \mathcal{T}} [Y_{it}(1) - \hat{Y}_{it}(0)], where N_{tr} is the number of treated units and \mathcal{T} the treated set. This approach accommodates multiple treated units and staggered treatment adoption by estimating a single set of common from controls and imputing individually for each treated unit, avoiding the need for unit-by-unit matching. It relaxes the original synthetic control's restrictive assumptions, such as no anticipation and exact pre-treatment matching, by leveraging the full control for factor estimation and permitting violations of parallel trends through interactive effects. is conducted via parametric bootstrap, resampling residuals to account for serial correlation and construct confidence intervals for ATT estimates, under assumptions including strict exogeneity of errors, common factor structure across units, and weak serial dependence. Empirical applications demonstrate its utility; for instance, applied GSCM to evaluate the impact of Election Day Registration laws on U.S. from 1920 to 2012, estimating a 5 increase for early adopters using state-level with multiple treated states entering at different times. The method is implemented in the R package gsynth, which supports unbalanced panels, alternatives, and diagnostic tools for model validation, such as gap time balance plots. This extension enhances the synthetic control toolkit for policy evaluation in settings with complex heterogeneity, though it requires the interactive fixed effects structure to hold for consistency.

Integration with Machine Learning and Design-Based Approaches

The synthetic control method (SCM) has been augmented with techniques to enhance donor selection, handle high-dimensional data, and mitigate in weight estimation. In traditional SCM, donors are manually selected based on substantive similarity, but approaches automate this via clustering algorithms, which group control units by features to form a more relevant pool, improving match quality and reducing extrapolation bias. models, such as random forests or neural networks, can further predict untreated outcomes to construct flexible synthetic controls, allowing for nonlinear interactions and better approximation in complex settings. These integrations address SCM's sensitivity to donor choice by leveraging data-driven feature importance, as demonstrated in applications estimating policy effects where ML-selected donors yield tighter intervals compared to ad-hoc selection. A prominent extension is the augmented synthetic control method (ASCM), which relaxes the strict pre-treatment matching requirement by incorporating a correction term estimated through or other regularized learners, enabling valid inference even with imperfect synthetic matches. This approach debiases the estimator by projecting residuals onto the space orthogonal to covariates, drawing from double principles to nest SCM within flexible nonparametric frameworks for . Recent comparisons show ASCM outperforming standard SCM in finite samples with spillovers or staggered adoption, as it combines weighted averages with ML-estimated adjustments for unobserved confounders. Dynamic variants further integrate panel-aware double , treating SCM weights as a form of targeted regularization to estimate treatment effects under heterogeneous trends. From a design-based perspective, SCM inference has shifted toward randomization-based frameworks that condition on the observed selection of the treated unit, avoiding parametric assumptions about error distributions and instead deriving exact tests via permutations over potential treated units. This approach formalizes placebo inference—reassigning treatment to controls and recomputing synthetic gaps—as a valid randomization test under a finite population model where unit selection is fixed or superpopulation-sampled, providing uniform validity across specifications. Unlike model-based conformal inference, design-based SCM emphasizes the superpopulation of possible donor pools, enabling sensitivity analysis to latent selection mechanisms without relying on asymptotic normality. Extensions apply this to experimental design, using synthetic controls to reweight samples for improved power in randomized trials with covariates, bridging observational and experimental causal paradigms. Hybrid integrations combine these paradigms, such as using for initial donor pooling within a design-based , which enhances for large datasets while preserving finite-sample guarantees through permutations. Empirical evaluations indicate these methods reduce Type I in settings with few donors, outperforming classical SCM variance estimators that assume linear factor models. Challenges persist in computational tractability for high-dimensional ML components, but advances in optimization, like relaxation solvers for non-negative weights, facilitate broader adoption.

Implementation

Software Packages and Tools

The synthetic control method is supported by dedicated software packages in , , , and , enabling researchers to estimate counterfactuals through weighted combinations of control units. These tools typically involve data preparation functions to specify pre-intervention matching periods and covariates, followed by optimization routines (often ) to minimize discrepancies between treated and synthetic units. Implementations vary in handling extensions like multiple treated units, procedures, and integration with other econometric models, with core packages originating from the method's developers. In R, the Synth package implements the original synthetic control estimator, using functions like dataprep() for input preparation and synth() for weights optimization and effect estimation, as detailed in the method's foundational work. The gsynth package extends this to generalized synthetic controls via and interactive fixed effects, suitable for with multiple treated units and providing built-in cross-validation for factor selection. Additional options include tidysynth, which adopts a workflow for streamlined estimation, plotting, and placebo inference, and scul for lasso-penalized synthetic controls to enhance flexibility in high-dimensional settings. Python libraries include SyntheticControlMethods, which fits synthetic controls to panel or time-series data using optimization and supports extensions like ensemble methods for robustness. The pysyncon package offers implementations of vanilla SCM alongside robust variants, such as those with shrinkage, and includes tools for inference via . Other options like scpi focus on synthetic controls with partially pooled inference for staggered adoption designs. In , the synth command mirrors the counterpart, requiring panel setup via tsset before specifying outcome variables, pre-treatment matching, and optional predictors for estimation. User-contributed extensions such as synth_runner automate workflows for multiple specifications, while allsynth addresses bias correction in stacked or multiple-treatment synthetic designs. MATLAB toolboxes, also from the original developers, provide analogous functionality through scripts for optimization and visualization, though less commonly updated than open-source alternatives. Researchers should verify package versions for compatibility with recent data structures and inference needs, as implementations may differ in default optimization solvers or handling of unit-specific trends.

Practical Considerations for Researchers

Researchers must ensure access to panel data with a sufficiently long pre-treatment period, ideally spanning multiple years or periods to enable accurate matching on trends and reduce bias from unobserved confounders. For instance, Abadie recommends at least as many pre-treatment observations as the number of control units used in weighting to satisfy feasibility conditions for exact matching. The donor pool should comprise untreated units similar to the treated unit in relevant characteristics, excluding those exposed to spillovers, similar interventions, or major shocks that could distort the counterfactual. A large pool relative to the number of predictors (e.g., J > K+1, where J is controls and K predictors) enhances the likelihood of finding weights within the unit simplex, avoiding extrapolation beyond the convex hull of controls. Predictor selection involves including key covariates that influence the outcome, such as lagged outcomes and structural variables, while employing data-driven methods like cross-validation to assess and prevent , particularly with noisy or volatile series. Weights are estimated via optimization minimizing pre-treatment discrepancies, with non-negativity constraints ensuring interpretability and no ; nested optimization for variance-covariance weights (V) is preferable over simpler or approaches to improve fit. Researchers should standardize variables consistently across software (e.g., unit variance rescaling) to mitigate discrepancies in estimates from packages like Synth or AugSynth. Post-estimation diagnostics include evaluating pre-treatment (RMSPE) for goodness-of-fit, conducting tests by applying the method to untreated units, and leave-one-out robustness checks to assess to individual donors. relies on permutation-based p-values comparing the treated unit's post-treatment RMSPE ratio to distributions, rather than assumptions. analyses should test variations in donor pools, predictor sets, and weight constraints; pre-registering specifications guards against . Common pitfalls include reliance on pre-fit alone for bias prediction (often uncorrelated with true bias), omission of covariates when pre-periods are short, and uncorrected , which bias-corrected extensions like augmented synthetic controls can address.

References

  1. [1]
    [PDF] Synthetic Control Methods For Comparative Case Studies
    Building on an idea in Abadie and Gardeazabal (2003), this article investigates the application of synthetic control methods to comparative case studies. We ...
  2. [2]
    Synthetic Control Methods for Comparative Case Studies
    Building on an idea in Abadie and Gardeazabal (2003), this article investigates the application of synthetic control methods to comparative case studies.
  3. [3]
    Synthetic Control Methods for the Evaluation of Single-Unit ...
    Advantages of the synthetic control method over alternative evaluation methods. SCM has 3 main strengths. First, it provides data-driven and formal criteria ...Missing: peer | Show results with:peer
  4. [4]
    Synthetic control methodology as a tool for evaluating population ...
    Synthetic control methods are a valuable addition to the range of approaches for evaluating public health interventions when randomisation is impractical.
  5. [5]
    Generalized Synthetic Control Method: Causal Inference with ...
    Feb 21, 2017 · The GSC method has two main limitations. First, it requires more pretreatment data than fixed effects estimators. When the number of ...
  6. [6]
    [PDF] The Augmented Synthetic Control Method Eli Ben-Michael, Avi ...
    The synthetic control method (SCM) is a popular approach for estimating the impact of a treatment on a single unit in panel data settings with a modest number ...
  7. [7]
    The Economic Costs of Conflict: A Case Study of the Basque Country
    ... Abadie and Javier Gardeazabal. Published in volume 93, issue 1, pages 113-132 of American Economic Review, March 2003, Abstract: This article investigates ...
  8. [8]
    Using Synthetic Controls: Feasibility, Data Requirements, and ...
    This article aims to provide practical guidance to researchers employing synthetic control methods. The article starts with an overview and an introduction to ...
  9. [9]
    [PDF] Synthetic Controls in Action - MIT Economics
    Sep 17, 2021 · In this article, we have described the settings where the synthetic control method provides reliable estimates, and those where it does not.Missing: original | Show results with:original
  10. [10]
    [PDF] Using Synthetic Controls: Feasibility, Data Requirements, and ...
    As discussed in Section 3.3, Abadie, Diamond, and Hainmueller. (2010) show that if the data generating process follows a linear factor model, then the bias ...
  11. [11]
    25 - Synthetic Difference-in-Differences — Causal Inference for the ...
    Synthetic Controls can accommodate non-parallel pre-treatment trends much better, so it is not susceptible to the same bias as Diff-in-Diff. Rather, the process ...
  12. [12]
    When do we prefer synthetic control over a difference in ... - Reddit
    Aug 21, 2024 · I think synthetic control makes sense to use instead of DiD when DiD gives bad pre-trends. If DiD gives bad pre-trends, we can't trust the ...Sanity check - am I understanding synthetic controls correctly? - RedditQuestions about synthetic control method (SCM) and ... - RedditMore results from www.reddit.com
  13. [13]
    Chapter 29 Synthetic Difference-in-Differences | A Guide on Data ...
    An alternative approach, Synthetic Control (SC), is particularly useful when only a few units receive treatment. Instead of assuming parallel trends, SC ...
  14. [14]
    Understanding Synthetic Control Methods | Towards Data Science
    Jul 30, 2022 · The idea of synthetic control is to exploit the temporal variation in the data instead of the cross-sectional one (across time instead of across units).
  15. [15]
    Synthetic Control Methods for the Evaluation of Single-Unit ... - NIH
    The synthetic control method provides a potential solution to these problems by using a data-driven algorithm to identify an optimal weighted control unit.
  16. [16]
    15 - Synthetic Control — Causal Inference for the Brave and True
    It is based on a simple, yet powerful idea. We don't need to find any single unit in the untreated that is very similar to the treated.Synthetic Control As Linear... · Don't Extrapolate · Making Inference<|separator|>
  17. [17]
    Synthetic Difference-in-Differences - American Economic Association
    We present a new estimator for causal effects with panel data that builds on insights behind the widely used difference-in-differences and synthetic control ...
  18. [18]
    None
    **Summary of "The Economic Costs of Conflict: A Case Study of the Basque Country"**
  19. [19]
    Introduction to the Special Section on Synthetic Control Methods
    Dec 16, 2021 · Abadie, A., Diamond, A., and Hainmueller, J. (2015), “Comparative Politics and the Synthetic Control Method,” American Journal of Political ...
  20. [20]
    [PDF] WHAT IS THE SYNTHETIC CONTROL METHOD? | Urban Institute
    For example, Abadie, Diamond and Hainmueller (2015) study the economic cost of the reunification of East and West. Germany. Hankins (2020) examines Nebraska's ...
  21. [21]
    Generalized Synthetic Control Method — gsynth - Yiqing Xu
    gsynth implements the generalized synthetic control method. It imputes counterfactuals for each treated unit using control group information based on a linear ...
  22. [22]
    The Augmented Synthetic Control Method - Taylor & Francis Online
    The synthetic control method (SCM) is a popular approach for estimating the impact of a treatment on a single unit in panel data settings.
  23. [23]
  24. [24]
    None
    ### Summary of Constructing Synthetic Controls from the Document
  25. [25]
    [PDF] Using Synthetic Controls: Feasibility, Data Requirements, and ...
    Finally, the article discusses some recent extensions that widen the applicability, robustness, and flexibility of the method. Open areas of related research ...
  26. [26]
  27. [27]
    [PDF] On the Assumptions of Synthetic Control Methods - arXiv
    Dec 14, 2021 · [Aba19]. Alberto Abadie. “Using synthetic controls: feasibility, data require- ments, and methodological aspects”. In: Journal of Economic ...
  28. [28]
    [PDF] Synthetic Control and Inference - UCR | Department of Economics
    Nov 28, 2017 · Alberto Abadie kindly pointed out that the placebo test in synthetic control is often based on randomization inference idea, under which the ...
  29. [29]
    [PDF] Synthetic Control Methods for Comparative Case Studies - MIT
    Building on an idea in Abadie and Gardeazabal (2003), this article investigates the application of synthetic control methods to comparative case studies.
  30. [30]
    [PDF] Comparative Politics and the Synthetic Control Method
    ALBERTO ABADIE, ALEXIS DIAMOND, AND JENS HAINMUELLER. In this article, we discuss how synthetic control meth- ods (Abadie, Diamond, and Hainmueller 2010; Abadie.
  31. [31]
    [PDF] The Synthetic Control Method as a Tool to Understand State Policy
    The Synthetic Control Method (SCM) is a tool to evaluate policy effectiveness, using a publicly available open-source modeling tool. It helps separate policy ...
  32. [32]
    Examination of the Synthetic Control Method for Evaluating Health ...
    Oct 7, 2015 · This paper examines the synthetic control method in contrast to commonly used difference‐in‐differences (DiD) estimation, in the context of a re‐evaluation of ...
  33. [33]
    The use of generalized synthetic control method to evaluate air ...
    This study demonstrates that the pollution control measures during the G20 Hangzhou Summit improved air quality immediately and continuously.
  34. [34]
    Applying a two-stage generalized synthetic control approach ... - LWW
    In this study, we propose a framework based on a two-stage generalized synthetic control (GSC) method to quantify the heterogeneous health effects of wildfire ...
  35. [35]
    Evaluating natural experiments in ecology: using synthetic controls ...
    Nov 21, 2020 · The synthetic control approach seeks to generate a composite counterfactual by functionally relating patterns in treated units to candidate ...Synthetic Control In Ecology · Synthetic Control And Other... · Simulation Modeling
  36. [36]
    Use of synthetic control methodology for evaluating public health ...
    This comprehensive literature review suggests that SCM has been little used in public health despite some advantages over existing methods.Missing: environmental | Show results with:environmental
  37. [37]
    [PDF] Why Synthetic Control estimators are biased and what to do about it
    Furthermore, I show that this bias increases with, amongst other things, the weighted sum of the absolute unitwise discrepancies between the treated unit and ...
  38. [38]
    Synthetic Control Methods - Introduction by Xingna Zhang
    Apr 3, 2025 · The synthetic control method is a robust statistical approach used to estimate causal effects in situations where randomised controlled trials (RCTs) are ...
  39. [39]
    Impact of Regression to the Mean on the Synthetic Control Method
    Difference-in-Difference methods allow for estimation of treatment effects under the parallel trends assumption. To justify this assumption, methods for ...
  40. [40]
    Why Synthetic Control estimators are biased and what to do about it
    Nov 21, 2021 · This paper extends the literature on the theoretical properties of synthetic controls to the case of non-linear generative models.
  41. [41]
    [2203.11576] Predictor Selection for Synthetic Controls - arXiv
    Mar 22, 2022 · This paper proposes a sparse synthetic control procedure that penalizes the number of predictors to select the most important ones.
  42. [42]
    [PDF] Consistent Estimation of Optimal Synthetic Control Weights
    This paper proposes a new method to estimate synthetic control weights. We derive the true predictor weights from a standard factor model for potential outputs ...
  43. [43]
    Using synthetic control method to evaluate the effect of a ...
    This paper investigates how on-track competition has influenced fares in a mature market such as the Italian one, by studying the entry of the newcomer.Missing: popularization | Show results with:popularization
  44. [44]
    [PDF] Synthetic Control Method: Inference, Sensitivity Analysis and ...
    Sep 1, 2017 · They consider a sensitivity analysis that allows the empirical researcher to measure the robustness of his or her conclusions (i.e., the test's ...
  45. [45]
    [PDF] Inference for Synthetic Controls via Refined Placebo Tests - arXiv
    Apr 19, 2025 · Synthetic control methods were first introduced by Abadie and Gardeazabal [AG03] to analyze the effects of terrorism in the Basque Country ...
  46. [46]
    Non-parametric identifiability and sensitivity analysis of synthetic ...
    Jan 18, 2023 · We provide a general framework for sensitivity analysis of synthetic control causal inference to violations of the assumptions underlying non-parametric ...
  47. [47]
    gsynth--The Generalized Synthetic Control Method - Yiqing Xu
    gsynth implements the generalized synthetic control method, which imputes counterfactuals for each treated unit using control group information.
  48. [48]
    Synthetic controls with machine learning: application on the effect of ...
    Apr 26, 2024 · Well-selected machine learning algorithms can make synthetic controls more data-driven and flexible. Clustering algorithms select the comparison ...
  49. [49]
    [PDF] A Design-Based Perspective on Synthetic Control Methods - arXiv
    Jul 19, 2023 · Here we take a design-based approach to Synthetic ... Synthetic control method: Inference, sensitivity analysis and confidence sets.
  50. [50]
    A Design-Based Perspective on Synthetic Control Methods
    Sep 21, 2023 · Here we study SC methods from a design-based perspective, assuming a model for the selection of the treated unit(s) and period(s).Missing: peer | Show results with:peer
  51. [51]
    [2508.01793] A Relaxation Approach to Synthetic Control - arXiv
    Aug 3, 2025 · The synthetic control method (SCM) is widely used for constructing the counterfactual of a treated unit based on data from control units in a ...Missing: advances | Show results with:advances
  52. [52]
    Difference-in-Differences Meets Synthetic Control: Doubly Robust ...
    Mar 14, 2025 · Meanwhile, the synthetic control method, which constructs counterfactuals by matching treated units to weighted averages of untreated units, ...
  53. [53]
    Synth: An R Package for Synthetic Control Methods in Comparative ...
    Jun 14, 2011 · The R package Synth implements synthetic control methods for comparative case studies designed to estimate the causal effects of policy interventions.
  54. [54]
    tidysynth package - RDocumentation
    Mar 23, 2025 · tidysynth is a tidy R package for the synthetic control method, which evaluates intervention effects. It offers improvements for easier ...
  55. [55]
    Synthetic Control Using Lasso (SCUL) - Alex Hollingsworth
    The traditional synthetic control method restricts weights to be non-negative and to sum to one. These restrictions force the synthetic control group to remain ...
  56. [56]
    SyntheticControlMethods · PyPI
    SyntheticControlMethods is a Python package for causal inference using synthetic controls, creating a synthetic copy of a treated unit by combining control ...
  57. [57]
    pysyncon 1.5.2 documentation - GitHub Pages
    pysyncon is a Python package that provides methods for the synthetic control method and derivative methods. The types of synthetic control studies available ...
  58. [58]
    SCPI - NP Packages
    The scpi package provides Python, R and Stata implementations of estimation and inference procedures for synthetic control methods.
  59. [59]
    Welcome to the Webpage of Jens Hainmueller - MIT
    Synth is a statistical software package for R, Stata, and MATLAB that implements synthetic control methods for causal inference in comparative case studies as ...
  60. [60]
    allsynth: (Stacked) Synthetic Control Bias-Correction Utilities for Stata
    Synthetic control methods are widely-used for estimating counterfactuals and treatment effects of policy interventions. allsynth adds greatly-enhanced ...
  61. [61]
    [PDF] The Myths of Synthetic Control: Recommendations for Practice
    In their original papers describing the method Abadie et al. (2015, 2010) note that if pre-treatment outcome imbalance is poor, synthetic control methods.<|control11|><|separator|>
  62. [62]
    [PDF] Perils and Pitfalls in the Use of Synthetic Control Methods to Study ...
    May 30, 2024 · We focus on the following software packages: 1. Synth (R/Stata): This package implements the original synthetic controls estimator proposed by.Missing: popularization | Show results with:popularization<|control11|><|separator|>