Fact-checked by Grok 2 weeks ago

Impact evaluation

Impact evaluation is a rigorous analytical approach in social science and policy research that seeks to identify the causal effects of interventions—such as programs, policies, or treatments—on specific outcomes by establishing counterfactual scenarios and attributing observed changes to the intervention itself, rather than confounding factors.^[1]^[2] This distinguishes it from descriptive monitoring or correlational studies, as it prioritizes causal inference through techniques that isolate treatment effects from selection bias, endogeneity, and external influences.^[3]^[4] Central methods include randomized controlled trials (RCTs), which randomly assign participants to treatment and control groups to ensure comparability; quasi-experimental designs like difference-in-differences or regression discontinuity, which leverage natural variation or thresholds for identification; and instrumental variable approaches that exploit exogenous sources of variation to address non-compliance or hidden bias.^[2]^[5] These tools have enabled evidence-based decisions in fields like international development, education, and health, where evaluations have demonstrated, for instance, the ineffectiveness of certain cash transfer programs in altering long-term behaviors or the modest gains from deworming initiatives in improving school attendance.^[6] However, impact evaluation's defining achievements—such as informing the scaling of microfinance or conditional cash transfers—coexist with persistent challenges, including heterogeneous treatment effects across contexts that undermine generalizability and the difficulty of capturing mechanisms beyond average effects.^[6] Controversies arise from methodological limitations and systemic biases: RCTs, often hailed as the gold standard, can suffer from attrition, spillover effects, or ethical constraints in randomization, while non-experimental methods risk confounding; moreover, publication and selection biases in academic and donor-funded studies favor reporting positive or significant results, inflating perceived intervention efficacy and skewing policy toward "what works" narratives that overlook failures or null findings.^[7]^[8] Academic incentives, including tenure pressures and funding from ideologically aligned institutions, exacerbate this optimism, leading to underreporting of negative impacts and overemphasis on short-term metrics over long-run causal chains.^[7]^[9] Despite these issues, rigorous impact evaluation remains essential for causal realism in resource-scarce environments, provided evaluations incorporate sensitivity analyses, pre-registration to curb p-hacking, and mixed-methods to probe underlying processes.^[4]^[8]

Definition and Fundamentals

Core Concepts and Purpose

Impact evaluation entails the rigorous estimation of causal effects attributable to an intervention, program, or policy on targeted outcomes, achieved by comparing observed results against the counterfactual—what outcomes would have prevailed absent the intervention.^[10]^[11] This approach distinguishes impact from mere correlation by addressing the fundamental identification problem: the counterfactual remains inherently unobservable, necessitating empirical strategies to approximate it, such as randomization or statistical matching to construct comparable control groups.^[12] Central concepts include the average treatment effect (ATE), which quantifies the mean difference in outcomes between treated and untreated units, and considerations of heterogeneity, where effects may vary across subgroups, contexts, or over time.^[13] The purpose of impact evaluation lies in generating credible evidence to ascertain whether interventions produce net benefits, the scale of those benefits, and the conditions under which they occur, thereby enabling data-driven decisions in resource-constrained environments.^[14] In development contexts, it supports the prioritization of effective programs to alleviate poverty and enhance welfare, as scarce public funds demand verification that expenditures yield measurable improvements rather than illusory gains from confounding factors.^[14] Beyond accountability, it informs program refinement, scalability assessments, and policy replication, countering reliance on anecdotal or associational evidence that often overstates efficacy due to omitted variables or selection effects.^[15] Evaluations thus promote causal realism, emphasizing mechanisms linking inputs to outputs while highlighting failures, such as null or adverse effects, to avoid perpetuating ineffective practices.^[12]

Historical Origins and Evolution

The systematic assessment of program impacts, particularly through causal inference, originated in early quantitative evaluation practices but gained methodological rigor in the mid-20th century. Initial roots lie in 19th-century reforms, including William Farish's 1792 introduction of numerical marks for academic performance at Cambridge University and Horace Mann's 1845 standardized tests in Boston schools to gauge educational effectiveness. These efforts focused on measurement for accountability rather than causality. By the early 20th century, Frederick W. Taylor's scientific management principles (circa 1911) emphasized efficiency metrics, evolving into objective testing movements that laid groundwork for outcome-oriented scrutiny, though without robust controls for confounding factors.^[16] The modern era of impact evaluation emerged in the 1950s-1960s, driven by post-World War II expansions in education and social welfare programs, including the U.S. National Defense Education Act (1958) and Elementary and Secondary Education Act (1965), which mandated evaluations amid concerns over program efficacy. The Sputnik launch in 1957 heightened demands for evidence-based policy, while the Great Society initiatives spurred social experiments to test interventions like income support. Donald T. Campbell and Julian C. Stanley's 1963 monograph Experimental and Quasi-Experimental Designs for Research formalized designs to mitigate internal validity threats—such as selection bias and maturation—in non-laboratory settings, enabling causal claims from observational data approximations like pre-post comparisons and nonequivalent control groups. This framework professionalized evaluation, distinguishing true experiments from quasi-experiments and influencing fields beyond psychology.^[17]^[18] Pioneering randomized controlled trials (RCTs) in social policy followed, with the U.S. Negative Income Tax experiments (1968-1982) randomizing households to assess guaranteed income effects on labor supply, and the RAND Health Insurance Experiment (1971-1982) evaluating cost-sharing's impact on healthcare utilization, informing 1980s policy shifts toward deductibles. In international development, Mexico's PROGRESA conditional cash transfer program (1997) employed RCTs to measure effects on school enrollment and health, catalyzing scalable evaluations across Latin America and beyond.^[19]^[20] The 2000s marked explosive evolution, termed the "evidence revolution," with institutions like the Abdul Latif Jameel Poverty Action Lab (J-PAL, founded 2003) and the International Initiative for Impact Evaluation (3ie, 2008) institutionalizing RCTs and quasi-experimental methods for poverty alleviation. The U.S. Government Performance and Results Act (1993) and UK Modernizing Government initiative (1999) embedded outcome-focused evaluation in public administration. Advances integrated econometric tools, such as instrumental variables and regression discontinuity designs, to handle endogeneity in large-scale data. This period's emphasis on rigorous causality peaked with the 2019 Nobel Prize in Economics awarded to Abhijit Banerjee, Esther Duflo, and Michael Kremer for RCTs demonstrating interventions' micro-level effects on development outcomes. Subsequent growth includes evidence synthesis via systematic reviews and government-embedded labs, though debates persist over generalizability from small-scale trials to policy scale.^[19]^[21]

Methodological Designs

Experimental Designs

Experimental designs in impact evaluation primarily utilize randomized controlled trials (RCTs), in which eligible units such as individuals, households, or communities are randomly assigned to treatment (receiving the intervention) or control (no intervention) groups to isolate causal effects from confounding factors.^[22]^[23] This random assignment, typically executed through computer algorithms or lotteries, ensures that groups are statistically equivalent on average, both in observed covariates and unobserved characteristics, allowing outcome differences to be credibly attributed to the intervention.^[23] RCTs thus provide unbiased estimates of the average treatment effect (ATE), addressing the fundamental challenge of counterfactual reasoning—what would have happened without the intervention—by using the control group as a proxy.^[22] Key steps in RCT design include defining the eligible population, conducting power calculations to determine required sample size based on expected effect sizes and variability (often aiming for 80% power to detect minimum detectable effects), and verifying post-randomization balance through statistical tests on baseline data.^[22] Outcomes are measured via surveys, administrative records, or other instruments at baseline and endline, with analysis focusing on intent-to-treat (ITT) effects—comparing groups as randomized—to maintain randomization integrity, or treatment-on-the-treated (TOT) effects using instruments for compliance issues.^[23] Regression models may adjust for covariates to increase precision, though unadjusted differences suffice for primary inference under randomization.^[22] Variations adapt RCTs to contextual constraints. Individual-level randomization assigns treatment independently to each unit, maximizing statistical power but risking spillovers in interconnected settings.^[22] Cluster-randomized trials, conversely, assign intact groups (e.g., villages or schools) to treatment or control, mitigating interference while requiring larger samples and intra-cluster correlation adjustments; for example, Mexico's PROGRESA program randomized 506 communities to evaluate conditional cash transfers, demonstrating sustained impacts on school enrollment.^[23]^[22] Factorial designs test multiple interventions simultaneously by crossing treatment arms (e.g., combining cash transfers with training), enabling assessment of interactions and main effects within one trial, as in variations of Indonesia's Raskin food subsidy program across 17.5 million beneficiaries in 2012.^[23]^[24] Stratified or blocked randomization ensures balance across subgroups like gender or location, enhancing precision without altering causal identification.^[22] Staggered or phase-in designs roll out interventions sequentially, using early phases as controls for later ones in scalable programs.^[23] These designs prioritize internal validity but demand safeguards against threats like spillovers (intervention diffusion to controls) or crossovers (controls accessing treatment), addressed via geographic separation or monitoring.^[22] Ethical implementation requires uncertainty about intervention efficacy and minimal harm from control withholding, often justified by potential phase-in for all post-evaluation.^[23] Empirical evidence from RCTs, such as a 43% reduction in violent crime arrests from Chicago's One Summer Plus job program, underscores their capacity for policy-relevant causal insights when properly executed.^[23]

Quasi-Experimental and Observational Designs

Quasi-experimental designs estimate causal impacts of interventions without random assignment, relying instead on structured comparisons or natural variations to approximate experimental conditions. These approaches, first systematically outlined by Donald T. Campbell and Julian C. Stanley in their 1963 chapter, address threats to internal validity through designs like time-series analyses or nonequivalent control groups, enabling inference in real-world settings where randomization is infeasible, such as policy implementations or large-scale programs.^[25]^[26] Unlike true experiments, they demand explicit assumptions—such as the absence of contemporaneous events affecting groups differentially—to isolate treatment effects, with validity often assessed via placebo tests or falsification strategies. A core quasi-experimental method is difference-in-differences (DiD), which identifies impacts by subtracting pre-treatment outcome differences from post-treatment differences between treated and control groups, under the parallel trends assumption that untreated trends would mirror counterfactuals. Applied in evaluations like the 1996 U.S. welfare reform, DiD has shown, for instance, that job training programs increased earnings by 10-20% in some cohorts when controlling for economic cycles.^[27]^[28] Extensions, such as triple differences, incorporate additional dimensions like geography to mitigate violations from heterogeneous trends, though recent critiques highlight sensitivity to staggered adoption in multi-period settings.^[29] Regression discontinuity designs (RDD) exploit deterministic assignment rules, estimating local average treatment effects from outcome discontinuities at a cutoff, where units near the threshold are quasi-randomized by the forcing variable. In a 2013 evaluation of Colombia's Ser Pilo Paga scholarship, RDD revealed a 0.17 standard deviation increase in college enrollment for score-justifiers above the eligibility line, with bandwidth selection via optimal methods ensuring precise local inference.^[30] Sharp RDD assumes perfect compliance at the cutoff, while fuzzy variants handle partial take-up using IV within the framework; both require checks for manipulation, such as density tests showing no bunching.^[31] Instrumental variables (IV) address endogeneity by using an exogenous instrument correlated with treatment uptake but unrelated to outcomes except through treatment, yielding estimates for compliers under monotonicity. In Angrist and Krueger's 1991 analysis of U.S. compulsory schooling, quarter-of-birth instruments—leveraging school entry age laws—estimated a 7-10% return to an additional year of education, isolating causal effects amid self-selection.^[32] Instrument validity hinges on relevance (strong first-stage correlation) and exclusion (no direct outcome path), tested via overidentification in multiple-IV setups; weak instruments bias estimates toward OLS, as quantified in Stock-Yogo critical values from 2005.^[33] Observational designs draw causal inferences from non-manipulated data, emphasizing conditional independence or structural assumptions to mitigate confounding, often via balancing methods like propensity score matching (PSM), which estimates treatment probabilities from covariates to pair similar units. A 2023 review found PSM effective in observational evaluations of public health interventions, reducing bias by up to 80% when overlap is sufficient, though it fails with unobservables, as evidenced by simulation studies showing 20-50% attenuation under hidden confounders.^[34]^[35] Advanced observational techniques include panel fixed effects, which difference out time-invariant confounders in longitudinal data, and synthetic controls, constructing counterfactuals as weighted untreated unit combinations to match pre-treatment trajectories. In Abadie et al.'s 2010 California tobacco control evaluation, synthetic controls attributed a 20-30% drop in per-capita cigarettes to the policy, outperforming simple DiD under heterogeneous trends.^[36] These methods demand large samples and covariate balance diagnostics, with triangulation—combining, say, PSM and IV—enhancing robustness, as recommended in 2021 guidelines for non-randomized studies.^[37] Despite strengths in scalability, observational designs remain vulnerable to model misspecification, necessitating pre-registration and falsification tests to approximate causal credibility.^[38]

Sources of Bias and Validity Threats

Selection and Attrition Biases

Selection bias occurs when systematic differences between treatment and comparison groups arise due to non-random assignment or participation, leading to distorted estimates of causal effects in impact evaluations. In observational or quasi-experimental designs, individuals self-selecting into programs often possess unobserved characteristics—such as motivation or ability—that correlate with outcomes, inflating or deflating apparent program impacts; for instance, remaining selection bias after matching techniques can exceed 100% of the experimentally estimated effect in social program evaluations.^[39] This threat undermines internal validity by violating the assumption of exchangeability between groups, making it challenging to attribute outcome differences solely to the intervention rather than pre-existing disparities.^[40] Even in randomized controlled trials (RCTs), selection bias can emerge if eligibility criteria or recruitment processes favor certain subgroups, though proper randomization typically mitigates it at baseline.^[41] Attrition bias, a post-randomization form of selection bias, arises when participants exit studies at differential rates between treatment and control groups, particularly if dropouts are correlated with outcomes or treatment status, thereby altering group compositions and biasing effect estimates. In RCTs for social programs, such as early childhood interventions, attrition rates exceeding 20% often introduce systematic imbalances, with leavers in treatment groups potentially having worse outcomes than stayers, leading to overestimation of positive effects if not addressed.^[42]^[43] This bias threatens the completeness of intention-to-treat analyses and can amplify in longitudinal evaluations where follow-up surveys fail to retain high-risk participants, as seen in teen pregnancy prevention trials where cluster-level attrition exacerbates imbalances.^[44] Unlike baseline selection, attrition introduces time-varying confounding, as dropout reasons—like program dissatisfaction or external shocks—may interact with treatment exposure.^[45] Both biases compromise causal inference by eroding the comparability of groups essential for counterfactual estimation; selection operates pre-treatment, while attrition does so post-treatment, but they converge in non-random loss of data that correlates with potential outcomes. In development impact evaluations, empirical assessments show that unadjusted attrition can shift effect sizes by 10-30% in magnitude, with bounding approaches or sensitivity analyses revealing the direction of potential distortion.^[46] Mitigation strategies include baseline covariates for reweighting, worst-case scenario bounds, or pattern-mixture models, though these require assumptions about missingness mechanisms that may not hold without auxiliary data. High-quality evaluations report attrition rates and test for baseline differences among dropouts to quantify threats, emphasizing that low attrition alone does not guarantee unbiasedness if patterns are non-ignorable.^[47]^[48]

Temporal and Contextual Biases

Temporal biases in impact evaluation refer to systematic errors introduced by time-related factors that confound causal attribution, often threatening internal validity by providing alternative explanations for observed changes in outcomes. History effects occur when external events, unrelated to the intervention, coincide with its implementation and influence results; for instance, a concurrent economic policy change might inflate estimates of a job training program's employment effects. Maturation effects arise from natural developmental or aging processes in participants, such as improved cognitive skills in children over the study period, which could be mistakenly attributed to an educational intervention.^[49]^[50] These biases are particularly pronounced in longitudinal or quasi-experimental designs lacking randomization, where pre-intervention trends or secular drifts—broader societal shifts like technological adoption—may parallel the treatment timeline and bias impact estimates upward or downward. Regression to the mean exacerbates temporal issues when extreme baseline values naturally moderate over time, as seen in evaluations of interventions targeting high-risk groups, such as substance abuse programs where initial severity scores revert without treatment influence. To mitigate, evaluators often employ difference-in-differences methods to test parallel trends or include time-fixed effects in models.^[49]^[51] Contextual biases stem from the specific setting or environment of the evaluation, which can modify intervention effects or introduce local confounders, thereby limiting generalizability and introducing effect heterogeneity. Interaction effects with settings manifest when outcomes vary due to unmeasured site-specific factors, such as cultural norms or institutional support; for example, a microfinance program's success in rural areas may not replicate in urban contexts due to differing market dynamics. Spillover effects, where treatment benefits leak to controls within the same locale, contaminate comparisons, as documented in cluster-randomized trials of health interventions where community-level diffusion biases null findings toward underestimation.^[49]^[50] Hawthorne effects represent a reactive contextual bias, wherein participants alter behavior due to awareness of evaluation, inflating impacts in monitored settings like workplace productivity studies. Site selection bias further compounds issues when programs are evaluated in non-representative locations correlated with higher efficacy, such as motivated communities, leading to overoptimistic extrapolations. Addressing these requires explicit testing for moderators via subgroup analyses or heterogeneous treatment effect estimators, alongside transparent reporting of contextual descriptors to aid external validity assessments.^[49]^[52]

Estimation and Analytical Techniques

Causal Inference Methods

Causal inference methods in impact evaluation seek to identify and quantify the effects of interventions by estimating counterfactual outcomes, typically under the potential outcomes framework. This framework posits that for each unit i, there exist two potential outcomes: Y_i(1) under treatment and Y_i(0) under control, with the individual treatment effect defined as Y_i(1) - Y_i(0).^[53] The average treatment effect (ATE) averages this difference across units, but the fundamental challenge arises because only one outcome is observed per unit, necessitating assumptions to link observables to the unobserved counterfactual.^[54] Originating from Neyman's work in randomized experiments (1923) and extended by Rubin (1974) to broader settings, the framework underpins modern quasi-experimental estimation by emphasizing identification via conditional independence or exclusion restrictions.^[4] These methods are particularly vital in observational data from impact evaluations, where randomization is absent, requiring strategies to mimic experimental conditions through covariates, instruments, or discontinuities. Common approaches include propensity score matching, instrumental variables, regression discontinuity, and difference-in-differences, each relying on distinct identifying assumptions to bound or point-identify causal effects. While powerful, their validity hinges on untestable assumptions, such as no unmeasured confounders or parallel trends, which empirical checks like placebo tests or sensitivity analyses can probe but not fully verify.^[3] Propensity Score Matching (PSM) balances treated and control groups by matching on the propensity score, defined as the probability of treatment given observed covariates X, e(X) = P(D=1|X). Under selection on observables (conditional independence: Y(1), Y(0) \perp D | X), matching yields unbiased estimates of the ATE for the treated or overall. Introduced by Rosenbaum and Rubin (1983), PSM reduces dimensionality from multiple covariates to one score, often implemented via nearest-neighbor or kernel matching, with caliper restrictions to ensure close matches.^[55] In impact evaluations of social programs, such as job training initiatives, PSM has estimated effects like a 10-20% earnings increase from participation, though it fails if unobservables like motivation confound assignment.^[4] Sensitivity to model misspecification and common support violations necessitates balance diagnostics, where covariate means post-matching should align across groups. Instrumental Variables (IV) addresses endogeneity from unobservables by leveraging an instrument Z correlated with treatment D (relevance: \text{Cov}(Z,D) \neq 0) but affecting outcomes Y only through D (exclusion: no direct path from Z to Y). The two-stage least squares (2SLS) estimator recovers the local average treatment effect (LATE) for compliers—those whose treatment status changes with Z—under monotonicity (no defiers). Angrist, Imbens, and Rubin (1996) formalized LATE as the relevant parameter when heterogeneity exists, applied in evaluations like quarter-of-birth instruments for schooling returns, yielding IV estimates of 7-10% per year of education versus 5-8% from OLS. Weak instruments bias estimates toward OLS (first-stage F-statistic >10 recommended), and exclusion violations, such as spillover effects, undermine credibility; overidentification tests (Sargan-Hansen) assess multiple instruments.^[56] Regression Discontinuity Design (RDD) exploits sharp or fuzzy discontinuities at a known cutoff in the assignment rule, treating units just above and below as locally randomized. In sharp RDD, the treatment effect is the jump in the conditional expectation of Y at the cutoff, estimated via local polynomials or parametric regressions with bandwidth selection (e.g., Imbens-Kalyanaraman optimal). Imbens and Lemieux (2008) outline implementation, including density tests for manipulation and placebo outcomes for bandwidth sensitivity.^[57] For policy cutoffs like scholarships at exam score thresholds, RDD has quantified effects such as a 0.2-0.5 standard deviation improvement in future earnings, with internal validity strongest near the cutoff but external validity limited to that margin. Fuzzy RDD extends to imperfect compliance using IV logic, where the first-stage discontinuity instruments the treatment probability.^[58] Difference-in-Differences (DiD) estimates effects by differencing changes in outcomes over time between treated and control groups, identifying the ATE under parallel trends: absent treatment, gaps would evolve similarly. The estimator is (E[Y_{TT}] - E[Y_{TC}]) - (E[Y_{CT}] - E[Y_{CC}]), where subscripts denote treated/ control and post/pre periods. Bertrand, Duflo, and Mullainathan (2004) highlight serial correlation inflating standard errors in multi-period panels, recommending clustered errors or data collapse to two periods for robustness.^[59] In evaluations of minimum wage hikes, DiD has shown null or small employment effects (e.g., -0.1% per 10% wage increase), contrasting event-study pre-trends to validate assumptions.^[60] Extensions like triple differences add a third dimension to control fixed differences, but violations from differential shocks (e.g., Ashenfelter dips) require synthetic controls or staggered adoption adjustments. Other techniques, such as synthetic control for aggregate interventions, construct counterfactuals as weighted combinations of untreated units matching pre-treatment trends, effective for rare events like policy reforms in single units.^[4] Across methods, robustness checks, including placebo applications and falsification on pre-treatment data, are essential, as are meta-analyses revealing that quasi-experimental estimates often align with RCTs when assumptions hold, though divergence signals bias.^[3] Integration with machine learning for covariate adjustment or double robustness (combining outcome and propensity models) enhances precision but demands large samples to avoid overfitting.^[61]

Economic Evaluation Integration

Economic evaluation integration in impact evaluation extends causal effect estimation by incorporating cost data to assess resource efficiency, enabling comparisons of interventions' value relative to alternatives. This approach quantifies whether observed impacts justify expended resources, often through metrics like incremental cost-effectiveness ratios (ICERs) or benefit-cost ratios (BCRs). For instance, in development programs, impact evaluations using randomized controlled trials (RCTs) may pair treatment effect estimates on outcomes such as school enrollment with program delivery costs to compute costs per additional enrollee.^[62] Such integration supports decision-making on scaling interventions, as seen in analyses by organizations like the International Initiative for Impact Evaluation (3ie), which emphasize prospective cost data collection alongside experimental designs to avoid retrospective biases.^[62] Cost-effectiveness analysis (CEA), a primary method, measures the cost per unit of outcome achieved, such as dollars per life-year saved or per child educated, without requiring full monetization of benefits. In RCT-based impact evaluations, CEA typically applies the intervention's average cost per beneficiary to the estimated average treatment effect, yielding ratios like $X per Y% increase in productivity.^[63] A 2024 3ie handbook outlines standardized steps for CEA in impact evaluations, including delineating direct and indirect costs (e.g., staff time, materials, overhead) and sensitivity analyses for uncertainty in effect sizes or cost estimates.^[62] Challenges include attributing shared costs in multi-component interventions and using shadow prices for non-traded inputs in low-income settings, where market prices may distort true opportunity costs.^[64] Cost-benefit analysis (CBA) advances further by monetizing all outcomes, comparing discounted streams of benefits against costs to derive net present values or internal rates of return. Applied to impact evaluations, CBA requires valuing non-market effects, such as health improvements via willingness-to-pay proxies or human capital models projecting lifetime earnings gains from education interventions.^[65] A World Bank analysis found that fewer than 20% of impact evaluations incorporate CBA, often due to data demands and methodological debates over valuation assumptions, yet those that do reveal high returns, like BCRs exceeding 5:1 for deworming programs in Kenya based on long-term income effects.^[64]^[65] Integration with quasi-experimental designs demands adjustments for selection biases in cost attribution, using techniques like propensity score matching to estimate counterfactual costs.^[66] Despite advantages, integration faces institutional barriers, including underinvestment in cost data collection during trials, where focus prioritizes statistical significance of impacts over economic metrics.^[63] Guidelines from bodies like the World Bank advocate embedding economic components from study inception, with prospective costing protocols to capture fixed and variable expenses accurately.^[64] Empirical evidence from development economics underscores the policy relevance, as integrated evaluations have informed reallocations, such as prioritizing cash transfers over less cost-effective subsidies when BCRs differ by factors of 2-10.^[65] Ongoing refinements address generalizability, incorporating transferability adjustments for context-specific costs and effects across settings.^[62]

Debates and Methodological Controversies

RCT Gold Standard vs. Alternative Approaches

Randomized controlled trials (RCTs) are widely regarded as the gold standard in impact evaluation for establishing causal effects due to randomization, which balances treatment and control groups on both observed and unobserved confounders, thereby minimizing selection bias and enabling unbiased estimates of average treatment effects under ideal conditions.^[67] This approach has been particularly influential in fields like development economics, where organizations such as J-PAL have scaled RCTs to evaluate interventions like deworming programs, yielding precise estimates of effects such as a 0.14 standard deviation increase in earnings from childhood deworming in Kenya as of long-term follow-ups reported in 2019.^[68] However, proponents acknowledge that RCTs assume stable mechanisms and no spillover effects, which may not hold in complex social settings. Despite their strengths in internal validity, RCTs face significant limitations that challenge their unqualified status as the gold standard. Ethical constraints prevent randomization in many policy contexts, such as evaluating universal programs like national education reforms, while high costs—often exceeding $1 million per trial in development settings—and long timelines limit scalability.^[69] External validity is another concern, as RCT participants and settings are often unrepresentative; for instance, trials in controlled environments may overestimate effects in diverse real-world applications, with meta-analyses showing effect sizes in RCTs decaying by up to 50% when scaled up.^[70] Critics like Angus Deaton argue that RCTs provide narrow, context-specific knowledge without illuminating underlying mechanisms or generalizability, potentially misleading policy if treated as universally superior evidence, as evidenced by discrepancies between RCT findings and broader econometric data in poverty alleviation studies.^[68] Alternative approaches, particularly quasi-experimental designs, offer robust causal inference when RCTs are infeasible by exploiting natural or policy-induced variation. Methods like regression discontinuity designs (RDD) assign treatment based on a cutoff score, approximating randomization near the threshold; for example, an RDD evaluation of Colombia's scholarship program in 2012 estimated a 4.8 percentage point increase in college enrollment, comparable to RCT benchmarks.^[71] Difference-in-differences (DiD) compares changes over time between treated and untreated groups assuming parallel trends, as in Card and Krueger's 1994 minimum wage study, which found no employment loss in New Jersey fast-food sectors post-1992 hike.^[72] Instrumental variables (IV) use exogenous shocks for identification, addressing endogeneity in observational data. These methods rely on testable assumptions—such as no anticipation in RDD or parallel trends in DiD—allowing empirical validation, and often provide stronger external validity by leveraging large-scale administrative data rather than small, artificial samples.^[73] The debate pits RCT advocates, including Joshua Angrist and Guido Imbens—who emphasize randomization's avoidance of model dependence against alternatives' reliance on untestable assumptions—against skeptics like Deaton and Nancy Cartwright, who contend that no method guarantees causality without theory and triangulation, as RCTs can suffer from attrition bias (up to 20-30% in social trials) or Hawthorne effects.^[74] ^[75] Empirical comparisons reveal mixed results: a 2022 analysis of labor interventions found quasi-experimental estimates aligning with RCTs 70-80% of the time when assumptions hold, but diverging in heterogeneous contexts, underscoring that alternatives can match RCT precision while better capturing policy-relevant variation.^[76] In impact evaluation, over-reliance on RCTs, often promoted by institutions with vested interests in experimental methods, risks sidelining credible quasi-experimental evidence from natural experiments, as seen in macroeconomic policy assessments where observational designs have informed reforms like conditional cash transfers in Brazil.^[77]

Approach	Key Strength	Key Limitation	Example Application
RCTs	High internal validity via randomization	Poor scalability, ethical barriers, limited generalizability	Microfinance impacts in India (2000s trials showing modest effects)^[68]
Quasi-Experimental (e.g., DiD, RDD)	Leverages real-world data for broader applicability	Depends on assumptions like parallel trends, testable but not always verifiable	Minimum wage effects (DiD in 1994 U.S. study)^[72]

Ultimately, causal realism demands selecting methods based on context rather than hierarchy, integrating RCTs for precision where possible with quasi-experimental and mechanistic analyses for robustness, as singular elevation of any approach ignores the pluralistic nature of evidence in complex systems.^[75]

Empirical Positivism vs. Theory-Driven Evaluation

In impact evaluation, empirical positivism prioritizes observable data and statistical inference to determine program effects, often employing randomized controlled trials (RCTs) or quasi-experimental designs to isolate causal impacts on outcomes while treating interventions as "black boxes" that link inputs directly to results without explicit modeling of internal processes. This approach, rooted in the positivist paradigm's emphasis on objective measurement and falsifiability, seeks to establish whether an intervention produces net benefits through rigorous hypothesis testing and control for confounding variables, as seen in evaluations by organizations like the Abdul Latif Jameel Poverty Action Lab (J-PAL), which reported over 1,000 RCTs by 2023 demonstrating average treatment effects in areas like education and health.^[78] Such methods excel in providing high internal validity, with meta-analyses showing RCTs yielding effect sizes that are more precise and less biased than non-experimental alternatives, though they may overlook heterogeneous effects across contexts.^[79] Theory-driven evaluation, by contrast, integrates explicit program theories—such as theories of change or realist causal mechanisms—to unpack how interventions generate outcomes via intermediate links, resources, and contextual factors, rather than solely relying on outcome measurement. Originating in the 1980s as a critique of black-box limitations, this method, advanced by evaluators like Huey Chen, posits that understanding "what works for whom, in what circumstances, and why" requires mapping assumed causal pathways and testing them empirically or qualitatively, as applied in international development assessments by the International Institute for Environment and Development (IIED).^[80] For instance, a 2014 study on knowledge translation initiatives used realist evaluation to identify context-mechanism-outcome configurations, revealing why certain programs succeeded in specific settings despite similar average effects.^[81] Proponents argue it enhances external validity and scalability by addressing generalizability gaps in purely empirical designs, with Treasury Board of Canada guidelines from 2021 recommending its use to examine causal chains beyond net impacts.^[82] The tension between these paradigms reflects broader methodological debates in evaluation science, where empirical positivism is lauded for its causal rigor—evidenced by post-positivist refinements acknowledging researcher influence but still prioritizing quantifiable evidence over metaphysical assumptions—yet critiqued for reductionism that ignores implementation fidelity and adaptive behaviors.^[83] Theory-driven approaches counter this by fostering deeper causal realism through mechanism testing, but they risk circular reasoning if program theories embed unverified ideological assumptions, as noted in critiques of their subjective theory construction potentially amplifying biases in academic settings where qualitative methods predominate. Empirical evaluations have demonstrated superior replicability in policy contexts, with a 2020 review finding that black-box RCT findings influenced 15% more legislative changes than theory-only assessments, though hybrid models combining both—such as realist RCTs—emerge as pragmatic syntheses to balance evidentiary strength with explanatory depth.^[84] In practice, over-reliance on positivist metrics in high-stakes funding decisions, like those from USAID since 2010, has prompted calls for theory integration to mitigate failures in scaling empirically validated pilots, underscoring that while empirical methods ground truth claims in data, theory-driven elements are essential for causal interpretation without supplanting evidential primacy.^[85]

Ethical, Practical, and Ideological Critiques

Ethical critiques of impact evaluation, particularly randomized controlled trials (RCTs), center on the moral implications of randomization, which deliberately withholds interventions from control groups to establish causality. This practice raises concerns about equity and beneficence, as it may deny potentially life-improving treatments to participants in need, especially when preliminary evidence or clinical equipoise is absent, violating principles like those in the Declaration of Helsinki.^[86] In development contexts, where populations often face poverty or health vulnerabilities, RCTs can exacerbate inequalities by favoring treatment groups, prompting debates over whether such designs are justifiable without assured post-trial access for controls.^[87] Critics like Angus Deaton argue that conducting RCTs when interventions are suspected to work undermines ethical standards, as it prioritizes experimental purity over participant welfare, potentially amounting to exploitation in low-resource settings.^[88] Practical challenges include the high financial and temporal costs of RCTs, which often require large samples, extended follow-ups, and sophisticated infrastructure, rendering them infeasible for small-scale or urgent programs in resource-constrained environments.^[89] Attrition, non-compliance, and contextual dependencies further compromise reliability, as real-world implementation deviates from idealized protocols, leading to underpowered studies unable to detect modest effects.^[67] External validity remains a persistent issue; findings from specific, controlled settings—such as deworming programs in rural Kenya—frequently fail to replicate or scale in diverse populations or policy environments, limiting their utility for broad decision-making.^[86] Ideological critiques portray RCT-centric impact evaluation as emblematic of empirical positivism, which elevates narrow, ahistorical data over theoretical models, contextual nuances, and indigenous knowledge systems, fostering a "randomista" orthodoxy that dismisses non-experimental evidence.^[90] This approach is accused of technocratic overreach, depoliticizing policy by framing decisions as purely evidence-driven while sidelining value judgments, power dynamics, and ethical trade-offs inherent to governance.^[91] In international development, such methods have been labeled neo-colonial, imposing Western scientific paradigms on global South contexts and prioritizing measurable outcomes over holistic, theory-guided interventions that address systemic causes like institutional failures.^[86] Proponents of alternatives, including structural economists, contend that RCTs' aversion to prior assumptions hinders causal understanding in complex social systems, where mechanisms demand mechanistic reasoning beyond average treatment effects.^[92]

Applications and Empirical Evidence

Impact evaluations, predominantly through randomized controlled trials (RCTs), have been extensively applied to development and social programs in low- and middle-income countries, yielding causal evidence on interventions targeting poverty alleviation, health, education, and nutrition. Organizations such as the Abdul Latif Jameel Poverty Action Lab (J-PAL) and the World Bank have conducted or funded numerous RCTs to assess program effectiveness, revealing heterogeneous outcomes where some interventions demonstrate robust benefits while others show modest or null effects.^[93]^[94] These evaluations emphasize scalable, low-cost programs like deworming and cash transfers, but also highlight challenges such as generalizability beyond pilot settings and long-term sustainability.^[95] Conditional cash transfer (CCT) programs, which link payments to behaviors like school attendance and health checkups, provide some of the strongest empirical evidence of positive impacts. Mexico's Progresa (later Oportunidades), launched in 1997, was evaluated using RCTs on over 24,000 households, showing increases in school enrollment by approximately 20% for girls in secondary school and improvements in health outcomes, including a 10-18% rise in immunization rates and reduced child malnutrition.^[96]^[97] Long-term follow-ups indicated sustained effects, such as higher consumption and reduced poverty into adulthood, though benefits were more pronounced for targeted poor households.^[98] Unconditional cash transfers (UCTs), without behavioral requirements, have been analyzed in a Bayesian meta-analysis of 115 studies across 72 programs, estimating average effects including a 0.08 standard deviation increase in household consumption and reduced hunger, with stronger impacts in acute poverty contexts but limited evidence of transformative poverty escape.^[99] In health-focused social programs, mass deworming initiatives stand out for cost-effectiveness, with RCTs in Kenya demonstrating that school-based treatment reduced worm infections and increased school attendance by 25%, alongside long-run earnings gains of up to 20% for treated children tracked into adulthood.^[100] A 2022 meta-analysis of multiple studies confirmed modest nutritional benefits, such as 0.3 kg average weight gain in children per treatment round, though effects on cognition and height were inconsistent or negligible.^[101] Reanalyses of flagship studies have debated effect sizes, attributing some discrepancies to externalities like community-wide treatment spillovers, underscoring the need for careful interpretation in scaling.^[102] Microfinance programs, aimed at fostering entrepreneurship among the poor, contrast with these successes, as RCTs across six countries found limited causal impacts on household income or consumption, with meta-analyses of seven evaluations reporting negligible poverty reduction for non-entrepreneurial households and only modest business adoption among borrowers.^[103]^[93] These null or small effects challenge earlier observational claims of broad transformative potential, revealing instead that access to credit often supports consumption smoothing rather than sustained growth, particularly in saturated markets.^[104] Overall, empirical evidence from these applications supports selective investment in high-evidence interventions like CCTs and deworming, which yield positive returns at costs under $100 per beneficiary annually, but cautions against over-reliance on programs like microfinance without addressing selection into entrepreneurship.^[105] Integration with non-experimental methods, such as regressions on observational data, has complemented RCTs for broader policy contexts where randomization is infeasible.^[106]

Policy and Institutional Interventions

Impact evaluations of policy and institutional interventions employ causal inference methods, such as randomized controlled trials (RCTs) and difference-in-differences (DiD) designs, to measure the effects of government reforms on outcomes like economic growth, service delivery, and governance quality. These assessments often reveal mixed results, with successes dependent on contextual factors including political incentives and implementation capacity, while many donor-supported initiatives fail to deliver sustained improvements. For instance, between 1998 and 2008, donor-backed "good governance" reforms in 145 countries resulted in a decline in government effectiveness for 50% of recipients, as measured by World Bank Governance Indicators, highlighting challenges in achieving causal improvements through institutional changes.^[107] Decentralization policies, which devolve authority to local levels, have been evaluated for their impacts on resource allocation and public goods provision. A randomized evaluation in India during the early 2000s assigned village leadership to women under gender quotas, finding that female policymakers increased investments in public drinking water and roads—goods disproportionately benefiting women—by 10-15 percentage points compared to male-led villages, demonstrating causal effects on pro-poor outcomes via improved representation.^[108] In Bolivia, the 1994 Popular Participation Law, which decentralized 20% of national revenue to municipalities, led to shifts in spending toward education and health in poorer areas, with per capita infrastructure investments rising by up to 25% in responsive localities, though overall impacts varied by local elite capture.^[107] Streamlining administrative institutions, such as one-stop service (OSS) reforms, aims to reduce bureaucratic hurdles for business registration and permits. In Indonesia, the 2018 OSS institutional overhaul, consolidating licensing across 369 districts, was assessed using a staggered DiD model on 2014-2018 panel data, revealing a short-term negative impact on per-capita GDP growth, with a coefficient of -0.011 (p<0.1), attributed to transitional disruptions like capacity gaps and risk-averse implementation.^[109] Police institutional reforms, including training protocols, have shown more consistent causal benefits in RCTs; a multicity U.S. trial in 2015-2016 found procedural justice training increased officer compliance with constitutional standards by 10-20%, reducing citizen complaints without elevating crime rates.^[110] Similarly, a 2024 RCT of police use-of-force training in a large agency reported a statistically significant reduction in force incidents post-intervention.^[111] Broader evidence from public sector reforms indicates limited success in curbing administrative corruption, with systematic reviews finding that while efficiency gains reduce opportunities for graft, sustained declines require complementary enforcement, as isolated institutional tweaks often yield null or perverse effects due to entrenched incentives.^[112] These findings underscore the importance of rigorous, context-specific evaluations to distinguish effective interventions from those undermined by implementation failures or political short-termism.

Organizations, Initiatives, and Reviews

Key Promoters and Evidence Producers

The Abdul Latif Jameel Poverty Action Lab (J-PAL), established in 2003 at the Massachusetts Institute of Technology, serves as a central hub for promoting randomized controlled trials (RCTs) in impact evaluation, particularly in poverty alleviation and development economics.^[78] J-PAL-affiliated researchers have conducted or overseen more than 1,100 randomized evaluations worldwide, generating empirical evidence on interventions such as deworming programs, remedial education, and conditional cash transfers, which have informed scalable policies in over 80 countries.^[23] Its founders, including Nobel laureates Abhijit Banerjee and Esther Duflo, emphasize RCTs for establishing causal impacts, training policymakers and researchers through courses and partnerships to prioritize evidence over intuition in program design.^[113] Innovations for Poverty Action (IPA), founded in 2002 by economist Dean Karlan, functions as a research network that executes field experiments to test poverty interventions, producing evidence on topics like microfinance efficacy, agricultural innovations, and behavioral nudges.^[114] IPA has completed hundreds of RCTs across more than 50 countries, collaborating with governments and NGOs to scale proven programs, such as improving teacher attendance in India or reducing fraud in cash transfers, while addressing organizational challenges in embedding rigorous evaluation into operations.^[115] It complements J-PAL by focusing on implementation science, providing tools for theory-driven evaluations and partnering on joint initiatives to build capacity for evidence generation in low-resource settings.^[116] The International Initiative for Impact Evaluation (3ie), launched in 2008 as a grant-making NGO, funds and synthesizes high-quality impact studies to support evidence-informed policies in low- and middle-income countries, emphasizing transparency through systematic reviews and repositories of over 4,000 evaluations.^[21] 3ie has disbursed grants for more than 300 primary studies and produced evidence maps on sectors like health, education, and climate adaptation, promoting mixed-methods approaches alongside RCTs to enhance generalizability and uptake by decision-makers.^[21] It quality-assures outputs via rigorous protocols, countering publication bias by incentivizing registration and reporting of null results. Other notable producers include the World Bank's Strategic Impact Evaluation Fund (SIEF), active since 2008, which has supported over 100 studies measuring program effects in areas like early childhood development and service delivery, influencing Bank-wide lending decisions with data from RCTs in Africa and South Asia.^[117] The International Food Policy Research Institute (IFPRI) has conducted causal evaluations since the late 1990s, including landmark RCTs on Mexico's PROGRESA program, generating evidence on nutrition-sensitive agriculture and social safety nets adopted in multiple nations.^[118] These entities collectively advance a paradigm of empirical testing, though their RCT-centric focus has drawn scrutiny for potential overemphasis on narrow, context-specific findings at the expense of broader causal mechanisms.^[119]

Skeptics, Critics, and Reform Advocates

Nobel laureate Angus Deaton has critiqued the application of randomized controlled trials (RCTs) in impact evaluation, arguing that they are often misinterpreted as providing unassailable evidence for policy without addressing external validity or causal mechanisms.^[75] Deaton and co-author Nancy Cartwright contend that RCTs require minimal theoretical assumptions, which aids persuasion in skeptical contexts but hinders deeper understanding by sidelining prior knowledge and generalizability beyond specific trial conditions.^[75] They emphasize that RCTs cannot standalone as "gold standard" proofs, as replication across varied settings is rare, and results may fail to predict outcomes in scaled implementations due to contextual differences.^[75] Lant Pritchett has similarly challenged the RCT paradigm in development impact evaluation, highlighting paradoxes in external validity where small-scale trials yield effects that diminish or reverse at larger scales due to implementation challenges and institutional constraints.^[120] Pritchett argues that RCTs disproportionately focus on marginal, short-term interventions like private goods (e.g., deworming) rather than public goods or systemic reforms, diverting attention from transformative questions about economic growth and state capacity.^[121] He critiques the methodology for underemphasizing mechanisms of change and scalability, noting that even positive trial findings often encounter "fade-out" when rolled out nationally, as seen in education interventions where contract teacher effects did not persist broadly.^[122] Ethical concerns form another core critique, particularly in development contexts where control groups receive no intervention, potentially withholding beneficial treatments from vulnerable populations.^[123] Deaton points to cases like cash transfers or health programs where randomization equates to denying aid, raising moral hazards absent equipoise—true uncertainty about efficacy—that is harder to establish for social policies than medical ones.^[75] Critics like Ravi Khera argue this practice influences research agendas toward low-stakes questions, amplifying disproportionate sway over policy while exposing participants to harms without adequate safeguards.^[123] Reform advocates urge integrating RCTs with theory-driven approaches, qualitative insights, and quasi-experimental methods to enhance causal inference and policy relevance.^[75] Deaton advocates for RCTs within cumulative scientific programs that incorporate mechanistic understanding and historical data, rather than isolated empiricism.^[75] Pritchett calls for evaluation frameworks prioritizing implementation science and growth-oriented reforms, arguing that methodological pluralism better addresses scalability barriers than RCT monoculture.^[120] Such reforms aim to mitigate biases toward feasible but narrow studies, fostering evaluations that inform ambitious interventions despite academia's institutional incentives favoring RCT production.^[121]

Recent Developments and Challenges

Technological and Methodological Innovations

Advancements in machine learning have enhanced causal inference in impact evaluation by addressing high-dimensional data and model misspecification. Double machine learning (Double ML) employs supervised machine learning algorithms to flexibly estimate nuisance parameters, such as propensity scores and conditional expectations, within semi-parametric estimators for average treatment effects under unconfoundedness assumptions, thereby improving precision and bias reduction compared to parametric alternatives.^[124] Targeted learning integrates ensemble methods like Super Learner into targeted maximum likelihood estimation, allowing for data-adaptive model selection while targeting causal parameters, as demonstrated in policy effect estimations where traditional methods falter with complex covariates.^[124] These approaches, formalized in frameworks from 2019 onward, enable evaluators to incorporate vast covariate sets without overfitting risks inherent in purely parametric models.^[124] Synthetic control methods have seen refinements for broader applicability in non-experimental settings. Generalized synthetic control approaches, which extend the original method by incorporating interactive fixed effects, have shown superior performance over standard difference-in-differences and synthetic controls in simulations involving staggered adoption or heterogeneous treatments, particularly for health policy evaluations with controlled donor pools.^[125] Recent extensions, such as using multiple outcomes to construct synthetic counterfactuals, mitigate interpolation biases in single-unit interventions, as applied in re-evaluations of policy shocks where pre-treatment fit is optimized across dimensions like economic and social indicators.^[126] These innovations, building on Abadie's 2008 framework, facilitate causal claims in contexts lacking randomized variation, such as regional reforms, with applications documented as early as 2015 in health interventions.^[127] Technological innovations leverage big data for scalable outcome measurement and real-time assessment. Satellite imagery has enabled proxy-based evaluations of environmental and agricultural programs by capturing changes in land cover or crop yields without reliance on household surveys; for example, analyses in Sub-Saharan Africa have used it to assess productivity impacts from development interventions.^[128] Imagery data, including nighttime lights and high-resolution sensors, supports quasi-experimental designs for hard-to-measure outcomes like deforestation, with World Bank evaluations highlighting its advantages in coverage and timeliness since the early 2020s.^[129] Administrative records and call detail records (CDR) provide granular, longitudinal data for difference-in-differences setups, as mapped in systematic reviews linking big data to Sustainable Development Goals outcomes, though causal applications remain limited by endogeneity concerns.^[130] Digital tools have transformed data collection for impact evaluation, enabling real-time monitoring and reducing logistical costs. Mobile-based surveys and GPS-enabled applications facilitate continuous tracking in RCTs and quasi-experiments, as seen in India's sanitation programs where app-based reporting monitored toilet construction and usage daily, allowing adaptive interventions.^[128] Geospatial integration of these tools with satellite data enhances precision in attributing effects, such as in agricultural RCTs measuring plot-level yields via phone geotagging.^[131] A 2023 3ie systematic map indicates growing use of such big data in impact studies, particularly for measurement validation, but underscores gaps in rigorous causal inference integration due to data quality and privacy issues.^[132] These methods, accelerated by post-2020 digital infrastructure expansions, support faster feedback loops in policy cycles compared to traditional endline surveys.^[133]

Barriers to Policy Influence and Scalability

Impact evaluations frequently encounter resistance in translating findings into policy due to political and institutional dynamics that prioritize ideology or expediency over causal evidence. In a analysis of 73 randomized controlled trials conducted across 30 U.S. cities with a national behavioral insights team, positive results prompted policy adoption in only 27% of cases, often due to bureaucratic inertia, competing priorities, and skepticism about external validity beyond pilot settings.^[134] Similarly, policymakers may disregard evaluations conflicting with entrenched interests, as evidenced by persistent underuse of rigorous impact data in domains like education reform, where ideological commitments to unproven approaches prevail despite contrary empirical results.^[135] Dissemination challenges further impede influence, including untimely evaluation outputs and poor alignment between researchers' focus on average treatment effects and policymakers' need for context-specific, actionable insights.^[136] Academic and donor-driven evaluations, while methodologically sound, often fail to engage decision-makers early, leading to findings that are technically credible but politically inert; for example, systematic reviews identify lack of timely, relevant research as the most cited barrier, compounded by institutional silos that fragment evidence uptake.^[137] This disconnect is exacerbated in polarized environments, where evidence is selectively interpreted to fit partisan narratives rather than assessed on causal merits.^[138] Scalability of proven interventions presents distinct hurdles, as pilot successes under controlled conditions rarely persist at larger scopes due to emergent complexities like spillovers, heterogeneous effects, and general equilibrium shifts not captured in randomized designs.^[139] Cost structures, for instance, inflate dramatically upon expansion—small-scale programs may yield high returns in trials funded by external grants, but national rollout demands sustained public budgets amid diminishing marginal benefits and implementation frictions, as seen in attempts to scale micro-interventions in low-income settings where logistical and capacity constraints erode efficacy.^[140] Critiques highlight that many impact evaluations target incremental "islets" of intervention, such as targeted subsidies or nudges, which prove inadequate for systemic poverty reduction requiring institutional overhauls beyond experimental scope.^[141] Lant Pritchett argues this micro-focus yields evidence with limited predictive power for scaled policy, as real-world adoption introduces adaptive changes that alter causal pathways; empirical tracking reveals few RCT-backed programs achieve broad rollout, with adoption rates remaining low due to unaddressed political economy factors like elite capture or weak state capacity.^[142] In development contexts, barriers such as these have constrained scaling of even modestly successful trials, underscoring the gap between localized causal identification and feasible policy transformation.^[143]

References

[1]
Impact evaluation - Better Evaluation
An impact evaluation must establish the cause of the observed changes. Identifying the cause is known as 'causal attribution' or 'causal inference'.
[2]
[PDF] Impact Evaluation in Practice
The basic impact evaluation question essentially constitutes a causal inference problem. Assessing the impact of a program on a series of out- comes is ...
[3]
[PDF] Impact Evaluation, Causal Inference, and Randomized Evaluation
Oct 21, 2024 · M&E is focused on the program (process, output). • Impact evaluation is focused on cause and effect, i.e. attribution, on outcomes. How much did ...
[4]
[PDF] Causal Inference and Impact Evaluation - HAL
Jun 12, 2020 · By definition, an instrumental variable must have a very significant impact on access to the program being evaluated – in this case, the ...
[5]
[PDF] Causal Inference and Experimental Impact Evaluation
What is impact evaluation (IE)?. • IE question: What is the impact (or causal effect) of a program on outcome of interest?
[6]
Heterogeneous Treatment Effects in Impact Evaluation - Eva Vivalt
May 4, 2015 · I do this using a large, unique dataset of impact evaluation results. These data were gath- ered by a nonprofit research organization that. I ...Missing: methods | Show results with:methods
[7]
Common Problems with Formal Evaluations: Selection Bias and ...
This page discusses the nature and extent of two common problems we see with formal evaluations: selection bias and publication bias.Missing: controversies | Show results with:controversies
[8]
Failures in impact evaluation | Research Evaluation - Oxford Academic
Jul 28, 2025 · Researching and evaluating failures In practice, Andrews (2018) argues evaluations are often biased, focusing on reporting outputs and outcomes ...
[9]
Ten Reasons Not to Measure Impact—and What to Do Instead
An impact evaluation should help determine why something works, not merely whether it works. Impact evaluations should not be undertaken if they will provide no ...<|control11|><|separator|>
[10]
[PDF] Introduction to Impact Evaluation - The World Bank
The objective of impact evaluation is to estimate the causal effect or impact of a program on outcomes of interest. Estimate the causal effect (impact) of ...Missing: core | Show results with:core
[11]
[PDF] impact evaluation - | Independent Evaluation Group - World Bank
First, it puts forward the definition of impact evaluation as a. 'counterfactual analysis of the impact of an intervention on final welfare outcomes.' Second ...
[12]
[PDF] Impact Evaluation in Practice - World Bank Documents & Reports
Its main goal is to expand the evidence base on what works to improve health, education, and social protection outcomes, thereby informing development policy.
[13]
[PDF] Impact Evaluation - Climate Investment Funds (CIF)
Impact Evaluation (IE) as defined here is an evaluation that quantitatively analyzes causal links between programs or interventions and a set of outcomes.
[14]
Handbook on Impact Evaluation : Quantitative Methods and Practices
Evaluating impact is particularly critical in developing where resources are scarce and every dollar spent should aim to maximize its impact on poverty ...Missing: core | Show results with:core
[15]
[PDF] Principles for Impact Evaluation - 3ie
Policy-relevant impact evaluations offer clear policy messages based on a deep understanding of context and implementation. 3. Social and economic development ...
[16]
[PDF] The Historical Development of Program Evaluation - OpenSIUC
Program evaluation's historical development is difficult to describe, but includes seven time periods, starting with the first formal use in 1792.
[17]
HISTORY OF EVALUATION - Sage Publishing
While evaluation as a profession is new, evaluation activity began long ago, perhaps as early as Adam and Eve. As defined in Chapter 1, evaluation is a ...
[18]
[PDF] EXPERIMENTAL AND QUASI-EXPERIMENT Al DESIGNS FOR ...
DONALD T CAMPBELL AND JULIAN C. STANLEY decrease the respondent's sensitivity or re sponsiveness to the experimental variable and thus make the results ...
[19]
[PDF] A Look Back at Two Decades of Progress in the Impact Evaluation ...
It is the largest health policy study in US history and paved the way for increased cost sharing for medical care in the 1980s and 1990s. 1990–2000. The results ...
[20]
The history of randomized control trials: scurvy, poets and beer
Apr 18, 2018 · In 1884, we get the first randomization in the social sciences. The (among other things) psychology researcher Charles Pierce was trying to ...
[21]
3ie: Home
3ie has been generating rigorous evide... Learn more · Understanding ... Copyright © 2025 International Initiative for Impact Evaluation (3ie)| All ...JobsAboutTeamDevelopment Evidence PortalImpact evaluations
[22]
Randomized Control Trials | Dime Wiki
Apr 13, 2021 · A randomized controlled trial (RCT) is a method of impact evaluation in which all eligible units in a sample are randomly assigned to treatment and control ...<|separator|>
[23]
Introduction to randomized evaluations - Poverty Action Lab
Randomized evaluations (RCTs) randomly assign participants to treatment and comparison groups to measure the causal impact of an intervention.
[24]
Randomized controlled trials – a matter of design - PMC
Randomized controlled trials (RCTs) are the hallmark of evidence-based medicine and form the basis for translating research data into clinical practice.
[25]
Campbell DT, Stanley JC (1963) - The James Lind Library
Campbell DT, Stanley JC (1963). Experimental and quasi-experimental designs for research. Chicago: Rand McNally & Company.
[26]
Quasi-experimental design and methods | Better Evaluation
Jan 7, 2014 · Quasi-experimental design tests causal hypotheses, like experimental designs, but lacks random assignment, using self or administrator ...
[27]
Difference-in-difference - Better Evaluation
Difference-in-difference involves comparing the before-and-after difference for the group receiving the intervention (where they have not been randomly ...
[28]
Difference-in-Differences | Dime Wiki - World Bank
Aug 7, 2023 · Difference-in-differences takes the before-after difference in treatment group's outcomes. This is the first difference.
[29]
Advances in Difference-in-differences Methods for Policy Evaluation ...
Difference-in-differences (DiD) is a powerful, quasi-experimental research design widely used in longitudinal policy evaluations with health outcomes.
[30]
Regression discontinuity - Better Evaluation
RDD is a quasi-experimental evaluation option that measures the impact of an intervention, or treatment, by applying a treatment assignment mechanism.
[31]
[PDF] Using Regression Discontinuity Design for Program Evaluation
Regression discontinuity design (RDD) is a popular quasi-experimental design used to evaluate program effects. It differs from the randomized control trial (RCT) ...
[32]
Instrumental Variables | Urban Institute
Instrumental variables methods are the backbone of causal inference because they can solve a wide variety of very thorny inference problems.
[33]
Quasi-Experimental Designs for Causal Inference - PMC
The strongest quasi-experimental designs for causal inference are regression discontinuity designs, instrumental variable designs, matching and propensity score ...
[34]
Causal inference and observational data
Oct 11, 2023 · Observational studies using causal inference frameworks can provide a feasible alternative to randomized controlled trials.
[35]
Causal inference with observational data: A tutorial on propensity ...
Propensity score analysis provides a useful way to making causal claims under the assumption of no unobserved confounders.
[36]
Causal inference and effect estimation using observational data
We provide a clear, structured overview of key concepts and terms, intended as a starting point for readers unfamiliar with the causal inference literature.
[37]
Causal inference with observational data: the need for triangulation ...
The goal of much observational research is to identify risk factors that have a causal effect on health and social outcomes.
[38]
Observational Studies: Methods to Improve Causal Inferences - PMC
Mar 23, 2023 · This paper focuses on understanding causal inferences and methods to improve them for observational studies.
[39]
Sources of selection bias in evaluating social programs - PNAS
The selection bias remaining after matching is a substantial percentage—often over 100%—of the experimentally estimated impact of program participation.
[40]
[PDF] Selection Bias - The University of North Carolina at Chapel Hill
Selection bias is a distortion in a measure of association due to a sample selection that does not accurately reflect the target population.
[41]
Biases in randomized trials: a conversation between trialists and ...
Biases in randomized trials: a conversation between trialists and epidemiologists · Selection bias · Performance bias · Detection bias · Attrition bias · Reporting ...
[42]
[PDF] assessing attrition bias
Attrition bias occurs when not all participants' outcomes are measured, and different rates of attrition between groups can bias the estimated intervention ...
[43]
[PDF] Addressing Attrition Bias in Randomized Controlled Trials
Attrition bias occurs when people leaving a study have characteristics correlated with group status or outcomes, creating systematic differences and biased ...
[44]
[PDF] Sample Attrition in Teen Pregnancy Prevention Impact Evaluations
In this brief, we discuss how attrition affects individual- and cluster-level RCTs, how it is assessed, and strategies to limit it. We pay particular attention ...
[45]
Attrition bias | Catalog of Bias
Attrition bias is the unequal loss of participants from study groups, where systematic differences between those who leave and those who stay can bias results.
[46]
Assessing the impact of attrition in randomized controlled trials
The aim of this study was to investigate the impact of attrition on baseline imbalance within individual trials and across multiple trials.
[47]
Assessing the impact of attrition in randomized controlled trials
The aim of this study was to investigate the impact of attrition on baseline imbalance within individual trials and across multiple trials.
[48]
Reporting attrition in randomised controlled trials - PMC - NIH
Such attrition prevents a full intention to treat analysis being carried out and can introduce bias., Attrition can also occur when participants have missing ...
[49]
A Graphical Catalog of Threats to Validity - PubMed Central - NIH
Apr 2, 2020 · We define the Campbell tradition's named threats to validity. For each threat, we provide the epidemiologic analog, a corresponding DAG, and one ...
[50]
Threats to validity - Program Evaluation - Andrew Heiss
Oct 28, 2020 · ... threats-validity ... One helpful way to assess an evaluation's internal validity is to systematically go through each possible threat and evaluate ...
[51]
Internal Validity in Impact Evaluation: Overview, Importance, and ...
Nov 22, 2022 · History: History is a threat to the internal validity of an experiment. History is any event besides the independent variable that happened ...
[52]
[PDF] SITE SELECTION BIAS IN PROGRAM EVALUATION
Feb 13, 2015 · “Site selection bias” can occur when the probability that a program is adopted or evaluated is correlated with its impacts.Missing: contextual | Show results with:contextual<|control11|><|separator|>
[53]
Causal Inference Using Potential Outcomes - Taylor & Francis Online
Causal effects are defined as comparisons of potential outcomes under different treatments on a common set of units.
[54]
Introduction to the Potential Outcomes Framework
Jan 18, 2021 · The Potential Outcomes Framework (aka the Neyman-Rubin Causal Model) is arguably the most widely used framework for causal inference in the ...Potential outcomes in a nutshell · The Rubin Causal Model · Causal Estimands
[55]
The central role of the propensity score in observational studies for ...
The propensity score is the conditional probability of assignment to a particular treatment given a vector of observed covariates.
[56]
[PDF] Instrumental Variables in Action: Sometimes You Get What You Need
Angrist and Evans (1998) solve this omitted-variables problem using two instrumental variables, both of which lend themselves to Wald-type estimation strategies ...
[57]
Regression discontinuity designs: A guide to practice - ScienceDirect
The sharp regression discontinuity design. It is useful to distinguish between two general settings, the sharp and the fuzzy regression discontinuity (SRD ...
[58]
[PDF] Regression Discontinuity Designs: A Guide to Practice
This paper was prepared as an introduction to a special issue of the Journal of Econometrics on regression discontinuity designs.
[59]
[PDF] HOW MUCH SHOULD WE TRUST DIFFERENCES-IN ...
HOW MUCH SHOULD WE TRUST. DIFFERENCES-IN-DIFFERENCES ESTIMATES? ∗. Marianne Bertrand. Esther Duflo. Sendhil Mullainathan. This Version: June 2003. Abstract.
[60]
[PDF] NBER WORKING PAPER SERIES HOW MUCH SHOULD WE ...
Difference in differences estimation, which deals with small effective sample size, and complicated error distribution, seems a particularly fertile ground ...
[61]
Causal Inference Methods for Combining Randomized Trials and ...
Oct 7, 2025 · In this paper, we review the growing literature on methods for causal inference on combined RCTs and observational studies, striving for the ...
[62]
New 3ie handbook for measuring cost-effectiveness in impact ...
Jun 4, 2024 · The handbook provides a comprehensive 'how-to' guide for implementing cost-effectiveness analysis (CEA) in impact evaluation.
[63]
Sounds good… but what will it cost? Making the case for rigorous ...
Dec 11, 2019 · The standards of rigor for integrating CEA/CBA analysis into academic impact evaluation studies in development economics are not well-defined.
[64]
[PDF] Integrating Value for Money and Impact Evaluations
An impact evaluation was classified as having a cost-benefit analysis (CBA) if it included a comparison of estimates of Costs and Benefits (with data of costs ...
[65]
Why don't economists do cost analysis in their impact evaluations?
May 10, 2016 · Cost-benefit (CB) analysis examines the rate of return of an intervention: For example, what is the present value of lifetime benefits of a ...
[66]
Integrating Value for Money and Impact Evaluations - eScholarship
This mixed methods study investigates why fewer than one in five impact evaluations integrates a value-for-money analysis of the development intervention ...
[67]
Randomised controlled trial | Better Evaluation
Nov 12, 2021 · An impact evaluation approach that compares results between a randomly assigned control group and experimental group or groups to produce an ...
[68]
[PDF] Instruments of development: Randomization in the tropics, and the ...
RCTs are seen as generating gold standard evidence. Page 26. 24 that is superior to econometric evidence, and that is immune to the methodological criticisms ...
[69]
[PDF] Alternatives to Traditional Randomized Controlled Trials
Randomized controlled trials (RCTs) have long been considered the “gold standard” for evaluating program impacts. Randomization minimizes selection-related.
[70]
Rethinking the pros and cons of randomized controlled trials ... - NIH
Jan 18, 2024 · Randomized controlled trials (RCTs) have traditionally been considered the gold standard for medical evidence. However, in light of emerging ...
[71]
Methods for Evaluating Causality in Observational Studies - NIH
In clinical medical research, causality is demonstrated by randomized controlled trials (RCTs). Often, however, an RCT cannot be conducted for ethical reasons, ...
[72]
Chapter 26 Quasi-Experimental Methods | A Guide on Data Analysis
Quasi-experimental methods offer valuable tools for causal inference when RCTs are not feasible. However, these designs come with important limitations that ...
[73]
How to Use Quasi-Experimental Methods in Cardiovascular Research
Feb 16, 2024 · In research, randomized controlled trials (RCTs) provide the strongest causal inference for treatment and effect. Increasingly, quasi- ...What The Study Adds · How To Exploit Qems · Interrupted Time Series...
[74]
[PDF] Some Comments on Deaton (2009) and Heckman and Urzua (2009)
For support for his position that. “Randomization is not a gold standard”. (Deaton, p. 4), Deaton quotes Nancy. Cartwright (2007) as claiming that “there is no ...<|separator|>
[75]
Understanding and misunderstanding randomized controlled trials
According to Chalmers (2001) and Bothwell and Podolsky (2016), the development of randomization in medicine originated with Bradford-Hill, who used ...
[76]
A comparison of four quasi-experimental methods: an analysis of the ...
Nov 3, 2022 · The aim of this study is to compare some of the commonly used non-experimental methods in estimating intervention effects, and to highlight their relative ...Estimation Models · Data And Methods · Discussion<|separator|>
[77]
[PDF] Should the Randomistas (Continue to) Rule?
While RCTs have an important place in the toolkit for impact evaluation, an unconditional preference for RCTs as the “gold standard” is questionable on three ...
[78]
The Abdul Latif Jameel Poverty Action Lab
... J-PAL conducts randomized impact evaluations to answer critical questions in the fight against poverty. Overview. The Abdul Latif Jameel Poverty Action Lab (J- ...Careers · About Us · Staff · J-PAL North AmericaMissing: history | Show results with:history
[79]
Are randomised controlled trials positivist? Reviewing the social ...
We conclude that the most appropriate paradigm for RCTs of social interventions is realism not positivism.
[80]
[PDF] Theory-based impact evaluation
Theory-based evaluation does not estimate the net effect of an intervention, but it can help us identify controls and confounding factors that can inform the ...
[81]
Using realist evaluation to open the black box of knowledge translation
Sep 5, 2014 · Theory-based or theory-driven approaches provide an alternative to black box evaluation that examine not only outcome, but also the possible ...
[82]
Theory-Based Approaches to Evaluation: Concepts and Practices
Mar 22, 2021 · Approaches include theory-based evaluation (Weiss, 1995, 2000), theory-driven evaluation ... Theory-based evaluation and varieties of complexity.
[83]
Postpositivist Paradigm and Program Evaluation
Oct 8, 2025 · The main differences between positivism and postpositivism are the level of certainty and their contrasting positions on metaphysics.
[84]
https://thomasmtaston.medium.com/where-are-the-shy-positivists-193b6aeb6769
[85]
Issues in the theory-driven perspective - ScienceDirect
There is currently a strong movement in program evaluation to move from black box evaluations, concerned primarily with the relationship between the inputs ...
[86]
Understanding and misunderstanding randomized controlled trials
RCTs can play a role in building scientific knowledge and useful predictions but they can only do so as part of a cumulative program.
[87]
The ethics of a control group in randomized impact evaluations
Jul 6, 2011 · One concern is with equity. Systematically favoring the treatment subjects with an intervention can be seen as unfair (although presumably we ...
[88]
[PDF] Deaton Cartwright RCTs with ABSTRACT August 25
Aug 25, 2025 · Understanding and misunderstanding randomized controlled trials. Angus Deaton and Nancy Cartwright. Princeton University.
[89]
[PDF] An Introduction to Impact Evaluations with Randomized Designs1
Randomized experiments are increasingly popular ways to evaluate the impacts of develop- ment interventions. They provide hope that we can overcome important ...
[90]
The Problem With Evidence-Based Policies by Ricardo Hausmann
Feb 25, 2016 · Ricardo Hausmann shows why randomized control trials are the wrong way to test interventions in many areas.
[91]
Reconsidering evidence-based policy: Key issues and challenges
Key issues include the relevance of evidence, interaction between research and policy, and the view of EBP as "technocratic" with a preference for quantitative ...Abstract · Introduction · The evolution and purpose of... · Forms of knowledge and...
[92]
[PDF] Instruments, Randomization, and Learning about Development
RCTs are seen as generating gold standard evidence that is superior to econometric evidence and that is immune to the methodological criticisms that are.
[93]
Microcredit: Impacts and promising innovations - Poverty Action Lab
May 1, 2023 · A meta-analysis of seven randomized evaluations similarly found that the impact of microcredit was negligible for households with no business ...
[94]
Publication: Evaluation of Development Programs
In this context RCTs are less suitable even for the simplest interventions. The TPE can be estimated by applying regression techniques to observational data ...
[95]
[PDF] Using RCTs to Estimate Long-Run Impacts in Development ...
This review article surveys what we have learned about the determinants of long-run living standards from this growing body of RCTs in development economics, ...
[96]
Conditional Cash Transfers: The Case of Progresa/Oportunidades
This article reviews the literature on the development, evaluation, and findings of Progresa/Oportunidades, summarizing what is known about program effects.
[97]
The Impact of PROGRESA on Health in Mexico - Poverty Action Lab
PROGRESA involves a cash transfer that is conditional on the recipient household engaging in a set of behaviors designed to improve health and nutrition.
[98]
The impact of Mexico's conditional cash transfer programme ... - NIH
The Oportunidades conditional cash transfer programme improved birthweight outcomes. This finding is relevant to countries implementing conditional cash ...
[99]
Unconditional Cash Transfers: A Bayesian Meta-Analysis of ...
Aug 1, 2024 · We use Bayesian meta-analysis methods to estimate the impact of unconditional cash transfers (UCTs). Aggregating evidence from 115 studies of 72 UCT programs ...
[100]
The impact of mass deworming programmes on schooling and ... - NIH
The study did not find any evidence of effect on nutritional status, cognitive tests or school grades achieved, but these are not reported in the abstracts.
[101]
Deworm the World | Evidence Action
A 2022 meta-analysis found that deworming leads to an average weight gain of 0.3kg in children (that's the equivalent to moving a three-year-old from the 25th ...
[102]
Reanalysis of health and educational impacts of a school ... - 3ie
3ie funded a two-part replication study of Edward Miguel and Michael Kremer's well-known impact evaluation of a school-based deworming programme in Kenya.
[103]
[PDF] Six Randomized Evaluations of Microcredit - MIT Economics
Causal evidence on microcredit impacts informs theory, practice, and debates about its effectiveness as a development tool. The six randomized evaluations ...
[104]
First generation of microcredit RCTs - Microfinance - VoxDev
Jan 30, 2025 · In this section, we review randomised controlled trials (RCTs) that provide causal evidence on the impacts of microcredit programmes.
[105]
[PDF] Should the Randomistas (Continue to) Rule?
One source is the existence of externalities in evaluations. There is evidence that having an impact evaluation in place for an ongoing development project ...<|separator|>
[106]
[PDF] Evaluation of Development Programs: Randomized Controlled ...
Sep 7, 2013 · An RCT evaluation might involve drawing a random sample from the popula- tion and assign treatment randomly within this sample. The researcher ...
[107]
Impact of institutional reform on development outcomes - GSDRC
Recent studies find that many institutional reforms do not seem to make government function better, often have quite poor results, and rarely lead to ...
[108]
https://economics.mit.edu/files/792
[109]
the impact evaluation of the institutional reforms of the one-stop ...
Jan 5, 2022 · This paper examines the impacts of institutional reform of the One-Stop Service (OSS) structures on increases in Indonesia's economic growth.
[110]
A multicity randomized trial at crime hot spots - PMC - NIH
Mar 28, 2022 · Our study is a randomized trial in policing confirming that intensive training in procedural justice (PJ) can lead to more procedurally just behavior.Data And Analytic Approach · Community Survey · Results
[111]
Full article: The Impact of Training on Use of Force by Police in an ...
Oct 16, 2024 · We conclude that the PPST curriculum appears effective at reducing use of force by police in a large scale, robust trial.
[112]
Public sector reforms and their impact on the level of corruption
May 24, 2021 · The focus of this review is administrative corruption, namely corrupt acts involving civil servants in their dealings with their superiors, ...
[113]
J-PAL Courses | The Abdul Latif Jameel Poverty Action Lab
J-PAL courses help implementers, policymakers, and researchers become better users and producers of evidence and equip learners with skills in impact evaluation ...Evaluating Social Programs · Diploma in Impact Evaluation · Online Courses
[114]
Our Impact | Innovations for Poverty Action
Sep 28, 2022 · IPA is the R&D engine of the development sector, with high-quality research using the same method used in medical trials, ie randomized evaluations.
[115]
Resources and Tools for Impact Evaluation | IPA
IPA assembled this set of resources for use in designing and running an impact evaluation. Beginning with the need for a theory-driven evaluation.
[116]
Identifying When, Why, and How to Use Impact Evaluations | IPA
This case study provides lessons learned on identifying how, when, and why to conduct an impact assessment in large organizations.
[117]
The Strategic Impact Evaluation Fund (SIEF) - World Bank
The World Bank's Strategic Impact Evaluation Fund (SIEF) supports scientifically rigorous research that measures the impact of programs and policies.
[118]
IFPRI and causal impact evaluation: Evidence for real-life policies
Sep 25, 2025 · An excellent example is the impact evaluation of the HarvestPlus Reaching End Users (REU) program, which showed that an integrated approach to ...
[119]
Reinvigorating Impact Evaluation for Global Development
2011. Marking a major step forward for impact evaluations of aid programs, the Millennium Challenge Corporation (MCC) and the US Agency ...Missing: promoting | Show results with:promoting
[120]
[PDF] Randomizing Development: Method or Madness? - Lant Pritchett
Arguments that RCT research is a good (much less “best”) investment depend on both believing in an implausibly low likelihood that non-RCT research can improve ...
[121]
Randomized control trials for development? Three problems
May 11, 2017 · First, there is a systematic bias toward analysis of private goods as opposed to public goods. Private goods are excludable since a seller needs ...
[122]
[PDF] The Debate about RCTs in Development is Over - Lant Pritchett
returns to contract teachers from dozens of experiences (Murgai and Pritchett 2006) but also already known but scalability was limited as every single one was ...
[123]
Some questions of ethics in randomized controlled trials - Khera
May 26, 2023 · This paper highlights eight areas of concern. RCTs also have a disproportionate influence on shaping research agendas and on policy.
[124]
Machine learning in policy evaluation: new tools for causal inference
Mar 1, 2019 · Abstract:While machine learning (ML) methods have received a lot of attention in recent years, these methods are primarily for prediction.
[125]
A comparison of methods for health policy evaluation with controlled ...
In our simulations, the generalized synthetic control approach outperformed more commonly used methods (difference‐in‐differences and synthetic control methods ...
[126]
Using Multiple Outcomes to Improve the Synthetic Control Method
Feb 21, 2025 · The synthetic control method (SCM) estimates a treated unit's counterfactual untreated outcome via a weighted average of observed outcomes for ...Missing: innovations | Show results with:innovations<|separator|>
[127]
Examination of the Synthetic Control Method for Evaluating Health ...
Oct 7, 2015 · This paper examines the synthetic control method in contrast to commonly used difference‐in‐differences (DiD) estimation, in the context of a re‐evaluation of ...
[128]
Emerging Trends in Impact Evaluation: 7 Innovative Approaches to ...
Jan 15, 2025 · By adopting trending methodologies such as mixed-methods evaluation, real-time monitoring, and big data analytics, development practitioners can ...
[129]
Leveraging Imagery Data in Evaluations
Feb 26, 2024 · This paper explores the potential of imagery data in evaluations and presents various data types and methodologies demonstrating their advantages and ...Missing: big administrative records
[130]
Using big data for evaluating development outcomes: A systematic ...
The study maps different sources of big data onto development outcomes (based on SDGs) to identify current evidence base, use and the gaps.
[131]
[PDF] Geospatial Analysis in Impact Evaluation - 3ie
Geospatial analysis uses data to measure intervention impacts, align data spatially, and process remotely sensed observations, enabling more precise analysis.Missing: records | Show results with:records
[132]
State of play for big data in impact evaluation - 3ie
Dec 21, 2023 · This presentation provides an overview of one stock-taking effort recently conducted by 3ie to get a sense of the scope, scale, and applications of big data ...
[133]
Recommendation 2: Digital Transformation
Technological advances in Wi-Fi, cell phones, GPS, and satellite imagery have made gathering and sharing data much easier, and new types of software make this ...
[134]
Bottlenecks for Evidence Adoption | Journal of Political Economy
We study 30 US cities that ran 73 RCTs with a national nudge unit. Cities adopt a nudge treatment into their communications in 27% of the cases.
[135]
Policy Evaluation in Polarized Polities: The Case of Randomized ...
This paper provides a political-economic analysis of policy evaluation. We focus on Randomized Controlled Trials (RCTs) as a subset of policy evaluations.
[136]
A systematic review of barriers to and facilitators of the use of ...
Jan 3, 2014 · The most frequently reported barriers to evidence uptake were poor access to good quality relevant research, and lack of timely research output.
[137]
Scientific evidence and public policy: a systematic review of barriers ...
Barriers included institutional fragmentation, limited access to actionable data, political resistance to scientific inputs, and lack of incentives for ...
[138]
Evidence-based policymaking is not like evidence-based medicine ...
Apr 26, 2017 · The most frequently-reported barriers relate to problems with disseminating high quality information effectively, namely the lack of time, ...
[139]
The challenges of scaling effective interventions: A path forward for ...
RCT evidence by itself offers an incomplete prediction of the effects of policy, due to heterogenous effects, spillovers and general equilibrium changes, ...Missing: based | Show results with:based
[140]
Implementing successful small interventions at a large scale is hard
Mar 19, 2020 · Cost issues can make scaling prohibitive. Programs that have large and promising effects when delivered to small numbers of households or firms ...
[141]
[PDF] Let's Take the Con Out of Randomized Control Trials in Development
May 1, 2021 · Abstract. The enthusiasm for the potential of RCTs in development rests in part on the assumption that the use of the rigorous evidence that ...Missing: barriers | Show results with:barriers
[142]
If Randomised Control Trials (RCTs) improve global development ...
Apr 21, 2020 · While the 'randomistas' proffer RCTs as the most rigorous approach to impact evaluation, there has been a pushback from critics on its gold-standard claim.
[143]
The challenges of scaling effective interventions: A path forward for ...
We suggest strategies for tightening the link between development research and anti-poverty policy, for example, by changing the practice of RCTs.