Fact-checked by Grok 2 weeks ago

Causal inference

Causal inference is the branch of statistics and dedicated to identifying and estimating cause-and-effect relationships from observational or experimental , going beyond mere associations to determine how interventions on one affect outcomes in others. It relies on explicit causal assumptions, such as unconfoundedness or the absence of , to interpret effects like the (ATE), defined as the expected difference in potential outcomes under and . Unlike correlational , which measures dependencies like P(Y|X), causal inference addresses interventional queries like P(Y|do(X)), using tools such as counterfactual reasoning to evaluate "" scenarios. The field encompasses several foundational frameworks, including the potential outcomes model developed by in 1923 and formalized by Donald Rubin in 1974, which defines causal effects for individual units as the difference between outcomes had the unit received versus control, aggregated to population-level estimates under assumptions like stable unit assumption (SUTVA). Complementing this is Judea Pearl's structural causal model (SCM), introduced in the 1990s, which integrates graphical models with structural equations to represent causal mechanisms, enabling identification via criteria like back-door adjustment to control for confounders. These approaches trace roots to early 20th-century work, such as Sewall Wright's path analysis in 1921 for genetic causation, and have evolved through econometric contributions like James Heckman's selection models in 1979. The importance of these methods was recognized by the 2021 Nobel Memorial Prize in Economic Sciences awarded to , Joshua D. Angrist, and Guido W. Imbens for their empirical approach to analyzing causal relationships. Key methods for causal estimation include randomized controlled trials (RCTs), the gold standard for establishing causality through to balance covariates, as emphasized by in 1935; propensity score matching and weighting to mimic in observational data; instrumental variables () to address , as in Angrist, Imbens, and Rubin's 1996 work on local average treatment effects (LATE); and regression discontinuity designs exploiting cutoff rules for quasi-experimental variation. Modern advancements incorporate , such as double machine learning for robust inference amid high-dimensional confounders and causal forests for heterogeneous effects. Causal inference is pivotal across disciplines: in for evaluating interventions like efficacy; in for policy impacts such as effects; in social sciences for program evaluations like job training initiatives; and in for in personalized recommendations or algorithmic fairness. Challenges persist, including handling , spillovers, and untestable assumptions, underscoring the need for transparent modeling and analyses.

Introduction

Definition and Scope

Causal inference is the process of determining whether, to what extent, and how a cause contributes to an , employing statistical, epidemiological, and computational methods to estimate causal from . This discipline formalizes assumptions about to distinguish genuine causal relationships from mere associations, enabling researchers to answer questions about interventions and their impacts on outcomes of interest. The philosophical roots of causal inference trace back to David Hume's 18th-century ideas, where causation is understood as arising from the repeated observation of constant conjunction between events, rather than any inherent necessary connection discernible by reason alone. In modern practice, causal inference spans both experimental and non-experimental settings: randomized controlled trials (RCTs) serve as the gold standard by balancing participant characteristics through randomization to attribute outcomes directly to interventions, while observational studies address scenarios where RCTs are unethical, impractical, or cost-prohibitive. However, real-world observational data often introduce challenges, such as limited generalizability due to non-representative samples and vulnerability to biases that RCTs mitigate more effectively. As an interdisciplinary field, causal inference integrates insights from statistics, , , , and , providing a unifying lens for cause-effect analysis across , sciences, and beyond. The potential outcomes framework exemplifies this by modeling what outcomes would occur under different interventions, though it requires careful assumption validation.

Historical Overview

The philosophical foundations of causal inference trace back to David Hume's 1748 work, An Enquiry Concerning Human Understanding, where he argued that causation arises from the constant conjunction of events observed in experience, rather than any inherent necessary connection between cause and effect discernible by reason alone. Hume emphasized that our belief in causal relations stems from habitual association formed through repeated observations of events occurring together, laying the groundwork for distinguishing empirical patterns from deeper causal mechanisms. In the late 19th and early 20th centuries, the development of statistical methods began to formalize the study of associations that had described philosophically. introduced the in 1895 as a measure of linear dependence between variables, providing a quantitative tool to assess the strength of observed conjunctions, though it could not distinguish causation from mere . Building on this, advanced experimental design in the 1920s and 1930s, particularly through his 1935 book , where he stressed the importance of to ensure that observed effects in controlled trials could be attributed to the intervention rather than confounding factors. Mid-20th-century contributions shifted focus toward rigorous frameworks for estimating causal effects. formalized the potential outcomes model in 1923, originally in the context of agricultural field experiments, defining causal effects as the difference between outcomes under treatment and control for the same units, and highlighting the role of in unbiased estimation. In the 1970s, Donald Rubin refined this approach, extending it to nonrandomized studies by articulating the , which clarified assumptions like stable unit treatment value and the need for matching or weighting to approximate . The late 20th century saw the integration of graphical representations to model causal structures. Judea Pearl developed causal graphical models in the 1980s and 1990s, introducing directed acyclic graphs to encode assumptions about confounding and enabling identification strategies like the do-calculus for interventional queries in observational data. Entering the 21st century, causal inference merged with machine learning, exemplified by the double machine learning framework proposed by Chernozhukov et al. in 2016 (published 2018), which combines flexible prediction algorithms with debiased estimation to handle high-dimensional confounders while targeting causal parameters.

Core Concepts

Causation versus Correlation

In causal inference, a fundamental challenge is distinguishing between , which measures the extent to which two variables co-vary, and causation, which implies that changes in one variable directly produce changes in another. is typically quantified using Pearson's product-moment , defined as r = \frac{\cov(X,Y)}{\sigma_X \sigma_Y}, where \cov(X,Y) is the between variables X and Y, and \sigma_X and \sigma_Y are their standard deviations. This metric, introduced by in 1895, captures linear associations but provides no insight into whether one variable influences the other. In contrast, causation requires evidence from interventions, such as whether forcing X to a specific value (denoted as do(X)) alters Y, as formalized in Judea Pearl's framework where the interventional distribution P(Y \mid do(X)) differs from the observational conditional P(Y \mid X). Without such evidence, observed associations may reflect mere coincidence, , or other non-causal mechanisms. Several common fallacies arise when equating with causation. Spurious correlations occur when two variables appear related due to a third factor or random chance, rather than any direct link; for instance, seasonal increases in both sales and attacks are driven by warmer weather increasing beachgoers and consumption, not by attracting . Reverse causation reverses the assumed direction, as when an outcome influences the exposure, such as early symptoms of illness prompting behavioral changes that mimic the exposure causing the disease. bias emerges when analyzing data conditioned on a "" variable—a common effect of both exposure and outcome—which artificially induces an between them; for example, restricting analysis to hospitalized patients (a affected by both disease severity and treatment-seeking behavior) can create spurious links between unrelated risk factors. Illustrative historical examples highlight these issues. In the mid-20th century, epidemiological observations revealed a strong between and , but skeptics initially dismissed it as non-causal, attributing it to personality traits or genetic factors shared by smokers and cancer patients; only through rigorous case-control studies by and in 1950, showing odds ratios up to 30 times higher for heavy smokers, did evidence mount for as the cause. further demonstrates how correlations can mislead in aggregated data: in one classic setup, a treatment may appear less effective overall but superior within subgroups (e.g., by patient severity), reversing when data are pooled due to uneven group sizes—a phenomenon first described by Edward Simpson in 1951 and rooted in earlier work by and George Yule. To establish causation, observational studies require controlling for confounders—variables influencing both exposure and outcome—or, preferably, to break such dependencies. emphasized in experimental design as early as 1925, arguing it ensures treatment assignment is independent of potential outcomes, thereby isolating causal effects without systematic . Without these safeguards, correlations remain suggestive at best but insufficient for causal claims.

Potential Outcomes Framework

The potential outcomes framework, also known as the , formalizes causal inference through counterfactual reasoning, defining causal effects as comparisons between outcomes that would occur under different conditions for the same units. This approach treats potential outcomes as fixed but unobserved variables, enabling precise statistical definitions of without requiring mechanistic models of how treatments operate. Originating from Neyman's work on randomized experiments and extended by , the framework shifts focus from associations to what would have happened had treatment assignment differed. Central to the framework are potential outcomes for each i: Y_i(1), the outcome under , and Y_i(0), the outcome under . The individual causal effect for i is then \tau_i = Y_i(1) - Y_i(0). Since both potential outcomes cannot be observed for any single —the fundamental problem of causal inference—the (ATE) aggregates across units as \mathbb{E}[\tau_i] = \mathbb{E}[Y(1) - Y(0)]. This expectation represents the population-level causal impact of . To identify the ATE from observed data, key assumptions are required, including the Stable Unit Treatment Value Assumption (SUTVA), which posits no between units (one unit's does not affect another's outcome) and (the observed outcome matches the potential outcome under the assigned , with no hidden variations in treatment delivery). Another critical assumption is ignorability, or the absence of unmeasured , stating that assignment is independent of the potential outcomes conditional on observed covariates: \{Y(1), Y(0)\} \perp T \mid X. In randomized controlled trials (RCTs), directly satisfies ignorability by balancing both observed and unobserved covariates across groups, allowing unbiased of the ATE as the difference in observed means: \mathbb{E}[Y \mid T=1] - \mathbb{E}[Y \mid T=0] = \mathbb{E}[Y(1) - Y(0)]. Under the assumptions of SUTVA and , this simple difference identifies the causal effect without further adjustment. For example, in a evaluating a drug's , the ATE quantifies the average improvement in outcomes attributable to the drug across all participants. The framework extends beyond the ATE to other estimands, such as the on the treated (ATT), defined as \mathbb{E}[Y(1) - Y(0) \mid T=1], which focuses on the causal for units actually receiving and is particularly relevant in observational settings where treatment uptake is selective. It also accommodates heterogeneous treatment s, where \tau_i varies across units due to interactions with covariates, enabling analyses like \mathbb{E}[\tau_i \mid X=x] to reveal moderation. These extensions maintain the core counterfactual logic while supporting targeted inferences in diverse applications.

Structural Causal Models

Structural causal models (SCMs) formalize causal relationships through a combination of directed acyclic graphs (DAGs) and structural equations, enabling the representation and analysis of causal structures in complex systems. In this framework, each is depicted as a in the DAG, with directed edges signifying causal mechanisms from cause to variables. Exogenous variables, which are not influenced by other variables in the model, capture external influences, while endogenous variables are determined by the structural equations involving their parents in the graph. This graphical structure allows for explicit modeling of causal pathways, including confounders—common causes that produce spurious associations between variables by sending edges to multiple descendants. A central feature of SCMs is the do-operator, which encodes interventions by severing incoming edges to a variable and setting it to a specific value, thereby distinguishing causal effects from mere associations. The interventional query P(Y | do(X = x)) estimates the distribution of Y under an intervention that forces X to x, in contrast to the observational conditional P(Y | X = x), which may be confounded. To identify such effects from observational data, the backdoor criterion provides a graphical test: a set of variables Z is admissible for adjustment if it contains no descendants of X and blocks all backdoor paths—non-directed paths from X to Y that initiate with an arrow into X. Under this criterion, the causal effect is given by the backdoor adjustment formula: P(Y | do(X)) = \sum_z P(Y | X, z) P(z) where the summation is over the values of Z. This formula recovers the interventional distribution solely from observable data. For scenarios involving unmeasured confounders, the front-door criterion offers an alternative identification strategy, particularly useful for mediation analysis. It applies when a mediator set M intercepts all directed paths from X to Y, no unblocked backdoor paths exist from X to M, and all backdoor paths from M to Y are blocked by X. The effect is then identifiable as P(Y | do(X = x)) = \sum_m P(M = m | X = x) \sum_{x'} P(Y | X = x', M = m) P(X = x'), leveraging the mediator to bypass direct confounding. Additionally, d-separation serves as the foundational criterion for reading conditional independencies from the DAG: two sets of variables are conditionally independent given a third set if every path between them is blocked, where a path is blocked by including or excluding appropriate colliders and common causes. This property underpins the graphical model's ability to encode the joint distribution via Markov factorization. SCMs offer significant advantages in , as the explicit graphical representation facilitates handling unmeasured when the causal structure is known, allowing strategies that observational conditionals alone cannot achieve. Furthermore, the supports causal algorithms that infer DAG structures from patterns of conditional independencies and dependencies in , bridging and empirical inference. This graphical approach complements the potential outcomes by providing tools for structural and .

Methodological Foundations

Experimental Approaches

Experimental approaches in causal inference primarily rely on randomized controlled trials (RCTs), which are considered the gold standard for establishing causal relationships due to their ability to minimize through . In an RCT, participants are randomly allocated to either a treatment group receiving the or a control group receiving a or standard care, ensuring that known and unknown are balanced across groups on average. This process underpins the of RCTs, allowing researchers to attribute differences in outcomes directly to the rather than selection biases or confounding variables. To further reduce bias, RCTs often incorporate blinding, where participants, researchers, or both are unaware of the group assignments. Single-blind designs mask the assignment from participants to prevent effects, while double-blind designs additionally conceal it from those administering the to avoid . These elements of design help isolate the causal effect of the , assuming the stable unit treatment value assumption (SUTVA) holds, where the treatment received by one unit does not affect others. Analysis of RCT data typically employs intention-to-treat (ITT) principles, which include all randomized participants in their assigned groups regardless of compliance, preserving and providing a pragmatic estimate of the 's real-world effect. In contrast, per-protocol analysis restricts the sample to those who fully adhered to the assigned treatment, yielding a more explanatory estimate but potentially introducing from non-random dropout. is crucial for adequate statistical power; for detecting a difference in means \delta between two groups with standard deviation \sigma, assuming equal group sizes and a two-sided test, the required sample size per group is given by: n = \frac{(Z_{\alpha/2} + Z_{\beta})^2 \cdot 2\sigma^2}{\delta^2} where Z_{\alpha/2} and Z_{\beta} are the z-scores for the significance level and power, respectively. The primary strength of RCTs lies in their high internal validity, achieved through randomization, which enables unbiased estimation of causal effects under ideal conditions. However, generalizability to broader populations—external validity—can be limited by strict eligibility criteria or controlled settings that do not reflect real-world variability. A landmark example is the 1954 Salk field trial, involving over 1.8 million children randomly assigned to vaccine or groups across multiple U.S. sites, which demonstrated the vaccine's efficacy in reducing paralytic cases by about 80% in the vaccinated cohort. In technology, applies RCT principles to compare variants, such as webpage layouts, by randomly exposing subsets of users and measuring outcomes like click-through rates to infer causal impacts on engagement. Despite these advantages, RCTs face limitations including high costs for large-scale implementation, ethical concerns when withholding potentially beneficial treatments (e.g., in superiority ), and challenges in when trial conditions differ from everyday .

Observational Data Challenges

Observational data, unlike data from randomized experiments, lack to treatments, making it difficult to distinguish causal effects from mere associations due to systematic biases. These biases can arise from the data generation process itself, leading to causal inferences if not properly addressed. Confounding represents a core challenge, occurring when an unmeasured or uncontrolled variable influences both the treatment assignment and the outcome, thereby creating a between them. For example, in studies assessing the causal impact of on health outcomes, often acts as a by simultaneously shaping to and health-related behaviors or resources. Selection bias emerges from non-random inclusion of subjects into the study sample, which can distort the distribution of variables and induce artificial dependencies. This includes , where conditioning on a common effect of the exposure and outcome opens a non-causal , potentially reversing or exaggerating associations; , a historical form of in hospital-based studies, illustrates how selection on multiple conditions can bias estimates toward the null for independent risks. In epidemiological cohort studies, exemplifies , where individuals who adhere to treatments tend to engage in other health-promoting behaviors, leading to overestimation of treatment benefits as healthier users systematically differ from non-adherers. Measurement error in covariates or outcomes adds another layer of complication, as inaccuracies in can bias causal estimates. Classical measurement error, characterized by observed values as true values plus independent noise, generally attenuates effect estimates toward zero in linear models. Berkson error, conversely, involves true values fluctuating around a fixed observed value, which may preserve or even inflate associations depending on the error structure and model assumptions. Basic strategies to mitigate these challenges in observational data include matching, which pairs treated and untreated units based on observed covariates to approximate balance as in , and , which divides the sample into homogeneous subgroups to control for confounders within each layer. These methods seek to close backdoor paths from treatment to outcome, aligning with criteria from structural causal models.

Quasi-Experimental Designs

Quasi-experimental designs leverage natural or policy-induced variations to approximate the conditions of randomized experiments, enabling causal inference in observational settings where true is infeasible. These methods exploit discontinuities, time-based interventions, or comparative group structures to identify treatment effects, often under assumptions that mimic locally or over time. By addressing through such designs, researchers can estimate parameters akin to the (ATE) outlined in the potential outcomes framework, though with reliance on untestable identifying assumptions.

Difference-in-Differences (DiD)

Difference-in-differences compares changes in outcomes over time between a treated group exposed to an and an untreated group, isolating the causal by differencing out common trends. This approach assumes parallel trends, meaning that in the absence of , the outcome trajectories for both groups would evolve similarly over time. The DiD is given by the difference in post- and pre-treatment outcome changes between groups: \hat{\tau}_{DiD} = \left( E[Y_{post,treat} - Y_{pre,treat}] \right) - \left( E[Y_{post,control} - Y_{pre,control}] \right) where Y denotes the outcome, subscripts indicate treatment status and time period, and E[\cdot] is the expectation operator. This formula captures the treatment effect under the parallel trends assumption, assuming no anticipation effects or spillover between groups. A seminal application is the study by Card and Krueger (1994), which used DiD to evaluate the 1992 minimum wage increase in New Jersey by comparing employment at fast-food restaurants in New Jersey (treated) and neighboring Pennsylvania (control) before and after the policy change, finding no significant employment reduction.

Regression Discontinuity Design (RDD)

Regression discontinuity design exploits a known in a continuous running , such as a or , where assignment changes deterministically, creating local around the . Near the , units just above and below are assumed comparable except for receipt, allowing estimation of local causal effects. RDD variants include sharp RDD, where jumps fully at the (e.g., automatic eligibility for a program above a score ), and fuzzy RDD, where the probability of increases discontinuously but compliance is imperfect, requiring instrumental variable techniques to estimate intent-to-treat and local average effects. An influential example is Angrist and Lavy (1999), who applied RDD to Israel's Maimonides' rule capping class sizes at 40 students per teacher; enrollment just exceeding multiples of 40 triggered class splitting, revealing that smaller classes improved student test scores, particularly in early grades.

Interrupted Time Series

Interrupted time series analysis assesses impacts by modeling outcome trends before and after a specific intervention point, detecting shifts in level or slope attributable to the treatment. This design controls for underlying time trends and , assuming no concurrent events confound the interruption. To address in time-series data, where errors are correlated over time, models incorporate autoregressive terms or differencing to ensure valid on immediate level changes or slope alterations post-intervention.

Validity Checks

tests enhance credibility by applying the to pre- periods or untreated units, expecting null effects if assumptions hold; for instance, in DiD, simulating treatment in earlier time periods should insignificant estimates. Robustness to assumptions involves sensitivity analyses, such as varying bandwidths in RDD or testing alternative trend specifications in time series, to confirm results are not driven by model choices or violations like heterogeneous trends.

Field-Specific Applications

In epidemiology, causal inference plays a central role in identifying factors that contribute to occurrence and progression, often relying on observational data due to ethical and practical constraints on . Unlike randomized controlled trials, which provide strong evidence of causality through experimental manipulation, epidemiological studies must carefully address , , and reverse causation to infer causal relationships. Key study designs include studies, which follow groups exposed and unexposed to a over time to estimate relative risks; case-control studies, which compare individuals with a (cases) to those without (controls) to assess prior exposures via odds ratios; and cross-sectional studies, which capture exposure and outcome data at a single point to identify associations but struggle with . In case-control designs, odds ratios approximate risk ratios when the outcome is rare, facilitating causal assessment in resource-limited settings. A seminal framework for evaluating causal evidence in epidemiology is the Bradford Hill criteria, proposed by Austin Bradford Hill in 1965, which outline nine considerations: strength of association, consistency across studies, specificity of the association, temporality (exposure preceding outcome), biological gradient (dose-response relationship), plausibility, coherence with existing knowledge, experiment (if applicable), and analogy. These criteria, derived from analyses of smoking and lung cancer, guide researchers in distinguishing causal from spurious associations without providing a strict checklist for proof. For instance, temporality is essential to rule out reverse causation, while consistency requires replication in diverse populations. Controlling for is critical in epidemiological causal inference, with methods like used to balance baseline characteristics between exposed and unexposed groups, mimicking . Propensity scores estimate the probability of exposure given covariates, enabling matched analyses that reduce bias in observational data. Directed acyclic graphs (DAGs) further aid in identifying confounders and mediators by visually representing causal assumptions, particularly in modeling where pathways involve multiple variables. In infectious contexts, DAGs help delineate dynamics and effects. Illustrative examples highlight these approaches: the , initiated in 1948, employed prospective cohort designs to establish causal links between risk factors like and , influencing preventive guidelines through long-term follow-up of over 5,000 participants. Similarly, efficacy trials, such as the Pfizer-BioNTech phase 3 , demonstrated causal protection against severe outcomes, reporting 95% efficacy against symptomatic infection. Unique challenges in include handling rare events, where case-control designs predominate; time-varying exposures, such as cumulative smoking doses analyzed via g-estimation; and mediation analysis in pathways, for example, how smoking leads to through tar deposition as an intermediate. These aspects underscore the need for robust statistical tools to unpack complex biological mechanisms.

Economics and Political Science

In economics and political science, causal inference methods are extensively applied to evaluate policy interventions and understand behavioral responses in socioeconomic contexts. Natural experiments, such as randomized lotteries for school choice programs, provide quasi-random variation to estimate causal effects on student outcomes. For instance, in Chicago's public high school admissions system, lottery winners who attended their preferred schools showed no significant improvements in test scores or graduation rates compared to losers, highlighting the importance of school quality and peer effects in causal pathways. Similarly, analyses of Boston's charter school lotteries reveal substantial achievement gains for lottery winners attending oversubscribed charters, with effects equivalent to 0.4 standard deviations per year in math and reading, underscoring the role of school accountability in driving causal impacts. These lottery-based designs leverage randomization to isolate treatment effects, akin to randomized controlled trials (RCTs), while addressing selection biases inherent in observational choice data. Synthetic control methods further advance policy evaluation by constructing counterfactuals for treated units using weighted combinations of untreated controls, particularly useful when traditional controls are unavailable. Developed to assess aggregate interventions, this approach estimates causal effects by minimizing pre-treatment differences in predictors like GDP or consumption. In the , the method quantified terrorism's economic costs, showing a 10 decline in per capita GDP relative to a synthetic control after 1975. Applied to California's Proposition 99 tobacco control program, it estimated a 20-30 index point reduction in per capita cigarette sales by 2000 compared to a synthetic control of other states. The Oregon Health Insurance Experiment (2008), an RCT via lottery-based expansion, exemplifies policy evaluation by demonstrating increased healthcare utilization and improved self-reported among winners, with no significant changes in physical outcomes after one year, informing causal debates on effects. Complementing these, the Angrist-Krueger (1991) study used quarter-of-birth as an instrument in a to estimate returns to schooling, finding a 7-10% wage increase per additional year, causal evidence pivotal for . In behavioral economics, causal inference addresses endogeneity in choice models, where unobserved factors like preferences confound observed decisions, using structural estimation and revealed preference approaches to infer welfare effects. Revealed preference methods recover underlying utilities from choice data while accounting for behavioral biases, enabling causal welfare analysis beyond standard rationality assumptions. For example, extensions of revealed preference theory incorporate framing effects or biases to test consistency and estimate welfare-relevant preferences, revealing how choice inconsistencies affect causal interpretations of consumer surplus. In political science, field experiments on voter turnout causally identify mobilization effects; Gerber and Green (2000) found that nonpartisan door-to-door canvassing increased turnout by 8-10 percentage points in a New Haven RCT, while phone calls and mail had negligible or negative impacts, guiding get-out-the-vote strategies. Panel data methods estimate dynamic causal effects by modeling time-varying treatments and outcomes, controlling for unit-specific trends to capture persistence or anticipation. Blackwell, Imai, and King (2014) propose a weighting framework for dynamic panel inference, applied to political events like policy shocks, revealing lagged effects on outcomes such as public opinion shifts. Unique to these fields are considerations of general effects and long-term spillovers, which complicate causal by transmitting treatments through markets or networks. General adjustments, such as price changes from -induced supply shifts, can partial estimates; in urban settings, highway construction causally increased by 20-30% via accessibility gains, but with spillovers reducing central city populations. programs in generated aggregate income multipliers of 2.5 via spillovers, with treated households' spending boosting local economies, illustrating amplification of direct effects. Long-term spillovers extend beyond immediate outcomes, as seen in boundary discontinuity designs where borders reveal ; U.S. funding reforms spilled over districts, increasing neighboring spending by 10% and equalizing outcomes regionally. These aspects emphasize the need for holistic causal models in design to account for interconnected socioeconomic dynamics.

Computer Science and Machine Learning

In and , causal inference emphasizes scalable algorithms that integrate with high-dimensional data processing and predictive modeling to estimate effects and causal structures. These approaches leverage techniques to handle complex confounders and enable in large-scale settings, such as web-scale datasets, where traditional methods falter. By combining causal assumptions with flexible estimators, computational frameworks address and estimation challenges, facilitating applications in dynamic systems like online platforms. A key advancement in causal (Causal ML) is the /debiased (DML) framework, which uses to flexibly estimate nuisance parameters like propensity scores and outcome regressions, thereby achieving root-n consistent causal estimation even with high-dimensional confounders. This method debiases ML predictions through cross-fitting and orthogonalization, ensuring valid under unconfoundedness assumptions. Complementing DML, targeted learning employs ensemble methods and cross-validation to construct targeted maximum likelihood estimators (TMLEs) that update initial ML predictions toward the causal parameter of interest, providing robustness against model misspecification. These techniques are particularly suited to observational in ML pipelines, where they mitigate bias from flexible nonparametric models. Noise models play a crucial role in computational causal inference by providing identifiability conditions for structural causal models (SCMs), especially in linear settings. Under the additive noise model, each variable is expressed as a function of its parents plus an independent noise term, enabling the recovery of causal directions from observational data without experiments, as the noise independence breaks symmetry in linear relations. For instance, in linear SCMs, if the noise is non-Gaussian, the causal direction is identifiable via methods like linear non-Gaussian acyclic models (). Nonparametric extensions relax linearity while maintaining identifiability through score-based tests or residuals. Briefly, these models often represent dependencies via directed acyclic graphs (DAGs) to encode causal assumptions. In applications, causal forests extend random forests to estimate heterogeneous effects by recursively partitioning data based on covariates that interact with , allowing scalable on individual-level causal impacts. This method, which averages honest trees to reduce variance, has been applied to personalize interventions in domains like policy evaluation. Similarly, uplift modeling in uses causal to predict incremental effects of campaigns on , optimizing targeting by estimating conditional effects (CATE) for subgroups. For example, in recommendation systems, causal disentangles preferences from biases, enabling counterfactual predictions of with unseen items. In algorithmic fairness, causal approaches quantify by tracing disparate outcomes to protected attributes via mediation analysis, informing debiasing in decision algorithms. Unique to these computational paradigms is their scalability to massive datasets via parallelization and efficient approximations, alongside causal imputation methods that leverage SCMs to missing data mechanisms, preserving during preprocessing.

Advanced Techniques

Instrumental Variables

Instrumental variables (IV) estimation addresses in causal inference by introducing a Z, termed the instrument, that is correlated with the endogenous treatment X but uncorrelated with the error term in the outcome equation for Y. The method relies on two core assumptions: , which requires \cov(Z, X) \neq 0, ensuring the instrument predicts the treatment; and exclusion, which stipulates that Z affects Y only through X, i.e., \cov(Z, \epsilon) = 0 where \epsilon is the error in the structural equation Y = \beta X + \gamma' W + \epsilon and W are exogenous covariates. These assumptions allow IV to isolate exogenous variation in X induced by Z, mitigating biases from or reverse causality, as briefly referenced in discussions of observational data challenges. Under monotonicity—where the instrument does not decrease treatment uptake for any subgroup—the IV estimand identifies the local (LATE), the average effect of X on Y for compliers, those whose treatment status changes with Z. In the simplest bivariate case without covariates, the IV estimator is given by the Wald ratio: \hat{\beta}_{IV} = \frac{\cov(Y, Z)}{\cov(X, Z)}, which equals the difference in means of Y (or X) across values of binary Z, divided appropriately. For models with covariates or multiple instruments, two-stage least squares (2SLS) provides a consistent estimator: in the first stage, regress X on Z and W to obtain fitted values \hat{X}; in the second stage, regress Y on \hat{X} and W to recover \hat{\beta}. This procedure yields the best linear approximation to the LATE in linear models and is robust to heteroskedasticity when using robust standard errors. To detect endogeneity necessitating IV over ordinary least squares (OLS), the Hausman test compares \hat{\beta}_{IV} and \hat{\beta}_{OLS}; under the null of exogeneity, the difference is asymptotically zero. Valid IV application requires testing key assumptions. Relevance is assessed via the first-stage F-statistic from the of X on Z; values below 10 indicate weak instruments, leading to finite-sample bias and invalid inference, as the instrument fails to sufficiently vary X. For overidentified models (more instruments than endogenous variables), the Sargan test checks the exclusion restriction by examining residuals from the structural equation regressed on instruments; under the null, the test statistic follows a with equal to the number of overidentifying restrictions. Violations can arise from instrument invalidity, underscoring the need for theoretically motivated Z. A seminal application is Angrist and Krueger's (1991) use of quarter-of-birth as an for years of schooling to estimate returns to . Children born in the first quarter of the year start school slightly later due to cutoff dates, leading to plausibly exogenous variation in education that affects earnings but not innate ability, yielding a 7-10% return per additional year for compliers. In experimental settings with imperfect compliance, such as randomized voter mobilization campaigns, assignment to treatment serves as an for actual turnout; the IV estimate then captures the LATE for induced voters (compliers), as analyzed in frameworks handling noncompliance.

Sensitivity Analysis

Sensitivity analysis in causal inference evaluates the robustness of estimated causal effects to violations of key assumptions, such as the absence of unmeasured or model misspecification. These techniques quantify how much deviation from ideal conditions, like hidden confounders, would be required to alter conclusions about , providing a framework to gauge the credibility of findings. By deriving bounds on potential biases, helps researchers communicate uncertainty and assess whether results hold under plausible alternative scenarios. One prominent method is Rosenbaum's sensitivity bounds, applied in matched observational studies to assess the impact of unmeasured covariates on effect estimates. These bounds calculate the range of possible effects assuming hidden confounders differ in odds of assignment up to a specified parameter Γ, where Γ=1 implies no hidden bias akin to . For instance, if the upper bound of the effect crosses zero at Γ=2, it indicates that confounders twice as strongly associated with and outcome as measured ones could nullify the observed effect. This approach is particularly useful in , where it serves as a post-estimation check to test the stability of matched estimates. The E-value, developed by VanderWeele and Ding, measures the minimum strength of unmeasured needed to explain away an observed , offering an intuitive sensitivity metric for epidemiologic and research. For a risk ratio (RR) of 2, the E-value is approximately 3.4, meaning that an unmeasured confounder associated with both exposure and outcome by an RR of 3.4 or more—stronger than any measured confounder—could fully account for the observed , rendering it non-causal. This tool applies to various effect measures, including odds ratios and hazard ratios, and is computed without requiring model refitting, making it accessible for routine sensitivity checks in regression-based analyses. Graphical tools, such as directed acyclic graphs (DAGs) augmented with variables, facilitate by visualizing potential unmeasured confounders and deriving partial bounds on causal effects. In a DAG, introducing a connected to both and outcome illustrates backdoor paths that, if unblocked, induce ; partial then yields worst-case bounds on the , such as those ranging from the minimum to maximum possible outcomes under monotonicity assumptions. These bounds, pioneered by Manski, quantify the interval of plausible causal effects without full , highlighting the degree of due to variables. For example, in the absence of , the bounds might span from -1 to 1 for a outcome, narrowing with additional restrictions like monotonicity. Such graphical approaches integrate with structural causal models to briefly reference backdoor adjustment while probing assumption violations. For model specification issues, particularly in linear regressions, Cinelli and Hazlett extend the omitted variable bias framework to provide graphical and numerical diagnostics. Their method visualizes the bias contribution of a potential omitted through partial R-squared measures for its correlations with the regressor and outcome, enabling researchers to assess how large these associations must be to invalidate the estimate. This toolkit includes contour plots showing combinations of partial R-squared values that would overturn the causal conclusion, applicable as a post-estimation tool in ordinary models. In practice, sensitivity analysis is routinely applied as post-estimation diagnostics in instrumental variable (IV) and propensity score analyses to verify robustness. For IV methods, extensions of the Cinelli-Hazlett framework bound bias from invalid instruments or omitted variables without weak instrument concerns, while in propensity score matching, Rosenbaum bounds test for hidden biases beyond observed covariates. These checks ensure that causal claims withstand scrutiny, promoting transparent reporting of assumption-dependent results in fields like epidemiology and economics.

Causal Discovery Methods

Causal discovery methods aim to infer causal structures, typically represented as directed acyclic graphs (DAGs), from observational data without prior knowledge of the underlying mechanisms. These algorithms automate the search for causal relationships by leveraging statistical dependencies, contrasting with approaches that assume known structures for effect estimation. Broadly, they fall into two categories: constraint-based methods, which use conditional independence tests to prune edges, and score-based methods, which optimize a scoring function over possible graphs to balance fit and complexity. Both rely on key assumptions, such as the causal Markov condition, which states that a variable is independent of its non-descendants given its parents in the causal graph, and faithfulness, which posits that all conditional independencies in the data are implied by the graph's d-separation criteria. Constraint-based methods begin by testing for unconditional and conditional independencies among variables to identify the of the , then orient edges using rules like collider detection. The PC algorithm, named after its developers Peter Spirtes and Clark Glymour, is a seminal constraint-based approach that iteratively applies tests, starting with small conditioning sets and increasing their size to reduce computational cost. It exploits d-separation, a graphical criterion where two variables are conditionally independent given a set if all paths between them are blocked by the conditioning set, to orient edges and avoid cycles. For settings with latent (unobserved) confounders, the Fast Causal Inference (FCI) algorithm extends PC by allowing bidirectional edges in partial ancestral graphs, detecting latent variables through patterns like unshielded s without assuming causal sufficiency. Score-based methods evaluate candidate DAGs using a score that measures data likelihood penalized for model complexity, searching the space of graphs to find a high-scoring structure. The (BIC) is a widely used score, approximating the by subtracting a penalty proportional to the number of parameters and sample size logarithm, which favors parsimonious models consistent with the data in large samples. The Greedy Search (GES) algorithm applies this by operating on equivalence classes of DAGs () rather than graphs, using forward and backward greedy steps to add, delete, or reverse edges while maximizing the score, achieving under . In time series data, where cycles may arise due to temporal dependencies, causal discovery adapts by incorporating lagged variables; for instance, tests whether past values of one series improve prediction of another beyond its own past, assuming stationarity and to infer directional influences without full acyclicity. Practical implementations include the , which integrates PC, FCI, GES, and other algorithms for simulating, estimating, and visualizing causal models from data. In , these methods have reconstructed regulatory networks by discovering causal links from expression data, such as identifying key regulators in cancer pathways where constraint-based approaches reveal latent interactions among hundreds of s. Despite their strengths, causal discovery methods face challenges, including high sample size requirements for reliable tests, as power decreases with sparse data, leading to incomplete or erroneous graphs. Multiple testing in evaluations exacerbates false positives, necessitating corrections like control to maintain validity across numerous tests.

Challenges and Criticisms

Common Methodological Pitfalls

One common methodological pitfall in causal inference is the failure to prioritize replication, often leading to "fork science" where initial findings are pursued without verifying their robustness, or "junk science" where irreproducible results propagate unchecked. The in exemplifies this issue, as a large-scale effort to reproduce 100 studies from top journals found that only 36% yielded significant effects, compared to 97% in the originals, highlighting how selective reporting and low statistical power contribute to unreliable causal claims. Another frequent error involves conducting multiple comparisons without appropriate corrections, which inflates the and increases the likelihood of false positives in estimating causal effects. For instance, in observational analyses aiming to infer impacts across various subgroups or outcomes, unadjusted p-values can misleadingly suggest causal relationships that do not hold under scrutiny, as the probability of at least one spurious significant result rises with the number of tests performed. The represents a critical when aggregate-level data are used to draw conclusions about individual-level causal relationships, often violating the assumptions of methods like regression discontinuity or difference-in-differences. Coined by Robinson in his seminal analysis of correlations between and foreign-born populations across U.S. states versus individuals, this pitfall occurs because group-level associations may arise from compositional effects rather than true individual causation, leading to erroneous implications. Post-hoc subgroup analyses, commonly known as , pose a significant by exploiting flexibility in data exploration to identify seemingly significant causal effects that are actually artifacts of multiple testing or chance. In randomized trials or observational studies, unplanned stratifications—such as dividing samples by age or baseline characteristics after observing overall results—can yield subgroup-specific estimates that fail to replicate, as they capitalize on noise without accounting for the increased Type I error rate. Survivorship bias in longitudinal studies distorts causal estimates by systematically excluding participants who drop out or experience the event of interest early, biasing toward "survivors" and underestimating effects on the full . For example, in mental health cohort analyses, due to severe outcomes can make samples appear healthier over time, leading to overoptimistic inferences about treatment efficacy unless or sensitivity checks are applied. In instrumental variables () approaches, using weak instruments—those with low to the endogenous treatment variable—produces biased and imprecise causal estimates, often exacerbating rather than resolving it. Weak instruments fail to satisfy the assumption, resulting in finite-sample bias toward the ordinary estimate and unreliable inference, as demonstrated in simulations where first-stage below 10 lead to confidence intervals that cover implausible values. Signs of methodological malpractice in causal inference include cherry-picking models, where researchers selectively report specifications that yield desired significant effects while omitting alternatives, and the absence of pre-registration, which enables post-hoc adjustments akin to p-hacking. These practices undermine the validity of causal claims by introducing researcher , as seen in cases where multiple variants are tested until a favorable outcome emerges without disclosure. To mitigate these pitfalls, researchers should adopt pre-analysis plans that outline hypotheses, , and analysis steps in advance, reducing flexibility for while preserving exploratory intent. Enhanced through detailed reporting of all analyses, including null results and sensitivity tests, further promotes ; for instance, platforms like the Open Science Framework facilitate such practices, correlating with higher replication rates in registered studies.

Ethical and Practical Limitations

Causal inference methods, particularly when integrated with , raise significant ethical concerns regarding . In causal applications, the selection of instrumental variables (IVs) can inadvertently perpetuate if the instruments are chosen based on biased sources that reflect societal inequities, such as using socioeconomic proxies that disadvantage marginalized groups. For instance, surrogate IVs learned from user-item interactions in recommendation systems may amplify biases if the underlying overrepresents certain demographics, leading to unfair causal estimates in processes like hiring or lending. Additionally, the use of observational in causal inference often involves ethical dilemmas around , as individuals may not be aware that their data is being analyzed to infer causal relationships, potentially violating norms without explicit approval. Practical limitations further complicate the application of causal inference. is frequently undermined by reliance on WEIRD (Western, Educated, Industrialized, Rich, Democratic) samples in psychological and , which restricts the generalizability of causal findings to diverse populations and can lead to misleading inferences about universal human behaviors. In global health contexts, scalability poses a major barrier, as traditional causal inference techniques struggle with the computational demands of large-scale, heterogeneous datasets from low-resource settings, limiting their deployment in real-time response or policy evaluation. Causal claims derived from these methods have profound policy implications, often directly influencing legislation and regulations. For example, epidemiological causal inferences linking tobacco use to lung cancer were pivotal in shaping U.S. policies like the 1964 Surgeon General's report, which spurred advertising restrictions and public health campaigns, demonstrating how robust causal evidence can drive protective laws. However, such applications risk unintended consequences, including policy rebound effects where interventions based on incomplete causal models exacerbate inequalities or create new harms, such as when correlation-driven assumptions overlook heterogeneous treatment effects across subgroups. The 2014 Facebook emotional contagion experiment exemplifies these ethical tensions, where researchers manipulated news feeds of nearly 700,000 users without prior consent to study emotional transmission, sparking debates over psychological harm and the need for institutional review board oversight in large-scale observational manipulations. Similarly, in climate policy, causal inference faces challenges in attributing extreme weather events to human activities amid confounding variables like natural variability, complicating efforts to justify mitigation strategies and risking ineffective or inequitable resource allocation. Looking ahead, addressing these issues requires interdisciplinary guidelines that integrate causal inference standards with broader human subjects protections, such as those outlined in international frameworks for emphasizing and justice. Promoting equitable data access is also essential, ensuring that underrepresented populations contribute to and benefit from causal datasets to mitigate biases and foster inclusive policy outcomes. Recent developments in Causal as of 2025 highlight ongoing challenges, including data quality and availability for robust causal discovery in high-dimensional settings, integration with for personalized treatments, and methodological issues in platform trials and multisource statistics, which demand enhanced focus on interpretability and generalizability.

References

  1. [1]
    [PDF] Causal inference in statistics: An overview - UCLA
    Examples of causal concepts are: randomization, influence, effect, confounding, “holding constant,” disturbance, spurious correlation, faithfulness/stability, ...
  2. [2]
    [PDF] Causal Inference: A Statistical Learning Approach - Stanford University
    Sep 6, 2024 · Our goal is to estimate the effect of the treatment on the outcome. Following the Neyman–Rubin causal model, we define the causal effect of a ...
  3. [3]
    The causal inference framework: a primer on concepts and methods ...
    The purpose of this first paper is to: a) define causal inference, b) provide a brief history of the causal inference framework and associated methods, c) ...
  4. [4]
    The Importance of Being Causal - Harvard Data Science Review
    Jul 30, 2020 · Causal inference is the study of how actions, interventions, or treatments affect outcomes of interest.
  5. [5]
    David Hume - Stanford Encyclopedia of Philosophy
    Feb 26, 2001 · Hume's method dictates his strategy in the causation debate. In the critical phase, he argues that his predecessors were wrong: our causal ...Kant and Hume on Causality · Hume's Moral Philosophy · On Free Will · On Religion
  6. [6]
    Randomised controlled trials—the gold standard for effectiveness ...
    Dec 1, 2018 · RCTs are the gold-standard for studying causal relationships as randomization eliminates much of the bias inherent with other study designs.
  7. [7]
    An Enquiry Concerning Human Understanding - Project Gutenberg
    Enquiries concerning the human understanding, and concerning the principles of morals, by David Hume.
  8. [8]
    [PDF] The Design of Experiments By Sir Ronald A. Fisher.djvu
    First Published 1935. Second Edition 1937. Third Edition 1942. Fourth Edition 1947. Fifth Edition 1949. Sixth Edition 1951. Reprinted 1953. Seventh Edition 1960.Missing: primary source
  9. [9]
    [PDF] On the Application of Probability Theory to Agricultural Experiments ...
    Abstract. In the portion of the paper translated here, Neyman introduces a model for the analysis of field experiments conducted for the purpose of comparing a ...
  10. [10]
    [PDF] Estimating causal effects of treatments in randomized and ...
    A discussion of matching, randomization, random sampling, and other methods of controlling extraneous variation is presented.Missing: source | Show results with:source
  11. [11]
    Double/Debiased Machine Learning for Treatment and Causal ...
    Jul 30, 2016 · View a PDF of the paper titled Double/Debiased Machine Learning for Treatment and Causal Parameters, by Victor Chernozhukov and 6 other authors.
  12. [12]
    Causal Inference for Statistics, Social, and Biomedical Sciences
    'Guido Imbens and Don Rubin present an insightful discussion of the potential outcomes framework for causal inference … this book presents a unified ...Missing: original | Show results with:original
  13. [13]
    Causal Inference - Proceedings of Machine Learning Research
    This paper reviews a theory of causal inference based on the Structural Causal Model (SCM) described in Pearl (2000a). The theory unifies the graphical ...Missing: seminal | Show results with:seminal
  14. [14]
    [PDF] Causal diagrams for empirical research
    Pearl. (1993b) shows that such judgments are equivalent to a simple graphical test, named the. 'back-door criterion', which can be applied directly to the ...
  15. [15]
    d-SEPARATION WITHOUT TEARS (At the request of many readers)
    d-separation is a criterion for deciding, from a given a causal graph, whether a set X of variables is independent of another set Y, given a third set Z.
  16. [16]
    Randomized controlled trials – a matter of design - PMC
    The internal validity of a clinical trial is directly related to appropriate design, conduction, and reporting of the study. The two main threats to internal ...
  17. [17]
    Rethinking the pros and cons of randomized controlled trials ... - NIH
    Jan 18, 2024 · Under ideal conditions, this design ensures high internal validity and can provide an unbiased causal effect of the exposure on the outcome [6].
  18. [18]
    Chapter 7 A/B Testing: Beyond Randomized Experiments | Causal ...
    A/B testing is not just a direct adaptation of classic randomized experiments to a new type of business and data. It has its own special aspects, unique ...
  19. [19]
    Intention-to-treat versus as-treated versus per-protocol approaches ...
    Nov 14, 2023 · There are various group-defining strategies for analyzing RCT data, including the intention-to-treat (ITT), as-treated, and per-protocol (PP) approaches.
  20. [20]
    Statistics review 4: Sample size calculations | Critical Care | Full Text
    May 10, 2002 · The first step in calculating a sample size for comparing means is to consider this difference in the context of the inherent variability in ...<|separator|>
  21. [21]
    Evidence for Health Decision Making — Beyond Randomized ...
    Aug 3, 2017 · Despite their strengths, RCTs have substantial limitations. Although they can have strong internal validity, RCTs sometimes lack external ...
  22. [22]
    “A calculated risk”: the Salk polio vaccine field trials of 1954 - NIH
    The 1954 polio vaccine field trials used a singular statistical design · Over 600 000 schoolchildren were injected with vaccine or placebo and over a million ...
  23. [23]
    Randomized controlled trials – The what, when, how and why
    RCTs are considered the “gold standard” as they offer the best answer on the efficacy of a treatment or intervention. A well-designed RCT with rigorous ...
  24. [24]
    Causal inference and effect estimation using observational data
    We provide a clear, structured overview of key concepts and terms, intended as a starting point for readers unfamiliar with the causal inference literature.Key Concepts And Frameworks · Defining Causal Effects · Identifying Causal Effects
  25. [25]
    Causal Inference With Observational Data and Unobserved ...
    Jan 21, 2025 · The major challenge using observational data for causal inference is confounding variables: variables affecting both a causal variable and ...
  26. [26]
    Causal inference with observational data: the need for triangulation ...
    Three types of bias can arise in observational data: (i) confounding bias (which includes reverse causality), (ii) selection bias (inappropriate selection of ...
  27. [27]
    Berkson's bias, selection bias, and missing data - PMC - NIH
    Berkson's bias is widely recognized in the epidemiologic literature, it remains underappreciated as a model of both selection bias and bias due to missing data.
  28. [28]
    Healthy User Bias - an overview | ScienceDirect Topics
    Healthy-user bias is when patients receiving therapy engage in healthier behaviors, leading to misleading conclusions about the therapy's effectiveness.
  29. [29]
    The Measurement Error Elephant in the Room - NIH
    The Berkson error model posits a fixed value of the measured variable, A*, around which the true value, A, varies such that A = A* + UA (Figure 1B).
  30. [30]
    five myths about measurement error in epidemiological research
    Dec 10, 2019 · In this paper, we describe five myths that contribute to misjudgments about measurement error, regarding expected structure, impact and solutions.
  31. [31]
    Matching methods for causal inference: A review and a look forward
    When estimating causal effects using observational data, it is desirable to replicate a randomized experiment as closely as possible by obtaining treated and ...
  32. [32]
    Squeezing observational data for better causal inference
    At this stage, strategies to reduce confounding include regression adjustment, restriction, stratification, matching, propensity score matching, standardisation ...
  33. [33]
    [PDF] Regression Discontinuity Designs: A Guide to Practice
    Thistlewaite, D., and D. Campbell, 1960, Regression-Discontinuity Analysis: An Alter- native to the Ex-Post Facto Experiment, Journal of Educational ...
  34. [34]
    [PDF] Minimum Wages and Employment: A Case Study of the Fast-Food ...
    On April 1, 1992, New Jersey's minimum wage rose from $4.25 to $5.05 per hour. To evaluate the impact of the law we surveyed 410 fast-food restaurants in.
  35. [35]
    EDUCATIONAL PSYCHOLOGY - APA PsycNet
    This paper has three purposes: first, it presents an alternative mode of analysis, called regression-discon- tinuity analysis, which we believe can be more ...
  36. [36]
    [PDF] using maimonides' rule to estimate the effect of class size on ...
    Maimonides' rule of 40 is used here to construct instrumental variables estimates of effects of class size on test scores. The resulting identification strategy ...
  37. [37]
    [PDF] Campbell, DT (1969). Reforms as experiments. American ...
    These are the "interrupted time-series design," the "con- trol series design," "regression discontinuity de- sign," and various "true experiments." The ...Missing: seminal | Show results with:seminal
  38. [38]
    [PDF] Quasi-Experiments: Interrupted Time-Series Designs
    Interrupted time-series designs are a type of quasi-experiment where a treatment's impact is assessed by observing a change in the series at the treatment ...Missing: seminal | Show results with:seminal
  39. [39]
    [PDF] Placebo Tests for Causal Inference - Knowledge UChicago
    Our formal framework clarifies the extra assumptions necessary for informative placebo tests; these assumptions can be strong, and in some cases similar.Missing: quasi- | Show results with:quasi-
  40. [40]
    The Environment and Disease: Association or Causation? - PMC - NIH
    Austin Bradford Hill ... This article has been reprinted. See "The environment and disease: association or causation?" in Bull World Health Organ, volume 83 on ...
  41. [41]
    Targeted Learning - Book - SpringerLink
    This book is aimed at both statisticians and applied researchers interested in causal inference and general effect estimation for observational and ...
  42. [42]
    [PDF] Consistency of Causal Inference under the Additive Noise Model
    We analyze a family of methods for statisti- cal causal inference from sample under the so- called Additive Noise Model. While most work.
  43. [43]
    Recursive partitioning for heterogeneous causal effects - PNAS
    Jul 5, 2016 · In this paper we propose methods for estimating heterogeneity in causal effects in experimental and observational studies and for conducting ...
  44. [44]
    [PDF] Causal Inference and Uplift Modeling A review of the literature
    Uplift modeling estimates the impact of an action on a customer outcome, using techniques to model the effect of a treatment on a customer outcome.
  45. [45]
    [2208.12397] Causal Inference in Recommender Systems - arXiv
    Aug 26, 2022 · Researchers in recommender systems have begun utilizing causal inference to extract causality, thereby enhancing the recommender system.
  46. [46]
    Causally-Aware Imputation via Learning Missing Data Mechanisms
    Nov 4, 2021 · Our proposal is a causally-aware imputation algorithm (MIRACLE). MIRACLE iteratively refines the imputation of a baseline by simultaneously modeling the ...
  47. [47]
    Identification and Estimation of Local Average Treatment Effects
    Feb 1, 1995 · We investigate conditions sufficient for identification of average treatment effects using instrumental variables.
  48. [48]
    Identification and Estimation of Local Average Treatment Effects - jstor
    (Angrist, Imbens, and Rubin (1993)), we discuss conditions similar to this in great detail, and investigate the implications of violations of these conditions.
  49. [49]
    Identification of Causal Effects Using Instrumental Variables
    We show that the instrumental variables (IV) estimand can be embedded within the Rubin Causal Model (RCM) and that under some simple and easily interpretable ...
  50. [50]
    [PDF] Specification Tests in Econometrics Author(s): J. A. Hausman Source
    where a test of Ho: a = 0 is a test for errors in variables.'2 The last orthogonality test involves a lagged endogenous variable which may be correlated with ...
  51. [51]
    Instrumental Variables Regression with Weak Instruments | NBER
    Jan 1, 1994 · Douglas Staiger and James H. Stock, "Instrumental Variables Regression with Weak Instruments," NBER Working Paper t0151 (1994), https://doi.org/ ...
  52. [52]
    Instrumental Variables Regression with Weak Instruments - jstor
    This section provides an asymptotic interpretation of this statistic as a measure of the bias resulting from weak instruments. Consider the squared bias of ...
  53. [53]
    [PDF] AngristKrueger1991.pdf
    The estimated monetary return to an additional year of schooling for those who are compelled to attend school by compulsory schooling laws is about 7.5 percent, ...
  54. [54]
    Sensitivity Analysis in Observational Research: Introducing the E ...
    Jul 11, 2017 · This article introduces a new measure called the “E-value,” which is related to the evidence for causality in observational studies that are potentially ...
  55. [55]
    Tetrad - Department of Philosophy - Carnegie Mellon University
    Tetrad is a software suite for simulating, estimating, and searching for graphical causal models of statistical data. The Tetrad suite can be used from the ...
  56. [56]
    Challenges and Opportunities with Causal Discovery Algorithms
    Feb 19, 2020 · Both of the two methods can adjust for observed confounding and one of the algorithms, FCI, has some ability to discover latent confounding.<|separator|>
  57. [57]
    [PDF] Bounding the Family-Wise Error Rate in Local Causal Discovery ...
    We tested multiple sample sizes and sampled 100 datasets for each sam- ple size. We compared our algorithms and state-of-the-art ones both in the standard ...<|separator|>
  58. [58]
    Estimating the reproducibility of psychological science
    We conducted a large-scale, collaborative effort to obtain an initial estimate of the reproducibility of psychological science.
  59. [59]
    Common pitfalls in statistical analysis: The perils of multiple testing
    Another, more challenging type, of multiple testing occurs when authors try to salvage a negative study. If the primary endpoint does not show statistical ...
  60. [60]
    Ecological Correlations and the Behavior of Individuals - jstor
    In each instance, however, the substitution is made tacitly rather than explicitly. The purpose of this paper is to clarify the ecological correlation problem ...
  61. [61]
    Statistical Pitfalls in Medical Research - PMC - NIH
    Subgroup analysis. Ad hoc subgroup analyses are vulnerable to data dredging. Ideally results of such analysis should be viewed as exploratory. Even with ...Missing: post- | Show results with:post-
  62. [62]
    Uncovering survivorship bias in longitudinal mental health surveys ...
    However, survivorship bias in longitudinal mental health surveys suggest that longitudinal samples may be non-representative of population-level mental health.
  63. [63]
    Avoiding Invalid Instruments and Coping with Weak Instruments
    We call such instruments “weak.” Researchers need to guard against drawing misleading inferences from weak instruments. How can economists determine that a ...
  64. [64]
    HARKing, Cherry-Picking, P-Hacking, Fishing Expeditions, and Data ...
    Feb 18, 2021 · Cherry-picking is the presentation of favorable evidence with the concealment of unfavorable evidence. P-hacking is the relentless analysis of ...
  65. [65]
    Promises and Perils of Pre-Analysis Plans
    A pre-analysis plan is relatively straightforward to write if there is a single, simple hypothesis, with a single, obvious outcome variable of interest. But in ...
  66. [66]
    [PDF] Do Pre-Registration and Pre-analysis Plans Reduce p - EconStor
    Aug 3, 2022 · We provide what we believe to be the first systematic investigation of whether. PAPs and pre-registration reduce p-hacking and publication bias ...
  67. [67]
    Instrumental Variables in Causal Inference and Machine Learning
    Jun 13, 2025 · Weak IVs can lead to biased and imprecise causal estimates, as the IV method relies on the strength of the relationship between the IV and the ...
  68. [68]
    Learning instrumental variable representation for debiasing in ...
    To mitigate confounding bias in recommendation systems, we propose learning surrogate instrumental variables (SIVs) directly from user-item interaction data.
  69. [69]
    Facebook study: a little bit unethical but worth it? - PubMed
    This paper argues that the research was unethical because (i) it should have been overseen by an independent ethics committee or review board and (ii) informed ...Missing: debate | Show results with:debate
  70. [70]
    A Causal Framework for Cross-Cultural Generalizability
    Sep 21, 2022 · These findings make it clear that broad, unqualified generalizations about human psychology based on WEIRD samples alone are rarely justified.
  71. [71]
    Scalable Causal Structure Learning: Scoping Review of Traditional ...
    Jan 17, 2023 · This paper provides a practical review and tutorial on scalable causal structure learning models with examples of real-world data to help health care audiences ...
  72. [72]
    Introduction and Approach to Causal Inference - NCBI - NIH
    The judgment that smoking causes a particular disease has immediate implications for prevention of the disease. Having reached a causal conclusion, one of the ...Missing: regulation | Show results with:regulation
  73. [73]
    Causal inference and observational data
    Oct 11, 2023 · Observational studies using causal inference frameworks can provide a feasible alternative to randomized controlled trials.Main Text · Author Information · Rights And Permissions
  74. [74]
    Experimental evidence of massive-scale emotional contagion ...
    These results indicate that emotions expressed by others on Facebook influence our own emotions, constituting experimental evidence for massive-scale contagion ...Missing: ethical | Show results with:ethical
  75. [75]
    Causal inference concepts can guide research into the effects of ...
    Nov 25, 2024 · Causal inference frameworks and their tools are increasingly used to analyse data and guide study design in epidemiology and beyond, and may ...
  76. [76]
    [PDF] International Ethical Guidelines for Health-related Research ...
    These are international ethical guidelines for health research involving humans, prepared by CIOMS and WHO, covering scientific value, low-resource settings, ...
  77. [77]
    Ethical Challenges in the use of AI for Infectious Disease ...
    Mar 7, 2025 · Data Equity and Bias. One of the fundamental ethical concerns in AI-driven epidemiology is data equity. AI models require vast amounts of data ...