Local average treatment effect
The local average treatment effect (LATE) is a causal estimand in econometrics and statistics that quantifies the average effect of a binary treatment on a specific subpopulation—known as compliers—whose treatment status is altered by an exogenous instrumental variable (IV).[1] Compliers are individuals who receive the treatment when the IV is present (e.g., Z=1) but would not receive it in its absence (Z=0), distinguishing them from always-takers (who receive treatment regardless of the IV), never-takers (who never receive it), and potential defiers (ruled out by key assumptions).[1] Formally, LATE is defined as the expected difference in potential outcomes for compliers: \mathbb{E}[Y_1 - Y_0 \mid D_1 > D_0], where Y_1 and Y_0 are potential outcomes with and without treatment, and D_1 and D_0 are potential treatment receipts under the IV states.[2] Introduced by Joshua Angrist and Guido Imbens in their seminal 1994 paper, for their methodological contributions to the analysis of causal relationships, including the LATE framework, Angrist and Imbens shared the 2021 Nobel Memorial Prize in Economic Sciences with David Card.[3][4] LATE provides a rigorous interpretation for IV estimators in settings with heterogeneous treatment effects and imperfect compliance, such as randomized experiments with non-adherence or observational studies with endogeneity.[3] Under the core assumptions of IV independence (the instrument is uncorrelated with potential outcomes and treatment potentials) and monotonicity (the instrument affects treatment status in one direction only, excluding defiers), LATE is nonparametrically identified by the Wald estimator: the ratio of the IV's effect on the outcome to its effect on treatment receipt, \frac{\mathbb{E}[Y \mid Z=1] - \mathbb{E}[Y \mid Z=0]}{\mathbb{E}[D \mid Z=1] - \mathbb{E}[D \mid Z=0]}.[2] This estimand equals the intention-to-treat (ITT) effect on outcomes divided by the ITT effect on treatment, offering a local rather than global measure of causality that applies only to the complier subgroup.[1] LATE has become foundational in applied causal inference, enabling researchers to draw valid conclusions from IV strategies in fields like labor economics (e.g., estimating returns to schooling using quarter-of-birth instruments) and health policy (e.g., assessing drug effects via physician prescribing as an IV).[1] Extensions include bounding LATE under partial violations of monotonicity and nonparametric estimation methods to relax functional form assumptions, though challenges persist in extrapolating LATE to broader populations or verifying the exclusion restriction (where the IV affects outcomes only through treatment).[5] Its emphasis on subpopulation-specific effects underscores the limitations of assuming homogeneous treatment impacts, promoting more nuanced policy evaluations.[2]Fundamentals
Definition
The local average treatment effect (LATE) is a causal estimand that measures the average effect of a treatment on a specific subgroup of the population known as "compliers," who change their treatment status in response to an exogenous instrument. Formally, it is defined as \mathbb{E}[Y(1) - Y(0) \mid D(1) > D(0)], where Y(1) and Y(0) are the potential outcomes under treatment and no treatment, and D(1) and D(0) are the potential treatment receipts under the instrument values Z=1 and Z=0.[6] In empirical settings where treatment assignment is not perfectly randomized or compliant, LATE isolates the impact on those individuals whose participation in the treatment is directly influenced by the instrument, providing a targeted estimate of causal effects rather than an overall population average.[6] Consider a scenario with an outcome variable Y (such as earnings), a binary treatment indicator D (1 if treated, 0 otherwise), and a binary instrument Z (1 if the instrument is applied, 0 otherwise). The LATE then represents the mean difference in Y between treated and untreated states for compliers—those with D=1 when Z=1 and D=0 when Z=0.[6] This contrasts with the average treatment effect (ATE), which estimates the treatment impact across the entire population and can be identified as LATE under full compliance.[6] LATE was formally introduced by Angrist and Imbens in their seminal 1994 paper, which built on earlier instrumental variables (IV) methods to address identification challenges in non-experimental data.[6] This framework has become foundational in econometrics for evaluating policies, such as the effects of education programs or draft lotteries, where instruments like random assignments influence treatment uptake imperfectly.[6]Potential Outcomes Framework
The potential outcomes framework, also known as the Rubin causal model, provides a foundational approach to defining causal effects in observational and experimental settings.[7] In this model, for each unit i, the individual treatment effect is given by the difference between the potential outcome under treatment, Y_i(1), and the potential outcome under no treatment, Y_i(0), where Y_i(d) denotes the value of the outcome Y_i that would be observed if unit i received treatment status d \in \{0, 1\}.[7] However, only one potential outcome is observed for each unit—the one corresponding to the actual treatment received—leading to the fundamental problem of causal inference, as the counterfactual outcome remains unobservable.[7] To address settings with imperfect compliance, where treatment assignment does not perfectly determine receipt, the framework incorporates an instrumental variable Z_i, typically binary, that influences the treatment received D_i but operates through potential treatment statuses D_i(z) for instrument values z \in \{0, 1\}.[6] Here, D_i(z) represents the treatment status unit i would take if assigned to instrument level z, allowing the model to capture how the instrument affects treatment uptake without directly altering the outcome.[6] Units can be classified into principal strata based on their potential treatment behaviors: always-takers, for whom D_i(0) = 1 and D_i(1) = 1; compliers, for whom D_i(0) = 0 and D_i(1) = 1; never-takers, for whom D_i(0) = 0 and D_i(1) = 0; and defiers, for whom D_i(0) = 1 and D_i(1) = 0.[6] The local average treatment effect targets the average causal effect specifically for the complier subgroup, where treatment receipt changes with the instrument, i.e., units satisfying D_i(1) > D_i(0).[6] Non-compliance in this framework can be one-sided, such as scenarios involving only compliers and never-takers (where no unit takes treatment without the instrument), or two-sided, which includes always-takers who receive treatment regardless of the instrument.[6]Assumptions
Non-Compliance Framework
In the potential outcomes framework, non-compliance arises when the instrument Z, which randomly assigns individuals to treatment or control, does not perfectly determine actual treatment receipt D. Specifically, potential treatment variables D(1) and D(0) denote whether an individual would receive treatment if assigned to Z=1 or Z=0, respectively. This setup partitions the population into four principal strata based on their compliance behavior: compliers, for whom D(1)=1 and D(0)=0; always-takers, for whom D(1)=1 and D(0)=1; never-takers, for whom D(1)=0 and D(0)=0; and defiers, for whom D(1)=0 and D(0)=1.[6][8] Non-compliance introduces bias in simple intent-to-treat (ITT) estimates, which compare average outcomes between assignment groups but dilute the treatment effect by including individuals who do not respond to the assignment. The local average treatment effect addresses this by focusing exclusively on compliers, whose treatment status changes with the instrument, thereby isolating the causal effect for this subgroup and correcting for the attenuation bias in ITT.[6][8] Non-compliance can be one-sided or two-sided. In one-sided non-compliance, defiers are absent—typically in encouragement designs where control group members cannot access treatment, but some treatment assignees may not comply—simplifying identification to compliers, always-takers, and never-takers. Two-sided non-compliance permits all four types, including defiers who take treatment when assigned to control but not when assigned to treatment, which complicates interpretation as the instrument may reverse treatment for this group.[9] The proportion of compliers, given by P(D(1) > D(0)), represents the size of the subpopulation on which the instrument has a causal effect on treatment receipt and thus defines the local population relevant for the local average treatment effect. This proportion is empirically estimable as the difference in treatment probabilities between assignment groups and scales the ITT effect to recover the complier-specific impact.[6]Identification Assumptions
The identification of the local average treatment effect (LATE) relies on a set of assumptions within the instrumental variables (IV) framework to ensure that the instrument validly isolates the causal effect for compliers—the subgroup whose treatment status changes with the instrument. These assumptions address potential biases from non-compliance and confounding, enabling the LATE to be recovered from observable data such as intention-to-treat effects. As outlined in the non-compliance framework, individuals are categorized into compliers, always-takers, never-takers, and potential defiers based on their treatment responses to the instrument. The independence assumption, also known as randomization, posits that the instrument Z is independent of the potential outcomes and potential treatments, i.e., Z \perp (Y_i(0), Y_i(1), D_i(0), D_i(1)) for all individuals i. This ensures that the instrument does not correlate with unobserved factors affecting the outcomes or treatment decisions, mimicking a randomized experiment. In practice, this holds when the instrument is exogenously assigned, such as in a lottery system, but it is untestable and requires domain-specific justification.[8] The exclusion restriction requires that the instrument affects the observed outcome Y solely through its impact on the treatment D, with no direct effect: Y_i(z, d) = Y_i(d) for all z and d. For compliers, this implies that their potential outcomes Y_i(1) and Y_i(0) are independent of Z. This assumption rules out direct channels from the instrument to the outcome, such as side effects unrelated to treatment, and is crucial for attributing all instrument-induced variation in Y to changes in D. It is untestable but can be motivated by the instrument's design, like a policy change that only alters treatment access.[8] The monotonicity assumption eliminates defiers—individuals for whom the instrument reverses their treatment choice—by requiring that treatment receipt does not decrease with the instrument: D_i(1) \geq D_i(0) for all i. This ensures the instrument shifts treatment in one direction only, so that the first stage variation reflects only compliers (and possibly always-takers or never-takers) without opposing effects that could bias the LATE toward a weighted average across subgroups. Monotonicity is untestable but often plausible under linear response models or when the instrument encourages uptake without discouraging it.[8] Finally, the relevance assumption mandates that the instrument strongly predicts treatment, with the proportion of compliers positive: \Pr(D_i(1) > D_i(0)) > 0, or equivalently, E[D_i(1) - D_i(0)] \neq 0. This prevents weak instruments that fail to generate sufficient variation in D, ensuring the denominator in the LATE formula is non-zero and estimable. Relevance is testable via the first-stage F-statistic and is essential for precise inference.[8] In settings with one-sided non-compliance—where, for example, the control group cannot access treatment (no always-takers)—the assumptions simplify: monotonicity holds automatically as D_i(0) = 0 for all i, and the focus shifts to excluding defiers alongside independence, exclusion, and relevance. In contrast, two-sided non-compliance requires the full set, including monotonicity to rule out defiers, while always-takers and never-takers do not bias the complier effect due to their unchanged treatment status across Z, to isolate the complier effect without contamination. These distinctions allow LATE to adapt to experimental designs like encouragement trials.Identification
Core Identification Formula
The local average treatment effect (LATE) is identified through the ratio of reduced-form intent-to-treat (ITT) effects under the instrumental variables framework. For a binary instrument Z \in \{0, 1\}, the core identification formula is given by \text{LATE} = \frac{E[Y \mid Z=1] - E[Y \mid Z=0]}{E[D \mid Z=1] - E[D \mid Z=0]}, where Y is the observed outcome, D is the observed treatment receipt, and the expectations denote population averages conditional on the instrument value.[6] This expression equals the ratio of the ITT effect on the outcome, \text{ITT}_Y = E[Y \mid Z=1] - E[Y \mid Z=0], to the ITT effect on treatment receipt, \text{ITT}_D = E[D \mid Z=1] - E[D \mid Z=0].[6] The numerator \text{ITT}_Y measures the average causal effect of the instrument on the outcome, capturing overall shifts attributable to Z under the exclusion restriction. The denominator \text{ITT}_D quantifies the average change in treatment compliance induced by Z, reflecting the instrument's influence on who receives the treatment. Their ratio isolates the treatment effect for compliers—the subgroup whose treatment status switches from D=0 to D=1 when Z changes from 0 to 1—yielding a weighted average causal effect specific to this subpopulation.[6] Under the standard assumptions of monotonicity (no defiers) and exclusion ( Z affects Y only through D), this formula precisely equals E[Y(1) - Y(0) \mid D(1) > D(0)], the conditional average treatment effect for compliers in the potential outcomes framework.[6] While the formula assumes a binary instrument for simplicity, it generalizes to continuous instruments by replacing the conditional differences with covariances, such as \text{LATE} = \frac{\text{Cov}(Y, Z)}{\text{Cov}(D, Z)}, maintaining the focus on reduced-form contrasts.[6]Proof of Identification
The proof of identification for the local average treatment effect (LATE) relies on the potential outcomes framework and key assumptions, including independence of the instrument, the exclusion restriction, and monotonicity. Under these conditions, the instrumental variables (IV) estimand equals the average treatment effect for compliers—the subpopulation whose treatment status changes with the instrument. This derivation, originally established by Imbens and Angrist (1994), proceeds step by step, expressing conditional expectations and showing how non-compliers' contributions cancel out. Consider the potential outcomes notation: for each unit i, let Y_i(1) be the outcome under treatment and Y_i(0) under no treatment, with observed outcome Y_i = D_i Y_i(1) + (1 - D_i) Y_i(0), where D_i \in \{0,1\} indicates treatment receipt. The instrument Z_i \in \{0,1\} affects treatment via potential treatment statuses D_i(1) and D_i(0). The independence assumption states that (Y_i(0), Y_i(1), D_i(1), D_i(0)) is independent of Z_i, ensuring the instrument is exogenous and does not directly affect outcomes except through treatment (incorporating the exclusion restriction). Let P(z) = E[D_i | Z_i = z] = \Pr(D_i(z) = 1). The monotonicity assumption further posits that D_i(1) \geq D_i(0) for all i, ruling out defiers (units who take treatment when Z=0 but not when Z=1) and partitioning the population into always-takers (D_i(1) = D_i(0) = 1), never-takers (D_i(1) = D_i(0) = 0), and compliers (D_i(1) = 1, D_i(0) = 0). This implies \Pr(D_i(1) - D_i(0) = -1) = 0. Now, express the conditional expectation of the outcome: E[Y_i | Z_i = z] = E[Y_i(1) | D_i(z) = 1, Z_i = z] \Pr(D_i(z) = 1 | Z_i = z) + E[Y_i(0) | D_i(z) = 0, Z_i = z] \Pr(D_i(z) = 0 | Z_i = z). By independence, the conditional expectations E[Y_i(1) | D_i(z) = 1, Z_i = z] and E[Y_i(0) | D_i(z) = 0, Z_i = z] equal the unconditional E[Y_i(1) | D_i(z) = 1] and E[Y_i(0) | D_i(z) = 0], which do not depend on z. Substituting for z = 1 and z = 0, the difference becomes: E[Y_i | Z_i = 1] - E[Y_i | Z_i = 0] = E[Y_i(1) - Y_i(0) | D_i(1) = 1, D_i(0) = 0] \cdot [P(1) - P(0)], as contributions from always-takers and never-takers cancel under the exclusion restriction (their outcomes are unaffected by Z) and monotonicity (no defiers). The term E[Y_i(1) - Y_i(0) | D_i(1) = 1, D_i(0) = 0] is precisely the treatment effect for compliers. Similarly, for the first stage: E[D_i | Z_i = 1] - E[D_i | Z_i = 0] = \Pr(D_i(1) = 1, D_i(0) = 0) = P(1) - P(0), the proportion of compliers, again by independence and monotonicity, with always-takers and never-takers canceling. The IV estimand, or Wald ratio, \frac{E[Y_i | Z_i = 1] - E[Y_i | Z_i = 0]}{E[D_i | Z_i = 1] - E[D_i | Z_i = 0]}, thus simplifies to \frac{[E[Y_i(1) - Y_i(0) | D_i(1) = 1, D_i(0) = 0] \cdot [P(1) - P(0)]]}{P(1) - P(0)} = E[Y_i(1) - Y_i(0) | D_i(1) = 1, D_i(0) = 0], identifying the LATE as the complier average causal effect. Each step invokes the assumptions: independence for unconditional expectations, exclusion for non-compliers' invariance, and monotonicity for the clean partitioning and absence of defiers.Instrumental Variables Integration
LATE in IV Estimation
In instrumental variables (IV) estimation, the local average treatment effect (LATE) is commonly estimated using the two-stage least squares (2SLS) method, which addresses endogeneity in the treatment variable D by leveraging an instrument Z. In the first stage, D is regressed on Z (and any exogenous covariates) to obtain the predicted values \hat{D}, capturing the exogenous variation induced by Z. In the second stage, the outcome Y is regressed on \hat{D} (and the same exogenous covariates), with the coefficient on \hat{D} providing a consistent estimate of the LATE for compliers—those whose treatment status changes with Z—under the standard IV assumptions of relevance, exclusion, and monotonicity.[6][10] The 2SLS estimator is consistent for the LATE as the sample size increases to infinity, provided the instrument is relevant and the assumptions hold, meaning it converges in probability to the true LATE parameter. Asymptotic standard errors for the 2SLS estimates can be computed to account for potential weak instrument bias, which arises when the first-stage correlation between Z and D is low, though robust standard errors are recommended in finite samples to improve inference reliability.[10] Implementation of 2SLS for LATE estimation is widely available in statistical software, facilitating practical application in econometric analysis. In Stata, theivregress 2sls command performs the two-stage procedure, allowing specification of endogenous regressors and instruments while supporting cluster-robust standard errors to handle heteroskedasticity and dependence.[11] Similarly, in R, the ivreg function from the AER package executes 2SLS estimation with options for robust variance-covariance matrices, essential for valid inference in applied settings with non-i.i.d. data.
When multiple instruments are available, leading to an overidentified system (more instruments than endogenous variables), 2SLS generalizes by projecting D onto the space spanned by all instruments in the first stage, yielding an estimate that remains a LATE but is now local to the combined variation across the instruments, weighted by their relevance. Overidentification tests, such as the Sargan-Hansen statistic, can then assess instrument validity, though the estimand stays confined to compliers responsive to the instrumental variation.[10][12]
Connection to Wald Estimator
The Wald estimator provides a straightforward, non-parametric method for estimating the local average treatment effect (LATE) in settings with a binary instrument Z, treatment D, and outcome Y. It computes the ratio of the reduced-form effect of the instrument on the outcome to the first-stage effect on the treatment, using population expectations or their sample analogs. The estimator is given by \hat{\text{LATE}} = \frac{\bar{Y}_{Z=1} - \bar{Y}_{Z=0}}{\bar{D}_{Z=1} - \bar{D}_{Z=0}}, where \bar{Y}_{Z=z} and \bar{D}_{Z=z} denote the sample means of the outcome and treatment, respectively, conditional on the instrument taking value z \in \{0,1\}.[2][1] Named after the statistician Abraham Wald, who introduced an early version in the context of errors-in-variables models, the estimator was originally proposed in 1940 for fitting linear relationships with measurement error in both variables.[13] Its application to causal inference and LATE was popularized in the seminal work of Angrist, Imbens, and Rubin (1996), who formalized its interpretation within the potential outcomes framework as the average treatment effect for compliers—those whose treatment status changes with the instrument.[8][2] In relation to instrumental variables (IV) estimation, the Wald estimator corresponds exactly to the two-stage least squares (2SLS) estimator when the instrument is binary and there are no covariates, reducing to a simple ratio of differences in means.[2] This equivalence highlights its role as a foundational building block for more general IV methods, particularly in randomized experiments where the instrument is randomly assigned, ensuring efficiency under standard assumptions like monotonicity and exclusion.[1] The Wald estimator's key advantages lie in its non-parametric nature, requiring no functional form assumptions beyond the IV conditions for LATE identification, and its computational simplicity, as it relies solely on group means without iterative optimization.[8] For binary outcomes, it can be adapted using logit or probit models for the first stage if the treatment probability is modeled parametrically, though the basic ratio form remains applicable for linear projections.[2]Applications
Hypothetical Scenarios
To illustrate the local average treatment effect (LATE) in an encouragement design, consider a scenario where a random subset of high school graduates receives a scholarship offer (instrument Z) intended to encourage college attendance (treatment D), with long-term earnings (outcome Y) as the measure of interest. The scholarship offer increases the probability of attendance among recipients but does not mandate it, leading to partial compliance. Under standard assumptions including instrument exogeneity, monotonicity (no defiers who attend only without the offer), and instrument relevance, the LATE identifies the average causal effect of college attendance on earnings specifically for compliers—those students induced to attend by the scholarship offer, or the marginal students who would not otherwise enroll. This contrasts with the average treatment effect (ATE), which averages the impact across the entire population, as LATE isolates the effect for this induced subgroup.[14] In settings with two-sided non-compliance, always-takers—students who attend college regardless of the scholarship offer—further complicate identification, as their treatment effects are not captured by the LATE. For instance, always-takers might enroll due to strong academic motivation or family resources, and their earnings outcomes under attendance remain unaffected by the instrument Z. The LATE excludes these individuals, focusing solely on compliers whose attendance (and thus earnings) responds to Z, while never-takers—who do not attend even with the offer—are also excluded since they receive no treatment in either case. This exclusion ensures the estimand reflects only the policy-relevant effect for those swayed by the encouragement.[14] The schedule of potential outcomes clarifies how LATE arises in this framework, distinguishing groups based on compliance types under Z=0 (no offer) and Z=1 (offer). Potential outcomes are denoted Y(1) for earnings with attendance and Y(0) without.| Compliance Type | D under Z=0 | Y under Z=0 | D under Z=1 | Y under Z=1 |
|---|---|---|---|---|
| Always-Takers | 1 | Y(1) | 1 | Y(1) |
| Compliers | 0 | Y(0) | 1 | Y(1) |
| Never-Takers | 0 | Y(0) | 0 | Y(0) |