Fact-checked by Grok 2 weeks ago

Instrumental variables estimation

Instrumental variables (IV) estimation is a statistical method in and used to identify and estimate causal effects when an explanatory variable is endogenous—meaning it is correlated with the unobserved error term in a model—such as due to omitted variables, measurement error, or reverse causality. The approach relies on an instrumental variable (or instrument), a variable that is correlated with the endogenous explanatory variable (relevance condition) but uncorrelated with the error term (exogeneity or exclusion restriction), enabling consistent estimation of the causal parameter without bias from . The primary purpose of IV estimation is to mimic the conditions of a in observational data by leveraging the to isolate exogenous variation in the or explanatory variable, thereby addressing threats to that plague ordinary least squares (OLS) . Common applications include estimating returns to using quarter of birth as an for schooling (due to compulsory schooling laws), or the of interventions like lotteries or natural experiments where direct is infeasible. For identification, IV requires two core assumptions: the must influence the endogenous variable (first-stage , often tested via an F-statistic greater than 10 to avoid weak ) and must affect the outcome only through the endogenous variable (exclusion restriction, which is non-testable and relies on theoretical justification). Additional assumptions, such as monotonicity (no "defiers" who respond oppositely to the ), ensure the estimator recovers the local (LATE) for "compliers"—those whose status changes with the —rather than the for the entire population. In practice, IV estimation is implemented through methods like the simple Wald estimator for treatments and instruments, given by the ratio of the reduced-form of the instrument on the outcome to its first-stage on the treatment: \hat{\beta}_{IV} = \frac{\text{Cov}(Y, Z)}{\text{Cov}(D, Z)}, or more generally via two-stage (2SLS), where the endogenous variable is first regressed on the instrument(s) to obtain predicted values, which are then used in the second-stage regression on the outcome. While 2SLS is efficient under homoskedasticity and provides standard errors that account for the two-stage procedure, challenges include weak instruments (which bias estimates toward OLS and inflate variance), overidentification (when multiple instruments are available, tested via Sargan or J-statistics), and the need for robust inference in heteroskedastic data. IV methods have become foundational in empirical economics, policy evaluation, and beyond, as detailed in influential texts like Mostly Harmless Econometrics by Angrist and Pischke.

Motivation and Examples

Endogeneity in Regression Models

In regression models, occurs when one or more explanatory variables are correlated with the disturbance term, violating the of strict exogeneity that is necessary for least squares (OLS) to produce unbiased and consistent estimates. This correlation implies that the explanatory variables are not independent of the unobservable factors captured by the error term, leading to systematic errors in parameter estimation. The main sources of endogeneity in OLS regression include , in the explanatory variables, and simultaneous causation. arises when a relevant that affects both the dependent and the included explanatory variables is excluded from the model, causing the term to absorb its influence and correlate with the regressors. in regressors, particularly classical errors where the observed equals the plus an uncorrelated , attenuates coefficients but can induce if the is nonclassical or correlated with the true . Simultaneous causation, common in economic systems, occurs when the dependent influences the explanatory in the same period, as in supply-demand models, creating mutual dependence that correlates both with the . Consider the linear structural equation y = X\beta + \epsilon, where y is the dependent variable, X includes the explanatory variables, \beta are the parameters of interest, and \epsilon is the error term. Under the exogeneity , \Cov(X, \epsilon) = 0, which ensures that OLS consistently estimates \beta by projecting y onto X. In the presence of , however, \Cov(X, \epsilon) \neq 0, so the OLS is inconsistent, as it attributes part of the error's variation to the explanatory variables. The consequences of are evident in the asymptotic behavior of the . In the simple univariate case with y = \beta x + \epsilon and \E(\epsilon | x) \neq 0, the probability limit is given by \plim \hat{\beta}_{\OLS} = \beta + \frac{\Cov(x, \epsilon)}{\Var(x)}, where the second term represents the bias that does not vanish as the sample size increases, potentially overstating or understating the true effect depending on the sign of the . This inconsistency undermines , as the estimated coefficients reflect rather than the isolated impact of x on y. Instrumental variables methods can mitigate this issue by leveraging exogenous variation in instruments correlated with X but not with \epsilon, enabling consistent estimation without relying on the violated exogeneity assumption.

Illustrative Applications

One prominent application of instrumental variables (IV) estimation addresses the in estimating returns to due to ability bias, where unobserved individual correlates with both levels and wages, biasing ordinary least squares (OLS) estimates upward. In a seminal study, Angrist and Krueger (1991) used quarter of birth as an instrument for years of schooling in wage regressions, exploiting compulsory schooling laws that create exogenous variation in based on birth timing—children born in the first quarter of the year are older when starting school and thus more likely to complete additional years compared to those born later. This instrument is relevant because it predicts attainment, and it satisfies the exclusion restriction by affecting wages only through , as birth quarter is unrelated to innate or other wage determinants. Their IV estimates suggested a 7-10% return to an additional year of schooling, lower than the OLS estimate of around 12%, correcting for the ability bias. Another classic example involves using geographic proximity to colleges as an instrument for in estimating wage returns. Card (1995) analyzed data from the National Longitudinal Survey of Young Men cohort, using distance to the nearest as an instrument for years of schooling. Proximity to a reduces the costs of and increases the likelihood of , providing exogenous variation, while it is assumed to affect wages primarily through education rather than directly through local labor market conditions. The IV estimates yielded returns to schooling of 9-13%, higher than the OLS estimate of about 7%, mitigating downward biases in OLS from factors such as measurement error in ability or omitted variables affecting both education and earnings. To illustrate endogeneity and IV correction conceptually, consider a simple simulated data generating process where the true causal effect of an endogenous regressor x on outcome y is \beta = 0.5, but x correlates with the error term due to omitted variables. Specifically, generate data with y = 0.5x + u, where x = \pi z + v, z \sim N(2,1) is the instrument, and errors (u, v) are jointly normal with correlation 0.8 and unit variance, using a large sample of 10,000 observations. OLS estimation yields a biased coefficient of approximately 0.902, overestimating the true effect because the endogeneity inflates the covariance between x and the error. In contrast, IV using z produces an estimate of about 0.510, closely recovering the true \beta, as the instrument provides exogenous variation uncorrelated with u but predictive of x. This simulation highlights how IV isolates the causal channel, reducing bias without requiring direct measurement of confounders. In policy evaluation, IV methods have been applied to assess causal effects in randomized experiments with non-compliance, such as the impact of on student test scores. The Student-Teacher Achievement Ratio (STAR) experiment randomly assigned over 11,600 students to small (13-17 pupils) or regular (22-25 pupils) classes from 1985-1989, but some students switched classes, leading to in observed . Krueger (1999) used initial to small classes as an instrument for actual attended, which is relevant since predicts and exogenous under the , satisfying exclusion by affecting scores only through . The IV estimates indicated that reducing by 10 students increased test scores by about 0.2 standard deviations, particularly benefiting disadvantaged students, providing evidence for policy interventions like smaller classes in early grades.

Historical Development

Early Formulations

The origins of instrumental variables estimation trace back to early 20th-century developments in statistics and , where precursors to modern methods emerged. , a , introduced path analysis in the as a technique to decompose correlations into direct and indirect causal effects within structural equation models, laying foundational groundwork for handling interdependent relationships in data. This method, detailed in his seminal 1921 paper, emphasized the use of diagrammatic representations to trace causal paths, which anticipated later econometric tools for addressing confounding in simultaneous systems. Philip G. Wright, an economist and son of , extended these ideas to economic applications in 1928, proposing an early form of instrumental variables as a solution to identification challenges in models affected by . In his analysis of tariffs on animal and vegetable oils, Wright suggested using external variables—such as lagged prices or exogenous factors—to isolate causal effects, effectively treating them as instruments to mitigate from correlated errors. This grouping approach served as a precursor to instrumental variables estimation, demonstrating its utility in by averaging estimates across multiple instruments to improve reliability in the presence of measurement errors and omitted variables. In the 1940s, econometricians began formalizing these concepts amid growing recognition of simultaneity biases in economic models. Trygve Haavelmo's probability approach revolutionized the field by framing econometric within a framework, explicitly highlighting how simultaneous equations lead to biased ordinary least squares estimates due to correlated disturbances. His work underscored the need for identification strategies to distinguish structural relations from reduced-form correlations, setting the stage for methods to resolve these issues. Complementing this, Tjalling C. Koopmans and William C. Hood advanced identification theory in simultaneous systems during the mid-1940s Cowles Commission efforts, emphasizing conditions under which exogenous variables could uniquely determine model parameters. Their contributions clarified the role of restrictions—like exclusion conditions—in enabling consistent estimation, bridging early statistical insights to rigorous econometric practice.

Key Advancements in Econometrics

In the 1950s, the Cowles Commission played a pivotal role in formalizing instrumental variables within the framework of simultaneous equations models, building on earlier inspirations from statistical identification problems. A foundational contribution came from T.W. Anderson and H. Rubin, who in 1949 developed methods for estimating parameters of a single equation in a complete system of linear stochastic relations, emphasizing conditions for identification using instrumental variables to address simultaneity bias. This work established the rank and order conditions for identification, which remain central to IV theory. Complementing this, R.L. Basmann in 1957 proposed a generalized classical method of linear estimation for structural equations, introducing the limited information maximum likelihood (LIML) estimator as an alternative to full-system approaches, which proved computationally efficient for overidentified models. Henri Theil's 1953 development of k-class estimators marked a significant precursor to two-stage (2SLS), offering a flexible family of estimators that interpolate between ordinary and indirect by adjusting for via variables. These estimators, detailed in Theil's mimeographed and later elaborated in his writings, provided a unified framework for handling incomplete observations and simultaneity in multiple regression contexts. The 1960s saw further advancements integrating Bayesian perspectives and efficient estimation techniques. Arnold Zellner introduced Bayesian approaches to instrumental variables, particularly in analyzing regression models with unobservable variables, as explored in his 1970 work that laid groundwork for posterior inference in IV settings during the decade's Bayesian surge. Concurrently, Dale W. Jorgenson contributed early ideas toward (GMM) through efficient instrumental variables estimation in simultaneous equations, notably in his 1971 collaboration with J.M. Brundy on constructing optimal instruments without initial reduced-form estimation. By the 1970s, Arthur S. Goldberger's writings solidified IV applications in linear models, with his 1972 paper on of regressions containing unobservables highlighting IV's role in handling measurement error and endogeneity. Goldberger's contributions, including extensions to full information estimators, influenced pedagogical texts and practical implementations, emphasizing the method's robustness in econometric modeling.

Core Theory and Assumptions

Identification Conditions

Instrumental variables estimation addresses in the y = X\beta + u, where y is an n \times 1 of outcomes, X is an n \times K of endogenous regressors, \beta is a K \times 1 of parameters, and u is an n \times 1 term with E(X'u) \neq 0. An Z ( n \times L ) is introduced such that the first-stage relation is X = Z\Pi + V, where \Pi is an L \times K of coefficients, V is an , and the exogeneity condition holds: E(Z'u) = 0. The of Z is assumed to be L, with L \geq K allowing for potential overidentification. Identification requires two key conditions: the order condition and the rank condition. The order condition states that the model is if the number of instruments L is at least as large as the number of endogenous regressors K (i.e., L \geq K). This ensures there are sufficient independent sources of exogenous variation to solve for the K parameters in \beta. When L = K, the model is just-identified, yielding a unique solution analogous to solving a square ; when L > K, it is over-identified, providing additional instruments that allow testing of overidentifying restrictions but requiring all instruments to satisfy exogeneity. The intuition for solvability under L \geq K is that the instruments must span the space of the endogenous regressors in the onto the exogenous variation, preventing of the structural parameters. The rank condition complements the order condition by requiring that the matrix E(Z'X) (or equivalently, \Pi) has full column rank K, meaning the instruments are relevant and provide linearly independent variation in the endogenous regressors. This ensures that the covariance between Z and X is of full rank, so the first-stage isolates exogenous components without collapse to zero or . Without full rank, even if L \geq K, fails as the instruments do not sufficiently predict X. Under these conditions, \beta is identified via the population moment condition: there exists a matrix Z such that E[Z(X\beta + u)] = E[ZX]\beta, which simplifies to E[ZX]\beta since E[Zu] = 0. This equates the projected moments of the outcome equation to the structural parameters, enabling consistent estimation when the order and rank conditions hold.

Exclusion and Relevance Restrictions

The exclusion restriction and the relevance condition constitute the two core assumptions underlying the validity of instrumental variables (IV) estimation. The relevance condition requires that the instrument Z is sufficiently correlated with the endogenous explanatory variable X, ensuring that Z provides meaningful variation in X for identification purposes. In the case of a single endogenous regressor, this is expressed as \operatorname{Corr}(Z, X) \neq 0; more generally, for multiple instruments and regressors, the matrix E[Z'X] must have full column rank. Violation of relevance leads to weak instruments, where the IV estimator exhibits substantial finite-sample bias and poor inference properties, even in large samples. Instruments are considered strong if the first-stage F-statistic exceeds 10, a rule of thumb indicating adequate explanatory power of Z for X. The exclusion restriction mandates that the instrument Z influences the outcome Y solely through its effect on the endogenous variable X, with no direct pathway from Z to Y. Mathematically, this implies that the partial derivative of Y with respect to Z, holding X fixed, is zero: \frac{\partial Y}{\partial Z} = 0. Equivalently, in the structural equation for Y, the coefficient on Z is zero, as Z is excluded from this equation after accounting for its role via X. Breaches of the exclusion restriction introduce direct confounding, rendering the IV estimator inconsistent by failing to isolate the causal channel through X. Both restrictions must hold jointly to ensure the consistency of the IV estimator, as relevance alone cannot compensate for exclusion violations, and vice versa; their absence results in biased estimates that mimic ordinary least squares inconsistencies. In settings with binary treatment, the exclusion restriction is often supplemented by a monotonicity assumption, which posits that the instrument does not reverse the treatment assignment for any subgroup (i.e., no "defiers" exist), thereby supporting interpretation of the IV estimand without altering the core exclusion requirement.

Graphical and Conceptual Frameworks

Directed Acyclic Graphs for IV

Directed acyclic graphs (DAGs) provide a visual framework for representing causal assumptions in instrumental variables () estimation, where nodes represent variables and directed arrows denote causal influences. These graphs are acyclic, meaning no cycles exist among the arrows, ensuring a clear temporal or causal ordering. Backdoor paths in DAGs illustrate , defined as non-directed paths from the treatment variable to the outcome that pass through common causes, potentially biasing causal estimates if unblocked. In applications, a DAG typically depicts the instrument Z causally influencing the endogenous X, which in turn affects the outcome Y, forming the Z \to X \to Y. Crucially, no direct connects Z to Y (enforcing the exclusion restriction), and Z shares no common unobserved causes with Y or X beyond this (ensuring independence from unobservables). This configuration blocks backdoor paths from X to Y—such as those through unobserved confounders—by leveraging Z's exogeneity, allowing of the causal effect of X on Y. A illustrative example involves estimating the causal effect of (X) on wages (Y), confounded by unobserved (U). The DAG includes arrows U \to X and U \to Y for , X \to Y for the treatment effect, and quarter of birth (Z) as the with Z \to X (due to compulsory schooling laws tying school entry to birth quarter), but no arrows from Z to Y or Z to U. This structure isolates the effect of education by exploiting variation in Z that influences schooling without directly impacting wages or ability. The path in a DAG parallels the front-door criterion, where causation is via an intermediate variable free of direct , but differs in that [Z](/page/Z) serves as an external rather than an observed . D-separation, a graphical criterion, verifies by confirming that conditioning on appropriate variables (here, the instrument's role) closes all backdoor paths while leaving the causal path open, thus enabling unbiased estimation.

Criteria for Instrument Selection

Selecting valid instruments in instrumental variables (IV) estimation requires satisfying two primary conditions: , where the instrument correlates strongly with the endogenous explanatory variable, and the exclusion restriction, where the instrument affects the outcome only through the endogenous variable. These criteria ensure the instrument provides exogenous variation for causal without introducing . Economic theory often guides initial selection by identifying variables that plausibly influence the endogenous regressor, such as policy changes or natural experiments that shift behavior without directly impacting outcomes. Relevance is assessed through both theoretical justification and empirical pre-tests. Theoretically, instruments should stem from mechanisms that credibly affect the endogenous , like exogenous shocks in supply chains influencing firm . Empirically, the first-stage tests this via t-statistics on the instrument's or, preferably, the F-statistic on excluded instruments, with a requiring an F-statistic greater than 10 to avoid weak instrument bias. Weak occurs when the is low, leading to finite-sample biases that inflate standard errors and distort , as demonstrated in simulations where low first-stage correlations produced IV estimates deviating substantially from true effects. Validating the exclusion restriction relies heavily on , as it cannot be directly tested from data alone. Instruments should represent exogenous shocks uncorrelated with unobservables affecting the outcome, such as randomized lotteries assigned to treatments, ensuring no direct pathway to the dependent . Researchers must rule out direct effects through theoretical arguments, for instance, confirming that a geographic variation influences only via and not through local economic spillovers. Directed acyclic graphs can aid this by visualizing potential pathways, highlighting instruments that block but preserve the desired link. A key trade-off in instrument selection involves the number of instruments: more instruments enhance efficiency by exploiting additional variation, but they increase the of including invalid ones, amplifying bias if exclusion fails for even a . A common guideline for overidentified models is to use one more instrument than endogenous regressors (L = K + 1), allowing an overidentification while minimizing proliferation risks. Common pitfalls include irrelevant instruments, which weaken identification and mimic OLS biases, and invalid ones correlated with errors, violating exogeneity. For example, using random lottery numbers as an instrument for income effects would fail relevance if they do not correlate with earnings-determining choices, rendering the approach ineffective despite randomness. Such errors underscore the need for rigorous pre-selection scrutiny to balance theoretical plausibility with empirical strength.

Estimation Procedures

Two-Stage Least Squares

Two-stage least squares (2SLS) is a widely used for instrumental variables () models in settings where endogenous regressors require correction for with the error term. It operates by projecting the endogenous variables onto the space spanned by the instruments in a preliminary step, thereby isolating the exogenous variation needed for consistent . The method was independently developed by Theil in 1953 and Basmann in 1957 as a practical approach to estimating parameters in systems of simultaneous equations. Under the standard IV assumptions of and exogeneity of the instruments, 2SLS delivers consistent estimates of the structural parameters. The procedure consists of two distinct stages. In the first stage, each endogenous regressor X (an n \times k matrix) is regressed on the matrix of instruments Z (an n \times m matrix, where m \geq k) using ordinary least squares (OLS), yielding the fitted values \hat{X} = Z(Z'Z)^{-1}Z'X. This step purges the endogenous components from X, producing an instrumented version \hat{X} that is uncorrelated with the structural error term. In the second stage, the outcome variable y (an n \times 1 vector) is regressed on \hat{X} via OLS, resulting in the 2SLS estimator \hat{\beta}_{2SLS} = (\hat{X}'\hat{X})^{-1}\hat{X}'y. An equivalent closed-form expression for the 2SLS estimator avoids explicit computation of \hat{X} and is given by \hat{\beta}_{2SLS} = (X'P_Z X)^{-1} X'P_Z y, where P_Z = Z(Z'Z)^{-1}Z' is the projection matrix onto the column space of Z. This formulation ensures numerical stability and is the basis for implementation in statistical software. Directly substituting the generated regressors \hat{X} into the second-stage OLS can introduce an errors-in-variables bias in the estimated standard errors; thus, modern implementations compute the closed form to obtain correct inference. In the just-identified case where the number of instruments equals the number of endogenous regressors (m = k), the 2SLS coincides exactly with the simple and is uniquely determined without projection. More generally, 2SLS is consistent for the true parameters as the sample size grows, provided the instruments satisfy the relevance condition (nonzero correlation with the endogenous regressors) and the exclusion restriction (uncorrelated with the error term). The is asymptotically , enabling standard testing and confidence intervals under homoskedasticity, though robust variants address heteroskedasticity.

Generalized Method of Moments

The (GMM) serves as a broad framework for instrumental variables () estimation, encompassing and generalizing approaches like two-stage (2SLS) by exploiting moment conditions derived from the of instruments to the error term. In the linear IV model y = X \beta + u, where X includes endogenous regressors and instruments Z satisfy E[Z^T u] = 0, the population moment conditions are E[Z^T (y - X \beta)] = 0. The GMM estimator targets these by minimizing a of the sample moments: \hat{\beta}_{\text{GMM}} = \arg\min_{\beta} \left( \frac{1}{n} Z^T (y - X \beta) \right)^T W \left( \frac{1}{n} Z^T (y - X \beta) \right), where n is the sample size and W is a positive definite weighting . This setup allows for flexible estimation when the number of instruments exceeds the number of endogenous regressors, enabling efficiency improvements over simpler methods. The efficiency of the GMM estimator depends critically on the choice of W; the optimal weighting matrix is the inverse of the asymptotic of the sample moments, W = S^{-1}, where S = \text{AsyVar}(\sqrt{n} \cdot \frac{1}{n} Z^T u). Using this optimal W yields the minimum asymptotic variance among GMM estimators satisfying the moment conditions, with the given by \sqrt{n} (\hat{\beta}_{\text{GMM}} - \beta) \xrightarrow{d} N(0, (G^T S^{-1} G)^{-1}), where G = E[Z^T X] represents the expected of regressors onto instruments. In practice, a estimates S from first-stage residuals before applying the optimal W, or iterative methods refine it further for under general structures. In relation to 2SLS, the latter emerges as a special case of GMM when W is proportional to the , which assumes homoskedastic errors and is efficient only in just-identified models (where the number of instruments equals the number of endogenous variables). For overidentified models, the optimal GMM weighting enhances efficiency by downweighting less informative moments, reducing the asymptotic variance relative to 2SLS. This connection highlights GMM's role in unifying IV procedures under a method-of-moments . GMM extends naturally to non-linear IV models by replacing linear projections with general moment conditions E[g(Z, X, y; \beta)] = 0, allowing estimation of parameters in systems where relationships are non-linear in \beta. Additionally, heteroskedasticity-robust variants estimate S using kernel or cluster methods to account for error variance depending on covariates or observations, ensuring valid without strong distributional assumptions. These features make GMM particularly valuable in modern econometric applications with complex data structures.

Interpretation and Extensions

As a Predictor Under Homogeneity

Under the homogeneity assumption in instrumental variables (IV) estimation, the treatment effect parameter β is constant across all units in the , implying a uniform causal impact of the endogenous regressor X on the outcome Y in the structural Y = \mu + X\beta + \varepsilon. This assumption aligns with the classical where treatment effects do not vary by individual characteristics or compliance status, enabling straightforward identification of the as the structural parameter β itself. The IV estimator serves as the best linear predictor of Y using X, adjusted for endogeneity via the instrument Z, by minimizing the expected squared prediction error E[(Y - X\beta)^2] in a framework weighted by the inverse variance of the first-stage projections under the orthogonality condition E[Z\varepsilon] = 0. This minimization occurs within the span of the instruments, ensuring the estimator is unbiased for β when homoskedasticity holds and the model is correctly specified. In population terms, the IV coefficient β satisfies the moment condition E[Z(Y - X\beta)] = 0, which, under homogeneity, directly recovers the constant structural effect without bias from correlation between X and ε. In the simple case with a single endogenous regressor and instrument, the Wald estimand is \beta_{IV} = \frac{\text{Cov}(Y, Z)}{\text{Cov}(X, Z)}, and under homogeneity, this equals the structural β, representing the average causal effect across the entire population. Unlike ordinary least squares (OLS), which yields a biased estimate plim \hat{\beta}_{OLS} = \beta + \frac{\text{Cov}(X, \varepsilon)}{\text{Var}(X)} due to endogeneity (\text{Cov}(X, \varepsilon) \neq 0), the IV estimator is unbiased and consistent for the average effect provided the instrument satisfies relevance (\text{Cov}(X, Z) \neq 0) and exclusion (\text{Cov}(Z, \varepsilon) = 0) restrictions. Thus, in the population, \beta_{IV} = \beta holds if exclusion, relevance, and homogeneity are satisfied, as the IV projection aligns perfectly with the linear structural form: \beta = (E[Z'X])^{-1} E[Z'Y] This equivalence underscores IV's role in delivering the true constant , free from the that plagues OLS.

Local Average Treatment Effects

When effects are heterogeneous, meaning the causal of the on the outcome varies across individuals (i.e., \beta_i differs by unit i), the instrumental variables (IV) estimand does not identify the (ATE) for the entire population. Instead, under specific assumptions, it identifies the local (LATE) for a subpopulation known as compliers—those individuals whose status changes in response to the instrument Z. This contrasts with the homogeneous effects case, where IV recovers a uniform across all units. The LATE is formally defined in the potential outcomes framework as the average treatment effect for compliers: \text{LATE} = \mathbb{E}[y_1 - y_0 \mid D_1 > D_0], where D denotes treatment receipt (with potential outcomes D_1 and D_0 under Z=1 and Z=0, respectively), and y_1, y_0 are the potential outcomes under treatment and no treatment. Compliers are precisely those with D_1 > D_0, meaning they receive treatment when assigned to the instrument but not otherwise. Identification of the LATE requires the monotonicity assumption, which rules out defiers (individuals with D_1 < D_0), ensuring that the instrument affects treatment status in only one direction. Under monotonicity, along with the standard exclusion restriction (the instrument affects the outcome only through ) and (the is randomly assigned or as good as random), the IV estimand (Wald estimator) equals the LATE: \beta_{IV} = \frac{\mathbb{E}[Y \mid Z=1] - \mathbb{E}[Y \mid Z=0]}{\Pi} = \text{LATE}, where \Pi = \mathbb{E}[D \mid Z=1] - \mathbb{E}[D \mid Z=0] is the average change in treatment probability induced by the (the first-stage , equal to the proportion of compliers). This result is encapsulated in the Imbens-Angrist theorem, which establishes that for binary Z and binary D, the Wald estimand—\frac{\mathbb{E}[Y \mid Z=1] - \mathbb{E}[Y \mid Z=0]}{\mathbb{E}[D \mid Z=1] - \mathbb{E}[D \mid Z=0]}—precisely recovers the LATE for compliers. Despite its rigor, the LATE framework has key limitations: it identifies effects only for compliers, who may not represent the broader population, so the differs from the population ATE and cannot be straightforwardly extrapolated to other subgroups like always-takers or never-takers. This subpopulation specificity can complicate policy interpretations, as the complier proportion \Pi often varies across contexts, limiting generalizability.

Diagnostic Challenges

Detecting Weak Instruments

Weak instruments arise when the correlation between the instrumental variables Z and the endogenous regressor X is low, resulting in poor explanatory power in the first-stage regression. This weakness undermines the relevance condition required for consistent IV estimation, causing the two-stage (2SLS) estimator to exhibit substantial finite-sample toward the ordinary (OLS) estimator, which is itself inconsistent due to . An approximate formula for this bias is given by \hat{\beta}_{\text{IV}} \approx \beta + \frac{\sigma_u}{\sigma_v \, n F}, where \beta is the true parameter, \sigma_u is the standard deviation of the structural error, \sigma_v is the standard deviation of the first-stage residual, n is the sample size, and F is the first-stage F-statistic for the instruments. This bias increases as the instrument strength, proxied by F, decreases, highlighting how even moderately weak instruments can dominate the estimation error in applied settings. Detection of weak instruments typically relies on the first-stage F-statistic from the of X on Z; a suggests instruments are weak if F < 10. In overidentified cases with multiple instruments, the Cragg-Donald Wald F-statistic provides a more refined test, where critical values tabulated by Stock and Yogo allow assessment of based on criteria such as maximal 2SLS relative to OLS or size distortions in t-tests. The consequences of weak instruments extend beyond bias to inflated variance in the IV estimator, which amplifies uncertainty and leads to invalid confidence intervals. simulations illustrate severe size distortions in hypothesis testing, with actual rejection probabilities under the null often exceeding nominal levels by factors of 2–10 times when F is low, rendering standard inference unreliable. To mitigate these issues, alternatives like limited information maximum likelihood (LIML) are recommended, as they exhibit reduced bias and better finite-sample properties under weak instrument conditions.

Validating Exclusion Restrictions

Validating the exclusion restriction in instrumental variables (IV) is , as it posits that instruments affect the outcome only through their on the endogenous explanatory variables. While this assumption cannot be directly tested in just-identified models, overidentified systems allow for diagnostic tests that assess whether all instruments are orthogonal to the error term. These tests leverage the additional moments provided by extra instruments to evaluate the overall validity of the exclusion restrictions. The primary overidentification test is the Sargan test, originally proposed for limited information maximum likelihood estimation, and its extension, the Hansen J-test, which applies under heteroskedasticity-robust conditions in the generalized method of moments (GMM) framework. The Hansen J-statistic is calculated as J = n \cdot \hat{u}^\top P_Z \hat{u}, where \hat{u} denotes the residuals from the IV-estimated structural equation, P_Z = Z (Z^\top Z / n)^{-1} Z^\top is the projection matrix onto the instrument space Z, and n is the sample size; equivalently, it equals n R^2 from an auxiliary regression of \hat{u} on Z. Under the null hypothesis that all instruments are valid—satisfying both the exclusion restriction and exogeneity—the J-statistic follows a \chi^2 distribution with degrees of freedom L - K, where L is the number of instruments and K is the number of endogenous regressors. A rejection of the null suggests that at least one instrument violates the exclusion restriction or is correlated with the errors, though the test lacks power to identify which specific instrument is invalid. The test gains power against violations when multiple instruments are present, as invalid ones can systematically correlate with residuals. Complementing overidentification tests, the Kleibergen-Paap statistic addresses underidentification, which indirectly supports exclusion validation by confirming as part of overall IV validity. This LM test statistic, based on the of the first-stage , tests the that the of the matrix of reduced-form coefficients on instruments is less than required for identification; under the alternative, it follows a \chi^2 distribution with appropriate and is robust to heteroskedasticity and clustering. Failure to reject underidentification indicates weak or irrelevant instruments, undermining the ability to credibly assess exclusion. Placebo tests offer a falsification approach to probe the exclusion restriction by estimating IV effects on "irrelevant" or outcomes that should not be affected by the or under valid exclusion. For instance, one might use the to predict outcomes like pre-treatment variables or unrelated proxies, expecting null effects; significant effects suggest violation of exclusion, as the influences the placebo outcome directly or through unmodeled channels. These tests provide indirect evidence but rely on the researcher's judgment in selecting appropriate . Despite their utility, these validation methods have limitations. Overidentification tests like the J-statistic exhibit low power when the number of overidentifying restrictions (L - K) is small, making it difficult to detect subtle violations, and they are inapplicable in just-identified models where no overidentifying moments exist for testing. Placebo tests, while intuitive, can suffer from specification issues if placebos are poorly chosen, and they do not constitute formal statistical tests of exclusion.

Statistical Inference

Asymptotic Properties

Under the standard assumptions of instrumental variables estimation—namely, that the instruments are exogenous (uncorrelated with the structural error term), relevant (correlated with the endogenous regressors), and that the data satisfy linearity and random sampling conditions—the IV estimator is consistent. Specifically, as the sample size n approaches infinity, the probability limit of the IV estimator satisfies \operatorname{plim} \hat{\beta}_{\text{IV}} = \beta, where \beta is the true parameter vector. The IV estimator is also asymptotically normal. Under homoskedasticity (where the error variance is constant conditional on the instruments), the normalized estimator converges in distribution to \sqrt{n} (\hat{\beta}_{\text{IV}} - \beta) \xrightarrow{d} N(0, V). For the just-identified case with exactly as many instruments as endogenous regressors, the asymptotic covariance matrix is V = (\Pi' E[ZZ'] \Pi)^{-1} \sigma^2_{\varepsilon}, where \Pi denotes the first-stage coefficients from the projection of the endogenous variables onto the instruments, Z is the matrix of instruments, and \sigma^2_{\varepsilon} is the variance of the structural error. In overidentified models, where the number of instruments exceeds the number of endogenous regressors (as in two-stage least squares estimation), the asymptotic covariance matrix simplifies to V = \sigma^2_{\varepsilon} (E[X' P_Z X / n])^{-1}, with P_Z = Z(Z'Z)^{-1}Z' the projection matrix onto the instrument space and X the regressors; this form reflects greater efficiency compared to the just-identified case, as additional valid instruments strengthen the projection and reduce the variance. When errors are heteroskedastic, the homoskedasticity fails, but and asymptotic hold provided the moments of the errors are finite; however, the requires a robust adjustment. The heteroskedasticity-consistent asymptotic is V = (E[X' P_Z X / n])^{-1} (E[X' P_Z \Omega P_Z X / n]) (E[X' P_Z X / n])^{-1}, where \Omega = E[\varepsilon^2 Z Z']. This form accounts for the conditional heteroskedasticity in the errors. The asymptotic variance V explicitly depends on the strength of the first-stage relationship, captured by \Pi. If the instruments are weak such that \Pi \to 0, then V \to \infty, implying that the IV estimator becomes highly inefficient even in large samples.

Hypothesis Testing Frameworks

In instrumental variables (IV) estimation, hypothesis testing typically relies on the asymptotic normality of the estimator \hat{\beta}, where standard errors are derived from the asymptotic variance-covariance matrix V, yielding \text{se}(\hat{\beta}) = \sqrt{V/n} for sample size n. The t-statistic for testing H_0: \beta = \beta_0 is then computed as t = (\hat{\beta} - \beta_0)/\text{se}(\hat{\beta}), which under the null follows a standard normal distribution asymptotically. Confidence intervals for \beta are constructed as \hat{\beta} \pm t_{\text{crit}} \cdot \text{se}(\hat{\beta}), where t_{\text{crit}} is the critical value from the t-distribution with degrees of freedom adjusted for overidentification in cases with more instruments than endogenous regressors; this adjustment accounts for the estimation of the projection matrix and ensures consistent variance estimation. For testing joint significance, such as H_0: \beta_1 = \beta_2 = \cdots = \beta_k = 0 for multiple coefficients, the is employed, forming a W = (\hat{\beta}' R') (R V R')^{-1} \hat{\beta} \cdot n, which is asymptotically \chi^2_k distributed under the , where R imposes the restrictions. This F-statistic variant is particularly useful in overidentified models to assess the overall relevance of instruments or regressors. When instruments are weak, conventional t-tests and confidence intervals suffer from size distortions and poor coverage; the Anderson-Rubin (AR) test addresses this by testing H_0: \beta = \beta_0 via the statistic A_T(\beta_0) = \frac{(y - X\beta_0)' P_Z (y - X\beta_0) / K}{(y - X\beta_0)' M_Z (y - X\beta_0) / (n - K)}, where P_Z = Z(Z'Z)^{-1}Z' is the projection onto instruments Z with K columns, and M_Z = I - P_Z, which follows a \chi^2_K / K distribution asymptotically under the null regardless of instrument strength. The corresponding AR confidence set inverts this test for robust inference. For small samples or non-normal errors, bootstrap methods provide an alternative for inference, such as the percentile bootstrap, which resamples residuals from the IV model to generate empirical distributions of \hat{\beta}; the 95% interval is the 2.5th to 97.5th percentiles of bootstrapped estimates, offering improved finite-sample performance over asymptotic approximations.

References

  1. [1]
    Instrumental Variables
    Instrumental Variable estimation is used when the model has endogenous X's and can address important threats to internal validity. Learn more.
  2. [2]
    [PDF] Instrumental Variables Estimation - Colin Cameron
    Basic instrumental variables is presented in many texts. The following present LATE in addition to IV. Joshua D. Angrist and Jörn-Steffen Pischke (2015), ...
  3. [3]
    [PDF] 1 Instrumental Variables - LSE Economics Department
    1.1 Basics. A good baseline for thinking about the estimation of causal effects is often the randomized experiment, like a clinical trial.
  4. [4]
    [PDF] Endogeneity in Empirical Corporate Finance∗
    This equation shows that the OLS estimate of the endogenous variable's coefficient converges to the true value, βj, plus a bias term as the sample size.
  5. [5]
    [PDF] Instrumental variables and two stage least squares
    Wooldridge, Introductory Econometrics, 4th ed. Many economic models involve endogeneity: that is, a theoretical relationship does not fit into the framework of ...
  6. [6]
    [PDF] Section 11 Endogenous Regressors and Instrumental Variables
    If cov(x, u) ≠ 0, then OLS is biased and inconsistent o Coefficient on x will pick up the effects of the parts of u that are correlated with it.Missing: formula citation
  7. [7]
    Does Compulsory School Attendance Affect Schooling and Earnings?
    We estimate the impact of compulsory schooling on earnings by using quarter of birth as an instrument for education. The instrumental variables estimate of ...
  8. [8]
    [PDF] Using Geographic Variation in College Proximity to Estimate the ...
    This paper explores the use of college proximity as an exogenous determinant of schooling. Analysis of the NLS Young Men Cohort reveals that men who grew up in ...
  9. [9]
    [PDF] 4.8 Instrumental Variables
    The data give information on dy=dx, so OLS estimates the total effect + du=dx rather than alone. The OLS estimator is therefore biased and inconsistent for ,.
  10. [10]
    [PDF] experimental estimates of education production functions* alan b ...
    This paper analyzes data on 11,600 students and their teachers who were randomly assigned to different size classes from kindergarten through third grade.
  11. [11]
    [PDF] The Tariff on Animal and Vegetable Oils
    THE TARIFF ON ANIMAL. AND VEGETABLE OILS. BY. PHILIP G. WRIGHT. WITH THE AID OF THE COUNCIL, AND STAFF. OF THE INSTITUTE OF ECONOMICS. New Bork. THE MACMILLAN ...
  12. [12]
    [PDF] THE TARIFF ON ANIMAL AND VEGETABLE OILS • - DSpace@GIPE
    The principal vegetable oils under consideration are cottonseed oil, linseed oil, olive oil, corn oil, and peanut oil; the principal animal fats are butter,.
  13. [13]
    [PDF] The Probability Approach in Econometrics Author(s): Trygve ...
    It is based upon modern theory of probability and statistical inference. A few words may be said to justify such a study. The method of econometric research ...
  14. [14]
    [PDF] Identification Problems in Economic Model Construction - LIES
    KOOPMANS, "The Logic of Econometric Business Cycle Research,"'. Journal of Political Economy, Vol. 49, 1941, pp. 157-181. [12] T. C. KOOPMANS, "Measurement ...
  15. [15]
    [PDF] Studies in Econometric Method
    Studies in Econometric Method, edited by WM. C. HOOD AND TJALLING. C. KOOPMANS. 1952. 324 pages. Price $5.50. Presents and extends methods developed in ...<|control11|><|separator|>
  16. [16]
    Estimation of the Parameters of a Single Equation in a Complete ...
    Summary. A method is given for estimating the coefficients of a single equation in a complete system of linear stochastic equations (see expression.
  17. [17]
    A Generalized Classical Method of Linear Estimation of Coefficients ...
    valent to those of existing limited-information single-equation estimators, and ... Divinsky, "The Computation of Maximum-likelihood Estimates of Linear ...
  18. [18]
    Some Developments of Economic Thought in the Netherlands
    Theil, H.: 1953, “Estimation and Simultaneous Correlation in Complete Equation Systems.” Mimeographed memorandum of the Central Planning Bureau, The Hague, The ...
  19. [19]
    Estimation of Regression Relationships Containing Unobservable ...
    The relationship of this estimator to certain instrumental variable estimators is set forth. Then in. Section 4, a Bayesian analysis of the model is presented.
  20. [20]
    Efficient Estimation of Simultaneous Equations by Instrumental ... - jstor
    In this paper we present methods for construction of instrumental vari- ables that do not require initial estimation of the reduced form by ordinary least ...
  21. [21]
    [PDF] Instrumental Variables in Statistics and Econometrics
    The method of instrumental variables (IVs) is a general approach to the estimation of causal relations using observational data. This method can be used when.
  22. [22]
    [PDF] Microeconometrics: Chapter 5 Instrumental Variables Estimation
    Oct 8, 2021 · We say the model is (potentially) overidentified. When L = H and the rank condition holds, the model is just identified. Per Johansson.
  23. [23]
    [PDF] Instrumental Variables Regression with Weak Instruments
    This section investigates two solutions to this problem, Anderson-Rubin (1949) (AR) confidence regions and confidence regions based on Bonferroni's inequality.
  24. [24]
    [PDF] Testing for Weak Instruments in Linear IV Regression
    Staiger and Stock (1997) suggested the rule of thumb that, in the n = 1 case, instruments be deemed weak if the first-stage F is less than ten. They motivated ...
  25. [25]
    [PDF] Identification of Causal Effects Using Instrumental Variables
    June 1996, Vol. 91, No. 434, Applications and Case Studies. 444. Page 2. Angrist, Imbens, and Rubin: Causal Effects and Instrumental Variables. 445. IV ...
  26. [26]
    Understanding the Assumptions Underlying Instrumental Variable ...
    Condition (2): Exclusion Restriction. The exclusion restriction condition (2) requires that any effect of the proposed instrument on the outcome is exclusively ...
  27. [27]
    [PDF] Identification and Estimation of Local Average Treatment Effects ...
    Mar 29, 2008 · Imbens; Joshua D. Angrist. Econometrica, Vol. 62, No. 2. (Mar., 1994), pp. 467-475.
  28. [28]
    Causality - Cambridge University Press & Assessment
    Written by one of the preeminent researchers in the field, this book provides a comprehensive exposition of modern analysis of causation.
  29. [29]
    7 Instrumental Variables - Causal Inference The Mixtape
    How do we interpret Angrist's estimated causal effect in his Vietnam draft project? Well, IV estimates the average effect of military service on earnings for ...<|control11|><|separator|>
  30. [30]
  31. [31]
  32. [32]
    [PDF] Instrumental variables and two stage least squares
    Instrumental variables (IV) are a solution to endogeneity, using a variable uncorrelated with the error term but correlated with the explanatory variable. Two- ...
  33. [33]
    [PDF] A Two-Stage Penalized Least Squares Method for Constructing ...
    Later, Theil (1953a,b, 1961) and Basmann (1957) independently developed the 2SLS estimator, which is the simplest and most common estimation method for fitting ...
  34. [34]
    [PDF] Lecture 8 Instrumental Variables - Bauer College of Business
    In the simple one exogenous variable & one IV case, it is easy calculate the bias: b = β + θ0/𝜋. The smaller 𝜋, the bigger the bias (the bias is worse with weak ...
  35. [35]
    [PDF] Lecture: IV and 2SLS Estimators (Wooldridge's book chapter 15)
    OLS is biased because of the x1u link. • To solve the endogeneity or identification issue, we need help—an IV variable z which is outside the box (exclusion), ...
  36. [36]
    [PDF] Instrumental Variables, 2SLS and GMM
    Sep 3, 2009 · The main reason for using 2SLS or GMM is that we suspect that one or several of the explanatory variables are endogenous. If endogeneity is in ...Missing: Jorgenson 1960s
  37. [37]
    [PDF] Instrumental Variables - Purdue University
    Feb 21, 2012 · Now consider a particular, (and at this point completely unjustified), choice of R. Define the two stage least squares (2SLS) estimator of β as:.
  38. [38]
    large sample properties of generalized method of moments - jstor
    IN THIS PAPER we study the large sample properties of a class of generalized method of moments (GMM) estimators which subsumes many standard econo-.
  39. [39]
    Identification of Causal Effects Using Instrumental Variables
    We show that the instrumental variables (IV) estimand can be embedded within the Rubin Causal Model (RCM) and that under some simple and easily interpretable ...
  40. [40]
    [PDF] Endogenous Regressors and Instrumental Variables
    parameter β rather than β*, which is the vector of best linear predictor coefficients for yt given xt. Several examples below will outline cases in which β ...
  41. [41]
    Identification and Estimation of Local Average Treatment Effects - jstor
    (Angrist, Imbens, and Rubin (1993)), we discuss conditions similar to this in great detail, and investigate the implications of violations of these conditions.
  42. [42]
    Identification and Estimation of Local Average Treatment Effects
    Feb 1, 1995 · We investigate conditions sufficient for identification of average treatment effects using instrumental variables.
  43. [43]
    Instrumental Variables Regression with Weak Instruments - jstor
    RUBIN (1949): "Estimation of the Parameters of a Single Equation in a. Complete System of Stochastic Equations," Annals of Mathematical Statistics, 20, 46-63.
  44. [44]
    the Endogenous Explanatory Variable Is Weak - jstor
    They developed an asymptotic distribution theory that does not rely on approximation or on the assumption of normality for IV estimates with weak instruments.
  45. [45]
    [PDF] Instrumental Variables - Kurt Schmidheiny
    In case we have more instruments than necessary, L>K, we can per- form a so-called J-test for overidentifying restrictions. This tests whether. Page 8 ...
  46. [46]
    [PDF] Instrumental Variables Estimation and Two Stage Least Squares
    Oct 18, 2018 · You will see how the method of instrumental variables (IV) can be used to solve the problem of endogeneity of one or more explanatory variables.
  47. [47]
    [PDF] Instrumental Variables Regression with Weak Instruments
    Because the t-statistic does not have an asymptotic normal distribution, confidence intervals contructed as ±1.96 standard errors will not in general have a 95% ...
  48. [48]
    Bootstrap inference in a linear equation estimated by instrumental ...
    Summary. We study several tests for the coefficient of the single right‐hand‐side endogenous variable in a linear equation estimated by instrumental variables.