Frisch–Waugh–Lovell theorem
The Frisch–Waugh–Lovell theorem, often abbreviated as the FWL theorem, is a fundamental result in econometrics that establishes the equivalence between the ordinary least squares (OLS) coefficients for a subset of regressors in a multiple linear regression model and the coefficients obtained from a simpler auxiliary regression involving residuals from projections onto the complementary set of regressors.[1][2] Specifically, for a model y = X_1 \beta_1 + X_2 \beta_2 + \epsilon, where X = [X_1 \, X_2] partitions the regressors, the estimates of \beta_2 are identical to those from regressing the residuals of y orthogonalized against X_1 on the residuals of X_2 orthogonalized against X_1.[3] This decomposition highlights the partialling-out interpretation of OLS, where the effect of X_2 is isolated after controlling for X_1, and it holds under standard assumptions of linear regression, including no perfect multicollinearity within subsets.[4] Named after economists Ragnar Frisch, Frederick V. Waugh, and Michael C. Lovell, the theorem originated in Frisch and Waugh's 1933 paper, which applied it to handling linear time trends in time-series data by comparing "partial time regressions" to "individual trends," using determinants and Cramer's rule to prove the result for cases with a single variable in one subset.[1] Lovell extended and generalized the theorem in 1963, employing matrix algebra and projection operators to cover arbitrary partitions of multiple regressors in both subsets, while also addressing the equivalence of residual vectors and covariance matrix adjustments under homoskedasticity.[2] Although earlier work by G. Udny Yule in 1907 anticipated aspects of the result using algebraic methods for bivariate partial regressions, the FWL formulation became canonical in modern econometrics.[4] The theorem's implications extend beyond estimation to computational efficiency, theoretical proofs, and extensions in generalized linear models, instrumental variables, and high-dimensional settings.[5] For instance, it underpins the use of orthogonalization in software implementations of OLS to avoid numerical instability from correlated regressors and facilitates understanding omitted variable bias or marginal effects in controlled regressions.[3] Recent developments, such as Ding's 2021 extension to heteroskedasticity-consistent covariance matrices, affirm its robustness while requiring adjustments for inference in non-iid error settings.[6] Overall, the FWL theorem remains a cornerstone for interpreting multivariate relationships, emphasizing that multiple regression coefficients capture isolated partial associations.Overview
Theorem Statement
The Frisch–Waugh–Lovell theorem states that, in a multiple linear regression model, the ordinary least squares (OLS) estimates of the coefficients on a subset of the regressors (the included regressors) are numerically identical to the OLS estimates obtained from a simpler auxiliary regression of the residuals from regressing the dependent variable on the excluded regressors against the residuals from regressing each of the included regressors on the excluded regressors.[7][8] Consider the linear model\mathbf{Y} = \mathbf{X} \boldsymbol{\beta} + \mathbf{Z} \boldsymbol{\gamma} + \boldsymbol{\varepsilon},
where \mathbf{Y} is an n \times 1 vector of observations on the dependent variable, \mathbf{X} is an n \times k full-rank matrix of included regressors, \mathbf{Z} is an n \times m full-rank matrix of excluded regressors, \boldsymbol{\beta} is a k \times 1 vector of coefficients on the included regressors, \boldsymbol{\gamma} is an m \times 1 vector of coefficients on the excluded regressors, and \boldsymbol{\varepsilon} is an n \times 1 vector of errors. The theorem asserts that the OLS estimator \hat{\boldsymbol{\beta}} from the full model equals the OLS estimator from the auxiliary regression
\tilde{\mathbf{Y}} = \tilde{\mathbf{X}} \hat{\boldsymbol{\beta}} + \tilde{\boldsymbol{\varepsilon}},
or equivalently,
\hat{\boldsymbol{\beta}} = (\tilde{\mathbf{X}}' \tilde{\mathbf{X}})^{-1} \tilde{\mathbf{X}}' \tilde{\mathbf{Y}},
where \tilde{\mathbf{X}} = (\mathbf{I} - \mathbf{P}_{\mathbf{Z}}) \mathbf{X}, \tilde{\mathbf{Y}} = (\mathbf{I} - \mathbf{P}_{\mathbf{Z}}) \mathbf{Y}, and \mathbf{P}_{\mathbf{Z}} = \mathbf{Z} (\mathbf{Z}' \mathbf{Z})^{-1} \mathbf{Z}' is the projection matrix onto the column space of \mathbf{Z}.[7][8] The theorem holds under the assumption that the matrix [\mathbf{X} \ \mathbf{Z}] has full column rank (ensuring no perfect multicollinearity and that the projection matrices are well-defined).[7][8]
Motivation and Importance
The Frisch–Waugh–Lovell theorem emerged in the pre-computer era to address the computational challenges of multivariate regression analysis, particularly in econometrics where isolating the effect of specific variables amid correlated predictors was labor-intensive. Ragnar Frisch and Frederick V. Waugh developed the core idea in 1933 to resolve debates over de-trending time series data—whether to remove trends from individual variables separately or include time as a covariate in the full model—demonstrating their equivalence and simplifying manual calculations by breaking complex regressions into simpler auxiliary steps.[1] This approach reduced the burden of solving full normal equations by hand, which required inverting large matrices without mechanical aids, allowing analysts to focus on subsets of variables without re-estimating the entire model.[9] The theorem's importance lies in its facilitation of hypothesis testing on specific coefficients while controlling for others, enabling researchers to assess the isolated impact of variables of interest in large, multicollinear datasets. It underpins the concept of partial correlations by showing that ordinary least squares coefficients represent effects net of other regressors, which is crucial for diagnosing omitted variable bias—where excluding key controls distorts estimates—and for efficient model specification in empirical work.[10] Michael Lovell's generalization in 1963 extended this to broader multiple regression contexts, including seasonal adjustments, reinforcing its role as a foundational tool for interpreting partial effects and supporting computational strategies in modern statistical software.[10] In large datasets, it promotes efficiency by avoiding redundant full-model estimations, thus streamlining workflows in fields like economics and beyond.[9] To illustrate the coefficient invariance conceptually, consider a simple three-variable linear model y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \epsilon, where interest lies in \beta_1. The theorem implies that \beta_1 remains unchanged whether obtained directly from the full regression or by first regressing both y and x_1 on x_2 to obtain residuals \tilde{y} and \tilde{x_1}, then regressing \tilde{y} on \tilde{x_1}; for a dataset with, say, 100 observations where x_1 (e.g., income) and x_2 (e.g., education) are correlated, this partialling-out yields the same \beta_1 (e.g., the marginal effect of income on consumption, net of education), highlighting the theorem's utility in isolating effects without altering results.[1]Mathematical Formulation
Linear Regression Model
The multiple linear regression model underlying the Frisch–Waugh–Lovell theorem is specified as \mathbf{Y} = \mathbf{X}\boldsymbol{\beta} + \mathbf{Z}\boldsymbol{\gamma} + \boldsymbol{\varepsilon}, where \mathbf{Y} is an n \times 1 vector of the dependent variable, \mathbf{X} is an n \times k matrix of regressors of interest, \mathbf{Z} is an n \times m matrix of control regressors, \boldsymbol{\beta} is a k \times 1 vector of unknown parameters, \boldsymbol{\gamma} is an m \times 1 vector of unknown parameters, and \boldsymbol{\varepsilon} is an n \times 1 error vector satisfying the conditional moment condition \mathbb{E}(\boldsymbol{\varepsilon} \mid \mathbf{X}, \mathbf{Z}) = \mathbf{0} and the conditional variance \mathrm{Var}(\boldsymbol{\varepsilon} \mid \mathbf{X}, \mathbf{Z}) = \sigma^2 \mathbf{I}_n, with \sigma^2 > 0. This setup assumes strict exogeneity of the errors with respect to both sets of regressors, spherical errors (homoskedasticity and no autocorrelation), and no perfect multicollinearity among the columns of [\mathbf{X} \ \mathbf{Z}]. The ordinary least squares (OLS) estimator for \boldsymbol{\beta} in this full model takes the partitioned form \hat{\boldsymbol{\beta}} = (\mathbf{X}' \mathbf{M}_{\mathbf{Z}} \mathbf{X})^{-1} \mathbf{X}' \mathbf{M}_{\mathbf{Z}} \mathbf{Y}, where \mathbf{M}_{\mathbf{Z}} = \mathbf{I}_n - \mathbf{Z}(\mathbf{Z}' \mathbf{Z})^{-1} \mathbf{Z}' is the symmetric, idempotent residual maker matrix that projects orthogonally onto the space orthogonal to the column space of \mathbf{Z}. This expression arises from the block inversion of the full OLS formula \hat{\boldsymbol{\theta}} = ([\mathbf{X} \ \mathbf{Z}]' [\mathbf{X} \ \mathbf{Z}])^{-1} [\mathbf{X} \ \mathbf{Z}]' \mathbf{Y}, where \boldsymbol{\theta} = [\boldsymbol{\beta}' \ \boldsymbol{\gamma}']', isolating the coefficients on \mathbf{X} after accounting for \mathbf{Z}. Under the classical assumptions of the linear model—linearity in parameters, conditional mean independence, homoskedasticity, no serial correlation in errors, and full rank of the regressor matrix—the OLS estimator \hat{\boldsymbol{\beta}} is unbiased, meaning \mathbb{E}(\hat{\boldsymbol{\beta}} \mid \mathbf{X}, \mathbf{Z}) = \boldsymbol{\beta}, and it achieves the minimum variance among linear unbiased estimators (BLUE property via the Gauss-Markov theorem). For consistency, as the sample size n \to \infty, \hat{\boldsymbol{\beta}} converges in probability to \boldsymbol{\beta} provided the assumptions hold asymptotically, including bounded moments and no weak dependence in the data-generating process. These properties ensure that \hat{\boldsymbol{\beta}} provides a reliable basis for inference on the effects of interest after controlling for \mathbf{Z}.Partialling-Out Procedure
The partialling-out procedure provides a practical method to estimate the coefficients of interest in a multiple linear regression model by first removing the effects of control variables through residualization. This approach is particularly useful when focusing on the relationship between a subset of regressors and the dependent variable while accounting for additional covariates. In the context of the model Y = X\beta + Z\gamma + \epsilon, where Y is the n \times 1 vector of observations on the dependent variable, X is the n \times k matrix of regressors of interest, Z is the n \times m matrix of control variables, and \epsilon is the error term, the procedure isolates \beta by orthogonalizing Y and X with respect to Z. The process begins with the first step: regress Y on Z using ordinary least squares (OLS) to obtain the residuals \tilde{Y} = Y - P_Z Y, where P_Z = Z(Z'Z)^{-1}Z' is the projection matrix onto the column space of Z. Equivalently, \tilde{Y} = M_Z Y, with M_Z = I_n - P_Z denoting the annihilator matrix that projects orthogonally to Z. These residuals \tilde{Y} represent the variation in Y unexplained by Z. Next, for each of the k columns of X, perform a separate OLS regression on Z to obtain the residuals \tilde{X} = X - P_Z X = M_Z X. The matrix \tilde{X} thus consists of k columns of residuals, capturing the variation in X orthogonal to Z. In the final step, regress the residuals \tilde{Y} on the residualized regressors \tilde{X} via OLS, yielding the estimator \hat{\beta} = (\tilde{X}' \tilde{X})^{-1} \tilde{X}' \tilde{Y}. This auxiliary regression focuses solely on the k dimensions of interest, as the effects of the m controls in Z have been partialled out. A key property here is the orthogonality of \tilde{X} to Z, since \tilde{X}' Z = X' M_Z Z = X' \cdot 0 = 0 (noting that M_Z Z = 0); this ensures no residual correlation between the partialled-out regressors and the controls in the auxiliary regression, preserving the integrity of the estimation. Computationally, this procedure offers advantages by reducing the scale of matrix operations. The full OLS estimation on the augmented model requires inverting a (k + m) \times (k + m) matrix (W'W)^{-1}, where W = [X \ Z], which can be burdensome for large m. In contrast, partialling out involves two auxiliary regressions on the m-dimensional Z (one for Y and k for X, often efficient if m is small) followed by a single inversion of the smaller k \times k matrix (\tilde{X}' \tilde{X})^{-1}, significantly lowering computational complexity and memory requirements, especially in high-dimensional settings or with big data.Proof and Interpretation
Derivation of the Theorem
The Frisch–Waugh–Lovell theorem establishes the equivalence between the ordinary least squares (OLS) estimator of the coefficients on a subset of regressors in a multiple linear regression and the OLS estimator obtained from an auxiliary regression on residuals after partialling out the effects of the remaining regressors. This algebraic derivation relies on the properties of the projection matrix and the normal equations of the full model.[10] Consider the linear regression model \mathbf{Y} = \mathbf{X}\boldsymbol{\beta} + \mathbf{Z}\boldsymbol{\gamma} + \boldsymbol{\epsilon}, where \mathbf{Y} is the n \times 1 response vector, \mathbf{X} is the n \times k matrix of regressors of interest, \mathbf{Z} is the n \times m matrix of control regressors (assumed to have full column rank), \boldsymbol{\beta} and \boldsymbol{\gamma} are the corresponding coefficient vectors, and \boldsymbol{\epsilon} is the error term satisfying the standard OLS assumptions (zero mean, homoskedasticity, and no serial correlation). The full OLS estimators satisfy the normal equations formed by the augmented design matrix \mathbf{W} = [\mathbf{X} \ \mathbf{Z}]: \begin{pmatrix} \mathbf{X}'\mathbf{X} & \mathbf{X}'\mathbf{Z} \\ \mathbf{Z}'\mathbf{X} & \mathbf{Z}'\mathbf{Z} \end{pmatrix} \begin{pmatrix} \hat{\boldsymbol{\beta}} \\ \hat{\boldsymbol{\gamma}} \end{pmatrix} = \begin{pmatrix} \mathbf{X}'\mathbf{Y} \\ \mathbf{Z}'\mathbf{Y} \end{pmatrix}. Solving the second block equation for \hat{\boldsymbol{\gamma}} gives \hat{\boldsymbol{\gamma}} = (\mathbf{Z}'\mathbf{Z})^{-1} (\mathbf{Z}'\mathbf{Y} - \mathbf{Z}'\mathbf{X} \hat{\boldsymbol{\beta}}). Substituting this into the first block equation yields \mathbf{X}'\mathbf{X} \hat{\boldsymbol{\beta}} + \mathbf{X}'\mathbf{Z} (\mathbf{Z}'\mathbf{Z})^{-1} (\mathbf{Z}'\mathbf{Y} - \mathbf{Z}'\mathbf{X} \hat{\boldsymbol{\beta}}) = \mathbf{X}'\mathbf{Y}. Rearranging terms produces \mathbf{X}'\mathbf{X} \hat{\boldsymbol{\beta}} - \mathbf{X}'\mathbf{Z} (\mathbf{Z}'\mathbf{Z})^{-1} \mathbf{Z}'\mathbf{X} \hat{\boldsymbol{\beta}} + \mathbf{X}'\mathbf{Z} (\mathbf{Z}'\mathbf{Z})^{-1} \mathbf{Z}'\mathbf{Y} = \mathbf{X}'\mathbf{Y}, or equivalently, \mathbf{X}' (\mathbf{I}_n - \mathbf{Z} (\mathbf{Z}'\mathbf{Z})^{-1} \mathbf{Z}') \mathbf{X} \ \hat{\boldsymbol{\beta}} = \mathbf{X}' (\mathbf{I}_n - \mathbf{Z} (\mathbf{Z}'\mathbf{Z})^{-1} \mathbf{Z}') \mathbf{Y}. Define the residual maker matrix \mathbf{M}_\mathbf{Z} = \mathbf{I}_n - \mathbf{P}_\mathbf{Z}, where \mathbf{P}_\mathbf{Z} = \mathbf{Z} (\mathbf{Z}'\mathbf{Z})^{-1} \mathbf{Z}' is the orthogonal projection onto the column space of \mathbf{Z}. The key properties of \mathbf{M}_\mathbf{Z} are its symmetry, idempotence (\mathbf{M}_\mathbf{Z}^2 = \mathbf{M}_\mathbf{Z}), and orthogonality to \mathbf{Z} (\mathbf{M}_\mathbf{Z} \mathbf{Z} = \mathbf{0}). Thus, the equation simplifies to \hat{\boldsymbol{\beta}} = (\mathbf{X}' \mathbf{M}_\mathbf{Z} \mathbf{X})^{-1} \mathbf{X}' \mathbf{M}_\mathbf{Z} \mathbf{Y}. Now define the residualized variables \tilde{\mathbf{X}} = \mathbf{M}_\mathbf{Z} \mathbf{X} and \tilde{\mathbf{Y}} = \mathbf{M}_\mathbf{Z} \mathbf{Y}, which are the residuals from regressing \mathbf{X} and \mathbf{Y} on \mathbf{Z}, respectively. Substituting these yields \hat{\boldsymbol{\beta}} = (\tilde{\mathbf{X}}' \tilde{\mathbf{X}})^{-1} \tilde{\mathbf{X}}' \tilde{\mathbf{Y}}, which is precisely the OLS estimator from the auxiliary regression of \tilde{\mathbf{Y}} on \tilde{\mathbf{X}}. This demonstrates the equivalence of the estimators.[10] Under the standard assumptions, the variance-covariance matrix of \hat{\boldsymbol{\beta}} in the full model is \mathrm{Var}(\hat{\boldsymbol{\beta}}) = \sigma^2 (\mathbf{X}' \mathbf{M}_\mathbf{Z} \mathbf{X})^{-1}, where \sigma^2 is the error variance. Substituting the residualized forms gives \mathrm{Var}(\hat{\boldsymbol{\beta}}) = \sigma^2 (\tilde{\mathbf{X}}' \tilde{\mathbf{X}})^{-1}, confirming that the asymptotic properties (and standard errors, up to estimation of \sigma^2) are identical between the full and auxiliary regressions. This equivalence extends the original result of Frisch and Waugh (who used Cramer's rule for the special case of time trends) to the general linear model via matrix algebra.[10]Geometric Interpretation
The Frisch–Waugh–Lovell (FWL) theorem admits a natural geometric interpretation in the context of Euclidean vector spaces, where ordinary least squares (OLS) estimation corresponds to orthogonal projections onto subspaces spanned by the regressors. Consider the multiple linear regression model y = X\beta + Z\gamma + \epsilon, where X represents the regressors of primary interest, Z denotes additional control variables, and the full design matrix is W = [X \ Z]. The space of observations is the Euclidean space \mathbb{R}^n (with n the sample size), and the column space of W, denoted \mathcal{S}(W), is a subspace onto which the response vector y is projected to obtain the fitted values \hat{y} = P_W y, where P_W = W(W'W)^{-1}W' is the orthogonal projection matrix. The FWL procedure geometrically "partialles out" the effect of Z by projecting both y and the columns of X onto the orthogonal complement of \mathcal{S}(Z), yielding residuals \tilde{y} = (I - P_Z)y and \tilde{X} = (I - P_Z)X. These residuals lie in the subspace \mathcal{S}(Z)^\perp, ensuring they are uncorrelated with any vector in \mathcal{S}(Z).[11][3][12] The estimated coefficient \hat{\beta} on X is then the slope from regressing \tilde{y} onto \tilde{X}, which geometrically represents the orthogonal projection of \tilde{y} onto the subspace \mathcal{S}(\tilde{X}) \subseteq \mathcal{S}(Z)^\perp. This projection is invariant to the inclusion of Z in the original model because the partialling-out step removes all linear dependence on Z, leaving the relationship between \tilde{y} and \tilde{X} identical to the marginal effect of X after controlling for Z in the full model. In vector terms, the full-model projection P_W y decomposes additively into components aligned with \mathcal{S}(X) and \mathcal{S}(Z), but the component in \mathcal{S}(Z)^\perp isolates \hat{\beta} unchanged, as orthogonality ensures no interference from Z. This invariance holds because the residuals \tilde{X} and \tilde{y} are orthogonal to Z, so adding Z back does not alter the least-squares fit within the complement space.[11][3][12] To visualize this, consider a simple bivariate case (analogous to 2D geometry) where Z is a single constant regressor, forming a 1D hyperplane (line) through the origin in the scatterplot of y versus X. Partialling out Z shifts observations to their deviations from this line, effectively "detrending" the data perpendicular to \mathcal{S}(Z); the subsequent regression on these deviations yields a line parallel to the full-model slope, preserving \hat{\beta}. In three dimensions (with one X and one Z), the full space \mathcal{S}(W) is a plane spanned by the vectors of X and Z; projecting y onto the line orthogonal to Z within this plane isolates the unique direction of X's influence, with residuals representing perpendicular deviations from the Z-hyperplane. This geometric view underscores how FWL simplifies multicollinearity by confining analysis to orthogonal subspaces, facilitating intuitive understanding of coefficient isolation.[11][3]Applications
In Econometrics
In econometrics, the Frisch–Waugh–Lovell theorem facilitates efficient hypothesis testing for subsets of regressors in multiple linear regression models, particularly through F-tests for joint significance. By partialling out the effects of control variables from both the dependent variable and the variables of interest, the theorem reduces the problem to a simpler auxiliary regression on the residuals, where the resulting F-statistic directly tests the null hypothesis that the coefficients on the subset are zero, conditional on the controls. This approach avoids repeated full-model estimations, which is advantageous in large-scale economic datasets. For example, in evaluating the joint impact of fiscal policy variables such as government spending and tax rates on GDP growth while controlling for demographic factors like population age and urbanization, the partial regression yields the same F-test result as the complete model, enabling precise inference on policy effectiveness without computational redundancy.[13][14] The theorem also provides a practical tool for assessing omitted variable bias, a common concern in economic modeling where key confounders may be excluded due to data limitations. It demonstrates that the coefficient on a variable of interest in the full model—after partialling out included controls—differs from the estimate in a restricted model omitting those controls, with the difference quantifying the bias induced by omission. This comparison is essential for robustness checks; for instance, in labor economics studies of wage determinants, partialling out observables like education and experience reveals how much the estimated return to schooling would be biased upward if unobserved ability were omitted, guiding model specification and interpretation.[13][15] Additionally, the Frisch–Waugh–Lovell theorem underpins instrumental variables methods, notably as a special case in two-stage least squares (2SLS) estimation for addressing endogeneity. In 2SLS, instruments are used to purge endogenous regressors of correlation with errors; the theorem shows this is equivalent to partialling out exogenous controls from the outcome, endogenous variables, and instruments, then performing IV on the residualized data, producing identical coefficient estimates to the standard procedure. This linkage is critical in causal inference applications, such as estimating the effect of education on earnings using quarter-of-birth instruments while controlling for family background, ensuring unbiased policy-relevant conclusions.[16][17] The theorem has been extended to generalized linear models (GLMs) and high-dimensional settings. In GLMs, FWL-inspired decompositions enable efficient estimation with high-dimensional fixed effects, such as in logit or Poisson models with numerous categorical controls, by residualizing variables before applying maximum likelihood, as implemented in tools like the Capybara package for memory-efficient computation.[18] In high-dimensional econometrics, where the number of regressors exceeds observations, FWL underpins double machine learning (DML) approaches, partialling out nuisance parameters (e.g., via lasso) to isolate treatment effects, improving bias reduction in causal estimates.[19] A 2025 extension establishes an FWL theorem for generalized method of moments (GMM), allowing residualized estimation equivalent to full GMM models, useful for dynamic panel data and nonlinear instruments.[20]In Statistical Computing
The Frisch–Waugh–Lovell (FWL) theorem enhances numerical stability in statistical computing by decomposing multivariate regressions into separate univariate regressions on residuals, thereby avoiding the direct inversion of large design matrices that can lead to ill-conditioned problems in high-dimensional settings.[21] This approach is particularly advantageous when dealing with datasets where the number of observations greatly exceeds the number of parameters (N >> K), as it processes data in manageable blocks without requiring the full matrix to be formed or inverted until the final aggregation step, reducing the risk of overflow or precision loss.[21] In practice, the FWL theorem is implemented in major statistical software packages through manual computation of residuals from auxiliary regressions. In R, users can apply the theorem using the baselm() function to obtain residuals from regressing the variable of interest and the outcome on control variables, followed by a simple regression of those residuals; the partialling.out package formalizes this by extracting residuals from lm or fixed-effects models like those from the fixest package.[22] Similarly, in Stata, the areg command absorbs categorical factors via the FWL decomposition, adjusting variables to group means before applying ordinary least squares, which is equivalent to partialling out the absorbed effects.[23] For Python, the statsmodels library supports FWL implementation by computing residuals from OLS fits on controls, enabling users to replicate the partialling-out procedure for the main regressor and outcome before a final univariate regression.[24]
Extensions of the FWL theorem to big data leverage its row-wise structure for parallelization, allowing auxiliary regressions to be computed independently across data partitions and aggregated via sufficient statistics like cross-products, achieving linear scalability with the number of processors.[21] This embarrassingly parallel approach is well-suited for distributed computing environments, as demonstrated in applications handling millions of observations from concatenated files without loading the entire dataset into memory.[25]