Generalized least squares
Generalized least squares (GLS) is a statistical estimation method used to fit linear regression models when the errors exhibit heteroscedasticity (unequal variances) or autocorrelation (correlation among errors), by incorporating a known or estimated covariance structure to weight the observations appropriately.[1] Developed by Alexander C. Aitken, GLS generalizes the ordinary least squares (OLS) approach to produce more efficient parameter estimates under these conditions, first introduced in his 1935 paper on weighted observations and linear combinations.[2] In the linear model \mathbf{y} = \mathbf{X}\boldsymbol{\beta} + \boldsymbol{\epsilon}, where \mathbf{y} is the n \times 1 response vector, \mathbf{X} is the n \times k design matrix, \boldsymbol{\beta} is the k \times 1 parameter vector, and \boldsymbol{\epsilon} has mean zero and covariance matrix \boldsymbol{\Sigma} = \sigma^2 \mathbf{V} (with \mathbf{V} known up to a scalar), the GLS estimator minimizes the quadratic form (\mathbf{y} - \mathbf{X}\boldsymbol{\beta})^\top \mathbf{V}^{-1} (\mathbf{y} - \mathbf{X}\boldsymbol{\beta}).[3] The resulting estimator is \hat{\boldsymbol{\beta}}_{GLS} = (\mathbf{X}^\top \mathbf{V}^{-1} \mathbf{X})^{-1} \mathbf{X}^\top \mathbf{V}^{-1} \mathbf{y}, which is the best linear unbiased estimator (BLUE) by the Gauss-Markov theorem when the error covariance is correctly specified, offering lower variance than OLS.[3] When the covariance matrix \mathbf{V} is unknown, feasible GLS (FGLS) estimates it from the data—often using OLS residuals—and substitutes the estimate into the GLS formula, yielding consistent and asymptotically efficient estimates under mild conditions.[4] GLS is widely applied in econometrics, time series analysis (e.g., AR(1) errors via quasi-differencing), and panel data models (e.g., random effects), where it improves inference by correcting for serial correlation and heteroscedasticity, though it requires careful specification of the error structure to avoid inefficiency or bias.[4]Background and Motivation
Limitations of Ordinary Least Squares
Ordinary least squares (OLS) regression, developed in the early 19th century by mathematicians Adrien-Marie Legendre and Carl Friedrich Gauss primarily for analyzing astronomical data, relies on the assumption that errors are independent and identically distributed (i.i.d.) with constant variance.[5] This method minimizes the sum of squared residuals to estimate parameters in a linear model, but its validity hinges on several key assumptions: linearity in parameters, independence of errors, homoscedasticity (constant error variance across observations), and no autocorrelation among errors.[6] Violations of these assumptions are common in real-world data, leading to unreliable results.[7] When these assumptions fail, OLS estimators remain unbiased and consistent under certain conditions but lose efficiency, meaning they no longer provide the minimum-variance estimates among linear unbiased estimators (BLUE).[6] More critically, standard errors become inefficient, often underestimated, which invalidates hypothesis tests and confidence intervals—for instance, t-tests may appear overly significant, leading to overconfident inferences about parameter significance.[8] In cases of severe violations, such as endogeneity from omitted variables or measurement error, OLS can produce biased estimates.[6] A prevalent violation is heteroscedasticity, where error variance increases with the level of an explanatory variable, as seen in cross-sectional data on household income and food expenditure: lower-income households show less variation in spending, while higher-income ones exhibit greater dispersion.[9] Autocorrelation, another common issue in time series data, occurs when errors are serially correlated, such as in an AR(1) process where current errors depend on past ones, as in economic indicators like GDP growth rates over time.[10] These violations underscore the need for extensions like generalized least squares to restore efficiency and valid inference.[11]The General Linear Model
The standard linear model in statistics is formulated as \mathbf{Y} = \mathbf{X}\boldsymbol{\beta} + \boldsymbol{\varepsilon}, where \mathbf{Y} is an n \times 1 response vector, \mathbf{X} is an n \times p design matrix of explanatory variables, \boldsymbol{\beta} is a p \times 1 vector of unknown parameters, and \boldsymbol{\varepsilon} is an n \times 1 error vector.[12] This setup assumes that the errors satisfy E(\boldsymbol{\varepsilon}) = \mathbf{0}, providing an unbiased representation of the expected value E(\mathbf{Y}) = \mathbf{X}\boldsymbol{\beta}.[12] To accommodate real-world data exhibiting heteroscedasticity or correlation among errors, the model is generalized by relaxing the assumption on the error variance. Specifically, the errors now have E(\boldsymbol{\varepsilon}) = \mathbf{0} and \text{Var}(\boldsymbol{\varepsilon}) = \sigma^2 \boldsymbol{\Omega}, where \sigma^2 > 0 is a scalar variance parameter and \boldsymbol{\Omega} is a known, positive definite n \times n covariance matrix that need not be the identity matrix.[12] This generalization, originally developed by Aitken to handle weighted and correlated observations, allows the model to capture non-spherical error structures while preserving the linear relationship between predictors and the response.[13] The matrix \boldsymbol{\Omega} encodes the structure of error dependence: its diagonal elements reflect heteroscedasticity through varying variances across observations, while off-diagonal elements capture correlations between errors, such as those arising from temporal or spatial dependencies.[12] For instance, in clustered data where observations within groups are correlated but independent across groups, \boldsymbol{\Omega} takes a block-diagonal form, with each block corresponding to the covariance within a cluster.[14] The positive definiteness of \boldsymbol{\Omega} ensures it is invertible, which is essential for subsequent transformations and estimation procedures, while \sigma^2 serves as an overall scale factor for the variance.[12] Ordinary least squares corresponds to the special case where \boldsymbol{\Omega} = \mathbf{I}_n, the n \times n identity matrix, implying homoscedastic and uncorrelated errors.[12]GLS Methodology
Model Formulation
The generalized least squares (GLS) method addresses linear regression scenarios where the error terms exhibit heteroscedasticity or correlation, violating the ordinary least squares (OLS) assumption of independent and identically distributed errors with constant variance. The model is formulated as \mathbf{Y} = \mathbf{X}\boldsymbol{\beta} + \boldsymbol{\varepsilon}, where \mathbf{Y} is an n \times 1 vector of observed responses, \mathbf{X} is an n \times p design matrix of predictors, \boldsymbol{\beta} is a p \times 1 vector of unknown parameters, and \boldsymbol{\varepsilon} is an n \times 1 error vector. Under the normality assumption used in derivations, \boldsymbol{\varepsilon} \sim \mathcal{N}(\mathbf{0}, \sigma^2 \boldsymbol{\Omega}), where \sigma^2 > 0 is a scalar variance and \boldsymbol{\Omega} is a known n \times n positive definite matrix capturing the error covariance structure.[2][12] However, GLS remains valid more generally under moment conditions, specifically E(\boldsymbol{\varepsilon}) = \mathbf{0} and \text{Var}(\boldsymbol{\varepsilon}) = \sigma^2 \boldsymbol{\Omega}, without requiring normality for consistency.[15][16] A key applicability condition is that the design matrix \mathbf{X} must have full column rank, ensuring p \leq n and \text{[rank](/page/Rank)}(\mathbf{X}) = p, which guarantees the parameters are identifiable and the normal equations are solvable.[17][2] The matrix \boldsymbol{\Omega} must be positive definite to ensure the covariance is well-defined and invertible, enabling the transformation central to GLS; its structure is typically known or assumed based on theoretical knowledge, such as autoregressive processes or clustering.[17][18] For practical implementation, the data requirements include observed values of \mathbf{Y} and \mathbf{X}, with the form of \boldsymbol{\Omega} either fully known or specified up to estimation from prior data or domain theory; in the latter case, feasible GLS approximations are used when \boldsymbol{\Omega} is unknown.[18][19] This formulation aligns with the linear component of generalized linear models in statistics, distinct from extensions handling non-normal responses via link functions.[20]Estimator Definition
The generalized least squares (GLS) estimator addresses the estimation of the parameter vector \beta in the linear model Y = X\beta + \epsilon, where the error term \epsilon has zero mean and covariance matrix \sigma^2 \Omega, with \Omega being a known positive definite matrix. The GLS estimator is derived by minimizing the quadratic form (Y - X\beta)^T \Omega^{-1} (Y - X\beta), yielding the closed-form expression \hat{\beta}_{\text{GLS}} = (X^T \Omega^{-1} X)^{-1} X^T \Omega^{-1} Y. This estimator generalizes the ordinary least squares approach by incorporating the inverse covariance weighting to account for error heteroscedasticity and correlation.[13] The variance-covariance matrix of the GLS estimator, under the model assumptions, is given by \text{Var}(\hat{\beta}_{\text{GLS}}) = \sigma^2 (X^T \Omega^{-1} X)^{-1}, which reflects the efficiency gain from weighting by the inverse covariance structure.[21] Computationally, direct inversion of \Omega can be inefficient for large matrices, so a common approach uses the Cholesky decomposition \Omega = L L^T, where L is lower triangular. The variables are then transformed as Y^* = L^{-1} Y and X^* = L^{-1} X, reducing the problem to ordinary least squares estimation on the transformed data: \hat{\beta}_{\text{GLS}} = (X^{*T} X^*)^{-1} X^{*T} Y^*. This transformation leverages the stability and efficiency of Cholesky factorization, avoiding explicit inversion.[22] The algorithmic steps for implementing GLS are as follows:- Obtain or estimate the covariance matrix \Omega.
- Compute its inverse \Omega^{-1} (or equivalently, use the Cholesky-based transformation).
- Form the weighted matrices X_w = \Omega^{-1/2} X and Y_w = \Omega^{-1/2} Y.
- Solve the normal equations (X_w^T X_w) \hat{\beta} = X_w^T Y_w to obtain \hat{\beta}_{\text{GLS}}.