Fact-checked by Grok 2 weeks ago

Generalized least squares

Generalized least squares (GLS) is a statistical estimation method used to fit linear regression models when the errors exhibit heteroscedasticity (unequal variances) or autocorrelation (correlation among errors), by incorporating a known or estimated covariance structure to weight the observations appropriately.^[1] Developed by Alexander C. Aitken, GLS generalizes the ordinary least squares (OLS) approach to produce more efficient parameter estimates under these conditions, first introduced in his 1935 paper on weighted observations and linear combinations.^[2] In the linear model \mathbf{y} = \mathbf{X}\boldsymbol{\beta} + \boldsymbol{\epsilon}, where \mathbf{y} is the n \times 1 response vector, \mathbf{X} is the n \times k design matrix, \boldsymbol{\beta} is the k \times 1 parameter vector, and \boldsymbol{\epsilon} has mean zero and covariance matrix \boldsymbol{\Sigma} = \sigma^2 \mathbf{V} (with \mathbf{V} known up to a scalar), the GLS estimator minimizes the quadratic form (\mathbf{y} - \mathbf{X}\boldsymbol{\beta})^\top \mathbf{V}^{-1} (\mathbf{y} - \mathbf{X}\boldsymbol{\beta}).^[3] The resulting estimator is \hat{\boldsymbol{\beta}}_{GLS} = (\mathbf{X}^\top \mathbf{V}^{-1} \mathbf{X})^{-1} \mathbf{X}^\top \mathbf{V}^{-1} \mathbf{y}, which is the best linear unbiased estimator (BLUE) by the Gauss-Markov theorem when the error covariance is correctly specified, offering lower variance than OLS.^[3] When the covariance matrix \mathbf{V} is unknown, feasible GLS (FGLS) estimates it from the data—often using OLS residuals—and substitutes the estimate into the GLS formula, yielding consistent and asymptotically efficient estimates under mild conditions.^[4] GLS is widely applied in econometrics, time series analysis (e.g., AR(1) errors via quasi-differencing), and panel data models (e.g., random effects), where it improves inference by correcting for serial correlation and heteroscedasticity, though it requires careful specification of the error structure to avoid inefficiency or bias.^[4]

Background and Motivation

Limitations of Ordinary Least Squares

Ordinary least squares (OLS) regression, developed in the early 19th century by mathematicians Adrien-Marie Legendre and Carl Friedrich Gauss primarily for analyzing astronomical data, relies on the assumption that errors are independent and identically distributed (i.i.d.) with constant variance.^[5] This method minimizes the sum of squared residuals to estimate parameters in a linear model, but its validity hinges on several key assumptions: linearity in parameters, independence of errors, homoscedasticity (constant error variance across observations), and no autocorrelation among errors.^[6] Violations of these assumptions are common in real-world data, leading to unreliable results.^[7] When these assumptions fail, OLS estimators remain unbiased and consistent under certain conditions but lose efficiency, meaning they no longer provide the minimum-variance estimates among linear unbiased estimators (BLUE).^[6] More critically, standard errors become inefficient, often underestimated, which invalidates hypothesis tests and confidence intervals—for instance, t-tests may appear overly significant, leading to overconfident inferences about parameter significance.^[8] In cases of severe violations, such as endogeneity from omitted variables or measurement error, OLS can produce biased estimates.^[6] A prevalent violation is heteroscedasticity, where error variance increases with the level of an explanatory variable, as seen in cross-sectional data on household income and food expenditure: lower-income households show less variation in spending, while higher-income ones exhibit greater dispersion.^[9] Autocorrelation, another common issue in time series data, occurs when errors are serially correlated, such as in an AR(1) process where current errors depend on past ones, as in economic indicators like GDP growth rates over time.^[10] These violations underscore the need for extensions like generalized least squares to restore efficiency and valid inference.^[11]

The General Linear Model

The standard linear model in statistics is formulated as \mathbf{Y} = \mathbf{X}\boldsymbol{\beta} + \boldsymbol{\varepsilon}, where \mathbf{Y} is an n \times 1 response vector, \mathbf{X} is an n \times p design matrix of explanatory variables, \boldsymbol{\beta} is a p \times 1 vector of unknown parameters, and \boldsymbol{\varepsilon} is an n \times 1 error vector.^[12] This setup assumes that the errors satisfy E(\boldsymbol{\varepsilon}) = \mathbf{0}, providing an unbiased representation of the expected value E(\mathbf{Y}) = \mathbf{X}\boldsymbol{\beta}.^[12] To accommodate real-world data exhibiting heteroscedasticity or correlation among errors, the model is generalized by relaxing the assumption on the error variance. Specifically, the errors now have E(\boldsymbol{\varepsilon}) = \mathbf{0} and \text{Var}(\boldsymbol{\varepsilon}) = \sigma^2 \boldsymbol{\Omega}, where \sigma^2 > 0 is a scalar variance parameter and \boldsymbol{\Omega} is a known, positive definite n \times n covariance matrix that need not be the identity matrix.^[12] This generalization, originally developed by Aitken to handle weighted and correlated observations, allows the model to capture non-spherical error structures while preserving the linear relationship between predictors and the response.^[13] The matrix \boldsymbol{\Omega} encodes the structure of error dependence: its diagonal elements reflect heteroscedasticity through varying variances across observations, while off-diagonal elements capture correlations between errors, such as those arising from temporal or spatial dependencies.^[12] For instance, in clustered data where observations within groups are correlated but independent across groups, \boldsymbol{\Omega} takes a block-diagonal form, with each block corresponding to the covariance within a cluster.^[14] The positive definiteness of \boldsymbol{\Omega} ensures it is invertible, which is essential for subsequent transformations and estimation procedures, while \sigma^2 serves as an overall scale factor for the variance.^[12] Ordinary least squares corresponds to the special case where \boldsymbol{\Omega} = \mathbf{I}_n, the n \times n identity matrix, implying homoscedastic and uncorrelated errors.^[12]

GLS Methodology

Model Formulation

The generalized least squares (GLS) method addresses linear regression scenarios where the error terms exhibit heteroscedasticity or correlation, violating the ordinary least squares (OLS) assumption of independent and identically distributed errors with constant variance. The model is formulated as \mathbf{Y} = \mathbf{X}\boldsymbol{\beta} + \boldsymbol{\varepsilon}, where \mathbf{Y} is an n \times 1 vector of observed responses, \mathbf{X} is an n \times p design matrix of predictors, \boldsymbol{\beta} is a p \times 1 vector of unknown parameters, and \boldsymbol{\varepsilon} is an n \times 1 error vector. Under the normality assumption used in derivations, \boldsymbol{\varepsilon} \sim \mathcal{N}(\mathbf{0}, \sigma^2 \boldsymbol{\Omega}), where \sigma^2 > 0 is a scalar variance and \boldsymbol{\Omega} is a known n \times n positive definite matrix capturing the error covariance structure.^[2]^[12] However, GLS remains valid more generally under moment conditions, specifically E(\boldsymbol{\varepsilon}) = \mathbf{0} and \text{Var}(\boldsymbol{\varepsilon}) = \sigma^2 \boldsymbol{\Omega}, without requiring normality for consistency.^[15]^[16] A key applicability condition is that the design matrix \mathbf{X} must have full column rank, ensuring p \leq n and \text{[rank](/page/Rank)}(\mathbf{X}) = p, which guarantees the parameters are identifiable and the normal equations are solvable.^[17]^[2] The matrix \boldsymbol{\Omega} must be positive definite to ensure the covariance is well-defined and invertible, enabling the transformation central to GLS; its structure is typically known or assumed based on theoretical knowledge, such as autoregressive processes or clustering.^[17]^[18] For practical implementation, the data requirements include observed values of \mathbf{Y} and \mathbf{X}, with the form of \boldsymbol{\Omega} either fully known or specified up to estimation from prior data or domain theory; in the latter case, feasible GLS approximations are used when \boldsymbol{\Omega} is unknown.^[18]^[19] This formulation aligns with the linear component of generalized linear models in statistics, distinct from extensions handling non-normal responses via link functions.^[20]

Estimator Definition

The generalized least squares (GLS) estimator addresses the estimation of the parameter vector \beta in the linear model Y = X\beta + \epsilon, where the error term \epsilon has zero mean and covariance matrix \sigma^2 \Omega, with \Omega being a known positive definite matrix. The GLS estimator is derived by minimizing the quadratic form (Y - X\beta)^T \Omega^{-1} (Y - X\beta), yielding the closed-form expression

\hat{\beta}_{\text{GLS}} = (X^T \Omega^{-1} X)^{-1} X^T \Omega^{-1} Y.

This estimator generalizes the ordinary least squares approach by incorporating the inverse covariance weighting to account for error heteroscedasticity and correlation.^[13] The variance-covariance matrix of the GLS estimator, under the model assumptions, is given by

\text{Var}(\hat{\beta}_{\text{GLS}}) = \sigma^2 (X^T \Omega^{-1} X)^{-1},

which reflects the efficiency gain from weighting by the inverse covariance structure.^[21] Computationally, direct inversion of \Omega can be inefficient for large matrices, so a common approach uses the Cholesky decomposition \Omega = L L^T, where L is lower triangular. The variables are then transformed as Y^* = L^{-1} Y and X^* = L^{-1} X, reducing the problem to ordinary least squares estimation on the transformed data: \hat{\beta}_{\text{GLS}} = (X^{*T} X^*)^{-1} X^{*T} Y^*. This transformation leverages the stability and efficiency of Cholesky factorization, avoiding explicit inversion.^[22] The algorithmic steps for implementing GLS are as follows:

Obtain or estimate the covariance matrix \Omega.
Compute its inverse \Omega^{-1} (or equivalently, use the Cholesky-based transformation).
Form the weighted matrices X_w = \Omega^{-1/2} X and Y_w = \Omega^{-1/2} Y.
Solve the normal equations (X_w^T X_w) \hat{\beta} = X_w^T Y_w to obtain \hat{\beta}_{\text{GLS}}.

When \Omega is diagonal, this procedure specializes to weighted least squares.

Statistical Properties

Under the standard assumptions of the generalized least squares (GLS) model, where the errors have mean zero, E(\varepsilon) = 0, the GLS estimator \hat{\beta}_{GLS} is unbiased, satisfying E(\hat{\beta}_{GLS}) = \beta.^[23] This property holds provided the covariance matrix \Omega is known and positive definite, ensuring the estimator's linearity and the exogeneity of the regressors.^[24] The GLS estimator possesses the best linear unbiased estimator (BLUE) property, as established by the extension of the Gauss-Markov theorem to the heteroscedastic and/or autocorrelated case.^[24] Specifically, among all linear unbiased estimators, \hat{\beta}_{GLS} has the minimum variance, given by \text{Var}(\hat{\beta}_{GLS}) = \sigma^2 (X^T \Omega^{-1} X)^{-1}.^[24] This optimality is achieved when \Omega is correctly specified, making GLS superior in variance to alternatives like ordinary least squares (OLS) under violations of the classical assumptions.^[23] In large samples, the GLS estimator is asymptotically normal: \sqrt{n} (\hat{\beta}_{GLS} - \beta) \xrightarrow{d} N\left(0, \sigma^2 \cdot \text{plim}\left(n^{-1} X^T \Omega^{-1} X\right)^{-1}\right), provided the regressors are exogenous and \Omega is consistently estimated if unknown.^[23] This distribution facilitates inference, particularly for hypothesis testing. For testing linear restrictions R\beta = r, the Wald statistic W = (R\hat{\beta}_{GLS} - r)^T [R (X^T \Omega^{-1} X)^{-1} R^T]^{-1} (R\hat{\beta}_{GLS} - r) follows a \chi^2 distribution with degrees of freedom equal to the number of restrictions under the null, asymptotically.^[23] When model assumptions are relaxed (e.g., for heteroscedasticity or autocorrelation), robust inference uses sandwich estimators for the covariance matrix, enabling valid Wald, likelihood ratio, and Lagrange multiplier tests.^[23] Compared to OLS, GLS exhibits lower mean squared error (MSE) in settings with heteroscedasticity or autocorrelation, as its variance is minimized while maintaining unbiasedness, yielding efficiency gains that can exceed 100% in severe violation cases.^[25]

Special Cases

Weighted Least Squares

Weighted least squares (WLS) arises as a special case of generalized least squares (GLS) when the error covariance matrix \Omega is diagonal, indicating uncorrelated errors but heterogeneous variances across observations. In this framework, the diagonal elements of \Omega are \sigma_i^2 for i = 1, \dots, n, and the corresponding weight matrix W = \Omega^{-1} has diagonal elements \omega_i = 1/\sigma_i^2, which downweight observations with larger variances to achieve efficiency.^[26]^[27] The WLS estimator simplifies to \hat{\beta}_{WLS} = (X^T W X)^{-1} X^T W Y, where Y is the response vector, X is the design matrix, and W = \operatorname{diag}(\omega_1, \dots, \omega_n). This form is equivalent to ordinary least squares applied to variance-stabilized data, obtained by transforming each observation Y_i and row x_i of X by \sqrt{\omega_i}.^[26]^[28] The choice of weights depends on the assumed error variance structure. When variances are known, \omega_i = 1/\sigma_i^2; for count data with Poisson-like variance \sigma_i^2 \propto x_i^T \beta (approximated by x_i^T \hat{\beta}_{OLS}), weights are often set to \omega_i = 1/x_i; alternatively, empirical weights can be derived from squared OLS residuals as \omega_i = 1/\hat{u}_i^2, though this requires iteration for consistency.^[26]^[27] WLS coincides with GLS precisely when the errors exhibit no serial correlation, as the off-diagonal elements of \Omega are zero, reducing the general GLS problem to this diagonal weighting scheme.^[26]^[28] A practical example occurs in regressing wages on years of education, where heteroscedasticity is common due to error variances increasing with education level, as higher-educated workers face more variable labor market outcomes. Using data from 935 observations, ordinary least squares yields \widehat{\text{wage}} = 146.952 + 60.214 \cdot \text{educ} with \hat{\sigma} = 382.32, but Breusch-Pagan and White tests confirm nonconstant variance (p-values near zero). Applying WLS with weights based on fitted variances from an auxiliary regression of \log(\hat{u}^2) on educ corrects this, producing more efficient estimates with narrower standard errors.^[29]

Heteroscedasticity Handling

Heteroscedasticity occurs when the variance of the error terms in a linear regression model is not constant across observations, violating a key assumption of ordinary least squares (OLS). Common forms include multiplicative heteroscedasticity, where the error variance for the i-th observation is given by \operatorname{[Var](/page/Var)}(\varepsilon_i) = \sigma^2 h(x_i), with h(\cdot) a positive function of the regressors x_i, often modeled as exponential or polynomial to capture scaling with predictors like income or size.^[30] Another form is grouped heteroscedasticity, where observations are clustered into subgroups (e.g., by region or time period) with distinct constant variances within each group but differing across groups, leading to non-uniform error dispersion in panel or cross-sectional data. Detection of heteroscedasticity typically involves residual-based tests applied after initial OLS estimation. The Breusch-Pagan test regresses the squared OLS residuals on the original regressors (or their transformations) and examines the significance of the auxiliary regression's R^2 via a chi-squared statistic, assuming a specified functional form for the variance such as multiplicative.^[31] White's test extends this by including cross-products of regressors in the auxiliary regression, making it more general and free from assumptions about the specific form of heteroscedasticity, though it can suffer from low power in small samples.^[32] For grouped heteroscedasticity, the Goldfeld-Quandt test splits the data into two subsets (e.g., by ordering on a key variable like income), estimates separate variances, and compares them using an F-statistic, providing a simple check for variance shifts across groups. To remedy heteroscedasticity using generalized least squares (GLS), the covariance matrix \Omega is specified as diagonal with elements \Omega_{ii} = h(x_i), transforming the model into a weighted least squares (WLS) framework where observations are weighted inversely by their variances to achieve efficiency. Weighted least squares serves as the primary tool here, minimizing the weighted sum of squared residuals to yield unbiased and efficient estimates under correct variance specification.^[30] Misspecification of the heteroscedasticity form, such as assuming an incorrect h(x_i), results in GLS estimators that remain consistent but lose asymptotic efficiency, potentially performing worse than OLS in finite samples.^[33] As an alternative that avoids variance modeling, heteroscedasticity-consistent standard errors (HCSE) adjust OLS inference by estimating a robust covariance matrix without assuming a specific heteroscedasticity structure, ensuring valid t-tests and confidence intervals even under misspecification.^[32]

Feasible GLS

Covariance Matrix Estimation

In practice, the covariance matrix \Omega of the error terms in the generalized least squares (GLS) model is rarely known a priori, necessitating the use of feasible GLS (FGLS), which replaces \Omega with a consistent estimate \hat{\Omega} derived from the data.^[34] This estimation enables application of GLS principles when the true structure is unknown, improving efficiency over ordinary least squares (OLS) under violations of the classical assumptions.^[35] A common approach is the two-step estimation procedure. In the first step, OLS is applied to obtain the residuals \hat{e} = y - X\hat{\beta}_{\text{OLS}}. In the second step, \hat{\Omega} is constructed from these residuals; for an unstructured form, this may be \hat{\Omega} = \frac{1}{n} \hat{e} \hat{e}^T, while structured forms parameterize \Omega based on assumed error processes, such as autoregressive (AR) models. For example, under first-order AR(1) errors with parameter \rho, \hat{\rho} is estimated via an auxiliary regression of \hat{e}_t on \hat{e}_{t-1}, yielding a Toeplitz-structured \hat{\Omega} with elements \hat{\rho}^{|i-j|}.^[35] This method, pioneered in the context of autocorrelated errors, extends to more general covariance structures like seemingly unrelated regressions.^[34] Under standard regularity conditions—such as strict exogeneity, no perfect multicollinearity, and correct specification of the error covariance structure—the resulting \hat{\Omega} is consistent, meaning \operatorname{plim}_{n \to \infty} \hat{\Omega} = \Omega.^[36] This consistency ensures that the FGLS estimator inherits the desirable properties of true GLS asymptotically. To facilitate estimation, the covariance is often decomposed into a scale parameter and a correlation matrix: \Omega = \sigma^2 \Sigma, where \Sigma is estimated without the unknown \sigma^2, and \sigma^2 is then obtained separately from the residuals of the preliminary regression, such as \hat{\sigma}^2 = \frac{1}{n} \hat{e}^T \hat{e}.^[18] For small samples, bias in \hat{\Omega} can arise due to the use of estimated residuals, leading to downward bias in variance estimates; corrections include adjusting the diagonal elements or applying a finite-sample scaling factor like \frac{n}{n-k} to the overall estimator, where k is the number of parameters. Such adjustments, as in heteroskedasticity and autocorrelation consistent (HAC) variants, improve reliability without altering asymptotic properties.

Iterative Procedures

The feasible generalized least squares (FGLS) estimator is defined as \hat{\beta}_{\text{FGLS}} = (X^T \hat{\Omega}^{-1} X)^{-1} X^T \hat{\Omega}^{-1} y, where \hat{\Omega} denotes an estimate of the error covariance matrix \Omega derived from a preliminary estimation step.^[21] Iterative FGLS refines this estimator through a sequential process that begins with an ordinary least squares (OLS) fit to obtain initial residuals, from which \hat{\Omega} is estimated using methods such as those for autocorrelation or heteroscedasticity. These estimates inform a weighted least squares regression to update \hat{\beta}, after which new residuals are generated to revise \hat{\Omega}, and the cycle repeats until the change in successive parameter estimates falls below a tolerance threshold, such as |\hat{\beta}^{(k)} - \hat{\beta}^{(k-1)}| < \epsilon for some small \epsilon > 0.^[37]^[21] Under correct model specification and assumptions ensuring the information matrix is positive definite and the initial estimates lie within a neighborhood of the true parameters, the iterative procedure converges to the exact GLS estimator, equivalent to maximum likelihood under normality.^[38] However, if the covariance structure is misspecified, the iterates may fail to converge or exhibit cycling behavior without reaching a stable solution. In implementation, limiting iterations to 3–5 often suffices for practical convergence in finite samples, with an initial \hat{\Omega} based on uniform weights (i.e., OLS) providing a straightforward starting point that avoids undue complexity.^[37]^[39] Compared to one-step FGLS, which applies a single preliminary \hat{\Omega} update from OLS residuals for consistency and asymptotic efficiency, the iterative approach enhances finite-sample efficiency by better approximating the true \Omega but introduces potential bias from the generated regressors in the weighting matrix.^[40]^[36]

Derivations

Minimization of Quadratic Form

The generalized least squares (GLS) estimator arises from minimizing the quadratic form (Y - X\beta)^T \Omega^{-1} (Y - X\beta), known as the generalized residual sum of squares, where Y is the n \times 1 vector of observations, X is the n \times k design matrix, \beta is the k \times 1 parameter vector, and \Omega is the n \times n positive definite covariance matrix of the errors.^[13] This objective function generalizes the ordinary least squares criterion by incorporating the error covariance structure through the weighting matrix \Omega^{-1}.^[41] To find the minimizer, differentiate the objective function with respect to \beta and set the result to zero:

\frac{\partial}{\partial \beta} (Y - X\beta)^T \Omega^{-1} (Y - X\beta) = -2 X^T \Omega^{-1} (Y - X\beta) = 0.

Solving for \beta yields the normal equations X^T \Omega^{-1} X \beta = X^T \Omega^{-1} Y, and assuming X^T \Omega^{-1} X is invertible, the GLS estimator is

\hat{\beta}_{\text{GLS}} = (X^T \Omega^{-1} X)^{-1} X^T \Omega^{-1} Y.

^[42] Geometrically, the GLS estimator represents the orthogonal projection of Y onto the column space of X in the inner product space equipped with the metric \langle u, v \rangle = u^T \Omega^{-1} v.^[43] This projection minimizes the weighted distance in the transformed space, where the weighting accounts for the error variances and correlations. An equivalent approach is to apply ordinary least squares to the transformed variables Y^* = \Omega^{-1/2} Y and X^* = \Omega^{-1/2} X, where \Omega^{1/2} satisfies \Omega = \Omega^{1/2} (\Omega^{1/2})^T.^[41] The resulting OLS estimator on the transformed model coincides with \hat{\beta}_{\text{GLS}}, as the transformation whitens the errors to have identity covariance. This quadratic minimization perspective requires no normality assumption on the errors; the GLS estimator remains the best linear unbiased estimator under the generalized Gauss-Markov conditions of uncorrelated errors with known covariance \Omega.^[13]

Maximum Likelihood Approach

Under the assumption that the error term \epsilon in the linear model Y = X\beta + \epsilon follows a multivariate normal distribution with mean zero and covariance matrix \sigma^2 \Omega, where \Omega is a known positive definite matrix, the maximum likelihood estimation of \beta and \sigma^2 provides a probabilistic foundation for the generalized least squares (GLS) estimator.^[44] The likelihood function is given by

L(\beta, \sigma^2 \mid Y, X, \Omega) = (2\pi \sigma^2)^{-n/2} |\Omega|^{-1/2} \exp\left\{ -\frac{1}{2\sigma^2} (Y - X\beta)^T \Omega^{-1} (Y - X\beta) \right\},

where n is the number of observations.^[44] To obtain the maximum likelihood estimator (MLE), the log-likelihood is maximized with respect to \beta. Differentiating the log-likelihood with respect to \beta and setting the derivative to zero yields the normal equations X^T \Omega^{-1} X \hat{\beta}_{ML} = X^T \Omega^{-1} Y, which are identical to those from the GLS minimization of the quadratic form (Y - X\beta)^T \Omega^{-1} (Y - X\beta). Thus, the MLE for \beta coincides with the GLS estimator: \hat{\beta}_{ML} = (X^T \Omega^{-1} X)^{-1} X^T \Omega^{-1} Y.^[44]^[45] The MLE for the variance parameter is \hat{\sigma}^2_{ML} = \frac{1}{n} (Y - X\hat{\beta})^T \Omega^{-1} (Y - X\hat{\beta}), which is biased (with expected value \sigma^2 (1 - k/n), where k is the number of parameters in \beta) but consistent as n \to \infty under standard regularity conditions.^[44] If normality is violated, the GLS estimator \hat{\beta} remains the best linear unbiased estimator (BLUE) by the Gauss-Markov theorem and is consistent under the standard linear model assumptions, although it is no longer the maximum likelihood estimator and exact finite-sample inference (e.g., t-tests) may require alternative methods.^[44] For inference, the profile likelihood is obtained by substituting \hat{\beta}_{ML} into the likelihood function, concentrating out \beta and yielding a function of \sigma^2 (or other parameters of interest), which supports likelihood ratio tests and confidence intervals.^[46]

References

[1]
[PDF] Generalized Least Squares
More details may be found in Carroll and. Ruppert (1988). An alternative approach is to model the variance and jointly estimate the regression and weighting.Missing: authoritative sources
[2]
[PDF] Aitken's Generalized Least Squares - University of California, Berkeley
Aitken's Generalized Least Squares (GLS) is the best linear unbiased estimator for the generalized regression model, derived by Aitken.Missing: paper | Show results with:paper
[3]
Generalized least squares (GLS regression) - StatLect
The generalized least squares (GLS) estimator of the coefficients of a linear regression is a generalization of the ordinary least squares (OLS) estimator.Missing: authoritative sources
[4]
T.2.5.4 - Generalized Least Squares | STAT 501
The generalized least squares estimator (sometimes called the Aitken ... Lesson 7: MLR Estimation, Prediction & Model Assumptions · 7.1 - Confidence Interval ...Missing: citation | Show results with:citation
[5]
[PDF] Introductory Econometrics
... LEAST SQUARES. ESTIMATES. Now that we have discussed the basic ingredients of the simple regression model, we will address the important issue of how to ...
[6]
Gauss and the Invention of Least Squares - Project Euclid
The most famous priority dispute in the history of statistics is that between Gauss and Legendre, over the discovery of the method of least squares.
[7]
7 Classical Assumptions of Ordinary Least Squares (OLS) Linear ...
Violations of this assumption can occur because there is simultaneity between the independent and dependent variables, omitted variable bias, or measurement ...Ols Assumption 1: The... · Ols Assumption 3: All... · Ols Assumption 4...
[8]
Key Assumptions of OLS: Econometrics Review - Albert.io
Jul 13, 2021 · This article provides a review of the key assumptions of OLS. It talks about: how to look out for potential errors when assumptions are not ...
[9]
OLS Regression: The Key Ideas Explained - DataCamp
Jan 8, 2025 · Additionally, OLS requires that all assumptions (linearity, independence, homoscedasticity, normality) are met; violations can lead to biased or ...
[10]
Heteroscedasticity in Regression Analysis - Statistics By Jim
Heteroscedasticity refers to residuals for a regression model that do not have a constant variance. Learn how to identify and fix this problem.
[11]
[PDF] Issues Using OLS with Time Series Data
Example: AR(1) Process. Very common form of serial correlation. First Order Autoregressive process: AR(1). True model: yt = β0+β1x1t + β2x2t + . . . .βkXkt + ...
[12]
(PDF) Ordinary Least Squares - ResearchGate
Sep 27, 2024 · The article also addresses the limitations of OLS, such as sensitivity to outliers, multicollinearity, non-linearity, and heteroscedasticity.
[13]
[PDF] lecture 11: generalized least squares (gls) - Cornell University
In this lecture, we will consider the model y = Xβ + ε retaining the assumption Ey = Xβ. However, we no longer have the assumption. V(y) = V(ε) = σ2I. Instead ...Missing: Var( σ²
[14]
IV.—On Least Squares and Linear Combination of Observations
Sep 15, 2014 · —On Least Squares and Linear Combination of Observations. Published online by Cambridge University Press: 15 September 2014. A. C. Aitken.
[15]
5.2 Generalized Least Squares | A Guide on Data Analysis
This is a guide on how to conduct data analysis in the field of data science, statistics, or machine learning.Missing: authoritative | Show results with:authoritative
[16]
[PDF] STAT 714 LINEAR STATISTICAL MODELS
SUMMARY : Consider the linear model Y = Xβ + , where E( ) = 0. From the ... Successively taking Y to be standard unit vectors, for i = 1,2, ..., n ...
[17]
[PDF] Contents - USC Dornsife
This approach is known as generalized least squares(GLS). In the special ... In previous sections only first and second moment assumptions are required,.
[18]
[PDF] 1 Introduction to Generalized Least Squares
Consider the model. Y = Xβ + , where the N × K matrix of regressors X is fixed, independent of the error term, and of full rank,.Missing: σ² | Show results with:σ²
[19]
[PDF] Lecture 11 GLS
• Aitken Theorem (1935): The Generalized Least Squares estimator. Py = PXβ + P𝜺 or. y∗ = X∗β + 𝜺∗. E[𝜺∗𝜺∗|X∗] = 𝜎 IT. We can use OLS in the ...
[20]
[PDF] GLS and FGLS - Purdue University
To work out the asymptotics of the FGLS estimator, a standard approach is to first demonstrate that FGLS is asymptotically equivalent to GLS, so.<|control11|><|separator|>
[21]
Beyond Ordinary Least Squares: The Generalized Linear Model (GLM)
Assumptions of Ordinary Least Squares (OLS) · Weighted least-squares, which permit giving different observations different weights. · Maximum likelihood ...
[22]
[PDF] Generalized Least Squares Theory
In this chapter, the method of generalized least squares (GLS) is introduced to im-.Missing: John 1936 paper
[23]
[PDF] Lecture 3 Generalized Least Squares and Autocovariance Functions
Jun 26, 2015 · Compute a Cholesky decomposition of the matrix, i.e.,. Σ = LLT where L is lower triangular. Then Σ-1/2 = L-1. The decomposition is not ...
[24]
[PDF] Day 1A Ordinary Least Squares and GLS - Colin Cameron
▻ model in matrix notation. ▻ statistical properties. ▻ hypothesis testing. ▻ simulations to show consistency and asymptotic normality. Additionally.
[25]
[PDF] A Modern Gauss-Markov Theorem - University of Wisconsin–Madison
Thus, in the general linear regression model, generalized least squares is the minimum variance linear unbiased estimator. Aitken's theorem, however, rests ...<|control11|><|separator|>
[26]
[PDF] Robust Regression in the Presence of Heteroscedasticity
known) is to use generalized least squares where the weights for each of ... 100% plus efficiency gain that can be obtained by switching from OLS to FGLS.
[27]
[PDF] Lecture 24: Weighted and Generalized Least Squares
Heteroskedastic linear regression model. For now, assume we know σ1,...,σp. The model is. Y = Xβ + where now (treating X as fixed), E[] = 0 and Var() = Σ and Σ ...Missing: σ² Ω
[28]
[PDF] Chapter 4 WLS and Generalized Least Squares
This method uses the spec- tral theorem (singular value decomposition) and has better computational properties than transformation based on the Cholesky ...
[29]
13.1 - Weighted Least Squares | STAT 501
Weighted least squares estimates of the coefficients will usually be nearly the same as the "ordinary" unweighted estimates.
[30]
[PDF] Heteroscedasticity - UTRGV Faculty Web
As an example, consider once again our wage equation wage = β1 + ... We can use this hi as a weight in a Weighted Least Squares regression to solve the.
[31]
Estimating Regression Models with Multiplicative Heteroscedasticity
HETEROSCEDASTIC REGRESSION MODELS in which the variance of the disturbance term is assumed to be proportional to one of the regressors raised to a certain.
[32]
A Simple Test for Heteroscedasticity and Random Coefficient Variation
BY T. S. BREUSCH AND A. R. PAGAN. A simple test for heteroscedastic disturbances in a linear regression model is developed using the framework of the ...
[33]
A Heteroskedasticity-Consistent Covariance Matrix Estimator and a ...
May 1, 1980 · This paper presents a parameter covariance matrix estimator which is consistent even when the disturbances of a linear regression model are heteroskedastic.
[34]
Efficiency Gains from Misspecified Heteroscedasticity Models - arXiv
Apr 14, 2023 · If the heteroscedasticity model is correct, the proposed estimator achieves full asymptotic efficiency. The idea is to frame moment conditions ...Missing: inefficiency | Show results with:inefficiency
[35]
[PDF] Using Heteroskedasticity to Identify and Estimate Mismeasured and ...
In an empirical application, this paper's methodol- ogy is applied to deal with measurement error in total expenditures, resulting in Engel curve estimates ...
[36]
An Efficient Method of Estimating Seemingly Unrelated Regressions ...
[13] Zellner, A., and Theil, H. "Three-Stage Least-Squares: Simultaneous Estimation of. Simultaneous Equations." Econometrica, 30 (1962), 54-78.Missing: feasible | Show results with:feasible
[37]
Application of Least Squares Regression to Relationships ...
Apr 11, 2012 · We present evidence showing that the error terms involved in most current formulations of economic relations are highly positively autocorrelated.
[38]
Feasible generalized least squares estimation of multivariate ...
We provide a feasible generalized least squares estimator for (unrestricted) multivariate GARCH(1, 1) models. We show that the estimator is consistent and ...
[39]
fgls - Feasible generalized least squares - MATLAB - MathWorks
Traditional FGLS methods, such as the Cochrane-Orcutt procedure, use low-order, autoregressive models. These methods, however, estimate parameters in the ...
[40]
A General Procedure for Obtaining Maximum Likelihood Estimates ...
This paper describes an iterative procedure for obtaining maximum likelihood estimates of parameters in generalized regression models when direct maximization ...Missing: GLS | Show results with:GLS
[41]
[PDF] xtgls — Fit panel-data models by using GLS - Stata
xtgls fits panel-data linear models by using feasible generalized least squares. This command allows estimation in the presence of AR(1) autocorrelation ...Missing: procedure | Show results with:procedure
[42]
[PDF] Solutions and Applications Manual - NYU Stern
Page 1. Solutions and Applications Manual. Econometric Analysis. Sixth Edition. William H. Greene. New York University. Prentice Hall, Upper Saddle River, New ...
[43]
[PDF] Review of Classical Least Squares James L. Powell Department of ...
A more delicate derivation, which uses the fact that s2 is proportional to a quadratic form ... yields Aitken's Generalized Least Squares (GLS) estimator,.
[44]
[PDF] Econometrics-I-14.pdf - NYU Stern
Generalized least squares - efficient estimation. Assuming weights are known. Two step generalized least squares: □ Step 1: Use least squares, then the ...
[45]
[PDF] Least squares and maximum likelihood
Apr 4, 2018 · ... metric defined by W, ktk. W. = tT Wt. 1/2 . We have krk2. 2. = rT WW−1r ... generalized least squares problems with ill-conditioned ...
[46]
[PDF] Ch6. Multiple Regression: Estimation 1 The model
Remark: The remarkable feature of the Gauss-Markov theorem is its distributional generality. The result holds for any distribution of y; normality is not ...
[47]
[PDF] 1 Least squares basics - Cornell: Computer Science
Sep 20, 2022 · the generalized least squares problem XT C. −1(Xc − y)=0 or, in alternate form. C X. XT. 0 r c. = y. 0 . 4.2 Maximum likelihood. Another ...
[48]
[PDF] When BLUE is not best: non-normal errors and the linear model
If the errors do not follow a normal distribution, then LS is still the BLUE, but other, non-linear estimators may be more efficient. Our claim is that the ...
[49]
Bayesian generalized least squares regression with application to ...
Oct 26, 2005 · This paper develops a Bayesian approach to analysis of a generalized least squares ... Therefore the mode of the profile likelihood corresponds to ...