Random effects model
In statistics, the random effects model is a framework for analyzing data where certain factors or effects are treated as random variables drawn from a specific probability distribution, typically normal, enabling inferences about a broader population of possible levels rather than just the observed ones.[1] This approach contrasts with fixed effects models by assuming that the random effects capture unexplained variability across groups or units, such as treatments sampled from a larger set, and is formally expressed in a basic one-way model as Y_{ij} = \mu + \tau_i + \epsilon_{ij}, where \tau_i \sim N(0, \sigma_\tau^2) represents the random effect for the i-th level and \epsilon_{ij} \sim N(0, \sigma^2) is the residual error, with independence between \tau_i and \epsilon_{ij}.[1] The total variance of observations is then \sigma_\tau^2 + \sigma^2, and a key parameter is the intraclass correlation coefficient \rho = \sigma_\tau^2 / (\sigma_\tau^2 + \sigma^2), which quantifies the proportion of total variance due to the random effects.[1] Random effects models differ fundamentally from fixed effects models in their scope of inference and assumptions about the factors involved.[1] In fixed effects models, the levels of a factor are considered exhaustive and specifically chosen by the researcher, limiting inferences to those particular levels and testing hypotheses about equality of means (e.g., H_0: \mu_i = \mu); random effects, however, treat levels as a random sample from a larger population, broadening inferences to that population and testing whether the variance of the effects is zero (e.g., H_0: \sigma_\tau^2 = 0).[1] This distinction arises because random effects account for hierarchical or clustered data structures, where observations within groups are correlated, adjusting standard errors for tests on other effects and preventing underestimation of variability.[2] Random effects models are integral to linear mixed models (LMMs), which combine fixed effects—representing all possible levels of interest for direct inference, such as specific treatments—with random effects for sampled factors like blocks, replications, or locations to model additional variability.[2] In LMMs, random effects are specified separately (e.g., via a random statement in software), and they adjust the standard errors for fixed effect tests, making the model suitable for experimental designs like split-plot or repeated measures where repeated effects over time require covariance structures.[2] Applications extend to fields like agriculture for analyzing field trials across varying locations,[2] and in econometrics for panel data where individual-specific effects are assumed uncorrelated with regressors.[3] In meta-analysis, the random effects model assumes true effects vary across studies due to heterogeneity, incorporating between-study variance to provide more conservative estimates than fixed-effect alternatives.[4]Fundamentals
Definition and Qualitative Description
A random effects model is a statistical framework in which certain model parameters, such as intercepts or slopes, are treated as random variables drawn from a specified probability distribution, enabling the incorporation of unobserved heterogeneity or variation across groups, individuals, or units in the data.[5] This approach allows the model to account for clustering or correlation in observations that arise from shared unmeasured factors, such as repeated measures on the same subject, rather than assuming all observations are independent.[6] Qualitatively, random effects models provide an intuitive way to model data where the levels of a factor are viewed as a random sample from a broader population, rather than fixed and exhaustive categories of interest. For instance, in studying treatment effects across multiple schools, a random effects model treats school-specific deviations as draws from a distribution, capturing natural variation due to unmeasured school characteristics like culture or resources, which induces correlation among students within the same school. In contrast, fixed effects would assume uniform parameters across all units, ignoring such group-level variability. This perspective shifts the focus from estimating specific effects for each level to estimating the overall variance in those effects, facilitating generalizations beyond the observed sample.[5] Key assumptions underlying random effects models include that the random effects follow a normal distribution with mean zero and constant variance, reflecting their role as deviations from the population mean without systematic bias. In models including fixed effects and covariates, the random effects are assumed to be independent of the covariates, ensuring that unobserved heterogeneity does not correlate with observed predictors and supporting unbiased estimation of fixed effects.[5] The origins of random effects models trace back to the development of variance components analysis in the 1940s and 1950s, building on R.A. Fisher's foundational work in the 1920s on analysis of variance for agricultural experiments at Rothamsted Experimental Station, where he introduced methods to partition variance into components attributable to different sources. Frank Yates, collaborating with Fisher, extended these ideas in the 1930s through studies on sampling designs and yield estimation, laying groundwork for handling random variation in experimental data. This evolved into modern mixed-effects models through seminal contributions like the 1982 framework by Laird and Ware, which formalized random effects for longitudinal data analysis.[7][8]Comparison with Fixed Effects Models
Random effects models differ from fixed effects models primarily in the scope of inference and the treatment of factors. In fixed effects models, the levels of a factor are considered fixed and of specific interest to the researcher, with inferences limited to those levels; hypotheses typically test the equality of means across levels (e.g., H_0: \mu_i = \mu). Random effects models, however, treat the levels as a random sample from a larger population, allowing inferences about the population variance of effects (e.g., H_0: \sigma_\tau^2 = 0) and enabling generalization beyond the observed levels.[5] This distinction is particularly relevant in hierarchical or clustered data, where random effects account for correlation within groups, adjusting estimates and standard errors accordingly. In applications like panel data analysis in econometrics, fixed effects can control for unobserved time-invariant heterogeneity that may correlate with covariates, while random effects assume exogeneity (no such correlation) for efficiency and the ability to estimate time-invariant effects; the Hausman test can help select between them. However, in general statistical contexts such as ANOVA or linear mixed models, the focus remains on the inferential scope rather than bias correction.[9][10]Mathematical Formulation
Basic Model Structure
The random effects model, also known as a linear mixed-effects model, is formulated as a hierarchical linear regression that incorporates both fixed and random effects to account for variation across groups or clusters in the data. In its basic scalar form for the j-th observation within the i-th group, the model is expressed as y_{ij} = X_{ij} \beta + Z_{ij} b_i + \epsilon_{ij}, where y_{ij} is the response variable, X_{ij} \beta represents the fixed effects contribution with \beta as the vector of fixed-effect parameters, Z_{ij} b_i captures the random effects for group i with b_i \sim N(0, G) denoting the random effects vector assumed to follow a multivariate normal distribution with mean zero and covariance matrix G, and \epsilon_{ij} \sim N(0, R) is the residual error term with covariance matrix R.[8] This structure allows the model to handle clustered data by treating group-specific deviations b_i as random draws from a population distribution, thereby generalizing fixed effects approaches to induce dependence within groups.[8] The model adopts a two-stage hierarchical interpretation. In the first stage, the conditional distribution of the response given the random effects is specified as y_{ij} | b_i \sim N(X_{ij} \beta + Z_{ij} b_i, R), modeling the within-group variability around the group-specific mean. The second stage then specifies the distribution of the random effects themselves as b_i \sim N(0, G), which introduces between-group variability and ensures that observations within the same group i are correlated through the shared b_i term, with the covariance between y_{ij} and y_{ik} (for j \neq k) arising from \text{Cov}(Z_{ij} b_i, Z_{ik} b_i) = Z_{ij} G Z_{ik}^T.[8] This hierarchical setup facilitates inference about the fixed effects \beta while borrowing strength across groups via the random effects distribution.[8] In matrix notation, the full model for the stacked response vector \mathbf{y} across all groups is \mathbf{y} = X \beta + Z \mathbf{b} + \epsilon, where \mathbf{y} is the n \times 1 response vector, X is the n \times p fixed-effects design matrix, Z is the n \times q random-effects design matrix, \mathbf{b} \sim N(0, \Sigma_b) is the stacked q \times 1 random effects vector with covariance \Sigma_b = \text{blockdiag}(G_1, \dots, G_m) for m groups, and \epsilon \sim N(0, \Sigma_\epsilon) with \Sigma_\epsilon = \text{blockdiag}(R_1, \dots, R_m).[8] The resulting marginal covariance structure of \mathbf{y} is V = Z \Sigma_b Z^T + \Sigma_\epsilon, which captures the total variability as a combination of random effects and residual components, leading to a marginal normal distribution \mathbf{y} \sim N(X \beta, V).[8] Key assumptions underlying this model include the normality of both the random effects \mathbf{b} and the residuals \epsilon, independence between \mathbf{b} and \epsilon, and, under the common conditional independence assumption, homoscedasticity within groups such that R = \sigma^2 I (implying equal residual variances and no additional autocorrelation beyond that induced by the random effects).[8] These assumptions ensure that the induced correlations are solely attributable to the shared random effects within groups, enabling valid likelihood-based inference.[8]Variance Components
In the random effects model, the total variance of the observed response variable y is decomposed into two primary components: the between-group variance attributable to the random effects, denoted \sigma_b^2, and the within-group residual variance, denoted \sigma_\epsilon^2. This partitioning is expressed as \sigma_y^2 = \sigma_b^2 + \sigma_\epsilon^2, where \sigma_y^2 represents the marginal variance of y. This decomposition highlights how unobserved heterogeneity across groups contributes to the overall variability in the data, separate from measurement error or other residuals. A key metric derived from this decomposition is the intraclass correlation coefficient (ICC), defined as \rho = \frac{\sigma_b^2}{\sigma_b^2 + \sigma_\epsilon^2}. The ICC quantifies the proportion of total variance explained by the random effects or grouping structure, ranging from 0 (no clustering) to 1 (complete clustering). Values of \rho close to 0 suggest that observations within groups are nearly independent, while higher values indicate stronger dependence due to shared random effects. Conceptually, variance components are estimated by partitioning the total sum of squares into between-group and within-group portions, akin to analysis of variance (ANOVA) procedures, where expected mean squares inform the components under normality assumptions. This approach provides a foundation for interpreting heterogeneity, though detailed estimation techniques are addressed elsewhere. A large \sigma_b^2 relative to \sigma_\epsilon^2 signals substantial unobserved heterogeneity among groups, justifying the inclusion of random effects to account for clustering. In balanced designs with equal group sizes, components are readily identifiable from the ANOVA table; however, unbalanced designs introduce complexities, as varying group sizes affect the orthogonality of sums of squares and can complicate the separation of variance sources.Estimation and Inference
Maximum Likelihood Methods
In random effects models, maximum likelihood estimation (MLE) involves maximizing the likelihood function with respect to the fixed effects parameters \beta and the variance-covariance parameters \Sigma, treating the random effects as nuisance parameters. To obtain a tractable form, the marginal likelihood is constructed by integrating out the random effects b, yielding L(\beta, \Sigma | y) = \int L(y | \beta, b, \Sigma) f(b | \Sigma) \, db, where L(y | \beta, b, \Sigma) is the conditional likelihood of the observed data y given \beta, b, and \Sigma, and f(b | \Sigma) is the density of the random effects.[11][12] Under normality assumptions for both the random effects and residuals, this integration results in a multivariate normal marginal distribution for the data: y \sim N(X\beta, V), where V = Z G Z^T + R incorporates the variance components from the random effects design matrix Z, covariance matrix G, and residual covariance R.[11] The log-likelihood function is then \ell(\beta, \Sigma) = -\frac{1}{2} \left[ n \log(2\pi) + \log|V| + (y - X\beta)^T V^{-1} (y - X\beta) \right], where n is the sample size. Computing this requires evaluating the determinant and inverse of V, which poses numerical challenges for large datasets due to the high dimensionality and potential ill-conditioning of V.[11][13] Optimization proceeds iteratively, often using the expectation-maximization (EM) algorithm to handle the integration implicitly. The EM algorithm alternates between an expectation step, computing expected values of the random effects given current parameter estimates, and a maximization step, updating \beta via the generalized least squares estimator \hat{\beta} = (X^T V^{-1} X)^{-1} X^T V^{-1} y and profiling the likelihood for \Sigma to obtain variance component estimates.[14][12] Under correct model specification and suitable regularity conditions, such as increasing sample size with fixed dimensionality of random effects, the MLEs are consistent and asymptotically normal, with \sqrt{n} (\hat{\beta} - \beta) \to N(0, (X^T V^{-1} X / n)^{-1}) and similar results for \Sigma. However, in small samples, the MLE tends to underestimate variance components due to the incidental parameters problem.[11]Restricted Maximum Likelihood (REML)
Restricted maximum likelihood (REML) is an estimation method for variance components in random effects models that maximizes the likelihood of a transformed set of contrasts in the data, which are orthogonal to the fixed effects. This approach focuses on the residuals after accounting for fixed effects, providing an unbiased estimate of the variance parameters by effectively removing the influence of the fixed effect estimators from the likelihood function. Introduced by Patterson and Thompson in 1971, REML is particularly valuable in balanced designs where it yields estimators equivalent to those from analysis of variance for variance components. The REML likelihood can be expressed in relation to the maximum likelihood (MLE) asL_{\text{REML}} = L_{\text{MLE}} \times |X^T V^{-1} X|^{-p/2},
where p is the number of fixed effects, X is the design matrix for fixed effects, and V is the variance-covariance matrix depending on the variance components. This adjustment penalizes the likelihood for the degrees of freedom lost in estimating the fixed effects \beta, ensuring that the variance estimates, such as \sigma^2, are unbiased even in small samples.[15] Compared to MLE, REML offers key advantages by reducing downward bias in variance component estimates; for instance, the expected value of the MLE for \sigma^2 is (n - p)/n \cdot \sigma^2, whereas REML corrects this to match the unbiased sample variance in simple cases. This bias correction makes REML the preferred method for hypothesis testing on variance components, as it provides more reliable standard errors and confidence intervals for random effects variances.[16] The estimation procedure for REML mirrors that of MLE, involving iterative optimization techniques such as expectation-maximization (EM) or Newton-Raphson algorithms to maximize the adjusted log-likelihood, often using transformed data contrasts w = A'y where A spans the null space of X^T. In practice, software implements this by profiling out the fixed effects and optimizing over variance parameters alone. For large samples, REML estimators are asymptotically equivalent to MLE, converging to the same values as the number of observations increases.[15] Despite its benefits, REML has limitations, including a lack of invariance to reparameterization of the model, which can affect the bias properties of estimates under different formulations. Additionally, it can be computationally more intensive than MLE in unbalanced designs due to the need to handle the projection matrix X^T V^{-1} X explicitly during iterations.[17]
Examples and Applications
Simple Example
To illustrate the random effects model, consider a simulated dataset consisting of test scores for 3 students in each of 5 classrooms, where scores are clustered by classroom.Mixed-Effects Models in S and S-PLUS (Pinheiro and Bates, 2000) The data are generated to reflect a scenario with an overall average performance level and variation both within and between classrooms, as follows:| Classroom | Student 1 | Student 2 | Student 3 | Group Mean |
|---|---|---|---|---|
| 1 | 66 | 68 | 70 | 68 |
| 2 | 68 | 70 | 72 | 70 |
| 3 | 67 | 69 | 71 | 69 |
| 4 | 71 | 73 | 75 | 73 |
| 5 | 68 | 70 | 72 | 70 |
| Overall Mean | 70 |
| Classroom | Fixed Effect Estimate (Group Mean) | Random Effect BLUP (Predicted Intercept) |
|---|---|---|
| 1 | 68 | 68.8 |
| 2 | 70 | 70.0 |
| 3 | 69 | 69.6 |
| 4 | 73 | 71.9 |
| 5 | 70 | 70.0 |