Latent growth modeling
Latent growth modeling (LGM), also referred to as latent growth curve modeling (LGCM), is a statistical method embedded within structural equation modeling (SEM) that analyzes longitudinal data to estimate and interpret patterns of individual and average change over time.[1] It represents growth trajectories using latent variables to capture key parameters such as the intercept (initial status or starting level) and slope (rate of change), allowing researchers to model both linear and nonlinear developmental processes while accounting for measurement error and individual differences. This approach treats repeated measures as indicators of underlying latent constructs, enabling the separation of within-person change from between-person variability in a single unified framework.
The conceptual foundations of LGM trace back to early work on curve fitting and factor analysis of change, with Ledyard R. Tucker introducing generalized learning curves in 1958 as a way to parameterize functional relations in longitudinal data using principal components.[2] This idea was formalized and extended by William Meredith and John Tisak in 1990 through latent curve analysis, which applied confirmatory factor analysis to model developmental trajectories with maximum likelihood estimation and asymptotic tests for parameters. Bengt Muthén and Patrick J. Curran further advanced the technique in 1997 by integrating it fully into the SEM paradigm, emphasizing its flexibility for handling missing data, multilevel structures, and experimental designs in behavioral research.
LGM has become a cornerstone in fields such as psychology, education, and public health for investigating developmental phenomena, intervention impacts, and risk factors over time, offering advantages over traditional methods like repeated-measures ANOVA by explicitly modeling heterogeneity in growth and covariates' effects on trajectories. Key extensions include multilevel LGMs for nested data (e.g., students within schools), time-varying covariates to explain fluctuations, and growth mixture modeling to identify subpopulations with distinct trajectories. By providing interpretable estimates of growth parameters and their covariances, LGM facilitates hypothesis testing about the shape, direction, and predictors of change, making it particularly valuable for longitudinal studies with unequally spaced assessments or incomplete data.[1]
Overview
Definition and Purpose
Latent growth modeling (LGM) is a statistical technique embedded within the structural equation modeling (SEM) framework that analyzes longitudinal data by modeling individual trajectories of change over time.[1] It employs latent variables to capture unobserved heterogeneity in growth processes, specifically representing the initial status of a phenomenon (intercept) and its rate of change (slope).[3] This approach treats repeated measures as indicators of these latent factors, allowing researchers to partition observed variability into systematic growth components and random error.[1]
The primary purpose of LGM is to estimate population-average patterns of growth, quantify individual differences in those trajectories, and identify predictors that explain variability in change.[3] By modeling both intra-individual change and inter-individual differences, LGM facilitates the examination of how developmental processes unfold and how they are influenced by covariates, such as demographic factors or interventions.[1] In the basic linear growth model, the observed outcome Y_{ti} for individual i at time t is expressed as:
Y_{ti} = \alpha_i + \beta_i t + \varepsilon_{ti}
where \alpha_i denotes the random intercept (initial status), \beta_i the random slope (rate of change), t the time coding, and \varepsilon_{ti} the residual disturbance.[3]
LGM was developed to overcome key limitations of traditional methods like repeated measures ANOVA, which often treat individual differences as mere error and fail to account for measurement unreliability or latent constructs underlying observed variables.[1] Instead, LGM explicitly models measurement error and latent growth factors, providing a more nuanced understanding of dynamic processes in fields such as psychology, education, and public health.[3]
Relation to Structural Equation Modeling
Latent growth modeling (LGM) represents a specialized application within the broader framework of structural equation modeling (SEM), wherein the core growth parameters—such as the intercept (initial status) and slope (rate of change)—are modeled as latent variables that exert direct influences on observed repeated measures across multiple time points. This positioning as a special case of SEM enables LGM to incorporate the analytical rigor of SEM while prioritizing the temporal sequencing and interdependence inherent in longitudinal data, in contrast to general SEM models that may address cross-sectional or non-temporal relationships without such emphasis.[3][2]
Meredith and Tisak (1990) formalized LGM by framing it as a restricted common factor model embedded in SEM, which imposes specific constraints on factor loadings to reflect developmental trajectories over time.[4] Key shared principles between LGM and SEM include the reliance on confirmatory factor analysis (CFA) to operationalize and validate latent constructs through their observed indicators, ensuring measurement reliability and reducing bias from measurement error.[3] Additionally, both approaches employ path diagrams to articulate model structures, with latent variables symbolized as ovals or circles connected by arrows to rectangular observed variables, facilitating clear visualization of hypothesized causal pathways and covariances.[2]
Distinct adaptations in LGM arise from its emphasis on time as a foundational predictor, where factor loadings are parameterized to encode temporal progression—fixed for parametric forms like linear growth or estimated for flexible shapes—allowing the model to capture individual differences in change patterns.[3] In path diagrams for LGM, this manifests as structured arrows from the latent intercept (with loadings of 1 at all time points) and slope (with loadings of 0, 1, 2, etc., for equally spaced linear occasions) to each observed indicator, often accompanied by curved double-headed arrows denoting residual variances and potential covariances between growth factors.[2] A further unique feature is the capacity to include directional paths from latent slope factors to distal outcomes, enabling examination of how rates of change prospectively influence subsequent variables, such as in models predicting later achievement from academic growth trajectories.[3]
Historical Development
Early Foundations
The foundations of latent growth modeling trace back to mid-20th-century statistical developments in growth curve analysis, which emphasized univariate trajectories and methods to account for individual differences in longitudinal data. In 1958, Ledyard R. Tucker introduced a factor analytic approach to model growth curves, particularly in the context of learning processes, by determining the parameters of functional relations through principal components analysis.[2] This method focused on univariate growth curves, using principal components to identify the number of underlying dimensions needed to represent temporal changes, thereby capturing the shape and variability of individual trajectories without assuming a specific functional form upfront. Tucker's work laid early groundwork for handling unobserved heterogeneity by treating components as latent structures that explain differences in growth patterns across individuals.
Concurrently, C. Radhakrishna Rao developed statistical methods for comparing growth curves in 1958, addressing challenges in both balanced and unbalanced longitudinal designs.[5] Rao's score method enabled the estimation of growth parameters, such as rates and intercepts, by leveraging likelihood-based scores to test hypotheses about differences between groups or individuals, even when observations were missing or irregularly spaced. This approach introduced latent elements through the decomposition of covariance structures, allowing for the modeling of unobserved sources of variation in trajectories, which proved particularly useful for biological and psychological growth data. Both Tucker's and Rao's contributions centered on univariate analyses but pioneered the use of latent structures to represent heterogeneity, predating the formalization of structural equation modeling (SEM) while influencing its later extensions to longitudinal applications.
Key Advancements in the 20th Century
In the late 1980s, significant progress in latent growth modeling (LGM) emerged through its integration with structural equation modeling (SEM), allowing for the analysis of individual growth trajectories using latent variables. McArdle and Epstein (1987) introduced latent growth curves within developmental SEM frameworks, providing a method to model both average and individual-level changes over time by estimating parameters such as intercepts and slopes as latent factors.[6] This approach built on earlier curve-fitting techniques but formalized them within SEM to handle correlated errors and multiple outcomes more robustly.[7]
Bollen's (1989) foundational work on structural equations with latent variables further extended these ideas to panel data, enabling the specification of growth models that account for time-dependent covariances and autoregressive processes in longitudinal datasets.[8]
A key milestone came with Meredith and Tisak (1990), who established LGM—also termed latent curve analysis—as a direct subset of SEM, emphasizing the estimation of growth curves through confirmatory factor analysis where latent intercepts and slopes capture unobserved heterogeneity in trajectories.[4] Their framework introduced unconditional models, which estimate population-level growth without covariates, and conditional models, which incorporate predictors to explain inter-individual differences in change.[9]
Muthén and Curran (1997) further advanced the technique by fully integrating it into the SEM paradigm, highlighting its flexibility for handling missing data, multilevel structures, and experimental designs in behavioral research.[10]
By the 1990s, LGM had gained substantial traction in psychology for modeling developmental changes, such as cognitive or behavioral trajectories across ages, due to its ability to disentangle within- and between-person variance.[1] Adaptations in software like LISREL and EQS during this period made these models more accessible, supporting maximum likelihood estimation for complex longitudinal designs.[11]
Core Concepts
Latent Variables in Growth
In latent growth modeling, the intercept and slope serve as latent variables that encapsulate the initial status and rate of change, respectively, of an unobserved true growth trajectory, thereby isolating these constructs from measurement error inherent in observed repeated measures. The intercept factor, with loadings fixed at 1 across all time points, represents the baseline level of the outcome at the starting point (often time 0), while the slope factor, with loadings set to time-specific values (such as 0, 1, 2 for equally spaced linear assessments), models the systematic progression or decline over time. This framework, rooted in structural equation modeling principles, enables the estimation of these latent growth factors as random variables that vary across individuals, providing a more precise depiction of developmental processes than observed data alone.
The variances of these latent variables quantify inter-individual differences: the intercept variance captures heterogeneity in starting levels, and the slope variance reflects variation in growth rates, both purged of error influences. The covariance between the intercept and slope elucidates the dynamic interplay between initial conditions and change, for example, indicating whether individuals with higher baseline values exhibit steeper or flatter trajectories. These parameters allow researchers to assess both average growth patterns (via factor means) and individual deviations, offering deeper insights into stability and transformation in longitudinal data.
A representative application appears in cognitive development research, where the intercept denotes baseline ability—such as percent correct on verbal subscales like Information at age 5 (estimated around 49%)—and the slope signifies the learning rate across subsequent assessments. In the Louisville Twin Study, slopes for cognitive subscales (e.g., 0.65 percent correct units per interval for Information from ages 7–10, using a piecewise growth model) revealed significant genetic contributions to growth independent of initial levels, with factor loadings fixed to reflect linear progression over assessment intervals. This setup underscores how latent variables facilitate the partitioning of true developmental variance from error, informing etiological models of cognitive maturation.[12]
Time-Invariant vs. Time-Varying Factors
In latent growth modeling (LGM), covariates are incorporated to explain individual differences in growth trajectories, with a key distinction between time-invariant and time-varying factors. Time-invariant covariates are stable characteristics that do not change across measurement occasions for a given individual, such as gender, ethnicity, or baseline socioeconomic status (SES). These are typically regressed onto the latent intercept and slope factors to account for between-person differences in initial levels and rates of change. For instance, gender might predict a higher intercept in cognitive ability trajectories for one group compared to another, thereby explaining group-level variations in growth patterns.[13][14]
The incorporation of time-invariant covariates extends the unconditional LGM to a conditional model, where the growth factors are predicted by these stable predictors. This is represented in the equation for the intercept factor as \alpha_i = \gamma_0 + \gamma_1 X_i + \zeta_i, where \alpha_i is the individual-specific intercept, \gamma_0 is the mean intercept, \gamma_1 is the regression coefficient for the time-invariant covariate X_i, and \zeta_i is the residual. Similarly, the slope factor can be modeled as \beta_i = \delta_0 + \delta_1 X_i + \xi_i, allowing the covariate to influence the rate of change over time. Such modeling reveals how stable factors moderate growth, as seen in studies where baseline SES predicts steeper educational achievement slopes. Centering these covariates at the grand mean or group mean is recommended to facilitate interpretation and reduce multicollinearity.[15][16]
In contrast, time-varying covariates fluctuate across measurement waves within individuals, such as stress levels, treatment dosage, or environmental exposures at each time point. These are modeled by including direct paths from the covariate at time t to the outcome or residual at that time, capturing within-person deviations from the overall growth trajectory beyond what the intercept and slope explain. For example, contemporaneous stress might predict higher depressive symptoms at specific waves, adjusting the trajectory locally without altering the global slope. Unlike time-invariant covariates, time-varying ones require careful centering—often at the individual mean—to distinguish within- from between-person effects, preventing biased estimates of growth parameters. This approach is central to conditional LGMs and builds on the core latent growth factors by addressing dynamic influences.[13][14][17]
Model Specification
Linear Growth Model
The linear growth model represents the foundational specification within latent growth modeling, positing that individual trajectories in a longitudinal outcome can be captured by two latent factors: an intercept factor (\alpha_i) denoting the starting level for individual i, and a slope factor (\beta_i) capturing the rate of linear change over time. This approach models observed repeated measures Y_{ti} at time points t = 1, \dots, T for individual i as a function of these factors plus a residual term, expressed in the measurement model as:
Y_{ti} = 1 \cdot \alpha_i + \lambda_t \cdot \beta_i + \varepsilon_{ti},
where \lambda_t are fixed time scores (loadings) that define the linear trajectory, typically coded as \lambda_t = 0, 1, 2, \dots, T-1 to set the intercept at the first occasion and scale the slope in units of change per time interval.[18] The factor loading matrix \Lambda thus has a first column of all 1s for the intercept and the second column as the \lambda_t vector for the slope, ensuring model identification through these constraints.[3]
The latent factors follow a structural model where \alpha_i \sim N(\mu_\alpha, \sigma_\alpha^2) and \beta_i \sim N(\mu_\beta, \sigma_\beta^2), with a possible covariance \sigma_{\alpha\beta} between them; residuals \varepsilon_{ti} are assumed N(0, \theta) with diagonal variance-covariance matrix \Theta.[18] Interpretation centers on these parameters: the mean intercept \mu_\alpha reflects the average initial status across individuals at the time origin (e.g., the first measurement occasion), while the mean slope \mu_\beta indicates the average rate of linear change per unit time.[3] The variance \sigma_\alpha^2 quantifies heterogeneity in starting levels, \sigma_\beta^2 captures individual differences in growth rates, and \sigma_{\alpha\beta} describes the extent to which initial status and change are coupled (e.g., positive if higher starters tend to grow faster).[18][3]
Key assumptions underpin the model's validity. It presumes linear change in the outcome over the studied period, such that deviations from linearity would require alternative specifications.[3] Additionally, the repeated measures are assumed to follow a multivariate normal distribution, facilitating maximum likelihood estimation of parameters. Residuals \varepsilon_{ti} are further assumed uncorrelated across time (no autocorrelation) and often homoscedastic (constant variance), though violations can be addressed in extensions.[3] At least three time points are required for estimation, with more enhancing precision.[18]
Nonlinear and Polynomial Extensions
While linear latent growth models assume constant rates of change, polynomial extensions incorporate higher-order terms to represent curvilinear patterns, such as acceleration or deceleration in developmental trajectories. A common form is the quadratic model, which adds a second latent factor for the quadratic slope alongside the intercept and linear slope factors. The loadings for the quadratic factor are typically specified as squared time values, for example, 0, 1, and 4 for time points t=0, 1, and 2, allowing the model to capture U-shaped or inverted U-shaped growth. The structural equation for the observed outcome Y_{ti} at time t for individual i is given by:
Y_{ti} = \alpha_i + \beta_{1i} t + \beta_{2i} t^2 + \epsilon_{ti},
where \alpha_i is the intercept, \beta_{1i} the linear slope, \beta_{2i} the quadratic slope, and \epsilon_{ti} the residual. This extension is particularly useful for processes where growth slows over time, such as cognitive development in children, and requires at least four time points for model identification to estimate the additional parameter reliably.[1]
Nonlinear extensions further enhance flexibility by relaxing parametric assumptions, enabling the modeling of complex, non-polynomial shapes like asymptotic or irregular trajectories that may occur in biological or psychological development. One approach uses freely estimated factor loadings on the slope factor, with the first loading fixed at 0 and the last at 1, while intermediate loadings are estimated from the data to accommodate unspecified nonlinear forms without predefined functional shapes. This latent basis model, building on the linear framework, allows data-driven discovery of growth patterns but demands more time points—typically five or more—for adequate identification and stable estimates due to the increased number of free parameters.
Parametric nonlinear models specify predefined functional forms, such as logistic, exponential, or Gompertz curves, to fit sigmoid or asymptotic growth common in developmental plateaus where rapid initial change levels off toward a maximum. The Gompertz model, originally from population dynamics, is adapted in latent growth modeling for asymmetric growth approaching a plateau, with the equation:
Y_{ti} = \alpha_i + \beta_{1i} \exp\left( -\exp\left( -\gamma (t_i - \mu_i) \right) \right) + \epsilon_{ti},
where \alpha_i is the lower asymptote (approximating initial status), \beta_{1i} the range to the upper asymptote, \gamma (fixed) controls the growth rate, \mu_i (random) the inflection point (with about 37% of growth completed by then), and the double exponential terms produce an asymmetric S-shaped curve that asymptotes, ideal for modeling skill acquisition that plateaus after early gains.[19] Similarly, the logistic model enforces symmetry around the inflection point via:
Y_{ti} = \alpha_i + \beta_{1i} \frac{1}{1 + \exp(-\gamma (t_i - \mu_i))} + \epsilon_{ti},
with \gamma (fixed) as the growth rate and \mu_i (random) the midpoint, applied to balanced acceleration-deceleration patterns in longitudinal studies.[19] These forms are estimated using nonlinear structural equation modeling procedures and are especially valuable for capturing real-world developmental processes that deviate from straight lines, though they require careful specification to ensure convergence.
Estimation Procedures
Maximum Likelihood Estimation
Maximum likelihood (ML) estimation serves as the primary method for obtaining parameter estimates in latent growth models (LGMs), treating the model as a special case of structural equation modeling (SEM).[20] In this approach, parameters such as growth factors (e.g., intercepts and slopes) are estimated by maximizing the likelihood function, which assumes multivariate normality of the observed repeated measures.[21] Full information maximum likelihood (FIML) is commonly employed, utilizing the complete sample information to derive estimates that minimize the discrepancy between the observed covariance matrix S and the model-implied covariance matrix \Sigma(\theta), where \theta represents the vector of free parameters.[22]
Model fit in ML-estimated LGMs is evaluated using several standard indices derived from SEM traditions. The chi-square test (\chi^2) assesses absolute fit by comparing the observed and implied covariance structures, with non-significant values (adjusted for degrees of freedom) indicating good fit, though it is sensitive to sample size.[23] Incremental fit is gauged by indices like the Comparative Fit Index (CFI), which compares the target model to a baseline null model and favors values above 0.95, and the Root Mean Square Error of Approximation (RMSEA), which accounts for model parsimony and prefers values below 0.06.[24] These indices provide a multifaceted evaluation, allowing researchers to assess both overall and comparative model adequacy.[25]
For comparing nested LGMs—such as a linear model against a more complex nonlinear extension—ML facilitates likelihood ratio tests (LRTs), where the difference in chi-square values follows a chi-square distribution with degrees of freedom equal to the difference in the number of parameters.[26] Non-significant LRT results support retaining the more restrictive model, indicating that the simpler structure suffices, while significant results favor the less restrictive model, aiding in model selection without underfitting or overfitting.[27]
Inference in ML-estimated LGMs relies on asymptotic standard errors, which approximate the variability of parameter estimates under large-sample conditions and enable hypothesis testing via z-statistics or confidence intervals.[28] This framework also accommodates unequal time spacing in longitudinal data by specifying time loadings on growth factors to reflect actual measurement occasions, ensuring the model captures the intended temporal structure without assuming equidistance.[22]
Handling Missing Data and Non-Normality
In latent growth modeling (LGM), handling missing data is crucial due to the longitudinal nature of the data, where attrition or intermittent nonresponse is common. Full information maximum likelihood (FIML) estimation addresses this by utilizing all available observations across individuals without resorting to listwise deletion, which can reduce statistical power and introduce bias.[29] This approach maximizes the likelihood based on the observed data patterns, assuming the data are missing at random (MAR), meaning the probability of missingness depends only on observed variables and not on unobserved ones.[30] Under MAR, FIML provides unbiased and efficient parameter estimates in LGM, outperforming traditional methods like mean imputation or pairwise deletion, as demonstrated in simulation studies with structural equation models.[31]
Non-normality in growth data, often arising from skewed outcomes or heteroscedastic errors, can inflate Type I error rates and distort standard errors in maximum likelihood estimation. Robust maximum likelihood (MLR), incorporating the Satorra-Bentler correction, adjusts the chi-square test statistic and standard errors to account for multivariate non-normality, yielding a scaled statistic that better approximates the chi-square distribution under violations of normality.[32] This correction, originally developed for structural equation modeling, applies directly to LGM by rescaling the asymptotic covariance matrix, improving model fit evaluation and inference in non-normal settings.[33] Additionally, bootstrapping techniques generate confidence intervals for growth parameters by resampling the data with replacement, providing robust coverage without relying on normality assumptions; the Bollen-Stine bootstrap, in particular, evaluates overall model fit by imposing the fitted model on bootstrap samples.[34]
Post-2000 advancements have enhanced the robustness of LGM estimation to combined missing data and non-normality challenges. For instance, Muthén and Muthén (2002) utilized Monte Carlo simulations in Mplus to evaluate parameter recovery, standard errors, and confidence interval coverage in latent variable models with non-normal missing data, showing that robust methods like MLR maintain accuracy even under MAR violations and skewness.[35] These developments have made LGM more applicable to real-world longitudinal datasets, such as those in psychology and education, where data irregularities are prevalent.[36]
Bayesian Estimation
Bayesian estimation has emerged as a prominent alternative to ML for LGMs, particularly since the 2010s, offering flexibility for complex models and small samples. It treats parameters as random variables with prior distributions updated by the likelihood to yield posterior distributions, typically estimated via Markov chain Monte Carlo (MCMC) methods. This approach incorporates prior knowledge, provides exact finite-sample inference without asymptotic approximations, and naturally handles non-normality, missing data under MAR or MNAR, and hierarchical structures through informative priors. Simulation studies show Bayesian methods often outperform ML in biased estimates and coverage for distal outcomes or small samples (N < 200), though they require careful prior specification to avoid undue influence. Bayesian LGMs are implemented in software like Mplus, R (e.g., brms package), and Stan, and are increasingly used for growth mixture models and time-varying effects. As of 2025, extensions include multiple-group Bayesian comparisons for testing trajectory differences across populations.[37][38]
Applications and Examples
In Longitudinal Studies
Latent growth modeling (LGM) is extensively used in longitudinal studies within developmental psychology to examine trajectories of change in psychological and behavioral outcomes over time, particularly in social science contexts like education and mental health. Since the 1990s, following its formal introduction as a method for analyzing developmental processes, LGM has become a common tool for investigating phenomena such as achievement gaps, allowing researchers to model how disparities in academic performance evolve across school years.[4] For example, in studies of elementary students, LGM has revealed that growth rates in reading achievement often lead to closing gaps between low- and high-performing groups, whereas mathematics trajectories tend to maintain persistent disparities, highlighting the need for targeted interventions.[39]
A key application involves modeling depression trajectories in adolescents using multi-wave panel data collected over several years. In such analyses, LGM captures the initial levels (intercepts) and rates of change (slopes) in depressive symptoms, often incorporating time-invariant or time-varying predictors to explain individual differences.[40] For instance, family support has been shown to predict the slope of depressive symptom trajectories, with higher levels of perceived support associated with slower increases or even declines in symptoms during early to mid-adolescence.[40]
The analytical process typically begins with fitting an unconditional LGM to establish the average growth pattern and variability in intercepts and slopes across the sample, providing a baseline for understanding heterogeneity in developmental paths.[21] This is followed by conditional models that introduce covariates, such as family support, to test their effects on growth parameters and interpret how these factors contribute to differences in trajectories.[21] Through this sequential approach, researchers can identify protective influences that mitigate adverse developmental trends, informing preventive strategies in social science research.[41]
In Multidisciplinary Contexts
In economics, LGM facilitates the examination of income trajectories across career spans, enabling the modeling of heterogeneous paths such as steady ascents, plateaus, or declines influenced by labor market dynamics. By specifying latent intercepts for starting salaries and slopes for wage growth, economists can incorporate time-invariant covariates like education level to predict divergence in earning profiles over decades-long panels. A key application involves panel data from national surveys, where LGM identifies subclasses of workers experiencing rapid early-career gains versus stagnation, informing policy on income inequality.[42]
Adaptations of LGM in multidisciplinary settings often involve integrating domain-specific covariates to enhance explanatory power, such as environmental exposures in health studies that moderate growth trajectories. In toxicological research, for example, air pollution levels or chemical stressors are included as time-varying predictors in LGM frameworks to assess their impact on neurodevelopmental outcomes, with latent slopes capturing cumulative effects on cognitive function over childhood. This approach reveals how initial exposure doses interact with genetic factors to alter intercept and slope variances, providing insights into preventive interventions.[43]
Post-2020, LGM has seen increased application in analyzing COVID-19 impacts on mental health, addressing discontinuities like pandemic-induced data collection disruptions through piecewise or event-based extensions. Rioux et al. (2021) outline solutions for modeling abrupt shifts in psychological trajectories, such as heightened anxiety slopes during lockdowns, by incorporating "knot points" for policy changes or infection waves in longitudinal mental health surveys.[44] These methods have been pivotal in health economics and epidemiology, quantifying recovery rates in well-being indicators across diverse populations affected by the crisis.
Software and Implementation
OpenMx is a comprehensive open-source R package designed for structural equation modeling (SEM), including full support for latent growth modeling (LGM) through its matrix-based or path specification approaches.[45] It offers advanced capabilities such as nonlinear growth models, simulation studies, and handling of complex constraints, making it suitable for intricate longitudinal analyses.[46] Since its development in the 2000s, OpenMx has been preferred for complex LGM applications due to its flexibility and computational efficiency, with ongoing active development as of 2025.[47] For instance, a linear LGM can be specified using the RAM framework with matrices for asymmetric relationships (A), symmetric covariances (S), filter (F), and means (M), as shown in the following example code for a model with five time points:
r
require(OpenMx)
dataRaw <- mxData(observed=myLongitudinalData, type="raw")
matrA <- mxMatrix(type="Full", nrow=7, ncol=7, free=F,
values=c(0,0,0,0,0,1,0, 0,0,0,0,0,1,1, 0,0,0,0,0,1,2, 0,0,0,0,0,1,3, 0,0,0,0,0,1,4, 0,0,0,0,0,0,0, 0,0,0,0,0,0,0),
byrow=TRUE, name="A")
matrS <- mxMatrix(type="Symm", nrow=7, ncol=7,
free=c(T,F,F,F,F,F,F, F,T,F,F,F,F,F, F,F,T,F,F,F,F, F,F,F,T,F,F,F, F,F,F,F,T,F,F, F,F,F,F,F,T,T, F,F,F,F,F,T,T),
values=c(0,0,0,0,0,0,0, 0,0,0,0,0,0,0, 0,0,0,0,0,0,0, 0,0,0,0,0,0,0, 0,0,0,0,0,0,0, 0,0,0,0,0,1,.5, 0,0,0,0,0,.5,1),
labels=c("residual",NA,NA,NA,NA,NA,NA, NA,"residual",NA,NA,NA,NA,NA, NA,NA,"residual",NA,NA,NA,NA, NA,NA,NA,"residual",NA,NA,NA, NA,NA,NA,NA,"residual",NA,NA, NA,NA,NA,NA,NA,"vari","cov", NA,NA,NA,NA,NA,"cov","vars"),
byrow=TRUE, name="S")
matrF <- mxMatrix(type="Full", nrow=5, ncol=7, free=F,
values=c(1,0,0,0,0,0,0, 0,1,0,0,0,0,0, 0,0,1,0,0,0,0, 0,0,0,1,0,0,0, 0,0,0,0,1,0,0),
byrow=T, name="F")
matrM <- mxMatrix(type="Full", nrow=1, ncol=7, free=c(F,F,F,F,F,T,T),
values=c(0,0,0,0,0,1,1),
labels=c(NA,NA,NA,NA,NA,"meani","means"), name="M")
exp <- mxExpectationRAM("A","S","F","M", dimnames=c(names(myLongitudinalData),"intercept","slope"))
funML <- mxFitFunctionML()
growthCurveModel <- mxModel("Linear Growth Curve Model Matrix Specification",
dataRaw, matrA, matrS, matrF, matrM, exp, funML)
growthCurveFit <- mxRun(growthCurveModel)
require(OpenMx)
dataRaw <- mxData(observed=myLongitudinalData, type="raw")
matrA <- mxMatrix(type="Full", nrow=7, ncol=7, free=F,
values=c(0,0,0,0,0,1,0, 0,0,0,0,0,1,1, 0,0,0,0,0,1,2, 0,0,0,0,0,1,3, 0,0,0,0,0,1,4, 0,0,0,0,0,0,0, 0,0,0,0,0,0,0),
byrow=TRUE, name="A")
matrS <- mxMatrix(type="Symm", nrow=7, ncol=7,
free=c(T,F,F,F,F,F,F, F,T,F,F,F,F,F, F,F,T,F,F,F,F, F,F,F,T,F,F,F, F,F,F,F,T,F,F, F,F,F,F,F,T,T, F,F,F,F,F,T,T),
values=c(0,0,0,0,0,0,0, 0,0,0,0,0,0,0, 0,0,0,0,0,0,0, 0,0,0,0,0,0,0, 0,0,0,0,0,0,0, 0,0,0,0,0,1,.5, 0,0,0,0,0,.5,1),
labels=c("residual",NA,NA,NA,NA,NA,NA, NA,"residual",NA,NA,NA,NA,NA, NA,NA,"residual",NA,NA,NA,NA, NA,NA,NA,"residual",NA,NA,NA, NA,NA,NA,NA,"residual",NA,NA, NA,NA,NA,NA,NA,"vari","cov", NA,NA,NA,NA,NA,"cov","vars"),
byrow=TRUE, name="S")
matrF <- mxMatrix(type="Full", nrow=5, ncol=7, free=F,
values=c(1,0,0,0,0,0,0, 0,1,0,0,0,0,0, 0,0,1,0,0,0,0, 0,0,0,1,0,0,0, 0,0,0,0,1,0,0),
byrow=T, name="F")
matrM <- mxMatrix(type="Full", nrow=1, ncol=7, free=c(F,F,F,F,F,T,T),
values=c(0,0,0,0,0,1,1),
labels=c(NA,NA,NA,NA,NA,"meani","means"), name="M")
exp <- mxExpectationRAM("A","S","F","M", dimnames=c(names(myLongitudinalData),"intercept","slope"))
funML <- mxFitFunctionML()
growthCurveModel <- mxModel("Linear Growth Curve Model Matrix Specification",
dataRaw, matrA, matrS, matrF, matrM, exp, funML)
growthCurveFit <- mxRun(growthCurveModel)
This setup estimates intercept and slope factors with linear loadings (0, 1, 2, 3, 4), supporting maximum likelihood estimation for raw data.[46]
lavaan is another prominent open-source R package for SEM, particularly user-friendly for beginners implementing LGM due to its intuitive syntax and dedicated functions like growth().[48] It integrates seamlessly with the tidyverse ecosystem through packages such as tidySEM and broom, enabling tidy data manipulation, model tidying, and visualization workflows.[49] Additionally, lavaan supports multilevel extensions for clustered longitudinal data, allowing hierarchical LGM specifications.[50] A simple linear LGM example with four time points can be fitted as follows:
r
model <- ' i =~ 1*t1 + 1*t2 + 1*t3 + 1*t4
s =~ 0*t1 + 1*t2 + 2*t3 + 3*t4 '
fit <- growth(model, data=Demo.growth)
model <- ' i =~ 1*t1 + 1*t2 + 1*t3 + 1*t4
s =~ 0*t1 + 1*t2 + 2*t3 + 3*t4 '
fit <- growth(model, data=Demo.growth)
Here, i represents the intercept factor with fixed loadings of 1, and s the slope with linear loadings; the function automatically handles mean structures and estimation.[48]
Commercial Packages
Several commercial software packages facilitate the implementation of latent growth modeling (LGM), offering proprietary features such as intuitive interfaces, robust estimation options, and integration with broader statistical environments to support researchers in applied settings.[51]
Mplus, developed by Muthén & Muthén, is a leading commercial tool specialized for LGM and advanced structural equation modeling (SEM). It supports a variety of growth models, including linear, nonlinear, and latent basis variants, with flexible syntax for model specification. Key features include Bayesian estimation via Markov chain Monte Carlo methods, which is particularly useful for complex models with non-normal data or small samples, and robust handling of missing data through full information maximum likelihood under missing at random assumptions or Bayesian approaches for non-random missingness.[52][53][54] Mplus has been widely adopted in applied research on LGM since the early 2000s, owing to its comprehensive modeling capabilities and extensive output diagnostics.[55]
IBM SPSS AMOS provides a graphical user interface for SEM, enabling users to construct LGM path diagrams visually without extensive coding, which enhances accessibility for linear growth models. It supports maximum likelihood estimation and bootstrapping for inference, with seamless integration into the SPSS ecosystem for data preparation and visualization. However, AMOS is more limited in handling nonlinear growth trajectories compared to syntax-driven packages, often requiring workarounds for advanced specifications.[56][57]
Advantages and Limitations
Strengths Over Traditional Methods
Latent growth modeling (LGM) provides distinct advantages over traditional statistical approaches like repeated measures ANOVA by explicitly accounting for measurement error through latent variables representing initial status and growth rates. This separation of true growth parameters from random error in observed indicators yields more precise estimates of individual trajectories and reduces bias in parameter interpretation.[58] Unlike ANOVA, which aggregates individual differences into residual error, LGM models inter-individual variability in intercepts and slopes as substantive latent factors, enabling researchers to examine heterogeneity in developmental processes directly.[1]
LGM's flexibility extends to incorporating predictors of growth, including time-invariant covariates like demographics and time-varying factors like environmental influences, even in designs where predictors are correlated (non-orthogonal). This allows for comprehensive tests of how external variables shape trajectories without the restrictive assumptions of orthogonality often required in traditional methods.[13] In comparison to repeated measures ANOVA, LGM handles unequal variances and covariances across time points by permitting residual variances to differ freely, accommodating the heteroscedasticity common in longitudinal data and improving model fit.[13]
For analyzing simple, homogeneous growth trajectories, LGM is more parsimonious than growth mixture models, which estimate separate parameters for multiple latent classes to capture unobserved heterogeneity and thus involve greater complexity and computational demands.[59] Simulations demonstrate LGM's superior statistical power for detecting intervention effects or changes over time, particularly in smaller samples; for example, it achieves 80% power with a sample size of 525 for a moderate effect size (0.20), compared to 725 required by ANCOVA.[13] LGM can also be briefly extended to multilevel frameworks for handling nested data structures, further broadening its utility beyond basic longitudinal designs.[13]
Common Challenges and Criticisms
One major challenge in applying latent growth modeling (LGM) is the requirement for large sample sizes to achieve stable parameter estimates and adequate statistical power. Simulation studies indicate that sample sizes of at least 200 are typically needed for reliable detection of growth trajectories, particularly in mediation analyses or models with multiple time points, as smaller samples (e.g., N < 100) lead to biased estimates, poor model fit, and reduced power.[60]
LGM is also sensitive to the choice of time coding, which can alter the interpretation of growth parameters such as intercepts and slopes without changing the underlying data structure. For instance, recoding time from chronological age to deviations around a mean can shift the intercept from representing initial status to an average value across occasions, affecting covariances among growth factors and the precision of predictor effects. This sensitivity may lead to misinterpretations if not carefully considered, with implications for multicollinearity and statistical power varying across different codings.[61]
Nonlinear LGM specifications present additional estimation difficulties, including frequent non-convergence due to the need for precise starting values and the complexity of multiplicative random effects. In practice, these models often require reduced quadrature points or iterative fitting from simpler linear versions to achieve convergence, and even large samples (N > 20,000) may yield non-positive definite covariance matrices or unreliable likelihood ratio tests in mixture extensions.[62][63]
A key criticism of LGM is its assumption of continuous latent growth trajectories within a single homogeneous population, which may overlook discrete subpopulations exhibiting qualitatively distinct patterns better captured by mixture models. Standard LGM fails to account for unobserved heterogeneity, such as varying onset timings in developmental outcomes, potentially oversimplifying complex data and leading to biased average growth estimates.[59]
Furthermore, traditional LGM has been critiqued as outdated in the big data era without extensions, as its reliance on structured equation modeling limits scalability to high-dimensional or unstructured datasets where neural network-based latent variable models offer greater flexibility.[64]
As of 2025, LGM shows limited integration with machine learning techniques, hindering its application to dynamic, high-volume longitudinal data; however, emerging hybrid approaches combining LGM with partial least squares structural equation modeling or deep latent variable frameworks have been proposed to enhance predictive analytics in panel surveys.[65][66]