Fact-checked by Grok 2 weeks ago

Structural equation modeling

Structural equation modeling (SEM) is a comprehensive statistical framework that integrates factor analysis and multiple regression to test and estimate causal relationships among observed and latent variables, allowing researchers to model complex theoretical structures in a single analysis.^[1] Developed as an extension of path analysis, SEM accounts for measurement error and enables the specification of both measurement models, which link latent constructs to their indicators, and structural models, which depict relationships among the constructs themselves.^[2] The roots of SEM trace back to Sewall Wright's path analysis in the early 20th century, which provided a method for decomposing correlations into direct and indirect effects among variables.^[3] The modern formulation emerged in the 1970s through the work of Karl G. Jöreskog, whose 1973 paper introduced a general maximum likelihood estimation method for linear structural equation systems, unifying confirmatory factor analysis and path modeling.^[4] In 1974, Jöreskog and Dag Sörbom released the LISREL software, which made SEM accessible and drove its widespread adoption in social sciences, psychology, and beyond.^[5] SEM consists of two primary components: the measurement model, which defines how latent variables are reflected in observed indicators and handles issues like reliability and validity, and the structural model, which specifies the hypothesized causal paths between latent and observed variables.^[6] Estimation typically employs maximum likelihood or other methods to fit the model to covariance or correlation matrices, with goodness-of-fit indices such as the chi-square test, RMSEA, and CFI used to evaluate model adequacy.^[7] Widely applied in fields like education, marketing, and health sciences, SEM facilitates theory testing, mediation analysis, and longitudinal modeling, offering advantages over traditional methods by simultaneously assessing multiple equations and latent constructs.^[8] Advances continue with extensions like multilevel SEM and Bayesian approaches, enhancing its flexibility for complex data structures.^[9]

Introduction

Definition and Purpose

Structural equation modeling (SEM) is a multivariate statistical framework that integrates elements of factor analysis and multiple regression to analyze relationships among observed variables and latent constructs, while accounting for measurement error in the observed indicators.^[10] This approach emphasizes confirmatory analysis, where researchers specify and test theoretical models against empirical data to evaluate how well the data fit the hypothesized structure.^[11] Seminal developments in SEM trace back to Karl G. Jöreskog's work in the 1970s, which provided a general method for estimating linear structural equation systems involving covariance structures. The primary purpose of SEM is to test hypotheses about complex causal relationships and interdependencies in theoretical models, particularly in disciplines such as psychology, sociology, economics, and education, where latent variables represent unobservable constructs like intelligence or attitudes.^[10] Unlike exploratory techniques, SEM's confirmatory nature allows researchers to assess whether proposed models adequately explain observed covariances, providing a rigorous means to validate or refute theories about variable interactions. SEM facilitates the simultaneous estimation of two interrelated components: the measurement model, which links latent variables to their observed indicators via factor loadings, and the structural model, which specifies the causal paths among latent variables. This integrated estimation accounts for errors in both measurement and structural relations, yielding more accurate parameter estimates than separate analyses.^[10] For instance, consider a simple SEM model examining how latent job satisfaction (inferred from survey items on workload, pay, and supervision) influences latent turnover intention (inferred from items on job search behavior and commitment). This setup distinguishes SEM from multiple regression, as the latter treats predictors as error-free observed variables and cannot model latent constructs or their measurement properties directly. Such models are often visualized using path diagrams to depict variables and hypothesized paths.^[12]

Key Applications

Structural equation modeling (SEM) is extensively applied in the social sciences, particularly in psychology and sociology, where it facilitates the analysis of complex relationships involving latent constructs such as attitudes and social influences. In psychology, SEM has been employed to measure and model attitudes, enabling researchers to test theoretical frameworks that account for measurement error in multi-item scales, as demonstrated in reviews of applications across psychological journals from the late 1990s onward.^[13] For instance, it has been used to examine how latent attitudes toward health behaviors predict observable outcomes like compliance with interventions. In sociology, SEM models social structures by integrating latent variables representing socioeconomic status with observed indicators, allowing for the evaluation of pathways linking socioeconomic status to life satisfaction in older adults.^[14] In economics and marketing, SEM supports the modeling of consumer behavior by simultaneously estimating latent factors like brand loyalty and their impacts on purchasing decisions, surpassing simpler regression approaches through its ability to handle multifaceted data structures. A comprehensive review of SEM applications in major marketing journals highlights its role in testing consumer preference models, such as those linking perceived value to repurchase intentions, with examples from studies on product adoption in competitive markets.^[15] Economists apply SEM to assess structural relationships in behavioral economics.^[16] SEM finds significant use in health sciences for testing disease risk models, where it disentangles pathways among latent risk factors like stress or genetic predispositions and observed health metrics. Researchers have utilized SEM to quantify modifiable risk pathways for atherosclerosis as a cardiovascular disease, prioritizing interventions by modeling direct and indirect effects on outcomes such as cardiovascular events.^[17] In education, SEM evaluates learning outcomes by linking latent variables like student satisfaction to measured performance indicators, as seen in studies of e-learning readiness, self-regulation, and its influence on academic achievement in online environments.^[18] For example, it has been applied to assess how teaching patterns and peer effects contribute to skill acquisition in higher education settings.^[19] One key advantage of SEM over traditional statistical methods, such as multiple regression, lies in its capacity to explicitly model and correct for measurement error in both cross-sectional and longitudinal data, providing more reliable estimates of relationships among latent and observed variables.^[20] This feature is particularly valuable in theory-driven research across these fields, where unobserved constructs underpin empirical hypotheses.^[21]

Historical Development

Origins in Path Analysis

Structural equation modeling traces its foundational roots to path analysis, a statistical technique pioneered by geneticist Sewall Wright in the early 20th century. Wright developed path analysis during the 1920s primarily to model causal relationships in quantitative genetics, building on his earlier work from 1918 and elaborating it in subsequent publications in 1920 and 1921.^[22] This method allowed researchers to decompose observed correlations between variables into direct and indirect causal effects using path diagrams, where arrows represented hypothesized causal paths and path coefficients quantified the strength of these relationships standardized to unit variance.^[22] Central to Wright's approach was the rule of tracing for path coefficients, which states that the correlation between two variables x and y, denoted p_{xy}, equals the sum of products of path coefficients along all connecting paths:

p_{xy} = \sum p_{xz} \cdot p_{zy}

where the summation accounts for both direct effects (when z is absent) and indirect effects through intermediate variables z.^[22] This formula enabled the partitioning of total effects into components attributable to specific causal pathways, providing a graphical and algebraic framework for causal inference from correlational data.^[3] Path analysis drew conceptual inspiration from earlier developments in factor analysis, particularly Charles Spearman's introduction of the two-factor model in 1904, which posited a general intelligence factor underlying correlations among cognitive tests. Spearman's work emphasized latent factors to explain observed variable intercorrelations, laying groundwork for handling unmeasured influences. In the 1930s and 1940s, Louis L. Thurstone extended this to multiple-factor analysis, advocating for several orthogonal primary mental abilities rather than a single general factor, as detailed in his 1931 paper and later book.^[23] Thurstone's centroid method for extracting multiple factors from correlation matrices influenced the integration of measurement models with causal structures, bridging factor analysis and path analysis in ways that anticipated structural equation modeling.^[23] These advancements in factor analysis provided tools for addressing latent constructs, complementing Wright's focus on explicit causal paths among observed variables. Early applications of path analysis appeared predominantly in biology and psychology, where it facilitated the decomposition of correlations into hypothesized causal components. In genetics, Wright applied the method to study inbreeding effects in guinea pigs, estimating that brother-sister mating reduced heterozygosity by approximately 19% per generation, and to analyze coat color patterns, attributing 38% of variation to heredity and 62% to environment.^[22] He also examined transpiration in plants and livestock breeding outcomes, using path coefficients to trace influences from environmental and genetic factors.^[22] In psychology, the technique was used to dissect correlations in human heredity data, such as Barbara Burks' 1928 study on intelligence inheritance, allowing separation of direct genetic paths from indirect environmental mediators.^[22] These applications demonstrated path analysis's utility in causal modeling without experimental manipulation, relying instead on recursive assumptions where causes precede effects in a directed acyclic graph. A key limitation of Wright's original path analysis was its assumption that all variables were measured without error, treating observed measures as true values and thus ignoring attenuation in correlations due to measurement unreliability.^[5] While Wright later incorporated Spearman's correction for attenuation to adjust for known measurement error in specific cases, the standard framework did not systematically model latent variables or error terms in indicators, restricting its applicability to scenarios with precise measurements.^[22] This assumption of error-free observed variables—central to early path models—highlighted the need for extensions that could incorporate measurement error, paving the way for the development of full structural equation models.^[5]

Evolution into Modern SEM

The evolution of structural equation modeling (SEM) accelerated in the 1970s through Karl Jöreskog's pioneering work on the LISREL (Linear Structural Relations) model, which extended path analysis by incorporating latent variables and enabling the simultaneous estimation of measurement and structural equations using covariance-based maximum likelihood methods.^[5] This framework addressed limitations in earlier approaches by allowing researchers to model unobserved constructs and their interrelationships within a unified system, marking a shift from observed-variable path models to comprehensive latent variable analysis. A foundational contribution came in Jöreskog's 1973 paper, which outlined a general method for estimating linear structural equation systems, emphasizing covariance structures and confirmatory approaches over exploratory techniques. Building on this, the integration of confirmatory factor analysis (CFA) with structural models during the 1970s and 1980s—pioneered by Jöreskog and collaborators—permitted the explicit testing of hypothesized factor structures alongside causal paths, enhancing SEM's applicability in social sciences, psychology, and beyond.^[5] Jöreskog and Dag Sörbom's development of the LISREL software in the mid-1970s further operationalized these covariance-based methods, providing a practical tool for model estimation and testing.^[5] The 1980s saw SEM solidify as a standard methodology through influential texts, including Jöreskog and Sörbom's 1981 LISREL V manual, which detailed implementation strategies for linear structural relationships via maximum likelihood and least squares. Similarly, Kenneth A. Bollen's 1989 book, Structural Equations with Latent Variables, offered a rigorous theoretical foundation and practical guidance, emphasizing identification, estimation, and interpretation, which helped demystify SEM for broader audiences.^[24] By the 1990s, the advent of more accessible, graphical user interfaces in SEM software—such as enhancements to LISREL and the introduction of programs like AMOS—dramatically increased adoption across disciplines, transitioning SEM from specialized econometric tools to routine analytical methods in empirical research.^[5] This user-friendly shift, coupled with rising computational power, facilitated SEM's diffusion into fields like education, marketing, and health sciences, solidifying its role in hypothesis-driven modeling.^[5]

Core Concepts

Observed and Latent Variables

In structural equation modeling (SEM), observed variables, also known as manifest variables, are directly measurable quantities obtained from data collection methods such as surveys, tests, or sensors. For example, responses to specific questionnaire items about job satisfaction represent observed variables, as they are empirically recorded without inference.^[25] These variables serve as the empirical foundation for SEM, providing the raw data that link theoretical constructs to real-world observations. In contrast, latent variables are unobservable theoretical constructs that cannot be directly measured but are inferred through their relationships with multiple observed indicators. A classic example is intelligence, which is not directly quantifiable but is approximated via observed scores on cognitive tests like vocabulary or pattern recognition tasks.^[26] Latent variables are essential in SEM for modeling abstract concepts such as attitudes, abilities, or personality traits that underlie patterns in observed data.^[5] The relationship between observed and latent variables is formalized in the measurement model of SEM, which specifies how latent constructs are represented by their indicators. For exogenous latent variables (those not predicted by other latents in the model), the standard equation is:

\mathbf{x} = \Lambda_x \boldsymbol{\xi} + \boldsymbol{\delta}

where \mathbf{x} is the vector of observed indicators, \Lambda_x is the matrix of factor loadings representing the strength of associations between the latent factors \boldsymbol{\xi} and their indicators, and \boldsymbol{\delta} is the vector of measurement errors or unique variances not explained by the latents.^[5] This equation, originating from the LISREL framework, assumes that the errors \boldsymbol{\delta} are uncorrelated with the latents but may covary among themselves. A similar form applies to endogenous latents: \mathbf{y} = \Lambda_y \boldsymbol{\eta} + \boldsymbol{\varepsilon}. These models allow SEM to account for measurement error, distinguishing true latent effects from imperfect observations. Measurement models in SEM are classified as reflective or formative, depending on the assumed direction of causality between the latent variable and its indicators. In reflective models, the latent variable causes variation in the observed indicators, implying that the indicators are interchangeable manifestations of the underlying construct; for instance, multiple survey items assessing anxiety (e.g., "I feel nervous" and "I worry excessively") are caused by the latent anxiety trait, and errors among indicators may correlate due to shared influences.^[27] Reflective models are the default in many SEM applications, particularly in psychology and sociology, where constructs like motivation reflect through correlated symptoms. Conversely, formative models posit that the observed indicators cause the latent variable, treating it as a composite index; here, indicators are not interchangeable, and omitting one could alter the construct's meaning—for example, socioeconomic status as a formative latent formed by income, education level, and occupation, where changes in these predictors directly define the status without assuming correlated errors.^[27] Misspecifying the model type (e.g., treating a formative construct as reflective) can bias parameter estimates and undermine causal inferences.^[28] Ensuring reliability and validity is crucial for establishing credible links between observed and latent variables in SEM. Reliability assesses the consistency and stability of indicators in measuring the latent construct, often via composite reliability coefficients (e.g., values above 0.70 indicate adequate reliability) or average variance extracted (AVE > 0.50), which quantify how much indicator variance is captured by the latent rather than error. Validity evaluates whether the measurement model accurately represents the intended construct, including convergent validity (indicators strongly load on their latent, e.g., via AVE) and discriminant validity (a latent's AVE exceeds its squared correlations with other latents, per the Fornell-Larcker criterion). These assessments, typically conducted through confirmatory factor analysis within SEM, prevent issues like multicollinearity and ensure that inferences about latent relationships are robust and theoretically sound.^[29]

Path Diagrams and Model Representation

Path diagrams provide a visual representation of structural equation models (SEM), facilitating the depiction of hypothesized relationships among observed and latent variables. In these diagrams, observed variables, which are directly measured, are conventionally represented as rectangles or squares, while latent variables, which are inferred from indicators, are shown as ovals or circles.^[30]^[31] Directed relationships, such as regression coefficients or causal paths, are indicated by single-headed arrows pointing from predictor to outcome variables, whereas undirected associations, like covariances or correlations between variables, are denoted by double-headed arrows.^[32] These conventions allow researchers to clearly illustrate the structure of complex models, including measurement and structural components, without relying solely on algebraic notation.^[33] Mathematically, the structural component of an SEM is often expressed using matrix notation derived from the LISREL framework. The core equation for the structural model is:

\eta = B \eta + \Gamma \xi + \zeta

where \eta represents the vector of endogenous latent variables, \xi the vector of exogenous latent variables, B the matrix of coefficients for paths among endogenous variables (with diagonal elements zero to avoid tautology), \Gamma the matrix of coefficients for paths from exogenous to endogenous variables, and \zeta the vector of disturbance terms capturing unexplained variance in \eta.^[2] This formulation, introduced in the development of the LISREL model, enables the simultaneous estimation of multiple interdependent relationships.^[34] Path labeling in diagrams follows standardized Greek-letter conventions to denote parameter types. Coefficients for paths between endogenous latent variables are typically labeled with \beta (beta), reflecting intra-endogenous effects, while paths from exogenous to endogenous latent variables use \gamma (gamma) labels, indicating cross-variable influences.^[35]^[33] These labels correspond directly to elements in the B and \Gamma matrices, aiding in the translation from visual to computational model specification. Error or disturbance terms are often labeled with \zeta or \epsilon, and measurement loadings with \lambda, though the focus here remains on structural paths.^[2] A common application of path diagrams appears in mediation models, where an independent variable influences a dependent variable both directly and indirectly through a mediating variable. Consider a simple mediation example involving stress (exogenous observed variable X), coping strategy (mediating latent variable \eta_1, indicated by observed items), and well-being (endogenous latent variable \eta_2, indicated by observed items). The path diagram would feature a single-headed arrow from X to \eta_1 (labeled \gamma_{11}, the indirect path component), a single-headed arrow from \eta_1 to \eta_2 (labeled \beta_{21}), and a direct single-headed arrow from X to \eta_2 (labeled \gamma_{21}). Double-headed arrows might connect error terms if assumed correlated. In this setup, the direct effect is the path from X to \eta_2 (\gamma_{21}), the indirect effect is the product \gamma_{11} \times \beta_{21}, and the total effect is their sum, quantifying the mediator's role in transmitting influence.^[36] Such diagrams clarify how mediation decomposes overall associations, supporting hypothesis testing in fields like psychology and social sciences.^[37]

Modeling Process

Specification

Model specification is the foundational step in structural equation modeling (SEM), where researchers translate substantive theory into a formal representation of hypothesized relationships among observed and latent variables. This process involves delineating the measurement sub-model, which specifies how observed indicators relate to latent constructs, and the structural sub-model, which outlines the causal paths among the latent variables themselves. Grounded in a priori theoretical hypotheses, specification ensures the model serves as a testable representation of the underlying theory rather than an exploratory tool. The specification of the measurement sub-model begins with identifying which observed variables serve as indicators for each latent construct, based on theoretical expectations about their convergent validity. For instance, if theory posits that cognitive ability is a latent trait reflected in multiple test scores, each score is specified as loading onto the latent factor, with unique error terms accounting for measurement unreliability. Parameters in this sub-model are categorized as fixed (set to predefined values, such as loading the first indicator to 1.0 to establish the factor's scale), free (allowed to vary and be estimated from data), or constrained (e.g., setting multiple loadings equal to test invariance across groups). These decisions stem directly from theory, ensuring that constraints reflect substantive equality rather than arbitrary assumptions. In specifying the structural sub-model, researchers draw on a priori hypotheses to define directed paths between latent variables, including only those supported by established theory to promote model parsimony and avoid capitalization on chance. For example, in a model of employee motivation, a path from a latent "leadership quality" factor to a "job performance" factor would be included if prior research justifies the causal direction, while excluding bidirectional or unsupported paths maintains theoretical fidelity. This selective inclusion of arrows tests specific mediation or direct effects hypothesized in the literature, preventing the model from becoming overly complex or atheoretical. Path diagrams provide a visual aid for this specification, depicting latents as ovals, observed variables as rectangles, and paths as arrows. Common errors in specification include under-specification, where theoretically relevant paths or indicators are omitted, resulting in biased estimates of remaining parameters—for instance, neglecting a mediating latent variable between stress and health outcomes would confound direct effects; and over-specification, where extraneous paths are added without theoretical justification, diluting model power and complicating interpretation, such as including a spurious link between unrelated economic and psychological factors. To mitigate these, researchers must conduct thorough literature reviews to ensure comprehensive yet focused hypothesis generation, prioritizing parsimonious models that balance theoretical coverage with statistical feasibility.

Identification

In structural equation modeling (SEM), identification refers to the extent to which the parameters of a specified model can be uniquely estimated from the observed data, ensuring that the model's implied covariance structure can be distinguished from alternatives. A model is under-identified if there are more free parameters to estimate than available pieces of information from the data, leading to infinite possible solutions; just-identified if the number of free parameters equals the number of unique data elements, yielding a unique but untestable solution; and over-identified if there are fewer free parameters than data elements, allowing for statistical testing of model fit. The fundamental necessary condition for identification, known as the t-rule, states that the degrees of freedom must satisfy t = \frac{p(p+1)}{2} - q \geq 0, where p is the number of observed variables (providing \frac{p(p+1)}{2} unique covariances) and q is the number of free parameters; positive degrees of freedom indicate over-identification, enabling chi-square tests for discrepancy between observed and implied covariances. Distinctions exist between local and global identification in SEM. Local identification holds if parameters can be uniquely estimated in a neighborhood around the true values, assuming the data generating process is sufficiently informative, whereas global identification requires uniqueness across the entire parameter space, which is rarer and harder to verify analytically. Latent variables in SEM are scale-free, lacking inherent units of measurement, which introduces an indeterminacy that must be resolved by imposing constraints such as fixing the variance of each latent to 1 or setting one factor loading per latent to 1 to define its metric relative to observed indicators. Even theoretically identified models may exhibit empirical under-identification due to data characteristics, such as high correlations or small sample sizes, preventing convergence during estimation. Diagnostics include factor determinacy, which measures the reliability of estimated latent variable scores as the squared correlation between true and estimated factors (values near 1 indicate well-determined factors, while low values signal under-identification); and expected parameter change (EPC), which approximates the value a fixed parameter would take if freed, with large EPCs highlighting local under-identification or misspecification. To achieve identification, researchers employ strategies such as using multiple indicators (at least three per latent variable) to over-identify the measurement model, fixing latent variances to 1 for scale setting, or constraining one loading per factor to 1 while estimating others; these approaches leverage the redundancy in over-identified models to resolve indeterminacies without altering theoretical meaning. For single-indicator latents, fixing the loading to 1 and error variance to 0 treats the observed variable as a perfect measure, though this assumes no measurement error.^[38]

Estimation

Estimation in structural equation modeling (SEM) involves obtaining parameter values that best fit the observed data to the specified model, typically by minimizing a discrepancy function between the sample covariance matrix and the covariance matrix implied by the model parameters. For this process to yield unique and consistent estimates, the model must first be identified, ensuring that parameters can be uniquely determined from the data. The most widely used method is maximum likelihood (ML) estimation, which assumes that the observed variables follow a multivariate normal distribution and that the sample size is sufficiently large (typically at least 200 cases) to ensure the asymptotic properties of the estimator hold. Under these assumptions, ML estimation minimizes the following fit function:

F_{ML}(\theta) = \log|\Sigma(\theta)| + \trace[\Sigma(\theta)^{-1} S] - \log|S| - p

where S is the sample covariance matrix, \Sigma(\theta) is the model-implied covariance matrix parameterized by \theta, |\cdot| denotes the determinant, \trace(\cdot) is the trace, and p is the number of observed variables. This function is derived from the likelihood of the multivariate normal distribution and leads to estimates that are consistent, asymptotically unbiased, and efficient when the model is correctly specified and assumptions are met. The ML approach was formalized by Jöreskog in his development of covariance structure analysis, enabling the joint estimation of measurement and structural parameters in SEM. When the multivariate normality assumption is violated, such as with categorical, ordinal, or skewed data, alternative estimators are preferred to avoid biased standard errors and inflated chi-square statistics. Generalized least squares (GLS) extends ML by weighting the covariance matrix with its asymptotic covariance, providing consistent estimates under weaker distributional assumptions but requiring larger sample sizes for reliability. Weighted least squares (WLS), particularly in its diagonally weighted form, is commonly used for non-normal or categorical data; it minimizes a quadratic form weighted by the inverse of the covariance matrix of the sample moments, making it asymptotically distribution-free under certain conditions. These methods were advanced by Browne to address limitations of ML in non-normal settings, though they can be computationally intensive and sensitive to small samples. Parameter estimates in SEM are typically obtained through iterative numerical algorithms, as closed-form solutions are rarely available except in simple cases. The Newton-Raphson method is a standard iterative procedure that solves the first-order conditions of the likelihood equations by approximating the Hessian matrix of second derivatives at each step, updating parameters until convergence criteria (e.g., change in fit function less than a small threshold) are satisfied. Other algorithms, such as quasi-Newton methods like BFGS, may be used for improved stability when the Hessian is ill-conditioned. Non-convergence can occur due to starting values far from the solution, local minima, or model misspecification; in such cases, practitioners often employ better initial estimates (e.g., from least squares), rescaling, or alternative algorithms to achieve reliable results.

Assessment

Assessment of structural equation models involves evaluating the degree to which the specified model reproduces the observed covariance structure in the data, typically after parameter estimation via methods such as maximum likelihood. Global fit indices provide an overall measure of model adequacy, while local indices identify specific areas of misspecification. The chi-square (\chi^2) test serves as the primary statistical test for exact model fit in SEM, assessing the discrepancy between the observed and model-implied covariance matrices multiplied by the sample size. A non-significant \chi^2 indicates that the model fits the data exactly, but this test is highly sensitive to sample size, often rejecting well-fitting models in large samples due to its asymptotic nature. As an alternative absolute fit measure less affected by sample size, the standardized root mean square residual (SRMR) quantifies the average discrepancy between observed and predicted correlations after standardization; values below 0.08 suggest good fit. Incremental fit indices compare the target model to a baseline null model, typically assuming no correlations among variables. The comparative fit index (CFI) measures the relative improvement in fit, with values greater than 0.95 indicating good fit. Similarly, the Tucker-Lewis index (TLI), also known as the non-normed fit index, penalizes for model complexity and supports values above 0.95 for acceptable fit. Parsimony-adjusted indices account for model complexity to avoid favoring overly elaborate specifications. The root mean square error of approximation (RMSEA) estimates the error of approximation per degree of freedom, providing a measure of discrepancy in the population; values below 0.06 indicate close fit, with 0.08 as a reasonable upper bound for acceptable fit, and reporting the 90% confidence interval is recommended to assess precision.^[39] For local fit assessment, modification indices (MI) identify fixed parameters whose relaxation would most reduce the \chi^2 statistic, serving as Lagrange multiplier tests for potential model improvements. The expected parameter change (EPC) complements MI by estimating the magnitude of change in the parameter if freed, often standardized for interpretability. However, post-hoc modifications based on MI and EPC risk capitalizing on chance, inflating fit in the current sample without generalizing to the population, and should be theoretically justified or cross-validated.

Interpretation

Interpretation of results in structural equation modeling (SEM) involves translating estimated parameters into meaningful substantive insights, assuming the model has demonstrated adequate fit during assessment. Path coefficients represent the estimated relationships between variables and are typically reported in standardized form, akin to beta weights in multiple regression, where values range from -1 to 1 and indicate the expected change in the dependent variable for a one-standard-deviation increase in the predictor while holding other variables constant.^[40] For instance, a standardized path coefficient of 0.3 suggests a moderate positive effect, following guidelines adapted from Cohen's conventions for social sciences, though SEM-specific benchmarks emphasize contextual relevance over universal thresholds.^[41] Error variances, or disturbance terms for endogenous variables, quantify the unexplained portion of variance after accounting for all predictors; a low error variance (e.g., approaching 0) implies strong model explanatory power for that variable.^[40] Indirect effects, central to mediation processes in SEM, are computed as the product of constituent path coefficients, such as ab in a simple mediation model where a is the effect of the independent variable on the mediator and b is the effect of the mediator on the dependent variable.^[37] Total effects combine direct and indirect paths, providing an overall assessment of influence. To test the significance of indirect effects, which often violate normality assumptions, bootstrapping is recommended, generating confidence intervals (e.g., 95%) by resampling the data thousands of times to estimate the distribution of the effect without relying on parametric tests.^[42] Reporting practices in SEM emphasize effect sizes to convey practical significance beyond statistical tests. For endogenous variables, R^2 values indicate the proportion of variance explained, with thresholds like 0.02 (small), 0.13 (medium), and 0.26 (large) proposed for behavioral sciences, alongside confidence intervals for path coefficients to highlight estimation precision.^[41] These metrics facilitate comparison across studies and underscore the model's theoretical implications. While SEM facilitates testing of hypothesized causal structures, interpretations must caution against strong causal claims, particularly with cross-sectional data, which cannot establish temporal precedence and may confound directionality without additional assumptions or experimental controls.^[1] Longitudinal designs or instrumental variables are preferable for robust causal inference in SEM applications.^[43]

Advanced Extensions

Multilevel and Longitudinal SEM

Multilevel structural equation modeling (ML-SEM) extends traditional SEM to accommodate clustered or hierarchical data structures, where observations are nested within groups, such as students within schools or employees within organizations. This approach decomposes the total variance into within-group (e.g., individual-level) and between-group (e.g., school-level) components, allowing researchers to model relationships at multiple levels simultaneously while accounting for the interdependence of observations within clusters. By integrating multilevel regression principles with SEM's capacity for latent variables, ML-SEM enables the examination of cross-level interactions and group-level effects that single-level models overlook. In ML-SEM, the model is typically specified across levels. At Level 1 (within-group), the equation for an outcome y_{ij} for individual i in group j is given by
y_{ij} = \beta_{0j} + \beta_{1j} x_{ij} + r_{ij},
where \beta_{0j} is the intercept, \beta_{1j} is the slope, x_{ij} is a predictor, and r_{ij} is the residual. At Level 2 (between-group), parameters from Level 1 are modeled as
\beta_{0j} = \gamma_{00} + u_{0j},
where \gamma_{00} is the overall intercept and u_{0j} is the group-specific deviation; similar equations apply to slopes. This framework partitions covariances and means at each level, facilitating the analysis of complex nested structures like educational outcomes varying by individual traits and school policies. Longitudinal SEM addresses change over time by modeling repeated measures as indicators of latent trajectories, with latent growth modeling (LGM) as a primary approach for capturing individual differences in growth patterns. In LGM, observed variables at multiple time points load onto latent factors representing initial status and rate of change, enabling tests of linear, nonlinear, or piecewise trajectories. For a linear growth model, the measurement model specifies that for each time point t, the observed variable y_{ti} = \alpha_i + \lambda_t \beta_i + \epsilon_{ti}, where \alpha_i is the latent intercept (status at time 0), \beta_i is the latent slope (rate of change), \lambda_t codes time (e.g., 0, 1, 2 for equally spaced intervals), and \epsilon_{ti} is the time-specific residual.^[44] The latent growth factors \alpha_i and \beta_i can have their own means, variances, covariances, and be predicted by covariates, revealing how variables like age or intervention influence developmental paths. Applications of longitudinal SEM are prevalent in panel data studies across psychology, sociology, and health sciences, where it models dynamic processes such as cognitive decline or behavioral adaptation over repeated observations. For instance, LGM can assess how socioeconomic factors predict trajectories of academic achievement across multiple waves of data. To handle missing data common in longitudinal designs—due to attrition or nonresponse—full information maximum likelihood (FIML) estimation is widely used, maximizing the likelihood based on all available data under the assumption of missing at random, thereby reducing bias compared to listwise deletion. Core estimation methods from single-level SEM, such as maximum likelihood, are extended in these contexts to fit multilevel and growth models efficiently.

Bayesian and Non-Standard Estimators

Bayesian structural equation modeling (BSEM) extends traditional SEM by incorporating prior knowledge into the estimation process through a probabilistic framework, allowing for the computation of full posterior distributions of model parameters rather than point estimates. In BSEM, the posterior distribution is given by p(\theta | y) \propto p(y | \theta) p(\theta), where \theta represents the model parameters, y the observed data, p(y | \theta) the likelihood, and p(\theta) the prior distribution, providing a principled way to update beliefs based on data while accounting for uncertainty. This contrasts with frequentist maximum likelihood (ML) estimation, which maximizes the likelihood p(y | \theta) to obtain point estimates without incorporating priors, potentially leading to less flexibility in handling prior substantive knowledge or small sample sizes. Estimation in BSEM typically relies on Markov chain Monte Carlo (MCMC) methods to sample from the often intractable posterior distribution, enabling the approximation of posterior means, variances, and credible intervals for parameters.^[45] Priors can be conjugate for certain distributions, such as normal-inverse Wishart priors for mean and covariance parameters in linear models, which facilitate analytical tractability and ensure posterior propriety under mild conditions. Credible intervals, interpreted as intervals containing the true parameter value with a specified posterior probability (e.g., 95%), offer a direct measure of uncertainty that is particularly useful for inference in complex models where asymptotic assumptions of frequentist methods may fail. Non-standard estimators in SEM address violations of classical assumptions, such as multivariate normality, by providing robust alternatives to standard ML. Robust maximum likelihood (MLR), incorporating the Satorra-Bentler correction, adjusts the chi-square test statistic and standard errors to account for non-normality through mean- and variance-adjusted scaling, improving the validity of fit assessments and inference in the presence of skewed or kurtotic data.^[46] This correction, originally developed for covariance structure analysis, scales the normal-theory statistic by factors derived from the model's asymptotic variance, ensuring better small-sample performance and robustness without requiring data transformation.^[47] Another non-standard approach involves two-stage least squares (2SLS) estimation using model-implied instrumental variables (MIIVs), which handles endogeneity and identification issues in SEM by treating instruments derived from the model's structure as exogenous predictors. In the first stage, endogenous variables are regressed on instruments to obtain predicted values; in the second stage, these predictions replace the originals in the structural equations, yielding consistent estimates even under misspecification of certain model components. This method is particularly advantageous for overidentified models, as it allows equation-by-equation estimation and tests of overidentification via Hansen's J-statistic. These Bayesian and non-standard estimators offer key advantages in scenarios with small samples or complex structures, where traditional ML may produce unstable or biased results; for instance, BSEM's priors can stabilize estimates by borrowing strength from external information, while robust methods maintain validity under distributional violations. Software such as Mplus implements these approaches, supporting MCMC sampling with user-specified priors and providing credible intervals alongside robust corrections for practical application in psychological and social science research.^[48]

Confirmatory Factor Analysis

Confirmatory factor analysis (CFA) represents a restricted form of factor analysis designed to test predefined theoretical structures relating observed variables to underlying latent constructs. Unlike exploratory approaches, CFA imposes specific constraints on the model parameters based on prior theory, allowing researchers to evaluate whether the data support hypothesized relationships between indicators and factors. The core measurement model in CFA is expressed as:

\mathbf{x} = \boldsymbol{\Lambda} \boldsymbol{\phi} + \boldsymbol{\delta}

where \mathbf{x} is the vector of observed variables, \boldsymbol{\Lambda} is the matrix of factor loadings, \boldsymbol{\phi} represents the latent factors (which may be orthogonal or correlated), and \boldsymbol{\delta} denotes the unique errors or residuals, assumed to be uncorrelated with the factors. This formulation, introduced by Jöreskog, enables maximum likelihood estimation of the parameters while testing the adequacy of the specified structure against the sample covariance matrix. A primary distinction between CFA and exploratory factor analysis (EFA) lies in their objectives and analytical approach: CFA is theory-driven and confirmatory, aiming to validate a prespecified model by fixing certain loadings to zero and estimating only the theoretically relevant paths, whereas EFA is data-driven and exploratory, allowing all possible loadings to be estimated without prior constraints to uncover latent structures. This confirmatory nature of CFA ensures that interpretations are guided by substantive theory rather than emergent patterns, reducing the risk of overfitting to sample-specific noise. For instance, in EFA, factor rotations are often applied to achieve interpretability, but CFA avoids such indeterminacies by directly testing the hypothesized configuration. The implementation of CFA involves several key steps to specify and evaluate the measurement model. First, researchers define the factor structure by assigning observed indicators to specific latent factors and specifying which loadings, factor covariances, and error variances (uniquenesses) are free to be estimated versus fixed to zero or other values based on theory. Second, the model is estimated using methods like maximum likelihood, minimizing the discrepancy between the observed and implied covariance matrices. Third, fit indices such as the chi-square test, comparative fit index (CFI), and root mean square error of approximation (RMSEA) are examined to assess overall model adequacy, with values like CFI > 0.95 and RMSEA < 0.06 indicating good fit. An extension of this process includes testing measurement invariance across groups, such as by constraining loadings or intercepts to equality in multigroup CFA and comparing fit across increasingly restrictive models to determine if the construct operates similarly (e.g., across genders or cultures). CFA finds extensive application in scale validation within psychological, educational, and social sciences research, particularly for confirming whether items load onto intended latent factors as theorized. For example, in developing a questionnaire to measure personality traits, CFA can test if subsets of items reliably indicate distinct factors like extraversion and neuroticism, ensuring the scale's structural validity before broader use. By focusing on latent variables—unobserved constructs inferred from multiple indicators—CFA supports the rigorous operationalization of abstract concepts, enhancing the reliability of subsequent analyses.

Path Analysis and Regression Alternatives

Path analysis serves as an observed-variable counterpart to structural equation modeling (SEM), restricting its scope to directly measured constructs without incorporating latent variables. It employs recursive systems of linear equations to model hypothesized causal chains, where path coefficients quantify direct and indirect effects among variables. This approach decomposes correlations into causal pathways, assuming a unidirectional flow without feedback loops.^[49]^[50] Developed by Sewall Wright as a precursor to modern SEM techniques, path analysis originated in genetic studies to trace inheritance patterns but has since been applied across disciplines for causal inference. In practice, models are specified via path diagrams, with estimation typically via ordinary least squares on the implied covariance structure. However, this method assumes perfect measurement of variables, neglecting error in observations, which can lead to attenuated estimates and overstated significance in complex chains.^[49]^[50]^[51] A primary limitation of path analysis relative to SEM lies in its failure to model measurement error, treating observed variables as true scores and potentially biasing path coefficients. For example, consider a basic causal relation:

y = \beta x + \epsilon

Here, path analysis estimates \beta assuming x contains no error, whereas SEM would introduce a latent true score for x to correct for unreliability, yielding more precise inferences about the underlying relationship. This oversight reduces power to detect effects and risks invalid model acceptance, particularly when error variance is substantial.^[50]^[51] Regression-based alternatives provide efficient options for simpler structures where full path modeling or error correction is unnecessary. Multiple regression excels in estimating effects for single outcomes from multiple observed predictors, avoiding the multivariate estimation required in path analysis while suiting exploratory or parsimonious designs. For outcomes that are categorical, logistic regression adapts the framework to binary or polytomous data, linking predictors to log-odds via maximum likelihood without needing covariance-based fitting. These methods are computationally lighter and interpretable for models lacking mediation or latent elements.^[52]^[53] Path analysis and regression suffice over SEM in scenarios with smaller models, exclusively observed variables, and minimal concern for measurement error, prioritizing ease and direct hypothesis testing. Related techniques, such as vector autoregression (VAR) for time series, extend this logic by modeling lagged interdependencies among observed series without enforcing recursive causality, offering a flexible alternative for dynamic data where temporal ordering is key.^[54]^[55]

Implementation and Tools

Software Packages

Software packages for structural equation modeling (SEM) encompass both commercial and open-source options, enabling researchers to specify, estimate, and assess models ranging from basic path analyses to advanced multilevel and latent growth structures. These tools vary in accessibility, with commercial software often providing polished interfaces and support, while open-source alternatives leverage programming environments like R for customization and integration with broader statistical workflows.^[56] Among commercial packages, LISREL stands as the original software for SEM, utilizing a command-based interface to handle standard and multilevel models with continuous, categorical, and incomplete data from surveys or random samples.^[57] It supports maximum likelihood (ML) estimation and confirmatory factor analysis (CFA), making it suitable for traditional covariance-based SEM applications.^[57] Mplus offers high flexibility for complex analyses, including mixture models, latent class growth analysis, and multilevel SEM, with robust support for ML, Bayesian estimation, and handling of cross-sectional, longitudinal, and clustered data.^[58] Its input syntax allows precise specification of intricate relationships, such as those involving categorical outcomes or missing data mechanisms.^[58] EQS provides robust capabilities for SEM, including support for non-normal distributions and robust estimation methods like Satorra-Bentler correction, with both graphical and command-line interfaces for model specification and testing.^[59] IBM SPSS Amos integrates graphical path diagramming within the SPSS environment, simplifying model building for beginners while supporting SEM, CFA, path analysis, and Bayesian estimation for attitudinal and behavioral models.^[60] The visual interface facilitates hypothesis testing and model comparison without extensive programming.^[60] Stata's gsem command enables generalized structural equation modeling, accommodating multilevel mixed-effects models, generalized linear responses, and latent variables within a unified framework, integrated into Stata's syntax for seamless use in econometric and social science analyses.^[61] SAS PROC CALIS offers comprehensive SEM tools within the SAS environment, supporting path analysis, CFA, full SEM, and multilevel models with various estimation methods including ML and robust corrections for non-normality.^[62] Open-source packages in R provide cost-free alternatives with strong extensibility. The lavaan package fits a wide array of latent variable models, including CFA, full SEM, and latent growth curves, using ML and robust estimators, and benefits from R's ecosystem for preprocessing and post-estimation tasks.^[63] It handles multilevel data and ordered categorical variables through intuitive syntax.^[63] OpenMx extends R-based SEM with advanced optimization tools for custom likelihood functions and matrix specifications, supporting multilevel, genetic, and dynamic models beyond standard implementations.^[64] Its programmatic approach excels in high-performance computing for large datasets or specialized constraints.^[65] JASP offers a free, user-friendly graphical interface for SEM, built on the lavaan package, allowing point-and-click model specification, estimation, and visualization suitable for researchers preferring GUI over coding.^[66] The sem package, an earlier R tool, enables basic SEM and path analysis but lacks the comprehensive features of successors like lavaan, limiting its use in modern applications.^[56]

Package	Type	Primary Interface	Key Supported Features	Best For
LISREL	Commercial	Command-line	ML estimation, multilevel SEM, incomplete data	Traditional covariance-based SEM
Mplus	Commercial	Syntax-based	Bayesian/ML, mixtures, longitudinal/multilevel	Complex, advanced modeling
EQS	Commercial	Graphical/Command	Robust estimators, non-normal data, CFA/SEM	Non-normal distributions, testing
AMOS	Commercial	Graphical (SPSS)	Path diagrams, CFA/SEM, Bayesian estimation	Beginners, visual model building
gsem (Stata)	Commercial	Syntax-based	Generalized linear, multilevel, latent vars	Econometric/social science integration
CALIS (SAS)	Commercial	Syntax-based	Path/CFA/SEM, robust methods, multilevel	SAS ecosystem, comprehensive stats
lavaan	Open-source	R syntax	ML/robust estimators, growth models, multilevel	Integrated R workflows
OpenMx	Open-source	R matrix-based	Custom optimizations, genetic/multilevel SEM	Specialized, high-power analyses
JASP	Open-source	Graphical (R backend)	GUI for SEM/CFA, visualization, easy estimation	User-friendly, non-programmers
sem	Open-source	R syntax	Basic path analysis, simple SEM	Introductory or legacy use

When selecting software, researchers weigh ease of entry—graphical tools like AMOS, EQS, or JASP for novices—against analytical depth, such as Mplus, OpenMx, or gsem for bespoke multilevel or Bayesian extensions in rigorous studies.^[56]

Practical Guidelines

When conducting structural equation modeling (SEM) analyses, data preparation is crucial to ensure reliable estimation and inference. A minimum sample size of 200 is generally recommended for simple models to achieve stable parameter estimates and adequate power, though this can vary based on model complexity. Additionally, a common guideline is a ratio of at least 10 observations per estimated parameter, with some experts suggesting 20:1 for more robust results in confirmatory contexts. For handling missing data, full information maximum likelihood (FIML) is the preferred method under the assumption of missing at random, as it utilizes all available data without introducing bias from listwise deletion or imputation artifacts. Power analysis in SEM should precede data collection to determine the sample size needed for detecting meaningful effects. Monte Carlo simulations are a flexible approach for this, allowing researchers to specify population parameters, generate replicated datasets, and evaluate the probability of recovering true effects under various conditions such as effect size and non-normality. Tools like the MonteCarloSEM R package facilitate these simulations by automating data generation and convergence checks for complex models, including those with categorical indicators or multilevel structures. Reporting SEM results requires transparency to enable replication and scrutiny. Standard practice includes presenting a clear path diagram of the model, alongside global fit indices such as the comparative fit index (CFI > 0.95), root mean square error of approximation (RMSEA < 0.06), and standardized root mean square residual (SRMR < 0.08), as these provide complementary evidence of model adequacy. Parameter estimates should be reported in tables with standard errors, confidence intervals, and p-values, supplemented by sensitivity analyses such as checks for alternative specifications or bootstrapping to assess robustness. Best practices emphasize a confirmatory mindset to avoid capitalization on chance. Analysts should begin by validating the measurement model via confirmatory factor analysis (CFA) before proceeding to the full structural model, ensuring latent variables are well-defined. Model modifications based on empirical fit indices should be minimized and theoretically justified, as post-hoc changes can inflate fit and reduce generalizability. Finally, incorporating equivalence testing—such as two one-sided tests (TOST) for parameters—can complement traditional null hypothesis testing by providing evidence for the absence of meaningful differences when appropriate.

Limitations and Debates

Methodological Criticisms

One major methodological criticism of structural equation modeling (SEM) concerns violations of its core assumptions, particularly multivariate normality of the observed variables, which can lead to biased parameter estimates and inflated standard errors when using maximum likelihood (ML) estimation. Non-normality often arises in real-world data, such as skewed distributions in psychological or social science measures, resulting in distorted chi-square statistics and unreliable inference about model fit and path coefficients. For instance, simulations have demonstrated that moderate to severe non-normality can substantially increase Type I error rates in standard ML-based tests.^[67] To address this, robust estimators like the Satorra-Bentler scaled chi-square correction adjust for non-normality by rescaling the test statistic and standard errors, providing more accurate p-values and confidence intervals without assuming normality. These corrections are now widely implemented in software such as Mplus and lavaan, though they may reduce statistical power in detecting true effects under extreme non-normality. Another set of criticisms focuses on identification and convergence problems, especially in complex models with many latent variables or cross-loadings, where the model may fail to produce a unique solution or converge during estimation. Identification issues occur when the number of free parameters exceeds the available information from the covariance matrix, leading to underidentified models that yield unstable or multiple equivalent solutions, such as Heywood cases with negative variance estimates. Convergence failures are exacerbated in large models, with studies showing notable rates of convergence failure in simulations involving 10 or more latent variables, often due to starting value problems or local maxima in the likelihood function.^[68] Researchers recommend checking local identification via expected parameter change indices and using alternative starting values or simpler model respecifications to mitigate these risks, though complex models inherently amplify the potential for non-convergence. SEM has also been critiqued for over-reliance on fit indices, such as the comparative fit index (CFI) or root mean square error of approximation (RMSEA), which can encourage data-driven modifications that ignore underlying theory and capitalize on chance correlations—a phenomenon akin to the "Texas sharpshooter" fallacy. Post-hoc adjustments based on modification indices, which suggest adding paths or covariances to improve fit, often result in overfitting, where models fit the current sample well but generalize poorly to new data, with cross-validation studies showing substantial degradation in out-of-sample fit. This practice undermines the confirmatory intent of SEM, as fit indices alone cannot validate theoretical constructs; seminal work emphasizes that indices should supplement, not supplant, substantive theory, with thresholds like CFI > 0.95 being arbitrary and context-dependent. Brief reference to assessment methods highlights how such misuse distorts the interpretation of these indices.^[69] Finally, small sample sizes pose significant biases in SEM, including underestimated standard errors and low statistical power to detect model misspecification, particularly for latent variable models requiring at least 200-500 observations for stability. In samples below 200, parameter estimates can exhibit notable biases under correct specification, while power to reject misspecified models is often low even for moderate effect sizes, as shown in Monte Carlo simulations across various model complexities.^[70] These deficiencies are pronounced in multilevel or longitudinal SEM, where effective sample sizes are further reduced, leading to unreliable factor loadings and path coefficients; guidelines recommend power analyses using tools like Monte Carlo simulations to ensure adequate detection of deviations from the hypothesized structure.

Current Controversies

One ongoing debate in structural equation modeling (SEM) centers on its capacity for causal inference, given its reliance on correlational data and untestable assumptions. Critics argue that SEM cannot establish causality without experimental interventions, as model fit does not validate causal structures but merely renders assumptions more plausible.^[71] Judea Pearl's do-calculus framework highlights this limitation by emphasizing the need for explicit intervention rules (e.g., do(X)) to identify causal effects, which traditional SEM lacks without randomization or instrumental variables.^[71] This has led to calls for integrating causal graphical models with SEM to better distinguish association from causation, particularly in observational studies prevalent in social sciences.^[71] The reproducibility crisis has also spotlighted SEM's vulnerability to p-hacking, where researchers flexibly adjust model specifications, such as modifying paths or fit indices (e.g., selectively reporting CFI or RMSEA thresholds), to achieve significance post-data collection.^[72] This practice undermines the falsifiability of complex SEMs, which involve numerous researcher degrees of freedom in specification and estimation.^[73] In response, advocates promote preregistration of analysis plans, including predefined model structures and fit criteria, to enhance transparency and reduce bias, as evidenced in partial least squares SEM applications.^[72] Efforts to address these critiques include the movement toward hybrid methods that incorporate machine learning techniques, such as regularized SEM, which applies penalties (e.g., LASSO) to sparse parameter selection and mitigate overfitting in high-dimensional data.^[74] These approaches, including SEM trees for detecting heterogeneity, aim to blend SEM's theoretical rigor with ML's predictive power, fostering more robust models amid frequentist limitations.^[74] Bayesian extensions offer a complementary response by incorporating prior distributions to handle uncertainty in small samples, potentially alleviating issues like multiple testing in frequentist SEM.^[75] Equity concerns in SEM arise from its underrepresentation in non-Western contexts, where most applications derive from WEIRD (Western, Educated, Industrialized, Rich, Democratic) samples, limiting generalizability. Post-2010s debates emphasize the need for rigorous measurement invariance testing across cultures to avoid biased comparisons, yet such tests are infrequently reported, risking invalid cross-cultural inferences in constructs like attitudes or behaviors. This has prompted calls for inclusive data collection and invariance-focused guidelines to promote equitable SEM applications globally.^[76]

References

[1]
Structural equation modeling in medical research: a primer
Oct 22, 2010 · Structural equation modeling (SEM) is a set of statistical techniques used to measure and analyze the relationships of observed and latent variables.Step 2: Identify The Model · Step 3: Estimate The Model · 2. Examples Of SemMissing: seminal | Show results with:seminal
[2]
Structural Equation Model - an overview | ScienceDirect Topics
A structural equation model (SEM) is defined as a statistical approach that extends path analysis by allowing for the modeling of latent variables, ...Introduction to Structural... · Theoretical Foundations and...Missing: seminal | Show results with:seminal
[3]
The Method of Path Coefficients - jstor
SEWALL WRIGHT. Department of Zoology, The University of Chicago. Introduction. The method of path coefficients was suggested a number of years ago (Wright ...
[4]
[PDF] key advances in the history of structural equation modeling
Jun 25, 2011 · In a seminal and landmark paper that summarized his approach, Jöreskog. (1973) presented his maximum likelihood framework for estimating SEMs ...
[5]
FIFTY YEARS OF STRUCTURAL EQUATION MODELING
Jöreskog and Sörbom's (1974) LISREL program was a turning point in that it made an ML estimator widely available for SEM. This generalization and unification ...
[6]
An Introduction to Structural Equation Modeling - SpringerLink
Nov 4, 2021 · Structural equation modeling is a multivariate data analysis method for analyzing complex relationships among constructs and indicators.Missing: seminal | Show results with:seminal
[7]
Introduction to Structural Equation Modeling: Issues and Practical ...
Sep 10, 2007 · The purpose of this module is to introduce the foundations of SEM modeling with the basic covariance structure models to new SEM researchers.Missing: seminal | Show results with:seminal
[8]
Structural equation models and the quantification of behavior - PNAS
Jul 5, 2011 · This work provides an overview of latent variable SEMs. We present the equations for SEMs and the steps in modeling, and we provide three illustrations of SEMs.Structural Equation Models... · What Are Sems? · Estimation
[9]
An overview of structural equation modeling: its beginnings ...
Jan 9, 2017 · MG-SEM was initiated by Jöreskog (1971), who developed a maximum likelihood estimator for SEM with multiple groups, and by Sörbom (1974), who ...
[10]
Structural Equation Modeling: A Framework for Ocular and Other ...
Structural equation modeling (SEM) is a modeling framework that encompasses many types of statistical models and can accommodate a variety of estimation and ...
[11]
Structural Equation Modeling - an overview | ScienceDirect Topics
Structural equation modeling originated (Jöreskog (1973); Bentler (1980); Bollen (1989)) as a method for modeling linear relations among observed and ...
[12]
Analysis of Covariance Structures - SpringerLink
Jöreskog, K. G. A general method for analysis of covariance structures. Biometrika, 1970a, 57, 239–251. Google Scholar. Jöreskog, K. G. Estimation and ...
[13]
Applications of structural equation modeling in psychological research
This chapter presents a review of applications of structural equation modeling (SEM) published in psychological research journals in recent years.
[14]
A Systematic Review of Structural Equation Modelling in Social ...
This study systematically reviewed how SEM is used in social work research and the extent to which it reflects best practices.
[15]
Applications of structural equation modeling in marketing and ...
This paper reviews prior applications of structural equation modeling in four major marketing journals (the Journal of Marketing, Journal of Marketing Research)
[16]
Applications of structural equation modeling in marketing and ...
This paper reviews prior applications of structural equation modeling in four major marketing journals (the Journal of Marketing, Journal of Marketing ...
[17]
Using Structural Equation Modeling to Untangle Pathways of Risk ...
Quantifying the pathways of those modifiable risk factors using SEM may be a useful tool for the prioritization of prevention targets.
[18]
Using structural equation modeling to investigate students ... - NIH
Jul 24, 2024 · This study investigated undergraduate students' satisfaction with UTS and the influencing factors, aiming to promote the healthy and sustainable development of ...
[19]
A Structural Equation Modeling Approach in Higher Education
Aug 30, 2025 · In this study, we break down the learning context into four elements: curriculum design, teaching pattern, learning initiative, and peer effects ...
[20]
[PDF] Structural Equation Modeling in Practice: A Review and ...
In this article, we provide guidance for substantive researchers on the use of structural equation modeling in practice for theory testing and development.
[21]
Structural Equation Modeling in Practice: A Review and ...
In this article, we provide guidance for substantive researchers on the use of structural equation modeling in practice for theory testing and development.
[22]
[PDF] The Method of Path Coefficients - Gwern.net
The method of path coefficients was suggested a number of years ago (Wright 1918, more fully 1920, 1921), as a flexible means of relating the correlation ...
[23]
[PDF] MULTIPLE FACTOR ANALYSIS - Statistics
We have described a method of multiple factor analysis. 28. Page 21. 426. L. L. THURSTONE by which it is possible to ascertain how many general, inde- pendent ...
[24]
Structural Equations with Latent Variables | Wiley Online Books
Structural Equations with Latent Variables ; Author(s):. Kenneth A. Bollen, ; First published:28 April 1989 ; Print ISBN:9780471011712 | ; Online ...
[25]
Introduction to Structural Equation Modeling (SEM) in R with lavaan
Structural equation modeling is a linear model framework that models both simultaneous regression equations with latent variables. Models such as linear ...
[26]
[PDF] Chapter 1 Introduction to Structural Equation Models
Latent variable: A random variable that cannot be directly observed, and also is not an error term. • Manifest variable: An observable variable. An actual ...
[27]
Critical Review of Construct Indicators and Measurement Model ...
The purpose of this research is to (a) discuss the distinction between formative and reflective measurement models, (b) develop a set of conceptual criteria.
[28]
A Critical Review of Construct Indicators and Measurement Model ...
Aug 9, 2025 · The purpose of this research is to (a) discuss the distinction between formative and reflective measurement models, (b) develop a set of conceptual criteria.
[29]
Reporting reliability, convergent and discriminant validity with ...
Jan 30, 2023 · We suggest that when latent constructs are used in subsequent analyses, one should report CR as a measure of reliability because it does not ...
[30]
A Primer on Structural Equation Model Diagrams and Directed ...
Apr 13, 2023 · Manifest variables are usually depicted in path models using a square or rectangle. A latent variable is a variable that cannot be directly ...
[31]
Structural Equation Model - an overview | ScienceDirect Topics
In path diagrams, latent substantive variables are enclosed in ovals, and measured variables are enclosed in rectangles. 1.1 SEMs and Probability Distributions.
[32]
Path diagrams in structural equation modelling (SEM)
Mar 3, 2022 · A path diagram is a visual way to represent variables in SEM. Arrows show correlations, and single-headed arrows show loadings between manifest ...
[33]
[PDF] Anatomy of Path Diagrams
Feb 10, 2019 · Path coefficients. Endogenous variable γ (gamma) used to represent effect of exogenous on endogenous. β (beta) used to represent effect of ...
[34]
JÖRESKOG, K. G., & SÖRBOM, D. (1993). <i>Lisrel 8
JÖRESKOG, K. G., & SÖRBOM, D. (1993). Lisrel 8: Structural equations modeling with the SIMPLIS command language. Chicago: Scientific Software International. has ...Missing: original | Show results with:original
[35]
Structural Equation Modeling (SEM)
Structural Equation Modeling (SEM). What is a latent variable? What is an observed (manifest) variable? How does SEM handle measurement errors?
[36]
Introduction to mediation analysis with structural equation modeling
A path diagram consists of nodes representing the variables and arrows showing relations among these variables. By convention, in a path diagram latent ...Missing: ovals | Show results with:ovals
[37]
SEM: Mediation (David A. Kenny)
In contemporary mediational analyses, the indirect effect or ab is the measure of the amount of mediation. The equation of c = c' + ab exactly holds when a) ...
[38]
SEM: Identification (David A. Kenny)
This page provides an introduction to the topic of identification of SEMs. For a more formal treatment of the topic go to Identification: Formal Treatment.Missing: determinacy parameter change
[39]
https://psycnet.apa.org/doiLanding?doi=10.1037%2F0033-2909.107.2.238
[40]
[PDF] Stata Structural Equation Modeling Reference Manual
... 2. A way of thinking about SEMs. 3. Methods for estimating the parameters of SEMs. Stata's sem and gsem commands fit these models: sem fits standard linear ...
[41]
Full article: Effect Size Interpretation in Structural Equation Models
We explain how effect sizes are represented in three common types of SEM: path analysis, confirmatory factor analysis, and structural regression models.
[42]
[PDF] Quantifying and Testing Indirect Effects in Simple Mediation Models ...
Bootstrapping methods are already available for estimating indirect effects in some SEM software applications as well as in macros for use in. SPSS and SAS for ...
[43]
[PDF] EIGHT MYTHS ABOUT CAUSALITY AND STRUCTURAL ...
We conclude that the current capabilities of SEMs to formalize and implement causal inference tasks are indispensible; its potential to do more is even greater.
[44]
[PDF] bayesian Structural Equation Modeling - Mplus
The most common algorithm for Bayesian estimation is based on MCMC sampling. A number of very important papers and books have been written about MCMC ...Missing: seminal | Show results with:seminal
[45]
Understanding Robust Corrections in Structural Equation Modeling
The original SEM development, due to Satorra and Bentler (1988, 1994), was to account for the effect of nonnormality.
[46]
[PDF] Principles and Practice of Scaled Difference Chi-Square Testing
Accordingly, Satorra and Bentler (1988, 1994) developed a set of corrected normal-theory test statistics that adjust the goodness-of-fit chi-square for bias due ...
[47]
Bayesian Structural Equation Modeling (BSEM) - Mplus
Bayesian SEM: A more flexible representation of substantive theory. ... Zyphur, M & Oswald, F. (2013). Bayesian Estimation and Inference: A User's Guide.
[48]
The Method of Path Coefficients - Project Euclid
The Method of Path Coefficients. Sewall Wright. DOWNLOAD PDF + SAVE TO MY LIBRARY. Ann. Math. Statist. 5(3): 161-215 (September, 1934).<|separator|>
[49]
Path Analysis | Columbia University Mailman School of Public Health
Developed nearly a century ago by Sewall Wright, a geneticist working at the US Department of Agriculture, its early applications involved quantifying the ...
[50]
Manifest variable path analysis: Potentially serious and misleading ...
Using fallible measures in path analysis can cause several serious problems: (a) As measurement error pervades a given data set, many path coefficients may be ...
[51]
Comparative assessment of structural equation modeling and ...
SEM is not a new statistical technique (e.g. Jöreskog, 1967, Jöreskog, 1969); however, its diffusion into the tourism research is relatively recent. For ...
[52]
Beyond logistic regression: structural equations modelling for binary ...
Mar 15, 2006 · The advantage of SEM over separate logistic regression models for each outcome is twofold. First, SEM can model all regression equations ...
[53]
The Four Types of Structural Equation Models - The Analysis Factor
Path Analysis is a type of structural equation modeling without latent variables. One of the advantages of path analysis is the inclusion of relationships ...
[54]
Vector autoregression or simultaneous equations model? The ...
We conduct simulations to compare the vector autoregression (VAR) and simultaneous equations models (SEM) for examining temporal relationships among ...
[55]
A Review of Software Packages for Structural Equation Modeling
Aug 4, 2017 · In this paper five SEM software packages (AMOS, LISREL, and three packages in R) dealing with SEM analysis were reviewed to guide the researcher about the ...
[56]
https://pubs.sciepub.com/amp/5/3/2/
[57]
General Description - Mplus
Mplus allows the analysis of both cross-sectional and longitudinal data, single-level and multilevel data, data that come from different populations with either ...
[58]
IBM SPSS Amos
IBM® SPSS® Amos is a powerful structural equation modeling (SEM) software helping support your research and theories by extending standard multivariate ...
[59]
lavaan: An R Package for Structural Equation Modeling
May 24, 2012 · The R package lavaan has been developed to provide applied researchers, teachers, and statisticians, a free, fully open-source, but commercial-quality package ...
[60]
OpenMx: Home
OpenMx consists of a library of functions and optimizers that allow you to quickly and flexibly define an SEM model and estimate parameters given observed data.
[61]
OpenMx 2.0: Extended Structural Equation and Statistical Modeling
the idea that a generalization of structural equation ...
[62]
[PDF] Eight Myths About Causality and Structural Equation Models
... global and local tests that can lead to the rejection of causal assumptions. In the classic SEM, the best-known global test is a likelihood ratio test that.
[63]
Toward open science in PLS-SEM: Assessing the state of the art and ...
By adopting open science practices and utilizing the suggested preregistration template, PLS-SEM applications can establish a transparent and trust-enhancing ...
[64]
Why social psychologists using Structural Equation Modelling need ...
Apr 9, 2018 · OSF | Why social psychologists using Structural Equation Modelling need to pre-register their studies.
[65]
(PDF) Machine-Learning Approaches to Structural Equation Modeling
Sep 9, 2021 · In this chapter, we discuss two methods developed specifically for this purpose: regularized SEM and SEM trees.
[66]
Regularized Structural Equation Modeling - PMC - PubMed Central
RegSEM penalizes specific parameters in structural equation models, with the goal of creating easier to understand and simpler models.Missing: machine | Show results with:machine
[67]
The Necessity of Testing Measurement Invariance in Cross-Cultural ...
Feb 24, 2022 · The aim of this article is to summarize some of those pitfalls, particularly the problem of measurement non-invariance.