Confirmatory factor analysis
Confirmatory factor analysis (CFA) is a multivariate statistical technique used within structural equation modeling to test whether the covariances or correlations observed among a set of measured variables align with a predefined theoretical model of underlying latent factors.[1] Unlike exploratory factor analysis, which identifies potential factor structures from data without prior hypotheses, CFA evaluates the adequacy of a researcher-specified model, including the number of factors, their interrelationships, and the loadings of observed variables onto those factors.[2] This method accounts for measurement error and provides goodness-of-fit indices, such as the comparative fit index (CFI) and root mean square error of approximation (RMSEA), to assess how well the model reproduces the sample data.[3]
CFA was pioneered by Karl G. Jöreskog in the 1960s as an extension of traditional factor analysis, introducing a general maximum likelihood approach to confirm hypothesized structures in psychometric and social science research. Building on early work in factor analysis dating back to Karl Pearson's contributions in 1901, Jöreskog's framework formalized the testing of a priori models, enabling rigorous validation of theoretical constructs like intelligence or attitudes.[1] By the 1970s and 1980s, CFA became integral to structural equation modeling software such as LISREL and EQS, facilitating its widespread adoption in fields including psychology, education, and health sciences.[4]
In practice, CFA involves specifying a measurement model, estimating parameters via methods like maximum likelihood under assumptions of multivariate normality, and interpreting fit statistics to confirm or refine the factor structure.[3] It is commonly applied to validate measurement instruments, such as surveys assessing primary healthcare accessibility, where it confirms whether items load onto intended dimensions like timeliness or accommodation.[2] Key advantages include its ability to handle correlated factors and test measurement invariance across groups, though challenges like model misspecification or small sample sizes can affect reliability, necessitating careful theoretical grounding and sample sizes typically exceeding 200.[4]
Overview
Definition and Purpose
Confirmatory factor analysis (CFA) is a multivariate statistical method used to test whether a set of observed variables reflects a hypothesized structure of underlying latent factors, as specified by a researcher prior to data analysis.[5] Developed as a hypothesis-testing approach, CFA evaluates the extent to which the data support preconceived theoretical models, allowing researchers to confirm or reject assumptions about factor structures, loadings, and inter-factor relationships.[6] This technique is widely applied in psychometrics to validate measurement instruments, in social sciences to assess theoretical constructs such as attitudes or behaviors, and in market research to refine survey scales for consumer insights.
The core components of a CFA model include latent variables, also known as factors, which represent unobservable constructs inferred from the data; observed indicators, which are the measured variables that serve as proxies for these factors; factor loadings, interpreted as regression coefficients indicating the strength of the relationship between a factor and its indicators; unique error terms capturing measurement unreliability or specificity; and factor covariances, which describe associations between multiple latent factors.[7] In contrast to exploratory methods, CFA imposes a priori constraints on these elements to test specific theoretical predictions.[1]
CFA models are often visualized using path diagrams, where latent factors are depicted as circles or ovals, observed indicators as rectangles or squares, single-headed arrows from factors to indicators representing factor loadings, single-headed arrows from indicators to error terms for unique variances, and double-headed arrows between factors for covariances.[8] For instance, in a simple one-factor model assessing intelligence, multiple observed test items—such as vocabulary scores, pattern recognition tasks, and arithmetic problems—would load onto a single latent factor representing general cognitive ability, with the diagram illustrating how these indicators converge to measure the underlying trait while accounting for measurement errors.[9]
Historical Development
The roots of confirmatory factor analysis (CFA) lie in the broader development of factor analysis, which began with Charles Spearman's introduction of the concept in 1904 to explain correlations among cognitive abilities through a single general intelligence factor, known as the g factor. In the 1930s, Louis L. Thurstone extended this framework by proposing multiple orthogonal factors to account for diverse mental abilities, shifting away from Spearman's unitary model and laying groundwork for more complex latent variable structures.
CFA emerged as a distinct hypothesis-testing approach in the late 1960s, primarily through the work of Karl G. Jöreskog at the Educational Testing Service. Jöreskog's 1967 contributions introduced maximum likelihood methods for factor analysis with specified constraints, enabling the testing of predefined factor structures rather than exploratory derivation. This was formalized in his 1969 paper, which presented a general confirmatory maximum likelihood procedure for factor models, marking the first explicit use of the term "confirmatory factor analysis."
In the 1970s, Jöreskog integrated CFA into the structural equation modeling (SEM) framework, developing the LISREL software with Dag Sörbom to estimate latent variable models via covariance analysis.[10] This advancement built on classical test theory by allowing researchers to specify and test relationships between observed indicators and latent factors within larger causal models.[11] The widespread adoption of CFA accelerated in the 1980s, driven by computational improvements in personal computers and accessible software like LISREL, which facilitated its use in psychological research for validating measurement instruments.[12]
By the 1990s, CFA gained prominence in psychology through user-friendly tools such as AMOS, developed by James L. Arbuckle and integrated into SPSS, enabling graphical model specification and broader application in scale development and construct validation.[13] Post-2000, advancements in Bayesian CFA addressed limitations of traditional maximum likelihood estimation, such as handling non-normality and small samples, with methods incorporating prior distributions for robust inference in complex datasets.[14] In the 2020s, CFA has expanded to big data contexts, such as integrating with word embeddings for measurement models in textual data, leveraging computational power for high-dimensional analyses in fields like psychometrics and social sciences.[15]
Exploratory Factor Analysis
Exploratory factor analysis (EFA) is a data-driven statistical method used to uncover the underlying factor structure of a set of observed variables without preconceived hypotheses about the number or nature of the factors. It aims to identify latent constructs by examining patterns of correlations among variables, typically through techniques such as principal components analysis, which accounts for total variance, or principal axis factoring, which focuses on shared (common) variance among variables.[16][17] These methods, originating from early psychometric work,[18] decompose the correlation matrix to reveal potential dimensions, with factor extraction often guided by criteria like eigenvalues greater than 1 for retaining factors.[17]
In contrast to confirmatory factor analysis (CFA), which is theory-driven and tests specific, predefined models—such as a fixed number of factors and constrained factor loadings—EFA adopts an inductive, exploratory approach where all variables are permitted to load on all factors, and the structure emerges from the data.[19][17] While CFA imposes a hypothesized simple structure to validate theoretical constructs, EFA allows unrestricted loadings and relies on rotation methods (e.g., varimax) to achieve interpretability, making it suitable for initial scale development or when little is known about the data's dimensionality.[16][19] This fundamental difference underscores EFA's role in hypothesis generation versus CFA's emphasis on hypothesis testing.[17]
A common workflow involves using EFA to identify potential factor structures in an initial dataset, followed by CFA to confirm those structures in a separate validation sample, ensuring generalizability and reducing overfitting risks.[17] EFA's advantages include its flexibility for discovering unanticipated patterns and lower demands on sample size (typically 5-10 cases per variable),[20] but it suffers from subjectivity in decisions like factor retention and potential for unstable solutions without cross-validation.[16] In comparison, CFA provides rigorous validation but requires stronger theoretical justification, larger samples (often 200+),[21] and careful model specification to avoid identification issues.[19][17]
For instance, applying EFA to a battery of personality questionnaire items might reveal the five broad factors of the Big Five model (e.g., openness, conscientiousness) through emergent loading patterns, whereas CFA would subsequently test whether those items conform to a prespecified Big Five structure with cross-loadings constrained to zero.[22] This sequential approach leverages EFA's exploratory power to inform CFA's confirmatory rigor, enhancing the reliability of psychometric instruments.[19]
Structural Equation Modeling
Structural equation modeling (SEM) represents a comprehensive multivariate statistical framework that combines confirmatory factor analysis (CFA) as the measurement model with path analysis as the structural model to test hypothesized relationships among unobserved latent variables and their observed indicators.[23] This integration allows researchers to account for measurement error in observed variables while examining complex causal structures among latent constructs, such as those in psychological, sociological, or economic theories.[24]
Within SEM, CFA serves as the foundational measurement submodel, specifying how observed variables load onto underlying latent factors and assessing the validity of this factor structure.[25] SEM extends beyond CFA by incorporating a structural component that hypothesizes directed paths or regressions between latent factors, enabling tests of theoretical relationships like mediation or moderation among constructs.[26] In contrast, standalone CFA evaluates only the adequacy of the measurement model without specifying inter-factor relations, whereas SEM fully articulates both measurement and structural components to validate comprehensive theoretical models.[27]
A practical illustration of this distinction appears in organizational research, where CFA might first confirm latent factors of job satisfaction (measured by items on pay, supervision, and promotion) and turnover intention (measured by items on intent to quit and job search behavior), after which the SEM structural model tests whether job satisfaction negatively predicts turnover intention.[28] This approach, which unified CFA's factor analytic tradition with path analysis for causal inference, was pioneered by Karl Jöreskog through the development of the LISREL framework in the 1970s, providing the first general method for estimating covariance structures in linear models with latent variables.
Like CFA, SEM relies on several key assumptions, including linearity in the relationships between latent and observed variables, as well as the absence of omitted variables that could bias parameter estimates or lead to specification errors.[29][30] Violations of these assumptions, such as nonlinear effects or unmodeled confounders, can compromise the validity of inferences drawn from the model.[31]
Model Specification
Statistical Model
The confirmatory factor analysis (CFA) model posits a linear relationship between a set of observed variables and underlying latent factors. The general structural equation is expressed as
\mathbf{Y} = \boldsymbol{\Lambda} \boldsymbol{\xi} + \boldsymbol{\varepsilon},
where \mathbf{Y} is a p \times 1 vector of observed variables, \boldsymbol{\Lambda} is a p \times k factor loading matrix linking the observed variables to the latent factors, \boldsymbol{\xi} is a k \times 1 vector of latent factors, and \boldsymbol{\varepsilon} is a p \times 1 vector of unique error terms.
Under this formulation, the error terms are assumed to have zero mean, E(\boldsymbol{\varepsilon}) = \mathbf{0}, and a diagonal covariance matrix, \text{Cov}(\boldsymbol{\varepsilon}) = \boldsymbol{\Theta}, reflecting uncorrelated unique variances across indicators. Similarly, the latent factors are assumed to have zero mean, E(\boldsymbol{\xi}) = \mathbf{0}, and covariance matrix \text{Cov}(\boldsymbol{\xi}) = \boldsymbol{\Phi}, which captures correlations among factors if they are not orthogonal.
The elements of the factor loading matrix \boldsymbol{\Lambda} quantify the direct effects or, in standardized form, correlations between each latent factor and its associated observed indicators; these are theoretically specified, with certain entries fixed at zero to enforce a measurement structure where indicators load only on designated factors.
From these specifications, the model implies a covariance structure for the observed variables given by
\boldsymbol{\Sigma} = \boldsymbol{\Lambda} \boldsymbol{\Phi} \boldsymbol{\Lambda}' + \boldsymbol{\Theta},
where \boldsymbol{\Sigma} is the p \times p population covariance matrix reproduced by the model parameters.
The degrees of freedom associated with the model are determined by the number of unique elements in the observed covariance matrix minus the number of free parameters to be estimated, yielding df = \frac{p(p+1)}{2} - q, where q includes the nonzero elements of \boldsymbol{\Lambda}, \boldsymbol{\Phi}, and \boldsymbol{\Theta}.
For illustration, consider a simple two-factor CFA model with four observed variables Y_1, Y_2, Y_3, Y_4, where Y_1 and Y_2 load exclusively on the first factor \xi_1, and Y_3 and Y_4 load exclusively on the second factor \xi_2, with all cross-loadings constrained to zero. The loading matrix takes the form
\boldsymbol{\Lambda} = \begin{pmatrix}
\lambda_{11} & 0 \\
\lambda_{21} & 0 \\
0 & \lambda_{32} \\
0 & \lambda_{42}
\end{pmatrix},
leading to the system of equations Y_1 = \lambda_{11} \xi_1 + \varepsilon_1, Y_2 = \lambda_{21} \xi_1 + \varepsilon_2, Y_3 = \lambda_{32} \xi_2 + \varepsilon_3, Y_4 = \lambda_{42} \xi_2 + \varepsilon_4, and allowing for a nonzero covariance \phi_{12} between \xi_1 and \xi_2.
Identification Conditions
In confirmatory factor analysis (CFA), a model is identified if its parameters—typically the factor loading matrix \Lambda, the factor covariance matrix \Phi, and the error covariance matrix \Theta—can be uniquely estimated from the sample covariances of the observed variables. This ensures that the mapping from data to parameters is one-to-one, allowing for consistent and finite estimates under standard estimation methods like maximum likelihood.[32]
Necessary conditions for identification include the t-rule, which requires that the number of unique elements in the sample covariance matrix (i.e., p(p+1)/2, where p is the number of observed variables) is at least as large as the number of free parameters in the model. Additionally, scale identification must be imposed for each latent factor, as factor models are inherently scale-invariant; this is typically achieved by fixing one loading per factor to 1 or setting the factor variance to 1. These conditions are essential but not sufficient, as they only guarantee that the model is not underidentified in a basic counting sense.[33]
Sufficient conditions provide stronger assurances of identifiability. For standard CFA models, the three-indicator rule states that a factor is identified if it is measured by at least three indicators, each loading exclusively on that factor, with no correlated errors among indicators and unconstrained factor covariances. For simpler cases, such as models with two indicators per factor, identification can be achieved if the error covariances are diagonal and there is at least one nonzero covariance between factors.[33]
Identification can be local or global. Local identification ensures a unique solution in the neighborhood of the true parameter values, often verified by checking the rank of the information matrix or using empirical tests like multiple starting values in estimation.[32] Global identification, which implies local, requires that the solution is unique across the entire parameter space and may involve exhaustive checks, such as testing vanishing tetrad differences (products of pairs of covariances expected to be zero under the model). Distinguishing between them is crucial, as local identification may hold while global does not, leading to potential multiple distant solutions.[33]
Underidentification occurs when these conditions fail, resulting in non-unique solutions where parameters cannot be distinguished from the data, often yielding infinite estimates or improper values like negative error variances (known as Heywood cases). For instance, a factor with fewer than two indicators typically leads to underidentification, as the loading and error variance become empirically indistinguishable.[32]
To address underidentification, analysts impose additional constraints, such as equality restrictions on loadings across groups or fixing cross-loadings to zero, while ensuring they are theoretically justified to avoid model misspecification. These strategies, when applied judiciously, can transform an underidentified model into an identified one without altering its substantive meaning.[33]
Assumptions and Estimation
Key Assumptions
Confirmatory factor analysis (CFA) relies on several key statistical and theoretical assumptions to ensure valid inference and model interpretation. One fundamental assumption is multivariate normality of the observed variables, which posits that the joint distribution of the indicators follows a multivariate normal distribution. This assumption is particularly critical when using maximum likelihood (ML) estimation, as violations can lead to biased parameter estimates and underestimated standard errors, potentially inflating Type I error rates in chi-square tests. For instance, data from Likert-scale items often exhibit non-normality due to their ordinal nature and bounded response options, which can distort factor loadings and covariances if treated as continuous.
Another core assumption is the independence of unique errors, meaning that the error terms (often represented by the diagonal elements of the error covariance matrix Θ) are uncorrelated with each other and with the latent factors. This local independence implies that any observed correlations among indicators are fully explained by the common factors, and violations—such as correlated residuals—may indicate omitted variables, misspecified factors, or unmodeled method effects that require model revision.
CFA also assumes a simple factor structure where indicators load primarily on one theorized factor, with no cross-loadings unless explicitly specified based on prior theory. This restriction enforces the hypothesized measurement model by constraining non-salient loadings to zero, promoting parsimony and interpretability; allowing untheorized cross-loadings can obscure the distinctiveness of factors and complicate model identification.
Adequate sample size is essential for reliable estimation and stable standard errors in CFA, with guidelines recommending a minimum of 5 to 10 cases per estimated parameter and an overall sample of at least 200 for basic models to achieve convergence and power. Smaller samples risk unstable solutions, especially in complex models with many parameters, while larger ones (e.g., over 400) enhance precision but may not be necessary for well-specified models.
Theoretically, CFA requires a priori model specification driven by substantive theory rather than exploratory data mining, ensuring that the factor structure, loadings, and covariances are hypothesized in advance to test specific measurement hypotheses. This contrasts with exploratory approaches and ties directly into estimation procedures, where violations of these assumptions may necessitate robust alternatives like bootstrapping to adjust standard errors and fit statistics under non-normality.
Estimation Procedures
Confirmatory factor analysis (CFA) parameters, including factor loadings, unique variances, and factor covariances, are estimated by minimizing a fit function that measures the discrepancy between the observed sample covariance matrix S and the model-implied covariance matrix \Sigma(\theta), where \theta denotes the parameter vector.[34]
The most widely used estimation method is maximum likelihood (ML), which assumes multivariate normality of the observed variables to ensure the consistency and asymptotic efficiency of the estimates.[35] Under this assumption, ML minimizes the fit function
F_{\text{ML}}(\theta) = \log|\Sigma(\theta)| + \text{tr}\left[S \Sigma(\theta)^{-1}\right] - \log|S| - p,
where p is the number of observed variables, yielding unbiased estimates in large samples.[35]
For non-normal or categorical data, where the normality assumption fails, weighted least squares (WLS) provides an asymptotically distribution-free alternative by weighting the discrepancies according to the inverse of the asymptotic covariance matrix of the sample statistics.[36] The WLS fit function is
F_{\text{WLS}}(\theta) = (s - \text{vech}(\Sigma(\theta)))' W^{-1} (s - \text{vech}(\Sigma(\theta))),
with s = \text{vech}(S) the vectorized form of the sample covariance matrix and W the weight matrix, often estimated from polychoric correlations for ordinal indicators.[36]
Diagonally weighted least squares (DWLS) approximates WLS for computationally intensive large models by using only the diagonal elements of W, reducing estimation time while maintaining reasonable performance for factor loadings and correlations in ordinal data analyses.[37]
To address non-normality in ML estimation, robust corrections such as the Satorra-Bentler scaled chi-square adjust the test statistic and standard errors by incorporating a scaling factor derived from the kurtosis of the data, improving the validity of inference in small or non-normal samples.[38] Additionally, bootstrapping generates empirical standard errors and confidence intervals by resampling the data with replacement and re-estimating the model multiple times, offering a non-parametric robust alternative without relying on normality.
Estimation typically proceeds iteratively using algorithms like Newton-Raphson or quasi-Newton methods, starting from initial values (e.g., from exploratory factor analysis) and continuing until convergence, defined by criteria such as a parameter change less than 0.001 or a gradient norm below a small threshold.[35]
For example, in a two-factor CFA model with three indicators per factor assessing verbal and spatial abilities, ML estimation might yield factor loadings of 0.85, 0.78, and 0.82 for verbal and 0.90, 0.88, and 0.75 for spatial, along with a factor correlation of 0.45, assuming a sample size of 500 and multivariate normality.
Model Fit Assessment
Absolute Fit Indices
Absolute fit indices assess the degree of correspondence between the observed sample covariance matrix S and the model-implied covariance matrix \Sigma implied by the confirmatory factor analysis model, without comparing to a null or baseline model. These indices provide a direct measure of absolute model fit, focusing on overall discrepancy rather than relative improvement.[39]
The chi-squared test statistic (\chi^2) serves as the foundational absolute fit measure in confirmatory factor analysis, testing the null hypothesis that the model-implied covariance matrix \Sigma reproduces the sample covariance matrix S exactly. It is computed as \chi^2 = (N-1) F_{ML}, where N is the sample size and F_{ML} is the minimum value of the maximum likelihood discrepancy function F_{ML} = \log|\Sigma| - \log|S| + \trace(\Sigma^{-1} S) - p, with p denoting the number of observed variables. The degrees of freedom (df) are given by df = p(p+1)/2 - t, where t is the number of freely estimated parameters. A non-significant \chi^2 p-value (typically p > 0.05) suggests that the model fits the data adequately, indicating no significant discrepancy between S and \Sigma. However, the \chi^2 test has notable limitations: it is highly sensitive to sample size, often rejecting well-fitting models in large samples (N > 400) due to its asymptotic nature, and assumes multivariate normality of the data, which can inflate the statistic under violations. To address its power against misspecification, researchers conduct power analyses for the \chi^2 test, estimating the probability of detecting population discrepancies of varying magnitudes given the df and N.[39][7][39]
The root mean square error of approximation (RMSEA) quantifies the discrepancy per degree of freedom, adjusting for model parsimony and providing an estimate of fit in the population. It is calculated as \text{RMSEA} = \sqrt{ \frac{\chi^2 / df - 1}{N - 1} }, with a 90% confidence interval often reported to assess precision; values close to zero indicate better approximation of the population covariance structure. RMSEA values below 0.06 suggest good fit, while those below 0.08 indicate acceptable fit, particularly when combined with other indices. This index is less sensitive to sample size than \chi^2 and does not assume exact fit, making it suitable for evaluating approximate model adequacy.[40][41][40]
The standardized root mean square residual (SRMR) measures the average size of standardized residuals between the observed and model-implied covariances, offering a straightforward absolute fit assessment. It is defined as \text{SRMR} = \sqrt{ \frac{1}{t} \sum_{i \leq j} \epsilon_{ij}^2 }, where t = p(p+1)/2 is the number of unique elements in the covariance matrix, and \epsilon_{ij} = (s_{ij} - \sigma_{ij}) / \sqrt{\sigma_{ii} \sigma_{jj}} are the standardized residuals. SRMR values below 0.08 indicate good model fit, as they reflect minimal average discrepancy after standardization for scale. Unlike unstandardized versions, SRMR facilitates comparability across models with differing variable scales.[42][40]
The goodness-of-fit index (GFI) and adjusted goodness-of-fit index (AGFI) evaluate the relative amount of observed covariances reproduced by the model. GFI is computed as \text{GFI} = 1 - \frac{\sum (s_{ij} - \sigma_{ij})^2}{\sum s_{ij}^2}, representing the proportion of variance and covariance in S accounted for by \Sigma; values above 0.90 suggest good fit. AGFI adjusts GFI for model parsimony using \text{AGFI} = 1 - \frac{ \frac{p(p+1)}{2} (1 - \text{GFI}) }{df}, where p is the number of observed variables, penalizing less parsimonious models; thresholds mirror GFI at >0.90 for acceptable fit.[43][44][43][41]
However, both indices are sensitive to non-normality in the data and do not perform well under violations of multivariate normality, leading to recommendations against their primary use in favor of more robust alternatives.[43]
Guidelines for interpreting absolute fit indices emphasize combining multiple measures for robust evaluation, as no single index is definitive. Hu and Bentler recommend RMSEA < 0.06 and SRMR < 0.08 alongside \chi^2 assessments, while cautioning against sole reliance on GFI/AGFI due to their limitations. These cutoffs, derived from Monte Carlo simulations under varied conditions, balance Type I and Type II error rates in model rejection decisions.[40]
Relative Fit Indices
Relative fit indices, also known as incremental or comparative fit indices, evaluate the improvement in model fit relative to a baseline model, typically the null model assuming complete independence among observed variables with no correlations or factor structure. This comparison quantifies the proportional reduction in lack of fit achieved by the target confirmatory factor analysis model over the baseline, providing a measure of relative adequacy rather than absolute fit.
The Normed Fit Index (NFI), introduced by Bentler and Bonett, is calculated as:
\text{NFI} = \frac{\chi^2_{\text{null}} - \chi^2_{\text{model}}}{\chi^2_{\text{null}}}
where \chi^2_{\text{null}} is the chi-square statistic for the null model and \chi^2_{\text{model}} is that for the target model. Values range from 0 to 1, with higher values indicating better relative fit; Bentler and Bonett suggested that NFI values above 0.90 indicate good fit, while values around 0.80 may suggest acceptable fit in some contexts, though lower values like 0.75 signal poor relative improvement. However, NFI tends to underestimate fit in small samples (e.g., N < 200) due to its sensitivity to sample size and lack of parsimony adjustment, making it less reliable without corroboration from other indices.
The Non-Normed Fit Index (NNFI), also known as the Tucker-Lewis Index (TLI), addresses some limitations of the NFI by incorporating a penalty for model complexity through degrees of freedom. It is computed as:
\text{NNFI} = \frac{ (\chi^2_{\text{null}} / \text{df}_{\text{null}}) - (\chi^2_{\text{model}} / \text{df}_{\text{model}}) }{ 1 - \chi^2_{\text{null}} / \text{df}_{\text{null}} }
where df denotes degrees of freedom. Unlike the NFI, NNFI values can exceed 1 but generally fall between 0 and 1 for well-fitting models; thresholds above 0.95 are considered excellent, 0.90 to 0.95 acceptable, and below 0.90 indicative of poor relative fit. This index favors parsimonious models by rewarding improvements in the chi-square per degree of freedom ratio, making it particularly useful for comparing nested models in confirmatory factor analysis.
The Comparative Fit Index (CFI), developed by Bentler, builds on the NNFI to provide a more robust measure less affected by sample size variations. Its formula is:
\text{CFI} = 1 - \frac{ \max(\chi^2_{\text{model}} - \text{df}_{\text{model}}, 0) }{ \max(\chi^2_{\text{null}} - \text{df}_{\text{null}}, 1) }
which normalizes the relative reduction in noncentrality while bounding values between 0 and 1. CFI values greater than 0.95 indicate excellent relative fit, 0.90 to 0.95 suggest good fit, and below 0.90 point to inadequate improvement over the baseline; for example, a CFI of 0.97 reflects strong relative performance, whereas 0.85 indicates substantial room for model refinement. Its resilience to sample size makes it one of the most widely recommended relative indices in confirmatory factor analysis applications.
To account for model parsimony, adjustments such as the Parsimony Normed Fit Index (PNFI) modify the NFI by multiplying it by the parsimony ratio, defined as df_model / df_null. Thus,
\text{PNFI} = \text{NFI} \times \frac{\text{df}_{\text{model}}}{\text{df}_{\text{null}}}
This penalizes overly complex models that achieve fit through excessive parameters, with values above 0.50 often considered acceptable for parsimonious relative fit, though it should be interpreted alongside unadjusted indices. Similar parsimony-corrected versions exist for other relative indices, emphasizing the trade-off between fit and simplicity in model evaluation.
In practice, relative fit indices should be used in conjunction with absolute fit measures to provide a comprehensive assessment, as they highlight incremental gains but do not evaluate standalone discrepancy. For instance, a model with NFI = 0.92, NNFI = 0.94, and CFI = 0.96 alongside good absolute fit supports retention, while values like NFI = 0.78, NNFI = 0.82, and CFI = 0.85 may warrant respecification even if absolute indices are marginal.
Applications and Extensions
Practical Applications
Confirmatory factor analysis (CFA) is widely applied in psychology to validate personality scales, such as the Big Five inventory, where it confirms the underlying factor structure of traits like extraversion and neuroticism from self-report items.[45] For instance, CFA has been used to test the unidimensionality of the revised NEO Personality Inventory, supporting its five-factor model in normative samples despite some cross-loading issues.[46] Similarly, in assessing clinical tools like the Minnesota Multiphasic Personality Inventory (MMPI), CFA validates the restructured clinical scales by examining how items load onto latent dimensions of psychopathology, such as emotional dysfunction, in clinical populations.[47] In depression research, CFA evaluates the unidimensionality of measures like the Patient Health Questionnaire-9 (PHQ-9), often revealing deviations from a single-factor model across time points or populations, which informs scale refinement for accurate symptom assessment.
In the social sciences, CFA tests latent factors underlying attitudes toward policies, such as political trust in institutions, by confirming measurement equivalence across diverse regimes and enabling comparisons of economic beliefs or dissatisfaction dimensions from survey data.[48] For example, multi-group CFA has identified distinct factors for trust in central versus local governance, supporting policy analysis on public perceptions of economic reforms. In marketing, CFA analyzes survey data to validate constructs like brand loyalty, where items measuring repurchase intention and preference load onto latent factors influenced by brand awareness and image, as demonstrated in studies of consumer behavior in competitive markets like smartphones.[49] In health research, CFA confirms quality-of-life factors in patient-reported outcomes, such as the PROMIS framework, by verifying how physical, mental, and social health domains interrelate through item responses, with models showing improved fit when allowing cross-loadings for symptoms like fatigue.[50]
A practical case study in education involves applying CFA to test scores from reading achievement assessments, confirming whether items load onto a single "ability" factor to evaluate construct validity and inform curriculum policy decisions.[51] For instance, in analyzing reading achievement tests for third-year university English majors, CFA supported a one-factor model, validating the scale's use for assessing reading ability and informing instructional strategies. However, applications often face challenges like cross-cultural invariance testing, where CFA assesses configural invariance (same factor structure across groups) and metric invariance (equivalent item loadings), ensuring measures like personality scales are comparable internationally without bias.[52]
In the 2020s, recent trends integrate CFA with machine learning techniques for handling high-dimensional data, such as textual big data, where word embeddings serve as indicators in confirmatory models to validate latent constructs in large-scale surveys or social media analyses.[15] These hybrids enhance scalability for complex datasets, as seen in semi-confirmatory approaches that automate factor selection in sparse, high-dimensional settings. CFA findings can also extend briefly to structural equation modeling for inferring causal relationships among latent variables in applied contexts.
Software and Implementation
Several software packages facilitate the implementation of confirmatory factor analysis (CFA), catering to varying levels of user expertise and analytical needs. Mplus stands out for its flexibility in specifying advanced models, such as multilevel and latent growth analyses, and includes Bayesian estimation options for handling complex data structures.[53] The lavaan package in R serves as a robust, free, and open-source tool for CFA and broader structural equation modeling, enabling users to fit models with customizable estimation methods like maximum likelihood.[54] AMOS, an extension of SPSS, provides a user-friendly graphical user interface that allows model specification via path diagrams, ideal for researchers preferring visual over syntactic approaches. LISREL, one of the earliest dedicated programs for structural equation modeling, remains a historical standard for covariance-based CFA, though it requires more manual input compared to modern alternatives.
Implementing CFA in these software packages follows a structured workflow to ensure reliable results. First, specify the model by defining latent factors and their relationships to observed indicators, either through syntax in programs like lavaan or Mplus, or via diagrams in AMOS.[7] Second, input the data as raw observations or precomputed covariance/saturation matrices, depending on the software's capabilities and data characteristics. Third, estimate the model parameters using methods such as maximum likelihood, monitoring for convergence. Fourth, verify model identification by ensuring sufficient constraints (e.g., fixing one loading per factor to 1) and check for estimation convergence to avoid Heywood cases or negative variances. Fifth, assess overall model fit using indices like chi-square, RMSEA, and CFI. Sixth, if respecification is needed, add theoretically justified paths or covariances based on modification indices, rather than purely empirical adjustments.[7]
Interpreting CFA output focuses on key parameter estimates and diagnostics to validate the measurement model. Standardized factor loadings exceeding 0.40 typically indicate statistically significant and substantively meaningful associations between indicators and latent factors, with values closer to 0.70 or higher reflecting strong convergent validity. Modification indices, which quantify the expected decrease in chi-square if a parameter is freed, can highlight potential improvements, such as residual covariances among errors; however, they must be interpreted cautiously to prevent capitalizing on chance and undermining model generalizability.
Best practices in CFA implementation emphasize transparency and robustness to enhance reproducibility and validity. Always report a comprehensive set of fit indices (e.g., CFI, RMSEA, SRMR) alongside parameter estimates, and justify sample size adequacy, as CFA generally requires at least 200–300 cases for stable estimation, with larger samples preferred for complex models. For missing data, employ full information maximum likelihood (FIML) estimation, available in most packages, to utilize all available information without listwise deletion bias.
To improve accessibility, free graphical interfaces like jamovi and JASP integrate CFA modules, often powered by lavaan, allowing point-and-click model building without coding. Recent software updates continue to expand capabilities; for instance, Mplus Version 9, released in October 2025, introduced enhancements for multilevel and three-level structural equation modeling, including applications to CFA.[55]
As an example, the following lavaan syntax specifies and fits a simple one-factor CFA model where a latent factor loads on three observed items:
r
model <- 'factor1 =~ item1 + item2 + item3'
fit <- cfa(model, data = your_dataset)
summary(fit, standardized = TRUE, fit.measures = TRUE)
model <- 'factor1 =~ item1 + item2 + item3'
fit <- cfa(model, data = your_dataset)
summary(fit, standardized = TRUE, fit.measures = TRUE)
This code fixes the first loading to 1 for identification, estimates the remaining parameters, and outputs standardized loadings, variances, and fit measures for evaluation.[56]