Fact-checked by Grok 2 weeks ago

Exploratory factor analysis

Exploratory factor analysis (EFA) is a multivariate statistical designed to uncover the underlying latent factors or constructs that explain patterns of intercorrelations among a set of observed variables, thereby reducing data dimensionality while revealing the structure of the data. It models the population using sample data to identify common variance shared by variables, distinguishing it from (PCA), which focuses on total variance reduction rather than latent constructs. EFA assumes linear relationships, multivariate normality, and sufficient sample sizes (typically at least 300 cases) to ensure reliable factor extraction and interpretation. Developed in the early , EFA traces its origins to Charles Spearman's 1904 work on general , where he proposed a single latent factor accounting for correlations in scores, laying the foundation for factor analytic methods in . The technique gained prominence with advancements in statistical computing during the mid-, enabling broader applications in social sciences, education, and health research for theory building and scale development. Unlike (CFA), which tests predefined hypotheses about factor structures, EFA is exploratory, allowing factors to emerge from the data without prior specifications, making it ideal for initial instrument validation. In practice, EFA involves assessing data suitability, extracting initial factors, determining the optimal number of factors, and rotating the solution for interpretability. Researchers must address potential issues to ensure robust results. EFA plays a crucial role in establishing by providing evidence of an instrument's internal structure, such as grouping items into subscales for measures of psychological traits or attitudes. It is widely applied in across disciplines, including developing surveys on in or factors influencing health behaviors. Recent guidelines emphasize reporting full extraction criteria, rotation decisions, and parallel analysis for factor retention to enhance and in findings.

Introduction and Background

Definition and Purpose

Exploratory factor analysis (EFA) is a multivariate statistical technique employed to uncover the underlying latent factors or constructs that account for the patterns of correlations observed among a set of manifest variables. By reducing the dimensionality of the , EFA identifies a smaller number of factors that explain the interrelationships among the variables, thereby simplifying complex datasets while preserving essential information about their structure. The primary purpose of EFA is to partition the variance in the observed variables into common variance—shared across variables and attributable to the latent factors—and unique variance, which includes measurement error and specific factors unique to individual variables. This contrasts with , which seeks to explain the total variance through orthogonal linear combinations of the observed variables without distinguishing latent constructs or incorporating error terms. EFA serves key objectives such as generating hypotheses about underlying structures, summarizing data for interpretive clarity, and facilitating theory building in disciplines including and , where it aids in developing psychometric scales and understanding consumer behaviors. At its core, EFA operates within the general linear factor model, expressed as X = \Lambda F + \epsilon, where X represents the vector of observed variables, \Lambda is the matrix of factor loadings, F is the vector of latent factors, and \epsilon denotes the vector of unique factors or errors. As the exploratory counterpart to (CFA), which tests predefined hypotheses, EFA allows for flexible discovery of factor structures without prior specifications.

Historical Development

The roots of exploratory factor analysis trace back to 19th-century advancements in psychophysics and statistics, particularly Francis Galton's pioneering work on correlations as a measure of association between variables in human traits and measurements. Galton's development of the regression line and correlation concept in the 1880s provided the foundational statistical tools for later multivariate analyses in psychology. A pivotal milestone occurred in 1904 when psychologist Charles Spearman introduced the two-factor theory of intelligence, positing a general factor (g) underlying cognitive abilities alongside specific factors, using early factor analytic techniques to explain intercorrelations among mental tests. This work, published in the American Journal of Psychology, marked the formal inception of factor analysis as a method to uncover latent structures in observed data. In the 1930s, Louis L. Thurstone advanced the field by developing multiple-factor analysis, challenging Spearman's emphasis on a single general factor and proposing orthogonal rotations to identify primary mental abilities; his 1935 book, The Vectors of Mind, detailed the centroid method for extracting multiple factors from correlation matrices. During the 1940s and 1950s, statisticians and D.N. Lawley formalized estimation procedures, with Lawley's 1940 introduction of maximum likelihood methods providing a probabilistic framework for factor model parameters under assumptions. Post-World War II, Harry H. Harman's 1960 textbook, Modern Factor Analysis, synthesized these developments and popularized exploratory techniques among researchers through accessible expositions of rotation criteria and interpretation strategies. The saw a shift from manual computations—reliant on graphical and iterative hand calculations—to computational implementations enabled by emerging statistical software, facilitating larger datasets and more efficient rotations. Key events include Karl G. Jöreskog's 1960s contributions to maximum likelihood estimation in factor analysis, which laid groundwork for his development of the LISREL software in the early 1970s, bridging exploratory methods to confirmatory approaches. This evolution culminated in the 1970s transition toward confirmatory factor analysis, allowing hypothesis testing of predefined models.

Theoretical Foundations

The Factor Model

The classical linear factor model in exploratory factor analysis (EFA) posits that the observed covariance matrix \Sigma of p variables can be expressed as \Sigma = \Lambda \Phi \Lambda^T + \Psi, where \Lambda is a p \times m matrix of factor loadings, \Phi is an m \times m correlation matrix among the m common factors (with \Phi positive definite), and \Psi is a p \times p diagonal matrix of unique variances. This formulation assumes that the observed variables x arise from an underlying structure where x = \mu + \Lambda f + \epsilon, with \mu the mean vector, f the vector of latent factors (having mean zero and covariance \Phi), and \epsilon the vector of errors (with mean zero, covariance \Psi, and uncorrelated with f). The model, originally developed in the context of multiple factors by Thurstone building on Spearman's two-factor theory, aims to parsimoniously account for the covariances among observed variables through a smaller number of latent factors. The loadings \lambda_{ij} represent the regression coefficients of the j-th observed on the i-th , indicating the strength and direction of the relationship between the variable and the . When variables and factors are standardized (as is in EFA applications), these loadings can be interpreted as correlations between the variables and factors. The communality for the j-th , h_j^2, is the proportion of its variance explained by the factors, given by the j-th diagonal element of \Lambda \Phi \Lambda^T, or h_j^2 = \sum_{i=1}^m \sum_{k=1}^m \lambda_{ji} \phi_{ik} \lambda_{jk}. Conversely, the uniqueness \psi_j (the j-th diagonal of \Psi) captures the variance specific to the j-th , including measurement error and variable-specific influences not shared with other variables. The primary goal of estimation in the factor model is to select \Lambda, \Phi, and \Psi such that the model-implied closely reproduces the sample , thereby maximizing the common variance explained by the factors while minimizing residuals in \Psi. This approach focuses on partitioning total variance into shared (common) and unique components, unlike (PCA), which provides an exact decomposition of the as \Sigma = V D V^T (where V contains eigenvectors and D the eigenvalues) without distinguishing latent factors or error terms—treating all variance as systematic rather than modeling latent structures with uniqueness.

Key Assumptions

Exploratory factor analysis (EFA) relies on several core statistical assumptions to ensure the validity of the underlying model, which posits that observed variables X can be expressed as a of common factors F and unique errors \epsilon, or X = \Lambda F + \epsilon. One fundamental assumption is multivariate normality of the observed variables, which is particularly critical for methods like to produce reliable parameter estimates and fit statistics. Violations of this assumption can lead to biased results, though robust estimation techniques, such as or principal axis factoring, can mitigate effects in non-normal data. Another key assumption is the linearity of relationships among the observed variables, meaning that the covariances or correlations captured in the data matrix reflect linear associations rather than nonlinear ones. EFA also assumes no extreme or singularity in the or , which would indicate perfect linear dependencies among variables that prevent unique solutions. This is typically checked by ensuring the is sufficiently large (e.g., greater than 0.00001) and that no variables are perfectly predictable from others. Additionally, the errors are assumed to exhibit homoscedasticity, implying constant variance across levels of the , which supports the stability of the structure. For the to be meaningful, there must be sufficient among the variables (e.g., at least some exceeding 0.30), ensuring they are not entirely orthogonal and thus capable of forming coherent . Regarding factorial assumptions, the common factors F are assumed to be uncorrelated with the unique errors \epsilon, formally expressed as \text{Cov}(F, \epsilon) = 0, which ensures that the factors explain variance independently of measurement-specific noise. Furthermore, the unique errors are assumed to be uncorrelated with each other, represented by a diagonal matrix \Psi in the factor model, where off-diagonal elements are zero. This diagonal structure of \Psi implies that any remaining variance after extracting common factors is attributable to unique sources without cross-contamination. Sample size requirements are essential for stable factor solutions in EFA, with a minimum of 5 to 10 cases per variable recommended to achieve reliable correlations and avoid unstable estimates. Ideally, total sample sizes of 100 to 200 or more are preferred, particularly when communalities are low or factors are weak, as smaller samples increase the risk of distorted loadings. For handling violations like non-normality, robust methods such as can be employed to maintain validity without strict adherence to distributional assumptions.

Data Preparation and Suitability

Assessing Data Adequacy

Before conducting exploratory factor analysis (EFA), it is essential to evaluate the suitability of the dataset using empirical diagnostics that assess whether the correlations among variables are sufficiently strong and the sample is adequate for identifying underlying s. These tests help ensure that the data meet the preconditions for meaningful factor extraction, preventing issues such as unstable solutions or spurious s. The Kaiser-Meyer-Olkin (KMO) measure of sampling adequacy quantifies the proportion of variance among variables that may be common variance, indicating the appropriateness of the data for factor analysis. The KMO is calculated as KMO = \frac{\sum_{i \neq j} r_{ij}^2}{\sum_{i \neq j} r_{ij}^2 + \sum_{i \neq j} p_{ij}^2}, where r_{ij} represents the correlation coefficients and p_{ij} the partial correlation coefficients between variables i and j. Values greater than 0.6 generally suggest that the data are suitable for EFA, with thresholds interpreted as follows: above 0.9 (marvelous), 0.8–0.9 (meritorious), 0.7–0.8 (middling), 0.6–0.7 (mediocre), 0.5–0.6 (miserable), and below 0.5 (unacceptable), signaling inadequate data for proceeding. Bartlett's test of sphericity examines whether the correlation matrix significantly differs from an , where variables are uncorrelated, a condition that would render inappropriate. The test statistic is approximated by \chi^2 = -\left( n-1 - \frac{2p+5}{6} \right) \ln |R|, with n as the sample size, p as the number of variables, and |R| as the of the correlation matrix; a significant result (typically p < 0.05) indicates the presence of substantial correlations, supporting the use of EFA and relating to the assumption of non-sphericity in the data. Additional diagnostics include the determinant of the correlation matrix, which should exceed 0.00001 to avoid singularity and multicollinearity issues that could invalidate the analysis; values close to zero suggest linear dependencies among variables. Anti-image correlations, derived from the inverse of the correlation matrix adjusted for common variance, should generally be above 0.5 for individual variables to confirm their contribution to the factor structure. Initial communality estimates, often based on squared multiple correlations, provide preliminary indicators of how much variance each variable shares with others, with values below 0.4 potentially flagging problematic items. Guidelines for sample size emphasize a minimum of 100–200 observations for stable results, with a preferred ratio of at least 5–10 cases per variable to ensure reliable correlation estimates and robust factor solutions. Low values below 0.5, combined with non-significant or poor ancillary diagnostics, indicate inadequate data, often due to insufficient sample size, weak inter-variable relationships, or excessive noise, necessitating data collection or variable refinement before .

Preprocessing Techniques

Before conducting exploratory factor analysis (EFA), data must be preprocessed to ensure the input matrix is suitable for factor extraction, typically focusing on the correlation or covariance matrix as the primary input. For continuous variables measured on comparable scales, the Pearson correlation matrix is commonly used, as it standardizes variables by accounting for differences in units and variances, thereby preventing variables with larger scales from disproportionately influencing the analysis. In contrast, when variables are measured on different scales, the correlation matrix is preferred over the covariance matrix to equalize their contributions, avoiding bias toward higher-variance items. For ordinal data, such as Likert-scale responses, polychoric correlations are recommended over Pearson correlations, as they better approximate the underlying continuous latent structure and yield more accurate factor loadings and model fit in EFA simulations. For binary categorical variables, tetrachoric correlations serve a similar purpose, estimating the correlation under an assumed bivariate normal distribution for latent continuous traits. Handling missing data is crucial in EFA preprocessing to minimize , particularly in small samples where loss can distort structure. Listwise deletion, which removes all cases with any missing values, is a simple default method but can lead to substantial sample reduction and biased estimates unless data are missing completely at random (MCAR); it is generally discouraged in small samples (e.g., N < 200) due to reduced power and increased standard errors. Mean imputation, replacing missing values with variable means, is another basic approach but introduces by underestimating variances, especially with non-MCAR mechanisms or higher missingness rates (>5%). , which generates multiple plausible datasets using methods like predictive mean matching (PMM) and pools results via Rubin's rules, outperforms deletion and single imputation by preserving relationships and reducing in EFA recovery, even with 25% missingness in samples as small as N=60. Full information maximum likelihood (FIML) offers an alternative for estimating the directly without imputation, providing unbiased results under assumptions, though with PMM is often superior for exploratory contexts with small samples. Variable selection and scaling further refine the dataset by eliminating irrelevant or problematic items and ensuring comparability. Items with low variance (e.g., standard deviation < 0.5) should be removed prior to EFA, as they contribute little to covariance structure and can inflate uniqueness estimates without adding meaningful information. When variables are on differing scales (e.g., one 1-5, another 0-100), standardization—dividing each variable by its standard deviation—is essential if using a covariance matrix, but the correlation matrix inherently standardizes, making it the default choice for heterogeneous scales. For categorical variables, beyond correlation adjustments, selection involves checking item discrimination; low-discriminating items (e.g., correlating <0.30 with others) are typically excluded to enhance factor clarity. Sample considerations are integral to preprocessing, as inadequate sample size can undermine EFA reliability regardless of other preparations. The sample must be representative of the population, avoiding convenience sampling biases that skew factor loadings. A minimum subjects-to-variables ratio of 10:1 is widely recommended for stable results, though 20:1 or higher (e.g., N ≥ 300 for 15 variables) is optimal, especially with low communalities (<0.40) or many factors; ratios below 5:1 often yield unstable solutions and poor replicability. These guidelines stem from simulations showing that smaller ratios increase factor indeterminacy, but requirements lessen with strong data (high communalities >0.70). Preprocessing may reference initial suitability tests, such as the , to confirm adequacy before finalizing the dataset.

Factor Extraction Methods

Principal Axis Factoring

Principal axis factoring (PAF) is an iterative extraction method in exploratory factor analysis that estimates the common factor structure by adjusting for unique variances through communality estimates on the diagonal of the correlation matrix. Unlike methods that assume full variance contribution from all variables, PAF begins with initial communality estimates derived from squared multiple correlations (SMCs), which approximate the shared variance for each variable based on its correlations with the others. These SMCs are calculated as h_j^2 = 1 - (R^{-1})_{jj}, where R is the sample correlation matrix and (R^{-1})_{jj} is the j-th diagonal element of its inverse, representing the proportion of unique variance. The resulting reduced matrix, with communalities replacing the diagonal unities, undergoes eigenvalue decomposition to yield initial factor loadings, and the process iterates to refine these estimates until stability. This approach, developed from early work by Thomson and Hotelling and formalized in subsequent psychometric literature, prioritizes the latent common factors while explicitly modeling measurement error and specific variances. The iterative algorithm for PAF follows these steps:
  1. Compute the initial uniqueness diagonal matrix \Psi_0 with elements \psi_{0j} = 1 - h_j^2, using SMCs as initial communalities.
  2. Form the reduced correlation matrix \tilde{R} = R - \Psi_0.
  3. Perform eigenvalue decomposition on \tilde{R} = \tilde{V} \tilde{\Lambda} \tilde{V}', where \tilde{\Lambda} contains the eigenvalues \tilde{\lambda}_k and \tilde{V} the corresponding eigenvectors \tilde{v}_k; compute the loadings as \tilde{\ell}_{jk} = \tilde{\lambda}_k^{1/2} \tilde{v}_{jk} for the first m factors.
  4. Update the uniqueness estimates as \psi_j = 1 - \sum_{k=1}^m \tilde{\ell}_{jk}^2 for each variable j, forming a new \Psi.
  5. Repeat steps 2–4, substituting the updated \Psi, until convergence, typically defined by the maximum change in communality estimates being less than 0.001.
PAF offers distinct advantages in applications involving imperfect measures, such as psychological assessments, where unique variances from error or specific factors are prevalent; by partitioning variance into and components, it provides a more theoretically grounded representation of underlying constructs than total-variance methods. This explicit accounting for enhances the interpretability of factors as latent variables driving observed . In comparison to (), which decomposes the full with unities on the diagonal to capture total variance, PAF uses the reduced and thus produces smaller eigenvalues that reflect only variance, often requiring fewer factors to explain the shared structure while excluding noise from sources.

Maximum Likelihood Estimation

Maximum likelihood (ML) estimation serves as a parametric method for factor extraction in exploratory factor analysis, treating the observed variables as multivariate normally distributed and estimating the factor loading matrix and the diagonal matrix of unique variances by maximizing the likelihood of the observed data given the factor model. This approach assumes multivariate normality of the variables, which underpins the probabilistic framework for parameter estimation. The ML framework maximizes the likelihood function L = \prod_{i=1}^N \frac{1}{(2\pi)^{p/2} |\Sigma|^{1/2}} \exp\left( -\frac{1}{2} \operatorname{tr}(\Sigma^{-1} S) \right), where N is the sample size, p is the number of variables, S is the sample covariance matrix, and \Sigma = \Lambda \Lambda^T + \Psi is the model-implied covariance matrix. Equivalently, estimation minimizes the ML discrepancy function F_{ML} = \log|\Sigma| + \operatorname{tr}(S \Sigma^{-1}) - \log|S| - p, typically through iterative optimization algorithms such as Newton-Raphson, which solve the necessary conditions for the maximum by updating parameter estimates based on first- and second-order derivatives of the likelihood. A key strength of ML estimation is its provision of a chi-square goodness-of-fit test statistic, \chi^2 = (N-1) F_{ML}, which assesses the discrepancy between the sample covariance matrix and the model-implied covariance under the null hypothesis that the factor model holds, distributed asymptotically as chi-square with degrees of freedom equal to \frac{p(p+1)}{2} - p(t + 1) for t factors. Additionally, ML yields standard errors for the estimated parameters, derived from the inverse of the observed information matrix, facilitating inference about the reliability of the loadings and uniquenesses. Despite these advantages, ML estimation is sensitive to violations of multivariate , leading to biased parameter estimates and inflated statistics when data exhibit , , or other departures from . It is also prone to Heywood cases, where estimated unique variances become negative (or occasionally exceed 1), often signaling model misspecification, small sample sizes, or overfactoring, which can disrupt and interpretability.

Determining the Number of Factors

Eigenvalue-Based Criteria

One of the primary eigenvalue-based criteria for determining the number of factors to retain in exploratory factor analysis is the rule, which recommends retaining only those factors whose eigenvalues exceed 1.0 when derived from the matrix. Proposed by in 1960, this rule posits that factors with eigenvalues greater than 1 explain more total variance than an individual original variable, as each variable contributes an eigenvalue of 1 in the underlying the structure. The criterion is straightforward to apply and has been widely adopted due to its simplicity in computational settings. The theoretical basis for the eigenvalue threshold of 1 originates from Guttman's lower bound criterion, established in , which asserts that the minimum number of common factors in a population equals the number of eigenvalues greater than or equal to 1. Guttman demonstrated that eigenvalues below 1 represent trivial contributions, as they account for less variance than a single standardized variable, thereby serving as a conservative estimate to avoid underfactoring while bounding the factor space from below. This latent roots criterion aligns closely with Kaiser's practical guideline, often referred to jointly as the Kaiser-Guttman rule. In addition to absolute eigenvalue thresholds, researchers frequently consider the cumulative or individual variance explained by retained factors as a complementary eigenvalue-derived . Common guidelines suggest retaining factors until they collectively account for 70-80% of the total variance, or ensuring that each retained factor explains at least 10% for the first factor and progressively less (e.g., 5-9%) for subsequent ones, to balance with . These thresholds help interpret eigenvalues in terms of practical significance, as the proportion of variance explained by the i-th factor is given by \lambda_i / p, where p is the number of variables. Despite its popularity, the Kaiser rule has faced significant critique for overextraction, particularly in analyses with large sample sizes or many variables, where it may retain spurious factors by inflating the number beyond the true underlying structure. Simulations have shown that this tendency arises because the rule does not account for sampling variability or the dimensionality of the data, leading to unreliable decisions in non-ideal conditions. In principal axis factoring, eigenvalues are obtained through the eigendecomposition of the correlation matrix modified by replacing its diagonal elements with estimated communalities, which are iteratively refined during the process. The sum of these eigenvalues equals the total sum of communalities across all variables. Graphical of eigenvalues can provide supplementary intuition for applying these criteria, though numerical thresholds remain the core focus.

Graphical and Structural Methods

Graphical methods for determining the number of factors in exploratory factor analysis rely on visualizing the eigenvalues obtained from factor extraction techniques, such as principal axis factoring or , to identify natural breaks in the . These approaches emphasize subjective or semi-automated inspection of plots to discern where the explained variance stabilizes, aiding researchers in retaining interpretable factors without rigid numerical thresholds. One of the most widely adopted graphical tools is the , proposed by Raymond B. Cattell in 1966. This method involves plotting the eigenvalues in decreasing order on the y-axis against their corresponding factor numbers on the x-axis, resulting in a curve that typically descends steeply at first before flattening. Researchers are instructed to retain the factors up to the point known as the "," where the plot's slope markedly decreases, indicating diminishing returns in additional factors. This visual inspection helps balance parsimony and explanatory power, though it requires judgment to locate the precisely. The 's simplicity has made it a staple in , particularly for datasets with clear structural breaks. To address the subjectivity inherent in elbow identification, extensions like the Optimal Coordinates and Acceleration Factor (OCAF) methods provide more objective graphical aids by quantifying breaks in the eigenvalue curve. The Optimal Coordinates approach extrapolates a line through the initial eigenvalues and identifies the where subsequent eigenvalues deviate significantly from this line, marking a structural shift. Complementing this, the Acceleration Factor computes the second differences (accelerations) of the eigenvalues, highlighting points of maximum where the curve's rate of change slows abruptly, akin to the 's location. These techniques, formalized as non-graphical solutions to the scree test, can be overlaid on the plot for enhanced visualization, improving reliability in retention decisions across varied sample sizes and structures. Shifting to structural methods, the Very Simple Structure (VSS) criterion, developed by William Revelle and Timothy Rocklin in 1979, evaluates factor solutions based on the simplicity and interpretability of the rotated loading matrix rather than eigenvalues alone. VSS maximizes the proportion of loadings that are either high (close to 1 or -1) or low (close to 0) per variable, promoting a "simple structure" where each observed variable loads prominently on few factors. For each potential number of factors, an index is computed by fitting the loading matrix to this ideal and assessing the fit; higher VSS values indicate better simplicity. Plots of VSS indices across increasing factor numbers typically peak at the optimal count, providing a graphical complement to traditional plots by incorporating rotation outcomes. This method is particularly valuable for rotations, where correlated factors enhance interpretability. Interpretation of these graphical and structural methods involves subjective judgment for and OCAF elbows, guided by and replication across subsets, while VSS plots offer a more objective peak for comparison. Researchers often combine these tools—examining eigenvalue declines visually alongside simplicity fits—to arrive at a on factor number, ensuring the solution captures meaningful latent structure without overextraction.

Statistical Comparison Approaches

Statistical comparison approaches in exploratory factor analysis (EFA) provide quantitative criteria for selecting the number of factors by evaluating model fit or residual correlations directly from the sample data, offering a rigorous alternative to heuristic or visual methods. These techniques often rely on nested model comparisons or partial correlation analyses to assess how well increasing the number of factors reduces unexplained variance without overfitting. For instance, in maximum likelihood (ML) estimation, fit statistics such as the chi-square test can be used to compare models with differing numbers of factors, where a significant chi-square difference indicates that adding a factor improves fit beyond chance. Model comparison techniques, including likelihood ratio tests and information criteria like the Akaike Information Criterion (AIC) and (BIC), are particularly suited for nested EFA models to determine the optimal number of factors. The involves computing a difference between models with k and k-1 factors; a non-significant difference suggests that the additional factor does not justify the loss of . Similarly, AIC penalizes model complexity less severely than BIC, which imposes a stronger penalty for extra parameters, making BIC preferable for larger samples to avoid overextraction. These approaches evaluate the between goodness-of-fit and , with BIC often yielding more conservative estimates in EFA applications. Velicer's Minimum Average Partial (MAP) test offers a distinct statistical approach by focusing on the average squared partial correlations remaining after extracting successive factors, aiming to identify the point where common variance is most effectively captured. Introduced as a to minimize the average of off-diagonal elements in the partial correlation matrix after principal components extraction, the MAP test proceeds by iteratively factoring the correlation matrix, computing partial correlations among variables with the effects of retained factors removed, and selecting the number of factors at which this average reaches its minimum value. This minimum signals that further factors primarily account for unique variance rather than shared structure, providing a data-driven cutoff without relying on eigenvalues. The test performs robustly across various factor loading patterns and sample sizes, though it assumes continuous data with Pearson correlations. The efficacy of statistical comparison approaches is enhanced by considering convergence across multiple tests, as no single method is universally superior. For example, combining with other criteria, such as parallel analysis, yields higher accuracy in retaining the correct number of factors, particularly when the of variables to factors is high or when factors are moderately correlated. A seminal simulation study demonstrated that MAP and parallel analysis together outperformed eigenvalue-based rules like the Kaiser criterion in over 80% of conditions tested, emphasizing the value of methodological agreement for reliable decisions in EFA. Courtney's procedures extend these statistical comparisons to handle more appropriately by incorporating polychoric correlations, which estimate underlying continuous relationships from categorical variables, while reverting to Pearson correlations for continuous data. This tailored approach integrates and other tests within software like R-Menu, allowing users to specify correlation types and evaluate factor retention criteria in a unified framework. For common in , using polychoric matrices with or information criteria reduces bias in factor number estimation compared to standard Pearson-based analyses, ensuring the procedures align with the data's distributional assumptions.

Parallel Analysis and Simulations

Parallel analysis, introduced by in 1965, is a simulation-based method for determining the number of factors to retain in exploratory factor analysis by comparing the eigenvalues from the sample data to those generated from random data matrices of the same dimensions. The procedure involves generating multiple random matrices that match the sample size and number of variables, assuming no underlying factor structure, and computing their eigenvalues. Factors are retained in the original analysis if their corresponding eigenvalues exceed the 95th (or a similar ) of the average eigenvalues from the random matrices, providing an empirical to distinguish true factors from noise. A refinement to this approach, proposed by Ruscio and Roche in 2012, enhances accuracy through the comparison data method, which uses simulations to generate datasets from known structures rather than purely random , and compares the full of all eigenvalues rather than just their means. This method better accounts for the specific characteristics of the , such as variable intercorrelations, leading to more precise retention decisions in complex scenarios. In practice, parallel analysis is implemented by simulating 100 to 1000 random datasets to ensure stable estimates, with higher numbers providing greater reliability at the cost of computational time; many software packages, such as R's package or syntax, include options to adjust for finite sample sizes by rescaling eigenvalues or incorporating sampling variability. These simulations help mitigate biases that arise in smaller samples, where random eigenvalues tend to be larger. Parallel analysis offers key advantages over simpler eigenvalue-based criteria like the Kaiser rule, as it explicitly handles the effects of sample size and variable-to-factor ratios on eigenvalue distributions, demonstrating superior performance in studies across diverse conditions. Empirical evidence consistently shows it reduces overextraction of factors compared to the Kaiser rule, which often retains too many trivial components, particularly in non-normal data or smaller samples. Recent advances include approaches that leverage for factor retention (Goretzko & Bühner, 2023) and methods minimizing out-of-sample prediction error (Preacher et al., 2023). A 2025 review emphasizes the need for method convergence in practice (Goretzko et al., 2025).

Factor Rotation Techniques

Orthogonal Rotation

Orthogonal rotation methods in exploratory factor analysis transform the initial factor solution to improve interpretability while preserving the uncorrelated of the factors. The primary purpose is to achieve simple structure, a configuration where the factor loading matrix exhibits high loadings on one or a few factors per and near-zero loadings elsewhere, facilitating clearer of underlying dimensions. This approach maximizes the variance of the squared loadings, promoting patterns that align with theoretical expectations of distinct, non-overlapping constructs. The , introduced by in , is the most commonly applied orthogonal method due to its effectiveness in producing balanced across variables and factors. It operates by maximizing the following : V = \sum_{j=1}^{m} \left( \sum_{i=1}^{p} p_{ij}^{4} - \frac{1}{p} \left( \sum_{i=1}^{p} p_{ij}^{2} \right)^{2} \right) where p_{ij} represents the rotated loadings, p is the number of variables, and m is the number of factors. This formulation equalizes the influence of variables, avoiding bias toward those with higher communalities, and has been widely adopted for its computational efficiency and robustness in psychometric applications. In contrast, the quartimax rotation, proposed by Neuhaus and Wrigley in 1954, emphasizes row simplicity in the loading matrix by minimizing the number of factors required to account for each variable's variance. It achieves this by maximizing: Q = \sum_{i=1}^{p} \sum_{j=1}^{m} p_{ij}^{4} This criterion tends to yield a general factor on which most variables load substantially, with the remaining factors capturing specific variances, making it suitable for scenarios where hierarchical structures are anticipated but less ideal for equally prominent factors. Equamax rotation, developed by Saunders in 1962, serves as a hybrid variant that balances the column-focused simplicity of varimax and the row-focused simplicity of quartimax. As a special case of the orthomax family with parameter \gamma = m/2, it adjusts the weighting to promote both variable-specific and factor-specific clarity in loadings, often resulting in more solutions than pure varimax or quartimax. Related parsimony criteria, such as those proposed by Crawford and Ferguson in 1970, further refine this balance by penalizing complex cross-loadings more stringently, enhancing overall structure in datasets with moderate factor interdependencies under orthogonal constraints. Orthogonal rotations are particularly recommended when theoretical models posit factors, such as unrelated personality traits or cognitive abilities in , where assuming zero correlations simplifies interpretation without sacrificing validity. In such cases, they outperform unrotated solutions by redistributing variance to highlight substantive patterns while maintaining the total explained variance invariant to the .

Oblique Rotation

Oblique in exploratory factor analysis permits the extracted s to correlate with one another, reflecting the reality that underlying latent constructs in many datasets—particularly in social sciences and —are often interrelated rather than . This approach transforms the initial loading \Lambda by introducing a correlation \Phi, targeting a that approximates the observed through \Lambda \Phi \Lambda^T + \Psi, where \Psi is the of unique variances. Unlike orthogonal methods, which enforce \Phi = I, oblique provides greater flexibility for achieving while preserving theoretical plausibility in correlated scenarios. The oblimin criterion, introduced by Jennrich and Sampson in , represents a general framework for oblique rotation that minimizes a quartic simplicity function applied to the loadings to promote sparsity and interpretability. This method includes a \gamma that adjusts the allowance for factor correlations: values of \gamma close to zero permit higher correlations, while larger values (e.g., \gamma = 0 for direct quartimin) restrict them, balancing simplicity against realism. Direct oblimin, a computationally efficient variant, optimizes the function directly on the pattern matrix loadings, making it widely adopted for its simplicity and effectiveness in producing clear factor distinctions. Promax rotation, developed by Hendrickson and White in 1964, offers a practical to oblimin by first applying an orthogonal to achieve initial simple structure, followed by a power transformation (typically raising loadings to the ) and a adjustment to introduce obliqueness. This two-step process is computationally straightforward and often yields results comparable to more iterative methods, especially when factors exhibit moderate correlations, though it may slightly oversimplify in highly complex structures. In oblique solutions, the output includes the pattern matrix \Lambda, which contains the partial regression coefficients of variables on the factors (unique contributions after accounting for other factors), and the structure matrix \Lambda \Phi, which represents the total correlations between variables and factors (including shared variance). The factor correlation matrix \Phi is also reported, allowing assessment of inter-factor relationships; correlations in \Phi are interpreted directly, with values indicating the strength and direction of factor associations. For interpretation, the pattern matrix is typically preferred for identifying primary variable-factor links, while the structure matrix highlights overall affinities. The choice of oblique over orthogonal rotation is guided by substantive theory and empirical evidence from the solution: if correlations in \Phi exceed 0.32 in absolute value, oblique rotation is warranted, as this threshold signifies at least 10% overlapping variance among factors, justifying the allowance for correlation to avoid distorted interpretations.

Unrotated Solutions

In exploratory factor analysis, the initial extraction process produces an unrotated factor solution consisting of factor loadings that represent the correlations between the observed variables and the underlying factors. For extraction methods such as principal components analysis, these unrotated loadings are obtained from the eigenvectors of the correlation matrix, scaled by the square roots of the corresponding eigenvalues to yield the component loadings. This output forms the starting point for further analysis, capturing the maximum variance in a hierarchical manner across successive factors. A fundamental issue with the unrotated solution is rotational indeterminacy, arising from the mathematical structure of the factor model. Specifically, if \Lambda denotes the of unrotated factor loadings, then for any matrix T (where T^T T = I), the transformed loadings \Lambda T paired with transformed factors T^{-1} F reproduce the observed correlations exactly, yielding infinitely many equivalent solutions. This indeterminacy implies that the initial factors lack a unique orientation in the factor space, making direct psychological or substantive challenging without additional criteria. Heywood's problem further underscores the lack of constraints in unrotated solutions, where arbitrary and can lead to improper estimates, such as negative unique variances or communalities exceeding 1, as originally demonstrated in analyses without bounds on error terms. The unrotated solution nonetheless plays a crucial role by providing essential diagnostics, including eigenvalues that quantify the amount of variance explained by each and communalities that indicate the shared variance proportion for each . These metrics guide decisions on the number of factors to retain but highlight the solution's primary limitation: its loadings are often diffuse and , obscuring clear variable-factor associations and necessitating rotation for meaningful interpretation. Historically, early techniques like Thurstone's centroid method generated such unrotated approximations by averaging row and column correlations to estimate initial , offering computational simplicity but yielding results that were invariant neither across studies nor psychologically insightful.

Factor Interpretation and Validation

Interpreting Loadings and Communalities

In exploratory factor analysis, factor loadings represent the between each observed and the underlying s, with absolute values greater than 0.3 or 0.4 typically considered significant for , depending on sample (e.g., ≥0.32 for n ≥ at α=0.01), indicating a meaningful contribution to the . Loadings exceeding 0.5 are often viewed as primary or strong indicators, defining the core variables for a , while cross-loadings—salient associations with multiple s—should ideally be minimized or suppressed if below 0.3 to enhance clarity and avoid ambiguity in assignment. Rotation techniques play a key role in achieving this by maximizing high loadings and minimizing others, facilitating more interpretable patterns. Communalities, denoted as h^2, quantify the proportion of each variable's variance explained by the retained factors, calculated as the sum of the squared loadings for that variable across all factors. Values closer to 1 indicate strong representation by the factor model, whereas low communalities below 0.4 (or <0.2 in stricter guidelines) suggest that a substantial portion of the variable's variance remains unexplained, potentially signaling poor model fit or the need to reconsider variable inclusion. Aiming for simple structure in the loading matrix is essential for meaningful , as defined by Thurstone's criteria, which promote patterns where load predominantly on one and near-zero on others. These criteria include ensuring each has at least one near-zero loading, each has at least as many near-zero loadings as the number of factors, and for any pair of factors, a substantial number of load highly on one but near-zero on the other. The full set of Thurstone's five criteria for simple structure is outlined below:
CriterionDescription
1Each row () in the loading matrix contains at least one near-zero loading.
2Each column () contains at least as many near-zero loadings as the total number of factors.
3For every pair of factors, some variables have near-zero loadings on one factor and high loadings on the other.
4For every pair of factors, a sizable proportion of variables have near-zero loadings on both.
5For every pair of factors, only a small number of variables have high loadings on both.
Suppression effects can complicate when a exhibits a low or negative loading on one while enhancing the relationship between other variables and factors by suppressing irrelevant variance, often requiring careful examination of the pattern matrix to discern true associations. refers to the number of factors on which a has salient loadings; ideally, variables should exhibit low complexity (primarily loading on one factor) to align with simple structure, but higher complexity may indicate multifaceted constructs. Bipolar factors, characterized by both positive and negative loadings, represent oppositional dimensions (e.g., high versus low traits) and should be interpreted by considering the directionality of loadings to capture the underlying .

Computing Factor Scores

Factor scores in exploratory factor analysis represent estimates of an individual's standing on the underlying latent factors, derived from the observed variables after factor extraction and rotation. These scores are computed using the factor loading matrix and the covariance structure of the data, providing a way to summarize multivariate data into a reduced set of dimensions for individual-level analysis. One common method is Bartlett's regression approach, which minimizes the sum of squared differences between observed and predicted values to produce unbiased estimates. The formula for Bartlett factor scores is given by \hat{F} = \Phi \Lambda^T \Psi^{-1} X, where \hat{F} is the matrix of estimated factor scores, \Phi is the factor correlation matrix, \Lambda is the factor loading matrix, \Psi is the diagonal matrix of unique variances, and X is the matrix of standardized observed variables. This method, originally proposed by Bartlett, weights the observed variables by their unique variances to account for measurement error, resulting in scores that correlate only with their respective factors. For solutions involving orthogonal rotation, where factors are assumed uncorrelated, the Anderson-Rubin method provides an alternative estimation that ensures the factor scores maintain . This approach adjusts the criterion to produce scores uncorrelated with unique factors and other orthogonal factors, offering improved properties when is theoretically justified. Other techniques, such as and complete estimation methods, address trade-offs between bias and variability in score . applies weights inversely proportional to the unique variances, similar to but generalized for non-standard cases, while complete estimation incorporates full maximum likelihood principles to balance across factors. These approaches are particularly useful when dealing with rotations or complex structures, as they minimize overall prediction . Factor scores are often used to profile individuals on latent dimensions or as predictors in subsequent models, enabling the application of factor analytic results to individual data points. Their reliability can be evaluated using Thurstone's for the -based estimates, which computes the proportion of variance in the true explained by the score as \rho = \frac{\lambda^T \Psi^{-1} \lambda}{\lambda^T \Psi^{-1} \lambda + 1}, where \lambda is the vector of loadings for a . Despite these methods, factor scores remain estimates subject to inherent indeterminacy, as multiple sets of scores can reproduce the observed correlations equally well due to rotational freedom in the model. This indeterminacy implies that scores include estimation error and are not unique representations of the latent factors, necessitating caution in their as precise individual measures.

Applications and Extensions

Use in Psychometrics and Social Sciences

Exploratory factor analysis (EFA) plays a central role in for developing and refining psychological questionnaires by identifying underlying constructs from item responses, enabling the creation of reliable multi-dimensional scales. A seminal application is in personality assessment, where Lewis R. Goldberg applied EFA to lexical descriptors to delineate the personality factors—openness, , extraversion, , and —resulting in marker scales that have become foundational for measuring broad traits across diverse populations. Similarly, EFA has been instrumental in validating and refining the (MMPI) content scales, where factor analyses of item pools have helped evaluate core dimensions of , such as emotional distress and behavioral tendencies, enhancing the instrument's utility in clinical diagnostics. In intelligence testing, EFA facilitates item reduction by evaluating factor loadings and communalities to eliminate redundant or weakly contributing items, streamlining assessments while preserving construct coverage. For instance, during the refinement of cognitive ability measures, EFA identifies latent factors like verbal comprehension and perceptual reasoning, allowing developers to select a parsimonious set of items that maximize validity without inflating test length. This approach ensures that shortened versions of tests, such as adaptations of the Wechsler Adult Intelligence Scale, retain high fidelity to underlying cognitive dimensions. Within the social sciences, EFA is widely employed to measure attitudes and summarize survey data, uncovering latent structures in responses to complex social phenomena. In , for example, EFA has been used to develop scales assessing ideological orientations, revealing multi-dimensional constructs such as economic , , and from self-report items on policy preferences. In , EFA aids in condensing large-scale survey datasets, like those from national attitude polls, into interpretable factors that represent societal trends, such as dimensions of social trust or cultural values, thereby facilitating cross-sectional and longitudinal analyses. A prominent health-related example is the Health Survey, where EFA during its development confirmed an eight-scale structure encompassing physical and mental components, supporting its use in evaluating quality-of-life outcomes in population studies. The typical workflow in these fields involves iterative EFA applications for initial item selection and structure exploration, followed by to validate the emerging model. This process begins with examining correlations among a broad pool of items, extracting factors via methods like principal axis factoring, and refining by removing items with low loadings (e.g., below 0.40) or high cross-loadings, ultimately yielding psychometrically sound scales ready for broader application.

Applications in Other Disciplines

In , exploratory factor analysis (EFA) is employed to uncover underlying dimensions from consumer survey data, such as identifying key factors influencing perception. For instance, EFA has been applied to studies to reduce multiple attributes into latent factors like reliability and responsiveness, enabling marketers to refine strategies. Similarly, in , EFA analyzes attitudinal and behavioral variables to profile consumer groups, such as grouping preferences for product features to target specific segments more effectively. In and , EFA facilitates the identification of patterns in and trait clusters by reducing high-dimensional genomic data into interpretable latent factors. A notable application involves analyzing pathway copy number aberrations in datasets, where EFA extracts factors representing overall aberration levels and gain-versus-loss contrasts, which correlate with profiles to reveal regulatory mechanisms. In , EFA has been used to model multiple phenotypes in , identifying eight latent factors such as grain yield, architectural traits, content, and resistance, thereby clustering correlated traits and informing programs through inferred genetic networks. Within finance, EFA contributes to risk factor models by exploring latent structures in asset return data, aiding in the identification of underlying drivers of portfolio volatility. It supports by analyzing economic indicators like interest rates and to derive factors that explain s and mitigate risks in . Additionally, EFA enhances assessment by distilling financial and non-financial variables into factors that predict borrower default probabilities, improving lending decisions. In environmental science, EFA is utilized to analyze correlations among pollutants for constructing air quality indices and understanding emission sources. For example, EFA combined with principal component analysis has developed a new air quality index incorporating variables like NO₂, O₃, PM₁₀, CO, and SO₂, capturing synergistic effects across seasons in urban settings such as Delhi, where it revealed higher pollution in winter compared to monsoons. In studies of urban air pollution patterns, EFA identifies key factors like particulate matter contributions (explaining 40.85% of variance), gaseous pollutants from traffic (30.46%), and meteorological influences, as demonstrated in Hangzhou, China, to inform pollution control strategies. Adaptations of EFA, such as robust methods, address non-normal distributions common in and analyses. Techniques like minimum residuals (MINRES) or unweighted least squares (ULS) extraction, along with robust estimators (e.g., minimum ), maintain factor recovery accuracy under or outliers, as evaluated in simulations for environmental datasets. In , these robust EFA variants have been applied to variables, such as integrating , , and vegetation indices to extract latent factors representing seasonal variability and suitability without assuming multivariate normality.

Limitations and Considerations

Common Pitfalls and Criticisms

One common pitfall in exploratory factor analysis (EFA) is the over-reliance on the Kaiser criterion, which retains factors with eigenvalues greater than 1, often leading to overfactoring and retention of too many trivial factors. This rule has been widely discredited for its poor performance across various data conditions, yet it remains a in many software packages and was used in a substantial portion of published psychological studies reviewed between 1991 and 1995. Researchers are advised to combine it with other methods, such as parallel analysis or the , to avoid misleading structures. Another frequent error involves ignoring sample size effects, where small samples (e.g., below 200) distort factor loadings and communalities, particularly when factors are weakly determined or communalities are low. Guidelines recommend at least 5-10 observations per variable, with larger samples (up to 400-800) needed for complex models to ensure stable solutions. In a review of , modest sample sizes contributed to suboptimal EFA implementations in over half of the analyzed studies. Interpreting cross-loadings as substantively meaningful represents a further pitfall, as these often arise from rotational choices like orthogonal varimax, which can inflate secondary loadings and obscure simple structure. Cross-loadings exceeding 0.32 on multiple factors may signal poor item design or the need for oblique rotation, but overinterpretation without considering model fit can lead to erroneous factor assignments. Misuses of EFA include applying it to confirmatory questions, where an a priori structure exists, instead of using (CFA) for hypothesis testing. This exploratory approach cannot rigorously test causal relationships or predefined models, potentially yielding invalid inferences when treated as confirmatory. Additionally, analyzing (e.g., Likert scales) with Pearson correlations underestimates true population correlations and biases factor loadings downward, violating the assumption of multivariate normality. Polychoric correlations are recommended to mitigate this bias in such cases. Theoretical criticisms of EFA center on its rotational indeterminacy, where multiple factor solutions (via different ) fit the equally well, complicating unique interpretations. This indeterminacy arises from the common factor model's underidentification, allowing infinite transformations without altering reproduced correlations. Subjectivity in factor interpretation exacerbates this, as decisions on rotation type, factor retention, and labeling rely heavily on researcher judgment, with no objective criteria for "simple structure." The scree test, for instance, involves subjective identification of an "elbow" in the eigenvalue plot. EFA is also critiqued for its unsuitability for , as it identifies associations without establishing directionality or ruling out confounders, limiting its role to hypothesis generation rather than testing. underscores these issues: a of EFA applications in prominent psychological journals found that the majority involved at least one questionable , such as using principal components analysis instead of common factor extraction (in 48-53% of cases) or failing to report retention methods (in 32% of cases). Such errors routinely compromise the validity of conclusions in applied research.

Modern Alternatives and Advances

Modern alternatives to classical exploratory factor analysis (EFA) have addressed limitations in handling non-normal distributions, , small samples, and high-dimensional settings by incorporating robust estimation, , and regularization techniques. These advancements build on the foundational assumptions of EFA while enhancing flexibility and reliability in diverse data scenarios. Robust methods for EFA with non-normal data include robust maximum likelihood (ML) estimation, such as the Satorra-Bentler correction, which scales statistics and standard errors to account for deviations from . Developed in the late 1980s and refined through the 2000s, this approach adjusts for and , improving inference accuracy in EFA models estimated via software like lavaan or Mplus. For , such as Likert-scale items, (WLS) estimation, particularly diagonally weighted least squares with robust standard errors (WLSMV), is recommended as it uses polychoric correlations to treat categories as ordered thresholds rather than continuous variables, yielding more accurate factor loadings and fit indices compared to standard ML. Bayesian EFA introduces prior distributions on factor loadings and variances, enabling MCMC estimation via tools like JAGS or to sample from the posterior distribution. This post-2010 framework, which imposes sparsity-inducing priors to aid factor identification, performs better with small samples by incorporating uncertainty and avoiding Heywood cases common in classical . For instance, informative priors on loadings can shrink irrelevant paths to zero, facilitating interpretable solutions in low-power settings. Exploratory structural equation modeling (ESEM), proposed in 2009, integrates EFA's rotational freedom directly into frameworks, allowing all cross-loadings to be estimated without the orthogonality constraints of (CFA). This approach, implemented in Mplus, provides a more realistic representation of complex factor structures in . Complementing this, regularized EFA applies penalties like L1 () to the during factor extraction, promoting sparse and interpretable loadings in high-dimensional data where variable-to-sample ratios exceed classical thresholds. Recent implementations, such as those in the R package regsem, have demonstrated high recovery rates for true zero loadings in simulations. By 2025, integrations have extended EFA to nonlinear factors through variational autoencoders (VAEs), which learn latent representations via neural networks trained on reconstruction loss, capturing non-Gaussian dependencies that linear models miss. For example, VAEs applied to psychological datasets have identified curved manifolds in item-factor relations, outperforming traditional EFA in recovery of nonlinear structures with errors reduced by up to 20%. Open-source tools have advanced accordingly, with the psych reaching version 2.5.6 in mid-2025, incorporating updated EFA functions for robust estimators, Bayesian interfaces via bridges to , and regularization options to support high-dimensional analyses.

References

  1. [1]
    Exploratory Factor Analysis
    Factor analysis is a family of techniques used to identify the structure of observed data and reveal constructs that give rise to observed phenomena.Missing: scholarly | Show results with:scholarly
  2. [2]
    Factor Analysis: a means for theory and instrument development in ...
    Nov 6, 2020 · Factor analysis (FA) allows us to simplify a set of complex variables or items using statistical procedures to explore the underlying dimensions ...
  3. [3]
    [PDF] A Beginner's Guide to Factor Analysis: Focusing on Exploratory ...
    The basic idea behind this model is that factor analysis tries to look for factors such that when these factors are extracted, there remain no intercorrelations.Missing: sources | Show results with:sources
  4. [4]
  5. [5]
  6. [6]
    Factor Analysis Model - an overview | ScienceDirect Topics
    Common factors represent common variance and unique factors represent unique variance. The FA model is: (1) ∑ x x = Λ Φ Λ T + D ψ ,
  7. [7]
    Encyclopedia of Research Design - Exploratory Factor Analysis
    ... Σ of the observed variables x under EFA given as Σ = ΛΛ' + ψ can be rewritten as. where Λ* = ΛT. This indicates that the EFA model has an ...
  8. [8]
    [PDF] 203-30: Principal Component Analysis versus Exploratory Factor ...
    Principal Component Analysis (PCA) and Exploratory Factor Analysis (EFA) are both variable reduction techniques and sometimes mistaken as the same statistical ...
  9. [9]
    An index of factorial simplicity | Psychometrika
    Sep 5, 1973 · An index of factorial simplicity. Published: March 1974. Volume 39, pages 31–36, (1974); Cite this article. Download PDF · Psychometrika Aims ...
  10. [10]
    [PDF] Modern Factor Analysis
    Continued research has led to new techniques for coping with long-standing problems in factor analysis as well as for pushing into areas previously unexplored.
  11. [11]
    [PDF] Factor Analysis - University of Minnesota Twin Cities
    Mar 16, 2017 · X = (X1,...,Xp)0 is a random vector with mean vector µ and covariance matrix Σ. The Factor Analysis model assumes that. X = µ + LF + where. L = ...
  12. [12]
    A comparison of implementations of principal axis factoring and ...
    Jun 7, 2021 · In the present study, we focus on implementations of a specific procedure within a frequently used statistical framework—EFA—in two of the most ...
  13. [13]
    Components (PCA) and Exploratory Factor Analysis (EFA) with SPSS
    This seminar will give a practical overview of both principal components analysis (PCA) and exploratory factor analysis (EFA) using SPSS.
  14. [14]
    Some contributions to maximum likelihood factor analysis
    Download PDF · Psychometrika Aims and scope. Some contributions ... Cite this article. Jöreskog, K.G. Some contributions to maximum likelihood factor analysis.
  15. [15]
    Testing Negative Error Variances: Is a Heywood Case a Symptom of ...
    Heywood cases, or negative variance estimates, are a common occurrence in factor analysis and latent variable structural equation models.
  16. [16]
    [PDF] Principles of Exploratory Factor Analysis
    Kaiser (1960) proposed that principal components with eigenvalues of 1.00 or above should be retained, and it is this quick-and-dirty heuristic that has been ...
  17. [17]
    Principal Axis Method of Factor Extraction - Real Statistics Using Excel
    In the principal axis factoring method, we make an initial estimate of the common variance in which the communalities are less than 1.
  18. [18]
    The Scree Test For The Number Of Factors - Taylor & Francis Online
    (1966). The Scree Test For The Number Of Factors. Multivariate Behavioral Research: Vol. 1, No. 2, pp. 245-276.
  19. [19]
    [PDF] VERY SIMPLE STRUCTURE: AN ALTERNATIVE PROCEDURE ...
    ABSTRACT. A new procedure for determining the optimal number of interpretable factors to extract from a correlation matrix is introduced and compared to.
  20. [20]
    [PDF] A Comparison of Ten Methods for Determining the Number of ...
    We examined 10 methods for determining the number of factors for exploratory factor analysis. The investigation considered a wide variety of factor ...
  21. [21]
    The varimax criterion for analytic rotation in factor analysis
    The varimax criterion is an analytic criterion for rotation in factor analysis, related to simple structure and factorial invariance, and is a two-dimensional ...
  22. [22]
    [PDF] M004 578 - ERIC
    Saunders, D. R.. Trans-varimax: some properties of the ratiomax and equamax criteria for blind orthogaonal rotation. Paper presented at the annual meeting of ...
  23. [23]
    Orthogonal Rotation - an overview | ScienceDirect Topics
    Orthogonal rotation in factor analysis assumes factors are uncorrelated, simplifying interpretation, unlike oblique rotation which allows correlations.
  24. [24]
    [PDF] Newsom: Principal Components Analysis
    Loadings for the principal components, B, are computed by multiplying the eigenvectors, V, by the square root of the eigenvalues, L .
  25. [25]
    None
    ### Summary of Key Points on Factor Indeterminacy
  26. [26]
    Conditions for factor (in)determinacy in factor analysis | Psychometrika
    Jan 6, 1998 · It is shown that (un)observable factors are (in)determinate. Specifically, the indeterminacy proof by Guttman is extended to Heywood cases. The ...
  27. [27]
    A Practical Introduction to Factor Analysis: Exploratory ... - OARC Stats
    This seminar is the first part of a two-part seminar that introduces central concepts in factor analysis. Part 1 focuses on exploratory factor analysis (EFA).Missing: classical Σ = Λ Λ^ Ψ
  28. [28]
    [PDF] Factor Rotations in Factor Analyses. - The University of Texas at Dallas
    In order to make the interpretation of the factors that are considered rel- evant, the first selection step is generally followed by a rotation of the factors.
  29. [29]
    [PDF] Factor Scores 1 - ERIC
    There are dozens of factor score estimation methods available and we‟ll consider four of the more common methods in this paper: (1) Regression; (2) Bartlett; (3) ...
  30. [30]
    (PDF) Understanding and Using Factor Scores: Considerations for ...
    This article discusses popular methods to create factor scores under two different classes: refined and non-refined.
  31. [31]
    Exploratory factor analysis—Parameter estimation and scores ...
    The estimating equations are conveniently expressed by use of a singular value decomposition (SVD) under the same scaling. A simple iteration scheme, in the ...<|control11|><|separator|>
  32. [32]
  33. [33]
    Multiple factor analysis and psychological concepts. - APA PsycNet
    A criticism of the assumption of psychological independence of factors made by Thurstone in his justifications and interpretations of multiple factor analyses ...Missing: scores | Show results with:scores
  34. [34]
    (PDF) Reliability estimates for three factor score predictors
    PDF | Estimates for the reliability of Thurstone's regression factor score predictor, Bartlett's factor score predictor, and McDonald's factor score.
  35. [35]
    (PDF) Computing and Evaluating Factor Scores - ResearchGate
    Oct 9, 2025 · This article reviews the history and nature of factor score indeterminacy. Novel computer programs for assessing the degree of indeterminacy in a given ...
  36. [36]
    [PDF] Principles of Exploratory Factor Analysis1 - Oregon Research Institute
    Assessing the Big Five: Applications of 10 psychometric criteria to the development of marker scales. ... Psychological Bulletin, 99, 432-442. Page 30 ...
  37. [37]
    [PDF] An Alternative "Description of Personality": The Big-Five Factor ...
    ... psychometric analysis of descriptions of oneself and various types ofothers. JournalofPerson- ality and Social Psychology, 41, 517-552. Goldberg, L. R. (1982).
  38. [38]
    [PDF] Factor Analysis of the MMPI-A Content Scales - ODU Digital Commons
    Factor analysis is useful for both exploratory and confirmatory purposes. The two general exploratory uses of factor analysis in the development and evaluation ...
  39. [39]
    [PDF] Factor analysis and item analysis
    Factor analysis is a statistical method mainly developed by psychologists to study the empirical patterns of psychological test scores. The method was first ...
  40. [40]
    Factor Analysis: Complete Guide to Data Reduction & Latent Variables
    Factor interpretation requires examining the pattern of loadings, considering both magnitude and sign. Items with loadings above 0.40 typically define factors, ...
  41. [41]
    Item Removal Strategies Conducted in Exploratory Factor Analysis
    The aim of this study is to examine how the practice of different item removal strategies during exploratory factor analysis (EFA) phase ...
  42. [42]
    [PDF] On the Measurement and Validation of Political Ideology
    Using an exploratory factor analysis, we find three dimensions of ideology: Economic Socialism, Contemporary Populism and Social Conservatism.
  43. [43]
    Overview of the SF-36 Health Survey and the International Quality of ...
    This article presents information about the development and evaluation of the SF-36 Health Survey, a 36-item generic measure of health status.Missing: exploratory | Show results with:exploratory
  44. [44]
    Applied Psychometrics: The Steps of Scale Development and ...
    The item selection criteria described comprise an expert panel review, pretesting and item analysis. Finally, the dimensionality evaluation is summarized along ...
  45. [45]
    [PDF] Scale Development and Validation: Methodology and ...
    Jan 4, 2020 · An iterative process may be required to test, revise, and retest scale items, as well as the overall scale (DeVellis, 2017). Internal Structure ...
  46. [46]
    Factor Analysis In Marketing Research - B2B International
    Exploratory factor analysis is driven by the data, i.e. the data determines the factors. Confirmatory factor analysis, used in structural equation modelling, ...
  47. [47]
    Factor Analysis - Definition, Influence on Finance, Types
    Exploratory Factor Analysis is a data-driven approach that attempts to identify the underlying structure or patterns in a set of observed variables. It is a ...How Factor Analysis is Used · The Influence on Finance · Types of Factor Analysis
  48. [48]
  49. [49]
    Exploratory Structural Equation Modeling - Taylor & Francis Online
    Tihomir Asparouhov Muthén & Muthén. &. Bengt Muthén University of California, Los Angeles. Pages 397-438 | Published online: 14 Jul 2009. Cite this article ...
  50. [50]
    Bayesian Exploratory Factor Analysis - PMC - NIH
    This paper develops and applies a Bayesian approach to Exploratory Factor Analysis that improves on ad hoc classical approaches.Missing: JAGS Stan
  51. [51]
    Evaluation of Test Statistics for Robust Structural Equation Modeling ...
    Nov 30, 2011 · A systematic simulation study is conducted to evaluate the accuracy of the parameter estimates and the performance of five test statistics.
  52. [52]
    Factor Analysis of Ordinal Items: Old Questions, Modern Solutions?
    Polychoric versus Pearson correlations in exploratory and confirmatory factor analysis of ordinal variables. Qual. Quant. 2009, 44, 153–166. [Google Scholar] ...Factor Analysis Of Ordinal... · 1.1. Factor Analysis · 2.2. The Polychoric...
  53. [53]
    [PDF] Efficient Bayesian Structural Equation Modeling in Stan
    Structural equation models comprise a large class of popular statistical models, in- cluding factor analysis models, certain mixed models, and extensions ...Missing: exploratory | Show results with:exploratory
  54. [54]
    Regularized Exploratory Factor Analysis as an Alternative to Factor ...
    Oct 11, 2023 · Many psychologists argue that oblique rotation methods should be preferred as they do not constrain between-factor correlations to zero which ...
  55. [55]
    Exploring Factor Structures Using Variational Autoencoder in ... - NIH
    In this paper, we investigate if an effective deep learning tool for factor extraction, the Variational Autoencoder (VAE), can be applied to explore the factor ...
  56. [56]
    Exploring the Potential of Variational Autoencoders for Modeling ...
    In this study, we investigate the capacity of autoencoders to model item–factor relationships in simulated data, which encompasses linear and nonlinear ...
  57. [57]
    [PDF] CRAN - Package 'psych'
    Jun 23, 2025 · Page 1. Package 'psych'. July 23, 2025. Version 2.5.6. Date 2025-06-20. Title Procedures for Psychological, Psychometric, and Personality.