Fact-checked by Grok 2 weeks ago

Rasch model

The Rasch model, also known as the one-parameter logistic (1PL) model, is a psychometric framework within item response theory (IRT) that models the probability of an individual's correct response to a dichotomous test item as a logistic function of the difference between the person's latent trait level (such as ability or attitude) and the item's difficulty parameter, assuming uniform item discrimination across all items.^[1] This model enables the estimation of interval-level measures from ordinal data, facilitating objective and invariant comparisons of person abilities and item difficulties independent of the specific sample or test form used.^[2] The mathematical formulation is given by P_{ni}(x=1) = \frac{e^{(\theta_n - \delta_i)}}{1 + e^{(\theta_n - \delta_i)}}, where \theta_n represents person n's ability and \delta_i denotes item i's difficulty, both expressed in logit units.^[1] Developed by Danish mathematician Georg Rasch in the 1950s, the model was first formalized in his 1960 monograph Probabilistic Models for Some Intelligence and Attainment Tests, which applied probabilistic approaches to educational assessment data from Danish schools.^[3] Building on earlier scaling methods like those of Louis Leon Thurstone in the 1920s, Rasch's work emphasized "specific objectivity," ensuring that measurements remain consistent regardless of the persons or items involved, a principle that distinguished it from classical test theory.^[1] The model's adoption accelerated in the 1960s through collaborations, particularly with American psychometrician Benjamin Drake Wright, who introduced it via lectures and training at the University of Chicago, leading to extensions such as the partial credit model for polytomous responses (Masters, 1982) and multifaceted versions for rater effects (Linacre, 1989).^[3] Key assumptions of the Rasch model include unidimensionality (all items measure a single underlying trait), local independence (item responses are conditionally independent given the trait level), and monotonicity (higher trait levels increase the probability of success).^[1] These features allow for rigorous evaluation of instrument quality through fit statistics and differential item functioning analysis, making it particularly valuable for developing and refining scales in fields beyond education, such as health outcomes (e.g., patient-reported measures like the Eating Assessment Tool) and social sciences.^[2] By the 2010s, Rasch-based research had produced over 5,000 publications, underscoring its enduring influence on measurement science through accessible software like Winsteps and RUMM.^[3]

Overview

Definition and purpose

The Rasch model is a one-parameter logistic model in item response theory (IRT) that estimates the probability of a correct response to a binary item as a function of the difference between a person's latent ability parameter (θ) and the item's difficulty parameter (β).^[4] Developed by Danish mathematician Georg Rasch, it assumes that observed responses reflect an underlying probabilistic structure where success depends solely on this ability-difficulty contrast, without additional item-specific discrimination parameters varying across items.^[1] The primary purpose of the Rasch model is to facilitate invariant measurement, meaning that estimates of person abilities and item difficulties remain consistent regardless of the particular sample of persons or items used in the assessment, thereby enabling objective comparisons.^[4] This contrasts with classical test theory (CTT), which relies on aggregate test scores and is sample-dependent, often producing ordinal rather than interval-level measurements that vary across different groups or item sets.^[5] By achieving parameter separability—where person and item parameters are estimated independently—the model supports fundamental measurement in fields like education and psychology, promoting fairness and precision in assessing latent traits. Key assumptions underlying the Rasch model include unidimensionality, positing that all items measure a single latent trait; local stochastic independence, ensuring that responses to items are independent given the person's ability; and equal item discrimination, with all items having the same discrimination parameter fixed at unity.^[1] Additionally, the model assumes monotonicity, where the probability of a correct response increases as ability exceeds item difficulty.^[4] For example, in educational testing, the Rasch model can analyze student responses to multiple-choice questions, revealing that the likelihood of a correct answer rises monotonically with the student's reading comprehension ability relative to each question's difficulty, allowing for tailored assessments that maintain measurement invariance across diverse student populations.

Historical development

The Rasch model was developed by Danish mathematician Georg Rasch during the 1950s as a probabilistic framework for analyzing categorical response data in educational and psychological assessments, particularly to estimate latent traits such as ability and item difficulty.^[6] Rasch first applied the model empirically to reading comprehension data in the 1950s, modeling counts of errors in oral reading tasks to demonstrate its utility in attainment testing.^[7] This application formed the basis for his foundational publication, Probabilistic Models for Some Intelligence and Attainment Tests, which presented the model as a means to achieve invariant comparisons between persons and items in intelligence and achievement contexts.^[8] The model's theoretical underpinnings were influenced by L.L. Thurstone's earlier work on psychological scaling, which sought to place items and individuals on a common metric for comparative measurement.^[1] Rasch extended these ideas by integrating Ronald A. Fisher's concept of statistical sufficiency, ensuring that parameter estimates remained stable regardless of the specific sample of respondents or items, thus enabling objective inferences about underlying constructs.^[9] Adoption of the Rasch model accelerated in the 1960s through the advocacy of Benjamin D. Wright and collaborators at the University of Chicago, who emphasized its practical implementation via computational tools and educational programs.^[10] Wright, having invited Rasch to lecture at Chicago in 1960 and overseen the 1980 English republication of his book, organized the inaugural International Objective Measurement Workshop in 1981, fostering a community around the approach.^[11] This effort catalyzed the Rasch measurement movement, promoting the model as a cornerstone for sample-independent, fundamental measurement in the social sciences.^[12] Over subsequent decades, the Rasch model transitioned from a specialized tool for probabilistic modeling of test responses to a broader paradigm for objective measurement theory, aligning psychometric practices with principles of invariance and separability akin to those in physical metrology.^[13]

Mathematical formulation

Dichotomous model

The dichotomous Rasch model specifies the probability of a correct response to a binary item, assuming unidimensionality of the underlying trait.^[2] For person n with ability \theta_n responding to item i with difficulty \beta_i, the probability P(X_{ni}=1 \mid \theta_n, \beta_i) of a correct response (X_{ni}=1) is given by the logistic function:

P(X_{ni}=1 \mid \theta_n, \beta_i) = \frac{e^{\theta_n - \beta_i}}{1 + e^{\theta_n - \beta_i}}.

This equation models the response as a function of the difference between ability and difficulty, with higher ability relative to difficulty increasing the probability of success.^[2]^[14] The logit form of this probability, \log\left(\frac{P(X_{ni}=1 \mid \theta_n, \beta_i)}{1 - P(X_{ni}=1 \mid \theta_n, \beta_i)}\right) = \theta_n - \beta_i, directly links the log-odds of success to the linear difference on a logistic scale, where \theta_n and \beta_i are expressed in logit units.^[2] The model can be viewed as a logistic regression for each item, treating ability \theta_n as the predictor and difficulty \beta_i as the intercept, with the response X_{ni} as the binary outcome; this perspective highlights its equivalence to a conditional logistic regression framework under specific constraints.^[15] Derivationally, the Rasch model emerges from the exponential family of distributions, where the joint probability of responses factorizes to separate person and item contributions via sufficient statistics: the total score for each person Y_{n+} = \sum_i X_{ni} is sufficient for \theta_n, and the total score for each item Y_{+i} = \sum_n X_{ni} is sufficient for \beta_i, ensuring parameter separability and enabling independent estimation.^[16] The resulting logit scale provides interval-level measurement, where equal intervals represent equal changes in the log-odds of success, allowing direct comparability of ability and difficulty locations along a continuous linear continuum in logits.^[17]

Parameter estimation methods

Parameter estimation in the Rasch model involves deriving values for person abilities \theta_i and item difficulties \beta_j from observed response data Y_{ij}, where Y_{ij} = 1 indicates a correct response by person i to item j. Several maximum likelihood-based methods are employed, each addressing the incidental parameters problem inherent in item response theory, where the number of person parameters grows with the sample size. These methods vary in their treatment of person abilities and assumptions about their distribution, impacting consistency, bias, and computational feasibility.^[18] Joint maximum likelihood (JML) estimation simultaneously maximizes the likelihood for both person abilities \theta_i and item difficulties \beta_j by treating all parameters as fixed effects. The log-likelihood function is given by

\ell_J(\theta, \beta) = \sum_i \sum_j \left[ Y_{ij} \log p_{ij} + (1 - Y_{ij}) \log (1 - p_{ij}) \right],

where p_{ij} = \frac{\exp(\theta_i - \beta_j)}{1 + \exp(\theta_i - \beta_j)} and typically \beta_1 = 0 for identifiability. This approach is computationally efficient, often using iterative algorithms like Newton-Raphson, and provides reasonable starting values for other methods. However, JML yields inconsistent estimates for finite samples because person parameters are incidental; as the number of persons increases while items remain fixed, biases accumulate, particularly for extreme scores where persons achieve all correct or all incorrect responses.^[18]^[19] Conditional maximum likelihood (CML) estimation addresses JML's inconsistencies by conditioning on the sufficient statistics for person abilities—the total scores Y_{i+} = \sum_j Y_{ij}—thereby eliminating \theta_i from the estimation. The conditional likelihood for item parameters is

L_C(\beta | \{Y_{i+}\}) = \prod_k \frac{\exp\left( -\sum_j \beta_j c_{jk} \right)}{\sum_{\mathbf{c} \in C_k} \exp\left( -\sum_j \beta_j c_j \right)},

where c_{jk} counts the number of persons with total score k who responded correctly to item j, and C_k is the set of response patterns with exactly k correct answers. Maximization yields consistent and asymptotically normal estimates of \beta_j as the number of persons grows, independent of the ability distribution. CML is particularly suitable for the Rasch model due to its sufficiency properties but requires complete data across items for all persons and can be computationally intensive for large datasets, though modern implementations mitigate this.^[18] Marginal maximum likelihood (MML) estimation integrates out person abilities by assuming they follow a known distribution, typically standard normal \phi(\theta), treating items as fixed and persons as random effects. The marginal likelihood is

L_M(\beta) = \prod_i \int \prod_j p_{ij}^{Y_{ij}} (1 - p_{ij})^{1 - Y_{ij}} \phi(\theta_i) \, d\theta_i,

approximated numerically via Gauss-Hermite quadrature. This method produces consistent estimates for both \beta_j and the ability distribution parameters, even with extreme scores, and is widely implemented in software such as the R package ltm, which uses MML for Rasch model fitting. MML is advantageous for moderate sample sizes and allows estimation of person abilities via empirical Bayes methods post-hoc.^[20] For person ability estimation, Warm's weighted likelihood estimation (WLE) provides a bias-adjusted alternative to direct maximum likelihood, which can be infinite for perfect scores. WLE computes \hat{\theta}_i by minimizing

\sum_j w_{ij} \left[ Y_{ij} \log p_{ij}(\theta) + (1 - Y_{ij}) \log (1 - p_{ij}(\theta)) \right],

where weights w_{ij} are chosen to reduce bias, often based on the information matrix. This method improves stability and mean-squared error over unweighted maximum likelihood, particularly in small samples or with sparse data. Estimation in the Rasch model faces challenges related to sample size and data completeness. Stable item parameter estimates generally require at least 30 persons per item to minimize sampling variability and ensure model fit assessment reliability. For missing data, pairwise deletion—using only observed responses for each item-person pair—is commonly applied in JML and pairwise maximum likelihood approaches, preserving information without imputation bias, though it may reduce effective sample size for correlated items.^[21]^[22]

Key properties

Invariant measurement

In the Rasch model, invariant measurement refers to the property where estimates of person ability are independent of the particular set of items administered, and estimates of item difficulty are independent of the particular sample of persons tested, a concept known as specific objectivity.^[23] This invariance ensures that comparisons between persons or between items remain consistent regardless of the context in which they are observed, provided the data fit the model.^[24] Specific objectivity arises from the model's structure, which separates person and item parameters, allowing for objective scaling that aligns with fundamental measurement principles in the sciences.^[23] The mathematical basis for this invariance stems from the separability of parameters in the Rasch model, where the log-odds of a correct response for person n on item i is given by \theta_n - \beta_i, with \theta_n representing person ability and \beta_i item difficulty.^[24] Under conditional inference, these log-odds ratios are invariant because the model's probabilistic structure ensures that parameter estimates do not depend on the specific sample or test form, as long as the sufficient statistics for persons and items are used.^[24] This separability contrasts with classical test theory (CTT), where item difficulties and person scores are sample-dependent and test-dependent, respectively, leading to comparisons that vary across administrations. The implications of invariant measurement include the ability to equitably link different test forms and scales, facilitating fair comparisons over time or across groups. For instance, in adaptive testing, item banks can be used to administer tailored subsets of items to persons, yet person ability estimates remain comparable across individuals due to the model's invariance, enabling efficient and precise measurement without compromising objectivity. This property supports the construction of stable measurement systems in fields like education and psychology, where consistent scaling is essential for monitoring progress or evaluating interventions.^[23] However, invariant measurement in the Rasch model assumes adequate fit to the data; violations of model assumptions, such as local independence or unidimensionality, can introduce dependencies that undermine parameter invariance and lead to biased estimates.^[25]

Sufficiency and conditional independence

The Rasch model belongs to the exponential family of probability distributions, a property that guarantees the existence of sufficient statistics for its parameters. In this context, the total score for a person—defined as the sum of their binary responses across all items—serves as a minimal sufficient statistic for estimating the person's ability parameter θ. Likewise, the column total for each item, representing the sum of responses across all persons, is a sufficient statistic for the item's difficulty parameter β. This sufficiency implies that all relevant information about θ or β is encapsulated in these marginal totals, independent of the specific patterns of individual responses.^[26]^[27] A key consequence of this structure is the factorization of the joint likelihood function. The likelihood of the observed response matrix X given the parameters can be decomposed into separate components reliant solely on the sufficient totals:

L(\theta, \beta \mid X) = L(\theta \mid \mathbf{r}) \cdot L(\beta \mid \mathbf{s}),

where \mathbf{r} denotes the vector of person total scores and \mathbf{s} the vector of item total scores. This separation arises directly from the exponential family form of the model, allowing estimation of person and item parameters to proceed independently once the totals are observed.^[28] The sufficiency property underpins conditional independence in the Rasch model: given a person's total score, their responses to individual items are exchangeable and conditionally independent. That is, the probability of a specific response pattern, conditional on the total, depends only on the item difficulties and not on the person's ability or correlations between items. This exchangeability means that any response pattern yielding the same total score is equally likely, facilitating person-free calibration of item parameters without needing to estimate individual θ values simultaneously.^[27]^[26] These properties have significant implications for model estimation and application. For instance, conditional maximum likelihood (CML) estimation leverages this independence to derive consistent item parameter estimates by conditioning on the observed totals, avoiding incidental parameter bias associated with full maximum likelihood. Moreover, sufficiency enables efficient probabilistic predictions and model comparisons using only the total scores, rather than the entire response matrix, which reduces computational demands in large datasets. The framework also connects to the additivity of the measurement scale: by ensuring that person and item effects combine additively on the logit scale, the model realizes Rasch's vision of conjoint additivity, where comparisons remain invariant across contexts.^[28]^[14]

Model extensions

Polytomous response models

The Rasch model extends to polytomous response formats to analyze ordered categorical data beyond binary outcomes, maintaining the core principles of probabilistic measurement while accounting for multiple response levels. These extensions are particularly useful for items where respondents select from a scale of ordered options, such as agreement levels or performance gradations. Two key models in this family are the Rating Scale Model (RSM) and the Partial Credit Model (PCM), each addressing specific structures in response data.^[29] The Rating Scale Model (RSM), proposed by Andrich in 1978, applies to sets of items sharing a common rating scale structure, such as Likert-type items assessing attitudes or perceptions. In the RSM, the probability P(X_{ni} = k) that person n scores in category k (where k = 0, 1, \dots, M) on item i is modeled using shared category thresholds \delta_k across items:

P(X_{ni} = k) = \frac{\exp \left[ \sum_{j=0}^{k} (\theta_n - \beta_i - \delta_j) \right]}{\sum_{m=0}^{M} \exp \left[ \sum_{j=0}^{m} (\theta_n - \beta_i - \delta_j) \right]},

where \theta_n is the person's ability, \beta_i is the item's difficulty, and \delta_j (with \delta_0 = 0) represent the step difficulties between categories, identical for all items. This formulation arises from an adjacent-categories logit framework, where the log-odds of responding in category k rather than k-1 equals \theta_n - (\beta_i + \delta_k). The thresholds \delta_k are interpreted as the additional difficulty required to advance from one category to the next, enabling the assessment of how uniformly the scale functions across items.^[30]^[31] In contrast, the Partial Credit Model (PCM), introduced by Masters in 1982, allows each item to have its own unique set of category thresholds, making it ideal for constructed-response tasks where partial credit reflects varying step difficulties per item. The probability P(X_{ni} = m) (for m = 0, 1, \dots, M) is:

P(X_{ni} = m) = \frac{\exp \left[ \sum_{j=0}^{m} (\theta_n - \delta_{ij}) \right]}{\sum_{l=0}^{M} \exp \left[ \sum_{j=0}^{l} (\theta_n - \delta_{ij}) \right]},

where \delta_{ij} denotes the difficulty of step j for item i (with \delta_{i0} = 0), and the item's overall difficulty emerges from the cumulative steps. This can be viewed through an adjacent-categories logit, with the log-odds between consecutive categories m-1 and m as \theta_n - \delta_{im}, or a cumulative logit form that compares the probability of scoring at or above m versus below. The step parameters \delta_{ij} quantify the incremental challenges within each item, such as progressing from incorrect to partially correct responses.^[32]^[29] The RSM and PCM differ primarily in their assumption about threshold uniformity: the RSM imposes a common structure suitable for standardized scales, reducing parameters and enhancing stability when category observations are sparse, while the PCM's item-specific thresholds offer flexibility for heterogeneous tasks but require larger samples to estimate reliably. Both models preserve Rasch invariances, such as separation of person and item parameters, and can be framed equivalently in adjacent or cumulative logit terms for interpretation. A chi-square difference test or comparison of fit indices, such as sample separation, can guide model selection based on data structure.^[29]^[31] These polytomous models find applications in attitude surveys using Likert scales, where the RSM evaluates consistent response patterns across items, and in performance assessments like open-ended tasks, where the PCM assigns nuanced credit for partial successes, such as in educational evaluations of problem-solving steps. In both cases, threshold estimates reveal scale functioning, informing item design by identifying disordered steps that disrupt measurement precision.^[31]^[32]

Multidimensional variants

The multidimensional Rasch model (MRM) extends the unidimensional Rasch framework to account for multiple latent traits, allowing for the measurement of correlated abilities or skills within a single assessment. In this model, the probability of a correct response incorporates a vector of person abilities \theta_n = (\theta_{n1}, \theta_{n2}, \dots, \theta_{nD}) and an item discrimination vector \alpha_i = (\alpha_{i1}, \alpha_{i2}, \dots, \alpha_{iD}), where in the Rasch case, components of \alpha_i are fixed to 1 for the dimensions the item measures (often using a Q-matrix to indicate loadings) and 0 otherwise; it is often simplified by assuming unit discrimination. The response probability for a dichotomous item is given by:

P(X_{ni} = 1 \mid \theta_n, \beta_i, \alpha_i) = \frac{\exp(\alpha_i \cdot \theta_n - \beta_i)}{1 + \exp(\alpha_i \cdot \theta_n - \beta_i)},

where \beta_i is the scalar item's difficulty.^[33] Applications of the MRM are particularly valuable in educational testing where constructs involve distinct but related subskills, such as in mathematics assessments distinguishing between algebra (e.g., modeling relationships) and geometry (e.g., spatial reasoning) traits.^[34] For instance, analyses of PISA mathematics data have used the MRM to calibrate items across domains like quantity, uncertainty, space and shape, and change and relationships, revealing nuanced performance patterns across these multidimensional skills.^[34] Estimation in the MRM presents challenges due to the increased number of parameters with higher dimensionality, which can lead to issues like slower convergence and higher computational demands compared to unidimensional models.^[35] Common approaches include marginal maximum likelihood (MML) estimation, which integrates over the ability distribution to avoid incidental parameters problems, and Bayesian methods using Markov chain Monte Carlo (MCMC) for handling complex priors and multidimensional integrals.^[36] Key properties of the Rasch model are partially retained in the MRM; for example, measurement invariance holds conditionally on the discrimination parameters if they are constrained to be equal across dimensions, preserving comparability of ability estimates along specific trait directions.^[37] Additionally, the MRM serves as a diagnostic tool for detecting violations of unidimensionality, as model fit comparisons (e.g., via likelihood ratio tests) can indicate whether multiple traits better explain the data structure. A related extension is the many-facet Rasch model (MFRM; Linacre, 1989), which accounts for multiple facets such as rater or judge effects in subjective assessments while maintaining a unidimensional latent trait. This model incorporates facets like judges' severity and specific criteria alongside person abilities and item difficulties to enable fairer measurement by adjusting for variability in rater judgments.^[38]^[39]

Applications and interpretations

Educational and psychological testing

The Rasch model is widely applied in educational testing for item banking, which involves calibrating items on a common scale to create pools for constructing equivalent test forms. This approach enables equitable score comparisons across administrations, as seen in standardized assessments where items are selected and equated based on their difficulty parameters independent of the test-taking sample. For instance, in large-scale programs like Australia's National Assessment Program – Literacy and Numeracy (NAPLAN), Rasch measurement supports item banking to ensure consistent evaluation of student achievement across diverse populations. Similarly, the Graduate Record Examination (GRE) employs item response theory (IRT) frameworks, such as the 2PL model, for equating sections and maintaining fairness in admissions testing.^[40]^[41] In computerized adaptive testing (CAT), the Rasch model facilitates real-time item selection tailored to an examinee's estimated ability (θ), optimizing test efficiency by administering fewer items while achieving high precision. CAT systems using Rasch select subsequent items that maximize information at the current θ estimate, reducing test length by up to 50% compared to fixed-form tests without compromising reliability. This has been implemented in educational contexts, such as nursing competency assessments, where adaptive algorithms based on Rasch improve measurement accuracy for individualized learning evaluations.^[42]^[43]^[44] Psychological applications of the Rasch model extend to measuring latent traits like attitudes and health outcomes, often via extensions such as the partial credit model (PCM) for polytomous responses. In depression assessment, the Depression Anxiety Stress Scales (DASS-21) have been validated using Rasch analysis under the PCM, confirming unidimensionality and reliable scoring across response categories for clinical screening. The model also supports patient-reported outcomes (PROs) in clinical trials, where Rasch-calibrated scales quantify changes in symptoms like pain or function, enhancing sensitivity to treatment effects in randomized studies. For example, Rasch optimization of PRO measures has improved the detection of meaningful differences in mobility self-reports for rehabilitation trials.^[45]^[46]30477-4/fulltext)^[47] Compared to classical test theory (CTT), the Rasch model offers sample-invariant item calibration, where item difficulties remain stable across different groups, unlike CTT's reliance on test-specific statistics. This invariance supports generalizable measurements, as demonstrated in instrument development where Rasch ensures consistent trait estimation regardless of sample composition. Additionally, Rasch enables detection of differential item functioning (DIF), identifying biased items that perform differently across subgroups (e.g., by gender or ethnicity), promoting fairness in educational and psychological assessments. For instance, routine DIF analysis using Rasch has been recommended for validating science education instruments to eliminate cultural biases.^[48]^[49]^[50]^[51] Case studies illustrate the Rasch model's role in refining psychological instruments, such as its application to IQ tests for fluid intelligence scales, where Rasch modeling combined with cognitive principles yielded invariant person ability estimates across age groups. In personality inventories, Rasch analysis of the Proactive Personality Scale confirmed item fit and category functioning, supporting its use in occupational psychology for trait assessment. The Rasch measurement community, through organizations like the International Objective Measurement Workshop, advances instrument development by emphasizing these applications, fostering collaborative validation of scales for educational and attitudinal research.^[52]^[53] Overall, the Rasch model's impact is evident in large-scale assessments like the Programme for International Student Assessment (PISA), where Rasch-based IRT scaling improves validity by equating literacy measures across cycles and countries, ensuring comparable international benchmarks for educational policy. This has enhanced the reliability of global proficiency estimates, influencing reforms in over 70 participating nations.^[54]^[55]

Interpreting parameters and fit assessment

In the Rasch model, the person parameter θ quantifies an individual's ability or trait level along the latent variable, expressed in logit units where θ = 0 corresponds to the average level (by convention), with higher values indicating greater ability.^[56] This parameterization allows for interval-level measurement, enabling comparisons of relative proficiency; for instance, a difference of 1 logit roughly doubles the odds of success on items of equivalent difficulty.^[2] The item parameter β, conversely, represents the location or difficulty of an item on the same logit scale, marking the point where the probability of a correct response is 50% for a person with θ = β.^[57] Items with higher β values are more challenging, targeting higher-ability persons, while lower β values suit lower-ability individuals. Person-item maps, also known as Wright maps, provide a visual alignment of these parameters by plotting person abilities (typically as asterisks or "x" symbols on the left) and item difficulties (as numbers or "M" for mean on the right) against a common logit scale, with the vertical axis ranging from low to high measures.^[56] This depiction illustrates targeting, such as whether most items cluster around the mean person ability (M) or spread across standard deviations (S for ±1 SD, T for ±2 SD), highlighting gaps or overlaps that inform test construction.^[56] Item characteristic curves (ICCs) complement this by graphing the expected probability of success as a function of θ for a fixed β, forming an S-shaped logistic curve that rises steeply around the item's difficulty; deviations from the ideal curve in empirical plots signal potential misfit.^[58] Fit assessment evaluates how well observed responses align with model expectations, primarily through residual-based statistics derived from differences between observed (X) and expected (E) scores. Item fit is gauged using infit and outfit mean-square statistics, both chi-square variants normalized by degrees of freedom and expected to approximate 1 under perfect fit.^[59] Infit, an information-weighted measure, is sensitive to "inlier" patterns—unexpected responses near a person's ability level, such as overfit Guttman-like determinism (mean-square < 0.5, indicating predictability) or underfit erratic responses (mean-square > 1.5, suggesting noise); it is less affected by outliers.^[59] Outfit, outlier-sensitive, detects extreme surprises far from ability, like lucky guesses (high underfit, mean-square > 2.0, degrading measurement) or imputed responses (low overfit); values between 0.5 and 1.5 are productive, while extremes warrant item revision.^[59] Chi-square item fit tests aggregate these residuals, with non-significant p-values (often > 0.01) confirming alignment, though sample size influences sensitivity.^[60] Person fit examines individual response patterns for anomalies, using similar mean-square statistics or t-tests on standardized residuals to identify guessing (high outfit > 2.0), carelessness, or deterministic overfit (infit < 0.5), which may indicate cheating or misunderstanding.^[59] T-tests of person residuals, standardized to z-scores with expectation 0 and variance 1, flag misfit if |z| > 2 (unexpected patterns) or < -2 (overly predictable); values beyond ±3 suggest invalid measures, such as extreme scores fitting trivially and thus excluded from computation.^[61] Unusual patterns, like inconsistent successes on hard items paired with failures on easy ones, elevate outfit, signaling potential data issues.^[59] Model criticism addresses violations like local dependence or multidimensionality through principal components analysis (PCA) of inter-item residual correlations, after linearizing residuals as (X - E)/√[E(1 - E)] to approximate normality.^[62] The first component captures the Rasch dimension; a dominant second eigenvalue (> 10% of first or unexplained variance > 5%) indicates local dependence, such as correlated residuals from similar content (e.g., correlations > 0.2 between item pairs like bladder and bowel functions), violating conditional independence.^[62] For unidimensionality, PCA contrasts loadings to split items into subsets; if subset measures differ significantly (t-test p < 0.05), multidimensionality is evident, prompting model extensions or item removal.^[63] This diagnostic ensures the scale measures a single construct, with low residual variance supporting validity.^[64]

Implementation and software

Estimation software tools

Several software packages and tools are available for estimating parameters in the Rasch model, ranging from open-source R packages to commercial standalone applications, enabling researchers to fit the model to dichotomous, polytomous, and multidimensional data.^[65] These tools typically implement estimation methods such as joint maximum likelihood (JML), conditional maximum likelihood (CML), and marginal maximum likelihood (MML), facilitating analysis in educational testing and psychological measurement.^[66] In the R programming environment, several packages provide robust support for Rasch estimation. The ltm package analyzes dichotomous and polytomous data under item response theory (IRT), including the Rasch model, using maximum likelihood estimation for parameter fitting and diagnostics.^[67] The TAM package offers marginal MML and JML/CML estimation for unidimensional and multidimensional Rasch models, as well as the multifaceted Rasch model, with functions like tam.mml() for model calibration and support for large datasets through plausible value imputation.^[68] Similarly, the eRm package specializes in extended Rasch modeling, fitting the Rasch model (RM), rating scale model (RSM), and partial credit model (PCM) via CML for item parameters and ML for person parameters, including features for fit assessment like infit/outfit statistics and automated item elimination.^[69] Specialized standalone software provides user-friendly interfaces for Rasch analysis. Winsteps, a commercial Windows-based tool, employs JML and CML estimation to construct measures from rectangular datasets, generating person-item maps for visualizing ability and difficulty distributions, handling large datasets on 64-bit systems, and performing differential item functioning (DIF) analysis to detect bias.^[70] The free, open-source jMetrik offers a graphical user interface (GUI) for Rasch estimation alongside classical and IRT analyses, supporting DIF detection, item response theory linking, and direct export of results to Excel for further processing.^[71] IRTPRO, a commercial package from Scientific Software International, uses MML estimation for the Rasch model as a one-parameter logistic IRT variant, accommodating complex designs with an intuitive GUI suitable for test developers.^[72] Additional tools integrate Rasch estimation into broader statistical environments. ACER ConQuest, a commercial program, fits unidimensional and multidimensional Rasch models using MML, JML, or Bayesian MCMC, with capabilities for latent regression and direct import/export to SPSS, Excel, or CSV formats, making it ideal for large-scale assessments.^[73] For users of general-purpose software, Rasch estimation can be achieved in SPSS via extensions like the SPSSINC_RASCH procedure, which leverages the R ltm package for model fitting, or the SPIRIT macro for one-parameter IRT analyses.^[74]^[75] In SAS, procedures such as PROC LOGISTIC estimate dichotomous Rasch parameters through logistic regression frameworks, while macros like %lrasch_mml enable MML fitting for polytomous models.^[76]^[77] When selecting software, consider the specific model requirements: for example, TAM is preferable for multidimensional or multifaceted extensions, while eRm excels in conditional estimation for dichotomous data.^[68]^[69] Open-source options like R packages and jMetrik promote accessibility and reproducibility, whereas commercial tools such as Winsteps and IRTPRO offer advanced DIF and visualization features for professional applications.^[70]^[72]

Software	Type	Key Estimation Methods	Notable Features	Open-Source/Commercial
ltm (R)	Package	Maximum likelihood	Polytomous support, diagnostics	Open-source^[67]
TAM (R)	Package	MML, JML/CML	Multidimensional, multifaceted, large datasets	Open-source^[68]
eRm (R)	Package	CML, ML	Fit statistics, item elimination	Open-source^[69]
Winsteps	Standalone	JML, CML	Person-item maps, DIF, 64-bit large data	Commercial^[70]
jMetrik	Standalone	IRT-based (Rasch)	GUI, DIF, Excel export	Open-source^[71]
IRTPRO	Standalone	MML	GUI for complex IRT, test scoring	Commercial^[72]
ConQuest	Standalone	MML, JML, MCMC	Multidimensional, SPSS/Excel integration	Commercial^[73]
SPSS extensions (e.g., SPIRIT)	Integration	Via R or macro	One-parameter IRT, syntax interface	Extension (free macro)^[75]
SAS (PROC LOGISTIC, macros)	Integration	Logistic regression, MML	Polytomous support, flexible macros	Commercial software^[76]^[77]

Practical considerations in analysis

In Rasch analysis, data requirements emphasize a balanced experimental design, where the number of persons and items is approximately equal to facilitate stable parameter estimation via joint maximum likelihood. ^[78] This balance helps mitigate biases that arise from disproportionate sample compositions, ensuring that person and item difficulties are calibrated reliably without excessive variability. ^[79] For handling incomplete data, plausible value imputation is a recommended approach, particularly in educational assessments, where multiple imputed datasets are generated to approximate the underlying ability distribution while preserving the probabilistic structure of the model. ^[80] Sample size guidelines typically recommend a minimum of 100 persons to achieve stable estimates and reduce the risk of parameter instability, though smaller samples (around 50) can be used in pilot studies with caution due to increased likelihood of disordered item thresholds. ^[81] Larger samples enhance the precision of fit statistics but may paradoxically increase detection of minor misfits, necessitating careful interpretation. ^[82] Power analysis is essential for detecting item misfit, as small samples (n < 500) often fail to identify deviations reliably, especially when multiple items are aberrant, underscoring the need for simulations tailored to the study's context. ^[83] Common pitfalls include assuming unidimensionality without empirical testing, which can lead to invalid scale construction if multidimensionality underlies the data, as confirmed by principal components analysis of residuals or fit diagnostics. ^[84] Another frequent issue is ignoring category collapse in polytomous items, where adjacent response categories exhibit overlapping thresholds, resulting in disordered functioning and inflated misfit statistics; failing to collapse such categories artificially reduces item information and compromises measure precision. ^[85] ^[86] Validation steps involve cross-validation across independent samples to verify parameter invariance and generalizability, ensuring that item difficulties remain consistent beyond the calibration dataset. ^[87] Reliability is assessed using indices like person separation, which quantifies the stratification of persons into distinct ability levels and serves as an analog to Cronbach's alpha, with values above 0.80 indicating adequate separation for group-level inferences. ^[88] Ethical issues in Rasch applications, particularly for high-stakes testing, center on ensuring fairness by detecting and mitigating differential item functioning that could disadvantage subgroups, thereby upholding equity in decisions affecting education or certification. ^[89] Reporting standards, as outlined in the AERA/APA/NCME guidelines, mandate transparent disclosure of model assumptions, fit results, and limitations to support valid interpretations and prevent misuse of measures in consequential contexts. ^[90] Future directions include integrating Rasch models with machine learning techniques to enhance adaptive testing designs, such as using neural networks to predict item responses in real-time while maintaining probabilistic invariance for personalized assessments. ^[91] This hybrid approach leverages IRT's measurement rigor with ML's scalability to optimize item selection in computerized adaptive tests. ^[92]

References

[1]
An introduction to Item Response Theory and Rasch Analysis ... - NIH
Measurement model built around the relationship between a person's performance on individual items and their performance on the measure overall. Latent ...
[2]
Rasch Modeling
The Rasch model provides a mathematical framework against which test developers can compare empirical data to assess an instrument's capacity to emulate the ...
[3]
A Scientometric Review of Rasch Measurement - PubMed Central
Abstract. A recent review of the literature concluded that Rasch measurement is an influential approach in psychometric modeling.
[4]
Rasch Model - an overview | ScienceDirect Topics
The Rasch model is defined as a statistical model that ensures the comparison between the locations of persons and items is independent, ...
[5]
Classical test theory versus Rasch analysis for quality of life ...
In contrast to the CTT approach, the Rasch model provides an alternative scaling methodology that enables the examination of the hierarchical structure, ...
[6]
Rasch Model - Sage Knowledge
Originally developed in the 1950s by Danish mathematician Georg Rasch for the analysis of dichotomous responses to intelligence tests, the ...
[7]
[PDF] The Measurement Theory of Georg Rasch Ronald Mead
The goal is to understand; not simply explain in the barren statistical sense, the item response matrix. A perfectly fitting model does not imply understanding.
[8]
I. Probabilistic models for some intelligence and attainment tests.
Citation. Rasch, G. (1960). Studies in mathematical psychology: I. Probabilistic models for some intelligence and attainment tests. Nielsen & Lydiche.
[9]
Misunderstanding the Rasch model - ResearchGate
Aug 6, 2025 · Rasch attributes important features of his model to Ronald Fisher's concept of statistical sufficiency, but he appears to have been prepared ...<|separator|>
[10]
A Scientometric Review of Rasch Measurement: The Rise ... - Frontiers
A recent review of the literature concluded that Rasch measurement is an influential psychometric approach in psychology research.Missing: Fisher | Show results with:Fisher
[11]
Georg Rasch and Benjamin D. Wright: The Early Years
Excerpts from an interview with Benjamin D. Wright taped by David Andrich in Judd 438 during April, 1981, when David was in Chicago for the first International ...
[12]
(PDF) Ben Wright: the Measure of the Man - ResearchGate
An extension to the Rasch model for fundamental measurement is described in which there is parameterization not only for examinee ability and item difficulty ...Missing: 1980s | Show results with:1980s
[13]
MESA Memo 63: Probabilistic Models: Foreword and Preface
The psychometric research done by Rasch between 1951 and 1959, which he explains and illustrates in this book, marks the point at which psychometrics moved from ...
[14]
Probabilistic models for some intelligence and attainment tests
Oct 12, 2022 · Probabilistic models for some intelligence and attainment tests. by: Rasch, G. (Georg), 1901-1980. Publication date: 1980. Topics: Intelligence ...
[15]
"Fitting the Rasch model under the logistic regression framework ...
This article showed how and why the Rasch model can be fitted under the logistic regression framework. Then a penalized maximum likelihood (Firth 1993) for ...
[16]
[PDF] A short note on the Rasch model and latent trait model.
Apr 10, 2020 · In this note, we will briefly discuss the simplest IRT model called the Rasch model, which is named after Danish statistician Georg Rasch. A ...
[17]
Does the Rasch Model Convert an Ordinal Scale into an Interval ...
Rasch analysis provides a transformation of an ordinal score into a linear, interval-level variable, given fit of data to Rasch model expectations.
[18]
[PDF] Joint and Conditional Maximum Likelihood Estimation for the Rasch ...
The Rasch model has the attraction of relative simplicity, and parameter estimation is feasible without use of a parametric model for latent ability.
[19]
[PDF] 1 Estimation Methods for Rasch Measures Linacre, J. M. (1999 ...
Joint maximum likelihood estimation (JMLE). Leading directly from Fisher sufficiency, and also from Gaussian least-squares, this method produces estimates for ...
[20]
None
### Summary of Marginal Maximum Likelihood (MML) Estimation for the Rasch Model (Bock and Aitkin, 1981)
[21]
Rasch analysis: A primer for school psychology researchers and ...
The Rasch model, credited to Danish mathematician Georg Rasch (Citation1960), aims to support true measurement. The mathematics behind the model illustrates the ...
[22]
[PDF] Rasch Model Estimation: Further Topics - Winsteps.com
Building on Wright and Masters (1982), several Rasch estimation methods are briefly described, including. Marginal Maximum Likelihood Estimation (MMLE) and ...
[23]
Specific objectivity - local and general - Rasch.org
Specific objectivity - local and general. Georg Rasch used the term "specific objectivity" to describe that case essential to measurement in which ...
[24]
Dichotomous Rasch Model derived from Specific Objectivity
Specific Objectivity1 is the requirement that the measures produced by a measurement model be sample-free for the agents (test items) and test-free for the ...<|control11|><|separator|>
[25]
Rasch fit statistics as a test of the invariance of item parameter ...
This study investigates the degree to which the INFIT and OUTFIT item fit statistics in WINSTEPS detect violations of the invariance property of Rasch ...
[26]
Sufficient statistics and latent trait models
The case of equal item--dis- criminating powers is the special case of the Birnbaum-Rasch model usually referred to as the Rasch model. We may thus describe the ...<|control11|><|separator|>
[27]
Raw Scores as Sufficient Statistics - Rasch.org
Dichotomous Rasch Model derived from Counting Right Answers: Raw Scores as Sufficient Statistics. In statistical terminology, "sufficient" means "all the ...Missing: total | Show results with:total
[28]
Asymptotic Properties of Conditional Maximum-Likelihood Estimators
In this section we shall give the definition of minimal sufficient statistics for a two-parameter model and discuss the existence and uniqueness of such minimal ...
[29]
Comparing and Choosing between "Partial Credit Models" (PCM ...
A "rating scale" model is one in which all items (or groups of items) share the same rating scale structure. A "partial credit" model is one in which each item ...
[30]
A rating formulation for ordered response categories | Psychometrika
Andrich, D.Applications of a psychometric model for ordered categories scored with successive integers. Paper presented at the A.E.R.A. Conference, New York, ...
[31]
Rating Scale Model (RSM) or Partial Credit Model (PCM)? - Rasch.org
The rating scale model specifies that a set of items share the same rating scale structure. It originates in attitude surveys.
[32]
A rasch model for partial credit scoring | Psychometrika
Jan 4, 1982 · A Rasch model for rating scales. Doctoral dissertation, University of Chicago, 1980. Wright, B. D. & Masters, G. N. The measurement of ...
[33]
[PDF] A Multidimensional Rasch Analysis of Gender Differences in PISA ...
This paper aims to serve two primary ob- jectives (1) investigate the empirical evidence supporting the four sub-content domains using a four-dimensional model, ...
[34]
[PDF] a comparison of unidimensional and multidimensional rasch
This process can require an inordinate convergence time for two- or three- parameter models with challenges to consistency in parameter estimation. The Rasch ...
[35]
[PDF] STATISTICAL INFERENCE FOR THE MULTIDIMENSIONAL ... - HAL
Jan 25, 2008 · This method involves the approximations of the marginal likelihood and joint moments of the variables. It is also proposed an approximate Akaike ...
[36]
(PDF) Assessing the dimensionality of the CES-D using multi ...
May 25, 2018 · Methods The present study applies a multidimensional Rasch model ... partial invariance when the chi-square difference test was applied ...
[37]
[PDF] Linacre MFRM Book - Winsteps.com
10 A COMPARATIVE EXAMPLE OF MANY-FACET. MEASUREMENT. The many-facet Rasch model facilitates the analysis of judge-awarded ratings by producing measures asfree ...
[38]
MESA Memo 61: Facets Model for Judging - Rasch.org
The facets model is an extension of the partial credit model, designed for examinations which include subjective judgments. Its development enables the benefits ...
[39]
Why Studying Rasch Measurement is a Smart Career Move
Dec 10, 2018 · Rasch himself observed that, when the model is applied to a reading comprehension test, it introduces the possibility of 'the reading accuracy ...
[40]
(PDF) Exploring the use of IRT equating for the GRE Subject Test in ...
Aug 9, 2025 · A study was conducted to investigate the feasibility of using IRT equating for the GRE Subject Test in Mathematics. Two forms of the test ...
[41]
Computer-adaptive testing algorithm - Rasch.org
It modernizes our testing procedures by replacing conventional classical test theory (CTT) with Rasch measurement, and clumsy paper-and-pencil administration ...
[42]
Utilizing Rasch Measurement Models to Develop a Computer ...
Purpose: The purpose of this paper is to show how the Rasch model can be used to develop a computer adaptive self-report of walking, climbing, and running.
[43]
Computerized Adaptive Testing (CAT): Introduction and Benefits
Apr 11, 2025 · Computerized adaptive testing (CAT) is an AI-based approach that personalizes assessments, making them shorter, more accurate, and more secure.What is computerized adaptive... · Advantages of computerized...
[44]
Rasch Model: DASS Analysis of Depression, Anxiety, Stress
May 9, 2009 · The aim of this study was to use Rasch analysis to assess the psychometric properties of the DASS-21 scales, using two different administration modes.
[45]
Rasch model analysis: Depression, Anxiety, Stress Scales (DASS)
May 9, 2009 · Conformity of DASS-21 scales to a Rasch partial credit model was assessed using the RUMM2020 software.
[46]
a simulation study based on Rasch measurement theory
Aug 12, 2022 · Evaluating the impact of calibration of patient-reported outcomes measures on results from randomized clinical trials: a simulation study based ...
[47]
[PDF] Why Rasch: Selection of a Quantitative Model
An instrument carefully constructed and evaluated with the Rasch model provides a sample-invariant measurement scale. Limitations of Classical Test Theory.Missing: advantages | Show results with:advantages
[48]
The Promising Advantages of Rasch
Rasch modeling allows for generalizability across samples and items, takes into account that response options may not be psychologically equally spaced.
[49]
Why Differential Item Functioning Analysis Should Be a Routine Part ...
Oct 13, 2017 · We provide a tutorial on differential item functioning (DIF) analysis, an analytic method useful for identifying potentially biased items in assessments.
[50]
Differential Item Functioning and Differential Test Functioning (DIF
Differential Item Functioning (DIF) investigates the items in a test, one at a time, for signs of interactions with sample characteristics.
[51]
Developing a fluid intelligence scale through a combination of ...
Title. Developing a fluid intelligence scale through a combination of Rasch modeling and cognitive psychology. Publication Date. Sep 2014. Publication History.<|control11|><|separator|>
[52]
Rasch Analysis of the Proactive Personality Scale - Sage Journals
Jun 29, 2021 · This study evaluated the psychometric properties of the PPS-6 using Rasch analysis. A total of 429 participants completed the PPS-6. Rasch rating scale model ( ...Missing: inventories | Show results with:inventories
[53]
[PDF] Theoretical considerations on scaling methodology in PISA - OECD
Dec 13, 2022 · GPCM for binary data is called the 2-parameter model, while the partial credit model (PCM) for binary data is called the Rasch model (Rasch, ...
[54]
(PDF) Equating Reading Literacy Measures Over Time: A Rasch ...
Apr 8, 2025 · This study employs a Rasch model approach to equate reading literacy measures across multiple cycles of the Programme for International Student Assessment ( ...
[55]
Using The Very Useful Wright Map - Rasch.org
The Wright Map compares exam item difficulty to candidate ability, using two histograms. It helps understand how well a test measures, with items and ...
[56]
An Evaluation of Overall Goodness-of-Fit Tests for the Rasch Model
This study compares four of these tests, which are all available in R software: T 10, T 11, M 2, and the LR test.Missing: unusual | Show results with:unusual
[57]
Estimating Item Discriminations - Rasch.org
Plots of empirical item characteristic curves (ICCs) enable one to estimate the empirical item discrimination, at least as well as a 2-PL IRT computer program.
[58]
What do Infit and Outfit, Mean-square and Standardized mean?
Infit means inlier-sensitive or information-weighted fit. · Outfit means outlier-sensitive fit. · Mean-square fit statistics show the size of the randomness, i.e. ...
[59]
Item fit statistics for Rasch analysis: can we trust them?
Aug 28, 2020 · Methods. Data sets were simulated to fit the Rasch model, with sample sizes between 150 and 10 000, and 10, 15 or 20 items.Missing: challenges missing
[60]
Fit diagnosis: infit outfit mean-square standardized: Winsteps Help
The mean-square Outfit statistic is also called the Reduced chi-square statistic. ... square) statistics occurring by chance when the data fit the Rasch model.
[61]
Local Dependency, Correlations and Principal Components
Local independence specifies that the value of one datum has no influence on another once the underlying variable has been accounted for (conditioned out).
[62]
Comparison with principal component analysis of residuals - NIH
Sep 15, 2022 · As those values are used to determine whether the data are unidimensional in the Rasch model, it would be beneficial to compare those to fit ...
[63]
Chapter 4 Evaluating the Quality of Measures | Rasch ... - Bookdown
Because model-data fit analyses within the Rasch framework are based on residuals, the first step in the model-data fit analysis is to analyze the data with a ...Missing: unusual | Show results with:unusual
[64]
Rasch Measurement Analysis Software Directory
Rasch Measurement Analysis Software Directory ; ConQuest 5 (Windows, Mac) · Facets (Windows) · RUMM2030+ (Windows) ; www.acer.edu.au/conquest · www.winsteps.com/ ...
[65]
survey and review of packages for the estimation of Rasch models
From 27 R packages indexed with the word “Rasch”, 11 packages capable of Rasch estimation and analysis are identified and critiqued.
[66]
https://www.ijme.net/archive/13/r-statistics-rasch-models/
[67]
CRAN: Package TAM - R Project
Aug 28, 2025 · TAM: Test Analysis Modules. Includes marginal maximum likelihood estimation and joint maximum likelihood estimation for unidimensional and multidimensional ...
[68]
CRAN: Package eRm
May 27, 2025 · eRm: Extended Rasch Modeling. Fits Rasch models (RM), linear logistic test models (LLTM), rating scale model (RSM), linear rating scale models (LRSM), partial ...
[69]
WINSTEPS Rasch Software - Winsteps 5.10.3
Winsteps constructs Rasch measures from simple rectangular data sets, usually of persons and items, using JMLE and CMLE.Ministep · Winsteps User Manual PDF · 64-bit
[70]
Psychomeasurement Systems – Software and Consulting Services
jMetrik. jMetrik is a free and open source computer program for psychometric analysis. Methods include classical item analysis, differential item functioning, ...Psychomeasurement · jMetrik Features · About Us · Contact Us
[71]
SSI Software
IRTPRO™ is an advanced application for item calibration and test scoring using item response theory (IRT). It comes with an intuitive graphical user interface ...Software Licenses · Lisrel · HLM Academic · IRTPRO™ - Trial
[72]
ConQuest - ACER - Australian Council for Educational Research
ACER ConQuest 5 is a computer program for fitting both unidimensional and multidimensional item response and latent regression models.
[73]
IBMPredictiveAnalytics/SPSSINC_RASCH: Estimate a Rasch model ...
This procedure estimates the parameters of the Rasch model for item response data, using the rasch function from the R ltm package.
[74]
IRT in SPSS Using the SPIRIT Macro - PMC - NIH
A wide variety of one-parameter item response theory (IRT) models can now be easily run on SPSS using the SPIRIT macro.<|separator|>
[75]
[PDF] 342-2011: Using PROC LOGISTIC to Estimate the Rasch Model
This paper develops SAS code to estimate the Rasch model using PROC LOGISTIC in order to produce results consistent or comparable with the estimates from ...
[76]
[PDF] %lrasch_mml: A SAS Macro for Marginal Maximum Likelihood ...
This paper describes a SAS macro that fits two-dimensional polytomous Rasch models using a specifi- cation of the model that is sufficiently flexible to ...
[77]
(PDF) On designing data-sampling for Rasch model calibrating an ...
In correspondence with pertinent statistical tests, it is of practical importance to design data-sampling when the Rasch model is used for calibrating an ...Missing: guidelines, | Show results with:guidelines,
[78]
Sample Size and Item Calibration or Person Measure Stability
The Rasch model is blind to what is a person and what is an item, so the numbers are the same. Rasch is the same as any other statistical analysis with a small ...Missing: missing | Show results with:missing
[79]
Rasch fit statistics and sample size considerations for polytomous data
May 29, 2008 · The relationship between sample size and fit statistics was explored using the two Rasch models, that is, the Rating Scale Model [26], and ...
[80]
Is Rasch model analysis applicable in small sample size pilot ...
Conclusions: Rasch analysis based on small samples (≤ 50) identified a greater number of items with incorrectly ordered parameters than larger samples (≥ 100).
[81]
Interpreting results from Rasch analysis 2. Advanced model ...
The present paper presents developments and advanced practical applications of Rasch's theory and statistical analysis to construct questionnaires for ...
[82]
Detecting Item Misfit in Rasch Models - ResearchGate
Aug 7, 2025 · Results indicate the limitations of small samples (n < 500) in correctly detecting item misfit, especially when a larger proportion of items are ...
[83]
Rasch analysis in physics education research: Why measurement ...
Jul 3, 2019 · The Rasch model is a probabilistic model which describes the interaction of persons (test takers or survey respondents) with test or survey items.
[84]
Combining (Collapsing) and Splitting Categories - Rasch.org
For a dichotomous item, there are two categories: present/ absent, yes/no, right/wrong. For polytomous items, there can be many categories: none/some/many/all. ...Missing: Common pitfalls
[85]
[PDF] Collapsing or Not? A Practical Guide to Handling Sparse ...
When collapsing categories to improve polytomous data-model fit, ob served improved model fit may be spurious and only applicable to the collected sample. ( ...<|separator|>
[86]
Rasch validation of the Warwick-Edinburgh Mental Well-Being Scale ...
Feb 17, 2023 · Residuals greater than 2.5 or smaller than 2.5 indicate item redundancy and item misfit, respectively [10]. Item fit analysis takes into account ...
[87]
Psychometric limitations of the 13-item Sense of Coherence Scale ...
Jun 8, 2017 · Collapsing categories at the low end of the 7-category rating scale improved its overall functioning. Two items demonstrated poor fit to the ...
[88]
Position Statement on High-Stakes Testing
This position statement on high-stakes testing is based on the 1999 Standards for Educational and Psychological Testing. The Standards represent a professional ...
[89]
[PDF] standards_2014edition.pdf
American Educational Research Association. Standards for educational and psychological testing / American Educational Research Association,.Missing: Rasch | Show results with:Rasch
[90]
[PDF] Integrating Machine Learning into Item Response Theory for ... - Lirias
This study proposes a hybrid approach by combining the strength of IRT models with machine learning. Specifically, the approach integrates the Rasch model with ...Missing: directions | Show results with:directions
[91]
Survey of Computerized Adaptive Testing: A Machine Learning ...
A pivotal advancement was the integration of the Rasch model from Item Response Theory (IRT) (Roskam and Jansen, 1984) , which allowed CAT to adaptively match ...