Fact-checked by Grok 2 weeks ago

Rasch model

The Rasch model, also known as the one-parameter logistic (1PL) model, is a psychometric framework within (IRT) that models the probability of an individual's correct response to a dichotomous test item as a of the difference between the person's latent trait level (such as ability or attitude) and the item's difficulty parameter, assuming uniform item discrimination across all items. This model enables the estimation of interval-level measures from , facilitating objective and invariant comparisons of person abilities and item difficulties independent of the specific sample or test form used. The mathematical formulation is given by P_{ni}(x=1) = \frac{e^{(\theta_n - \delta_i)}}{1 + e^{(\theta_n - \delta_i)}}, where \theta_n represents person n's ability and \delta_i denotes item i's difficulty, both expressed in units. Developed by Danish mathematician Georg Rasch in the 1950s, the model was first formalized in his 1960 monograph Probabilistic Models for Some Intelligence and Attainment Tests, which applied probabilistic approaches to data from Danish schools. Building on earlier scaling methods like those of in the 1920s, Rasch's work emphasized "specific objectivity," ensuring that measurements remain consistent regardless of the persons or items involved, a principle that distinguished it from . The model's adoption accelerated in the 1960s through collaborations, particularly with American psychometrician Benjamin Drake Wright, who introduced it via lectures and training at the , leading to extensions such as the partial credit model for polytomous responses (Masters, 1982) and multifaceted versions for rater effects (Linacre, 1989). Key assumptions of the Rasch model include unidimensionality (all items measure a single underlying ), local (item responses are conditionally independent given the trait level), and monotonicity (higher trait levels increase the probability of ). These features allow for rigorous evaluation of instrument quality through fit statistics and analysis, making it particularly valuable for developing and refining scales in fields beyond , such as outcomes (e.g., patient-reported measures like the Eating Assessment Tool) and social s. By the , Rasch-based had produced over 5,000 publications, underscoring its enduring influence on measurement through accessible software like Winsteps and RUMM.

Overview

Definition and purpose

The Rasch model is a one-parameter logistic model in (IRT) that estimates the probability of a correct response to a item as a of the between a person's latent ability parameter (θ) and the item's difficulty parameter (β). Developed by Danish Georg Rasch, it assumes that observed responses reflect an underlying probabilistic structure where success depends solely on this ability-difficulty contrast, without additional item-specific discrimination parameters varying across items. The primary purpose of the Rasch model is to facilitate invariant , meaning that estimates of abilities and item difficulties remain consistent regardless of the particular sample of persons or items used in the , thereby enabling objective comparisons. This contrasts with (CTT), which relies on aggregate test scores and is sample-dependent, often producing ordinal rather than interval-level that vary across different groups or item sets. By achieving parameter separability—where and item parameters are estimated independently—the model supports fundamental in fields like and , promoting fairness and precision in assessing latent traits. Key assumptions underlying the Rasch model include unidimensionality, positing that all items measure a single latent trait; local , ensuring that responses to items are independent given the person's ; and equal item , with all items having the same fixed at . Additionally, the model assumes monotonicity, where the probability of a correct response increases as exceeds item difficulty. For example, in educational testing, the Rasch model can analyze student responses to multiple-choice questions, revealing that the likelihood of a correct answer rises monotonically with the student's relative to each question's difficulty, allowing for tailored assessments that maintain measurement invariance across diverse student populations.

Historical development

The Rasch model was developed by Danish Georg Rasch during the as a probabilistic framework for analyzing categorical response data in educational and psychological assessments, particularly to estimate latent traits such as and item difficulty. Rasch first applied the model empirically to data in the , modeling counts of errors in oral reading tasks to demonstrate its utility in attainment testing. This application formed the basis for his foundational publication, Probabilistic Models for Some Intelligence and Attainment Tests, which presented the model as a means to achieve invariant comparisons between persons and items in and achievement contexts. The model's theoretical underpinnings were influenced by L.L. Thurstone's earlier work on psychological scaling, which sought to place items and individuals on a common metric for comparative measurement. Rasch extended these ideas by integrating Ronald A. Fisher's of statistical sufficiency, ensuring that parameter estimates remained stable regardless of the specific sample of respondents or items, thus enabling objective inferences about underlying constructs. Adoption of the Rasch model accelerated in the 1960s through the advocacy of and collaborators at the , who emphasized its practical implementation via computational tools and educational programs. , having invited Rasch to lecture at in 1960 and overseen the 1980 English republication of his book, organized the inaugural International Objective Measurement Workshop in 1981, fostering a community around the approach. This effort catalyzed the Rasch measurement movement, promoting the model as a cornerstone for sample-independent, fundamental measurement in the social sciences. Over subsequent decades, the Rasch model transitioned from a specialized tool for probabilistic modeling of test responses to a broader for objective measurement theory, aligning psychometric practices with principles of invariance and separability akin to those in physical .

Mathematical formulation

Dichotomous model

The dichotomous Rasch model specifies the probability of a correct response to a binary item, assuming unidimensionality of the underlying . For person n with ability \theta_n responding to item i with difficulty \beta_i, the probability P(X_{ni}=1 \mid \theta_n, \beta_i) of a correct response (X_{ni}=1) is given by the logistic function: P(X_{ni}=1 \mid \theta_n, \beta_i) = \frac{e^{\theta_n - \beta_i}}{1 + e^{\theta_n - \beta_i}}. This equation models the response as a function of the difference between and difficulty, with higher relative to difficulty increasing the probability of success. The logit form of this probability, \log\left(\frac{P(X_{ni}=1 \mid \theta_n, \beta_i)}{1 - P(X_{ni}=1 \mid \theta_n, \beta_i)}\right) = \theta_n - \beta_i, directly links the log-odds of success to the linear difference on a logistic scale, where \theta_n and \beta_i are expressed in logit units. The model can be viewed as a logistic regression for each item, treating ability \theta_n as the predictor and difficulty \beta_i as the intercept, with the response X_{ni} as the binary outcome; this perspective highlights its equivalence to a conditional logistic regression framework under specific constraints. Derivationally, the Rasch model emerges from the of distributions, where the joint probability of responses factorizes to separate person and item contributions via sufficient statistics: the total score for each person Y_{n+} = \sum_i X_{ni} is sufficient for \theta_n, and the total score for each item Y_{+i} = \sum_n X_{ni} is sufficient for \beta_i, ensuring parameter separability and enabling independent estimation. The resulting scale provides interval-level measurement, where equal intervals represent equal changes in the log-odds of success, allowing direct comparability of and difficulty locations along a continuous linear in logits.

Parameter estimation methods

Parameter estimation in the Rasch model involves deriving values for person abilities \theta_i and item difficulties \beta_j from observed response data Y_{ij}, where Y_{ij} = 1 indicates a correct response by person i to item j. Several maximum likelihood-based methods are employed, each addressing the incidental parameters problem inherent in , where the number of person parameters grows with the sample size. These methods vary in their treatment of person abilities and assumptions about their , impacting , , and computational feasibility. Joint maximum likelihood (JML) simultaneously maximizes the likelihood for both person abilities \theta_i and item difficulties \beta_j by treating all parameters as fixed effects. The log-likelihood is given by \ell_J(\theta, \beta) = \sum_i \sum_j \left[ Y_{ij} \log p_{ij} + (1 - Y_{ij}) \log (1 - p_{ij}) \right], where p_{ij} = \frac{\exp(\theta_i - \beta_j)}{1 + \exp(\theta_i - \beta_j)} and typically \beta_1 = 0 for . This approach is computationally efficient, often using iterative algorithms like Newton-Raphson, and provides reasonable starting values for other methods. However, JML yields inconsistent estimates for finite samples because person parameters are incidental; as the number of persons increases while items remain fixed, biases accumulate, particularly for extreme scores where persons achieve all correct or all incorrect responses. Conditional maximum likelihood (CML) estimation addresses JML's inconsistencies by conditioning on the sufficient statistics for person abilities—the total scores Y_{i+} = \sum_j Y_{ij}—thereby eliminating \theta_i from the . The conditional likelihood for item parameters is L_C(\beta | \{Y_{i+}\}) = \prod_k \frac{\exp\left( -\sum_j \beta_j c_{jk} \right)}{\sum_{\mathbf{c} \in C_k} \exp\left( -\sum_j \beta_j c_j \right)}, where c_{jk} counts the number of persons with total score k who responded correctly to item j, and C_k is the set of response patterns with exactly k correct answers. Maximization yields consistent and asymptotically estimates of \beta_j as the number of persons grows, independent of the ability distribution. CML is particularly suitable for the Rasch model due to its sufficiency properties but requires complete data across items for all persons and can be computationally intensive for large datasets, though modern implementations mitigate this. Marginal maximum likelihood (MML) estimation integrates out person abilities by assuming they follow a known distribution, typically standard normal \phi(\theta), treating items as fixed and persons as random effects. The is L_M(\beta) = \prod_i \int \prod_j p_{ij}^{Y_{ij}} (1 - p_{ij})^{1 - Y_{ij}} \phi(\theta_i) \, d\theta_i, approximated numerically via Gauss-Hermite quadrature. This method produces consistent estimates for both \beta_j and the ability distribution parameters, even with extreme scores, and is widely implemented in software such as the ltm, which uses for Rasch model fitting. is advantageous for moderate sample sizes and allows estimation of person abilities via post-hoc. For person ability estimation, Warm's weighted likelihood estimation (WLE) provides a bias-adjusted alternative to direct maximum likelihood, which can be infinite for perfect scores. WLE computes \hat{\theta}_i by minimizing \sum_j w_{ij} \left[ Y_{ij} \log p_{ij}(\theta) + (1 - Y_{ij}) \log (1 - p_{ij}(\theta)) \right], where weights w_{ij} are chosen to reduce bias, often based on the . This method improves stability and mean-squared error over unweighted maximum likelihood, particularly in small samples or with sparse data. Estimation in the Rasch model faces challenges related to sample size and completeness. Stable item estimates generally require at least 30 persons per item to minimize sampling variability and ensure model fit assessment reliability. For , pairwise deletion—using only observed responses for each item-person pair—is commonly applied in JML and pairwise maximum likelihood approaches, preserving without imputation , though it may reduce effective sample size for correlated items.

Key properties

Invariant measurement

In the Rasch model, invariant measurement refers to the property where estimates of person ability are independent of the particular set of items administered, and estimates of item difficulty are independent of the particular sample of persons tested, a concept known as specific objectivity. This invariance ensures that comparisons between persons or between items remain consistent regardless of the context in which they are observed, provided the data fit the model. Specific objectivity arises from the model's structure, which separates person and item parameters, allowing for objective scaling that aligns with fundamental principles in the sciences. The mathematical basis for this invariance stems from the separability of parameters in the Rasch model, where the log-odds of a correct response for person n on item i is given by \theta_n - \beta_i, with \theta_n representing person ability and \beta_i item difficulty. Under conditional , these log-odds ratios are invariant because the model's probabilistic structure ensures that parameter estimates do not depend on the specific sample or test form, as long as the sufficient statistics for persons and items are used. This separability contrasts with (CTT), where item difficulties and person scores are sample-dependent and test-dependent, respectively, leading to comparisons that vary across administrations. The implications of invariant measurement include the ability to equitably link different test forms and scales, facilitating fair comparisons over time or across groups. For instance, in adaptive testing, item banks can be used to administer tailored subsets of items to persons, yet person ability estimates remain comparable across individuals due to the model's invariance, enabling efficient and precise without compromising objectivity. This property supports the construction of stable measurement systems in fields like and , where consistent scaling is essential for monitoring progress or evaluating interventions. However, invariant measurement in the Rasch model assumes adequate fit to the data; violations of model assumptions, such as local independence or unidimensionality, can introduce dependencies that undermine invariance and lead to biased estimates.

Sufficiency and conditional independence

The Rasch model belongs to the of probability distributions, a property that guarantees the existence of s for its parameters. In this context, the total score for a person—defined as the sum of their responses across all items—serves as a minimal for estimating the person's θ. Likewise, the column total for each item, representing the sum of responses across all persons, is a for the item's difficulty β. This sufficiency implies that all relevant about θ or β is encapsulated in these marginal totals, independent of the specific patterns of individual responses. A key consequence of this structure is the of the joint likelihood function. The likelihood of the observed response matrix X given the parameters can be decomposed into separate components reliant solely on the sufficient totals: L(\theta, \beta \mid X) = L(\theta \mid \mathbf{r}) \cdot L(\beta \mid \mathbf{s}), where \mathbf{r} denotes the of person total scores and \mathbf{s} the of item total scores. This separation arises directly from the form of the model, allowing estimation of person and item parameters to proceed independently once the totals are observed. The sufficiency property underpins conditional independence in the Rasch model: given a person's total score, their responses to individual items are exchangeable and conditionally independent. That is, the probability of a specific response pattern, conditional on the total, depends only on the item difficulties and not on the person's or correlations between items. This exchangeability means that any response pattern yielding the same total score is equally likely, facilitating person-free calibration of item parameters without needing to estimate individual θ values simultaneously. These properties have significant implications for model estimation and application. For instance, conditional maximum likelihood (CML) estimation leverages this independence to derive consistent estimates by conditioning on the observed totals, avoiding incidental associated with full maximum likelihood. Moreover, sufficiency enables efficient probabilistic predictions and model comparisons using only the total scores, rather than the entire response , which reduces computational demands in large datasets. The framework also connects to the additivity of the measurement scale: by ensuring that person and item effects combine additively on the scale, the model realizes Rasch's vision of conjoint additivity, where comparisons remain invariant across contexts.

Model extensions

Polytomous response models

The Rasch model extends to polytomous response formats to analyze ordered categorical data beyond outcomes, maintaining the core principles of probabilistic while accounting for multiple response levels. These extensions are particularly useful for items where respondents select from a scale of ordered options, such as agreement levels or performance gradations. Two key models in this family are the Rating Scale Model (RSM) and the Partial Credit Model (PCM), each addressing specific structures in response data. The Rating Scale Model (RSM), proposed by Andrich in , applies to sets of items sharing a common rating scale structure, such as Likert-type items assessing attitudes or perceptions. In the RSM, the probability P(X_{ni} = k) that person n scores in category k (where k = 0, 1, \dots, M) on item i is modeled using shared category thresholds \delta_k across items: P(X_{ni} = k) = \frac{\exp \left[ \sum_{j=0}^{k} (\theta_n - \beta_i - \delta_j) \right]}{\sum_{m=0}^{M} \exp \left[ \sum_{j=0}^{m} (\theta_n - \beta_i - \delta_j) \right]}, where \theta_n is the person's , \beta_i is the item's difficulty, and \delta_j (with \delta_0 = 0) represent the step difficulties between , identical for all items. This formulation arises from an adjacent- logit framework, where the log-odds of responding in k rather than k-1 equals \theta_n - (\beta_i + \delta_k). The thresholds \delta_k are interpreted as the additional difficulty required to advance from one category to the next, enabling the assessment of how uniformly the scale functions across items. In contrast, the Partial Credit Model (PCM), introduced by Masters in , allows each item to have its own unique set of category thresholds, making it ideal for constructed-response tasks where partial credit reflects varying step difficulties per item. The probability P(X_{ni} = m) (for m = 0, 1, \dots, M) is: P(X_{ni} = m) = \frac{\exp \left[ \sum_{j=0}^{m} (\theta_n - \delta_{ij}) \right]}{\sum_{l=0}^{M} \exp \left[ \sum_{j=0}^{l} (\theta_n - \delta_{ij}) \right]}, where \delta_{ij} denotes the difficulty of step j for item i (with \delta_{i0} = 0), and the item's overall difficulty emerges from the cumulative steps. This can be viewed through an adjacent-categories , with the log-odds between consecutive categories m-1 and m as \theta_n - \delta_{im}, or a cumulative form that compares the probability of scoring at or above m versus below. The step parameters \delta_{ij} quantify the incremental challenges within each item, such as progressing from incorrect to partially correct responses. The RSM and PCM differ primarily in their assumption about threshold uniformity: the RSM imposes a common structure suitable for standardized scales, reducing parameters and enhancing stability when category observations are sparse, while the PCM's item-specific thresholds offer flexibility for heterogeneous tasks but require larger samples to estimate reliably. Both models preserve Rasch invariances, such as separation of and item parameters, and can be framed equivalently in adjacent or cumulative terms for interpretation. A difference test or comparison of fit indices, such as sample separation, can guide based on data structure. These polytomous models find applications in attitude surveys using , where the RSM evaluates consistent response patterns across items, and in performance assessments like open-ended tasks, where the PCM assigns nuanced credit for partial successes, such as in educational evaluations of problem-solving steps. In both cases, threshold estimates reveal scale functioning, informing item design by identifying disordered steps that disrupt measurement precision.

Multidimensional variants

The multidimensional Rasch model (MRM) extends the unidimensional Rasch framework to account for multiple latent traits, allowing for the measurement of correlated abilities or skills within a single assessment. In this model, the probability of a correct response incorporates a vector of person abilities \theta_n = (\theta_{n1}, \theta_{n2}, \dots, \theta_{nD}) and an item discrimination vector \alpha_i = (\alpha_{i1}, \alpha_{i2}, \dots, \alpha_{iD}), where in the Rasch case, components of \alpha_i are fixed to 1 for the dimensions the item measures (often using a Q-matrix to indicate loadings) and 0 otherwise; it is often simplified by assuming unit discrimination. The response probability for a dichotomous item is given by: P(X_{ni} = 1 \mid \theta_n, \beta_i, \alpha_i) = \frac{\exp(\alpha_i \cdot \theta_n - \beta_i)}{1 + \exp(\alpha_i \cdot \theta_n - \beta_i)}, where \beta_i is the scalar item's difficulty. Applications of the MRM are particularly valuable in educational testing where constructs involve distinct but related subskills, such as in mathematics assessments distinguishing between algebra (e.g., modeling relationships) and geometry (e.g., spatial reasoning) traits. For instance, analyses of PISA mathematics data have used the MRM to calibrate items across domains like quantity, uncertainty, space and shape, and change and relationships, revealing nuanced performance patterns across these multidimensional skills. Estimation in the MRM presents challenges due to the increased number of parameters with higher dimensionality, which can lead to issues like slower and higher computational demands compared to unidimensional models. Common approaches include marginal maximum likelihood () estimation, which integrates over the ability distribution to avoid incidental parameters problems, and Bayesian methods using (MCMC) for handling complex priors and multidimensional integrals. Key properties of the Rasch model are partially retained in the MRM; for example, measurement invariance holds conditionally on the discrimination parameters if they are constrained to be equal across dimensions, preserving comparability of estimates along specific directions. Additionally, the MRM serves as a diagnostic tool for detecting violations of unidimensionality, as model fit comparisons (e.g., via likelihood ratio tests) can indicate whether multiple traits better explain the data structure. A related extension is the many-facet Rasch model (MFRM; Linacre, 1989), which accounts for multiple facets such as rater or effects in subjective assessments while maintaining a unidimensional latent . This model incorporates facets like judges' severity and specific criteria alongside abilities and item difficulties to enable fairer measurement by adjusting for variability in rater judgments.

Applications and interpretations

Educational and psychological testing

The Rasch model is widely applied in educational testing for item banking, which involves calibrating items on a common scale to create pools for constructing equivalent test forms. This approach enables equitable score comparisons across administrations, as seen in standardized assessments where items are selected and equated based on their difficulty parameters independent of the test-taking sample. For instance, in large-scale programs like Australia's National Assessment Program – Literacy and Numeracy (), Rasch measurement supports item banking to ensure consistent evaluation of student achievement across diverse populations. Similarly, the Graduate Record Examination (GRE) employs (IRT) frameworks, such as the 2PL model, for equating sections and maintaining fairness in admissions testing. In computerized adaptive testing (), the Rasch model facilitates real-time item selection tailored to an examinee's estimated ability (θ), optimizing test efficiency by administering fewer items while achieving high precision. CAT systems using Rasch select subsequent items that maximize information at the current θ estimate, reducing test length by up to 50% compared to fixed-form tests without compromising reliability. This has been implemented in educational contexts, such as competency assessments, where adaptive algorithms based on Rasch improve accuracy for individualized learning evaluations. Psychological applications of the Rasch model extend to measuring latent traits like attitudes and health outcomes, often via extensions such as the partial credit model (PCM) for polytomous responses. In depression assessment, the Depression Anxiety Stress Scales (DASS-21) have been validated using Rasch under the PCM, confirming unidimensionality and reliable scoring across response categories for clinical screening. The model also supports patient-reported outcomes (PROs) in clinical trials, where Rasch-calibrated scales quantify changes in symptoms like pain or function, enhancing sensitivity to treatment effects in randomized studies. For example, Rasch optimization of PRO measures has improved the detection of meaningful differences in mobility self-reports for rehabilitation trials.30477-4/fulltext) Compared to (CTT), the Rasch model offers sample-invariant item calibration, where item difficulties remain stable across different groups, unlike CTT's reliance on test-specific statistics. This invariance supports generalizable measurements, as demonstrated in instrument development where Rasch ensures consistent trait estimation regardless of sample composition. Additionally, Rasch enables detection of (DIF), identifying biased items that perform differently across subgroups (e.g., by or ), promoting fairness in educational and psychological assessments. For instance, routine DIF analysis using Rasch has been recommended for validating instruments to eliminate cultural biases. Case studies illustrate the Rasch model's role in refining psychological instruments, such as its application to IQ tests for fluid intelligence scales, where Rasch modeling combined with cognitive principles yielded invariant person ability estimates across age groups. In inventories, Rasch analysis of the Proactive Personality Scale confirmed item fit and category functioning, supporting its use in occupational for trait assessment. The Rasch community, through organizations like the International Objective Measurement Workshop, advances instrument development by emphasizing these applications, fostering collaborative validation of scales for educational and attitudinal research. Overall, the Rasch model's impact is evident in large-scale assessments like the (PISA), where Rasch-based IRT scaling improves validity by equating literacy measures across cycles and countries, ensuring comparable international benchmarks for educational policy. This has enhanced the reliability of global proficiency estimates, influencing reforms in over 70 participating nations.

Interpreting parameters and fit assessment

In the Rasch model, the person parameter θ quantifies an individual's or level along the latent variable, expressed in units where θ = 0 corresponds to the average level (by convention), with higher values indicating greater . This parameterization allows for interval-level measurement, enabling comparisons of relative proficiency; for instance, a difference of 1 roughly doubles the of success on items of equivalent difficulty. The item parameter β, conversely, represents the location or difficulty of an item on the same scale, marking the point where the probability of a correct response is 50% for a person with θ = β. Items with higher β values are more challenging, targeting higher- persons, while lower β values suit lower- individuals. Person-item maps, also known as , provide a visual of these parameters by plotting (typically as asterisks or "x" symbols on the left) and item difficulties (as numbers or "M" for on the right) against a common scale, with the vertical axis ranging from low to high measures. This depiction illustrates targeting, such as whether most items cluster around the (M) or spread across deviations (S for ±1 SD, T for ±2 SD), highlighting gaps or overlaps that inform test construction. Item characteristic curves (ICCs) complement this by graphing the expected probability of success as a of θ for a fixed β, forming an S-shaped logistic curve that rises steeply around the item's difficulty; deviations from the ideal curve in empirical plots signal potential misfit. Fit assessment evaluates how well observed responses align with model expectations, primarily through residual-based statistics derived from differences between observed (X) and expected (E) scores. Item fit is gauged using infit and outfit mean-square statistics, both variants normalized by and expected to approximate 1 under perfect fit. Infit, an information-weighted measure, is sensitive to "inlier" patterns—unexpected responses near a person's level, such as overfit Guttman-like (mean-square < 0.5, indicating predictability) or underfit erratic responses (mean-square > 1.5, suggesting ); it is less affected by outliers. Outfit, outlier-sensitive, detects extreme surprises far from , like lucky guesses (high underfit, mean-square > 2.0, degrading measurement) or imputed responses (low overfit); values between 0.5 and 1.5 are productive, while extremes warrant item revision. item fit tests aggregate these residuals, with non-significant p-values (often > 0.01) confirming alignment, though sample size influences sensitivity. Person fit examines individual response patterns for anomalies, using similar mean-square statistics or t-tests on standardized residuals to identify guessing (high outfit > 2.0), carelessness, or deterministic overfit (infit < 0.5), which may indicate cheating or misunderstanding. T-tests of person residuals, standardized to z-scores with expectation 0 and variance 1, flag misfit if |z| > 2 (unexpected patterns) or < -2 (overly predictable); values beyond ±3 suggest invalid measures, such as extreme scores fitting trivially and thus excluded from computation. Unusual patterns, like inconsistent successes on hard items paired with failures on easy ones, elevate outfit, signaling potential data issues. Model criticism addresses violations like local dependence or multidimensionality through principal components analysis (PCA) of inter-item residual correlations, after linearizing residuals as (X - E)/√[E(1 - E)] to approximate normality. The first component captures the Rasch dimension; a dominant second eigenvalue (> 10% of first or unexplained variance > 5%) indicates local dependence, such as correlated residuals from similar content (e.g., correlations > 0.2 between item pairs like bladder and bowel functions), violating conditional independence. For unidimensionality, PCA contrasts loadings to split items into subsets; if subset measures differ significantly (t-test p < 0.05), multidimensionality is evident, prompting model extensions or item removal. This diagnostic ensures the scale measures a single construct, with low residual variance supporting validity.

Implementation and software

Estimation software tools

Several software packages and tools are available for estimating parameters in the Rasch model, ranging from open-source R packages to commercial standalone applications, enabling researchers to fit the model to dichotomous, polytomous, and multidimensional data. These tools typically implement estimation methods such as joint maximum likelihood (JML), conditional maximum likelihood (CML), and marginal maximum likelihood (), facilitating analysis in educational testing and psychological measurement. In the R programming environment, several packages provide robust support for Rasch estimation. The ltm package analyzes dichotomous and polytomous data under item response theory (IRT), including the Rasch model, using maximum likelihood estimation for parameter fitting and diagnostics. The TAM package offers marginal MML and JML/CML estimation for unidimensional and multidimensional Rasch models, as well as the multifaceted Rasch model, with functions like tam.mml() for model calibration and support for large datasets through plausible value imputation. Similarly, the eRm package specializes in extended Rasch modeling, fitting the Rasch model (RM), rating scale model (RSM), and partial credit model (PCM) via CML for item parameters and ML for person parameters, including features for fit assessment like infit/outfit statistics and automated item elimination. Specialized standalone software provides user-friendly interfaces for Rasch analysis. Winsteps, a commercial Windows-based tool, employs JML and CML estimation to construct measures from rectangular datasets, generating person-item maps for visualizing ability and difficulty distributions, handling large datasets on 64-bit systems, and performing differential item functioning (DIF) analysis to detect bias. The free, open-source jMetrik offers a graphical user interface (GUI) for Rasch estimation alongside classical and IRT analyses, supporting DIF detection, item response theory linking, and direct export of results to Excel for further processing. IRTPRO, a commercial package from Scientific Software International, uses MML estimation for the Rasch model as a one-parameter logistic IRT variant, accommodating complex designs with an intuitive GUI suitable for test developers. Additional tools integrate Rasch estimation into broader statistical environments. , a program, fits unidimensional and multidimensional Rasch models using , JML, or Bayesian MCMC, with capabilities for latent regression and direct import/export to , Excel, or formats, making it ideal for large-scale assessments. For users of general-purpose software, Rasch estimation can be achieved in via extensions like the SPSSINC_RASCH procedure, which leverages the ltm package for model fitting, or the macro for one-parameter IRT analyses. In , procedures such as PROC LOGISTIC estimate dichotomous Rasch parameters through frameworks, while macros like %lrasch_mml enable fitting for polytomous models. When selecting software, consider the specific model requirements: for example, is preferable for multidimensional or multifaceted extensions, while eRm excels in conditional estimation for dichotomous data. Open-source options like R packages and jMetrik promote accessibility and reproducibility, whereas commercial tools such as Winsteps and IRTPRO offer advanced DIF and visualization features for professional applications.
SoftwareTypeKey Estimation MethodsNotable FeaturesOpen-Source/Commercial
ltm (R)PackageMaximum likelihoodPolytomous support, diagnosticsOpen-source
TAM (R)PackageMML, JML/CMLMultidimensional, multifaceted, large datasetsOpen-source
eRm (R)PackageCML, MLFit statistics, item eliminationOpen-source
WinstepsStandaloneJML, CMLPerson-item maps, DIF, 64-bit large dataCommercial
jMetrikStandaloneIRT-based (Rasch)GUI, DIF, Excel exportOpen-source
IRTPROStandaloneMMLGUI for complex IRT, test scoringCommercial
ConQuestStandaloneMML, JML, MCMCMultidimensional, SPSS/Excel integrationCommercial
SPSS extensions (e.g., SPIRIT)IntegrationVia R or macroOne-parameter IRT, syntax interfaceExtension (free macro)
SAS (PROC LOGISTIC, macros)IntegrationLogistic regression, MMLPolytomous support, flexible macrosCommercial software

Practical considerations in analysis

In Rasch analysis, data requirements emphasize a balanced experimental design, where the number of persons and items is approximately equal to facilitate stable parameter estimation via joint maximum likelihood. This balance helps mitigate biases that arise from disproportionate sample compositions, ensuring that person and item difficulties are calibrated reliably without excessive variability. For handling incomplete data, plausible value imputation is a recommended approach, particularly in educational assessments, where multiple imputed datasets are generated to approximate the underlying ability distribution while preserving the probabilistic structure of the model. Sample size guidelines typically recommend a minimum of 100 persons to achieve stable estimates and reduce the risk of instability, though smaller samples (around 50) can be used in pilot studies with caution due to increased likelihood of disordered item thresholds. Larger samples enhance the precision of fit statistics but may paradoxically increase detection of minor misfits, necessitating careful interpretation. is essential for detecting item misfit, as small samples (n < 500) often fail to identify deviations reliably, especially when multiple items are aberrant, underscoring the need for simulations tailored to the study's context. Common pitfalls include assuming unidimensionality without empirical testing, which can lead to invalid scale construction if multidimensionality underlies the data, as confirmed by principal components analysis of residuals or fit diagnostics. Another frequent issue is ignoring category collapse in polytomous items, where adjacent response categories exhibit overlapping thresholds, resulting in disordered functioning and inflated misfit statistics; failing to collapse such categories artificially reduces item information and compromises measure precision. Validation steps involve cross-validation across independent samples to verify parameter invariance and generalizability, ensuring that item difficulties remain consistent beyond the calibration dataset. Reliability is assessed using indices like person separation, which quantifies the of persons into distinct ability levels and serves as an analog to , with values above 0.80 indicating adequate separation for group-level inferences. Ethical issues in Rasch applications, particularly for , center on ensuring fairness by detecting and mitigating that could disadvantage subgroups, thereby upholding equity in decisions affecting or . Reporting standards, as outlined in the AERA//NCME guidelines, mandate transparent disclosure of model assumptions, fit results, and limitations to support valid interpretations and prevent misuse of measures in consequential contexts. Future directions include integrating Rasch models with techniques to enhance adaptive testing designs, such as using neural networks to predict item responses in while maintaining probabilistic invariance for personalized assessments. This hybrid approach leverages IRT's measurement rigor with ML's scalability to optimize item selection in computerized adaptive tests.

References

  1. [1]
    An introduction to Item Response Theory and Rasch Analysis ... - NIH
    Measurement model built around the relationship between a person's performance on individual items and their performance on the measure overall. Latent ...
  2. [2]
    Rasch Modeling
    The Rasch model provides a mathematical framework against which test developers can compare empirical data to assess an instrument's capacity to emulate the ...
  3. [3]
    A Scientometric Review of Rasch Measurement - PubMed Central
    Abstract. A recent review of the literature concluded that Rasch measurement is an influential approach in psychometric modeling.
  4. [4]
    Rasch Model - an overview | ScienceDirect Topics
    The Rasch model is defined as a statistical model that ensures the comparison between the locations of persons and items is independent, ...
  5. [5]
    Classical test theory versus Rasch analysis for quality of life ...
    In contrast to the CTT approach, the Rasch model provides an alternative scaling methodology that enables the examination of the hierarchical structure, ...
  6. [6]
    Rasch Model - Sage Knowledge
    Originally developed in the 1950s by Danish mathematician Georg Rasch for the analysis of dichotomous responses to intelligence tests, the ...
  7. [7]
    [PDF] The Measurement Theory of Georg Rasch Ronald Mead
    The goal is to understand; not simply explain in the barren statistical sense, the item response matrix. A perfectly fitting model does not imply understanding.
  8. [8]
    I. Probabilistic models for some intelligence and attainment tests.
    Citation. Rasch, G. (1960). Studies in mathematical psychology: I. Probabilistic models for some intelligence and attainment tests. Nielsen & Lydiche.
  9. [9]
    Misunderstanding the Rasch model - ResearchGate
    Aug 6, 2025 · Rasch attributes important features of his model to Ronald Fisher's concept of statistical sufficiency, but he appears to have been prepared ...<|separator|>
  10. [10]
    A Scientometric Review of Rasch Measurement: The Rise ... - Frontiers
    A recent review of the literature concluded that Rasch measurement is an influential psychometric approach in psychology research.Missing: Fisher | Show results with:Fisher
  11. [11]
    Georg Rasch and Benjamin D. Wright: The Early Years
    Excerpts from an interview with Benjamin D. Wright taped by David Andrich in Judd 438 during April, 1981, when David was in Chicago for the first International ...
  12. [12]
    (PDF) Ben Wright: the Measure of the Man - ResearchGate
    An extension to the Rasch model for fundamental measurement is described in which there is parameterization not only for examinee ability and item difficulty ...Missing: 1980s | Show results with:1980s
  13. [13]
    MESA Memo 63: Probabilistic Models: Foreword and Preface
    The psychometric research done by Rasch between 1951 and 1959, which he explains and illustrates in this book, marks the point at which psychometrics moved from ...
  14. [14]
    Probabilistic models for some intelligence and attainment tests
    Oct 12, 2022 · Probabilistic models for some intelligence and attainment tests. by: Rasch, G. (Georg), 1901-1980. Publication date: 1980. Topics: Intelligence ...
  15. [15]
    "Fitting the Rasch model under the logistic regression framework ...
    This article showed how and why the Rasch model can be fitted under the logistic regression framework. Then a penalized maximum likelihood (Firth 1993) for ...
  16. [16]
    [PDF] A short note on the Rasch model and latent trait model.
    Apr 10, 2020 · In this note, we will briefly discuss the simplest IRT model called the Rasch model, which is named after Danish statistician Georg Rasch. A ...
  17. [17]
    Does the Rasch Model Convert an Ordinal Scale into an Interval ...
    Rasch analysis provides a transformation of an ordinal score into a linear, interval-level variable, given fit of data to Rasch model expectations.
  18. [18]
    [PDF] Joint and Conditional Maximum Likelihood Estimation for the Rasch ...
    The Rasch model has the attraction of relative simplicity, and parameter estimation is feasible without use of a parametric model for latent ability.
  19. [19]
    [PDF] 1 Estimation Methods for Rasch Measures Linacre, J. M. (1999 ...
    Joint maximum likelihood estimation (JMLE). Leading directly from Fisher sufficiency, and also from Gaussian least-squares, this method produces estimates for ...
  20. [20]
    None
    ### Summary of Marginal Maximum Likelihood (MML) Estimation for the Rasch Model (Bock and Aitkin, 1981)
  21. [21]
    Rasch analysis: A primer for school psychology researchers and ...
    The Rasch model, credited to Danish mathematician Georg Rasch (Citation1960), aims to support true measurement. The mathematics behind the model illustrates the ...
  22. [22]
    [PDF] Rasch Model Estimation: Further Topics - Winsteps.com
    Building on Wright and Masters (1982), several Rasch estimation methods are briefly described, including. Marginal Maximum Likelihood Estimation (MMLE) and ...
  23. [23]
    Specific objectivity - local and general - Rasch.org
    Specific objectivity - local and general. Georg Rasch used the term "specific objectivity" to describe that case essential to measurement in which ...
  24. [24]
    Dichotomous Rasch Model derived from Specific Objectivity
    Specific Objectivity1 is the requirement that the measures produced by a measurement model be sample-free for the agents (test items) and test-free for the ...<|control11|><|separator|>
  25. [25]
    Rasch fit statistics as a test of the invariance of item parameter ...
    This study investigates the degree to which the INFIT and OUTFIT item fit statistics in WINSTEPS detect violations of the invariance property of Rasch ...
  26. [26]
    Sufficient statistics and latent trait models
    The case of equal item--dis- criminating powers is the special case of the Birnbaum-Rasch model usually referred to as the Rasch model. We may thus describe the ...<|control11|><|separator|>
  27. [27]
    Raw Scores as Sufficient Statistics - Rasch.org
    Dichotomous Rasch Model derived from Counting Right Answers: Raw Scores as Sufficient Statistics. In statistical terminology, "sufficient" means "all the ...Missing: total | Show results with:total
  28. [28]
    Asymptotic Properties of Conditional Maximum-Likelihood Estimators
    In this section we shall give the definition of minimal sufficient statistics for a two-parameter model and discuss the existence and uniqueness of such minimal ...
  29. [29]
    Comparing and Choosing between "Partial Credit Models" (PCM ...
    A "rating scale" model is one in which all items (or groups of items) share the same rating scale structure. A "partial credit" model is one in which each item ...
  30. [30]
    A rating formulation for ordered response categories | Psychometrika
    Andrich, D.Applications of a psychometric model for ordered categories scored with successive integers. Paper presented at the A.E.R.A. Conference, New York, ...
  31. [31]
    Rating Scale Model (RSM) or Partial Credit Model (PCM)? - Rasch.org
    The rating scale model specifies that a set of items share the same rating scale structure. It originates in attitude surveys.
  32. [32]
    A rasch model for partial credit scoring | Psychometrika
    Jan 4, 1982 · A Rasch model for rating scales. Doctoral dissertation, University of Chicago, 1980. Wright, B. D. & Masters, G. N. The measurement of ...
  33. [33]
    [PDF] A Multidimensional Rasch Analysis of Gender Differences in PISA ...
    This paper aims to serve two primary ob- jectives (1) investigate the empirical evidence supporting the four sub-content domains using a four-dimensional model, ...
  34. [34]
    [PDF] a comparison of unidimensional and multidimensional rasch
    This process can require an inordinate convergence time for two- or three- parameter models with challenges to consistency in parameter estimation. The Rasch ...
  35. [35]
    [PDF] STATISTICAL INFERENCE FOR THE MULTIDIMENSIONAL ... - HAL
    Jan 25, 2008 · This method involves the approximations of the marginal likelihood and joint moments of the variables. It is also proposed an approximate Akaike ...
  36. [36]
    (PDF) Assessing the dimensionality of the CES-D using multi ...
    May 25, 2018 · Methods The present study applies a multidimensional Rasch model ... partial invariance when the chi-square difference test was applied ...
  37. [37]
    [PDF] Linacre MFRM Book - Winsteps.com
    10 A COMPARATIVE EXAMPLE OF MANY-FACET. MEASUREMENT. The many-facet Rasch model facilitates the analysis of judge-awarded ratings by producing measures asfree ...
  38. [38]
    MESA Memo 61: Facets Model for Judging - Rasch.org
    The facets model is an extension of the partial credit model, designed for examinations which include subjective judgments. Its development enables the benefits ...
  39. [39]
    Why Studying Rasch Measurement is a Smart Career Move
    Dec 10, 2018 · Rasch himself observed that, when the model is applied to a reading comprehension test, it introduces the possibility of 'the reading accuracy ...
  40. [40]
    (PDF) Exploring the use of IRT equating for the GRE Subject Test in ...
    Aug 9, 2025 · A study was conducted to investigate the feasibility of using IRT equating for the GRE Subject Test in Mathematics. Two forms of the test ...
  41. [41]
    Computer-adaptive testing algorithm - Rasch.org
    It modernizes our testing procedures by replacing conventional classical test theory (CTT) with Rasch measurement, and clumsy paper-and-pencil administration ...
  42. [42]
    Utilizing Rasch Measurement Models to Develop a Computer ...
    Purpose: The purpose of this paper is to show how the Rasch model can be used to develop a computer adaptive self-report of walking, climbing, and running.
  43. [43]
    Computerized Adaptive Testing (CAT): Introduction and Benefits
    Apr 11, 2025 · Computerized adaptive testing (CAT) is an AI-based approach that personalizes assessments, making them shorter, more accurate, and more secure.What is computerized adaptive... · Advantages of computerized...
  44. [44]
    Rasch Model: DASS Analysis of Depression, Anxiety, Stress
    May 9, 2009 · The aim of this study was to use Rasch analysis to assess the psychometric properties of the DASS-21 scales, using two different administration modes.
  45. [45]
    Rasch model analysis: Depression, Anxiety, Stress Scales (DASS)
    May 9, 2009 · Conformity of DASS-21 scales to a Rasch partial credit model was assessed using the RUMM2020 software.
  46. [46]
    a simulation study based on Rasch measurement theory
    Aug 12, 2022 · Evaluating the impact of calibration of patient-reported outcomes measures on results from randomized clinical trials: a simulation study based ...
  47. [47]
    [PDF] Why Rasch: Selection of a Quantitative Model
    An instrument carefully constructed and evaluated with the Rasch model provides a sample-invariant measurement scale. Limitations of Classical Test Theory.Missing: advantages | Show results with:advantages
  48. [48]
    The Promising Advantages of Rasch
    Rasch modeling allows for generalizability across samples and items, takes into account that response options may not be psychologically equally spaced.
  49. [49]
    Why Differential Item Functioning Analysis Should Be a Routine Part ...
    Oct 13, 2017 · We provide a tutorial on differential item functioning (DIF) analysis, an analytic method useful for identifying potentially biased items in assessments.
  50. [50]
    Differential Item Functioning and Differential Test Functioning (DIF
    Differential Item Functioning (DIF) investigates the items in a test, one at a time, for signs of interactions with sample characteristics.
  51. [51]
    Developing a fluid intelligence scale through a combination of ...
    Title. Developing a fluid intelligence scale through a combination of Rasch modeling and cognitive psychology. Publication Date. Sep 2014. Publication History.<|control11|><|separator|>
  52. [52]
    Rasch Analysis of the Proactive Personality Scale - Sage Journals
    Jun 29, 2021 · This study evaluated the psychometric properties of the PPS-6 using Rasch analysis. A total of 429 participants completed the PPS-6. Rasch rating scale model ( ...Missing: inventories | Show results with:inventories
  53. [53]
    [PDF] Theoretical considerations on scaling methodology in PISA - OECD
    Dec 13, 2022 · GPCM for binary data is called the 2-parameter model, while the partial credit model (PCM) for binary data is called the Rasch model (Rasch, ...
  54. [54]
    (PDF) Equating Reading Literacy Measures Over Time: A Rasch ...
    Apr 8, 2025 · This study employs a Rasch model approach to equate reading literacy measures across multiple cycles of the Programme for International Student Assessment ( ...
  55. [55]
    Using The Very Useful Wright Map - Rasch.org
    The Wright Map compares exam item difficulty to candidate ability, using two histograms. It helps understand how well a test measures, with items and ...
  56. [56]
    An Evaluation of Overall Goodness-of-Fit Tests for the Rasch Model
    This study compares four of these tests, which are all available in R software: T 10, T 11, M 2, and the LR test.Missing: unusual | Show results with:unusual
  57. [57]
    Estimating Item Discriminations - Rasch.org
    Plots of empirical item characteristic curves (ICCs) enable one to estimate the empirical item discrimination, at least as well as a 2-PL IRT computer program.
  58. [58]
    What do Infit and Outfit, Mean-square and Standardized mean?
    Infit means inlier-sensitive or information-weighted fit. · Outfit means outlier-sensitive fit. · Mean-square fit statistics show the size of the randomness, i.e. ...
  59. [59]
    Item fit statistics for Rasch analysis: can we trust them?
    Aug 28, 2020 · Methods. Data sets were simulated to fit the Rasch model, with sample sizes between 150 and 10 000, and 10, 15 or 20 items.Missing: challenges missing
  60. [60]
    Fit diagnosis: infit outfit mean-square standardized: Winsteps Help
    The mean-square Outfit statistic is also called the Reduced chi-square statistic. ... square) statistics occurring by chance when the data fit the Rasch model.
  61. [61]
    Local Dependency, Correlations and Principal Components
    Local independence specifies that the value of one datum has no influence on another once the underlying variable has been accounted for (conditioned out).
  62. [62]
    Comparison with principal component analysis of residuals - NIH
    Sep 15, 2022 · As those values are used to determine whether the data are unidimensional in the Rasch model, it would be beneficial to compare those to fit ...
  63. [63]
    Chapter 4 Evaluating the Quality of Measures | Rasch ... - Bookdown
    Because model-data fit analyses within the Rasch framework are based on residuals, the first step in the model-data fit analysis is to analyze the data with a ...Missing: unusual | Show results with:unusual
  64. [64]
    Rasch Measurement Analysis Software Directory
    Rasch Measurement Analysis Software Directory ; ConQuest 5 (Windows, Mac) · Facets (Windows) · RUMM2030+ (Windows) ; www.acer.edu.au/conquest · www.winsteps.com/ ...
  65. [65]
    survey and review of packages for the estimation of Rasch models
    From 27 R packages indexed with the word “Rasch”, 11 packages capable of Rasch estimation and analysis are identified and critiqued.
  66. [66]
  67. [67]
    CRAN: Package TAM - R Project
    Aug 28, 2025 · TAM: Test Analysis Modules. Includes marginal maximum likelihood estimation and joint maximum likelihood estimation for unidimensional and multidimensional ...
  68. [68]
    CRAN: Package eRm
    May 27, 2025 · eRm: Extended Rasch Modeling. Fits Rasch models (RM), linear logistic test models (LLTM), rating scale model (RSM), linear rating scale models (LRSM), partial ...
  69. [69]
    WINSTEPS Rasch Software - Winsteps 5.10.3
    Winsteps constructs Rasch measures from simple rectangular data sets, usually of persons and items, using JMLE and CMLE.Ministep · Winsteps User Manual PDF · 64-bit
  70. [70]
    Psychomeasurement Systems – Software and Consulting Services
    jMetrik. jMetrik is a free and open source computer program for psychometric analysis. Methods include classical item analysis, differential item functioning, ...Psychomeasurement · jMetrik Features · About Us · Contact Us
  71. [71]
    SSI Software
    IRTPRO™ is an advanced application for item calibration and test scoring using item response theory (IRT). It comes with an intuitive graphical user interface ...Software Licenses · Lisrel · HLM Academic · IRTPRO™ - Trial
  72. [72]
    ConQuest - ACER - Australian Council for Educational Research
    ACER ConQuest 5 is a computer program for fitting both unidimensional and multidimensional item response and latent regression models.
  73. [73]
    IBMPredictiveAnalytics/SPSSINC_RASCH: Estimate a Rasch model ...
    This procedure estimates the parameters of the Rasch model for item response data, using the rasch function from the R ltm package.
  74. [74]
    IRT in SPSS Using the SPIRIT Macro - PMC - NIH
    A wide variety of one-parameter item response theory (IRT) models can now be easily run on SPSS using the SPIRIT macro.<|separator|>
  75. [75]
    [PDF] 342-2011: Using PROC LOGISTIC to Estimate the Rasch Model
    This paper develops SAS code to estimate the Rasch model using PROC LOGISTIC in order to produce results consistent or comparable with the estimates from ...
  76. [76]
    [PDF] %lrasch_mml: A SAS Macro for Marginal Maximum Likelihood ...
    This paper describes a SAS macro that fits two-dimensional polytomous Rasch models using a specifi- cation of the model that is sufficiently flexible to ...
  77. [77]
    (PDF) On designing data-sampling for Rasch model calibrating an ...
    In correspondence with pertinent statistical tests, it is of practical importance to design data-sampling when the Rasch model is used for calibrating an ...Missing: guidelines, | Show results with:guidelines,
  78. [78]
    Sample Size and Item Calibration or Person Measure Stability
    The Rasch model is blind to what is a person and what is an item, so the numbers are the same. Rasch is the same as any other statistical analysis with a small ...Missing: missing | Show results with:missing
  79. [79]
    Rasch fit statistics and sample size considerations for polytomous data
    May 29, 2008 · The relationship between sample size and fit statistics was explored using the two Rasch models, that is, the Rating Scale Model [26], and ...
  80. [80]
    Is Rasch model analysis applicable in small sample size pilot ...
    Conclusions: Rasch analysis based on small samples (≤ 50) identified a greater number of items with incorrectly ordered parameters than larger samples (≥ 100).
  81. [81]
    Interpreting results from Rasch analysis 2. Advanced model ...
    The present paper presents developments and advanced practical applications of Rasch's theory and statistical analysis to construct questionnaires for ...
  82. [82]
    Detecting Item Misfit in Rasch Models - ResearchGate
    Aug 7, 2025 · Results indicate the limitations of small samples (n < 500) in correctly detecting item misfit, especially when a larger proportion of items are ...
  83. [83]
    Rasch analysis in physics education research: Why measurement ...
    Jul 3, 2019 · The Rasch model is a probabilistic model which describes the interaction of persons (test takers or survey respondents) with test or survey items.
  84. [84]
    Combining (Collapsing) and Splitting Categories - Rasch.org
    For a dichotomous item, there are two categories: present/ absent, yes/no, right/wrong. For polytomous items, there can be many categories: none/some/many/all. ...Missing: Common pitfalls
  85. [85]
    [PDF] Collapsing or Not? A Practical Guide to Handling Sparse ...
    When collapsing categories to improve polytomous data-model fit, ob served improved model fit may be spurious and only applicable to the collected sample. ( ...<|separator|>
  86. [86]
    Rasch validation of the Warwick-Edinburgh Mental Well-Being Scale ...
    Feb 17, 2023 · Residuals greater than 2.5 or smaller than 2.5 indicate item redundancy and item misfit, respectively [10]. Item fit analysis takes into account ...
  87. [87]
    Psychometric limitations of the 13-item Sense of Coherence Scale ...
    Jun 8, 2017 · Collapsing categories at the low end of the 7-category rating scale improved its overall functioning. Two items demonstrated poor fit to the ...
  88. [88]
    Position Statement on High-Stakes Testing
    This position statement on high-stakes testing is based on the 1999 Standards for Educational and Psychological Testing. The Standards represent a professional ...
  89. [89]
    [PDF] standards_2014edition.pdf
    American Educational Research Association. Standards for educational and psychological testing / American Educational Research Association,.Missing: Rasch | Show results with:Rasch
  90. [90]
    [PDF] Integrating Machine Learning into Item Response Theory for ... - Lirias
    This study proposes a hybrid approach by combining the strength of IRT models with machine learning. Specifically, the approach integrates the Rasch model with ...Missing: directions | Show results with:directions
  91. [91]
    Survey of Computerized Adaptive Testing: A Machine Learning ...
    A pivotal advancement was the integration of the Rasch model from Item Response Theory (IRT) (Roskam and Jansen, 1984) , which allowed CAT to adaptively match ...