Fact-checked by Grok 2 weeks ago

Pseudo-R-squared

Pseudo-R-squared, also known as pseudo R^2, encompasses a collection of goodness-of-fit statistics employed in and other generalized linear models to evaluate the explanatory power of a model, serving as an analogue to the (R^2) in ordinary least squares . Unlike the traditional R^2, which quantifies the proportion of variance explained by the model in , pseudo-R-squared measures do not directly assess variance reduction due to the absence of a homoscedastic error term in nonlinear models; instead, they typically compare the log-likelihood of the fitted model to that of a (intercept-only) model, yielding values between 0 and 1 where higher values indicate better fit, though interpretations vary across indices. These measures are particularly vital for binary outcome models, such as logistic regression, where maximum likelihood estimation is used rather than least squares, and no single pseudo-R-squared is universally preferred due to differences in scaling, sensitivity to model complexity, and comparability across datasets. Common variants include McFadden's pseudo-R-squared, calculated as $1 - \frac{\log L(M)}{\log L(0)} (where \log L(M) is the log-likelihood of the full model and \log L(0) of the null model), which assesses relative improvement in likelihood but tends to yield lower values; Cox and Snell’s, given by $1 - \left( \frac{L(0)}{L(M)} \right)^{2/n} (with n as sample size), which is bounded below 1; and Nagelkerke’s adjustment to Cox and Snell, scaled to reach a maximum of 1 for fuller interpretability. Other indices, such as those proposed by McKelvey and Zavoina or Efron, incorporate latent variable predictions or residual correlations, while adjusted versions penalize for the number of predictors to avoid overfitting. Despite their utility in model comparison and reporting effect sizes, pseudo-R-squared values are generally smaller than OLS R^2 and should not be interpreted as proportions of explained variance, with caution advised against direct cross-model comparisons due to varying properties like monotonicity with added regressors or bounds exceeding 1 in some cases. Desirable pseudo-R-squared measures, as outlined in information-theoretic frameworks, emphasize properties such as nondecreasing behavior with additional variables and interpretability via reductions in uncertainty (e.g., Kullback-Leibler divergence), extending applicability to models like or regressions.

Overview and Background

Definition and Purpose

Pseudo-R-squared serves as an analog to the ordinary R-squared statistic for generalized linear models (GLMs), such as , where the assumptions of ordinary (OLS) regression—particularly and homoscedasticity—do not hold. In these non-linear contexts, pseudo-R-squared provides a measure of goodness-of-fit by adapting concepts from to evaluate how well the model captures the relationship between predictors and the outcome variable. Unlike OLS, which relies on squared residuals, pseudo-R-squared draws on likelihood principles to assess model adequacy without assuming a continuous dependent variable. The primary purpose of pseudo-R-squared is to quantify the improvement in model fit attributable to the inclusion of predictor variables, relative to a baseline null model (typically an intercept-only model). It offers a way to gauge predictive strength or in categorical or non-normal outcomes, but crucially, it does not represent the proportion of variance explained, as this concept is not directly applicable in GLMs. Instead, it focuses on the relative enhancement of the model's likelihood, helping researchers compare models or evaluate whether predictors meaningfully reduce uncertainty in predictions. Pseudo-R-squared values generally range from 0, indicating no over the model, to less than 1, though the exact upper bound depends on the specific measure employed. This variability arises because different pseudo-R-squared indices normalize the likelihood ratio statistic in distinct ways to approximate the interpretive convenience of OLS R-squared. While some measures are deviance-based—using the deviance as a scaled version of the log-likelihood—pseudo-R-squared emphasizes likelihood ratio principles to provide a standardized, albeit approximate, indicator of fit .

Relation to Ordinary R-squared

In ordinary least squares (OLS) , the , known as the ordinary R-squared (R^2), quantifies the proportion of the total variance in the dependent variable that is explained by the model. It is calculated as R^2 = 1 - \frac{\sum (y_i - \hat{y}_i)^2}{\sum (y_i - \bar{y})^2}, where SS_{\text{res}} is the \sum (y_i - \hat{y}_i)^2, SS_{\text{tot}} is the \sum (y_i - \bar{y})^2, y_i are the observed values, \hat{y}_i are the predicted values, and \bar{y} is the mean of the observed values. This measure relies on the assumptions of normally distributed errors and homoscedasticity (constant variance of errors), which ensure that the sum of squared residuals provides a meaningful decomposition of variance. These assumptions do not hold in generalized linear models (GLMs) and other non-linear regression frameworks, such as logistic regression, where the dependent variable is often categorical or bounded, and errors follow a non-normal distribution from the exponential family with variance depending on the mean (heteroscedasticity in the original scale). As a result, the ordinary R^2 based on sum of squares can produce invalid results, such as values outside the [0,1] range, failure to increase with added predictors, or lack of a coherent variance decomposition, rendering it inapplicable for assessing fit. Instead, pseudo-R-squared measures adapt the concept by substituting likelihood functions or deviance statistics for sums of squares, comparing the fitted model's explanatory power to that of a null (intercept-only) model. Unlike R^2, which directly measures explained variance, pseudo-R-squared does not quantify the proportion of variance accounted for but rather indicates the relative improvement in model fit over the baseline, with interpretations varying by specific measure. However, in the special case of a with a canonical identity link and known variance, certain pseudo-R-squared measures, such as those based on Kullback-Leibler divergence, reduce exactly to the R^2.

Historical Context

The development of pseudo-R-squared measures was driven by the increasing adoption of logistic and models for analyzing categorical data in the social sciences and during the 1960s and 1970s, when researchers sought goodness-of-fit statistics comparable to the ordinary R-squared but adapted for frameworks. These models, popularized after early work on by in 1958 and in , lacked direct analogs for assessing , leading to initial reliance on ad-hoc measures like deviance reductions from theory introduced by Nelder and Wedderburn in 1972. The earliest formal proposal for a pseudo-R² appeared in 1970 with Cragg and Uhler's likelihood ratio-based index for limited dependent variable models, applied in their analysis of automobile demand, marking the initial attempt to standardize fit assessment in nonlinear contexts. This was soon followed by McFadden's 1974 contribution in discrete choice modeling, where the measure was framed as a proportional reduction in log-likelihood relative to a null model, influencing applications in econometrics and transportation research. Later, Cox and Snell formalized a similar likelihood-based pseudo-R² in 1989 for binary data analysis. These measures developed independently, with no single inventor, often in response to limitations in early statistical software like GLIM (released in 1974) and early SAS implementations, which provided deviance statistics but not intuitive R-squared-like summaries for non-technical users. Subsequent refinements addressed scaling issues, such as Nagelkerke's 1991 adjustment to the Cox-Snell measure, which rescales it to reach a maximum of 1 for perfect fit, enhancing comparability across models. In 2009, Tjur proposed the coefficient of discrimination for binary outcomes, emphasizing discriminatory power over likelihood ratios as a more intuitive metric. Overall, the evolution reflected a shift from informal deviance-based comparisons to standardized pseudo-R² indices, enabling broader model evaluation and reporting in fields like and social sciences, though diversity in definitions persisted due to the inherent challenges of non-normal outcomes.

Common Measures

Cox and Snell’s Pseudo-R² (R²CS)

Cox and Snell's pseudo-R², denoted as R²_CS, was formalized by David R. Cox and Evelyn J. Snell in their 1989 book on the analysis of , which belongs to the of distributions, though first proposed by Cragg and Uhler (1970), and has since been generalized to other generalized linear models (GLMs). This measure serves as an analog to the ordinary R², assessing the improvement in model fit by comparing the likelihood of a null (intercept-only) model to that of the fitted model. The formula for R²_CS is given by: R^2_{CS} = 1 - \exp\left( -\frac{2}{n} (\log L_{\text{null}} - \log L_{\text{full}}) \right) where L_{\text{null}} is the likelihood of the null model, L_{\text{full}} is the likelihood of the fitted model, and n is the sample size. This expression derives from the proportional reduction in deviance, where the deviance is defined as D = -2 \log L; the term -2 (\log L_{\text{null}} - \log L_{\text{full}}) represents the likelihood ratio test statistic (G²), measuring the deviance reduction from the null to the full model, and the exponential transformation scaled by $1/n provides a measure akin to the geometric mean improvement in likelihood. Key properties of R²_CS include its range from 0 (indicating no improvement over the null model) to a value strictly less than 1, even for perfect fits, with an upper bound of $1 - \exp\left( -\frac{2}{n} \log L_{\text{null}} \right) = 1 - L_{\text{null}}^{2/n}. This bound often results in low values for R²_CS, even when the model provides a substantial improvement in fit, particularly in datasets with small n or unbalanced outcomes. Due to these characteristics, R²_CS is commonly reported by statistical software such as , , and packages for and GLMs.

Nagelkerke’s Pseudo-R² (R²N)

Nagelkerke’s pseudo-R², denoted as R^2_N, was proposed by Nico J. D. Nagelkerke in 1991 as an adjustment to and Snell’s pseudo-R² specifically for models, aiming to provide a more intuitive measure of fit by scaling values to the full 0-1 range. This normalization addresses the limitation of and Snell’s measure, which cannot reach 1 even for a perfect model due to the discrete nature of the outcome variable in . The derivation involves direct scaling of Cox and Snell’s pseudo-R² (R^2_{CS}) by dividing it by its theoretical maximum value under a , ensuring the adjusted measure achieves an upper bound of 1. The formula is given by: R^2_N = \frac{R^2_{CS}}{1 - \exp\left( \frac{2}{n} l(0) \right)}, where n is the sample size and l(0) is the log-likelihood of the null model (intercept-only). Equivalently, it can be expressed directly in terms of likelihoods as: R^2_N = \frac{1 - \exp\left\{ \frac{2}{n} [l(0) - l(\hat{\theta})] \right\}}{1 - \exp\left\{ \frac{2}{n} l(0) \right\}}, with l(\hat{\theta}) denoting the log-likelihood of the fitted model. Key properties of R^2_N include its range from 0 (null model) to 1 (perfect prediction), making it always greater than or equal to R^2_{CS} for the same model and facilitating comparisons with the saturated model’s fit. It is asymptotically independent of sample size and maximizes under maximum likelihood estimation, offering an interpretable proportion of explained variation analogous to ordinary least squares R². Due to these attributes, Nagelkerke’s measure is widely recommended in statistical software like SPSS and for reporting logistic regression goodness-of-fit, as it provides a standardized scale for model evaluation.

McFadden’s Pseudo-R² (R²McF)

McFadden's pseudo-R², also known as the likelihood ratio index, was developed by Daniel McFadden in the 1970s as a measure of goodness-of-fit for discrete choice models, particularly the multinomial logit model used in econometrics and transportation research. It quantifies the improvement in model fit by comparing the log-likelihood of the full model to that of a null model, typically one with only intercepts and no predictors. This measure is widely adopted in software for estimating maximum likelihood models, such as Stata, where it is the default pseudo-R² reported for logistic and multinomial logit regressions. The formula for McFadden's pseudo-R² is given by R^2_{\text{McF}} = 1 - \frac{\ln L_{\text{full}}}{\ln L_{\text{null}}}, where \ln L_{\text{full}} is the log-likelihood of the fitted (full) model and \ln L_{\text{null}} is the log-likelihood of the null model. This derives from the proportional reduction in the log-likelihood achieved by including predictors relative to the null model, analogous to the statistic but normalized to resemble the ordinary R². In , the null model assumes independence of observations from predictors, providing a for assessing . Key properties include a range from 0 (no improvement over null) to values approaching but rarely reaching 1, as perfect prediction is infeasible in probabilistic models like logit due to inherent randomness. Values between 0.2 and 0.4 are typically interpreted as indicating a very good fit, reflecting substantial explanatory power in discrete choice contexts. The measure is invariant to proportional scaling of the utility function parameters, a feature arising from the normalization in random utility models, but it is sensitive to the specification of the null model—for instance, whether the null includes alternative-specific constants or excludes them entirely.

Cohen’s Pseudo-R² (R²L)

Cohen’s pseudo-R², denoted as R²_L, is equivalent to McFadden's pseudo-R² and was discussed by Jacob Cohen in 1983 as a likelihood ratio-based analog to the ordinary least squares R-squared for logistic regression models with binomial outcomes. It quantifies the proportional reduction in the log-likelihood attributable to the predictors. This measure is particularly useful in generalized linear models (GLMs) where log-likelihood functions as a measure of model adequacy, analogous to residual sum of squares in linear regression. The formula for ’s pseudo-R² is given by: R^2_L = 1 - \frac{\ln L_{\text{full}}}{\ln L_{\text{null}}} where \ln L_{\text{full}} is the log-likelihood of the fitted (full) model and \ln L_{\text{null}} is the log-likelihood of the null (intercept-only) model. This measure possesses several key properties: it is bounded between 0 and 1, with values closer to 1 indicating better ; it does not include a separate penalty for model complexity, as it is unadjusted like the base McFadden measure; and it is less commonly distinguished from McFadden's due to their equivalence, though sometimes referenced in behavioral sciences contexts. Despite its limitations, R²_L provides a useful for comparing model fit in logistic contexts.

Tjur’s Coefficient of Discrimination (R²T)

Tjur’s coefficient of discrimination, also known as R^2_T, serves as a pseudo-R-squared measure specifically designed for binary logistic regression models to assess the model's discriminatory strength between outcome categories. Introduced by Tue Tjur in 2009, it provides an intuitive evaluation by focusing on the separation in predicted probabilities rather than goodness-of-fit based on likelihoods. The measure is computed using the formula R^2_T = \overline{\hat{y}}_1 - \overline{\hat{y}}_0, where \overline{\hat{y}}_1 denotes the average of the predicted probabilities for cases where the observed outcome is 1 (positive class), and \overline{\hat{y}}_0 is the average for cases where the outcome is 0 (negative class). These predicted probabilities are derived directly from the fitted values of the logistic regression model. This derivation stems from comparing the mean fitted probabilities across the two groups defined by the binary response variable, emphasizing the model's ability to assign higher probabilities to the correct category on average. By avoiding reliance on likelihood functions or deviance statistics, R^2_T isolates the pure discrimination aspect, making it distinct from measures like McFadden's or Nagelkerke's pseudo-R². Notable properties of R^2_T include its bounded range of 0 to 1, with 0 signifying no discriminatory power (equivalent predicted probabilities across groups) and 1 indicating perfect discrimination (predicted probabilities of 1 for the positive class and 0 for the negative). It has been recommended for its straightforward interpretability, as the coefficient value itself quantifies the magnitude of separation in a scale akin to the ordinary R-squared, without the need for arbitrary adjustments or transformations.

Interpretation and Application

Guidelines for Values

Interpreting pseudo-R² values requires caution, as they measure relative improvement in model fit over a (typically the model) rather than the proportion of variance explained, unlike the R-squared in . These values should primarily be used for comparing models within the same dataset and model family, such as logistic regressions, rather than establishing quality. No thresholds exist across all pseudo-R² measures due to their differing scales and properties; instead, they are best evaluated alongside information criteria like AIC and for comprehensive model assessment. Common rules of thumb suggest that pseudo-R² values below 0.2 indicate weak fit, 0.2 to 0.4 suggest moderate fit, and above 0.4 indicate strong fit, though these ranges must be adjusted per measure since some, like McFadden's R²McF, systematically produce lower values than others, such as Nagelkerke's R²N. For instance, McFadden's R²McF values between 0.2 and 0.4 are often regarded as excellent, reflecting substantial improvement over the intercept-only model in contexts. In contrast, Nagelkerke's R²N tends to yield higher values closer to those of R-squared and can reach 1 for perfect , making thresholds like >0.4 more indicative of strong relative fit. Measure-specific guidelines further refine interpretation: Tjur's coefficient of discrimination R²T, which assesses the difference in mean predicted probabilities between outcome groups, with higher values signaling stronger separation between groups. Across all measures, pseudo-R² values emphasize proportional gains in explanatory power rather than predictive accuracy alone, underscoring the need for contextual evaluation within similar modeling scenarios.

Limitations and Misuses

Pseudo-R-squared measures in logistic regression and other generalized linear models possess several inherent limitations that distinguish them from the ordinary R-squared in linear regression. Unlike the ordinary R-squared, which directly quantifies the proportion of variance explained by the model, pseudo-R-squared values cannot be interpreted as such due to the non-linear nature of these models and the lack of a direct equivalent to total sum of squares. They are not additive across predictors, meaning the incremental contribution of additional variables does not accumulate in a straightforward manner as it does in ordinary least squares. Furthermore, these measures are sensitive to sample size and the number of predictors; for instance, larger samples can inflate pseudo-R-squared values through increased model complexity without necessarily improving predictive accuracy. Certain pseudo-R-squared variants have upper bounds less than 1, limiting their interpretability and comparability. and Snell's pseudo-R-squared, for example, cannot reach 1 even for a perfect model, with its maximum value depending on the sample size and of the outcome (e.g., approaching but not exceeding 0.75 for balanced outcomes). Nagelkerke's adjustment scales it to a maximum of 1, but this can produce misleadingly high values that overestimate model fit. These measures are also sensitive to predictors and data characteristics, such as , which tends to decrease pseudo-R-squared values, and base rates, where low event probabilities disproportionately affect uncorrected indices. Additionally, pseudo-R-squared is not equivalent to a squared , despite some variants (like Efron's) relating to correlations between observed and predicted outcomes; this connection does not hold universally across measures. Common misuses of pseudo-R-squared include treating it as an indicator of explained variance akin to ordinary R-squared, which can lead to overinterpretation of model performance. Another frequent error is comparing values across different pseudo-R-squared types (e.g., McFadden's versus Tjur's) or model families (e.g., logistic versus ), as they operate on disparate scales and optimization criteria, rendering such comparisons invalid. All pseudo-R-squared measures can yield high values for overfitted models, particularly when numerous predictors are included without adjustment, masking poor generalizability. Recent critiques emphasize that pseudo-R-squared should be used alongside cross-validation to assess out-of-sample performance, as it tends to overestimate fit in small samples or complex models and is unsuitable for direct predictive accuracy evaluation.

Model Selection and Reporting

In model selection for logistic regression and related generalized linear models, pseudo-R² measures are often combined with information criteria such as the Akaike information criterion (AIC) to balance goodness-of-fit and model parsimony, as AIC penalizes excessive complexity while pseudo-R² evaluates explanatory power. For instance, researchers may select the model with the lowest AIC among candidates and corroborate this with improvements in pseudo-R², such as McFadden's R², to ensure the chosen model enhances predictive accuracy without overfitting. Reporting multiple pseudo-R² indices, like both Nagelkerke's and McFadden's, promotes transparency by revealing how different measures assess fit on the same dataset, avoiding overreliance on a single metric that may vary substantially. Best practices for emphasize specifying the exact type of pseudo-R² to facilitate and replication; for example, one might state "Nagelkerke's pseudo-R² = 0.35, indicating moderate ." Where feasible, include confidence intervals for pseudo-R² estimates, particularly in simulation-based validation, to quantify uncertainty in model fit. Style guides, including the (APA) 7th edition, recommend contextualizing pseudo-R² values with residual diagnostics, such as deviance residuals, to verify assumptions like in the and absence of influential outliers, ensuring the measure reflects robust model performance rather than isolated fit. Pseudo-R² plays a supportive role in stepwise selection procedures or cross-validation by comparing nested models on the same data—for instance, tracking changes during forward or backward elimination—but should not be the sole , as it may inflate with added predictors without improving generalizability. Instead, integrate it with AIC or out-of-sample validation to mitigate risks like bias, prioritizing overall model utility over isolated pseudo-R² gains.

Comparisons

Relative Strengths and Weaknesses

The pseudo-R-squared measures for logistic and related models each offer distinct advantages and limitations when compared to one another, particularly in terms of their theoretical , computational simplicity, sensitivity to model complexity, and applicability across data structures. and Snell's pseudo-R² (R²_CS) serves as a foundational measure, deriving directly from likelihood ratios in a manner analogous to the OLS R², making it computationally straightforward and interpretable as a proportional reduction in deviance; however, its upper bound is inherently less than 1 (often capped around 0.75 for balanced data), preventing it from reaching the full scale of traditional R² even for perfect fits, which limits comparability across models. Nagelkerke's pseudo-R² (R²_N) addresses this range limitation by normalizing and Snell's measure to achieve a maximum of 1, enhancing its interpretability and alignment with OLS R² for reporting purposes, while remaining easy to compute as a simple rescaling; yet, it retains dependence on the model's likelihood, which can inflate values in imbalanced datasets and reduce relative to base-rate-independent alternatives like McFadden's. McFadden's pseudo-R² (R²_McF) excels in simplicity for models, relying solely on log-likelihood ratios without additional adjustments, and demonstrates relative from the of the outcome, making it robust for compared to likelihood-bounded measures; its primary drawback is consistently producing low values (often below 0.2 even for strong models), which can understate fit relative to rescaled options like Nagelkerke's and complicate intuitive communication. Cohen's pseudo-R² (R²_L) incorporates an adjustment for model complexity by penalizing for the number of predictors, akin to adjusted R² in linear models, which provides a strength in preventing evaluations compared to unadjusted measures like McFadden's; however, this penalty is somewhat and can overly deflate values in parsimonious models, reducing its appeal relative to more theoretically grounded alternatives. Tjur's coefficient of (R²_T), introduced post-2009, offers intuitive interpretability as the difference in average predicted probabilities between outcome groups, achieving a full 0-1 range and ease of computation without likelihood reliance, outperforming likelihood-based measures in direct assessment for binary outcomes; its limitation lies in applicability restricted to binary cases, lacking straightforward extensions to multinomial or ordinal models unlike McFadden's broader utility.
MeasureKey StrengthKey WeaknessRelative to Others
and Snell (R²_CS)Foundational likelihood-based analogy to OLS R²Bounded upper limit <1Less scalable than Nagelkerke's normalization; more bounded than Tjur's full range
Nagelkerke (R²_N)Normalized 0-1 range for better comparabilityLikelihood-dependent inflationImproves on and Snell's bound but less independent than McFadden's base rate stability
McFadden (R²_McF)Simple, base-rate independent for discrete modelsProduces low valuesMore robust to rarity than and Snell but undervalues fit vs. rescaled measures
(R²_L)Adjusts for complexity to avoid Ad-hoc penalty can deflate valuesPenalizes more than unadjusted like McFadden; less theoretical than likelihood-based
Tjur (R²_T)Intuitive discrimination metric with full rangeLimited to binary outcomesMore intuitive than likelihood measures but less generalizable than McFadden's

Choosing an Appropriate Measure

Selecting an appropriate pseudo-R-squared measure for models requires consideration of the research context, as no single variant is universally superior. The choice hinges on factors such as the model's purpose—whether emphasizing overall fit, predictive discrimination, or comparability to analogs—and the underlying model family, such as versus multinomial outcomes. For instance, in general applications where an intuitive scale from 0 to 1 is desired, Nagelkerke's pseudo-R² (R²N) is often preferred due to its adjustment of the & Snell measure to achieve this full range, facilitating easier interpretation akin to ordinary R². In econometric analyses, McFadden's pseudo-R² (R²McF) remains a standard choice, as it directly relates to the improvement in log-likelihood over the model and aligns well with likelihood-based inference common in that field. For outcome models with a focus on predictive , Tjur's of (R²T) is increasingly recommended, as it quantifies the difference in average predicted probabilities between event and non-event groups, providing a straightforward measure of separation without relying on likelihood ratios. These selections reflect the measure's with goals: R²N for broad goodness-of-fit assessment, R²McF for theoretical model comparison, and R²T for emphasis on performance. Software availability also influences the decision, as default outputs vary across packages. In , the base glm function does not compute pseudo-R² directly, but packages like pscl and DescTools provide multiple measures (e.g., R²McF, R²N, and others) via functions such as pR2, allowing users to select or report several without additional coding. Stata's fitstat command similarly outputs a suite including R²McF and R²N, while reports McFadden's pseudo-R² (R²McF) by default. For multinomial models, R²McF is more readily available and interpretable across categories, whereas R²T is primarily suited to cases and may require manual computation in some environments. Sample size and event prevalence can affect measure comparability, as some variants like & Snell have upper bounds dependent on the marginal proportion of events, potentially leading to underestimation in imbalanced datasets; thus, adjustments like R²N or R²T are favored in such scenarios to mitigate . Given these nuances, hybrid reporting of at least two measures (e.g., R²McF alongside R²T) is recommended to provide a balanced view of model performance. In contexts during the 2020s, there has been growing preference for Tjur-like metrics emphasizing discrimination, particularly in predictive modeling where separation between classes is prioritized over likelihood-based fit. The following table summarizes key decision criteria for selecting a pseudo-R² measure:
CriterionRecommended MeasureRationale
General logistic fit (intuitive 0-1 scale)Nagelkerke's R²NAdjusted for full range; suitable for broad applications.
Econometric/likelihood-based comparisonMcFadden's R²McFAligns with log-likelihood improvements; standard in .
Binary prediction/discriminationTjur's R²TMeasures probability separation; intuitive for tasks.
Multinomial modelsMcFadden's R²McFHandles multiple categories; widely supported in software.
Imbalanced data/small samplesNagelkerke's R²N or Tjur's R²TLess sensitive to event proportion or size effects.
This framework aids in transparent selection, ensuring the measure supports the study's objectives without over-relying on any one variant.

References

  1. [1]
    FAQ: What are pseudo R-squareds? - OARC Stats - UCLA
    Oct 20, 2011 · A non-pseudo R-squared is a statistic generated in ordinary least squares (OLS) regression that is often used as a goodness-of-fit measure.
  2. [2]
    [PDF] Nine Pseudo R^2 Indices for Binary Logistic Regression Models ...
    May 1, 2016 · In logistic regression, pseudo R2 indices proffer an indication of model fit, and are similar to variance accounted for metrics affiliated with ...
  3. [3]
    [PDF] An R-squared measure of goodness of fit for some common ...
    Mar 31, 1995 · Desirable properties of an R-squared include interpretation in terms of the information content of the data, and sufficient generality to cover ...
  4. [4]
    Logit Regression | R Data Analysis Examples - OARC Stats - UCLA
    They all attempt to provide information similar to that provided by R-squared in OLS regression; however, none of them can be interpreted exactly as R-squared ...<|control11|><|separator|>
  5. [5]
    [PDF] A Comparison of Logistic Regression Pseudo R Indices
    Adequacy of fit for a logistic regression model is typically assessed by assessing (1) the significance of the omnibus chi-square test of the model coefficients ...
  6. [6]
    Coefficients of Determination in Logistic Regression Models—A ...
    Jan 1, 2012 · We propose the name “the coefficient of discrimination” for this statistic, and recommend its use as a standard measure of explanatory power.
  7. [7]
    Analysis of Binary Data - 2nd Edition - D.R. Cox - Routledge
    In stock Free deliveryThe first edition of this book (1970) set out a systematic basis for the analysis of binary data and in particular for the study of how the probability of ...
  8. [8]
    [PDF] A Comparison of Logistic Regression Pseudo R Indices
    The first cluster consists of the Nagelkerke and Veall-Zimmerman indices, while the second cluster consists of the remaining indices. Within this second cluster ...
  9. [9]
    What's the Best R-Squared for Logistic Regression?
    Feb 13, 2013 · SPSS reports the Cox-Snell measures for binary logistic regression but McFadden's measure for multinomial and ordered logit. For years, I've ...
  10. [10]
  11. [11]
  12. [12]
    [PDF] Conditional Logit Analysis of Qualitative Choice Behavior
    Conditional logit analysis is a procedure for modeling population choice behavior, especially for qualitative choices, and is used in areas like choice of ...Missing: pseudo | Show results with:pseudo
  13. [13]
    [PDF] logit — Logistic regression, reporting coefficients - Stata
    Apr 13, 2024 · The `logit` command fits a logit model for a binary response, modeling the probability of a positive outcome given regressors.
  14. [14]
    [PDF] PSEUDO-R2 IN LOGISTIC REGRESSION MODEL
    Pseudo-R2 is an entropy-based measure used to assess the predictive strength of a logistic regression model, similar to R2 in linear regression.
  15. [15]
    Applied Multiple Regression/Correlation Analysis for the Behavioral Sc
    Jun 17, 2013 · Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences. ByJacob Cohen, Patricia Cohen, Stephen G. West, Leona S. Aiken.Missing: pseudo R squared 1986<|control11|><|separator|>
  16. [16]
    Statistical Methods for the Analysis of Discrete Choice Experiments
    Although McFadden's pseudo R-squared provides a measure of relative (rather than absolute) model fit, a measure from 0.2 to 0.4 can be considered a good model ...Missing: R2 | Show results with:R2
  17. [17]
    A comparison of selection and estimation methods in small data sets
    Aug 6, 2025 · In social science research, R 2 Tjur values between 0.1 and 0.3 are considered to indicate moderate explanatory power (Steyerberg et al. ... good ...Missing: R2 | Show results with:R2
  18. [18]
    [PDF] Pseudo R2 and Information Measures (AIC & BIC)
    Sep 8, 2024 · However, fitstat also reports several over pseudo R^2 statistics. The formulas and rationale for each of these is presented in Appendix A.
  19. [19]
    How to Interpret Logistic Regression Outputs - Displayr Help
    Aug 9, 2024 · The disadvantage of pseudo-r-squared statistics is that they are only useful when compared to other models fit to the same data set (i.e., it is ...
  20. [20]
    Residuals and regression diagnostics: focusing on logistic regression
    This article primarily aims to describe how to perform model diagnostics by using R. A basic type of graph is to plot residuals against predictors or fitted ...<|separator|>
  21. [21]
    pR2 compute various pseudo-R2 measures - RDocumentation
    Numerous pseudo r-squared measures have been proposed for generalized linear models, involving a comparison of the log-likelihood for the fitted model.<|control11|><|separator|>
  22. [22]
    Pseudo R2 Statistics — PseudoR2 • DescTools
    Nagelkerke's R2 (also sometimes called Cragg-Uhler) is an adjusted version of the Cox and Snell's R2 that adjusts the scale of the statistic to cover the full ...
  23. [23]
    Logistic regression: the best R2 or the best combination of good ones?
    Many different R2 statistics (popularly known as pseudo R2) have been proposed for logistic regression in the past four and a half decades (there exist a ...