Fact-checked by Grok 2 weeks ago

Coefficient of determination

The coefficient of determination, often denoted as , is a statistical measure in that quantifies the proportion of the total variance in the dependent variable that can be explained by the independent variable(s) in a model. It ranges from 0 to 1, where a value of 0 indicates that the model explains none of the variability, a value of 1 indicates a perfect fit, and intermediate values represent the of variance accounted for by the model (e.g., an of 0.75 means 75% of the variance is explained). Introduced by Sewall Wright in his 1921 paper "Correlation and Causation," the concept emerged in the context of path analysis to assess relationships in complex systems, such as agricultural and . In , is equivalent to the square of the (r) between the observed and predicted values, providing a direct link to measures of linear association. For multiple models of the form Y_i = \beta_0 + \beta_1 X_{i1} + \cdots + \beta_p X_{ip} + \epsilon_i, where Y_i is the dependent variable, X_{ij} are predictors, \beta_j are coefficients, and \epsilon_i is the error term, is calculated as the ratio of the (SSR) to the (SST): R^2 = \frac{SSR}{SST} = 1 - \frac{SSE}{SST}, with SSE denoting the sum of squared residuals (unexplained variance). This decomposition—SST = SSR + SSE—highlights how partitions total variability into explained and residual components, making it a key goodness-of-fit statistic. While widely used across fields like , sciences, and to evaluate model performance, has limitations: it does not imply causation, can increase with irrelevant predictors in multiple (leading to the development of adjusted ), and its "good" value depends on context—for instance, values above 0.8 may be excellent in physical sciences but modest in behavioral studies. Despite these caveats, remains a foundational tool for interpreting the explanatory power of models in statistical .

Definitions

Proportion of explained variance

The coefficient of determination, denoted R^2, quantifies the proportion of the total variance in the dependent variable that is explained by the independent variables in a model. It is formally defined as R^2 = 1 - \frac{[SS_{res}](/page/Residual_sum_of_squares)}{[SS_{tot}](/page/Total_sum_of_squares)}, where SS_{res} is the representing the unexplained variance, and SS_{tot} is the capturing the overall variability in the data. This measure ranges from 0 to 1, where R^2 = 0 indicates that the model explains none of the variance (equivalent to using the as the predictor), and R^2 = 1 signifies a perfect fit with no variance. The total sum of squares, SS_{tot} = \sum (y_i - \bar{y})^2, measures the total variability of the observed values y_i around their mean \bar{y}, serving as a baseline for the dispersion in the dependent variable before any modeling. After fitting the regression model, the residual sum of squares, SS_{res} = \sum (y_i - \hat{y}_i)^2, quantifies the remaining unexplained variability between the observed values and the predicted values \hat{y}_i. Thus, R^2 directly reflects the fraction of SS_{tot} that the model accounts for, highlighting its effectiveness in capturing patterns in the data. From the perspective of variance reduction in prediction, R^2 arises as the complement of the proportion of variance left unexplained by the model. In predictive terms, the variance of the prediction error is proportional to SS_{res}, while the model's reduces the expected error variance from the total level SS_{tot} by the amount attributable to the predictors. This decomposition underscores R^2 as a metric of how much the improves predictions over a naive mean-based approach, with higher values indicating greater reduction in prediction . To illustrate, consider a simple dataset with four observations of an independent variable x (e.g., dosage levels: 1, 2, 3, 4) and dependent variable y (e.g., response rates: 2, 4, 5, 4). The mean of y is \bar{y} = 3.75, so SS_{tot} = (2-3.75)^2 + (4-3.75)^2 + (5-3.75)^2 + (4-3.75)^2 = 4.75. Fitting a yields predicted values \hat{y} = 2.7, 3.4, 4.1, 4.8, with residuals leading to SS_{res} = (2-2.7)^2 + (4-3.4)^2 + (5-4.1)^2 + (4-4.8)^2 = 2.3. Thus, R^2 = 1 - \frac{2.3}{4.75} \approx 0.516, meaning approximately 51.6% of the variance in y is explained by x.

Relation to unexplained variance

The complement of the coefficient of determination, denoted as $1 - R^2, quantifies the proportion of the total variance in the dependent that remains unexplained by the model. This value, sometimes called the coefficient of non-determination, directly measures the model's failure to account for variability in the response . The unexplained variance is formally computed as the ratio of the residual sum of squares (SSres) to the total sum of squares (SStot): $1 - R^2 = \frac{\text{SS}_{\text{res}}}{\text{SS}_{\text{tot}}} = \frac{\sum (y_i - \hat{y}_i)^2}{\sum (y_i - \bar{y})^2}, where y_i are the observed values, \hat{y}_i are the predicted values from the model, and \bar{y} is the mean of the observed values. This component captures irreducible error— inherent stochastic variability in the data that no model can eliminate—as well as variance attributable to omitted variables, model misspecification, or unmodeled interactions. A high value of $1 - R^2 signals inadequate model performance, as it reflects a large portion of the data's variability left unaccounted for, potentially indicating the need for additional predictors or a different modeling approach. For instance, in analyses of rates versus , an R^2 of approximately 0.68 implies $1 - R^2 \approx 0.32, meaning 32% of the variance in rates is unexplained by latitude alone, possibly due to factors like levels or differences. In contrast, a with a strong linear relationship might yield R^2 = 0.80, so $1 - R^2 = 0.20, where the residuals e_i = y_i - \hat{y}_i sum to squared values representing only 20% of total variability (e.g., SSres = 200 when SStot = 1000), highlighting better but still imperfect model adequacy.

Squared correlation coefficient

In simple linear regression, the coefficient of determination R^2 is equal to the square of the sample r between the observed response values y and the predicted values \hat{y}. This relationship holds specifically for the bivariate case with one predictor variable. The mathematical equivalence arises because R^2 = \frac{\mathrm{SSR}}{\mathrm{SST}}, where SSR is the regression sum of squares and SST is the total sum of squares, and this simplifies to the squared correlation. To see this, note that the Pearson correlation r = \frac{\mathrm{cov}(x, y)}{s_x s_y}, where s_x and s_y are the standard deviations of the predictor x and response y. In simple linear regression, the slope \hat{\beta}_1 = r \frac{s_y}{s_x}, and substituting into the expression for SSR yields R^2 = \left( \frac{\mathrm{cov}(x, y)}{s_x s_y} \right)^2 = r^2. Equivalently, since the predicted values \hat{y} are a linear transformation of x, r also equals the correlation between y and \hat{y}, confirming R^2 = [\mathrm{cor}(y, \hat{y})]^2. This equivalence is valid under the assumptions of , particularly that the relationship between the predictor and response is linear, and the analysis is limited to two variables without additional predictors. For example, consider data on college GPA (colgpa) and high school GPA (hsgpa) for n = 141 students. The Pearson r between colgpa and hsgpa is 0.4146. Squaring this gives r^2 = 0.4146^2 = 0.1719. Fitting the model yields SSR = 3.335 and SST = 19.406, so R^2 = \frac{3.335}{19.406} = 0.1719, matching the squared .

Interpretation

In simple linear regression

In , the coefficient of determination, denoted R^2, represents the proportion of the total variance in the response variable Y that is explained by the predictor variable X. For instance, an R^2 value of 0.75 indicates that 75% of the variability in Y can be attributed to its linear relationship with X, while the remaining 25% is due to other factors or random error. This measure provides a straightforward way to assess how well the captures the underlying pattern in the data. The of R^2 in this context reflects the degree to which the model's predictions align with the actual observed values along the fitted straight line. Higher values suggest that the data points cluster closely around the line, implying more reliable predictions for new observations within the range of X. Conversely, a low R^2 indicates greater scatter, meaning the linear fit offers limited insight into Y's behavior. In , R^2 is equivalent to the square of the between X and Y, reinforcing its role as a measure of linear association strength. To illustrate intuitively, consider a scatterplot of points representing (X) and weight (Y) for a group of individuals, with a straight line fitted through them. The total deviation of points from the weight (horizontal lines) decomposes into explained deviations (vertical distances from the line to the ) and deviations (vertical distances from points to the line). An R^2 of 0.80 here would mean 80% of the spread in weights is accounted for by the linear trend with , visualized by the line passing near most points, while the residuals show the unexplained scatter. The value of R^2 ranges from 0 to 1, where 0 signifies no linear relationship (the line explains none of the variance, as points are randomly scattered) and 1 indicates a perfect linear fit (all points lie exactly on the line). However, this range applies specifically to linear associations; a strong nonlinear relationship may yield a low R^2 despite a clear , as the metric does not capture or other non-straight forms.

In multiple linear regression

In multiple linear regression, the coefficient of determination, denoted R^2, quantifies the collective explanatory power of all predictor variables in accounting for the variability in the response variable. It represents the fraction of the total variance in the response that the model captures through the combined effects of multiple predictors, providing a measure of overall model fit. This value always ranges between 0 and 1, where a higher R^2 indicates that a larger proportion of the response variance is explained by the predictors together, though the interpretation emphasizes the model's performance relative to a baseline intercept-only model that explains none of the variance beyond the mean. As additional predictors are incorporated into the model, R^2 will not decrease and typically increases, reflecting the added variables' contribution to reducing variance; however, this rise does not necessarily signify a substantial or meaningful enhancement in understanding, particularly if the new predictors overlap substantially with existing ones. For instance, in a approach where predictors are added sequentially based on their , each step can show an incremental increase in R^2, with the marginal contribution of a new predictor interpreted as the change in R^2 attributable to its inclusion, highlighting how the model's explanatory power accumulates but requires caution against overinterpretation. Multicollinearity, arising when predictors are moderately or highly correlated, can result in a high overall R^2 while complicating the attribution of explanatory effects to individual predictors, as it increases the variance of coefficient estimates and leads to less reliable assessments of their unique roles despite the strong combined fit. This extension from , where R^2 reflects the squared correlation between one predictor and the response, underscores the cumulative nature of explanation in multivariate settings.

Limitations and inflation effects

One key limitation of the coefficient of determination, R^2, arises in models where adding more predictor variables—even those that are irrelevant or purely noisy—will always increase (or at least not decrease) the value of R^2 when fitted to the sample data. This inflation occurs because the model gains flexibility to fit the specific quirks and noise in the training , rather than capturing true underlying patterns, which promotes and reduces the model's generalizability. Several caveats further underscore the risks of over-relying on R^2. A high R^2 does not imply causation between predictors and the response variable; it only measures , and spurious correlations can yield misleadingly strong fits. Similarly, R^2 can appear elevated in misspecified models, such as those omitting key variables or assuming incorrect functional forms, masking structural flaws in the . Moreover, R^2 is computed solely from in-sample data and provides no insight into out-of-sample error, potentially overestimating a model's for new observations. To illustrate the inflation effect, consider a simulated with 50 observations and an initial simple model using one relevant predictor, yielding an R^2 of around 0.3; upon adding nine irrelevant variables (randomly generated), the R^2 can inflate to 0.9 or higher due to , as the model interpolates the rather than the signal—though this fit fails to hold on unseen data. To mitigate these issues, R^2 should be interpreted alongside other diagnostics, such as p-values to assess predictor significance and cross-validation techniques to evaluate out-of-sample performance and detect .

Extensions

Adjusted coefficient of determination

The adjusted coefficient of determination, denoted \bar{R}^2, modifies the ordinary coefficient of determination R^2 by incorporating a penalty for the number of predictors in the model, yielding a less biased estimate of the of explained variance. Unlike R^2, which monotonically increases or stays the same when additional predictors are included regardless of their relevance, \bar{R}^2 decreases if the added predictors do not sufficiently improve the model fit, thereby discouraging . The formula for the adjusted coefficient of determination is \bar{R}^2 = 1 - (1 - R^2) \frac{n-1}{n - k - 1}, where n is the sample size and k is the number of predictors (excluding ). This adjustment arises from a derivation that accounts for in variance estimation: the (TSS) is divided by its n-1 to obtain an unbiased estimate of the total variance, while the (RSS) is divided by n - k - 1 for an unbiased estimate of the variance; \bar{R}^2 then represents the ratio of these adjusted variances, equivalent to 1 minus the ratio of the unbiased variance to the unbiased total variance. To illustrate, consider a with n = 30 observations where the unadjusted R^2 = 0.60. For a model with k = 1 predictor, \bar{R}^2 = 1 - (1 - 0.60) \frac{29}{28} \approx 0.586, indicating a slight downward adjustment. If the same R^2 = 0.60 holds for a model with k = 5 predictors, \bar{R}^2 = 1 - (1 - 0.60) \frac{29}{24} \approx 0.517, demonstrating how the penalty grows with model complexity even without improvement in fit.

Partial coefficient of determination

The partial coefficient of determination, often denoted as R^2_{Y_j | \mathbf{X}_{-j}}, quantifies the marginal contribution of a specific predictor variable X_j to explaining the variance in the response variable Y in a multiple linear regression model, after controlling for the effects of all other predictors \mathbf{X}_{-j}. It is defined as the proportional reduction in the residual sum of squares (SSE) when X_j is added to the model containing the other predictors: R^2_{Y_j | \mathbf{X}_{-j}} = 1 - \frac{\text{SSE}(\mathbf{X}_{-j}, X_j)}{\text{SSE}(\mathbf{X}_{-j})} = \frac{\text{SSR}(X_j | \mathbf{X}_{-j})}{\text{SSE}(\mathbf{X}_{-j})}, where \text{SSR}(X_j | \mathbf{X}_{-j}) is the extra sum of squares due to X_j, \text{SSE}(\mathbf{X}_{-j}) is the error sum of squares for the reduced model excluding X_j, and \text{SSE}(\mathbf{X}_{-j}, X_j) is the error sum of squares for the full model. This measure isolates the unique explanatory power of X_j, ranging from 0 (no additional contribution) to 1 (complete explanation of remaining variance). In interpretation, the partial R^2 represents the proportion of the variance in Y that remains unexplained by the other predictors and is subsequently accounted for by adding X_j. Unlike the overall coefficient of determination, which assesses the full model's fit, the partial version highlights the incremental benefit of an individual predictor, making it valuable for identifying redundant variables or effects where predictors overlap in their explanations. For instance, a partial R^2 near 0 indicates that X_j adds little unique information beyond the other variables already in the model. The partial coefficient of determination can also be expressed in terms of correlations, specifically relating to the squared partial correlation pr^2_{Y_j \cdot \mathbf{X}_{-j}}, which equals the partial R^2, and involving the squared semi-partial correlation sr^2_{Y_j (\mathbf{X}_{-j})}: R^2_{Y_j | \mathbf{X}_{-j}} = pr^2_{Y_j \cdot \mathbf{X}_{-j}} = \frac{sr^2_{Y_j (\mathbf{X}_{-j})}}{1 - R^2_{Y | \mathbf{X}_{-j}}}, where sr^2_{Y_j (\mathbf{X}_{-j})} = R^2_{Y | \mathbf{X}_{-j}, X_j} - R^2_{Y | \mathbf{X}_{-j}} is the semi-partial squared correlation, measuring the unique contribution to total variance, while the denominator adjusts for the variance already explained by the reduced model. This formulation underscores how partial R^2 normalizes the semi-partial contribution to the unexplained variance. Consider an example from a multiple of (Y) predicted by triceps skinfold thickness (X_1) and thigh circumference (X_2). The reduced model with only X_1 yields R^2_{Y | X_1} = 0.71 and = 143.12, while the full model gives = 109.95. The partial R^2 for X_2 given X_1 is then R^2_{Y_2 | X_1} = (143.12 - 109.95)/143.12 = 0.232, indicating that X_2 explains an additional 23.2% of the variance in body fat not accounted for by X_1 alone. This value is modest compared to the overall R^2 \approx 0.78 for the full model, illustrating how predictor overlap can diminish an individual variable's partial contribution despite a strong total fit.

Generalizations and decompositions

In models with orthogonal predictors, the coefficient of determination decomposes additively into the sum of the individual R² values (or squared partial correlations) contributed by each predictor, reflecting their effects on explained variance. This follows from the of the , where the onto the fitted values is the sum of orthogonal projections onto each predictor space, yielding R^2 = \sum_{j=1}^p R_j^2, with R_j^2 = \frac{\| P_j y \|^2}{\| y - \bar{y} \|^2} for the P_j of the j-th predictor. When predictors are correlated (non-orthogonal cases), such additive decomposition no longer holds directly, but hierarchical partitioning addresses this by evaluating all possible subsets of predictors and allocating variance based on the average independent contribution of each across models, thus providing a measure of relative importance while accounting for . Alternatively, the method from decomposes R² by computing the average marginal contribution of each predictor (or group) over all possible combinations, ensuring an equitable partition that sums to the total R² and handles shared variance. For instance, in a multiple with environmental and socioeconomic predictors, this approach might attribute 0.15 of an overall R² = 0.45 to climate variables and 0.20 to income factors, after averaging marginal gains across coalitions. A broader geometric generalization interprets R² within the of centered observations, where it equals the squared cosine of θ* between the observed response y - \bar{y} and the fitted values \hat{y} - \bar{y}, i.e., R^2 = \cos^2(\theta^*) = \frac{(\hat{y} - \bar{y})' (y - \bar{y}) }{ \| y - \bar{y} \| \cdot \| \hat{y} - \bar{y} \| }, emphasizing the directional alignment between actual and predicted data. Extensions to nonlinear models introduce pseudo-R² forms to approximate goodness-of-fit. McFadden's pseudo-R², commonly used for models like , is given by \rho^2 = 1 - \frac{\ln L(M)}{\ln L(M_0)}, where L(M) is the likelihood of the full model and L(M_0) that of the intercept-only null model; values near 0.2–0.4 often indicate reasonable fit, though it understates compared to linear R².

Application in

In , which models binary outcomes, the coefficient of determination cannot be directly applied as in due to the non-linear nature of the link and the absence of a straightforward variance decomposition. Instead, pseudo-R² measures are used to assess model fit by quantifying the improvement in the over a null model. These measures are derived from and provide a way to evaluate how well the predictors explain the observed data relative to an intercept-only model. One common pseudo-R² variant is the Cox and Snell measure, defined as R^2_{CS} = 1 - \left( \frac{L_0}{L_1} \right)^{2/n}, where L_0 is the likelihood of the (intercept-only) model, L_1 is the likelihood of the fitted model, and n is the sample size. This measure, proposed by Cox and Snell, ranges between 0 and less than 1, reflecting the proportional reduction in the deviance but bounded by the model's likelihood. To address the limitation that Cox and Snell's R² cannot reach 1 even for a perfect model, Nagelkerke introduced a scaled version: R^2_N = \frac{R^2_{CS}}{1 - L_0^{2/n}}. This adjustment normalizes the measure so its maximum value is 1, making it more intuitive for comparing fit across models while still based on likelihood ratios. Nagelkerke's formulation is widely adopted in statistical software for binary logistic regression. Interpreting these pseudo-R² values presents challenges distinct from linear regression. Unlike the ordinary R², which represents the proportion of total variance explained by the model, pseudo-R² measures indicate the relative improvement in predictive likelihood rather than variance reduction. For instance, a value of 0.10 does not mean 10% of the "variance" is explained but rather that the full model improves the log-likelihood by about 10% relative to the null, adjusted for sample size; values are typically lower than in linear models for similar data. These measures are most useful for comparing nested models rather than assessing absolute explanatory power. Consider a logistic regression example predicting binary income (1 for above-median, 0 otherwise) from years of education, with a sample of n = 500. The null model's log-likelihood is -346.574, while the fitted model's is -322.489. The resulting Cox and Snell R² is 0.092, and Nagelkerke's R² is 0.122. In a corresponding on the same data, the ordinary R² is 0.11, showing rough concordance in scale but highlighting that pseudo-R² values remain modest and emphasize likelihood gains over variance fit. Despite their utility, pseudo-R² measures in have limitations: they are not directly comparable to the linear R² due to differing underlying assumptions about error distributions and cannot be interpreted as proportions of in the outcome. Instead, they serve primarily for relative model comparison within the same , such as evaluating whether adding predictors meaningfully improves fit beyond the . Over-reliance on a single pseudo-R² can mislead, so complementary diagnostics like AIC or Hosmer-Lemeshow tests are recommended.

Comparisons

With other goodness-of-fit measures

The coefficient of determination, R^2, quantifies the proportion of variance in the response variable explained by the model in linear regression, but it does not penalize model complexity and tends to increase with additional predictors, potentially leading to overfitting. In contrast, information criteria such as the Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) provide alternative goodness-of-fit measures that balance explanatory power against model complexity, making them suitable for model selection. These criteria are particularly useful when comparing models for predictive accuracy rather than just in-sample fit, as R^2 emphasizes. The AIC is defined as \text{AIC} = -2 \log L + 2k, where L is the maximized likelihood of the model and k is the number of , imposing a fixed penalty of 2 per parameter to estimate relative predictive . Unlike the adjusted R^2, which penalizes complexity proportionally to the ratio of unexplained variance adjusted for , AIC derives from and asymptotically approximates the expected Kullback-Leibler divergence, favoring models with lower values for out-of-sample prediction. The BIC, formulated as \text{BIC} = -2 \log L + k \log n with n as the sample size, applies a stronger penalty that grows with n, making it more conservative in selecting parsimonious models, especially in large datasets. This logarithmic penalty in BIC contrasts with AIC's constant one, leading BIC to favor simpler models more aggressively than AIC or adjusted R^2. In generalized linear models (GLMs), the deviance serves as a goodness-of-fit measure analogous to the in , defined as D = -2 (\log L_m - \log L_s), where L_m is the likelihood of the fitted model and L_s is the likelihood. Lower deviance indicates better fit, and reductions in deviance can test model improvements, much like changes in $1 - R^2. For instance, in —a common GLM application—deviance assesses fit similarly to pseudo-R^2 measures, though it focuses on likelihood rather than variance explained. R^2 is preferred for interpreting explanatory power within the training data, particularly in simple linear contexts, while AIC and BIC are favored for model selection aimed at prediction, as they incorporate penalties to avoid overfitting. Deviance is ideal for GLMs where likelihood-based inference is central, offering a direct parallel to R^2's role in ordinary least squares. Consider a linear regression example with n = 100 observations: a parsimonious model might yield R^2 = 0.70 and AIC = 180, while adding two extraneous predictors increases R^2 to 0.72 but raises AIC to 185 and BIC to 192 due to the penalties, illustrating the trade-off where AIC/BIC select the simpler model despite the modest fit gain.

Relation to residual statistics

The coefficient of determination, denoted R^2, directly relates to residual statistics through its foundational formula, which quantifies the proportion of variance explained by the model in terms of (MSE) and total (MSTot). Specifically, R^2 = 1 - \frac{\text{MSE}}{\text{MSTot}}, where MSE represents the average squared (the difference between observed and predicted values), and MSTot is the total variance in the dependent variable. This connection highlights how R^2 measures error reduction: a higher R^2 indicates a lower MSE relative to the total variability, implying the model's predictions deviate less from actual outcomes. The standard error of the estimate, defined as s = \sqrt{\text{MSE}}, serves as the typical prediction error and is inversely related to R^2; as R^2 approaches 1, s decreases, reflecting tighter fits around the regression line. For instance, s = \sqrt{(1 - R^2)} \times \text{SD}(y) in simple linear regression, where SD(y) is the standard deviation of the observed values, underscoring the link between explained variance and residual dispersion. In the context of hypothesis testing, R^2 integrates with the ANOVA F-statistic to assess overall significance. The F-statistic is computed as F = \frac{\text{MSR}}{\text{MSE}}, where MSR (mean square regression) derives from the , and this ratio can be expressed in terms of R^2 as F = \frac{R^2 / k}{(1 - R^2) / (n - k - 1)}, with k as the number of predictors and n as the sample size; a significant supports that R^2 exceeds what would occur by chance. To illustrate, consider a with n = 10 observations where the (SS_tot) is 100, yielding MSTot = SS_tot / (n-1) = 100 / 9 ≈ 11.11. If the model's residuals sum to SS_res = 40, then MSE = SS_res / (n-2) = 40 / 8 = 5, and R^2 = 1 - \frac{5}{11.11} ≈ 0.55, demonstrating a 55% reduction in error variance compared to a null model using only the . This from residuals directly yields R^2, emphasizing its role in evaluating predictive accuracy.

Historical Development

Origins in early statistics

The concept of the coefficient of determination emerged in the early 20th century, building on earlier statistical developments. The idea of partitioning total variance into explained and unexplained components was advanced through the analysis of variance (ANOVA) developed by during the 1910s and 1920s. Fisher's work on ANOVA, beginning with his 1918 paper "The Correlation Between Relatives on the Supposition of " and expanding through subsequent publications, partitioned total variance into components attributable to different sources, providing a framework for quantifying the proportion of explained variation in experimental data. This variance decomposition laid important groundwork for measures like the coefficient of determination, which quantifies the ratio of explained variance to total variance as a key insight in assessing model fit. The specific term "coefficient of determination," denoted as R², was introduced by in his 1921 paper "Correlation and Causation," in the context of path analysis to assess relationships in complex systems such as biological and agricultural data. The formulation built directly on the method pioneered by in 1805 and in 1809, who minimized the sum of squared residuals to estimate parameters in linear models. Their approach quantified the discrepancy between observed and predicted values but did not explicitly frame it as a proportion of ; instead, it emphasized optimal fitting for astronomical and geodetic data. Fisher's innovation extended this by integrating it into , transforming the residual-based metric into a standardized measure of . Preceding Fisher's contributions, introduced the idea of in the 1880s through studies on hereditary traits, such as stature, where he observed that tended to regress toward the . Galton's work established the linear relationship between variables but lacked an explicit coefficient of determination, focusing instead on the of regression lines without quantifying the proportion of variance explained. In , the coefficient of determination is equivalent to the square of the (r), an interpretation that elaborated in his seminal 1925 book Statistical Methods for Research Workers. There, provided rigorous statistical grounding through tables and tests for significance, linking bivariate correlation with analysis of variance and making the metric accessible for biological and agricultural research. An early equivalent, the squared , had been noted in prior work but gained practical application through Fisher's contributions.

Evolution and modern usage

Following , the coefficient of determination saw significant refinements to address limitations in multiple settings. Although Mordecai Ezekiel introduced the adjusted R² in 1930 as a penalty for additional predictors to mitigate , its widespread adoption occurred during the and amid growing computational capabilities and the rise of multivariate statistical analysis. This adjustment became a standard tool in econometric and research by the 1970s, as evidenced in influential texts on that emphasized its role in . Concurrently, the partial R² emerged as a key extension in , quantifying the unique contribution of individual predictors while controlling for others; its formalization and application gained traction in the through works on linear models, facilitating hierarchical testing in fields like and . The standardization of R² and its variants accelerated with the proliferation of statistical software in the late 20th century. SAS, first released in 1976, incorporated R² and adjusted R² as default outputs in procedures like PROC REG, enabling routine computation in large-scale data analysis while including caveats in documentation about interpreting it as explanatory power rather than predictive accuracy. Similarly, the R programming language, developed in the early 1990s and first publicly announced in 1993 with source code released under GPL in 1995 and version 1.0.0 in 2000, integrated these metrics into its base lm() function, promoting open-source accessibility and embedding warnings against overreliance on in-sample R² for causal inference.) These implementations democratized the use of R² across disciplines but also highlighted risks of misuse, such as data dredging to inflate values without theoretical justification. In the , econometric debates intensified around R²'s vulnerabilities to specification bias and , prompting a shift toward robust validation methods. Edward Leamer's 1983 critique underscored how extreme sensitivity analyses could reveal fragility in models with high R², influencing the field to prioritize out-of-sample testing for generalizability. Today, R² remains integral to for assessing regression models, including non-linear ones, as implemented in libraries like 's r2_score function, which evaluates fit on held-out data. However, in the era, critiques emphasize its limitations in high-dimensional settings, where it may mask poor generalization; practitioners now pair it with cross-validation to avoid overoptimism.

References

  1. [1]
    2.5 - The Coefficient of Determination, r-squared | STAT 462
    The coefficient of determination or r-squared value, denoted r 2 , is the regression sum of squares divided by the total sum of squares.Missing: history | Show results with:history
  2. [2]
    The coefficient of determination R-squared is more informative than ...
    Jul 5, 2021 · The coefficient of determination (Wright, 1921) can be interpreted as the proportion of the variance in the dependent variable that is ...
  3. [3]
    Coefficient of Determination (R²) | Calculation & Interpretation - Scribbr
    Apr 22, 2022 · The coefficient of determination is a number between 0 and 1 that measures how well a statistical model predicts an outcome.What is the coefficient of... · Calculating the coefficient of...
  4. [4]
    Biostatistics Series Module 6: Correlation and Linear Regression - NIH
    A coefficient of determination can be calculated to denote the proportion of the variability of y that can be attributed to its linear relation with x. This is ...The Scatter Plot · The Correlation Coefficient · Simple Linear Regression
  5. [5]
    The coefficient of determination R2 and intra-class correlation ...
    Sep 13, 2017 · The coefficient of determination R 2 quantifies the proportion of variance explained by a statistical model and is an important summary statistic of biological ...<|control11|><|separator|>
  6. [6]
    Coefficient of Determination
    The coefficient of determination is the percent of variation explained by the regression equation, or the explained variation divided by the total variation.
  7. [7]
    [PDF] Assumption Lean Regression - Wharton Statistics and Data Science
    Nov 26, 2018 · We prefer for such variation the term “noise” over “error.” Sometimes it is called “irreducible variation” because it exists even if the true ...
  8. [8]
    [PDF] Week 4: Simple Linear Regression III
    It turns out that R2 is also the square of the correlation between the observed y and its model-predicted values: cor(y, y)2 qui reg colgpa hsgpa predict ...
  9. [9]
    [PDF] Correlation & Simple Regression - University of Iowa
    ... regression line fits the data). For simple regression, the coefficient of determination is simply the square of the correlation: R2 = r2. Example (continued):.
  10. [10]
    [PDF] Lecture 10: F-Tests, R 2, and Other Distractions
    Oct 1, 2015 · R2 = ( cXY. sXsY. )2. (8) which we can recognize as the squared correlation coefficient between X and Y. (hence the square in R2). A noteworthy ...
  11. [11]
    2.6 - (Pearson) Correlation Coefficient r | STAT 462
    The correlation coefficient r is directly related to the coefficient of determination r 2 in the obvious way.
  12. [12]
    R-squared or coefficient of determination (video) - Khan Academy
    Nov 19, 2010 · In linear regression, r-squared (also called the coefficient of determination) is the proportion of variation in the response variable that is explained by ...
  13. [13]
    How To Interpret R-squared in Regression Analysis - Statistics By Jim
    R-squared measures the strength of the relationship between your linear model and the dependent variables on a 0 - 100% scale. Learn about this statistic.
  14. [14]
    2.8 - R-squared Cautions | STAT 462
    Caution # 1. The coefficient of determination r2 and the correlation coefficient r quantify the strength of a linear relationship. It is possible that r2 = 0% ...
  15. [15]
    [PDF] STAT 224 Lecture 4 Multiple Linear Regression, Part 3
    Multiple R2, also called the coefficient of determination, is defined as. R2 = SSR. SST. = 1 −. SSE. SST. = proportion of variability in Y explained by X1,..., ...
  16. [16]
    10.2 - Stepwise Regression | STAT 501
    The general idea behind the stepwise regression procedure is that we build our regression model from a set of candidate predictor variables by entering and ...Missing: increase | Show results with:increase
  17. [17]
    [PDF] 4 Introduction to Multiple Linear Regression
    The R2 for the simple linear regression was .076, whereas R2 = .473 for the multiple regression model. Adding the weight variable to the model increases R2 by ...
  18. [18]
    [PDF] Multiple regression
    In a stepwise regression, predictor variables are entered into the regression equation one at a time based upon statistical criteria. At each step in the ...
  19. [19]
    10.4 - Multicollinearity | STAT 462
    Multicollinearity exists when two or more of the predictors in a regression model are moderately or highly correlated with one another.
  20. [20]
    [PDF] Multicollinearity (and Model Validation) - San Jose State University
    A serious issue in multiple linear regression is multicollinearity, or near- linear dependence among the regression variables, e.g., x3 = 2x1 + 3x2.
  21. [21]
    [PDF] Multicollinearity - Academic Web
    Jul 24, 2025 · R-squared = 0.0626. -------------+ ... In short, multicollinearity is not a problem that is unique to OLS regression, and the various.
  22. [22]
    5.3 - The Multiple Linear Regression Model | STAT 462
    The estimates of the \beta coefficients are the values that minimize the sum of squared errors for the sample. The exact formula for this is given in the next ...
  23. [23]
    [PDF] A Brief, Nontechnical Introduction to Overfitting in Regression-Type ...
    Figure 2. Pure noise variables still produce good R2 values if the model is overfitted. The distribution of R2 values from a series of simulated regression.Missing: squared | Show results with:squared
  24. [24]
    Lesson 10: Model Building - STAT ONLINE
    The model with a smaller mean square prediction error (or larger cross-validation R 2 ) is a better predictive model. Consider residual plots, outliers ...<|control11|><|separator|>
  25. [25]
    Derivation of R² and adjusted R² | The Book of Statistical Proofs
    Dec 6, 2019 · 2) Using (4) , the coefficient of determination can be also written as: R2=1−∑ni=1(yi−^yi)2∑ni=1(yi−¯y)2=1−1n∑ni=1(yi−^yi)21n∑ni=1(yi−¯y)2. (8)Missing: seminal paper
  26. [26]
    [PDF] Applied linear statistical models - Statistics - University of Florida
    ... Applied linear regression models. 4th ed. c2004. Includes bibliographical references and index. ISBN 0-07-238688-6 (acid-free paper). 1. Regression analysis ...
  27. [27]
    [PDF] Semipartial (Part) and Partial Correlation
    Some relevant formulas for the semipartial and squared semipartial correlations are then k k. GX k. YG. YH k k k. GX k k. Tol b. R b. R. R sr. Tol b. R b sr k k.
  28. [28]
    Partial and Semipartial Correlation
    This says that the squared semipartial correlation is equal to the difference between two R2 values. The difference between the squared partial and semipartial ...
  29. [29]
    [PDF] Decomposing Variance - Department of Statistics
    Oct 10, 2021 · Decomposition of R2 (orthogonal case). The R2 for simple linear ... 1 + R2. 2 ≈ 2R2. 28 / 40. Page 29. Decomposition of R2. Case 2: P j R2 j < R2.
  30. [30]
    Hierarchical partitioning as an interpretative tool in multivariate ...
    This note is to draw the attention of ecologists to a relatively recent method, hierarchical partitioning, that does not aim to identify a best regression model ...
  31. [31]
    [PDF] Shapley Decomposition of R-Squared in Machine Learning Models
    Aug 26, 2019 · This amazing decomposition of a single prediction into its constituent parts across model features is one of the main goals of Shapley value ...Missing: components | Show results with:components
  32. [32]
    [PDF] An overview of the elementary statistics of correlation, R-squared ...
    Feb 21, 2018 · The R-squared for two vectors thus is the squared cosine of the angle θ* between the centered values. The root mean squared error (RMSE) ...
  33. [33]
    [PDF] Conditional Logit Analysis of Qualitative Choice Behavior
    Conditional logit analysis of qualitative choice behavior. DANIEL MCFADDEN'. UNIVERSITY OF CALIFORNIA AT BERKELEY. BERKELEY, CALIFORNIA. 1. Preferences and ...Missing: URL | Show results with:URL
  34. [34]
    FAQ: What are pseudo R-squareds? - OARC Stats - UCLA
    Oct 20, 2011 · If comparing two models on the same data, McFadden's would be higher for the model with the greater likelihood. McFadden's (adjusted). Image ...
  35. [35]
    Analysis of Binary Data - 2nd Edition - D.R. Cox - Routledge
    In stockAnalysis of Binary Data. By D.R. Cox, E. J. Snell Copyright 1989. Hardback $161.00. eBook $172.50. ISBN 9780412306204. 248 Pages. Published May 15, 1989 by ...
  36. [36]
    [PDF] Pseudo R2 and Information Measures (AIC & BIC)
    Sep 8, 2024 · Pseudo R2 measures are analogs to OLS R2, with McFadden's being popular. AIC and BIC are information measures used to assess model fit.
  37. [37]
    [PDF] Linear Model Selection and Regularization
    This procedure has an advantage relative to AIC, BIC, Cp, and adjusted R2, in that it provides a direct estimate of the test error, and doesn't require an ...
  38. [38]
    How to compare regression models - Duke People
    After fitting a number of different regression or time series forecasting models to a given data set, you have many criteria by which they can be compared.
  39. [39]
    Model selection and Akaike's Information Criterion (AIC)
    Akaike, H. (1974). A new look at the statistical model identification.IEEE Transactions on Automatic Control, AC-19, 716–723. Google Scholar.
  40. [40]
    Estimating the Dimension of a Model - Project Euclid
    These terms are a valid large-sample criterion beyond the Bayesian context, since they do not depend on the a priori distribution. Citation. Download Citation.
  41. [41]
    Generalized Linear Models | P. McCullagh - Taylor & Francis eBooks
    Jan 22, 2019 · The success of the first edition of Generalized Linear Models led to the updated Second Edition, which continues to provide a definitive ...
  42. [42]
    R-Squared Explained: Measuring Model Fit - DataCamp
    May 14, 2025 · r-squared formula involving 1 - MSE / var(y). where: MSE is the mean squared error of the model,; Var(y) is the variance of the true outcome ...
  43. [43]
    Standard Error of the Regression vs. R-squared - Statistics By Jim
    The standard error of the regression (s) is the square root of the mean square error (MSE), where: Equation for mean square error.
  44. [44]
    Mathematics of simple regression - Duke People
    Review of the mean model, formulas for the slope and intercept of a simple regression model, formulas for R-squared and standard error of the regression.
  45. [45]
    2.6 - The Analysis of Variance (ANOVA) table and the F-test
    Of course, that means the regression sum of squares (SSR) and the regression mean square (MSR) are always identical for the simple linear regression model. Now, ...
  46. [46]
    R2 Score & Mean Square Error (MSE) Explained - BMC Software
    Jul 24, 2025 · What is variance? · What is the R2 score? · What is mean square error (MSE)? · How to calculate MSE in Python · What is a good Mean Squared Error ( ...Missing: determination MSTot)
  47. [47]
    Fisher and Regression - Project Euclid
    In 1922 R. A. Fisher introduced the modern regression model, synthesizing the regression theory of Pearson and Yule and the least squares theory of Gauss.
  48. [48]
    3.4. Metrics and scoring: quantifying the quality of predictions
    Scikit-learn uses estimator score methods, scoring parameters, and metric functions to evaluate model predictions. Common metrics include accuracy, balanced ...F1_score · Accuracy_score · Roc_auc_score · 3.5. Validation curves