Fact-checked by Grok 2 weeks ago

Explained sum of squares

The explained sum of squares (ESS), also known as the regression sum of squares (SSR) or sum of squares due to regression, is a fundamental statistical quantity in regression analysis and analysis of variance (ANOVA) that measures the portion of the total variation in the response variable accounted for by the fitted model or by differences between group means. It arises from the partitioning of the total sum of squares (TSS), which captures all variability in the data around the mean, into two components: the ESS representing explained variation and the residual sum of squares (RSS) or error sum of squares (SSE) representing unexplained variation, such that TSS = ESS + RSS. In the context of , the ESS is calculated as the of the squared deviations of the predicted values from the overall of the response , formally ESS = \sum (\hat{y}_i - \bar{y})^2, where \hat{y}_i are the fitted values and \bar{y} is the sample . This decomposition highlights how much of the observed variability in the dependent Y is attributable to its linear relationship with the predictor X, as opposed to random error. In multiple linear regression, the ESS extends to account for multiple predictors, quantifying the collective explanatory power of the model. In one-way ANOVA, the ESS corresponds to the between-groups sum of squares (SST), which measures variation due to differences among treatment or group means, computed as the sum of squared deviations of group means from the grand mean, weighted by group sizes: SST = \sum n_j (\bar{y}_j - \bar{y})^2, where n_j is the sample size of group j and \bar{y}_j its mean. This between-groups component isolates the effect of the factor under study from within-group error variation (SSE), enabling tests of significance via the F-statistic. For more complex designs, such as factorial ANOVA, sequential or partial sums of squares can refine the ESS to assess individual factor contributions while controlling for others. The plays a central role in model evaluation, particularly through the R², defined as R^2 = ESS / TSS, which indicates the proportion of total variance explained by the model and ranges from 0 to 1. A higher ESS relative to TSS signifies better model fit, though it must be interpreted alongside considerations like sample size and potential in multiple . This metric underpins inference in fields such as , , and , where quantifying is essential for testing and prediction.

Fundamentals

Definition

The explained sum of squares, often denoted as or ESS, is a key component in and the analysis of variance (ANOVA), representing the sum of the squared deviations between the predicted values from a fitted model and the overall of the observed dependent variable. Mathematically, it is expressed as \text{[SSR](/page/SSR)} = \sum_{i=1}^n (\hat{y}_i - \bar{y})^2, where \hat{y}_i denotes the predicted value for the i-th observation, \bar{y} is the sample of the observed values y_i, and the is over all n observations. This measure captures the portion of the in the dependent variable that the model attributes to the influence of the independent variables. In essence, SSR quantifies the improvement in predictive accuracy gained by the model over simply using the of the dependent variable as the predictor, thereby indicating how much variability the independent variables explain. The concept forms part of the fundamental partitioning of the (SST) into explained () and residual (SSE) components, where SST = + SSE. The term and its underlying framework were formalized in the early 20th century by Ronald A. Fisher within the development of ANOVA and regression techniques, particularly in his 1925 book Statistical Methods for Research Workers, as a means to decompose variance in experimental data.

Relation to Variance Partitioning

In regression analysis, the total sum of squares (SST), which quantifies the total variation in the response variable around its mean, is partitioned into the explained sum of squares (SSR) and the sum of squares due to error (SSE). Specifically, SST is given by \sum_{i=1}^n (y_i - \bar{y})^2, measuring the overall variability in the observed data; SSR, \sum_{i=1}^n (\hat{y}_i - \bar{y})^2, captures the variation explained by the model; and SSE, \sum_{i=1}^n (y_i - \hat{y}_i)^2, represents the unexplained residual variation. This decomposition satisfies the identity SST = SSR + SSE, providing a fundamental way to assess how much of the total variance is accounted for by the regression model. This partitioning bears a close conceptual to the analysis of variance (ANOVA), where functions similarly to the "between-group" , attributing variation to the explanatory factors in the model, while corresponds to the "within-group" or error . In both frameworks, the total variability is decomposed into systematic (model-related) and random (unexplained) components to facilitate inference about the model's . The associated with reflect the model's complexity: in with one predictor, df_{SSR} = 1, corresponding to the single slope parameter; in multiple with k predictors, df_{SSR} = k, accounting for the additional parameters beyond . These , when divided into , yield the mean square for regression, enabling F-tests for model .

Simple Linear Regression

Model Setup and Partitioning

In simple linear regression, the relationship between a response variable y and a single predictor variable x is modeled as y_i = \beta_0 + \beta_1 x_i + \epsilon_i for i = 1, \dots, n, where \beta_0 represents the , \beta_1 the , and \epsilon_i the random term with E(\epsilon_i) = 0 and \text{Var}(\epsilon_i) = \sigma^2. This setup assumes linearity in the mean response, independence of the errors, of the , and homoscedasticity (constant variance across all levels of x). These assumptions ensure that the ordinary estimates of \beta_0 and \beta_1 are unbiased and provide a basis for inference about the linear relationship. The total variability in the observed responses, quantified by the (SST) as \sum_{i=1}^n (y_i - \bar{y})^2, where \bar{y} is the sample mean of y, is partitioned into two components under this model: the explained sum of squares (SSR), which captures the variation attributable to the predictor x, and the due to error (SSE), which reflects unexplained variation. This partitioning is expressed as SST = SSR + SSE, providing a of the total variation into that explained by the fitted line and that remaining after accounting for the linear relationship. In the context, is explicitly calculated as \hat{\beta}_1^2 \sum_{i=1}^n (x_i - \bar{x})^2, where \hat{\beta}_1 is the slope estimate and \bar{x} is the sample of x; this formula arises from the of the centered response onto the of the centered predictor. Geometrically, corresponds to the squared norm of the of the centered response \mathbf{y} - \bar{y} \mathbf{1} (where \mathbf{1} is the of ones) onto the column spanned by the constant \mathbf{1} and the predictor \mathbf{x}, illustrating how the model captures the component of variation aligned with the defined by the regressors.

Derivation of Explained Sum of Squares

In , the ordinary least squares (OLS) method seeks to minimize the sum of squared errors (), defined as \text{SSE} = \sum_{i=1}^n (y_i - \beta_0 - \beta_1 x_i)^2, with respect to the parameters \beta_0 and \beta_1, yielding the estimates \hat{\beta}_0 and \hat{\beta}_1. The fitted values are then \hat{y}_i = \hat{\beta}_0 + \hat{\beta}_1 x_i. From the normal equations of OLS, \hat{\beta}_0 = \bar{y} - \hat{\beta}_1 \bar{x}, where \bar{y} and \bar{x} are the sample means of y and x, respectively. Substituting this in gives \hat{y}_i = \bar{y} + \hat{\beta}_1 (x_i - \bar{x}). Thus, the deviation of the fitted value from the mean is \hat{y}_i - \bar{y} = \hat{\beta}_1 (x_i - \bar{x}). The explained sum of squares (SSR), which quantifies the variation in y accounted for by the regression model, is \text{SSR} = \sum_{i=1}^n (\hat{y}_i - \bar{y})^2. Substituting the expression for \hat{y}_i - \bar{y} yields \text{SSR} = \sum_{i=1}^n [\hat{\beta}_1 (x_i - \bar{x})]^2 = \hat{\beta}_1^2 \sum_{i=1}^n (x_i - \bar{x})^2. The OLS estimate for the slope is \hat{\beta}_1 = \frac{\sum_{i=1}^n (x_i - \bar{x})(y_i - \bar{y})}{\sum_{i=1}^n (x_i - \bar{x})^2}, so SSR can also be expressed as \text{SSR} = \frac{[\sum_{i=1}^n (x_i - \bar{x})(y_i - \bar{y})]^2}{\sum_{i=1}^n (x_i - \bar{x})^2}. To relate SSR to the total sum of squares (SST = \sum_{i=1}^n (y_i - \bar{y})^2) and SSE, consider the identity y_i - \bar{y} = (\hat{y}_i - \bar{y}) + (y_i - \hat{y}_i), where e_i = y_i - \hat{y}_i is the residual. Squaring and summing gives \text{SST} = \sum_{i=1}^n [( \hat{y}_i - \bar{y} ) + e_i ]^2 = \text{SSR} + \text{SSE} + 2 \sum_{i=1}^n e_i (\hat{y}_i - \bar{y}). The cross-product term vanishes because the residuals are orthogonal to the fitted deviations: \sum_{i=1}^n e_i (\hat{y}_i - \bar{y}) = 0. This orthogonality follows from the OLS normal equations, which ensure that the residuals are perpendicular to both the constant term (implying \sum e_i = 0) and the predictor deviations (implying \sum e_i (x_i - \bar{x}) = 0). Since \hat{y}_i - \bar{y} is a scalar multiple of x_i - \bar{x}, the residuals are orthogonal to it as well. In vector terms, the explained and residual vectors are orthogonal in the Euclidean space, so SST = SSR + SSE holds by the Pythagorean theorem.

General Linear Models

Extension to Multiple Regression

In multiple linear regression, the explained sum of squares (SSR) extends the concept from to models incorporating several predictor variables, allowing for the quantification of variation in the response variable attributed to the collective influence of multiple predictors. The general ordinary (OLS) model is formulated as \mathbf{y} = X \boldsymbol{\beta} + \boldsymbol{\varepsilon}, where \mathbf{y} is an n \times 1 vector of observed responses, X is an n \times (k+1) comprising a column of ones for and k columns of predictor variables, \boldsymbol{\beta} is a (k+1) \times 1 vector of unknown regression coefficients, and \boldsymbol{\varepsilon} is an n \times 1 vector of independent errors assumed to have mean zero and constant variance. This matrix representation generalizes the case, where only one predictor is used, by accommodating the joint effects of multiple independent variables on the response. The fitted values are given by \hat{\mathbf{y}} = X \hat{\boldsymbol{\beta}}, where the OLS estimator \hat{\boldsymbol{\beta}} = (X^T X)^{-1} X^T \mathbf{y} minimizes the residual sum of squares. The projection matrix P = X (X^T X)^{-1} X^T, often called the hat matrix, projects \mathbf{y} onto the column space of X, yielding \hat{\mathbf{y}} = P \mathbf{y}. In this framework, the explained sum of squares measures the variation explained by the model relative to the mean and is expressed in matrix form as SSR = \hat{\mathbf{y}}^T \hat{\mathbf{y}} - n \bar{y}^2, where \bar{y} is the sample mean of \mathbf{y}. Equivalently, centering the data around the mean gives SSR = \hat{\boldsymbol{\beta}}^T X^T (\mathbf{y} - \bar{y} \mathbf{1}), with \mathbf{1} denoting the n \times 1 vector of ones, or SSR = \| P \mathbf{y} - \bar{y} \mathbf{1} \|^2, highlighting the deviation of the fitted values from the grand mean. The fundamental partitioning of the total sum of squares (SST) into explained and unexplained components persists in multiple regression: SST = SSR + SSE, where SSE is the error sum of squares SSE = \| \mathbf{y} - \hat{\mathbf{y}} \|^2 = \mathbf{y}^T (I - P) \mathbf{y} and I is the n \times n identity matrix. This decomposition quantifies how much of the total variation in \mathbf{y}, after adjusting for the mean (SST = \sum (y_i - \bar{y})^2), is captured by the k predictors (SSR) versus remaining as residual variation (SSE). The degrees of freedom associated with SSR equal k, reflecting the number of predictors excluding the intercept, while SST has n-1 degrees of freedom and SSE has n - k - 1. This extension maintains the interpretability of SSR as a measure of model fit while accounting for the multidimensional predictor space.

Computational Methods

In general linear models, the explained sum of squares (SSR) is computed by first fitting the model to obtain the predicted values \hat{y}_i for each observation i = 1, \dots, n. The mean of the observed response variable \bar{y} is then calculated, and SSR is given by \sum_{i=1}^n (\hat{y}_i - \bar{y})^2. This approach directly quantifies the variation explained by the model relative to the overall mean. An equivalent and often more efficient method leverages the partitioning of the (SST), where = SST - , with being the of squared residuals \sum_{i=1}^n (y_i - \hat{y}_i)^2. Here, SST is \sum_{i=1}^n (y_i - \bar{y})^2, and both SST and can be derived post-model fitting. When the model includes categorical predictors, these are incorporated via dummy coding in the X, where each category level (except one reference) becomes a indicator column. The resulting then captures the variation explained by these dummy variables alongside any continuous predictors, as the fit accounts for their joint effects in the projection onto the column space of X. For , especially with ill-conditioned design matrices (e.g., due to or scaling issues), direct inversion of X^T X should be avoided; instead, of X is recommended to solve the normal equations reliably. Centering the predictors (subtracting their means) further mitigates conditioning problems by decorrelating the intercept from other terms. Modern statistical software handles these computations automatically: in Python's statsmodels library, the OLSResults object provides the explained sum of squares directly via the ess attribute after fitting, while R's lm() summary outputs it in the ANOVA table, both using stable algorithms like QR under the hood. In high-dimensional settings where the number of predictors p > n, ordinary least squares (OLS) does not yield a unique solution, rendering undefined without additional constraints; standard OLS assumes p < n for full-rank estimation, though regularization methods like can adapt the concept by penalizing coefficients to stabilize fits.

Interpretation and Applications

Statistical Significance

The explained sum of squares () plays a central role in assessing the of regression models through the F-statistic, which tests the that all regression coefficients β_j = 0 for j = 1 to k, indicating no linear relationship between the predictors and the response variable. The F-statistic is computed as F = \frac{\text{SSR} / k}{\text{SSE} / (n - k - 1)}, where k is the number of predictors, n is the sample size, and SSE is the error sum of squares; under the null hypothesis, this follows an F-distribution with k and n - k - 1 degrees of freedom. A large value of F, driven by a substantial SSR relative to SSE, provides evidence against the null, with the p-value determining significance at a chosen alpha level (e.g., 0.05). For testing individual predictors, the partial SSR measures the incremental increase in SSR when adding a specific predictor to the model after including the others, enabling a partial that assesses its unique contribution. This partial F-statistic is equivalent to the square of the corresponding for the predictor's , such that F = t^2, with 1 and n - k - 1; rejection of the β_j = 0 for that predictor indicates its . The , R^2 = SSR / SST, where SST is the , quantifies the proportion of variance explained by the model and relates directly to the , as higher R^2 values amplify the numerator of the F-statistic and thus enhance the likelihood of . For , the adjusted R^2 accounts for model complexity to avoid overestimating fit in significance testing: \bar{R}^2 = 1 - \frac{(1 - R^2)(n-1)}{n - k - 1}, which penalizes additional predictors and provides a more reliable measure of when evaluating overall model significance. In terms of , a larger SSR relative to the error variance signals stronger evidence against the , increasing the power of the to detect true effects, though this also depends on the and sample size n. Sample size requirements for adequate power (e.g., 0.80) rise with smaller expected R^2 increments from predictors, ensuring the test can reliably identify meaningful relationships while controlling for Type II errors.

Simple Regression Example

Consider a small dataset with five observations relating height (in inches, as the predictor variable x) to weight (in pounds, as the response variable y): (60, 120), (62, 125), (64, 130), (66, 135), (68, 140). The mean weight \bar{y} is 130 pounds. The total sum of squares (SST) measures the total variation in weight around this mean and equals 250. Fitting a simple linear regression model yields the line \hat{y} = -30 + 2.5x, with predicted weights of 120, 125, 130, 135, and 140 pounds, respectively. The explained sum of squares (SSR) captures the variation explained by this model relative to the mean and is 250. The residual sum of squares (SSE), representing unexplained variation, is 0. Thus, SST = SSR + SSE holds, confirming the partitioning. The proportion of variation explained is SSR/SST = 250/250 = 1, or 100%, indicating that height accounts for 100% of the variability in weight in this sample, reflecting a perfect linear relationship.

Multiple Regression Example

For an illustration of incremental SSR, consider a dataset from a study on cognitive performance where the response is an overall score, the first predictor is vocabulary score (x), and the second is a symbol-digit modalities test score (z). The simple regression SSR for vocabulary is 2.691, with SSE = 40.359 and SST ≈ 43.05. Adding the second predictor increases SSR to 11.778, with SSE decreasing to 31.272. This incremental SSR of 9.087 (from 2.691 to 11.778) demonstrates the additional added by the second predictor, reducing unexplained variation and raising the proportion explained from about 6.3% to 27.4%. Such sequential increases in SSR help assess the value of each predictor in multiple contexts.

Real-World Application in

In , explained sum of squares quantifies how well predict GDP growth. For instance, a multiple regression model using quarterly might show interest rates and explaining an SSR of 450 out of a total SST of 600, yielding 75% and highlighting these factors' role in GDP fluctuations.

Visualization

A scatterplot of the height-weight data reveals points lying exactly on an upward-sloping regression line. With zero residuals, there are no vertical distances from points to the line (SSE = 0, no unexplained variation), while the deviations from the line to the horizontal mean line illustrate the perfect model fit, with the spread indicating the full () matching . This graphical decomposition aids in visually interpreting the partitioning of .

References

  1. [1]
    [PDF] Linear Least Squares Regression
    the ratio of an “explained sum of squares” due to linear regression, RegSS, over a “total sum of squares”. It can also be computed by analogy with the usual ...
  2. [2]
    2.3 - Sums of Squares | STAT 501
    Called the "total sum of squares," it quantifies how much the observed responses vary if you don't take into account their latitude.
  3. [3]
    7.4.2.1. 1-Way ANOVA overview - Information Technology Laboratory
    The numerator part is called the sum of squares of deviations from the mean, and the denominator is called the degrees of freedom. The variance, after some ...
  4. [4]
    Simple Regression
    The variation of the scores is either explained by x or not. Total sum of squares = explained sum of squares + unexplained sum of squares.
  5. [5]
    Sum of Squares: Definition, Formula & Types - Statistics By Jim
    The sum of squares (SS) is a statistic that measures the variability of a dataset's observations around the mean.
  6. [6]
    Classics in the History of Psychology -- Fisher (1925) Chapter 8
    in the analysis of variance the sum of squares corresponding to "treatment" will be the sum of these squares divided by 4. Since the sum of the squares of ...<|control11|><|separator|>
  7. [7]
    [PDF] Chapter 2 - University of South Carolina
    The sum of squares explained by the regression line is given by. SSR = n. X i=1. ( ˆYi − ¯Y)2. The sum of squared errors measures how much Y varies around the ...<|control11|><|separator|>
  8. [8]
    [PDF] 1 Simple Linear Regression - Statistics
    Thus we can partition the total (uncorrected) sum of squares into the sum of squares of the predicted values (the model sum of squares) and the sum of squares ...
  9. [9]
    Chapter 9 Analysis of Variance - UC Davis Plant Sciences
    The equations above show how each observation and the total variation are partitioned into treatment and residual. Total sum of squares is partitioned into SS ...
  10. [10]
    6.3 - Sequential (or Extra) Sums of Squares | STAT 501
    A sequential sum of squares quantifies how much variability we explain (increase in regression sum of squares) or alternatively how much error we reduce.
  11. [11]
    2.3 - The Simple Linear Regression Model | STAT 462
    So, another way to write the simple linear regression model is y_i = \mbox{E}(Y_i) + \epsilon_i = \beta_0 + \beta_1x_i + \epsilon_i.
  12. [12]
    Testing the assumptions of linear regression - Duke People
    There are four principal assumptions which justify the use of linear regression models for purposes of inference or prediction: (i) linearity and additivity ...
  13. [13]
    [PDF] Simple Linear Regression
    The simplest deterministic mathematical relationship between two variables x and y is a linear relationship: y = β0. + β1 x. The objective of this section is to ...
  14. [14]
    [PDF] Chapter 2: Simple Linear Regression - Purdue Department of Statistics
    2 . Page 18. The related formulas are regression sum of squares. (SSR) and total sum of squares (SST). SSR = n. X i=1. (ˆyi − ¯y). 2. =ˆβ1Sxy,. SST = n. X i=1.
  15. [15]
    [PDF] A Note on Geometric Interpretations of Regression Analysis
    Abstract. This article aims to systematically restate and interpret basic concepts and important properties of regression analysis under the geometric ...<|control11|><|separator|>
  16. [16]
    None
    ### Summary of Explained Sum of Squares (SSR) in Simple Linear Regression
  17. [17]
    None
    ### Summary of Content on Coefficient of Determination, SST, SSR, SSE, and Derivation of SSR
  18. [18]
    [PDF] Simple Linear Regression
    Regression sum of squares is interpreted as the amount of total variation that is explained by the model. Then we have r2 = 1 – SSE/SST = (SST – SSE)/SST = SSR/ ...
  19. [19]
    5.4 - A Matrix Formulation of the Multiple Regression Model
    Matrix notation applies to other regression topics, including fitted values, residuals, sums of squares, and inferences about regression parameters. One ...
  20. [20]
    [PDF] Multivariate Linear Regression - University of Minnesota Twin Cities
    Jan 16, 2017 · Multiple Linear Regression. Parameter Estimation. Regression Sums-of-Squares: Matrix Form. In MLR models, the relevant sums-of-squares are. SST ...
  21. [21]
    None
    ### Summary of Regression Sum of Squares from sumsquares.pdf
  22. [22]
    Understanding Sum of Squares: A Guide to SST, SSR, and SSE
    Sep 23, 2024 · Other terms for this metric include: regression sum of squares, sum of squares due to regression, and explained sum of squares. There's even ...
  23. [23]
    numerics - Algorithm for simple linear regression that is efficient and ...
    Feb 27, 2020 · I'm looking for suggestions which algorithms to use that are both, efficient and numerically stable.Which is more numerically stable for OLS: pinv vs QRWhen is OLS numerically unstable? - Cross ValidatedMore results from stats.stackexchange.com
  24. [24]
    statsmodels.regression.linear_model.OLSResults
    The explained sum of squares. If a constant is present, the centered total sum of squares minus the sum of squared residuals. If there is no constant, the ...Missing: computing | Show results with:computing
  25. [25]
    [PDF] High-dimensional regression
    Hre x ∈ Rn×p denotes the predictor matrix, and y ∈ Rn the outcome vector. Given the optimal coefficients ˆβ, we then we make a prediction for the outcome at ...
  26. [26]
    2.6 - The Analysis of Variance (ANOVA) table and the F-test
    Of course, that means the regression sum of squares (SSR) and the regression mean square (MSR) are always identical for the simple linear regression model.
  27. [27]
    The F-test for Linear Regression
    Corrected Sum of Squares for Model: SSM = Σ i=1 n (y i^ - y) 2, also called sum of squares for regression. Sum of Squares for Error: SSE = Σ i=1 n (y i - y ...
  28. [28]
    [PDF] Multiple linear regression
    the partial sum of squares for a specific variable measures the increase in the regression sum of squares by adding that variable to a model already ...
  29. [29]
    [PDF] Multiple Linear Regression - San Jose State University
    Multiple Linear Regression. Lastly, we can derive an estimator of σ2 from the residual sum of squares. SSRes = X e2 i = kek2 = ky − X. ˆ βk2. Theorem 0.4. We ...<|control11|><|separator|>
  30. [30]
    [PDF] STAT763: Applied Regression Analysis Multiple linear regression ...
    T = ˆβj s(ˆβj ). Two ways could be used to make a decision: p-value or significance level. Page 28. (2). Partial F test. F test statistic. F = SSR(Xj |X1 ...<|control11|><|separator|>
  31. [31]
    Regression Analysis | SPSS Annotated Output - OARC Stats - UCLA
    – These are the degrees of freedom associated with the sources of variance. The total variance has N-1 degrees of freedom. In this case, there were N=200 ...
  32. [32]
    Multiple Regression Power Analysis - OARC Stats - UCLA
    Power analysis is the name given to the process for determining the sample size for a research study. The technical definition of power is that it is the ...Missing: sum | Show results with:sum
  33. [33]
    A Gentle Guide to Sum of Squares: SST, SSR, SSE - Statology
    This tutorial provides a gentle explanation of sum of squares in linear regression, including SST, SSR, and SSE.
  34. [34]
    Statistical notes for clinical researchers: simple linear regression 2
    Aug 9, 2018 · Table 1. Calculation of sum of squares of total (SST), sum of squares due to regression (SSR), sum of squares of errors (SSE), and R-square, ...Overall F Test: A Global... · T Test: Test On Significance... · Table 4. Regression...
  35. [35]
    Linear Regression with Multiple Regressors - FRM Part 1
    Example: Calculating the Confidence Interval (CI). An economist tests the hypothesis that interest rates and inflation can explain GDP growth in a country.