Explained sum of squares
The explained sum of squares (ESS), also known as the regression sum of squares (SSR) or sum of squares due to regression, is a fundamental statistical quantity in regression analysis and analysis of variance (ANOVA) that measures the portion of the total variation in the response variable accounted for by the fitted model or by differences between group means.[1][2] It arises from the partitioning of the total sum of squares (TSS), which captures all variability in the data around the mean, into two components: the ESS representing explained variation and the residual sum of squares (RSS) or error sum of squares (SSE) representing unexplained variation, such that TSS = ESS + RSS.[3][4] In the context of simple linear regression, the ESS is calculated as the sum of the squared deviations of the predicted values from the overall mean of the response variable, formally ESS = \sum (\hat{y}_i - \bar{y})^2, where \hat{y}_i are the fitted values and \bar{y} is the sample mean.[1] This decomposition highlights how much of the observed variability in the dependent variable Y is attributable to its linear relationship with the predictor X, as opposed to random error.[2] In multiple linear regression, the ESS extends to account for multiple predictors, quantifying the collective explanatory power of the model.[1] In one-way ANOVA, the ESS corresponds to the between-groups sum of squares (SST), which measures variation due to differences among treatment or group means, computed as the sum of squared deviations of group means from the grand mean, weighted by group sizes: SST = \sum n_j (\bar{y}_j - \bar{y})^2, where n_j is the sample size of group j and \bar{y}_j its mean.[3] This between-groups component isolates the effect of the factor under study from within-group error variation (SSE), enabling tests of significance via the F-statistic.[3] For more complex designs, such as factorial ANOVA, sequential or partial sums of squares can refine the ESS to assess individual factor contributions while controlling for others.[5] The ESS plays a central role in model evaluation, particularly through the coefficient of determination R², defined as R^2 = ESS / TSS, which indicates the proportion of total variance explained by the model and ranges from 0 to 1.[1][2] A higher ESS relative to TSS signifies better model fit, though it must be interpreted alongside considerations like sample size and potential overfitting in multiple regression.[6] This metric underpins inference in fields such as economics, biology, and engineering, where quantifying explanatory power is essential for hypothesis testing and prediction.[3]Fundamentals
Definition
The explained sum of squares, often denoted as SSR or ESS, is a key component in regression analysis and the analysis of variance (ANOVA), representing the sum of the squared deviations between the predicted values from a fitted model and the overall mean of the observed dependent variable.[7] Mathematically, it is expressed as \text{[SSR](/page/SSR)} = \sum_{i=1}^n (\hat{y}_i - \bar{y})^2, where \hat{y}_i denotes the predicted value for the i-th observation, \bar{y} is the sample mean of the observed values y_i, and the summation is over all n observations.[2] This measure captures the portion of the total variation in the dependent variable that the model attributes to the influence of the independent variables. In essence, SSR quantifies the improvement in predictive accuracy gained by the model over simply using the mean of the dependent variable as the predictor, thereby indicating how much variability the independent variables explain.[7] The concept forms part of the fundamental partitioning of the total sum of squares (SST) into explained (SSR) and residual (SSE) components, where SST = SSR + SSE.[2] The term and its underlying framework were formalized in the early 20th century by Ronald A. Fisher within the development of ANOVA and regression techniques, particularly in his 1925 book Statistical Methods for Research Workers, as a means to decompose variance in experimental data.[8]Relation to Variance Partitioning
In regression analysis, the total sum of squares (SST), which quantifies the total variation in the response variable around its mean, is partitioned into the explained sum of squares (SSR) and the sum of squares due to error (SSE). Specifically, SST is given by \sum_{i=1}^n (y_i - \bar{y})^2, measuring the overall variability in the observed data; SSR, \sum_{i=1}^n (\hat{y}_i - \bar{y})^2, captures the variation explained by the model; and SSE, \sum_{i=1}^n (y_i - \hat{y}_i)^2, represents the unexplained residual variation. This decomposition satisfies the identity SST = SSR + SSE, providing a fundamental way to assess how much of the total variance is accounted for by the regression model.[9][10] This partitioning bears a close conceptual analogy to the analysis of variance (ANOVA), where SSR functions similarly to the "between-group" sum of squares, attributing variation to the explanatory factors in the model, while SSE corresponds to the "within-group" or error sum of squares. In both frameworks, the total variability is decomposed into systematic (model-related) and random (unexplained) components to facilitate inference about the model's explanatory power.[9][11] The degrees of freedom associated with SSR reflect the model's complexity: in simple linear regression with one predictor, df_{SSR} = 1, corresponding to the single slope parameter; in multiple regression with k predictors, df_{SSR} = k, accounting for the additional parameters beyond the intercept. These degrees of freedom, when divided into SSR, yield the mean square for regression, enabling F-tests for model significance.[9][5]Simple Linear Regression
Model Setup and Partitioning
In simple linear regression, the relationship between a response variable y and a single predictor variable x is modeled as y_i = \beta_0 + \beta_1 x_i + \epsilon_i for i = 1, \dots, n, where \beta_0 represents the y-intercept, \beta_1 the slope, and \epsilon_i the random error term with E(\epsilon_i) = 0 and \text{Var}(\epsilon_i) = \sigma^2.[12] This setup assumes linearity in the mean response, independence of the errors, normality of the error distribution, and homoscedasticity (constant error variance across all levels of x).[13] These assumptions ensure that the ordinary least squares estimates of \beta_0 and \beta_1 are unbiased and provide a basis for inference about the linear relationship. The total variability in the observed responses, quantified by the total sum of squares (SST) as \sum_{i=1}^n (y_i - \bar{y})^2, where \bar{y} is the sample mean of y, is partitioned into two components under this model: the explained sum of squares (SSR), which captures the variation attributable to the predictor x, and the sum of squares due to error (SSE), which reflects unexplained residual variation.[14] This partitioning is expressed as SST = SSR + SSE, providing a decomposition of the total variation into that explained by the fitted line and that remaining after accounting for the linear relationship.[14] In the simple linear regression context, SSR is explicitly calculated as \hat{\beta}_1^2 \sum_{i=1}^n (x_i - \bar{x})^2, where \hat{\beta}_1 is the least squares slope estimate and \bar{x} is the sample mean of x; this formula arises from the projection of the centered response onto the direction of the centered predictor.[15] Geometrically, SSR corresponds to the squared Euclidean norm of the projection of the centered response vector \mathbf{y} - \bar{y} \mathbf{1} (where \mathbf{1} is the vector of ones) onto the column space spanned by the constant vector \mathbf{1} and the predictor vector \mathbf{x}, illustrating how the model captures the component of variation aligned with the linear subspace defined by the regressors.[16]Derivation of Explained Sum of Squares
In simple linear regression, the ordinary least squares (OLS) method seeks to minimize the sum of squared errors (SSE), defined as \text{SSE} = \sum_{i=1}^n (y_i - \beta_0 - \beta_1 x_i)^2, with respect to the parameters \beta_0 and \beta_1, yielding the estimates \hat{\beta}_0 and \hat{\beta}_1.[15] The fitted values are then \hat{y}_i = \hat{\beta}_0 + \hat{\beta}_1 x_i. From the normal equations of OLS, \hat{\beta}_0 = \bar{y} - \hat{\beta}_1 \bar{x}, where \bar{y} and \bar{x} are the sample means of y and x, respectively. Substituting this in gives \hat{y}_i = \bar{y} + \hat{\beta}_1 (x_i - \bar{x}). [17] Thus, the deviation of the fitted value from the mean is \hat{y}_i - \bar{y} = \hat{\beta}_1 (x_i - \bar{x}). The explained sum of squares (SSR), which quantifies the variation in y accounted for by the regression model, is \text{SSR} = \sum_{i=1}^n (\hat{y}_i - \bar{y})^2. Substituting the expression for \hat{y}_i - \bar{y} yields \text{SSR} = \sum_{i=1}^n [\hat{\beta}_1 (x_i - \bar{x})]^2 = \hat{\beta}_1^2 \sum_{i=1}^n (x_i - \bar{x})^2. The OLS estimate for the slope is \hat{\beta}_1 = \frac{\sum_{i=1}^n (x_i - \bar{x})(y_i - \bar{y})}{\sum_{i=1}^n (x_i - \bar{x})^2}, so SSR can also be expressed as \text{SSR} = \frac{[\sum_{i=1}^n (x_i - \bar{x})(y_i - \bar{y})]^2}{\sum_{i=1}^n (x_i - \bar{x})^2}.[18][19] To relate SSR to the total sum of squares (SST = \sum_{i=1}^n (y_i - \bar{y})^2) and SSE, consider the identity y_i - \bar{y} = (\hat{y}_i - \bar{y}) + (y_i - \hat{y}_i), where e_i = y_i - \hat{y}_i is the residual. Squaring and summing gives \text{SST} = \sum_{i=1}^n [( \hat{y}_i - \bar{y} ) + e_i ]^2 = \text{SSR} + \text{SSE} + 2 \sum_{i=1}^n e_i (\hat{y}_i - \bar{y}). The cross-product term vanishes because the residuals are orthogonal to the fitted deviations: \sum_{i=1}^n e_i (\hat{y}_i - \bar{y}) = 0. This orthogonality follows from the OLS normal equations, which ensure that the residuals are perpendicular to both the constant term (implying \sum e_i = 0) and the predictor deviations (implying \sum e_i (x_i - \bar{x}) = 0). Since \hat{y}_i - \bar{y} is a scalar multiple of x_i - \bar{x}, the residuals are orthogonal to it as well. In vector terms, the explained and residual vectors are orthogonal in the Euclidean space, so SST = SSR + SSE holds by the Pythagorean theorem.[17][15][18]General Linear Models
Extension to Multiple Regression
In multiple linear regression, the explained sum of squares (SSR) extends the concept from simple linear regression to models incorporating several predictor variables, allowing for the quantification of variation in the response variable attributed to the collective influence of multiple predictors.[20] The general ordinary least squares (OLS) model is formulated as \mathbf{y} = X \boldsymbol{\beta} + \boldsymbol{\varepsilon}, where \mathbf{y} is an n \times 1 vector of observed responses, X is an n \times (k+1) design matrix comprising a column of ones for the intercept and k columns of predictor variables, \boldsymbol{\beta} is a (k+1) \times 1 vector of unknown regression coefficients, and \boldsymbol{\varepsilon} is an n \times 1 vector of independent errors assumed to have mean zero and constant variance.[20][21] This matrix representation generalizes the simple linear regression case, where only one predictor is used, by accommodating the joint effects of multiple independent variables on the response.[22] The fitted values are given by \hat{\mathbf{y}} = X \hat{\boldsymbol{\beta}}, where the OLS estimator \hat{\boldsymbol{\beta}} = (X^T X)^{-1} X^T \mathbf{y} minimizes the residual sum of squares.[20] The projection matrix P = X (X^T X)^{-1} X^T, often called the hat matrix, projects \mathbf{y} onto the column space of X, yielding \hat{\mathbf{y}} = P \mathbf{y}.[22] In this framework, the explained sum of squares measures the variation explained by the model relative to the mean and is expressed in matrix form as SSR = \hat{\mathbf{y}}^T \hat{\mathbf{y}} - n \bar{y}^2, where \bar{y} is the sample mean of \mathbf{y}.[21] Equivalently, centering the data around the mean gives SSR = \hat{\boldsymbol{\beta}}^T X^T (\mathbf{y} - \bar{y} \mathbf{1}), with \mathbf{1} denoting the n \times 1 vector of ones, or SSR = \| P \mathbf{y} - \bar{y} \mathbf{1} \|^2, highlighting the deviation of the fitted values from the grand mean.[22][21] The fundamental partitioning of the total sum of squares (SST) into explained and unexplained components persists in multiple regression: SST = SSR + SSE, where SSE is the error sum of squares SSE = \| \mathbf{y} - \hat{\mathbf{y}} \|^2 = \mathbf{y}^T (I - P) \mathbf{y} and I is the n \times n identity matrix.[20][21] This decomposition quantifies how much of the total variation in \mathbf{y}, after adjusting for the mean (SST = \sum (y_i - \bar{y})^2), is captured by the k predictors (SSR) versus remaining as residual variation (SSE).[22] The degrees of freedom associated with SSR equal k, reflecting the number of predictors excluding the intercept, while SST has n-1 degrees of freedom and SSE has n - k - 1.[21] This extension maintains the interpretability of SSR as a measure of model fit while accounting for the multidimensional predictor space.[20]Computational Methods
In general linear models, the explained sum of squares (SSR) is computed by first fitting the model to obtain the predicted values \hat{y}_i for each observation i = 1, \dots, n. The mean of the observed response variable \bar{y} is then calculated, and SSR is given by \sum_{i=1}^n (\hat{y}_i - \bar{y})^2.[23] This approach directly quantifies the variation explained by the model relative to the overall mean. An equivalent and often more efficient method leverages the partitioning of the total sum of squares (SST), where SSR = SST - SSE, with SSE being the sum of squared residuals \sum_{i=1}^n (y_i - \hat{y}_i)^2.[23] Here, SST is \sum_{i=1}^n (y_i - \bar{y})^2, and both SST and SSE can be derived post-model fitting.[7] When the model includes categorical predictors, these are incorporated via dummy coding in the design matrix X, where each category level (except one reference) becomes a binary indicator column. The resulting SSR then captures the variation explained by these dummy variables alongside any continuous predictors, as the least squares fit accounts for their joint effects in the projection onto the column space of X. For numerical stability, especially with ill-conditioned design matrices (e.g., due to multicollinearity or scaling issues), direct inversion of X^T X should be avoided; instead, QR decomposition of X is recommended to solve the normal equations reliably.[24] Centering the predictors (subtracting their means) further mitigates conditioning problems by decorrelating the intercept from other terms. Modern statistical software handles these computations automatically: in Python's statsmodels library, the OLSResults object provides the explained sum of squares directly via theess attribute after fitting, while R's lm() summary outputs it in the ANOVA table, both using stable algorithms like QR under the hood.[25]
In high-dimensional settings where the number of predictors p > n, ordinary least squares (OLS) does not yield a unique solution, rendering SSR undefined without additional constraints; standard OLS assumes p < n for full-rank estimation, though regularization methods like ridge regression can adapt the concept by penalizing coefficients to stabilize fits.[26]