Fact-checked by Grok 2 weeks ago

Total sum of squares

The total sum of squares (TSS), also denoted as the sum of squares total (SST), is a fundamental statistical measure that quantifies the overall variability in a dataset by summing the squared deviations of each observed value from the sample mean.^[1] For a set of n observations y_1, y_2, \dots, y_n with mean \bar{y}, it is formally defined by the formula

\text{TSS} = \sum_{i=1}^n (y_i - \bar{y})^2,

which captures the total dispersion around the central tendency without regard to direction, as squaring eliminates negative differences.^[2] This metric serves as the baseline for assessing variation in data analysis, providing a scalar value where larger TSS indicates greater spread and smaller values suggest data points cluster closely around the mean.^[1] In linear regression analysis, TSS represents the total variation in the dependent variable that a model aims to explain, decomposing into the regression sum of squares (SSR), which measures variation accounted for by the predictors, and the error sum of squares (SSE), which captures unexplained residual variation, such that TSS = SSR + SSE.^[2] This decomposition enables the calculation of the coefficient of determination R^2 = \frac{\text{SSR}}{\text{TSS}}, a key goodness-of-fit statistic ranging from 0 to 1 that indicates the proportion of total variation explained by the model.^[2] For instance, in simple linear regression, a high R^2 value derived from TSS signals strong predictive power, while low values highlight model inadequacies.^[1] Within the framework of analysis of variance (ANOVA), TSS quantifies the total variation across all observations in an experiment, partitioning it into between-group sum of squares (SSB), reflecting differences attributable to experimental factors or treatments, and within-group sum of squares (SSW), representing random error or variability within groups, with TSS = SSB + SSW.^[3] This breakdown is essential for hypothesis testing, as the F-statistic compares mean squares (TSS divided by degrees of freedom) to determine if group differences are statistically significant beyond chance.^[3] ANOVA applications span various fields in experimental design and quality control, where TSS helps evaluate treatment effects reliably.^[3] Beyond regression and ANOVA, TSS underpins broader variance-based methods, such as in the computation of sample variance s^2 = \frac{\text{TSS}}{n-1}, which adjusts for degrees of freedom to provide an unbiased estimator of population variance.^[2] Its emphasis on squared deviations makes it sensitive to outliers, amplifying their influence on the total variability measure, a property that enhances its utility in detecting non-normal distributions or influential data points.^[1] Overall, TSS remains a versatile tool in statistical modeling, facilitating precise quantification of data spread essential for inference and decision-making across disciplines.^[2]

Definition and Formulation

Mathematical Definition

The total sum of squares (TSS) is a fundamental measure in statistics that quantifies the total variability of a dataset around its sample mean. It is defined as the sum of the squared deviations of each observation from the overall mean, providing an algebraic representation of the dispersion in the data.^[4] Mathematically, for a sample of n observations y_1, y_2, \dots, y_n with sample mean \bar{y} = \frac{1}{n} \sum_{i=1}^n y_i,

\text{TSS} = \sum_{i=1}^n (y_i - \bar{y})^2.

This formula captures the total variation in the response variable without considering any explanatory factors.^[4] Common notations for this quantity include TSS, SST (total sum of squares), or \text{SS}_{\text{total}}.^[5] TSS is directly related to the sample variance s^2, which estimates the population variance, through the equation \text{TSS} = (n-1) s^2, where s^2 = \frac{1}{n-1} \sum_{i=1}^n (y_i - \bar{y})^2. This connection highlights TSS as a scaled version of the sum of squared deviations used in variance calculations.^[4]

Geometric Interpretation

The total sum of squares (TSS) geometrically represents the overall spread or dispersion of data points relative to their mean, visualized in a scatterplot as the sum of squared vertical distances from each observation to the horizontal line at the sample mean. This measure captures the total variability inherent in the dataset, akin to the collective deviation of points from their central tendency. In a univariate context, these distances quantify how far the data scatter around the mean on a one-dimensional axis, providing an intuitive sense of data clustering or spread—where a larger TSS indicates greater variability and potential outliers pulling the points farther from the center.^[6] In vector space terms, the TSS can be interpreted as the squared length of the vector formed by the deviations from the mean, drawing an analogy to the Pythagorean theorem extended to higher dimensions. Each data point is viewed as a vector in an n-dimensional space, and the TSS is the squared Euclidean norm of the vector connecting the mean to the data points, emphasizing the orthogonal decomposition of variability in a geometric framework. This perspective highlights how TSS quantifies the total "energy" or magnitude of deviations, much like the hypotenuse squared in a right triangle represents the sum of squared legs.^[7] This geometric view extends naturally to multivariate settings, where data points are vectors in a p-dimensional space, and the TSS is given by

\text{TSS} = \sum_{i=1}^n \| \mathbf{y}_i - \bar{\mathbf{y}} \|^2,

with \mathbf{y}_i denoting the i-th observation vector and \bar{\mathbf{y}} the mean vector; here, \| \cdot \|^2 is the squared Euclidean norm, measuring the total squared distance across all dimensions. In multiple dimensions, a larger TSS similarly signals higher multivariate variability, such as elongated clusters or dispersed point clouds in the feature space.^[8]

Role in Statistical Analysis

In Analysis of Variance

In one-way analysis of variance (ANOVA), the total sum of squares (TSS) serves as a foundational measure for decomposing the overall variability in a dataset into components attributable to differences between group means and variability within groups, enabling tests of whether observed group differences are statistically significant.^[9] This decomposition, originally developed by Ronald A. Fisher in the context of experimental design for agricultural research, partitions the TSS as TSS = SSB + SSW, where SSB is the between-group sum of squares and SSW is the within-group sum of squares.^[10]^[9] The between-group sum of squares (SSB) quantifies the variation explained by the group means relative to the grand mean and is calculated as

SSB = \sum_{j=1}^k n_j (\bar{y}_j - \bar{y})^2,

where k is the number of groups, n_j is the sample size of the j-th group, \bar{y}_j is the mean of the j-th group, and \bar{y} is the overall grand mean.^[9] The within-group sum of squares (SSW), which captures the unexplained variation or residual error within each group, is computed as the sum of squared deviations from each group's mean:

SSW = \sum_{j=1}^k \sum_{i=1}^{n_j} (y_{ij} - \bar{y}_j)^2,

where y_{ij} is the i-th observation in the j-th group; this SSW effectively represents the pooled variance across all groups under the null hypothesis of equal means.^[9] This partitioning of the TSS facilitates the F-test in one-way ANOVA, where the mean square between groups (MSB = SSB / (k-1)) is divided by the mean square within groups (MSW = SSW / (N-k), with N the total sample size) to form the F-statistic: F = \frac{MSB}{MSW}.^[9] A significantly large F-value, compared against the F-distribution with degrees of freedom k-1 and N-k, indicates that at least one group mean differs from the others, rejecting the null hypothesis of homogeneity.^[10]^[9]

In Linear Regression

In linear regression, the total sum of squares (TSS) quantifies the total variation in the response variable around its mean, serving as a baseline measure of variability before accounting for the model's explanatory power. For simple linear regression with a single continuous predictor, TSS is defined as the sum of squared deviations of the observed responses from their mean, and it decomposes into the regression sum of squares (SSR) and the error sum of squares (SSE), such that TSS = SSR + SSE. This partitioning illustrates how much of the total variation is explained by the linear relationship versus what remains unexplained by the model.^[4]^[11] The SSR measures the variation in the response attributable to the fitted regression line and is calculated as \sum_{i=1}^n (\hat{y}_i - \bar{y})^2, where \hat{y}_i is the predicted value for the i-th observation and \bar{y} is the mean of the observed responses. This component captures the improvement in fit due to the predictor variable relative to a null model that simply uses the mean. In contrast, the SSE represents the residual variation not explained by the model and is given by \sum_{i=1}^n (y_i - \hat{y}_i)^2, where y_i is the observed response; it reflects the discrepancies between actual and predicted values.^[4]^[11]^[12] This same decomposition extends naturally to multiple linear regression, where multiple continuous predictors are involved, with TSS still equaling SSR + SSE and SSR now accounting for the combined explanatory effects of all predictors on the response variation. The approach parallels the partitioning in analysis of variance but applies to continuous covariate structures in predictive modeling.^[13]^[4]

Relationships and Decompositions

Partition into Component Sums

In statistical modeling, the total sum of squares (TSS) quantifies the overall variation in the observed response variable relative to its mean and decomposes additively into the explained sum of squares (ESS), which measures the variation attributable to the fitted model, and the residual sum of squares (RSS), which captures the remaining unexplained variation. This fundamental identity, TSS = ESS + RSS, applies universally to models estimated via ordinary least squares (OLS), partitioning the total variability into components that facilitate variance analysis across diverse contexts such as regression and analysis of variance (ANOVA).^[14] The proof of this decomposition relies on the geometric property of orthogonality inherent to OLS projection. Consider a response vector \mathbf{y} with mean \bar{y}, fitted values \hat{\mathbf{y}}, and residuals \mathbf{e} = \mathbf{y} - \hat{\mathbf{y}}. Each deviation from the mean can be expressed as y_i - \bar{y} = (\hat{y}_i - \bar{y}) + e_i. Squaring and summing over all observations yields:

\sum_{i=1}^n (y_i - \bar{y})^2 = \sum_{i=1}^n (\hat{y}_i - \bar{y})^2 + \sum_{i=1}^n e_i^2 + 2 \sum_{i=1}^n e_i (\hat{y}_i - \bar{y}),

where the cross-product term equals zero because the residuals are orthogonal to the column space of the design matrix (including the intercept), implying \sum e_i = 0 and \sum e_i \hat{y}_i = 0. Thus, TSS = ESS + RSS. Equivalent decompositions, such as the between-group sum of squares (SSB) plus within-group sum of squares (SSW) in ANOVA, follow the same orthogonal projection principle.^[14] This partition holds under standard OLS assumptions, including model linearity (that the true relationship follows the specified functional form), independence of errors (observations are independent), homoscedasticity (constant error variance), and no perfect multicollinearity among predictors, with the design matrix incorporating an intercept to center the fitted values around the mean. These conditions ensure the residuals are uncorrelated with the model predictions, preserving the additivity of the sums of squares. Violations, such as dependence or nonlinearity, may invalidate the decomposition.^[14] In the multivariate setting, where the response comprises multiple dependent variables forming a matrix \mathbf{Y}, the total sum of squares generalizes to a sum-of-squares-and-cross-products (SSP) matrix, \mathbf{T} = \mathbf{Y}^\top (\mathbf{I} - \mathbf{1}\mathbf{1}^\top / n) \mathbf{Y}, whose trace equals the scalar TSS. This matrix decomposes into an explained SSP matrix \mathbf{H} (hypothesis sum of squares) and a residual SSP matrix \mathbf{E}, such that \mathbf{T} = \mathbf{H} + \mathbf{E} and \operatorname{tr}(\mathbf{T}) = \operatorname{tr}(\mathbf{H}) + \operatorname{tr}(\mathbf{E}), mirroring the univariate case and enabling multivariate variance partitioning.^[15]

Connection to Measures of Fit

The coefficient of determination, denoted R^2, quantifies the proportion of variance in the response variable explained by the model in linear regression and is computed as R^2 = 1 - \frac{\text{SSE}}{\text{TSS}} = \frac{\text{SSR}}{\text{TSS}}, where SSE is the residual sum of squares and SSR is the regression sum of squares.^[16] This measure, introduced by Sewall Wright in 1921 as a correlation index, provides a goodness-of-fit statistic between 0 and 1, with higher values indicating better explanatory power of the model relative to a null model that simply predicts the mean.^[17]^[18] The total sum of squares (TSS) serves as the denominator in R^2, normalizing the explained variation against the overall variability in the data and rendering the metric unitless, which facilitates comparisons of model fit across datasets differing in scale or measurement units.^[19] This standardization emphasizes the relative improvement over baseline prediction, highlighting how much the model's predictions reduce uncertainty compared to using the sample mean alone. To mitigate overestimation of fit in models with multiple predictors, the adjusted coefficient of determination, \bar{R}^2, incorporates a penalty for model complexity: \bar{R}^2 = 1 - (1 - R^2) \frac{n-1}{n-p-1}, where n is the number of observations and p is the number of predictors.^[20] This formula adjusts R^2 downward as p increases relative to n, providing a more reliable assessment when comparing models with varying numbers of variables.^[21] A key limitation of R^2 is that it non-decreasingly increases upon adding predictors to the model, even if those predictors are irrelevant, as any additional variable can capture minor random variation and inflate the explained proportion.^[22] The TSS, while essential for this normalization, remains fixed for a given dataset and does not inherently penalize such overfitting, necessitating the use of adjusted metrics or other diagnostics for robust model evaluation.^[23]

Applications and Examples

Numerical Illustration

To illustrate the computation of the total sum of squares (TSS), consider a simple univariate dataset consisting of the values {2, 4, 4, 4, 5, 5, 7, 9}. The TSS measures the total variation in this data around its mean, calculated as the sum of the squared deviations from the mean:

\text{TSS} = \sum_{i=1}^{n} (y_i - \bar{y})^2,

where y_i are the data points, \bar{y} is the sample mean, and n=8 is the number of observations. First, compute the sample mean \bar{y}: the sum of the values is 40, so \bar{y} = 40 / 8 = 5. Next, calculate the deviations from the mean for each value:

For 2: $2 - 5 = -3
For each 4: $4 - 5 = -1 (occurs three times)
For each 5: $5 - 5 = 0 (occurs two times)
For 7: $7 - 5 = 2
For 9: $9 - 5 = 4

Square these deviations:

(-3)^2 = 9
Three instances of (-1)^2 = 1, totaling $3 \times 1 = 3
Two instances of $0^2 = 0, totaling 0
$2^2 = 4
$4^2 = 16

Sum the squared deviations to obtain the TSS: $9 + 3 + 0 + 4 + 16 = 32. This TSS value relates directly to the sample variance s^2, which is an unbiased estimator of the population variance and given by s^2 = \text{TSS} / (n-1). For this dataset, s^2 = 32 / 7 \approx 4.57.

Practical Use in Data Analysis

In statistical software, the total sum of squares (TSS) is routinely computed as part of ANOVA and regression outputs to quantify overall variability in datasets. In R, the anova() function applied to a linear model object generates an ANOVA table that includes the total sum of squares as the entry for the "Total" row, representing the sum of squared deviations from the grand mean across all observations.^[24] Similarly, in Python's statsmodels library, the anova_lm() function produces a table with sums of squares for model components and residuals, where the total sum of squares can be computed by summing these values or as the sum of the model's explained sum of squares (ess) and residual sum of squares (ssr) attributes for ordinary least squares fits, enabling direct assessment of data dispersion.^[25] The scikit-learn library computes TSS implicitly in evaluating the coefficient of determination R^2, defined as $1 - \frac{\text{RSS}}{\text{TSS}}, where TSS is the sum of squared deviations of the response variable from its mean, often calculated as (y_true - y_true.mean())**2).sum().^[26] In SPSS, the ONEWAY command outputs an ANOVA table displaying the total sum of squares in the "Total" row, alongside between- and within-group components, facilitating quick variability checks in applied analyses.^[27] The TSS plays a key role in hypothesis testing by determining the degrees of freedom for the total variability, which is n-1 where n is the sample size, directly influencing the F-statistic in ANOVA and regression contexts. This degrees of freedom value ensures the F-test compares mean squares appropriately, with the total mean square (TSS divided by n-1) serving as a baseline variance estimate under the null hypothesis of no effects. For instance, in one-way ANOVA, the F-statistic is the ratio of the between-group mean square to the within-group mean square, both scaled relative to the total degrees of freedom to test group mean equality.^[28]^[29] In extensions to generalized linear models (GLMs), the TSS concept is adapted through deviance measures, where the null deviance replaces TSS to capture unexplained variability under a saturated model, and the residual deviance analogs the residual sum of squares for model assessment. This allows analogous goodness-of-fit testing in non-normal response scenarios, such as logistic regression for binary outcomes, where twice the log-likelihood difference yields the deviance statistic distributed approximately as chi-squared.^[30] A practical case study in manufacturing quality control illustrates TSS application: in assessing variability of machined part dimensions across production shifts, NIST guidelines recommend computing TSS from repeated measurements to decompose total process variation into shift effects and error, enabling F-tests to identify significant sources of inconsistency and guide process adjustments for improved yield.^[31]

References

[1]
Sum of Squares: Calculation, Types, and Examples - Investopedia
The sum of squares measures how widely a set of data points is spread out from the mean. It is also known as variation.What Is the Sum of Squares? · How It Works · Formula · Types
[2]
Sum of Squares: Definition, Formula & Types - Statistics By Jim
It's the cumulative total of each data point's squared difference from the mean. Variability measures how far observations fall from the center. Larger values ...Sum Of Squares Formula · Ss In Regression Analysis · Error Sum Of Squares (sse)
[3]
Understanding sums of squares - Support - Minitab
The sum of squares represents a measure of variation or deviation from the mean. It is calculated as a summation of the squares of the differences from the ...Sum Of Squares In Anova · Sum Of Squares In Regression · Comparison Of Sequential...
[4]
2.3 - Sums of Squares | STAT 501
Called the "total sum of squares," it quantifies how much the observed responses vary if you don't take into account their latitude.
[5]
[PDF] supplement for chapter 8: why r2 is the fraction of variation
i) The total sum of squares as SStotal = yi − y 2 n i!! ii) The residual ... • SStotal is sometimes called TSS or SST. • SSM is sometimes called SSM or ...
[6]
[PDF] Correlation and Geometry - School of Statistics
Jan 4, 2017 · Geometrical Interpretations. Sum-of-Squares Geometry. Geometry of Sum-of-Squares Total. Bryant, P. (1984). Geometry, statistics, probability ...
[7]
[PDF] The geometry of analysis of variance - Pathematica
In the language of ANOVA, this final sum of squares is known as the residual sum of squares (RSS). ... The statistic is named for RA Fisher, who developed ANOVA ...
[8]
[PDF] Simple Linear Regression - Kosuke Imai
Total Sum of Squares: TSS = n. X i=1. (Yi − Y)2. The coefficient of ... Geometric interpretation: AV = VD = [λ1v1 ··· λnvn] v1 v2 λ1v1 λ2v2. Corollaries ...
[9]
[PDF] Chapter 7 One-way ANOVA - Statistics & Data Science
An ANOVA is a breakdown of the total variation of the data, in the form of. SS and df, into smaller independent components. For the one-way ANOVA, we break ...Missing: decomposition SSB SSW
[10]
Classics in the History of Psychology -- Fisher (1925) Chapter 8
in the analysis of variance the sum of squares corresponding to "treatment" will be the sum of these squares divided by 4. Since the sum of the squares of ...Missing: anova | Show results with:anova
[11]
[PDF] Chapter 2: Simple Linear Regression - Purdue Department of Statistics
The related formulas are regression sum of squares. (SSR) and total sum of squares (SST). SSR = n. X i=1. (ˆyi − ¯y). 2. =ˆβ1Sxy,. SST = n. X i=1. (yi − ¯y).
[12]
[PDF] Week 4: Simple Linear Regression III
Total sum of squared deviations: SST = P n i=1(yi - ¯y)2. Sum of squares due to the regression: SSR = P n i=1(ˆyi - ¯y)2. Sum of squared residuals (errors): SSE ...
[13]
[PDF] Linear Least Squares Regression
the ratio of an “explained sum of squares” due to linear regression, RegSS, over a “total sum of squares”. It can also be computed by analogy with the usual ...
[14]
Proof: Partition of sums of squares for multiple linear regression
Mar 9, 2020 · where TSS T S S is the total sum of squares, ESS E S S is the explained sum of squares and RSS R S S is the residual sum of squares.
[15]
[PDF] Multivariate Linear Models in R* - John Fox
Sep 21, 2018 · ... multivariate linear model a decomposition of the total sum-of-squares-and-cross-products (SSP) matrix into regression and residual SSP matrices.
[16]
2.5 - The Coefficient of Determination, r-squared | STAT 462
In short, the "coefficient of determination" or "r-squared value," denoted r2, is the regression sum of squares divided by the total sum of squares.
[17]
The coefficient of determination R-squared is more informative than ...
Jul 5, 2021 · Introduced by Wright (1921) and generally indicated by R2, its original formulation quantifies how much the dependent variable is determined by ...
[18]
1.5 - The Coefficient of Determination, $R^2$ | STAT 501
The coefficient of determination (R-squared) is the regression sum of squares divided by the total sum of squares, and is between 0 and 1.
[19]
[PDF] MITOCW | MIT15_071S17_Session_2.2.03_300k
R squared is nice because it's unitless and therefore universally interpretable between problems. However, it can still be hard to compare between problems ...
[20]
[PDF] R2.pdf
Some- time R2 is called the coefficient of determination, and it is given as the square of a correlation coefficient. Given paired variables рXi,YiЮ, a linear ...
[21]
Regression Analysis | SPSS Annotated Output - OARC Stats - UCLA
Sum of Squares – These are the Sum of Squares associated with the three sources of variance, Total, Model and Residual. These can be computed in many ways ...Missing: TSS | Show results with:TSS
[22]
5.3 - The Multiple Linear Regression Model | STAT 462
In other words, R^2 always increases (or stays the same) as more predictors are added to a multiple linear regression model, even if the predictors added are ...
[23]
[PDF] Assessing the Fit of Regression Models
One pitfall of R-squared is that it can only increase as predictors are added to the regression model. This increase is artificial when predictors are not ...
[24]
[PDF] Practical Regression and Anova using R
of Freedom Sum of Squares Mean Square F. Regression p - 1. SSreg. SSreg /. ( p - 1). F. Residual. n-p. RSS. RSS/. ( n - p). Total n-1. SYY. Table 3.1: Analysis ...<|separator|>
[25]
statsmodels.stats.anova.anova_lm
Sum of squares for model terms. dffloat64. Degrees of freedom for model terms. Ffloat64. F statistic value for significance of adding ...
[26]
LinearRegression — scikit-learn 1.7.2 documentation
The coefficient of determination, R 2 , is defined as ( 1 − u v ) , where u is the residual sum of squares ((y_true - y_pred)** 2).sum() and v is the total sum ...Component Regression vs... · Ridge · Underfitting vs. Overfitting
[27]
Overview (ONEWAY command) - IBM
ONEWAY produces an ANOVA table displaying the between- and within-groups sums of squares, mean squares, degrees of freedom, the F ratio, and the probability ...
[28]
3.5 - The Analysis of Variance (ANOVA) table and the F-test
The degrees of freedom associated with SSE is n-2 = 49-2 = 47. And the degrees of freedom add up: 1 + 47 = 48. The sums of squares add up: SSTO = SSR + SSE.
[29]
7.4.2.3. The ANOVA table and tests of hypotheses about means
The mean squares are formed by dividing the sum of squares by the associated degrees of freedom. Let N = Sni. Then, the degrees of freedom for treatment ...<|separator|>
[30]
5.5 Deviance | Notes for Predictive Modeling - Bookdown
The deviance generalizes the Residual Sum of Squares (RSS) of the linear model. The generalization is driven by the likelihood and its equivalence with the RSS ...
[31]
[PDF] 3. Production Process Characterization
Jun 27, 2012 · Total sum of squares. The total sum of squares is calculated by summing the squares ... Production Process Characterization. 3.5. Case Studies.