Fact-checked by Grok 2 weeks ago

Total sum of squares

The total sum of squares (TSS), also denoted as the sum of squares total (SST), is a fundamental statistical measure that quantifies the overall variability in a dataset by summing the squared deviations of each observed value from the sample mean. For a set of n observations y_1, y_2, \dots, y_n with mean \bar{y}, it is formally defined by the formula \text{TSS} = \sum_{i=1}^n (y_i - \bar{y})^2, which captures the total dispersion around the central tendency without regard to direction, as squaring eliminates negative differences. This metric serves as the baseline for assessing variation in data analysis, providing a scalar value where larger TSS indicates greater spread and smaller values suggest data points cluster closely around the mean. In analysis, TSS represents the total variation in the dependent variable that a model aims to explain, decomposing into the regression sum of squares (SSR), which measures variation accounted for by the predictors, and the error sum of squares (SSE), which captures unexplained residual variation, such that TSS = SSR + SSE. This decomposition enables the calculation of the R^2 = \frac{\text{SSR}}{\text{TSS}}, a key goodness-of-fit statistic ranging from 0 to 1 that indicates the proportion of total variation explained by the model. For instance, in , a high R^2 value derived from TSS signals strong predictive power, while low values highlight model inadequacies. Within the framework of analysis of variance (ANOVA), TSS quantifies the total variation across all observations in an experiment, partitioning it into between-group (SSB), reflecting differences attributable to experimental factors or treatments, and within-group (SSW), representing random error or variability within groups, with TSS = SSB + SSW. This breakdown is essential for hypothesis testing, as the F-statistic compares mean squares (TSS divided by ) to determine if group differences are statistically significant beyond chance. ANOVA applications span various fields in experimental design and , where TSS helps evaluate treatment effects reliably. Beyond and ANOVA, TSS underpins broader variance-based methods, such as in the computation of sample variance s^2 = \frac{\text{TSS}}{n-1}, which adjusts for to provide an unbiased estimator of population variance. Its emphasis on squared deviations makes it sensitive to outliers, amplifying their influence on the total variability measure, a property that enhances its utility in detecting non-normal distributions or influential data points. Overall, TSS remains a versatile tool in statistical modeling, facilitating precise quantification of data spread essential for and decision-making across disciplines.

Definition and Formulation

Mathematical Definition

The total sum of squares (TSS) is a fundamental measure in that quantifies the total variability of a around its sample . It is defined as the of the squared deviations of each from the overall , providing an algebraic representation of the in the . Mathematically, for a sample of n observations y_1, y_2, \dots, y_n with sample \bar{y} = \frac{1}{n} \sum_{i=1}^n y_i, \text{TSS} = \sum_{i=1}^n (y_i - \bar{y})^2. This formula captures the in the response variable without considering any explanatory factors. Common notations for this quantity include TSS, (total sum of squares), or \text{SS}_{\text{total}}. TSS is directly related to the sample variance s^2, which estimates the population variance, through the equation \text{TSS} = (n-1) s^2, where s^2 = \frac{1}{n-1} \sum_{i=1}^n (y_i - \bar{y})^2. This connection highlights TSS as a scaled version of the sum of squared deviations used in variance calculations.

Geometric Interpretation

The total sum of squares (TSS) geometrically represents the overall or of points relative to their , visualized in a scatterplot as the sum of squared vertical distances from each to the horizontal line at the sample . This measure captures the total variability inherent in the dataset, akin to the collective deviation of points from their . In a univariate context, these distances quantify how far the scatter around the on a one-dimensional , providing an intuitive sense of clustering or —where a larger TSS indicates greater variability and potential outliers pulling the points farther from the center. In terms, the TSS can be interpreted as the squared length of the formed by the deviations from the , drawing an analogy to the extended to higher dimensions. Each data point is viewed as a in an n-dimensional space, and the TSS is the squared Euclidean norm of the connecting the to the data points, emphasizing the orthogonal of variability in a geometric framework. This perspective highlights how TSS quantifies the total "energy" or magnitude of deviations, much like the hypotenuse squared in a represents the sum of squared legs. This geometric view extends naturally to multivariate settings, where data points are vectors in a p-dimensional space, and the TSS is given by \text{TSS} = \sum_{i=1}^n \| \mathbf{y}_i - \bar{\mathbf{y}} \|^2, with \mathbf{y}_i denoting the i-th observation vector and \bar{\mathbf{y}} the mean vector; here, \| \cdot \|^2 is the squared , the total squared across all dimensions. In multiple dimensions, a larger TSS similarly signals higher multivariate variability, such as elongated clusters or dispersed point clouds in the feature space.

Role in Statistical Analysis

In Analysis of Variance

In (ANOVA), the total sum of squares (TSS) serves as a foundational measure for decomposing the overall variability in a into components attributable to differences between group means and variability within groups, enabling tests of whether observed group differences are statistically significant. This decomposition, originally developed by A. Fisher in the context of experimental design for agricultural research, partitions the TSS as TSS = + SSW, where is the between-group sum of squares and SSW is the within-group sum of squares. The between-group sum of squares (SSB) quantifies the variation explained by the group means relative to the grand mean and is calculated as SSB = \sum_{j=1}^k n_j (\bar{y}_j - \bar{y})^2, where k is the number of groups, n_j is the sample size of the j-th group, \bar{y}_j is the mean of the j-th group, and \bar{y} is the overall grand mean. The within-group sum of squares (SSW), which captures the unexplained variation or residual error within each group, is computed as the sum of squared deviations from each group's mean: SSW = \sum_{j=1}^k \sum_{i=1}^{n_j} (y_{ij} - \bar{y}_j)^2, where y_{ij} is the i-th in the j-th group; this SSW effectively represents the across all groups under the of equal means. This partitioning of the TSS facilitates the in one-way ANOVA, where the between groups (MSB = SSB / (k-1)) is divided by the within groups (MSW = SSW / (N-k), with N the total sample size) to form the F-statistic: F = \frac{MSB}{MSW}. A significantly large F-value, compared against the with k-1 and N-k, indicates that at least one group mean differs from the others, rejecting the of homogeneity.

In Linear Regression

In , the total sum of squares (TSS) quantifies the total variation in the response variable around its , serving as a baseline measure of variability before accounting for the model's explanatory power. For with a single continuous predictor, TSS is defined as the sum of squared deviations of the observed responses from their , and it decomposes into the regression sum of squares (SSR) and the error sum of squares (SSE), such that TSS = SSR + SSE. This partitioning illustrates how much of the total variation is explained by the linear relationship versus what remains unexplained by the model. The SSR measures the variation in the response attributable to the fitted line and is calculated as \sum_{i=1}^n (\hat{y}_i - \bar{y})^2, where \hat{y}_i is the predicted value for the i-th and \bar{y} is the of the observed responses. This component captures the improvement in fit due to the predictor variable relative to a null model that simply uses the . In contrast, the SSE represents the variation not explained by the model and is given by \sum_{i=1}^n (y_i - \hat{y}_i)^2, where y_i is the observed response; it reflects the discrepancies between actual and predicted values. This same decomposition extends naturally to multiple , where multiple continuous predictors are involved, with TSS still equaling SSR + SSE and SSR now accounting for the combined explanatory effects of all predictors on the response variation. The approach parallels the partitioning in analysis of variance but applies to continuous covariate structures in predictive modeling.

Relationships and Decompositions

Partition into Component Sums

In statistical modeling, the total sum of squares (TSS) quantifies the overall variation in the observed response variable relative to its mean and decomposes additively into the (ESS), which measures the variation attributable to the fitted model, and the (RSS), which captures the remaining unexplained variation. This fundamental identity, TSS = ESS + RSS, applies universally to models estimated via ordinary (OLS), partitioning the total variability into components that facilitate variance analysis across diverse contexts such as and analysis of variance (ANOVA). The proof of this decomposition relies on the geometric property of inherent to OLS . Consider a response \mathbf{y} with \bar{y}, fitted values \hat{\mathbf{y}}, and residuals \mathbf{e} = \mathbf{y} - \hat{\mathbf{y}}. Each deviation from the can be expressed as y_i - \bar{y} = (\hat{y}_i - \bar{y}) + e_i. Squaring and summing over all observations yields: \sum_{i=1}^n (y_i - \bar{y})^2 = \sum_{i=1}^n (\hat{y}_i - \bar{y})^2 + \sum_{i=1}^n e_i^2 + 2 \sum_{i=1}^n e_i (\hat{y}_i - \bar{y}), where the cross-product term equals zero because the residuals are orthogonal to the column space of the (including the intercept), implying \sum e_i = 0 and \sum e_i \hat{y}_i = 0. Thus, TSS = + . Equivalent decompositions, such as the between-group (SSB) plus within-group sum of squares (SSW) in ANOVA, follow the same orthogonal . This holds under standard OLS assumptions, including model (that the true relationship follows the specified functional form), of errors (observations are independent), homoscedasticity (constant error variance), and no perfect among predictors, with the incorporating an intercept to center the fitted values around the . These conditions ensure the residuals are uncorrelated with the model predictions, preserving the additivity of the sums of squares. Violations, such as dependence or nonlinearity, may invalidate the . In the multivariate setting, where the response comprises multiple dependent variables forming a \mathbf{Y}, the total sum of squares generalizes to a sum-of-squares-and-cross-products (SSP) , \mathbf{T} = \mathbf{Y}^\top (\mathbf{I} - \mathbf{1}\mathbf{1}^\top / n) \mathbf{Y}, whose equals the scalar TSS. This decomposes into an explained SSP \mathbf{H} ( sum of squares) and a SSP \mathbf{E}, such that \mathbf{T} = \mathbf{H} + \mathbf{E} and \operatorname{tr}(\mathbf{T}) = \operatorname{tr}(\mathbf{H}) + \operatorname{tr}(\mathbf{E}), mirroring the univariate case and enabling multivariate variance partitioning.

Connection to Measures of Fit

The coefficient of determination, denoted R^2, quantifies the proportion of variance in the response variable explained by the model in linear regression and is computed as R^2 = 1 - \frac{\text{SSE}}{\text{TSS}} = \frac{\text{SSR}}{\text{TSS}}, where SSE is the residual sum of squares and SSR is the regression sum of squares. This measure, introduced by Sewall Wright in 1921 as a correlation index, provides a goodness-of-fit statistic between 0 and 1, with higher values indicating better explanatory power of the model relative to a null model that simply predicts the mean. The total sum of squares (TSS) serves as the denominator in R^2, normalizing the against the overall variability in the data and rendering the metric unitless, which facilitates comparisons of model fit across datasets differing in scale or measurement units. This emphasizes the relative improvement over prediction, highlighting how much the model's reduce compared to using the sample alone. To mitigate overestimation of fit in models with multiple predictors, the adjusted coefficient of determination, \bar{R}^2, incorporates a penalty for model complexity: \bar{R}^2 = 1 - (1 - R^2) \frac{n-1}{n-p-1}, where n is the number of observations and p is the number of predictors. This formula adjusts R^2 downward as p increases relative to n, providing a more reliable assessment when comparing models with varying numbers of variables. A key limitation of R^2 is that it non-decreasingly increases upon adding predictors to the model, even if those predictors are irrelevant, as any additional variable can capture minor random variation and inflate the explained proportion. The TSS, while essential for this , remains fixed for a given and does not inherently penalize such , necessitating the use of adjusted metrics or other diagnostics for robust model evaluation.

Applications and Examples

Numerical Illustration

To illustrate the computation of the total sum of squares (TSS), consider a simple univariate consisting of the values {2, 4, 4, 4, 5, 5, 7, 9}. The TSS measures the in this data around its , calculated as the sum of the squared deviations from the mean: \text{TSS} = \sum_{i=1}^{n} (y_i - \bar{y})^2, where y_i are the data points, \bar{y} is the sample , and n=8 is the number of observations. First, compute the sample mean \bar{y}: the sum of the values is 40, so \bar{y} = 40 / 8 = 5. Next, calculate the deviations from the mean for each value:
  • For 2: $2 - 5 = -3
  • For each 4: $4 - 5 = -1 (occurs three times)
  • For each 5: $5 - 5 = 0 (occurs two times)
  • For 7: $7 - 5 = 2
  • For 9: $9 - 5 = 4
Square these deviations:
  • (-3)^2 = 9
  • Three instances of (-1)^2 = 1, totaling $3 \times 1 = 3
  • Two instances of $0^2 = 0, totaling 0
  • $2^2 = 4
  • $4^2 = 16
Sum the squared deviations to obtain the TSS: $9 + 3 + 0 + 4 + 16 = 32. This TSS value relates directly to the sample variance s^2, which is an unbiased estimator of the population variance and given by s^2 = \text{TSS} / (n-1). For this dataset, s^2 = 32 / 7 \approx 4.57.

Practical Use in Data Analysis

In statistical software, the total sum of squares (TSS) is routinely computed as part of ANOVA and regression outputs to quantify overall variability in datasets. In R, the anova() function applied to a linear model object generates an ANOVA table that includes the total sum of squares as the entry for the "Total" row, representing the sum of squared deviations from the grand mean across all observations. Similarly, in Python's statsmodels library, the anova_lm() function produces a table with sums of squares for model components and residuals, where the total sum of squares can be computed by summing these values or as the sum of the model's explained sum of squares (ess) and residual sum of squares (ssr) attributes for ordinary least squares fits, enabling direct assessment of data dispersion. The scikit-learn library computes TSS implicitly in evaluating the coefficient of determination R^2, defined as $1 - \frac{\text{RSS}}{\text{TSS}}, where TSS is the sum of squared deviations of the response variable from its mean, often calculated as (y_true - y_true.mean())**2).sum(). In SPSS, the ONEWAY command outputs an ANOVA table displaying the total sum of squares in the "Total" row, alongside between- and within-group components, facilitating quick variability checks in applied analyses. The TSS plays a key role in testing by determining the for the total variability, which is n-1 where n is the sample size, directly influencing the F-statistic in ANOVA and contexts. This value ensures the compares s appropriately, with the total (TSS divided by n-1) serving as a baseline variance estimate under the of no effects. For instance, in one-way ANOVA, the F-statistic is the ratio of the between-group to the within-group , both scaled relative to the total to test group mean equality. In extensions to generalized linear models (GLMs), the TSS concept is adapted through deviance measures, where the null deviance replaces TSS to capture unexplained variability under a , and the residual deviance analogs the for model assessment. This allows analogous goodness-of-fit testing in non-normal response scenarios, such as for binary outcomes, where twice the log-likelihood difference yields the deviance statistic distributed approximately as chi-squared. A practical in illustrates TSS application: in assessing variability of machined part dimensions across production shifts, NIST guidelines recommend computing TSS from repeated measurements to decompose total process variation into shift effects and error, enabling F-tests to identify significant sources of inconsistency and guide process adjustments for improved .

References

  1. [1]
    Sum of Squares: Calculation, Types, and Examples - Investopedia
    The sum of squares measures how widely a set of data points is spread out from the mean. It is also known as variation.What Is the Sum of Squares? · How It Works · Formula · Types
  2. [2]
    Sum of Squares: Definition, Formula & Types - Statistics By Jim
    It's the cumulative total of each data point's squared difference from the mean. Variability measures how far observations fall from the center. Larger values ...Sum Of Squares Formula · Ss In Regression Analysis · Error Sum Of Squares (sse)
  3. [3]
    Understanding sums of squares - Support - Minitab
    The sum of squares represents a measure of variation or deviation from the mean. It is calculated as a summation of the squares of the differences from the ...Sum Of Squares In Anova · Sum Of Squares In Regression · Comparison Of Sequential...
  4. [4]
    2.3 - Sums of Squares | STAT 501
    Called the "total sum of squares," it quantifies how much the observed responses vary if you don't take into account their latitude.
  5. [5]
    [PDF] supplement for chapter 8: why r2 is the fraction of variation
    i) The total sum of squares as SStotal = yi − y 2 n i!! ii) The residual ... • SStotal is sometimes called TSS or SST. • SSM is sometimes called SSM or ...
  6. [6]
    [PDF] Correlation and Geometry - School of Statistics
    Jan 4, 2017 · Geometrical Interpretations. Sum-of-Squares Geometry. Geometry of Sum-of-Squares Total. Bryant, P. (1984). Geometry, statistics, probability ...
  7. [7]
    [PDF] The geometry of analysis of variance - Pathematica
    In the language of ANOVA, this final sum of squares is known as the residual sum of squares (RSS). ... The statistic is named for RA Fisher, who developed ANOVA ...
  8. [8]
    [PDF] Simple Linear Regression - Kosuke Imai
    Total Sum of Squares: TSS = n. X i=1. (Yi − Y)2. The coefficient of ... Geometric interpretation: AV = VD = [λ1v1 ··· λnvn] v1 v2 λ1v1 λ2v2. Corollaries ...
  9. [9]
    [PDF] Chapter 7 One-way ANOVA - Statistics & Data Science
    An ANOVA is a breakdown of the total variation of the data, in the form of. SS and df, into smaller independent components. For the one-way ANOVA, we break ...Missing: decomposition SSB SSW
  10. [10]
    Classics in the History of Psychology -- Fisher (1925) Chapter 8
    in the analysis of variance the sum of squares corresponding to "treatment" will be the sum of these squares divided by 4. Since the sum of the squares of ...Missing: anova | Show results with:anova
  11. [11]
    [PDF] Chapter 2: Simple Linear Regression - Purdue Department of Statistics
    The related formulas are regression sum of squares. (SSR) and total sum of squares (SST). SSR = n. X i=1. (ˆyi − ¯y). 2. =ˆβ1Sxy,. SST = n. X i=1. (yi − ¯y).
  12. [12]
    [PDF] Week 4: Simple Linear Regression III
    Total sum of squared deviations: SST = P n i=1(yi - ¯y)2. Sum of squares due to the regression: SSR = P n i=1(ˆyi - ¯y)2. Sum of squared residuals (errors): SSE ...
  13. [13]
    [PDF] Linear Least Squares Regression
    the ratio of an “explained sum of squares” due to linear regression, RegSS, over a “total sum of squares”. It can also be computed by analogy with the usual ...
  14. [14]
    Proof: Partition of sums of squares for multiple linear regression
    Mar 9, 2020 · where TSS T S S is the total sum of squares, ESS E S S is the explained sum of squares and RSS R S S is the residual sum of squares.
  15. [15]
    [PDF] Multivariate Linear Models in R* - John Fox
    Sep 21, 2018 · ... multivariate linear model a decomposition of the total sum-of-squares-and-cross-products (SSP) matrix into regression and residual SSP matrices.
  16. [16]
    2.5 - The Coefficient of Determination, r-squared | STAT 462
    In short, the "coefficient of determination" or "r-squared value," denoted r2, is the regression sum of squares divided by the total sum of squares.
  17. [17]
    The coefficient of determination R-squared is more informative than ...
    Jul 5, 2021 · Introduced by Wright (1921) and generally indicated by R2, its original formulation quantifies how much the dependent variable is determined by ...
  18. [18]
    1.5 - The Coefficient of Determination, \(R^2\) | STAT 501
    The coefficient of determination (R-squared) is the regression sum of squares divided by the total sum of squares, and is between 0 and 1.
  19. [19]
    [PDF] MITOCW | MIT15_071S17_Session_2.2.03_300k
    R squared is nice because it's unitless and therefore universally interpretable between problems. However, it can still be hard to compare between problems ...
  20. [20]
    [PDF] R2.pdf
    Some- time R2 is called the coefficient of determination, and it is given as the square of a correlation coefficient. Given paired variables рXi,YiЮ, a linear ...
  21. [21]
    Regression Analysis | SPSS Annotated Output - OARC Stats - UCLA
    Sum of Squares – These are the Sum of Squares associated with the three sources of variance, Total, Model and Residual. These can be computed in many ways ...Missing: TSS | Show results with:TSS
  22. [22]
    5.3 - The Multiple Linear Regression Model | STAT 462
    In other words, R^2 always increases (or stays the same) as more predictors are added to a multiple linear regression model, even if the predictors added are ...
  23. [23]
    [PDF] Assessing the Fit of Regression Models
    One pitfall of R-squared is that it can only increase as predictors are added to the regression model. This increase is artificial when predictors are not ...
  24. [24]
    [PDF] Practical Regression and Anova using R
    of Freedom Sum of Squares Mean Square F. Regression p - 1. SSreg. SSreg /. ( p - 1). F. Residual. n-p. RSS. RSS/. ( n - p). Total n-1. SYY. Table 3.1: Analysis ...<|separator|>
  25. [25]
    statsmodels.stats.anova.anova_lm
    Sum of squares for model terms. dffloat64. Degrees of freedom for model terms. Ffloat64. F statistic value for significance of adding ...
  26. [26]
    LinearRegression — scikit-learn 1.7.2 documentation
    The coefficient of determination, R 2 , is defined as ( 1 − u v ) , where u is the residual sum of squares ((y_true - y_pred)** 2).sum() and v is the total sum ...Component Regression vs... · Ridge · Underfitting vs. Overfitting
  27. [27]
    Overview (ONEWAY command) - IBM
    ONEWAY produces an ANOVA table displaying the between- and within-groups sums of squares, mean squares, degrees of freedom, the F ratio, and the probability ...
  28. [28]
    3.5 - The Analysis of Variance (ANOVA) table and the F-test
    The degrees of freedom associated with SSE is n-2 = 49-2 = 47. And the degrees of freedom add up: 1 + 47 = 48. The sums of squares add up: SSTO = SSR + SSE.
  29. [29]
    7.4.2.3. The ANOVA table and tests of hypotheses about means
    The mean squares are formed by dividing the sum of squares by the associated degrees of freedom. Let N = Sni. Then, the degrees of freedom for treatment ...<|separator|>
  30. [30]
    5.5 Deviance | Notes for Predictive Modeling - Bookdown
    The deviance generalizes the Residual Sum of Squares (RSS) of the linear model. The generalization is driven by the likelihood and its equivalence with the RSS ...
  31. [31]
    [PDF] 3. Production Process Characterization
    Jun 27, 2012 · Total sum of squares. The total sum of squares is calculated by summing the squares ... Production Process Characterization. 3.5. Case Studies.