Fact-checked by Grok 2 weeks ago

Studentized residual

In statistics, particularly within linear regression analysis, a studentized residual is a standardized measure of the difference between an observed value and its predicted value from the fitted model, obtained by dividing the ordinary residual by an estimate of its standard deviation.^[1] This standardization accounts for the variability in residual sizes due to leverage and model fit, making studentized residuals useful for detecting outliers and assessing model assumptions.^[2] There are two primary variants: internally studentized residuals, which use the mean squared error (MSE) from the full dataset to estimate the standard deviation, and externally studentized residuals (also called deleted studentized residuals), which exclude the observation itself when computing the MSE for a more precise variance estimate, especially when identifying potential outliers.^[1]^[2] The formula for an internally studentized residual for the ith observation is given by
t_i = \frac{e_i}{\sqrt{\hat{\sigma}^2 (1 - h_{ii})}}
where e_i is the ordinary residual, \hat{\sigma}^2 is the MSE from the full model, and h_{ii} is the leverage of the ith observation.^[2] For externally studentized residuals, the denominator incorporates an adjusted MSE excluding the ith observation, denoted s_{(i)}^2, yielding
t_{i(i)} = \frac{e_i}{\sqrt{s_{(i)}^2 (1 - h_{ii})}}
which follows a t-distribution with n - p - 1 degrees of freedom under the null hypothesis of no outliers, where n is the sample size and p is the number of parameters.^[1] These residuals are particularly valuable in regression diagnostics because values exceeding approximately 3 in absolute magnitude (for large samples) often indicate outliers that may unduly influence the model, prompting further investigation or data adjustment.^[2]^[1]

Fundamentals of Residuals

Ordinary Residuals

In the linear regression model \mathbf{y} = X\beta + \epsilon, where \mathbf{y} is the vector of observed responses, X is the design matrix, \beta is the parameter vector, and \epsilon represents the error terms, the ordinary residual for the i-th observation is defined as the difference between the observed response y_i and the fitted value \hat{y}_i, expressed as e_i = y_i - \hat{y}_i.^[3] These residuals quantify the discrepancy between the model's predictions and the actual data points, serving as estimates of the random error component in the model.^[4] Ordinary residuals are essential for evaluating the adequacy of the linear regression model, particularly in verifying underlying assumptions such as homoscedasticity, which requires the residuals to exhibit constant variance across all levels of the independent variables, and independence, which assumes no systematic correlations among the residuals or between residuals and predictors.^[5]^[6] Plots of residuals against fitted values or predictors can reveal violations of these assumptions, such as patterns indicating non-linearity or heteroscedasticity, thereby guiding model refinements.^[7] To facilitate comparison and initial outlier detection, standardized residuals provide a simple scaling of the ordinary residuals by dividing them by the estimated residual standard deviation s, yielding r_i = e_i / s, where s is typically the square root of the mean squared error from the regression.^[8] This standardization places the residuals on a common scale, approximating a standard normal distribution under ideal conditions.^[9] The concept of residuals originated in the early development of regression analysis during the late 19th and early 20th centuries, with Karl Pearson playing a pivotal role in formalizing their use within the mathematical framework of least squares estimation.^[10]

Variance Estimation in Regression

In regression analysis, the variance of the residuals is estimated using the ordinary residuals obtained from the fitted model. The residual sum of squares (RSS) measures the total variation unexplained by the model and is calculated as the sum of the squared ordinary residuals:

\text{RSS} = \sum_{i=1}^n e_i^2,

where e_i = y_i - \hat{y}_i for each observation i.^[11] An unbiased estimator of the error variance \sigma^2, denoted s^2, is then given by

s^2 = \frac{\text{RSS}}{n - p},

where n is the sample size and p is the number of parameters in the model (including the intercept).^[11] This estimation relies on key assumptions about the error terms in the linear regression model: the errors are normally distributed with mean zero, they are independent across observations, and they exhibit constant variance (homoscedasticity).^[12] These assumptions ensure that the residuals behave like independent draws from a normal distribution with variance \sigma^2, allowing s^2 to provide a reliable estimate for subsequent statistical inference.^[12] The denominator n - p represents the degrees of freedom for the error, which accounts for the loss of information due to estimating p parameters from the data. Without this adjustment, dividing RSS by n would yield a biased underestimate of \sigma^2, as the expected value of RSS is (n - p)\sigma^2 under the model assumptions; thus, s^2 has expectation \sigma^2 and corrects for this bias.^[11] For instance, in simple linear regression (where p = 2), consider a dataset with n = 12 observations of high school GPAs (x_i) and corresponding college entrance test scores (y_i). After fitting the model \hat{y}_i = b_0 + b_1 x_i to obtain predicted values, the residuals are computed as e_i = y_i - \hat{y}_i, RSS is the sum of e_i^2 over all i, and s^2 = \text{RSS} / (12 - 2) = \text{RSS} / 10. In this example, the resulting s^2 \approx 0.8678, providing the scale for residual variability.^[13]

Definition and Types

Internal Studentized Residual

The internal studentized residual for the i-th observation in a linear regression model is defined as

t_i = \frac{e_i}{s \sqrt{1 - h_{ii}}},

where e_i = y_i - \hat{y}_i is the ordinary residual, s = \sqrt{\frac{\sum_{j=1}^n e_j^2}{n - p}} is the residual standard error estimated using all n observations with p parameters in the model, and h_{ii} is the i-th diagonal element of the leverage matrix (hat matrix) H = X(X^T X)^{-1} X^T.^[14]^[15] This form of studentization is termed "internal" because it relies on the variance estimate s^2 derived from the complete dataset, incorporating the i-th observation itself in the estimation process.^[2] Unlike approaches that exclude the observation for variance computation, internal studentization provides a standardized measure that reflects the residual's deviation relative to the model's overall fit, making it suitable for initial diagnostic screening.^[15] The primary purpose of the internal studentized residual is to adjust the ordinary residual for both the estimated error variance and the leverage h_{ii}, which quantifies the influence of the i-th observation's predictor values on its own fitted value \hat{y}_i. High-leverage points (h_{ii} close to 1) tend to pull the fitted line toward them, reducing the apparent size of e_i and potentially masking outliers; the denominator \sqrt{1 - h_{ii}} corrects for this effect, yielding a more equitable scale across observations.^[14]^[15] Under the model assumptions, the internal studentized residuals are theoretically bounded by |t_i| < \sqrt{\frac{n-2}{1 - h_{ii}}} in simple linear regression (with p=2), though this generalizes to \sqrt{\frac{n-p}{1 - h_{ii}}} for multiple regression; in practice, observations with |t_i| > 2 or |t_i| < -2 (approximately two standard deviations) are commonly examined as potential outliers, while values exceeding 3 in absolute value strongly suggest discordance.^[16]^[2]

External Studentized Residual

The external studentized residual, denoted t_i^*, is defined as

t_i^* = \frac{e_i}{s_{(-i)} \sqrt{1 - h_{ii}}},

where e_i is the ordinary residual for the i-th observation, s_{(-i)}^2 is the mean squared error from the regression model refitted excluding the i-th observation, and h_{ii} is the leverage of the i-th observation, given by the diagonal element of the hat matrix.^[17] This formulation, introduced in seminal regression diagnostics work, standardizes the residual using a variance estimate independent of the potentially anomalous observation. Unlike the internal studentized residual, which incorporates all observations in the variance estimate, the external version excludes the i-th point to prevent bias in detecting outliers.^[17] This exclusion makes the external studentized residual more sensitive to outliers, as the variance estimate s_{(-i)}^2 reflects the model's error structure without influence from the suspicious observation.^[17] The external studentized residual relates to the internal studentized residual t_i through the formula

t_i^* = \frac{t_i}{\sqrt{\frac{(n - p - 1) - t_i^2}{n - p - 2}}},

where n is the number of observations and p is the number of parameters, enabling computation without repeatedly refitting the model.^[17] This relationship derives from the difference in variance estimates between the full and deleted models.^[1] Under the assumption of normally distributed errors and when the i-th observation is deleted from the model, the external studentized residuals follow a t-distribution with n - p - 1 degrees of freedom exactly.^[1] This exact distributional property holds because the deleted model has n - 1 observations, yielding n - p - 1 degrees of freedom for the t-statistic.^[17]

Computation

Formulas for Studentization

In linear regression, the ordinary residuals e_i = y_i - \hat{y}_i have heterogeneous variances given by \operatorname{Var}(e_i) = \sigma^2 (1 - h_{ii}), where h_{ii} is the i-th diagonal element of the hat matrix H = X (X^T X)^{-1} X^T.^[18]^[19] The hat matrix projects the response vector onto the column space of the design matrix X, and its diagonal elements h_{ii} measure the leverage of the i-th observation, satisfying $0 \leq h_{ii} \leq 1, \sum_{i=1}^n h_{ii} = p (where p is the number of parameters), and average h_{ii} = p/n.^[19] The internal studentized residual standardizes e_i using the global variance estimate s^2 = \operatorname{RSS}/(n - p), where \operatorname{RSS} = \sum e_i^2 is the residual sum of squares:

r_i = \frac{e_i}{s \sqrt{1 - h_{ii}}}

This follows directly from replacing \sigma^2 with s^2 in the variance formula, yielding a pivotal quantity approximately distributed as a t-random variable under model assumptions.^[1] For the external studentized residual, the standardization excludes the i-th observation from the variance estimate to mitigate bias if the point is an outlier:

t_i = \frac{e_i}{s_{(-i)} \sqrt{1 - h_{ii}}},

where s_{(-i)}^2 = \operatorname{RSS}_{(-i)} / (n - p - 1) and \operatorname{RSS}_{(-i)} is the residual sum of squares from the model fit excluding the i-th observation.^[17] Computing \operatorname{RSS}_{(-i)} naively requires refitting the model n times, but an equivalent expression derives from the relation \operatorname{RSS}_{(-i)} = \operatorname{RSS} - e_i^2 / (1 - h_{ii}):

s_{(-i)}^2 = s^2 \frac{n - p - r_i^2}{n - p - 1}.

This adjustment leverages the internal residual r_i and avoids repeated least-squares fits, improving computational efficiency and numerical stability for large n.^[1]^[17] Equivalently, the external residual relates to the internal via

t_i = r_i \sqrt{ \frac{n - p - 1}{n - p - r_i^2} },

which follows from substituting the expression for s_{(-i)}^2 and simplifying.^[1]

Step-by-Step Calculation Process

The computation of studentized residuals in linear regression follows a structured algorithm that leverages the fitted model and the hat matrix to avoid multiple refits for efficiency. First, fit the linear regression model to the full dataset using ordinary least squares to obtain the predicted values \hat{y}_i, the ordinary residuals e_i = y_i - \hat{y}_i for each observation i = 1, \dots, n, and the residual mean square s^2 = \frac{\sum_{i=1}^n e_i^2}{n - p}, where n is the sample size and p is the total number of parameters in the model (including the intercept).^[1] Second, construct the hat matrix H = X(X^T X)^{-1} X^T, where X is the n \times p design matrix, and extract the leverage values h_{ii} from its diagonal; these h_{ii} measure the influence of each observation on its own fitted value. Third, compute the internal studentized residuals as t_i = \frac{e_i}{s \sqrt{1 - h_{ii}}}.^[1] Finally, obtain the external studentized residuals t_i^* using the adjustment t_i^* = t_i \sqrt{\frac{n - p - 1}{n - p - t_i^2}}, which exactly computes the deleted residual divided by the deleted standard error without refitting the model n times. To illustrate, consider a simple linear regression example with n=5 observations and p=2 parameters (intercept and one predictor), using the dataset:

x_i	y_i
1	2
2	4
3	5
4	4
5	5

The fitted model is \hat{y}_i = 2.2 + 0.6 x_i, yielding residuals e = [-0.8, 0.6, 1.0, -0.6, -0.2] and s^2 = 0.8 (so s \approx 0.8944).^[1] The leverages are h_{ii} = [0.6, 0.3, 0.2, 0.3, 0.6]. For the third observation (i=3), the internal studentized residual is t_3 = \frac{1.0}{0.8944 \sqrt{1 - 0.2}} \approx 1.25. The external studentized residual is then t_3^* = 1.25 \sqrt{\frac{5 - 2 - 1}{5 - 2 - (1.25)^2}} = 1.25 \sqrt{\frac{2}{3 - 1.5625}} \approx 1.25 \times 1.179 \approx 1.474. Similarly, for i=1, t_1 \approx -1.414 and t_1^* = -2. These values highlight how external studentization further adjusts for the observation's potential influence on variance estimation. The overall computational complexity is dominated by forming the hat matrix, requiring O(n p^2) operations for moderate p and n, making it efficient for datasets up to thousands of observations.

Statistical Properties

Distribution Under Normality

Under the assumption of normally distributed errors in the linear regression model, the distributions of studentized residuals provide a standardized scale for assessing deviations from the fitted model. The internal studentized residual, defined using the mean squared error from the full dataset, follows approximately a Student's t-distribution; this approximation arises because the denominator depends on the data including the observation in question, introducing slight dependence that prevents an exact distribution.^[20] In contrast, the external studentized residual (also known as the deleted studentized residual) is computed using the mean squared error from the model fitted without the i-th observation, yielding an exact Student's t-distribution with n - p - 1 degrees of freedom when the i-th error is normally distributed and independent of the others; this exact pivotal distribution makes external residuals particularly useful for formal inference.^[17] Critical values for studentized residuals are obtained from the t-distribution table to identify potential outliers. For instance, if the absolute value of the external studentized residual |t_i^*| exceeds the t_{n-p-1, 0.975} quantile (corresponding to a two-sided 5% significance level), the observation is flagged as an outlier at that level, assuming the model is otherwise correct.^[1] This threshold adjusts for the degrees of freedom, becoming closer to 1.96 as n increases, reflecting the t-distribution's convergence to the standard normal. Although marginally distributed as described, the joint distribution of studentized residuals across observations is not independent, as the residuals sum to zero and their variances are linked through the leverage values h_{ii} from the hat matrix; higher leverage for one observation reduces the potential residual magnitude for others, inducing correlations that must be considered in multivariate diagnostics.

Asymptotic Behavior

As the sample size n approaches infinity with a fixed number of parameters p, both the internal studentized residual t_i and the external studentized residual t_i^* converge in distribution to a standard normal distribution N(0,1), provided the error variance estimator \hat{\sigma}^2 is consistent and standard least squares assumptions hold, such as finite second moments for the errors and regressors.^[21] This asymptotic normality follows from the central limit theorem applied to the least squares estimator \hat{\beta}, combined with Slutsky's theorem for the studentization, ensuring that the scaled residuals behave like independent standard normals under the null model.^[22] Under non-normal errors with finite variance, the studentized residuals retain their asymptotic normality and exhibit consistency for outlier detection, meaning that if an observation corresponds to a true outlier (e.g., a gross error in the response), the corresponding t_i or t_i^* diverges to infinity in probability, while non-outlying residuals remain bounded.^[23] This robustness arises because the least squares estimator remains consistent and asymptotically normal via the central limit theorem without requiring normality of the errors, as long as the outliers do not contaminate the bulk of the data.^[23] In high-dimensional settings where the dimension p grows with n such that p/n \to c \in (0,1), the leverage values h_{ii} concentrate around c for typical observations under random designs, leading to increased variability in the residuals that the studentization process adjusts for by incorporating the leverage-dependent standard error.^[21] The asymptotic normality of the studentized residuals persists under these conditions, though the effective degrees of freedom decrease, amplifying the need for careful variance estimation to maintain the N(0,1) limit.^[21]

Applications in Diagnostics

Outlier and Influence Detection

Studentized residuals play a crucial role in outlier detection within linear regression models by standardizing residuals to account for varying variances, allowing practitioners to identify observations that deviate substantially from the expected behavior under the assumed model. Specifically, an observation is flagged as a potential outlier if the absolute value of its externally studentized residual, |t_i^*|, exceeds 3 or surpasses the 97.5th percentile of the t-distribution with n - p - 1 degrees of freedom, where n is the sample size and p is the number of parameters; such thresholds indicate deviations larger than what would be anticipated under the null hypothesis of no outliers. This approach enhances sensitivity compared to raw residuals, as it adjusts for the leverage and model fit specific to each point. To assess influence, studentized residuals are combined with leverage measures, such as the diagonal elements of the hat matrix h_{ii}, to detect points that disproportionately affect the regression estimates. High leverage is typically indicated by h_{ii} > 2p/n, and when paired with a large |t_i^*|, it suggests an influential outlier; this combination can be quantified using Cook's distance, defined as

D_i = \frac{t_i^{*2} h_{ii}}{p (1 - h_{ii})}

which measures the change in fitted values across all observations if the ith point is removed, with values exceeding 4/n or F_{p, n-p} thresholds signaling high influence. A representative case study involves Anscombe's quartet, particularly the fourth dataset, which features an outlier in simple linear regression. In this example, fitting y on x yields studentized residuals where the anomalous point exhibits a large |t_i^*|, exceeding the threshold, thereby pinpointing the outlier and prompting its investigation or removal to restore model reliability; without such detection, the outlier can unduly influence the fitted model. Despite their utility, studentized residuals have limitations in outlier detection, particularly when outliers occur in clusters, as mutual proximity can mask individual deviations by pulling the fitted model toward the group and reducing apparent residual sizes. Additionally, these residuals are tailored for linear models and require adaptations, such as generalized versions, for non-linear or generalized linear models to maintain valid detection properties.

Model Validation Techniques

Studentized residuals play a central role in validating linear regression models by facilitating checks on key assumptions such as homoscedasticity, independence, and normality. For detecting heteroscedasticity, a scatterplot of internal studentized residuals t_i versus fitted values \hat{y}_i is constructed; a random scatter around zero with constant spread supports the assumption, whereas a funnel-shaped pattern or increasing variance indicates non-constant residual variance.^[24] To assess autocorrelation, especially in models with ordered or time-dependent observations, the Durbin-Watson test is applied to the ordered residuals, yielding a statistic typically between 0 and 4; values near 2 suggest no first-order serial correlation, while deviations signal potential issues in the independence assumption.^[25] Normality of residuals is evaluated using a quantile-quantile (Q-Q) plot of external studentized residuals t_i^* against theoretical quantiles from a t-distribution with n - p - 1 degrees of freedom, where points aligning closely with the reference line affirm approximate normality; alternatively, the Shapiro-Wilk test applied to these residuals provides a formal p-value for normality assessment, with low values rejecting the null hypothesis.^[26] In model selection, observations exhibiting large absolute external studentized residuals |t_i^*| (e.g., exceeding 3) are iteratively removed as suspected outliers, followed by refitting the model and comparing metrics like the coefficient of determination R^2 or Akaike information criterion (AIC); improvements in R^2 or reductions in AIC indicate enhanced model fit and parsimony after exclusion.^[1] These techniques extend to generalized linear models, where deviance residuals are studentized analogously by scaling with the estimated dispersion and leverage to approximate a t-distribution, enabling similar diagnostic plots and tests for assumption validation in non-Gaussian settings.^[27] The adoption of studentized residuals for such comprehensive diagnostics emerged prominently in the 1970s through foundational work by Belsley, Kuh, and Welsch, who integrated them into systematic regression analysis frameworks.

Implementation

Numerical Considerations

In small samples where the number of observations n is close to the number of parameters p, the estimate of the residual variance s^2 introduces bias into the computation of studentized residuals, compromising their reliability for outlier detection and model diagnostics. This bias arises because the degrees of freedom n - p - 1 become critically low, leading to unstable variance estimates that distort the t-distribution approximation underlying the residuals. To address this, bootstrap methods for variance estimation are recommended, as they provide more robust approximations of the sampling distribution without relying on parametric assumptions.^[28] High leverage points, where the diagonal element of the hat matrix h_{ii} approaches 1, pose significant challenges in computing studentized residuals, as the denominator s_{(i)} \sqrt{1 - h_{ii}} nears zero, rendering t_i undefined or excessively large even for moderate residuals.^[29] Such points dominate the fit, masking potential outliers and amplifying sensitivity to minor perturbations.^[30] In these scenarios, generalized studentized residuals, which incorporate group deletion to account for multiple high-leverage observations, offer a more stable alternative for influence assessment.^[30] Computational precision in deriving studentized residuals is vulnerable to floating-point errors, particularly when inverting X^T X to obtain the hat matrix H, as multicollinearity can inflate the condition number and propagate rounding inaccuracies throughout the residuals.^[29] These errors are exacerbated in ill-conditioned designs, where small changes in input data yield disproportionately large variations in h_{ii} and thus in the studentized values. Using QR decomposition to compute H mitigates this by avoiding explicit inversion, ensuring backward stability and reducing error bounds in finite-precision arithmetic. Handling missing data requires preprocessing before studentization, as incomplete observations disrupt the least-squares fit and residual calculations. Common approaches include casewise deletion, which removes rows with any missing values to maintain a complete dataset for regression, or imputation techniques like multiple imputation to fill gaps while preserving uncertainty. Imputation is preferable when missingness is not completely at random, as deletion can introduce bias in the variance estimate s^2 and subsequent studentized residuals.^[31]

Software Examples

In R, studentized residuals are computed using functions from the base stats package for linear models fitted with lm(). The rstudent() function returns externally studentized residuals (denoted t_i^*), while rstandard() provides internally studentized residuals (denoted t_i). These can be applied to the built-in cars dataset, which records speed and stopping distance for 50 cars.

r
# Load data and fit model
data(cars)
model <- lm(dist ~ speed, data = cars)

# Compute externally studentized residuals (default in rstudent)
t_star <- rstudent(model)

# Compute internally studentized residuals
t <- rstandard(model)

# Residuals e_i and leverage h_ii
e <- residuals(model)
h <- hatvalues(model)

# View first few
data.frame(speed = cars$speed[1:5], dist = cars$dist[1:5], e = e[1:5], h = h[1:5], t = t[1:5], t_star = t_star[1:5])
# Load data and fit model
data(cars)
model <- lm(dist ~ speed, data = cars)

# Compute externally studentized residuals (default in rstudent)
t_star <- rstudent(model)

# Compute internally studentized residuals
t <- rstandard(model)

# Residuals e_i and leverage h_ii
e <- residuals(model)
h <- hatvalues(model)

# View first few
data.frame(speed = cars$speed[1:5], dist = cars$dist[1:5], e = e[1:5], h = h[1:5], t = t[1:5], t_star = t_star[1:5])

The output combines these diagnostics into a data frame for inspection, aiding outlier detection where |t_i^*| > 2 or |t_i| > 2 flags potential issues. In Python, the statsmodels library supports computation of both internal and external studentized residuals via the OLSInfluence class from an ordinary least squares fit. The resid_studentized_internal attribute gives internally studentized residuals, and resid_studentized_external provides externally studentized ones, both accessible after calling get_influence(). Using the cars dataset (loadable via statsmodels or pandas), the process integrates seamlessly with NumPy arrays.^[32]

python
import statsmodels.api as sm
import pandas as pd

# Load cars data (example using pandas for simplicity; adjust path as needed)
cars = pd.read_csv('cars.csv')  # Columns: speed, dist
X = sm.add_constant(cars['speed'])
y = cars['dist']
model = sm.OLS(y, X).fit()

# Get influence measures
influence = model.get_influence()

# Internally studentized residuals (t_i)
t = influence.resid_studentized_internal

# Externally studentized residuals (t_i^*)
t_star = influence.resid_studentized_external

e = model.resid
h = influence.hat_matrix_diag

df = pd.DataFrame({
    'speed': cars['speed'].head(),
    'dist': cars['dist'].head(),
    'e': e.head(),
    'h': h,
    't': t,
    't_star': t_star
})
print(df)
import statsmodels.api as sm
import pandas as pd

# Load cars data (example using pandas for simplicity; adjust path as needed)
cars = pd.read_csv('cars.csv')  # Columns: speed, dist
X = sm.add_constant(cars['speed'])
y = cars['dist']
model = sm.OLS(y, X).fit()

# Get influence measures
influence = model.get_influence()

# Internally studentized residuals (t_i)
t = influence.resid_studentized_internal

# Externally studentized residuals (t_i^*)
t_star = influence.resid_studentized_external

e = model.resid
h = influence.hat_matrix_diag

df = pd.DataFrame({
    'speed': cars['speed'].head(),
    'dist': cars['dist'].head(),
    'e': e.head(),
    'h': h,
    't': t,
    't_star': t_star
})
print(df)

This yields a DataFrame with the diagnostics, where external residuals adjust for leave-one-out variance estimates, enhancing robustness for model validation.^[32] In SAS, PROC REG computes studentized residuals using the OUTPUT statement after specifying the model. The RSTUDENT keyword outputs externally studentized residuals (t_i^*), while STUDENT provides the t-statistic equivalent (related to internal forms). Leverage (h_{ii}) is obtained via H=, and raw residuals via R=. Applied to the cars dataset, this generates an output dataset for further analysis.^[33]

sas
PROC REG DATA=cars;
    MODEL dist = speed;
    OUTPUT OUT=stats 
        R=e 
        H=h 
        STUDENT=t 
        RSTUDENT=t_star;
RUN;

PROC PRINT DATA=stats(OBS=5);
RUN;
PROC REG DATA=cars;
    MODEL dist = speed;
    OUTPUT OUT=stats 
        R=e 
        H=h 
        STUDENT=t 
        RSTUDENT=t_star;
RUN;

PROC PRINT DATA=stats(OBS=5);
RUN;

The resulting stats dataset includes the first five observations' diagnostics, printed via PROC PRINT, facilitating graphical or tabular review in the SAS output viewer.^[33] Across these packages, outputs align on key diagnostics but differ in defaults and accessibility. R's rstudent() defaults to external studentized residuals, matching SAS's RSTUDENT, while Python's statsmodels requires explicit selection for internal versus external. For the cars dataset, combining residuals (e_i), leverage (h_{ii}), internally studentized (t_i), and externally studentized (t_i^*) via the respective functions produces comparable tables, though numerical precision may vary slightly due to implementation (e.g., R and statsmodels use iterative leave-one-out for external, while SAS integrates it directly).^[32]^[33]

Software	Residuals (e_i)	Leverage (h_{ii})	Internal (t_i)	External (t_i^*)	Default Type
R	`residuals(model)`	`hatvalues(model)`	`rstandard(model)`	`rstudent(model)`	External
Python (statsmodels)	`model.resid`	`influence.hat_matrix_diag`	`influence.resid_studentized_internal`	`influence.resid_studentized_external`	Neither (explicit)
SAS	OUTPUT R=	OUTPUT H=	OUTPUT STUDENT=	OUTPUT RSTUDENT=	External (RSTUDENT)

As of 2023, scikit-learn integrates influence functions approximations for regression diagnostics, enabling computation of studentized residual-like measures through custom implementations on LinearRegression fits, often combined with NumPy for leverage and variance adjustments.^[34]

References

[1]
9.4 - Studentized Residuals | STAT 462
Studentized residuals offer an alternative criterion for identifying outliers. The basic idea is to delete the observations one at a time.
[2]
11.3 - Identifying Outliers (Unusual y Values) | STAT 501
An observation with an internally studentized residual that is larger than 3 (in absolute value) is generally deemed an outlier. (Sometimes, the term "outlier" ...
[3]
4.1 - Residuals | STAT 462
The basic idea of residual analysis, therefore, is to investigate the observed residuals to see if they behave “properly.”Missing: ordinary | Show results with:ordinary
[4]
5.2.4. Are the model residuals well-behaved?
Residuals are estimates of experimental error obtained by subtracting the observed responses from the predicted responses. The predicted response is calculated ...Missing: ordinary linear definition
[5]
Understanding the Assumptions of Linear Regression Analysis
Homoscedasticity: The variance of error terms (residuals) should be consistent across all levels of the independent variables. A scatterplot of residuals ...
[6]
Testing the assumptions of linear regression - Duke People
The residuals should be randomly and symmetrically distributed around zero under all conditions, and in particular there should be no correlation between ...Missing: ordinary | Show results with:ordinary<|separator|>
[7]
Residual Analysis Explained: Understand Model Fit & Patterns
Residuals are used to evaluate several assumptions underlying the regression model. These assumptions include linearity, constant variance (homoscedasticity), ...
[8]
Standardized Residual - R Tutorial
The standardized residual is the residual divided by its standard deviation. Standardized Residual = Residual / Standard Deviation of Residual.
[9]
[PDF] Simple Linear Regression
The standardized residuals are given by. If, for example, a particular standardized residual is 1.5, then the residual itself is 1.5 (estimated) standard.
[10]
Galton, Pearson, and the Peas: A Brief History of Linear Regression ...
This paper presents a brief history of how Galton originally derived and applied linear regression to problems of heredity.
[11]
[PDF] Statistics 2 Unit 6
Apr 3, 2025 · E(RSS) = E(ˆe′ ˆe)=(n − p)σ2. Thus, s2 = RSS n − p is an unbiased estimate of σ2.
[12]
Regression Model Assumptions | Introduction to Statistics - JMP
Because we are fitting a linear model, we assume that the relationship really is linear, and that the errors, or residuals, are simply random fluctuations ...
[13]
2.4 - What is the Common Error Variance? | STAT 462
### Summary of s^2 Computation in Example
[14]
[PDF] STAT 224 Lecture 10 Chapter 4 Model Diagnostics, Part 1
studentized residuals of the outlier. Hence, it's better estimate σ2 excluding the outlier. This is the idea behind externally studentized residuals.
[15]
None
Below is a merged summary of internal studentized residuals based on the provided segments from R. Dennis Cook's "Applied Regression Including Computing and Graphics" (1999). To retain all information in a dense and organized manner, I will use a combination of narrative text and a table in CSV format for detailed attributes. The narrative will provide an overarching synthesis, while the table captures specific details (e.g., definitions, formulas, purposes, etc.) across chapters and pages.
[16]
The Maximum Size of Standardized and Internally Studentized ...
Shiffler (1988) showed that the magnitude of the largest Z score in a univariate data set is bounded above by . Similar bounds hold for standardized and ...
[17]
11.4 - Deleted Residuals | STAT 501
That is, a studentized deleted (or externally studentized) residual is just an (unstandardized) deleted residual divided by its estimated standard deviation ( ...
[18]
[PDF] The residuals and their variance-covariance matrix
Therefore the variance of the ith residual is var(ei) = σ2(1 − hii). Since the variance is always ≥ 0 we have 1 − hii ≥ 0 ⇒ hii ≤ 1. If hii is close to 1 the ...
[19]
5.4 - A Matrix Formulation of the Multiple Regression Model
One important matrix that appears in many formulas is the so-called "hat matrix," H = X(X^{'}X)^{-1}X^{'}, since it puts the hat on Y! Linear Dependence. There ...
[20]
4.2 Residuals | lineaRmodels
4.2 Residuals. There are many types of residuals. The model residuals are simply e=MXy e = M X y , which can be obtained through resid for lm objects.<|control11|><|separator|>
[21]
STAT 224 Lecture 10 Chapter 4 Model Diagnostics, Part 1
standardized residual or internally studentized residuals ri = ei bσ. √. 1 − hii . • ri has mean zero and standard deviation 1, but ri's no longer add up to ...
[22]
[PDF] A Unified Asymptotic Distribution Theory for Parametric and Non ...
Our second theorem demonstrates asymptotic normality of studentized linear functions of the regression function. These estimates have a finite sample bias due ...
[23]
Asymptotic Normality and Consistency of the Least Squares ...
"Asymptotic Normality and Consistency of the Least Squares Estimators for Families of Linear Regressions." Ann. Math. Statist. 34 (2) 447 - 456, June, 1963 ...
[24]
A new multiple outliers identification method in linear regression - Metrika
**Summary of Asymptotic Properties of Studentized Residuals (Bagdonavičius & Petkevičius, 2019)**
[25]
[PDF] Lecture 20: Outliers and Influential Points 1 Modified Residuals
We can look at their studentized residuals, either ordinary or cross-validated, which depend on how far they are from the regression line. 3. We can look at ...
[26]
Residuals - MATLAB & Simulink - MathWorks
Studentized Residuals s r i = r i M S E ( i ) ( 1 − h i i ) , where MSE(i) is the mean squared error of the regression fit calculated by removing observation i ...
[27]
The Durbin-Watson Test: Definition & Example - Statology
A Durbin-Watson test, which is used to detect the presence of autocorrelation in the residuals of a regression.
[28]
[PDF] Chapter 12: Diagnostics
Formal tests of normality: • Shapiro-Wilk test (shapiro.test) --- related to a QQ-plot in that it is derived from observed order statistics and expected ...
[29]
Accessing Generalized Linear Model Fits - R
McCullagh P, Nelder JA (1989). ... measures for deletion diagnostics, including standardized ( rstandard ) and studentized ( rstudent ) residuals.
[30]
Bootstrap Methods and Their Application - A. C. Davison, D. V. Hinkley
This book gives a broad and up-to-date coverage of bootstrap methods, with numerous applied examples, developed in a coherent way with the necessary ...
[31]
Regression Diagnostics: Identifying Influential Data and Sources of ...
Provides practicing statisticians and econometricians with new tools for assessing quality and reliability of regression estimates.Missing: numerical | Show results with:numerical
[32]
[PDF] The Identification of Good and Bad High Leverage Points in Multiple ...
This two measures also not be able to detect multiple outliers. Imon [10] suggested a generalized studentized residual (Gti) based on a group of deletion to.
[33]
Statistical Analysis with Missing Data - Google Books
This new edition by two acknowledged experts on the subject offers an up-to-date account of practical methodology for handling missing data problems.
[34]
statsmodels.stats.outliers_influence.OLSInfluence - statsmodels 0.14.4
### Summary of `resid_studentized_internal`, `resid_studentized_external`, and `get_influence`
[35]
SAS Help Center
**Summary of OUTPUT Statement for STUDENT and RSTUDENT in PROC REG:**
[36]
1.1. Linear Models — scikit-learn 1.7.2 documentation
The following are a set of methods intended for regression in which the target value is expected to be a linear combination of the features.Ordinary Least Squares and... · 1.2. Linear and Quadratic... · LinearRegressionMissing: 2023 | Show results with:2023