Fact-checked by Grok 2 weeks ago

Simple linear regression

Simple linear regression is a statistical method that models the relationship between two continuous variables: one independent variable (predictor) and one dependent variable (response), assuming a straight-line relationship between them.^[1] The model is expressed as y = \beta_0 + \beta_1 x + \epsilon, where y is the dependent variable, x is the independent variable, \beta_0 is the y-intercept, \beta_1 is the slope, and \epsilon represents the random error term.^[2] This technique enables the estimation of the dependent variable based on the independent variable and is foundational in fields such as economics, biology, and engineering for analyzing linear associations.^[3] The origins of simple linear regression trace back to the late 19th century, when Sir Francis Galton developed the concept while studying heredity and the phenomenon of regression toward the mean in biological traits, such as the heights of parents and children.^[4] Building on earlier work in least squares estimation by Adrien-Marie Legendre in 1805 and Carl Friedrich Gauss, Galton introduced the term "regression" to describe the tendency of extreme values to move toward the average in subsequent generations.^[5] Karl Pearson later formalized the mathematical framework in the early 20th century, extending it through correlation analysis, which solidified linear regression as a core tool in statistical inference.^[6] To ensure valid inferences, simple linear regression relies on several key assumptions: linearity (the true relationship is linear in parameters), independence of observations, homoscedasticity (constant variance of residuals across levels of the independent variable), and normality of the error terms.^[7] Violations of these assumptions, such as nonlinearity or heteroscedasticity, can lead to biased estimates or invalid predictions, necessitating diagnostic checks like residual plots.^[8] Parameter estimates are typically obtained via ordinary least squares (OLS), which minimizes the sum of squared residuals between observed and predicted values, providing unbiased and efficient estimators under the model assumptions.^[9] In practice, simple linear regression is widely applied for prediction, hypothesis testing on the slope (to assess significance of the relationship), and understanding causal or associative patterns in data, though it cannot establish causation without additional experimental design.^[10] Extensions include multiple linear regression for more predictors and robust methods for handling assumption violations, but the simple form remains essential for introductory statistical modeling due to its interpretability and computational simplicity.[]

Model and Assumptions

Definition and Model Equation

Simple linear regression is a fundamental statistical technique used to model and analyze the linear relationship between a single predictor variable, denoted as X, and a response variable, denoted as Y. It posits that the expected value of the response variable can be expressed as a straight-line function of the predictor, enabling predictions and inferences about how changes in X affect Y. This method is widely applied in fields such as economics, biology, and engineering to quantify associations in bivariate data.^[1] The population model for simple linear regression is given by the equation

Y_i = \beta_0 + \beta_1 X_i + \epsilon_i, \quad i = 1, 2, \dots, n,

where Y_i is the i-th observation of the response variable, X_i is the corresponding predictor value, \beta_0 represents the y-intercept (the expected value of Y when X = 0), \beta_1 denotes the slope (the expected change in Y for a one-unit increase in X), and \epsilon_i is the random error term for the i-th observation, assumed to be independent with mean zero and constant variance \sigma^2. The error terms \epsilon_i capture the unexplained variation in Y after accounting for the linear effect of X.^[11]^[10] In practice, with a sample of n data points drawn from the population, the parameters \beta_0 and \beta_1 are unknown and must be estimated, typically using Roman letters such as b_0 and b_1 to distinguish sample estimates from the true population values represented by Greek letters. This distinction underscores the inferential nature of regression analysis, where sample-based estimates inform broader population characteristics.^[2]

Key Assumptions

The simple linear regression model is built upon a set of classical assumptions that underpin the validity of parameter estimation and statistical inference. These assumptions ensure that the model's predictions align with the underlying data-generating process and that the ordinary least squares (OLS) estimators possess desirable properties, such as unbiasedness and minimum variance under the Gauss-Markov theorem.^[12] While the core assumptions apply to both simple and multiple regression, in the simple case, they simplify due to the presence of only one predictor variable. Linearity: The primary assumption is that the conditional expected value of the response variable Y given the predictor X is a linear function of X, expressed as E(Y \mid X) = \beta_0 + \beta_1 X. This posits that the mean response changes linearly with the predictor, allowing the model to capture a straight-line relationship without curvature or higher-order terms.^[7] Violation of linearity, such as when the true relationship is quadratic, can lead to biased estimates, though graphical diagnostics like scatterplots can help detect this.^[13] Independence: The errors \varepsilon_i across observations must be independent, meaning that the value of one error does not influence another. This assumption arises from the requirement that the data constitute a random sample, ensuring no serial correlation or dependence structure, such as in time-series data.^[7] In the simple linear regression context, independence implies that observations are drawn without clustering or autocorrelation, which is crucial for the validity of standard errors.^[14] Homoscedasticity: The variance of the errors is constant across all levels of the predictor, so \text{Var}(\varepsilon_i) = \sigma^2 for all i, regardless of X. This equal spread of residuals around the regression line prevents heteroscedasticity, where variance increases or decreases with X, which could otherwise inflate standard errors for certain predictions.^[7] The assumption is part of the Gauss-Markov conditions that make OLS the best linear unbiased estimator (BLUE).^[14] Normality: For exact finite-sample inference, such as t-tests and F-tests, the errors are assumed to be normally distributed, \varepsilon_i \sim N(0, \sigma^2). This Gaussian assumption facilitates the derivation of the sampling distribution of the OLS estimators.^[7] However, it is not required for consistency or unbiasedness; in large samples, the central limit theorem ensures asymptotic normality of the estimators even under non-normal errors. No perfect multicollinearity: In simple linear regression, this reduces to the predictor X not being constant across all observations, ensuring variation in X to allow estimation of \beta_1. Without this, the model parameters cannot be uniquely identified.^[12] Violations of these assumptions can compromise the model's reliability, but simple linear regression is robust in several ways, particularly with large sample sizes. For instance, breaches in homoscedasticity or normality often do not severely affect point estimates, though they may impact inference; asymptotic theory supports valid hypothesis tests as the number of observations grows.^[15] Linearity and independence violations, however, tend to have more pronounced effects, potentially requiring model respecification or alternative methods.^[15]

Estimation Methods

Ordinary Least Squares

Ordinary least squares (OLS) is the primary method for estimating the parameters of the simple linear regression model, introduced by Adrien-Marie Legendre in 1805 as a technique to fit lines to observational data by minimizing the sum of squared errors.^[16] The core principle of OLS involves selecting the intercept b_0 and slope b_1 that minimize the sum of squared residuals (SSR), defined as

\text{SSR} = \sum_{i=1}^n (Y_i - \hat{Y}_i)^2,

where \hat{Y}_i = b_0 + b_1 X_i represents the predicted value for the i-th observation.^[17] This minimization criterion assumes that errors are measured vertically from the response variable Y to the line, emphasizing the model's predictive accuracy for Y given X.^[18] Geometrically, the OLS regression line can be interpreted as the straight line passing through the centroid (\bar{X}, \bar{Y}) of the data cloud that minimizes the sum of the squared vertical distances from each data point to the line.^[18] This property ensures the line balances the data around the mean, providing an intuitive visual representation of the best linear fit in the plane spanned by the observations.^[17] To derive the OLS estimates, the SSR is treated as a function of b_0 and b_1, and its partial derivatives are set to zero, yielding a system of linear equations known as the normal equations:

\sum_{i=1}^n Y_i = n b_0 + b_1 \sum_{i=1}^n X_i,

\sum_{i=1}^n X_i Y_i = b_0 \sum_{i=1}^n X_i + b_1 \sum_{i=1}^n X_i^2.

These equations arise directly from the calculus-based optimization and form the foundation for solving the parameter estimates.^[17]^[18] Under the assumptions of linearity in parameters and strict exogeneity (E[ε | X] = 0), the OLS estimators are unbiased, meaning their expected values equal the true population parameters.^[19] Furthermore, the Gauss-Markov theorem establishes that OLS produces the best linear unbiased estimators (BLUE), with the smallest variance among all linear unbiased estimators, provided the additional assumption of homoscedasticity holds.^[20]^[21] OLS is also computationally straightforward, relying solely on sums and products of the data, which facilitates its implementation even with limited resources.^[18]

Coefficient Formulas

The ordinary least squares (OLS) estimators for the coefficients in simple linear regression are obtained by solving the normal equations, which minimize the sum of squared residuals.^[2] The slope estimator b_1 is given by the sample covariance of X and Y divided by the sample variance of X:

b_1 = \frac{\sum_{i=1}^n (X_i - \bar{X})(Y_i - \bar{Y})}{\sum_{i=1}^n (X_i - \bar{X})^2} = \frac{\text{Cov}(X, Y)}{\text{Var}(X)},

where \bar{X} and \bar{Y} are the sample means of the predictor and response variables, respectively.^[2] This formulation, rooted in the method of least squares introduced by Adrien-Marie Legendre in 1805, expresses the slope as a measure of linear association scaled by the variability in X.^[22] An alternative computational form for the slope, useful for direct calculation from raw data, is

b_1 = \frac{n \sum_{i=1}^n X_i Y_i - \left( \sum_{i=1}^n X_i \right) \left( \sum_{i=1}^n Y_i \right)}{n \sum_{i=1}^n X_i^2 - \left( \sum_{i=1}^n X_i \right)^2}.

This expression avoids explicit computation of means and is equivalent to the covariance form.^[2] The intercept estimator b_0 is then

b_0 = \bar{Y} - b_1 \bar{X},

ensuring the regression line passes through the point of means (\bar{X}, \bar{Y}).^[2] The fitted values for the response variable are predicted by the estimated model:

\hat{Y}_i = b_0 + b_1 X_i

for each observation i = 1, \dots, n.^[2] The residuals, which represent the deviations between observed and fitted values, are defined as

e_i = Y_i - \hat{Y}_i.

^[2]

Interpretation

Slope Meaning

In simple linear regression, the slope coefficient, denoted b_1, represents the estimated change in the expected value of the response variable Y for each one-unit increase in the predictor variable X.^[23] This interpretation holds under the model's assumptions, where no other factors are involved, providing a measure of the average linear association between X and Y.^[24] The sign of b_1 indicates the direction of this association: a positive value suggests a direct relationship, where increases in X are associated with increases in Y, while a negative value implies an inverse relationship, with increases in X linked to decreases in Y.^[23] For instance, in a model relating height to weight, a positive b_1 would mean that taller individuals tend to weigh more, with the magnitude specifying the average weight gain per additional unit of height.^[23] The units of b_1 are determined by the scales of Y and X, specifically units of Y per unit of X, ensuring the coefficient's interpretability remains tied to the data's measurement context.^[23] A slope of zero indicates no linear association between X and Y, implying that changes in X do not systematically predict changes in Y under the model.^[25] Furthermore, b_1 is directly related to the covariance between X and Y, scaled by the inverse of the variance of X, which quantifies how the joint variability of the variables contributes to the estimated linear effect.^[26] This connection underscores the slope's role in capturing the strength and direction of the linear dependence relative to the predictor's spread.^[27]

Intercept Meaning

In simple linear regression, the intercept parameter, denoted as b_0, represents the expected value of the response variable Y when the predictor variable X is equal to zero, or equivalently, the predicted value \hat{Y} at X = 0.^[28]^[29] This interpretation follows directly from the model equation E(Y \mid X = x) = \beta_0 + \beta_1 x, where \beta_0 is the true population intercept.^[10] However, the practical relevance of the intercept can be limited if X = 0 falls outside the observed range of the data or represents an impossible scenario in the context of the variables.^[28] For instance, in a regression model predicting weight from height, an intercept implying a negative weight at zero height lacks physical meaning, as heights are positive.^[28] In such cases, the intercept serves more as a mathematical adjustment rather than a substantive prediction.^[10] The ordinary least squares estimate of the intercept ensures that the fitted regression line passes through the point of means (\bar{X}, \bar{Y}), which centers the model around the data.^[30] This property is reflected in the formula b_0 = \bar{Y} - b_1 \bar{X}, guaranteeing that the predicted value at the average predictor equals the average response.^[30] Omitting the intercept by setting b_0 = 0 fundamentally alters the model, forcing the line through the origin and potentially biasing estimates unless theoretically justified.

Correlation Coefficient

The Pearson correlation coefficient, denoted as r, is a standardized measure of the strength and direction of the linear relationship between two variables, X and Y. It is defined as the covariance between X and Y divided by the product of their standard deviations:

r = \frac{\text{Cov}(X, Y)}{s_X s_Y},

where s_X and s_Y are the sample standard deviations of X and Y, respectively. Equivalently, it can be computed as

r = \frac{\sum_{i=1}^n (X_i - \bar{X})(Y_i - \bar{Y})}{\sqrt{\sum_{i=1}^n (X_i - \bar{X})^2 \sum_{i=1}^n (Y_i - \bar{Y})^2}},

which normalizes the data by centering at the means \bar{X} and \bar{Y}. The value of r ranges from -1 to 1, where r = 1 indicates a perfect positive linear relationship, r = -1 a perfect negative linear relationship, and r = 0 no linear association.^[31]^[32] In the context of simple linear regression, the correlation coefficient is closely related to the slope estimate b_1. Specifically, r = b_1 \frac{s_X}{s_Y}, or equivalently, b_1 = r \frac{s_Y}{s_X}, showing that the sign of r always matches the sign of the slope, while its magnitude reflects the slope scaled by the ratio of standard deviations. This relationship highlights how r standardizes the association to be unitless, unlike the slope which depends on the units of X and Y.^[33] The absolute value |r| indicates the strength of the linear association: values near 1 suggest a strong linear relationship, while values near 0 indicate weak or no linear association. The sign of r conveys the direction—positive for direct association and negative for inverse. Additionally, the square of the correlation coefficient, R^2 = r^2, is the coefficient of determination, representing the proportion of the variance in Y that is explained by the linear variation in X under the regression model. For example, if r = 0.8, then R^2 = 0.64, meaning 64% of the variability in Y is accounted for by X.^[34]^[33] Despite its utility, the Pearson correlation coefficient has notable limitations. It solely measures linear associations and may detect no correlation for strong nonlinear relationships, such as quadratic patterns. Furthermore, it is highly sensitive to outliers, which can disproportionately influence the coefficient and lead to misleading interpretations of the association strength.^[35]^[36]

Statistical Properties

Unbiasedness

In simple linear regression, the ordinary least squares (OLS) estimators b_0 and b_1 are unbiased, meaning their expected values equal the true population parameters: E(b_1) = \beta_1 and E(b_0) = \beta_0.^[37] This property holds under the core assumptions of the model, specifically linearity in parameters and the strict exogeneity condition that the errors have zero conditional mean given the predictors, E(\varepsilon_i \mid X_i) = 0.^[21] To demonstrate unbiasedness for the slope estimator, consider the formula

b_1 = \frac{\sum_{i=1}^n (X_i - \bar{X})(Y_i - \bar{Y})}{\sum_{i=1}^n (X_i - \bar{X})^2},

where \bar{X} and \bar{Y} are the sample means. Substituting the model Y_i = \beta_0 + \beta_1 X_i + \varepsilon_i yields

Y_i - \bar{Y} = \beta_1 (X_i - \bar{X}) + (\varepsilon_i - \bar{\varepsilon}),

b_1 = \beta_1 + \frac{\sum_{i=1}^n (X_i - \bar{X})(\varepsilon_i - \bar{\varepsilon})}{\sum_{i=1}^n (X_i - \bar{X})^2}.

Taking expectations, E(b_1) = \beta_1 + E\left[ \frac{\sum_{i=1}^n (X_i - \bar{X})(\varepsilon_i - \bar{\varepsilon})}{\sum_{i=1}^n (X_i - \bar{X})^2} \right]. Under the assumptions, the expected value of the second term is zero because E(\varepsilon_i) = 0 for all i and the errors are independent of the predictors (treating X as fixed or conditioning on X).^[37]^[18] For the intercept estimator, b_0 = \bar{Y} - b_1 \bar{X}. The sample mean \bar{Y} is unbiased for its expectation, E(\bar{Y}) = \beta_0 + \beta_1 \bar{X}, and since E(b_1) = \beta_1, it follows that E(b_0) = E(\bar{Y}) - E(b_1) \bar{X} = \beta_0.^[37] Unbiasedness requires only the linearity and independence assumptions; it does not depend on normality of errors or homoscedasticity.^[19] Under the Gauss-Markov theorem, which assumes linearity, strict exogeneity, homoscedasticity, and no serial correlation in errors, the OLS estimators are the best linear unbiased estimators (BLUE), meaning they have the minimum variance among all linear unbiased estimators.^[38] This theorem, originally developed by Carl Friedrich Gauss in his 1821 work on least squares, provides the foundational justification for OLS in linear models.^[39]

Variances of Estimators

In simple linear regression, the ordinary least squares (OLS) estimators \hat{\beta}_0 and \hat{\beta}_1 are random variables due to the stochastic nature of the error terms, and their variances quantify the sampling variability around their expected values. Under the standard assumptions of the linear model—including linearity, strict exogeneity, homoscedasticity (constant error variance \sigma^2), and no perfect collinearity—the variance of the slope estimator is given by

\text{Var}(\hat{\beta}_1) = \frac{\sigma^2}{\sum_{i=1}^n (x_i - \bar{x})^2},

where \sum_{i=1}^n (x_i - \bar{x})^2 denotes the total variation in the predictor variable x, often abbreviated as S_{xx}. This formula arises from the Gauss-Markov theorem, which establishes the OLS estimators as the best linear unbiased estimators with minimum variance under these assumptions.^[40] The variance of the intercept estimator is

\text{Var}(\hat{\beta}_0) = \sigma^2 \left( \frac{1}{n} + \frac{\bar{x}^2}{\sum_{i=1}^n (x_i - \bar{x})^2} \right),

reflecting contributions from both sample size n and the positioning of the mean \bar{x} relative to the spread in x. Larger values of \sum_{i=1}^n (x_i - \bar{x})^2 reduce \text{Var}(\hat{\beta}_1), improving precision by leveraging greater dispersion in the predictors, while increasing n diminishes \text{Var}(\hat{\beta}_0) through the $1/n term.^[40] The covariance between \hat{\beta}_0 and \hat{\beta}_1 is \text{Cov}(\hat{\beta}_0, \hat{\beta}_1) = -\bar{x} \sigma^2 / \sum_{i=1}^n (x_i - \bar{x})^2, indicating negative dependence that strengthens when \bar{x} is farther from zero. Since \sigma^2 is unknown in practice, it is estimated unbiasedly by s^2 = \sum_{i=1}^n e_i^2 / (n-2), where e_i = y_i - \hat{y}_i are the residuals and n-2 reflects the degrees of freedom lost to estimating two parameters. The standard error of the slope, s_{\hat{\beta}_1} = s / \sqrt{\sum_{i=1}^n (x_i - \bar{x})^2}, then provides an estimate of \sqrt{\text{Var}(\hat{\beta}_1)} for inference purposes, assuming homoscedasticity holds.^[40] These expressions highlight how estimator precision depends on error variance and data configuration, with violations of homoscedasticity potentially inflating these variances.

Inference Procedures

Confidence Intervals

In simple linear regression, confidence intervals quantify the uncertainty around estimates of the regression coefficients and predicted values by providing a range likely to contain the true population values with a specified confidence level, such as 95%. These intervals rely on the t-distribution with n-2 degrees of freedom to account for the additional variability from estimating the error variance s^2. The standard errors used in these intervals derive from the variances of the estimators, ensuring the intervals reflect the sampling variability under the model's assumptions.^[41] The confidence interval for the slope coefficient \beta_1 is constructed as
b_1 \pm t_{\alpha/2, n-2} \, s_{b_1},
where s_{b_1} = s / \sqrt{\sum_{i=1}^n (x_i - \bar{x})^2} is the standard error of the slope, and s = \sqrt{\sum_{i=1}^n (y_i - \hat{y}_i)^2 / (n-2)} is the residual standard error. Similarly, the interval for the intercept \beta_0 is
b_0 \pm t_{\alpha/2, n-2} \, s_{b_0},
with s_{b_0} = s \sqrt{1/n + \bar{x}^2 / \sum_{i=1}^n (x_i - \bar{x})^2}. These intervals indicate the plausible range for the true coefficients, with narrower widths for larger sample sizes or stronger linear relationships.^[42]^[43] For the mean response at a specific predictor value x_0, the confidence interval estimates the expected value of y and is given by
\hat{y}_0 \pm t_{\alpha/2, n-2} \, s \sqrt{\frac{1}{n} + \frac{(x_0 - \bar{x})^2}{\sum_{i=1}^n (x_i - \bar{x})^2}}.
This formula incorporates the leverage of x_0 relative to the data, making the interval wider when x_0 is distant from \bar{x}, as extrapolation increases uncertainty.^[41] The prediction interval for an individual future observation at x_0 extends beyond the mean response to account for both the uncertainty in the mean and the inherent variability of a single y, yielding
\hat{y}_0 \pm t_{\alpha/2, n-2} \, s \sqrt{1 + \frac{1}{n} + \frac{(x_0 - \bar{x})^2}{\sum_{i=1}^n (x_i - \bar{x})^2}}.
The extra term under the square root ensures this interval is always wider than the mean response interval, reflecting the added variability of individual predictions. Intervals for both the mean and predictions broaden with greater distance from the data's predictor range, emphasizing the risks of extrapolation.^[44] In large samples where n is sufficiently big, an asymptotic approximation replaces the t-critical value with the z-critical value from the standard normal distribution, particularly when strict normality of errors is not assumed, though the t-based approach remains preferred for smaller samples.^[41]

Hypothesis Testing

In simple linear regression, hypothesis testing is used to assess the significance of the relationship between the predictor variable x and the response variable y. The primary test focuses on the slope parameter \beta_1, evaluating whether it differs from zero, which would indicate no linear relationship. The null hypothesis is H_0: \beta_1 = 0 (no linear association), and the alternative hypothesis is H_a: \beta_1 \neq 0 (a linear association exists).^[45] The test statistic is the t-ratio, given by t = \frac{b_1}{s_{b_1}}, where b_1 is the estimated slope and s_{b_1} is its standard error. Under H_0, this statistic follows a t-distribution with n-2 degrees of freedom, where n is the sample size.^[45]^[46] For the overall model fit, an F-test examines whether the regression explains a significant portion of the variability in y. The null hypothesis is again H_0: \beta_1 = 0, testing if the model is better than a horizontal line through the mean. The test statistic is F = \frac{\text{MSR}}{\text{MSE}} = \frac{\text{[SSR](/page/SSR)}/1}{\text{[SSE](/page/SSE)}/(n-2)}, where SSR is the regression sum of squares (variability explained by the model) and SSE is the error sum of squares (unexplained variability). This simplifies to F = \frac{R^2}{1 - R^2} \cdot (n-2), with R^2 as the coefficient of determination. Under H_0, F follows an F-distribution with 1 and n-2 degrees of freedom. In simple linear regression, the F-test is mathematically equivalent to the square of the t-test for the slope, as F = t^2.^[47]^[48] Decisions in hypothesis testing rely on p-values, which are the probability of observing a test statistic at least as extreme as the calculated value under H_0. The null hypothesis is rejected if the p-value is less than the significance level \alpha (commonly 0.05), indicating sufficient evidence of a linear relationship. Critical values from the t- or F-distribution can also be used for comparison.^[45]^[46] The analysis of variance (ANOVA) table provides a structured summary for these tests, decomposing the total sum of squares (SST) into SSR and SSE components: \text{SST} = \text{SSR} + \text{SSE}, where SST measures total variability in y around its mean. The table includes degrees of freedom (df: 1 for regression, n-2 for error, n-1 total), mean squares (MSR = SSR/1, MSE = SSE/(n-2)), the F-statistic, and p-value. This breakdown quantifies how much variance the model captures versus random error.^[47]^[48] Power considerations in these tests highlight the importance of sample size for detecting true effects. The power (1 - \beta) is the probability of rejecting H_0 when it is false, depending on the effect size (e.g., standardized slope), significance level \alpha, and n. For instance, detecting a small slope difference often requires larger samples, with formulas or software like G*Power used to compute required n based on desired power (e.g., 0.80).^[49]^[50]

Numerical Example

Data Setup

To illustrate simple linear regression, consider data from a sample of 10 students examining the relationship between height (in inches) and weight (in pounds).^[51] The dataset consists of the following paired observations:

Height (inches)	Weight (pounds)
63	127
64	121
66	142
69	157
69	162
71	156
71	169
72	165
73	181
75	208

Summary statistics for this dataset include a mean height \bar{X} = 69.3 inches, mean weight \bar{Y} = 158.8 pounds, standard deviation of height s_X \approx 3.92 inches, standard deviation of weight s_Y \approx 25.4 pounds, and sample correlation coefficient r \approx 0.95.^[51]^[52] A scatterplot of weight against height shows a strong positive linear trend, with points generally increasing from lower left to upper right and minimal deviation from a straight line, suggesting that height is a useful predictor of weight in this sample.^[51]

Computation Steps

To compute the ordinary least squares (OLS) estimators for the simple linear regression model using the example data, first calculate the sample means \bar{x} = 69.3 and \bar{y} = 158.8.^[51] The slope estimator b_1 is then computed using the formula

b_1 = \frac{\sum_{i=1}^n (x_i - \bar{x})(y_i - \bar{y})}{\sum_{i=1}^n (x_i - \bar{x})^2},

which yields b_1 = 6.1. The intercept estimator b_0 is

b_0 = \bar{y} - b_1 \bar{x} = 158.8 - 6.1 \times 69.3 = -266.5.

Thus, the fitted line is \hat{Y} = -266.5 + 6.1 X.^[51] Next, fitted values \hat{y}_i = b_0 + b_1 x_i and residuals e_i = y_i - \hat{y}_i are computed for each observation. The sum of squared residuals (SSE) is approximately 597. The unbiased estimate of the error variance is s^2 = \frac{\mathrm{[SSE](/page/SSE)}}{n-2} = \frac{597}{8} \approx 75, where n=10.^[53] The coefficient of determination R^2 measures the proportion of variance in Y explained by the model and is calculated as R^2 = r^2 \approx (0.95)^2 = 0.90, where the total sum of squares \mathrm{SST} = (n-1) s_Y^2 \approx 9 \times (25.4)^2 \approx 5806, and \mathrm{[SSE](/page/SSE)} = (1 - R^2) \mathrm{[SST](/page/SST)} \approx 0.10 \times 5806 \approx 581.^[51]

Extensions and Alternatives

Regression Without Intercept

In simple linear regression without an intercept, also known as regression through the origin, the model assumes that the response variable Y_i is directly proportional to the predictor variable X_i with no additive constant term. The model is expressed as

Y_i = \beta_1 X_i + \varepsilon_i,

where \varepsilon_i are independent errors with mean zero and constant variance \sigma^2, and the fitted value is \hat{Y}_i = b_1 X_i.^[54]^[55] The least-squares estimator for the slope b_1 is obtained by minimizing the sum of squared residuals \sum_{i=1}^n (Y_i - b_1 X_i)^2. Differentiating this objective with respect to b_1 and setting it to zero yields the normal equation \sum X_i Y_i = b_1 \sum X_i^2, so

b_1 = \frac{\sum_{i=1}^n X_i Y_i}{\sum_{i=1}^n X_i^2}.

This estimator is unbiased under the model assumption that the true intercept is zero, but it becomes biased if the true model includes a nonzero intercept.^[55]^[56] This formulation is appropriate when theoretical or physical principles dictate that the relationship passes through the origin, such as in Hooke's law where the restoring force is directly proportional to displacement with no constant term, or in certain calibration experiments where zero input yields zero output.^[54]^[57] In contrast to the standard model with an intercept, the no-intercept version has only one parameter to estimate, so the residual degrees of freedom for estimating \sigma^2 is n-1 rather than n-2.^[58] When applying this model to data originally analyzed with an intercept, the resulting coefficient of determination R^2, defined as $1 - \frac{\sum (Y_i - \hat{Y}_i)^2}{\sum Y_i^2}, is typically lower because the forced origin constraint increases the residual sum of squares relative to the total uncorrected sum of squares, reflecting a poorer fit unless the true intercept is indeed zero.^[59]^[60]

Other Fitting Methods

When the assumption of ordinary least squares (OLS) that errors occur only in the dependent variable fails, alternative fitting methods account for measurement errors in both the independent and dependent variables. These techniques, known as errors-in-variables models, minimize distances that are not solely vertical, providing more appropriate fits in scenarios such as calibration problems or physical measurements where both variables are subject to noise. Total least squares (TLS), also called orthogonal regression, minimizes the sum of the squared perpendicular (orthogonal) distances from data points to the fitted line, treating errors symmetrically in both variables. This approach is particularly suitable when the error variances in the independent variable x and dependent variable y are assumed equal. The slope b_1 in TLS is given by

b_1 = \frac{s_Y^2 - s_X^2 + \sqrt{(s_Y^2 - s_X^2)^2 + 4 r^2 s_X^2 s_Y^2}}{2 r s_X s_Y},

where s_X and s_Y are the sample standard deviations of x and y, and r is the sample correlation coefficient; the intercept b_0 follows as b_0 = \bar{y} - b_1 \bar{x}. TLS can be computed via singular value decomposition of the data matrix, offering a geometrically intuitive solution for symmetric error structures.^[61] Deming regression extends TLS by incorporating a known ratio \lambda of the error variances (\sigma_X^2 / \sigma_Y^2), weighting the distances accordingly to balance the contributions from errors in x and y. The slope is then

b_1 = \frac{s_Y^2 - \lambda s_X^2 + \sqrt{(s_Y^2 - \lambda s_X^2)^2 + 4 \lambda r^2 s_X^2 s_Y^2}}{2 r s_X s_Y}.

This method, originally formulated for adjusting data with correlated errors, is widely used in fields like analytical chemistry for method comparison.^[62] These alternatives are recommended when there is substantial measurement error in the predictors or when residuals should not be assumed vertical, such as in instrumental calibrations or biological assays. In contrast to OLS, which assumes no error in x and thus minimizes only vertical residuals, TLS and Deming provide symmetric treatment but can be less statistically efficient if the OLS assumptions hold, as they do not leverage the full information from the error-free predictor. Despite their advantages, TLS and Deming regression are more computationally complex than OLS and require estimates of error variance ratios, which may introduce bias if misspecified; additionally, they assume homoscedastic errors and linearity, limiting applicability without further extensions.

References

[1]
2.1 - What is Simple Linear Regression? | STAT 462
Simple linear regression is a statistical method that allows us to summarize and study relationships between two continuous (quantitative) variables.
[2]
[PDF] Simple Linear Regression
Simple linear regression models the relationship between two variables, x and y, with the equation y = β0 + β1 x + ε, where ε is a random error term.
[3]
1.1 - What is Simple Linear Regression? | STAT 501
Simple linear regression is a statistical method to study relationships between two continuous variables, one predictor and one response, using only one ...
[4]
Introduction to linear regression - Duke People
This type of predictive model was first studied in depth by a 19th-Century scientist, Sir Francis Galton. Galton was a self-taught naturalist, anthropologist, ...
[5]
[PDF] Simple Linear Regression
1.2 Regression Models and Their Uses. Historical Origins. Regression analysis was first developed by Sir Francis Galton in the latter part of the. 19th century ...
[6]
[PDF] Galton, Pearson, and the Peas: A Brief History of Linear Regression ...
Dec 1, 2017 · This paper presents a brief history of how Galton originally derived and applied linear regression to problems of heredity. This history.
[7]
Testing the assumptions of linear regression - Duke People
There are four principal assumptions which justify the use of linear regression models for purposes of inference or prediction: (i) linearity and additivity ...
[8]
Lesson 4: SLR Model Assumptions | STAT 501
Since the assumptions relate to the (population) prediction errors, we do this through the study of the (sample) estimated errors, the residuals. We focus in ...
[9]
2.3 - The Simple Linear Regression Model | STAT 462
The least squares regression line doesn't match the population regression line perfectly, but it is a pretty good estimate. And, of course, we'd get a ...
[10]
[PDF] Chapter 9 Simple Linear Regression - Statistics & Data Science
Simple linear regression is an analysis for a quantitative outcome and a single quantitative explanatory variable, using the model E(Y |x) = β0 + β1x.
[11]
Simple Linear Regression - Statistics Resources
Oct 27, 2025 · The Simple Linear Regression is used to create a predictive model using one independent variable and one dependent variable.
[12]
[PDF] The Classical Linear Regression Model and Least Squares
The CLRM rests on 4 key assumptions. An optional 5th assumption (normality of the error term) can be added in cases where a fully parametric approach is desired ...
[13]
Lesson 4: SLR Model Assumptions - STAT ONLINE
The residuals "bounce randomly" around the residual = 0 line. This suggests that the assumption that the relationship is linear is reasonable.
[14]
[PDF] The Simple Linear Regression Model: Specification and Estimation
Jan 17, 2011 · In order for the Gauss-Markov Theorem to hold, assumptions SR1-SR5 must be true. If any of these assumptions are not true, then b1 and b2 are ...
[15]
[PDF] 1 ROBUSTNESS Our model for simple linear regression has four ...
ROBUSTNESS. Our model for simple linear regression has four assumptions: 1. Linear mean function: E(Y|x) = η0 + η1x. 2.Constant variance of conditional ...
[16]
[PDF] Legendre On Least Squares - University of York
The portion of the work translated here is found on pages 72–75. Adrien-Marie Legendre (1752–1833) was for five years a professor of mathematics in the École ...Missing: original | Show results with:original
[17]
[PDF] Chapter 2 Simple Linear Regression Analysis - IIT Kanpur
Direct regression method. This method is also known as the ordinary least squares estimation. Assuming that a set of n paired observations on ( , ),. 1,2 ...Missing: historical | Show results with:historical
[18]
[PDF] Applied Linear Regression - Purdue Department of Statistics
... book is to help you learn to do data analysis using linear regression. Linear regression is a excellent model for learning about data analysis, both because ...
[19]
Gauss on least-squares and maximum-likelihood estimation
Apr 2, 2022 · Gauss' 1809 discussion of least squares, which can be viewed as the beginning of mathematical statistics, is reviewed.Missing: Carl chance
[20]
Nouvelles méthodes pour la détermination des orbites des comètes
Jun 18, 2008 · Nouvelles méthodes pour la détermination des orbites des comètes. by: Adrien Marie Legendre. Publication date: 1805. Publisher: F. Didot.
[21]
12.3 - Simple Linear Regression - STAT ONLINE
A positive slope indicates a line moving from the bottom left to top right. A negative slope indicates a line moving from the top left to bottom right. For ...
[22]
Application and interpretation of linear-regression analysis - PMC
A simple linear regression involves a single independent variable, whereas multiple linear regression includes multiple predictors. A linear-regression model is ...
[23]
Common pitfalls in statistical analysis: Linear regression analysis - NIH
Simple linear regression (i.e., its coefficient or “b”) predicts the nature of the association – it provides a means of predicting the value of dependent ...
[24]
The First Method for Finding $\beta_0$ and $\beta_1$
Now, we can find β0 and β1 if we know EX, EY, Cov(X,Y)Var(X). Here, we have the observed pairs (x1,y1), (x2,y2), ⋯, (xn,yn), so we may estimate these ...
[25]
[PDF] Simple Linear Regression Models, with Hints at Their Estimation
In lecture 1, we saw that the optimal linear predictor of Y from X has slope β1 = Cov [X, Y ] /Var [X], and intercept β0 = E [Y ]-β1E [X]. A common tactic.Missing: beta1 = | Show results with:beta1 =
[26]
12.3 - Simple Linear Regression | STAT 200
Example: Interpreting the Equation for a Line Section ... Here, the y -intercept is 6.5. This means that when x = 0 then the predicted value of y is 6.5. The ...
[27]
Simple Linear Regression - Data Science Discovery
Simple linear regression fits a line to points using one x-variable to predict one y-variable, using one independent variable.Missing: definition | Show results with:definition
[28]
Mathematics of simple regression - Duke People
Review of the mean model, formulas for the slope and intercept of a simple regression model, formulas for R-squared and standard error of the regression.
[29]
Correlation Coefficient -- from Wolfram MathWorld
The correlation coefficient, sometimes also called the cross-correlation coefficient, Pearson correlation coefficient (PCC), Pearson's r ...
[30]
Pearson Correlation Coefficient (r) | Guide & Examples - Scribbr
May 13, 2022 · It is a number between –1 and 1 that measures the strength and direction of the relationship between two variables.
[31]
1.6 - (Pearson) Correlation Coefficient, $r$ | STAT 501
If b 1 is positive, then r takes a positive sign. That is, the estimated slope and the correlation coefficient r always share the same sign. Furthermore, ...
[32]
2.6 - (Pearson) Correlation Coefficient r | STAT 462
Furthermore, because r2 is always a number between 0 and 1, the correlation coefficient r is always a number between -1 and 1.
[33]
18.5 - Use and Misuse of Correlation Coefficients | STAT 509
The Pearson correlation coefficient can be very sensitive to outlying observations and all correlation coefficients are susceptible to sample selection biases.Missing: limitations | Show results with:limitations
[34]
19
The Pearson correlation is very sensitive (affected by) extreme values (outliers), and also should not be used for non-linear relationships.
[35]
Proof: Expectation of parameter estimates for simple linear regression
Oct 27, 2021 · Proof: Expectation of parameter estimates for simple linear regression ... which means that the ordinary least squares solution produces unbiased ...Missing: textbook | Show results with:textbook
[36]
The Gauss-Markov Theorem and BLUE OLS Coefficient Estimates
When your model satisfies the assumptions, the Gauss-Markov theorem states that the OLS procedure produces unbiased estimates that have the minimum variance.
[37]
7 Classical Assumptions of Ordinary Least Squares (OLS) Linear ...
OLS does not require that the error term follows a normal distribution to produce unbiased estimates with the minimum variance. However, satisfying this ...
[38]
Gauss Markov theorem - StatLect
The Gauss Markov theorem: under what conditions the OLS estimator of the coefficients of a linear regression is BLUE (best linear unbiased estimator).
[39]
Theory of the combination of observations least subject to error : part ...
Aug 9, 2019 · Theory of the combination of observations least subject to error : part one, part two, supplement = Theoria combinationis observationum ...
[40]
Linear Regression Analysis - George A. F. Seber, Alan J. Lee
Concise, mathematically clear, and comprehensive treatment of the subject. Expanded coverage of diagnostics and methods of model fitting.<|control11|><|separator|>
[41]
7.5 - Confidence Intervals for Regression Parameters | STAT 415
Confidence intervals for regression parameters are derived using probability distributions. The slope parameter interval is given, and the intercept parameter ...
[42]
12.3.5 - Confidence Interval for Slope | STAT 200
We can use the slope that was computed from our sample to construct a confidence interval for the population slope ( β 1 ). This confidence interval follows the ...
[43]
2.1 - Inference for the Population Intercept and Slope | STAT 501
The resulting confidence interval gives us a range of values that is likely to contain the true unknown value β 0 . The factors affecting the length of a ...
[44]
Confidence vs prediction intervals for regression
1. In fact, for least squares simple linear regression, The width of the confidence interval depends on the variance of ŷ = ax + b as an estimator of E(Y|X = x ...
[45]
6.4 - The Hypothesis Tests for the Slopes | STAT 501
To test that all of the slope parameters in a multiple linear regression model are 0, we use the overall F-test reported in the analysis of variance table.
[46]
[PDF] Chapter 2: Simple Linear Regression - Purdue Department of Statistics
The method of least squares is to estimate β0 and β1 so that the sum of the squares of the differ- ence between the observations yi and the straight.
[47]
3.5 - The Analysis of Variance (ANOVA) table and the F-test
3.5 - The Analysis of Variance (ANOVA) table and the F-test · The degrees of freedom associated with SSR will always be 1 for the simple linear regression model.
[48]
2.6 - The Analysis of Variance (ANOVA) table and the F-test
The degrees of freedom associated with SSR will always be 1 for the simple linear regression model. The degrees of freedom associated with SSTO is n-1 = 49-1 = ...
[49]
[PDF] Statistical power analyses using G*Power 3.1: Tests for correlation ...
This section summarizes power analysis procedures addressing regression coefficients in the bivariate linear standard model Yi 5 a 1 b·Xi 1 Ei, where Yi and Xi ...
[50]
[PDF] Sample Size Calculations in Simple Linear Regression
The focus of this disquisition is sample size determination in the context of a simple linear regression. The regression model has an outcome variable Y and a ...
[51]
3.1 - The Research Questions | STAT 501
Here are some weight and height data from a sample of n = 10 people, (Student Height and Weight data):. If we used the average weight of the 10 people in the ...
[52]
None
### Extracted Data
[53]
Regression through the origin - Statistics LibreTexts
Aug 17, 2020 · Regression through the origin. Sometimes due to the nature of the problem (e.g. (i) physical law ... The following example illustrates such a ...
[54]
[PDF] STAT611, Semester II 2004-2005 The simple linear regression ...
The designation simple indicates that there is only one predictor variable x, and linear means that the model is linear in β0 and β1. The intercept β0 and ...
[55]
When is it ok to remove the intercept in a linear regression model?
Mar 7, 2011 · A model without the intercept could have better performance. In this way the intercept is like any other regressor variable that we might choose ...When forcing intercept of 0 in linear regression is acceptable ...Why is a zero-intercept linear regression model predicts better than ...More results from stats.stackexchange.com
[56]
Regression lines - DataClassroom User Guide
Forcing a linear regression through the origin (0,0). Sometimes you know that there are some physical laws that mean that a linear relationship you are ...
[57]
Linear Regression through the Origin - RPubs
Jul 22, 2016 · As there is only one estimated parameter in RTO, the degrees of freedom increases by one. Therefore the SSE and MSE are: SSE=n∑i=1(Y ...
[58]
FAQ: Why are R2 and F so large for models without a constant?
When the intercept (or constant term) is left off and it does not have a true zero effect, the total sum of squares being modelled is increased. This tends to ...
[59]
The Why, How, and Cautions of Regression Without an Intercept
Sep 16, 2023 · Another way of stating that the model goes through the origin is that the intercept of the model should be zero. Forcing the intercept to be ...
[60]
An Analysis of the Total Least Squares Problem
The restricted total least squares (RTLS) problem, presented in this paper, is devised for solving overdetermined sets of linear equations 𝐴 ⁢ 𝑋 ≈ 𝐵 in which ...
[61]
Statistical Adjustment Of Data : Deming W Edwards - Internet Archive
Jan 20, 2017 · Statistical Adjustment Of Data. by: Deming W Edwards. Publication date: 1938. Topics: C-DAC. Collection: digitallibraryindia; JaiGyan. Language ...