Fact-checked by Grok 2 weeks ago

Simple linear regression

Simple linear regression is a statistical that models the relationship between two continuous s: one independent (predictor) and one dependent (response), assuming a straight-line relationship between them. The model is expressed as y = \beta_0 + \beta_1 x + \epsilon, where y is the dependent , x is the independent , \beta_0 is the , \beta_1 is the , and \epsilon represents the random term. This technique enables the estimation of the dependent based on the independent and is foundational in fields such as , , and for analyzing linear associations. The origins of simple linear regression trace back to the late , when developed the concept while studying and the phenomenon of in biological traits, such as the heights of parents and children. Building on earlier work in estimation by in 1805 and , Galton introduced the term "regression" to describe the tendency of extreme values to move toward the average in subsequent generations. later formalized the mathematical framework in the early , extending it through correlation analysis, which solidified linear regression as a core tool in . To ensure valid inferences, simple linear regression relies on several key assumptions: (the true relationship is linear in parameters), independence of observations, homoscedasticity (constant variance of residuals across levels of the independent variable), and normality of the error terms. Violations of these assumptions, such as nonlinearity or heteroscedasticity, can lead to biased estimates or invalid predictions, necessitating diagnostic checks like residual plots. Parameter estimates are typically obtained via ordinary (OLS), which minimizes the sum of squared residuals between observed and predicted values, providing unbiased and efficient estimators under the model assumptions. In practice, simple is widely applied for , testing on the (to assess of the ), and understanding causal or associative patterns in , though it cannot establish causation without additional experimental design. Extensions include multiple linear regression for more predictors and robust methods for handling assumption violations, but the simple form remains essential for introductory statistical modeling due to its interpretability and computational simplicity.[]

Model and Assumptions

Definition and Model Equation

Simple linear regression is a fundamental statistical technique used to model and analyze the linear relationship between a single predictor , denoted as X, and a response , denoted as Y. It posits that the of the response can be expressed as a straight-line of the predictor, enabling predictions and inferences about how changes in X affect Y. This method is widely applied in fields such as , , and to quantify associations in . The population model for simple linear regression is given by the equation Y_i = \beta_0 + \beta_1 X_i + \epsilon_i, \quad i = 1, 2, \dots, n, where Y_i is the i-th observation of the response variable, X_i is the corresponding predictor value, \beta_0 represents the y-intercept (the expected value of Y when X = 0), \beta_1 denotes the slope (the expected change in Y for a one-unit increase in X), and \epsilon_i is the random error term for the i-th observation, assumed to be independent with mean zero and constant variance \sigma^2. The error terms \epsilon_i capture the unexplained variation in Y after accounting for the linear effect of X. In practice, with a sample of n data points drawn from the population, the parameters \beta_0 and \beta_1 are unknown and must be estimated, typically using letters such as b_0 and b_1 to distinguish sample estimates from the true population values represented by letters. This distinction underscores the inferential nature of , where sample-based estimates inform broader population characteristics.

Key Assumptions

The simple linear regression model is built upon a set of classical assumptions that underpin the validity of parameter estimation and . These assumptions ensure that the model's predictions align with the underlying data-generating process and that the ordinary (OLS) estimators possess desirable properties, such as unbiasedness and minimum variance under the Gauss-Markov theorem. While the core assumptions apply to both simple and multiple regression, in the simple case, they simplify due to the presence of only one predictor variable. Linearity: The primary assumption is that the conditional of the response variable Y given the predictor X is a of X, expressed as E(Y \mid X) = \beta_0 + \beta_1 X. This posits that the mean response changes linearly with the predictor, allowing the model to capture a straight-line without or higher-order terms. Violation of , such as when the true is , can lead to biased estimates, though graphical diagnostics like scatterplots can help detect this. Independence: The errors \varepsilon_i across observations must be independent, meaning that the value of one error does not influence another. This assumption arises from the requirement that the data constitute a random sample, ensuring no serial or dependence structure, such as in time-series data. In the simple linear regression context, independence implies that observations are drawn without clustering or , which is crucial for the validity of standard errors. Homoscedasticity: The variance of the errors is constant across all levels of the predictor, so \text{Var}(\varepsilon_i) = \sigma^2 for all i, regardless of X. This equal spread of residuals around the regression line prevents heteroscedasticity, where variance increases or decreases with X, which could otherwise inflate standard errors for certain predictions. The assumption is part of the Gauss-Markov conditions that make OLS the best linear unbiased estimator (). Normality: For exact finite-sample inference, such as t-tests and F-tests, the errors are assumed to be normally distributed, \varepsilon_i \sim N(0, \sigma^2). This Gaussian assumption facilitates the derivation of the sampling distribution of the OLS estimators. However, it is not required for consistency or unbiasedness; in large samples, the central limit theorem ensures asymptotic normality of the estimators even under non-normal errors. No perfect multicollinearity: In simple linear regression, this reduces to the predictor X not being constant across all observations, ensuring variation in X to allow estimation of \beta_1. Without this, the model parameters cannot be uniquely identified. Violations of these assumptions can compromise the model's reliability, but simple linear regression is robust in several ways, particularly with large sample sizes. For instance, breaches in homoscedasticity or often do not severely affect point estimates, though they may impact ; asymptotic supports valid tests as the number of observations grows. and violations, however, tend to have more pronounced effects, potentially requiring model respecification or alternative methods.

Estimation Methods

Ordinary Least Squares

Ordinary least squares (OLS) is the primary method for estimating the parameters of the simple linear regression model, introduced by in 1805 as a technique to fit lines to observational data by minimizing the sum of squared errors. The core principle of OLS involves selecting the intercept b_0 and slope b_1 that minimize the sum of squared residuals (SSR), defined as \text{SSR} = \sum_{i=1}^n (Y_i - \hat{Y}_i)^2, where \hat{Y}_i = b_0 + b_1 X_i represents the predicted value for the i-th observation. This minimization criterion assumes that errors are measured vertically from the response variable Y to the line, emphasizing the model's predictive accuracy for Y given X. Geometrically, the OLS regression line can be interpreted as the straight line passing through the centroid (\bar{X}, \bar{Y}) of the data cloud that minimizes the sum of the squared vertical distances from each data point to the line. This property ensures the line balances the data around the mean, providing an intuitive visual representation of the best linear fit in the plane spanned by the observations. To derive the OLS estimates, the is treated as a of b_0 and b_1, and its partial are set to zero, yielding a known as equations: \sum_{i=1}^n Y_i = n b_0 + b_1 \sum_{i=1}^n X_i, \sum_{i=1}^n X_i Y_i = b_0 \sum_{i=1}^n X_i + b_1 \sum_{i=1}^n X_i^2. These equations arise directly from the calculus-based optimization and form the foundation for solving the parameter estimates. Under the assumptions of linearity in parameters and strict exogeneity (E[ε | X] = 0), the OLS estimators are unbiased, meaning their expected values equal the true parameters. Furthermore, the Gauss-Markov theorem establishes that OLS produces the best linear unbiased estimators (), with the smallest variance among all linear unbiased estimators, provided the additional assumption of homoscedasticity holds. OLS is also computationally straightforward, relying solely on sums and products of the , which facilitates its implementation even with limited resources.

Coefficient Formulas

The ordinary least squares (OLS) estimators for the coefficients in simple linear regression are obtained by solving the normal equations, which minimize the sum of squared residuals. The slope estimator b_1 is given by the sample covariance of X and Y divided by the sample variance of X: b_1 = \frac{\sum_{i=1}^n (X_i - \bar{X})(Y_i - \bar{Y})}{\sum_{i=1}^n (X_i - \bar{X})^2} = \frac{\text{Cov}(X, Y)}{\text{Var}(X)}, where \bar{X} and \bar{Y} are the sample means of the predictor and response variables, respectively. This formulation, rooted in the method of least squares introduced by in 1805, expresses the slope as a measure of linear association scaled by the variability in X. An alternative computational form for the slope, useful for direct calculation from raw data, is b_1 = \frac{n \sum_{i=1}^n X_i Y_i - \left( \sum_{i=1}^n X_i \right) \left( \sum_{i=1}^n Y_i \right)}{n \sum_{i=1}^n X_i^2 - \left( \sum_{i=1}^n X_i \right)^2}. This expression avoids explicit computation of means and is equivalent to the covariance form. The intercept estimator b_0 is then b_0 = \bar{Y} - b_1 \bar{X}, ensuring the regression line passes through the point of means (\bar{X}, \bar{Y}). The fitted values for the response are predicted by the estimated model: \hat{Y}_i = b_0 + b_1 X_i for each i = 1, \dots, n. The residuals, which represent the deviations between observed and fitted values, are defined as e_i = Y_i - \hat{Y}_i.

Interpretation

Slope Meaning

In simple linear regression, the slope coefficient, denoted b_1, represents the estimated change in the of the response Y for each one-unit increase in the predictor X. This interpretation holds under the model's assumptions, where no other factors are involved, providing a measure of the average linear association between X and Y. The sign of b_1 indicates the direction of this association: a positive value suggests a direct relationship, where increases in X are associated with increases in Y, while a negative value implies an inverse relationship, with increases in X linked to decreases in Y. For instance, in a model relating height to weight, a positive b_1 would mean that taller individuals tend to weigh more, with the magnitude specifying the average weight gain per additional unit of height. The units of b_1 are determined by the scales of Y and X, specifically units of Y per unit of X, ensuring the coefficient's interpretability remains tied to the data's context. A slope of zero indicates no linear between X and Y, implying that changes in X do not systematically predict changes in Y under the model. Furthermore, b_1 is directly related to the between X and Y, scaled by the inverse of the variance of X, which quantifies how the joint variability of the variables contributes to the estimated linear effect. This connection underscores the 's role in capturing the strength and direction of the linear dependence relative to the predictor's spread.

Intercept Meaning

In simple linear regression, the intercept parameter, denoted as b_0, represents the expected value of the response variable Y when the predictor variable X is equal to zero, or equivalently, the predicted value \hat{Y} at X = 0. This interpretation follows directly from the model equation E(Y \mid X = x) = \beta_0 + \beta_1 x, where \beta_0 is the true population intercept. However, the practical relevance of the intercept can be limited if X = 0 falls outside the observed range of the data or represents an impossible scenario in the context of the variables. For instance, in a model predicting from , an intercept implying a negative weight at zero height lacks physical meaning, as heights are positive. In such cases, the intercept serves more as a mathematical adjustment rather than a substantive prediction. The ordinary least squares estimate of the intercept ensures that the fitted regression line passes through the point of means (\bar{X}, \bar{Y}), which centers the model around the data. This property is reflected in the formula b_0 = \bar{Y} - b_1 \bar{X}, guaranteeing that the predicted value at the average predictor equals the average response. Omitting the intercept by setting b_0 = 0 fundamentally alters the model, forcing the line through the origin and potentially biasing estimates unless theoretically justified.

Correlation Coefficient

The Pearson correlation coefficient, denoted as r, is a standardized measure of the strength and direction of the linear relationship between two variables, X and Y. It is defined as the covariance between X and Y divided by the product of their standard deviations: r = \frac{\text{Cov}(X, Y)}{s_X s_Y}, where s_X and s_Y are the sample standard deviations of X and Y, respectively. Equivalently, it can be computed as r = \frac{\sum_{i=1}^n (X_i - \bar{X})(Y_i - \bar{Y})}{\sqrt{\sum_{i=1}^n (X_i - \bar{X})^2 \sum_{i=1}^n (Y_i - \bar{Y})^2}}, which normalizes the data by centering at the means \bar{X} and \bar{Y}. The value of r ranges from -1 to 1, where r = 1 indicates a perfect positive linear relationship, r = -1 a perfect negative linear relationship, and r = 0 no linear association. In the context of simple linear regression, the is closely related to the estimate b_1. Specifically, r = b_1 \frac{s_X}{s_Y}, or equivalently, b_1 = r \frac{s_Y}{s_X}, showing that the of r always matches the sign of the slope, while its magnitude reflects the slope scaled by the ratio of standard deviations. This highlights how r standardizes the association to be unitless, unlike the slope which depends on the units of X and Y. The |r| indicates the strength of the linear : values near 1 suggest a strong linear relationship, while values near 0 indicate weak or no linear . The of r conveys the —positive for and negative for inverse. Additionally, the square of the , R^2 = r^2, is the , representing the proportion of the variance in Y that is explained by the linear variation in X under the model. For example, if r = 0.8, then R^2 = 0.64, meaning 64% of the variability in Y is accounted for by X. Despite its utility, the has notable limitations. It solely measures linear associations and may detect no correlation for strong nonlinear relationships, such as patterns. Furthermore, it is highly sensitive to outliers, which can disproportionately influence the coefficient and lead to misleading interpretations of the association strength.

Statistical Properties

Unbiasedness

In simple linear regression, the ordinary least squares (OLS) estimators b_0 and b_1 are unbiased, meaning their expected values equal the true population parameters: E(b_1) = \beta_1 and E(b_0) = \beta_0. This property holds under the core assumptions of the model, specifically linearity in parameters and the strict exogeneity condition that the errors have zero conditional mean given the predictors, E(\varepsilon_i \mid X_i) = 0. To demonstrate unbiasedness for the slope estimator, consider the formula b_1 = \frac{\sum_{i=1}^n (X_i - \bar{X})(Y_i - \bar{Y})}{\sum_{i=1}^n (X_i - \bar{X})^2}, where \bar{X} and \bar{Y} are the sample means. Substituting the model Y_i = \beta_0 + \beta_1 X_i + \varepsilon_i yields Y_i - \bar{Y} = \beta_1 (X_i - \bar{X}) + (\varepsilon_i - \bar{\varepsilon}), so b_1 = \beta_1 + \frac{\sum_{i=1}^n (X_i - \bar{X})(\varepsilon_i - \bar{\varepsilon})}{\sum_{i=1}^n (X_i - \bar{X})^2}. Taking expectations, E(b_1) = \beta_1 + E\left[ \frac{\sum_{i=1}^n (X_i - \bar{X})(\varepsilon_i - \bar{\varepsilon})}{\sum_{i=1}^n (X_i - \bar{X})^2} \right]. Under the assumptions, the of the second term is zero because E(\varepsilon_i) = 0 for all i and the errors are of the predictors (treating X as fixed or on X). For the intercept estimator, b_0 = \bar{Y} - b_1 \bar{X}. The sample mean \bar{Y} is unbiased for its , E(\bar{Y}) = \beta_0 + \beta_1 \bar{X}, and since E(b_1) = \beta_1, it follows that E(b_0) = E(\bar{Y}) - E(b_1) \bar{X} = \beta_0. Unbiasedness requires only the linearity and assumptions; it does not depend on normality of errors or homoscedasticity. Under the Gauss-Markov theorem, which assumes , strict exogeneity, homoscedasticity, and no serial correlation in errors, the OLS estimators are the best linear unbiased estimators (BLUE), meaning they have the minimum variance among all linear unbiased estimators. This theorem, originally developed by in his 1821 work on , provides the foundational justification for OLS in linear models.

Variances of Estimators

In simple linear regression, the (OLS) estimators \hat{\beta}_0 and \hat{\beta}_1 are random variables due to the of the error terms, and their variances quantify the sampling variability around their expected values. Under the assumptions of the —including , strict exogeneity, homoscedasticity (constant error variance \sigma^2), and no perfect —the variance of the slope estimator is given by \text{Var}(\hat{\beta}_1) = \frac{\sigma^2}{\sum_{i=1}^n (x_i - \bar{x})^2}, where \sum_{i=1}^n (x_i - \bar{x})^2 denotes the total variation in the predictor variable x, often abbreviated as S_{xx}. This formula arises from the Gauss-Markov theorem, which establishes the OLS estimators as the best linear unbiased estimators with minimum variance under these assumptions. The variance of the intercept estimator is \text{Var}(\hat{\beta}_0) = \sigma^2 \left( \frac{1}{n} + \frac{\bar{x}^2}{\sum_{i=1}^n (x_i - \bar{x})^2} \right), reflecting contributions from both sample size n and the positioning of the mean \bar{x} relative to the spread in x. Larger values of \sum_{i=1}^n (x_i - \bar{x})^2 reduce \text{Var}(\hat{\beta}_1), improving precision by leveraging greater dispersion in the predictors, while increasing n diminishes \text{Var}(\hat{\beta}_0) through the $1/n term. The covariance between \hat{\beta}_0 and \hat{\beta}_1 is \text{Cov}(\hat{\beta}_0, \hat{\beta}_1) = -\bar{x} \sigma^2 / \sum_{i=1}^n (x_i - \bar{x})^2, indicating negative dependence that strengthens when \bar{x} is farther from zero. Since \sigma^2 is unknown in practice, it is estimated unbiasedly by s^2 = \sum_{i=1}^n e_i^2 / (n-2), where e_i = y_i - \hat{y}_i are the residuals and n-2 reflects the lost to estimating two parameters. The of the slope, s_{\hat{\beta}_1} = s / \sqrt{\sum_{i=1}^n (x_i - \bar{x})^2}, then provides an estimate of \sqrt{\text{Var}(\hat{\beta}_1)} for purposes, assuming homoscedasticity holds. These expressions highlight how estimator precision depends on variance and configuration, with violations of homoscedasticity potentially inflating these variances.

Inference Procedures

Confidence Intervals

In simple linear regression, confidence intervals quantify the uncertainty around estimates of the regression coefficients and predicted values by providing a range likely to contain the true population values with a specified level, such as 95%. These intervals rely on the t-distribution with n-2 to account for the additional variability from estimating the error variance s^2. The standard errors used in these intervals derive from the variances of the estimators, ensuring the intervals reflect the sampling variability under the model's assumptions. The for the slope coefficient \beta_1 is constructed as
b_1 \pm t_{\alpha/2, n-2} \, s_{b_1},
where s_{b_1} = s / \sqrt{\sum_{i=1}^n (x_i - \bar{x})^2} is the of the slope, and s = \sqrt{\sum_{i=1}^n (y_i - \hat{y}_i)^2 / (n-2)} is the . Similarly, the interval for the intercept \beta_0 is
b_0 \pm t_{\alpha/2, n-2} \, s_{b_0},
with s_{b_0} = s \sqrt{1/n + \bar{x}^2 / \sum_{i=1}^n (x_i - \bar{x})^2}. These intervals indicate the plausible range for the true coefficients, with narrower widths for larger sample sizes or stronger linear relationships.
For the mean response at a specific predictor value x_0, the estimates the of y and is given by
\hat{y}_0 \pm t_{\alpha/2, n-2} \, s \sqrt{\frac{1}{n} + \frac{(x_0 - \bar{x})^2}{\sum_{i=1}^n (x_i - \bar{x})^2}}.
This formula incorporates the of x_0 relative to the data, making the wider when x_0 is distant from \bar{x}, as increases .
The for an future observation at x_0 extends beyond the mean response to account for both the in the mean and the inherent variability of a single y, yielding
\hat{y}_0 \pm t_{\alpha/2, n-2} \, s \sqrt{1 + \frac{1}{n} + \frac{(x_0 - \bar{x})^2}{\sum_{i=1}^n (x_i - \bar{x})^2}}.
The extra term under the ensures this is always wider than the mean response , reflecting the added variability of predictions. Intervals for both the mean and predictions broaden with greater distance from the data's predictor range, emphasizing the risks of .
In large samples where n is sufficiently big, an asymptotic approximation replaces the t-critical value with the z-critical value from the standard normal distribution, particularly when strict normality of errors is not assumed, though the t-based approach remains preferred for smaller samples.

Hypothesis Testing

In simple linear regression, hypothesis testing is used to assess the significance of the relationship between the predictor variable x and the response variable y. The primary test focuses on the slope parameter \beta_1, evaluating whether it differs from zero, which would indicate no linear relationship. The null hypothesis is H_0: \beta_1 = 0 (no linear association), and the alternative hypothesis is H_a: \beta_1 \neq 0 (a linear association exists). The test statistic is the t-ratio, given by t = \frac{b_1}{s_{b_1}}, where b_1 is the estimated slope and s_{b_1} is its standard error. Under H_0, this statistic follows a t-distribution with n-2 degrees of freedom, where n is the sample size. For the overall model fit, an examines whether the explains a significant portion of the variability in y. The is again H_0: \beta_1 = 0, testing if the model is better than a line through the . The is F = \frac{\text{MSR}}{\text{MSE}} = \frac{\text{[SSR](/page/SSR)}/1}{\text{[SSE](/page/SSE)}/(n-2)}, where is the (variability explained by the model) and is the error (unexplained variability). This simplifies to F = \frac{R^2}{1 - R^2} \cdot (n-2), with R^2 as the . Under H_0, F follows an with 1 and n-2 . In simple linear , the F-test is mathematically equivalent to the square of the t-test for the slope, as F = t^2. Decisions in hypothesis testing rely on p-values, which are the probability of observing a at least as extreme as the calculated value under H_0. The is rejected if the p-value is less than the significance level \alpha (commonly 0.05), indicating sufficient evidence of a linear relationship. Critical values from the t- or can also be used for comparison. The analysis of variance (ANOVA) table provides a structured summary for these tests, decomposing the (SST) into SSR and SSE components: \text{SST} = \text{SSR} + \text{SSE}, where SST measures total variability in y around its mean. The table includes (df: 1 for regression, n-2 for error, n-1 total), mean squares (MSR = SSR/1, MSE = SSE/(n-2)), the F-statistic, and . This breakdown quantifies how much variance the model captures versus random error. Power considerations in these tests highlight the importance of sample for detecting true effects. The (1 - \beta) is the probability of rejecting H_0 when it is false, depending on the effect (e.g., standardized ), significance level \alpha, and n. For instance, detecting a small slope difference often requires larger samples, with formulas or software like used to compute required n based on desired (e.g., 0.80).

Numerical Example

Data Setup

To illustrate simple linear regression, consider data from a sample of 10 students examining the relationship between height (in inches) and weight (in pounds). The dataset consists of the following paired observations:
Height (inches)Weight (pounds)
63127
64121
66142
69157
69162
71156
71169
72165
73181
75208
Summary statistics for this dataset include a mean height \bar{X} = 69.3 inches, mean weight \bar{Y} = 158.8 pounds, standard deviation of height s_X \approx 3.92 inches, standard deviation of weight s_Y \approx 25.4 pounds, and sample correlation coefficient r \approx 0.95. A scatterplot of weight against height shows a strong positive linear trend, with points generally increasing from lower left to upper right and minimal deviation from a straight line, suggesting that height is a useful predictor of weight in this sample.

Computation Steps

To compute the ordinary least squares (OLS) estimators for the simple linear regression model using the example data, first calculate the sample means \bar{x} = 69.3 and \bar{y} = 158.8. The slope estimator b_1 is then computed using the formula b_1 = \frac{\sum_{i=1}^n (x_i - \bar{x})(y_i - \bar{y})}{\sum_{i=1}^n (x_i - \bar{x})^2}, which yields b_1 = 6.1. The intercept estimator b_0 is b_0 = \bar{y} - b_1 \bar{x} = 158.8 - 6.1 \times 69.3 = -266.5. Thus, the fitted line is \hat{Y} = -266.5 + 6.1 X. Next, fitted values \hat{y}_i = b_0 + b_1 x_i and residuals e_i = y_i - \hat{y}_i are computed for each . The sum of squared residuals () is approximately 597. The unbiased estimate of the error variance is s^2 = \frac{\mathrm{[SSE](/page/SSE)}}{n-2} = \frac{597}{8} \approx 75, where n=10. The R^2 measures the proportion of variance in Y explained by the model and is calculated as R^2 = r^2 \approx (0.95)^2 = 0.90, where the \mathrm{SST} = (n-1) s_Y^2 \approx 9 \times (25.4)^2 \approx 5806, and \mathrm{[SSE](/page/SSE)} = (1 - R^2) \mathrm{[SST](/page/SST)} \approx 0.10 \times 5806 \approx 581.

Extensions and Alternatives

Regression Without Intercept

In simple linear regression without an intercept, also known as through the origin, the model assumes that the response variable Y_i is directly proportional to the predictor variable X_i with no additive . The model is expressed as Y_i = \beta_1 X_i + \varepsilon_i, where \varepsilon_i are errors with mean zero and variance \sigma^2, and the fitted value is \hat{Y}_i = b_1 X_i. The least-squares estimator for the slope b_1 is obtained by minimizing the sum of squared residuals \sum_{i=1}^n (Y_i - b_1 X_i)^2. Differentiating this objective with respect to b_1 and setting it to zero yields the normal equation \sum X_i Y_i = b_1 \sum X_i^2, so b_1 = \frac{\sum_{i=1}^n X_i Y_i}{\sum_{i=1}^n X_i^2}. This estimator is unbiased under the model assumption that the true intercept is zero, but it becomes biased if the true model includes a nonzero intercept. This formulation is appropriate when theoretical or physical principles dictate that the relationship passes through the origin, such as in where the restoring force is directly proportional to displacement with no , or in certain experiments where zero input yields zero output. In contrast to the with an intercept, the no-intercept version has only one parameter to estimate, so the for estimating \sigma^2 is n-1 rather than n-2. When applying this model to data originally analyzed with an intercept, the resulting R^2, defined as $1 - \frac{\sum (Y_i - \hat{Y}_i)^2}{\sum Y_i^2}, is typically lower because the forced origin constraint increases the relative to the total uncorrected , reflecting a poorer fit unless the true intercept is indeed zero.

Other Fitting Methods

When the assumption of ordinary least squares (OLS) that errors occur only in the dependent variable fails, alternative fitting methods account for measurement errors in both the independent and dependent variables. These techniques, known as errors-in-variables models, minimize distances that are not solely vertical, providing more appropriate fits in scenarios such as calibration problems or physical measurements where both variables are subject to noise. Total least squares (TLS), also called orthogonal regression, minimizes the sum of the squared perpendicular (orthogonal) distances from data points to the fitted line, treating errors symmetrically in both variables. This approach is particularly suitable when the error variances in the independent variable x and dependent variable y are assumed equal. The slope b_1 in TLS is given by b_1 = \frac{s_Y^2 - s_X^2 + \sqrt{(s_Y^2 - s_X^2)^2 + 4 r^2 s_X^2 s_Y^2}}{2 r s_X s_Y}, where s_X and s_Y are the sample standard deviations of x and y, and r is the sample correlation coefficient; the intercept b_0 follows as b_0 = \bar{y} - b_1 \bar{x}. TLS can be computed via singular value decomposition of the data matrix, offering a geometrically intuitive solution for symmetric error structures. Deming regression extends TLS by incorporating a known ratio \lambda of the error variances (\sigma_X^2 / \sigma_Y^2), weighting the distances accordingly to balance the contributions from errors in x and y. The slope is then b_1 = \frac{s_Y^2 - \lambda s_X^2 + \sqrt{(s_Y^2 - \lambda s_X^2)^2 + 4 \lambda r^2 s_X^2 s_Y^2}}{2 r s_X s_Y}. This method, originally formulated for adjusting data with correlated errors, is widely used in fields like analytical chemistry for method comparison. These alternatives are recommended when there is substantial measurement error in the predictors or when residuals should not be assumed vertical, such as in instrumental calibrations or biological assays. In contrast to OLS, which assumes no error in x and thus minimizes only vertical residuals, TLS and Deming provide symmetric treatment but can be less statistically efficient if the OLS assumptions hold, as they do not leverage the full information from the error-free predictor. Despite their advantages, TLS and Deming regression are more computationally complex than OLS and require estimates of error variance ratios, which may introduce bias if misspecified; additionally, they assume homoscedastic errors and linearity, limiting applicability without further extensions.

References

  1. [1]
    2.1 - What is Simple Linear Regression? | STAT 462
    Simple linear regression is a statistical method that allows us to summarize and study relationships between two continuous (quantitative) variables.
  2. [2]
    [PDF] Simple Linear Regression
    Simple linear regression models the relationship between two variables, x and y, with the equation y = β0 + β1 x + ε, where ε is a random error term.
  3. [3]
    1.1 - What is Simple Linear Regression? | STAT 501
    Simple linear regression is a statistical method to study relationships between two continuous variables, one predictor and one response, using only one ...
  4. [4]
    Introduction to linear regression - Duke People
    This type of predictive model was first studied in depth by a 19th-Century scientist, Sir Francis Galton. Galton was a self-taught naturalist, anthropologist, ...
  5. [5]
    [PDF] Simple Linear Regression
    1.2 Regression Models and Their Uses. Historical Origins. Regression analysis was first developed by Sir Francis Galton in the latter part of the. 19th century ...
  6. [6]
    [PDF] Galton, Pearson, and the Peas: A Brief History of Linear Regression ...
    Dec 1, 2017 · This paper presents a brief history of how Galton originally derived and applied linear regression to problems of heredity. This history.
  7. [7]
    Testing the assumptions of linear regression - Duke People
    There are four principal assumptions which justify the use of linear regression models for purposes of inference or prediction: (i) linearity and additivity ...
  8. [8]
    Lesson 4: SLR Model Assumptions | STAT 501
    Since the assumptions relate to the (population) prediction errors, we do this through the study of the (sample) estimated errors, the residuals. We focus in ...
  9. [9]
    2.3 - The Simple Linear Regression Model | STAT 462
    The least squares regression line doesn't match the population regression line perfectly, but it is a pretty good estimate. And, of course, we'd get a ...
  10. [10]
    [PDF] Chapter 9 Simple Linear Regression - Statistics & Data Science
    Simple linear regression is an analysis for a quantitative outcome and a single quantitative explanatory variable, using the model E(Y |x) = β0 + β1x.
  11. [11]
    Simple Linear Regression - Statistics Resources
    Oct 27, 2025 · The Simple Linear Regression is used to create a predictive model using one independent variable and one dependent variable.
  12. [12]
    [PDF] The Classical Linear Regression Model and Least Squares
    The CLRM rests on 4 key assumptions. An optional 5th assumption (normality of the error term) can be added in cases where a fully parametric approach is desired ...
  13. [13]
    Lesson 4: SLR Model Assumptions - STAT ONLINE
    The residuals "bounce randomly" around the residual = 0 line. This suggests that the assumption that the relationship is linear is reasonable.
  14. [14]
    [PDF] The Simple Linear Regression Model: Specification and Estimation
    Jan 17, 2011 · In order for the Gauss-Markov Theorem to hold, assumptions SR1-SR5 must be true. If any of these assumptions are not true, then b1 and b2 are ...
  15. [15]
    [PDF] 1 ROBUSTNESS Our model for simple linear regression has four ...
    ROBUSTNESS. Our model for simple linear regression has four assumptions: 1. Linear mean function: E(Y|x) = η0 + η1x. 2.Constant variance of conditional ...
  16. [16]
    [PDF] Legendre On Least Squares - University of York
    The portion of the work translated here is found on pages 72–75. Adrien-Marie Legendre (1752–1833) was for five years a professor of mathematics in the École ...Missing: original | Show results with:original
  17. [17]
    [PDF] Chapter 2 Simple Linear Regression Analysis - IIT Kanpur
    Direct regression method. This method is also known as the ordinary least squares estimation. Assuming that a set of n paired observations on ( , ),. 1,2 ...Missing: historical | Show results with:historical
  18. [18]
    [PDF] Applied Linear Regression - Purdue Department of Statistics
    ... book is to help you learn to do data analysis using linear regression. Linear regression is a excellent model for learning about data analysis, both because ...
  19. [19]
    Gauss on least-squares and maximum-likelihood estimation
    Apr 2, 2022 · Gauss' 1809 discussion of least squares, which can be viewed as the beginning of mathematical statistics, is reviewed.Missing: Carl chance
  20. [20]
    Nouvelles méthodes pour la détermination des orbites des comètes
    Jun 18, 2008 · Nouvelles méthodes pour la détermination des orbites des comètes. by: Adrien Marie Legendre. Publication date: 1805. Publisher: F. Didot.
  21. [21]
    12.3 - Simple Linear Regression - STAT ONLINE
    A positive slope indicates a line moving from the bottom left to top right. A negative slope indicates a line moving from the top left to bottom right. For ...
  22. [22]
    Application and interpretation of linear-regression analysis - PMC
    A simple linear regression involves a single independent variable, whereas multiple linear regression includes multiple predictors. A linear-regression model is ...
  23. [23]
    Common pitfalls in statistical analysis: Linear regression analysis - NIH
    Simple linear regression (i.e., its coefficient or “b”) predicts the nature of the association – it provides a means of predicting the value of dependent ...
  24. [24]
    The First Method for Finding $\beta_0$ and $\beta_1$
    Now, we can find β0 and β1 if we know EX, EY, Cov(X,Y)Var(X). Here, we have the observed pairs (x1,y1), (x2,y2), ⋯, (xn,yn), so we may estimate these ...
  25. [25]
    [PDF] Simple Linear Regression Models, with Hints at Their Estimation
    In lecture 1, we saw that the optimal linear predictor of Y from X has slope β1 = Cov [X, Y ] /Var [X], and intercept β0 = E [Y ]-β1E [X]. A common tactic.Missing: beta1 = | Show results with:beta1 =
  26. [26]
    12.3 - Simple Linear Regression | STAT 200
    Example: Interpreting the Equation for a Line Section ... Here, the y -intercept is 6.5. This means that when x = 0 then the predicted value of y is 6.5. The ...
  27. [27]
    Simple Linear Regression - Data Science Discovery
    Simple linear regression fits a line to points using one x-variable to predict one y-variable, using one independent variable.Missing: definition | Show results with:definition
  28. [28]
    Mathematics of simple regression - Duke People
    Review of the mean model, formulas for the slope and intercept of a simple regression model, formulas for R-squared and standard error of the regression.
  29. [29]
    Correlation Coefficient -- from Wolfram MathWorld
    The correlation coefficient, sometimes also called the cross-correlation coefficient, Pearson correlation coefficient (PCC), Pearson's r ...
  30. [30]
    Pearson Correlation Coefficient (r) | Guide & Examples - Scribbr
    May 13, 2022 · It is a number between –1 and 1 that measures the strength and direction of the relationship between two variables.
  31. [31]
    1.6 - (Pearson) Correlation Coefficient, \(r\) | STAT 501
    If b 1 is positive, then r takes a positive sign. That is, the estimated slope and the correlation coefficient r always share the same sign. Furthermore, ...
  32. [32]
    2.6 - (Pearson) Correlation Coefficient r | STAT 462
    Furthermore, because r2 is always a number between 0 and 1, the correlation coefficient r is always a number between -1 and 1.
  33. [33]
    18.5 - Use and Misuse of Correlation Coefficients | STAT 509
    The Pearson correlation coefficient can be very sensitive to outlying observations and all correlation coefficients are susceptible to sample selection biases.Missing: limitations | Show results with:limitations
  34. [34]
    19
    The Pearson correlation is very sensitive (affected by) extreme values (outliers), and also should not be used for non-linear relationships.
  35. [35]
    Proof: Expectation of parameter estimates for simple linear regression
    Oct 27, 2021 · Proof: Expectation of parameter estimates for simple linear regression ... which means that the ordinary least squares solution produces unbiased ...Missing: textbook | Show results with:textbook
  36. [36]
    The Gauss-Markov Theorem and BLUE OLS Coefficient Estimates
    When your model satisfies the assumptions, the Gauss-Markov theorem states that the OLS procedure produces unbiased estimates that have the minimum variance.
  37. [37]
    7 Classical Assumptions of Ordinary Least Squares (OLS) Linear ...
    OLS does not require that the error term follows a normal distribution to produce unbiased estimates with the minimum variance. However, satisfying this ...
  38. [38]
    Gauss Markov theorem - StatLect
    The Gauss Markov theorem: under what conditions the OLS estimator of the coefficients of a linear regression is BLUE (best linear unbiased estimator).
  39. [39]
    Theory of the combination of observations least subject to error : part ...
    Aug 9, 2019 · Theory of the combination of observations least subject to error : part one, part two, supplement = Theoria combinationis observationum ...
  40. [40]
    Linear Regression Analysis - George A. F. Seber, Alan J. Lee
    Concise, mathematically clear, and comprehensive treatment of the subject. Expanded coverage of diagnostics and methods of model fitting.<|control11|><|separator|>
  41. [41]
    7.5 - Confidence Intervals for Regression Parameters | STAT 415
    Confidence intervals for regression parameters are derived using probability distributions. The slope parameter interval is given, and the intercept parameter  ...
  42. [42]
    12.3.5 - Confidence Interval for Slope | STAT 200
    We can use the slope that was computed from our sample to construct a confidence interval for the population slope ( β 1 ). This confidence interval follows the ...
  43. [43]
    2.1 - Inference for the Population Intercept and Slope | STAT 501
    The resulting confidence interval gives us a range of values that is likely to contain the true unknown value β 0 . The factors affecting the length of a ...
  44. [44]
    Confidence vs prediction intervals for regression
    1. In fact, for least squares simple linear regression, The width of the confidence interval depends on the variance of ŷ = ax + b as an estimator of E(Y|X = x ...
  45. [45]
    6.4 - The Hypothesis Tests for the Slopes | STAT 501
    To test that all of the slope parameters in a multiple linear regression model are 0, we use the overall F-test reported in the analysis of variance table.
  46. [46]
    [PDF] Chapter 2: Simple Linear Regression - Purdue Department of Statistics
    The method of least squares is to estimate β0 and β1 so that the sum of the squares of the differ- ence between the observations yi and the straight.
  47. [47]
    3.5 - The Analysis of Variance (ANOVA) table and the F-test
    3.5 - The Analysis of Variance (ANOVA) table and the F-test · The degrees of freedom associated with SSR will always be 1 for the simple linear regression model.
  48. [48]
    2.6 - The Analysis of Variance (ANOVA) table and the F-test
    The degrees of freedom associated with SSR will always be 1 for the simple linear regression model. The degrees of freedom associated with SSTO is n-1 = 49-1 = ...
  49. [49]
    [PDF] Statistical power analyses using G*Power 3.1: Tests for correlation ...
    This section summarizes power analysis procedures addressing regression coefficients in the bivariate linear standard model Yi 5 a 1 b·Xi 1 Ei, where Yi and Xi ...
  50. [50]
    [PDF] Sample Size Calculations in Simple Linear Regression
    The focus of this disquisition is sample size determination in the context of a simple linear regression. The regression model has an outcome variable Y and a ...
  51. [51]
    3.1 - The Research Questions | STAT 501
    Here are some weight and height data from a sample of n = 10 people, (Student Height and Weight data):. If we used the average weight of the 10 people in the ...
  52. [52]
    None
    ### Extracted Data
  53. [53]
    Regression through the origin - Statistics LibreTexts
    Aug 17, 2020 · Regression through the origin. Sometimes due to the nature of the problem (e.g. (i) physical law ... The following example illustrates such a ...
  54. [54]
    [PDF] STAT611, Semester II 2004-2005 The simple linear regression ...
    The designation simple indicates that there is only one predictor variable x, and linear means that the model is linear in β0 and β1. The intercept β0 and ...
  55. [55]
    When is it ok to remove the intercept in a linear regression model?
    Mar 7, 2011 · A model without the intercept could have better performance. In this way the intercept is like any other regressor variable that we might choose ...When forcing intercept of 0 in linear regression is acceptable ...Why is a zero-intercept linear regression model predicts better than ...More results from stats.stackexchange.com
  56. [56]
    Regression lines - DataClassroom User Guide
    Forcing a linear regression through the origin (0,0). Sometimes you know that there are some physical laws that mean that a linear relationship you are ...
  57. [57]
    Linear Regression through the Origin - RPubs
    Jul 22, 2016 · As there is only one estimated parameter in RTO, the degrees of freedom increases by one. Therefore the SSE and MSE are: SSE=n∑i=1(Y ...
  58. [58]
    FAQ: Why are R2 and F so large for models without a constant?
    When the intercept (or constant term) is left off and it does not have a true zero effect, the total sum of squares being modelled is increased. This tends to ...
  59. [59]
    The Why, How, and Cautions of Regression Without an Intercept
    Sep 16, 2023 · Another way of stating that the model goes through the origin is that the intercept of the model should be zero. Forcing the intercept to be ...
  60. [60]
    An Analysis of the Total Least Squares Problem
    The restricted total least squares (RTLS) problem, presented in this paper, is devised for solving overdetermined sets of linear equations 𝐴 ⁢ 𝑋 ≈ 𝐵 in which ...
  61. [61]
    Statistical Adjustment Of Data : Deming W Edwards - Internet Archive
    Jan 20, 2017 · Statistical Adjustment Of Data. by: Deming W Edwards. Publication date: 1938. Topics: C-DAC. Collection: digitallibraryindia; JaiGyan. Language ...