Fact-checked by Grok 2 weeks ago

Line fitting

Line fitting is a statistical procedure for determining the straight line that best approximates a set of scattered data points in a two-dimensional plane, typically representing the linear relationship between an independent variable (often denoted as x) and a dependent variable (often denoted as y).^[1] The primary goal is to minimize the discrepancies, or residuals, between the observed data and the predicted values on the line, enabling predictions, trend identification, and modeling of linear associations in fields such as statistics, economics, engineering, and natural sciences.^[2] The most widely used technique for line fitting is the method of least squares, which constructs the line by minimizing the sum of the squared vertical residuals between each data point and the line.^[3] For a dataset with n points (x_i, y_i), the line equation is \hat{y} = b_0 + b_1 x, where the slope b_1 = \frac{\sum_{i=1}^n (x_i - \bar{x})(y_i - \bar{y})}{\sum_{i=1}^n (x_i - \bar{x})^2} and the intercept b_0 = \bar{y} - b_1 \bar{x}, with \bar{x} and \bar{y} as the means of the x and y values, respectively.^[1] This approach assumes errors are primarily in the y-direction and follow a Gaussian distribution, yielding unbiased estimators under certain conditions like linearity and homoscedasticity.^[3] The goodness of fit is often assessed using the coefficient of determination r^2, which measures the proportion of variance in y explained by the line, ranging from 0 to 1.^[1] Historically, the least squares method emerged in the early 19th century amid efforts to refine astronomical observations. Adrien-Marie Legendre first published the technique in 1805 in his work Nouvelles méthodes pour la détermination des orbites des comètes, presenting it as an algebraic tool for minimizing errors in planetary position calculations without probabilistic grounding.^[4] Carl Friedrich Gauss, in his 1809 book Theoria Motus Corporum Coelestium, claimed prior development around 1795 and provided a theoretical justification based on the assumption of normally distributed errors, arguing that least squares yields the maximum likelihood estimate.^[4] This sparked a priority dispute, though both contributions advanced the method's adoption; independently, American mathematician Robert Adrain derived a similar approach in 1808 for surveying problems.^[4] While least squares dominates due to its mathematical simplicity and statistical properties, alternative methods address limitations such as errors in both variables or outlier sensitivity. Total least squares (or orthogonal regression) minimizes perpendicular distances to the line, suitable when measurement errors affect both x and y equally, as in calibration problems.^[5] Robust techniques, like M-estimation or median-based fitting, reduce the impact of outliers by using less sensitive loss functions instead of squared residuals.^[6] These alternatives are particularly valuable in noisy datasets from physics or environmental science, though they often require more computational effort.^[6] In practice, line fitting underpins simple linear regression and extends to diagnostic tools like residual plots to validate assumptions and detect non-linearity or heteroscedasticity.^[3] Software such as MATLAB, R, or Python's SciPy implements these methods efficiently, facilitating applications from economic forecasting to biological growth modeling.^[7]

Overview

Definition

Line fitting, also known as simple linear regression, is a fundamental statistical technique used to identify the straight line that best approximates a set of two-dimensional data points, each consisting of an observed pair (x_i, y_i) for i = 1 to n.^[8]^[9] In this context, the line is typically represented by the equation y = mx + c, where m denotes the slope (indicating the rate of change in y per unit change in x) and c is the y-intercept (the value of y when x = 0).^[8]^[10] The primary purposes of line fitting include summarizing underlying trends in the data, predicting future values of the dependent variable y based on the independent variable x, and modeling the linear relationship between two quantitative variables.^[8]^[9] This approach assumes that the relationship can be adequately captured by a linear function, enabling quantitative analysis in fields such as economics, biology, and engineering.^[11] Line fitting models can be distinguished as either deterministic or stochastic. A deterministic model posits an exact relationship without accounting for variability or error, such that each y is precisely determined by x via the line equation.^[12]^[13] In contrast, a stochastic model incorporates random error terms to reflect real-world measurement inaccuracies or unexplained variation, treating the observed points as realizations around the true line.^[13]^[14] The quality of the fit in these models can be geometrically interpreted as the line that minimizes deviations from the data points in a relevant metric.^[15]

Geometric interpretations

In line fitting, the vertical distance quantifies the discrepancy between an observed data point (x_i, y_i) and the corresponding point on the fitted line (\hat{y}_i, x_i), defined as the residual e_i = y_i - \hat{y}_i, where \hat{y}_i = m x_i + c and m, c are the slope and intercept parameters.^[16] This measure assumes that the independent variable x_i is measured without error, while deviations occur only in the dependent variable y_i, making it suitable for models where x is controlled or precisely known.^[17] The perpendicular, or orthogonal, distance provides a more symmetric geometric measure by calculating the shortest Euclidean distance from a data point to the fitted line, expressed as d_i = \frac{|a x_i + b y_i + c|}{\sqrt{a^2 + b^2}} for the line equation a x + b y + c = 0. This distance treats errors in both x and y directions equally, projecting the point orthogonally onto the line rather than vertically.^[18] The distinction between vertical and perpendicular distances is critical: vertical distances are appropriate for asymmetric error models where predictions focus on y given x, whereas perpendicular distances account for isotropic errors in both variables, leading to a more balanced fit in scenarios like calibration or principal component analysis. For instance, in ordinary least squares, residuals appear as vertical segments in visualizations, emphasizing vertical deviations, while total least squares uses perpendicular segments to minimize orthogonal offsets.^[19] Scatter plots effectively illustrate these concepts by overlaying the fitted line on the data cloud and depicting residual vectors as arrows from each point to the line. In a classic example using brushtail possum measurements, a scatter plot of head length versus total length shows the regression line \hat{y} = 41 + 0.59x with vertical residuals, such as -1.1 for the point (77.0, 85.3) and +7.45 for (85.0, 98.6), highlighting how points above the line contribute positive residuals and those below contribute negative ones.^[16] Similarly, in studies of alcohol consumption and muscle strength, residual plots versus fitted values reveal random scattering around zero if the linear model holds, with vertical distances mirroring deviations from the line in the original scatter plot.^[17] These visualizations underscore the geometric quality of the fit, where tight clustering of residuals indicates good alignment.

Mathematical foundations

Model specification

In line fitting, the parametric linear model describes the relationship between a response variable y_i and a predictor variable x_i for i = 1, \dots, n observations as
y_i = \beta_0 + \beta_1 x_i + \varepsilon_i,
where \beta_0 is the intercept, \beta_1 is the slope, and \varepsilon_i is the error term representing the deviation from the true line.^[20] This model posits that the expected value of y_i given x_i follows a straight line, with the errors capturing unexplained variability. The error terms \varepsilon_i are typically assumed to be independent and identically distributed as \varepsilon_i \sim N(0, \sigma^2), implying a Gaussian distribution with mean zero and constant variance \sigma^2.^[21] This normality assumption facilitates statistical inference, such as confidence intervals and hypothesis tests for the parameters. However, the model can be generalized to non-normal error distributions, such as in generalized linear models, depending on the data characteristics.^[21] A key aspect of the error structure is homoscedasticity, where the variance \sigma^2 remains constant across all levels of x_i; in contrast, heteroscedasticity occurs when the error variance varies with x_i, potentially requiring adjusted estimation techniques.^[22] In the context of line fitting to bivariate points, these errors often correspond to vertical distances from the points to the line, though geometric interpretations may involve perpendicular distances.^[20] Simple linear regression represents a special case of line fitting, where the predictors x_i are treated as fixed and non-stochastic, and errors are confined to the response direction.^[23]

Objective functions

In line fitting, the objective function defines the mathematical criterion for selecting the parameters that yield the "best" approximating line to a set of data points, typically by minimizing a measure of discrepancy or error. The most common approach is the least squares criterion, which minimizes the sum of the squared vertical residuals between the observed points and the line. For a simple linear model y_i = \beta_0 + \beta_1 x_i + \epsilon_i, where i = 1, \dots, n, the objective function is

S(\beta_0, \beta_1) = \sum_{i=1}^n (y_i - (\beta_0 + \beta_1 x_i))^2.

This method assumes errors \epsilon_i are independent and identically distributed, often normally, and focuses on vertical distances, implying measurement precision in the dependent variable y. It was first formally proposed by Adrien-Marie Legendre in 1805 for orbit determination and justified probabilistically by Carl Friedrich Gauss in 1809 as the maximum likelihood estimator under Gaussian errors.^[24]^[25] Geometrically, these residuals represent vertical distances from data points to the fitted line, emphasizing errors solely in the response variable. When errors are present in both independent and dependent variables, orthogonal least squares—also called total least squares—provides an alternative by minimizing the sum of squared perpendicular (orthogonal) distances from points to the line. The objective function becomes

S(\beta_0, \beta_1) = \sum_{i=1}^n \frac{(y_i - \beta_0 - \beta_1 x_i)^2}{1 + \beta_1^2},

which treats both variables symmetrically and is suitable for calibration or errors-in-variables models. This criterion was introduced by R. J. Adcock in 1878 as an extension of least squares for cases with equal error variances in x and y.^[26] Other objective functions address specific data characteristics, such as robustness or varying error scales. The least absolute deviations (L1 norm) criterion minimizes the sum of absolute residuals,

S(\beta_0, \beta_1) = \sum_{i=1}^n |y_i - (\beta_0 + \beta_1 x_i)|,

yielding estimates of the conditional median and offering greater resistance to outliers than squared errors; it dates to Roger Joseph Boscovich's work in 1757 on fitting lines to astronomical data.^[27] For heteroscedastic errors—where residual variance changes with x—weighted least squares incorporates weights w_i (often inversely proportional to estimated variances) into the sum of squared residuals,

S(\beta_0, \beta_1) = \sum_{i=1}^n w_i (y_i - (\beta_0 + \beta_1 x_i))^2,

to achieve more efficient estimates by downweighting high-variance observations.^[28] These objectives involve trade-offs in robustness and efficiency. Squared-error methods like ordinary least squares are sensitive to outliers, as large deviations disproportionately influence the fit due to quadratic penalization, potentially leading to biased lines in contaminated data.^[29] In contrast, the L1 norm's linear penalization reduces outlier impact, promoting stability but at the cost of lower efficiency under Gaussian errors compared to least squares.^[29] Orthogonal and weighted variants balance these by adapting to error structures, though they require assumptions about error equality or variance forms.

Estimation methods

Ordinary least squares

Ordinary least squares (OLS) is the foundational estimation method for fitting a straight line to data points in simple linear regression, where the model assumes the form y_i = \beta_0 + \beta_1 x_i + \epsilon_i for i = 1, \dots, n, with errors \epsilon_i confined to the response variable y and assumed to be independent, identically distributed, and normally distributed with mean zero and constant variance \sigma^2.^[30] This approach minimizes the sum of squared vertical residuals, \sum_{i=1}^n (y_i - \hat{y}_i)^2, where \hat{y}_i = \hat{\beta}_0 + \hat{\beta}_1 x_i represents the predicted values on the fitted line.^[31] To derive the OLS estimators, the objective function—the residual sum of squares—is differentiated with respect to \beta_0 and \beta_1, and the resulting normal equations are solved simultaneously.^[30] The slope estimator is given by

\hat{\beta}_1 = \frac{\sum_{i=1}^n (x_i - \bar{x})(y_i - \bar{y})}{\sum_{i=1}^n (x_i - \bar{x})^2},

where \bar{x} and \bar{y} are the sample means of the predictors and responses, respectively.^[30] The intercept estimator follows as

\hat{\beta}_0 = \bar{y} - \hat{\beta}_1 \bar{x}.

^[30] These closed-form expressions arise from the orthogonality conditions that the residuals are uncorrelated with the constant term and the predictor variable.^[32] In matrix notation, the simple linear model can be expressed as \mathbf{y} = \mathbf{X} \boldsymbol{\beta} + \boldsymbol{\epsilon}, where \mathbf{y} is the n \times 1 vector of observations, \mathbf{X} is the n \times 2 design matrix with first column of ones and second column of x_i values, \boldsymbol{\beta} = (\beta_0, \beta_1)^T is the parameter vector, and \boldsymbol{\epsilon} is the error vector.^[33] The OLS estimator is then \hat{\boldsymbol{\beta}} = (\mathbf{X}^T \mathbf{X})^{-1} \mathbf{X}^T \mathbf{y}, assuming \mathbf{X}^T \mathbf{X} is invertible, which requires no perfect multicollinearity (here, ensured by variation in the x_i).^[33] Computationally, OLS proceeds in steps: first, calculate the sample means \bar{x} = \frac{1}{n} \sum_{i=1}^n x_i and \bar{y} = \frac{1}{n} \sum_{i=1}^n y_i; next, compute the numerator covariance-like term \sum_{i=1}^n (x_i - \bar{x})(y_i - \bar{y}) and the denominator \sum_{i=1}^n (x_i - \bar{x})^2; then apply the formulas for \hat{\beta}_1 and \hat{\beta}_0.^[31] This process scales efficiently for simple cases and forms the basis for software implementations in statistical packages.^[32] Under the classical linear model assumptions—linearity in parameters, strict exogeneity (E[\epsilon_i | \mathbf{X}] = 0), homoscedasticity (Var(\epsilon_i | \mathbf{X}) = \sigma^2), and no perfect multicollinearity—the OLS estimators are unbiased, meaning E[\hat{\boldsymbol{\beta}}] = \boldsymbol{\beta}.^[34] Furthermore, by the Gauss-Markov theorem, they possess the minimum variance among all linear unbiased estimators, rendering them the best linear unbiased estimators (BLUE).^[34] This efficiency holds even without normality of errors, though normality enables additional inference procedures like t-tests.^[35]

Total least squares

Total least squares (TLS), also known as orthogonal regression, addresses line fitting scenarios where measurement errors are present in both the independent variable x and the dependent variable y, treating them symmetrically. Unlike ordinary least squares, which minimizes vertical residuals assuming error only in y, TLS minimizes the sum of the squared perpendicular (orthogonal) distances from the data points to the fitted line. This approach is particularly suitable for calibration problems or when variables are measured with comparable precision. The mathematical formulation of TLS for fitting a line to centered data points (x_i - \bar{x}, y_i - \bar{y}) involves constructing the data matrix \mathbf{X} = \begin{bmatrix} x_1 - \bar{x} & y_1 - \bar{y} \\ \vdots & \vdots \\ x_n - \bar{x} & y_n - \bar{y} \end{bmatrix} and finding the line that minimizes the Frobenius norm of the perturbation, equivalent to solving \mathbf{X} \mathbf{v} \approx \mathbf{0} with \|\mathbf{v}\| = 1. This is achieved through the singular value decomposition (SVD) of \mathbf{X} = U \Sigma V^T, where the line's normal vector is the right singular vector \mathbf{v} corresponding to the smallest singular value. The slope m of the line is then given by m = -\frac{v_{12}}{v_{11}}, where v_{11} and v_{12} are the first and second components of \mathbf{v}, respectively; the intercept is c = \bar{y} - m \bar{x}. This SVD-based solution ensures numerical stability and is computationally efficient for large datasets.^[36] TLS is mathematically equivalent to performing principal component analysis (PCA) on the augmented matrix [\mathbf{x} \ \mathbf{y}], where the first principal component (direction of maximum variance) aligns with the fitted line, thereby minimizing the variance in the perpendicular direction. In the context of the errors-in-variables (EIV) model, TLS assumes that the errors in x and y are independent, normally distributed, and have equal variances (\sigma_x^2 = \sigma_y^2), which justifies the isotropic perturbation in both dimensions. This assumption distinguishes TLS from more general EIV models that allow unequal variances. Orthogonal regression represents the unweighted, symmetric special case of TLS, emphasizing geometric fidelity over directional bias.^[36]

Other variants

Deming regression extends total least squares by incorporating a known ratio \delta = \sigma_\epsilon^2 / \sigma_\eta^2 of the error variances in the x and y directions, allowing for weighted treatment of measurement errors in both variables.^[37] This method is particularly useful in calibration and method comparison studies where errors in the predictor variable are non-negligible but their relative magnitude to response errors is estimated or assumed. The slope \beta_1 is given by

\beta_1 = \frac{s_{yy} - \delta s_{xx} + \sqrt{(s_{yy} - \delta s_{xx})^2 + 4 \delta s_{xy}^2}}{2 s_{xy}},

where s_{yy}, s_{xx}, and s_{xy} denote the sample variances and covariance, respectively.^[37] The intercept \beta_0 follows from \beta_0 = \bar{y} - \beta_1 \bar{x}. Originally outlined in foundational work on statistical adjustment, Deming regression provides unbiased estimates under the specified error structure and is implemented in statistical software for practical applications like analytical chemistry. Robust methods address violations of standard assumptions, such as the presence of outliers, by downweighting or excluding anomalous data points during estimation. The Random Sample Consensus (RANSAC) algorithm iteratively samples minimal subsets of data to hypothesize model parameters, then evaluates consensus sets of inliers that fit the hypothesis within a threshold, selecting the model with the largest consensus for robustness against gross errors. Introduced for computer vision tasks like line fitting in imagery, RANSAC has been widely adopted in engineering and remote sensing for its ability to handle up to 50% outliers without distributional assumptions. Iteratively reweighted least squares (IRLS), another robust approach, minimizes a weighted sum of squared residuals where weights are updated based on a robustness function (e.g., Huber's \psi-function) that reduces influence of large residuals in each iteration until convergence. This method, effective for moderate contamination levels, approximates M-estimators and is computationally efficient for line fitting in datasets with leverage points or heteroscedasticity. Bayesian line fitting treats regression parameters as random variables, deriving posterior distributions by combining a likelihood (typically Gaussian) with priors on the slope, intercept, and error variance to quantify uncertainty and incorporate domain knowledge. For a simple linear model y_i = \beta_0 + \beta_1 x_i + \epsilon_i, conjugate normal-inverse-gamma priors yield closed-form posteriors, while non-conjugate cases use Markov chain Monte Carlo (MCMC) sampling to approximate the full posterior for inference on parameters and predictions. MCMC methods, such as Gibbs sampling, enable exploration of complex posteriors in scenarios with informative priors, providing credible intervals that adapt to data sparsity or prior strength, as seen in geophysical and astronomical line fitting applications. While the above variants remain parametric, non-parametric alternatives like kernel smoothing offer flexible line fitting by locally weighting observations without assuming a fixed functional form, using kernel functions (e.g., Gaussian) to estimate the regression surface. The Nadaraya-Watson estimator, \hat{m}(x) = \sum_{i=1}^n K\left(\frac{x - x_i}{h}\right) y_i / \sum_{i=1}^n K\left(\frac{x - x_i}{h}\right), where K is the kernel and h the bandwidth, provides smooth fits robust to mild nonlinearity but requires bandwidth selection to balance bias and variance. These methods serve as extensions when parametric assumptions fail, though they are computationally intensive for large datasets.

Assumptions and diagnostics

Underlying assumptions

Line fitting, particularly through methods like ordinary least squares (OLS), relies on several key statistical assumptions to ensure the validity of inferences about the model parameters. These assumptions underpin the theoretical properties of the estimators, such as unbiasedness and efficiency under the Gauss-Markov theorem.^[38] The primary assumption is linearity, which posits that the true relationship between the response variable y and the predictor variable x is linear in the parameters. This means the model can be expressed as y_i = \beta_0 + \beta_1 x_i + \epsilon_i, where the errors \epsilon_i capture deviations from this linear form, ensuring that the expected value E(y_i | x_i) = \beta_0 + \beta_1 x_i. Violations of linearity, such as nonlinear relationships, can lead to biased parameter estimates in OLS.^[39] Another crucial assumption is the independence of errors, requiring that the disturbances \epsilon_i are independent across observations. This independence typically arises from random sampling of the data, preventing issues like autocorrelation that could arise in time-series contexts. Independence ensures that the variance-covariance matrix of the errors is diagonal, supporting the efficiency of OLS estimators.^[40] Homoscedasticity assumes constant variance of the errors, specifically \text{Var}(\epsilon_i) = \sigma^2 for all i, regardless of the value of x_i. This equal spread of residuals around the fitted line is essential for the OLS estimator to achieve minimum variance among linear unbiased estimators, as per the Gauss-Markov theorem. Departures from homoscedasticity, such as heteroscedasticity, can result in inefficient estimates and invalid standard errors.^[39] For statistical inference, such as constructing confidence intervals or performing t-tests on the coefficients, the errors are assumed to be normally distributed: \epsilon_i \sim N(0, \sigma^2). This normality assumption facilitates exact finite-sample inference under OLS, although it is less critical for large samples due to the central limit theorem. While OLS remains consistent without normality, violations can affect the reliability of hypothesis tests.^[40] Finally, the absence of perfect multicollinearity is assumed, meaning the predictors are not linearly dependent. In simple line fitting with a single predictor, this condition is inherently satisfied unless the predictor is constant across all observations, avoiding singular design matrices and ensuring unique parameter estimates. In extensions to multiple predictors, perfect multicollinearity would render some parameters inestimable.^[39] Violations of these assumptions can compromise the properties of OLS estimators, such as leading to biased, inconsistent, or inefficient results.^[40]

Model validation techniques

Model validation techniques in line fitting involve statistical methods to evaluate the adequacy of the fitted line, ensuring it meets key assumptions such as linearity, homoscedasticity, and independence of errors, while also assessing overall fit and identifying potential issues like outliers. These techniques are essential post-estimation to confirm the reliability of the model for inference and prediction.^[41] Residual analysis is a primary diagnostic tool, where residuals—defined as the differences between observed and fitted values, e_i = y_i - \hat{y}_i—are examined to verify model assumptions. Plotting residuals against fitted values helps detect non-linearity (if a pattern deviates from randomness), heteroscedasticity (if variance changes across fitted values), and outliers (if points are distant from zero). For time-series or ordered data, the Durbin-Watson test assesses autocorrelation in residuals; the test statistic d ranges from 0 to 4, with values near 2 indicating no first-order autocorrelation, while deviations suggest positive (d < 2) or negative (d > 2) serial correlation. Geometric residuals, which measure perpendicular distances to the line, can provide an alternative view but are less commonly used than vertical residuals for these checks.^[17]^[42]^[43] Goodness-of-fit is often quantified using the coefficient of determination, R^2, which measures the proportion of total variance in the response variable explained by the model:

R^2 = 1 - \frac{\sum (y_i - \hat{y}_i)^2}{\sum (y_i - \bar{y})^2} = 1 - \frac{SS_{res}}{SS_{tot}}

Here, SS_{res} is the residual sum of squares and SS_{tot} is the total sum of squares. Values of R^2 closer to 1 indicate better fit, though it should be interpreted cautiously as it does not imply causality or account for model complexity.^[44]^[45] Hypothesis tests provide formal statistical evaluation of the model's parameters. The t-test for the slope coefficient \beta_1 tests the null hypothesis H_0: \beta_1 = 0 against the alternative of a non-zero slope, using the test statistic t = \frac{\hat{\beta_1}}{SE(\hat{\beta_1})} with n-2 degrees of freedom; a low p-value rejects the null, indicating a significant linear relationship. The overall F-test assesses whether the model explains significant variance beyond an intercept-only model, with the statistic F = \frac{MS_{reg}}{MS_{res}} following an F-distribution with 1 and n-2 degrees of freedom for simple line fitting; rejection of H_0: \beta_1 = 0 supports model utility.^[46]^[47] Outlier and influential point detection uses measures like leverage and Cook's distance to identify data points that disproportionately affect the fit. Leverage h_i quantifies how far an observation's predictor value is from the mean, with high-leverage points (typically h_i > \frac{2(p+1)}{n}, where p=1 for simple regression) potentially pulling the line; values near 1 indicate extreme influence from position alone. Cook's distance D_i combines leverage and residual size to measure overall influence on fitted coefficients, calculated as D_i = \frac{e_i^2 h_i}{(p+1) s^2 (1 - h_i)^2}; points with D_i > \frac{4}{n} are flagged as influential. These diagnostics, introduced in seminal work on regression analysis, help ensure robust model validation.^[48]^[49]

Applications and extensions

Core applications

Line fitting, particularly through simple linear regression, serves as a foundational tool in data analysis for modeling the relationship between two continuous variables, enabling predictions of one variable based on the other.^[50] A classic example is predicting body weight from height, where data from college students have been used to fit a line showing that weight increases by approximately 4.85 pounds per inch of height, allowing estimates for new observations.^[51] This approach assumes a straight-line relationship and minimizes squared errors to quantify how changes in the predictor variable influence the response.^[10] In trend analysis, line fitting estimates rates of change over time or conditions, such as in biological growth curves where linear models describe body weight development in animals.^[52] For instance, researchers apply simple linear regression to track population growth or physiological changes, providing slopes that represent average growth rates per unit time, which inform evolutionary studies and breeding strategies in species like woody perennials.^[53] These applications highlight line fitting's role in capturing linear trends amid natural variability, without delving into nonlinear dynamics. Calibration represents another core use, where line fitting establishes response curves for analytical instruments, such as in spectroscopy to relate absorbance to concentration.^[54] In spectrophotometry, standards of known concentrations are measured to construct a linear calibration plot, typically following Beer's law, enabling accurate quantification of unknown samples by interpolating from the fitted line. This method ensures precision in fields like chemistry, where deviations from linearity signal potential errors in measurement procedures.^[55] In quality control, line fitting integrates with control charts to monitor process stability, such as tracking linear profiles in manufacturing where regression lines set upper and lower bounds for acceptable variation.^[56] For example, regression control charts detect shifts in slopes or intercepts of production trends, like part dimensions over time, outperforming traditional charts when correlations between variables are present.^[57] This facilitates early intervention to maintain product consistency.

Advanced extensions

Line fitting extends to multivariate settings through multiple linear regression, which generalizes the simple linear model to incorporate multiple predictors. The model is expressed as y_i = \beta_0 + \sum_{j=1}^p \beta_j x_{ij} + \epsilon_i, or in matrix form as \mathbf{y} = \mathbf{X} \boldsymbol{\beta} + \boldsymbol{\epsilon}, where \mathbf{y} is the n \times 1 response vector, \mathbf{X} is the n \times (p+1) design matrix with a column of ones for the intercept, \boldsymbol{\beta} is the (p+1) \times 1 parameter vector, and \boldsymbol{\epsilon} is the error term assumed to be normally distributed with mean zero and constant variance. The parameters are estimated using ordinary least squares (OLS), yielding \hat{\boldsymbol{\beta}} = (\mathbf{X}^T \mathbf{X})^{-1} \mathbf{X}^T \mathbf{y}, which minimizes the sum of squared residuals and provides unbiased estimates under standard assumptions of linearity, independence, homoscedasticity, and no perfect multicollinearity.^[58]^[59] This matrix formulation facilitates computational efficiency for large datasets and enables extensions like hypothesis testing via the F-statistic for overall model significance and t-tests for individual coefficients. In practice, multiple linear regression is foundational in fields such as econometrics and social sciences, where it models relationships involving several covariates, though it requires careful handling of multicollinearity through variance inflation factors.^[58] Generalized linear models (GLMs) further extend line fitting to cases where the response variable does not follow a normal distribution, accommodating non-normal errors common in categorical or count data. Introduced by Nelder and Wedderburn, GLMs unify linear regression with models like logistic and Poisson regression under a framework consisting of a linear predictor \eta = \mathbf{X} \boldsymbol{\beta}, a link function g(\mu) = \eta that connects the mean \mu of the response to the predictor, and an exponential family distribution for the response. For binary outcomes, the logistic GLM uses the logit link \log\left(\frac{\mu}{1-\mu}\right) = \eta, enabling probability estimation via maximum likelihood, which is solved iteratively using methods like iteratively reweighted least squares.^[60] This approach maintains the interpretability of linear coefficients as log-odds ratios while relaxing normality assumptions, making it suitable for outcomes like disease prevalence or event counts.^[60] Nonlinear least squares addresses scenarios where the relationship between predictors and response is inherently nonlinear, such as in exponential growth models or pharmacokinetic curves. The objective is to minimize \sum_{i=1}^n (y_i - f(\mathbf{x}_i; \boldsymbol{\theta}))^2, where f(\cdot; \boldsymbol{\theta}) is a nonlinear function parameterized by \boldsymbol{\theta}, often requiring iterative optimization since closed-form solutions are unavailable. A widely adopted method is the Levenberg-Marquardt algorithm, which blends gradient descent and Gauss-Newton approaches by solving (\mathbf{J}^T \mathbf{J} + \lambda \mathbf{I}) \Delta \boldsymbol{\theta} = \mathbf{J}^T \mathbf{r}, where \mathbf{J} is the Jacobian matrix of partial derivatives, \mathbf{r} is the residual vector, and \lambda is a damping parameter that ensures stability near local minima.^[61] This iterative procedure converges quadratically under suitable conditions, providing reliable fits for models in physics and biology, though initial parameter guesses and computational cost can pose challenges.^[61] In high-dimensional settings where the number of predictors p exceeds the sample size n, standard OLS fails due to overfitting and the singularity of \mathbf{X}^T \mathbf{X}; regularization techniques like ridge and lasso regression mitigate this by adding penalties to the loss function. Ridge regression, proposed by Hoerl and Kennard, minimizes \|\mathbf{y} - \mathbf{X} \boldsymbol{\beta}\|^2 + \lambda \|\boldsymbol{\beta}\|^2, yielding \hat{\boldsymbol{\beta}} = (\mathbf{X}^T \mathbf{X} + \lambda \mathbf{I})^{-1} \mathbf{X}^T \mathbf{y}, which shrinks coefficients toward zero to stabilize estimates in the presence of multicollinearity while retaining all predictors.^[62] Lasso regression, introduced by Tibshirani, employs an L1 penalty via \|\mathbf{y} - \mathbf{X} \boldsymbol{\beta}\|^2 + \lambda \|\boldsymbol{\beta}\|_1, promoting sparsity by setting some coefficients exactly to zero, thus performing simultaneous variable selection and estimation; solutions are computed using convex optimization like coordinate descent.^[63] These methods are crucial in genomics and machine learning, where p \gg n, with lasso's sparsity enabling interpretable models, though tuning \lambda via cross-validation is essential for performance.^[63]

References

[1]
Least Squares Fitting -- from Wolfram MathWorld
A mathematical procedure for finding the best-fitting curve to a given set of points by minimizing the sum of the squares of the offsets.
[2]
Least Squares Method: What It Means and How to Use It, With ...
The least squares method is a form of mathematical regression analysis used to select the trend line that best represents a set of data in a chart.
[3]
1.2 - What is the "Best Fitting Line"? | STAT 501
The equation of the best fitting line is: y ^ i = b 0 + b 1 x i · We just need to find the values b 0 and b 1 which make the sum of the squared prediction errors ...
[4]
Science on the Farther Shore | American Scientist
Both Gauss and Legendre introduced the method of least squares in works on astronomy. Legendre was first to publish; he presented the technique (and also ...
[5]
Robust Alternatives to Least-Squares Curve-Fitting - SpringerLink
The methods considered are a median method for fitting a straight line (which is mathematically identical to the direct linear plot), M-estimation and the ...<|control11|><|separator|>
[6]
Introduction to Least-Squares Fitting - MATLAB & Simulink
Linear Least Squares Curve Fitting Toolbox uses the linear least-squares method to fit a linear model to data. A linear model is defined as an equation that is ...Linear Least Squares · Weighted Least Squares · Robust Least Squares
[7]
2.1 - What is Simple Linear Regression? | STAT 462
Simple linear regression is a statistical method that allows us to summarize and study relationships between two continuous (quantitative) variables.
[8]
Linear Regression
Linear regression attempts to model the relationship between two variables by fitting a linear equation to observed data.
[9]
[PDF] Simple Linear Regression
The simplest deterministic mathematical relationship between two variables x and y is a linear relationship: y = β0. + β1 x. The objective of this ...
[10]
1.1 - What is Simple Linear Regression? | STAT 501
A statistical method that allows us to summarize and study relationships between two continuous (quantitative) variables.
[11]
[PDF] Simple Linear Regression Deterministic Model
A model that defines an exact relationship between variables. Example: There is no allowance for error in the prediction of y for a given x. Deterministic Model.
[12]
[PDF] STAT 515 -- Chapter 11: Regression
This is known as simple linear regression. Probabilistic vs. Deterministic Models. If there is an exact relationship between two (or more) variables that can be ...
[13]
[PDF] CHAPTER - Simple Linear Regression
In this chapter we present the simplest of probabilistic models--the straight. line model-which derives its name from the fact that the deterministic portion ...
[14]
[PDF] Lecture 9 Fitting and Matching
Feb 4, 2015 · A classic example is line fitting: given a set of points in 2D, the goal is to find the parameters that describe the line so as to best fit the ...
[15]
8.1 Line fitting, residuals, and correlation
Creating a residual plot is sort of like tipping the scatterplot over so the regression line is horizontal. From the residual plot, we can better estimate the ...
[16]
4.2 - Residuals vs. Fits Plot | STAT 462
It is a scatter plot of residuals on the y axis and fitted values (estimated responses) on the x axis. The plot is used to detect non-linearity, unequal error ...
[17]
Geometric Interpretation of Least Squares
In the case of least-squares approximation, we are minimizing the Euclidean distance, which suggests the geometrical interpretation shown in Fig.4.19.
[18]
Ordinary Least Squares Regression explained visually - Setosa.IO
Error is the difference between prediction and reality: the vertical distance between a real data point and the regression line. OLS is concerned with the ...
[19]
2.3 - The Simple Linear Regression Model | STAT 462
So, another way to write the simple linear regression model is yi=E(Yi)+ϵi=β0+β1xi+ϵi y i = E ( Y i ) + ϵ i = β 0 + β 1 x i + ϵ i . When looking to summarize ...Missing: y_i = | Show results with:y_i =
[20]
6.1 - MLR Model Assumptions | STAT 462
An alternative way to describe all four assumptions is that the errors, \epsilon_i, are independent normal random variables with mean zero and constant ...Missing: 0, | Show results with:0,
[21]
[PDF] Lecture 12 Heteroscedasticity
Heteroscedasticity can also be the result of model misspecification. It can arise as a result of the presence of outliers (either very small or very large). ...
[22]
[PDF] SIMPLE LINEAR REGRESSION 1. Line-Fitting Points (x1,y1),...,(xn ...
SIMPLE LINEAR REGRESSION. DAVAR KHOSHNEVISAN. 1. Line-Fitting. Points (x1,y1),...,(xn,yn) are given; e.g. on a scatterplot. What is “the best line” that ...
[23]
[PDF] Legendre On Least Squares - University of York
His work on geometry, in which he rearranged the propositions of Euclid, is one of the most successful textbooks ever written. On the Method of Least Squares.
[24]
[PDF] Gauss on Least-Squares and Maximum-Likelihood Estimation
Nov 26, 2021 · Legendre (1805) had published his method of least squares four years earlier,. 1I am grateful to Franco Peracchi and Steven Tijms for useful ...
[25]
A Problem in Least Squares : Adcock, R. J. - Internet Archive
Mar 20, 2013 · A Problem in Least Squares. by: Adcock, R. J.. Publication date: 1878-03-01. Publisher: The Analyst. Collection: jstor_analyst; jstor_ejc ...Missing: orthogonal paper
[26]
13.1 - Weighted Least Squares | STAT 501
The black line represents the OLS fit, while the red line represents the WLS fit. The standard deviations tend to increase as the value of Parent increases, so ...
[27]
Instability of Least Squares, Least Absolute Deviation and Least ...
A coefficient vector is an LMS estimate if it minimizes the median of the squared residuals. LMS and LAD may reduce the danger from outliers, but they can be ...
[28]
[PDF] Simple Linear Regression Least Squares Estimates of β0 and β1
This document derives the least squares estimates of β0 and β1. It is simply for your own information. You will not be held responsible for this derivation. The ...Missing: y_i = | Show results with:y_i =
[29]
[PDF] Derivation of OLS Estimators in a Simple Regression
Derivation of OLS Estimators in a Simple Regression. 1 A Simple ... The sum of errors squared. S ≡. T. X t =1. (yt −β2xt )2. (14). The first-order derivative is.
[30]
STAT340 Lecture 08 supplement: Derivation of OLS Estimates
In lecture, we discussed ordinary least squares (OLS) regression in the setting of simple linear regression, whereby we find β0 and β1 minimizing the sum of ...
[31]
[PDF] Chapter 5 – Matrix Approach to Simple Linear Regression - Statistics
Least Squares Estimation in Matrix Form. Again, the key result that we will be using repeatedly throughout multiple regression is that: b = (X'X). -1X'Y . Y ...
[32]
[PDF] The Gauss-Markov Theorem - STA 211 - Stat@Duke
Mar 7, 2023 · The Gauss-Markov Theorem asserts that under some assumptions, the OLS estimator is the “best” (has the lowest variance) among all estimators in ...
[33]
3.4 - Theoretical Justification | STAT 897D
Gauss-Markov Theorem. This theorem says that the least squares estimator is the best linear unbiased estimator. Assume that the linear model is true. For any ...Missing: ordinary | Show results with:ordinary
[34]
A tutorial on the total least squares method for fitting a straight line ...
Feb 13, 2015 · This tutorial provides an introduction to the method of total least squares supplementing a first course in statistics or linear algebra.
[35]
https://online.stat.psu.edu/stat857/node/43/
[36]
The Gauss-Markov Theorem and BLUE OLS Coefficient Estimates
When your model satisfies the assumptions, the Gauss-Markov theorem states that the OLS procedure produces unbiased estimates that have the minimum variance.
[37]
7 Classical Assumptions of Ordinary Least Squares (OLS) Linear ...
In a nutshell, your linear model should produce residuals that have a mean of zero, have a constant variance, and are not correlated with themselves or other ...Ols Assumption 1: The... · Ols Assumption 3: All... · Ols Assumption 4...
[38]
Key Assumptions of OLS: Econometrics Review - Albert.io
Jul 13, 2021 · OLS Assumption 1: The linear regression model is “linear in parameters.” ; OLS Assumption 2: There is a random sampling of observations ; OLS ...
[39]
4.1 - Residuals | STAT 462
The basic idea of residual analysis, therefore, is to investigate the observed residuals to see if they behave “properly.”
[40]
T.2.3 - Testing and Remedial Measures for Autocorrelation | STAT 501
The DW test statistic varies from 0 to 4, with values between 0 and 2 indicating positive autocorrelation, 2 indicating zero autocorrelation, and values between ...
[41]
TESTING FOR SERIAL CORRELATION IN LEAST SQUARES ...
J. DURBIN, G. S. WATSON; TESTING FOR SERIAL CORRELATION IN LEAST SQUARES REGRESSION. I, Biometrika, Volume 37, Issue 3-4, 1 December 1950, Pages 409–428, h.
[42]
[PDF] Assessing the Fit of Regression Models
Regression model fit is assessed using R-squared, the F test, and RMSE. R-squared measures variance accounted for, F test checks reliability, and RMSE measures ...
[43]
2.8 - R-squared Cautions | STAT 462
R-squared (r²) and correlation (r) are often misused. r² can be 0 even with a curved relationship, and a large r² doesn't mean the regression line fits well.Missing: goodness | Show results with:goodness
[44]
15.1 - A Test for the Slope | STAT 415
The test for slope uses a test statistic, and if the p-value is less than 0.05, there is sufficient evidence to conclude the slope parameter does not equal 0.
[45]
6.4 - The Hypothesis Tests for the Slopes | STAT 501
We can use either the F-test or the t-test to test that only one slope parameter is 0. Because the t-test results can be read right off of the Minitab ...
[46]
9.5 - Identifying Influential Data Points | STAT 462
In this section, we learn the following two measures for identifying influential data points: Difference in Fits (DFFITS); Cook's Distances.
[47]
[PDF] Detection of Influential Observation in Linear Regression
Dennis Cook. Reviewed work(s):. Source: Technometrics, Vol. 19, No. 1 (Feb., 1977), pp. 15-18. Published by: American Statistical Association and American ...Missing: original paper
[48]
Lesson 2: Simple Linear Regression (SLR) Model | STAT 462
Simple linear regression is a statistical method that allows us to summarize and study relationships between two continuous (quantitative) variables.2.1 - What is Simple Linear... · 2.3 - The Simple Linear... · 2.9 - Simple Linear...Missing: core | Show results with:core
[49]
12.3 - Simple Linear Regression | STAT 200
The slope is 4.854. For every one inch increase in height, the predicted weight increases by 4.854 pounds. Fitted Line Plot for Linear Model ...
[50]
[PDF] Use of Linear and non-linear growth curves to describe body weight ...
Linear and non-linear growth functions were used to evaluate changes in BW. Models used were a simple linear regression model fitting cubic polynomial of age at ...
[51]
[PDF] Genetic analysis of growth curves for a woody perennial species ...
Abstract Inheritance of growth curves is critical for understanding evolutionary change and formulating effi- cient breeding plans, yet has received limited ...
[52]
[PDF] Absorption, Emission and Fluorescence Spectroscopies
We can measure the concentration of a species by measuring the Absorbance at a particular wavelength. A. Concentration (nM). Linear Calibration Curve: y = mx + ...
[53]
Calibration Practices in Clinical Mass Spectrometry - NIH
The calibration curve slope and intercept should theoretically be consistent for a validated measurement procedure over the period of method validation and ...
[54]
[PDF] Regression control chart
A regression control chart combines linear regression and control chart theory, using regression lines to establish limits, and is useful for controlling ...
[55]
[PDF] Monitoring the Slopes of Linear Profiles
When such a correlation is low, the proposed control chart has a better ARL performance than the 72 chart. KEYWORDS profile monitoring, simple linear regression ...
[56]
5.4 - A Matrix Formulation of the Multiple Regression Model
Here, we review basic matrix algebra, as well as learn some of the more important multiple regression formulas in matrix form.<|separator|>
[57]
[PDF] Multiple regression: matrix formulation
In this chapter we use matrices to write regression models. Properties of matrices are reviewed in Appendix A. The economy of notation achieved through using ...Missing: seminal | Show results with:seminal
[58]
Generalized Linear Models - jstor
Nelder (1966) gives examples of inverse polynomials calculated using the first approximation of the method in this paper. More generally, as shown in Nelder ( ...
[59]
[PDF] The Levenberg-Marquardt algorithm for nonlinear least squares ...
May 5, 2024 · The Levenberg-Marquardt algorithm combines two numerical minimization algorithms: the gradient descent method and the Gauss-Newton method. In ...
[60]
[PDF] Ridge Regression: Biased Estimation for Nonorthogonal Problems
A. E. HOERL AND R. W. KENNARD. 6. RELATION TO OTHER WORK IN REGRESSION. Ridge regression has points of contact with other approaches to regression analysis ...