Fact-checked by Grok 2 weeks ago

Line fitting

Line fitting is a statistical procedure for determining the straight line that best approximates a set of scattered data points in a two-dimensional plane, typically representing the linear relationship between an independent variable (often denoted as x) and a dependent variable (often denoted as y). The primary goal is to minimize the discrepancies, or residuals, between the observed data and the predicted values on the line, enabling predictions, trend identification, and modeling of linear associations in fields such as statistics, economics, engineering, and natural sciences. The most widely used technique for line fitting is the method of , which constructs the line by minimizing the sum of the squared vertical residuals between each data point and the line. For a dataset with n points (x_i, y_i), the line equation is \hat{y} = b_0 + b_1 x, where the slope b_1 = \frac{\sum_{i=1}^n (x_i - \bar{x})(y_i - \bar{y})}{\sum_{i=1}^n (x_i - \bar{x})^2} and the intercept b_0 = \bar{y} - b_1 \bar{x}, with \bar{x} and \bar{y} as the means of the x and y values, respectively. This approach assumes errors are primarily in the y-direction and follow a Gaussian distribution, yielding unbiased estimators under certain conditions like and homoscedasticity. The is often assessed using the r^2, which measures the proportion of variance in y explained by the line, ranging from 0 to 1. Historically, the method emerged in the early amid efforts to refine astronomical observations. first published the technique in 1805 in his work Nouvelles méthodes pour la détermination des orbites des comètes, presenting it as an algebraic tool for minimizing errors in planetary position calculations without probabilistic grounding. , in his 1809 book Theoria Motus Corporum Coelestium, claimed prior development around 1795 and provided a theoretical justification based on the assumption of normally distributed errors, arguing that least squares yields the maximum likelihood estimate. This sparked a priority dispute, though both contributions advanced the method's adoption; independently, American mathematician Robert Adrain derived a similar approach in 1808 for problems. While dominates due to its mathematical simplicity and statistical properties, alternative methods address limitations such as errors in both variables or sensitivity. (or orthogonal regression) minimizes perpendicular distances to the line, suitable when measurement errors affect both x and y equally, as in problems. Robust techniques, like M-estimation or median-based fitting, reduce the impact of s by using less sensitive loss functions instead of squared residuals. These alternatives are particularly valuable in noisy datasets from physics or , though they often require more computational effort. In practice, line fitting underpins and extends to diagnostic tools like residual plots to validate assumptions and detect non-linearity or heteroscedasticity. Software such as , , or Python's implements these methods efficiently, facilitating applications from to biological growth modeling.

Overview

Definition

Line fitting, also known as , is a fundamental statistical technique used to identify the straight line that best approximates a set of two-dimensional data points, each consisting of an observed pair (x_i, y_i) for i = 1 to n. In this context, the line is typically represented by the equation y = mx + c, where m denotes the (indicating the rate of change in y per unit change in x) and c is the (the value of y when x = 0). The primary purposes of line fitting include summarizing underlying trends in the , predicting values of the dependent y based on the independent x, and modeling the linear relationship between two quantitative . This approach assumes that the relationship can be adequately captured by a , enabling in fields such as , , and . Line fitting models can be distinguished as either deterministic or . A deterministic model posits an exact relationship without accounting for variability or error, such that each y is precisely determined by x via the line equation. In contrast, a stochastic model incorporates random error terms to reflect real-world inaccuracies or unexplained variation, treating the observed points as realizations around the true line. The quality of the fit in these models can be geometrically interpreted as the line that minimizes deviations from the data points in a relevant .

Geometric interpretations

In line fitting, the vertical distance quantifies the discrepancy between an observed data point (x_i, y_i) and the corresponding point on the fitted line (\hat{y}_i, x_i), defined as the residual e_i = y_i - \hat{y}_i, where \hat{y}_i = m x_i + c and m, c are the slope and intercept parameters. This measure assumes that the independent variable x_i is measured without error, while deviations occur only in the dependent variable y_i, making it suitable for models where x is controlled or precisely known. The , or orthogonal, provides a more symmetric geometric measure by calculating the shortest from a data point to the fitted line, expressed as d_i = \frac{|a x_i + b y_i + c|}{\sqrt{a^2 + b^2}} for the line equation a x + b y + c = 0. This treats errors in both x and y directions equally, projecting the point orthogonally onto the line rather than vertically. The distinction between vertical and perpendicular distances is critical: vertical distances are appropriate for asymmetric error models where predictions focus on y given x, whereas perpendicular distances account for isotropic s in both variables, leading to a more balanced fit in scenarios like or . For instance, in ordinary least squares, residuals appear as vertical segments in visualizations, emphasizing vertical deviations, while total least squares uses perpendicular segments to minimize orthogonal offsets. Scatter plots effectively illustrate these concepts by overlaying the fitted line on the data cloud and depicting vectors as arrows from each point to the line. In a classic example using measurements, a of head length versus total length shows the regression line \hat{y} = 41 + 0.59x with vertical residuals, such as -1.1 for the point (77.0, 85.3) and +7.45 for (85.0, 98.6), highlighting how points above the line contribute positive residuals and those below contribute negative ones. Similarly, in studies of consumption and muscle strength, plots versus fitted values reveal random around zero if the holds, with vertical distances mirroring deviations from the line in the original . These visualizations underscore the geometric quality of the fit, where tight clustering of residuals indicates good alignment.

Mathematical foundations

Model specification

In line fitting, the parametric describes the relationship between a response y_i and a predictor x_i for i = 1, \dots, n observations as
y_i = \beta_0 + \beta_1 x_i + \varepsilon_i,
where \beta_0 is , \beta_1 is the , and \varepsilon_i is the term representing the deviation from the true line. This model posits that the of y_i given x_i follows a straight line, with the errors capturing unexplained variability.
The error terms \varepsilon_i are typically assumed to be independent and identically distributed as \varepsilon_i \sim N(0, \sigma^2), implying a with zero and constant variance \sigma^2. This normality assumption facilitates , such as intervals and tests for the parameters. However, the model can be generalized to non-normal error distributions, such as in generalized linear models, depending on the data characteristics. A key aspect of the error structure is homoscedasticity, where the variance \sigma^2 remains constant across all levels of x_i; in contrast, heteroscedasticity occurs when the error variance varies with x_i, potentially requiring adjusted estimation techniques. In the context of line fitting to bivariate points, these errors often correspond to vertical distances from the points to the line, though geometric interpretations may involve distances. Simple linear regression represents a special case of line fitting, where the predictors x_i are treated as fixed and non-stochastic, and errors are confined to the response direction.

Objective functions

In line fitting, the objective function defines the mathematical criterion for selecting the parameters that yield the "best" approximating line to a set of data points, typically by minimizing a measure of discrepancy or error. The most common approach is the criterion, which minimizes the sum of the squared vertical residuals between the observed points and the line. For a simple y_i = \beta_0 + \beta_1 x_i + \epsilon_i, where i = 1, \dots, n, the objective function is S(\beta_0, \beta_1) = \sum_{i=1}^n (y_i - (\beta_0 + \beta_1 x_i))^2. This method assumes errors \epsilon_i are independent and identically distributed, often normally, and focuses on vertical distances, implying measurement precision in the dependent variable y. It was first formally proposed by Adrien-Marie Legendre in 1805 for orbit determination and justified probabilistically by Carl Friedrich Gauss in 1809 as the maximum likelihood estimator under Gaussian errors. Geometrically, these residuals represent vertical distances from data points to the fitted line, emphasizing errors solely in the response variable. When errors are present in both independent and dependent variables, orthogonal —also called —provides an alternative by minimizing the sum of squared perpendicular (orthogonal) distances from points to the line. The objective function becomes S(\beta_0, \beta_1) = \sum_{i=1}^n \frac{(y_i - \beta_0 - \beta_1 x_i)^2}{1 + \beta_1^2}, which treats both variables symmetrically and is suitable for or errors-in-variables models. This criterion was introduced by R. J. Adcock in 1878 as an extension of for cases with equal error variances in x and y. Other objective functions address specific data characteristics, such as robustness or varying error scales. The (L1 norm) criterion minimizes the sum of absolute , S(\beta_0, \beta_1) = \sum_{i=1}^n |y_i - (\beta_0 + \beta_1 x_i)|, yielding estimates of the conditional and offering greater resistance to outliers than squared errors; it dates to Roger Joseph Boscovich's work in 1757 on fitting lines to astronomical . For heteroscedastic errors—where residual variance changes with x— incorporates weights w_i (often inversely proportional to estimated variances) into the sum of squared residuals, S(\beta_0, \beta_1) = \sum_{i=1}^n w_i (y_i - (\beta_0 + \beta_1 x_i))^2, to achieve more efficient estimates by downweighting high-variance observations. These objectives involve trade-offs in robustness and efficiency. Squared-error methods like ordinary least squares are sensitive to outliers, as large deviations disproportionately influence the fit due to quadratic penalization, potentially leading to biased lines in contaminated data. In contrast, the L1 norm's linear penalization reduces outlier impact, promoting stability but at the cost of lower efficiency under Gaussian errors compared to least squares. Orthogonal and weighted variants balance these by adapting to error structures, though they require assumptions about error equality or variance forms.

Estimation methods

Ordinary least squares

Ordinary least squares (OLS) is the foundational estimation method for fitting a straight line to data points in , where the model assumes the form y_i = \beta_0 + \beta_1 x_i + \epsilon_i for i = 1, \dots, n, with errors \epsilon_i confined to the response variable y and assumed to be independent, identically distributed, and normally distributed with mean zero and constant variance \sigma^2. This approach minimizes the sum of squared vertical residuals, \sum_{i=1}^n (y_i - \hat{y}_i)^2, where \hat{y}_i = \hat{\beta}_0 + \hat{\beta}_1 x_i represents the predicted values on the fitted line. To derive the OLS estimators, the objective function—the residual sum of squares—is differentiated with respect to \beta_0 and \beta_1, and the resulting normal equations are solved simultaneously. The estimator is given by \hat{\beta}_1 = \frac{\sum_{i=1}^n (x_i - \bar{x})(y_i - \bar{y})}{\sum_{i=1}^n (x_i - \bar{x})^2}, where \bar{x} and \bar{y} are the sample means of the predictors and responses, respectively. The intercept estimator follows as \hat{\beta}_0 = \bar{y} - \hat{\beta}_1 \bar{x}. These closed-form expressions arise from the orthogonality conditions that the residuals are uncorrelated with the constant term and the predictor . In matrix notation, the simple linear model can be expressed as \mathbf{y} = \mathbf{X} \boldsymbol{\beta} + \boldsymbol{\epsilon}, where \mathbf{y} is the n \times 1 vector of observations, \mathbf{X} is the n \times 2 with first column of ones and second column of x_i values, \boldsymbol{\beta} = (\beta_0, \beta_1)^T is the parameter vector, and \boldsymbol{\epsilon} is the error vector. The OLS estimator is then \hat{\boldsymbol{\beta}} = (\mathbf{X}^T \mathbf{X})^{-1} \mathbf{X}^T \mathbf{y}, assuming \mathbf{X}^T \mathbf{X} is invertible, which requires no perfect (here, ensured by variation in the x_i). Computationally, OLS proceeds in steps: first, calculate the sample means \bar{x} = \frac{1}{n} \sum_{i=1}^n x_i and \bar{y} = \frac{1}{n} \sum_{i=1}^n y_i; next, compute the numerator covariance-like \sum_{i=1}^n (x_i - \bar{x})(y_i - \bar{y}) and the denominator \sum_{i=1}^n (x_i - \bar{x})^2; then apply the formulas for \hat{\beta}_1 and \hat{\beta}_0. This process scales efficiently for simple cases and forms the basis for software implementations in statistical packages. Under the classical linear model assumptions—linearity in parameters, strict exogeneity (E[\epsilon_i | \mathbf{X}] = 0), homoscedasticity (Var(\epsilon_i | \mathbf{X}) = \sigma^2), and no perfect —the OLS estimators are unbiased, meaning E[\hat{\boldsymbol{\beta}}] = \boldsymbol{\beta}. Furthermore, by the Gauss-Markov theorem, they possess the minimum variance among all linear unbiased estimators, rendering them the best linear unbiased estimators (). This efficiency holds even without normality of errors, though normality enables additional inference procedures like t-tests.

Total least squares

Total least squares (TLS), also known as orthogonal regression, addresses line fitting scenarios where measurement errors are present in both the independent variable x and the dependent variable y, treating them symmetrically. Unlike ordinary , which minimizes vertical residuals assuming error only in y, TLS minimizes the sum of the squared (orthogonal) distances from the points to the fitted line. This approach is particularly suitable for problems or when variables are measured with comparable precision. The mathematical formulation of TLS for fitting a line to centered data points (x_i - \bar{x}, y_i - \bar{y}) involves constructing the \mathbf{X} = \begin{bmatrix} x_1 - \bar{x} & y_1 - \bar{y} \\ \vdots & \vdots \\ x_n - \bar{x} & y_n - \bar{y} \end{bmatrix} and finding the line that minimizes the Frobenius norm of the perturbation, equivalent to solving \mathbf{X} \mathbf{v} \approx \mathbf{0} with \|\mathbf{v}\| = 1. This is achieved through the (SVD) of \mathbf{X} = U \Sigma V^T, where the line's normal vector is the right singular vector \mathbf{v} corresponding to the smallest . The m of the line is then given by m = -\frac{v_{12}}{v_{11}}, where v_{11} and v_{12} are the first and second components of \mathbf{v}, respectively; the intercept is c = \bar{y} - m \bar{x}. This SVD-based solution ensures and is computationally efficient for large datasets. TLS is mathematically equivalent to performing on the [\mathbf{x} \ \mathbf{y}], where the first principal component (direction of maximum variance) aligns with the fitted line, thereby minimizing the variance in the perpendicular direction. In the context of the errors-in-variables (EIV) model, TLS assumes that the errors in x and y are independent, normally distributed, and have equal variances (\sigma_x^2 = \sigma_y^2), which justifies the isotropic perturbation in both dimensions. This assumption distinguishes TLS from more general EIV models that allow unequal variances. Orthogonal regression represents the unweighted, symmetric special case of TLS, emphasizing geometric fidelity over directional bias.

Other variants

Deming regression extends by incorporating a known ratio \delta = \sigma_\epsilon^2 / \sigma_\eta^2 of the error variances in the x and y directions, allowing for weighted treatment of measurement errors in both . This method is particularly useful in and method comparison studies where errors in the predictor are non-negligible but their relative magnitude to response errors is estimated or assumed. The \beta_1 is given by \beta_1 = \frac{s_{yy} - \delta s_{xx} + \sqrt{(s_{yy} - \delta s_{xx})^2 + 4 \delta s_{xy}^2}}{2 s_{xy}}, where s_{yy}, s_{xx}, and s_{xy} denote the sample variances and covariance, respectively. The intercept \beta_0 follows from \beta_0 = \bar{y} - \beta_1 \bar{x}. Originally outlined in foundational work on statistical adjustment, Deming regression provides unbiased estimates under the specified error structure and is implemented in statistical software for practical applications like analytical chemistry. Robust methods address violations of standard assumptions, such as the presence of outliers, by downweighting or excluding anomalous data points during estimation. The (RANSAC) algorithm iteratively samples minimal subsets of data to hypothesize model parameters, then evaluates sets of inliers that fit the within a , selecting the model with the largest for robustness against gross errors. Introduced for tasks like line fitting in imagery, RANSAC has been widely adopted in engineering and for its ability to handle up to 50% outliers without distributional assumptions. (IRLS), another robust approach, minimizes a weighted sum of squared residuals where weights are updated based on a robustness (e.g., Huber's \psi-) that reduces influence of large residuals in each iteration until convergence. This method, effective for moderate contamination levels, approximates M-estimators and is computationally efficient for line fitting in datasets with leverage points or heteroscedasticity. Bayesian line fitting treats parameters as random variables, deriving posterior distributions by combining a likelihood (typically Gaussian) with priors on the , intercept, and variance to quantify and incorporate . For a simple y_i = \beta_0 + \beta_1 x_i + \epsilon_i, conjugate normal-inverse-gamma priors yield closed-form posteriors, while non-conjugate cases use (MCMC) sampling to approximate the full posterior for inference on parameters and predictions. MCMC methods, such as , enable exploration of complex posteriors in scenarios with informative priors, providing credible intervals that adapt to data sparsity or prior strength, as seen in geophysical and astronomical line fitting applications. While the above variants remain , non-parametric alternatives like kernel smoothing offer flexible line fitting by locally weighting observations without assuming a fixed functional form, using kernel functions (e.g., Gaussian) to estimate the surface. The Nadaraya-Watson , \hat{m}(x) = \sum_{i=1}^n K\left(\frac{x - x_i}{h}\right) y_i / \sum_{i=1}^n K\left(\frac{x - x_i}{h}\right), where K is the and h the , provides smooth fits robust to mild nonlinearity but requires bandwidth selection to balance bias and variance. These methods serve as extensions when parametric assumptions fail, though they are computationally intensive for large datasets.

Assumptions and diagnostics

Underlying assumptions

Line fitting, particularly through methods like ordinary least squares (OLS), relies on several key statistical assumptions to ensure the validity of inferences about the model parameters. These assumptions underpin the theoretical properties of the estimators, such as unbiasedness and efficiency under the Gauss-Markov theorem. The primary assumption is , which posits that the true relationship between the response variable y and the predictor variable x is linear in the parameters. This means the model can be expressed as y_i = \beta_0 + \beta_1 x_i + \epsilon_i, where the errors \epsilon_i capture deviations from this linear form, ensuring that the E(y_i | x_i) = \beta_0 + \beta_1 x_i. Violations of linearity, such as nonlinear relationships, can lead to biased parameter estimates in OLS. Another crucial assumption is the independence of errors, requiring that the disturbances \epsilon_i are independent across observations. This independence typically arises from random sampling of the data, preventing issues like that could arise in time-series contexts. Independence ensures that the variance-covariance of the errors is diagonal, supporting the efficiency of OLS estimators. Homoscedasticity assumes variance of the errors, specifically \text{Var}(\epsilon_i) = \sigma^2 for all i, regardless of the value of x_i. This equal of residuals around the fitted line is essential for the OLS estimator to achieve minimum variance among linear unbiased estimators, as per the Gauss-Markov theorem. Departures from homoscedasticity, such as heteroscedasticity, can result in inefficient estimates and invalid standard errors. For statistical inference, such as constructing confidence intervals or performing t-tests on the coefficients, the errors are assumed to be normally distributed: \epsilon_i \sim N(0, \sigma^2). This normality assumption facilitates exact finite-sample inference under OLS, although it is less critical for large samples due to the . While OLS remains consistent without normality, violations can affect the reliability of hypothesis tests. Finally, the absence of perfect is assumed, meaning the predictors are not linearly dependent. In simple line fitting with a single predictor, this condition is inherently satisfied unless the predictor is constant across all observations, avoiding singular matrices and ensuring unique estimates. In extensions to multiple predictors, perfect multicollinearity would render some parameters inestimable. Violations of these assumptions can compromise the properties of OLS estimators, such as leading to biased, inconsistent, or inefficient results.

Model validation techniques

Model validation techniques in line fitting involve statistical methods to evaluate the adequacy of the fitted line, ensuring it meets key assumptions such as , homoscedasticity, and of errors, while also assessing overall fit and identifying potential issues like outliers. These techniques are essential post-estimation to confirm the reliability of the model for and . Residual analysis is a primary diagnostic tool, where residuals—defined as the differences between observed and fitted values, e_i = y_i - \hat{y}_i—are examined to verify model assumptions. Plotting residuals against fitted values helps detect non-linearity (if a deviates from ), heteroscedasticity (if variance changes across fitted values), and outliers (if points are distant from zero). For time-series or ordered data, the Durbin-Watson test assesses in residuals; the d ranges from 0 to 4, with values near 2 indicating no , while deviations suggest positive (d < 2) or negative (d > 2) serial . Geometric residuals, which measure distances to the line, can provide an alternative view but are less commonly used than vertical residuals for these checks. Goodness-of-fit is often quantified using the , R^2, which measures the proportion of total variance in the response variable explained by the model: R^2 = 1 - \frac{\sum (y_i - \hat{y}_i)^2}{\sum (y_i - \bar{y})^2} = 1 - \frac{SS_{res}}{SS_{tot}} Here, SS_{res} is the and SS_{tot} is the . Values of R^2 closer to 1 indicate better fit, though it should be interpreted cautiously as it does not imply or account for model complexity. Hypothesis tests provide formal statistical evaluation of the model's parameters. The t-test for the \beta_1 tests the H_0: \beta_1 = 0 against the alternative of a non-zero , using the t = \frac{\hat{\beta_1}}{SE(\hat{\beta_1})} with n-2 ; a low rejects the null, indicating a significant linear relationship. The overall assesses whether the model explains significant variance beyond an intercept-only model, with the statistic F = \frac{MS_{reg}}{MS_{res}} following an with 1 and n-2 for line fitting; rejection of H_0: \beta_1 = 0 supports model utility. Outlier and influential point detection uses measures like and to identify data points that disproportionately affect the fit. h_i quantifies how far an observation's predictor value is from the , with high-leverage points (typically h_i > \frac{2(p+1)}{n}, where p=1 for simple regression) potentially pulling the line; values near 1 indicate extreme influence from position alone. D_i combines leverage and residual size to measure overall influence on fitted coefficients, calculated as D_i = \frac{e_i^2 h_i}{(p+1) s^2 (1 - h_i)^2}; points with D_i > \frac{4}{n} are flagged as influential. These diagnostics, introduced in seminal work on , help ensure robust model validation.

Applications and extensions

Core applications

Line fitting, particularly through , serves as a foundational tool in for modeling the relationship between two continuous , enabling predictions of one based on the other. A classic example is predicting body from , where from college students have been used to fit a line showing that weight increases by approximately 4.85 pounds per inch of height, allowing estimates for new observations. This approach assumes a straight-line relationship and minimizes squared errors to quantify how changes in the predictor influence the response. In , line fitting estimates rates of change over time or conditions, such as in biological growth curves where linear models describe body weight development in animals. For instance, researchers apply to track or physiological changes, providing slopes that represent average growth rates per unit time, which inform evolutionary studies and breeding strategies in species like woody perennials. These applications highlight line fitting's role in capturing linear trends amid natural variability, without delving into nonlinear dynamics. Calibration represents another core use, where line fitting establishes response curves for analytical instruments, such as in to relate to concentration. In , standards of known concentrations are measured to construct a linear calibration plot, typically following Beer's law, enabling accurate quantification of unknown samples by interpolating from the fitted line. This method ensures precision in fields like , where deviations from linearity signal potential errors in measurement procedures. In , line fitting integrates with control charts to monitor process stability, such as tracking linear profiles in where lines set upper and lower bounds for acceptable variation. For example, control charts detect shifts in slopes or intercepts of trends, like part dimensions over time, outperforming traditional charts when correlations between variables are present. This facilitates early intervention to maintain product consistency.

Advanced extensions

Line fitting extends to multivariate settings through multiple linear regression, which generalizes the simple linear model to incorporate multiple predictors. The model is expressed as y_i = \beta_0 + \sum_{j=1}^p \beta_j x_{ij} + \epsilon_i, or in matrix form as \mathbf{y} = \mathbf{X} \boldsymbol{\beta} + \boldsymbol{\epsilon}, where \mathbf{y} is the n \times 1 response vector, \mathbf{X} is the n \times (p+1) design matrix with a column of ones for the intercept, \boldsymbol{\beta} is the (p+1) \times 1 parameter vector, and \boldsymbol{\epsilon} is the error term assumed to be normally distributed with mean zero and constant variance. The parameters are estimated using ordinary least squares (OLS), yielding \hat{\boldsymbol{\beta}} = (\mathbf{X}^T \mathbf{X})^{-1} \mathbf{X}^T \mathbf{y}, which minimizes the sum of squared residuals and provides unbiased estimates under standard assumptions of linearity, independence, homoscedasticity, and no perfect multicollinearity. This matrix formulation facilitates computational efficiency for large datasets and enables extensions like hypothesis testing via the F-statistic for overall model significance and t-tests for individual coefficients. In practice, multiple linear regression is foundational in fields such as and social sciences, where it models relationships involving several covariates, though it requires careful handling of through variance inflation factors. Generalized linear models (GLMs) further extend line fitting to cases where the response variable does not follow a , accommodating non-normal errors common in categorical or count data. Introduced by Nelder and Wedderburn, GLMs unify with models like logistic and under a consisting of a linear predictor \eta = \mathbf{X} \boldsymbol{\beta}, a link function g(\mu) = \eta that connects the mean \mu of the response to the predictor, and an distribution for the response. For binary outcomes, the logistic GLM uses the logit link \log\left(\frac{\mu}{1-\mu}\right) = \eta, enabling probability estimation via maximum likelihood, which is solved iteratively using methods like . This approach maintains the interpretability of linear coefficients as log-odds ratios while relaxing normality assumptions, making it suitable for outcomes like disease prevalence or event counts. Nonlinear least squares addresses scenarios where the relationship between predictors and response is inherently nonlinear, such as in exponential growth models or pharmacokinetic curves. The objective is to minimize \sum_{i=1}^n (y_i - f(\mathbf{x}_i; \boldsymbol{\theta}))^2, where f(\cdot; \boldsymbol{\theta}) is a nonlinear function parameterized by \boldsymbol{\theta}, often requiring iterative optimization since closed-form solutions are unavailable. A widely adopted method is the Levenberg-Marquardt algorithm, which blends gradient descent and Gauss-Newton approaches by solving (\mathbf{J}^T \mathbf{J} + \lambda \mathbf{I}) \Delta \boldsymbol{\theta} = \mathbf{J}^T \mathbf{r}, where \mathbf{J} is the Jacobian matrix of partial derivatives, \mathbf{r} is the residual vector, and \lambda is a damping parameter that ensures stability near local minima. This iterative procedure converges quadratically under suitable conditions, providing reliable fits for models in physics and biology, though initial parameter guesses and computational cost can pose challenges. In high-dimensional settings where the number of predictors p exceeds the sample size n, standard OLS fails due to and the singularity of \mathbf{X}^T \mathbf{X}; regularization techniques like and mitigate this by adding penalties to the loss function. , proposed by Hoerl and Kennard, minimizes \|\mathbf{y} - \mathbf{X} \boldsymbol{\beta}\|^2 + \lambda \|\boldsymbol{\beta}\|^2, yielding \hat{\boldsymbol{\beta}} = (\mathbf{X}^T \mathbf{X} + \lambda \mathbf{I})^{-1} \mathbf{X}^T \mathbf{y}, which shrinks coefficients toward zero to stabilize estimates in the presence of while retaining all predictors. , introduced by Tibshirani, employs an L1 penalty via \|\mathbf{y} - \mathbf{X} \boldsymbol{\beta}\|^2 + \lambda \|\boldsymbol{\beta}\|_1, promoting sparsity by setting some coefficients exactly to zero, thus performing simultaneous variable selection and estimation; solutions are computed using like . These methods are crucial in and , where p \gg n, with lasso's sparsity enabling interpretable models, though tuning \lambda via cross-validation is essential for performance.

References

  1. [1]
    Least Squares Fitting -- from Wolfram MathWorld
    A mathematical procedure for finding the best-fitting curve to a given set of points by minimizing the sum of the squares of the offsets.
  2. [2]
    Least Squares Method: What It Means and How to Use It, With ...
    The least squares method is a form of mathematical regression analysis used to select the trend line that best represents a set of data in a chart.
  3. [3]
    1.2 - What is the "Best Fitting Line"? | STAT 501
    The equation of the best fitting line is: y ^ i = b 0 + b 1 x i · We just need to find the values b 0 and b 1 which make the sum of the squared prediction errors ...
  4. [4]
    Science on the Farther Shore | American Scientist
    Both Gauss and Legendre introduced the method of least squares in works on astronomy. Legendre was first to publish; he presented the technique (and also ...
  5. [5]
    Robust Alternatives to Least-Squares Curve-Fitting - SpringerLink
    The methods considered are a median method for fitting a straight line (which is mathematically identical to the direct linear plot), M-estimation and the ...<|control11|><|separator|>
  6. [6]
    Introduction to Least-Squares Fitting - MATLAB & Simulink
    Linear Least Squares​​ Curve Fitting Toolbox uses the linear least-squares method to fit a linear model to data. A linear model is defined as an equation that is ...Linear Least Squares · Weighted Least Squares · Robust Least Squares
  7. [7]
    2.1 - What is Simple Linear Regression? | STAT 462
    Simple linear regression is a statistical method that allows us to summarize and study relationships between two continuous (quantitative) variables.
  8. [8]
    Linear Regression
    Linear regression attempts to model the relationship between two variables by fitting a linear equation to observed data.
  9. [9]
    [PDF] Simple Linear Regression
    The simplest deterministic mathematical relationship between two variables x and y is a linear relationship: y = β0. + β1 x. The objective of this ...
  10. [10]
    1.1 - What is Simple Linear Regression? | STAT 501
    A statistical method that allows us to summarize and study relationships between two continuous (quantitative) variables.
  11. [11]
    [PDF] Simple Linear Regression Deterministic Model
    A model that defines an exact relationship between variables. Example: There is no allowance for error in the prediction of y for a given x. Deterministic Model.
  12. [12]
    [PDF] STAT 515 -- Chapter 11: Regression
    This is known as simple linear regression. Probabilistic vs. Deterministic Models. If there is an exact relationship between two (or more) variables that can be ...
  13. [13]
    [PDF] CHAPTER - Simple Linear Regression
    In this chapter we present the simplest of probabilistic models--the straight. line model-which derives its name from the fact that the deterministic portion ...
  14. [14]
    [PDF] Lecture 9 Fitting and Matching
    Feb 4, 2015 · A classic example is line fitting: given a set of points in 2D, the goal is to find the parameters that describe the line so as to best fit the ...
  15. [15]
    8.1 Line fitting, residuals, and correlation
    Creating a residual plot is sort of like tipping the scatterplot over so the regression line is horizontal. From the residual plot, we can better estimate the ...
  16. [16]
    4.2 - Residuals vs. Fits Plot | STAT 462
    It is a scatter plot of residuals on the y axis and fitted values (estimated responses) on the x axis. The plot is used to detect non-linearity, unequal error ...
  17. [17]
    Geometric Interpretation of Least Squares
    In the case of least-squares approximation, we are minimizing the Euclidean distance, which suggests the geometrical interpretation shown in Fig.4.19.
  18. [18]
    Ordinary Least Squares Regression explained visually - Setosa.IO
    Error is the difference between prediction and reality: the vertical distance between a real data point and the regression line. OLS is concerned with the ...
  19. [19]
    2.3 - The Simple Linear Regression Model | STAT 462
    So, another way to write the simple linear regression model is yi=E(Yi)+ϵi=β0+β1xi+ϵi y i = E ( Y i ) + ϵ i = β 0 + β 1 x i + ϵ i . When looking to summarize ...Missing: y_i = | Show results with:y_i =
  20. [20]
    6.1 - MLR Model Assumptions | STAT 462
    An alternative way to describe all four assumptions is that the errors, \epsilon_i, are independent normal random variables with mean zero and constant ...Missing: 0, | Show results with:0,
  21. [21]
    [PDF] Lecture 12 Heteroscedasticity
    Heteroscedasticity can also be the result of model misspecification. It can arise as a result of the presence of outliers (either very small or very large). ...
  22. [22]
    [PDF] SIMPLE LINEAR REGRESSION 1. Line-Fitting Points (x1,y1),...,(xn ...
    SIMPLE LINEAR REGRESSION. DAVAR KHOSHNEVISAN. 1. Line-Fitting. Points (x1,y1),...,(xn,yn) are given; e.g. on a scatterplot. What is “the best line” that ...
  23. [23]
    [PDF] Legendre On Least Squares - University of York
    His work on geometry, in which he rearranged the propositions of Euclid, is one of the most successful textbooks ever written. On the Method of Least Squares.
  24. [24]
    [PDF] Gauss on Least-Squares and Maximum-Likelihood Estimation
    Nov 26, 2021 · Legendre (1805) had published his method of least squares four years earlier,. 1I am grateful to Franco Peracchi and Steven Tijms for useful ...
  25. [25]
    A Problem in Least Squares : Adcock, R. J. - Internet Archive
    Mar 20, 2013 · A Problem in Least Squares. by: Adcock, R. J.. Publication date: 1878-03-01. Publisher: The Analyst. Collection: jstor_analyst; jstor_ejc ...Missing: orthogonal paper
  26. [26]
    13.1 - Weighted Least Squares | STAT 501
    The black line represents the OLS fit, while the red line represents the WLS fit. The standard deviations tend to increase as the value of Parent increases, so ...
  27. [27]
    Instability of Least Squares, Least Absolute Deviation and Least ...
    A coefficient vector is an LMS estimate if it minimizes the median of the squared residuals. LMS and LAD may reduce the danger from outliers, but they can be ...
  28. [28]
    [PDF] Simple Linear Regression Least Squares Estimates of β0 and β1
    This document derives the least squares estimates of β0 and β1. It is simply for your own information. You will not be held responsible for this derivation. The ...Missing: y_i = | Show results with:y_i =
  29. [29]
    [PDF] Derivation of OLS Estimators in a Simple Regression
    Derivation of OLS Estimators in a Simple Regression. 1 A Simple ... The sum of errors squared. S ≡. T. X t =1. (yt −β2xt )2. (14). The first-order derivative is.
  30. [30]
    STAT340 Lecture 08 supplement: Derivation of OLS Estimates
    In lecture, we discussed ordinary least squares (OLS) regression in the setting of simple linear regression, whereby we find β0 and β1 minimizing the sum of ...
  31. [31]
    [PDF] Chapter 5 – Matrix Approach to Simple Linear Regression - Statistics
    Least Squares Estimation in Matrix Form. Again, the key result that we will be using repeatedly throughout multiple regression is that: b = (X'X). -1X'Y . Y ...
  32. [32]
    [PDF] The Gauss-Markov Theorem - STA 211 - Stat@Duke
    Mar 7, 2023 · The Gauss-Markov Theorem asserts that under some assumptions, the OLS estimator is the “best” (has the lowest variance) among all estimators in ...
  33. [33]
    3.4 - Theoretical Justification | STAT 897D
    Gauss-Markov Theorem. This theorem says that the least squares estimator is the best linear unbiased estimator. Assume that the linear model is true. For any ...Missing: ordinary | Show results with:ordinary
  34. [34]
    A tutorial on the total least squares method for fitting a straight line ...
    Feb 13, 2015 · This tutorial provides an introduction to the method of total least squares supplementing a first course in statistics or linear algebra.
  35. [35]
  36. [36]
    The Gauss-Markov Theorem and BLUE OLS Coefficient Estimates
    When your model satisfies the assumptions, the Gauss-Markov theorem states that the OLS procedure produces unbiased estimates that have the minimum variance.
  37. [37]
    7 Classical Assumptions of Ordinary Least Squares (OLS) Linear ...
    In a nutshell, your linear model should produce residuals that have a mean of zero, have a constant variance, and are not correlated with themselves or other ...Ols Assumption 1: The... · Ols Assumption 3: All... · Ols Assumption 4...
  38. [38]
    Key Assumptions of OLS: Econometrics Review - Albert.io
    Jul 13, 2021 · OLS Assumption 1: The linear regression model is “linear in parameters.” ; OLS Assumption 2: There is a random sampling of observations ; OLS ...
  39. [39]
    4.1 - Residuals | STAT 462
    The basic idea of residual analysis, therefore, is to investigate the observed residuals to see if they behave “properly.”
  40. [40]
    T.2.3 - Testing and Remedial Measures for Autocorrelation | STAT 501
    The DW test statistic varies from 0 to 4, with values between 0 and 2 indicating positive autocorrelation, 2 indicating zero autocorrelation, and values between ...
  41. [41]
    TESTING FOR SERIAL CORRELATION IN LEAST SQUARES ...
    J. DURBIN, G. S. WATSON; TESTING FOR SERIAL CORRELATION IN LEAST SQUARES REGRESSION. I, Biometrika, Volume 37, Issue 3-4, 1 December 1950, Pages 409–428, h.
  42. [42]
    [PDF] Assessing the Fit of Regression Models
    Regression model fit is assessed using R-squared, the F test, and RMSE. R-squared measures variance accounted for, F test checks reliability, and RMSE measures ...
  43. [43]
    2.8 - R-squared Cautions | STAT 462
    R-squared (r²) and correlation (r) are often misused. r² can be 0 even with a curved relationship, and a large r² doesn't mean the regression line fits well.Missing: goodness | Show results with:goodness
  44. [44]
    15.1 - A Test for the Slope | STAT 415
    The test for slope uses a test statistic, and if the p-value is less than 0.05, there is sufficient evidence to conclude the slope parameter does not equal 0.
  45. [45]
    6.4 - The Hypothesis Tests for the Slopes | STAT 501
    We can use either the F-test or the t-test to test that only one slope parameter is 0. Because the t-test results can be read right off of the Minitab ...
  46. [46]
    9.5 - Identifying Influential Data Points | STAT 462
    In this section, we learn the following two measures for identifying influential data points: Difference in Fits (DFFITS); Cook's Distances.
  47. [47]
    [PDF] Detection of Influential Observation in Linear Regression
    Dennis Cook. Reviewed work(s):. Source: Technometrics, Vol. 19, No. 1 (Feb., 1977), pp. 15-18. Published by: American Statistical Association and American ...Missing: original paper
  48. [48]
    Lesson 2: Simple Linear Regression (SLR) Model | STAT 462
    Simple linear regression is a statistical method that allows us to summarize and study relationships between two continuous (quantitative) variables.2.1 - What is Simple Linear... · 2.3 - The Simple Linear... · 2.9 - Simple Linear...Missing: core | Show results with:core
  49. [49]
    12.3 - Simple Linear Regression | STAT 200
    The slope is 4.854. For every one inch increase in height, the predicted weight increases by 4.854 pounds. Fitted Line Plot for Linear Model ...
  50. [50]
    [PDF] Use of Linear and non-linear growth curves to describe body weight ...
    Linear and non-linear growth functions were used to evaluate changes in BW. Models used were a simple linear regression model fitting cubic polynomial of age at ...
  51. [51]
    [PDF] Genetic analysis of growth curves for a woody perennial species ...
    Abstract Inheritance of growth curves is critical for understanding evolutionary change and formulating effi- cient breeding plans, yet has received limited ...
  52. [52]
    [PDF] Absorption, Emission and Fluorescence Spectroscopies
    We can measure the concentration of a species by measuring the Absorbance at a particular wavelength. A. Concentration (nM). Linear Calibration Curve: y = mx + ...
  53. [53]
    Calibration Practices in Clinical Mass Spectrometry - NIH
    The calibration curve slope and intercept should theoretically be consistent for a validated measurement procedure over the period of method validation and ...
  54. [54]
    [PDF] Regression control chart
    A regression control chart combines linear regression and control chart theory, using regression lines to establish limits, and is useful for controlling ...
  55. [55]
    [PDF] Monitoring the Slopes of Linear Profiles
    When such a correlation is low, the proposed control chart has a better ARL performance than the 72 chart. KEYWORDS profile monitoring, simple linear regression ...
  56. [56]
    5.4 - A Matrix Formulation of the Multiple Regression Model
    Here, we review basic matrix algebra, as well as learn some of the more important multiple regression formulas in matrix form.<|separator|>
  57. [57]
    [PDF] Multiple regression: matrix formulation
    In this chapter we use matrices to write regression models. Properties of matrices are reviewed in Appendix A. The economy of notation achieved through using ...Missing: seminal | Show results with:seminal
  58. [58]
    Generalized Linear Models - jstor
    Nelder (1966) gives examples of inverse polynomials calculated using the first approximation of the method in this paper. More generally, as shown in Nelder ( ...
  59. [59]
    [PDF] The Levenberg-Marquardt algorithm for nonlinear least squares ...
    May 5, 2024 · The Levenberg-Marquardt algorithm combines two numerical minimization algorithms: the gradient descent method and the Gauss-Newton method. In ...
  60. [60]
    [PDF] Ridge Regression: Biased Estimation for Nonorthogonal Problems
    A. E. HOERL AND R. W. KENNARD. 6. RELATION TO OTHER WORK IN REGRESSION. Ridge regression has points of contact with other approaches to regression analysis ...