Fact-checked by Grok 2 weeks ago

Local regression

Local regression, also known as LOESS (locally estimated scatterplot smoothing) or LOWESS (locally weighted scatterplot smoothing), is a non-parametric statistical method that estimates the relationship between variables by fitting simple models, such as low-degree polynomials, to localized subsets of data points, producing a smooth curve that captures underlying trends without assuming a global functional form.^[1] This approach weights observations based on their proximity to the point of interest, with closer points receiving higher influence, allowing it to adapt flexibly to complex, nonlinear patterns in scatterplots.^[2] Developed by statistician William S. Cleveland in the late 1970s, local regression builds on earlier smoothing techniques by incorporating robust weighting schemes to mitigate the impact of outliers and enhance reliability in data exploration.^[3] The method gained prominence through Cleveland's foundational work, including robust variants that use iterative procedures like bisquare weights to downweight deviant points, making it suitable for real-world datasets with noise or anomalies.^[3] By the 1980s, extensions like multivariate loess further broadened its applicability to higher-dimensional data.^[2] Key features of local regression include adjustable parameters such as the smoothing span (bandwidth), which controls the neighborhood size, and the choice of weight functions like the tricube, ensuring smooth transitions across the data.^[2] It excels in exploratory data analysis, diagnostic checking of parametric models, and nonparametric regression, particularly for visualizing stressor-response relationships in fields like environmental science and economics.^[1] Unlike rigid parametric methods, its minimal assumptions about data structure make it versatile for large datasets, though it requires careful tuning to balance bias and variance.^[1]

Overview

Definition and intuition

Local regression, also known as LOESS (locally estimated scatterplot smoothing) or LOWESS (locally weighted scatterplot smoothing), is a nonparametric method for fitting smooth curves or surfaces to data by performing separate low-degree polynomial regressions on localized subsets of the data around each point of interest.^[4] This approach estimates the underlying relationship between predictor and response variables without assuming a fixed global functional form, making it particularly useful for exploring complex, nonlinear patterns in datasets.^[5] Intuitively, local regression mimics the flexibility of nonlinear modeling while retaining the simplicity of linear least squares, as it builds the fit piecewise by focusing on nearby data points rather than the entire dataset.^[4] For a target point x_0, the method selects a neighborhood of data points within a specified bandwidth h, assigns higher weights to points closer to x_0 (typically using a decreasing function like the tricube weight), and fits a low-degree polynomial—often linear or quadratic—via weighted least squares to this subset, evaluating the polynomial at x_0 to obtain the smoothed estimate.^[4] This local weighting ensures that the fit adapts to variations in the data's structure, such as changes in slope or curvature, producing a smooth overall regression that can reveal trends hidden in global parametric models.^[1] In a simple univariate example, consider a scatterplot of response values against a predictor exhibiting a wavy, nonlinear trend amid random noise, such as measurements of temperature fluctuations over time; local regression generates a fitted curve that closely tracks the underlying oscillations by averaging local linear trends, resulting in a visually intuitive smoothed line that highlights the data's essential pattern without overgeneralizing distant outliers.^[1] The acronym LOWESS, coined by William S. Cleveland in his 1979 work on robust smoothing for scatterplots, emphasizes the weighted scatterplot aspect for univariate cases, while LOESS, formalized in Cleveland and Susan J. Devlin's 1988 extension, broadens the technique to multivariate local regression for more general predictor structures.^[6]^[5]

Comparison to other regression methods

Local regression, also known as LOESS or LOWESS, differs fundamentally from parametric methods such as linear or polynomial regression by avoiding global functional assumptions, instead fitting simple models locally to subsets of the data weighted by proximity to each evaluation point. This non-parametric approach excels at capturing complex non-linear relationships without presupposing a specific form, making it suitable for exploratory data analysis where the underlying trend is unknown or irregular. However, it sacrifices the interpretability of parametric models, which provide explicit coefficients representing global effects, and requires tuning parameters like bandwidth that can complicate inference. In comparison to kernel regression, such as the Nadaraya-Watson estimator, local regression extends the framework by employing local polynomial fits—typically of degree 1 or 2—rather than constant or linear approximations, which allows for higher-order bias reduction and better adaptation to curvature. Both methods use kernel weights (e.g., tricube or Gaussian) to emphasize nearby observations, but local regression's polynomial component provides smoother estimates with reduced boundary bias, though at the expense of slightly higher computational demands for least-squares solving. Local regression contrasts with spline-based methods, like cubic splines, by being fully data-adaptive without requiring predefined knots or global optimization to enforce smoothness across the entire domain. This locality makes it more flexible for irregular or sparse data, where splines might over-smooth due to fixed knot placement, but local regression is computationally more intensive for large datasets owing to repeated local fits, unlike the efficient matrix-based solving in splines. Relative to nearest-neighbor methods, such as k-nearest neighbors regression, local regression emphasizes weighted local polynomial regression over simple averaging of neighbors, enabling better handling of non-uniform data densities and smoother trends through the incorporation of trends within neighborhoods. While both are instance-based and non-parametric, local regression's weighting and polynomial fitting yield more robust predictions against noise, though it demands more parameters to tune compared to selecting just k.

Method	Assumptions	Flexibility	Typical Use Cases
Parametric (e.g., linear regression)	Global functional form (e.g., linearity)	Low; fixed structure	Well-understood linear relationships; interpretable predictions
Kernel Regression	Local smoothness via fixed kernel	Moderate; constant/linear fits	Simple density estimation; basic non-parametric smoothing
Splines (e.g., cubic)	Piecewise polynomials with knots	High; global smoothness constraints	Structured data with known breakpoints; interpolation
Nearest Neighbors	Local averaging in fixed neighborhood	Low; no trend modeling	Classification; quick non-parametric prototyping
Local Regression (LOESS)	Local polynomial smoothness	High; adaptive to data patterns	Scatterplot smoothing; complex non-linear exploration
GAMs	Additive smooth components	Moderate; assumes additivity	Multivariate modeling with interactions minimized

Historical Development

Early smoothing techniques

Early smoothing techniques in statistics emerged primarily in the context of actuarial science and time series analysis, laying foundational concepts for localized fitting by addressing irregular data through weighted local averages. In the 19th century, actuaries developed methods to graduate mortality tables, smoothing raw counts of deaths to produce reliable life tables for insurance purposes. A key approach involved weighted averages applied locally to adjacent age groups, balancing fidelity to observed data with overall smoothness. W.S.B. Woolhouse's 1870 method exemplified this, using osculatory interpolation—essentially local polynomial fitting of low degree—to adjust mortality rates while minimizing bias from sampling variability.^[7] This technique influenced later local regression by emphasizing neighborhood-based adjustments, as discussed in historical reviews of actuarial smoothing.^[8] The Whittaker-Henderson graduation method, introduced in the 1920s, represented a significant precursor to penalized local smoothing. Developed by R. Henderson in 1916 and refined by E.T. Whittaker in 1923, it minimized a combination of squared deviations from raw data and squared second (or higher-order) differences to enforce smoothness, akin to a discrete analog of penalizing curvature in splines. This approach was widely adopted for mortality and population data, providing a framework for local fits that penalize roughness without global parametric assumptions.^[9] In the early 20th century, time series analysis extended these ideas through moving averages, which compute local means over sliding windows to reduce noise in sequential data. F.R. Macaulay's 1931 work formalized moving averages for economic time series, treating them as simple uniform-weighted local regressions of degree zero.^[8] Exponential smoothing, introduced by C.C. Holt in 1957, advanced this by applying decaying weights to past observations, effectively performing weighted local averaging with an exponentially decreasing kernel for trend and seasonal forecasting. These methods highlighted the value of local weighting in capturing underlying patterns amid volatility. By the 1960s, smoothing techniques incorporated robustness against outliers, with running medians gaining prominence as a non-parametric local estimator resistant to extreme values. John Tukey's exploratory data analysis in the late 1960s promoted running medians for initial scatterplot smoothing, computing medians over moving windows to provide a robust baseline before parametric fitting. These pre-Cleveland innovations marked a transition toward robust local methods, emphasizing outlier detection and downweighting in neighborhood-based estimation to improve reliability in noisy datasets.^[8]

Modern contributions

The foundational modern contributions to local regression began with William S. Cleveland's introduction of LOWESS (locally weighted scatterplot smoothing) in 1979, a robust method designed for smoothing scatterplots by fitting weighted least squares regressions locally around each point, emphasizing resistance to outliers through iterative reweighting.^[6] This approach addressed limitations in earlier smoothing techniques by incorporating robustness, making it suitable for exploratory data analysis in statistical graphics.^[6] In 1988, Cleveland and Susan J. Devlin extended LOWESS to LOESS (local regression), adapting it for multivariate predictors through local polynomial fitting, with a matrix-based implementation that facilitated efficient computation and included diagnostic tools for model assessment.^[5] Key innovations in these works included the span parameter as an analog to bandwidth for controlling smoothness, selection of polynomial degree (typically 0 or 1 for simplicity), and iterative reweighting schemes using robust loss functions like the bisquare to downweight influential outliers.^[5]^[6] Subsequent theoretical advancements were provided by Jianqing Fan and Irene Gijbels in their 1996 book, which rigorously analyzed local polynomial regression, establishing asymptotic properties such as bias and variance rates that guide optimal bandwidth selection and degree choice for improved estimation efficiency.^[10] Catherine Loader's 1999 book further broadened the framework by integrating local likelihood methods, enabling extensions to generalized linear models and density estimation while maintaining computational tractability through efficient algorithms.^[11] Post-2000 developments have primarily involved integrating local regression into machine learning pipelines, such as approximating Gaussian processes for scalable regression on large datasets, though without introducing major paradigm shifts in the core methodology as of 2025.

Model Formulation

Local polynomial regression

Local polynomial regression estimates the regression function m(x) at a point x by fitting a polynomial of degree p to the data points in a local neighborhood around x, using weights that decrease with distance from x. This approach approximates the unknown function locally as m(x + u) \approx \sum_{j=0}^p \beta_j u^j, where the coefficients \beta = (\beta_0, \dots, \beta_p)^\top are determined via weighted least squares to minimize bias and variance in the estimate \hat{m}(x) = \beta_0.^[12] The estimator is obtained by solving the minimization problem:

\hat{m}(x) = \arg\min_{\beta} \sum_{i=1}^n K\left( \frac{x_i - x}{h} \right) \left( y_i - \sum_{j=0}^p \beta_j (x_i - x)^j \right)^2,

where K(\cdot) is a kernel function that assigns higher weights to observations closer to x, and h > 0 is the bandwidth controlling the size of the local neighborhood. This formulation ensures that the fit is adapted to the local structure of the data, with the kernel K typically being a symmetric, positive, and bounded density function such as the Epanechnikov or Gaussian kernel.^[12] In the univariate case, the model directly applies the above equation, centering the polynomial terms at (x_i - x) to improve numerical stability and reduce multicollinearity among the powers, which would otherwise arise from high-degree polynomials evaluated at distant points. This centering shifts the design matrix to have a column of ones and subsequent columns with mean zero near x, mitigating ill-conditioning in the least-squares solution.^[12] For multivariate extensions to predictors in d > 1 dimensions, local polynomial regression, as implemented in methods like LOESS (locally estimated scatterplot smoothing), fits polynomials using a product kernel or a truncated power basis to handle interactions among variables. LOESS typically employs polynomials of degree up to 2, with weights based on Euclidean distances scaled by variable standard deviations, allowing flexible surface estimation without assuming a global parametric form.^[5] A key example is the choice of polynomial degree p. When p = 0, the model reduces to a local constant fit, equivalent to the Nadaraya-Watson kernel estimator \hat{m}(x) = \sum_{i=1}^n w_i(x) y_i with weights w_i(x) = K((x_i - x)/h) / \sum K((x_j - x)/h), which provides a simple weighted average but can exhibit bias at boundaries. In contrast, for p = 1 (local linear), the fit incorporates a slope term, yielding second-order accuracy and better adaptation to linear trends, as analyzed in early theoretical work on local smoothers.^[12]

Estimation procedure

The estimation procedure for local regression computes fitted values by performing a series of local weighted polynomial fits around each evaluation point in the predictor space. For a target point x, the algorithm first identifies a local subset of data points, typically the nearest f n observations (where f is the smoothing fraction and n is the sample size) or points within bandwidth h, to define the neighborhood. Weights are then assigned to these points using a kernel function, such as the tricube kernel, with w_i = K\left( \frac{\|x_i - x\|}{d(x)} \right), where d(x) is the distance to the farthest neighbor in the subset and higher weights favor closer points. These weights enter a weighted least squares minimization to estimate the coefficients of a low-degree polynomial (usually linear or quadratic), and the local fit is evaluated at x to obtain the estimate \hat{y}(x).^[13] The full curve or surface is constructed by repeating this for all desired x. In the LOESS variant, which builds on LOWESS for added flexibility, the procedure incorporates iterations for robustness. After an initial fit, residuals are calculated, and robustness weights (e.g., via the bisquare function) downweight points with large residuals relative to the median absolute deviation. Updated combined weights are used to refit the local model, with iterations (typically 1–4) continuing until changes in the fit are negligible, such as when the root mean squared error stabilizes.^[14] The naive computational complexity is O(n^2), as each of up to n local fits may examine all n points. For large n, efficiencies like data binning or k-d tree-based nearest neighbor searches reduce the cost per fit to O(\log n) or better, enabling near-linear overall scaling.^[2] For multivariate predictors, distances are computed using the Euclidean metric (after optional standardization), and the local subset is selected accordingly before applying the weighted fit. While full multivariate polynomials can be used, coordinate-wise smoothing—fitting univariate local regressions sequentially along each dimension—offers a practical alternative in high dimensions to avoid excessive parameters. A pseudocode outline for univariate LOWESS, emphasizing the iterative robustness, is provided below:

for each evaluation point x_i (i = 1 to n):
    Select the nearest f*n data points to x_i based on |x_j - x_i|
    Compute initial [kernel](/page/Kernel) weights w_j = (1 - (|x_j - x_i| / d_i)^3)^3 for j in neighbors, where d_i is distance to f*n-th nearest
    Fit local [polynomial](/page/Polynomial) (degree 1 or 2) via [weighted least squares](/page/Weighted_least_squares) using w_j to get initial ŷ_i and local fitted values at x_j
    for iteration = 1 to max_iterations (e.g., 4):
        Compute residuals r_j = y_j - fitted at x_j for neighbors
        Compute [median absolute deviation](/page/Median_absolute_deviation) MAD = median(|r_j - median(r)|)
        Compute robustness weights rw_j = [1 - (r_j / (6 * MAD))^2]^2 if |r_j| < 6 * MAD, else 0
        Update weights w_j = w_j * rw_j
        Refit local [polynomial](/page/Polynomial) using updated w_j to get new ŷ_i
    Assign final ŷ_i as the estimate at x_i
for each evaluation point x_i (i = 1 to n):
    Select the nearest f*n data points to x_i based on |x_j - x_i|
    Compute initial [kernel](/page/Kernel) weights w_j = (1 - (|x_j - x_i| / d_i)^3)^3 for j in neighbors, where d_i is distance to f*n-th nearest
    Fit local [polynomial](/page/Polynomial) (degree 1 or 2) via [weighted least squares](/page/Weighted_least_squares) using w_j to get initial ŷ_i and local fitted values at x_j
    for iteration = 1 to max_iterations (e.g., 4):
        Compute residuals r_j = y_j - fitted at x_j for neighbors
        Compute [median absolute deviation](/page/Median_absolute_deviation) MAD = median(|r_j - median(r)|)
        Compute robustness weights rw_j = [1 - (r_j / (6 * MAD))^2]^2 if |r_j| < 6 * MAD, else 0
        Update weights w_j = w_j * rw_j
        Refit local [polynomial](/page/Polynomial) using updated w_j to get new ŷ_i
    Assign final ŷ_i as the estimate at x_i

This structure ensures robustness by iteratively suppressing outlier influence while maintaining local adaptivity.^[14]^[13]

Matrix representation

The matrix representation of local polynomial regression provides a compact algebraic framework for the estimator, enabling efficient computation and analysis of its properties. For a fixed evaluation point x in the univariate case, the local design matrix \mathbf{X}(x) is an n \times (p+1) matrix, where n is the number of data points and p is the polynomial degree, with the i-th row given by [1, (x_i - x), (x_i - x)^2, \dots, (x_i - x)^p]. The corresponding diagonal weight matrix \mathbf{W}(x) has entries K((x_i - x)/h), where K is the kernel function and h is the bandwidth. The local parameter vector is then estimated via weighted least squares as \hat{\boldsymbol{\beta}}(x) = [\mathbf{X}(x)^T \mathbf{W}(x) \mathbf{X}(x)]^{-1} \mathbf{X}(x)^T \mathbf{W}(x) \mathbf{y}, where \mathbf{y} is the response vector, and the fitted value is \hat{m}(x) = \mathbf{e}_1^T \hat{\boldsymbol{\beta}}(x) with \mathbf{e}_1 = (1, 0, \dots, 0)^T. To obtain fitted values across all data points, local regression can be expressed globally using the smoother matrix (or hat matrix) \mathbf{H}, an n \times n matrix whose i-th row consists of the effective weights from the local fit at x_i, specifically \mathbf{e}_1^T [\mathbf{X}(x_i)^T \mathbf{W}(x_i) \mathbf{X}(x_i)]^{-1} \mathbf{X}(x_i)^T \mathbf{W}(x_i). The vector of fitted values is thus \hat{\mathbf{y}} = \mathbf{H} \mathbf{y}. This matrix form facilitates key diagnostic and inferential tools: the effective degrees of freedom of the fit is given by \trace(\mathbf{H}), which quantifies the model's complexity and aids in bandwidth selection, while the diagonal elements H_{ii} serve as leverage measures to identify influential observations. In the multivariate setting, with predictor dimension d > 1, the weight matrix \mathbf{W}(x) incorporates tensor products of univariate kernels, typically as product kernels K((x - x_i)/h) = \prod_{j=1}^d K_j((x_{ij} - x_j)/h_j) to handle anisotropic smoothing across dimensions, while the design matrix extends to multivariate polynomial terms.

Tuning Parameters

Bandwidth selection

The bandwidth h in local regression controls the degree of locality in the fitting process, determining the size of the neighborhood around each point used for estimation. A small h produces wiggly fits that closely follow the data, resulting in low bias but high variance, while a large h yields smoother fits with reduced variance at the cost of increased bias.^[15] Cross-validation methods are widely used for bandwidth selection by minimizing an estimate of prediction error. Leave-one-out cross-validation (LOCV) fits the model excluding the i-th observation and minimizes the mean squared error \sum_{i=1}^n (y_i - \hat{m}_{-i}(x_i))^2, where \hat{m}_{-i} is the fit without the i-th point; this approach avoids overfitting and is computationally feasible due to efficient update formulas in local polynomial settings. Generalized cross-validation (GCV) extends LOCV by penalizing for the effective degrees of freedom, approximating the trace of the smoothing matrix to select h that balances fit and complexity without refitting the model n times. Plug-in selectors derive h from asymptotic approximations to the mean integrated squared error (MISE). A common rule-of-thumb for local linear regression assumes Gaussian errors and uses h \approx 1.06 \sigma n^{-1/5}, where \sigma is the residual standard deviation and n is the sample size; this provides a simple initial choice based on the optimal asymptotic bandwidth scaling.^[15] More refined plug-in methods iteratively estimate higher-order derivatives and roughness penalties to solve for the MISE-minimizing h, often outperforming cross-validation in finite samples for smooth functions.^[16] Adaptive bandwidth selection allows h to vary locally, adjusting to data density or heteroscedasticity for improved efficiency in non-uniform settings. For example, in regions of sparse data or high variance, a larger local h increases effective sample size, while denser areas use smaller h to capture detail; this can be implemented via variable kernel weighting or nearest-neighbor scaling. Such methods are particularly useful when residual variance changes with the predictor, as in heteroscedastic errors.^[17] Diagnostics for bandwidth choice often involve visual or quantitative assessment post-fitting. Inspecting standardized residuals for patterns like systematic over- or under-smoothing helps identify if h is too large (residuals clustered) or too small (noisy residuals); curvature measures, such as integrated squared second derivatives, can quantify oversmoothing by comparing fitted curvature to expected noise levels.

Polynomial degree and weights

In local polynomial regression, the choice of polynomial degree p balances bias and variance in the local fits. For p = 0, the model is locally constant, equivalent to a weighted average akin to kernel smoothing, which provides straightforward smoothing but can suffer from higher bias, especially near data boundaries.^[18] Local linear regression with p = 1 fits a straight line within each neighborhood, reducing bias compared to the constant case while maintaining low variance.^[19] Higher degrees, such as p = 2 (quadratic) or p = 3 (cubic), allow for better approximation of local curvature but are rarely employed in practice due to substantially increased variance, more parameters to estimate, and greater sensitivity to outliers.^[20] Key tradeoffs in selecting p involve the bias-variance dilemma and boundary performance. Odd degrees like p = 1 are generally preferred over even degrees because they promote symmetry in the local fits and yield superior behavior at data edges, where even degrees may introduce artifacts from asymmetric extrapolation.^[20] Asymptotically, p = 1 often proves optimal for minimizing mean squared error (MSE) under common regularity conditions, achieving minimax efficiency rates comparable to the best possible nonparametric estimators. Weights in local regression are derived from kernel functions that downweight distant observations, ensuring the local fit emphasizes nearby data points. These kernels must be nonnegative, symmetric about zero, and integrate to 1 over their support to preserve consistency and unbiasedness in the estimation.^[4] Popular choices for smooth weighting include the tri-cubic kernel, K(u) = (1 - |u|^3)^3 for |u| \leq 1 (and 0 otherwise), valued for its compact support and continuity, and the Gaussian kernel, K(u) = \frac{1}{\sqrt{2\pi}} \exp\left( -\frac{u^2}{2} \right), which provides infinite support but decays rapidly.^[4] To enhance computational efficiency, kernels are typically truncated such that weights are nonzero only within a bounded effective support, often scaled to [-1, 1] after normalization by the bandwidth, beyond which contributions are negligible or explicitly set to zero.^[20] For illustration, consider simulated data from a nonlinear function like \sin(x) + 0.5 \cos(2x) with added Gaussian noise and 100 observations; the local linear fit (p = 1) demonstrates reduced bias and wigglier adaptation to curvature near boundaries relative to the local constant fit (p = 0), resulting in closer alignment with the underlying trend.^[19]

Kernel functions

Kernel functions in local regression serve as weighting mechanisms that assign greater influence to observations nearer the target point x, while diminishing the impact of distant points. These functions possess key properties: they are non-negative, symmetric about zero (i.e., K(-u) = K(u)), integrate to 1 over the real line (\int K(u) \, du = 1), and preferably have bounded support to limit computational demands and focus weights locally. Common univariate kernels include the Epanechnikov kernel, given by

K(u) = 0.75 (1 - u^2) I(|u| \leq 1),

where I(\cdot) is the indicator function; this kernel is optimal for minimizing mean squared error in asymptotic analyses of local polynomial regression. The tricube kernel is

K(u) = (1 - |u|^3)^3 I(|u| \leq 1),

offering smooth weight decay with compact support. The Gaussian kernel,

K(u) = \frac{1}{\sqrt{2\pi}} \exp\left( -\frac{u^2}{2} \right),

features infinite support, assigning non-zero weights to all observations, which can enhance global smoothness but increase sensitivity to outliers.^[4] In multivariate settings, product kernels extend univariate forms via K(\mathbf{u}) = \prod_{j=1}^d K_j(u_j), accommodating dimension-specific bandwidths for anisotropic smoothing. For isotropic cases, spherical kernels apply a univariate kernel to the Euclidean norm, such as K(\mathbf{u}) = k(\|\mathbf{u}\|), promoting uniform weighting across directions.^[21] Monotone decreasing kernels, exemplified by the Epanechnikov and biweight, enhance robustness by sharply reducing weights for distant or outlier observations, thereby mitigating their undue influence on local fits. Although the Epanechnikov kernel achieves theoretical optimality for mean squared error, the tricube kernel is often selected in practice for LOESS due to its superior smoothness and effective outlier handling.

Fitting Methods

Standard least squares

In local polynomial regression of degree p, the standard estimation procedure employs a weighted least squares criterion to fit the model locally around a target point x_0. Specifically, the parameters \beta = (\beta_0, \beta_1, \dots, \beta_p)^T are obtained by minimizing the objective function

\sum_{i=1}^n w_i (y_i - \sum_{j=0}^p \beta_j (x_i - x_0)^j)^2,

where the weights w_i = K((x_i - x_0)/h) are derived from a kernel function K and bandwidth h.^[22] The resulting estimator \hat{\beta}(x_0) yields a linear smoother, expressed as \hat{m}(x_0) = \sum_{i=1}^n l_i(x_0) y_i, where the weights l_i(x_0) depend on the design points, kernel, bandwidth, and polynomial degree but not on the response values.^[22] This linearity facilitates computational efficiency and analytical tractability. Furthermore, under the assumption that the true regression function m(x) is linear (i.e., p=1) and certain regularity conditions on the kernel and design density hold, the estimator is unbiased at x_0, meaning E[\hat{m}(x_0)] = m(x_0).^[22] To assess the fit, residuals are computed as e_i = y_i - \hat{y}_i, where \hat{y}_i is the predicted value at x_i. Standardized residuals, scaled by an estimate of the residual standard deviation, help identify influential points and model adequacy.^[22] Lack-of-fit tests, such as those comparing observed residuals to expected variability under the model, can detect systematic departures from the local polynomial assumption.^[22] Pointwise confidence intervals for \hat{m}(x_0) are constructed using asymptotic normality of the estimator, with variance given by \sigma^2 \, e_1^T (X^T W X)^{-1} X^T W^2 X (X^T W X)^{-1} e_1, where X is the local design matrix, W is the diagonal weight matrix, e_1 = (1, 0, \dots, 0)^T, and \sigma^2 is estimated from the residuals.^[22] These bands provide interval estimates that account for local variability, typically at the 95% level using standard normal quantiles. Despite its simplicity, the standard least squares approach is sensitive to outliers, as large residuals can disproportionately influence the weighted fit.^[22] It also assumes homoscedasticity, where the error variance \sigma^2 is constant across the support, which may not hold in heteroscedastic settings and can lead to inefficient estimates.^[22]

Local likelihood estimation

Local likelihood estimation generalizes the local regression paradigm to settings where the response variable follows a non-Gaussian distribution, such as in generalized linear models (GLMs). Introduced by Tibshirani and Hastie, this method fits the model locally by maximizing a weighted log-likelihood function centered at a target point x, using kernel-based weights to emphasize nearby observations. This allows for flexible smoothing of regression functions under various distributional assumptions, bridging parametric GLMs and nonparametric techniques.^[23] The core framework defines the local log-likelihood as

L_x(a) = \sum_i w_i(x) \, l(y_i, \langle a, A(x_i - x) \rangle),

where w_i(x) are kernel weights (e.g., w_i(x) = K((x_i - x)/h) normalized over the sample), l(\cdot) is the log-density of the response distribution, a is the vector of local polynomial coefficients, and A(u) is the basis for the polynomial (e.g., A(u) = (1, u) for linear). The mean response is then modeled as \mu_i = g^{-1}(\langle a, A(x_i - x) \rangle), with g as the link function (e.g., log for Poisson). This formulation enables estimation for distributions like Poisson, where the response represents counts, or binomial for binary outcomes.^[23] Estimation proceeds iteratively, typically via iteratively reweighted least squares (IRLS) or Newton-Raphson updates adapted to the local weights. Starting from an initial fit (e.g., local constant), each iteration solves a weighted least squares problem:

a^{(k+1)} = a^{(k)} + (X^T W V X)^{-1} X^T W \dot{l}(y, X a^{(k)}),

where X incorporates the polynomial basis, W is the diagonal matrix of weights w_i(x), V is the variance function of the distribution, and \dot{l} is the score function. Convergence yields the local parameter estimates, from which the smoothed function value is extracted as \hat{\theta}(x) = \langle \hat{a}, A(0) \rangle. For Gaussian errors with identity link, this reduces asymptotically to standard local least squares.^[23] Applications include smoothing count data via local Poisson regression, which models rates with a log link to ensure positivity, and local logistic regression for binary outcomes, estimating probabilities through the logit link. This method's advantages lie in its ability to handle heteroscedasticity and non-normal errors inherent to such data, providing robust local inference without assuming global parametric forms. Modern implementations are available in libraries like scikit-learn as of 2025.^[24]^[23]

Robust local regression

Robust local regression extends standard local regression techniques to mitigate the influence of outliers and heavy-tailed error distributions by iteratively downweighting observations with large residuals. This is achieved through the use of robust loss functions, denoted as \rho(r / \sigma), where r is the residual and \sigma is a robust scale estimate, such as the median absolute deviation. A common example is Huber's \rho function, defined as \rho(r) = r^2 / 2 for |r| \leq 1 and \rho(r) = |r| - 0.5 for |r| > 1, which quadratically penalizes small residuals like least squares while linearly bounding the impact of large ones. This approach draws from M-estimation principles adapted to the local context, where the objective is to minimize \sum w_i \rho((y_i - \mathbf{p}(x_i)^\top \boldsymbol{\beta}) / \sigma) over a local neighborhood, with w_i as kernel-based spatial weights and \mathbf{p}(x_i) the polynomial basis. A prominent implementation is the LOWESS (locally weighted scatterplot smoothing) procedure, which combines initial least squares fitting with robust reweighting in an iterative framework. In the first pass, a local least squares fit is performed to obtain preliminary residuals. The second pass computes robustness weights using a bisquare function, B(e / (6s)) = (1 - (e / (6s))^2)^2 for |e / (6s)| < 1 and 0 otherwise, where e are the residuals and s is the robust scale; these weights downweight outliers. A third pass then refits using least squares on the combined spatial and robustness weights, typically requiring only 2-3 iterations for convergence. This method effectively cleans the data locally, reducing distortion from deviant points. Such robust modifications substantially improve the breakdown point—the smallest fraction of contaminated data that can cause the estimator to fail—from 0 for standard least squares to approximately 0.25 for typical M-estimators like those employing Tukey's biweight (bisquare) with standard tuning constants. For instance, in Cleveland's analysis of abrasion loss data contaminated by outliers, the robust LOWESS fit accurately captures the underlying linear trend, whereas the non-robust version is severely pulled toward the aberrant points, demonstrating enhanced resistance in practical settings with up to 25% contamination.^[22]

Theoretical Properties

Bias-variance tradeoff

In local polynomial regression of degree p, the estimator at a point x exhibits a bias-variance decomposition that governs its finite-sample performance. The bias arises from the approximation error of the local polynomial to the underlying regression function, typically of order O(h^{p+1}), where h is the bandwidth; this error stems from the neglect of higher-order terms in the Taylor expansion of the function around x.^[25] Increasing the polynomial degree p reduces this bias by allowing better local approximation, but it also amplifies the estimator's sensitivity to noise, thereby increasing the variance.^[26] The variance of the local polynomial estimator, in the univariate case, is of order O(1/(n h)), where n is the sample size; this reflects the effective local sample size n h f(x), with f(x) denoting the design density at x.^[27] Smaller bandwidths concentrate the fit on fewer nearby points, reducing bias but inflating variance due to noisier local estimates, while larger bandwidths incorporate more data points, stabilizing the variance at the expense of higher bias from a coarser approximation.^[25] The mean squared error (MSE), which combines squared bias and variance, is minimized by selecting a bandwidth that balances these competing effects, yielding an optimal h of order n^{-1/(2p+3)}.^[26] This choice achieves an optimal MSE rate of O(n^{-2(p+1)/(2p+3)}), highlighting the inherent tradeoff: higher p permits smaller optimal bandwidths and faster MSE convergence rates, but only up to the point where the increased variance outweighs the bias reduction.^[25] For model selection and inference, the effective degrees of freedom (edf) of the local polynomial fit are determined by the trace of the smoother matrix, which captures the estimator's complexity; approximations often yield edf roughly equal to p+1 in the interior for fixed p and large n h, though boundary effects and kernel choice introduce additional trace terms.^[28] Simulations illustrate this tradeoff vividly: for a fixed sample and true function, plotting integrated MSE against h produces U-shaped curves for each p, with the minimum occurring at progressively smaller h as p rises from 0 to 3, and the minimal MSE decreasing due to lower bias, though curves for higher p rise more steeply on the right from elevated variance.^[19]

Asymptotic convergence

Local regression estimators exhibit strong consistency under standard regularity conditions. Specifically, for a local polynomial estimator \hat{m}(x) of degree p fitting the regression function m(x), \hat{m}(x) \to m(x) in probability as the sample size n \to \infty, provided the bandwidth h \to 0 and nh \to \infty, assuming the design density f(x) > 0, the error variance \sigma^2(x) is finite and positive, m is sufficiently smooth (at least p+1 times differentiable), and the kernel K has compact support or sufficient moments. The asymptotic distribution of the local polynomial estimator centers around its bias term and is normal after scaling. For fixed x in the interior of the support, \sqrt{nh} \left( \hat{m}(x) - m(x) - b(x) h^{p+1} \right) \to_d N\left(0, \frac{\sigma^2(x)}{f(x)} \nu_0 \right), where b(x) is a constant depending on m^{(p+1)}(x) and kernel moments (specifically, b(x) = \frac{m^{(p+1)}(x)}{(p+1)!} \int u^{p+1} K(u) \, du for appropriate kernels), \nu_0 = \int K(u)^2 \, du, and the convergence holds under conditions including h \to 0, nh \to \infty, nh^{2p+3} \to 0 (to control higher-order bias), i.i.d. errors, and smoothness of m and f. For local linear regression (p=1), the asymptotic variance simplifies to the same form as the Nadaraya-Watson estimator, \frac{\sigma^2(x)}{f(x)} \int K(u)^2 \, du. Boundary points require adjustments, such as one-sided kernels, but local polynomial regression typically maintains the optimal interior convergence rate O(n^{-2(p+1)/(2p+3)}) for MSE near edges, an advantage over methods like Nadaraya-Watson.^[29] Optimal bandwidth selection yields minimax convergence rates for local polynomial estimators. Choosing h \asymp n^{-1/(2p+3)} minimizes the asymptotic MSE, achieving \mathrm{MSE}(\hat{m}(x)) \asymp n^{-2(p+1)/(2p+3)} at interior points, where the constant depends on \sigma^2(x), f(x), m^{(p+1)}(x), and kernel moments. This rate matches the minimax rate over Sobolev classes of smoothness p+1. Local linear regression (p=1) attains the optimal rate n^{-4/5} for twice-differentiable functions, matching the n^{-4/5} rate of local constant fits (with symmetric kernels). Local polynomial estimators relate closely to kernel regression methods. In particular, the local linear estimator is asymptotically equivalent to a second-order kernel estimator, inheriting its bias and variance properties while offering better boundary performance. Higher odd-degree polynomials (p = 2\nu - 1) correspond to higher-order kernels of order $2\nu, enabling adaptation to smoother functions without increasing variance.

Strengths and Limitations

Advantages

Local regression, also known as LOESS (locally estimated scatterplot smoothing), offers significant adaptivity in modeling complex relationships by fitting simple models locally around each point in the dataset, thereby capturing non-linearities that vary across the data without requiring a predefined global functional form. This local fitting approach allows the model to adapt to the specific structure of the data in different regions, providing a flexible representation of underlying patterns that global parametric methods might oversimplify or miss.^[5]^[1] A key benefit lies in its built-in robustness options, where weighting schemes downplay the influence of outliers and leverage points, ensuring that the fitted surface remains representative of the majority of the data. By incorporating robust estimation techniques, such as those using iteratively reweighted least squares, local regression mitigates distortions from anomalous observations, making it particularly suitable for real-world datasets prone to noise or errors.^[6]^[1] Local regression excels in visualization tasks, serving as an effective tool for exploratory data analysis through scatterplot smoothing that highlights trends, clusters, and relationships otherwise obscured in raw data. This graphical utility enhances perceptual understanding, enabling researchers to discern subtle patterns and inform subsequent modeling decisions.^[6]^[1] By eschewing strong parametric assumptions about the overall relationship form, local regression avoids misspecification bias that can plague traditional models in scenarios involving intricate or unknown dependencies, allowing for more reliable inference in complex environments. Additionally, its multivariate extension via LOESS naturally accommodates interactions among predictors by performing local fits in higher dimensions, facilitating the exploration of multidimensional data structures without explicit interaction terms.^[5]^[1]

Disadvantages

Local regression, while flexible for capturing nonlinear relationships, incurs significant computational costs due to its reliance on weighted least squares fits at each evaluation point, resulting in a naive implementation complexity of O(n^2) for n data points, which becomes prohibitive for large datasets without specialized approximations or fast algorithms.^[4]^[30] This expense arises from iteratively solving local models across the data, particularly in multivariate settings where the number of parameters grows with dimensionality, limiting scalability in high-throughput applications.^[30] A key risk in local regression is overfitting, especially when the bandwidth h is chosen too small, leading to high-variance, noisy fits that capture spurious patterns in the data rather than underlying trends.^[30] This occurs because smaller h values reduce bias but amplify variance through reliance on fewer nearby points, necessitating careful bandwidth tuning to balance the bias-variance tradeoff—though optimal h depends on unknown aspects of the true function, complicating reliable implementation.^[30] For simple underlying relations, local methods are often less efficient than parametric alternatives, as their flexibility introduces unnecessary variability without proportional gains in accuracy.^[4] Interpretability poses another challenge, as local regression produces fits defined by a collection of pointwise parameters rather than a single global model, making it difficult to extract concise, explainable insights about the overall relationship.^[4] Unlike parametric regression, where coefficients directly quantify effects, the "black-box" nature of local estimates hinders understanding of how predictors influence responses across the domain, particularly in multivariate cases.^[30] Edge effects further limit local regression, introducing bias at the boundaries of the data range where fewer points are available for fitting, resulting in wider confidence intervals and distorted estimates.^[30] This boundary bias is pronounced in kernel-based implementations, as the weighting scheme unevenly influences fits near the edges, potentially leading to over- or underestimation compared to interior regions.^[31] Finally, local regression suffers from the curse of dimensionality, degrading rapidly in high-dimensional spaces because the effective sample size in local neighborhoods shrinks exponentially, requiring nh^d \to \infty (with d the dimension) for consistent estimation, which demands impractically large datasets.^[32] Beyond 2-3 dimensions, sparsity leads to unreliable fits, often prompting reliance on dimensionality-reducing assumptions like additivity to mitigate this issue.^[30]

Applications and Implementation

Real-world uses

Local regression, also known as LOESS or LOWESS, finds extensive application in statistics and visualization for smoothing residuals in analysis of variance (ANOVA) models and detecting trends in time series data. In ANOVA contexts, it enables diagnostic checking by fitting smooth curves to residual plots, revealing non-linear patterns or outliers that linear models might overlook, as demonstrated in exploratory data analysis of industrial datasets. For instance, local regression smooths residuals to identify heteroscedasticity or curvature in regression diagnostics, improving model interpretation without assuming global parametric forms. In time series analysis, particularly with NOAA climate data, local regression quantifies nonlinear trends in global mean surface temperature (GMST), offering advantages over linear methods by capturing gradual accelerations in warming rates from 1850 onward, with reduced uncertainty in long-term estimates compared to segmented regressions. This approach has been applied to NOAA's historical temperature records to assess climate variability, providing robust trend lines that align closely with physical models of radiative forcing.^[33] In finance, local regression is employed for estimating volatility surfaces and identifying local trends in stock returns. Volatility surface estimation uses local polynomial fits to interpolate implied volatilities across strikes and maturities, ensuring no-arbitrage conditions while calibrating stochastic local volatility models to market option prices; for example, automatic local regression techniques calibrate surfaces for equity indices like the S&P 500.^[34] For stock returns, nonparametric local regression analyzes movements by weighting recent data points and can outperform global linear models in emerging markets where local factors dominate, as seen in analyses of Asian equity indices. Geophysics leverages local regression for signal processing in seismic data, particularly high-frequency smoothing to enhance event detection. In studies of pre-earthquake signals, local regression type methods smooth geophysical time series to isolate precursors, such as anomalous strain variations before the 2011 Tohoku event, by applying LOWESS to high-frequency recordings from global networks; local fitting aids in the identification of non-stationary patterns overlooked by global filters.^[35] In biology, local regression smooths dose-response curves and growth trajectories, accommodating irregular data without rigid parametric assumptions. For dose-response analysis, it smooths nonlinear relationships in pharmacological assays. Robust local regression variants handle outliers in high-throughput screening data. In growth trajectory smoothing, local regression models longitudinal data from longitudinal studies, such as plant or animal development cohorts, by estimating smooth derivatives for growth rates; applications to ecological datasets, like tree height measurements over decades, reveal inflection points in trajectories influenced by environmental stressors, with smoothing parameters tuned to balance bias and variance in sparse observations. Within machine learning, local regression serves as a preprocessing tool for non-linear feature engineering, particularly in visualization and model input preparation. It transforms raw features by smoothing nonlinear relationships, creating derived variables that enhance predictive performance in tasks like classification; for example, in R's ggplot2 package, LOESS smoothing generates trend lines for scatterplots of high-dimensional datasets, aiding exploratory data analysis in genomic studies where it preprocesses expression data to highlight non-linear clusters. This preprocessing step can improve downstream algorithms, such as random forests, by reducing noise in engineered features derived from local fits, as evidenced in applications to tabular datasets. As of 2025, local regression continues to see applications in finance, such as non-parametric calibration of local volatility surfaces for option pricing.^[34]

Software and computational aspects

Local regression, also known as LOESS or LOWESS, is implemented in several statistical software packages and programming languages, facilitating its use in data analysis workflows. In the R programming language, the loess() function from the base stats package performs local polynomial regression fitting, allowing users to specify parameters such as the smoothing span and polynomial degree for flexible curve estimation.^[36] Additionally, the ggplot2 package integrates LOESS smoothing via the geom_smooth() layer, which defaults to LOESS for scatterplot visualizations and supports customization of the span parameter to control smoothness.^[37] In Python, the statsmodels library provides the lowess() function in its nonparametric module, implementing locally weighted scatterplot smoothing based on Cleveland's algorithm with options for fraction-based span and iterative robust fitting.^[38] While scikit-learn lacks a native LOESS implementation, statsmodels serves as a primary alternative for local regression tasks within the Python machine learning ecosystem. For MATLAB users, the Curve Fitting Toolbox includes the smooth() function with a 'lowess' option, enabling robust local linear or quadratic fits directly in the Curve Fitter app or via command-line interfaces.^[39] Spreadsheet users can access LOESS through add-ins like the Real Statistics Resource Pack for Excel, which offers the LOESS array function to compute smoothed values without requiring advanced programming.^[40] Computational efficiency remains a challenge for large datasets due to the iterative nearest-neighbor searches inherent in local regression algorithms. To address this, implementations often employ k-d trees for rapid identification of local neighborhoods, as seen in the SAS/STAT LOESS procedure, which partitions predictor space into rectangular cells for faster neighbor retrieval.^[41] For equally spaced data, approximations using fast Fourier transforms (FFT) or convolution can accelerate smoothing by exploiting regularity in the input grid, though these are less common in standard libraries. Binning techniques provide further approximations by pre-aggregating data into discrete intervals before local fitting, reducing the effective sample size and computation time at the cost of minor accuracy loss in high-density regions.^[42] In the 2020s, local regression has seen integration into broader machine learning libraries for scalable applications, with Python's statsmodels increasingly used alongside frameworks like scikit-learn for hybrid nonparametric-parametric models. GPU acceleration remains limited for pure LOESS due to its sequential neighborhood dependencies.

References

[1]
[PDF] LOESS (or LOWESS)
It is designed to address nonlinear relationships where linear methods do not perform well. Loess fits a regression line through the moving central tendency of.
[2]
[PDF] An Approach to Regression Analysis by Local Fitting William S ...
May 16, 2007 · Locally weighted regression, or loess, is a way of estimating a regression surface through a multivariate smoothing procedure, fitting a ...
[3]
None
### Extracted Abstract and Key Description of Robust Locally Weighted Regression
[4]
4.1.4.4. LOESS (aka LOWESS) - Information Technology Laboratory
LOESS combines much of the simplicity of linear least squares regression with the flexibility of nonlinear regression. It does this by fitting simple models to ...Missing: seminal paper
[5]
An Approach to Regression Analysis by Local Fitting
The goal of this article is to show, through applications, how loess can be used for three purposes: data exploration, diagnostic checking of parametric ...Missing: original | Show results with:original
[6]
Robust Locally Weighted Regression and Smoothing Scatterplots
Apr 5, 2012 · Abstract. The visual information on a scatterplot can be greatly enhanced, with little additional cost, by computing and plotting smoothed ...
[7]
Explanation of a New Method of Adjusting Mortality Tables - jstor
The proved by Mr. Sprague, Journal, vol. I would here suggest that the otherwise determined in the form single life, ...
[8]
Smoothing by Local Regression: Principles and Methods
Local regression is an old method for smoothing data, having origins in the graduation of mortality data and the smoothing of time series in the late 19th ...
[9]
[PDF] multidimensional whittaker-henderson graduation frank e. knorr - SOA
Data are almost always expected 'to be smooth across age. Some dimensions defined by variables such as marital status or national origin are difficult to order ...
[10]
Local Polynomial Modelling and Its Applications - Routledge
In stock Free deliveryBy Jianqing Fan, Irene Gijbels Copyright 1996. Hardback $168.00. eBook $157.50. ISBN 9780412983214. 358 Pages. Published March 1, 1996 by Chapman & Hall. United ...
[11]
Local Regression and Likelihood - Book - SpringerLink
This book introduces the local regression method in univariate and multivariate settings, and extensions to local likelihood and density estimation.
[12]
Local Polynomial Modelling and Its Applications
May 2, 2018 · Data-analytic approaches to regression problems, arising from many scientific disciplines are described in this book.
[13]
Example of LOESS Computations
Computing the weights is a three-step process: determine the distance from each point to the point of estimation; scale the distances by the maximum distance ...
[14]
Robust Locally Weighted Regression and Smoothing Scatterplots
Dec 1, 1979 · This thesis examines the effectiveness of Robust Locally Weighted Regression Scatterplot Smoothing (LOWESS), a procedure that differs from ...Missing: JASA | Show results with:JASA
[15]
https://doi.org/10.1201/9780203748725
[16]
https://doi.org/10.1080/01621459.1995.10476630
[17]
Local Bandwidth Choice in Kernel Regression Estimation
Feb 21, 2012 · This article gives a short overview on proposals for locally adaptive kernel regression estimation. Computational and algorithmic aspects of ...
[18]
[PDF] Locally weighted polynomial regression: Parameter choice and ...
[9] Generally, the strategy for local polynomial regres- sion is to choose a certain number, k, of nearest neighbors. (in terms of Euclidean distance) of the ...
[19]
10.4 Local Polynomial Regression | A Guide on Data Analysis
For local linear regression (p=1 p = 1 ), the bias is of order O(h2) O ( h 2 ) , while for local quadratic regression (p=2 p = 2 ), it's of order O(h3) O ( h 3 ) ...
[20]
Locally weighted polynomial regression: Parameter choice and ...
May 25, 2006 · [26] Thus there is a trade-off between bias and variance as one changes the order of the local polynomial and the number of points used to fit ...
[21]
[PDF] Multivariate kernel regression in vector and product metric spaces
May 3, 2024 · The derivations in the technical appendix of this paper provide moments and bounds for general multivariate local functions; these bounds ...
[22]
Local polynomial modelling and its applications - Semantic Scholar
Local polynomial modelling and its applications · Jianqing Fan, I. Gijbels · Published 1994 · Mathematics.
[23]
Local Likelihood Estimation - Taylor & Francis Online
In this article, we extend the idea of local fitting to likelihood-based regression models. One such application is to the class of generalized linear models ( ...
[24]
[PDF] Local Linear Regression Smoothers and Their Minimax Efficiencies ...
May 16, 2007 · Local Linear Regression Smoothers and Their Minimax Efficiencies ... FAN,J. (1992). Design-adaptive nonparametric regression. J. Amer ...
[25]
Local Linear Regression Smoothers and Their Minimax Efficiencies
DOWNLOAD PDF + SAVE TO MY LIBRARY. Ann. Statist. 21(1): 196-216 (March ... Jianqing Fan. "Local Linear Regression Smoothers and Their Minimax Efficiencies.
[26]
[PDF] Local Polynomial Variance-Function Estimation - Matt Wand
In this article we extend local polynomial regression to estimation of the variance function. As we show in. Section 1, our proposal can be generalized to ...
[27]
Calculating degrees of freedom in multivariate local polynomial ...
Degrees of freedom are calculated using the trace of the hat matrix, which transforms the response variable to its predicted value. This is used to test model ...
[28]
None
Below is a merged summary of local regression comparisons based on the provided segments from "Local Regression and Likelihood" by Clive Loader (1999). To retain all information in a dense and organized manner, I will use a table format in CSV style for each comparison category (Parametric Methods, Kernel Regression, Splines, Nearest Neighbor Methods), followed by a concise narrative summary of assumptions, flexibility, computational aspects, and use cases that integrates all details. This approach ensures comprehensive coverage while maintaining clarity.
[29]
[PDF] Local Regression and Likelihood
These chapters introduce the local regression method in univariate and multivariate settings, and extensions to local likelihood and density estima- tion. Basic ...<|control11|><|separator|>
[30]
Local Polynomial Regression 3: Correcting Bias
Jul 12, 2022 · This post is the third in a series on local polynomial regression, motivating the local polynomial estimator through bias reduction.
[31]
[PDF] IEOR 265 – Lecture 6 Local Linear Regression
even worse: There is a “curse of dimensionality” that occurs, because the convergence rate gets exponentially worse as the ambient dimension d increases. It ...
[32]
loess function - Local Polynomial Regression Fitting - RDocumentation
loess: Local Polynomial Regression Fitting. Description: Fit a polynomial surface determined by one or more numerical predictors, using local fitting.
[33]
Smoothed conditional means — geom_smooth - ggplot2
Controls the amount of smoothing for the default loess smoother. Smaller numbers produce wigglier lines, larger numbers produce smoother lines. Only used with ...
[34]
statsmodels.nonparametric.smoothers_lowess.lowess
This lowess function implements the algorithm given in the reference below using local linear estimates. Suppose the input data has N points. The algorithm ...
[35]
Lowess Smoothing - MATLAB & Simulink - MathWorks
The app uses locally weighted linear regression to smooth the data. In the Fit Options pane, you can try different fit options. Fit Options pane for lowess fit.Missing: software implementations R Python statsmodels
[36]
LOESS Regression using Excel
Real Statistics supports LOESS Regression with the LOESS array function. Currently, there isn't a LOESS Regression data analysis tool. The LOESS array function ...
[37]
[PDF] The LOESS Procedure - SAS Support
The LOESS procedure implements a nonparametric method for estimating regression surfaces pioneered by. Cleveland, Devlin, and Grosse (1988); Cleveland and ...
[38]
Fast implementation of LOESS for equally spaced samples (possibly ...
Oct 12, 2016 · Is it possible to do LOESS smoothing on equally spaced data using convolution? The Wikipedia article on LOESS mentions a possibility of ...How to deal with signal not equally spaced in time when performing ...Why can't DFT be used when samples are not equally spaced in time?More results from dsp.stackexchange.comMissing: FFT | Show results with:FFT
[39]
NVIDIA's GPU Acceleration in scikit-learn, UMAP, and HDBSCAN
Mar 19, 2025 · NVIDIA's latest cuML update brings GPU acceleration to scikit-learn, UMAP, and HDBSCAN, boosting performance by up to 50x for sklearn—with zero ...Missing: integration 2020s