Fact-checked by Grok 2 weeks ago

Kernel smoother

A kernel smoother is a nonparametric statistical used to estimate a real-valued , such as a or probability , by computing a locally weighted of observed points, where the weights are assigned via a that diminishes with increasing from the evaluation point. This approach, which avoids assuming a specific form for the underlying , relies on two primary components: the (often Gaussian or Epanechnikov for its and finite ) that defines the shape of the weighting, and the bandwidth parameter that controls the degree of , balancing and variance in the estimate. is particularly effective for revealing underlying patterns in noisy without overparameterization. The origins of kernel smoothing trace back to early 20th-century actuarial methods, such as Spencer's 15-point moving average formula from 1904, but modern developments began with kernel density estimation introduced by Murray Rosenblatt in 1956 and further formalized by Emanuel Parzen in 1962, establishing the Parzen-Rosenblatt window method for nonparametric probability density estimation. For regression, the seminal Nadaraya-Watson estimator emerged independently in 1964, providing a local constant approximation to the conditional expectation m(x) = E[Y \mid X = x] through the formula \hat{m}(x) = \frac{\sum_{i=1}^n K\left(\frac{x - X_i}{h}\right) Y_i}{\sum_{i=1}^n K\left(\frac{x - X_i}{h}\right)}, where K is the kernel and h is the bandwidth. Subsequent advancements include the Priestley-Chao estimator (1969) for ordered data and local polynomial extensions, such as local linear smoothing proposed by Stone in 1977 and refined by Gasser and Müller in 1979, which mitigate boundary bias and improve mean squared error performance. Kernel smoothers find broad applications in , lack-of-fit testing for parametric models, and semiparametric adjustments, such as covariate correction in experimental designs, with asymptotic analyses guiding optimal bandwidth selection to minimize estimation error. Their flexibility makes them valuable in fields like , , and for tasks including , scatterplot smoothing, and , though challenges such as the curse of dimensionality in high dimensions persist.

Fundamentals

Definition and Motivation

Kernel smoothing is a non-parametric statistical technique for estimating the underlying structure of , particularly in and , by constructing a weighted of observed values where weights are assigned via a that prioritizes data points closer to the target evaluation point. This approach allows for flexible estimation without imposing a predefined form on the relationship between variables. In regression settings, addresses the problem of estimating the conditional mean function m(x) = \mathbb{E}[Y \mid X = x] from a sample of independent observations \{(x_i, y_i)\}_{i=1}^n, where no specific shape for m is assumed, enabling the method to adapt to complex, unknown patterns in the data. The primary motivation stems from the limitations of models, which require assuming a fixed functional form (e.g., linear or ) that may lead to if misspecified, whereas kernel smoothing offers a data-driven alternative that captures local variations more accurately without such restrictive assumptions. This flexibility makes it particularly useful when the true relationship is nonlinear or poorly understood, providing smoother estimates that and variance through local weighting. In contrast to other non-parametric alternatives like spline smoothing or Fourier-based methods, which rely on global basis expansions or piecewise polynomials, kernel smoothing emphasizes proximity-based weighting to produce locally adaptive fits. Historically, the foundations of kernel methods emerged with Murray Rosenblatt's introduction in 1956 of kernel-based nonparametric estimators for , followed by Emanuel Parzen's refinements in 1962. The extension to occurred independently in 1964 through the works of E. A. Nadaraya on estimating and G. S. Watson on smooth , marking the shift toward broader applications in function estimation.

Kernel Functions and Properties

In kernel smoothing, a kernel function K is a symmetric, non-negative function that integrates to 1 over the real line and often possesses bounded support. The essential properties of such kernels include non-negativity, ensuring K(u) \geq 0 for all u; symmetry, satisfying K(-u) = K(u); the normalization condition \int_{-\infty}^{\infty} K(u) \, du = 1; and a finite second moment \int_{-\infty}^{\infty} u^2 K(u) \, du < \infty, which facilitates bias analysis in smoothing procedures. Common kernel functions used in practice include the Epanechnikov kernel, defined as K(u) = \begin{cases} \frac{3}{4} (1 - u^2) & |u| \leq 1 \\ 0 & \text{otherwise} \end{cases}; the uniform kernel, K(u) = \begin{cases} \frac{1}{2} & |u| \leq 1 \\ 0 & \text{otherwise} \end{cases}; and the biweight kernel, K(u) = \begin{cases} \frac{15}{16} (1 - u^2)^2 & |u| \leq 1 \\ 0 & \text{otherwise}. \end{cases} Kernel functions determine the weights assigned to observations in the smoothing process, with weights diminishing as the distance from the evaluation point grows, modulated by the parameter h to control the degree of smoothing.

Core Methods

Nadaraya-Watson Estimator

The Nadaraya-Watson estimator is a foundational method in kernel smoothing for nonparametric regression. Introduced independently by Nadaraya and Watson in 1964, it estimates the conditional expectation E[Y \mid X = x] as a locally weighted average of observed responses, with weights derived from a kernel function that prioritizes data points near x. This approach, also known as local constant smoothing, provides a flexible way to model unknown regression functions without assuming a parametric form. The estimator takes the form \hat{y}(x) = \frac{\sum_{i=1}^n K\left( \frac{x - x_i}{h} \right) y_i }{\sum_{i=1}^n K\left( \frac{x - x_i}{h} \right) }, where K(\cdot) is a symmetric kernel function satisfying \int K(u) \, du = 1 and \int u K(u) \, du = 0, h > 0 is the bandwidth controlling the degree of local averaging, and \{(x_i, y_i)\}_{i=1}^n are the observed data pairs. The denominator ensures the weights sum to 1, making \hat{y}(x) a of the y_i. This form derives from kernel approximations to density ratios. The conditional expectation E[Y \mid X = x] = \int y f(y \mid x) \, dy = \frac{\int y f(x, y) \, dy}{f_X(x)} is estimated by replacing the joint density f_{X,Y}(x, y) and marginal f_X(x) with their kernel density counterparts: \hat{f}_{X,Y}(x, y) = \frac{1}{n h^{d+1}} \sum_{i=1}^n K\left( \frac{x - x_i}{h} \right) L\left( \frac{y - y_i}{h} \right) and \hat{f}_X(x) = \frac{1}{n h^d} \sum_{i=1}^n K\left( \frac{x - x_i}{h} \right), where d is the dimension of X and L is a kernel for Y. Integrating over y simplifies to the weighted average under kernels with matching bandwidths and appropriate moments. The and variance properties underpin its performance. Under regularity conditions, including a twice-differentiable true function m(x) = E[Y \mid X = x] and a f_X(x) > 0, the is E[\hat{y}(x)] - m(x) = h^2 m''(x) \int u^2 K(u) \, du / 2 + o(h^2) = O(h^2) in one dimension, obtained via Taylor expansion around x. The variance is \text{Var}(\hat{y}(x)) = \frac{\sigma^2(x) \int K(u)^2 \, du}{n h f_X(x)} + o(1/(n h)) = O(1/(n h)), reflecting the effective local sample size n h f_X(x). The approximate is then \text{AMSE}(x) = \text{[bias](/page/Bias)}^2 + \text{variance} = O(h^4) + O(1/(n h)). For a conceptual illustration in univariate regression, suppose data are generated from y_i = \sin(2\pi x_i) + \epsilon_i with x_i \sim \text{[Uniform](/page/Uniform)}[0,1] and \epsilon_i \sim \mathcal{N}(0, 0.1^2). The Nadaraya-Watson estimator, using an Epanechnikov kernel and bandwidth h \approx n^{-1/5}, produces a smooth curve that traces the oscillatory sine pattern while attenuating , though it exhibits slight boundary bias and oversmoothing in flat regions.

Local Constant Smoothing

Local constant smoothing, also known as the zero-order local polynomial smoother, estimates the regression function at a point x by fitting a locally within a neighborhood defined by the h. This approach minimizes the criterion \sum_{i=1}^n K\left(\frac{x - x_i}{h}\right) (y_i - \beta)^2, where K is the kernel function, \{x_i, y_i\}_{i=1}^n are the data points, and \beta is the constant parameter to be estimated. The solution for \beta is a of the y_i, with weights proportional to K\left(\frac{x - x_i}{h}\right), establishing its direct equivalence to the Nadaraya-Watson estimator. This formulation leverages the kernel's properties as non-negative weights that sum to unity when normalized, enabling the weighted least squares solution without additional constraints. In practice, implementing local constant smoothing involves evaluating the estimator at each target point, with a naive computational complexity of O(n) operations per point due to the summation over all data. For one-dimensional data, after an initial O(n \log n) sorting of the x_i, efficiency can be improved using binary search to O(\log n) per point plus the time to sum over points in the local window, allowing access to nearby points within the bandwidth. The kernel density estimator is analogous, given by \hat{f}(x) = \frac{1}{n h} \sum_{i=1}^n K\left(\frac{x - x_i}{h}\right), treating each data point as contributing a kernel density. A key limitation of local constant smoothing is boundary bias, where the estimator systematically over- or underestimates the true near the edges of the due to asymmetric kernel weighting. This leads to poor performance in boundary regions, often requiring undersmoothing—choosing a smaller —to reduce at the cost of increased variance. Unlike global averaging methods, which compute a single constant fit across the entire dataset and ignore local variations, local constant adapts the estimate to the density and structure of nearby points, providing a more flexible of heterogeneous functions.

Local Regression Techniques

Local Linear Regression

Local linear regression addresses a key limitation of local constant smoothing by incorporating a linear trend in the local fit, thereby reducing bias, particularly near the boundaries of the data support. Proposed by Stone (1977) and refined by Gasser and Müller (1979), this method fits a straight line to the data points in a neighborhood around each evaluation point x, weighted by a kernel function. Specifically, at each x, the parameters \beta_0 and \beta_1 are chosen to minimize the weighted sum of squared residuals: \sum_{i=1}^n K\left(\frac{x_i - x}{h}\right) \left( y_i - \beta_0 - \beta_1 (x_i - x) \right)^2, where K is the kernel, h is the bandwidth, and the resulting estimate of the regression function is \hat{m}(x) = \hat{\beta}_0. This optimization problem has a closed-form solution via . Define the X with rows [1, (x_i - x)], the response vector y = (y_1, \dots, y_n)^T, and the diagonal W with entries K((x_i - x)/h). Then, \hat{\beta} = (X^T W X)^{-1} X^T W y, and \hat{m}(x) is the first component of \hat{\beta}. This formulation ensures computational efficiency while adapting the fit to local . Compared to local constant , local linear regression achieves a higher-order of O(h^3) under standard assumptions on the true regression function, versus O(h^2) for the constant case, which translates to superior accuracy near boundaries where asymmetric otherwise distorts the estimate. This reduction occurs because the linear term allows the smoother to better approximate the underlying function's slope, mitigating the endpoint effects inherent in kernel . For example, when smoothing noisy observations from a simple over a bounded , a comparison reveals that the local constant exhibits pronounced upward at the lower due to the kernel's one-sided , pulling estimates away from the true line, whereas the local linear aligns closely with the function throughout, including endpoints, demonstrating its practical advantage in bias correction.

Local Polynomial Regression

Local regression extends the framework of kernel by fitting a of arbitrary degree p \geq 0 locally around each evaluation point x, using kernel weights to emphasize nearby data points. Introduced by Stone (1977), this approach, comprehensively developed by Fan and Gijbels (1996) in the context of , allows for more flexible approximation of the underlying regression function compared to constant or linear fits, particularly for capturing local . The estimator \hat{m}(x) is derived by solving a weighted least squares problem that minimizes the objective function \sum_{i=1}^n K\left( \frac{x_i - x}{h} \right) \left( y_i - \sum_{j=0}^p \beta_j (x_i - x)^j \right)^2, where K is the kernel function, h > 0 is the bandwidth, and the \beta_j are the polynomial coefficients. The solution takes the form \hat{\boldsymbol{\beta}}(x) = \left( \mathbf{X}^T \mathbf{W}(x) \mathbf{X} \right)^{-1} \mathbf{X}^T \mathbf{W}(x) \mathbf{y}, with the estimate given by the intercept \hat{m}(x) = \hat{\beta}_0(x); here, \mathbf{X} is the (n \times (p+1)) Vandermonde-style design matrix with entries X_{i,j} = (x_i - x)^j for j = 0, \dots, p, and \mathbf{W}(x) is the diagonal matrix with entries K((x_i - x)/h). This formulation ensures that the fit is locally weighted, with the kernel controlling the influence of distant points. Increasing the polynomial degree p reduces the order of in the from O(h^{p+1}), enabling better adaptation to smoother underlying functions, but it simultaneously amplifies variance due to the higher effective dimensionality of the local fit and can introduce numerical instability, especially with small sample sizes or narrow bandwidths. In practice, the optimal degree is typically low, ranging from 1 to 3, as higher values offer diminishing returns in reduction while exacerbating and computational demands. Local linear regression arises as the specific case with p=1. The structure of the \mathbf{X} interacts with the of p to influence boundary behavior: odd degrees, by incorporating asymmetric terms that adjust the local , mitigate at the edges more effectively than even degrees, which rely on symmetric polynomials and may exhibit poorer performance near boundaries due to inadequate correction for one-sided density. This adaptation in the matrix construction helps maintain across the of the , though it requires careful implementation to avoid ill-conditioning for higher p.

Special Cases and Variants

Gaussian Kernel Smoother

The Gaussian kernel is a widely adopted choice in kernel smoothing due to its mathematical properties and ease of implementation. Defined as K(u) = \frac{1}{\sqrt{2\pi}} \exp\left( -\frac{u^2}{2} \right), this kernel represents the standard normal density function with 0 and variance 1, ensuring it integrates to 1 and is symmetric around zero. In the Nadaraya-Watson estimator, the Gaussian kernel determines the weights assigned to each data point, emphasizing proximity through . The normalized weight for the i-th at evaluation point x is w_i(x) = \frac{ \exp\left( -\frac{(x - x_i)^2}{2 h^2} \right) }{ \sum_{j=1}^n \exp\left( -\frac{(x - x_j)^2}{2 h^2} \right) }, where h > 0 is the bandwidth controlling the smoothness, and the leading constant \frac{1}{\sqrt{2\pi}} from the kernel definition cancels during normalization. The resulting estimator is then \hat{m}(x) = \sum_{i=1}^n w_i(x) y_i. This formulation, originally proposed in the context of general positive kernels, readily accommodates the Gaussian form for producing weighted local averages. The Gaussian kernel yields smooth and infinitely differentiable regression fits, facilitating applications where derivative estimates or visual interpretability are important. Its form also establishes direct connections to parametric methods: in (RBF) networks, Gaussian kernels act as localized basis functions for approximation, bridging nonparametric smoothing with architectures; similarly, in regression, the squared exponential covariance kernel—equivalent to the Gaussian kernel—induces priors over smooth functions, linking kernel smoothing to Bayesian nonparametric modeling. Despite these strengths, the Gaussian kernel's infinite support introduces notable challenges. All data points influence the estimate everywhere, albeit with rapidly diminishing weights, resulting in rather than strictly local that increases computational demands, as predictions require evaluating the entire dataset rather than a local subset. This property also renders the method sensitive to outliers, since even remote anomalous points contribute non-zero weights that can subtly bias the fit, particularly in low-density regions. proceeds via the denominator sum, but numerical issues arise in , such as exponent underflow for large distances or ill-conditioning when h is small, necessitating careful scaling in software.

Nearest Neighbor Smoother

The nearest neighbor smoother, often referred to as the k-nearest neighbor (k-NN) , is a method that constructs estimates by averaging the responses of the k data points closest to the target evaluation point x in the predictor space. This approach treats the k-NN as a kernel smoother with a uniform kernel applied exclusively to those nearest points, where the estimate is formulated as \hat{m}(x) = \frac{1}{k} \sum_{i \in N_k(x)} y_i, with N_k(x) denoting the set of indices for the k nearest neighbors to x, ordered by Euclidean (or other metric) distance. The uniform kernel here assigns equal weight K(u) = 1/k to neighbors within the adaptive radius d_k(x) (the distance to the k-th nearest neighbor) and zero otherwise, effectively creating a data-dependent neighborhood size. In relation to general kernel smoothing, the k-NN method corresponds to a Nadaraya-Watson estimator using a kernel function and an adaptive h(x) = d_k(x), which varies locally based on data density rather than a fixed global value. This adaptivity arises because denser regions yield smaller d_k(x), leading to finer local resolution, while sparser areas expand the effective to include the required k points. As the fixed k remains constant while the shrinks toward zero in high-density regions, the method maintains a balance between local adaptation and computational simplicity in metric spaces. Key properties of the nearest neighbor smoother include its piecewise constant nature, resulting in discontinuous estimates that change abruptly at boundaries midway between points, which contrasts with the smoother curves produced by kernels with continuous . It exhibits higher variance than smooth kernel alternatives due to the equal weighting of distant neighbors in sparse regions and the lack of gradual decay in , though it achieves under mild conditions on the design density and k growing appropriately with sample size. Beyond , the uniform averaging principle extends naturally to tasks, where it predicts the class by vote among the k neighbors. A representative example involves applying a k=5 nearest neighbor smoother to a univariate scatterplot of noisy data points, such as temperature observations over time; the resulting fit forms a step function, remaining constant within intervals defined by the average distances to the fifth nearest point and jumping at those midpoints to reflect local averages.

Implementation and Extensions

Bandwidth Selection Methods

The bandwidth parameter h in kernel smoothing governs the fundamental bias-variance tradeoff, where smaller h values yield low bias but high variance (under-smoothing), and larger values produce low variance but high bias (over-smoothing). For kernel density estimation, the asymptotically mean integrated squared error (AMISE)-optimal h is of order n^{-1/5}, where n is the sample size; a similar order holds for nonparametric regression due to analogous asymptotic expansions. Several data-driven methods exist for selecting h. Least squares cross-validation (LSCV) minimizes the leave-one-out criterion \text{CV}(h) = \sum_{i=1}^n \left( y_i - \hat{y}_{-i}(x_i) \right)^2, where \hat{y}_{-i}(x_i) denotes the kernel estimator at x_i excluding the i-th observation; this provides an unbiased estimate of the integrated squared error and is widely used for its computational feasibility. Plug-in selectors approximate the AMISE-optimal h by estimating its components, such as second derivatives of the regression function and residual variance, often requiring an initial pilot bandwidth selected via LSCV or a rule-of-thumb. A practical rule-of-thumb for Gaussian errors and a Gaussian kernel is h = 1.06 \sigma n^{-1/5}, where \sigma is the error standard deviation, offering a quick initial choice that approximates the optimal under normality assumptions. Adaptive bandwidth methods adjust h locally according to data density, employing larger h in sparse regions to mitigate excessive variance while maintaining smaller h in dense areas for precision. These approaches typically scale the bandwidth proportional to a pilot estimate of the design density f(x) raised to the power of -1/5, such as h(x) \propto [\hat{f}(x)]^{-1/5}, enhancing performance in heterogeneous data. The impact of h selection is evident in evaluations of (MSE) as a of h, where MSE typically declines to an optimal minimum before rising, demonstrating under- (high MSE from ) for small h and over- (high MSE from ) for large h; such curves from studies underscore the of kernel estimators to bandwidth choice.

Multivariate and Adaptive Extensions

Multivariate extensions of kernel smoothers generalize the univariate Nadaraya-Watson estimator to higher-dimensional input spaces, accommodating vector-valued covariates \mathbf{x} \in \mathbb{R}^d. The estimator takes the form \hat{m}(\mathbf{x}) = \frac{\sum_{i=1}^n K_H(\mathbf{x} - \mathbf{X}_i) Y_i}{\sum_{i=1}^n K_H(\mathbf{x} - \mathbf{X}_i)}, where K_H(\mathbf{u}) = |\mathbf{H}|^{-1/2} K(\mathbf{H}^{-1/2} \mathbf{u}), K is a multivariate (often a product or radial kernel), and \mathbf{H} is a symmetric positive definite bandwidth controlling the smoothing in each direction. For isotropic smoothing, \mathbf{H} is diagonal with equal entries, while anisotropic versions allow direction-specific smoothing via a full , enabling adaptation to elongated structures. This formulation arises naturally from local averaging with ellipsoidal neighborhoods defined by \mathbf{H}. A key challenge in multivariate settings is the curse of dimensionality, where estimation accuracy degrades rapidly as d increases due to sparse data coverage in high-dimensional spaces. The optimal bandwidth scales as h \asymp n^{-1/(d+4)} for the mean integrated squared error in local constant kernel regression, implying that sample sizes must grow exponentially with d to maintain performance; for example, in d=10, millions of observations may be needed for reliable smoothing comparable to univariate cases. Adaptive smoothing addresses local data variability by employing variable bandwidths or kernel shapes, such as inflating the bandwidth by a factor proportional to the inverse square root of the local pilot density estimate, as in Abramson's method, to apply wider smoothing in sparse regions and sharper in dense ones. Robust variants further mitigate outliers by using truncated or M-estimator-based weights in the kernel, reducing their influence on the local average. These extensions find applications in image denoising, where adaptive kernel regression preserves edges by locally adjusting smoothing based on image gradients, achieving superior performance over fixed-kernel methods on noisy grayscale images. In spatial statistics, multivariate kernel smoothers estimate intensity functions for point processes, such as crime hotspots or disease outbreaks, by smoothing over geographic coordinates to reveal underlying spatial patterns. Computational efficiency is enhanced through binning techniques, which discretize the data space into grids and approximate kernel sums via precomputed histograms, reducing complexity from O(n^2) to near-linear time for large datasets. However, limitations include the proliferation of parameters in \mathbf{H} (up to d(d+1)/2 entries), complicating selection and risking overfitting or instability in high dimensions, where the curse exacerbates variance without sufficient data.

References

  1. [1]
    [PDF] Kernel Smoothers: An Overview of Curve Estimators for the First ...
    The kernel weights K are calculated under two dis- tinct approaches: (1) a fixed window width as in Fig- ure 3 and (2) a fixed fraction of the data. In the ...
  2. [2]
    Kernel smoothing • SOGA-Py - Freie Universität Berlin
    Kernel smoothing is a moving average smoother that uses a weight function, also referred to as kernel, to average the observations.
  3. [3]
    6.2 Kernel regression estimation | Notes for Predictive Modeling
    The Nadaraya–Watson estimator can be seen as a particular case of a wider class of nonparametric estimators, the so called local polynomial estimators.
  4. [4]
    [PDF] Nonparametric Regression: Nearest Neighbors and Kernels
    The basic goal in nonparametric regression is to construct an estimator ˆf of f0 without assuming a specific parametric form for f0, and instead only assuming ...
  5. [5]
    [PDF] Nonparametric Regression - Statistics & Data Science
    As in kernel density estimation, kernel regression or kernel smoothing begins with a kernel function K : R → R, satisfying. Z K(x) dx = 1, Z xK(x) dx = 0, 0 ...
  6. [6]
    On Estimation of a Probability Density Function and Mode
    September, 1962 On Estimation of a Probability Density Function and Mode. Emanuel Parzen · DOWNLOAD PDF + SAVE TO MY LIBRARY. Ann. Math. Statist.
  7. [7]
    Kernel Smoothing | M.P. Wand, M.C. Jones - Taylor & Francis eBooks
    Dec 1, 1994 · Get Citation. Wand ... The basic principle is that local averaging or smoothing is performed with respect to a kernel function. This book ...
  8. [8]
    [PDF] Notes On Nonparametric Regression Estimation James L. Powell ...
    The estimator ^g(x); known as the Nadaraya-Watson kernel regression estimator, can be written as a ... The bias and variance of the numerator ^t(x) are ...
  9. [9]
    Smooth Regression Analysis - jstor
    ... Watson and. Leadbetter (1963, 1964, 1964a), Parzen (1962b), Bartlett (1964)). Mahalanobis (1961) has suggested a "distribution-free" regression analysis and ...
  10. [10]
    An Upper Bound of the Bias of Nadaraya-Watson Kernel Regression ...
    The Nadaraya-Watson kernel estimator is among the most popular nonparameteric regression technique thanks to its simplicity. Its asymptotic bias has been ...
  11. [11]
    Robust Locally Weighted Regression and Smoothing Scatterplots
    Apr 5, 2012 · A robust fitting procedure is used that guards against deviant points distorting the smoothed points. Visual, computational, and statistical ...
  12. [12]
    Local Polynomial Modelling and Its Applications
    May 2, 2018 · Fan, J. (1996). Local Polynomial Modelling and Its Applications: Monographs on Statistics and Applied Probability 66 (1st ed.). Routledge ...
  13. [13]
    10.3 Kernel Regression | A Guide on Data Analysis - Bookdown
    This is a guide on how to conduct data analysis in the field of data science, statistics, or machine learning.
  14. [14]
    On Estimating Regression | Theory of Probability & Its Applications
    Enhanced Nadaraya-Watson Kernel Regression: Surface Approximation for Extremely Small Samples. 2011 Fifth Asia Modelling Symposium | 1 May 2011. Hidden ...
  15. [15]
    Consistent Nonparametric Regression - Project Euclid
    July, 1977 Consistent Nonparametric Regression. Charles J. Stone · DOWNLOAD PDF + SAVE TO MY LIBRARY. Ann. Statist. 5(4): 595-620 (July, 1977). DOI: 10.1214/aos ...
  16. [16]
    [PDF] Statistical Methods for Quantifying Spatial Effects on Disease ...
    2.2 Kernel representation of a K-nearest neighbor smoother. Black dots ... and Silverman, 1994) for univariate variables and local kernel smoothing (LOESS) (Cleve ...
  17. [17]
    Adaptive Bandwidth Choice for Kernel Regression
    A data-based procedure is introduced for local bandwidth selection for kernel estimation of a regression function at a point.
  18. [18]
    On Bandwidth Variation in Kernel Estimates-A Square Root Law
    The paper considers kernel estimation with adaptive bandwidths, varying them like f−1/2, which lowers bias and improves performance.
  19. [19]
    [PDF] Robust Kernel Density Estimation
    Since the sample mean is sensitive to outliers, we estimate it robustly via M-estimation, yielding a robust kernel density estimator (RKDE).
  20. [20]
    Fast Computation of Multivariate Kernel Estimators
    Feb 21, 2012 · Fast computation uses multivariate binning techniques, linear binning for accuracy, and the fast Fourier transform for time savings in ...Missing: speed | Show results with:speed