Fact-checked by Grok 2 weeks ago

Winsorized mean

The Winsorized mean is a robust statistical measure of central tendency that mitigates the impact of outliers by replacing the most extreme values in a dataset—typically the lowest and highest percentiles—with less extreme values from adjacent percentiles, before computing the arithmetic mean of the modified data.^[1] This approach yields a more stable estimate of the data's location compared to the conventional arithmetic mean, especially in distributions affected by anomalous observations.^[2] Named after biostatistician Charles P. Winsor, the method was introduced in 1946 as a technique for handling outliers in counting statistics. The computation of the Winsorized mean involves specifying a trimming proportion α (often 0.05 or 0.10 for 90% or 80% Winsorization, respectively), sorting the dataset, and capping the αn lowest values at the (αn + 1)th ordered value while capping the αn highest values at the [n(1 - α)]th ordered value, where n is the sample size.^[1] For instance, consider the dataset {1, 2, 3, 4, 100} with n=5 and α=0.20 (20% Winsorization): the lowest value (1) is replaced by the second lowest (2), and the highest (100) by the second highest (4), resulting in the adjusted set {2, 2, 3, 4, 4} and a Winsorized mean of 3.^[2] This process can be implemented symmetrically for balanced tails or asymmetrically if needed, and it is available in statistical software such as R and SAS.^[3] In contrast to the trimmed mean, which excludes extreme values entirely and reduces the effective sample size, the Winsorized mean preserves all data points, avoiding loss of information and maintaining degrees of freedom for inference.^[4] It is particularly valuable in applied fields like economics, psychology, and biomedical research, where outliers from measurement errors or rare events can distort results, and for symmetric distributions, it provides an unbiased estimator of the population mean.^[5]^[3] The technique enhances the reliability of analyses such as t-tests and regression models by producing variance estimates less sensitive to contamination.^[2]

Definition and Background

Definition

The arithmetic mean, a common measure of central tendency, is highly sensitive to outliers—extreme values in a dataset that deviate significantly from the majority of observations—because they can pull the average toward themselves, leading to distorted estimates of the population parameter.^[3] To mitigate this without discarding data, the Winsorized mean serves as a robust alternative estimator that adjusts extreme values rather than removing them.^[6] Winsorizing involves replacing the extreme observations in a sorted dataset with less extreme values drawn from the dataset itself, typically at specified lower and upper percentiles such as \alpha and $1 - \alpha, where $0 < \alpha < 0.5.^[7] For a dataset of n observations sorted in non-decreasing order as x_{(1)} \leq x_{(2)} \leq \cdots \leq x_{(n)}, and a trimming level k (often k = \lfloor \alpha n \rfloor), the k smallest values x_{(1)} to x_{(k)} are replaced by x_{(k+1)}, and the k largest values x_{(n-k+1)} to x_{(n)} are replaced by x_{(n-k)}.^[3] This capping process preserves the original sample size n, distinguishing it from trimming methods that delete extremes and reduce the effective sample.^[6] The Winsorized mean is then computed as the arithmetic mean of this adjusted dataset:

\bar{x}_w = \frac{1}{n} \sum_{i=1}^n w_i,

where w_i are the Winsorized values (with w_i = x_{(k+1)} for the lowest k originals, w_i = x_{(n-k)} for the highest k, and w_i = x_i otherwise).^[7] This approach limits the influence of outliers while retaining all data points, making it suitable for datasets suspected of containing anomalies.^[3]

Historical Development

The Winsorized mean was introduced by biostatistician Charles P. Winsor during his tenure at the Johns Hopkins School of Hygiene and Public Health, where he advocated for replacing extreme observations in datasets with values from adjacent non-extreme points to reduce outlier influence. This approach emerged in 1941, as recounted by John W. Tukey, who first encountered Winsor's ideas that year and credited him with a practical philosophy for treating "wild shots" in real-world data without discarding them outright. Winsor's innovation arose in the context of early 20th-century statistics, a period marked by increasing recognition of the limitations of classical methods like the arithmetic mean when applied to non-normal distributions prevalent in biological and public health studies.^[8] At the time, researchers in fields such as epidemiology and vital statistics sought robust alternatives to better accommodate skewed or contaminated data from observational sources, reflecting broader debates on estimation stability that traced back to late-19th-century concerns over least squares assumptions.^[8] Following World War II, the Winsorized mean gained traction in econometrics and finance starting in the post-1950s era, as these disciplines grappled with volatile economic indicators and heavy-tailed distributions in time-series data. Tukey formalized the terminology "Winsorized" in his 1962 seminal paper, popularizing the method and linking it to emerging robust statistics frameworks, though Winsor's foundational contribution predated and inspired Tukey's later exploratory techniques. By the 2000s, computational advancements made Winsorizing routine, with implementations in open-source software like R's DescTools package and Python's SciPy library enabling easy application across disciplines. Recent studies continue to affirm its utility, such as in 2024 analyses of financial forecasting where winsorization improved model stability against extreme market events.^[9]

Computation

Procedure

The procedure for computing the Winsorized mean involves sorting the data and systematically replacing extreme values to mitigate the influence of outliers.^[6]^[2] Begin by sorting the dataset in ascending order to obtain the order statistics x_{(1)} \leq x_{(2)} \leq \cdots \leq x_{(n)}, where n is the sample size.^[6]^[3] Next, select the trimming level \alpha, a value between 0 and 0.5 that specifies the proportion of observations to adjust in each tail (for example, \alpha = 0.05 for 5% per tail). Compute k = \lfloor \alpha n \rfloor using the floor function to obtain an integer number of values to replace.^[5]^[6] When \alpha n is not an integer, the floor function ensures k values are replaced per tail, which may yield a proportion slightly less than \alpha for small or odd n; alternatively, some implementations use ceiling or rounding for k, but flooring is the conventional choice for consistency and to avoid over-adjustment.^[5]^[6] Then, replace the k smallest values (x_{(1)} through x_{(k)}) with x_{(k+1)}, and replace the k largest values (x_{(n-k+1)} through x_{(n)}) with x_{(n-k)}.^[6]^[3]^[2] Finally, calculate the arithmetic mean of the resulting modified dataset.^[5]^[6] The following pseudocode outlines the process in a clear, implementable form (assuming 1-based indexing and a sorted array):

function winsorized_mean(data, alpha):
    n = length(data)
    if n <= 1:
        return mean(data)  // or handle edge case
    sort data ascending  // data[1] ≤ ... ≤ data[n]
    k = floor(alpha * n)
    if k > 0 and 2 * k < n:  // ensure valid range
        lower = data[k + 1]
        upper = data[n - k]
        for i = 1 to k:
            data[i] = lower
        for i = n - k + 1 to n:
            data[i] = upper
    return sum(data) / n
function winsorized_mean(data, alpha):
    n = length(data)
    if n <= 1:
        return mean(data)  // or handle edge case
    sort data ascending  // data[1] ≤ ... ≤ data[n]
    k = floor(alpha * n)
    if k > 0 and 2 * k < n:  // ensure valid range
        lower = data[k + 1]
        upper = data[n - k]
        for i = 1 to k:
            data[i] = lower
        for i = n - k + 1 to n:
            data[i] = upper
    return sum(data) / n

This algorithm preserves the sample size while capping extremes, with the condition k < n/2 preventing the entire dataset from being replaced.^[6]^[3]

Mathematical Formulation

The α-Winsorized mean of a sample X = (X_1, \dots, X_n) consisting of independent and identically distributed random variables is formally defined as

\bar{X}^{(\alpha)} = \frac{1}{n} \sum_{i=1}^n W_i,

where W_i = \min\left( \max\left( X_i, Q_{\alpha} \right), Q_{1-\alpha} \right) for each i, and Q_p denotes the p-quantile of the underlying distribution (or the sample p-quantile in the empirical setting).^[10] This transformation caps observations below the α-quantile at Q_{\alpha} and those above the (1-α)-quantile at Q_{1-\alpha}, thereby bounding the influence of extreme values while retaining all data points in the averaging process.^[11] In terms of order statistics, let X_{(1)} \leq X_{(2)} \leq \dots \leq X_{(n)} be the ordered sample, and set k = \lfloor \alpha n \rfloor. The α-Winsorized mean can equivalently be expressed as

\bar{X}^{(\alpha)} = \frac{1}{n} \left[ k X_{(k+1)} + \sum_{i=k+1}^{n-k} X_{(i)} + k X_{(n-k)} \right].

This formulation replaces the k smallest observations with X_{(k+1)} and the k largest with X_{(n-k)}, then computes the sample mean of the adjusted values.^[10]^[11] The sample quantiles Q_p are estimated using the empirical cumulative distribution function (CDF) F_n(x) = \frac{1}{n} \sum_{i=1}^n I(X_i \leq x), where I(\cdot) is the indicator function, via Q_p = \inf \{ x : F_n(x) \geq p \}. This ties the Winsorized mean directly to the empirical distribution, ensuring it is a nonparametric functional of the data.^[11] Asymptotically, as n \to \infty, the α-Winsorized mean is consistent for the population α-Winsorized mean E[\min(\max(X, Q_\alpha), Q_{1-\alpha})], which coincides with the population mean \mu for symmetric distributions, assuming a finite first absolute moment for the underlying distribution.^[11]

Properties

Robustness Characteristics

The Winsorized mean exhibits a finite-sample breakdown point of \min(\alpha, 1 - \beta), where \alpha and \beta represent the proportions replaced at the lower and upper tails, respectively; for the symmetric case with \alpha = \beta, this simplifies to approximately \alpha, indicating that the estimator can resist contamination by up to \alpha proportion of arbitrary outliers before its value becomes unbounded or meaningless.^[12] This property ensures qualitative robustness, as the estimator remains stable unless the fraction of outliers exceeds this threshold, outperforming non-robust alternatives like the arithmetic mean, which has a breakdown point approaching zero.^[10] The influence function of the Winsorized mean is bounded, which qualitatively measures its resistance to perturbations in the underlying distribution by limiting the contribution of any single observation.^[13] Specifically, the bounded influence caps the effect of an outlier at a value related to the distance between the \alpha- and $1-\alpha-quantiles, preventing any individual extreme value from disproportionately skewing the estimate.^[13] In contrast, the arithmetic mean's unbounded influence function allows a single outlier to produce arbitrarily large bias.^[10] Under gross-error models, such as the \epsilon-contaminated normal distribution where a proportion \epsilon of observations arise from a contaminating distribution, the Winsorized mean demonstrates superior performance to the arithmetic mean by reducing bias and mean squared error, particularly for low contamination levels (\epsilon \leq 0.10) and sample sizes n \geq 100.^[14] This robustness extends to distributions with heavy tails, where the Winsorized mean maintains lower sensitivity to extreme deviations compared to the sample mean, as evidenced by Monte Carlo simulations across varying trimming proportions and contamination scenarios.^[13]

Bias and Efficiency

The Winsorized mean is unbiased for the population mean under symmetric distributions, such as the normal distribution, because the symmetric replacement of extreme values with their corresponding quantiles preserves the expectation of the estimator.^[15] In skewed distributions, however, the procedure introduces a small bias, as the clipping of extremes does not fully account for the asymmetry, though this bias can be minimized by selecting an optimal cutoff value that reduces the mean squared error.^[16] For the normal distribution specifically, the bias is zero, as confirmed by the symmetry property.^[17] The variance of the Winsorized mean is lower than that of the arithmetic mean in distributions susceptible to outliers, since replacing extreme values dampens the contribution of heavy tails to overall variability. Under the normal distribution, the asymptotic variance takes the form \frac{\sigma^2}{n} \left(1 - 2\alpha + \frac{2\alpha^2}{1 - 2\alpha}\right), reflecting the reduced spread after Winsorization while accounting for the data-dependent nature of the cutoffs.^[18] This formula highlights how modest clipping levels (\alpha) yield variances close to but slightly above the arithmetic mean's \sigma^2 / n due to the variability in estimating the quantiles. Relative efficiency, defined as the inverse ratio of the asymptotic variances compared to the arithmetic mean, stands at approximately 95% for \alpha = 0.05 under a clean normal distribution, indicating a minor efficiency loss in uncontaminated settings. In contrast, for contaminated normal distributions—such as those with 10% scale contamination—the relative efficiency exceeds 1, often reaching 1.2 or higher depending on the contamination level, demonstrating superior performance when outliers are present.^[19] Similar gains occur with heavier-tailed distributions like the t-distribution; for instance, with \alpha = 0.05 and 3 degrees of freedom, efficiency surpasses 1.1 relative to the normal case, increasing to over 1.3 for \alpha = 0.1, as clipping mitigates the influence of extreme values more effectively in non-normal scenarios.^[20] These efficiencies underscore the bias-variance tradeoff favoring the Winsorized mean in robust settings, though optimal \alpha selection balances the loss in clean data against gains in robustness.

Advantages and Limitations

Advantages

The Winsorized mean offers simplicity in computation, as it requires only sorting the data values and replacing the extremes with specified percentile thresholds before taking the arithmetic average, making it accessible without advanced programming or optimization techniques.^[21] This approach contrasts with more intricate robust estimators, such as M-estimators or regression-based methods, which often involve iterative solving of nonlinear equations or hyperparameter tuning. By replacing outlier values rather than discarding them, the Winsorized mean preserves the full sample size and retains all observations, thereby avoiding the information loss inherent in trimming or deletion procedures that reduce effective dataset size.^[5] This preservation maintains greater data variability compared to trimmed means, where extremes are entirely excluded from the calculation.^[5] The method demonstrates versatility across data types, serving as a robust location estimator for univariate distributions with heavy tails or asymmetry, and extending to multivariate contexts through adaptations like robust mean vector estimation in control charts or composite indicators.^[22] In non-ideal conditions, such as datasets contaminated by outliers or deviations from normality, it enhances the reliability of statistical inference by mitigating extreme influences while stabilizing variance estimates.^[21] Modern software integration further bolsters its practicality, with implementations available in libraries like SciPy's masked statistics module (as of version 1.16.2 in 2025), although there is an ongoing proposal to deprecate the mstats module,^[23]^[24] enabling seamless use in data analysis and machine learning workflows.^[23] Overall, this reduces the impact of outliers on central tendency estimates without requiring distributional assumptions.^[21]

Limitations

One key limitation of the Winsorized mean is the arbitrary selection of the parameter α, which determines the proportion of data winsorized at each tail and lacks universally objective criteria, thereby introducing potential subjectivity into the estimation process.^[6] Researchers often rely on conventional values such as α = 0.05 or 0.1, but these choices can vary based on domain knowledge or exploratory analysis, and sensitivity analyses are recommended to assess the impact of different α levels on results.^[5] This subjectivity can undermine reproducibility, particularly when the optimal α is context-dependent.^[6] The method also exhibits residual sensitivity to outliers if their number exceeds the proportion specified by α or if they cluster near the chosen quantiles, as the replacement values may still be contaminated by extreme observations.^[6] In such cases, the Winsorized mean fails to fully mitigate outlier influence, potentially leading to biased estimates that resemble those of the ordinary mean.^[6] For instance, if more than αn outliers exist in one tail, the (αn + 1)th order statistic used for replacement could itself be an outlier, preserving undue leverage.^[6] Additionally, the Winsorized mean relies on asymptotic assumptions that may not hold in small samples (typically n < 30) or highly skewed distributions, where it can perform poorly without further adjustments.^[25] In small-sample scenarios, such as usability testing with n around 10–20, the Winsorized mean has been shown to yield inaccurate point estimates compared to more robust alternatives.^[25] For skewed data, the procedure can introduce bias by asymmetrically altering the distribution, especially if tails are unequal.^[26] Traditional discussions of α selection have been limited, but post-2010 developments include data-driven methods such as using the median absolute deviation (MAD) to set thresholds.^[27] These approaches, often integrated in modern robust estimation frameworks, help mitigate subjectivity by adaptively choosing truncation levels based on empirical evidence. For example, MAD-based selection identifies outlier boundaries relative to the median.^[27]

Comparison with Trimmed Mean

The primary distinction between the Winsorized mean and the trimmed mean lies in their treatment of extreme values: the Winsorized mean replaces outliers beyond specified percentiles with the nearest non-extreme values, thereby capping their influence while retaining the full sample size, whereas the trimmed mean discards those extremes entirely, resulting in a reduced effective sample size.^[28]^[4] In positively skewed distributions, where high-value outliers are more prevalent, the Winsorized mean tends to produce slightly higher estimates than the trimmed mean because it incorporates capped versions of the extremes into the average across the original sample size, avoiding the downward pull from outright removal of those values. Regarding performance, both estimators exhibit similar breakdown points, typically up to the proportion of data trimmed or Winsorized (e.g., 50% for maximal robustness), but the Winsorized mean demonstrates higher efficiency under symmetric contamination scenarios, such as normal distributions with balanced outliers on both tails.^[28]^[29] The choice between them depends on context: the trimmed mean is preferable when strict outlier removal is desired to eliminate potential contamination entirely, while the Winsorized mean is better suited for small datasets where preserving the full sample size is crucial to maintain statistical power.^[28]^[29]

Comparison with Other Robust Estimators

The Winsorized mean provides greater statistical efficiency than the median under distributions close to normal with only mild contamination, as it retains more information from the data while capping extremes. For instance, under the standard normal distribution, a 5% Winsorized mean achieves approximately 95% relative efficiency compared to the sample mean, whereas the median attains only about 64% relative efficiency (2/π). The median, however, offers simplicity in computation and asymptotic distribution-free properties, making it preferable when contamination is heavy or computational resources are limited. In comparison to the Huber M-estimator, the Winsorized mean is non-iterative and thus easier to implement, requiring only sorting and replacement of extremes rather than solving nonlinear estimating equations. The Huber estimator, introduced as a foundational M-estimator, allows fine-tuning of its robustness parameter to balance efficiency and outlier resistance, achieving up to 95% relative efficiency under normality while bounding influence; certain extensions or combinations with high-breakdown initial estimates can yield breakdown points up to 25%. In robust regression contexts, the Winsorized mean serves as a straightforward alternative to Huber methods, particularly when iteration is undesirable, though it trades off some flexibility in tuning for computational simplicity. The following table summarizes key properties for a 5% Winsorized mean (α=0.05), the median, and the Huber M-estimator tuned for 95% efficiency under normality:

Estimator	Breakdown Point	Relative Efficiency (Normal)	Computational Complexity
Winsorized mean (α=0.05)	0.05	~0.95	O(n log n) (sorting)
Median	0.50	0.64	O(n)
Huber M-estimator	0 (asymptotic)	0.95	O(n × iterations)

Breakdown points reflect the maximum proportion of contaminated observations the estimator can tolerate before arbitrary bias; efficiencies are asymptotic relative to the sample mean.

Examples and Applications

Numerical Example

Consider a hypothetical dataset consisting of 10 exam scores: 50, 55, 60, 65, 70, 75, 80, 85, 90, 200. This set includes a clear outlier at 200, which inflates the arithmetic mean to 83 (calculated as the sum 830 divided by 10).^[3] To illustrate the 10% Winsorized mean (with α = 0.1 per tail, yielding k = 1 value replaced at each end), first sort the data in ascending order: 50, 55, 60, 65, 70, 75, 80, 85, 90, 200. Replace the lowest value (50) with the next lowest (55) and the highest value (200) with the next highest (90). The resulting Winsorized dataset is 55, 55, 60, 65, 70, 75, 80, 85, 90, 90, with a sum of 725 and a mean of 72.5 (725 divided by 10). This adjustment caps the outlier's influence, lowering the mean by 10.5 points compared to the original.^[3] For comparison, the corresponding 10% trimmed mean removes the extreme values (50 and 200), leaving the central 8 scores: 55, 60, 65, 70, 75, 80, 85, 90. The sum is 580, yielding a mean of 72.5 (580 divided by 8). In this case, both robust estimators produce the same result, highlighting how Winsorization retains all observations while achieving similar outlier mitigation.^[3] The following table displays the datasets side by side for clarity:

Position	Original Data	Winsorized Data (10%)	Trimmed Data (10%)
1	50	55	55
2	55	55	60
3	60	60	65
4	65	65	70
5	70	70	75
6	75	75	80
7	80	80	85
8	85	85	90
9	90	90	(removed)
10	200	90	(removed)

Means: Original = 83; Winsorized = 72.5; Trimmed = 72.5 This example demonstrates the Winsorized mean's ability to reduce outlier impact without discarding data points, making the estimate more representative of the typical performance in the exam scores.^[3] Visualizing the data via boxplots further illustrates the effect: the original dataset's boxplot shows a median around 72.5 with an upper whisker extended to 200 due to the outlier, creating a skewed representation. After Winsorization, the boxplot's upper whisker shortens to 90, resulting in a more symmetric and compact display that better captures the central distribution of scores.

Practical Applications

The Winsorized mean finds significant application in finance, particularly in analyzing portfolio returns where extreme market events can skew traditional means. During the 2008 financial crisis, researchers applied winsorization to fixed income mutual fund return data to eliminate spurious outliers and enhance statistical efficiency, enabling more reliable performance evaluations before, during, and after the crisis period.^[30] Similarly, in studies of volatility across financial crises, winsorized monthly real returns are used to compute standard deviations, mitigating the distorting effects of outliers in international stock data spanning 60 countries.^[31] This approach provides a robust estimate of central tendency, crucial for risk assessment and momentum profitability analysis in volatile markets.^[32] In medicine and biostatistics, the Winsorized mean supports robust averaging of clinical measurements prone to errors or outliers, such as in laboratory quality control and trial outcomes. For patient-based real-time quality control in medical laboratories, winsorization of extreme values outperforms simple outlier removal, yielding more stable medians and percentiles for monitoring analytical performance.^[33] In electronic health record systems, it is applied to turn-around times for critical test results, where winsorizing reveals a reduction in mean processing time from 34 minutes to 20 minutes after system improvements, highlighting operational efficiencies otherwise masked by extremes.^[34] These uses ensure reliable inference in datasets with measurement variability, as seen in screening colonoscopy performance models that employ winsorization for outlier adjustment.^[35] Recent applications extend to machine learning preprocessing, where the Winsorized mean handles outliers in feature data to improve model stability and prediction accuracy. In cognitive age prediction models, winsorization limits extreme values in features like psychophysiological test data such as reaction time and accuracy metrics, reducing distortions and enhancing regression performance alongside variance inflation factor selection.^[36] This technique preserves dataset size while curbing outlier influence, making it suitable for high-dimensional inputs in neural networks and time-series forecasting.^[27] Software implementations facilitate practical adoption of the Winsorized mean across disciplines. In R, the robustHD package offers the winsorize function to shrink outlying observations to data borders, ideal for cleaning financial or clinical datasets; post-2020 updates in related libraries like DescTools enhance its efficiency for large-scale analysis.^[37]

r
library(robustHD)
data <- c(1, 2, 3, 100, -50, 4, 5)  # Example data with outliers
winsorized_data <- winsorize(data, trim = 0.1)  # 10% winsorization
mean(winsorized_data)  # Compute Winsorized mean
library(robustHD)
data <- c(1, 2, 3, 100, -50, 4, 5)  # Example data with outliers
winsorized_data <- winsorize(data, trim = 0.1)  # 10% winsorization
mean(winsorized_data)  # Compute Winsorized mean

In Python, SciPy's mstats.winsorize from the scipy.stats module caps extremes at specified percentiles, commonly used in machine learning pipelines for preprocessing numerical features.

python
from scipy.stats.mstats import winsorize
import numpy as np
data = np.array([1, 2, 3, 100, -50, 4, 5])  # Example data with outliers
winsorized = winsorize(data, limits=[0.1, 0.1])  # 10% on each tail
np.mean(winsorized)  # Compute Winsorized mean
from scipy.stats.mstats import winsorize
import numpy as np
data = np.array([1, 2, 3, 100, -50, 4, 5])  # Example data with outliers
winsorized = winsorize(data, limits=[0.1, 0.1])  # 10% on each tail
np.mean(winsorized)  # Compute Winsorized mean

References

[1]
How to Winsorize Data: Definition & Examples - Statology
To winsorize data means to set extreme outliers equal to a specified percentile of the data. For example, a 90% winsorization sets all observations greater ...<|control11|><|separator|>
[2]
[PDF] Fisher Digital Publications Winsorizing
Jun 5, 2018 · Winsorizing is an important tool for educational and social science researchers for two reasons. First, significance tests based on the mean.Missing: paper | Show results with:paper
[3]
Trimmed and Winsorized Means
For a symmetric distribution, the symmetrically trimmed or Winsorized mean is an unbiased estimate of the population mean. But the trimmed or Winsorized mean ...
[4]
Winsorization - an overview | ScienceDirect Topics
Winsorization is defined as a method of handling outliers in a data distribution by converting extreme high values to the value of the highest data point ...
[5]
Understanding Winsorized Mean: Formula, Examples, and ...
Winsorized means can be expressed in two ways. A "kn" winsorized mean replaces the "k" smallest and largest values, where "k" is an integer. An "X%" winsorized ...What Is the Winsorized Mean? · Formula · Benefits of Using Winsorized... · Example
[6]
Winsorization: The good, the bad, and the ugly - The DO Loop
Feb 8, 2017 · If the data are from a symmetric population, the Winsorized mean is a robust unbiased estimate of the population mean. The graph to right ...
[7]
[PDF] Introduction to the Bootstrap and Robust Statistics
Another robust estimtor of central tendency is the winsorized mean. Like the trimmed mean, the winsorized mean eliminates the outliers at both ends of an ...<|control11|><|separator|>
[8]
of Robust Estimation 1885-1920 - jstor
Yet most of the early work in mathematical statistics was obsessed with "proving" the method of least squares, either starting with the assumption that the ...
[9]
Pooling and winsorizing machine learning forecasts to predict stock ...
Winsorization has little effect on the longer windows. Winsorizing ensemble forecasts increases their predictability by approximately 20 % on average from 0.9 % ...
[10]
Robust Estimation of a Location Parameter - Project Euclid
This paper contains a new approach toward a theory of robust estimation; it treats in detail the asymptotic theory of estimating a location parameter for ...
[11]
[PDF] A Short Course on Robust Statistics
α-Windsorized mean: Replace a proportion of α from both ends of the data set by the next closest observation and then take the mean. • Example: 2, 4, 5, 10, 200.
[12]
[PDF] High Breakdown Analogs of the Trimmed Mean - OpenSIUC
The Winsorized mean. Wn = Wn(ln,un) = 1 n. [lnX(ln+1) + un. ∑ i=ln+1. X(i) + (n − un)X(un)]. (3). These estimators have a breakdown point of min(α,1 − β).
[13]
[PDF] Comparison Between Robust Trimmed and Winsorized Mean
In this situation, the robust method. International Journal For Research In Mathematics And Statistics. ISSN: 2208-2662. Volume-3 | Issue-12 | December,2017 | ...Missing: citation | Show results with:citation<|control11|><|separator|>
[14]
[PDF] A COMPARATIVE STUDY ON UNIVARIATE OUTLIER ...
Thus, the mean winsorization statistic is recommended for most of the cases, especially for smaller levels of contamination (ε ≤ 0.1). For normal distribution, ...
[15]
An Estimator for a Population Mean Which Reduces the Effect of ...
Westat Research Analysts, Inc. An estimator, YI, of Po is investigated. This estimator is formed by replacing all sample values larger than a predetermined ...
[16]
On Searls' winsorized mean for skewed populations
A winsorized mean is obtained by replacing all the observations larger than some cut-off value R by R before averaging. The optimal cut-off value, as defined ...
[17]
[PDF] Winsorization-methods-in-sample-surveys.pdf - ResearchGate
The first paper, Searls(1966), proposed the winsorized sample mean. The second paper, Ernst(1980), discussed this estimator and many others, and proved that the ...
[18]
L-Statistics (Chapter 22) - Cambridge University Press
The a-trimmed mean is the average of the middle -th fraction of the observations, the a-Winsorized mean ... (A formula for the asymptotic variance is given in ...
[19]
Winsorized Regression: Technometrics - Taylor & Francis Online
Apr 9, 2012 · Winsorization showed improvement, over least. squares when the data are taken from a scale contaminated normal distribution.
[20]
(PDF) Trimmed and Winsorized means based on a scaled deviation
Aug 6, 2025 · The influence functions of the estimators are derived and their limiting distributions are established via asymptotic representations.
[21]
1.3.5.1. Measures of Location - Information Technology Laboratory
Winsorized Mean - similar to the trimmed mean. However, instead of trimming ... The first three alternative location estimators defined above have the advantage ...
[22]
Application of robust multivariate control chart with Winsorized Mean
Jul 19, 2019 · The Winsorized Mean is the mathematical average of the Winsorized distribution (Goodwyn 2012) and when it is applied on normal data, it does not ...
[23]
winsorize — SciPy v1.16.2 Manual
Returns a Winsorized version of the input array. The (limits[0])th lowest values are set to the (limits[0])th percentile, and the (limits[1])th highest values ...
[24]
[PDF] Average Task Times in Usability Tests: What to Report? - MeasuringU
Small sample point estimates will be inaccurate ... The arithmetic mean, trim-all mean, harmonic mean and. Winsorized mean performed much more poorly than the.Missing: limitations | Show results with:limitations
[25]
Winsorized Mean: What You Need to Know to Handle Outliers
Sep 10, 2024 · A winsorized mean is a statistical measure that reduces the impact of outliers by replacing extreme values with less extreme percentiles rather than completely ...Practical applications of... · Calculate the mean · Comparing trimmed mean and...
[26]
Winsorization: Handling Outliers in Machine Learning
Mar 23, 2025 · What is Winsorization? Winsorization (named after biostatistician Charles P. Winsor), or capping, involves replacing extreme values in a dataset ...Missing: paper | Show results with:paper
[27]
Comparison Between Robust Trimmed and Winsorized Mean
Dec 31, 2017 · That is, the estimator winsorized mean provides more efficient as well as robust result compared to the estimator trimming mean.Missing: seminal papers
[28]
[PDF] Statistical Properties of Winsorized Means for Skewed Distributions
Nov 25, 2005 · Section 3 proposes a nearly unbiased estimator of the mean squared error of the. Winsorized mean. ... order statistics. Its proof together ...
[29]
Comparison of the bias of trimmed and Winsorized means
Aug 6, 2025 · Further deductions explain why the Winsorized mean typically has smaller biases compared to the trimmed mean; two sequences of semiparametric ...Missing: seminal robustness
[30]
Fixed income mutual fund performance during and after a crisis - NIH
Mar 4, 2021 · This study investigates the performance of the fixed income mutual funds industry, focusing on Canadian fixed income funds before, during, and after the 2008 ...Missing: applications | Show results with:applications
[31]
[PDF] Learning from History: Volatility and Financial Crises
Volatility is calculated as the standard deviation of the previous 12 winsorized monthly real returns, scaled by. √. 12, and the sample includes 60 countries.
[32]
Outliers and momentum in the corporate bond market - ScienceDirect
Specifically, outlier trimming vanishes momentum returns, whereas winsorization yields a robust but conservative assessment of the momentum effect. Positive ...
[33]
Understanding Patient-Based Real-Time Quality Control ... - PubMed
Aug 1, 2020 · Winsorization of outlying values often led to a better performance than simple outlier removal. For medians and Harrell-Davis 50 percentile ...
[34]
Critical Results Notification System in the Electronic Health Record
Jul 17, 2025 · However, after applying Winsorization and analyzing mean TATs, the average TAT decreased from 34 min to 20 min, demonstrating improved ...
[35]
Identification of physicians with unusual performance in screening ...
Design: Bayesian random-effects modeling and Winsorization of potential outliers were applied to develop a robust model for the majority of providers.
[36]
Using machine learning methods to predict cognitive age ... - arXiv
Winsorization was used to handle outliers, limiting extreme values and reducing distortions caused by anomalies. This approach stabilized the feature ...
[37]
Data cleaning by winsorization - RDocumentation
Clean data by means of winsorization, i.e., by shrinking outlying observations to the border of the main part of the data.