Tolerance interval
A tolerance interval is a statistical interval derived from sample data that contains at least a specified proportion p of the population values with a stated confidence level $1 - \alpha, distinguishing it from confidence intervals that cover population parameters and prediction intervals that cover future observations.[1] For a normal distribution, the interval is typically two-sided, with lower limit \bar{Y} - k s and upper limit \bar{Y} + k s, where \bar{Y} is the sample mean, s is the sample standard deviation, and k is a factor depending on p, \alpha, and sample size.[1] One-sided intervals are also used, providing either a lower or upper bound.[1] The concept originated in the early 1940s with Samuel S. Wilks' work on determining sample sizes for setting tolerance limits, building on nonparametric methods to ensure coverage of population proportions.[2] Subsequent developments, such as W.G. Howe's 1969 method for normal distributions, standardized calculations using chi-square and t-distributions for precise factor computation. Tolerance intervals apply across fields like manufacturing quality control, where they verify if specification limits encompass a high proportion (e.g., 99%) of product measurements with 95% confidence, and environmental monitoring for pollutant ranges.[3] They are particularly valuable in process validation, such as pharmaceutical assays or food safety assessments, to infer population behavior from limited samples without assuming the full distribution.[3] Key distinctions include their focus on individual values rather than means or single predictions, requiring larger samples than confidence intervals for similar precision due to dual coverage of proportion and confidence.[1] Modern implementations in software like Minitab or R use these methods for both parametric (normal) and nonparametric cases, with Bayesian extensions emerging for complex distributions.[4]Fundamentals
Definition
A tolerance interval is a type of statistical interval designed to contain at least a specified proportion p of the values in a population, with a stated confidence level \gamma = 1 - \alpha that the interval achieves this coverage.[1] This makes it particularly useful for describing the range within which a large share of the population is expected to fall, accounting for both the variability in the sample and the uncertainty in estimating that variability.[5] For instance, a tolerance interval might be constructed to cover 95% of the population (p = 0.95) with 95% confidence (\gamma = 0.95), ensuring that the probability the interval misses more than 5% of the population is only 5%.[1] The key parameters defining a tolerance interval include the coverage proportion p, which specifies the minimum fraction of the population to be enclosed; the confidence level \gamma, which quantifies the reliability of the coverage claim; and the sample size n, which influences the interval's width and precision.[1] Larger samples generally yield narrower intervals for the same p and \gamma, but the construction balances these to reflect population spread rather than point estimates.[4] Tolerance intervals differ fundamentally from other statistical intervals: they do not bound population parameters like confidence intervals, nor do they predict single future observations like prediction intervals; instead, they provide bounds for a substantial portion of the entire population distribution.[1][6] The concept of tolerance intervals originated in the 1940s amid growing applications of statistics to quality control in manufacturing, with foundational theoretical developments by Samuel S. Wilks in his work on statistical prediction and tolerance limits. Subsequent contributions, including those by Abraham Wald on setting tolerance limits for normal distributions, further established their role in industrial settings.Types
Tolerance intervals are classified primarily by their directionality, which determines whether they provide bounds on one tail or both tails of the distribution. One-sided tolerance intervals establish either a lower bound, ensuring that at least a proportion p of the population exceeds the limit with confidence \gamma, or an upper bound, ensuring that no more than a proportion $1-p exceeds the limit with confidence \gamma.[1] These are commonly applied in scenarios such as quality control where only a minimum strength or maximum defect rate is of interest, for instance, setting a lower tolerance limit for the tensile strength of materials to guarantee that at least 95% of items meet the requirement with 99% confidence. Two-sided tolerance intervals provide both lower and upper bounds, capturing at least proportion p of the population within the interval with confidence \gamma, and can be symmetric or asymmetric depending on the application.[1] Within two-sided intervals, equal-tailed variants allocate equal proportions of the uncovered content to each tail, such as (1-p)/2 on each side, resulting in a central interval that symmetrically bounds the distribution.[7] In contrast, unequal-tailed two-sided intervals allow asymmetric tail probabilities, which may be preferred when the distribution or requirements are skewed, enabling more coverage on one side while still achieving the overall content guarantee.[8] Tolerance intervals are further categorized by distributional assumptions into parametric and non-parametric types. Parametric tolerance intervals assume a specific underlying distribution, such as the normal distribution, to derive bounds that are typically narrower and more precise for smaller samples when the assumption holds.[9] Non-parametric tolerance intervals, also known as distribution-free, make no such assumptions and rely on the order statistics of the sample, offering broader applicability but requiring larger sample sizes to achieve comparable coverage guarantees.[9] Special cases extend tolerance intervals beyond univariate settings. Tolerance bands for regression provide interval bounds that vary with predictor variables in linear or nonlinear models, ensuring coverage of future responses across the range of covariates, often constructed simultaneously to control overall confidence.[10] Multivariate tolerance intervals or regions bound a proportion p of a multidimensional population with confidence \gamma, using methods like data depth or spacings to define ellipsoidal or other shaped regions suitable for vector-valued quality characteristics.[11]Methods
Parametric Approaches
Parametric approaches to tolerance intervals rely on the assumption that the underlying population follows a specified parametric distribution, with the normal distribution being the most prevalent case due to its widespread applicability in quality control and engineering contexts.[1] Under this framework, the data are modeled as independent and identically distributed samples from a normal distribution N(\mu, \sigma^2), where the population mean \mu and variance \sigma^2 are typically unknown and estimated from the sample. This parametric assumption enables the derivation of exact or approximate intervals that guarantee, with a specified confidence level \gamma, coverage of at least a proportion p of the population. For one-sided tolerance intervals, the lower bound is constructed as L = \bar{y} - k s, where \bar{y} is the sample mean, s is the sample standard deviation, and k is a tolerance factor chosen such that the interval covers at least proportion p of the population with confidence \gamma.[1] The factor k is obtained from the quantile of a noncentral t-distribution with noncentrality parameter \delta = z_p \sqrt{n}, degrees of freedom n-1, and cumulative probability $1 - \gamma, where z_p is the p-quantile of the standard normal distribution and n is the sample size. This approach, originally proposed by Paulson, ensures the probabilistic coverage by linking the interval to the distribution of future observations relative to the sample estimates. An upper one-sided bound follows symmetrically as U = \bar{y} + k s. Two-sided tolerance intervals take the form [\bar{y} - k s, \bar{y} + k s], where the tolerance factor k is determined to achieve the desired coverage p with confidence \gamma, but its computation is more involved than for the one-sided case due to the joint uncertainty in mean and variance estimates.[1] The factor k incorporates elements from the chi-squared distribution to account for the variability in s^2 / \sigma^2, which follows a \chi^2_{n-1} distribution scaled by $1/(n-1); specifically, approximate methods solve for k such that the expected coverage meets the criteria, often using k \approx z_{(1+p)/2} \sqrt{\frac{\chi^2_{\alpha, n-1}}{n-1}}, where \alpha = 1 - \gamma and \chi^2_{\alpha, n-1} is the upper \alpha critical value of the chi-squared distribution, though a more refined approximation includes multiplication by \sqrt{1 + 1/n} and exact solutions require numerical integration over the joint distribution of \bar{y} and s. This formulation balances the symmetric bounds while ensuring the interval's reliability under normality. The derivation of these intervals centers on pivotal quantities that transform the problem into a distribution-free coverage probability statement. For the normal distribution, the standardized sample mean (\bar{y} - \mu)/(\sigma / \sqrt{n}) follows a t_{n-1} distribution, and the coverage of a future observation Y relative to the interval involves a noncentral t pivotal quantity for one-sided bounds, leading to the selection of k that satisfies \Pr(\Pr(Y > L \mid \bar{y}, s) \geq p) = \gamma. For two-sided intervals, order statistics of the sample play a role in approximating the minimum coverage, but the primary reliance is on the joint pivotal distribution of the mean and variance, often requiring inversion of the coverage probability function.[1] When parameters \mu and \sigma^2 are unknown, as is standard, the tolerance factors k explicitly incorporate the degrees-of-freedom adjustment in the t and chi-squared distributions to reflect estimation uncertainty, widening the interval compared to known-parameter cases. For finite populations of size N, a correction factor \sqrt{(N - n)/(N - 1)} is applied to the standard deviation component in the interval formula, reducing the effective variability and narrowing the bounds to account for the exhaustive sampling fraction n/N. This adjustment ensures the interval's validity when the population is not infinite, preserving the coverage guarantees.Non-Parametric Approaches
Non-parametric approaches to constructing tolerance intervals rely on distribution-free methods that do not assume any specific form for the underlying population distribution, making them robust alternatives when parametric assumptions, such as normality, cannot be justified. These methods primarily utilize order statistics from a random sample of size n, where the observations are ranked as X_{(1)} \leq X_{(2)} \leq \cdots \leq X_{(n)}. A two-sided tolerance interval is typically formed as [X_{(r+1)}, X_{(n-s)}], with r and s being non-negative integers that control the number of observations excluded from each tail to ensure the desired coverage properties.[12] The values of r and s are determined using exact binomial probabilities to achieve a content of at least proportion p with confidence level \gamma. Specifically, let k = r + s; then r and s are chosen as the smallest integers such that \sum_{i=0}^{k} \binom{n}{i} (1-p)^i p^{n-i} \geq \gamma, where the sum represents the probability that at most k future observations from the population fall outside the interval, ensuring the coverage requirement with the specified confidence. For symmetric intervals, r = s, though asymmetric choices may be used for skewed data. The actual coverage proportion follows a Beta distribution, Beta(r+1, s+1), which provides a mechanism to assess the expected coverage, with mean (n - r - s + 1)/(n + 1). This approach guarantees the tolerance interval contains at least $100p\% of the population with confidence \gamma, independent of the distribution.[12] These methods offer key advantages, including applicability to skewed, multimodal, or otherwise unknown distributions without requiring goodness-of-fit tests, thereby enhancing robustness in real-world scenarios where data may deviate from parametric ideals. However, they come with limitations: the resulting intervals are generally wider than those from parametric methods due to reduced statistical efficiency, often necessitating larger sample sizes (e.g., hundreds or more for tight p and high \gamma); additionally, while computational demands for solving the binomial sums were intensive historically, they are manageable today but can still pose challenges for extremely large n.[12] The historical development of non-parametric tolerance intervals traces back to foundational work by Wilks in the early 1940s, who introduced order statistic-based limits for small samples, with significant expansions in the 1960s through contributions like Walsh's tables and approximations that facilitated practical implementation for various sample sizes and coverage levels.[2]Comparisons
With Confidence Intervals
A confidence interval (CI) is a statistical range that bounds an unknown population parameter, such as the mean or variance, with a specified confidence level of $1 - \alpha, indicating the probability that the interval contains the true parameter value based on the sample.[1] In contrast, a tolerance interval (TI) aims to enclose a specified proportion p of the population distribution with confidence $1 - \alpha, focusing on the spread of individual values rather than a single parameter, which makes TIs generally wider than CIs due to incorporating both sampling variability and population dispersion.[6][1] A key mathematical relation between the two is that a TI can be interpreted as a confidence interval applied to a quantile of the underlying distribution, where the TI bounds ensure coverage of the proportion p around that quantile with the desired confidence.[13] For instance, a 95% CI for the population mean of a product's strength might span 100 to 120 units, estimating the central tendency, whereas a 95% TI covering 95% of the population values at 95% confidence could extend more broadly, such as 80 to 140 units, to account for the full range of variability in individual measurements.[6] When the population parameters, such as the mean and standard deviation, are fully known, a TI simplifies to the exact fixed percentiles of the distribution (e.g., the p/2 and $1 - p/2 quantiles for a two-sided interval), as there is no sampling uncertainty to incorporate.[1]With Prediction Intervals
A prediction interval (PI) provides an interval that, with confidence level $1 - \alpha, is expected to contain the value of a single future observation drawn from the same population, incorporating both the uncertainty in estimating the population parameters from the sample and the random variability of the individual observation itself.[14] Unlike a tolerance interval (TI), which aims to encompass a specified proportion p of the entire population with confidence $1 - \alpha, a PI is tailored to bound just one additional data point, making it narrower than a TI when p > 1/n (where n is the sample size) because the latter must account for the spread across multiple population units rather than a solitary instance.[6] The two intervals exhibit notable relations under limiting conditions. Furthermore, when population parameters are known with certainty, a TI designed to cover proportion p = 1 - \alpha with full confidence (i.e., certainty) coincides exactly with a PI at the same confidence level, as both reduce to the deterministic bounds \mu \pm z_{1 - \alpha/2} \sigma, where z is the standard normal quantile, \mu the mean, and \sigma the standard deviation.[1] In terms of construction for normally distributed data, the formulas highlight their distinct statistical foundations. A PI is typically computed as \bar{y} \pm t_{\alpha/2, n-1} s \sqrt{1 + 1/n}, where \bar{y} is the sample mean, s the sample standard deviation, t_{\alpha/2, n-1} the critical value from the t-distribution with n-1 degrees of freedom, and the term under the square root captures both parameter estimation error and observational variance.[14] By contrast, a TI employs \bar{y} \pm k s, where the tolerance factor k is derived from the noncentral t-distribution or chi-squared distribution to ensure the required population coverage and confidence, resulting in a more complex adjustment for the joint uncertainty in mean and variance estimates across the population proportion.[1] Prediction intervals find application in scenarios requiring forecasts for individual outcomes, such as estimating the performance of a single future unit in regression-based predictions.[6] Tolerance intervals, however, are better suited to batch quality assurance, where the goal is to verify that a large proportion of produced items—such as 99% of a manufacturing run—falls within acceptable limits with high confidence, ensuring overall process reliability.[15]Applications
Example 1: Parametric Two-Sided Tolerance Interval for Machine Part Diameters
Consider a sample of 20 machine part diameters measured in millimeters, assumed to follow a normal distribution: 9.72, 9.88, 10.05, 9.95, 10.12, 9.81, 10.03, 9.94, 10.08, 9.89, 10.01, 9.96, 10.07, 9.92, 10.04, 9.98, 10.02, 9.93, 10.00, 9.97. The sample mean \bar{y} = 9.97 and sample standard deviation s = 0.095.[1] For a two-sided tolerance interval covering at least 95% of the population (p = 0.95) with 95% confidence (\gamma = 0.95), the interval is given by \bar{y} \pm k s, where k is the tolerance factor obtained from standard tables for the normal distribution. For n = 20, k = 2.752.[16] Thus, the lower limit L = 9.97 - 2.752 \times 0.095 \approx 9.71 and upper limit U = 9.97 + 2.752 \times 0.095 \approx 10.23, yielding the interval [9.71, 10.23]. This means there is 95% confidence that at least 95% of future diameters will fall within this range.[1]Example 2: One-Sided Non-Parametric Upper Tolerance Interval for Environmental Contaminant Levels
Consider a sample of 30 contaminant levels measured in parts per million (ppm) from environmental samples, without assuming a specific distribution: 1.12, 1.25, 1.18, 1.34, 1.21, 1.45, 1.29, 1.52, 1.37, 1.61, 1.43, 1.68, 1.49, 1.73, 1.55, 1.79, 1.62, 1.84, 1.67, 1.91, 1.72, 1.96, 1.78, 2.02, 1.83, 2.08, 1.89, 2.14, 1.94, 2.20 (sorted for clarity). For a one-sided upper tolerance interval covering at least 90% of the population (p = 0.90) with 99% confidence (\gamma = 0.99), the interval is (-\infty, X_{(r)}], where X_{(r)} is the r-th order statistic and r is determined using the binomial approximation to ensure P(\text{Bin}(n, p) \geq r) \geq \gamma. For n = 30, p = 0.90, \gamma = 0.99, r = 23 (computed as the largest integer such that P(\text{Bin}(30, 0.90) \geq 23) \geq 0.99, using the lower-tail cumulative probability P(\text{Bin}(30, 0.90) \leq 22) \approx 0.003 < 0.01). The 23rd order statistic is X_{(23)} = 1.78, so the interval is (-\infty, 1.78]. This guarantees with 99% confidence that at least 90% of future contaminant levels will be below 1.78 ppm.Interpretation of Results
Post-calculation verification of coverage and confidence can be performed via simulation or bootstrap methods. For the parametric example, generate 10,000 simulated normal samples with the estimated mean and standard deviation, compute the proportion falling within [9.71, 10.23] for each, and confirm the empirical coverage exceeds 95% in at least 95% of simulations. For the non-parametric example, resample the data 10,000 times, compute the order statistic X_{(23)} each time, and verify that in at least 99% of cases, the true p-quantile (estimated separately) falls below the bound. These procedures empirically validate the interval's properties beyond the theoretical construction.[1]Sensitivity Analysis: Effect of Sample Size on Interval Width
Increasing the sample size n reduces the tolerance factor k in parametric methods, narrowing the interval width $2ks for fixed s. The following table illustrates this for two-sided normal tolerance intervals with p = 0.95 and \gamma = 0.95:| Sample Size n | Tolerance Factor k | Approximate Width Factor (Relative to s) |
|---|---|---|
| 10 | 3.379 | 6.758 |
| 20 | 2.752 | 5.504 |
| 30 | 2.529 | 5.058 |
| 50 | 2.379 | 4.758 |