Decile
A decile is a quantile that divides a sorted dataset into ten equal parts, each containing 10% of the observations, with the nine decile points corresponding to the 10th, 20th, 30th, 40th, 50th, 60th, 70th, 80th, and 90th percentiles.[1] This measure extends the concept of quartiles (which divide data into four parts) and quintiles (five parts) by providing finer granularity for analyzing distributions, particularly in large datasets where understanding segmented ranges is essential.[2] Deciles are calculated by first ordering the data from lowest to highest and then identifying positions using the formula for the k-th decile: L_{D_k} = \frac{k(n+1)}{10}, where n is the number of data points and k ranges from 1 to 9; if the position falls between integers, interpolation is typically applied.[3] For grouped or continuous data, an adjusted formula incorporates cumulative frequencies and class intervals to estimate the decile value within the relevant interval.[4] In practice, deciles are widely applied in economics, finance, and social sciences to summarize income, wealth, and earnings distributions, revealing patterns of inequality and variability across population segments.[2] For instance, the U.S. Bureau of Labor Statistics routinely publishes deciles alongside quartiles to describe usual weekly earnings for full-time workers, aiding policymakers in assessing labor market trends.[5] Similarly, deciles help in educational and health research to categorize outcomes by socioeconomic groups, such as identifying mortality differentials across earnings deciles.[6]Definition and Fundamentals
Definition of Decile
A decile is any of the nine values that divide a sorted dataset into ten equal-frequency subsets, with each subset containing 10% of the data points.[7] These values mark the boundaries where the cumulative frequency reaches 10%, 20%, up to 90% of the total observations.[8] The term "decile" derives from the Latin word decem, meaning "ten," reflecting its role in partitioning data into tenths.[9] In statistical contexts, the concept was first introduced in 1882 by Francis Galton, who used it to describe divisions in anthropometric data distributions.[10] Deciles are typically denoted as D_k for the k-th decile, where k = 1 to $9, representing the lower deciles that separate the subsets. Deciles represent specific instances within the broader framework of percentiles, which generalize such divisions to any percentage.[8]Relation to Percentiles and Quartiles
Deciles represent specific instances of percentiles, dividing a dataset or probability distribution into ten equal parts, each comprising 10% of the data. The k-th decile corresponds precisely to the (10k)-th percentile, such that the first decile (D1) is the 10th percentile, the second decile (D2) is the 20th percentile, and so on, up to the ninth decile (D9) as the 90th percentile.[11][12] In comparison to quartiles, which partition data into four equal segments of 25% each—denoted as the first quartile (Q1 at the 25th percentile), second quartile (Q2 at the 50th percentile), third quartile (Q3 at the 75th percentile), and with the median serving as Q2—deciles offer a more subdivided view by creating ten segments of 10% each. Notably, the median aligns as both the second quartile (Q2) and the fifth decile (D5), providing a common reference point across these measures.[13][14] Visually, deciles appear as points along the cumulative distribution function (CDF) of a random variable, marking the values where the CDF reaches 0.1, 0.2, ..., 0.9, thereby illustrating the progressive accumulation of probability mass in the distribution. This positioning on the CDF highlights how deciles capture the quantiles at these intervals, offering a stepwise depiction of the distribution's shape.[15][16] Deciles provide advantages over quartiles by delivering finer granularity, which is particularly beneficial for analyzing skewed distributions where additional division points better reveal asymmetries and tail behaviors that coarser quartiles might obscure.[14][17]Calculation and Computation
Empirical Method for Sample Data
To compute deciles from a finite sample dataset, begin by sorting the data in ascending order to obtain the ordered sample x_1 \leq x_2 \leq \cdots \leq x_n, where n is the sample size.[18] The k-th decile D_k (for k = 1, 2, \dots, 9) divides the data such that approximately 10k% of the observations lie at or below it.[18] The position of the k-th decile in the ordered sample is given by the formulai_k = \frac{k}{10} (n + 1).
If i_k is an integer i, then D_k = x_i.[18] This formula applies regardless of whether n is even or odd, as the addition of 1 ensures consistent positioning across sample sizes.[18] If i_k is not an integer, express it as i_k = i + f, where i = \lfloor i_k \rfloor is the integer part and $0 < f < 1 is the fractional part. Linear interpolation yields
D_k = x_i + f (x_{i+1} - x_i).
This approach provides a smooth estimate between adjacent ordered values.[18] In the presence of ties (repeated values in the dataset), sort the data as usual, placing tied observations consecutively in the ordered list; the position formula and interpolation proceed unchanged, using the tied values directly, which naturally averages across equal observations when the fraction f spans them.[18] For exact integer positions falling on tied values, the decile takes that shared value; if interpolation requires averaging adjacent tied values (e.g., f = 0.5 between identical x_i and x_{i+1}), the result remains the tied value itself.[18] Consider a small example dataset of 10 test scores: 55, 62, 67, 71, 74, 78, 82, 85, 89, 95 (n = 10). The ordered data are x = [55, 62, 67, 71, 74, 78, 82, 85, 89, 95]. Positions are i_k = (k/10) \times 11.
- For D_1: i_1 = 1.1, so D_1 = 55 + 0.1(62 - 55) = 55 + 0.7 = 55.7.
- For D_2: i_2 = 2.2, so D_2 = 62 + 0.2(67 - 62) = 62 + 1 = 63.
- For D_3: i_3 = 3.3, so D_3 = 67 + 0.3(71 - 67) = 67 + 1.2 = 68.2.
- For D_4: i_4 = 4.4, so D_4 = 71 + 0.4(74 - 71) = 71 + 1.2 = 72.2.
- For D_5: i_5 = 5.5, so D_5 = 74 + 0.5(78 - 74) = 74 + 2 = 76.
- For D_6: i_6 = 6.6, so D_6 = 78 + 0.6(82 - 78) = 78 + 2.4 = 80.4.
- For D_7: i_7 = 7.7, so D_7 = 82 + 0.7(85 - 82) = 82 + 2.1 = 84.1.
- For D_8: i_8 = 8.8, so D_8 = 85 + 0.8(89 - 85) = 85 + 3.2 = 88.2.
- For D_9: i_9 = 9.9, so D_9 = 89 + 0.9(95 - 89) = 89 + 5.4 = 94.4.