Fact-checked by Grok 2 weeks ago

Subsampling

Subsampling is the process of selecting a subset of data points or observations from a larger sample or dataset, often to reduce computational demands, approximate population characteristics, or enable efficient analysis while maintaining representativeness.^[1] In statistics, a subsample is explicitly defined as a sample drawn from an existing sample, serving as a portion of the original sample that itself represents a part of the broader population.^[2] In statistical inference, subsampling denotes a specific resampling methodology developed by Dimitris N. Politis and Joseph P. Romano in 1994, which estimates the sampling distribution of a normalized estimator—known as a root statistic—by calculating the estimator over multiple contiguous or non-overlapping subsamples of fixed size b from the original dataset of size n, where b \to \infty and b/n \to 0.^[3] This approach constructs asymptotically valid confidence regions under minimal assumptions, requiring only that the root statistic converges to a limiting distribution, without needing uniformity of convergence or strong parametric conditions on the underlying data-generating process.^[3] Unlike the bootstrap, which resamples with replacement from the empirical distribution and can fail in cases of discontinuous distribution functions (e.g., for quantiles or extreme order statistics), subsampling proves robust by directly leveraging the structure of the observed data.^[4] It has been extended to dependent data such as stationary time series and random fields, facilitating applications like spectral density estimation.^[3] The foundational text on this method, Subsampling by Politis, Romano, and Michael Wolf (1999), provides a comprehensive framework for its implementation and theoretical underpinnings.^[5] Beyond inferential statistics, subsampling plays a crucial role in machine learning, where it involves randomly selecting subsets of training data to accelerate model fitting, mitigate class imbalances through undersampling the majority class, or enhance privacy via mechanisms like privacy amplification by subsampling in differential privacy protocols.^[6]^[7] For instance, in boosting algorithms or stochastic gradient descent, subsampling reduces overfitting and computational costs by training on fractions of the dataset iteratively.^[8] In signal processing and digital media, subsampling typically refers to downsampling, which decreases the sampling rate to lower resolution, or chroma subsampling, a compression technique that reduces the resolution of color (chroma) components relative to brightness (luma) information—expressed in ratios like 4:2:0—to minimize bandwidth and storage while exploiting human visual sensitivity to luminance over color detail.^[9] This latter application is standard in video encoding formats such as JPEG and MPEG, enabling efficient transmission without perceptible quality loss for most viewers.^[9]

General Principles

Definition and Terminology

Subsampling is a data reduction technique that involves selecting a subset of data points from a larger dataset or signal, thereby decreasing its overall size while striving to retain key characteristics such as statistical properties or frequency content. This process is commonly employed across disciplines like signal processing and statistics to manage computational demands, lower storage needs, or simplify analysis without substantial loss of information. By reducing the resolution, sampling rate, or density of observations, subsampling facilitates efficient handling of high-dimensional or voluminous data.^[10] In signal processing, subsampling is often synonymous with downsampling, which specifically denotes the reduction of a signal's sampling rate by an integer factor M, typically involving low-pass filtering to prevent aliasing followed by retention of every M-th sample. Related terminology includes "decimation," which refers to the same integer-factor rate reduction process, emphasizing the combined filtering and subsampling steps to convert a signal from a higher sampling rate f_s to a lower one f_s / M. "Undersampling," by contrast, describes intentional sampling at a rate below the Nyquist frequency for bandlimited signals, which can exploit aliasing for applications like bandpass signal acquisition but risks distortion if not carefully managed. These terms trace their origins to foundational sampling theory developed in the 1940s, building on Harry Nyquist's 1928 work on telegraph transmission limits and Claude Shannon's 1949 formalization of reconstruction conditions for bandlimited signals.^[11] A key distinction exists between subsampling and supersampling: while subsampling decreases the number of data points to achieve reduction, supersampling increases sampling density—often by rendering or acquiring at higher resolution—before averaging to combat aliasing artifacts in the final output. This contrast highlights subsampling's focus on efficiency through sparsity rather than enhancement through oversampling. Understanding these concepts presupposes familiarity with the Nyquist-Shannon sampling theorem, which asserts that a continuous bandlimited signal with maximum frequency f_{\max} can be perfectly reconstructed from discrete samples taken at a rate greater than $2f_{\max}, providing the theoretical basis for safe rate reductions in subsampling.^[12] In statistics, subsampling involves selecting subsets from an existing sample to approximate population parameters or construct confidence intervals; this can include uniform random selection without replacement or non-random methods such as contiguous blocks to preserve temporal or spatial correlations in dependent data, particularly when full resampling like bootstrapping is infeasible. In signal processing applications, such as audio or image bandwidth compression, subsampling reduces resource requirements while maintaining perceptual quality.^[11]

Mathematical Foundations

Subsampling of a discrete-time signal x by an integer factor M produces a new sequence y = x[Mm], where m \in \mathbb{Z}. This operation retains every Mth sample of the original signal, effectively reducing the sampling rate by a factor of M. The process can be viewed as compressing the time axis, leading to a denser representation in the time domain but potentially introducing distortions in the frequency domain if not properly managed.^[13] To derive the effects of aliasing, the subsampled sequence is analyzed in the frequency domain. Specifically, the derivation involves the spectrum replication due to the periodic selection, leading to aliased components where higher frequencies fold into the baseband. In the Fourier domain, the discrete-time Fourier transform (DTFT) of the subsampled signal Y(e^{j\omega}) is given by

Y(e^{j\omega}) = \frac{1}{M} \sum_{k=0}^{M-1} X\left(e^{j (\omega - 2\pi k)/M}\right),

where X(e^{j\omega}) is the DTFT of the original signal. This formula demonstrates spectrum compression by a factor of $1/M along with aliasing, as the M shifted and scaled versions of the original spectrum are averaged, causing higher-frequency components to fold into the lower-frequency range. The aliased frequencies can be expressed as f_{\text{alias}} = f / M + k / M for integer k, where f is the original normalized frequency, illustrating how components at f + k alias to the same location after scaling.^[13] In statistical and data contexts, subsampling refers to selecting a subset of size b from an original dataset of size N via uniform random selection without replacement, where each possible subset is equally likely. The probability that any specific data point is included in the subsample is p = b / N. This approach approximates the original distribution and is used for inference, such as estimating variability without assuming independence. The approximation error in subsampled representations can be quantified using the mean squared error (MSE), particularly for estimators like the sample mean. For uniform subsampling without replacement, the MSE of the subsampled sample mean \bar{y} (which is unbiased for the population mean) equals its variance, given by

\text{MSE}(\bar{y}) = \frac{N - b}{N} \cdot \frac{\sigma^2}{b},

where \sigma^2 is the population variance. This formula highlights how the error decreases with larger subsample sizes b but increases with the finite population correction factor (N - b)/N, providing bounds on the fidelity of the subsampled approximation relative to the full dataset.^[14]

Subsampling in Signal Processing

Downsampling Process

The downsampling process in digital signal processing, also known as decimation, reduces the sampling rate of a discrete-time signal by an integer factor M while preserving the signal's relevant information content.^[15] This is achieved through a two-step procedure designed to prevent aliasing and maintain signal integrity. The first step involves applying a low-pass filter to the input signal x to bandlimit it, ensuring that frequency components above the new Nyquist frequency f_s / (2M) (where f_s is the original sampling frequency) are attenuated.^[15] The second step is decimation, where every M-th sample of the filtered signal is retained, effectively discarding the intermediate samples.^[15] For example, with M=2, the downsampled signal y is formed as

y{{grok:render&&&type=render_inline_citation&&&citation_id=0&&&citation_type=wikipedia}} = x{{grok:render&&&type=render_inline_citation&&&citation_id=0&&&citation_type=wikipedia}}

y{{grok:render&&&type=render_inline_citation&&&citation_id=1&&&citation_type=wikipedia}} = x{{grok:render&&&type=render_inline_citation&&&citation_id=2&&&citation_type=wikipedia}}

y{{grok:render&&&type=render_inline_citation&&&citation_id=2&&&citation_type=wikipedia}} = x{{grok:render&&&type=render_inline_citation&&&citation_id=4&&&citation_type=wikipedia}}

, and so on, halving the sampling rate.^[15] For efficient implementation, especially in hardware or real-time systems, polyphase decomposition reconfigures the filtering and downsampling operations to minimize computations.^[16] The filter's impulse response h is partitioned into M polyphase components E_k(z) = \sum_{m} h(Mm + k) z^{-m} for k = 0, 1, \dots, M-1, allowing the downsampler to be moved before the filters using the noble identity.^[16] This results in processing the input at the lower output rate rather than the full input rate, reducing multiplications by approximately a factor of M. For M=4, the structure consists of four parallel branches: the input signal is delayed successively by 0, 1, 2, and 3 samples, each branch is downsampled by 4 (selecting every fourth sample), filtered by the corresponding polyphase filter E_k(z), and the outputs are summed to produce the decimated signal.^[16] When the desired rate change is a rational factor L/M (where L and M are coprime integers not equal to 1), the process combines upsampling by L with low-pass interpolation filtering, followed by downsampling by M with anti-aliasing filtering.^[17] The overall low-pass filter is designed to meet the stricter of the two requirements, with a cutoff at \min(1/L, 1/M) times the intermediate sampling rate.^[17] The foundational concepts of multirate digital signal processing, including these downsampling techniques, were developed in the 1970s by Ronald E. Crochiere and Lawrence R. Rabiner at Bell Laboratories.^[18]

Anti-Aliasing and Filtering

In subsampling by an integer factor M, aliasing occurs due to the periodic replication of the signal's frequency spectrum at intervals of $2\pi/M radians per sample in the digital domain, causing spectral folding where components above the cutoff frequency \pi/M overlap with the baseband spectrum. This mechanism distorts the subsampled signal by mapping high-frequency content into the lower-frequency range, potentially corrupting the desired information. For instance, a high-frequency sinusoid at angular frequency \omega = \pi - \epsilon (near the Nyquist limit but exceeding \pi/M) will fold back to appear as a low-frequency component at \omega' \approx \epsilon/M after subsampling, masquerading as part of the original baseband and introducing artifacts that cannot be removed post-subsampling.^[19] To prevent such aliasing, an anti-aliasing low-pass filter is essential before the subsampling operation, ideally with a cutoff frequency of \pi/M radians per sample to preserve the baseband while attenuating higher frequencies that would otherwise fold in. The frequency response of this ideal filter is rectangular, passing signals unchanged up to \pi/M and zero thereafter, with its impulse response given by the sinc function:

h = \frac{\sin(\pi n / M)}{\pi n / M}, \quad n \neq 0,

and

h{{grok:render&&&type=render_inline_citation&&&citation_id=0&&&citation_type=wikipedia}} = 1

by continuity, ensuring a passband gain that maintains signal amplitude levels appropriate for decimation. This theoretical filter provides perfect reconstruction of the baseband but is non-causal and infinite in duration, making it unrealizable in practice without approximation.^[20]^[21] In practical implementations, finite impulse response (FIR) filters are commonly employed for anti-aliasing due to their ability to achieve linear phase response, which avoids phase distortion and preserves the temporal alignment of signal features, unlike infinite impulse response (IIR) filters that offer computational efficiency through recursion but introduce nonlinear phase shifts. FIR designs often use windowing methods, such as the Kaiser window, where the shape parameter \beta (typically ranging from 4 to 10) trades off passband ripple against stopband attenuation to meet aliasing suppression requirements; for example, \beta = 5.5 yields about 50 dB stopband rejection suitable for many audio decimation tasks. IIR filters, while requiring fewer coefficients (e.g., second-order sections for elliptic designs achieving sharp roll-off), demand careful stability analysis and are less favored when phase linearity is critical, though they reduce multiplications per output sample by up to an order of magnitude in resource-constrained systems. The choice balances computational cost—FIR filters may need hundreds of taps for steep transitions, increasing latency and power use—against performance, with polyphase structures further optimizing FIR decimators by partitioning the filter into M subfilters operating at the reduced rate.^[22]^[23]^[24] Pre-1990s digital signal processing literature, such as early works focusing on basic sampling theory, often underemphasized efficient polyphase anti-aliasing techniques for subsampling, treating filters as straightforward convolutions without rate-specific optimizations; these methods, popularized through seminal analyses in the early 1980s, are now standard for minimizing redundant computations in decimation by up to M-fold.

Subsampling in Statistics

Methods and Techniques

In statistical inference, subsampling refers to a resampling methodology introduced by Dimitris N. Politis and Joseph P. Romano in 1994 for estimating the sampling distribution of a normalized estimator, known as a root statistic, under minimal assumptions.^[3] The core procedure involves computing the root statistic over multiple subsamples of fixed size b drawn from the original dataset of size n, where b \to \infty and b/n \to 0 as n \to \infty. For independent and identically distributed (i.i.d.) data, subsamples can be non-overlapping or random without replacement. For dependent data, such as stationary time series, contiguous blocks of size b are typically used to preserve temporal structure. The empirical distribution of the root statistics from these subsamples approximates the limiting distribution of the full-sample root statistic, enabling the construction of asymptotically valid confidence regions without requiring strong parametric assumptions or uniformity conditions.^[3]^[5] Unlike the bootstrap, which resamples with replacement from the empirical distribution and may fail for discontinuous statistics (e.g., quantiles or median), subsampling is robust because it directly uses the observed data structure without artificial resampling. The choice of b is critical; theoretical guidelines suggest b \approx n^{1/3} for many cases to balance bias and variance in the approximation.^[4] Extensions include the jackknife, a special case of subsampling where b = n-1 and subsamples omit one observation each, used primarily for variance estimation. The jackknife variance estimator for an estimator \hat{\theta} is

\hat{\text{Var}}(\hat{\theta}) = \frac{n-1}{n} \sum_{i=1}^{n} (\hat{\theta}_{(i)} - \bar{\hat{\theta}})^2,

where \hat{\theta}_{(i)} is the estimator from the subsample excluding the i-th observation, and \bar{\hat{\theta}} is their average. Originally proposed by John Tukey in 1958, it provides a nonparametric uncertainty measure but can overestimate variance for non-smooth estimators.^[25]^[26]

Applications in Estimation

Subsampling is widely applied in constructing confidence intervals and testing hypotheses for complex estimators, particularly in nonparametric and semiparametric settings. In survey sampling, it facilitates efficient estimation in multi-stage designs; for example, the U.S. Census Bureau used subsampling in its 2000 Accuracy and Coverage Evaluation (A.C.E.) post-enumeration survey, which sampled approximately 300,000 housing units to measure census undercoverage and support dual-system estimation for demographic adjustments.^[27]^[28] For high-dimensional data where the number of variables p grows with sample size n, subsampling integrates with methods like penalized empirical likelihood (PEL) to form valid inference. In PEL, subsamples of size m (with m/n \to 0 and p/m \to 0) calibrate the empirical distribution of the adjusted log-likelihood ratio statistic, \hat{G}_n(x) = |I_n|^{-1} \sum_{I \in I_n} \mathbf{1}(-\log R^*_m(\mu_0; I) \leq x), enabling confidence intervals for means in sparse regimes.^[29] Subsampling has been extended to dependent data, including stationary time series and random fields, for applications such as spectral density estimation and volatility forecasting. The foundational text, Subsampling by Politis, Romano, and Michael Wolf (1999), details these extensions and implementation.^[5] Early implementations in the 1990s faced computational challenges for massive datasets, but post-2010 advancements, including GPU-accelerated parallel computing and techniques like subagging (subsample aggregation), have enabled scalable inference with convergence rates matching full-data methods. As of 2025, subsampling remains integral to big data analytics and machine learning-adjacent statistical procedures.^[30]^[31]

Subsampling in Image Processing

Chroma Subsampling

Chroma subsampling is a technique in image and video processing that reduces the resolution of color (chroma) information relative to brightness (luma) to achieve compression efficiency, exploiting the human visual system's lower spatial sensitivity to chromatic details compared to luminance variations. This principle allows the chroma channels, typically represented as Cb and Cr in the YCbCr color space, to be downsampled by a factor of 2 horizontally and/or vertically without significantly impairing perceived quality. The process begins with converting the input RGB image to YCbCr, where Y carries the luma, and Cb and Cr encode blue and red differences, respectively; subsampling then averages or filters chroma values over blocks of luma samples. Key standards define specific subsampling ratios relative to full sampling (4:4:4). In 4:2:2, chroma is subsampled by 2 horizontally but not vertically, halving the color data per line while preserving vertical color resolution, as specified in ITU-R BT.601 for studio digital video interfaces. The 4:2:0 format, common in consumer applications, further subsamples chroma by 2 vertically as well, reducing color information to one-quarter of the luma resolution by averaging over 2x2 blocks; this is the default in JPEG (ISO/IEC 10918-1) for still images and MPEG-1/2 (ISO/IEC 11172-2 and 13818-2) for video compression. A typical processing pipeline involves YCbCr conversion, low-pass filtering of chroma to prevent aliasing, subsampling, and subsequent quantization and encoding, as illustrated in block diagrams where luma remains at full 4:4:4 grid while chroma grids are coarsened.^[32] Despite its efficiency, chroma subsampling introduces artifacts, primarily color bleeding where sharp color edges appear smeared into adjacent areas due to the reduced chroma resolution and quantization effects.^[33] This is most noticeable in high-contrast color transitions, such as text on colored backgrounds, though the impact on overall quality is minimal for most content, with objective metrics showing some degradation compared to full 4:4:4 sampling. The technique originated in the 1950s with analog color television development, pioneered by Alda V. Bedford at RCA through her 1949 patent on bandwidth-efficient color encoding for NTSC systems, which separated luma and chroma to fit within monochrome broadcast limits. Early implementations in NTSC overlooked some perceptual nuances, such as chroma-luma interactions, leading to visible color artifacts in high-definition contexts. Refinements occurred in the 1990s with digital codecs like MPEG-1 and -2, which standardized 4:2:0 for DVD and broadcast, incorporating better filtering to align with human vision models while maintaining backward compatibility. As of 2025, 4:2:0 remains the standard chroma subsampling format in 4K UHD Blu-ray and many streaming services using HEVC compression, balancing quality and bandwidth efficiency.^[34]

Spatial and Temporal Subsampling

Spatial subsampling reduces the resolution of an image by mapping multiple original pixels to each new pixel in a lower-dimensional grid, effectively decreasing spatial detail while preserving overall structure. Two primary methods are nearest-neighbor selection, which assigns to the new pixel the value of the nearest original pixel for simplicity and speed, and pixel averaging, which computes the mean value of neighboring pixels to smooth the result. For a 2× reduction in both dimensions, pixel averaging typically involves the arithmetic mean of a 2×2 block:

x'_{i,j} = \frac{x_{2i,2j} + x_{2i+1,2j} + x_{2i,2j+1} + x_{2i+1,2j+1}}{4}

This approach, akin to a uniform bilinear interpolation, helps retain intensity information more evenly than nearest-neighbor, though it may introduce slight smoothing.^[35]^[36] Temporal subsampling lowers the frame rate in video sequences to cut data volume, often by frame dropping, such as retaining every second frame to halve the rate from 60 fps to 30 fps. This simple technique achieves compression but can produce jerkiness or stuttering in fast-motion scenes due to abrupt temporal changes. To mitigate this, motion compensation interpolates dropped frames using estimated motion vectors between retained frames, synthesizing intermediate content for smoother playback.^[37] These subsampling strategies find widespread use in generating thumbnails for quick image previews and enhancing video streaming efficiency, especially under bandwidth limitations prevalent in web applications during the early 2000s when average connection speeds were below 1 Mbps. By reducing file sizes, spatial subsampling enables faster loading of web images, while temporal subsampling optimizes real-time video delivery without excessive buffering.^[38] A key challenge in subsampling is the introduction of blurring artifacts from inadequate pre-filtering, which over-smooths edges and details, or insufficient low-pass filtering that fails to suppress high frequencies adequately. These issues, related to anti-aliasing principles, are effectively addressed by the Lanczos resampling kernel, a sinc-based windowed filter that balances sharpness and artifact reduction through its controlled frequency response, minimizing both blurring and aliasing in downscaled outputs.^[39]

Subsampling in Machine Learning

Data Reduction Techniques

Subsampling serves as a key data reduction technique in machine learning to manage large datasets by selecting representative subsets for training, thereby addressing memory constraints and computational demands while preserving model performance.^[40] This approach draws from statistical sampling principles but adapts them to the iterative, optimization-driven nature of machine learning workflows, enabling efficient processing of datasets that exceed single-machine capabilities.^[41] Random subsampling without replacement is a foundational method where a fixed number k of samples are uniformly selected from the full dataset to form a training set, ensuring no duplicates and fitting within available memory limits.^[40] For instance, in training deep neural networks on massive image corpora, this technique reduces the effective dataset size from millions to thousands of examples per iteration, accelerating convergence without significant bias in gradient estimates.^[42] Such uniform selection is particularly effective for initial model fitting, as it maintains the original data distribution proportionally.^[43] Importance sampling extends random methods by assigning selection probabilities based on the magnitude of loss gradients, with each sample x_i chosen proportional to |\nabla L(x_i)|, prioritizing examples that contribute most to error reduction. This weighted approach minimizes variance in stochastic gradient descent updates, leading to faster training on imbalanced or high-variance datasets, such as those in natural language processing where rare events drive performance gains.^[43] By focusing computational effort on informative samples, importance sampling can reduce training time by up to an order of magnitude (e.g., 10x speedup) in deep learning tasks compared to uniform sampling, while maintaining or improving accuracy.^[44] Curriculum subsampling introduces a progressive strategy, beginning with easier examples—often defined by lower initial loss values—and gradually incorporating harder ones to mimic human learning trajectories and avoid local optima. In practice, this involves sequencing subsamples by increasing difficulty metrics, such as model confidence scores, which has been shown to enhance generalization in sequence-to-sequence models over random ordering.^[45] The method's efficacy stems from smoothing the optimization landscape, making it suitable for complex tasks like computer vision where early exposure to simple patterns builds robust representations. Subsampling also plays a role in privacy-preserving machine learning, such as privacy amplification by subsampling in differential privacy protocols. This technique randomly selects a subset of data for processing, amplifying privacy guarantees (e.g., reducing the effective epsilon in DP-SGD) while enabling training on sensitive datasets like those in federated learning, as standardized in frameworks up to 2025.^[7] Prior to 2010, subsampling techniques in machine learning were often inefficient for big data due to reliance on single-node processing, limiting scalability to datasets under a few gigabytes and resulting in prohibitive training times for emerging large-scale applications.^[46] Since the release of Apache Spark in 2014, these methods have been integrated into distributed systems, enabling parallel subsampling across clusters for terabyte-scale data, with built-in functions for uniform and stratified sampling that reduce overhead by orders of magnitude. This shift has facilitated widespread adoption in production environments, such as recommendation systems handling billions of interactions.^[47]

Integration with Algorithms

Subsampling plays a crucial role in ensemble methods, particularly through bagging, where bootstrap sampling with replacement typically draws approximately 63% unique instances from the training set to create diverse base learners, thereby reducing variance in predictions.^[48] Variants like subagging employ subsampling without replacement at rates of 10-50% of the dataset size, which enhances computational efficiency while maintaining or improving predictive stability compared to traditional bagging.^[49] These approaches aggregate predictions from multiple subsampled models to yield robust ensemble outputs, as demonstrated in decision tree ensembles where subagging achieves comparable accuracy to bagging but with lower overhead.^[49] In stochastic gradient descent (SGD), subsampling forms the basis of mini-batch training, where gradients are estimated from subsets of size 32-256 to balance computational speed and optimization stability.^[50] This mini-batch approach reduces the variance of the stochastic gradient estimate from \sigma^2 (full single-sample case) to approximately \sigma^2 / b, where b is the batch size, enabling faster convergence toward minima in large-scale problems like neural network training.^[50] Larger batches further mitigate noise but increase per-iteration costs, making subsampling a key tunable parameter for practical deep learning workflows.^[50] Active learning integrates subsampling by selectively querying subsets of unlabeled data for oracle labeling, focusing on high-uncertainty or informative instances to minimize annotation efforts.^[51] This pool-based strategy reduces labeling costs relative to random sampling, as shown in text classification tasks where uncertainty sampling achieves target accuracy with far fewer queries.^[51] For instance, in radiology image labeling, active methods cut human effort by up to 90% while preserving model performance.^[52] As of 2023, advances have embedded subsampling within transformer architectures to handle long sequences, addressing the quadratic complexity of self-attention by techniques like random and layerwise token dropping during pretraining. This drops a fraction of input tokens (e.g., 10-20%) to sparsify computations without substantial accuracy loss, challenging early neural network assumptions of uniform token importance and enabling efficient processing of sequences exceeding 512 tokens. Such methods have been pivotal in scaling models like BERT variants for tasks involving extended contexts, yielding up to 2x speedups in training time.^[53]

References

[1]
SUBSAMPLE Definition & Meaning - Merriam-Webster
The meaning of SUBSAMPLE is to draw samples from (a previously selected group or population) : sample a sample of.
[2]
Subsample: Definition - Statistics How To
In statistics, a subsample is a sample of a sample. In other words, a sample is part of a population and a subsample is a part of a sample.
[3]
Large Sample Confidence Regions Based on Subsamples under ...
December, 1994 Large Sample Confidence Regions Based on Subsamples under Minimal Assumptions. Dimitris N. Politis, Joseph P. Romano · DOWNLOAD PDF + SAVE TO MY ...
[4]
[PDF] 1 Subsampling
Subsampling is originally due to Politis and Romano (1994). In order to motivate the idea behind subsampling, consider the following ... → 0 actually follows from ...
[5]
Subsampling | SpringerLink
The primary goal of this book is to lay some of the foundation for subsampling methodology and related methods.
[6]
Subsampling for class imbalances - tidymodels
Subsampling, either undersampling or oversampling, addresses class imbalances by reducing the majority class frequency to match the minority class, improving ...Subsampling For Class... · Subsampling The Data · Model PerformanceMissing: machine | Show results with:machine
[7]
Privacy Amplification by Subsampling - DifferentialPrivacy.org
Apr 13, 2025 · Privacy amplification by subsampling involves taking a random subset of a large dataset, running a DP algorithm on it, and the ambiguity of ...
[8]
Sub-Sampling Techniques
The basic idea is to train classifiers on multiple subsamples of the data and combine their predictions, usually by voting.<|control11|><|separator|>
[9]
Chroma Subsampling: 4:4:4 vs 4:2:2 vs 4:2:0 - RTINGS.com
Mar 4, 2019 · Chroma subsampling is a type of compression that reduces the color information in a signal in favor of luminance data. This reduces bandwidth ...Test Results · When Does It Matter? · Conclusion
[10]
Subsamplings - an overview | ScienceDirect Topics
Subsampling is defined as a technique used during boosting iterative processes to reduce overfitting by implementing stochastic gradient descent, which aims ...
[11]
https://ieeexplore.ieee.org/document/1456237
[12]
Supersampling — Documentation - Unigine Developer
Nov 30, 2021 · Supersampling is a technique used to increase the effective resolution of a frame by rendering the scene larger than its final resolution, ...Missing: subsampling definition
[13]
http://cnx.org/content/m12801/1.3/
[14]
24.4 - Mean and Variance of Sample Mean | STAT 414
Our result indicates that as the sample size n increases, the variance of the sample mean decreases. That suggests that on the previous page, if the instructor ...
[15]
Multirate DSP, Part 1: Upsampling and Downsampling - EETimes
Apr 21, 2008 · This chapter investigates basics of multirate digital signal processing, illustrates how to change a sampling rate for speech and audio signals,
[16]
Polyphase Decomposition | Spectral Audio Signal Processing
Polyphase Decomposition. The previous section derived an efficient polyphase implementation of an FIR filter $ h$ whose output was downsampled by the factor ...
[17]
Multirate DSP, part 2: Noninteger sampling factors - EE Times
### Summary of Rational Rate Changes Using Upsampling and Downsampling
[18]
Multirate Digital Signal Processing - Google Books
Intended for a one-semester advanced graduate course in digital signal processing or as a reference for practicing engineers and researchers.Missing: 1970s | Show results with:1970s
[19]
Downsampling and Aliasing | Spectral Audio Signal Processing
The aliasing theorem states that downsampling in time corresponds to aliasing in the frequency domain.
[20]
[PDF] Anti-aliasing (decimation) filtering before downsampling
Decimation and Interpolation. Decimation. • Anti-aliasing (decimation) filtering before downsampling. • Filter has cutoff frequency of π. /M. Interpolation.
[21]
Design of Decimators and Interpolators - MATLAB & Simulink
You can treat general rational conversions the same way as upsampling and downsampling operations. The cutoff is ω c = min ( 1 L , 1 M ) and the gain is L . ...
[22]
[PDF] Windowed-Sinc Filters
The frequency response of the ideal low-pass filter is shown in (a), with the corresponding filter kernel in (b), a sinc function. Since the sinc is infinitely ...Missing: downsampling | Show results with:downsampling
[23]
[PDF] Benefits of Integrated FIR and IIR Filters in Delta-Sigma ADCs
In general, there are three basic types of digital filters: finite impulse response (FIR) filters, infinite impulse response (IIR) filters, and sinc filters. ...
[24]
Overview of Multirate Filters - MATLAB & Simulink - MathWorks
Multirate filters are digital filters that change the sample rate of a digital signal, often incorporating FIR or IIR filters to mitigate aliasing or imaging.Decimation and Interpolation · Decimators · Interpolators · Sample Rate Converters
[25]
[PDF] Jackknife notes - Statistics & Data Science
Nov 29, 2023 · The jackknife variance estimate is known to usually over-estimate the true variance (i.e., to be biased upwards). (Efron and Stein 1981). This ...
[26]
https://www.jstor.org/stable/2237363
[27]
https://www2.census.gov/programs-surveys/decennial/2000/program-management/5-review/escap/memorandum/R-18.pdf
[28]
Post-Enumeration Surveys - U.S. Census Bureau
The Post-Enumeration Survey measures the accuracy of the census by surveying people and matching responses to the census to determine coverage.Overview · Procedural History · A.C.E. Design And...Missing: subsampling | Show results with:subsampling
[29]
A penalized empirical likelihood method in high dimensions - arXiv
Feb 13, 2013 · This paper formulates a penalized empirical likelihood (PEL) method for inference on the population mean when the dimension of the observations ...
[30]
Scalable subsampling: computation, aggregation and inference - arXiv
Dec 13, 2021 · Subsampling is a general statistical method developed in the 1990s aimed at estimating the sampling distribution of a statistic \hat \theta _n ...Missing: definition | Show results with:definition
[31]
Statistical methods and computing for big data - PMC
This article summarizes recent methodological and software developments in statistics that address the big data challenges.
[32]
[PDF] Chroma subsampling notation - Charles Poynton
Chroma subsampling notation uses 3 or 4 integers separated by colons. The first digit is luma horizontal reference, the second is horizontal subsampling of CB/ ...<|separator|>
[33]
Color bleeding reduction in image and video compression
... color bleeding is mainly caused by the subsampling and quantization of chroma components. Restoring luma can reduce blocking distortion, but with little ...Missing: loss | Show results with:loss
[34]
Image quality of 4∶2∶2 and 4∶2∶0 chroma subsampling formats
**Summary:**
[35]
Resample (Data Management)—ArcGIS Pro | Documentation
There are four options for the Resampling Technique parameter: Nearest—Performs a nearest neighbor assignment and is the fastest of the interpolation methods.<|separator|>
[36]
Introduction to Raster Resampling Methods
The Bilinear Interpolation method calculates new pixel values through weighted averaging of four neighboring pixels (4-neighborhood) in the input raster dataset ...
[37]
Motion‐based frame interpolation for film and television effects
Sep 23, 2020 · Frame interpolation is the process of synthesising a new frame in-between existing frames in an image sequence. It has emerged as a key ...
[38]
Bandwidth Constraints to Using Video and Other Rich Media in ...
Sep 16, 2005 · We describe the development of a bandwidth usage index, which seeks to provide a practical method to gauge the extent to which websites can successfully be ...Missing: subsampling | Show results with:subsampling
[39]
[PDF] Content-Adaptive Image Downscaling - Computer Science
Figure 3: The Lanczos kernel or Photoshop's “Sharpen” filter come at the expense of ringing artifacts due to negative lobes and oscillations in their ...Missing: mitigation | Show results with:mitigation
[40]
[PDF] Learning from Subsampled Data: Active and Randomized Strategies
May 17, 2013 · Subsampling a large dataset of rotamers uniformly at random makes these rare examples even rarer, which leads many estimators to overfit them.<|control11|><|separator|>
[41]
[PDF] A Review on Optimal Subsampling Methods for Massive Datasets
Subsampling is an effective way to deal with big data problems and many subsampling ap- proaches have been proposed for different models, such as leverage ...Missing: definition | Show results with:definition
[42]
[PDF] Sampling without Replacement Leads to Faster Rates in Finite-Sum ...
Sampling data points without replacement leads to faster convergence in stochastic gradient algorithms for finite-sum minimax optimization compared to sampling ...
[43]
[PDF] Deep Learning with Importance Sampling - arXiv
Importance sampling accelerates deep learning by focusing on informative samples, reducing gradient variance, and using a tractable upper bound to gradient ...
[44]
[PDF] Importance Sampling for Minibatches
Importance sampling for minibatches combines importance sampling, which samples more important examples, with minibatching, which accelerates training, to ...
[45]
[PDF] A Comprehensive Survey on Curriculum Learning - arXiv
Oct 25, 2020 · Abstract. Curriculum learning (CL) is a training strategy that trains a machine learning model from easier data to harder data, which.Missing: subsampling seminal
[46]
Big data preprocessing: methods and prospects
Nov 1, 2016 · Under-sampling has the advantage of producing reduced data sets, and thus interesting approaches based on neighborhood methods, clustering and ...
[47]
(PDF) Big data scalability based on Spark Machine Learning Libraries
Nov 20, 2019 · In this study, a classification application was devoloped on Apache Spark using the Naive Bayes method which machine learning libraries of ...
[48]
Bagging predictors | Machine Learning
Bagging predictors is a method for generating multiple versions of a predictor and using these to get an aggregated predictor.
[49]
Analyzing bagging - Project Euclid
With theoretical explanations, we motivate subagging based on subsampling as an alternative aggregation scheme. It is computationally cheaper but still shows ...
[50]
Optimization Methods for Large-Scale Machine Learning
This paper provides a review and commentary on the past, present, and future of numerical optimization algorithms in the context of machine learning ...
[51]
[PDF] Active Learning Literature Survey - Burr Settles
Jan 26, 2010 · This report provides a general introduction to active learning and a survey of the literature. This includes a discussion of the scenarios in ...
[52]
Active Learning Performance in Labeling Radiology Images Is 90 ...
Nov 30, 2021 · We conclude that the human effort required to label an image dataset can be reduced by approximately 90% in most cases by using the described ...