Fact-checked by Grok 2 weeks ago

Lossy compression

Lossy compression is a data compression method that reduces the size of data by irreversibly discarding portions of the original information deemed less essential, allowing for a reconstructed version that approximates but does not exactly match the source.^[1] This approach achieves significantly higher compression ratios than lossless methods, often 6:1 to 100:1 depending on the data type and quality requirements, by exploiting human perceptual limitations or application-specific tolerances for error.^[1] Unlike lossless compression, which preserves all data exactly, lossy techniques prioritize storage and transmission efficiency for bandwidth-constrained scenarios.^[2] The core principle of lossy compression revolves around the rate-distortion tradeoff, where the goal is to minimize distortion (measured by metrics like mean squared error or signal-to-noise ratio) for a given bit rate, or equivalently, maximize compression while keeping perceptible quality high.^[1] It relies on models of data redundancy and perceptual irrelevance; for instance, in visual or auditory signals, subtle details below human sensory thresholds can be removed without noticeable degradation.^[3] Quantization plays a central role, mapping continuous or high-precision values to a finite set of discrete levels, often combined with entropy coding for further efficiency.^[2] Common techniques include transform coding, such as the discrete cosine transform (DCT) used to concentrate energy in low-frequency components that are prioritized during quantization, and predictive coding like differential pulse code modulation (DPCM) that exploits temporal or spatial correlations.^[1] Vector quantization groups data into clusters represented by codebook entries, while subband or wavelet coding decomposes signals into frequency bands for selective compression.^[1] Notable examples are the JPEG standard for images, which applies DCT and chroma subsampling to achieve ratios up to 20:1 with acceptable quality, and MP3 for audio, employing perceptual coding to mask inaudible frequencies.^[2] Video formats like MPEG extend these by adding motion compensation across frames.^[3] Lossy compression finds widespread applications in digital media, including streaming services, mobile devices, and storage systems, where it enables efficient handling of large volumes of images, audio, and video without prohibitive bandwidth or space demands.^[1] Standards such as ITU H.261 for video conferencing and ISO MPEG-4 for versatile multimedia delivery incorporate lossy methods to support scalable bit rates and progressive refinement.^[1] While it introduces irreversible artifacts at high compression levels, its benefits in enabling real-time transmission and broad accessibility outweigh drawbacks for most consumer and professional uses.^[2]

Fundamentals

Definition and Principles

Lossy compression is a data compression technique that achieves reduced file sizes by permanently discarding redundant or perceptually irrelevant information from the original data, resulting in the impossibility of exact reconstruction upon decompression.^[4] This approach contrasts with lossless methods by prioritizing significant size reduction over fidelity, often achieving compression ratios several times higher while maintaining acceptable perceptual quality for human observers. The core principle underlying lossy compression involves exploiting models of human perception to identify and eliminate data that contributes minimally to the subjective experience, such as subtle details below sensory thresholds.^[5] Key principles of lossy compression include the use of psychoacoustic models for audio data, which account for auditory masking and frequency sensitivity to discard inaudible components, and psychovisual models for visual data, which leverage characteristics like reduced sensitivity to high spatial frequencies or color differences.^[6] These principles are grounded in rate-distortion theory, developed by Claude Shannon in 1948, which formalizes the tradeoff between data rate and allowable distortion. The compression process typically unfolds in stages: initial analysis to model perceptual irrelevance, quantization to approximate values with fewer bits, and encoding to further compact the representation.^[7] These stages ensure that the discarded information does not substantially impair the perceived quality, guided by empirical studies of human sensory limits.^[8] Foundational concepts and early practical applications of lossy compression emerged in the 1970s, coinciding with the development of early digital media standards, notably Adaptive Differential Pulse Code Modulation (ADPCM) for speech audio introduced by researchers at Bell Laboratories.^[9] ADPCM exemplified lossy techniques by adaptively quantizing prediction errors in audio signals, achieving efficient bandwidth reduction for telephony applications while introducing controlled distortion.^[9] A basic workflow for lossy compression can be described as follows: raw input data is transformed into a domain that concentrates energy (e.g., frequency coefficients), quantized to lower precision levels based on perceptual models, and then subjected to entropy coding to generate the final bitstream. For example, in audio processing, lossy compression often removes high-frequency components beyond the typical human hearing range of 20 kHz, as these are imperceptible and contribute disproportionately to data volume. Transform coding serves as a prevalent implementation in the transformation stage, reorganizing data to facilitate efficient quantization of less perceptible elements.^[7]

Advantages Over Lossless

Lossy compression achieves significantly higher compression ratios compared to lossless methods, often exceeding 10:1 for multimedia data, which substantially reduces storage requirements and enhances transmission efficiency over networks.^[10] This efficiency is particularly beneficial in bandwidth-constrained scenarios, where lossy techniques enable faster data transfer without requiring excessive resources.^[11] In practical applications, lossy compression excels in web media delivery, mobile devices, and broadcasting, where maintaining perceptual quality for human viewers or listeners is sufficient, allowing content providers to serve large audiences with limited infrastructure.^[12] For instance, streaming services rely on lossy formats to optimize playback in real-time environments with variable network conditions.^[13] While lossy compression introduces irreversible information loss as a necessary compromise for these gains, the discarded data typically falls below human perceptual thresholds, preserving acceptable quality in most use cases.^[14] This trade-off also yields energy savings in storage systems, as smaller file sizes decrease the power consumption associated with data handling and retention.^[15] A representative quantitative example is in audio compression, where the MP3 format achieves an approximately 11:1 ratio from uncompressed CD-quality audio, compared to the roughly 2:1 ratio of lossless FLAC, often without noticeable quality degradation for typical listeners.^[16]^[17] Furthermore, by minimizing data volumes, lossy compression contributes to reduced energy use in data centers, lowering the overall environmental impact through decreased electricity demands for storage and processing.^[18]

Core Techniques

Transform Coding

Transform coding is a foundational technique in lossy compression that converts input data from the spatial or time domain into the frequency domain using a reversible mathematical transform. This process exploits the statistical properties of signals, such as their tendency to have correlated samples, by representing the data as a set of frequency coefficients. A key benefit is the concentration of signal energy into a small number of low-frequency coefficients, while high-frequency components carry less energy and can often be discarded or approximated with minimal perceptual impact. This energy compaction property arises because transforms like the Karhunen-Loève transform (KLT) optimally diagonalize the signal's covariance matrix, decorrelating the coefficients and enabling efficient subsequent processing.^[19] Among the most widely used transforms, the Discrete Cosine Transform (DCT) is prevalent for image and video compression due to its excellent energy compaction for correlated data, closely approximating the performance of the optimal KLT with lower computational complexity. Introduced by Ahmed, Natarajan, and Rao in 1974, the DCT expresses a sequence of N real numbers as a sum of cosine functions oscillating at different frequencies. The one-dimensional type-II DCT, commonly employed in block-based coding, is defined as:

X_k = \sum_{n=0}^{N-1} x_n \cos\left[\frac{\pi}{N} \left(n + \frac{1}{2}\right) k \right], \quad k = 0, 1, \dots, N-1

where x_n are the input samples and X_k are the DCT coefficients. For audio compression, the Modified Discrete Cosine Transform (MDCT), developed by Princen and Bradley in 1987, is favored for its perfect reconstruction capabilities in critically sampled filter banks and overlap-add structures that reduce aliasing artifacts. The MDCT builds on the DCT-IV by incorporating time-domain aliasing cancellation, making it suitable for time-varying signals.^[20] The typical workflow in transform coding begins with applying the forward transform to blocks of input data to generate frequency coefficients, followed by coefficient selection—often prioritizing low-frequency terms based on their energy content—and then applying an inverse transform to reconstruct the signal. This selection step facilitates targeted information loss by focusing compression efforts on perceptually significant components. Quantization often follows as the primary lossy mechanism to further reduce coefficient precision. The decorrelation achieved by these transforms simplifies quantization and entropy coding, as independent coefficients require less bitrate for representation compared to the original correlated data, leading to higher compression ratios without uniform distortion across the signal.^[19] Historically, the DCT gained prominence through its adoption in the JPEG still image compression standard, finalized in 1992, where it enabled efficient lossy coding of continuous-tone images by processing 8x8 blocks. This standardization demonstrated the practical efficacy of transform coding, influencing subsequent formats in multimedia compression.^[21]

Quantization and Prediction

Quantization serves as the primary mechanism for introducing controlled loss in lossy compression by mapping continuous or high-precision input values to a finite set of discrete levels, thereby reducing the data representation to fewer bits while inevitably producing approximation errors. This process partitions the input range into intervals, assigning each interval to a representative value, known as the reconstruction level, which introduces a quantization error defined as e = x - \hat{x}, where x is the original input and \hat{x} is the quantized output. The error arises because the exact value x is replaced by the nearest discrete level, and its magnitude depends on the interval size and input distribution; for instance, in high-rate approximations, the mean squared error distortion is roughly proportional to the variance times $2^{-2R}, where R is the rate in bits per symbol.^[22]^[23] Uniform quantization employs equal-sized intervals across the input range, simplifying implementation and suiting signals with uniform distributions or high signal-to-noise ratios, but it can inefficiently allocate levels for non-uniform signals like speech or images. In contrast, non-uniform quantization uses varying interval sizes, often finer in regions of high perceptual importance—such as low-amplitude signals in audio—to minimize subjective distortion through perceptual weighting, as seen in companding techniques that compress the dynamic range before uniform quantization and expand it afterward. These approaches optimize the codebook design, such as via the Lloyd-Max algorithm, which iteratively refines levels and boundaries to minimize mean squared error for a given rate.^[22]^[24] Prediction enhances lossy compression by exploiting statistical redundancies in data, estimating subsequent values from prior ones to encode only the residuals, which are then quantized to introduce loss. Intra-frame prediction operates spatially within a single frame, using neighboring samples to forecast a current value, such as in image coding where a pixel is predicted from adjacent pixels via linear filters; the residual is quantized and transmitted, reducing the entropy of the encoded signal. Inter-frame prediction extends this temporally across frames, predicting from previous reconstructed frames to capture motion or evolution in sequences like video, again quantizing the difference to balance compression and fidelity; this yields prediction gains, for example, up to $1/(1 - r^2) for first-order Markov processes with correlation r. A classic example is Differential Pulse Code Modulation (DPCM), widely used in audio compression, where a linear predictor estimates the next sample from past ones, quantizes the prediction error, and reconstructs the signal at the decoder, achieving significant bit-rate savings over direct pulse code modulation while introducing granular noise as the primary distortion.^[25] Vector quantization extends scalar methods by treating groups of input samples as multidimensional vectors, mapping them jointly to codebook entries—predefined representative vectors—to exploit inter-sample correlations for greater efficiency. The codebook, a finite dictionary of vectors, is designed via clustering algorithms like k-means to minimize average distortion, such as Euclidean distance, enabling lower rates for equivalent quality compared to scalar quantization. This technique is particularly effective for correlated data like speech parameters or image blocks, though it requires larger codebooks and more complex searches, often mitigated by tree-structured approximations.^[22]^[26] Rate-distortion optimization integrates quantization and prediction by systematically balancing the trade-off between distortion D (e.g., mean squared error) and rate R (bits required), ensuring efficient allocation of resources across coding units. This is typically formulated as minimizing the Lagrangian J = D + \lambda R, where \lambda is a multiplier that slopes the convex hull of feasible rate-distortion points, allowing adaptation to constraints like target bit budgets in image or video coding. In practice, it guides decisions such as quantizer selection or prediction mode choice, as applied in standards like JPEG and MPEG, to achieve operating points that maximize quality per bit while respecting perceptual or objective fidelity limits.^[27]

Media Applications

Image Compression

Lossy image compression techniques exploit the limitations of human visual perception to discard data that has minimal impact on perceived quality, enabling significant file size reductions for still images. These methods prioritize preserving luminance details over chrominance and high-frequency spatial information, which the eye is less sensitive to. Quantization serves as the primary mechanism for introducing controlled information loss in these processes.^[28] The JPEG standard, formalized in ISO/IEC 10918-1:1994, represents a foundational approach to lossy image compression using discrete cosine transform (DCT) coding. In the encoding pipeline, input images are first converted from RGB to YCbCr color space to separate luminance (Y) from chrominance (Cb, Cr) components, allowing coarser quantization of color data. The image is divided into 8x8 pixel blocks, each undergoing a forward DCT to transform spatial data into frequency-domain coefficients, emphasizing low-frequency components that carry most visual energy. These coefficients are then quantized using application-defined tables, followed by zigzag scanning to reorder them from low to high frequency for efficient run-length encoding. Finally, Huffman coding is applied to the scanned coefficients, with DC values encoded differentially across blocks and AC coefficients using run-length and amplitude coding.^[28]^[29] JPEG supports baseline sequential mode for straightforward single-scan encoding of 8-bit images with 1-4 components, and progressive mode for multi-scan transmission that refines image quality gradually by spectral selection (grouping coefficient bands) and successive approximation (bit-plane refinement). Compression levels are controlled via a quality factor on a 1-100 scale, where higher values reduce quantization scaling to retain more detail, while lower values increase it for greater compression—though even at 100, minor rounding losses occur. Typical quality settings of 75-90 balance file size and visual fidelity for photographic images.^[28]^[30] Common artifacts in JPEG-compressed images include blocking, visible as grid-like discontinuities at block boundaries due to independent quantization, and ringing, oscillatory distortions around sharp edges from Gibbs phenomenon in the inverse DCT. These are exacerbated at low bit rates, degrading perceived quality in smooth or high-contrast regions. Mitigation often involves post-processing with deblocking filters that adaptively smooth block edges based on local variance, or deringing filters that suppress high-frequency oscillations while preserving edges; such techniques can improve PSNR by 1-3 dB without altering the core codec.^[31]^[32] Subsequent standards build on these principles for enhanced efficiency. WebP, introduced by Google in 2010 and standardized in RFC 6386, employs VP8 intra-frame coding for lossy compression, using block prediction from neighboring pixels, DCT on residuals, and arithmetic entropy coding to achieve 25-34% smaller files than JPEG at equivalent quality. HEIF (High Efficiency Image Format), defined in ISO/IEC 23008-12:2017, uses HEVC (H.265) intra-frame encoding within an ISO base media file format container, supporting features like layered images and transparency for up to 50% better compression than JPEG. More recently, AVIF (AV1 Image File Format), specified by the Alliance for Open Media in 2019 and registered as image/avif by IANA, leverages AV1 video codec intra-frames in a HEIF container, offering superior web efficiency with 20-50% size reductions over JPEG and growing adoption in browsers since 2020 for HDR and wide-color-gamut images.^[33]^[34]^[35] JPEG XL, standardized as ISO/IEC 18181-1:2022 by the Joint Photographic Experts Group, introduces a modern royalty-free format supporting both lossy and lossless compression with improved efficiency over JPEG (up to 60% size reduction at similar quality) and features like animation, HDR, and lossless transcoding from legacy JPEG files. It uses a modular design with tools such as the Fuchsia transform and adaptive quantization, achieving broad browser support by 2025 for web and professional imaging applications.^[36]^[37]

Video Compression

Video compression is a cornerstone of lossy compression techniques applied to moving images, exploiting both spatial redundancy within frames and temporal redundancy across frames to achieve significant data reduction while maintaining perceptual quality. Unlike still image compression, which operates on individual frames, video codecs incorporate inter-frame prediction to model motion, allowing for bitrates as low as 1-4 Mbps for high-definition content in modern standards. This approach is essential for streaming, broadcasting, and storage, where uncompressed raw video can exceed 100 Mbps per stream. The MPEG family of standards, developed by the Moving Picture Experts Group under ISO/IEC, forms the backbone of video compression. MPEG-2, standardized in 1995, enabled digital television broadcasting and DVD storage with compression ratios up to 50:1 for standard-definition video, supporting bitrates from 1.5 to 15 Mbps. H.264/AVC (Advanced Video Coding), jointly developed by ITU-T and MPEG and finalized in 2003, introduced more efficient tools like variable block sizes and improved entropy coding, achieving 50% bitrate savings over MPEG-2 at equivalent quality, widely used in Blu-ray, streaming, and mobile video. H.265/HEVC (High Efficiency Video Coding), released in 2013, further advances this with larger coding tree units and advanced motion vector prediction, offering up to 50% better compression than H.264 for 4K and 8K resolutions, though at higher computational cost. Complementing these, AV1, developed by the Alliance for Open Media and released in 2018, is a royalty-free alternative that rivals HEVC's efficiency with up to 30% bitrate reduction over H.264, gaining adoption in web video platforms like YouTube due to its open-source nature. Versatile Video Coding (VVC/H.266), standardized by ITU-T and ISO/IEC in 2020, builds on HEVC with enhanced tools for higher resolutions and immersive media, providing up to 50% bitrate savings over HEVC at equivalent quality for 8K and beyond, though requiring significantly more encoding/decoding power. As of 2025, VVC sees increasing deployment in professional broadcasting, OTT streaming, and hardware like set-top boxes, supported by profiles for low-latency and 360-degree video.^[38] Central to these standards is motion estimation and compensation, which predict frame content from previous or future frames to minimize residual data. Block matching divides frames into macroblocks (typically 16x16 pixels) and searches for the best-matching block in a reference frame, formalized as minimizing the sum of absolute differences:

\min_{mv} \sum |f(t) - f(t-1 + mv)|

where f(t) is the current frame at time t, f(t-1) is the reference frame, and mv is the motion vector. This process, often refined with quarter-pixel accuracy in H.264 and later, captures object movement efficiently. Frames are classified as I-frames (intra-coded, self-contained like images), P-frames (predicted from prior frames), and B-frames (bi-directionally predicted from past and future frames), with B-frames providing the highest compression by referencing multiple references but increasing decoding latency. Transform coding, similar to intra-frame methods, is applied to residuals after motion compensation. Standards define profiles to balance complexity and performance. The Baseline profile in H.264 suits low-latency applications like video conferencing with simpler motion estimation and no B-frames, while the High profile adds features like 8x8 transforms and CABAC entropy coding for superior efficiency in broadcast and storage. HEVC extends this with Main and Main 10 profiles for HDR support, and AV1 offers similar tiers for progressive enhancement. Bitrate control mechanisms further optimize delivery: constant bitrate (CBR) maintains steady output for live streaming to avoid buffering, whereas variable bitrate (VBR) allocates more bits to complex scenes for consistent quality, often using two-pass encoding where the first pass analyzes the video and the second encodes accordingly. Despite these advances, lossy video compression introduces visible artifacts. Motion blur arises from inaccurate estimation in fast-moving scenes, smearing details across frames, while mosquito noise manifests as ringing or halos around edges due to quantization of high-frequency components in motion-compensated residuals. These are more pronounced at low bitrates, prompting perceptual models in modern codecs to prioritize human vision sensitivity.

Audio Compression

Lossy audio compression leverages the limitations of human auditory perception, particularly through psychoacoustic principles that allow the removal of inaudible signal components while preserving perceived quality. Central to this approach are critical bands, which represent frequency ranges where the ear's resolution is roughly constant, modeled on scales like the Bark or Equivalent Rectangular Bandwidth (ERB) scales. These bands, numbering about 24 for audible frequencies, enable efficient encoding by grouping spectral energy and focusing compression on perceptually relevant details. Psychoacoustic models analyze the input signal to identify masked regions, ensuring quantization noise falls below auditory thresholds.^[39] Frequency masking, or simultaneous masking, occurs when a louder sound (masker) at frequency f_m renders quieter sounds (maskees) nearby inaudible within the same critical band or adjacent bands due to the ear's limited frequency selectivity. The masking effect spreads asymmetrically: stronger toward higher frequencies (upward spread) and weaker downward, quantified by a spreading function that raises the threshold of hearing. In MPEG standards, this is approximated on the Bark scale z, where the masking threshold T_q(z) for a maskee at z due to a masker at z_m follows a form like T_q(z) = a \cdot 10^{b(z - z_m)} for z > z_m, with parameters a and b derived from empirical data (e.g., steeper slope of -15 to -27 dB per Bark upward, shallower +8 to +30 dB per Bark downward). This allows bit allocation to prioritize unmasked frequencies.^[39]^[40] Temporal masking complements frequency masking by exploiting the ear's sluggish response to rapid changes: a loud sound elevates the hearing threshold for subsequent (post-masking, up to 200 ms) or preceding (pre-masking, 5-20 ms) quieter sounds in the same frequency range. The temporal spreading function models this decay, often exponentially, as T_t(t) = T_m \cdot e^{-t / \tau}, where \tau is a time constant varying by signal level (longer for sustained tones). Combined, these masking effects guide noise shaping in compression, confining errors to imperceptible regions.^[39]^[41] The MPEG-1 Audio Layer III (MP3) standard, finalized in 1993, exemplifies these principles in a widely adopted format for general audio. It employs a hybrid filterbank: a 32-subband polyphase filter followed by a Modified Discrete Cosine Transform (MDCT) on overlapping blocks of 576 or 192 samples, providing fine spectral resolution (down to 41.67 Hz) for transient handling and pre-echo avoidance. The psychoacoustic model (Model 1 or 2) computes masking thresholds via FFT analysis, identifies tonal/noise maskers, and allocates bits to subbands based on signal-to-masking ratios (SMR), ensuring noise below thresholds using Huffman-coded quantized MDCT coefficients. Bit rates are dynamically adjusted via a reservoir mechanism.^[42]^[43]^[44] For stereo audio, MP3 incorporates joint stereo coding to exploit inter-channel redundancies. Intensity stereo encodes high-frequency bands with a single mono signal modulated by channel-specific intensity factors, preserving spatial cues without full separation. Mid-side (M/S) coding transforms left-right channels into sum (mid) and difference (side) signals, quantizing the often low-energy side channel more coarsely while reconstructing the stereo image at decoding. These techniques reduce bitrate needs by 20-30% at low rates without perceptual loss.^[40]^[43] Advanced Audio Coding (AAC), defined in ISO/IEC 14496-3 (MPEG-4 Part 3) as MP3's successor, enhances these methods for better efficiency at low bitrates. AAC uses a pure MDCT filterbank with longer windows (1024-2048 samples) for improved frequency resolution, a more sophisticated psychoacoustic model incorporating temporal masking delays, and tools like perceptual noise substitution for noisy signals. It supports multichannel audio and variable rates, achieving transparent quality at lower bitrates than MP3.^[45]^[46] Opus, standardized as RFC 6716 by the IETF in 2012, is a versatile royalty-free codec for both speech and music, using a hybrid SILK (linear prediction for speech) and CELT (MDCT for music) structure with dynamic switching based on content. It supports bitrates from 6 to 510 kbit/s, frame sizes as low as 2.5 ms for low latency, and features like variable bitrate and forward error correction, outperforming AAC in quality at bitrates below 128 kbit/s and widely adopted in WebRTC, VoIP, and streaming services as of 2025.^[47] Typical bitrates for near-CD quality (44.1 kHz, 16-bit stereo) in these formats are around 128 kbps, where artifacts are minimal for most listeners, balancing file size and fidelity; higher rates like 192-256 kbps approach transparency.^[40]^[39]

Specialized Applications

Speech and 3D Graphics

Lossy compression for speech signals primarily relies on parametric models that synthesize voice based on vocal tract characteristics rather than directly encoding waveforms, enabling efficient representation at low bitrates suitable for real-time transmission. Linear Predictive Coding (LPC) forms the foundation of many such techniques by modeling the spectral envelope of speech through a linear prediction filter that estimates current samples from past ones, capturing formants essential to speech intelligibility. The core LPC synthesis equation is given by

\hat{s}(n) = \sum_{k=1}^p a_k s(n-k) + G u(n),

where \hat{s}(n) is the predicted speech sample, a_k are the predictor coefficients, p is the prediction order (typically 10-12 for speech), G is the gain, and u(n) is the excitation signal.^[48] This approach discards fine waveform details in favor of parameters that can be quantized, achieving compression ratios far beyond waveform coders while preserving perceived quality.^[49] Building on LPC, Code-Excited Linear Prediction (CELP) enhances synthesis by using a codebook to select an optimal excitation sequence that minimizes prediction error through analysis-by-synthesis optimization, allowing high-quality speech at bitrates as low as 4.8 kbps.^[50] Modern standards like Opus, standardized in 2012, integrate CELP-based methods (via its SILK component) for wideband speech up to 20 kHz, operating effectively in the 6-24 kbps range for voice applications, and supports hybrid modes for both speech and music.^[47] Similarly, the Enhanced Voice Services (EVS) codec, developed by 3GPP in 2014, employs a CELP core with super-wideband extension up to 20 kHz, targeting 5.9-24 kbps for conversational quality in mobile networks, with quantization applied to LPC parameters and codebook indices to balance bitrate and fidelity.^[51] These trade-offs prioritize intelligibility over exact reproduction, as minor parameter distortions remain imperceptible in voiced segments but can introduce artifacts at very low bitrates. Quantization of these parameters further reduces data by mapping continuous values to discrete levels, typically using vector quantization for efficiency.^[52] Such speech compression techniques find primary application in Voice over IP (VoIP) systems, where low-latency encoding at constrained bitrates ensures reliable transmission over packet-switched networks without excessive bandwidth demands.^[53] For 3D graphics, lossy compression addresses the high storage and transmission costs of polygonal meshes and associated textures by approximating geometry and visuals while maintaining interactive rendering quality. Mesh simplification reduces vertex count through edge-collapse operations guided by error metrics like distance to the original surface, enabling progressive transmission and level-of-detail adjustments with minimal perceptual loss in complex models.^[54] Texture compression employs block-based methods, such as BC7, which partitions 4x4 texel blocks into subsets and uses endpoint interpolation with indices for high-fidelity RGB/RGBA encoding at 8 bits per texel, supporting Direct3D 11 and later APIs for real-time graphics.^[55] Google's Draco library, released in 2017, combines predictive geometry encoding with entropy coding for meshes and point clouds, achieving up to 90% size reduction for typical assets while preserving visual fidelity through edgebreaker traversal and quantization of positions and attributes.^[56] Trade-offs in 3D compression emphasize visual fidelity over geometric precision, as small vertex perturbations or texture approximations are often imperceptible in rendered scenes, particularly under shading and lighting; for instance, Draco balances compression speed and ratio via tunable quantization levels. These methods are critical for virtual reality (VR) and augmented reality (AR) applications, where compressed 3D models enable efficient streaming of immersive environments over bandwidth-limited connections.^[57]^[58]

Scientific and Other Data

Lossy compression plays a crucial role in managing the vast volumes of numerical data generated by scientific simulations and observations, where storage and transmission constraints are severe, yet fidelity to underlying physical phenomena must be maintained. Applications span climate modeling, genomics, and astronomical telescope data, often employing error-bounded techniques to ensure that compression-induced errors do not compromise downstream analyses. For instance, in climate modeling, lossy compression reduces data volumes from high-resolution simulations while preserving key statistical properties like temperature and precipitation patterns.^[59] Similarly, in genomics, it targets quality scores in sequencing data to enable efficient storage without significantly affecting variant calling accuracy.^[60] For telescope data, such methods compress integer astronomical images while safeguarding photometry results essential for source detection.^[61] The SZ compressor exemplifies error-controlled approaches, providing pointwise absolute or relative error bounds for floating-point and integer scientific datasets across simulations and instruments.^[62] Key techniques include autoencoders for dimensionality reduction, which learn compact latent representations of high-dimensional scientific arrays, and floating-point quantization with user-specified tolerances to approximate values within acceptable error margins. Autoencoders, particularly hierarchical variants, achieve substantial compression for large-scale simulation outputs by reconstructing data with controlled distortion. Quantization methods, such as block floating-point schemes, scale and round values to lower precision levels, ensuring errors remain below predefined thresholds suitable for numerical fidelity. Standards like ZFP, a library for compressed floating-point arrays, support high-throughput random access with fixed-rate or error-bounded modes optimized for 2D to 4D spatially correlated data from physics simulations.^[63] MGARD, a multigrid-based framework, enables multilevel refactoring and compression with guaranteed error control, applicable to structured and unstructured meshes in scientific workflows.^[64] Evaluation metrics emphasize relative error, defined as \frac{|x - \hat{x}|}{|x|} < \epsilon for original value x and reconstruction \hat{x}, ensuring proportional accuracy across data scales common in scientific domains. This metric underpins relative-error-bounded compressors like SZ, which adapt to data magnitudes for consistent quality.^[65] In the 2020s, efforts have intensified around exascale computing, where lossy methods address I/O bottlenecks in petabyte-scale simulations by integrating with HPC workflows for in-situ compression. A primary challenge lies in balancing scientific accuracy with aggressive size reduction, as ratios up to 100:1 can be achieved—such as with ZFP on correlated floating-point fields—but require careful error tuning to avoid altering physical insights or statistical validity. Prediction techniques for time-series data, like those in climate outputs, can further enhance ratios by exploiting temporal correlations within error bounds.^[66]

Evaluation

Information Loss and Transparency

In lossy compression, information loss primarily occurs through the irreversible removal of data elements that are imperceptible to human sensory perception, such as subtle spatial variations in images or inaudible frequency components in audio signals.^[67] This approach exploits perceptual redundancies, discarding details below the thresholds of human vision or hearing while preserving essential structural and semantic content.^[68] The discarded data cannot be recovered upon decompression, distinguishing lossy methods from lossless ones, but the loss is engineered to minimize noticeable degradation.^[69] A key concern is the accumulation of loss across multiple compression-decompression cycles, known as generation loss, where artifacts from initial encoding propagate and amplify in subsequent generations.^[70] This compounding effect arises because each cycle introduces additional quantization errors or approximations, leading to progressive distortion that becomes more perceptible over iterations, particularly in formats like JPEG for images or MP3 for audio.^[70] Transparency refers to the bitrate or quality level at which the compressed output is perceptually indistinguishable from the original, meaning no audible or visible differences can be detected under typical conditions.^[67] For example, in audio compression, Advanced Audio Coding (AAC) achieves transparency at approximately 192 kbps for stereo signals in many listening scenarios, balancing efficiency with fidelity.^[71] This threshold varies by content and codec but represents the "transparent bitrate" where further increases yield diminishing perceptual returns.^[71] Objective metrics quantify information loss by comparing the original and reconstructed signals. The Peak Signal-to-Noise Ratio (PSNR) measures distortion as the ratio of the maximum signal power to the power of corrupting noise, calculated as

\text{PSNR} = 10 \log_{10} \left( \frac{\text{MAX}^2}{\text{MSE}} \right),

where MAX is the maximum possible signal value and MSE is the mean squared error between original and compressed versions; higher PSNR values indicate less loss, with typical ranges of 30–50 dB for acceptable quality in images and video.^[72] Another metric, the Structural Similarity Index (SSIM), evaluates perceived changes in luminance, contrast, and structure, defined as

\text{SSIM}(x, y) = [l(x, y)] \cdot [c(x, y)] \cdot [s(x, y)],

with luminance l(x, y) = \frac{2\mu_x \mu_y + C_1}{\mu_x^2 + \mu_y^2 + C_1}, contrast c(x, y) = \frac{2\sigma_x \sigma_y + C_2}{\sigma_x^2 + \sigma_y^2 + C_2}, and structure s(x, y) = \frac{\sigma_{xy} + C_3}{\sigma_x \sigma_y + C_3}, where \mu denotes means, \sigma variances and covariance, and C stabilization constants; SSIM values near 1 signify high structural fidelity.^[73] Perceptual models guide loss minimization by incorporating human visual or auditory sensitivities, particularly through Just Noticeable Difference (JND) thresholds, which define the minimum distortion level undetectable by observers.^[74] JND-based approaches, such as those modeling contrast masking or luminance adaptation, allow compressors to allocate bits preferentially to perceptible regions, enabling up to 15–20% bitrate savings without quality loss in image and video applications.^[74] Subjective evaluation complements objective metrics via the Mean Opinion Score (MOS), a standardized scale from 1 (bad) to 5 (excellent) derived from human listener or viewer ratings in controlled tests. MOS assesses overall perceptual quality, accounting for nuances like fatigue or context that metrics like PSNR overlook, and is integral to validating transparency in audio and video compression standards.

Compression Ratios and Efficiency

The compression ratio in lossy compression is defined as the ratio of the original data size to the compressed data size, quantifying the reduction in storage or transmission requirements.^[14] For images, typical ratios range from 10:1 to 50:1 depending on quality settings and content, as seen in JPEG where medium-quality encoding often achieves around 10:1 to 20:1 without severe degradation.^[75]^[30] In video, ratios can extend to 20:1 to 200:1 for standards like MPEG-4, balancing bitrate and perceptible quality.^[76] Efficiency is commonly measured using bits per pixel (BPP) for images, which represents the average number of bits needed to encode each pixel after compression; lower BPP values indicate higher efficiency, such as reducing from 24 BPP in uncompressed RGB to 1-4 BPP in compressed formats like JPEG.^[77] For video codecs, the Bjøntegaard Delta rate (BD-rate) provides a standardized metric by integrating rate-distortion curves to compute average bitrate savings at equivalent quality levels, often expressed as a percentage improvement over a reference codec.^[78] Compared to lossless compression, which typically yields 2:1 to 5:1 ratios for media data while preserving all information, lossy methods achieve 5-50 times higher ratios by discarding perceptually irrelevant details, though this introduces irreversible distortion.^[79] Efficiency varies with content dependency; smooth gradients in images or low-motion videos compress more effectively (higher ratios) than noisy or high-detail content due to better predictability in transform coding.^[80] Computational complexity also influences practical efficiency, as advanced codecs like HEVC (H.265) offer about 50% bitrate savings over H.264 at similar quality but require 2-10 times more encoding time owing to larger block sizes and more prediction modes. Recent benchmarks highlight AV1's gains, delivering approximately 30% better compression efficiency than HEVC (negative 30% BD-rate) across diverse content from 2018 to 2025 evaluations, while Versatile Video Coding (VVC, H.266) provides additional 20-40% efficiency improvements over HEVC as of 2025, often outperforming AV1 in high-resolution scenarios.^[81]^[82] These ratios are optimized near transparency thresholds where further compression yields diminishing returns in quality preservation.^[83]

Practical Challenges

Editing and Transcoding

Editing lossy compressed media introduces significant challenges due to the need for decoding and subsequent re-encoding, which exacerbates compression artifacts through a process known as generational loss. In image editing, for instance, operations such as cropping or resizing a JPEG file require decompressing the image, applying modifications, and recompressing it, often at the same or lower quality level. This re-compression amplifies visible artifacts like blocking, where pixelation appears along 8x8 block boundaries, as the quantization errors from the initial compression interact with new transformations. Additionally, JPEG editing can lead to color shifts, particularly in regions with subtle gradients, where the discrete cosine transform and chroma subsampling introduce inaccuracies that propagate during requantization. Similar issues arise in audio editing; altering pitch in lossy compressed files, such as those using MP3 or AAC, can amplify quantization noise, as frequency domain modifications redistribute errors across the spectrum, making subtle distortions more audible in the altered signal. Transcoding, the conversion of media from one lossy format to another, compounds these problems by necessitating a full decode-encode cycle, which introduces cumulative distortions. For video, converting from one compressed format to another involves decoding the source stream and re-encoding it, leading to drift accumulation where prediction errors from motion compensation and residual quantization propagate across frames, causing temporal inconsistencies like blurring or ghosting in motion-heavy scenes. This drift arises because the decoder's reconstructed frames deviate from the original, and subsequent encoding builds predictions on these imperfect references, resulting in error buildup over multiple generations. In cascaded compression scenarios, such as repeated transcoding for distribution across platforms, these effects intensify, reducing overall fidelity even if the target bitrate remains constant. To mitigate generational loss, workflows often incorporate non-destructive editing techniques, where modifications are stored as metadata or layered adjustments without altering the underlying compressed data until final export. Working with raw or lossless intermediate formats during editing preserves original quality, avoiding intermediate re-compressions.^[84] Proxy workflows further address these issues by generating low-resolution, lightweight versions of high-quality source files for editing; these proxies undergo any necessary re-compressions without affecting the originals, which are linked and substituted only during final rendering to minimize artifact accumulation.^[85]

Scalability and Resolution Adjustment

Lossy compression techniques often incorporate scalability to allow adaptation of the compressed data to varying network conditions, device capabilities, or user preferences without requiring complete re-encoding. This is achieved through layered bitstream structures that enable partial decoding for lower resolutions, frame rates, or quality levels. In image compression, Progressive JPEG exemplifies this by organizing the DCT coefficients into multiple scans, permitting a coarse approximation of the image to be displayed first, with successive scans refining the detail.^[28] For video, Scalable Video Coding (SVC), an extension to H.264/AVC defined in Annex G of ITU-T Recommendation H.264, introduces spatial, temporal, and quality (SNR) scalability through a base layer and enhancement layers. The base layer provides a low-resolution or low-quality version compatible with legacy decoders, while enhancement layers add higher resolution (spatial scalability, e.g., from quarter to full size), more frames (temporal scalability via hierarchical B-frames), or reduced quantization noise (SNR scalability). This layered approach allows extraction of subsets of the bitstream for targeted decoding, reducing bandwidth needs by up to 50% in adaptive scenarios compared to simulcasting multiple independent streams.^[86] The Scalable High Efficiency Video Coding (SHVC) extension to HEVC (H.265), specified in Annexes F and G of ITU-T Recommendation H.265, builds on this with improved efficiency, supporting spatial ratios of 1.5x or 2x between layers and SNR scalability through medium grain or coarse grain quality refinement. Enhancement layers in SHVC use inter-layer prediction, such as upsampling the base layer via dedicated filters specified in the standard for spatial alignment, to minimize redundancy while preserving compression gains of 30-50% over non-scalable HEVC for multi-resolution delivery. These scalability features enable dynamic adjustment, such as downsampling by halving spatial resolution or frame rate to fit lower bitrates, followed by upscaling at the decoder using methods like bicubic interpolation to approximate higher quality. In practice, protocols like Dynamic Adaptive Streaming over HTTP (DASH), standardized in ISO/IEC 23009-1, leverage scalable bitstreams to switch layers in real-time based on bandwidth, ensuring seamless playback across devices. For instance, DASH segments can include multiple representations, allowing clients to select appropriate scalability layers without transcoding.^[87] However, non-scalably designed coders can introduce mismatch artifacts, such as drift between encoder and decoder predictions, leading to accumulating errors in enhancement layers if inter-layer references misalign. This drift, exacerbated in SNR scalability, can manifest as blocking or blurring artifacts, requiring careful mode decisions to limit overhead to under 10% in SVC/SHVC. Modern codecs like AV1 support spatial and temporal scalability through multi-layer tiling and temporal sublayers, but SNR scalability remains limited, often relying on external enhancements rather than native fine-grained layers.^[88]

Emerging Trends

AI-Driven Methods

Recent advancements in lossy compression have leveraged artificial intelligence, particularly deep learning techniques, to surpass traditional methods in rate-distortion performance and perceptual quality. Neural autoencoders form the core of many end-to-end learned compression systems, where an encoder maps input data to a compact latent representation, followed by quantization and decoding to reconstruct the output. These models are trained jointly to minimize a rate-distortion loss, enabling the network to learn data-specific transformations that capture essential features more efficiently than hand-engineered transforms like discrete cosine transform.^[89] Generative adversarial networks (GANs) have been integrated to mitigate compression artifacts, such as blocking or blurring, by training a discriminator to distinguish real from reconstructed images, forcing the generator (decoder) to produce more realistic outputs. This adversarial training enhances perceptual fidelity beyond pixel-wise metrics like mean squared error. For instance, a fully convolutional residual network using GANs effectively removes JPEG compression artifacts, improving visual quality at low bitrates.^[90] Prominent examples include Google's Neural Image Compression framework, introduced in 2018, which employs variational autoencoders with a scale hyperprior to model spatial dependencies in the latent space. Models by Ballé et al. demonstrate superior rate-distortion curves compared to BPG on standard datasets while maintaining similar PSNR levels. These systems often incorporate learned perceptual losses, such as those based on LPIPS, which align better with human visual perception than traditional distortions, leading to visually preferable reconstructions at equivalent rates.^[89]^[91] A key benefit of these AI-driven approaches is support for variable bitrate compression through manipulation of the latent space, allowing dynamic adjustment of quality without retraining. By 2025, extensions to scientific data compression have emerged, such as error-bounded methods using neural autoencoders like AE-SZ, which ensure reconstruction errors stay within user-defined thresholds while achieving 100%-800% higher compression ratios than traditional compressors like SZ on multidimensional simulation data.^[92] DeepSZ applies similar principles to compress deep neural network weights with guaranteed accuracy loss bounds, facilitating efficient storage of AI models themselves.^[93] Despite these gains, AI-driven methods face challenges, including the need for large, diverse training datasets to generalize across data types, which can introduce biases if not representative. Additionally, the computational overhead during encoding and decoding remains high, often requiring specialized hardware to match real-time performance of classical codecs, though ongoing optimizations aim to address this.^[91]^[94]

Hardware Acceleration

Hardware acceleration plays a crucial role in enabling real-time lossy compression for applications demanding high throughput, such as video streaming and scientific data processing, by leveraging specialized processors to offload computationally intensive tasks from general-purpose CPUs. Application-Specific Integrated Circuits (ASICs) are commonly used in encoders to optimize fixed-function operations in standards like H.264, where Intel Quick Sync Video integrates dedicated encoding hardware directly into the CPU die for efficient video compression. This approach achieves transcoding speeds exceeding 300 frames per second on modern Intel processors, significantly reducing encoding latency compared to software-only implementations.^[95]^[96] Graphics Processing Units (GPUs) excel in parallelizing transforms essential to lossy compression algorithms, such as the Discrete Cosine Transform (DCT) in JPEG encoding. NVIDIA's NVENC, a dedicated ASIC on GeForce RTX GPUs, accelerates video compression using codecs like H.264, HEVC, and AV1, delivering up to 4x faster export times in tools like Adobe Premiere Pro while maintaining comparable quality to CPU encoding. For image compression, advanced GPU-accelerated JPEG decoders extending the nvJPEG library achieve throughputs that outperform CPU-based libjpeg-turbo by up to 51x on high-end GPUs like the A100. In scientific data contexts, the CuSZ framework further demonstrates GPU potential, providing error-bounded lossy compression up to 370x faster than a single CPU core and 13x faster than multi-core CPU setups on datasets like those from high-performance computing simulations.^[97]^[98]^[99]^[100] These technologies yield substantial gains, including 10-100x speedups in compression throughput and improved power efficiency, particularly beneficial for mobile devices where battery life constrains processing. For instance, NVENC's AV1 support on RTX 50-series GPUs offers 43% better compression efficiency than H.264 at equivalent bitrates, enabling 4K video at lower bandwidths without quality loss. AI-driven lossy compression methods, such as neural autoencoders, also benefit from hardware acceleration on platforms like Apple's Neural Engine, which provides up to 26x peak throughput improvements for transformer-based models since the A11 Bionic in 2017.^[97]^[100]^[101] By 2025, developments in Field-Programmable Gate Arrays (FPGAs) have advanced custom scientific compressors, such as FPGA-enhanced implementations of hyperspectral lossy algorithms like HyperLCA, which adaptively control distortion for real-time data from remote sensing. These FPGA designs achieve high-speed processing tailored to bandwidth constraints, outperforming general-purpose hardware in specialized error-bounded scenarios. Edge AI chips further reduce latency in compression tasks, enabling on-device processing for IoT applications with minimal data transmission delays, as seen in 2025 market advancements emphasizing energy-efficient inference.^[102]^[103] Despite these benefits, hardware acceleration involves trade-offs between fixed-function ASICs, which offer high efficiency for specific tasks but lack flexibility, and programmable options like GPUs or FPGAs, which support diverse algorithms at the cost of higher power consumption and design complexity.^[104]

References

[1]
[PDF] Fundamentals of Data Compression - Stanford Electrical Engineering
Sep 9, 1997 · In general lossy compression may also include a lossless compression component, to squeeze a few more bits out. 14. Page 15. Compression Context.
[2]
[PDF] Introduction to Data Compression - CMU School of Computer Science
Jan 31, 2013 · Lossless algorithms are typically used for text, and lossy for images and sound where a little bit of loss in resolution is often undetectable, ...Missing: fundamentals | Show results with:fundamentals
[3]
Lossy compression (article) - Khan Academy
Lossy compression algorithms are techniques that reduce file size by discarding the less important information. Nobody likes losing information, ...Missing: definition principles authoritative
[4]
Lossy Compression - an overview | ScienceDirect Topics
Lossy compression is a type of compression where some data are lost from the original message sequence [86]. Lossy compression techniques are usually used to ...
[5]
https://www.sciencedirect.com/topics/computer-science/audio-compression
[6]
Audio Compression - an overview | ScienceDirect Topics
Compression methods often exploit psychoacoustic properties, such as auditory masking, to remove data that is not perceptible to human listeners, thereby ...
[7]
Psychovisual-based distortion measure for monochrome image ...
In this paper we describe a quantitative distortion measure for judging the quality of compressed monochrome images based on a psycho-visual model.
[8]
[PDF] Lossy Image Compression and Scalar Quantization
Useful for analog to digital conversion. □ With entropy coding, it yields good lossy compression. □ Lloyd algorithm works very well in practice, ...Missing: workflow | Show results with:workflow<|control11|><|separator|>
[9]
Adaptive quantization in differential PCM coding of speech
Adaptive quantization in differential PCM coding of speech. Abstract: We describe an adaptive differential PCM (ADPCM) coder which makes instantaneous ...
[10]
[PDF] Hybrid Lossy Compression Methods Can Confidently Optimize Wide ...
The primary goal of this work is to research and create a workflow for scientists to transfer data between computing clusters faster while maintaining data ...
[11]
A Guide to Lossy vs Lossless Compression | NinjaOne
Oct 21, 2025 · Lossy compression is characterized by its ability to significantly reduce file sizes by discarding bits of information considered less critical ...An Overview Of Lossy... · An Overview Of Lossless... · Optimizing Your Digital...Missing: principles authoritative
[12]
5 Key Differences Between Lossless and Lossy Compression
Jan 28, 2025 · Streaming platforms use lossy compression to deliver videos efficiently without overwhelming bandwidth. Casual image sharing on social media ...
[13]
Lossy Compression in Streaming: Benefits & Challenges - FastPix
Oct 23, 2025 · Lossy compression reduces video size, optimizing bandwidth, but may impact quality, balancing performance and user experience in streaming.
[14]
Lossy Compression Explained: Benefits, Uses & How It Works
Jul 14, 2025 · The key advantage of lossy compression is its ability to drastically reduce file sizes, which is crucial for faster transmission over networks ...
[15]
How Better Data Compression Leads to Energy Savings
By reducing the size of data files, data compression helps to minimize storage and transmission requirements, leading to significant energy savings.
[16]
The Internet Is Changing the Music Industry
Aug 1, 2001 · This represents a compression ratio of 11:1. Although MP3 has a much lower bit rate than traditional standards, music quality does not ...
[17]
MP3, AAC, WAV, FLAC: all the audio file formats explained
Feb 10, 2025 · A lossless file, the FLAC (Free Lossless Audio Codec) is compressed to nearly half the size of an uncompressed WAV or AIFF of equivalent sample ...
[18]
How Data Lake Compression Reduces Carbon Emissions - Granica
Apr 18, 2024 · Data compression not only helps reduce carbon footprint and cost but can also enhance performance for enterprise applications by speeding up ...Missing: lossy savings
[19]
None
### Summary of Key Concepts in Transform Coding
[20]
Discrete Cosine Transform | IEEE Journals & Magazine
Discrete Cosine Transform. Abstract: A discrete cosine transform (DCT) is defined and an algorithm to compute it using the fast Fourier transform is developed.
[21]
The JPEG still picture compression standard - IEEE Xplore
... JPEG standard includes two basic compression methods, each with various modes of operation. A DCT (discrete cosine transform)-based method is specified for ...
[22]
Lossy Compression Basics and Quantization - GitHub Pages
This is a fundamental part of lossy compression. Entropy coding is still widely applicable and typically used as the final step after quantization.Missing: workflow | Show results with:workflow
[23]
[PDF] Fundamentals of Quantization - Stanford Electrical Engineering
Mar 20, 2006 · In lossy case require a measure of Quality of a quantizer quantifying the loss or quality of the resulting reproduction in comparison to the ...
[24]
Nonuniform Quantizer - an overview | ScienceDirect Topics
While a uniform quantizer has the same step size throughout the input range, a nonuniform quantizer does not. For a uniform quantizer, the range and the step ...
[25]
[PDF] Predictive Coding
▫ JPEG-LS lossless compression standard. ▫ Lossy predictive coding: DPCM ... Example of intraframe DPCM coding prediction error coding. 1 bit/pixel. 2 bit ...
[26]
Vector Quantization and Signal Compression - SpringerLink
Book Title: Vector Quantization and Signal Compression · Authors: Allen Gersho, Robert M. · Series Title: The Springer International Series in Engineering and ...
[27]
[PDF] Rate-Distortion Methods for Image and Video Compression
Sep 2, 1998 · Abstract. In this paper we provide an overview of rate-distortion R-D based optimization techniques and their practical application to image ...
[28]
[PDF] itu-t81.pdf
This CCITT Recommendation | ISO/IEC International Standard was prepared by CCITT Study Group VIII and the Joint. Photographic Experts Group (JPEG) of ISO/IEC ...
[29]
https://www.iso.org/standard/18902.html
[30]
JPEG Image Compression - Interactive Tutorial
Feb 12, 2016 · The JPEG quality factor is a number between 0 and 100 that associates a numerical value with a particular compression level.
[31]
[PDF] Removal Of Blocking Artifacts From JPEG-Compressed Images ...
Ringing artifacts are a class of artifacts that occur specifically due to low-pass filter processing. To understand how they occur, a short account of the ...
[32]
[PDF] Compression Artifact Reduction with Adaptive Bilateral Filtering
The blocking artifacts and other compression artifacts, such as the ringing problem, often truncate the high-frequency DCT coefficients and become more severe ...
[33]
Compression Techniques | WebP - Google for Developers
Aug 7, 2025 · Lossy compression: The lossy compression is based on VP8 key frame encoding. VP8 is a video compression format created by On2 Technologies as a ...
[34]
https://datatracker.ietf.org/doc/rfc6386/
[35]
AV1 Image File Format (AVIF)
Oct 16, 2025 · The AV1 Image File Format supports progressive image decoding through layered images. An AVIF file is designed to be a conformant [HEIF] file ...
[36]
Psychoacoustic Models for Perceptual Audio Coding—A Tutorial ...
This paper provides a tutorial introduction of the most commonly used psychoacoustic models for low bitrate perceptual audio coding.Psychoacoustic Models For... · 3. Coding Of Stereo Signals · 3.1. Binaural Hearing
[37]
[PDF] AUDIO CODING STANDARDS - MP3-Tech.org
Hence the bit allocation is just to apportion the total number of bits available for the quantization of the subband signals to minimize the audibility of the ...
[38]
[PDF] Psychoacoustic Model
A component (at a particular frequency) masks components at neighboring frequencies. Such masking may be partial. m Temporal Masking. When two tones (samples) ...
[39]
[PDF] TITLE PAGE PROVIDED BY ISO - SRS
For different combinations of bitrate and sampling frequency different bit allocation tables exist (3-Annex. B, Table 3-B.2 "LAYER II BIT ALLOCATION TABLES").
[40]
[PDF] Design of the Audio Coding Standards for MPEG and AC-3
This dissertation considers the design for the filterbank, psychoacoustic model, stereo matrix, and bit allocation/ quantization. This dissertation summarizes ...<|separator|>
[41]
[PDF] The Use of FFT and MDCT in MP3 Audio Compression
How is the FFT implemented in MP3 encoding? ○ What is the Modified Discrete Cosine. Transform? ○ How is the MDCT implemented in MP3.
[42]
ISO/IEC 14496-3:2009 - Coding of audio-visual objects
ISO/IEC 14496-3:2009 integrates many different types of audio coding: natural sound with synthetic sound, low bitrate delivery with high-quality delivery ...
[43]
[PDF] ISO/IEC 14496-3 - SRS
ISO/IEC 14496-3 was prepared by Joint Technical Committee ISO/IEC JTC 1, Information technology, Subcommittee SC 29, Coding of audio, picture, multimedia and ...
[44]
[PDF] SPEECH COMPRESSION 1. Linear Predictive Coding (LPC) 2. LPC ...
LPC Speech compression consists of parts. 1. Segment the sampled speech signal into short intervals (10-30 milliseconds long). These segments are called ...
[45]
[PDF] Linear Predictive Coding and the Internet Protocol A survey of LPC ...
Figure 9.1 from this seminal paper depicts the LP parameters being extracted using the autocorrelation method and transmitted to a decoder with voicing ...
[46]
[PDF] “Code-excited Linear Prediction (CELP): High Quality Speech at ...
We describe in this paper a code-excited linear predictive coder in which the optimum innovation sequence is selected from a code book of stored sequences ...
[47]
RFC 6716 - Definition of the Opus Audio Codec - IETF Datatracker
This document defines the Opus interactive speech and audio codec. Opus is designed to handle a wide range of interactive audio applications.
[48]
Enhanced Voice Services Codec for LTE - 3GPP
Nov 7, 2014 · EVS is the first 3GPP conversational codec offering up to 20 kHz audio bandwidth, delivering speech quality that matches other audio input such as stored music.
[49]
[PDF] Codec for Enhanced Voice Services (EVS) - ETSI
This Technical Specification has been produced by the 3rd Generation Partnership Project (3GPP). The contents of the present document are subject to continuing ...
[50]
Speech Coder and Compression Software | VOCAL Technologies
VOCAL's Speex voice codec software solution for custom HD VoIP and file-based speech compression applications is optimized leading DSPs and processors.
[51]
[PDF] Technologies for 3D mesh compression: A survey
In this paper, we performed a survey on current 3D mesh compression techniques ... Kobbelt, Simplification and compression of 3D meshes, in: Tutorials on ...
[52]
BC7 Format - Win32 apps - Microsoft Learn
Dec 14, 2022 · The BC7 format is a texture compression format used for high-quality compression of RGB and RGBA data. For info about the block modes of the BC ...Missing: AMD | Show results with:AMD
[53]
Draco 3D Graphics Compression - Google
Draco is an open-source library for compressing and decompressing 3D geometric meshes and point clouds. It is intended to improve the storage and transmission ...
[54]
Introducing Draco: compression for 3D graphics
Jan 13, 2017 · Draco can be used to compress meshes and point-cloud data. It also supports compressing points, connectivity information, texture coordinates, ...
[55]
Google's Draco for Mixed Reality Applications: Compression Test
Jan 8, 2025 · The description of Draco from the GitHub repo reads “Draco is a library for compressing and decompressing 3D geometric meshes and point clouds.
[56]
Evaluating lossy data compression on climate simulation ... - GMD
We find that applying lossy data compression to climate model data effectively reduces data volumes with minimal effect on scientific results. We apply lossy ...
[57]
Performance evaluation of lossy quality compression algorithms for ...
Jul 20, 2020 · Lossy genomic data compression, especially of the base quality values of sequencing data, is emerging as an efficient way to handle this ...
[58]
Lossy Compression of Integer Astronomical Images Preserving ...
Nov 14, 2024 · This paper presents a novel lossy compression technique that is able to preserve the results of photometry analysis with high fidelity
[59]
[PDF] Fast Error-bounded Lossy HPC Data Compression with SZ
The method linearizes data, uses curve-fitting models for predictable data, and lossy compression for unpredictable data, with a compression ratio of 3.3/1 - ...
[60]
zfp | Computing - Lawrence Livermore National Laboratory
zfp is a BSD licensed open-source library for compressed floating-point and integer arrays that support high throughput read and write random access.
[61]
MGARD: A multigrid framework for high-performance, error ... - arXiv
Jan 11, 2024 · With exceptional data compression capability and precise error control, MGARD addresses a wide range of requirements, including storage ...
[62]
https://www.mcs.anl.gov/papers/P5437-1115.pdf
[63]
Floating Point Compression: Lossless and Lossy Solutions
Our zfp compressor for floating-point and integer data often achieves compression ratios on the order of 100:1, i.e., to less than 1 bit per value of compressed ...
[64]
https://arxiv.org/abs/2401.05994
[65]
https://ieeexplore.ieee.org/document/8989806
[66]
[PDF] On Perceptual Lossy Compression - arXiv
Jun 5, 2021 · Abstract. Lossy compression algorithms are typically de- signed to achieve the lowest possible distortion at a given bit rate.
[67]
IEEE Xplore - Page not Found
**Summary:**
[68]
MPEG-4 scalable lossless audio transparent bitrate and its application
**Summary of Transparent Bitrate for AAC Audio:**
[69]
[PDF] On the Computation of PSNR for a Set of Images or Video - arXiv
Apr 30, 2021 · This paper investigates different approaches to computing PSNR for sets of images, single video, and sets of video and the relation between them ...
[70]
Image quality assessment: from error visibility to structural similarity
We introduce an alternative complementary framework for quality assessment based on the degradation of structural information.
[71]
[PDF] A Survey of Visual Just Noticeable Difference Estimation
The JND threshold reveals the visual redundancy, and thus is useful for perception oriented visual signal processing, e.g., perceptual signal compression, image ...<|separator|>
[72]
Lossy Data Compression: JPEG - Stanford Computer Science
The baseline algorithm, which is capable of compressing continuous tone images to less that 10% of their original size without visible degradation of the image ...<|separator|>
[73]
Image and Video Processing
However, lossy compression methods such as MPEG-4 result in compression ratios from 20 to 200 depending on the video stream. Even MPEG-4 is the most ...Missing: typical | Show results with:typical
[74]
Understand the concept of "Bpp" and "Mbps" to define your ... - intoPIX
Sep 30, 2020 · The "bits per pixel" (bpp) refers to the sum of the "number of bits per color channel" i.e. the total number of bits required to code the color ...The BPP "bits-per-pixel" concept · Chroma subsampling to...
[75]
Bjøntegaard Delta (BD): A Tutorial Overview of the Metric, Evolution ...
Jan 8, 2024 · The Bjøntegaard Delta (BD) method proposed in 2001 has become a popular tool for comparing video codec compression efficiency.
[76]
[PDF] Image compression overview - arXiv
Sep 14, 2014 · For lossless methods, we can get the average of 3-4 times smaller files than the original ones. With lossy methods, we can obtain ratios up to.Missing: numerical | Show results with:numerical
[77]
Lossy compression of x-ray diffraction images
For these images, the "theoretical maximum compression ratio" ranged from 1.2 to 4.8 with mean 2.7 and standard deviation 0.7. The values for Huffmann encoding ...
[78]
AV1 vs HEVC: Which Codec is Best for You? - Gumlet
May 3, 2023 · AV1 offers 30% better performance than HEVC. Cons: The AV1 codec is one of the slowest in terms of encoding/decoding efficiencies and ...Difference Between AV1 vs... · AV1 vs. HEVC: Which One is...
[79]
Understanding The Effectiveness of Lossy Compression in Machine ...
Mar 23, 2024 · Data compression is generally divided into two categories: lossless compression and lossy compression. Lossy compression methods can now be ...
[80]
Transients + Noise Audio Representation for Data Compression and ...
The purpose of this paper is to demonstrate a low bitrate audio ... Transients + Noise Audio Representation for Data Compression and Time / Pitch Scale Modi ...<|separator|>
[81]
[PDF] Drift Compensation for Reduced Spatial Resolution Transcoding
Aug 1, 2002 · This paper discusses the problem of reduced-resolution transcoding of compressed video bitstreams. An anal- ysis of drift errors is provided ...
[82]
(PDF) Drift compensation for reduced spatial resolution transcoding
Aug 5, 2025 · This paper discusses the problem of reduced-resolution transcoding of compressed video bitstreams. An analysis of drift errors is provided ...
[83]
Edit faster with the proxy workflow in Premiere Pro
Sep 19, 2024 · Adobe Premiere Pro logo. Craft the perfect story with Premiere Pro Find the best-in-class video-editing tools all in one place.
[84]
Reimagining the Possibilities of Proxy Workflows for Media Production
Aug 24, 2022 · By using a highly compressed alternative, work can continue on the substitute material that is more appropriate for the remote circumstances.Missing: generational lossy
[85]
https://www.tvtechnology.com/opinion/reimagining-the-possibilities-of-proxy-workflows-for-media-production
[86]
Dynamic adaptive streaming over HTTP (DASH) — Part 1 ... - ISO
This document primarily specifies formats for the Media Presentation Description and Segments for dynamic adaptive streaming delivery of MPEG media over HTTP.
[87]
[PDF] Overview of the Scalable Video Coding Extension of the H.264/AVC ...
Differences between these prediction loops lead to a “drift” that can accumulate over time and produce annoying artifacts. However, the scalability bit stream ...<|control11|><|separator|>
[88]
Variational image compression with a scale hyperprior - arXiv
Feb 1, 2018 · We describe an end-to-end trainable model for image compression based on variational autoencoders. The model incorporates a hyperprior to effectively capture ...
[89]
Deep Generative Adversarial Compression Artifact Removal - arXiv
Apr 8, 2017 · We present a feed-forward fully convolutional residual network model trained using a generative adversarial framework.
[90]
[1912.08771] Computationally Efficient Neural Image Compression
Dec 18, 2019 · We apply automatic network optimization techniques to reduce the computational complexity of a popular architecture used in neural image compression.
[91]
DeepSZ: A Novel Framework to Compress Deep Neural Networks ...
In this paper, we propose DeepSZ: an accuracy-loss expected neural network compression framework, which involves four key steps: network pruning, error bound ...
[92]
[PDF] Computationally-Efficient Neural Image Compression with Shallow ...
Neural image compression methods have seen increas- ingly strong performance in recent years. However, they suffer orders of magnitude higher computational ...
[93]
Intel GPU | Jellyfin
This tutorial guides you on setting up full video hardware acceleration on Intel integrated GPUs and ARC discrete GPUs via QSV and VA-API.
[94]
Oh yeah, 380 fps transcoding (via intel Quicksync)
Mar 15, 2023 · qsv transcode sometimes hits 400fps, it's speed is varying (I guess read/write speed of disk?), but it's constantly above 365 fps, seems to average around 380 ...
[95]
NVIDIA NVENC Obs Guide | GeForce News
Jan 30, 2025 · The latest AV1 codec on NVIDIA GeForce RTX 50 series is 5% more efficient than the previous generation and ~43% more efficient than H.264. This ...
[96]
Export up to 4X faster with hardware encoding (NVENC) in Premiere ...
Mar 25, 2021 · And this results in HUGE time-saving differences! Use the NVIDIA encoder or NVENC ... video is sponsored by NVIDIA #PremierePro #NVENC #NVIDIA.
[97]
[PDF] Accelerating JPEG Decompression on GPUs - arXiv
Nov 17, 2021 · For GPU-accelerated computer vision and deep learning tasks, such as the training of image classification models, efficient JPEG decoding is ...
[98]
[PDF] CuSZ: An Efficient GPU-Based Error-Bounded Lossy Compression ...
Sep 30, 2020 · CuSZ is an optimized GPU-based error-bounded lossy compression framework for scientific data, the first of its kind on GPUs.
[99]
Deploying Transformers on the Apple Neural Engine
Jun 6, 2022 · The 16-core Neural Engine on the A15 Bionic chip on iPhone 13 Pro has a peak throughput of 15.8 teraflops, an increase of 26 times that of ...Missing: 2020s | Show results with:2020s
[100]
(PDF) FPGA-Based Hyperspectral Lossy Compressor With Adaptive ...
Aug 27, 2025 · In this paper, a transform-based lossy compressor, HyperLCA, has been extended to include a run-time adaptive distortion feature that brings ...
[101]
Edge AI in Embedded Devices: What's New in 2025 for IoT and EVs
Sep 26, 2025 · The 2025 Edge AI Technology Report highlights that edge AI is central to minimizing data transmission, reducing latency, and cutting energy ...
[102]
How "exactly" are AI-accelerator chip ASICs built differently than ...
Jan 10, 2023 · AI ASICs have fixed-function for specific tasks, like image recognition, while GPUs are general-purpose. ASICs are faster for neural networks, ...<|separator|>