Fact-checked by Grok 2 weeks ago

Sub-band coding

Sub-band coding is a technique that decomposes an input signal into multiple subbands using a bank of bandpass filters, followed by to reduce the sampling rate in each subband, independent quantization and encoding of the subband signals with optimized bit allocation, and reconstruction via and filtering to approximate the original signal. This approach exploits the varying perceptual importance and statistical properties of different components to achieve efficient while minimizing , particularly in applications like audio and image processing. The core components of sub-band coding include analysis filter banks for signal decomposition, which apply low-pass and high-pass filters to separate frequency bands, and synthesis filter banks for reconstruction, often designed using quadrature mirror filters (QMFs) to ensure perfect reconstruction with minimal and phase distortion. Decimation by a factor of two in each subband halves the data rate per band, enabling adaptive bit allocation based on subband energy or perceptual models to prioritize bits for perceptually sensitive frequencies. Quantization introduces errors that are controlled across the frequency spectrum, and techniques like further reduce redundancy within subbands. Sub-band coding emerged in the 1970s as an extension of multirate signal processing, with foundational work on filter banks for perfect reconstruction by Croisier et al. in 1976 and detailed theoretical development in Crochiere and Rabiner's 1983 book Multirate Digital Signal Processing. It gained prominence in the 1980s for speech compression and evolved in the 1990s through connections to wavelet transforms, as explored in Vetterli and Kovačević's 1995 book Wavelets and Subband Coding, which unified subband methods with multiresolution analysis for better energy compaction. Advances in filter design and computational efficiency have made it a cornerstone of modern compression standards. Notable applications include audio coding in standards like MPEG-1 Audio Layer III () and Advanced Audio Coding (), where subband decomposition enables perceptual noise shaping for high-fidelity compression at low bit rates. In image and video compression, it underpins , which uses wavelet-based subband coding for scalable, progressive transmission and superior performance over DCT-based at low bit rates. Other uses span for , hyperspectral image analysis, and error-resilient transmission in systems.

Fundamentals

Definition and Motivation

Sub-band coding (SBC) is a technique that decomposes an input signal into narrower sub-bands through the application of a bank of bandpass filters, permitting independent processing, quantization, and coding of each sub-band to facilitate efficient while targeting specific content. This decomposition exploits the -domain structure of the signal, allowing for reduced redundancy and tailored representation compared to time-domain methods. The motivation for sub-band coding arises from the limitations of uniform coding schemes like (PCM), which apply equal bit resolution across the entire signal spectrum and thus inefficiently allocate resources to high-frequency components that contribute little to perceptual quality. Instead, SBC leverages signal statistics—such as the concentration of energy in lower-frequency bands—and perceptual models, including auditory or visual masking where stronger signals obscure weaker ones in nearby frequencies, to enable non-uniform bit allocation that minimizes bitrate without audible or visible degradation. This approach is particularly advantageous for bandwidth-constrained applications like audio and , where preserving subjective is paramount. In its basic model, an input discrete-time signal x(n) is divided into sub-bands via filter banks, with each sub-band signal decimated to lower its sampling rate, quantized based on perceptual relevance, and encoded for transmission or storage at a reduced overall bitrate. Reconstruction synthesizes the original signal by the coded sub-bands, applying synthesis filters, and summing the results to achieve at lower data rates. A representative example in illustrates this efficiency: (CD) audio uses 16-bit PCM at a stereo bitrate of 1.411 Mbit/s (44.1 kHz sampling), but early sub-band coders like the perceptual coder achieve near-CD quality at approximately 110 kbps per channel by assigning fewer bits to masked high frequencies and exploiting sub-band .

Historical Development

Sub-band coding emerged in the 1970s as an application of multirate techniques to speech , building on foundational work in decimation and . Early explorations demonstrated that dividing speech signals into sub-bands allowed for more efficient quantization and bit allocation, reducing overall coding rates while controlling noise. A seminal contribution was the 1976 paper by Crochiere, Webber, and Flanagan, which proposed digitally coding speech in sub-bands using adaptive bit allocation based on perceptual importance, achieving significant bitrate reductions for applications. Concurrently, Crochiere and Rabiner advanced the theoretical underpinnings through their work on multirate processing, providing tools for efficient sub-band decomposition without excessive computational overhead. In the , advancements in propelled sub-band coding toward practical implementation, particularly through the of quadrature mirror filters (QMFs) that enabled near-perfect reconstruction with minimal . At Bell Laboratories, J. D. Johnston introduced a family of optimized filters specifically tailored for QMF banks in 1980, improving frequency selectivity and reconstruction quality for audio signals. This era also saw the first international standardization with ITU-T G.722 in 1988, a sub-band (SB-ADPCM) operating at 64 kbit/s for (7 kHz), marking a milestone in high-quality speech transmission over digital networks. The 1990s integrated sub-band coding into multimedia standards, blending it with transform techniques for broader applications. The MPEG-1 Audio standard, finalized in 1992, incorporated a hybrid sub-band/transform filter bank in its Layer III () profile, enabling efficient compression of music at bitrates around 128 kbit/s per channel and revolutionizing distribution. Martin Vetterli and Jelena Kovačević's 1995 book Wavelets and Subband Coding formalized the deep connections between sub-band methods and wavelet theory, influencing subsequent designs by emphasizing multiresolution analysis for signal representation. Modern developments extended sub-band coding's efficiency into the late 1990s and beyond, with Advanced Audio Coding (AAC) standardized in 1997 as a perceptual coder using MDCT-based sub-bands for multichannel audio at lower bitrates than MP3. High-Efficiency AAC (HE-AAC), introduced in 2003, further enhanced low-bitrate performance through spectral band replication, achieving transparent quality at 48 kbit/s for stereo. In imaging, (2000) adopted wavelet sub-band decomposition for scalable compression, supporting lossless to lossy modes and outperforming DCT-based in visual quality.

Signal Decomposition and Analysis

Filter Banks

In sub-band coding, the analysis serves as the core mechanism for decomposing the input signal into multiple sub-bands, enabling efficient representation and subsequent processing. It consists of a parallel array of bandpass filters H_k(z), indexed by k = 0 to M-1, where M denotes the number of sub-bands, each designed to isolate a specific portion of the signal's . Following , each sub-band signal undergoes downsampling by the factor M, which reduces the sampling rate and data volume while preserving essential spectral content, thereby achieving critical sampling in filter banks. The output of the k-th downsampler, representing the decimated sub-band signal, is expressed as y_k(m) = \sum_{n} h_k(n - mM) x(n), where h_k(n) is the of the k-th analysis filter and x(n) is the input signal; this convolution-decimation operation efficiently extracts the sub-band components without redundant computation. A foundational type of filter bank is the two-channel quadrature mirror filter (QMF) bank, which performs critically sampled decomposition by splitting the signal into low- and high-frequency sub-bands using a and its mirror image, with cancellation properties inherent to the QMF . For broader applications requiring more sub-bands, polyphase implementations enhance efficiency by restructuring the into polyphase components followed by a computationally lightweight stage, reducing the overall operation count in multirate systems. Design principles for these filter banks emphasize maximizing frequency selectivity to sharply delineate sub-bands, thereby minimizing energy leakage, while simultaneously suppressing introduced by downsampling through appropriate filter and transition band control. (FIR) filters are often preferred for their characteristics and stability, though (IIR) filters can offer sharper responses at lower computational cost in some configurations. From a computational perspective, multirate processing in filter banks eliminates redundancy by aligning with filtration, achieving up to M-fold reduction in processing load compared to non-multirate alternatives. Additionally, intentional overlap in the responses of adjacent filters ensures smooth transitions and avoids discontinuities at sub-band boundaries, supporting seamless signal .

Sub-band Division Process

The sub-band division process in sub-band coding constitutes the stage, where the input signal is decomposed into frequency-specific sub-bands to enable efficient processing and compression. This begins with filtering the input signal using an , which applies bandpass filters to isolate distinct components, such as lowpass and highpass filters denoted as H_0(z) and H_1(z) in the z-domain. Each sub-band signal is then downsampled by an integer factor, typically 2 or N for an N-channel bank, to reduce the sampling rate and data volume while preserving essential information; this stretches the spectrum of each sub-band. Optionally, further transformations like the (DCT) may be applied within individual sub-bands to achieve additional decorrelation and energy compaction, particularly in image or audio applications. Frequency partitioning during this process can be uniform, dividing the signal spectrum into equal bandwidth sub-bands, or critical (non-uniform), such as octave-like bands that align with perceptual models like the Bark scale in audio to better match human psychoacoustics. Uniform partitioning is straightforward for fixed-rate systems, while critical partitioning optimizes for varying signal energy distribution across frequencies, often using logarithmic spacing to emphasize lower frequencies where perceptual sensitivity is higher. To mitigate introduced by downsampling, filters—typically lowpass filters with a cutoff at \pi/N—are employed prior to , ensuring that overlap from adjacent bands is minimized or canceled through careful . Quadrature mirror filters (QMF) serve as a common choice for this analysis stage due to their ability to provide near-perfect cancellation in critically sampled systems. A representative example is a 32-sub-band audio for signals from 0 to 20 kHz sampled at 48 kHz, where the spectrum is partitioned into uniform bands of approximately 625 Hz each to facilitate perceptual coding. The signal flow can be implemented via parallel filter banks, where all sub-bands are processed simultaneously, or tree-structured banks for multi-resolution analysis, involving iterative decomposition (e.g., applying lowpass/highpass pairs successively to create bands). In the parallel configuration, the input feeds into multiple analysis filters followed by downsamplers; tree structures, by contrast, cascade filters hierarchically, downsampling at each level to build a pyramid of resolutions suitable for progressive coding.

Coding and Quantization

Quantization Techniques

Quantization in sub-band coding involves mapping the continuous amplitude values of sub-band coefficients, obtained after signal decomposition, to a of levels to achieve data rate reduction while minimizing perceptual . This introduces quantization , which must be controlled to remain inaudible or imperceptible. Two primary approaches are scalar quantization, applied independently to each coefficient, and , which es groups of coefficients jointly to exploit statistical dependencies. Scalar quantization is widely used due to its low computational complexity and is typically either uniform, with equal step sizes across the range, or non-uniform, with varying step sizes to better match signal distributions or perceptual characteristics. In uniform scalar quantization, the coefficient y is quantized as q = \round(y / \Delta) \cdot \Delta, where \Delta is the fixed quantization step size and \round denotes rounding to the nearest integer. Non-uniform variants, such as companded quantization, apply a nonlinear mapping before uniform quantization to allocate finer resolution to smaller amplitudes, improving signal-to-noise ratio for low-level signals. Vector quantization offers higher efficiency by treating multiple sub-band coefficients as a and mapping it to the nearest codeword from a predefined , potentially capturing inter-coefficient correlations for better at low bit rates. However, its higher complexity limits widespread adoption in real-time sub-band coders, where scalar methods predominate. Perceptual quantization tailors the process to human sensory models, ensuring quantization noise falls below psychoacoustic masking thresholds to achieve at reduced . In audio applications, bit allocation assigns fewer bits to sub-bands where signal energy is below the absolute hearing threshold or masked by stronger components, concentrating resources on perceptually regions. The step size \Delta is often set inversely proportional to the masking level, such that higher masking allows coarser quantization without audible artifacts. Adaptive quantization dynamically adjusts the step size based on local signal energy or perceptual criteria within each sub-band, providing finer for high-energy segments and coarser for low-energy ones to optimize overall noise distribution. Noise shaping further refines this by spectral or temporal redistribution of quantization error, pushing it into or time regions of high masking to enhance perceived . In audio sub-band coding, simultaneous masking—arising from frequency proximity—and temporal masking—due to onset/offset effects—are modeled to determine sub-band-specific step sizes \Delta_k for the k-th sub-band, ensuring noise remains below the combined masking threshold. This approach, rooted in psychoacoustic principles, enables high-fidelity as demonstrated in standards like MPEG Audio.

Bit Allocation and Entropy Coding

In sub-band coding, bit allocation dynamically distributes a limited total bitrate across sub-bands to minimize reconstruction distortion while satisfying the rate constraint. This optimizes rate-distortion performance by assigning more bits to sub-bands with higher signal variance or perceptual significance, often through minimization of (MSE) under perceptual weighting. The approach ensures efficient use of bits by prioritizing sub-bands that contribute most to overall quality, treating the quantized sub-band outputs as inputs for this allocation. For signals modeled as parallel Gaussian sources, the water-filling algorithm provides an optimal strategy, iteratively pouring "water" (bits) into sub-bands to equalize the up to a common level, allocating zero bits to sub-bands below this . This method, adapted from , maximizes the total rate or minimizes for a given rate by favoring stronger sub-bands. The optimal bit assignment for each sub-band k is derived from rate-distortion theory as b_k = \frac{1}{2} \log_2 \left( \frac{\sigma_k^2}{\lambda} \right), where \sigma_k^2 denotes the variance of the sub-band signal and \lambda is the Lagrange multiplier adjusted to meet the total bitrate constraint; integer rounding and iterative refinement are applied in practice for feasibility. In perceptual applications like audio coding, bit allocation incorporates psychoacoustic models to weight sub-bands based on human auditory masking thresholds, ensuring imperceptible distortion. These models, as specified in ISO MPEG guidelines, compute signal-to-mask ratios to guide allocation, emphasizing tonal and noise-like components while de-emphasizing masked regions. For instance, in the MP3 audio standard, variable bit allocation is applied across 576 frequency-domain samples per granule, dynamically adjusting bits per sub-band to balance compression and perceptual fidelity. Post-quantization, entropy coding compresses the allocated bitstream by exploiting symbol probabilities, further reducing redundancy without loss. assigns variable-length prefix codes to quantized indices, with shorter codes for frequent values like near-zero coefficients, achieving near- rates for typical sub-band distributions. offers superior efficiency by encoding entire sequences into a single fractional number, approaching the theoretical limit more closely than Huffman, especially for non-integer bit requirements. In sparse high-frequency sub-bands, where many coefficients are zero, complements these methods by efficiently representing consecutive zeros as (value, length) pairs, minimizing bits for insignificant details.

Reconstruction and Synthesis

Synthesis Filter Banks

In sub-band coding, the synthesis filter bank reconstructs the original signal from the quantized and coded sub-band signals by first each sub-band component and then applying low-pass or band-pass followed by summation across all bands. Specifically, each quantized sub-band signal y_k(m), for k = 0, 1, \dots, M-1, is upsampled by the decimation factor M through zero-insertion, which expands the sampling rate and replicates the , after which it is filtered by the g_k(n) (or G_k(z) in the z-domain) to interpolate and suppress artifacts before the outputs are added to form the reconstructed signal \hat{x}(n). This inverts the stage, where the input signal was decomposed into sub-bands via downsampling and filtering. The design of filters typically employs mirror-image relationships to the corresponding filters in the counterpart bank, such that g_k(n) = h_k(-n), ensuring compatibility and facilitating management while maintaining properties in () implementations. For computational efficiency, polyphase representations decompose the filters into parallel branches, avoiding redundant operations in the upsampling and filtering cascade and reducing the overall structure to a in the polyphase domain. These designs are particularly effective in maximally filter banks, where the number of channels equals the factor M, allowing the process to operate at the original sampling rate after combination. Aliasing distortions introduced during the downsampling in the stage are canceled in the bank through the coordinated filter responses, where the filters are crafted to nullify the shifted components arising from effects in the sub-bands. This cancellation relies on the filters' frequency selectivity, ensuring that unwanted aliases from adjacent bands do not propagate into the reconstructed output. The reconstructed signal can be expressed in the time domain as \hat{x}(n) = \sum_{k=0}^{M-1} \sum_{m=-\infty}^{\infty} g_k(n - mM) \, y_k(m), where g_k(n) denotes the of the k-th filter, and the inner sum accounts for the upsampled nature of y_k(m) by spacing the contributions every M samples. This convolution-based formulation highlights the role of the filters in recovering the full-bandwidth signal. In practical implementations, the of the is mitigated through polyphase structures, which achieve linear with length, but FFT-based methods for modulated or DFT further reduce it to O(M \log M) operations per output sample, making them suitable for audio and applications despite the multi-channel .

Conditions for Perfect Reconstruction

Perfect reconstruction (PR) in sub-band coding refers to the ability of a to recover the original input signal X(z) exactly as a delayed version \hat{X}(z) = z^{-l} X(z), where l is an delay, without any distortion or artifacts. This property is essential for lossless decomposition and reconstruction, ensuring that the coding process does not introduce irreversible errors in the absence of quantization. In a two-channel mirror (QMF) , is achieved when the filters H_0(z) (low-pass) and H_1(z) (high-pass), along with filters G_0(z) and G_1(z), satisfy specific conditions derived from the polyphase representation. The function must equal H_0(z) G_0(z) + H_1(z) G_1(z) = 2z^{-l}, ensuring no or beyond the delay, while the cancellation condition requires H_1(z) G_0(z) - H_0(z) G_1(z) = 0. These equations arise from the overall of the critically sampled system ( factor of 2), where terms from downsampling are eliminated by appropriate choice of filters, such as G_0(z) = H_1(-z) and G_1(z) = -H_0(-z). For advanced designs, paraunitary filter banks enable orthogonal , where the and filters form an , preserving energy and allowing efficient implementation via structures. In wavelet filter banks, is extended to multi-resolution , supporting hierarchical sub-band decomposition with compactly supported filters that maintain across scales. A key states that is possible if the filters are biorthogonal, meaning the and filter sets satisfy inner product conditions that ensure invertibility without . In practice, quantization during coding introduces errors that prevent exact , leading to imperfect reconstruction where the error signal e degrades the output. This degradation is quantified by the (SNR), defined as \text{SNR} = 10 \log_{10} (P_x / P_e), where P_x is the power of the original signal and P_e is the power of the quantization error; higher SNR values indicate better , typically targeted above 30 in audio applications. Oversampled filter banks, where the decimation factor is less than the number of channels (introducing redundancy), relax the PR conditions by providing frame expansions that allow robust reconstruction even with imperfect filters, as the additional samples compensate for aliasing and distortion.

Applications

Audio Compression

Sub-band coding plays a pivotal role in audio compression by decomposing the signal into frequency subbands, allowing efficient encoding that exploits human auditory perception within the typical hearing range of 20 Hz to 20 kHz. This approach enables significant bitrate reduction while maintaining perceptual quality, as it discards inaudible components and allocates bits based on psychoacoustic models that account for masking effects. In early standards like Layer I, audio-specific adaptations employ a 32-subband quadrature mirror (QMF) bank to divide the input signal into equal-width bands, supporting sampling rates of 32, 44.1, and 48 kHz at bitrates from 32 to 448 kbit/s. This subband structure facilitates uniform quantization across bands, with performance demonstrated in reducing the uncompressed CD audio bitrate of 1411 kbit/s (for 16-bit, 44.1 kHz stereo) to levels like 192 kbit/s while preserving near-transparent quality for most listeners. A notable example is the format ( Layer III), which uses a combining a polyphase QMF for initial 32-subband division with a (MDCT) to further refine resolution, achieving 128 kbit/s for stereo audio; here, the polyphase splits the signal into 32 bands, which are then quantized according to a psychoacoustic model that determines masking thresholds for bit allocation. Other codecs illustrate sub-band coding's versatility in audio applications. The G.722 standard implements sub-band (SB-ADPCM), splitting the signal into two bands to cover a 7 kHz at 64 kbit/s, providing speech quality superior to alternatives. (AAC), an evolution with roots in sub-band techniques, primarily uses a 1024-point MDCT but incorporates perceptual subband-like processing to support sampling rates up to 96 kHz across multiple channels. In modern usage, the , standardized in 2012, employs a approach integrating sub-band coding elements with and MDCT for low-latency VoIP, enabling bitrates from 6 to 510 kbit/s with robust performance in real-time communication.

Image and Video Coding

Sub-band coding has been adapted for by extending one-dimensional filter banks to two-dimensional separable structures, where are applied separately along rows and columns to decompose images into frequency sub-bands. This approach enables multi-resolution analysis, capturing both low-frequency approximations and high-frequency details essential for visual fidelity. In , the standard utilizes the (DWT) implemented via such 2D filter banks, achieving efficient coding through sub-band decomposition into as many as 16 or more levels. For in , the reversible 5/3 LeGall filter is employed, featuring coefficients that support exact without information . In , the irreversible 9/7 is used for , offering superior energy compaction for high-frequency sub-bands at compression ratios such as 20:1 to 50:1, particularly effective for complex textures. These facilitate progressive transmission, where low-frequency sub-bands are prioritized for quick previews. Bit allocation in these systems can incorporate visual masking models, akin to psychoacoustic principles in audio, to allocate fewer bits to perceptually less sensitive sub-bands. Key processes in image sub-band coding include the Laplacian pyramid, which generates a series of difference images between Gaussian-smoothed versions at progressively reduced resolutions, forming bandpass sub-bands that enhance edge preservation. Lapped transforms, such as the modulated lapped transform (MLT), overlap adjacent blocks to reduce blocking artifacts, improving coding efficiency in non-stationary image regions. For encoding, embedded zerotree (EZW) coding exploits the statistical dependency of wavelet coefficients across sub-bands, treating insignificant coefficients as zerotrees rooted in parent-child relationships for scalable, embedded bitstreams suitable for progressive transmission. In performance, demonstrates superior compression to DCT-based , especially at high ratios above 20:1, where sub-bands preserve more details and reduce artifacts like ringing, achieving typically 1-3 dB higher PSNR in natural images. For video coding, sub-band techniques extend to three-dimensional decompositions, incorporating temporal dimensions alongside spatial ones to handle motion efficiently. Motion-compensated sub-band coding (SBC) appears in intra-frame processing within scalable extensions like H.264/AVC's , where sub-bands are formed for enhancement layers using -like decompositions on residual frames. employs multi-band layers for spatial, temporal, and quality scalability, allowing layered bitstreams that adapt to varying network conditions. A prominent example is the use of 3D wavelets with motion-compensated temporal filtering (MCTF), which applies temporal sub-bands across frames after motion compensation to decorrelate sequences, followed by spatial 2D wavelets on the resulting sub-bands for compact representation. This enables efficient handling of dynamic scenes, with temporal low-pass sub-bands capturing global motion and high-pass sub-bands isolating local changes. Video SBC via these methods supports SNR scalability, where finer quantization in higher sub-bands allows graceful quality degradation by truncating enhancement layers without disrupting base-layer decoding.

Other Applications

Sub-band coding is also applied in hyperspectral image analysis for efficient of multi-spectral data, using 3D decompositions to exploit spectral and spatial correlations while preserving fine spectral details essential for and material identification. Additionally, it enhances error-resilient transmission in systems by enabling unequal error protection, where more bits or redundancy are allocated to perceptually important low-frequency subbands, improving robustness against in networks.

Advantages and Limitations

Key Benefits

Sub-band coding provides significant efficiency gains by enabling adaptive coding tailored to the characteristics of individual frequency bands, exploiting the non-stationary nature of signals to allocate bits more effectively than uniform methods like (PCM). This approach can achieve bitrate reductions of up to 50%, such as compressing speech from 128 kbps to 64 kbps while maintaining quality, due to the coding gain derived from variance disparities across sub-bands. A key perceptual advantage lies in confining quantization errors to specific sub-bands where they are masked by the human auditory or visual system, ensuring distortions remain imperceptible and preserving transparency even at low bitrates. For instance, in , this psychoacoustic exploitation allows systems like MPEG-1 Audio Layer I—based on sub-band filtering—to deliver near-CD quality at 384 kbps, compared to the 1.411 Mbps required for uncompressed PCM, representing roughly a 1/3 reduction, with further layers achieving up to 1/10th the bitrate. The flexibility of sub-band coding supports scalable decoding from coarse to fine resolutions, facilitating progressive transmission and compatible with multi-resolution analysis through implementations that decompose signals hierarchically. Additionally, the independence of sub-bands enables , allowing distributed computation across bands to reduce and computational load in systems via pipelined or polyphase structures. In image compression, sub-band methods often outperform block-based transforms like DCT by providing better energy compaction and higher (PSNR), with less blocking artifacts at low rates, as demonstrated in comparative studies. These benefits manifest across applications, such as audio standards achieving perceptual transparency and image coders yielding superior reconstruction metrics.

Challenges and Drawbacks

Sub-band coding systems exhibit high primarily due to the implementation of and filter banks, which often require O(N log N) operations per frame in FFT-based designs for signal decomposition and reconstruction. This overhead can be mitigated through efficient algorithms such as the (MDCT), which reduces processing demands while maintaining critical sampling properties in audio applications. Imperfections in reconstruction lead to various artifacts, including ringing effects around edges, exacerbated by long synthesis filters and quantization noise. In critically sampled filter banks, aliasing artifacts arise from subsampling, manifesting as spectral overlaps that degrade signal quality unless compensated by perfect reconstruction conditions. These issues are particularly pronounced in image and video coding, where nonlinear phase responses introduce additional waveform distortions. Bit allocation in sub-band coding necessitates side information transmission, such as scale factors and quantization indices, which can constitute up to 14 kb/s per channel and represent a notable portion of the total bitrate (approximately 5-10% in mid-range scenarios). This overhead renders the system sensitive to channel errors during transmission, as corruption of side information can propagate distortions across subbands. At very low bitrates, sub-band coding proves less effective without approaches, such as combining it with or sinusoidal modeling, as energy compaction falters in high-frequency subbands leading to increased . involves inherent trade-offs between length and delay; longer filters improve frequency selectivity and reduce but introduce greater algorithmic delay (proportional to half the filter length in implementations). In the 2020s, traditional sub-band coding has been largely surpassed by neural audio codecs, which achieve superior perceptual quality at ultra-low bitrates through end-to-end learning, outperforming classical methods in efficiency and artifact suppression.

References

  1. [1]
    Subband Coding - an overview | ScienceDirect Topics
    Subband coding is a signal processing technique that involves dividing an input signal into multiple frequency bands, known as subbands. 1 2. In digital ...Introduction to Subband... · Theoretical Foundations and...
  2. [2]
    [PDF] Chapter 4 Subband Transforms - Perceptual Science Group @ MIT
    Each decimated subband signal encodes a particular portion of the frequency spectrum, corre- sponding to information occurring at a particular spatial scale. To ...
  3. [3]
    None
    Summary of each segment:
  4. [4]
    [PDF] Perceptual Coding of Digital Audio - MP3-Tech.org
    Perceptual coding aims to represent audio with minimal bits while achieving transparent reproduction, using psychoacoustic principles to control distortion.
  5. [5]
    [PDF] Digital Audio Compression 1 Abstract - LAME MP3 Encoder
    For example, the audio data on a compact disc (2 channels of audio sampled at 44.1 kHz with 16 bits per sample) requires a data rate of about 1.4 megabits per ...
  6. [6]
    Digital Coding of Speech in Sub‐bands - Wiley Online Library
    A rationale is advanced for digitally coding speech signals in terms of sub-bands of the total spectrum.
  7. [7]
    [PDF] Martin Vetterli & Jelena Kovačević - Wavelets and Subband Coding
    ... 1976 [69], which allows a signal to be split into two downsampled subband ... Croisier, Esteban, Galand [69] is known under the name. QMF (quadrature ...
  8. [8]
    JPEG 2000
    JPEG 2000 is the Swiss Army knife of image codecs. It supports: JPEG 2000 is faster and higher quality than JPEG. It is used worldwide in many high-performance ...Software · High-performance applications · Documentation
  9. [9]
    Multirate Filter Banks for Subband Coding - SpringerLink
    This chapter develops the theory of multirate signal processing as used in subband coding systems. Multirate operations are reviewed, multirate filter banks ...<|control11|><|separator|>
  10. [10]
    [PDF] Filter Banks in Perceptual Audio Coding
    The down-sampling operation can introduce aliasing in the signal spectrum if there is overlap between adjacent band-pass filters, while the up-sampling ...
  11. [11]
    [PDF] Tutorial on Perceptual Audio Coding Algorithms
    Some algorithms use a subband decomposition with frequency bands approximating the critical bands in order take advantage of frequency domain masking. Figure 4: ...
  12. [12]
    Coded Quantization - an overview | ScienceDirect Topics
    If the values we are looking at are scalars, the process is called scalar quantization; and if the values are vectors, the process is called vector quantization ...
  13. [13]
    [PDF] Perceptual coding of digital audio - Center for Neural Science
    The global masking threshold comprises an estimate of the level at which quantization noise becomes just noticeable. Conse- quently, the global masking ...
  14. [14]
    [PDF] Perceptual Coding of High-Quality Digital Audio - Index of /
    ABSTRACT | This paper introduces high-quality audio coding using psychoacoustic models. This technology is now abun- dant, with gadgets named after a ...Missing: seminal | Show results with:seminal
  15. [15]
    Subband audio coding using a perceptually hybrid vector-scalar ...
    This paper presents a novel perceptually hybrid vector-scalar quantization scheme for high quality subband audio coding at low bit rates.
  16. [16]
    [PDF] 1 SUBBAND IMAGE COMPRESSION
    Abstract: This chapter presents an overview of subband/wavelet image com- pression. Shannon showed that optimality in compression can only be achieved.<|control11|><|separator|>
  17. [17]
    [PDF] A tutorial on MPEG/audio compression - IEEE Multimedia
    The psy- choacoustic model computes the signal-to-mask ratio as the ratio of the signal energy within the subband (or, for Layer III, a group of bands) to the.
  18. [18]
    [PDF] The Theory Behind Mp3
    The third mode, called Layer III, manages to compress CD music from 1.4 Mbit/s to 128 kbit/s with almost no audible degradation.
  19. [19]
    Entropy coding techniques (Chapter 4) - Digital Signal Compression
    Huffman coding is one common form of entropy coding. Another is arithmetic coding and several adaptive, context-based enhancements are parts of several standard ...
  20. [20]
    [PDF] QMF - Purdue Engineering
    In many applications, a discrete-time signal x[n] is split into a number of subband signals by means of an analysis filter bank. • The subband signals are ...
  21. [21]
    Perfect Reconstruction Filter Banks - an overview - ScienceDirect.com
    A perfect reconstruction filter bank is defined as a filter bank that reconstructs the output signal as a pure delayed version of the input signal, ...
  22. [22]
  23. [23]
  24. [24]
    [PDF] SUBBAND IMAGE CODING
    Subband image coders consist of three essential stages: decomposition of the signal into frequency bands by means of subband lter banks, quantization of the ...
  25. [25]
  26. [26]
    Human Hearing
    Human hearing involves the outer, middle, and inner ear, the basilar membrane, and the place principle. The range is 20 Hz to 20 kHz, with 1-4 kHz being most ...<|separator|>
  27. [27]
    Audio | MPEG
    MPEG-1 Layer 3 was standardized for the higher sampling rates of 32, 44.1 and 48 kHz in MPEG-1 in 1992.. Figure 1 shows a high level overview of the MPEG-1 ...
  28. [28]
    [PDF] MP3 and AAC Explained
    MPEG-1 Layer-3 has been defined in 1991. Since then, research on perceptual audio coding has progressed and codecs with better compression efficiency became ...
  29. [29]
    [PDF] JPEG-2000
    Improved compression efficiency (vs. JPEG). ○ Highly scalable embedded data streams. ○ Progressive lossy to lossless compression within a single data stream.
  30. [30]
    [PDF] JPEG2000: Wavelet Based Image Compression - Helmut Knaust
    The standard restricts Daubechies 9/7 for lossy compression, and the 5/3 LeGall wavelet, which has rational coefficients, for reversible or lossless compression ...
  31. [31]
    [PDF] Image compression using wavelets and JPEG2000: a tutorial
    Image compression uses wavelets (DWT) to decompose images into high and low-frequency bands, then uses subband coding and algorithms like SPIHT, EZW, and EBCOT.
  32. [32]
    [PDF] Lapped transforms for efficient transform/subband coding - Microsoft
    The LOT and the MLT are both asymptotically optimal lapped transforms for coding an AR(1) signal with a high intersample correlation coefficient. The coding ...
  33. [33]
    Comparison between JPEG and JPEG 2000
    For lossy compression, data has shown that JPEG 2000 can typically compress images from 20%-200% more than JPEG. Compression efficiency for lossy compression is ...
  34. [34]
    A new subband/wavelet framework for AVC/H.264 intraframe coding ...
    Apr 24, 2008 · This paper develops a new intraframe scalable coding framework based on a subband/wavelet coding approach for MPEG-4 AVC/H. 264 scalable video ...
  35. [35]
    [PDF] Scalable Video Coding - LIRMM
    Each high-frequency subband is encoded independently using base-layer H.264/AVC as shown in Fig. 8. 1.5. Adaptive scan for high frequency (HF) subbands in SVC.
  36. [36]
    Motion-Compensated 3D Wavelet Video Coding Based on Adaptive ...
    In wavelet-based video coding with motion-compensated lifting, efficient compression is achieved by exploiting motion-compensated temporal filtering (MCTF).
  37. [37]
    Scalable video compression using longer motion-compensated ...
    Three-dimensional (3-D) subband/wavelet coding using a motion compensated temporal filter (MCTF) is emerging as a very effective structure for highly scalable ...
  38. [38]
    [PDF] a fully scalable 3d subband video codec
    These algorithms were successfully extended to 3D video coding systems to give some of the most effective SNR-scalable video coders, such as the 3D. Set ...
  39. [39]
  40. [40]
    [PDF] Wavelets and Multiresolution Processing
    ∗ Used in subband coding to develop fast wavelet transform. ∗ Defined by ... ∗ Progressive by pixel accuracy (SNR scalability) and image resolution.
  41. [41]
    [PDF] AVC-149 - ITU
    ... coding systems due to the 16 subband processing scheme. The low data rates in each subband allow parallel sprocessing with more flexible processor ...<|separator|>
  42. [42]
    (PDF) Subband conversion for feature extraction from compressed ...
    That leads, in general, to a reduction of the computational complexity from O (N log N) to O(N), compared to the conventional method of first decoding and ...
  43. [43]
    [PDF] AUDIO COMPRESSION USING MODIFIED DISCRETE COSINE ...
    It is used to split the input PCM signal with sampling frequency fs into subbands. The result will be 32 subbands which are equally spaced with sampling ...
  44. [44]
    [PDF] An Overview of the Coherent Acoustics Coding System
    Ultimately, at a sufficiently high bit rate, the noise floor in each sub-band can be reduced until it is equal to that of the source PCM signal, at which point ...
  45. [45]
    [PDF] Sub-band Coding of Speech Dynamic Bit Allocation
    This is mainly due to the phenomenon of masking, because of which an audio signal inhibits the perception of another audio signal. If the signal-t ...
  46. [46]
    High quality audio coding using a novel hybrid WLP-subband ...
    The proposed codec is capable of providing high quality audio output at low bit rate. Subjective tests have shown that the proposed codec is able to provide a ...Missing: limitations | Show results with:limitations
  47. [47]
    [PDF] WOLA, the filter - onsemi
    Effectively, the delay associated for a symmetric FIR filter of length L is L/2. The calculation load is also increased. N>R), decreasing the length Ls of the ...
  48. [48]
    A high quality neural audio codec with low-complexity decoder - arXiv
    Oct 17, 2025 · Neural audio coding has been shown to outperform classical audio coding at extremely low bitrates. However, the practical application of neural ...