Fact-checked by Grok 2 weeks ago

Audio codec

An audio codec, short for coder-decoder, is a or software that encodes analog audio signals into a compressed format for efficient transmission or storage and decodes them back to reconstruct the original signal for playback. These codecs are essential in , , and , enabling the reduction of data rates while aiming to preserve audio quality through techniques like perceptual coding, which exploits human auditory limitations to discard inaudible information. The development of audio codecs traces back to early digital audio efforts in the , with perceptual audio coding research gaining momentum around 1986 to achieve that maintains near-transparent quality at low . Milestone standards emerged in the 1990s, including the Audio Layer III () codec, defined in 1991 and finalized in 1992, which revolutionized digital music distribution by compressing CD-quality audio to about 1/12th its original size without significant perceptual loss. Subsequent advancements, such as Advanced Audio Coding () developed in the mid-1990s and standardized in 1997, improved efficiency further, requiring roughly 70% of 's for equivalent quality and supporting multichannel audio. Audio codecs are broadly categorized into uncompressed, lossless, and lossy types, with uncompressed formats like (PCM) retaining all original data at full size, lossy variants like and discarding data deemed imperceptible to achieve higher compression ratios, often at bit rates from 64 to 320 kbps, while lossless codecs such as (Free Lossless Audio Codec) preserve all original data, resulting in files about half the size of uncompressed PCM but with no quality degradation. Hybrid approaches, including scalable codecs like (standardized in 2012 by IETF), combine layers for adaptive quality based on network conditions, supporting bit rates from 6 to 510 kbps and applications from voice calls to high-fidelity streaming. Widely used codecs also include older telephony standards like ( at 64 kbps for basic voice) and (wideband at 48-64 kbps for improved clarity), which form the backbone of VoIP and broadcast systems. In modern contexts, codecs facilitate diverse applications: powers platforms like Apple's and , enables open-source formats like Ogg, and (Adaptive Multi-Rate) supports mobile speech at variable bit rates from 4.75 to 12.2 kbps. Ongoing standardization by bodies like and ITU continues to evolve codecs for emerging needs, such as immersive audio in and low-latency .

Fundamentals

Definition and Purpose

An audio codec, short for coder-decoder, is a device, software , or that implements the encoding of analog or uncompressed signals into a compressed format and the subsequent decoding of that format back into a playable . This dual functionality enables the efficient handling of audio data across various applications, from to professional broadcasting. The primary purpose of an audio codec is to minimize the storage and transmission requirements of audio data while maintaining acceptable perceptual quality for human listeners. For instance, uncompressed CD-quality stereo audio, sampled at 44.1 kHz with 16-bit depth, requires a bitrate of approximately 1.411 Mbps, whereas a typical codec can reduce this to under 128 kbps without significant audible degradation in many scenarios. This compression addresses fundamental challenges in digital media, such as limited bandwidth in early telecommunications networks and storage constraints in portable devices, allowing audio to be streamed or stored more economically. At a high level, an audio codec consists of an encoder and a as its core components. The encoder processes the input audio through quantization, which maps continuous values to discrete levels to facilitate representation, followed by techniques that exploit redundancies in the signal for further . The performs the inverse operations: decoding to reconstruct the quantized coefficients and dequantization to approximate the original signal values. A generic codec can be visualized as follows:
Input Audio Signal
       |
       v
   [Encoder]
   - Quantization
   - Source Coding
       |
       v
Compressed Bitstream
       |
       v
   [Decoder]
   - Source Decoding
   - Dequantization
       |
       v
Reconstructed Audio Signal
This architecture originated in the 20th century from telephony applications, where codecs were developed to compress voice signals for efficient transmission over limited-bandwidth lines, with early standards like G.711 emerging in the 1970s.

Encoding and Decoding Processes

The encoding process in an audio codec begins with converting analog audio signals into form, if the input is not already digital. This involves sampling the continuous analog at regular intervals to produce discrete time-domain samples, followed by quantization, which maps these samples to a finite set of digital values using a fixed number of bits per sample, such as 16 bits for (PCM). Compression then occurs in two main aspects: removing redundancies by exploiting statistical correlations in the signal, often through predictive or transform-based techniques, and eliminating perceptual irrelevancies by discarding audio components below human hearing thresholds, guided by psychoacoustic principles. The resulting compressed data is finally packaged into a structured , which includes the encoded audio coefficients along with side information necessary for decoding, such as markers; for example, an uncompressed PCM input at a sampling rate like 44.1 kHz can be transformed into a lower-bitrate suitable for storage or transmission. The decoding process reverses these steps to reconstruct the . It starts with unpacking the to extract the compressed or time-domain coefficients and associated side information. follows, reinstating redundancies and perceptual details through inverse transformations, such as filterbanks, to approximate the original signal structure. Dequantization then restores the quantized values to a higher-precision representation, mitigating some of the precision loss from encoding. Finally, digital-to-analog conversion (DAC) interpolates the digital samples back into a continuous analog for playback via speakers or . Most audio codecs exhibit asymmetry between encoding and decoding, with the encoding phase being computationally intensive due to the need for complex analysis, such as psychoacoustic modeling and bit allocation optimization, while decoding is designed to be lightweight and efficient to support real-time playback on resource-constrained devices like mobile phones or embedded systems. This design choice ensures low-latency reconstruction without excessive hardware demands on the consumer side. To maintain integrity during transmission or storage, audio codecs incorporate basic error handling mechanisms in the bitstream, such as codes for detecting bit errors or techniques to enable recovery from transmission losses, thereby preventing audible artifacts from corrupted data.

Historical Development

Early Analog-to-Digital Transitions

In the pre-1970s era, analog audio technologies such as recording and vinyl phonographs suffered from inherent limitations that degraded signal quality over time and distance. Tape hiss, arising from the random thermal motion of magnetic particles on the recording medium, introduced a persistent high-frequency , typically limiting the (SNR) to around 60-72 dB for professional studio masters. Similarly, bandwidth constraints in analog , such as FM radio's restriction to approximately 15 kHz for audio signals to fit within allocated spectrum, resulted in reduced fidelity and susceptibility to , making long-distance transmission and repeated playback increasingly problematic. The transition to digital audio began with the invention of (PCM) in 1937 by British engineer Alec H. Reeves while working at International Telephone and Telegraph (IT&T) in , primarily to address noise accumulation in long-haul telephony lines by converting analog signals into discrete binary pulses. Although initially overlooked, PCM gained traction during through developments at Bell Laboratories, where it was implemented in the system—a encryption terminal operational from 1943 that used a channel vocoder to analyze speech into 10 frequency bands, sampled at 50 Hz with 6-level quantization per band, for secure transatlantic communications, demonstrating early potential for digital transmission without cumulative noise. This marked an early practical shift from continuous analog waveforms to sampled digital representations, laying the groundwork for codec evolution by enabling error detection and regeneration without cumulative degradation. Claude Shannon's 1948 provided the theoretical foundation for PCM quantization, quantifying the trade-offs between , sampling rate, and distortion through concepts like and , which directly influenced optimal signal for audio . Building on this, the 1970s saw key advancements in codecs, including the standardization of μ-law companding in (1972), which compressed 14-bit linear PCM to 8 bits for North American networks, improving bandwidth efficiency while maintaining toll-quality voice at 64 kb/s. Concurrently, adaptive differential PCM (ADPCM) emerged in 1973 from research by P. Cummiskey, N. S. Jayant, and J. L. Flanagan, which predicted signal differences to reduce to 32-40 kb/s for speech with minimal perceptual loss, driven by the need for economical in systems. These innovations accelerated the analog-to-digital shift, motivated by superior immunity and scalability for and recording applications.

Digital Compression Milestones

The introduction of the (CD) in 1982 by and marked a pivotal benchmark in , utilizing uncompressed (PCM) at 44.1 kHz sampling and 16-bit depth, which delivered high-fidelity sound but generated large data volumes—approximately 10 MB per minute—prompting the need for efficient compression technologies to enable broader distribution and storage. In the mid-1980s, Dolby Laboratories advanced digital compression with AC-1, an adaptive scheme initially developed for broadcasting, serving as a precursor to the more sophisticated AC-3 () format and demonstrating early viability of perceptual coding for multichannel audio. The saw significant standardization efforts, beginning with the Audio standard in 1991, which introduced layered perceptual coding techniques that facilitated the development of portable players by reducing file sizes while maintaining near-CD quality. This culminated in the ISO/IEC 11172-3 specification for (MPEG-1 Audio Layer III) in 1992, pioneered by the Fraunhofer Society's research on psychoacoustic models that exploit human to achieve compression ratios up to 12:1 without perceptible loss. The decade's innovations were amplified by the rise of internet audio, exemplified by ' release of in 1995, the first widely adopted streaming format that compressed speech and music for dial-up connections, accelerating online media adoption despite modest quality. However, 's commercial success was tempered by patent licensing disputes in the late 1990s, involving Fraunhofer and entities like the University of , which established a royalty model but sparked legal challenges over rights. Entering the 2000s, (AAC) emerged as a successor to , standardized in in 1997 but gaining widespread adoption through Apple's launch in 2003, where it became the default format for 70 million tracks sold by 2006, offering superior efficiency at bitrates around 128 kbps. For lossless compression, the (FLAC) was specified in 2000 by the , providing 50-70% size reduction over uncompressed PCM with perfect reconstruction, ideal for archival purposes and gaining traction in open-source ecosystems. The 2010s introduced in 2012 via IETF RFC 6716, a versatile hybrid codec combining for speech and CELT for music, optimized for low-latency applications like VoIP with delays under 30 ms and bitrates as low as 6 kbps, supporting real-time communication across bandwidth-constrained networks. In the 2020s, integration of advanced audio codecs with video standards like AV1 has enhanced streaming efficiency, with Opus frequently paired in AV1 containers for platforms such as YouTube and Netflix, enabling 4K video delivery with high-quality audio at reduced bandwidth since widespread hardware support emerged around 2020. AI-assisted innovations have further pushed boundaries, as seen in Google's Lyra codec released in 2021, which leverages neural networks for ultra-low-bitrate speech compression at 3 kbps—about one-tenth of traditional codecs—while preserving intelligibility for voice calls over poor connections. In 2024, the FLAC format received formal standardization as RFC 9639 by the IETF. Additionally, the LC3 codec, part of the Bluetooth LE Audio standard finalized in 2020, saw broad device adoption by 2023-2025, enabling efficient low-latency wireless audio for hearing aids and TWS earbuds at bitrates from 160 to 345 kbps.

Technical Principles

Digital Audio Representation

Digital audio representation begins with (PCM), the foundational uncompressed format for converting analog audio signals into digital form. In PCM, the continuous-time analog is sampled at regular intervals to capture its values, which are then quantized into discrete binary levels. Key parameters include the sampling rate, measured in hertz (Hz), which determines the ; bit , indicating the number of bits per sample for precision; and the number of channels, such as mono (1) or (2). For instance, the (CD) standard employs a sampling rate of 44.1 kHz, 16-bit , and channels, enabling representation of frequencies up to 22.05 kHz with a of approximately 96 . The Nyquist-Shannon sampling theorem underpins accurate digital representation by stipulating that the sampling rate f_s must be at least twice the highest frequency component f_{\max} in the signal to prevent , where higher frequencies masquerade as lower ones, distorting . This requirement is expressed as: f_s \geq 2 f_{\max} For human auditory perception, which extends to about 20 kHz, a minimum f_s of 40 kHz suffices, though the CD's 44.1 kHz provides margin against imperfections. filters are applied prior to sampling to band-limit the signal accordingly. Quantization in PCM approximates the sampled to the nearest level from a , introducing quantization error that can manifest as or . For an ideal uniform quantizer with n bits, the (SQNR) quantifies this , derived from the ratio of signal power to the mean-square quantization assuming a full-scale sinusoidal input. The formula is: \text{SQNR} = 6.02n + 1.76 \, \text{dB} where the 6.02 dB term arises from the 2^n quantization levels and the 1.76 dB from the sine wave's power relative to uniform . For 16-bit PCM, this yields about 98 dB SQNR, sufficient for high- audio. Beyond fixed-point PCM, floating-point PCM representations are employed in professional and workflows, using a mantissa-exponent format such as the 32-bit floating-point standard to accommodate wider s without clipping. To mitigate quantization error's nonlinear effects, such as harmonic distortion in low-level signals, dithering introduces a small, uncorrelated signal before quantization, randomizing errors and preserving across the . Triangular (TPDF) dither is commonly used in audio for its noise-shaping benefits.

Compression Algorithms

Audio compression algorithms exploit redundancies and irrelevancies in digital audio signals to reduce data rates while preserving perceptual quality or exact reconstruction. Redundancy refers to statistical dependencies in the signal, such as repeated patterns or predictable samples, which can be eliminated through efficient encoding. Irrelevancy involves components inaudible to human hearing, guided by psychoacoustic models. These methods form the foundation for both lossless and lossy codecs, often combined in hybrid schemes to achieve high compression ratios.

Redundancy Reduction

Statistical coding techniques minimize the average code length by assigning shorter codes to more probable symbols, approaching the theoretical limit set by information . The H of a discrete source with symbols having probabilities p_i is given by H = -\sum p_i \log_2 p_i, representing the minimum average bits per symbol needed for lossless encoding. constructs optimal variable-length prefix codes via a , where leaf nodes correspond to symbols weighted by their probabilities; the code length for each symbol approximates -\log_2 p_i. Introduced in , it achieves near-entropy efficiency for audio symbols like quantized coefficients but requires predefined probabilities. Arithmetic coding, an alternative, encodes entire sequences into a single fractional number within [0,1), dynamically updating interval subranges based on cumulative probabilities; this avoids codeword boundaries, yielding compression closer to exact entropy, especially for sources with skewed distributions common in audio residuals. Developed from earlier ideas in 1963 and refined in implementations by 1987, it offers superior performance over Huffman for adaptive scenarios but incurs higher computational cost.

Irrelevancy Removal

Psychoacoustic principles identify signal components that contribute minimally to perceived , enabling selective discard in . Masking effects, where a stronger obscures a weaker one, are central: simultaneous masking occurs when tones near a masker's raise detection thresholds, while temporal masking affects sounds preceding or following the masker by up to 200 ms. These phenomena, quantified through critical bands—frequency ranges of about 100-400 Hz width where masking is —allow codecs to allocate fewer bits to masked regions. Seminal experiments in the established that masking thresholds vary with and level, forming the basis for perceptual models. Filter banks decompose the audio into subbands for targeted analysis and , mimicking the auditory system's selectivity. A applies bandpass filters followed by downsampling to isolate critical bands, reducing data in less perceptually sensitive areas; perfect reconstruction banks ensure lossless inversion if no quantization occurs. Early designs in the 1970s-1980s used mirror filters for cancellation, enabling efficient with minimal distortion.

Differential Coding

Differential coding exploits temporal correlations by encoding differences between samples rather than absolute values, assuming signal predictability from prior samples. Differential Pulse Code Modulation (DPCM) quantizes the prediction error e(n) = x(n) - \hat{x}(n), where \hat{x}(n) is a predictor; this reduces variance and thus quantization bits needed compared to direct PCM. Proposed in for signals like , DPCM achieves 2-4 SNR gains for speech and audio at similar rates. Linear prediction models the signal autoregressively, estimating the current sample as a of : \hat{x}(n) = \sum_{k=1}^p a_k x(n-k), with coefficients a_k optimized to minimize error (e.g., via Levinson-Durbin algorithm). For audio, orders p = 8-12 capture structures; applied in 1967 for , it reduces bit rates by 50-70% over PCM while maintaining intelligibility.

Hybrid Approaches

Hybrid methods integrate transforms for frequency decorrelation with quantization and statistical coding, balancing energy compaction and redundancy removal. The (DCT) projects the signal onto cosine basis functions, concentrating energy in low frequencies for efficient quantization; a fast from computes it with O(N log N) operations via structures, reducing multiplications by factors of 6-12 for N=8 blocks common in audio. The Modified DCT (MDCT) extends this for critically sampled, overlap-add processing, transforming 2N real samples into N coefficients with time-domain cancellation via symmetric windowing. Its equation is X_k = \sum_{n=0}^{N-1} x(n) \cos\left[\pi(k+0.5)(2n+1+N)/2N\right], enabling seamless block transitions and better pre-echo control; introduced in , it underpins modern codecs by combining transform efficiency with filter-bank-like subband resolution, achieving compression ratios up to 12:1 at transparent quality. Quantization follows, scaling coefficients inversely to perceptual importance before .

Codec Categories

Uncompressed Codecs

Uncompressed audio codecs store and transmit signals without applying any data reduction techniques, preserving the original sampled waveform in its entirety. The foundational encoding method for these codecs is Linear Pulse Code Modulation (LPCM), which represents audio as a sequence of quantized amplitude samples taken at regular intervals, without logarithmic or other nonlinear adjustments. LPCM ensures exact replication of the source material, making it the standard for applications requiring unaltered fidelity. Key container formats for LPCM include the Waveform Audio File Format (), developed by and in 1991 as a subset of the Resource Interchange File Format () specifically for uncompressed multimedia storage. WAV files typically encapsulate LPCM data, supporting various sample rates and bit depths while maintaining a simple for easy access and compatibility across Windows systems. Another prominent format is Apple's Audio Interchange File Format (), introduced in 1988 for professional audio interchange on Macintosh platforms, which stores uncompressed LPCM samples in a chunk-based similar to RIFF but optimized for big-endian byte order. AIFF supports like loop points and instrument parameters, facilitating its use in music production software. These codecs exhibit no compression artifacts, delivering full audio fidelity from the original recording, with decoding that involves straightforward sample reconstruction without complex algorithms. For instance, standard (CD-DA) employs 16-bit LPCM at a 44.1 kHz sampling rate for channels, resulting in a bitrate of 1,411 kbps that captures the full and of the medium. Advantages include seamless in digital environments and immunity to during repeated processing, though the primary drawback is substantially larger file sizes compared to compressed alternatives—often several megabytes per minute of audio. In professional recording studios, uncompressed LPCM at higher resolutions such as 24-bit depth and 96 kHz sampling rate is standard, providing extended (up to 144 dB) and broader frequency capture (up to 48 kHz) for mastering and workflows. implementations, like CD players, directly decode CD-DA's LPCM streams via dedicated digital-to-analog converters to reproduce the original signal without intermediary .

Lossless Compression Codecs

Lossless compression codecs reduce the size of files by exploiting statistical redundancies in the signal, such as correlations between adjacent samples, without discarding any data, ensuring that decoding reconstructs the original waveform bit-for-bit. These codecs typically achieve compression ratios of 40-60% of the original for common audio material like CD-quality recordings, depending on the signal's and . The core approach involves predictive modeling to estimate future samples based on past ones, followed by efficient encoding of the prediction errors, or residuals, which follow a Laplacian . This reversible process preserves all information, making it ideal for applications requiring archival fidelity, such as high-definition audio collections where exact reproduction is paramount. Key algorithms in lossless audio compression center on combined with . uses adaptive filters to forecast sample values: short-term prediction (STP) models local correlations over a few preceding samples (orders 1-4), while long-term prediction (LTP) captures periodicities across larger windows, such as in tonal music. The residuals are then compressed using entropy coders like Rice coding, which employs variable-length prefix codes parameterized by a rice parameter to match the of errors, offering fast encoding and decoding with minimal overhead. These techniques, often applied in fixed or adaptive blocks of 4,000-8,000 samples, include inter-channel for or multichannel audio to further reduce . Prominent formats include the Free Lossless Audio Codec (FLAC), developed by Josh Coalson in 2000 and standardized as RFC 9639, which supports sample depths from 4 to 32 bits and sample rates from 1 Hz to 655350 Hz, using fixed and linear predictive filters with Rice-coded residuals for broad compatibility in open-source ecosystems. Apple Lossless Audio Codec (ALAC), introduced in 2004 with iTunes 4.5, employs similar linear prediction methods within an MP4 container, targeting seamless integration in Apple devices while maintaining bit-identical decoding. Monkey's Audio (APE), originating from Matthew T. Ashland's work around 1999 and now open-source, enhances prediction with neural network-inspired filters and convolutional predictors, achieving competitive compression through adaptive entropy coding. Verification of lossless integrity relies on embedded checksums, such as 128-bit hashes computed over the uncompressed PCM data, allowing decoders to confirm bit-perfect reconstruction against the original. For instance, FLAC's STREAMINFO metadata block includes an signature that players can validate post-decoding, ensuring no errors during storage or transmission in archival scenarios like professional mastering or hi-res libraries. This mechanism underpins the reliability of these codecs for long-term preservation, where even minor alterations could compromise audio quality.

Lossy Compression Codecs

Lossy compression codecs achieve higher data reduction than lossless methods by discarding audio data that is perceptually irrelevant to human hearing, based on psychoacoustic models. This allows for significantly smaller file sizes at the cost of some fidelity, making them suitable for storage and transmission where is limited. Common range from 64 kbps for voice to 320 kbps for music, with varying by and content. These codecs often employ perceptual coding, which analyzes the to identify and remove components masked by louder sounds or outside the audible frequency range (typically 20 Hz to 20 kHz). Transform-based methods, such as the (MDCT) used in and , convert the time-domain signal to for efficient quantization and encoding of spectral coefficients. Examples include MPEG-1 Audio Layer III () and Advanced Audio Coding (), which balance compression efficiency and perceived quality for consumer applications. Further details on specific techniques are covered in subsequent sections.

Lossy Compression Codecs

Perceptual Coding Techniques

Perceptual coding techniques in audio compression leverage models of human auditory perception to discard signal components that are inaudible or imperceptible, thereby achieving high compression ratios without significant quality degradation. These methods rely on psychoacoustic principles to identify redundancies based on how the ear and brain process sound, focusing on phenomena such as masking and loudness perception. Central to this approach is the psychoacoustic model, which analyzes the audio signal to compute masking thresholds that determine the just-noticeable levels of quantization noise. Seminal work by Johnston introduced the concept of perceptual entropy as a measure of the information content audible to the human ear, guiding the efficient allocation of bits in lossy codecs. The psychoacoustic model incorporates masking, where a louder raises the detection threshold for nearby frequencies, and temporal masking, where a sound influences before or after its occurrence. In simultaneous masking, a masker significantly elevates the detection threshold for signals within its , with the amount depending on the masker's intensity and proximity, with the effect spreading asymmetrically—stronger toward lower frequencies (up to 30 per ) and weaker toward higher ones (about 15 per ). Temporal masking includes post-masking lasting 100-200 ms after the masker and pre-masking up to 20 ms before it, allowing subsequent quantization noise to be hidden in these temporal windows. Equal-loudness contours, originally mapped by and Munson, account for the ear's varying sensitivity across frequencies, with lower sensitivity at and treble extremes; for instance, at 60 phons, sensitivity peaks around 3-4 kHz but drops by 10-20 at 100 Hz and 10 kHz. These contours are integrated into the model via scales like or ERB, which approximate s for perceptual grouping. Bit allocation dynamically assigns quantization precision based on computed masking , prioritizing audible regions while minimizing bits in masked areas. Frequencies are grouped into scalefactor bands—typically 20-30 bands mimicking critical bandwidths—to enable efficient , where each band's masking informs the allowable . shaping further refines this by redistribution of quantization , pushing it into bands where it falls below the masking T(f) = T_q(f) + \Delta M(f), with T_q(f) as the absolute in quiet and \Delta M(f) the masking offset from signal components. This ensures perceptual at low bitrates, as becomes inaudible within masked regions. Advancements in perceptual coding have led to hybrid psychoacoustic models that incorporate hearing effects, enhancing efficiency for spatial audio. Binaural unmasking, via the binaural masking level difference (BMLD), can lower thresholds by up to 15 dB for signals with interaural differences, allowing better exploitation of redundancies in modern codecs. These models combine masking with binaural cues, improving bitrate savings while preserving spatial fidelity.

Transform-Based Methods

Transform-based methods in lossy audio codecs employ mathematical transformations to convert time-domain audio signals into the , enabling more efficient representation and compression by concentrating signal energy into fewer coefficients. These techniques facilitate the identification and quantization of perceptually relevant components while discarding or coarsely representing less important ones. The (MDCT) is a prominent discrete transform used in such codecs, providing critically sampled representation with perfect reconstruction capabilities through time-domain aliasing cancellation (TDAC). Introduced by Princen, , and , the MDCT processes overlapping blocks of audio samples, typically with 50% overlap between adjacent frames, to minimize artifacts like blocking at frame boundaries. The transform operates on an input block of length N, producing N/2 real-valued coefficients, which supports efficient encoding of the signal's spectral content. To mitigate and ensure smooth transitions during overlap-add reconstruction, are applied to the input blocks before transformation. Common choices include the , defined as w(n) = \sin\left[\frac{\pi (n + 0.5)}{N}\right] for n = 0 to N-1, which satisfies the constant overlap-add () condition for perfect reconstruction, and the Kaiser-Bessel derived , an approximation of the prolate spheroidal that optimizes energy concentration in the . These windows reduce inter-frame discontinuities, enhancing the codec's ability to handle transient signals without introducing audible distortions. Quadrature Mirror Filters (QMF) and enable subband decomposition in transform-based systems, dividing the audio spectrum into narrower frequency bands for targeted processing. Proposed by and Galand, QMFs consist of analysis that split the signal into low- and high-pass subbands, with synthesis reconstructing it while minimizing through mirror-image symmetry in their frequency responses. For efficiency in multi-band implementations, critically sampled polyphase filters are employed, representing the as polyphase components downsampled by the number of bands, which reduces computational complexity without loss of information in the transform domain. In the , the resulting transform from MDCT or QMF-based decompositions are quantized to reduce , exploiting the signal's energy distribution, and then entropy-coded using techniques like to further compress the data by assigning shorter codes to frequent values. At the , the MDCT synthesizes the time-domain signal via overlap-add of windowed -transformed blocks. The MDCT formula for reconstructing sample x(n) from X_k is given by: x(n) = \sum_{k=0}^{N/2-1} X_k \cos\left[\pi(k+0.5)(2n+1+N)/2N\right] for n = 0 to N-1, ensuring aliasing cancellation when combined with adjacent frames. As alternatives to MDCT and QMF, wavelet transforms have been explored in experimental audio codecs for superior time-frequency resolution, particularly in handling non-stationary signals like transients. Wavelet-based approaches decompose the signal into multi-resolution subbands using scalable bases, allowing adaptive bitrate allocation and better preservation of temporal details compared to fixed-block transforms.

Major Audio Codec Standards

MPEG Family (MP3, AAC)

The MPEG family of audio codecs encompasses standards developed under the (MPEG), with and representing pivotal advancements in perceptual audio coding for digital media. , formally known as MPEG-1/2 Layer III, was standardized in 1993 as an extension of earlier MPEG audio layers, enabling efficient of stereo audio signals. Its core architecture relies on a that divides the input signal into 32 equally spaced subbands, each approximately 689 Hz wide at a 44.1 kHz sampling rate, followed by a hybrid incorporating a (MDCT) to yield 576 frequency lines per granule for finer spectral resolution. Quantized spectral coefficients are then entropy-coded using , which employs variable-length codes selected from 32 tables based on signal statistics to minimize bitrate while preserving perceptual quality. Joint stereo techniques, including mid-side () stereo for low frequencies and intensity stereo for higher bands, further exploit inter-channel redundancies to enhance efficiency. supports bitrates ranging from 32 to 320 kbps, with constant bitrate (CBR) or (VBR) modes, making it suitable for a wide array of applications from voice to music. Licensing for implementation was managed by the , which held key patents and administered royalties until their expiration in 2017. Building on MP3's foundation, (AAC) was introduced in 1997 as part of and later refined in MPEG-4, offering improved compression through a more sophisticated perceptual model and design. Unlike MP3's hybrid approach, AAC employs a pure MDCT with up to 1024 lines, providing higher resolution and better handling of transient signals via window switching between 2048- and 256-line lengths. Temporal Noise Shaping (TNS) integrates noise shaping in the time domain to reduce pre-echo artifacts, particularly beneficial for percussive sounds and speech at low bitrates. For enhanced efficiency at very low bitrates, Replication (SBR) reconstructs high-frequency content from a lower-bandwidth core signal, enabling profiles like High-Efficiency AAC (HE-AAC), which combines AAC-LC (Low Complexity) with SBR to maintain down to 24 kbps. These features allow AAC to support multichannel audio (up to 48 channels) and sampling rates up to 96 kHz, with to profiles. MP3 gained widespread adoption following the release of the PMP300 portable player in 1998, the first commercially successful device to store and playback files, holding up to 32 minutes of music at 128 kbps and catalyzing the portable market. , in turn, became the preferred codec for modern wireless and streaming applications, serving as the default audio format in audio transmission on Apple devices and many implementations due to its balance of quality and low latency. It is also the recommended audio codec for uploads, with guidelines specifying AAC-LC at 128 kbps or higher for optimal playback. Despite their successes, exhibits noticeable perceptual artifacts, such as pre-echo and quantization noise, at bitrates below 96 kbps, where spectral smearing and muffled high frequencies become audible, limiting its suitability for bandwidth-constrained scenarios. addresses these shortcomings with approximately 30% greater efficiency, achieving comparable perceptual quality to at about 70% of the bitrate—for instance, 96 kbps rivals 128 kbps for stereo audio—through advanced tools like TNS and scalable profiles.

Open Standards (Opus, Vorbis)

Open standards in audio codecs refer to , open-source formats developed independently of proprietary or patented technologies, such as those from the MPEG consortium, allowing for broad, unrestricted adoption and community-driven improvements. These codecs prioritize versatility across applications like streaming, , and communication, fostering without licensing barriers. Vorbis, released by the Xiph.Org Foundation in 2000, is a lossy perceptual audio codec designed as a free alternative to proprietary formats. It employs the and utilizes the (MDCT) combined with cascaded for efficient compression, enabling encoding that adapts to content complexity. Vorbis supports sampling rates from 8 kHz to 192 kHz and multichannel audio, making it suitable for high-quality music reproduction at bitrates around 128 kbps for stereo. Its open-source nature has led to widespread use in video games for in-game audio and in web applications, including HTML5 Opus, standardized by the (IETF) in 6716 in 2012, represents a hybrid open codec tailored for both speech and music, merging the framework for linear prediction-based with the CELT transform coder using MDCT for general audio. This dual-mode design allows seamless switching between speech optimization and fullband music handling, with adaptive bitrates ranging from 6 kbps to 510 kbps and frame sizes as short as 2.5 ms, achieving algorithmic latency under 5 ms. As the default codec in , excels in interactive scenarios requiring low delay and high efficiency. Key advantages of these open standards include the absence of licensing fees, enabling free implementation in software and hardware worldwide, unlike patented alternatives. particularly outperforms at low bitrates for both speech and music transmission, providing superior perceptual quality in bandwidth-constrained environments such as mobile streaming. serves as a foundational successor to within open ecosystems, offering comparable or better compression efficiency without patent encumbrances, which has sustained its role in community-driven media tools. Implementations of these codecs are deeply integrated into modern platforms: and provide native decoding support for in Ogg containers, facilitating its use in web audio playback since the early 2010s. For real-time applications, powers voice-over-IP in services like and , leveraging WebRTC's framework to handle millions of concurrent users with minimal latency and resilience.

Applications and Implementations

Consumer Media and Streaming

In consumer media and streaming, audio codecs play a pivotal role in enabling efficient playback and distribution on everyday devices, balancing quality with storage and bandwidth constraints. Portable smartphones and media players commonly employ lossy codecs like and to handle audio files, with Apple's supporting HE-AAC playback since 3.1 in 2009 for enhanced efficiency at lower bitrates. For wireless audio transmission, codecs such as —the mandatory baseline for audio—and are widely used in smartphones, with providing basic compression up to 328 kbps and offering improved quality and lower latency on compatible devices. These implementations ensure seamless integration in mobile ecosystems, where decoding efficiency directly influences device performance. Streaming services leverage advanced codecs to deliver on-demand audio over variable networks, often employing adaptive bitrate streaming to adjust quality dynamically. Spotify streams premium content using lossy formats like Ogg Vorbis or AAC at up to 320 kbps, with lossless FLAC up to 24-bit/44.1 kHz available as of September 2025, and lower tiers at around 96 kbps, allowing real-time switching based on connection speed to minimize buffering. Apple Music primarily uses AAC at 256 kbps for its standard streaming, supplemented by ALAC for lossless options up to 24-bit/192 kHz, enabling adaptive adjustments that prioritize user experience across iOS devices. This approach in platforms like Spotify and Apple Music supports high-volume distribution while optimizing for mobile data usage. Audio files in consumer contexts are typically packaged in container formats that encapsulate codec data for compatibility and metadata handling. The MP4 container is standard for AAC audio, supporting features like chapters and artwork in files often saved as .m4a, making it ideal for iOS and cross-platform playback. Similarly, the Ogg container is commonly used for Vorbis, providing an open-source alternative with efficient seeking and multi-stream support for web and desktop applications. During CD ripping, transcoding uncompressed PCM audio to MP3 introduces challenges like irreversible quality loss due to perceptual compression, potential artifacts from poor encoder settings, and the need for accurate track metadata to avoid playback issues. From a user perspective, efficient codec decoding in portable devices contributes to significant life extensions by reducing computational demands on the . For instance, implementations of codecs like in mobile chipsets reduce audio playback power consumption compared to uncompressed formats, allowing hours of additional listening time. Historically, 128 kbps has served as a widely accepted "good enough" quality threshold for in consumer scenarios, delivering acceptable fidelity for casual listening on devices with limited , though higher bitrates are now preferred for nuanced reproduction.

Professional Audio and Broadcasting

In professional audio production, uncompressed pulse-code modulation (PCM) formats such as are standard for recording and mixing in digital audio workstations (DAWs) like Avid to preserve full fidelity and prevent generational loss from repeated encoding-decoding cycles. natively supports importing and processing WAV files containing uncompressed PCM audio, ensuring no during editing, effects application, and mastering stages. Lossless compressed formats like are also employed in professional workflows for efficient storage of high-resolution sessions, particularly in DAWs such as Steinberg Nuendo or PreSonus Studio One, where they decode transparently to PCM without quality loss. For broadcasting, the AC-3 codec, known as , serves as the mandated audio compression standard in the Advanced Television Systems Committee (ATSC) framework for , adopted in 1995 to enable multichannel transmission over limited bandwidth. In Europe, (HE-AAC) has been integral to Digital Audio Broadcasting Plus (DAB+) since its specification in 2006, providing superior efficiency for stereo and surround audio in while maintaining broadcast quality at lower bitrates. These standards ensure reliable delivery of high-fidelity audio in over-the-air and cable systems, prioritizing perceptual transparency for live and pre-recorded content. In and Voice over Internet Protocol (VoIP) applications, codecs like and address the need for real-time communication with minimal delay. , an standard for of voice frequencies at 64 kbit/s, remains the baseline for traditional due to its uncompressed nature and low algorithmic latency of approximately 0.125 ms. , defined in IETF RFC 6716, is widely adopted for modern VoIP in platforms requiring interactive speech and audio, offering low default delay of 26.5 ms and adaptability to varying network conditions. End-to-end latency in these systems is recommended to stay below 150 ms per G.114 to maintain natural conversational flow without perceptible impairment. Audio archiving in professional and institutional settings relies on 24-bit lossless formats to capture the of , as recommended by the International Association of Sound Archives (IASA) for digitizing analog sources without introducing quantization noise. During the , libraries and archives undertook widespread migrations from deteriorating analog tapes to digital formats like PCM on optical media or early hard drives, driven by preservation initiatives from organizations such as the Association for Recorded Sound Collections (ARSC) to safeguard against media degradation. These efforts established 24-bit/96 kHz files as a common archival master, enabling long-term access while retaining full spectral detail from original analog masters.

Performance Metrics and Evaluation

Quality Assessment Methods

Quality assessment of audio codecs primarily focuses on evaluating the perceptual of the reconstructed signal compared to the original, using both subjective listening tests and computational metrics. Subjective methods capture human directly but require controlled environments and trained listeners, while methods offer repeatable, automated evaluations that approximate auditory responses. These approaches ensure codecs balance efficiency with minimal audible artifacts, though efficiency trade-offs are analyzed separately. Subjective evaluation often employs the (MOS), a standardized scale from 1 (bad) to 5 (excellent) where multiple listeners rate audio samples, and the provides the overall score. This method, detailed in ITU-T Recommendation P.800, is foundational for assessing speech and general audio quality through absolute category rating (ACR) procedures. For more nuanced testing of intermediate-quality codecs, the (MUltiple Stimuli with Hidden Reference and Anchor) method is preferred, involving expert listeners rating several processed versions alongside a hidden original reference and low-quality anchors on a 0-100 continuous scale. As specified in ITU-R Recommendation BS.1534, MUSHRA enhances reliability by mitigating bias through randomization and anchoring, making it suitable for codec development and comparison. Objective metrics provide quantifiable proxies for perceived quality without human involvement. The Signal-to-Noise Ratio (SNR) is a basic measure of distortion, defined as the ratio of the original signal power to the noise power introduced by encoding and decoding: \text{SNR} = 10 \log_{10} \left( \frac{P_{\text{signal}}}{P_{\text{noise}}} \right) \quad \text{dB} Higher SNR values indicate lower distortion and better fidelity, with typical thresholds above 30 dB considered high quality for audio systems. More perceptually relevant is the Perceptual Evaluation of Audio Quality (PEAQ) model, which emulates human psychoacoustic processing through a series of filters, error mapping, and cognitive modeling to predict subjective annoyance. Standardized in ITU-R Recommendation BS.1387, PEAQ outputs the Basic Objective Difference Grade (ODG), ranging from -4 (very annoying degradation) to 0 (imperceptible difference), correlating strongly with MOS scores for lossy codecs. Blind testing complements these methods by verifying codec transparency—whether differences are inaudible under realistic conditions—using ABX comparators. In an ABX test, listeners compare reference A (original), B (encoded), and an unknown X (either A or B) in a double-blind setup, with statistical analysis determining detectability. This technique, formalized in Audio Engineering Society Convention Paper 3167, is widely applied to confirm perceptual equivalence in evaluations.

Bitrate and Efficiency Comparisons

Audio codecs vary significantly in their bitrate requirements to achieve comparable perceptual quality, often measured through listening tests that evaluate transparency or mean opinion scores (MOS). For instance, in a 2014 multiformat listening test on stereo music, Opus at approximately 107 kbps achieved a quality rating of 4.66 out of 5, outperforming MP3 at 136 kbps with a score of 4.24, demonstrating Opus's superior efficiency at lower bitrates. Similarly, LC-AAC at 104 kbps scored 4.42, indicating about 25% greater efficiency than MP3 for similar quality levels. These bitrate ladders highlight how modern codecs like Opus and AAC can deliver near-transparent audio at rates 20-50% lower than legacy formats like MP3, reducing bandwidth needs without perceptible loss. Subsequent informal listening tests through 2023 have generally reaffirmed these efficiency advantages. Efficiency also encompasses computational complexity, typically quantified in million instructions per second (MIPS) per channel for encoding and decoding. Opus, optimized for real-time applications, requires around 52 MIPS for encoding in CELT mode at high complexity (level 10) and 32 kbps, making it suitable for resource-constrained devices. In contrast, MP3 encoders like LAME demand higher MIPS for equivalent tasks, often exceeding 60 MIPS at mid-bitrates, while AAC implementations balance at 40-50 MIPS depending on profile. These metrics underscore trade-offs in processing power, with Opus's hybrid design enabling lower overall MIPS for mixed speech-music content.
CodecTypical Bitrate for Near-Transparent Quality (Stereo, 44.1 kHz)Efficiency Gain vs. Computational Complexity (MIPS, Encode/Decode, Approx.)
128-192 kbpsBaseline60 / 10
96-128 kbps~25% better45 / 12
64-96 kbps~40-50% better52 / 18
Variable (lossless, ~700-1000 kbps effective)50% file size reduction vs. 20 / 15 (decode-focused)
The table above summarizes representative data from standardized listening tests and implementation benchmarks, where FLAC achieves approximately 50% file size reduction compared to uncompressed WAV without quality loss, ideal for storage but not transmission. Latency is a critical efficiency factor, particularly for interactive applications. Speech-oriented codecs, such as those in VoIP, typically exhibit delays of 6-20 ms to support real-time conversation, with achieving 26.5 ms default latency via 20 ms frames. Music-focused codecs like and , prioritizing quality over immediacy, often incur 100+ ms delays due to longer analysis windows and buffering. On mobile devices, power consumption further influences efficiency; includes optimizations, making it preferable for battery-constrained streaming. Trade-offs in bitrate and efficiency depend on use cases, balancing , , and resources. For in music playback, higher bitrates like 256 kbps ensure indistinguishable-from-original audio, suitable for high-fidelity streaming where bandwidth is abundant. Conversely, low-bandwidth scenarios, such as voice over narrow networks, favor 32 kbps or modes, maintaining intelligibility with minimal data while accepting some artifacts in non-speech elements. These choices optimize for scenarios like mobile broadcasting (low power, moderate ) versus archival storage (lossless efficiency).

References

  1. [1]
    Codecs - Voice, GSM, Speech - ETSI
    An audio codec (COder/DECoder) converts analog audio signals into digital signals for transmission or encodes them for storage.
  2. [2]
    [PDF] MP3 and AAC Explained
    Newer audio compression technologies​​ MPEG-1 Layer-3 has been defined in 1991. Since then, research on perceptual audio coding has progressed and codecs with ...
  3. [3]
    Web audio codec guide - Media | MDN
    ### Summary of Audio Codecs from MDN Web Audio Codec Guide
  4. [4]
    Codecs FAQ - Microsoft Support
    A codec can consist of two parts: an encoder that compresses the media file (encoding) and a decoder that decompresses the file (decoding). Some codecs ...<|control11|><|separator|>
  5. [5]
    Selecting and Implementing Audio Codecs | DigiKey
    Dec 2, 2020 · An audio codec is a hardware component that is capable of encoding or decoding a digital data stream containing audio information1. An audio ...
  6. [6]
    Understanding audio bitrate and audio quality - Adobe
    Audio CD bitrate is always 1,411 kilobits per second (Kbps). The MP3 format ... CD-quality bitrate, which is high, sounds its best on a professional ...
  7. [7]
    High bitrate audio is overkill: CD quality is still great - SoundGuys
    Oct 1, 2025 · For example, a CD uses a 16-bit signal that's sampled at 44.1 thousand times per second (kHz). The bitrate for such a file would be 1,411kbps ...
  8. [8]
    [PDF] Lecture 7: Audio Compression & Coding - Electrical Engineering
    How to reduce? - lower sampling rate → less bandwidth (muffled). - lower channel count → no stereo image. - lower sample size → quantization ...
  9. [9]
    Beginners Guide to VoIP Audio CODECs - DLS Internet Services
    Oct 7, 2019 · Introduced in 1972 by ITU, the G.711 CODEC has been used in digital telephony since then. The Codec has two main variants: A-Law (used in Europe ...G. 711 · G. 729 · G. 723.1
  10. [10]
    [PDF] Digital Audio Compression - Electrical Engineering
    MPEG/Audio Encoding and Decoding. Figure 6 shows block diagrams of the MPEG/audio encoder and decoder.[11,12] In this high-level rep- resentation, encoding ...
  11. [11]
    [PDF] Perceptual Coding of High-Quality Digital Audio - Index of /
    ABSTRACT | This paper introduces high-quality audio coding using psychoacoustic models. This technology is now abun- dant, with gadgets named after a ...
  12. [12]
    Analog Tape Can Never Be HD: Here's Why - Real HD-Audio
    Apr 10, 2013 · Analog tape's frequency response drops after 20kHz, and each copy degrades SNR. Dynamic range is limited to 72dB, and 12-bit PCM can capture ...
  13. [13]
    Digital Recording White Paper - Sanders Sound Systems
    Specifically, the frequency bandwidth of LPs and FM multiplex broadcasts were limited from 30 Hz to 15 KHz. The S/N (Signal to Noise ratio) was limited to ...
  14. [14]
    Pulse Code Modulation - Engineering and Technology History Wiki
    In 1937, Alec Reeves came up with the idea of Pulse Code Modulation (PCM). At the time, few, if any, took notice of Reeve's development.
  15. [15]
    SIGSALY - Crypto Museum
    Oct 30, 2016 · SIGSALY was a digital speech encryption system developed by Bell Labs, used during WWII for secure talks, and used One-Time Pad encryption.
  16. [16]
    [PDF] A Mathematical Theory of Communication
    Continuous information sources that have been rendered discrete by some quantizing process. For example, the quantized speech from a PCM transmitter, or a ...
  17. [17]
    [PDF] Adaptive Quantization in Differential PCM Coding of Speech - vtda.org
    CUMMISKEY, N. S. JAYANT, and J. L. FLANAGAN. (Manuscript received March 12, 1973). We describe an adaptive differential PCM (ADPCM) coder which makes ...
  18. [18]
    Introduction of the Compact Disc - Vintage Digital
    The Compact Disc (CD) was launched in August 1982 as a new optical digital audio format, co-developed by Philips and Sony. It offered higher fidelity, ...
  19. [19]
    Compact Disc (1982 – ) | Museum of Obsolete Media
    Compact Disc (Compact Disc Digital Audio or CD) is a digital optical disc format for audio playback, released commercially in Japan in late 1982 (followed ...
  20. [20]
    What Is AC-1, AC-2, and AC-3? - Computer Hope
    Jun 16, 2017 · Dolby AC-1 was the first digital coding technology introduced in 1987, A development that would later become HDTV (High-Definition TeleVision). ...Missing: 1980s | Show results with:1980s
  21. [21]
    [PDF] MP3 and AAC Explained - Fraunhofer IIS
    ISO/IEC IS 11172 in late 1992. The audio coding part of MPEG-1 (ISO/IEC IS 11172-3, see [5] describes a generic coding system, designed to fit the demands of.Missing: Society | Show results with:Society
  22. [22]
    MP3 (MPEG Layer III Audio Encoding) - The Library of Congress
    Mar 26, 2024 · MP3 is MPEG Layer III audio encoding, using perceptual coding to reduce audio precision, and is defined in ISO/IEC specifications.
  23. [23]
    RealAudio | Definition, History, & Facts - Britannica
    RealAudio, a compressed audio format created in 1995 by Progressive Networks (after 1997, RealNetworks, Inc.) that was popular in the 1990s and early 2000s.Missing: rise | Show results with:rise
  24. [24]
    The Early History Of The Streaming Media Industry and The Battle ...
    Mar 9, 2016 · Progressive Networks is considered by many to have started the streaming media industry with their launch of RealAudio 1.0 in April of 1995.Missing: rise | Show results with:rise
  25. [25]
    A scrap over patents - The Economist
    Feb 23, 2007 · Vinyl long-playing records and cassette tapes were supplanted by the compact disc. Now that technology faces stiff competition from the MP3 file ...
  26. [26]
    Thirty years of audio coding and counting - Leonardo's Blog
    Feb 3, 2019 · The dominating role of MP3 in music distribution was shaken in 2003 when Apple announced that its iTunes and iPod products would use MPEG-4 AAC ...
  27. [27]
    25 years of Apple's innovation with the iTunes Music Store
    Feb 11, 2025 · The fourth of Apple's top 10 major areas of innovation in the last 25 years is 2003's iTunes Music Store. ... Using Apple's own AAC encoder, it ...
  28. [28]
    FLAC - What is FLAC? - Xiph.org
    FLAC stands for Free Lossless Audio Codec, an audio format similar to MP3, but lossless, meaning that audio is compressed in FLAC without any loss in quality.Downloads · Changelog · Using FLAC · FLAC 1.5.0 released
  29. [29]
    RFC 9639 - Free Lossless Audio Codec (FLAC) - IETF Datatracker
    Jan 22, 2025 · The FLAC format was first specified in December 2000, and the bitstream format was considered frozen with the release of FLAC 1.0 (the ...
  30. [30]
    RFC 6716 - Definition of the Opus Audio Codec - IETF Datatracker
    Mar 24, 2023 · This document defines the Opus interactive speech and audio codec. Opus is designed to handle a wide range of interactive audio applications.
  31. [31]
    Opus audio codec is now RFC6716, Opus 1.0.1 reference ... - Xiph.org
    Sep 11, 2012 · Despite its low latency, Opus also excels at streaming and storage applications, beating existing high-delay codecs like Vorbis and HE-AAC.
  32. [32]
    Lyra: A New Very Low-Bitrate Codec for Speech Compression
    Feb 25, 2021 · We have created Lyra, a high-quality, very low-bitrate speech codec that makes voice communication available even on the slowest networks.
  33. [33]
    google/lyra: A Very Low-Bitrate Codec for Speech Compression
    Lyra is a high-quality, low-bitrate speech codec that makes voice communication available even on the slowest networks.
  34. [34]
    [PDF] AES WHITE PAPER - Audio Engineering Society
    Jun 4, 2009 · A stereo CD quality audio stream (16- bit resolution, 44.1-kHz sampling) requires 1.4 Mbps1 of data throughput, a quantity easily supported ...
  35. [35]
    [PDF] AES White Paper - Stanford CCRMA
    Standard sampling rates include 32. kHz, 44.1 kHz, 48 kHz, 88.2 kHz, and 96 kHz. Standard data formats include linear PCM with 16-, 18-,. 20-, 24-, and 32-bit ...
  36. [36]
    Sampling Theory - Stanford CCRMA
    The sampling theorem provides that a properly bandlimited continuous-time signal can be sampled and reconstructed from its samples without error, in principle.Missing: equation | Show results with:equation
  37. [37]
    Music 220A - Stanford CCRMA
    The Nyquist Theorem: to represent digitally a signal containing frequency components up to f Hz it is necessary to use a sampling rate of more than 2f samples ...Missing: audio equation
  38. [38]
    [PDF] MT-001: Taking the Mystery out of the Infamous Formula,"SNR ...
    Once the rms quantization noise voltage is known, the theoretical signal-to-noise ratio ... SNR = 6.02N + 1.76dB, over the dc to fs/2 bandwidth. Eq. 9. Bennett's ...
  39. [39]
    [PDF] Digital Audio Standards
    Takasu, “Improved PCM (Pulse Code Modulation) Re- cording System,” presented at the 56th AES Convention,. Paris, March 1-4, 1977; AES Preprint No. 1206. [7] ...
  40. [40]
    On the Nature of Granulation Noisein Uniform Quantization Systems*
    quantization process in digital audio systems is pre- sented first. The intent is to emphasize the properties. The input analog signal in aconventional pulse ...
  41. [41]
    [PDF] A Method for the Construction of Minimum-Redundancy Codes*
    Minimum-Redundancy Codes*. DAVID A. HUFFMAN+, ASSOCIATE, IRE. September. Page 2. 1952. Huffman: A Method for the Construction of Minimum-Redundancy Codes. 1099 ...
  42. [42]
    [PDF] ARITHMETIC CODING FOR DATA COIUPRESSION
    Their paper discusses several variable-length codings for the integers used as cache indexes. Arithmetic coding allows any probability distribution to be ...Missing: Peter | Show results with:Peter
  43. [43]
    Auditory Masking and the Critical Band - AIP Publishing
    Masked audiograms were studied as a function of the bandwidth, level, and frequency of a masking noise. In a reverse procedure, audiograms were determined ...
  44. [44]
    Predictive Quantizing Systems (Differential Pulse Code Modulation ...
    Differential pulse code modulation (DP CM) and predictive quantizing are two names for a technique used to encode analog signals into digital pulses.Missing: original | Show results with:original
  45. [45]
    Adaptive Predictive Coding of Speech Signals - Atal - 1970
    We describe in this paper a method for efficient encoding of speech signals, based on predictive coding. In this coding method, both the transmitter and the ...
  46. [46]
  47. [47]
    PCM, Pulse Code Modulated Audio - The Library of Congress
    Apr 26, 2024 · Linear PCM is an uncompressed format. Compressed variants are widely used for telephony and other low-bandwidth applications. Relationship to ...
  48. [48]
    AudioFileTypeID | Apple Developer Documentation
    An Audio Interchange File Format Compressed (AIFF-C) file. A Microsoft WAVE file. A Sound Designer II file.
  49. [49]
    AIFF / AIFC Sound File Specifications - McGill University
    Sep 20, 2017 · The AIFF and the later AIFF-C specifications came from Apple Computer. This format is used on SGI machines. The latest data formats from Apple ...
  50. [50]
  51. [51]
    Red Book CD Format Explained - TravSonic
    It also specifies the form of digital audio encoding: 2-channel signed 16-bit Linear PCM sampled at 44,100 Hz. This sample rate is adapted from that attained ...
  52. [52]
    The History of the DAW - Yamaha Music Blog
    May 1, 2019 · Learn about the history of the Digital Audio Workstation (DAW) from the earliest days to current systems.
  53. [53]
    [PDF] Lossless Compression of Audio Data - Montana State University
    The compressed files are assigned the file extension * . shn. Audio files compressed losslessly by Shorten are typically between 40 and 60% of the original ...
  54. [54]
    [PDF] Overview of lossless audio codecs - ERK
    In this paper, we overview existing lossless audio formats, their way of predicting audio samples and encoding differences between the samples and predicted.Missing: principles entropy ALAC
  55. [55]
  56. [56]
    After seven years, Apple open sources its Apple Lossless Audio ...
    Oct 28, 2011 · Apple first introduced its own lossless audio compression format, Apple Lossless Audio Codec (ALAC), in 2004 with iTunes 4.5.
  57. [57]
    Monkey's Audio - a fast and powerful lossless audio compressor
    Monkey's Audio is a fast and easy way to compress digital music. Unlike traditional methods such as mp3, ogg, or wma that permanently discard quality to save ...Download · Theory · Help · License
  58. [58]
    [PDF] Transform coding of audio signals using perceptual noise criteria
    Johnston, “A method of estimating the perceptual entropy of an audio signal,” submitted to ICASSP '88. [7] -, “Digital coding of musical sound-Some statistics ...
  59. [59]
    Loudness, Its Definition, Measurement and Calculation
    Author & Article Information. Harvey Fletcher , W. A. Munson. Bell Telephone Laboratories. J. Acoust. Soc. Am. 5, 82–108 (1933). https://doi.org/10.1121/ ...
  60. [60]
    Psychoacoustic Models for Perceptual Audio Coding—A Tutorial ...
    This paper provides a tutorial introduction of the most commonly used psychoacoustic models for low bitrate perceptual audio coding.Missing: seminal | Show results with:seminal
  61. [61]
    Subband/Transform coding using filter bank designs based on time ...
    The application of TDAC systems to Subband/Transform coding is also discussed and the objective performance of a 32 band coder using several different window ...
  62. [62]
    [PDF] arXiv:1902.01053v1 [eess.AS] 4 Feb 2019
    Feb 4, 2019 · Commonly used windows include the half-sine and a Kaiser-. Bessel derived window. The latter is an approximation of the discrete prolate ...
  63. [63]
    [PDF] A Low Bit Rate Audio Codec Using Wavelet Transform - IJERA
    In this paper, a low bit rate audio codec algorithm using wavelet transform and wavelet packet transform has been developed, which is simple yet effective ...Missing: experimental | Show results with:experimental
  64. [64]
  65. [65]
    [PDF] The Theory Behind Mp3
    Working within ISO, the Moving Picture Experts Group was assigned to initiate the development of a common standard for coding/compressing a representation of ...Missing: Society | Show results with:Society
  66. [66]
  67. [67]
    MP3 | Make Software, Change the World! | Computer History Museum
    Rio PMP300 player and ear phones, 1998. The compact PMP300 was the first commercially successful MP3 player. It cost $200, held 30 minutes of music, and ran 10 ...<|separator|>
  68. [68]
    Video and audio formatting specifications - YouTube Help
    Video formatting guidelines ; Video codec: H.264 ; Audio codec: AAC ; Audio bitrate: 128 kbps or better.
  69. [69]
    What Are Bluetooth Codecs? A Guide to Everything From AAC to SBC
    AAC is the highest-quality codec that Apple products support, but they default to transmitting over SBC when paired headphones don't support that codec. So if ...
  70. [70]
    Bitrate in Audio - What It Is & How It Works - Audiodrome
    May 13, 2025 · At very low bitrates, such as 96 kbps or below, you'll often hear obvious artifacts. High frequencies might sound muffled or washed out. Reverb ...
  71. [71]
    Vorbis.com: FAQ - Xiph.org
    Oct 3, 2003 · Ogg Vorbis is a free, open, unpatented audio compression format, designed to replace proprietary formats like MP3. Vorbis is the compression ...
  72. [72]
    RFC 6716 - Definition of the Opus Audio Codec - IETF Datatracker
    1. Bitrate Opus supports all bitrates from 6 kbit/s to 510 kbit/s. · 2. Number of Channels (Mono/Stereo) Opus can transmit either mono or stereo frames within a ...
  73. [73]
    Vorbis I specification - Xiph.org
    Jul 4, 2020 · Vorbis is a general purpose perceptual audio CODEC intended to allow maximum encoder flexibility, thus allowing it to scale competitively over an exceptionally ...
  74. [74]
    Ogg Vorbis audio format | Can I use... Support tables for ... - CanIUse
    Ogg Vorbis audio format ; Chrome. 4 - 141 : Supported. 142 ; Edge *. 12 - 16 : Not supported. 17 - 141 : Supported. 142 ; Safari. 3.1 - 14 : Not supported. 14.1 - ...
  75. [75]
    Opus Codec
    Sampling rates from 8 kHz (narrowband) to 48 kHz (fullband); Frame sizes from 2.5 ms to 60 ms; Support for both constant bitrate (CBR) and variable bitrate (VBR) ...Comparison · Downloads · Opus documentation · Opus examples
  76. [76]
    [PDF] The Opus Codec - arXiv
    Feb 15, 2016 · ABSTRACT. The IETF recently standardized the Opus codec as RFC6716. Opus targets a wide range of real-time Internet.<|separator|>
  77. [77]
    HTML5 - XiphWiki
    Nov 12, 2015 · Firefox 3.5 includes "support for the HTML5 <video> and <audio> elements including native support for Ogg Theora encoded video and Vorbis ...
  78. [78]
    How Discord Handles Two and Half Million Concurrent Voice Users ...
    Sep 10, 2018 · Using the WebRTC native library allows us to use a lower level API from WebRTC (webrtc::Call) to create both send stream and receive stream. We ...
  79. [79]
    What is Opus Audio Codec? Features, Benefits & Use Cases - Vodlix
    Apr 22, 2025 · Opus is a highly versatile and efficient audio codec designed for interactive, real-time audio applications like voice calls, video conferencing, and live ...
  80. [80]
    Supported Audio file formats in iPhone [closed] - Stack Overflow
    Nov 19, 2009 · The audio playback formats supported in iOS are the following: AAC (AAC-LC); HE-AAC (v1 and v2); xHE-AAC - supported since iOS 13.0; AC-3 ...
  81. [81]
    Understanding Bluetooth codecs - SoundGuys
    May 30, 2025 · Why choose aptX over SBC? aptX's greater transfer rates are able to preserve more data than SBC, allowing for better overall sound quality. The ...
  82. [82]
    Apple Music vs Spotify - SoundGuys
    Aug 30, 2022 · The app streams audio using the open-source Ogg Vorbis codec at up to 320kbps for Spotify Premium users and up to 160kbps for people with a free ...
  83. [83]
    About lossless audio in Apple Music - Apple Support
    In addition to AAC, most of the Apple Music catalog is now also encoded using ALAC in resolutions ranging from 16-bit/44.1 kHz (CD Quality) up to 24-bit/192 kHz ...
  84. [84]
    Media container formats (file types) - MDN Web Docs
    Jun 10, 2025 · The most commonly used containers for media on the web are probably MPEG-4 Part-14 (MP4) and Web Media File (WEBM). However, you may also encounter Ogg, WAV, ...
  85. [85]
    How to Rip CDs Like a Pro: Avoiding Quality Loss and Metadata ...
    Feb 27, 2025 · While ripping CDs is a straightforward process, it comes with its challenges. Two common pitfalls include: Quality loss during conversion: MP3 ...<|separator|>
  86. [86]
    Modernizing Audio Codec Industry Standards For Enhanced Power ...
    May 13, 2021 · This translates to a significant improvement in mobile phone battery life, which means you can talk or stream mixed audio content longer without ...
  87. [87]
    In Defense of the 128 Kbps MP3: The Greatest Music Media Format ...
    Apr 1, 2019 · The humble 128 kbps MP3 is the true MVP of music mediums, the black sheep diamond in the rough with more than swagger and noise floor to go around.
  88. [88]
    Pro Tools Audio File Type and Session Support
    Mar 24, 2023 · Audio files of the following types can be imported into Pro Tools sessions and projects without conversion .wav. PCM (uncompressed) audio.
  89. [89]
    Demystifying Audio Formats: WAV, AIFF, MP3, FLAC & When to Use ...
    Oct 31, 2025 · Learn how to choose the right audio format with LANDR—WAV, AIFF, FLAC, MP3, even Dolby Atmos—to keep your mix sounding pro from studio to ...
  90. [90]
  91. [91]
    FLAC for lossless audio? - Nuendo - Steinberg Forums
    Mar 19, 2013 · FLAC is an excellent archive format for projects and it's great for music playback - but for real work - I would rather just stay uncompressed ( ...
  92. [92]
    [PDF] ATSC Standard: Digital Audio Compression (AC-3, E-AC-3)
    Dec 17, 2012 · 12 April 1995. Annex B, “AC-3 Data Stream ... Vernon, Steve, “Dolby Digital: Audio Coding for Digital Television and Storage Applications,”.
  93. [93]
    MPEG-4 HE-AAC v2 — audio coding for today's digital media world
    Jan 30, 2006 · The MPEG-4 High Efficiency AAC v2 profile (HE-AAC v2) has proven, in several independent tests, to be the most efficient audio compression scheme available ...
  94. [94]
    G.711 : Pulse code modulation (PCM) of voice frequencies - ITU
    Mar 14, 2011 · G. 711 : Pulse code modulation (PCM) of voice frequencies. Corresponding ANSI-C code is available in the G. 711 module of the ITU-T G.
  95. [95]
  96. [96]
    Guidelines on the Production and Preservation of Digital Audio ...
    IASA recommends an encoding rate of at least 24 bit to capture all analogue materials. For audio digital-original items, the bit depth of the storage technology ...
  97. [97]
    [PDF] ARSC Guide to Audio Preservation
    Jan 1, 2015 · The ARSC Guide to Audio Preservation, published by CLIR, covers audio preservation, including audio formats and their deterioration.
  98. [98]
    [PDF] Sound Savings - Association of Research Libraries
    We produce most of our audio masters at 96 kilocycles and 24-bit word length. At this time, we make two service copies: first, a down-sampled WAVE file at.
  99. [99]
    Results of the public multiformat listening test (July 2014)
    ### Key Results from Public Multiformat Listening Test (July 2014)
  100. [100]
    Opus Codec - Interactive Audio Vocoder | Adaptive Digital Tech
    Opus Armv7-M / Armv8-M Wideband (WB) ; Function, MIPS, Program Mem ; Encode – CELT Mode Only, Complexity 10, 32 kbps rate, 52, 110k bytes ...
  101. [101]
    WAV(PCM) vs FLAC - Audio Science Review (ASR) Forum
    Aug 29, 2021 · FLAC typically gives you a file about 60% of the original but it depends on the file. "Simple" sounds compress better (to a smaller file and ...
  102. [102]
    [PDF] A High-Quality Speech and Audio Codec With Less Than 10 ms Delay
    On the other hand, commonly used audio codecs, such as MP3 and Vorbis [5], can achieve high quality but have delays exceeding 100 ms. None of these codecs ...
  103. [103]
    Opus Codec: The Audio Format Explained | WebRTC Streaming
    Jul 29, 2020 · Opus provides a very performant 26.5 ms latency using its default settings (20 ms frame size), making it highly suitable for Voice over IP (VoIP) ...Quality · Comprehensive Combination Of... · Using Opus
  104. [104]
    Opus 1.2 audio codec brings better sound at low bit rates (free and ...
    Jun 26, 2017 · This release of Opus 1.2 has a ton of ARM Neon optimizations to improve decoding performance on mobile devices. Max Siegieda says: 06/26 ...
  105. [105]
  106. [106]
    Comparison – Opus Codec
    The figure below illustrates the quality of various codecs as a function of the bitrate. It attempts to summarize results from a collection of listening tests.