Pulse-code modulation
Pulse-code modulation (PCM) is a method used to digitally represent analog signals, in which the amplitude of the signal is sampled at uniform time intervals, quantized into a finite set of discrete levels, and then encoded into a series of binary codes representing the quantized values.[1] This process transforms continuous analog waveforms into a discrete digital format suitable for transmission, storage, and processing in digital systems.[2] PCM serves as the foundational technique for digital audio representation in applications such as compact discs, computers, and telephony.[2] Invented in 1937 by British engineer Alec H. Reeves while working at International Telephone and Telegraph (ITT) Laboratories in Paris, PCM was initially conceived as a way to transmit multiple voice channels securely over noisy analog lines during World War II by converting them into digital pulses resistant to interference.[3] Reeves patented the technique in 1938 (French patent 852,183),[4] and it was first described in a 1939 publication, marking it as a pioneering step toward digital communications.[5] Although overlooked initially due to the dominance of analog technologies, Bell Laboratories in the United States developed practical implementations in the 1940s, constructing the first working PCM system for experimental telephony in 1943.[6] The core steps of PCM involve three main stages: sampling, where the analog signal's amplitude is measured at a rate at least twice the highest frequency component (per the Nyquist-Shannon sampling theorem to avoid aliasing); quantization, which maps each sample to the nearest level in a predefined set of discrete values, introducing minimal distortion; and encoding, where these quantized levels are converted into binary code words, typically using a fixed number of bits per sample (e.g., 8 bits for 256 levels).[7] The fidelity of PCM depends on the sampling rate, quantization levels, and bit depth; for instance, standard CD audio uses 44.1 kHz sampling and 16-bit encoding for high-quality reproduction.[8] PCM's significance lies in its robustness against noise and errors compared to analog methods, enabling error detection and correction in digital systems, and it underpins modern telecommunications, including the T-carrier systems introduced by Bell Labs in 1962 for commercial long-distance telephony.[9] Its adoption revolutionized data transmission, paving the way for the digital revolution in audio, video, and broadband communications, with variants like linear PCM remaining uncompressed standards in professional audio production.[8]Fundamentals
Sampling
Sampling is the initial step in pulse-code modulation (PCM), where a continuous analog signal is transformed into a sequence of discrete-time samples by measuring its amplitude at regular intervals. This process creates a pulse-amplitude modulated (PAM) signal, consisting of narrow pulses whose amplitudes correspond to the instantaneous values of the original waveform at each sampling instant. Uniform sampling ensures that the time between samples, known as the sampling period T_s, is constant, with T_s = \frac{1}{f_s}, where f_s is the sampling frequency.[10] The Nyquist-Shannon sampling theorem provides the theoretical foundation for this process, stating that a band-limited continuous-time signal can be perfectly reconstructed from its samples if the sampling frequency f_s is greater than or equal to twice the highest frequency component f_{\max} in the signal, i.e., f_s \geq 2 f_{\max}. This requirement, often called the Nyquist rate, prevents aliasing, a distortion where higher frequencies masquerade as lower ones in the sampled signal. The theorem was first articulated by Harry Nyquist in 1928 regarding telegraph transmission limits and formalized by Claude Shannon in 1949 for communication systems.[11][12] To ensure compliance with the Nyquist-Shannon theorem, an anti-aliasing filter—a low-pass filter—is essential before sampling; it attenuates frequency components above f_s / 2, band-limiting the signal to avoid aliasing artifacts. This filter limits the signal's bandwidth to the Nyquist frequency, preserving the integrity of the sampled representation.[13] In audio applications, sampling converts continuous acoustic waveforms into discrete PAM samples; for instance, human hearing extends to about 20 kHz, so compact discs use a sampling frequency of 44.1 kHz—more than twice this bandwidth—to capture high-fidelity sound without aliasing. These PAM samples form the basis for subsequent PCM stages, such as quantization.[14]Quantization
Quantization in pulse-code modulation (PCM) involves discretizing the amplitude of each sampled signal value from a continuous range to one of a finite set of discrete levels, approximating the original analog signal with a digital representation. In uniform quantization, the full amplitude range from V_{\min} to V_{\max} is divided into $2^n equally spaced levels, where n is the number of bits per sample, resulting in a fixed step size \Delta = \frac{V_{\max} - V_{\min}}{2^n}.[15] This process maps each sample to the nearest quantization level, introducing an inherent approximation that forms the basis of digital signal fidelity in PCM.[16] The difference between the original sample value and its quantized counterpart is known as the quantization error, which manifests as noise in the reconstructed signal. For a uniform quantizer assuming the error is uniformly distributed over -\Delta/2 to \Delta/2, the mean squared error is \Delta^2/12. For sinusoidal input signals spanning the full dynamic range, the signal-to-quantization-noise ratio (SQNR) is given by \mathrm{SQNR} = 6.02n + 1.76 \, \mathrm{dB}, providing a theoretical measure of quantization performance that improves approximately 6 dB per additional bit.[17] This formula highlights the trade-off between bit depth and noise level, with higher n reducing error but increasing bandwidth requirements. Uniform quantizers are categorized into mid-riser and mid-tread types based on the placement of the zero level relative to the decision thresholds. In a mid-riser quantizer, the zero input falls midway between two output levels (e.g., between -1 and +1), resulting in no zero output code and potential DC offset, often using sign-magnitude representation.[18] Conversely, a mid-tread quantizer positions the zero at the center of a quantization interval, including a zero output level for zero input and typically employing two's complement coding, which rounds small signals to zero and avoids offset.[19] These designs influence error characteristics, with mid-tread often preferred for signals crossing zero frequently, such as audio. Quantization errors in PCM arise primarily from two sources: granular noise and overload noise. Granular noise refers to the small-scale distortions within the quantizer's dynamic range, akin to the uniform error distribution in each step, which dominates for signals fitting within the levels.[20] Overload noise occurs when the input amplitude exceeds the maximum representable level, causing clipping and large distortions, mitigated by ensuring the signal stays within V_{\min} to V_{\max} or using headroom in practice.[21] While basic PCM relies on uniform quantization for linearity, non-uniform quantization extends this by effectively varying step sizes through companding—compressing the signal before uniform quantization and expanding it afterward—to allocate finer levels to smaller amplitudes, reducing overall impairment for signals with wide dynamic ranges like speech.[22] This approach maintains the simplicity of uniform coding while improving SQNR for low-level signals without altering the core PCM linearity.Binary Encoding
In the binary encoding stage of pulse-code modulation (PCM), the discrete amplitude levels resulting from quantization are mapped to fixed-length binary codes, forming the pulse codes that represent the original signal for digital transmission or storage.[23] Each quantized level is assigned a unique binary word, typically consisting of n bits where the number of levels N = 2^n, allowing representation of N distinct values.[24] Common encoding schemes include natural binary coding, where levels are assigned sequential binary numbers (e.g., level 0 as 0000, level 1 as 0001), and Gray coding, which ensures that adjacent levels differ by only one bit to reduce error propagation in noisy channels (e.g., level 0 as 0000, level 1 as 0001, level 2 as 0011).[23][24] The bit rate R_b of the resulting PCM signal is determined by the product of the number of bits per sample n and the sampling frequency f_s, given by R_b = n \cdot f_s where R_b is in bits per second.[24] For instance, in telephony applications using 8 bits per sample and an 8 kHz sampling rate, this yields a bit rate of 64 kbps.[23] In multi-channel PCM systems, such as those in digital telephony, binary-encoded samples from multiple channels are organized into time-division multiplexed (TDM) frames to enable efficient transmission.[25] Each frame typically includes one binary word from each channel plus additional synchronization bits for frame alignment and timing recovery at the receiver.[25] For example, the T1 carrier system frames 24 channels using 8-bit PCM words per channel, resulting in a 193-bit frame (24 × 8 + 1 framing bit) transmitted at 8,000 frames per second.[25] As an illustrative example, consider a 16-level quantizer (N = 16, n = 4) encoding levels from 0 to 15. In natural binary coding, the assignments are straightforward increments, while Gray coding adjusts for single-bit transitions:| Level | Natural Binary | Gray Code |
|---|---|---|
| 0 | 0000 | 0000 |
| 1 | 0001 | 0001 |
| 2 | 0010 | 0011 |
| 3 | 0011 | 0010 |
| ... | ... | ... |
| 14 | 1110 | 1001 |
| 15 | 1111 | 1000 |