Full Rate
Full Rate (FR), also designated as GSM 06.10, is a digital speech coding standard developed by the European Telecommunications Standards Institute (ETSI) for the Global System for Mobile Communications (GSM), the first widely deployed second-generation cellular network launched in 1991.[1] It serves as the foundational codec for full-rate traffic channels (TCH/F), compressing 20-millisecond frames of speech—consisting of 160 samples at an 8 kHz sampling rate in 13-bit uniform pulse-code modulation (PCM)—into 260 bits at an average bit rate of 13 kbit/s, enabling efficient transmission while preserving intelligible speech suitable for mobile communications over limited bandwidth.[2] The encoding process relies on Regular Pulse Excitation with Long Term Prediction (RPE-LTP), a linear predictive coding technique that models the human vocal tract to predict and quantize speech signals, dividing each frame into four 5-millisecond subframes for parameter extraction including reflection coefficients, pitch lag, and excitation pulses.[3] This standard, specified in ETSI's EN 300 961, includes both encoder and decoder functions with a theoretical minimum delay of 20 milliseconds, though practical implementations may add 3 to 8 milliseconds for processing, resulting in a total delay of 23 to 28 milliseconds, and incorporates features like codec homing sequences for bit-exact testing and compatibility with A-law or μ-law compressed inputs via the GSM A-interface.[2] As the initial speech codec for GSM, Full Rate prioritized network capacity and compatibility, achieving a balance between quality and efficiency suitable for early mobile environments, though its speech intelligibility was later critiqued for artifacts in noisy conditions.[4] It laid the groundwork for subsequent GSM enhancements, such as the Enhanced Full Rate (EFR) codec introduced in 1995, which improved perceptual quality using Algebraic Code Excited Linear Prediction (ACELP) at the same bit rate, and the Half Rate codec at 5.6 kbit/s for doubled channel capacity at reduced fidelity.[4] Despite the evolution to more advanced standards like Adaptive Multi-Rate (AMR) in third-generation networks, Full Rate remains relevant in legacy GSM systems and certain embedded applications, with open-source implementations like libgsm facilitating its use in software-defined radio and archival audio processing.[3]History and Development
Origins in GSM Standardization
The Groupe Spécial Mobile (GSM) was established in 1982 by the Conference of European Posts and Telecommunications administrations (CEPT) to develop a pan-European standard for digital mobile communications, driven by the need for a unified digital speech coding system to replace fragmented analog networks and facilitate cross-border roaming in Europe.[5][6] This initiative addressed the inefficiencies of existing national systems, aiming for interoperability and efficient spectrum use in second-generation (2G) cellular technology.[7] The speech codec selection process, conducted under CEPT auspices from 1987 to 1988 and later overseen by the European Telecommunications Standards Institute (ETSI), involved evaluating multiple candidate algorithms to meet stringent requirements for speech quality, computational complexity, and robustness over noisy mobile channels.[6] Six primary codec proposals were shortlisted for detailed testing in 1986, including submissions from Germany (Regular Pulse Excitation-Linear Predictive Coding, or RPE-LPC), France (Multi-Pulse Excitation-Long Term Prediction, or MPE-LTP), and representatives from Italy, Norway, Sweden, and the United Kingdom (sub-band coders).[6] The RPE-LTP technique, a hybrid approach combining regular pulse excitation for the residual signal with long-term prediction to model pitch periodicity, was selected as the winner in 1988 for its optimal trade-off between natural-sounding speech reproduction and low processing demands suitable for early digital mobile hardware.[6] Standardization efforts culminated in the publication of ETSI specification GSM 06.10 in 1990, which formally defined the Full Rate codec—based on the selected RPE-LTP method—as the foundational speech coding standard for GSM Phase 1 deployment in 2G networks.[6][2] This document outlined the codec's integration into the GSM air interface, ensuring compatibility across European operators.[2] Significant contributions to the codec's development came from industrial research at companies like Philips Research Laboratories in Eindhoven, which advanced low-complexity analysis-by-synthesis techniques, alongside academic expertise from institutions such as the Technical University of Delft.[6] These efforts focused on refining excitation models and prediction algorithms to achieve viable real-time performance on single-chip digital signal processors.[6]Adoption and Timeline
The first commercial GSM networks launched in 1991 in Finland with the Radiolinja service operated by Nokia and Mobira, marking the initial deployment of the Full Rate speech codec as the default standard for voice transmission.[8] This was followed by the rollout in Germany in July 1992, where Deutsche Telekom initiated its D1 network using the same codec.[9] The Full Rate codec enabled efficient use of the 200 kHz carrier bandwidth through time-division multiple access (TDMA), supporting up to eight simultaneous voice channels per carrier, which facilitated scalable network capacity in these early deployments.[10] By the mid-1990s, GSM subscriber numbers had surged past 10 million globally, with Full Rate serving as the foundational codec integral to the technology's rapid expansion.[5] This growth propelled GSM to dominate the 2G market, achieving approximately 80% global market share by 2000 and underpinning the connection of hundreds of millions of users worldwide.[11] The introduction of the Enhanced Full Rate (EFR) codec in 1995 by ETSI improved speech quality while maintaining compatibility with existing Full Rate infrastructure, beginning a gradual shift away from the original standard.[12] This was further accelerated by the Adaptive Multi-Rate (AMR) codec's standardization in 1999 through ETSI and 3GPP, which offered adaptive bit rates for better error resilience and efficiency, leading to Full Rate's replacement in most networks during the 2000s.[13] By the 2010s, Full Rate usage had declined to legacy support primarily in developing regions where 2G infrastructure persisted due to slower migration to higher generations. As of 2025, Full Rate sees minimal active deployment amid widespread 3G, 4G, and 5G migrations, though it remains in some IoT applications and legacy systems reliant on 2G for low-bandwidth connectivity.[14] In the UK, operators like O2 plan to restrict 2G services to IoT and emergency use starting in October 2026, with full national shutdowns targeted by 2033, signaling the codec's ongoing phase-out.[15]Technical Overview
Core Principles
The Full Rate (FR) speech codec, standardized for the Global System for Mobile Communications (GSM), employs a hybrid coding approach that integrates linear predictive coding (LPC) for short-term spectral modeling with long-term prediction (LTP) to capture the periodic components of speech signals. LPC analyzes the speech frame to derive filter coefficients that predict short-term correlations, effectively representing the spectral envelope, while LTP refines this by searching for the optimal pitch lag and quantizing the long-term prediction gain to capture periodicity in voiced speech. This combination allows the codec to efficiently model both the formant structure and pitch of human speech, achieving compression without excessive distortion.[2] At the core of the excitation mechanism is Regular Pulse Excitation (RPE), which approximates the residual signal after LPC and LTP filtering by placing pulses on a fixed grid within sub-frames, rather than using an exhaustive search over all possible pulse positions. This grid-based approximation significantly lowers computational complexity while maintaining adequate representation of the excitation signal's energy distribution, making it suitable for the resource-constrained environments of early mobile networks. The RPE process involves downsampling the residual, selecting optimal grid positions and amplitudes, and quantizing them to form the codec's output parameters.[2] Designed to meet the bandwidth limitations of GSM's digital mobile telephony, the FR codec targets toll-quality speech reproduction at a bit rate of 13 kbit/s, balancing audio fidelity, low latency (under 30 ms algorithmic delay), and modest processing demands compatible with 1990s hardware. Input consists of 13-bit linear pulse code modulation (PCM) samples at an 8 kHz sampling rate, processed in 20 ms frames of 160 samples, while the output comprises 260 bits per frame encoding the LPC, LTP, and RPE parameters for transmission and subsequent synthesis. This framework ensured robust performance over noisy wireless channels, prioritizing perceptual quality for conversational use.[2][16]Key Parameters
The Full Rate codec operates at a gross bit rate of 13 kbit/s, producing 260 bits per 20 ms frame, which corresponds to 1.625 bits per audio sample given the 8 kHz sampling rate.[2] This rate encompasses all encoded parameters prior to channel coding for transmission.[2] Speech frames consist of 160 samples over 20 ms, segmented into four subframes of 40 samples each to enable subframe-level processing of excitation and prediction parameters.[2] This structure supports efficient analysis of speech dynamics within each 5 ms subframe.[2] The linear predictive coding employs an 8th-order analysis filter to model the short-term correlations in the speech signal, with reflection coefficients quantized using a total of 36 bits per frame.[2] Long-term prediction uses lags ranging from 40 to 120 samples (5 to 15 ms), encoded with 7 bits per subframe, along with prediction gains quantized to 2 bits per subframe, for totals of 28 bits and 8 bits, respectively, across the frame.[2] Regular pulse excitation generates the stochastic component using 13 evenly spaced pulses within each 40-sample subframe, with grid positions selected via 2 bits per subframe (choosing among four possible alignments, total 8 bits across the frame).[2] The block amplitude is quantized logarithmically with 6 bits per subframe (total 24 bits), while the 13 pulse values are each represented using 3-bit adaptive pulse code modulation after normalization, yielding 39 bits for the pulses per subframe (total 156 bits).[2] The design prioritizes low latency, with end-to-end transcoder delay below 30 ms (theoretical minimum of 20 ms plus minimal processing overhead) and no look-ahead buffering required for frame encoding.[2]Encoding and Decoding Process
Linear Prediction Analysis
In the linear prediction analysis stage of the Full Rate codec, the input speech signal undergoes pre-emphasis using a first-order high-pass FIR filter to boost higher frequency components and compensate for the spectral tilt of the human vocal tract. The filter is defined by H_p(z) = 1 - \beta z^{-1}, where \beta \approx 0.93 (precisely $28180 \times 2^{-15}), applied to the 13-bit linear PCM samples at an 8 kHz sampling rate. This is followed by windowing the pre-emphasized 160-sample frame (corresponding to 20 ms of speech) with a Hamming window to minimize spectral leakage during autocorrelation computation, ensuring accurate estimation of the short-term spectral envelope.[2] The short-term predictor is modeled as an 8th-order all-pole filter, where the coefficients are derived using the autocorrelation method on the windowed signal. The autocorrelation function r(k) is computed for lags k = 0 to $8 as r(k) = \sum_{n=k}^{159} s_w(n) s_w(n-k), where s_w(n) is the windowed pre-emphasized signal; this yields nine autocorrelation values for the 8th-order analysis. These values are then processed via the Levinson-Durbin recursion (implemented as the Schur algorithm for numerical stability) to solve for the 8 reflection coefficients k_i (for i = 1 to $8), which lie in the range -1 < k_i < 1 and represent the prediction error feedback in a lattice structure. The reflection coefficients are subsequently transformed into log-area ratios (LARs) using \text{LAR}_c(i) = \frac{1}{2} \ln \left( \frac{1 + k_i}{1 - k_i} \right) to provide a perceptually uniform parameterization suitable for quantization.[2][17] Quantization of the LPC parameters occurs once per frame on the LARs using scalar quantization with nonuniform codebooks tailored to each coefficient's dynamic range and perceptual importance, allocating a total of 36 bits: 6 bits each for LARs 1 and 2 (64 levels), 5 bits each for LARs 3 and 4 (32 levels), 4 bits each for LARs 5 and 6 (16 levels), and 3 bits each for LARs 7 and 8 (8 levels). This bit allocation reflects the higher sensitivity of lower-order coefficients to quantization error, with the quantized LARs (LAR_c) transmitted as part of the 260-bit frame. At the decoder, the received LAR_c values are interpolated between consecutive frames to generate subframe-specific LAR sets, which are inverse-transformed back to reflection coefficients for the short-term synthesis filter. The synthesis filter reconstructs the speech signal by inverse linear prediction, passing the excitation through an 8th-order lattice all-pole filter \hat{s}(n) = \sum_{i=1}^{8} a_i \hat{s}(n-i) + u(n), where a_i are the derived predictor coefficients and u(n) is the decoded excitation; the output is then de-emphasized with the inverse pre-emphasis filter to recover the original spectral balance.[2]Regular Pulse Excitation
In the Full Rate codec, the Long Term Prediction (LTP) analysis models the periodic components of the speech signal within the LPC residual. For each 40-sample subframe, an adaptive codebook search is performed to determine the pitch lag in the range of 40 to 120 samples and the associated gain that best predicts the current subframe from past residual samples. The selected lag is quantized to 7 bits per subframe (28 bits total per frame), and the gain to 2 bits per subframe (8 bits total), for a combined 36 bits for LTP parameters per frame. This periodic component is then subtracted from the LPC residual to yield the LTP residual, which captures the non-periodic excitation.[2] The Regular Pulse Excitation (RPE) encoding further models the LTP residual by representing it as a sparse set of pulses on a coarse grid. The 40-sample LTP residual subframe is processed through adaptive sample rate decimation to select one of four possible grids (encoded with 2 bits per subframe), each defining 13 pulse positions spaced approximately every 3 samples. The amplitudes at these 13 positions are then quantized using 3 bits each via adaptive pulse code modulation (APCM), allocating 39 bits per subframe for the pulses (156 bits total per frame). This approach efficiently captures the key energy concentrations in the residual while minimizing bit usage.[2] To normalize the excitation signal, a separate gain scaling factor representing the subframe energy is quantized using 6 bits per subframe. This factor, derived from the maximum amplitude in the selected grid, scales the pulses during reconstruction to match the original residual's dynamic range.[2] In the decoder, the RPE pulses are first reconstructed by placing the quantized amplitudes at their grid positions within the 40-sample subframe, with zeros elsewhere, and scaling by the gain factor. The LTP contribution is added by shifting the previous frame's excitation by the lag and scaling by the gain, yielding the full excitation signal. This excitation is then passed through the LPC synthesis filter to produce the output speech samples, followed by post-processing such as de-emphasis.[2]Implementations
Software Libraries
The first open-source implementation of the GSM Full Rate speech codec was libgsm, developed between 1992 and 1994 by Jutta Degener and Carsten Bormann at Technische Universität Berlin.[18] This C-based library provides royalty-free encoding and decoding of GSM 06.10 audio, capable of real-time processing on low-end CPUs such as those from the early 1990s.[18] The European Telecommunications Standards Institute (ETSI) released the official reference implementation as part of the GSM 06.10 specification, including fixed-point ANSI C code for the RPE-LTP transcoder and test vectors for verification.[2] This reference code serves as a baseline for compliant implementations and is freely available through ETSI documentation.[2] Integrations of Full Rate support appear in multimedia libraries like FFmpeg, which uses libgsm for decoding GSM audio streams, and SoX (Sound eXchange), which encodes and decodes the format via an external GSM library.[19][20] Both tools enable media processing workflows, such as converting uncompressed audio to GSM-compressed files. Licensing for these libraries is permissive: libgsm follows a custom copyright allowing free use, modification, and distribution without royalties, while the ETSI reference code is freely available under ETSI terms.[21] Original RPE-LTP patents for the codec expired around 2010, eliminating prior encumbrances and enabling widespread adoption post-expiry.[18] Practical usage includes command-line tools; for example, libgsm'stoast utility compresses WAV files to GSM format with toast input.wav output.gsm, while SoX supports similar conversions via sox input.wav -r 8000 -c 1 output.gsm.[18][20] These tools facilitate audio archiving and legacy telephony applications.