4B5B
4B/5B is a block coding scheme used in data communications that maps every group of 4 data bits (a nibble) into a predefined 5-bit code group, thereby expanding the data stream by 25% to ensure reliable transmission over physical media.[1] This encoding method addresses challenges in clock recovery and signal synchronization by guaranteeing a minimum number of transitions in the transmitted signal; specifically, each 5-bit code contains no more than three consecutive zeros and at least two transitions, preventing long runs of identical bits that could lead to timing errors in asynchronous systems.[1] The scheme operates at 80% efficiency, as only 4 of the 5 bits per group carry actual data, with the extra bit used for coding purposes.[2] Additionally, 4B/5B supports special non-data symbols—such as Idle (I), Start-of-Stream Delimiter (J/K), and End-of-Stream Delimiter (T/R)—which are essential for frame delimiting, error detection, and link management in networked environments.[2] Introduced as part of standards for high-speed local area networks, 4B/5B is prominently featured in the Physical Coding Sublayer (PCS) of Fast Ethernet, defined by the IEEE 802.3u-1995 standard, where it enables 100 Mbps data rates by transmitting at an effective symbol rate of 125 Mbps.[2] In 100BASE-TX over Category 5 twisted-pair cabling, the 4B/5B-encoded bits are scrambled to reduce electromagnetic interference before further processing with Multi-Level Transmit-3 (MLT-3) line coding, while 100BASE-FX over fiber optic uses Non-Return-to-Zero Inverted (NRZI) encoding on the 4B/5B output.[2] The technique also forms a core component of the Fiber Distributed Data Interface (FDDI), an ANSI X3.166 standard for 100 Mbps token-passing networks over multimode fiber, where it combines with NRZI to support ring topologies in backbone applications.[3] Although largely superseded by more advanced encodings in modern Gigabit and higher Ethernet variants, 4B/5B remains influential for its role in bridging the gap from 10 Mbps to faster speeds while maintaining compatibility with existing physical layer principles.[4]Fundamentals
Definition and Encoding Principle
4B5B is a block coding line code used in data communications that maps each group of 4 data bits, known as a nibble, to a unique 5-bit code group for transmission over a physical medium. This encoding scheme introduces a 25% overhead, as the 5-bit symbols require a higher signaling rate than the original data rate; for example, to achieve 100 Mbps of data throughput, the physical layer must operate at 125 MHz. The selection of 5-bit symbols from the 32 possible combinations ensures properties beneficial for reliable transmission, such as sufficient signal transitions for clock synchronization.[3] The core encoding principle involves dividing the incoming data stream into 4-bit nibbles, either from parallel input or by serializing the bits into groups of four, and then substituting each nibble with a predefined 5-bit symbol via a lookup table. Out of the 16 possible 4-bit values (from 0000 to 1111), each is assigned one of 16 carefully chosen 5-bit symbols designed to limit long runs of identical bits and maintain overall balance in the signal. This mapping avoids invalid or unused 5-bit patterns, reserving some for control purposes, while the process ensures that the encoded stream can be decoded unambiguously at the receiver.[3][5] In operation, the encoder aggregates input bits into nibbles and applies the mapping table to generate the 5-bit output stream, which is then typically further encoded (e.g., using NRZI) for the physical medium. The decoder, conversely, identifies valid 5-bit symbols in the incoming stream, maps them back to the original 4-bit nibbles using the inverse table, and reassembles the data. For example, the nibble 1010 (hexadecimal A) is encoded as the 5-bit symbol 10110, which provides the necessary transitions for reliable detection. This mechanism was originally specified in the ANSI FDDI standard and later incorporated into IEEE 802.3u for Fast Ethernet.[3][5][6]Key Properties and Benefits
The 4B5B encoding scheme incorporates a run-length limited (RLL) property, ensuring no more than three consecutive zeros within any 5-bit symbol, which guarantees at least one bit transition every five bits.[6] This design facilitates reliable clock recovery from the data stream without requiring a separate clock signal, as the frequent transitions allow phase-locked loops (PLLs) to synchronize effectively.[7] By limiting long runs of identical bits, 4B5B enhances signal integrity in both optical and electrical transmission media, reducing the risk of timing jitter and improving overall system performance.[8] Regarding DC balance, the selected 5-bit symbols provide a bounded disparity, with an average of roughly equal numbers of 1s and 0s across multiple encoded groups, though short-term imbalances can reach 2/5 ones (40% duty cycle).[7] This partial balancing minimizes baseline wander in AC-coupled systems, such as those using capacitors or transformers, thereby supporting stable long-distance transmission without excessive low-frequency distortion.[7] The scheme's 16 valid data symbols out of 32 possible 5-bit combinations enable basic error detection, as invalid patterns signal potential transmission errors, offering single-error detection capability with minimal overhead.[1] Key benefits of 4B5B include improved bandwidth efficiency, with only 25% overhead (transmitting 5 bits for every 4 data bits), compared to 100% overhead in Manchester encoding, which doubles the baud rate for self-clocking.[7] This efficiency allows higher effective data rates over constrained media, while the self-clocking nature eliminates the need for dedicated clock lines, simplifying hardware design.[8] As a simpler predecessor to 8B10B, 4B5B employs fixed mappings without running disparity management, reducing encoding/decoding complexity at the cost of less stringent DC control.[8]Encoding Details
Data Symbols
In 4B5B encoding, the data symbols represent the core mechanism for transmitting payload information, where each group of 4 bits (a nibble) from the input data stream is mapped to a specific 5-bit code group. This mapping ensures reliable transmission over the physical medium by guaranteeing sufficient signal transitions for clock recovery while maintaining a line rate of 125 Mbaud for 100 Mbps data. The 16 possible 4-bit data values, ranging from 0000 (hex 0) to 1111 (hex F), are encoded into predefined 5-bit patterns selected from the 32 possible 5-bit combinations to meet encoding constraints.[9] The complete mapping for the 16 data symbols is shown in the following table, including binary representations and hexadecimal equivalents. Symbol names use the conventional 4-bit hexadecimal notation (e.g., 0 for 0000), as defined in IEEE 802.3u for Fast Ethernet; FDDI uses similar mappings but assigns additional meanings to some codes for signaling.[10][3]| 4-bit Data (Binary / Hex) | 5-bit Code (Binary / Hex) | Symbol Name |
|---|---|---|
| 0000 / 0 | 11110 / 1E | 0 |
| 0001 / 1 | 01001 / 09 | 1 |
| 0010 / 2 | 10100 / 14 | 2 |
| 0011 / 3 | 10101 / 15 | 3 |
| 0100 / 4 | 01010 / 0A | 4 |
| 0101 / 5 | 01011 / 0B | 5 |
| 0110 / 6 | 01110 / 0E | 6 |
| 0111 / 7 | 01111 / 0F | 7 |
| 1000 / 8 | 10010 / 12 | 8 |
| 1001 / 9 | 10011 / 13 | 9 |
| 1010 / A | 10110 / 16 | A |
| 1011 / B | 10111 / 17 | B |
| 1100 / C | 11010 / 1A | C |
| 1101 / D | 11011 / 1B | D |
| 1110 / E | 11100 / 1C | E |
| 1111 / F | 11101 / 1D | F |
Control and Command Symbols
In 4B5B encoding, six special control symbols—H, I, J, K, R, and T—are defined outside the standard 16 data symbols to handle framing, synchronization, idle periods, termination, and error signaling. These symbols are assigned unique 5-bit patterns that violate the run-length constraints of data symbols (no more than three consecutive zeros and ensuring at least two transitions per symbol), making them invalid for data interpretation and thus easily detectable by the decoder. This design allows the physical coding sublayer (PCS) to insert control information transparently without ambiguity. Additionally, any received 5-bit pattern not matching a valid data or control code is categorized as an invalid symbol V.[13] The specific mappings for these symbols are as follows:| Symbol | 5-Bit Binary Code | Hex | Equivalent 4-Bit Input (if applicable) | Role |
|---|---|---|---|---|
| I | 11111 | 1F | N/A | Idle (line state during no transmission) |
| J | 11000 | 18 | 0101 | First part of start delimiter |
| K | 10001 | 11 | 0101 | Second part of start delimiter |
| H | 00100 | 04 | 1000 | Error propagation |
| T | 01101 | 0D | 0000 | First part of end/terminate delimiter |
| R | 00111 | 07 | 0000 | Second part of end/terminate delimiter |
| V | Various (unused 5-bit patterns, e.g., 00000, 00001) | N/A | N/A | Invalid or error-indicating received pattern |