Two-out-of-five code
The two-out-of-five code is a nonlinear binary constant-weight code consisting of the 10 codewords of length 5 and weight 2 (exactly two 1s per codeword), providing a unique encoding for the decimal digits 0 through 9.[1] This encoding scheme was employed in early computing systems for efficient decimal representation and basic error detection, as it allows verification that each digit maintains precisely two set bits, flagging deviations such as single-bit flips or unidirectional errors that alter the weight.[2] In the IBM 7070 data processing system, introduced in 1959, the code served as the internal representation for decimal digits in magnetic-core storage, arithmetic operations, and input/output devices like punched cards and magnetic tape, where it supported fixed-count checking to ensure data integrity during transfers.[2] The specific bit patterns for digits in the IBM 7070, for example, assigned 11000 to 0, 00110 to 6 (also used for the minus sign), and 10001 to 3 (also for the alpha sign), enabling compact storage of 10-digit words with a sign bit while minimizing errors in mechanical and electronic components.[2] Beyond computing hardware, the two-out-of-five code found prominent use in barcode symbologies for its simplicity and readability. In the United States Postal Service's POSTNET (Postal Numeric Encoding Technique) system, each digit is represented by five bars where exactly two are full-height (corresponding to 1s) and three are half-height (0s), facilitating automated mail sorting from the 1980s until its phase-out in favor of Intelligent Mail barcodes.[3] Similarly, the PLANET (Postal Alpha Numeric Encoding Technique) barcode adopted the same encoding for tracking outbound mail, while the numeric portions of Code 39 barcodes—widely used in logistics and inventory since the 1970s—map digit values to pairs of wide bars among five positions using this scheme.[1] These applications highlight the code's enduring value in error-detecting, machine-readable formats requiring exactly 10 distinct symbols.[1]Overview
Definition
The two-out-of-five code is a nonlinear binary constant-weight code of length 5, in which each valid codeword contains exactly two 1's and three 0's, thereby encoding information through the positions of the 1's.[1] This structure yields precisely 10 distinct codewords, as the number of ways to choose 2 positions out of 5 for the 1's is given by the binomial coefficient \binom{5}{2} = 10.[1] The fixed Hamming weight of 2 makes the code particularly suitable for representing the 10 decimal digits (0-9) in applications requiring reliable digit encoding.[1] In contrast to variable-weight codes, where the number of 1's can differ across codewords, the two-out-of-five code's constant weight enables inherent error detection by verifying that incoming words maintain exactly two 1's; any deviation signals a potential transmission error.[4] This property also simplifies hardware implementation, as the weight check can be performed efficiently with basic combinational logic circuits, such as adders or counters to tally the 1's.[4] A generic example of a codeword in this scheme is 11000, where the two 1's occupy the first and second bit positions, with the remaining bits set to 0; the exact mapping of such patterns to specific digits depends on the implementation context.[4]Encoding Principles
The two-out-of-five code encodes each decimal digit from 0 to 9 as a 5-bit binary codeword containing exactly two 1's and three 0's, yielding precisely ten unique combinations to match the decimal digits. This constant-weight structure ensures a fixed Hamming weight of 2 for every valid codeword, facilitating reliable representation in binary systems. The generic encoding process selects two distinct positions out of the five bits to place the 1's, with each digit mapped to a unique pair of positions to avoid ambiguity during decoding.[5] To enable efficient decoding, particularly in hardware or transmission contexts, bit positions are often assigned positional weights, such as powers of 2 or custom values, where the digit value corresponds to the sum of the weights in the positions set to 1. This weighted approach transforms the combinatorial selection into a numerical summation, allowing decoders to compute the digit directly from the codeword. For instance, in a scheme with weights w_1 = 0, w_2 = 1, w_3 = 2, w_4 = 3, w_5 = 6 (assigned from left to right), the digit value is calculated as the sum of the weights of the 1-bit positions, providing unique sums for most digits.[5][6] The digit 0 presents a special case, as no combination of two positive weights sums to zero, and an all-zero codeword would violate the two-1's rule. Thus, 0 is assigned a predefined pattern that does not conform to the summation rule, such as 01100 (positions 2 and 3 set to 1), which is reserved exclusively for zero despite potentially summing to 3 under the weighting scheme. This exception ensures all digits have distinct representations while maintaining the constant-weight property.[5]Variants
Telecommunication Variant
The telecommunication variant of the two-out-of-five code assigns weights of 0, 1, 2, 3, and 6 to the five bit positions, typically from left to right, to encode decimal digits 0 through 9 using exactly two 1s per codeword. This weighting scheme provides a unique identifier for each digit from 1 to 9 corresponding to the positions of the 1s, while digit 0 uses a codeword that would otherwise correspond to 3 (shared with one representation of 3, but reassigned for zero). Designed for punched cards and early data transmission in telecommunication systems during the 1940s, this variant prioritized readability and single-error detection in noisy channels, as employed by Bell Labs for reliable digit representation. Its constant weight of two 1s facilitated mechanical implementation, such as activating exactly two relays per digit in electromechanical setups, enabling straightforward detection and reducing wiring complexity in teletype and telegraph equipment.[7] The full encoding table for digits 0-9 is as follows, with binary representations read left to right (MSB to LSB) and identifiers based on the weights:| Digit | Binary Code | 1s Positions (Weights) | Identifier |
|---|---|---|---|
| 0 | 01100 | 2nd (1), 3rd (2) | 3 |
| 1 | 11000 | 1st (0), 2nd (1) | 1 |
| 2 | 10100 | 1st (0), 3rd (2) | 2 |
| 3 | 10010 | 1st (0), 4th (3) | 3 |
| 4 | 01010 | 2nd (1), 4th (3) | 4 |
| 5 | 00110 | 3rd (2), 4th (3) | 5 |
| 6 | 10001 | 1st (0), 5th (6) | 6 |
| 7 | 01001 | 2nd (1), 5th (6) | 7 |
| 8 | 00101 | 3rd (2), 5th (6) | 8 |
| 9 | 00011 | 4th (3), 5th (6) | 9 |
POSTNET Variant
The POSTNET (Postal Numeric Encoding Technique) variant of the two-out-of-five code was developed by the United States Postal Service (USPS) for encoding numeric ZIP Code information in barcodes on mail pieces to facilitate automated sorting.[8] This adaptation uses a binary representation where each digit from 0 to 9 is encoded into a five-bar pattern consisting of exactly two tall bars (representing 1s) and three short bars (representing 0s), ensuring the constant weight property of the code.[8] Unlike binary transmission variants, the POSTNET encoding is visualized vertically, with tall bars at full height (approximately 0.125 inches) and short bars at half height (approximately 0.050 inches), all with a uniform width of about 0.020 inches.[8] The encoding assigns weights of 7, 4, 2, 1, and 0 to the five bar positions from left to right, allowing the positions of the two tall bars to represent values from 0 to 9 through the sum of their weights (with digit 0 using a special combination summing to 11).[9] This weighted scheme ensures each digit has a unique pattern, and a check digit is appended to make the total sum of all digits a multiple of 10 for error detection.[8] The full encoding table for digits 0–9 is as follows, where 1 indicates a tall bar and 0 a short bar:| Digit | Pattern |
|---|---|
| 0 | 11000 |
| 1 | 00011 |
| 2 | 00101 |
| 3 | 00110 |
| 4 | 01001 |
| 5 | 01010 |
| 6 | 01100 |
| 7 | 10001 |
| 8 | 10010 |
| 9 | 10100 |
IBM 707x Variant
The IBM 707x variant of the two-out-of-five code was utilized in the IBM 7070, 7072, and 7074 mainframe computers for internal data representation in core storage. Each word comprised 10 decimal digits plus a sign, with every digit encoded across five bit positions designated by weights 0, 1, 2, 3, and 4—corresponding to binary places from least significant bit (weight 0) to most significant (weight 4). This constant-weight scheme set exactly two bits to 1 per digit, enabling basic error detection by verifying the bit count during operations, which was advantageous given the era's core memory costs and reliability needs. The encoding prioritized density, packing 55 bits per word (50 for digits + 5 for sign handling, though sign used three bits).[2] The specific bit patterns for digits 0–9 in this variant followed a systematic assignment of the 10 possible two-1 combinations across the five positions, adapted for efficient numeric processing distinct from external input formats. For example:| Digit | Binary (MSB to LSB: 4-3-2-1-0) | Positions On (Weights) |
|---|---|---|
| 0 | 00011 | 0, 1 |
| 1 | 00101 | 0, 2 |
| 2 | 00110 | 1, 2 |
| 3 | 01001 | 0, 3 |
| 4 | 01010 | 1, 3 |
| 5 | 01100 | 2, 3 |
| 6 | 10001 | 0, 4 |
| 7 | 10010 | 1, 4 |
| 8 | 10100 | 2, 4 |
| 9 | 11000 | 3, 4 |
Code 39 Variant
The Code 39 variant of the two-out-of-five code is integrated into the Code 39 barcode symbology, a discrete, self-checking alphanumeric standard developed by Intermec in 1974 and later standardized under ANSI MH10.8M-1983. This variant specifically supports the numeric subset for digits 0-9 by encoding two wide bars out of five bar positions, enabling 10 unique combinations suitable for decimal representation within the broader 43-character set of Code 39, which includes uppercase letters and special symbols.[10] In this encoding, the five bar positions are assigned equivalence values of 1, 2, 4, 7, and 0 from left to right, with the final position functioning as a parity bit to enhance error detection. The positions of the two wide bars are selected to correspond to unique identifiers from 1 to 10 (with 10 representing digit 0) based on the equivalence values of the positions. The overall character structure in Code 39 consists of nine elements—five bars and four intervening spaces—with exactly three wide elements total: two among the bars (via the two-out-of-five logic) and one among the spaces to convey additional information bits.[10] Code 39 barcodes are framed by start and stop characters represented by the asterisk (*), which feature a distinctive asymmetric pattern (narrow, narrow, wide, wide, narrow for the bars) not derived from the two-out-of-five numeric encoding. This delimiter ensures proper scan initiation, termination, and directionality without being interpreted as data.[11]Properties
Error Detection Capabilities
The two-out-of-five code utilizes a constant Hamming weight of two in each five-bit codeword, which facilitates error detection by verifying that the received word maintains exactly two ones. A single-bit error, such as flipping a zero to one or a one to zero, inevitably alters this weight to either one or three ones, allowing the error to be detected through a simple parity check on the number of ones. This mechanism ensures that all single-bit errors are reliably identified without requiring additional parity bits.[6][1] The code also excels in detecting unidirectional errors, where all bit flips occur in the same direction (either all from zero to one or all from one to zero). Any number of such errors will change the weight from two—for instance, one or more zeros flipping to ones results in a weight of three or higher, or one or more ones flipping to zeros results in a weight of one or zero—making all unidirectional errors detectable via the weight check.[1] Double-bit errors are not guaranteed to be detected, as certain combinations can preserve the weight of two but map to another valid codeword (e.g., one zero-to-one and one one-to-zero flip might keep the weight but result in a different decimal digit). Other double errors alter the weight (e.g., two zero-to-one flips result in weight four), making them detectable. Since all possible weight-two words are valid codewords, any weight-preserving double error shifts to another valid codeword and is thus undetectable, while weight-changing ones are detected.[6][1] Compared to traditional even or odd parity schemes, which add an extra bit to detect only an odd number of errors in binary data, the two-out-of-five code provides more efficient error detection for decimal encoding within a fixed five-bit structure. It inherently checks for exact weight compliance without extra overhead, offering stronger protection against single errors and all unidirectional errors in applications like digit representation.[6]Hamming Distance and Weight
The two-out-of-five code is a constant-weight code in which every codeword has a Hamming weight of exactly 2, meaning each 5-bit codeword contains precisely two 1s and three 0s. This fixed weight property ensures that the 10 codewords, corresponding to the decimal digits 0 through 9, are selected from the \binom{5}{2} = 10 possible combinations of positions for the two 1s.[12] The minimum Hamming distance of the code is 2, as any two distinct codewords differ in at least two bit positions. This arises because distinct codewords share at most one position with a 1; if they shared both, they would be identical. For two codewords with supports (positions of 1s) A and B, the Hamming distance d is given byd = 2 \times (2 - |A \cap B|) ,
where |A \cap B| is the size of the intersection of their supports. The minimum occurs when |A \cap B| = 1, yielding d = 2; if |A \cap B| = 0, then d = 4.[12] These properties imply that the code can detect up to 1 error but correct 0 errors, consistent with the general rule for binary codes where the error detection capability is d - 1 and correction capability is \lfloor (d - 1)/2 \rfloor. For constant-weight codes like this one, the Singleton bound further confirms the minimum distance aligns with the code's parameters, as d \leq n - \log_2 M + 1 (here n=5, M=10, yielding d \leq 2), and the achieved d=2 satisfies this limit while enabling single-error detection.[12]