Shannon–Fano coding

Shannon–Fano coding is a variable-length prefix coding technique used in lossless data compression to represent symbols from a source alphabet with binary codewords, where more probable symbols receive shorter codes to approximate the theoretical minimum average code length given by the source's entropy.^[1] The method, named after its co-developers Claude E. Shannon and Robert M. Fano, emerged in the late 1940s as part of early work in information theory. Shannon first outlined the approach in his 1948 paper "A Mathematical Theory of Communication", crediting Fano for independently devising a substantially equivalent encoding scheme that divides messages into probability-balanced groups and assigns binary digits recursively.^[1] Fano elaborated on the technique in his 1949 MIT technical report "The Transmission of Information", providing a practical algorithm for constructing such codes. This development laid foundational groundwork for source coding, enabling efficient representation of data by exploiting symbol probabilities without loss of information.^[1] The resulting binary tree ensures the prefix property, allowing instantaneous and unambiguous decoding from left to right.^[2] While the codes are typically within one bit of the optimal length, the greedy partitioning can lead to slight inefficiencies compared to later methods like Huffman coding, which builds trees bottom-up for guaranteed optimality.^[3] Shannon–Fano coding exemplifies entropy coding principles central to information theory, influencing subsequent compression algorithms and applications in telecommunications, file archiving, and multimedia encoding.^[1] Its simplicity and near-optimality made it a precursor to more advanced techniques, though modern implementations often favor Huffman or arithmetic coding for better performance.^[3]

Background

Prefix codes

Prefix codes are variable-length codes designed for lossless data compression and transmission, defined by the property that no codeword is a prefix of another codeword in the code set. This ensures unique decodability, allowing the original sequence of symbols to be recovered unambiguously from the concatenated codewords without requiring separators or lookahead beyond the current codeword.^[1] The prefix condition facilitates instantaneous decoding, in which each codeword can be recognized and decoded immediately upon completion, independent of subsequent bits. This property supports efficient, real-time unambiguous transmission over noisy channels, as the decoder need not buffer or delay processing to resolve ambiguities.^[4] A fundamental result characterizing prefix codes is the Kraft inequality, which asserts that for any binary prefix code with codeword lengths l_1, l_2, \dots, l_n,

\sum_{i=1}^n 2^{-l_i} \leq 1.

This provides both a necessary condition that any set of prefix code lengths must satisfy and a sufficient condition for the existence of a prefix code achieving those lengths.^[5] The notion of prefix codes traces back to early communication systems predating formal information theory, such as the variable-length Morse code invented in 1837, which used timing-based separations to achieve unique decodability akin to the prefix property.^[6] Such codes enable compression efficiencies approaching the source entropy, the theoretical lower bound on average code length.^[1]

Entropy and code efficiency

In information theory, Shannon entropy quantifies the average uncertainty or information content associated with a discrete random variable X representing the output of an information source. For a discrete memoryless source with symbols having probabilities p_i > 0 where \sum p_i = 1, the entropy H(X) is defined as

H(X) = -\sum_i p_i \log_2 p_i,

measured in bits per symbol.^[1] This formula arises from the need to measure the expected amount of information needed to specify the outcome of a random event, with the base-2 logarithm reflecting binary encoding.^[1] The value of H(X) represents the average information content per symbol produced by the source, serving as a fundamental limit on the compressibility of the data.^[7] For example, a source emitting symbols with equal probability p_i = 1/n yields H(X) = \log_2 n bits, indicating maximum uncertainty, while a deterministic source (p_1 = 1) has H(X) = 0, requiring no information to describe.^[1] Prefix codes, which ensure instantaneous decodability, provide a practical means to approach this entropy bound by assigning shorter codewords to more probable symbols.^[7] In the context of source coding, optimal prefix codes achieve an average codeword length L satisfying H(X) \leq L < H(X) + 1 bits per symbol, demonstrating that entropy sets the theoretical minimum for lossless compression.^[1] This inequality, part of the noiseless coding theorem (also known as Shannon's source coding theorem), establishes that no uniquely decodable code can have an average length below H(X), while efficient codes can get arbitrarily close to it with sufficiently long blocks.^[4] The theorem thus bounds the efficiency of compression schemes, highlighting entropy as the irreducible limit for data representation without loss.^[7]

History

Shannon's contributions

Claude Shannon's foundational work on coding theory emerged from his research at Bell Laboratories in the years following World War II, where he focused on improving the efficiency of communication systems amid growing demands for reliable information transmission.^[8] As a mathematician and engineer, Shannon sought to quantify the limits of data compression and error-free signaling, drawing on probabilistic models to address practical challenges in telephony and telegraphy.^[1] This effort culminated in his seminal 1948 paper, "A Mathematical Theory of Communication," published in the Bell System Technical Journal, which introduced key concepts in information theory, including entropy as a measure of uncertainty in a source and the principles of source coding.^[9] In the paper, Shannon defined the entropy H of a discrete information source with symbols occurring at probabilities p_i as

H = -\sum_i p_i \log_2 p_i,

representing the average information content per symbol in bits.^[1] He established the source coding theorem, asserting that the minimum average code length for uniquely decodable codes is bounded by the entropy H, with efficient codes achieving lengths arbitrarily close to this limit for large block sizes.^[9] This theorem provided a theoretical foundation for lossless data compression, emphasizing that redundancy in sources could be exploited to reduce transmission rates without information loss. Shannon proposed that optimal code lengths for individual symbols should be approximately proportional to -\log_2 p_i, the information content of each symbol, with appropriate rounding to integer values to ensure integer bit lengths while maintaining the prefix property for instantaneous decodability.^[1] This assignment ensures that more probable symbols receive shorter codes, minimizing the expected code length to approach the entropy bound.^[9] To construct such codes, Shannon outlined a method involving arranging the symbols in decreasing order of probability and encoding each using the binary expansion of the cumulative probability up to the preceding symbol, which yields code lengths close to the ideal -\log_2 p_i.^[1] Although presented as a conceptual approach rather than a fully detailed algorithm, it laid the groundwork for practical coding techniques in information theory, and later editions of the paper noted its substantial equivalence to a method independently developed by Robert M. Fano.^[1]

Fano's contributions

Robert M. Fano made significant contributions to the development of practical source coding techniques shortly after Claude E. Shannon's introduction of information theory, with his work appearing in a 1949 technical report from the MIT Research Laboratory of Electronics.^[10] In this report, titled The Transmission of Information, Fano presented adaptive coding methods that built directly on Shannon's entropy concept, providing algorithmic approaches to achieve coding efficiency close to the theoretical entropy bound for discrete sources.^[10] Funded under contracts from the U.S. Army Signal Corps, Navy, and Air Force as part of the MIT Radiation Laboratory efforts, Fano's research emphasized constructive, implementable procedures for information transmission, contrasting with the more theoretical focus of prior work.^[10] A key innovation in Fano's 1949 report was the introduction of the binary splitting method for encoding symbols with unequal probabilities, where the symbols are sorted by probability and recursively divided into two subsets whose total probabilities are as equal as possible.^[10] This partitioning process assigns code lengths proportional to the negative logarithm of the probabilities, enabling variable-length prefix codes that minimize average code length while ensuring instantaneous decodability.^[10] Fano demonstrated through examples how this method approximates the entropy limit, offering a straightforward computational alternative to exhaustive optimization.^[10] Fano's contributions extended beyond noiseless coding to practical algorithms for noisy channels, where he explored recoding strategies to maximize mutual information under constraints like fixed power and bandwidth.^[10] By analyzing binary symmetric channels and pulse-amplitude modulation, Fano highlighted how source coding could be combined with channel adaptations to improve reliability, laying groundwork for later error-correcting code designs.^[10] This emphasis on actionable engineering solutions within Shannon's framework influenced subsequent developments in communication systems at MIT and beyond.^[10]

Algorithms

Shannon's method

Shannon's method, attributed to Claude Shannon, constructs prefix codes by assigning to each symbol a codeword derived from the binary expansion of its position in the cumulative probability distribution. Symbols are sorted in increasing order of probability, and the code for the i-th symbol is the binary representation of the cumulative probability up to the previous symbol, truncated to a length that ensures the prefix property and approximates l_i = \lceil -\log_2 p_i \rceil + 1. This direct assignment yields an average code length bounded by H + 1, where H is the entropy, but may not always achieve the tightest optimality.^[1] The process avoids recursive partitioning and instead uses interval mapping:

Sort symbols by increasing probability: $0 < p_1 \leq p_2 \leq \cdots \leq p_n.
Compute cumulative probabilities c_0 = 0, c_i = \sum_{j=1}^i p_j for i = 1 to n.
For each symbol i, the codeword is the binary digits of c_{i-1} starting from the first bit after the binary point, up to length l_i = \lfloor -\log_2 p_i \rfloor + 1, ensuring no overlap with adjacent intervals.

This ensures the Kraft inequality \sum 2^{-l_i} \leq 1 and prefix-free decoding. Unlike Fano's recursive splitting, Shannon's approach focuses on probabilistic intervals for straightforward code generation, though it can lead to longer codes in some cases compared to bottom-up methods.^[11] The following pseudocode outlines codeword assignment:

function shannon_codes(symbols, probs):
    # Sort by increasing probability
    sorted_idx = argsort(probs)  # increasing
    sorted_symbols = symbols[sorted_idx]
    sorted_probs = probs[sorted_idx]
    cumulative = [0]
    for p in sorted_probs:
        cumulative.append(cumulative[-1] + p)
    
    codes = {}
    for i in range(len(sorted_symbols)):
        start = cumulative[i]
        length = floor(-log2(sorted_probs[i])) + 1
        # Binary expansion of start, take length bits after binary point
        code = binary_fraction_to_bits(start, length)
        codes[sorted_symbols[i]] = code
    return codes
function shannon_codes(symbols, probs):
    # Sort by increasing probability
    sorted_idx = argsort(probs)  # increasing
    sorted_symbols = symbols[sorted_idx]
    sorted_probs = probs[sorted_idx]
    cumulative = [0]
    for p in sorted_probs:
        cumulative.append(cumulative[-1] + p)
    
    codes = {}
    for i in range(len(sorted_symbols)):
        start = cumulative[i]
        length = floor(-log2(sorted_probs[i])) + 1
        # Binary expansion of start, take length bits after binary point
        code = binary_fraction_to_bits(start, length)
        codes[sorted_symbols[i]] = code
    return codes

Fano's method

Fano's method constructs a binary prefix code by recursively partitioning the set of symbols into subsets with approximately equal total probabilities, building a decision tree directly from the probability distribution. The process begins with all symbols sorted in descending order of probability and proceeds by dividing them into two groups whose summed probabilities are as balanced as possible, assigning a binary digit (0 or 1) to each branch. This recursive splitting continues until each leaf node corresponds to a single symbol, with codewords formed by the path from the root to the leaf.^[11]^[12] The algorithm steps are as follows:

Sort the symbols in decreasing order of their probabilities: p_1 \geq p_2 \geq \cdots \geq p_r, where r is the number of symbols.^[11]^[12]
Partition the ordered list into two subsets A and B such that the total probability of A (the sum of probabilities of symbols in A) is as close as possible to half the total probability of all symbols, minimizing the absolute difference |\sum_{i \in A} p_i - \sum_{i \in B} p_i|. In case of ties, select the split at the first point where the cumulative probability is at least half the total.^[11]^[12]
Assign 0 to the branch leading to subset A and 1 to the branch leading to subset B (or vice versa, though conventionally 0 for the higher-probability subset).^[13]
Recur on each subset independently until every subset contains only one symbol.^[11]^[12]

The following pseudocode illustrates the recursive partitioning process:

function build_fano_tree(symbols, probs, prefix=""):
    if len(symbols) == 1:
        return {symbols[0]: prefix}
    # Sort by probability descending (assumed already sorted initially)
    cumulative = [sum(probs[:i+1]) for i in range(len(probs))]
    total_prob = sum(probs)
    # Find split index minimizing |2 * cumulative[k] - total_prob| for k=0 to len-2 (0-based)
    # In case of ties (equal min diff), use first k where cumulative[k] >= total_prob / 2
    min_diff = float('inf')
    split_idx = -1
    for k in range(1, len(probs)):
        diff = abs(2 * cumulative[k-1] - total_prob)  # cumulative[k-1] is sum of first k symbols (0-based)
        if diff < min_diff or (diff == min_diff and cumulative[k-1] >= total_prob / 2):
            min_diff = diff
            split_idx = k
    left_symbols = symbols[:split_idx]
    left_probs = probs[:split_idx]
    right_symbols = symbols[split_idx:]
    right_probs = probs[split_idx:]
    codes = {}
    codes.update(build_fano_tree(left_symbols, left_probs, prefix + "0"))
    codes.update(build_fano_tree(right_symbols, right_probs, prefix + "1"))
    return codes
function build_fano_tree(symbols, probs, prefix=""):
    if len(symbols) == 1:
        return {symbols[0]: prefix}
    # Sort by probability descending (assumed already sorted initially)
    cumulative = [sum(probs[:i+1]) for i in range(len(probs))]
    total_prob = sum(probs)
    # Find split index minimizing |2 * cumulative[k] - total_prob| for k=0 to len-2 (0-based)
    # In case of ties (equal min diff), use first k where cumulative[k] >= total_prob / 2
    min_diff = float('inf')
    split_idx = -1
    for k in range(1, len(probs)):
        diff = abs(2 * cumulative[k-1] - total_prob)  # cumulative[k-1] is sum of first k symbols (0-based)
        if diff < min_diff or (diff == min_diff and cumulative[k-1] >= total_prob / 2):
            min_diff = diff
            split_idx = k
    left_symbols = symbols[:split_idx]
    left_probs = probs[:split_idx]
    right_symbols = symbols[split_idx:]
    right_probs = probs[split_idx:]
    codes = {}
    codes.update(build_fano_tree(left_symbols, left_probs, prefix + "0"))
    codes.update(build_fano_tree(right_symbols, right_probs, prefix + "1"))
    return codes

This direct tree-building approach can result in codeword lengths that are slightly suboptimal compared to entropy bounds, as the greedy split decisions may not always yield the globally optimal prefix code, with average length bounded by H + 2.^[11]^[12] The codeword length for each symbol is determined by the depth of its leaf in the resulting tree.^[12]

Tree representation

Shannon–Fano codes are represented as binary trees, where the root node corresponds to the full set of symbols and their probabilities, and each internal node branches into two child nodes labeled 0 (left) and 1 (right).^[10] The unique path from the root to a leaf node, formed by the sequence of branch labels, defines the binary codeword for the symbol at that leaf.^[14] This structure ensures that codewords are uniquely decodable, as the tree encodes the mapping between symbols and their variable-length codes. The depth of each leaf in the tree directly corresponds to the length of the codeword for its symbol; shorter paths (shallower leaves) are assigned to more probable symbols to minimize the average code length. In Shannon–Fano trees, the branching aims for balanced splits that approximate equal probability masses on each side, leading to relatively uniform subtree probabilities and near-optimal code lengths for many distributions.^[10] A generic Shannon–Fano tree can be visualized in text form as follows, for a simple alphabet {A, B, C, D} with decreasing probabilities:

      Root
     /    \
    0      1
   / \    / \
  A   B  C   D
      Root
     /    \
    0      1
   / \    / \
  A   B  C   D

Here, A and B share the left subtree (code prefix 0), while C and D share the right (prefix 1), with codewords 00, 01, 10, and 11, respectively; deeper trees would extend branches unevenly based on probabilities.^[15] The prefix property—that no codeword is a prefix of another—is inherently ensured by constructing the tree as a full binary tree, where every internal node has exactly two children and no unary branches exist, preventing any leaf from lying on the path to another leaf.^[14] This full structure guarantees instantaneous decodability during encoding and decoding processes.

Examples

Shannon's coding example

To illustrate Shannon's method for constructing a prefix code, consider a source with four symbols A, B, C, and D having probabilities 0.4, 0.25, 0.2, and 0.15, respectively. The symbols are first sorted in decreasing order of probability: A (0.4), B (0.25), C (0.2), D (0.15). The cumulative probabilities are then computed: 0.4 for A, 0.65 for A and B, 0.85 for A, B, and C, and 1.0 for all symbols. For the initial split, the point where the cumulative probability is closest to 0.5 is identified to form two groups. The cumulative after A is 0.4 (deviation of 0.1 from 0.5), while after B it is 0.65 (deviation of 0.15 from 0.5); thus, the split occurs after A, assigning bit 0 to group {A} and bit 1 to group {B, C, D}. This gives A a code length of 1 and the others a preliminary length of 2. The process recurses on the second group. In the subgroup {B, C, D} (total probability 0.6, target split at 0.3), the symbols are already sorted, with cumulatives 0.25 after B and 0.45 after B and C (relative to the whole or adjusted proportionally). The closest split to 0.3 is after B (deviation of 0.05), assigning bit 0 to B (code 10, length 2) and bit 1 to {C, D} (codes starting with 11, length 3). For the final subgroup {C, D} (total probability 0.35, target split at 0.175), the cumulative after C is 0.2 (deviation of 0.025 from 0.175). The split occurs after C, assigning bit 0 to C (code 110, length 3) and bit 1 to D (code 111, length 3). The resulting binary codes are prefix-free, with A assigned 0, B assigned 10, C assigned 110, and D assigned 111 (assignments within groups follow the recursive bits). The code table is as follows:

Symbol	Probability	Codeword	Length
A	0.4	0	1
B	0.25	10	2
C	0.2	110	3
D	0.15	111	3

Fano's coding example

To illustrate Fano's method, consider the same symbol distribution as in the Shannon coding example: symbols A, B, C, and D with probabilities 0.4, 0.25, 0.2, and 0.15, respectively. The symbols are first sorted in decreasing order of probability: A (0.4), B (0.25), C (0.2), D (0.15). Fano's algorithm proceeds by recursively dividing the sorted list into two subgroups whose total probabilities are as equal as possible. The initial split occurs after the first symbol, separating A (probability 0.4) from the group {B, C, D} (total probability 0.6), as this minimizes the difference between subgroup sums compared to alternatives like {A, B} (0.65) versus {C, D} (0.35). The left subgroup (A) is assigned prefix 0, and the right subgroup ({B, C, D}) is assigned prefix 1. Next, the right subgroup is split after B (0.25) from {C, D} (0.35), again choosing the division closest to equal probabilities within that group. The {C, D} subgroup is then divided into individual symbols C (0.2) and D (0.15), with C assigned 0 and D assigned 1 relative to their prefix. This recursive splitting constructs a binary tree where each leaf corresponds to a symbol, and the code for each symbol is the sequence of branch labels (0 for left, 1 for right) from root to leaf; see ### Tree representation for details on this structure. The resulting prefix codes are A: 0, B: 10, C: 110, and D: 111, with codeword lengths of 1, 2, 3, and 3 bits, respectively. The code table below summarizes the symbols, probabilities, splits, and assigned codes, highlighting the choices made at each division:

Symbol	Probability	Initial Group Split	Subgroup Split	Code
A	0.4	A vs. {B, C, D} (0.4 vs. 0.6)	N/A	0
B	0.25	{B, C, D} prefix 1	B vs. {C, D} (0.25 vs. 0.35)	10
C	0.2	{C, D} prefix 11	C vs. D (0.2 vs. 0.15)	110
D	0.15	{C, D} prefix 11	C vs. D (0.2 vs. 0.15)	111

These codes ensure the prefix property, allowing unambiguous decoding.

Analysis

Expected length calculation

The average codeword length L for a Shannon–Fano code is given by the formula

L = \sum_i p_i l_i,

where p_i is the probability of occurrence of symbol i and l_i is the length (in bits) of the codeword assigned to that symbol. This expected length measures the code's efficiency, representing the average number of bits needed to encode a single symbol drawn from the source distribution.^[1] Following construction of the code tree—whether via Shannon's top-down equiprobable partitioning or Fano's recursive subdivision—the codewords are derived by assigning binary labels along the paths from the root to each leaf node, with l_i equal to the path depth for symbol i. The lengths l_i are then weighted by the source probabilities p_i and summed to obtain L, providing a direct assessment of compression performance.^[1]^[10] The redundancy of the code, defined as L - H(X), quantifies the excess length over the theoretical minimum, where H(X) is the entropy of the source distribution.

Optimality bounds

Shannon–Fano coding produces prefix codes whose average codeword length L satisfies H(X) \leq L < H(X) + 1, where H(X) denotes the entropy of the source, but this bound is not always tight due to inherent construction limitations.^[11] The primary sources of suboptimality stem from rounding errors in assigning code lengths based on negative log-probabilities and suboptimal decisions in partitioning symbol sets during tree construction, which can result in code lengths exceeding the theoretical ideal by up to nearly 1 bit per symbol in adverse probability distributions.^[11] These factors prevent Shannon–Fano codes from achieving the minimal possible L guaranteed by more refined methods like Huffman coding, particularly when symbol probabilities are uneven. In Shannon's variant, code lengths are approximated closely to -\log_2 p_i via ceiling functions, providing a strong approximation to entropy but relying on arbitrary binary assignments to form a valid prefix code, which may introduce minor inefficiencies if the resulting lengths violate the Kraft inequality without adjustment.^[15] Fano's variant, by contrast, builds the code tree through recursive splits aiming for equal cumulative probabilities on each branch, yet imbalances in these splits—often unavoidable with discrete probabilities—can force low-probability symbols into deeper tree levels, inflating their code lengths and thus the overall average L.^[11] Asymptotically, Shannon–Fano coding approaches optimality when applied to large alphabets with smoothly decreasing probabilities, such as geometric distributions, where the redundancy L - H(X) stabilizes around 0.5 bits but the relative inefficiency diminishes as the alphabet size grows, leveraging the law of large numbers in block extensions.^[16] This behavior highlights its utility for practical scenarios with expansive symbol sets, despite persistent gaps from split-induced variances.^[16]

Comparisons

Versus Huffman coding

Huffman coding, introduced by David A. Huffman in 1952, constructs an optimal prefix code through a bottom-up process that repeatedly merges the two symbols or subtrees with the lowest probabilities, building a binary tree that minimizes the weighted path length.^[17] This greedy approach ensures the resulting code is uniquely decodable and achieves the minimum possible average codeword length among all prefix codes for a given probability distribution.^[17] In contrast, Shannon–Fano coding employs a top-down strategy, either by assigning codes based on cumulative probabilities (Shannon's method) or by recursively partitioning symbols into groups of roughly equal total probability (Fano's method), which approximates but does not guarantee the optimal code lengths.^[1]^[11] While both methods produce prefix codes, Huffman's exact minimization of the expected length via dynamic merging outperforms Shannon–Fano's partitioning, which can lead to suboptimal splits, particularly when probabilities are uneven.^[15] Performance-wise, Huffman coding typically yields an average codeword length L satisfying H(X) \leq L < H(X) + 1, with redundancy bounded by the probability of the least likely symbol plus a small constant (approximately 0.086 bits).^[15] Shannon–Fano coding, however, is bounded by H(X) \leq L \leq H(X) + 1, and in practice, it may exceed Huffman's length by up to 1 bit on distributions with skewed probabilities, as the partitioning can imbalance the tree more than necessary.^[15]^[11] For example, on probabilities {0.35, 0.17, 0.17, 0.16, 0.15}, Shannon–Fano achieves L = 2.31 bits while Huffman reaches L = 2.30 bits.^[15] Both algorithms exhibit O(n \log n) time complexity due to the need for sorting probabilities, though Huffman requires an additional priority queue for efficient merging of the smallest elements.^[15]

Versus fixed-length coding

Fixed-length coding assigns the same number of bits to each symbol in the alphabet, typically \lceil \log_2 n \rceil bits for an alphabet of size n, resulting in a constant codeword length L regardless of symbol probabilities.^[1] This approach was a baseline in early communication systems, such as teleprinters using the Baudot code, which employed fixed 5-bit encodings for 32 symbols to simplify transmission in telegraphy.^[18] Shannon–Fano coding, by contrast, employs variable-length codewords, assigning shorter codes to more probable symbols and longer ones to less probable ones, which reduces the average code length for non-uniform probability distributions.^[10] For example, consider a source with four symbols having probabilities 0.5, 0.25, 0.125, and 0.125; fixed-length coding requires 2 bits per symbol, yielding an average length of 2 bits, while Shannon–Fano coding produces codewords of lengths 1, 2, 3, and 3 bits, respectively, for an average length of 1.75 bits.^[1]

Symbol	Probability	Fixed-Length Code (2 bits)	Length	Shannon–Fano Code	Length
A	0.5	00	2	0	1
B	0.25	01	2	10	2
C	0.125	10	2	110	3
D	0.125	11	2	111	3
Average	-	-	2	-	1.75

This compression gain arises because Shannon–Fano approximates the source entropy H(X), the theoretical minimum average length, enabling more efficient use of channel capacity compared to fixed-length methods.^[10] Fixed-length coding suffices when symbol probabilities are uniform, as the entropy H(X) \approx \log_2 n, making the constant length L nearly optimal with minimal redundancy.^[1] In such cases, the simplicity of fixed-length encoding avoids the overhead of variable-length prefix rules, as seen in uniform-signal telegraphy systems before statistical coding advancements.^[18] Historically, Shannon–Fano represented an improvement over these fixed baselines by exploiting probability unevenness, laying groundwork for modern data compression in non-uniform sources like natural language.^[10]