Fact-checked by Grok 2 weeks ago

Entropy coding

Entropy coding is a class of lossless algorithms that assign shorter codes to more probable symbols in a , thereby minimizing the average code length and approaching the theoretical lower bound given by the 's Shannon entropy. This technique exploits the statistical properties of the to eliminate without losing , making it fundamental to efficient in fields such as digital communications, and video , and systems. The concept of entropy coding originates from Claude Shannon's foundational work in , particularly his 1948 paper "," where he defined as a measure of uncertainty or information content in a random source and proved the source coding theorem, establishing as the ultimate limit for efficiency. Building on this, practical methods emerged in the mid-20th century; developed the first widely adopted algorithm in 1952, introducing variable-length prefix codes that optimally assign code lengths proportional to the negative logarithm of symbol probabilities for discrete sources with known distributions. Huffman's method, known as , constructs a based on symbol frequencies to generate uniquely decodable codes, achieving near-entropy performance for stationary sources but requiring integer code lengths, which can introduce minor inefficiencies. Subsequent advancements addressed these limitations, with emerging as a more flexible alternative. Initially conceptualized by Peter Elias in the early 1960s and first practically implemented by Jorma Rissanen in 1976, encodes an entire sequence of symbols into a single fractional number within a , subdivided according to cumulative probabilities, allowing code lengths to more precisely match the without integer constraints. This approach gained prominence through the 1987 paper by Ian H. Witten, , and John G. Cleary, which popularized adaptive and context-based variants for real-world applications. Modern entropy coders, such as Context-Adaptive Binary Arithmetic Coding (CABAC) used in video standards like H.264/AVC and HEVC, combine arithmetic principles with adaptive probability estimation to handle non-stationary data effectively. Key advantages of entropy coding include its optimality for sources with skewed symbol probabilities and its integration with other compression stages, such as transform or prediction coding in formats. However, it requires accurate probability models, and can be higher for arithmetic variants compared to simpler Huffman implementations, though optimizations have mitigated this in contemporary systems. Overall, entropy coding remains a cornerstone of data compression, enabling efficient representation of across diverse applications.

Introduction

Definition and Principles

Entropy coding is a lossless data compression technique that assigns variable-length codes to symbols from a finite based on their probabilities of occurrence, with more probable symbols receiving shorter codes to minimize the average number of bits required per symbol. This approach exploits the statistical redundancy in the source data, ensuring that the encoded can be uniquely decoded back to the original sequence without any loss of information. The fundamental principles of entropy coding revolve around achieving an average code length that approaches the theoretical lower bound given by the source's , while maintaining unique decodability of the codewords. Codes must be designed to be instantaneous, meaning they are -free—no codeword is a of another—allowing the to identify symbol boundaries immediately upon reading the without lookahead. More broadly, the codes ensure unique decodability, where any of codewords corresponds to exactly one possible of symbols, preventing in . This Shannon serves as the irreducible limit for the average code length in of a memoryless source. To illustrate the benefits, consider a simple alphabet with three A, B, and C having probabilities 0.5, 0.25, and 0.25, respectively. A fixed-length scheme would require 2 bits per symbol (e.g., A: 00, B: 01, C: 10), yielding an average length of 2 bits. In contrast, a variable-length prefix-free code can assign A: 0 (1 bit), B: 10 (2 bits), and C: 11 (2 bits), reducing the average length to 1.5 bits while remaining uniquely decodable. Unlike predictive compression methods such as LZ77, which model dependencies between symbols to remove through dictionary-based , entropy coding operates solely on the independent statistical frequencies of symbols without modeling higher-order correlations.

Historical Context

The foundations of entropy coding trace back to the development of in the mid-20th century. In 1948, published "," where he introduced the concept of as the fundamental lower bound on the average number of bits required to represent information from a source, establishing the theoretical limits for lossless data . This work was later expanded in a 1949 book co-authored with Warren Weaver, who provided an accessible interpretation of Shannon's ideas for broader scientific audiences, emphasizing their implications for communication systems. Key milestones in the practical realization of entropy coding followed soon after. In the 1950s, Peter Elias contributed to early advancements in source coding at , including predictive coding techniques that built on Shannon's principles to address efficient encoding for discrete sources. This laid groundwork for David Huffman's 1952 algorithm, which provided an optimal method for constructing prefix-free codes that minimize redundancy based on symbol probabilities, marking a seminal advance in block-based entropy coding. Later, in 1976, Jorma Rissanen developed , patented the following year, which represented a breakthrough by enabling stream-based encoding that could approach Shannon's limit more closely than Huffman codes for certain sources. The evolution of entropy coding shifted from block-oriented methods like Huffman to more flexible stream-based approaches during the 1980s and 1990s, driven by the need for higher efficiency in real-time applications. Arithmetic coding gained practical traction in this period, overcoming earlier implementation challenges related to precision and complexity. This transition facilitated integration into international standards, such as the JPEG image compression standard released in 1992, which employed Huffman coding as its baseline entropy method while optionally supporting arithmetic coding for improved performance. Entropy coding saw practical deployments in fax machines and modems starting in the 1980s, where Huffman-based schemes like Modified Huffman in CCITT Group 3 standards (adopted in 1980) enabled efficient transmission of binary images over telephone lines. Open-source implementations emerged in the mid-1990s, exemplified by the zlib library released in 1995, which incorporated Huffman coding within the DEFLATE algorithm for widespread use in software compression. Post-2010, entropy coding has been revitalized in contexts, particularly for neural network-based , where techniques like hyperprior models estimate probabilities for autoregressive coders to achieve near-optimal rates in and video tasks. Seminal works, such as Ballé et al.'s 2018 variational , demonstrated how learned models could surpass traditional methods in end-to-end learned systems.

Theoretical Foundations

Shannon's Entropy

Shannon entropy, denoted as H(X), quantifies the average uncertainty or information content associated with a random variable X taking values from a discrete alphabet with probabilities p_i for each symbol i. Formally, it is defined as H(X) = -\sum_i p_i \log_2 p_i, where the logarithm is base-2 to measure information in bits; this represents the expected number of bits required to encode the outcomes of X in an optimal, lossless manner. The concept originates from the self-information of an individual outcome x, given by I(x) = -\log_2 p(x), which measures the surprise or improbability of that specific event—the rarer the event, the higher the information content. Entropy then emerges as the ensemble average of this self-information: H(X) = \mathbb{E}[I(X)] = \sum_i p_i I(x_i), providing a foundational measure of the source's inherent unpredictability that underpins efficient coding strategies. The derivation of Shannon follows from axiomatic principles of measurement, emphasizing continuity, monotonicity in probability, and additivity for independent events. Starting with self- as the negative logarithm of probability, which satisfies these axioms for a single outcome, the average over the yields as the unique function meeting the criteria for a broad class of measures. This formulation interprets not only as average but also as the minimal redundancy in representing the source, with higher values indicating greater variability in symbol probabilities. Key properties of Shannon entropy include its additivity for random variables, where H(X,Y) = H(X) + H(Y) if X and Y are , reflecting the combined of separate sources. It is , meaning H(\lambda X + (1-\lambda) Y) \geq \lambda H(X) + (1-\lambda) H(Y) for $0 \leq \lambda \leq 1, which implies that is maximized when probabilities are uniform and decreases as the distribution becomes more peaked. The joint H(X,Y) = -\sum_{i,j} p(i,j) \log_2 p(i,j) generalizes to multiple variables and satisfies H(X,Y) \leq H(X) + H(Y), with equality under , bounding the total . For a simple binary source like a flip, where p(0) = p(1) = 1/2, the calculates as H(X) = -(1/2 \log_2 1/2 + 1/2 \log_2 1/2) = 1 bit, illustrating the need for one bit on average to specify the outcome. As the irreducible limit for lossless source coding, Shannon entropy establishes that no coding scheme can achieve an average codeword length below H(X) bits per symbol in the long run, serving as the theoretical foundation for evaluating efficiency in subsequent theorems like the source coding theorem.

Source Coding Theorem and Kraft Inequality

The source coding theorem, also known as 's noiseless coding theorem, establishes the fundamental limits of for sources. For a memoryless source X with entropy H(X), the theorem states that the average codeword length \bar{l} of any uniquely decodable code satisfies \bar{l} \geq H(X), where lengths are measured in bits assuming a alphabet. This lower bound is approachable asymptotically: for of length n, there exist codes with \bar{l}_n / n \to H(X) as n \to \infty, provided the source is memoryless. The has weak and strong versions. The weak version applies to and identically distributed (i.i.d.) sources, proving that the is both a lower bound on the compression rate and achievable in the limit using block coding. The strong version extends this to ergodic processes, where the H (defined as the limit of the per-symbol entropy of n-blocks) serves as the optimal compression rate, achievable with vanishing error probability for rates above H. The Kraft inequality provides a necessary and sufficient condition for the existence of a binary prefix code with specified codeword lengths l_1, l_2, \dots, l_m: \sum_{i=1}^m 2^{-l_i} \leq 1. This inequality ensures that the codewords do not overlap in the prefix sense, allowing instantaneous decoding. For a D-ary alphabet, the generalized form is \sum_{i=1}^m D^{-l_i} \leq 1. A proof sketch uses a binary tree representation: each prefix code corresponds to leaves in a binary tree at depths l_i, where no leaf is an ancestor of another. To show necessity, extend the tree to a full complete binary tree by adding dummy leaves; the sum \sum 2^{-l_i} then equals the proportion of leaves used by the original code, which cannot exceed 1. Sufficiency follows by constructing the code via a tree where branches are allocated proportionally to the lengths satisfying the inequality. The Kraft inequality implies that Huffman codes are optimal among prefix codes with integer lengths, as their lengths l_i \approx -\log_2 p_i satisfy the inequality (often with equality for complete codes), achieving average length within 1 bit of the entropy bound. For example, consider a 3-symbol source with probabilities p_1 = 0.5, p_2 = 0.25, p_3 = 0.25; code lengths 1, 2, 2 yield \sum 2^{-l_i} = 2^{-1} + 2^{-2} + 2^{-2} = 0.5 + 0.25 + 0.25 = 1, verifying feasibility and optimality. McMillan's theorem generalizes the Kraft inequality to all uniquely decodable codes (not just prefix codes), stating that the same condition \sum 2^{-l_i} \leq 1 is necessary and sufficient for the existence of such a code with the given lengths. This extension ensures the inequality applies broadly to lossless coding schemes. The Kraft inequality originated in the late 1940s, with Kraft's 1949 work building upon related ideas in Shannon's 1948 paper, stemming from early discussions on pulse coding at MIT.

Core Techniques

Huffman Coding

Huffman coding is a foundational entropy coding technique that constructs variable-length prefix codes optimized for a known of symbols, assigning shorter codes to more frequent symbols to minimize the average code length. Developed by in 1952 as part of his doctoral work at , the method builds a where the path from the root to a leaf represents the code for a , ensuring no code is a prefix of another to enable instantaneous decoding. This approach achieves an average code length that is at most the plus one bit per symbol, though it is constrained to integer code lengths. The algorithm employs a bottom-up greedy strategy to construct the code tree using a (typically a min-heap) ordered by symbol probabilities. It begins by creating a leaf for each with its probability. Repeatedly, the two s with the smallest probabilities are extracted, merged into a new internal with probability equal to their sum, and reinserted into the queue; this process continues until only the root remains. Code assignment occurs via a traversal of the final , appending '0' for left branches and '1' for right branches to determine each 's . The resulting satisfies the Kraft equality with equality for optimal distributions, ensuring prefix-freeness. To illustrate, consider four symbols A, B, C, and D with probabilities 0.4, 0.3, 0.2, and 0.1, respectively. Initialize the with these nodes. Merge the lowest-probability nodes D (0.1) and C (0.2) into an internal X (0.3). The queue now holds A (0.4), B (0.3), and X (0.3); next, merge B (0.3) and X (0.3) into Y (0.6). Finally, merge Y (0.6) and A (0.4) into the root (1.0). Traversing the tree (left as 0, right as 1) yields codes A: 0 ( 1), B: 10 ( 2), C: 110 ( 3), and D: 111 ( 3). The average code is $0.4 \times 1 + 0.3 \times 2 + 0.2 \times 3 + 0.1 \times 3 = 1.9 bits per symbol. Huffman coding exists in static and adaptive variants. In static Huffman coding, the probability distribution is assumed known beforehand, allowing the full tree to be built and transmitted once before encoding the data. Adaptive Huffman coding, introduced by in 1978, dynamically updates the tree during encoding based on observed symbol frequencies, enabling real-time adaptation to changing statistics without prior knowledge of the distribution. Another variant, canonical Huffman coding, generates codes deterministically from the sorted list of code lengths rather than storing the full tree, reducing overhead for transmission or storage in applications like standards.

Arithmetic Coding

Arithmetic coding is an encoding technique that represents an entire message as a single fractional number within the unit interval [0, 1), achieving compression by allocating code space proportionally to symbol probabilities rather than assigning fixed-length codes to individual symbols. Introduced by Jorma Rissanen in his 1976 paper on generalized Kraft inequality and , this method allows for fractional bit allocation per , overcoming the inherent one-bit overhead per symbol in and approaching the theoretical limit more closely for skewed probability distributions. However, its early adoption was limited by patents held by , which expired in the mid-1990s, allowing broader implementation thereafter. The approach was further developed by Rissanen and Glenn G. Langdon in their 1979 work, which generalized the technique for practical implementation. The mechanism of arithmetic coding begins by initializing the coding interval to [0, 1). For each symbol in the message, the interval is subdivided according to the symbol's probability distribution, and the subinterval corresponding to the actual symbol is selected, cumulatively narrowing the range. This subdivision uses the cumulative distribution function of the probabilities: if the current interval is [low, high) with width = high - low, the subinterval for a symbol with probability p_i starts at low + (cumulative probability up to previous symbols) * width and has length p_i * width. The process continues until the entire message is encoded, resulting in a final subinterval whose any representative value (e.g., the midpoint) serves as the code. Decoding reverses this by starting from the code value and iteratively identifying the symbol that contains it within the subdivided intervals, using the same probability model. To illustrate, consider encoding the message "BAC" over an {A, B, C} with probabilities P(A)=0.5, P(B)=0.25, P(C)=0.25. Start with [0, 1). For 'B' (cumulative up to A: 0.5), the subinterval is [0.5, 0.75). For 'A' within this (width 0.25, P(A)=0.5 so length 0.125, starting at 0.5), it becomes [0.5, 0.625). For 'C' within [0.5, 0.625) (width 0.125), the subintervals relative to this interval are A: [0.5, 0.5 + 0.5 × 0.125) = [0.5, 0.5625), B: [0.5625, 0.5625 + 0.25 × 0.125) = [0.5625, 0.59375), C: [0.59375, 0.625). Thus, select [0.59375, 0.625) for 'C'. A binary representation of a value in [0.59375, 0.625), such as approximately 0.60 (e.g., binary 0.10011), can be output as the code. Arithmetic coding requires arbitrary-precision arithmetic to represent the shrinking intervals accurately, as finite fixed-point representations can lead to precision loss and decoding errors over long messages. To address this in practice, is employed: when the width falls below a (e.g., 1/2^k for k-bit ), the is rescaled by shifting out leading bits, outputting those bits to the code stream and maintaining sufficient for streaming encoding and decoding. This ensures the method remains efficient without excessive overhead, typically adding negligible bits beyond the bound.

Advanced Variants (Range and ANS Coding)

Range coding represents a practical implementation of principles, adapting the -based encoding to fixed-precision for improved computational efficiency. Introduced by G. Nigel N. Martin in 1979, it maintains a current defined by a lower bound and a width, typically using a fixed scale such as $2^{32} or $2^{64} to avoid floating-point operations. When encoding a , the is subdivided proportionally to probabilities, and the lower bound and are updated accordingly. To prevent underflow—where the becomes too small for precision— occurs by shifting bits from the lower bound to the output and rescaling the , effectively handling overflow in the lower bound by outputting whole bytes at once. This approach reduces overhead compared to full while preserving near-optimal compression ratios. A key application of appears in the standard, where the MQ-coder serves as an adaptive binary arithmetic coder built on range coding techniques to entropy-encode quantized coefficients. The MQ-coder employs 46 probability states per context for adaptation and performs through bit shifting, enabling efficient processing of binary symbols without , which contributes to the standard's superior compression performance over for high-quality images. Asymmetric numeral systems (ANS) offer another advancement, proposed by Jarek Duda in 2007 as a family of coders that model encoding as probabilistic state transitions within a discrete state space. Unlike arithmetic coding's continuous interval updates, ANS represents the encoded data as a state value x in a predefined range, where each symbol emission or absorption corresponds to a deterministic mapping derived from symbol probabilities, often implemented via precomputed lookup tables for rapid table-based encoding and decoding. This table-driven mechanism avoids multiplications entirely, relying instead on bit shifts and , which enhances speed in software implementations—typically achieving rates comparable to while matching arithmetic coding's compression efficiency. ANS has seen adoption in modern compression tools, such as Facebook's Zstandard algorithm introduced in 2015, where variants like finite-state entropy (FSE) and streamed ANS enable fast, adaptive coding atop dictionary-based stages. In compression, post-2014 methods leverage ANS for coding quantized weights and activations; for instance, end-to-end learned frameworks use ANS to model hyperpriors and achieve state-of-the-art rate-distortion performance with low overhead. Key advantages include better parallelization potential due to independent state updates and reduced computational overhead, making ANS particularly suitable for resource-constrained environments like .

Performance Evaluation

Compression Efficiency Metrics

The primary metric for evaluating the compression efficiency of an is the L, defined as the expected number of bits required to encode a single from , given by L = \sum_i p_i l_i, where p_i is the probability of the i-th and l_i is the of its codeword in bits. This measure quantifies the total of the encoded output per , with lower values indicating better for a fixed source distribution. A closely related metric is the redundancy \rho, which captures the excess bits beyond the theoretical minimum and is calculated as \rho = L - H(X), where H(X) is the of the source, H(X) = -\sum_i p_i \log_2 p_i. is always nonnegative for uniquely decodable codes due to the source coding theorem, and \rho \to 0 as the code approaches optimality. Another common metric is the compression ratio, defined as the ratio of the original data size (in bits) to the encoded size, which directly reflects storage or transmission savings and is inversely related to L. Bits per symbol efficiency is often assessed by comparing L to H(X), where the ideal code achieves L \approx H(X), minimizing bits per symbol to the source's inherent uncertainty. The variance of code lengths, \text{Var}(l_i) = \sum_i p_i (l_i - L)^2, influences practical performance by affecting decoding complexity and buffer requirements, though it does not directly alter the average rate. For example, consider a Huffman-coded source with symbols A (probability 0.5, code length 1), B (0.25, length 2), C (0.125, length 3), and D (0.125, length 3). The average code length is L = 0.5 \cdot 1 + 0.25 \cdot 2 + 0.125 \cdot 3 + 0.125 \cdot 3 = 1.75 bits/symbol, while the H(X) = 1.75 bits/symbol yields \rho = 1.75 - 1.75 = 0 bits/symbol. is further affected by factors such as block size and overhead. Larger block sizes in block-based coding allow better approximation of the , as the average code length per symbol converges to H(X) for sequences of length N \to \infty, though practical limits arise from . Additionally, overhead from transmitting code tables (e.g., Huffman codebooks) reduces net for short messages, as the fixed cost of the table amortized over fewer symbols increases effective L.

Comparative Analysis of Techniques

Entropy coding techniques exhibit distinct trade-offs in compression efficiency, , and practical implementation. Huffman coding, while straightforward and optimal for fixed probabilities, incurs overhead, particularly for rare symbols where code lengths may exceed the ideal by up to 1 bit per symbol due to integer codeword constraints. In contrast, and (ANS) achieve compression rates closer to the entropy limit with negligible , often approaching 0 bits per symbol, but at the cost of higher computational demands involving multiplication and state updates. ANS mitigates some of arithmetic coding's slowdown by using table-based operations, offering speeds comparable to Huffman while retaining near-optimal efficiency. These differences are quantified in the following comparison:
MethodAverage Redundancy (bits/symbol)Time Complexity (Encoding)
Huffman<1 (higher for skewed distributions)O(n log n) build + O(n)
Arithmetic~0 (near entropy limit)O(n)
ANS~0 (near entropy limit)O(n)
Huffman coding suits scenarios with small alphabets, such as binary data or simple symbol sets, where its low overhead and fast table lookups enable rapid processing without significant redundancy loss. Arithmetic coding excels in handling text and image data with larger, non-uniform alphabets, providing superior compression ratios despite slower execution. ANS is particularly advantageous for high-speed applications like video streaming, where its linear-time operations and table-driven design yield faster decoding than arithmetic coding in standards such as Zstandard. A common limitation across these methods is the assumption of known symbol probabilities; static versions perform optimally only with precomputed statistics, while adaptive variants introduce additional overhead for probability estimation, increasing both time and memory usage in dynamic data streams. Post-2020 advancements have addressed complexity through hardware acceleration, such as FPGA implementations of that achieve high throughputs for lossless compression tasks, with examples reaching up to 124 MS/s per lane as of 2021. No single entropy coding technique dominates universally; instead, hybrid approaches prevail in modern standards, exemplified by H.265/HEVC's CABAC, which integrates arithmetic coding with context adaptation for balanced efficiency in video compression.

Applications and Extensions

In Data Compression Standards

Entropy coding has been integral to numerous data compression standards since the late 1980s, with serving as the foundational technique in formats like ZIP and PNG. The ZIP format, introduced in 1989, employs in conjunction with LZ77 dictionary compression via the to achieve lossless compression of files. Similarly, the PNG standard, finalized in 1996, uses with [Huffman coding](/page/Huffman coding) for its IDAT chunks, enabling efficient lossless compression of raster images while supporting transparency and interlacing. These early adoptions in the 1980s and 1990s established Huffman as a patent-free, computationally efficient method for general-purpose file and image compression. Arithmetic coding emerged as a more efficient alternative in multimedia standards, offering finer granularity in probability modeling compared to . The (ISO/IEC 10918-1), developed in 1992 and embodied in the JFIF file format, includes an optional arithmetic coding mode for entropy encoding quantized DCT coefficients, providing marginal gains over at the cost of higher complexity. In video compression, the (ITU-T Recommendation H.264, 2003) introduced (CABAC), a binary arithmetic coder that models syntax elements with context-specific probabilities, achieving approximately 10-15% bit-rate reduction over the baseline method. CABAC binarizes input symbols and adapts probability estimates dynamically based on previously coded data, enhancing efficiency for diverse video content. More recent standards have incorporated advanced variants like range and asymmetric numeral systems (ANS) coding to balance compression ratios and speed. Google's WebP format, announced in 2010, utilizes a range coder—derived from the VP8 video codec—for entropy coding in its lossy mode, encoding boolean decisions and coefficients to support both lossy and lossless compression of web images. The Zstandard (zstd) algorithm, released by Facebook in 2016, employs finite-state entropy (FSE) based on ANS for its entropy stage following LZ77 preprocessing, delivering compression ratios superior to DEFLATE while maintaining high throughput for real-time applications. Brotli, specified in IETF draft in 2015 and later as RFC 7932, relies on context-modeled Huffman coding with pre-defined dictionaries for web content compression, outperforming gzip by 20-26% on text and HTML payloads. The DEFLATE specification (RFC 1951, 1996) exemplifies Huffman integration by applying dynamic and static to encode LZ77 symbols, literals, and distances, forming the basis for tools like gzip and PNG filters. Brotli extends this with second-order context modeling in its , adapting to byte histories for better web-specific gains. Evolutionarily, standards have shifted toward context-adaptive entropy coders to exploit spatial and temporal dependencies; for instance, in selects from 400+ context models per syntax element and updates probabilities multiplicatively after each bin, adapting effectively within slices or blocks to capture local statistics. This adaptability contributes significantly to overall compression, with providing 10-20% bit-rate reduction compared to CAVLC in video.

Beyond Compression (e.g., in Machine Learning and Cryptography)

In machine learning, entropy coding techniques like (ANS) have been integrated into (VAEs) to handle discrete latent spaces, enabling more efficient probabilistic modeling and generation. Specifically, the (BB-ANS) method allows for lossless compression of latent variables in VAEs by leveraging ANS to encode and decode samples from discrete distributions, approaching the optimal rate given by the model's evidence lower bound. This approach, introduced in 2019, facilitates practical applications in generative modeling where continuous latent spaces are discretized, improving sampling efficiency without significant loss in reconstruction quality. Entropy regularization, drawing from information-theoretic principles akin to those in entropy coding, is widely used in reinforcement learning (RL) to promote policy exploration by adding a term that maximizes the entropy of the policy distribution. In algorithms like Soft Actor-Critic (SAC), this regularization encourages the agent to maintain stochasticity in action selection, balancing exploitation of known rewards with exploration of uncertain states, which accelerates learning in high-dimensional environments. The seminal SAC framework, published in 2018, demonstrates that entropy-augmented objectives lead to robust policies in continuous control tasks, with empirical gains in sample efficiency over standard RL methods. In cryptography, arithmetic coding has been adapted for secure entropy coding schemes to enhance data protection during compression, particularly by incorporating nonlinear dynamic filters to obfuscate symbol probabilities. One such method, proposed in 2009, uses changeable coefficients in the filter to generate pseudorandom probability intervals, ensuring that the encoded bitstream resists statistical attacks while maintaining compression efficiency close to standard arithmetic coding. This approach is suitable for applications requiring confidentiality in compressed streams, such as secure data transmission. For homomorphic encryption, arithmetic coding principles support compact representations of encrypted data by enabling efficient encoding of probabilistic structures within ciphertexts, though direct integrations are emerging in specialized contexts like entropy estimation on encrypted strings. A 2024 technique computes Shannon entropy directly on homomorphically encrypted data using packed encodings, allowing detection of anomalous patterns (e.g., packed code in malware) without decryption, which reduces computational overhead in secure cloud analytics. Beyond these domains, entropy coding plays a role in genomic data processing through formats like CRAM, which employs rANS and adaptive arithmetic encoders to compress aligned sequencing reads. In CRAM 3.1, released in 2022, enhancements include SIMD-accelerated rANS with data transformations like run-length encoding (RLE) and bit-packing, achieving 7-15% better compression on Illumina data compared to prior versions by modeling symbol frequencies in quality scores and read names. This enables efficient storage and transmission of large-scale genomic datasets, with the format's entropy coders adapting to biological data redundancies. In audio processing, Huffman coding is utilized in watermarking schemes for MP3-encoded signals to embed semi-fragile identifiers within the compressed bitstream. A 2020 method modifies Huffman codewords in the scale factor bands to insert authentication data, preserving audio quality (signal-to-noise ratio >30 ) while detecting tampering through codeword discrepancies upon decoding. This technique leverages the variable-length nature of Huffman codes for high-capacity embedding without altering the perceptual audio characteristics. A key application in involves entropy coding for efficient sampling from complex distributions, as in Bits-Back ANS for VAEs, which recovers "bits back" during decoding to enable bidirectional inference in generative models. In emerging 2020s developments, such as , relative entropy coding (REC) compresses model updates to reduce communication bandwidth, achieving significant communication reductions, such as an 18-fold on the FEMNIST dataset, while maintaining accuracy in privacy-preserving settings. Techniques like with entropy encoding further optimize transmission in resource-constrained networks, as demonstrated in 2021 frameworks.

References

  1. [1]
    [PDF] Entropy and Information Theory - Stanford Electrical Engineering
    This book is devoted to the theory of probabilistic information measures and their application to coding theorems for information sources and noisy channels ...<|control11|><|separator|>
  2. [2]
    Entropy coding techniques (Chapter 4) - Digital Signal Compression
    Huffman coding is one common form of entropy coding. Another is arithmetic coding and several adaptive, context-based enhancements are parts of several standard ...
  3. [3]
    [PDF] MIT Open Access Articles Entropy Coding in HEVC
    Abstract Context-Based Adaptive Binary Arithmetic Coding (CABAC) is a method of entropy coding first introduced in H.264/AVC and now used in the latest High ...
  4. [4]
    [PDF] Entropy Coding
    We can define the code. C as follows ... Therefore, we first outline the principles and procedures of SFE codes, and then describe arithmetic coding.
  5. [5]
    [PDF] Parallel Algorithms for Entropy-Coding Techniques
    Lossless compression, also called entropy coding, is of special importance be- cause not only it serves as a stand-alone system for certain applications such ...<|control11|><|separator|>
  6. [6]
    [PDF] A Mathematical Theory of Communication
    We define the entropy of the ensemble per degree of freedom by. H0= ,Lim n!∞. 1 n. Z Z p(x1;:::;xn)log p(x1;:::;xn)dx1 :::dxn: We may also define an entropy H ...
  7. [7]
    [PDF] EE 376A Lecture 17: Image compression
    Mar 5, 2019 · Based on DEFLATE (LZ77 + Huffman coding). Page 23. PNG workflow ... prediction and entropy coding. Modern lossy compression: WebP. Page 39 ...
  8. [8]
    [PDF] The Mathematical Theory of Communication - Monoskop
    The mathematical theory of the engineering aspects of com- munication, as developed chiefly by Claude Shannon at the Bell. Telephone Laboratories, admittedly ...
  9. [9]
    [PDF] Peter Elias, 1923–2001 - IEEE Information Theory Society Newsletter
    In this same paper Peter introduced and named “convolutional codes”. His motivation was to show that it was in principle possible, by using a convolutional code ...
  10. [10]
    [PDF] A Method for the Construction of Minimum-Redundancy Codes*
    A Method for the Construction of. Minimum-Redundancy Codes*. DAVID A. HUFFMAN+, ASSOCIATE, IRE. September. Page 2. 1952. Huffman: A Method for the ...
  11. [11]
    US4122440A - Method and means for arithmetic string coding
    There is disclosed a method and means for compacting (encoding) and decompacting (decoding) binary bit strings which avoids the blocking of string elements ...
  12. [12]
  13. [13]
    [PDF] itu-t81.pdf
    Two alternative entropy coding procedures are specified: Huffman coding and arithmetic coding. ... JPEG Colour Data Compression Standard, Standards for.
  14. [14]
    [PDF] Chapter 8: Information, Entropy, and Coding - Princeton University
    Huffman coding is a simple and systematic way to design good variable-length codes given the probabilities of the symbols. The resulting code is both uniquely.
  15. [15]
    A Mathematical Theory of Communication - Wiley Online Library
    A Mathematical Theory of Communication - CE Shannon, CE Shannon. Search for more papers by this author. First published: July 1948.
  16. [16]
    [PDF] Towards Shannon's Entropy Theorem - Helmut Knaust
    Kraft's inequality was published in Kraft (1949). However, Kraft's paper discusses only prefix codes, and attributes the analysis leading to the inequality to ...
  17. [17]
    A device for quantizing, grouping, and coding amplitude-modulated ...
    A device for quantizing, grouping, and coding amplitude-modulated pulses. Author(s). Kraft, Leon Gordon. Thumbnail. DownloadFull printable version (3.914Mb) ...Missing: PDF | Show results with:PDF
  18. [18]
    [PDF] Lecture 7: Prefix codes, Kraft-McMillian Inequality 7.1 Recap
    Proof: First note that any prefix code can be represented by a D-ary tree TD (where each non-leaf node has D children) where the A symbols of A appear only at ...
  19. [19]
    Two inequalities implied by unique decipherability - IEEE Xplore
    Two inequalities implied by unique decipherability. Publisher: IEEE. Cite This. PDF. B. McMillan. All Authors. Sign In or Purchase. 131. Cites in. Papers.Missing: theorem | Show results with:theorem
  20. [20]
  21. [21]
    Huffman Coding | ACM Computing Surveys - ACM Digital Library
    This article presents a tutorial on Huffman coding and surveys some of the developments that have flowed as a consequence of Huffman's original discovery.<|control11|><|separator|>
  22. [22]
    [PDF] Arithmetic Coding for Data Compression
    Arithmetic coding provides an e ective mechanism for remov- ing redundancy in the encoding of data. We show how arithmetic coding works and describe an e ...
  23. [23]
    Generalized Kraft Inequality and Arithmetic Coding - IEEE Xplore
    Algorithms for encoding and decoding finite strings over a finite alphabet are described. The coding operations are arithmetic involving rational numbers.
  24. [24]
    On the Overhead of Range Coders - Xiph.org
    The original patents on arithmetic coding have expired, so this is no longer a compelling argument. Variants of both methods using table look-ups or multiply- ...
  25. [25]
    [PDF] Image Compression: The Mathematics of JPEG 2000
    The coding rate is derived directly through the length of the coding bitstream at certain instances, e.g., at the end of each subbitplane. The coding distortion ...
  26. [26]
    [0902.0271] Asymmetric numeral systems - arXiv
    Feb 2, 2009 · In this paper will be presented new approach to entropy coding: family of generalizations of standard numeral systems which are optimal for encoding sequence ...Missing: 2007 | Show results with:2007
  27. [27]
    Asymmetric numeral systems: entropy coding combining speed of ...
    Nov 11, 2013 · Asymmetric numeral systems (ANS) is a new approach to accurate entropy coding, which allows to end this trade-off between speed and rate.Missing: 2007 | Show results with:2007
  28. [28]
    [PDF] A Method for the Construction of Minimum-Redundancy Codes*
    Summary - An optimum method of coding an ensemble of mes- sages consisting of a finite number of members is developed. A minimum-redundancy code is one ...
  29. [29]
    Data Compression -- Section 3
    It is important to note that in defining redundancy to be average codeword length minus entropy, the cost of transmitting the code mapping computed by these ...
  30. [30]
    Compression Ratio - an overview | ScienceDirect Topics
    The compression ratio is a ratio, percentage, or fraction representing the data size difference before and after compression. The higher the compression ratio, ...
  31. [31]
    [PDF] Entropy, Coding and Data Compression
    What is the difference between the two solutions? • They have the same average length. • They differ in the variance of the average code length.
  32. [32]
    [PDF] Lecture 9: Huffman Codes
    • Take the optimal code lengths found by Huffman codes ... for any prefix code. H(X) ≤ L. Huffman. < H(X)+1. • Redundancy = average Huffman codeword length - H(X).Missing: calculation | Show results with:calculation
  33. [33]
    [PDF] Fast and Efficient Entropy Compression of ALICE Data using ANS ...
    This leads to an inefficiency of Huffman coding of up to one Bit per symbol. This inefficiency is overcome by Arithmetic coding [8], where the coding function C ...<|control11|><|separator|>
  34. [34]
    [PDF] Asymmetric numeral systems: entropy coding combining speed of ...
    Jan 6, 2014 · Asymmetric numeral systems (ANS) is a new approach to accurate entropy coding, which allows to end this tradeoff between speed and rate: the ...
  35. [35]
    [PDF] Evaluation of Huffman and Arithmetic Algorithms for Multimedia ...
    Arithmetic coding has a better compression ratio, while Huffman coding has higher performance and easier implementation. Both are used in JPEG entropy coding.
  36. [36]
    [PDF] A Comparative Study on the Performance of Huffman Coding and ...
    The outstanding advantage of arithmetic coding is that it can achieve a higher compression ratio because it can encode characters into decimal numbers of any ...<|control11|><|separator|>
  37. [37]
    Fast and Efficient Entropy Coding Architectures for Massive Data ...
    Sep 26, 2023 · This work provides a general framework to assess and optimize different entropy coders. First, the paper describes three main families of entropy coders.
  38. [38]
    Analysis and comparison of adaptive huffman coding and arithmetic ...
    Huffman coding is known to be optimal, yet its dynamic version may be even more efficient in practice. A new variant of Huffman encoding has been proposed ...
  39. [39]
    An FPGA-Based LOCO-ANS Implementation for Lossless and Near ...
    Nov 26, 2021 · In this work, we present and evaluate a hardware architecture for the LOCO-ANS (Low Complexity Lossless Compression with Asymmetric Numeral Systems)
  40. [40]
    [PDF] a comparison of cabac throughput for hevc/h.265 vs. avc/h.264
    HEVC uses context adaptive binary arithmetic coding (CABAC) to losslessly compress syntax elements to encoded bits. While CABAC provides higher coding effi-.
  41. [41]
  42. [42]
    DP-REC: Private & Communication-Efficient Federated Learning
    Nov 9, 2021 · We introduce a compression technique based on Relative Entropy Coding (REC) to the federated setting. With a minor modification to REC, we ...