Brotli
Brotli is a generic-purpose lossless compression algorithm that compresses data using a combination of a modern variant of the LZ77 algorithm, Huffman coding, and second-order context modeling.[1][2] Developed by Google engineers including Jyrki Alakuijala and Zoltán Szabadka, it was introduced in late 2013 as an open-source project to improve web content compression efficiency.[3] The Brotli compressed data format is formally specified in RFC 7932, published by the Internet Engineering Task Force (IETF) in July 2016, defining a structure optimized for both compression density and decoding speed.[2] Unlike earlier algorithms like gzip, Brotli supports compression levels from 0 to 11, allowing trade-offs between speed and ratio, and it achieves 10-20% better compression for most text-based web assets compared to gzip at equivalent decompression speeds.[4][5] By 2025, Brotli enjoys near-universal support across major web browsers, including Chrome, Firefox, Safari, and Edge, as well as widespread implementation in web servers like NGINX and Apache.[6][7] This adoption has made it a standard for reducing bandwidth usage and accelerating page loads, particularly for HTTP responses, while maintaining compatibility as a content-encoding method.[8]Introduction
Etymology
The name "Brotli" derives from the Swiss German word Brötli, the diminutive form of Brot meaning "bread," referring to a small bread roll or bun.[9][10] This choice was made by the primary developers, Jyrki Alakuijala and Zoltán Szabadka, based at Google Research in Zürich, Switzerland, to evoke a lighthearted theme inspired by Swiss culinary traditions.[11] The naming follows a playful convention in Google's compression projects, similar to the earlier Zopfli algorithm, which is also named after a traditional Swiss braided bread.[9] This reflects the team's connection to Swiss culture through their base in Zürich, infusing the technical endeavor with a touch of regional heritage.[11]Definition and Purpose
Brotli is a free and open-source lossless data compression algorithm and associated file format developed by Google.[1][3] Initially developed in 2013 for compressing web fonts in the WOFF 2.0 format,[12] it enables the reduction of data size without any loss of information, making it suitable for various applications while preserving the original content integrity.[2] The primary purpose of Brotli is to deliver superior compression ratios compared to established methods like gzip and deflate, coupled with efficient decompression speeds, to minimize bandwidth usage in web transfers.[3] This focus targets significant savings in data transmission over HTTP, particularly benefiting resource-constrained environments such as mobile networks.[8] While Brotli functions as a general-purpose compressor, it is particularly optimized for text-based web content, including HTML, CSS, and JavaScript files.[8] Users can adjust its compression intensity through quality levels ranging from 0, which performs no compression for rapid processing, to 11, which maximizes density at the expense of encoding time.[1]History
Development
Brotli was initiated in 2013 by Google engineers Jyrki Alakuijala and Zoltán Szabadka as a compression algorithm specifically designed for WOFF 2.0, the Web Open Font Format version 2.0, to reduce the size of web font transmissions and thereby accelerate web page loading.[13][12] The project emerged from Google's efforts to optimize font delivery in browsers, where large font files often contributed significantly to page weight, prompting the need for a more efficient encoding method tailored to font data structures.[14] The primary motivation for Brotli's development was to overcome the limitations of existing compressors like gzip, which, while fast, provided suboptimal ratios for web assets such as fonts and HTML. Alakuijala and Szabadka aimed to achieve 20–26% better compression ratios than Zopfli—a Google-optimized variant of gzip—on representative web content, including the Canterbury Corpus benchmark dataset, while maintaining decompression speeds comparable to deflate.[9] This focus on density improvements was driven by the goal of minimizing bandwidth usage without excessively increasing computational demands on client devices.[3] Early prototypes of Brotli underwent internal testing at Google, with iterative refinements to the format, including enhancements to the context model for text and fonts, achieving approximately 30% faster decompression on international HTML datasets compared to initial versions.[14] These prototypes were integrated into Chrome's font rendering pipeline to evaluate real-world performance, validating the algorithm's efficacy for WOFF 2.0 before broader open-sourcing efforts.[15] By late 2013, a simplified Brotli specification draft was prepared, alongside an open-source WOFF 2.0 converter, marking the transition from internal experimentation to standardization preparation.[14]Releases and Standardization
Brotli was open-sourced by Google in September 2015 under the permissive MIT license, making the reference implementation freely available for use and modification.[1] This release marked the transition from internal development—initially focused on compressing web fonts in the WOFF2 format—to a general-purpose compression tool accessible to the broader community. The initial stable release, version 1.0.0, followed in September 2017, providing a reliable version for integration into various software ecosystems.[16] In July 2016, the Internet Engineering Task Force (IETF) formalized Brotli's compressed data format through RFC 7932, establishing it as a standard method for HTTP content encoding.[2] This specification detailed the bitstream structure, enabling interoperable implementations across servers, browsers, and other tools while ensuring compatibility with existing web infrastructure. In September 2025, RFC 9841 further extended the format with support for shared dictionaries and large windows in a shared Brotli compressed data format.[17] Subsequent releases emphasized stability, security, and performance refinements. Version 1.0.0, issued in 2017, prioritized overall stability and refinement of the core library for production use.[16] By August 27, 2020, version 1.0.9 addressed critical issues, including an integer overflow vulnerability (CVE-2020-8927) in the decoder and optimizations for encoder and decoder speed, alongside added WebAssembly support.[18] The latest major update, version 1.2.0 released on October 27, 2025, incorporated minor efficiency enhancements such as faster encoding and reduced binary sizes through static initialization options. Recent IETF work, including RFC 9841 published in September 2025, formalized the Large Window Brotli variant as an experimental extension, allowing window sizes up to 1 GiB to better handle large-file compression scenarios.[17][19]Technical Description
Compression Algorithm
Brotli's compression process begins by dividing the input data into meta-blocks, each of up to 16 MiB (2^{24} bytes), which are processed independently to allow parallelization and streaming.[2] Each meta-block consists of a header specifying its size and type—uncompressed, compressed, or empty (for the final block marking the end of the stream)—followed by the block's content encoded as a sequence of commands.[2] This structure supports efficient handling of large inputs while maintaining low memory overhead during decompression.[2] The core of Brotli's compression is a variant of the LZ77 algorithm, which identifies repeated strings via backward references consisting of a length and a distance pointing to prior data within a sliding window of up to 16 MiB (addressable with 24 bits).[2] Distances are encoded using a combination of a Huffman-coded distance code and extra bits, with the final distance computed as \text{[distance](/page/Distance)} = ((\text{[offset](/page/Offset)} + \text{dextra}) \ll \text{NPOSTFIX}) + \text{lcode} + \text{NDIRECT} + 1, where offset derives from the distance code, ndistbits (up to 24), NPOSTFIX (0–3 postfix bits for precision), and NDIRECT (0–120 direct short distances); this encoding uses 5–31 bits total depending on the parameters.[2] To enhance ratios for web content, LZ77 is augmented by a static dictionary of 122,784 bytes containing 121,918 common phrases and words, allowing references via special distance codes (16–23) that transform dictionary entries with offsets and lengths.[2] Following LZ77 parsing, Brotli applies entropy coding to the resulting symbols (literals, distances, and command prefixes) using adaptive Huffman codes with context modeling. Contexts for literals are selected from up to 64 IDs (0–63) per block type, derived from the previous two uncompressed bytes via modes like LSB6 (least significant 6 bits of the prior byte) or UTF8-specific lookups, effectively providing 256 context possibilities across block switches (0–255 types).[2] Distance contexts use 4 IDs based on copy length (e.g., ID 0 for length 2).[2] Huffman codebooks are either static (predefined for efficiency) or dynamic (built per meta-block from symbol frequencies using a simple entropy coder), ensuring adaptation to local data statistics.[2] Preprocessing in Brotli includes optional use of the static dictionary for backward references and techniques like meta-block alignment to byte boundaries for better web encoding.[2] The encoder supports 12 quality levels (0–11), trading compression speed for ratio; higher levels employ more iterations in LZ77 parsing (e.g., two iterations at maximum quality using entropic costs) and advanced optimizations like dynamic programming for command selection.[20] An experimental large window extension, defined in RFC 9841, expands the sliding window to up to 62 bits (theoretically ~4 PiB), enabling better ratios on massive corpora by increasing maximum distances while raising encoding complexity and requiring 64-bit integers for full support.[21] This mode signals via a special 14-bit pattern in the stream header, with distance extra bits up to 62.[21]Decompression Process
The Brotli decompression process is engineered for simplicity and high speed, enabling efficient decoding even on resource-constrained devices. It operates by sequentially parsing the compressed stream, reconstructing data through LZ77-based techniques, and assembling the output while maintaining low memory overhead. This design prioritizes rapid execution over complex preprocessing, making it suitable for real-time applications.[22] Stream parsing begins with the decoder reading the stream header, which specifies parameters like the window size (ranging from 1 KiB - 16 B to 16 MiB - 16 B) and the large window flag. The decoder then processes meta-blocks in order, each preceded by a header that includes an "is last meta-block" flag, an "is uncompressed" flag, and the meta-block size (up to 16 MiB of output). Compressed meta-blocks contain Huffman-coded data for literals, commands, and distances, while uncompressed ones deliver data directly. This sequential structure supports streaming decompression, allowing partial decoding of ongoing streams without buffering the entire input.[23][24] LZ77 decoding reconstructs the original data by emitting literals or copying segments from backward references within the sliding window. The window is managed as a ring buffer to track recent output, with the last four distances (initialized to 16, 15, 11, and 4) used for efficient reference resolution. Backward references specify an offset and length, pulling data from the ring buffer or, if applicable, the static dictionary (predefined words of 4 to 24 bytes). Commands, decoded from the meta-block, dictate whether to insert a literal, copy a distance reference, or handle dictionary matches, ensuring the output sequence matches the compressed representation.[25][26][27] Huffman decoding employs fast, table-based lookups to interpret the bitstream efficiently. Symbols for literals (contexts 0 to 63), insert-and-copy commands, and distances (contexts 0 to 3) are decoded using dynamically built Huffman trees or context maps, which adapt based on prior symbols for better efficiency. If dynamic decoding encounters issues, such as invalid tree structures, the process falls back to predefined static Huffman codes or tables to ensure robustness and prevent decoding failures. These tables are constructed on-the-fly from the meta-block's prefix codes, using simple bit-reading operations for speed.[28][29] Output assembly concatenates the decompressed contents of each meta-block, incorporating any inserted dictionary words directly into the stream. The decoder writes literals and copied segments to the output buffer sequentially, rejecting the stream if errors occur, such as meta-block lengths exceeding limits, invalid distances, or buffer overflows. This final stage ensures the complete, contiguous output without additional post-processing.[27][26] The overall design emphasizes low resource consumption, using O(1) space complexity beyond the sliding window size to accommodate memory-limited environments like mobile devices. Implementations are advised to include sanity checks on parameters such as window size to guard against malformed inputs, further enhancing reliability without increasing overhead. This focus on efficiency allows Brotli decoders to operate swiftly across diverse hardware.[22][30]Implementations
Reference Implementation
The reference implementation of Brotli is Google's official C library, hosted on GitHub, which provides a complete encoder and decoder for the Brotli compression format.[1] Released under the permissive MIT license in 2015, the library includes the core componentsbrotlienc for encoding and brotlidec for decoding, enabling lossless compression using a variant of the LZ77 algorithm combined with Huffman coding.[1][31]
Key features of this implementation encompass support for all 12 compression quality levels from 0 to 11, where level 0 prioritizes speed with minimal compression and level 11 maximizes density at the cost of higher computational demands.[32] It also offers command-line tools via the brotli executable for straightforward compression and decompression tasks, such as brotli -q 6 input.txt to compress a file at quality level 6, alongside integration APIs designed for embedding in C and C++ applications.[33] The library is portable across multiple platforms, including Windows, macOS, Linux, and embedded systems, due to its ANSI C codebase.[1]
The implementation remains actively maintained by Google, with regular updates addressing security, performance, and compatibility. The version 1.2.0 release in October 2025 introduced bug fixes, enhanced Large Window support in the encoder for handling binaries larger than 2 GiB, and optimizations like faster static dictionary initialization.[19] This library serves as the default Brotli implementation in Google Chrome for web content decoding and in Android for system-level compression tasks.[34]
Alternative Implementations
In addition to the reference C implementation, several alternative implementations and ports of Brotli have been developed to enhance licensing flexibility, support diverse programming environments, and enable specialized use cases.[1] One notable reimplementation is the independent decoder by Mark Adler, released in 2016 under the Apache License 2.0, which provides compatibility for projects restricted from using the original MIT-licensed code while adhering strictly to the Brotli format specification.[35] This version focuses solely on decompression and has been integrated into various open-source projects requiring permissive licensing.[35] Ports to other programming languages have extended Brotli's accessibility beyond C. For JavaScript, particularly in browser environments, Emscripten-compiled versions convert the C reference code to WebAssembly, enabling client-side compression and decompression without native dependencies; examples include the brotli-wasm library, which supports both Node.js and browsers.[36] In Rust, the brotli crate offers a direct port of the C implementation, providing no_std compatibility for embedded systems and kernel use while maintaining high performance for both encoding and decoding.[37] Similarly, Python bindings via the Brotli package on PyPI wrap the C library using CFFI, allowing seamless integration into Python applications for tasks like data archiving and web serving.[38] Specialized variants address hardware-specific optimizations. Brotli-G, introduced by AMD in 2022, adapts the algorithm for GPU acceleration, particularly suited for compressing graphics and digital assets, with CPU-based encoding and GPU-accelerated decoding to leverage parallel processing in graphics pipelines.[39] Experimental research forks have explored large-window decoders, extending beyond the standard 24-bit window size (16 MB) to support up to 30 bits or more for improved compression on datasets with long-range redundancies, though these remain incompatible with the core specification and are primarily used in academic benchmarks.[40] Brotli has also been integrated into popular compression tools. Extended versions of 7-Zip, such as the 7-Zip-zstd fork, incorporate Brotli support within the 7z container format for archiving, enabling users to create and extract Brotli-compressed archives alongside other methods like LZMA.[41] PeaZip natively supports Brotli for both compression and extraction of .br files, as well as embedding it in multi-format archives, making it available in a portable, open-source file manager.[42] In Android, Brotli is utilized in over-the-air (OTA) update utilities to compress system images, reducing download sizes and accelerating updates starting from 2017 implementations in the Android Open Source Project.[43]Performance
Benchmarks
Brotli's performance has been evaluated using standard corpora such as the Silesia benchmark, which consists of diverse file types totaling approximately 211 MB. At lower quality settings, such as level 1, Brotli achieves a compression ratio of about 2.88 on this corpus, with decompression speeds reaching around 425 MB/s on modern x86-64 CPUs.[44] At maximum quality level 11, the compressed size reduces to roughly 50 MB, yielding a higher ratio of approximately 4.2, though compression time increases substantially.[45] Brotli offers 11 quality levels (0 to 11), balancing compression speed, ratio, and resource demands. At level 1, suitable for fast compression scenarios, throughput reaches about 290 MB/s, enabling rapid processing for dynamic content.[44] In contrast, level 11 prioritizes density, with compression speeds dropping to around 1 MB/s, but producing outputs up to 20-30% smaller than those at level 1 for text-heavy data like web assets. Decompression remains efficient across levels, typically 400-500 MB/s on contemporary hardware, making it suitable for client-side operations.[44] The encoder's memory footprint scales with quality and window size; at maximum settings (quality 11, window size 24 bits), it can require up to 256 MB for internal buffers and dictionaries during processing. The decoder, however, is lightweight, using under 20 MB even for large streams, which supports its deployment in resource-constrained environments like browsers. In the 2025 release of version 1.2.0, Brotli incorporates optimizations such as static initialization for the encoder and decoder, resulting in reduced binary sizes and faster startup times across platforms, enhancing overall efficiency without altering core ratios.[46]| Quality Level | Approx. Compression Speed (MB/s) | Relative Output Size Reduction vs. Level 1 |
|---|---|---|
| 1 | 290 | Baseline |
| 11 | ~1 | 20-30% smaller |