Fact-checked by Grok 2 weeks ago

bzip2

bzip2 is a free and open-source lossless file compression program designed for compressing individual files using the Burrows–Wheeler block-sorting transform followed by Huffman coding and move-to-front transformation.^[1] It produces archived files with the .bz2 extension and is not an archiver, meaning it handles single files rather than multiple files or directories.^[2] Developed by Julian Seward, bzip2 was first released on July 18, 1996, as version 0.15, with the stable version 1.0 arriving in late 2000 and the latest update, version 1.0.8, on July 13, 2019.^[2]^[1] The program is patent-free and licensed under a BSD-like license, ensuring broad compatibility and redistribution across major operating systems, including Unix-like systems where it is a standard utility.^[3] It operates on data blocks ranging from 100 kB to 900 kB, applying run-length encoding before the Burrows–Wheeler transform to exploit local redundancies and achieve compression ratios typically 10–15% better than gzip for text and similar data.^[1]^[4] In terms of performance, bzip2 generally provides superior compression density compared to gzip but at the cost of slower compression and decompression speeds; for example, on large files like the Linux kernel source, it achieves about 17.8% of original size versus gzip's 22.4%, while taking roughly 3–4 times longer to compress and over 10 times longer to decompress.^[4] Despite these trade-offs, its efficiency, integrity checks via 32-bit CRC, and support for partial recovery from damaged files make it a preferred choice for archival purposes in environments prioritizing space savings over speed.^[2] The associated library, libbzip2, enables integration into other software for in-memory compression tasks.^[1]

Introduction

Overview

bzip2 is a free and open-source lossless data compression program and algorithm that employs block-sorting techniques to achieve high compression ratios, particularly effective for text and structured data.^[5]^[6] Developed by Julian Seward, it was first released on July 18, 1996, as version 0.15, and produces compressed files with the .bz2 extension.^[3]^[7] The tool supports streaming compression, allowing it to process data in blocks without requiring the entire input to fit in memory, and is designed to balance superior compression density—typically 10-15% better than earlier methods like gzip—with acceptable processing speeds.^[5]^[6] This makes bzip2 a popular choice for archiving and data transfer in environments where storage efficiency is prioritized over rapid compression.^[8] As of November 2025, bzip2 remains a stable and widely available utility in Unix-like operating systems, with the latest official release being version 1.0.8 from July 13, 2019.^[9] Following Julian Seward, maintenance was assumed by Federico Mena in June 2019 and by Micah Snyder in June 2021 for feature development toward version 1.1 and later, with stable updates handled by Mark Wielaard at Sourceware.^[5]^[10] Maintained under a BSD-style license, it continues to be integrated into various software ecosystems for reliable data compression needs.^[5]

Basic Usage

bzip2 is a command-line utility commonly pre-installed on Unix-like operating systems such as Linux distributions and macOS, allowing users to compress and decompress files without additional setup in many cases.^[5] For systems where it is not available, installation on Debian-based Linux distributions like Ubuntu can be achieved via the Advanced Package Tool (APT) with the command sudo apt install bzip2.^[11] On macOS, users can install it using Homebrew by running brew install bzip2.^[12] The core syntax for the bzip2 command is bzip2 [options] [filenames ...], where specifying one or more filenames compresses each file into a corresponding .bz2 archive, replacing the original unless otherwise specified.^[1] For example, running bzip2 file.txt compresses file.txt into file.txt.bz2, appending the .bz2 extension to indicate the bzip2-compressed format.^[1] To decompress, use bzip2 -d file.txt.bz2, which restores the original file.txt and removes the archive.^[1] Several key options enhance basic functionality: the -d or --decompress flag forces decompression mode; -k or --keep preserves the original files after compression; compression levels range from -1 (fastest, least compression) to -9 (slowest, highest compression), with -9 providing the maximum ratio at the cost of more processing time; and -v or --verbose outputs details like the compression ratio achieved.^[1] For instance, bzip2 -9 -k -v file.txt compresses file.txt at the highest level, keeps the original, and reports the ratio.^[1] Piping enables streaming compression without intermediate files, useful for handling large data on the fly.^[1] An example is cat file.txt | [bzip2](/page/bzip2) -c > file.txt.bz2, where -c outputs to stdout for redirection into the archive.^[1] Similarly, decompression via pipe is shown in bzcat file.txt.bz2 | less, displaying the contents without extracting to disk.^[1] Common errors include insufficient permissions, which result in an exit code of 1 indicating an environmental issue, often resolved by using sudo or adjusting file access rights.^[1] Corrupted archives trigger an exit code of 2; in such cases, the bzip2recover tool can attempt salvage by running bzip2recover damaged.bz2, though success depends on the damage extent.^[1] bzip2 integrates seamlessly with archiving tools like tar for creating compressed bundles.^[1] A typical command is tar -cjf archive.tar.bz2 directory/, where -c creates the archive, -j specifies bzip2 compression, and -f names the output file.^[1]

History and Development

Origins and Creation

In the mid-1990s, the dominant file compression tool gzip, based on the LZ77 algorithm combined with Huffman coding, was widely used but exhibited limitations in achieving higher compression ratios for certain data types, such as text-heavy files and software archives, where more efficient block-based methods were sought.^[13] This context motivated the development of alternatives that could leverage emerging techniques for improved performance without excessive computational overhead. bzip2 was created by Julian Seward, a British developer, who began work on the project in 1995.^[14] Inspired by the Burrows-Wheeler transform (BWT) introduced in a 1994 technical report by Michael Burrows and David J. Wheeler at Digital Equipment Corporation, Seward aimed to implement a practical compressor that exploited BWT's ability to reorder data for better statistical predictability, surpassing gzip's ratios while remaining suitable for open-source software distribution and archival needs.^[13] The initial version, 0.15, was released on July 18, 1996, marking the program's debut as a free, open-source utility focused on lossless compression.^[3] By 1997, bzip2 had gained early recognition in the open-source community, reflecting its rapid acceptance for handling compressed archives in Unix-like environments. Seward personally maintained the project from its inception without corporate sponsorship until 2021, ensuring its evolution as a standalone, community-driven tool. Since June 2021, Micah Snyder has served as the maintainer for feature development.^[3]^[15]

Versions and Maintenance

bzip2's development has proceeded through a series of incremental releases since its initial stable version, focusing primarily on stability, security enhancements, and portability improvements. Version 1.0, released in 2000, established a stable API and included features such as large file support with 64-bit counters and robust decompression handling. Subsequent updates addressed specific issues: 1.0.2 in 2003 fixed a segmentation fault loop and I/O error conditions, along with adding scripts like bzdiff and bzgrep for enhanced usability. Version 1.0.3, released on February 15, 2005, bolstered decompression robustness against malformed inputs (addressing CAN-2005-1260). Further security-focused releases followed, including 1.0.4 on December 20, 2006, which resolved file permission races (CAN-2005-0953) and improved script security (CAN-2005-0758); 1.0.5 on December 10, 2007, for CERT-FI 20469 vulnerabilities; and 1.0.6 on September 6, 2010, tackling CVE-2010-0405. Later versions, 1.0.7 on June 27, 2019, and 1.0.8 on July 13, 2019, corrected undefined behaviors, buffer overflows (CVE-2016-3189), and selector range issues (CVE-2019-12900), while also enhancing Windows file handling for files larger than 4 GB and adding a comprehensive testsuite.^[9] The latest stable release remains 1.0.8 from 2019, which primarily addressed minor security refinements and portability across platforms, with no upstream changes to the core compression algorithm.^[9]^[16] As of 2025, bzip2 maintenance has seen no major upstream updates since 1.0.8, reflecting a deliberate emphasis on stability over frequent changes; however, community-driven efforts persist through forks and integrations, such as the libbzip2 library for embedding in other software like 7-Zip, which incorporates bzip2 compression without official multi-core support in the core tool. As of early 2025, development continues with contributions like Meson build support, though no new release has materialized.^[17] A development repository on GitLab, maintained by Micah Snyder since 2021, explores enhancements for a potential 1.1 version, including build system improvements and test expansions.^[5]^[15] bzip2 is distributed under a permissive, BSD-like open-source license that permits free use, modification, and redistribution, with source code available via the official website at sourceware.org/bzip2 and various mirrors.^[5]^[18] The project's maintenance challenges stem from its historical solo stewardship by creator Julian Seward, resulting in a slow release cycle prioritized for reliability; this approach has sustained bzip2's longevity but limited adaptations like native multi-threading, often handled via external wrappers in modern applications.^[5]^[15]

Technical Details

Algorithm

bzip2 employs a block-based compression strategy, dividing input data into independent blocks of up to 900 kB each to handle large files efficiently, with the default block size being 900 kB.^[1] Each block undergoes a sequence of reversible transformations: first, run-length encoding (RLE) to collapse long runs of identical characters in the input; second, the Burrows-Wheeler transform (BWT) to reorder the data and cluster similar characters; third, move-to-front (MTF) coding to convert the reordered symbols into a sequence favoring small indices; and finally, Huffman coding for entropy encoding of the resulting symbols, which includes special handling for runs of identical symbols (primarily zeros) from the MTF output.^[1]^[19] Following MTF, runs of identical symbols (often zeros) are encoded using a specialized scheme integrated into the Huffman coding: symbols 1–6 encode short runs of 2–7 identical values, with longer runs using additional count symbols (up to 258 repeats). This pipeline exploits local redundancies without loss of information, enabling high compression ratios. Decompression reverses these steps: Huffman decoding (with inverse run encoding), inverse MTF, inverse BWT, and inverse RLE. Each block includes a 32-bit CRC checksum for error detection, ensuring integrity during processing or transmission.^[1] The core of the algorithm begins with the BWT, which rearranges a block of input text S of length n by conceptually forming all cyclic rotations of S appended with a unique end-of-block symbol (typically represented as ''), [sorting](/page/Sorting) these rotations lexicographically, and outputting the last column of the sorted [matrix](/page/Matrix) as the transformed [string](/page/String) L , along with the [index](/page/Index) I indicating the position of the original string's rotation.[19] Formally, if R denotes the matrix of sorted rotations, then I = \arg\sort(\text{rotations of } S) , where the output BWT [string](/page/String) is L = S[(j - 1) \mod n] for the i -th sorted rotation starting at position j . This [sorting](/page/Sorting) groups identical or similar characters into runs, facilitating subsequent [compression](/page/Compression) stages, as characters in L are preceded by contexts that are sorted in the first column F $.^[20] For decompression, the inverse BWT reconstructs the original block from L and I using the last-to-first (LF) mapping, which exploits the relationship between the last and first columns of the rotation matrix. The LF function is defined as \text{LF}(i) = C[c_i] + \text{Occ}(c_i, i), where c_i = L is the character at position i, C is the cumulative count of characters strictly smaller than c in the block, and \text{Occ}(c, i) is the rank of the i-th occurrence of c in L (i.e., the number of times c appears in L[1..i-1]). Starting from position I, the original string is recovered by iteratively applying LF and prepending characters from F (which equals the sorted version of L) until the end-of-block symbol is reached.^[19] A pseudocode outline for inverse BWT is:

function inverseBWT(L, I):
    n = length(L)
    F = sort(L)  // First column, 1-based or adjust indices
    // Precompute C[c]: number of chars < c
    // Implement Occ(c, i) via table or counting sort for ranks
    S = [empty string](/page/Empty_string)
    i = I
    for j = 1 to n:
        S = F[i] + S
        c = L[i]
        i = C[c] + Occ(c, i)  // LF(i), 1-based
    return S without end symbol
function inverseBWT(L, I):
    n = length(L)
    F = sort(L)  // First column, 1-based or adjust indices
    // Precompute C[c]: number of chars < c
    // Implement Occ(c, i) via table or counting sort for ranks
    S = [empty string](/page/Empty_string)
    i = I
    for j = 1 to n:
        S = F[i] + S
        c = L[i]
        i = C[c] + Occ(c, i)  // LF(i), 1-based
    return S without end symbol

This process runs in [O(n](/page/O(n)) time with appropriate data structures for C and \text{Occ}.^[20] Following BWT, the MTF transform processes the output string L by maintaining a dynamic list of symbols ordered by recency of appearance, initially sorted alphabetically. For each symbol in L, MTF outputs its current index in the list (starting from 0) and moves it to the front of the list. This yields a sequence where frequently recurring symbols, especially those clustered by BWT, map to low values like 0, creating long runs suitable for further encoding.^[19] Inverse MTF reverses this by starting with the sorted alphabet and, for each index in the MTF output, emitting the symbol at that position and shifting it to the front. A simple pseudocode for forward MTF is:

function MTF(L):
    alphabet = sorted unique symbols in L
    output = empty
    for each c in L:
        idx = position of c in alphabet
        output.append(idx)
        move c to front of alphabet
    return output
function MTF(L):
    alphabet = sorted unique symbols in L
    output = empty
    for each c in L:
        idx = position of c in alphabet
        output.append(idx)
        move c to front of alphabet
    return output

RLE is applied initially to the input to replace long sequences of identical characters with special RUNA and RUNB symbols followed by a count (up to 258), reducing redundancy before BWT.^[20] Huffman coding finalizes the compression by building adaptive binary trees based on symbol frequencies in the post-MTF (run-encoded) output, assigning shorter codes to more frequent symbols and packing the bits into the output stream. bzip2 uses two levels of Huffman trees (for 0-255 and 0-20 ranges) to handle the varying symbol sets efficiently, with canonical representations for decoding. Inverse Huffman decoding reconstructs the run-encoded MTF stream using the provided tree structures and bit-level unpacking. The overall decompression flow thus proceeds as: decode Huffman to get run-encoded MTF indices, apply inverse run encoding and MTF to obtain the BWT string, apply inverse BWT to retrieve the RLEd block, and apply inverse RLE to recover the original block. Multi-block files concatenate these processed blocks, with an overall CRC for the entire file alongside per-block checksums.^[1]

File Format

The bzip2 file format consists of one or more independent streams, each representing a compressed sequence of data, enabling support for concatenated files without additional archival structures. Each stream begins with a 4-byte header: the magic bytes 0x42 0x5A 0x68 (ASCII 'BZh'), followed by a single ASCII digit '0' to '9', where '1' through '9' specifies the block size multiplier in hundreds of kilobytes (100 kB to 900 kB), and '0' indicates an unspecified or default size.^[3]^[1]^[2] Following the stream header, the format includes zero or more compressed blocks, each prefixed by a 48-bit synchronization marker 0x314159265359 (ASCII-equivalent '1AY&SY', derived from the digits of π in BCD). This marker is immediately followed by 16 randomized bits to minimize false detections in corrupted files, a 32-bit CRC checksum computed over the original uncompressed block, and a 1-bit flag (set to 0 in standard bzip2 implementations) indicating whether an origin table is present. The compressed block data then follows as a variable-length bitstream, representing up to 900,000 symbols from the Burrows-Wheeler transform output; this includes packed representations of run-length encoded (RLE) sequences using special RUNA and RUNB symbols, move-to-front (MTF) transformation indices (values 1 to 258), and Huffman-coded symbols. The Huffman coding supports 2 to 6 predefined trees, with up to 32 selector bytes specifying tree switches every 50 to 258 symbols, ensuring efficient encoding of the transformed data.^[21]^[22] Blocks are delimited by their headers, with no explicit end-of-block marker beyond the start of the next block or the stream's conclusion; the bitstream for each block concludes after all symbols are encoded. The stream ends with a 48-bit termination marker 0x177245385090 (BCD representation of the square root of π), followed by a 32-bit CRC checksum for the entire original uncompressed stream and 0 to 7 padding bits to achieve byte alignment.^[21]^[2] Error detection relies solely on these per-block and per-stream 32-bit CRCs, providing integrity verification with a detection failure probability of approximately 1 in 4 billion for random errors; the format includes no encryption or extraneous metadata.^[1] The design ensures backward compatibility across bzip2 versions since 0.9, with tools such as pbzip2 adhering to the identical structure for parallel decompression. For parsing and interoperability, the format's key components can be represented as follows:

Component	Size	Value/Description
Stream Header Magic	3 bytes	0x42 0x5A 0x68 ('BZh')
Block Size Digit	1 byte	ASCII '0'–'9' (multiplier for block size)
Block Magic Marker	6 bytes	0x31 0x41 0x59 0x26 0x53 0x59 ('1AY&SY')
Randomization Bits	16 bits	Arbitrary bits to thwart false synchronization
Per-Block CRC	4 bytes	32-bit CRC of original block data
Origin Flag	1 bit	0 (no origin table)
Compressed Block Data	Variable bits	Bit-packed MTF indices (1–258), RLE runs, Huffman symbols (up to 258 symbols)
Stream End Magic	6 bytes	0x17 0x72 0x45 0x38 0x50 0x90
Per-Stream CRC	4 bytes	32-bit CRC of entire original stream
Padding	0–7 bits	To next byte boundary

This table outlines the byte-level organization, facilitating implementation of readers and writers.^[21]

Performance and Applications

Efficiency Characteristics

bzip2 achieves compression ratios that typically reduce file sizes to 20-40% of their original size for text and repetitive data, leveraging the Burrows-Wheeler transform to group similar characters and enhance subsequent entropy coding efficiency.^[3] On the Calgary Corpus benchmark, a standard test suite for compression algorithms consisting of 14 files totaling 3,141,622 bytes, bzip2 at the highest compression level (-9) produces a compressed size of 828,642 bytes, yielding an average ratio of about 26.4% of the original size.^[1] This performance excels on data with redundancy, such as source code or logs, where the transform effectively clusters repeated patterns, but it approaches the original size (ratios near 100%) on random or incompressible data like encrypted files.^[3] Compression speeds for bzip2 vary by level, with the highest level (-9) processing data at around 10-15 MB/s on modern hardware as of 2024, while lower levels (e.g., -1) reach up to 15-20 MB/s. Decompression is notably faster, often 3-5 times the compression rate, achieving 30-40 MB/s under similar conditions due to the simpler inverse operations required. These metrics, derived from 2024 benchmarks on in-memory file systems, reflect single-threaded execution on contemporary CPUs without disk I/O interference.^[23] Resource usage includes memory demands scaling with block size, where compression at the default 900 kB block requires up to 7.6 MB of RAM (calculated as 400 kB base + 8 × block size), and decompression needs about 3.7 MB for the same block. The standard implementation lacks multi-threading, relying on a single core, which limits scalability on multi-core systems. Block size directly influences trade-offs: larger blocks (up to 900 kB) improve ratios by capturing more context for the transform but increase both time and memory, while smaller blocks (100 kB minimum) prioritize speed at the cost of density. Compression levels 1 through 9 correspond to these block sizes, allowing users to balance performance needs.^[1]^[24] In embedded and mobile contexts, bzip2's CPU-intensive nature results in higher power draw during operation compared to lighter algorithms, but its strong ratios can enhance overall efficiency by minimizing storage and I/O energy costs. A 2006 study on energy-aware compression found bzip2 to be over 25% more energy-efficient than alternatives in low-idle-power scenarios, a benefit that persists for battery-constrained devices where reduced data volume offsets compute overhead. On 2025 hardware, these traits make it suitable for offline archiving in resource-limited environments, though real-time applications may favor faster alternatives.^[25]

Comparisons and Use Cases

bzip2 offers superior compression ratios compared to gzip, typically achieving 10-15% better size reduction on text-heavy files, though gzip provides faster compression and decompression speeds suitable for quick operations.^[26] In contrast, xz generally delivers even higher ratios than bzip2—up to 30-40% smaller files in maximum settings—but at the cost of significantly slower processing times, making bzip2 a middle ground for scenarios prioritizing density over extreme optimization.^[27] Modern alternatives like zstd outperform bzip2 in 2025 benchmarks, balancing compression ratios comparable to bzip2 (around 50-60% file size at default levels) with 5-10 times faster speeds, particularly in decompression, rendering zstd preferable for most general-purpose tasks.^[28] Brotli, optimized for web content, achieves similar or better ratios than bzip2 on HTML and text but excels in faster web delivery scenarios where bzip2 lags due to its computational intensity.^[29]

Compressor	Typical Ratio (vs. Original Size)	Compression Speed	Decompression Speed	Best For
gzip	60-70%	Fast	Fast	Quick archiving of small files
bzip2	50-60%	Medium	Medium	Text/logs with moderate speed needs
xz	40-50%	Slow	Medium	Maximum archival density
zstd	50-60% (tunable to 30%)	Fast-Medium	Very Fast	Balanced modern use
Brotli	45-55%	Medium-Slow	Fast	Web-optimized content

bzip2's strengths lie in environments where file size reduction is critical over processing speed, such as archiving source code, system logs, or Debian packages, where its block-sorting algorithm yields efficient results on repetitive text data without excessive memory demands.^[26] It remains a preferred choice for software distribution in formats like tar.bz2 bundles or .deb packages supporting bzip2 compression, as seen in Debian's tooling for medium-sized payloads.^[30] In backup systems, bzip2 facilitates long-term storage of large text-based datasets, often integrated into tools like tar for creating compact archives.^[31] Additionally, it is embedded in multi-format archivers such as 7-Zip and WinRAR for handling .bz2 streams, and supported in Android file managers for extracting compressed app data or logs.^[32] In cloud storage ecosystems, bzip2 serves niche roles for compatible legacy uploads, though adoption is waning in favor of faster options. Practical use cases for bzip2 include Debian package creation and distribution, where its ratio advantages reduce bandwidth for open-source repositories, and backup routines for servers handling verbose logs or codebases.^[33] It is also utilized in embedded systems for compressing firmware updates or in tools like 7-Zip for cross-platform archival. However, its role in web compression is declining, supplanted by zstd and Brotli for HTTP transfers due to their superior speed-ratio trade-offs.^[34] bzip2 should be avoided for real-time streaming applications, where its sequential processing leads to unacceptable latencies compared to lightweight alternatives like gzip. For very large files, native bzip2's single-threaded nature causes bottlenecks, necessitating parallel variants like pbzip2 to leverage multi-core systems and achieve near-linear speedups.^[35] As of 2025, bzip2 occupies a niche role amid the rise of zstd, which is increasingly standardized (e.g., in Python's library proposals) for its versatility and compatibility with bzip2's density goals but without the speed penalties.^[36] It endures for backward compatibility in ecosystems like Debian and archival tools, ensuring long-term accessibility for bz2-formatted data in Android apps and cloud repositories.^[37]

References

[1]
bzip2 and libbzip2, version 1.0.8
### Summary of bzip2 Manual
[2]
bzip2 - The Library of Congress
Apr 30, 2024 · According to the bzip2 software official manual, bzip2 files are compressed using the Burrows-Wheeler block-sorting text compression algorithm, ...
[3]
bzip2
bzip2 compression program is based on Burrows–Wheeler algorithm. The program can compress files but cannot archive them.
[4]
A Quick Benchmark: Gzip vs. Bzip2 vs. LZMA - The Tukaani Project
May 31, 2005 · bzip2 has notably better compression ratio than gzip, which has to be the reason for the popularity of bzip2; it is slower than gzip especially ...Test conditions · The tables of the test results · Memory requirements
[5]
bzip2 : Home - Sourceware
The current stable version is bzip2 1.0.8. What is bzip2? bzip2 is a freely available, patent free (see below), high-quality data compressor. It typically ...Missing: date | Show results with:date
[6]
bzip2(1) man page
... compression algorithm, and Huffman coding. Compression is generally ... AUTHOR Julian Seward, jseward@acm.org. https://sourceware.org/bzip2/ The ideas ...
[7]
bzip2 download | SourceForge.net
Nov 28, 2024 · Seward made the first public release of bzip2, version 0.15, in July 1996. The compressor's stability and popularity grew over the next several ...
[8]
Gzip vs Bzip2 vs XZ Performance Comparison - RootUsers
Sep 17, 2015 · In this case xz is always providing the best compression ratio, closely followed by bzip2 with gzip coming in last, however as shown in the ...
[9]
changes - Sourceware
... compression. bzip2/libbzip2 version 1.0.8 of 13 July 2019 Copyright (C) 1996-2019 Julian Seward <jseward@acm.org> Please read the WARNING, DISCLAIMER and ...
[10]
How To Install And Use bzip2 in Ubuntu - Linux Fellas
bzip2 is a command-line tool used to compress and decompress files in Ubuntu. Let's learn its usage through some examples. Compress File Using bzip2. Execute ...Missing: documentation | Show results with:documentation
[11]
bzip2 - Homebrew Formulae
bzip2. Install command: brew install bzip2. Freely available high-quality ... macOS on. Apple Silicon, tahoe, ✓. sequoia, ✓. sonoma, ✓. ventura, ✓. monterey ...
[12]
[PDF] A Block-sorting Lossless Data Compression Algorithm
May 10, 1994 · Burrows and D.J. Wheeler. May 10, 1994. Page 4. David Wheeler is a Professor of Computer Science at the University of Cambridge,. U.K. His ...
[13]
[PDF] bzip2 and libbzip2 a program and library for data compression ...
bzip2 compresses large files in blocks. The block siz e a ff ects both the compres- sion ratio achieved, and the amount of memory needed for compression and.
[14]
AFNI file: README.bzip2
This is the README for bzip2, my block-sorting file compressor, version 0.1. bzip2 is distributed under the GNU General Public License version 2.
[15]
bzip2 1.0.8 released - Sourceware
Jul 13, 2019 · We are happy to announce the release of bzip2 1.0.8. This is an fixup release because the CVE-2019-12900 fix in bzip2 1.0.7 was too strict and ...
[16]
bzip2 - GitLab
Thanks to Julian Seward, the original author of Bzip2/libbz2, for creating the program and making it a very compelling alternative to previous compression ...
[17]
bzip2 and libbzip2 License v1.0.6 | Software Package Data ... - SPDX
This program, "bzip2", the associated library "libbzip2", and all documentation, are copyright (C) 1996-2010 Julian R Seward. All rights reserved.
[18]
[PDF] The Burrows-Wheeler Transform: Theory and Practice
In this paper we describe the Burrows-Wheeler Transform. (BWT) a completely new approach to data compression which is the basis of some of the best ...
[19]
[PDF] Burrows-Wheeler Transform
PrefixCode(rle(MTF(BWT(S)))) replace runs of 0s with the count of 0s. Huffman code that uses more bits for rare symbols. Page 20. Pseudocode for ...
[20]
https://www.cs.cmu.edu/~ckingsf/bioinfo-lectures/bwt.pdf
[21]
Bzip2 block header: 1AY&SY - Stack Overflow
Aug 15, 2013 · Any Bzip2 archive consists of file header, one or more blocks and tail structure. All blocks should start with "1AY&SY", 6 bytes of BCD-encoded digits of the ...gzip file - does it have a block header like bzip2 does?Why is bzip2's maximum blocksize 900k? - Stack OverflowMore results from stackoverflow.com
[22]
Detailed Results for the Calgary Corpus - The Canterbury Corpus
May 3, 2001 · Table of Compression Ratio Results. Sorted by compression ratio (*); Sorted by compression method. Table of Compression Speed Results. Sorted by ...
[23]
lzop vs compress vs gzip vs bzip2 vs lzma vs lzma2/xz benchmark ...
Jul 19, 2025 · bzip2 .bzip2 (.tbz, .tbz2), 1.0.5, known to have a better compression ratio than gzip, but much slower, 1 to 9, 9 by default. lzma .lzma, 4.999 ...
[24]
bzip2(1): block-sorting file compressor - Linux man page
bzip2 compresses files in blocks, usually 900kbytes long. Each block is handled independently. If a media or transmission error causes a multi-block .bz2 file ...Missing: format specification
[25]
Class BZip2CompressorOutputStream - Apache Commons
You can compute the memory usage for compressing by the following formula: <code>400k + (9 * blocksize)</code>. To get the memory required for decompression by ...
[26]
Energy Aware Lossless Data Compression - USENIX
When idle power is one Watt, bzip2 energy is over 25% more energy efficient than the next best compressor. ... Energy-efficient design of battery-powered embedded ...
[27]
Comparison of gzip, bzip2, xz - Thomas-Krenn-Wiki-en
Sep 12, 2025 · Several tools are available under Linux for lossless data compression: gzip, bzip2 and xz. These tools are often used together with the ...Missing: pre- context limitations 1990s
[28]
Comparison of different compression tools – /techblog - Redpill Linpro
Dec 18, 2024 · It is impressively fast even without multi-core, and the compression is still better than bzip2 . Practical use case. A not uncommon task is ...Missing: hardware | Show results with:hardware
[29]
Benchmarking compression programs - MaskRay
Aug 31, 2025 · Benchmarking. I want to benchmark the single worker performance of a few compression programs: lz4: Focuses on speed over compression ratio.
[30]
[PDF] Performance Evaluation of Efficient Hybrid Compression Methods ...
Apr 29, 2025 · The hybrid achieves a strong efficiency score of 0.8597, combining a high compression ratio (94.82) with exceptional speed (1055.69 MB/s ...
[31]
Debian -- Details of package bzip2 in bookworm
bzip2 is a freely available, patent free, data compressor. bzip2 compresses files using the Burrows-Wheeler block-sorting text compression algorithm, and ...
[32]
.BZ2 File Format - Everything You Need to Know about BZIP2 - Aspose
BZ2 file format, associated with BZIP2 compression, traces its origins back to Julian Seward's development efforts in the late 1990s within the UNIX community.Missing: 1995 | Show results with:1995
[33]
7-Zip
Packing / unpacking: 7z, XZ, BZIP2, GZIP, TAR, ZIP and WIM · Unpacking only: APFS, AR, ARJ, CAB, CHM, CPIO, CramFS, DMG, EXT, FAT, GPT, HFS, IHEX, ISO, LZH, LZMA ...Download · 7z Format · FAQs · Support
[34]
https://speedvitals.com/blog/zstd-vs-brotli-vs-gzip/
[35]
ZSTD vs Brotli vs GZip Comparison | SpeedVitals Blog
Jun 13, 2025 · ZSTD is the perfect choice when the Compression/Decompression times are of utmost importance, while Brotli takes the crown for Compression ratios.Missing: trends bzip2
[36]
Parallel BZIP2 (PBZIP2)
PBZIP2 is a parallel implementation of the bzip2 block-sorting file compressor that uses pthreads and achieves near-linear speedup on SMP machines.
[37]
PEP 784 – Adding Zstandard to the standard library | peps.python.org
Apr 6, 2025 · Zstandard achieves a much higher compression ratio than bzip2 or zlib (DEFLATE) while decompressing significantly faster than LZMA. Zstandard ...Libzstd Optional Dependency · Rejected Ideas · Include Zipfile And Tarfile...<|separator|>
[38]
How to Open 7-Zip, RAR, & ZIP Files on Android: Best Apps - wikiHow
GZ, BZIP2, or other type of compressed file on your Android, you'll need an app that can extract those file formats. There's no 7-Zip for Android, but Zarchiver ...