Fact-checked by Grok 2 weeks ago

Burrows–Wheeler transform

The Burrows–Wheeler transform (BWT) is a reversible algorithm that rearranges the characters of a finite to produce runs of similar symbols, thereby enhancing the of the without loss of information. It operates by generating all cyclic rotations of the input , sorting them lexicographically, and outputting the last column of the resulting , along with an index pointing to the original string's position. First conceived by David Wheeler in 1983 and formally published in a 1994 technical report by Michael Burrows and Wheeler, the BWT transforms blocks of text—typically several kilobytes in size—to group identical or contextually similar characters, making subsequent encoding with methods like move-to-front () and highly efficient. The transform's efficacy stems from its ability to exploit local redundancies in the data: after applying the BWT, the output string exhibits long runs of repeated characters, which simple statistical coders can compress to rates approaching the of the original text. Computationally, the BWT can be implemented using sorting techniques, such as a variant of , achieving O(n log n) for a string of length n, though practical implementations often use suffix arrays for , requiring about 9 bytes per in . is equally reversible: by the transformed string to reconstruct the first column and iteratively matching characters via a predecessor mapping, the original input is recovered in linear time. Theoretical bounds confirm its guarantees, with the MTF-encoded BWT output size limited by approximately 8 times the zeroth-order plus negligible overhead terms. Beyond its foundational role in lossless compression tools like bzip2—which processes blocks up to 900 KB and outperforms gzip on text corpora such as the Calgary or Canterbury sets—the BWT has found applications in diverse fields including bioinformatics for DNA sequence alignment (e.g., in tools like BWA and Bowtie), compressed text indexing via suffix arrays, and even image and multimedia compression. Variants, such as the inverse BWT or extensions for dynamic data structures, further enable space-efficient storage and querying of large datasets, underscoring the transform's enduring impact on algorithmic design.

Overview

Definition

The Burrows–Wheeler transform (BWT) is a reversible that rearranges the characters of a finite S of n over a finite into a new L, which is a of S, along with an I indicating the position of the original in the sorted rotations. This is achieved by generating all cyclic rotations of S, sorting them in to form a , and extracting the last column of that as L. The result groups similar characters together, creating runs of identical symbols that enhance through subsequent encoding techniques. The primary purpose of the BWT is to preprocess strings for lossless data , particularly text, by exploiting local redundancies without losing information. Originally proposed as part of a block-sorting scheme, it transforms the input into a form where identical characters are clustered, making it more amenable to or . This approach was designed to achieve high ratios while maintaining computational efficiency. The transform is named after Michael Burrows and David J. Wheeler, who introduced it in their 1994 technical report. Their work demonstrated its effectiveness in practical compression systems, influencing tools like bzip2.

Motivation

The Burrows–Wheeler transform (BWT) was motivated by the limitations of traditional techniques in handling the statistical redundancies inherent in natural language texts. Methods such as and excel at exploiting first-order or zeroth-order symbol frequencies but falter with long-range dependencies, where the probability of a character depends on distant contextual elements, such as or thematic repetitions spanning paragraphs. These approaches often require computationally intensive higher-order models, like prediction by partial matching (), to capture such patterns effectively, resulting in slower encoding and decoding speeds unsuitable for practical applications. To address these challenges, the BWT rearranges the input to exploit second-order redundancies—correlations between a and its preceding —by grouping characters that follow similar contexts together. This is achieved through all cyclic rotations of the , which positions suffixes with comparable prefixes adjacently; consequently, in the final output (the characters immediately preceding the sorted suffixes), context-similar symbols cluster, forming locally homogeneous segments that enhance using simple follow-up coders. For instance, in English text, vowels following common consonant sequences or repeated words tend to aggregate, amplifying the effectiveness of run-length or encoding on the transformed data. In comparison to earlier adaptive transforms like move-to-front (MTF) coding, which reorders symbols based on recent usage to favor frequent recurrences with , the BWT offers a superior preprocessing step by implicitly modeling without dynamic adaptation. MTF alone creates short runs by leveraging local recency but fails to generate the extended homogeneous blocks that BWT produces, leading to better ratios—often approaching higher-order bounds—when BWT precedes MTF and a zeroth-order coder. This combination yields outputs dominated by zeros (over 50% in typical English text), simplifying subsequent without the overhead of explicit tracking. The BWT emerged in 1994 from research at Digital Equipment Corporation's Systems Research Center in , where Michael Burrows and David J. Wheeler formalized and analyzed the technique to overcome inefficiencies in prior indexing and sorting approaches for large-scale data handling. Originally conceived by Wheeler in 1983 during his time at Bell Laboratories, the transform was refined at DEC SRC to support block-based that balanced speed and ratio, drawing on Burrows' expertise in sorted data structures for efficient file processing.

Construction

Algorithm

The Burrows–Wheeler transform (BWT) constructs a of the input S of n by rearranging its characters to group similar symbols together, facilitating subsequent . To compute the BWT, first a character, typically denoted as \ , to S to form an extended [string](/page/String) S' of [length](/page/Length) n+1 , where $ $ is lexicographically smaller than any character in the and appears only once; this ensures all rotations are and sortable without ambiguity. The algorithm proceeds in three main steps. First, generate all n+1 cyclic s of S', where each i (for i = 0 to n) is S'[i:] + S'[:i]. Second, sort these rotations lexicographically to form the BWT , an (n+1) \times (n+1) where each row is one sorted . Third, extract the last column of this sorted to obtain the BWT output L, which has the same length as S' and consists of the characters immediately preceding the first column in the sorted rotations. The original S can be recovered from L by removing the and inverting the transform, though the index of the original is not required when using the . The following pseudocode illustrates the basic construction, assuming 0-based indexing and standard string operations:
function BWT(S):
    append '$' to S to get S'  // unique sentinel, length n+1
    rotations = empty list of length n+1
    for i from 0 to n:
        rotations[i] = S'[i:] + S'[:i]
    sort rotations lexicographically
    L = empty string
    for each row in rotations:
        append row[-1] to L  // last character of each row
    return L  // BWT output, including sentinel
This naive implementation has a of O((n+1) \log (n+1)) dominated by the step, assuming a comparison-based sort like , and a of O((n+1)^2) to store all rotations explicitly; however, the space can be optimized to O(n) using suffix techniques that avoid materializing the full .

Example

To illustrate the construction of the Burrows–Wheeler transform, consider the input "" appended with a unique sentinel character "" (lexicographically smaller than 'a', 'b', or 'n'), yielding "banana". All 7 cyclic rotations of this string are generated and sorted in lexicographical order to form the rotation matrix. The sorted rotations and their corresponding last characters are shown below:
Sorted RotationLast Character
$bananaa
a$banann
ana$bann
anana$bb
banana$$
na$banaa
nana$baa
The Burrows–Wheeler transform output L is the concatenation of the last characters from the sorted rows: "annb$aa". In this output, runs of identical characters such as "nn" and "aa" emerge, arising from similarities among the suffixes in the original rotations.

Properties

Mathematical Formulation

The Burrows–Wheeler transform (BWT) of a string S of length n over an \Sigma, typically terminated by a unique end-of-string symbol such as $, is formally defined via the lexicographically sorted cyclic rotations of S. Consider the n \times n matrix M whose rows are these rotations, ordered such that the i-th row (for i = 0, \dots, n-1) is the rotation starting at position SA in S, where SA is the —a of \{0, \dots, n-1\} giving the starting indices of the sorted suffixes of S. The entry M[i, j] is then S[(SA + j) \mod n]. The BWT output, denoted L, is the last column of M: L = M[i, n-1] = S[(SA - 1) \mod n] for i = 0, \dots, n-1, which extracts the character immediately preceding the suffix starting at SA in the cyclic sense. Equivalently, L can be viewed as the result of permuting S according to the Burrows–Wheeler , but the matrix perspective highlights its structure: the first column F of M consists of the characters of S in order, since the rotations are cyclic and aligns identical prefixes. A key property is that F is precisely the version of L, as both columns contain each character of S exactly once, and the of rotations ensures F reflects the of the characters. This structure implies a crucial character mapping: for each row i, the character L immediately precedes F in the original S. To see why, note that in the i-th sorted , F is the first character and L is the last; shifting the left by one yields a beginning with L followed by the rest, but since the rows are sorted by their full content, the relative preserves the cyclic adjacency from S. A proof sketch follows from the : the suffixes after the first character in each determine the row , but the cyclic ensures that the last character of row i (from SA - 1) is the one preceding the suffix at SA, effectively permuting characters based on their right-context in the suffixes. This alignment exploits the suffix to cluster similar characters in L, facilitating subsequent ing.

Invertibility

The Burrows–Wheeler transform (BWT) is invertible, allowing the original string to be recovered from its transformed output, the last column L of the implicit , along with the index I indicating the position of the original string's row (typically marked by a unique character such as EOF). The inversion process relies on the Last-to-First (LF) , which establishes a correspondence between positions in L and the sorted first column F, without needing to reconstruct the full of rotations. This exploits the fact that F is the sorted version of L, enabling efficient traversal of the cyclic shifts. To perform the inversion, first compute the cumulative count array C, where C represents the number of characters in the that are strictly smaller than character c (i.e., the starting position in F for occurrences of c). Next, construct a rank array P, which for each position i in L gives the number of occurrences of L in the prefix L[0 \dots i-1] (0-based ing). The LF mapping is then defined by the equation: \text{LF}(i) = C[L] + P This yields the position in F (and thus in the sorted rotations) corresponding to the character at L. Starting from the I (the row containing the sentinel), iteratively apply the LF mapping to trace backward through the rotations: prepend F[\text{LF}(current)] to a result string until the sentinel is reached after n steps, where n is the string length. This process reconstructs the original string by following the stable sort order of the rotations. The BWT's invertibility stems from its bijectivity when a unique is appended to the input string, ensuring all rotations are distinct and the transformation permutes the string in a reversible manner. Without the sentinel, the BWT is not uniquely invertible, as multiple rotations of the original string may produce the same output; invertibility holds only if all characters are distinct. With precomputed and arrays (achievable in a single pass over the data), the entire inversion runs in O(n) time and space, where n is the string length, avoiding the O(n^2) cost of explicitly building the .

Variants

Bijective Variant

The standard Burrows–Wheeler transform (BWT) requires an artificial character, typically denoted as $, smaller than all characters in the , to guarantee that all cyclic rotations of the input are distinct and can be uniquely sorted. Without this sentinel, rotations may coincide in strings containing repeats, resulting in non-unique sorting orders and cases where the transform is not invertible, as multiple original strings could map to the same output. The bijective variant of the BWT, particularly in the context of the , resolves this by applying stable sorting to the cyclic s while using the original positions as tie-breakers. Specifically, each is paired with its starting in the original , and the pairs are sorted lexicographically by the ; in case of ties (equal rotations), the pair with the smaller original precedes the other. This ensures a total, unique ordering over all strings in the without needing a . The output comprises the BWT L, consisting of the last character of each sorted , and an array I (the inverse of the rotations), which captures the mapping from sorted ranks back to original positions. To invert the transform, the first column F is obtained by stably L (which is a permutation of the original ). The original S of length n is then reconstructed using I via the relation S = F[I] for j = 1 to n, where I gives the rank of the rotation starting just after j (with wrap-around for cyclic nature). Equivalently, starting from the primary k (the row in the sorted corresponding to the original rotation, often I or derived from I), the can be cycled as S = L[k : n] + L[0 : k] after applying the last-to-first (LF) mapping to align the positions, yielding the full sequence through iterative application of the defined by L and F. This process leverages the stable sort to handle repeats correctly, as the tie-breakers preserve positional uniqueness. This variant's key advantages stem from obviating the , enabling direct applicability to arbitrary strings over the without modification. It facilitates efficient implementations in scenarios involving dynamic updates or streaming inputs, where inserting a unique would disrupt the data flow or require additional preprocessing, while maintaining invertibility through the compact storage of I (typically O(n log n) bits, compressible in practice).

Dynamic Variant

The standard Burrows–Wheeler transform (BWT) is inherently static, necessitating a complete recomputation of the underlying and transform for any modification to the input , such as an insertion or deletion; this process incurs a time cost of O(n log n) using standard suffix array construction algorithms. Dynamic variants of the BWT address this limitation by enabling efficient s to the transform without full rebuilding, often leveraging advanced data structures to maintain the suffix array or equivalent representations. For instance, approaches using trees to represent the BWT and balanced parentheses to encode the implicit allow for incremental updates while preserving query capabilities. These methods typically employ auxiliary structures like ropes for manipulation or treaps for balanced operations on suffixes, achieving update times of O(log n) amortized in some cases. A seminal implementation is the dynamic FM-index proposed by Hon et al., which extends the (a BWT-based compressed index) to support insertions and deletions in O(log² n) time by incrementally updating occurrence ranks and the last-to-first (LF) mappings. This approach maintains the core BWT properties while allowing modifications to the text collection, making it suitable for scenarios requiring real-time indexing, such as evolving databases. However, these dynamic variants come with trade-offs, including higher space overhead—typically O(n log n) bits—compared to the static BWT's more compact O(n log |Σ|) bits, where |Σ| is the alphabet size, due to the additional structures needed for efficient updates.

Implementations and Optimizations

Sample Implementation

A sample implementation of the Burrows–Wheeler transform (BWT) and its inverse can be expressed in Python-like , suitable for small strings where explicit construction of rotations is feasible. This approach follows description in the seminal paper by Burrows and Wheeler. The forward BWT appends a sentinel character '' (assumed to be lexicographically smaller than all input characters) to the input string s $, generates all cyclic rotations, sorts them lexicographically, and extracts the last character of each sorted rotation to form the output string.
python
def bwt(s):
    if not s:
        return ''
    n = len(s)
    sentinel = '$'
    rotations = [s[i:] + s[:i] + sentinel for i in range(n)]
    rotations.append(sentinel + s)  # Rotation for the sentinel alone, but integrated
    sorted_rotations = sorted(rotations)
    return ''.join(row[-1] for row in sorted_rotations)
This implementation requires O(n^2) time and space due to the explicit storage of rotations, making it impractical for large n but illustrative for understanding the transform. The inverse BWT reconstructs the original string from the transformed string L (the BWT output) using the last-to-first (LF) mapping. It relies on the first column F, obtained by sorting L; the cumulative count array C, which gives the starting position in F for each character c; and the occurrence rank \text{rank}(c, i), which counts the number of times c appears in L[0 \dots i-1]. Starting from the row containing the sentinel in F (typically index 0, assuming '' is smallest), the algorithm iteratively applies the LF mapping \text{LF}(i) = C[L] + \text{rank}(L, i) $ to traverse the string in reverse order, prepending characters until the sentinel is reached. The LF mapping concept enables this efficient recovery, as detailed in the original work. For a practical implementation, the rank can be computed on-the-fly for small inputs by scanning prefixes of L, though this yields O(n^2) time; in production, a precomputed occurrence table or wavelet tree would achieve O(n) time.
python
def rank(L, i, c):
    return sum(1 for j in range(i) if L[j] == c)

def ibwt(L):
    if not L:
        return ''
    n = len(L)
    chars = sorted(set(L))
    # Compute C: cumulative counts of characters strictly smaller than c
    C = {}
    total = 0
    char_count = {ch: L.count(ch) for ch in chars}
    for ch in chars:
        C[ch] = total
        total += char_count[ch]
    # Assume '$' is the smallest and starts at index 0 in F
    i = 0
    result = []
    while True:
        char = L[i]
        result.append(char)
        if char == '$':
            break
        i = C[char] + [rank](/page/Rank)(L, i, char)
    return ''.join(reversed(result[:-1]))  # Remove [sentinel](/page/Sentinel) and reverse
For edge cases, an empty input string returns an empty output for both forward and transforms. For a single-character string such as "a", the forward BWT yields "a$", and the recovers "a", assuming the sentinel is handled appropriately. To verify correctness, consider the input "". The forward BWT produces "annbaa". Applying the [inverse](/page/Inverse) BWT to "annbaa" recovers "", demonstrating round-trip fidelity. This example illustrates the transform's invertibility for typical inputs.

Optimization Techniques

The naive construction of the (BWT) requires generating and all cyclic rotations of the input , incurring O(n²) time and for a of length n. To achieve linear-time computation, -based methods first construct the SA in O(n) time using algorithms such as DC3 or SA-IS, then derive the BWT by setting BWT = S[SA - 1] (or a if SA = 0), avoiding the need to build the full . The DC3 employs a divide-and-conquer to recursively sort sampled , while SA-IS uses induced on L- and S-type to propagate ranks efficiently. Space-efficient variants further optimize these approaches by inducing the sorting of all suffixes without explicitly storing rotations or intermediate arrays beyond O(n) space. The induced sorting technique, as refined by Manzini, classifies suffixes into L-type (starting with a smaller character) and S-type (starting with a larger character) and sorts them by inducing order from already-sorted subsets, enabling lightweight suffix array construction in O(n) time and space. This method reduces memory usage to a constant factor of the input size, making it suitable for large-scale texts. For parallelization on multi-core systems, divide-and-conquer extensions of induced sorting distribute suffix classification and ranking across processors, achieving near-linear speedup. Manzini's optimizations in induced sorting facilitate such parallelism by minimizing inter-processor communication during bucket scans. Algorithms like ParaBWT partition the input into blocks, compute partial suffix arrays in parallel, and merge results using multi-way sorting, scaling to O(n / p) time with p processors while maintaining O(n) space. BWT inversion benefits from precomputing rank arrays to support the last-to-first (LF) mapping in time overall. trees provide a compact representation for queries on the BWT string, decomposing it into bit vectors for O(log σ) time per query (where σ is the size), with total in O(n log σ) time and space. This enables efficient of the original string by iteratively applying LF mappings starting from the end position. Advances in the 2010s leveraged GPU acceleration for computing BWT on massive genomic datasets, where n exceeds billions. Blockwise algorithms sort rotations in parallel across thousands of GPU cores, achieving up to 10x over CPU-based induced for short-read collections up to 100 gigabases, effectively scaling as O(n / p) with p GPU threads. These methods are particularly impactful for bioinformatics pipelines, reducing time to under two hours on commodity hardware. More recent work in the 2020s includes algorithms like for efficient BWT on DNA datasets with highly diverse read lengths.

Applications

Data Compression

The Burrows–Wheeler transform (BWT) plays a central role in general-purpose lossless data compression by rearranging input data into a form that groups similar characters together, facilitating subsequent stages. In the compressor, introduced in 1996, the pipeline applies the BWT to the input block, followed by a transform to map symbols to small integers based on recent occurrences, then (RLE) to handle the resulting runs of identical symbols, and finally with adaptive modeling tailored to the BWT output. This sequence exploits the local correlations induced by the BWT, where characters preceding identical suffixes tend to cluster, enabling efficient compression without complex statistical modeling. The effectiveness of BWT in stems from its ability to transform data into a high-locality representation, often producing long runs of identical symbols that simple coders like RLE or Huffman can exploit with minimal overhead. For instance, in the example of the "banana", the BWT yields "annbaa", featuring clustered 'a's and 'n's that enhance compressibility. On text data, bzip2 achieves compression ratios approximately 20-30% better than (which relies on LZ77 and Huffman), reducing file sizes more effectively due to BWT's superior handling of contextual redundancies. Standalone implementations of BWT-based , such as the libbsc library, provide versatile tools for integrating the transform into custom pipelines, often outperforming LZ77-based methods like those in or on highly repetitive datasets by leveraging block-sorting to amplify redundancy. However, BWT performs poorly on random or incompressible data, where it introduces no exploitable structure and may even slightly increase , necessitating with post-BWT RLE to encode any emergent runs before further processing.

Bioinformatics

The FM-index, introduced by Ferragina and Manzini in 2000, serves as a cornerstone for bioinformatics applications of the Burrows–Wheeler transform by providing a compressed representation of a suffix array that supports constant-time random access to text characters and pattern matching in O(m + occ) time, where m is the pattern length and occ is the number of occurrences. This structure leverages the BWT to emulate suffix array functionality without storing the full array, making it ideal for indexing large biological sequences like genomes, where space efficiency is critical due to the repetitive nature of DNA. In practice, the FM-index has been integrated into widely used read alignment tools such as Bowtie, which aligns short DNA reads to reference genomes using Burrows–Wheeler indexing for a memory footprint of approximately 1.3 GB on the human genome, and BWA, which employs FM-index construction for fast exact matching in short-read alignment. A key mechanism in FM-index-based alignment is the backward search algorithm, which uses last-to-first (LF) mapping to traverse the implicit suffix array in reverse, enabling the discovery of exact and approximate matches without materializing the entire suffix array. This process starts from the pattern's suffix and iteratively applies LF-mapping to count and locate occurrences efficiently, accommodating mismatches or indels through extensions like seeding and extension strategies in tools like BWA-MEM. By avoiding the O(n log n) space of traditional suffix trees or tries, the FM-index offers sublinear space usage—often close to the text's compressed size—for massive datasets exceeding 100 GB, such as pan-genome collections, where suffix trees would require prohibitive memory (e.g., over 20 bytes per base). More recent advancements build on BWT for genomic , exemplified by the r-index introduced in the 2020s, which combines the BWT with sampled positions to achieve ratios of 1-2 bits per base for the while supporting efficient pattern queries. This approach exploits the low number of runs (r) in the BWT of repetitive biological sequences, using O(r \log \log n) space to index entire genomes or collections, facilitating tasks like read alignment across diverse strains without decompressing the data. The r-index's design addresses the scalability challenges of growing genomic databases, enabling searches on terabyte-scale inputs that traditional methods cannot handle due to space constraints.

Other Uses

The Burrows–Wheeler transform (BWT) has been extended to two-dimensional variants for , where images are rasterized into one-dimensional sequences for processing. In these approaches, row-by-row (horizontal raster) or column-by-column (vertical raster) scanning converts the pixel grid into a linear , allowing application of the standard BWT to group similar values and runs of identical pixels, which enhances subsequent . For instance, combining BWT with (DCT) and (RLE) achieves compression ratios around 29:1 to 30:1 for images, outperforming traditional in lossless scenarios by exploiting pixel redundancy through sorted contexts. These methods are particularly effective in sparse image coders, where BWT-induced grouping reduces the entropy of pixel runs in low-complexity regions, such as medical or scanned documents. Beyond images, BWT facilitates sequence prediction tasks in language modeling by leveraging its suffix-sorting property to model contexts efficiently. The transform rearranges training sequences such that similar prefixes (contexts) are clustered, enabling of the next or based on the conditional probabilities derived from the BWT representation and its . A succinct using BWT and trees achieves lossless competitive with state-of-the-art methods, storing training data compactly while querying for the most likely next in O(1) time per after preprocessing. This context-sorting mechanism has applications in grammar-based , where predicted symbols guide construction and for adaptive encoding. In , BWT aids by converting sequential data into strings and grouping similar patterns through permutation. For vehicle controller area network ( traffic, of message IDs are mapped to character strings, and BWT clusters recurring subsequences, allowing similarity analysis to identify deviations as anomalies with high (e.g., F1-scores above 0.95 on datasets). This grouping exploits the transform's ability to localize related temporal patterns, facilitating efficient detection without exhaustive pairwise comparisons. BWT also contributes to cryptographic applications, particularly in permutation-based ciphers for preservation. By prior to , BWT generates permutations that obscure statistical patterns, increasing the unicity distance (e.g., up to 8.3 bits) and resistance to when integrated into schemes for sensor feeds, such as the proposed SEC system. In such systems, BWT followed by move-to-front encoding and achieves dual and , reducing bits per character to under 2 while protecting transmitted against unauthorized .

References

  1. [1]
    [PDF] A Block-sorting Lossless Data Compression Algorithm
    May 10, 1994 · We describe a block-sorting, lossless data compression algorithm, and our imple- mentation of that algorithm. We compare the performance of ...
  2. [2]
    Wheeler graphs: A framework for BWT-based data structures
    Oct 25, 2017 · The Burrows–Wheeler Transformation (BWT) has a very peculiar history. First conceived in 1983, it was published only eleven years later in a ...Missing: invention date
  3. [3]
    [PDF] The Burrows-Wheeler Transform: Theory and Practice
    In this paper we describe the Burrows-Wheeler Transform. (BWT) a completely new approach to data compression which is the basis of some of the best ...
  4. [4]
  5. [5]
    An analysis of the Burrows—Wheeler transform - ACM Digital Library
    In this paper, we analyze two algorithms that use this technique. The first one is the original algorithm described by Burrows and Wheeler, which, despite ...
  6. [6]
    [PDF] Burrows-Wheeler Transform
    BWT(banana) = annb$aa. Tends to put runs of the same character together ... BWT Searching Example 2. $bananna a$banann ananna$b anna$ban bananna$ na ...
  7. [7]
  8. [8]
    [PDF] 1 The Burrows-Wheeler Transform
    Nov 29, 2018 · The BWT was invented in 1983, and published as a tech report in 1994 by Burrows and Wheeler1. We will see: • How to compute the BWT of a string.<|separator|>
  9. [9]
    When a dollar makes a BWT - ScienceDirect.com
    Feb 12, 2021 · Clearly, all rotations of v$ are distinct, thus the inverse of the BWT becomes unique, due to the condition that the sentinel character must be ...Missing: invertibility | Show results with:invertibility<|control11|><|separator|>
  10. [10]
  11. [11]
    [0908.0239] On Bijective Variants of the Burrows-Wheeler Transform
    Aug 3, 2009 · Abstract page for arXiv paper 0908.0239: On Bijective Variants of the Burrows-Wheeler Transform. ... In this paper, we discuss a bijective variant ...
  12. [12]
    Constructing the Bijective and the Extended Burrows-Wheeler ...
    Nov 16, 2019 · View a PDF of the paper titled Constructing the Bijective and the Extended Burrows-Wheeler Transform in Linear Time, by Hideo Bannai and ...Missing: online | Show results with:online
  13. [13]
  14. [14]
    Compressed indexes for dynamic text collections - ACM Digital Library
    This article extends the work on optimal-space indexing to a dynamic collection of texts. Our first result is a compressed solution to the library management ...
  15. [15]
    GPU-Accelerated BWT Construction for Large Collection of Short ...
    Jan 29, 2014 · The BWT of a short-read collection of up to 100 Gigabases can be constructed in less than 2 hours using a machine equipped with a quad-core CPU and a GPU.
  16. [16]
  17. [17]
    bzip2
    ### bzip2 Algorithm Pipeline and Compression Comparisons
  18. [18]
    Parallel lossless data compression on the GPU - IEEE Xplore
    Our approach parallelizes three main stages in the bzip2 compression pipeline: Burrows-Wheeler transform (BWT), move-to-front transform (MTF), and Huffman ...
  19. [19]
    A Quick Benchmark: Gzip vs. Bzip2 vs. LZMA - The Tukaani Project
    May 31, 2005 · bzip2 has notably better compression ratio than gzip, which has to be the reason for the popularity of bzip2; it is slower than gzip especially ...<|separator|>
  20. [20]
    High performance data compression library
    bsc and libbsc is a high-performance program and a library for lossless data compression. This is an open source, portable data compression software written ...Missing: Burrows- Wheeler transform
  21. [21]
    An Efficient Compression Algorithm for Short Text Strings - Baeldung
    Mar 18, 2024 · High compression ratio: The BWT algorithm can achieve a higher compression ratio than other algorithms such as LZ77 and LZ78, especially for ...
  22. [22]
    [PDF] Opportunistic Data Structures with Applications
    In this paper, we address the issue of compressing and indexing data by studying it in a theoretical framework. We devise a novel data structure for indexing ...
  23. [23]
    Ultrafast and memory-efficient alignment of short DNA sequences to ...
    Mar 4, 2009 · Bowtie employs a Burrows-Wheeler index based on the full-text minute-space (FM) index, which has a memory footprint of only about 1.3 gigabytes ...
  24. [24]
    [PDF] Indexing Compressed Text
    HON, W., LAM, T., SADAKANE, K., SUNG, W., AND YIU, S. 2004a. Compressed index for dynamic text. In. Proceedings of the IEEE Data Compression Conference. IEEE ...
  25. [25]
    [PDF] Image Compression Using Burrows-Wheeler Transform
    The purpose of this thesis was to study image compression using the Burrows-. Wheeler transform. The aim of image compression is to compress the image into ...
  26. [26]
    On the Utilization of Reversible Colour Transforms for Lossless 2-D ...
    The BWT is a lossless transform invented by M. Burrows and D. J. Wheeler for text compression [19]. By increasing the recurrence of image pixels, redundancy is ...
  27. [27]