Fact-checked by Grok 2 weeks ago

De Bruijn sequence

A de Bruijn sequence B(k, n) is a cyclic sequence over an of size k and k^n, in which every possible of n from the alphabet appears exactly once as a contiguous . These sequences are fundamental objects in and , corresponding to Eulerian cycles in a known as the , where vertices represent all strings of n-1 and edges represent overlaps to form strings of n. For example, a de Bruijn sequence of order 3 (with k=2, n=3) has 8 and includes all 8 possible binary triples exactly once when read cyclically, such as the sequence "00010111". The concept traces back to 1894, when French mathematician Camille Flye Sainte-Marie proved the existence of such sequences for binary alphabets in response to a problem posed in the journal L'Intermédiaire des Mathématiciens. In 1946, Dutch mathematician independently studied and generalized these sequences to arbitrary alphabet sizes k \geq 2, publishing his results in "A Combinatorial Problem" and later acknowledging Sainte-Marie's priority in 1975. De Bruijn's work built on graph-theoretic ideas, including Euler's 1735 solution to the Königsberg bridge problem, to establish that de Bruijn sequences exist for all positive integers k and n. De Bruijn sequences have wide-ranging applications across and , including the design of feedback shift registers for pseudo-random number generation and error-correcting codes. In bioinformatics, they underpin de Bruijn graph-based algorithms for genome assembly from short DNA reads in next-generation sequencing, where k-mers (substrings of length k) serve as edges to reconstruct sequences efficiently despite repeats and errors. Additional uses include problems like cracking combination locks and constructing mathematical card tricks, highlighting their utility in both theoretical and practical domains.

Fundamentals

Definition

A de Bruijn sequence, denoted as B(k, n), is a cyclic sequence of length k^n over an alphabet of size k in which every possible substring of length n appears exactly once. Here, k represents the alphabet size—for instance, k = 2 for a binary alphabet {0, 1}—and n denotes the length of the substrings, often referred to as n-mers. The cyclic nature ensures that substrings wrapping around the sequence's end are considered, achieving complete coverage of all k^n distinct n-length combinations without repetition. Equivalently, a de Bruijn sequence can be represented as a linear of k^n + n - 1, where the first n - 1 symbols of the are appended to its end to simulate the and verify uniqueness. This formulation highlights how the sequence's minimal guarantees exhaustive and non-overlapping (beyond the necessary cyclic overlap) enumeration of all possible n-mers.

Properties

A de Bruijn sequence B(k, n) for an of size k and length n exhibits universality, meaning it contains every possible length-n over the alphabet exactly once as a contiguous , thereby providing exhaustive coverage without or gaps. This property ensures the sequence serves as a compact representation of the full combinatorial space of k^n possible strings. The of a de Bruijn sequence has minimal length k^n, as this matches the number of distinct substrings required. To represent it linearly while preserving all substrings (including those wrapping around the ), the sequence is extended to length k^n + n - 1. A defining feature of de Bruijn sequences is their invariance under cyclic shifts: rotating the sequence by any number of positions yields the same set of length-n substrings, reflecting the circular structure inherent to the . The overlap property states that consecutive length-n substrings in the sequence share exactly n-1 symbols, enabling the maximal compression and the achievement of the shortest possible length. This overlap is a consequence of the sequential advancement by one symbol at a time. The existence of a de Bruijn sequence B(k, n) is equivalent to the existence of an Eulerian cycle in the G_{k,n-1}, where vertices correspond to length-(n-1) strings and directed edges to length-n strings; traversing each edge exactly once yields the sequence via edge labels. Equivalently, it corresponds to a Hamiltonian cycle in the graph where vertices are the length-n strings and edges connect overlapping strings. Since de Bruijn graphs are strongly connected and balanced (with equal in- and out-degrees k), such cycles always exist. The number of distinct cyclic de Bruijn sequences B(k, n) is given by \frac{(k!)^{k^{n-1}}}{k^n}, which arises from counting the Eulerian cycles in the de Bruijn graph using the BEST theorem for directed graphs. For the binary alphabet (k=2), this reduces to $2^{2^{n-1} - n}. For example, there is 1 such sequence for n=2, 2 for n=3, and 16 for n=4.

Historical Development

Early Constructions

The earliest documented construction of a de Bruijn-like sequence dates to 1894, when Camille Flye Sainte-Marie solved a puzzle posed by A. de Rivière as Question 48 in the journal L'Intermédiaire des Mathématiciens. The puzzle involved finding a circular arrangement of $2^n binary symbols such that every possible binary string of length n appears exactly once as a contiguous substring; this is a binary de Bruijn sequence of order n. Flye Sainte-Marie constructed such a sequence for n=2, the cyclic binary string 0011, which contains the substrings 00, 01, 11, and 10 (with the last wrapping around to the first). Flye Sainte-Marie's approach was manual and combinatorial, representing each n-letter arrangement over the binary alphabet as a pair consisting of its first n-1 letters (antecedent) and last n-1 letters (consequent). He formed a table T_n of all such pairs and constructed a closed chain C_n by linking pairs where the consequent of one matches the antecedent of the next, ensuring each pair appears exactly once—effectively tracing an Eulerian cycle through the structure. Odd-numbered terms in the chain were then replaced with 0 and even-numbered terms with 1 to yield the sequence. This method also allowed him to count the number of such distinct cyclic arrangements as $2^{2^n - n}.

Naming and Generalization

De Bruijn sequences are named after the mathematician , who formalized and generalized the concept in his seminal 1946 paper titled "A Combinatorial Problem," extending it to sequences over an arbitrary of size k that contain every possible substring of length n exactly once. This work addressed the existence of the shortest cyclic sequence achieving this property, marking a key advancement in combinatorial sequence theory. Published in the Proceedings of the Section of Sciences of the Koninklijke Nederlandse Akademie van Wetenschappen (volume 49, pages 758–764) and republished in Indagationes Mathematicae (volume 8, pages 461–467), de Bruijn's paper provided a rigorous proof of existence through a recursive construction, demonstrating that such sequences of length k^n always exist for any positive integers k and n. The recursive approach builds larger sequences from smaller ones, establishing the foundational theoretical framework for their generation and properties. Although de Bruijn's contribution established the general case, an earlier precursor for binary alphabets (k=2) appeared in the work of Flye Sainte-Marie in 1894, who proved the existence of such sequences in the context of circular arrangements. De Bruijn later acknowledged this priority in a 1975 note. The sequences gained broader prominence in the through W. Golomb's influential book Shift Register Sequences (1967), which connected them to linear feedback shift registers and applications in , thereby popularizing their use beyond pure .

Illustrative Examples

Binary Sequences

A de Bruijn sequence for a alphabet and substring length n, denoted B(2, n), is a cyclic sequence of length $2^n in which every possible string of length n appears exactly once as a when considering the sequence circularly. For n=1, the cyclic sequence covers all strings of length 1: the substrings are 0 and . This is the unique such sequence up to rotation and complement. For n=2, the cyclic sequence 0011 covers all binary strings of length 2: starting from the first position, the substrings are 00 (positions 1-2), 01 (2-3), 11 (3-4), and 10 (4-1, wrapping around). These are 00, 01, 11, and 10, confirming unique coverage of all four possibilities. Again, this is the unique sequence up to rotation and complement. For n=3, a cyclic sequence of length 8 is 00010111, which covers all eight strings of length 3. The substrings, extracted with wrap-around, are:
  • Positions 1-3: 000
  • 2-4: 001
  • 3-5: 010
  • 4-6: 101
  • 5-7: 011
  • 6-8: 111
  • 7-1 (wrap): 110 (1 from position 7, 1 from 8, 0 from 1)
  • 8-2 (wrap): 100 (1 from 8, 0 from 1, 0 from 2)
These yield 000, 001, 010, 101, 011, 111, 110, and 100, encompassing all possible triples uniquely. The reverse complement 11101000 serves as another example. Binary de Bruijn sequences are commonly represented in for brevity, though linear versions of length $2^n + n - 1 can be obtained by unfolding the while preserving the coverage, such as writing 0001011100 linearly for n=3.

Multinary Sequences

De Bruijn sequences extend naturally to multinary s of size k > 2, where the cyclic sequence of length k^n contains every possible of length n over the exactly once. This generalization increases the sequence length exponentially with k, while maintaining the property of maximal overlap of n-1 symbols between consecutive s. For the simplest case with n=1 and k=3 over the alphabet \{0,1,2\}, the cyclic sequence 012 covers all single symbols: 0, 1, and 2. A more illustrative example occurs for n=2 and k=3, yielding a cyclic sequence of length 9 that covers all 9 possible pairs (00, 01, 02, 10, 11, 12, 20, 21, 22). One such sequence is 001122102, where the consecutive pairs are 00, 01, 11, 12, 22, 21, 10, 02, and the wrap-around pair 20. This demonstrates how the sequence traverses all combinations through directed overlaps, starting from an initial symbol and appending each subsequent symbol to form the required substrings. As the alphabet size k grows, constructing these sequences becomes more complex due to the need for longer strings and more intricate overlap patterns to ensure complete coverage without repetition or omission. For instance, while sequences (k=2) serve as a foundational case with relatively straightforward patterns, multinary versions demand denser interconnections among the k^{n-1} possible overlapping prefixes. In , the sequence 001122102 for k=3, n=2 can be presented without cyclicity, but to fully cover all substrings including the wrap-around (here, 20), one appends the first symbol to the end, resulting in 0011221020 of length 10; the relevant substrings are then extracted from positions 1-2 through 9-10.

Construction Methods

De Bruijn Graph Approach

The de Bruijn graph provides a graph-theoretic framework for constructing de Bruijn sequences. For a de Bruijn sequence B(k, n) over an alphabet \Sigma of size k, the graph G(k, n) has vertices corresponding to all possible strings of length n-1 over \Sigma, known as (n-1)-mers; there are thus k^{n-1} vertices. A directed edge exists from vertex u to vertex v if the suffix of u of length n-2 equals the prefix of v of length n-2; this edge is labeled by the symbol \sigma \in \Sigma such that v is the suffix of u \sigma of length n-1, and the edge itself represents the n-mer u \sigma. Each vertex has out-degree k, corresponding to the k possible choices for \sigma, and similarly in-degree k. A de Bruijn sequence B(k, n) corresponds precisely to an Eulerian cycle in G(k, n), a closed directed path that traverses each of the k^n edges exactly once. A de Bruijn sequence B(k, n) corresponds to the concatenation of the labels of the edges traversed in the Eulerian cycle, yielding a string of length k^n. When viewed cyclically, this string contains every possible n-mer over \Sigma exactly once as a substring. For a concrete illustration, consider B(2, 3) over the alphabet \{0, 1\}. The vertices are the 2-mers: 00, 01, 10, 11. The edges are as follows:
  • From 00: to 00 labeled 0 (representing 000), to 01 labeled 1 (001)
  • From 01: to 10 labeled 0 (010), to 11 labeled 1 (011)
  • From 10: to 00 labeled 0 (100), to 01 labeled 1 (101)
  • From 11: to 10 labeled 0 (110), to 11 labeled 1 (111)
One such Eulerian cycle yields the de Bruijn sequence 00010111. When considered cyclically, 00010111 contains all eight 3-mers: 000, 001, 010, 101, 011, 111, 110, 100. The G(k, n) admits an because it is strongly connected and every vertex has equal in-degree and out-degree k, satisfying the necessary and sufficient conditions from for the existence of such a in a .

Prefer-One and Hierarchical Methods

The prefer-one method is a technique specifically designed for constructing de Bruijn sequences of order n. It initializes the sequence with n zeros and iteratively appends the next bit, preferring 1 whenever appending it would produce a novel n-bit not previously seen in the sequence. If appending 1 would repeat an existing , it appends instead, provided that choice yields a new . This process continues until all $2^n possible substrings are covered, at which point the sequence forms a full when considered cyclically. The method requires no explicit for alphabets, as it reliably generates a complete de Bruijn sequence, though it may produce sequences biased toward more 1s (e.g., up to 90% in initial segments for large n). This approach is particularly simple and efficient for binary cases with small to moderate n, as it avoids constructing the full de Bruijn graph and operates in linear time relative to the sequence length. However, for larger n or non-binary alphabets, it becomes less practical due to the need to track seen substrings efficiently and potential biases that may affect applications requiring balanced sequences. In contrast to the method, which relies on finding Eulerian paths, the prefer-one method uses local decision rules without global . Hierarchical constructions build de Bruijn sequences recursively from smaller orders by interleaving or inserting subsequences to incorporate all new n-tuples while preserving the coverage of lower-order tuples. One such , based on interleaving two de Bruijn sequences of order n-1 over size k, produces a sequence for order n by alternating symbols from the smaller sequences in a structured manner that ensures decodability and completeness. This recursive layering allows building sequences hierarchically, starting from base cases like order 1 or 2, and scales by combining verified smaller structures. For instance, beginning with the de Bruijn sequence of order 3, B(2,3) = 00010111, targeted insertions of bits (such as adding 0s in positions that extend overlaps) yield B(2,4) = 0001001101011110, covering all 16 quaternary substrings. These hierarchical methods offer advantages in modularity for small alphabets, enabling verification at each level without recomputing from scratch, and are useful for generating decodable variants suitable for error-correcting codes. Nonetheless, they are less efficient than graph-based approaches for large k or n, as the interleaving steps can introduce exponential growth in complexity during recursion, limiting scalability compared to direct Eulerian path algorithms.

Algorithmic Generation

One common algorithmic approach to generating a de Bruijn sequence B(k, n) involves constructing the de Bruijn graph of order n over an alphabet of size k, where vertices represent all strings of length n-1, and directed edges correspond to strings of length n by appending a symbol to a vertex and labeling the edge with that symbol. An Eulerian cycle in this graph, which visits every edge exactly once, yields the desired sequence when the labels of the traversed edges are concatenated (with the initial vertex prefix added). This cycle can be found efficiently using Hierholzer's algorithm, a depth-first search (DFS)-based method that builds the tour by iteratively discovering and merging cycles until all edges are covered. Hierholzer's algorithm proceeds as follows in for the :
function Hierholzer(start_vertex):
    [stack](/page/Stack) = empty [stack](/page/Stack)
    [circuit](/page/Circuit) = empty list
    [stack](/page/Stack).push(start_vertex)
    while [stack](/page/Stack) is not empty:
        current = [stack](/page/Stack).top()
        if current has unused edges:
            next_edge = pick an unused outgoing edge from current
            remove next_edge from [graph](/page/Graph)
            next_vertex = target of next_edge
            [stack](/page/Stack).push(next_vertex)
        else:
            [circuit](/page/Circuit).append([stack](/page/Stack).pop())
    return reverse([circuit](/page/Circuit))  // Reconstruct path from post-order traversal
The resulting path's edge labels form the de Bruijn sequence of length k^n, excluding the cyclic overlap. This method requires O(k^n) time and space due to the 's size (|V| = k^{n-1}, |E| = k^n), as the DFS traversal processes each edge once. For large n, direct graph construction becomes infeasible, but optimizations using linear shift registers (LFSRs) simulate the generation with O(1) time per bit and O(n) space by exploiting feedback polynomials that produce maximal-length sequences approximating de Bruijn properties, such as those based on primitive polynomials like x^n + x + 1. Another computational technique employs the inverse Burrows-Wheeler transform (iBWT), which constructs a de Bruijn sequence by starting with the sorted list of all possible rotations (or suffixes in a cyclic sense) of the target substrings and iteratively linking them via stable sorting and permutation recovery to form the full cyclic string. This reverses the Burrows-Wheeler transform's compression and yields a de Bruijn sequence, often the lexicographically smallest one using Lyndon words. For B(2,3), applying iBWT to the sorted suffixes produces the sequence 00010111, which cyclically contains all 3-bit strings exactly once.

Applications

Combinatorics and Coding Theory

De Bruijn sequences play a key role in through their generalization to universal cycles, which provide efficient ways to cover all elements of a combinatorial family exactly once in a cyclic . A universal cycle for a family of objects, such as k-subsets or , extends the de Bruijn cycle concept by embedding each object as a consecutive , minimizing redundancy and enabling compact representations for enumeration and optimization problems. For instance, graph universal cycles apply this to graph classes like or permutation graphs, where de Bruijn cycles serve as a foundational model for constructing cycles that cover all labeled or unlabeled graphs with specified overlaps, achieving compression ratios close to theoretical minima. In , de Bruijn sequences form the basis for de Bruijn covering codes, which are short strings ensuring that every possible n-symbol word over an of size q is within R of some , facilitating efficient error correction and detection. These codes achieve lengths near the sphere-covering bound of q^n / \sum_{i=0}^R \binom{n}{i} (q-1)^i, with constructions like probabilistic methods yielding upper bounds of (R + 1 + o(1)) q^n \log n / (R n (q-1)^R) for q. They support testing for fault detection by generating test patterns that cover state spaces comprehensively, reducing the length needed for exhaustive in error-correcting systems. Linear feedback shift registers (LFSRs) generate de Bruijn sequences via primitive polynomials over finite fields, producing maximal-length m-sequences of 2^n - 1 that approximate full de Bruijn sequences by omitting only the all-zero ; prepending a 0 to an m-sequence yields an exact de Bruijn of 2^n. Recent advancements highlight de Bruijn sequences' role in optimal codes, with post-2020 constructions using folding, interleaving, and primitive polynomials to derive shorter de Bruijn arrays that meet or approach lower bounds on area for multidimensional . For example, probabilistic upper bounds on the minimal length of (m,n,R)-de Bruijn arrays provide scalable solutions for q-ary spaces, improving upon prior sphere- limits and enabling tighter designs for fault-tolerant . These insights underscore de Bruijn sequences' utility in bounding the efficiency of for large-scale combinatorial and applications.

Computer Science and Cryptography

In , de Bruijn sequences enable efficient techniques, particularly for determining the position of the least or most significant set bit in a word without conditional branching or loops. A seminal involves multiplying the input value, after isolating the relevant bit (e.g., via x \& -x for the least significant bit), by a precomputed de Bruijn constant—a value derived from a de Bruijn sequence that maps each possible single-bit position to a unique pattern in the higher bits of the product. For 32-bit words, the constant $0x077CB531 is commonly used, producing a distinct 5-bit signature in the upper bits that indexes into a small for the bit position. The general formula for the position is given by: \text{pos} = ((x \times \text{magic}) \gg (32 - n)) \mod 2^n where x is the isolated bit value, \text{magic} is the de Bruijn constant, n is the word size in bits (e.g., 32), and \gg denotes right shift; the result is then typically masked and used to index a table. This approach, introduced in 1998, achieves constant-time execution on most architectures and has been widely adopted in low-level optimizations, such as in compiler intrinsics and performance-critical code like chess engines for bitboard operations. De Bruijn sequences also optimize brute-force attacks on combination locks or PIN pads that lack an "enter" key and evaluate only the last n digits entered, allowing overlaps to cover all possibilities efficiently. For a 4-digit PIN over digits 0-9 (alphabet size k=10, substring length n=4), there are $10^4 = 10,000 , but a de Bruijn sequence B(10,4) of length 10,000 ensures every possible PIN appears exactly once as a consecutive , requiring at most 10,003 key presses (accounting for wrap-around by appending the first 3 symbols). This reduces the search space dramatically compared to naive enumeration, which might require up to $10,000 \times 4 = 40,000 presses without overlaps, and has practical implications for of such devices. In , de Bruijn sequences serve as keystreams in stream due to their maximal period and balanced symbol distribution, often generated from nonlinear shift registers (NFSRs) to enhance against linear attacks. They facilitate keyspace by providing a compact traversal of all possible states, useful in or exhaustive searches for weak keys in symmetric systems. For instance, binary de Bruijn sequences derived from LFSRs (as discussed in methods) can model full-period outputs for testing cipher resilience, with properties like low aperiodic making them suitable for pseudorandom number generation in secure protocols. High-impact work has focused on constructing cryptographically strong variants with large linear complexity to resist correlation attacks. Another application arises in for detection and , where de Bruijn sequences form rotating codes on circular markers, such as or encoders, to precisely identify orientation from partial views. By inscribing the sequence around a robot's , a camera or can read any n-symbol to uniquely determine the among k^n possibilities, enabling robust localization even with or . This technique, optimized for rotary optical encoders, prevents errors like blooming and supports high-resolution measurements in systems.

Bioinformatics and Synthetic Biology

In bioinformatics, de Bruijn graphs serve as a foundational tool for assembling short sequencing reads into longer contigs during genome sequencing, by representing k-mers as nodes and overlaps as edges to resolve the underlying sequence. The Velvet assembler exemplifies this approach, employing de Bruijn graphs to process error-prone short reads from platforms like Illumina, achieving contigs up to 50 kb in length for bacterial genomes by iteratively simplifying graph structures to remove tips, bubbles, and branches caused by sequencing errors. This method has become a standard in assembly pipelines, enabling efficient reconstruction of microbial and eukaryotic genomes from fragmented data. In synthetic biology and DNA storage, de Bruijn sequences facilitate the encoding of digital information into DNA strands, ensuring comprehensive coverage of possible oligonucleotides while minimizing synthesis errors and unwanted biochemical motifs. The Explorer algorithm, introduced in 2024, leverages de Bruijn graphs to generate high-efficiency DNA codes that satisfy arbitrary local constraints (e.g., avoiding homopolymers) and global constraints (e.g., balanced GC content), achieving up to 95% encoding efficiency for data storage applications. Similarly, orthogonal de Bruijn sequences enable the design of non-interfering DNA barcode libraries for multiplexed gene synthesis, as demonstrated in a 2024 method that constructs libraries covering all k-mers while preventing cross-hybridization, supporting scalable synthetic biology experiments. Recent advancements extend de Bruijn graphs to , where an extended variant processes sequences to extract motifs for models in biological . This 2024 framework builds graphs from protein k-mers to identify structural and functional features, improving predictive accuracy in tasks like classification by capturing higher-order dependencies in sequential data. In , de Bruijn graphs enhance coverage for detection by diverse microbial communities from short reads, resolving quasispecies variability in populations. For instance, strain-resolved using de Bruijn graphs has identified novel gut genomes in metagenomes, achieving over 90% recovery of contigs longer than 10 kb from low-abundance samples. Beyond , de Bruijn sequences inform experimental design in , particularly for (fMRI) studies requiring balanced stimulus presentation to decode neural responses. De Bruijn cycles generate pseudo-random sequences that counterbalance carryover effects, ensuring every possible n-label subsequence appears exactly once, which optimizes event-related fMRI paradigms for estimating hemodynamic responses with minimal .

Generalizations

f-Fold Sequences

An f-fold de Bruijn sequence, denoted B_f(k, n), is a cyclic sequence over an alphabet of size k with length f \cdot k^n, in which every possible substring of length n (n-mer) appears exactly f times. This generalizes the standard de Bruijn sequence, corresponding to the case f = 1, by introducing controlled redundancy through multiple occurrences of each n-mer. Construction of f-fold de Bruijn sequences relies on generalizations of the de Bruijn graph approach. Specifically, a multi de Bruijn graph G(m, q, k) (with m = f, q = k) is constructed, where vertices represent all (n-1)-mers, and for each possible n-mer, there are exactly f parallel directed edges from the prefix (n-1)-mer to the suffix (n-1)-mer. An f-fold de Bruijn sequence corresponds to an Eulerian cycle in this , which traverses each edge exactly once and returns to the starting vertex. Alternative methods include extensions of the Burrows-Wheeler transform to produce multicyclic sequences that can be concatenated into the desired form. For example, in the case (k = 2, f = 2, n = 3), the sequence $1111011000101000 has length 16 and contains each of the 8 possible 3-bit substrings exactly twice, such as 000 appearing at positions wrapping around the . The number of distinct f-fold de Bruijn sequences, denoted N_f(k, n) or |C(f, k, n)| for the case, grows factorially with n due to the in Eulerian counts within the . A is given by |C(f, k, n)| = \frac{1}{f k^n} \sum_{r \mid f} \phi\left( \frac{f}{r} \right) W(r, k, n), where \phi is and W(r, k, n) = \frac{(r k)!}{r!^k k^{n-1}}, though practical computation for larger parameters often requires algorithmic enumeration. For the case with f = 2, the values are N_1 = 2, N_2 = 5, and N_3 = 82. Applications of f-fold de Bruijn sequences arise in scenarios requiring for reliability, such as robust testing of circuits or fault-tolerant error-correcting codes, where multiple coverings ensure detection and correction of failures despite the increased sequence length.

De Bruijn Tori

A de Bruijn torus B(k; m, n) is a two-dimensional of dimensions r \times s (with r s = k^{m n}) over an of size k, such that every possible m \times n over that alphabet appears exactly once when considering contiguous subarrays with wrap-around boundaries. This structure generalizes the one-dimensional de Bruijn sequence to a periodic , where the wrapping ensures that the array covers all k^{m n} possible subarrays without repetition or omission. The total number of symbols in a de Bruijn torus B(k; m, n) is exactly k^{m n}, reflecting the exhaustive enumeration of all distinct m \times n blocks. Unlike linear sequences, the nature allows for seamless cycling in both dimensions, making it particularly useful for modeling periodic two-dimensional patterns. Construction methods for de Bruijn tori typically involve two-dimensional de Bruijn s, where vertices represent (m-1) \times (n-1) blocks and edges correspond to overlapping m \times n blocks, with an Eulerian cycle in this graph yielding the torus arrangement. Alternatively, recursive techniques can build larger tori from smaller ones by combining and shifting substructures to ensure coverage. For a concrete example, consider the de Bruijn torus B(2; 2, 2), which is a $4 \times 4 that covers all possible $2 \times 2 blocks under wrapping:
1 0 1 1
1 0 0 0
0 0 0 1
1 1 0 1
This contains each of the possible 2 × 2 subarrays exactly once. Recent has explored the existence and construction of de Bruijn tori for non-square dimensions, including conditions for "perfect maps" that guarantee coverage without defects. A study on provides constructions for non-square cases where |m - n| \leq 2, using graph-theoretic approaches and de Bruijn rings to verify Eulerian properties in generalized de Bruijn graphs.

Orthogonal Variants

Orthogonal de Bruijn sequences refer to a collection of de Bruijn sequences, all of order k over an of size \sigma, such that the sets of (k+1)-length substrings appearing in their cyclic shifts are pairwise disjoint—no (k+1)-mer occurs in the shifts of more than one sequence in the collection. This property ensures that the sequences provide independent coverings of the substring space, with the longest common substring between any two being of length exactly k. A maximal such collection covers all possible \sigma^{k+1} (k+1)-mers exactly once across the union of shifts, analogous to a of the de Bruijn graph's edges into disjoint Eulerian cycles. Constructions of orthogonal de Bruijn sequences often rely on finding edge-disjoint cycles or Eulerian circuits in the G_{\sigma,k} or its , where each cycle corresponds to one sequence. For instance, rewiring techniques can adjust graph edges to ensure disjointness while preserving the de Bruijn property, yielding at least \lfloor \sigma/2 \rfloor mutually orthogonal sequences for any k \geq 1. Alternatively, orthogonal labelings of —bipermutative functions on edges that, when superposed, uniquely identify paths—can be derived from pairs of orthogonal Latin squares using bipermutative cellular automata, enabling systematic generation of such collections. These methods scale to larger \sigma, with conjectured bounds approaching \sigma - 1 orthogonal sequences for \sigma \geq 3 and k \geq 2. Recent generalizations relax the strict disjointness for practical use, allowing each (k+1)-mer to appear at most \ell times across the collection ( \ell-orthogonal) or requiring each k-mer to appear exactly b times in individual sequences ( b-balanced). These variants, explored in a 2025 study, support constructions via tensor-product graphs and fixed-weight restrictions, with bounds like \Omega_{\ell}(\sigma,k) \geq 2\ell for \sigma \geq 3. For example, consider a pair of orthogonal de Bruijn sequences for \sigma=4, k=2: one might be AACAGATCCGCTGGTT (covering all 16 dimers), paired with a disjoint sequence like TTGGCCTAAGGTCCAA, where no 3-mer overlaps in their cyclic shifts, ensuring complete coverage of 64 trimers. In applications, orthogonal de Bruijn sequences facilitate multi-channel coding in communications by providing non-interfering symbol streams for parallel transmission. In , they design orthogonal DNA barcode libraries to minimize cross-hybridization in probes or synthetic genes, with 2025 extensions to Kautz variants—directed analogs without self-loops—enhancing efficiency in and error detection.

Decoding Techniques

De Bruijn decoding involves recovering the original cyclic generating sequence from a collection of overlapping substrings, typically represented as edges in a . This process is fundamental in applications like genome assembly, where substrings (k-mers) derived from sequencing reads form the graph, and an or cycle reconstructs the underlying sequence by traversing overlaps. The standard algorithm constructs a directed de Bruijn graph with nodes as (k-1)-mers and directed edges labeled by k-mers, where each edge connects the prefix and suffix of the k-mer. An Eulerian cycle in this graph (found using Hierholzer's algorithm or variants) yields the de Bruijn sequence by concatenating the node labels along the path, effectively unwinding the cyclic overlaps into the linear representation. For instance, given the set of all eight binary triples {000, 001, 010, 101, 011, 111, 110, 100}, the corresponding de Bruijn graph for k=3 has nodes {00, 01, 10, 11} with edges forming a single cycle: 00 → 00 (via 000) → 01 (via 001) → 10 (via 010) → 01 (via 101) → 11 (via 011) → 11 (via 111) → 10 (via 110) → 00 (via 100), reconstructing the sequence 00010111 when starting from 00 and appending the last symbol of each edge. Advanced techniques leverage data structures like suffix arrays for efficient reconstruction and position decoding within the sequence. Suffix arrays, constructed in O(k^n) time for a de Bruijn sequence of length k^n, enable logarithmic-time queries to locate a given substring's starting position by binary search over sorted suffixes, aiding in verifying or refining the unwound cycle. Similarly, the inverse Burrows-Wheeler transform can unwind cyclic representations by iteratively sorting rotations and reconstructing the original string from the transform matrix, providing an alternative path to decode the sequence from a permuted overlap encoding. Challenges arise from ambiguities in non-unique paths, such as branches caused by repeated substrings or sequencing errors, which can produce multiple possible reconstructions. These are resolved using additional constraints, including k-mer coverage frequencies to prioritize high-confidence edges or paired-end read information to enforce long-range consistencies. In , this decoding reconstructs contigs from graphs in de Bruijn-based assemblers, enabling scalable genome assembly despite noisy data.

Connections to Other Structures

De Bruijn sequences exhibit a close structural relationship with Lyndon words, which are primitive words that are minimal among their cyclic rotations. A fundamental connection arises from the Chen–Fox–Lyndon theorem, which states that every nonempty word over a totally ordered admits a unique into a non-increasing sequence of Lyndon words. This property enables the construction of the lexicographically smallest de Bruijn sequence of order n over an of size \sigma by concatenating, in , all Lyndon words whose lengths divide n. This method, known as the FKM construction, yields a linear-time for generating such sequences and provides a unique representation that leverages the primitive nature of Lyndon words to ensure every n- appears exactly once in the cyclic arrangement. Universal cycles represent a broader generalization of de Bruijn sequences, extending the concept of cyclic enumeration from all possible strings to arbitrary sets of combinatorial objects. While a de Bruijn sequence is precisely a universal cycle for the set of all length-n strings over a k-ary , universal cycles apply to more diverse structures, such as k-subsets of an n-set, permutations, or even graphs and binary matrices, where each object appears exactly once as a contiguous "window" in the . This framework preserves the minimality and overlap properties of de Bruijn sequences but allows for representations tailored to the specific constraints of the object set, such as avoiding certain patterns or satisfying sum conditions. For instance, universal cycles for strings summing to at least s or with limited cyclic descents can be constructed using greedy or necklace-based algorithms analogous to those for de Bruijn sequences. Kautz sequences serve as directed graph analogs to de Bruijn sequences, particularly for structures prohibiting self-loops. In a de Bruijn graph of order k-1 over an of size \sigma, vertices represent (k-1)-, and directed edges correspond to valid k- extensions, allowing repeated symbols and thus self-loops. Kautz sequences, by contrast, arise from the Kautz , which excludes self-loops by forbidding consecutive identical symbols, ensuring that every k- in the sequence has no repeated adjacent entries. A Kautz sequence of order k is thus a cyclic where every possible such restricted k- appears exactly once, with length \sigma(\sigma-1)^{k-1}, and it corresponds to a Hamiltonian cycle in the Kautz . This makes Kautz sequences valuable in contexts like network routing or where symbol repetition must be avoided, while maintaining the exhaustive coverage property of de Bruijn sequences. Linear feedback shift registers (LFSRs) provide a practical mechanism for generating sequences closely approximating de Bruijn properties through m-sequences. An LFSR of length n over the binary field, governed by a of degree n, produces a maximal-length sequence (m-sequence) of $2^n - 1, in which every nonzero n-bit appears exactly once as a , excluding only the all-zero . These m-sequences thus exhibit near-de Bruijn behavior, covering nearly all possible n-s in a just one shorter than a full de Bruijn of order n. To achieve exact de Bruijn sequences, techniques such as joining or nonlinear extensions to feedback shift registers (NFSRs) are employed, where short cycles from the LFSR are merged using special marker states, often yielding at least exponentially many distinct de Bruijn sequences. This connection underscores the role of LFSRs in efficient implementations for pseudorandom generation and testing. In a broader combinatorial context, de Bruijn sequences relate to and in via their shared emphasis on efficient traversals of discrete structures. A de Bruijn sequence of n corresponds to an Eulerian cycle in the B(2,n), where vertices are binary strings of n-1 and edges represent overlaps, analogous to how a reflected Gray code defines a or cycle in the n-dimensional hypercube Q_n, traversing all $2^n vertices by single-bit flips. This parallelism highlights de Bruijn sequences as a form of "overlapping Gray code" for higher-dimensional shifts, with applications in both error-correcting codes and sequential listing problems, though the graphs differ in connectivity—hypercubes are undirected and loop-free, while de Bruijn graphs are directed with overlaps.

References

  1. [1]
    de Bruijn Sequence -- from Wolfram MathWorld
    The shortest circular sequence of length sigma^n such that every string of length n on the alphabet a of size sigma occurs as a contiguous subrange of the ...Missing: definition | Show results with:definition
  2. [2]
    [PDF] De Bruijn Sequences
    A De Bruijn sequence of rank n on an alphabet of size k is a cyclic word in which each of the kn words of length n appears exactly once as we travel around the ...
  3. [3]
    [PDF] Lecture 21 1 De Bruijn Sequences
    Apr 28, 2011 · Definition 1 A binary De Bruijn Sequence of order n is a string of bits bi ∈ {0,1}, b = {b1, ..., b2n } such that ever string of lengh n, ...
  4. [4]
    [PDF] Acknowledgement of priority to C. Flye Sainte-Marie on the counting ...
    Jan 1, 1975 · Citation for published version (APA):. Bruijn, de, N. G. (1975). Acknowledgement of priority to C. Flye Sainte-Marie on the counting of circular.
  5. [5]
    Why are de Bruijn graphs useful for genome assembly? - PMC - NIH
    Origin of de Bruijn graphs. In 1946, the Dutch mathematician Nicolaas de Bruijn became interested in the Superstring Problem: find a shortest circular ...
  6. [6]
    [PDF] Generalizing the classic greedy and necklace constructions of de ...
    Martin proved that the algorithm always terminates with a sequence that has length kn + n − 1 and suffix kn, and that a de Bruijn sequence is obtained by ...<|control11|><|separator|>
  7. [7]
    A016031 - OEIS
    Aug 8, 2025 · De Bruijn's sequence: 2^(2^(n-1) - n): number of ways of arranging 2^n bits in circle so all 2^n consecutive strings of length n are distinct.
  8. [8]
  9. [9]
    [PDF] DeBruijn Cycles and Relatives
    There are 28 edges in this graph and each edge corresponds to a domino. An Eulerian cycle corresponds to a maximal domino game like that shown in Figure 7.1.
  10. [10]
    [PDF] A combinatorial problem - Pure
    A combinatorial problem. Proceedings of the Section of Sciences of the Koninklijke. Nederlandse Akademie van Wetenschappen te Amsterdam, 49(7), 758-764.
  11. [11]
    Shift register sequences : : Golomb, Solomon W. (Solomon Wolf)
    Sep 19, 2022 · Shift register sequences : ; Publication date: 1982 ; Topics: Coding theory, Switching theory, Sequential machine theory, Shift registers.Missing: de Bruijn 1960s
  12. [12]
    [PDF] Art of de Bruijn Sequences - The Bridges Archive
    In this paper, we discuss the basics of de Bruijn sequences and present some drawings and grid-based designs inspired by them. de Bruijn Basics. In his 1946 ...
  13. [13]
    [PDF] a simple combinatorial algorithm for de bruijn sequences - mimuw
    Although the existence of such sequences is not obvious, it is well known that they exist for all orders n and that the number of distinct sequences is 22k−1−k, ...
  14. [14]
    Combined Greedy Algorithm on Construction Binary De Bruijn ...
    For example, if n = 2, then there is exactly one binary de bruijn sequence of order two that is 0011. If n = 3, then there are two different sequences, 00010111 ...<|control11|><|separator|>
  15. [15]
    [PDF] Revisiting the Prefer-same and Prefer-opposite de Bruijn sequence ...
    The most well-known greedy de Bruijn sequence construction is the Prefer-one (or equivalently Prefer-zero) described as follows. Prefer-one algorithm [9]. 1.
  16. [16]
  17. [17]
    None
    ### Summary of Hierholzer's Algorithm for De Bruijn Sequences
  18. [18]
    De Bruijn sequence | Set 1 - GeeksforGeeks
    Jul 11, 2025 · // Function to find a de Bruijn sequence // of order n on k ... // Number of edges int l = pow(k, n); for (int i = 0; i < l; ++i) S += ...
  19. [19]
    An efficiently generated family of binary de Bruijn sequences
    A binary de Bruijn sequence of order n is a 2 n -periodic sequence in which each n -tuple occurs exactly once per period. There are ...Missing: citation | Show results with:citation
  20. [20]
  21. [21]
    Graph Universal Cycles: Compression and Connections to ... - arXiv
    Sep 28, 2022 · Universal cycles, such as De Bruijn cycles, are cyclic sequences of symbols that represent every combinatorial object from some family exactly ...
  22. [22]
    [PDF] De Bruijn cycles for covering codes - UCSD Math
    ABSTRACT: A de Bruijn covering code is a q-ary string S so that every q-ary string is at most R symbol changes from some n-word appearing consecutively in ...
  23. [23]
    None
    Nothing is retrieved...<|separator|>
  24. [24]
    Investigating the discrepancy property of de Bruijn sequences
    ### Summary of m-Sequences and de Bruijn Sequences from arXiv:2005.01638
  25. [25]
    On de Bruijn Covering Sequences and Arrays
    ### Summary of de Bruijn Sequences in Covering Codes from arXiv:2404.13674
  26. [26]
    Bit Twiddling Hacks - Stanford Computer Graphics Laboratory
    The constant 0x077CB531UL is a de Bruijn sequence, which produces a unique pattern of bits into the high 5 bits for each possible bit position that it is ...Missing: ternary | Show results with:ternary<|control11|><|separator|>
  27. [27]
    de Bruijn cycles for neural decoding - PMC - NIH
    De Bruijn sequences may be represented, and thus constructed, as the path through a de Bruijn graph: a network of kn connected nodes in which each node is one ...
  28. [28]
    Cracking pass codes with De Bruijn sequences
    Oct 22, 2019 · Using De Bruijn sequences cuts the amount of work necessary to perform a brute force attack by about n; it would be exactly n if it weren't for ...Missing: manual constructions enumeration
  29. [29]
    Aperiodic Linear Complexities of de Bruijn Sequences - SpringerLink
    Dec 1, 2000 · These properties suggest that de Bruijn sequences may be useful in stream ciphers. However, any binary sequence can be generated using a ...
  30. [30]
    [PDF] Cryptographically Strong de Bruijn Sequences with Large Periods
    A de. Bruijn sequence can be generated by an n-stage NLFSR and it has known randomness properties such as long period, balance, span n property [3,. 8, 9]. The ...<|control11|><|separator|>
  31. [31]
    Partitioning de Bruijn graphs into fixed-length cycles for robot ...
    Hamiltonian cycles in de Bruijn graphs are called de Bruijn sequences. The ... A proof of Golomb's conjecture for the de Bruijn graph. J. Combin. Theory ...
  32. [32]
    Optimizing the De-Bruijn Code of Rotary Optical Encoders ...
    The De-Bruijn sequences could provide appropriate circular arrangements to achieve a single-track absolute rotary optical encoder in the servo motor ...
  33. [33]
    Velvet: Algorithms for de novo short read assembly using de Bruijn ...
    Velvet is a set of algorithms using de Bruijn graphs for genomic sequence assembly, ideal for short read data, and can produce contigs of significant length.
  34. [34]
    algorithms for de novo short read assembly using de Bruijn graphs
    Velvet is a set of algorithms for genomic sequence assembly using de Bruijn graphs, ideal for short read data, producing contigs up to 50-kb N50.
  35. [35]
    efficient DNA coding by De Bruijn graph toward arbitrary local and ...
    Jul 29, 2024 · The De Bruijn graph has been demonstrated to be valuable in the field of molecular biology because of the connectedness between adjacent ...
  36. [36]
    Scalable design of orthogonal DNA barcode libraries - Nature
    Jun 7, 2024 · A method for constructing artificial DNA libraries based on generalized de bruijn sequences. ... A new algorithm for DNA sequence assembly.
  37. [37]
    An extended de Bruijn graph for feature engineering over biological ...
    Jul 19, 2024 · In this study, we introduce a novel de Bruijn graph (dBG) based framework for feature engineering in biological sequential data such as proteins.
  38. [38]
    Applications of de Bruijn graphs in microbiome research - PMC
    Mar 1, 2022 · De Bruijn graphs are used for assembling sequencing data, gene-targeted assembly, comparative genomics, and large-scale searches in microbiome ...
  39. [39]
    Complementary insights into gut viral genomes: a comparative ...
    Dec 20, 2024 · Metagenome-assembled viral genomes have significantly advanced the discovery and characterization of the human gut virome.
  40. [40]
    de Bruijn cycles for neural decoding - ScienceDirect.com
    Jun 1, 2011 · De Bruijn sequences are a cyclic ordering of k labels of order-n ... Efficient design of event-related fMRI experiments using M-sequences ...
  41. [41]
    [PDF] Multi de Bruijn Sequences
    ... f-fold de Bruijn sequence”; this is the same as a cyclic multi de Bruijn sequence but in different terminology and notation. The description below is in ...
  42. [42]
    Constructing Orthogonal de Bruijn Sequences - SpringerLink
    In this paper, we show how to construct large collections of orthogonal de Bruijn sequences. In particular, we prove that there are at least mutually- ...
  43. [43]
    (PDF) Constructing Orthogonal de Bruijn Sequences - ResearchGate
    Aug 7, 2025 · In this paper, we show how to construct large collections of orthogonal de Bruijn sequences. In particular, we prove that there are at least ⌊ σ ...
  44. [44]
    [2501.12921] Generalized Orthogonal de Bruijn and Kautz Sequences
    Jan 22, 2025 · Here we study three relevant practical generalizations of orthogonal de Bruijn sequences where we relax either the constraint that every (k+1)-sequence appears ...Missing: 2024 | Show results with:2024
  45. [45]
    [PDF] Orthogonal labelings in de Bruijn graphs - Luca Mariot
    Two Latin squares L1 and L2 of order N are orthogonal if their superposition yields all the pairs (x,y) ∈ [N]×[N]. What do we know so far?
  46. [46]
    An Eulerian path approach to DNA fragment assembly - PNAS
    This paper suggests an approach to the fragment assembly problem based on the notion of the de Bruijn graph. In an informal way, one can visualize the ...
  47. [47]
    Efficient Ranking of Lyndon Words and Decoding Lexicographically ...
    Efficient Ranking of Lyndon Words and Decoding Lexicographically Minimal de Bruijn Sequence ... We focus on two settings, where (1) the input array is available ...
  48. [48]
    [PDF] Burrows-Wheeler transformations and de Bruijn words - arXiv
    Jan 24, 2019 · The extended BW transform however does allow the inversion of an arbitrary word and the result in general is a multiset (a set allowing repeats).
  49. [49]
    [PDF] Efficient Ranking of Lyndon Words and Decoding Lexicographically ...
    Dec 11, 2023 · A de Bruijn sequence of rank n [10] is a cyclic sequence of length σn in which every possible word of length n occurs as a factor exactly once.
  50. [50]
    [PDF] more de bruijn sequences as concatenation of lyndon words
    We also show that an appropriate bijective image of the binary, prefer-opposite de Bruijn sequence is uniquely factored as a concatenation of Lyndon words. ...
  51. [51]
    Generalizing the Classic Greedy and Necklace Constructions of de ...
    Feb 5, 2016 · Examples include all strings (in which case universal cycles are commonly known as de Bruijn sequences), strings that sum to at least s s , ...
  52. [52]
    [PDF] De Bruijn Sequences from Symmetric Shift Registers
    Nov 6, 2015 · The efficiency of their algorithm depends on the length of the longest cycle in the base FSR. To find FSRs that contain only very short cycles, ...
  53. [53]
    [PDF] Combinatorial Gray codes—an updated survey
    Feb 2, 2022 · content that either transposes two symbols or permutes three symbols in each step; see. Figure 6 (g). ... De Bruijn sequence and universal cycle ...<|control11|><|separator|>