Fact-checked by Grok 2 weeks ago

De Bruijn graph

A De Bruijn graph is a directed graph in graph theory that represents overlaps between sequences of symbols from a finite alphabet of size k, typically defined for a given order n such that the vertices correspond to all possible strings (or words) of length n-1 over the alphabet, and each directed edge corresponds to a unique string of length n, connecting the prefix of length n-1 to the suffix of length n-1.^[1] This structure ensures that every vertex has exactly k incoming and k outgoing edges, making the graph Eulerian and balanced in degree.^[2] The concept originated independently in 1946 from the work of Dutch mathematician Nicolaas Govert de Bruijn and British statistician I. J. Good, who developed it in the context of combinatorial problems related to recurring decimals and shift register sequences, though de Bruijn's formulation focused on finding cyclic sequences containing all possible substrings of a given length.^[3]^[4] De Bruijn graphs generalize these ideas to model de Bruijn sequences, which are cyclic strings where every possible substring of length n appears exactly once, and an Eulerian cycle in the graph corresponds to such a sequence of minimal length k^n.^[5] Key properties of De Bruijn graphs include their strong connectivity for k \geq 2 and n \geq 1, the existence of Hamiltonian paths and cycles under certain conditions, and their utility in encoding information efficiently due to the graph's regularity.^[2] They have found extensive applications beyond combinatorics, notably in bioinformatics for de novo genome assembly, where edges represent overlapping k-mers from sequencing reads to reconstruct contigs via Eulerian paths; in network theory for designing fault-tolerant interconnection networks with high connectivity and diameter logarithmic in the number of nodes; and in computer science for modeling shift registers, data compression, and string matching algorithms.^[5]^[2] Modern variants, such as succinct or dynamic De Bruijn graphs, address scalability for large-scale genomic data analysis.^[6]

Definition and History

Formal Definition

A directed De Bruijn graph B(k, d), where k \geq 1 is an integer parameter representing the substring length and d \geq 2 is the size of a finite alphabet \Sigma with |\Sigma| = d, is a directed graph G = (V, E).^[3] The vertex set V consists of all possible strings of length k-1 over \Sigma, denoted V = \Sigma^{k-1}. Thus, the number of vertices is |V| = d^{k-1}. The edge set E corresponds to all possible strings of length k over \Sigma; for each such k-length string (or k-mer) w = w_1 w_2 \cdots w_k \in \Sigma^k, there is a directed edge from the source vertex u = w_1 w_2 \cdots w_{k-1} (the prefix of length k-1) to the target vertex v = w_2 w_3 \cdots w_k (the suffix of length k-1). Equivalently, E = \{ (u, v) \mid u, v \in \Sigma^{k-1} \text{ and } u_2 u_3 \cdots u_{k-1} = v_1 v_2 \cdots v_{k-2} \}. The number of edges is therefore |E| = d^k, with each vertex having out-degree and in-degree exactly d. An equivalent formulation uses vertices as all strings of length k over \Sigma, with a directed edge from u to v if the suffix of u of length k-1 equals the prefix of v of length k-1; this is commonly used in bioinformatics applications.^[7] For a concrete example, consider the binary De Bruijn graph B(2, 2) over \Sigma = \{0, 1\}. The vertices are the length-1 strings: V = \{ 0, 1 \}. The edges, corresponding to the length-2 strings, are $0 \to 0 (for 00), $0 \to 1 (for 01), $1 \to 0 (for 10), and $1 \to 1 (for 11).

Historical Development

The concept underlying De Bruijn graphs traces back to earlier work on cyclic arrangements and sequences. In 1894, French mathematician Camille Flye Sainte-Marie demonstrated the existence of certain binary cyclic sequences in his paper "Question 48" published in Revue de mathématiques spéciales, providing an implicit foundation for what would later be recognized as de Bruijn sequences of order 2, though without an explicit graph structure. De Bruijn later acknowledged Sainte-Marie's priority in this regard in a 1975 note.^[8] The explicit development of De Bruijn graphs emerged independently in 1946 amid post-World War II interest in combinatorial problems related to sequences and coding. Dutch mathematician Nicolaas Govert de Bruijn introduced a graph-based approach in his paper "A combinatorial problem," published in Proceedings of the Section of Sciences of the Koninklijke Nederlandse Akademie van Wetenschappen, volume 49, pages 758–764, to count the number of distinct periodic sequences over a finite alphabet and solve related enumeration issues for binary cases.^[3] Concurrently, British statistician Irving John Good, working on the "teleprinter problem" during wartime codebreaking efforts, described a similar graph construction in his 1946 paper "Normal recurring decimals," published in Journal of the London Mathematical Society, series 1, volume 21, pages 167–169, to analyze recurring decimal expansions and normal numbers in base 10, effectively modeling overlaps in sequences.^[9] During the 1950s and 1960s, these ideas gained traction in combinatorics and information theory, particularly through studies of shift-register sequences and Eulerian paths, as explored in Solomon Golomb's 1967 book Shift Register Sequences, which highlighted their utility in pseudorandom number generation and coding theory without yet standardizing the graph terminology. The formal naming of the structure as the "de Bruijn graph" occurred in the 1970s, reflecting de Bruijn's foundational contributions, with key algorithmic advancements by Harold Fredricksen in his 1975 paper "A class of nonlinear de Bruijn cycles," published in the Journal of Combinatorial Theory, Series A, which introduced methods for constructing de Bruijn sequences for general alphabets using efficient cycle generation.^[10] A significant milestone came in the late 1980s with the integration of de Bruijn graphs into bioinformatics, pioneered by Pavel Pevzner in his 1989 paper "1-Tuple DNA sequencing: computer analysis," published in Journal of Biomolecular Structure & Dynamics, volume 7, pages 63–73, where they were adapted for sequencing by hybridization to reconstruct DNA from short oligonucleotide spectra.^[11]

Variants

Directed De Bruijn Graphs

In directed De Bruijn graphs, the directionality of edges is crucial for modeling overlaps in sequences, where each edge represents a left-shift operation on a k-mer. Specifically, for a k-mer over an alphabet of size d, the directed edge connects the prefix of length k-1 to the suffix of length k-1, effectively capturing the overlap after removing the first symbol and appending a new one. This structure, originally introduced to analyze recurring decimals and sequence overlaps, allows the graph to represent transitions in linear or cyclic sequences faithfully.^[5] A key property of the directed De Bruijn graph B(k, d) is that every vertex has both in-degree and out-degree exactly equal to d. This regularity arises because, for any (k-1)-mer as a vertex, there are precisely d possible symbols to append for an outgoing edge (forming a k-mer), and similarly d possible preceding symbols for incoming edges. This balanced degree structure ensures the graph is Eulerian under certain conditions, though the focus here is on the directed nature facilitating sequence reconstruction through path traversals.^[12]^[5] To illustrate, consider the binary directed De Bruijn graph B(3, 2), where vertices correspond to all possible 2-mers over {0,1}: 00, 01, 10, 11. The edges are defined by 3-mers, such as 000 directing from 00 to 00, 001 from 00 to 01, 010 from 01 to 10, 011 from 01 to 11, 100 from 10 to 00, 101 from 10 to 01, 110 from 11 to 10, and 111 from 11 to 11. This configuration visualizes how directionality encodes the shift: the edge for 000, for instance, indicates that after "00", appending 0 yields the overlapping suffix "00". The resulting graph consists of these four vertices with two outgoing edges each, forming a compact representation of all possible binary overlaps.^[5]^[12] Unique to the directed variant, the graph is generally cyclic, reflecting the repetitive and overlapping nature of sequences it models. This directionality makes directed De Bruijn graphs particularly suited for modeling shifts in symbolic sequences, such as in coding theory and data compression, where the oriented paths correspond to progressive symbol transitions.^[12]

Undirected and Generalized Variants

The undirected de Bruijn graph modifies the standard directed structure by treating edges as bidirectional, with vertices representing (k-1)-mers over an alphabet of size d and edges corresponding to k-mers where the direction of overlap is ignored. This results in a multigraph that may include loops (when the prefix and suffix (k-1)-mers are identical) and multiple edges between the same pair of vertices (when distinct k-mers connect the same unordered pair). Unlike the directed variant, which enforces a shift in one direction, the undirected version is suited for problems where sequence overlap symmetry matters, such as certain fault-tolerant network routings or undirected Eulerian path explorations in combinatorial designs.^[13] For example, consider the undirected de Bruijn graph B(2,2) over the binary alphabet {0,1} with k=2. The vertices are the 1-mers: 0 and 1. The k-mers are 00 (connecting {0,0}, a loop at 0), 01 (connecting {0,1}), 10 (connecting {1,0}, same as {0,1}), and 11 (connecting {1,1}, a loop at 1). This yields a multigraph with a loop at each vertex and two parallel edges between 0 and 1. In general, such graphs have exactly d^k edges counting multiplicity, making them useful in undirected sequence reconstruction tasks.^[13] Generalized variants extend the de Bruijn graph framework beyond uniform directions and binary alphabets. Weighted de Bruijn graphs assign frequencies or abundances to edges, reflecting the multiplicity of k-mers in input data, such as read coverage in probabilistic sequence models or genome assembly. This weighting enables handling of coverage variations and errors, as each edge's weight represents how often the corresponding k-mer appears, facilitating algorithms that prioritize high-confidence paths. For instance, in DNA assembly, edge weights correspond to k-mer counts from sequencing reads, allowing Eulerian path traversals to account for abundance.^[14] Higher-order variants increase the parameter k to model longer n-grams, capturing dependencies in sequences like linguistic or genomic data where substrings exceed fixed short lengths. These are essentially de Bruijn graphs with larger k, supporting n-gram analysis in higher-order Markov chains by representing overlaps of arbitrary length. Multi-dimensional generalizations adapt the structure to higher dimensions, such as 2D de Bruijn tori for image processing or toroidal networks, where vertices are tuples in multiple dimensions and edges reflect shifts in each, extending applications to spatial or multi-sequence data.^[15]

Properties

Structural Properties

The directed De Bruijn graph B(k, n), where k is the alphabet size and n is the order, is a k-regular digraph. Every vertex in B(k, n) has in-degree and out-degree exactly k, as each vertex corresponds to a unique string of length n-1 over an alphabet of k symbols, and outgoing edges are formed by appending any one of the k symbols (shifting the overlap), while incoming edges arise analogously from prepending symbols.^[2]^[12] This balanced degree structure ensures the graph is Eulerian, though the focus here is on its topological regularity.^[16] The graph B(k, n) is strongly connected for k \geq 2 and n \geq 1, meaning there exists a directed path between any pair of distinct vertices.^[2] This property stems from the comprehensive overlap representation, allowing traversal across all possible (n-1)-mers via extensions and shifts. The vertex connectivity of B(k, n) is k-1, indicating robustness to vertex failures up to that threshold while preserving strong connectivity.^[17] Due to the k-regularity, the total number of edges balances precisely with k times the number of vertices: |E| = k \cdot |V| = k^n, since |V| = k^{n-1}.^[2] A defining recursive feature is that B(k, n) is the line digraph of B(k, n-1); specifically, the vertices of B(k, n) correspond to the edges of B(k, n-1), and two such vertices are adjacent in B(k, n) if the corresponding edges in B(k, n-1) share a common vertex (via the n-2 overlap).^[2] This iterative construction highlights the hierarchical overlap structure inherent to De Bruijn graphs. For a concrete example, B(k, 1) consists of k vertices, each representing a single alphabet symbol, with a directed edge from every vertex i to every vertex j (including self-loops when i = j), forming a complete digraph augmented with self-loops.^[2]

Combinatorial Properties

De Bruijn graphs exhibit several key combinatorial properties related to their cycle structures and embeddings. Every De Bruijn graph B(k,n), defined over an alphabet of size k and substrings of length n, is Eulerian. This follows from the graph being strongly connected and balanced, with every vertex having equal in-degree and out-degree k. Consequently, B(k,n) admits an Eulerian circuit that traverses each of its k^n edges exactly once.^[18] Likewise, every De Bruijn graph B(k,n) is Hamiltonian, possessing a cycle that visits each of its k^{n-1} vertices exactly once. For the binary case (k=2), this was established by de Bruijn using a recursive construction based on T-nets, and the result extends to general k via analogous inductive arguments.^[18] The number of distinct de Bruijn sequences B(k,n) is \frac{(k!)^{k^{n-1}}}{k^n}. This count arises from the BEST theorem applied to the number of Eulerian circuits, adjusted for the cyclic nature of the sequences (each sequence corresponds to k^n Eulerian circuits due to rotational symmetry). For de Bruijn graphs, the number of Eulerian circuits is (k!)^{k^{n-1}}, reflecting the choices for ordering outgoing edges at each vertex while accounting for the graph's symmetric structure. Each de Bruijn sequence corresponds to an Eulerian cycle in the graph when the edge labels are read cyclically. For the binary case, the number of sequences simplifies to $2^{2^{n-1} - n}.^[19]^[20] Binary de Bruijn graphs (k=2) also have queue number 2, admitting a queue layout where vertices are linearly ordered and edges partitioned into two queues without nested or crossing conflicts in the same queue. This property facilitates efficient parallel computation models and layout algorithms.^[21] Moreover, binary de Bruijn graphs have book thickness at most 5, allowing an embedding into a book with five half-planes (pages) such that no two edges on the same page cross when vertices are placed along the spine. This bound is achieved through recursive embedding techniques for the graph's hierarchical structure.^[22]

Connection to De Bruijn Sequences

Eulerian Paths and Cycles

In de Bruijn graphs, an Eulerian path is a trail that visits every edge exactly once, and the labels of these edges correspond to all possible n-mers of length n over an alphabet of size k, thereby generating a sequence that overlaps to cover every substring exactly once.^[23] This direct mapping establishes the theoretical foundation linking graph traversal to sequence generation, where the path's edge sequence forms the de Bruijn sequence by concatenating the symbols while accounting for the (n-1)-length overlap between consecutive edges.^[24] An Eulerian cycle, which is a closed Eulerian path returning to the starting vertex, produces a cyclic de Bruijn sequence of length exactly k^n, ensuring that all possible substrings of length n appear precisely once when the sequence is considered circularly.^[3] This cyclic property arises because the cycle's edge labels wrap around, with the final edge's symbols overlapping the initial ones to complete the coverage without redundancy.^[24] The existence of an Eulerian cycle in a de Bruijn graph B(k, n) is guaranteed by Euler's theorem for directed graphs, which states that a strongly connected digraph with equal in-degree and out-degree at every vertex admits such a cycle; in de Bruijn graphs, every vertex has in-degree and out-degree equal to k due to the complete set of outgoing and incoming edges labeled by the alphabet symbols.^[25] This balanced degree condition holds inherently for the standard directed de Bruijn graph, ensuring traversability without isolated components or imbalances.^[23] For a concrete illustration, consider the binary de Bruijn graph B(2,2), with vertices labeled 0 and 1, and edges 00 (0→0), 01 (0→1), 10 (1→0), and 11 (1→1). An Eulerian cycle such as 0 \xrightarrow{00} 0 \xrightarrow{01} 1 \xrightarrow{11} 1 \xrightarrow{10} 0 yields the edge-label sequence 00110, which cyclically contains all 2-mers: 00, 01, 11, 10 (with the wrap-around providing the closing overlap).^[24]

Construction Methods

The construction of a directed De Bruijn graph B(k, n) over an alphabet \Sigma of size k proceeds by defining the vertex set V as all possible strings of length n-1 from \Sigma, known as (n-1)-mers, resulting in |V| = k^{n-1} vertices. For each possible n-mer s = s_1 s_2 \dots s_n \in \Sigma^n, a directed edge is added from the vertex representing the prefix s_1 s_2 \dots s_{n-1} to the vertex representing the suffix s_2 s_3 \dots s_n, with the edge labeled by s or simply by the last symbol s_n. This ensures that every n-mer corresponds uniquely to an edge, and the graph is balanced with in-degree and out-degree k at each vertex. To derive a de Bruijn sequence from the graph, an Eulerian cycle that traverses each edge exactly once must be found, as such a cycle encodes a cyclic sequence containing every n-mer precisely once. An adaptation of Hierholzer's algorithm accomplishes this by initiating a tour from any vertex, recursively traversing unused edges to form sub-cycles, and splicing these sub-cycles into the main tour at appropriate vertices until all edges are exhausted. This process leverages the graph's Eulerian property—arising from equal in- and out-degrees—and runs in linear time relative to the number of edges.^[26] For binary De Bruijn sequences (k=2), the prefer-one method offers a graph-free greedy construction that implicitly follows a specific Eulerian path in the De Bruijn graph. Starting with a seed string of length n-1 (typically all zeros or ones), the algorithm appends a '1' if it does not produce a duplicate n-mer substring already present, otherwise appending '0'; this preference for '1' ensures the traversal avoids closing cycles prematurely and covers all $2^n binary n-mers exactly once. The resulting sequence can be generated in O(2^n) time using O(n) space, and it corresponds to the lexicographically largest binary de Bruijn sequence.^[27] The explicit construction of the full De Bruijn graph enumerates all k^{n-1} vertices and k^n edges, incurring O(k^n) time and space complexity, which limits its practicality to small n (e.g., n \leq 20 for k=4 in DNA applications). For larger n, implicit or compacted representations are employed to mitigate exponential growth, though these fall outside standard explicit methods.

Applications

Bioinformatics and Sequence Assembly

De Bruijn graphs were introduced to bioinformatics for DNA sequence assembly by Idury and Waterman in 1995, who proposed an algorithm that models shotgun sequencing data as paths in a directed graph where nodes represent (k-1)-mers and edges correspond to k-mers observed in the fragments. This approach transforms the fragment assembly problem into finding an Eulerian path that reconstructs the original sequence by traversing each edge exactly once, leveraging the combinatorial structure to handle overlapping subsequences efficiently.^[28] In de novo genome assembly, short sequencing reads are decomposed into k-mers, which serve as edges in the De Bruijn graph, with nodes representing the prefixes and suffixes of these k-mers. Connected components in the graph correspond to contigs—continuous sequences assembled from the reads—while the overall Eulerian path through the graph aims to reconstruct the full genome by linking these contigs. This method excels in high-coverage datasets, as multiple traversals of the same k-mer edges reinforce the path, but it requires careful preprocessing to address sequencing errors and biological repeats. For instance, low-coverage erroneous k-mers manifest as short "tips" or dead-end paths in the graph, which can be trimmed during error correction to simplify the structure without losing true sequence information.^[5] Repetitive regions in the genome create branches or cycles in the De Bruijn graph, where multiple edges emanate from a node due to identical k-mers appearing in different contexts, complicating path reconstruction. Assemblers mitigate this by resolving "bulges" (short branches from errors or variants) and using paired-end read information to bridge ambiguous junctions, though long perfect repeats longer than the read length remain unresolved. Tools like Velvet (Zerbino and Birney, 2008) employ iterative graph simplification, including tip removal and bubble popping, to handle errors and repeats, producing high-quality contigs from short reads. Similarly, ABySS (Simpson et al., 2009) distributes the De Bruijn graph across parallel processors for large-scale assemblies, enabling efficient handling of repetitive bacterial and eukaryotic genomes by varying k-mer sizes to balance resolution of repeats against computational demands.^[29]^[30] The choice of k-mer length is critical, as smaller k values increase graph connectivity and tolerance to sequencing errors but amplify branching from repeats, while larger k values enhance specificity for unique regions at the cost of requiring deeper coverage to ensure all k-mers are sampled. Optimal k is often selected empirically or via automated methods that assess coverage statistics and assembly metrics, striking a balance between contig length, accuracy, and computational resources.^[31]^[5] Recent advances have extended De Bruijn graphs to long-read sequencing technologies, such as Pacific Biosciences and Oxford Nanopore, where high-error long reads are handled via multiplex or colored De Bruijn graphs that incorporate read colors or error profiles to improve accuracy. For example, the Verkko assembler (2022) combines De Bruijn graphs with overlap graphs for hybrid assembly of high-fidelity long reads, achieving complete bacterial and eukaryotic genomes. Additionally, machine learning approaches, like GnnDebugger (2024), use graph neural networks to detect and correct errors in De Bruijn graphs, enhancing assembly quality for complex metagenomes and polyploid genomes. These developments address scalability for large-scale genomic data, including pangenome construction, where De Bruijn graphs facilitate variation-aware alignments as of 2025.^[32]^[33]^[34]

Computer Science and Networks

In distributed systems, De Bruijn graphs provide an efficient topology for peer-to-peer networks, particularly in distributed hash tables (DHTs) where low-diameter routing is essential for scalability. The Koorde DHT, introduced by Kaashoek and Karger, leverages the De Bruijn graph structure to achieve a constant node degree while maintaining a routing diameter of O(\log n) for n nodes, enabling logarithmic-time lookups with minimal overhead compared to other DHTs like Chord. This design inherits Chord's simplicity but optimizes for degree, using de Bruijn shifts to connect nodes in a way that supports fault-tolerant and load-balanced routing.^[35] Binary De Bruijn graphs are fundamental in modeling linear feedback shift registers (LFSRs), which serve as compact hardware for pseudorandom number generation in applications such as cryptography, error-correcting codes, and simulation. In an LFSR of length n over a binary alphabet, the states correspond to nodes in the binary de Bruijn graph of order n, with feedback connections represented as edges that cycle through $2^n possible states to produce sequences exhibiting ideal autocorrelation properties for randomness testing. Constructions from LFSRs with arbitrary characteristic polynomials allow the generation of full-period de Bruijn sequences by joining cycles in the graph, ensuring maximal length without repetition. This modeling facilitates analysis of sequence periodicity and predictability, with primitive polynomials yielding m-sequences that approximate de Bruijn sequences closely. De Bruijn graphs also underpin interconnect topologies in parallel computing architectures, where their fixed degree and logarithmic diameter minimize latency in message passing among processors. In binary de Bruijn networks for massively parallel systems, the diameter is \log_2 N for N nodes, allowing any processor to reach any other via at most \log_2 N hops, with each node maintaining O(1) connections for simplicity and scalability. These networks exhibit high fault tolerance, as multiple disjoint paths exist between nodes, and have been proposed for optical implementations to support high-bandwidth communication in multiprocessor environments. Products of de Bruijn graphs with other fixed-degree networks further enhance embeddability and dilation properties, making them suitable for embedding complex algorithms with low communication costs.^[36] In algorithmic applications, De Bruijn graphs enable efficient processing of n-gram overlaps for tasks like spell-checking, where nodes represent word fragments and edges model valid transitions to identify and correct errors by finding probable paths. Routing tables in De Bruijn-based DHTs and interconnects maintain a constant size of O(k) entries per node in k-ary graphs with diameter d = \log_k n, supporting greedy forwarding without exponential storage growth.^[37]^[38]

Dynamical Systems and Symbolic Dynamics

In symbolic dynamics, De Bruijn graphs provide a graphical model for shift spaces, particularly shifts of finite type. The binary De Bruijn graph B(2,2), with vertices representing binary strings of length 2 and edges corresponding to overlapping strings of length 3, models the full shift on two symbols \{0,1\}. Infinite paths in this graph encode bi-infinite sequences in the shift space, where each edge traversal appends a new symbol while respecting the overlap structure, thus capturing all possible admissible sequences without forbidden blocks.^[39] This modeling connects directly to classical ergodic transformations, such as the Bernoulli map defined by the doubling transformation T(x) = 2x \mod 1 on the unit interval [0,1). The itinerary of a point x under T with respect to the partition \{ [0, 1/2), [1/2, 1) \} generates a binary sequence indicating which subinterval contains each iterate T^n(x), and these itineraries correspond precisely to paths in the De Bruijn graph, where edges represent allowed transitions between partition sets. The full shift induced by these itineraries is measure-theoretically isomorphic to the dynamics of T with respect to Lebesgue measure, highlighting the graph's role in conjugating continuous dynamics to discrete symbolic representations. The full shift on the De Bruijn graph is ergodic with respect to the uniform Bernoulli measure (product measure with equal probabilities $1/2 on each symbol), meaning that almost every sequence is dense in the shift space and invariant sets have measure 0 or 1. This ergodicity follows from the graph's irreducible and aperiodic transition matrix, ensuring the existence of a unique invariant probability measure of maximal entropy, which the graph structure preserves through its balanced in- and out-degrees. The graph thus facilitates the analysis of invariant measures and mixing properties in symbolic systems. De Bruijn graphs also exhibit a structural resemblance to the Lorenz attractor when embedded in three dimensions, where the graph's looping paths mimic the attractor's butterfly-shaped lobes and branching trajectories. Furthermore, they are employed in studying m-adic transformations, generalizations of the doubling map to base m, by discretizing the dynamics into symbolic sequences via the graph's edge shifts, enabling the generation of full-length periodic orbits and analysis of their properties.

References

[1]
[PDF] Lecture 21 1 De Bruijn Sequences
Apr 28, 2011 · Definition 3 A De Bruijn graph is a directed graph. DBn = ({0,1}n−1,{0,1}n), that is, a graph with vertex set {0,1}n−1 and edge set {0,1}n ...
[2]
[PDF] De Bruijn graphs and their applications to fault tolerant networks
A De Bruijn graph is a directed graph with dn nodes labeled by n-tuples over a d-character alphabet (denoted by juxtaposition). The edges are defined to be ...<|control11|><|separator|>
[3]
[PDF] A combinatorial problem - Pure
A combinatorial problem. Proceedings of the Section of Sciences of the Koninklijke. Nederlandse Akademie van Wetenschappen te Amsterdam, 49(7), 758-764.
[4]
NORMAL RECURRING DECIMALS I. J. GodD*. Suppose that in the ...
NORMAL RECURRING DECIMALS. 107. NORMAL RECURRING DECIMALS. I. J. GodD*. Suppose that in the decimal representation of a particular number between 0 and 1 a ...
[5]
Why are de Bruijn graphs useful for genome assembly? - PMC - NIH
Edges of the de Bruijn graph represent all possible k-mers, and thus an Eulerian cycle in B represents a shortest (cyclic) superstring that contains each k-mer ...
[6]
Succinct dynamic de Bruijn graphs | Bioinformatics - Oxford Academic
The de Bruijn graph is one of the fundamental data structures for analysis of high throughput sequencing data. In order to be applicable to population-scale ...
[7]
https://pmc.ncbi.nlm.nih.gov/articles/PMC5206522/
[8]
[PDF] Acknowledgement of priority to C. Flye Sainte-Marie on the counting ...
Jan 1, 1975 · Citation for published version (APA):. Bruijn, de, N. G. (1975). Acknowledgement of priority to C. Flye Sainte-Marie on the counting of circular.
[9]
[PDF] IJ Good's Shorter Publications List
Jun 26, 2003 · P. 7. “Normal recurring decimals”, JLMS 21 (1946), 167-169. (The 'teleprinter problem': solved in. 1941 but not ...
[10]
A new look at the de Bruijn graph - ScienceDirect
The Good-de Bruijn graph was originally defined to settle a question of existence of a certain shift register sequence, namely a binary cycle of length 2n ...
[11]
[PDF] A Comprehensive Review of the de Bruijn Graph and Its ...
The de Bruijn graph was proposed by de Bruijn and Good in 1946. It was initially employed in binary sequence research but has been widely adopted across ...
[12]
[PDF] An Unoriented Variation on de Bruijn Sequences - arXiv
Aug 30, 2016 · Our definition of an unoriented de Bruijn graph differs from that of the undirected de Bruijn graph defined by Esfahanian and Hakimi [4] and ...
[13]
an efficient and near-exact representation of the weighted de Bruijn ...
Jul 12, 2017 · Each k-mer has a k–1-base overlap with adjacent k-mers in the sequence. Pellow et al. (2016) use this redundancy to detect false positives in a ...Missing: acyclic | Show results with:acyclic
[14]
[PDF] De Bruijn Graph assembly
De Bruijn graph is a directed multigraph. Page 10. Eulerian walk definitions and statements.Missing: theory | Show results with:theory
[15]
[PDF] Graph-Theoretic Analysis of de Bruijn Graphs: Fault Resilience and ...
Jan 1, 2024 · De Bruijn graphs, characterized by regularity, Eulerian, and Hamiltonian properties, boast a low diameter in close proximity to optimal ...
[16]
On the connectivity of the De Bruijn graph - ScienceDirect.com
We show that the connectivity of the d-ary De Bruijn graph is at least d − 1. This bound is tight since the outdegree of this graph with self-loops deleted ...Missing: regularity | Show results with:regularity
[17]
[PDF] Applications De Bruijn Graphs of Hamiltonian and Eulerian Cycles ...
An undirected De Bruijn graph is a De Bruijn graph modified so that: 1) All edges which are self loops are removed. 2) If aaaa. ⃗ is an edge in B(d, n), ...
[18]
[PDF] Words and Automata, Lecture 2
The number of de Bruijn cycles of order n on an alphabet with k letters is k−n(k!)kn−1 . In particular, for k = 2, there are 22n−1−n de Bruijn cycles of order.
[19]
[PDF] DeBruijn Cycles and Relatives
Along the way we will discover a nice formula for the number of Eulerian cycles and a CAT algorithm for generating all Eulerian cycles of G. If G is a ...
[20]
Laying Out Graphs Using Queues | SIAM Journal on Computing
The problem of laying out the edges of a graph using queues is studied. In a k-queue layout, vertices of the graph are placed in some linear order.
[21]
Embedding de Bruijn and Shuffle-Exchange Graphs in Five Pages
Algorithms for embedding de Bruijn and shuffle-exchange graphs in books of five pages, with cumulative pagewidth $( 5/3 )2^n - ( 2/3 ) - ( 8/3 ) ( n\bmod 2 ) ...
[22]
[PDF] arXiv:cs/0505036v1 [cs.DM] 12 May 2005
This property explains the relation between de Bruijn graphs and de Bruijn sequence: BD,n+1 is the label of an Eulerian trail of GD,n. Therefore, given a ...
[23]
[PDF] De Bruijn Sequences
Sep 28, 2006 · Since the De Bruijn graph is strongly connected and every vertex has in-degree equal to its out-degree, Theorem 6.54 tells us that the graph ...
[24]
[PDF] Making de Bruijn Graphs Eulerian - Hal-Inria
Oct 28, 2022 · Euler's theorem tells us that a weakly connected directed multigraph is Eulerian if and only if every node is balanced. Given a collection S of ...
[25]
https://inria.hal.science/hal-03832887/document
[26]
https://doi.org/10.1016/j.disc.2018.07.010
[27]
An Eulerian path approach to DNA fragment assembly - PNAS
Pevzner (10) proposed a different approach that reduces SBH to an easy-to-solve Eulerian Path Problem in the de Bruijn graph. Because the Eulerian path approach ...
[28]
Velvet: Algorithms for de novo short read assembly using de Bruijn ...
In the de Bruijn graph, each node N represents a series of overlapping k-mers (cf. Fig. 1 for a small example). Adjacent k-mers overlap by k − 1 nucleotides.
[29]
ABySS: A parallel assembler for short read sequence data - PMC - NIH
The primary innovation in ABySS is a distributed representation of a de Bruijn graph, which allows parallel computation of the assembly algorithm across a ...
[30]
Informed and automated k-mer size selection for genome assembly
The de Bruijn graph is constructed with nodes being the (k – 1)-mers and the edges being the k-mers present in the reads. Broadly speaking, an assembler ...
[31]
Koorde: A Simple Degree-Optimal Distributed Hash Table
Koorde is a distributed hash table (DHT) based on Chord and de Bruijn graphs, inheriting Chord's simplicity.Missing: diameter | Show results with:diameter
[32]
[PDF] Products of networks with logarithmic diameter and fixed degree
Bruijn graph is shown to require a logarithmic dilation cost, ... The next two results show that PD,(N) is more powerful than the N'-node de Bruijn graph.
[33]
https://www.biorxiv.org/content/10.1101/2024.05.07.592713v1
[34]
[PDF] A Practical Distributed Hashtable Based on the De-Bruijn Topology
Apr 8, 2010 · Similarly to other De Bruijn based hashtables it uses a onstant size O(k) routing table instead of O(k log N) (where N is the number of nodes) ...
[35]
[PDF] Introduction to Symbolic Dynamics - Part 2: Shifts of finite type
Apr 14, 2010 · Consider the de Bruijn graph of order M on X: V(G) = BM(X). E(G) = BM+1(X) with i(e) = e[1,M] and t(e) ...