Euler tour technique
The Euler tour technique (ETT) is a fundamental method in computer science for linearly representing rooted trees to enable efficient computation of subtree properties and other tree-related functions. It constructs a sequence by performing a depth-first traversal of the tree, recording each visit to a node upon entry and exit (and sometimes backtracks), effectively traversing each edge twice—once downward and once upward—yielding a tour of length $2n-1 for a tree with n nodes. In this representation, the positions corresponding to any subtree form a contiguous interval in the sequence, allowing complex tree operations to be reduced to simple range queries or updates on a linear array.[1][2]
Originally developed for parallel computing models, the technique was introduced by Robert Tarjan and Uzi Vishkin in 1984 as part of an algorithm for finding biconnected components in graphs and computing various tree functions, such as node depths and subtree sizes, in logarithmic parallel time on the PRAM model. By converting tree computations into associative operations (e.g., prefix sums with +1 for entries and -1 for exits), ETT facilitates parallel evaluation using scan primitives, achieving optimal work-efficient algorithms for problems like rerooting trees or enumerating traversal orders (preorder, inorder, postorder). The method draws from Eulerian circuits in directed graphs obtained by duplicating tree edges bidirectionally.[3][4]
Beyond parallel algorithms, ETT has influenced sequential data structures and graph processing, such as enabling lowest common ancestor queries via range minimum queries on the tour or supporting dynamic tree updates in external-memory settings. It remains a building block in advanced techniques like link-cut trees and has been adapted for GPU-accelerated graph analytics, where the linear representation aids in parallel subgraph computations like finding bridges or connectivity.[5][4]
Introduction
Definition and Purpose
The Euler tour technique (ETT) represents a rooted tree as a linear sequence obtained by performing a depth-first traversal in which each edge is traversed twice—once downward from parent to child and once upward from child to parent—forming a closed tour that starts and ends at the root. This process records the nodes visited at each step, resulting in a sequence where each node appears multiple times: specifically, the root appears once for each of its subtrees plus one initial visit, while leaf nodes appear only once, and internal nodes appear k+1 times for k children. The technique assumes familiarity with basic tree structures, including nodes, directed edges from parent to children, and a designated root, but focuses on this specific traversal to linearize the hierarchy without cycles.[6]
The primary purpose of the ETT is to flatten the tree's hierarchical structure into a one-dimensional array-like sequence, enabling efficient linear-time processing of static tree queries through operations such as prefix sums, range minimum queries, or segment tree constructions on the tour. By mapping subtrees to contiguous intervals in this sequence, the technique simplifies computations that would otherwise require recursive traversal, achieving O(n) time complexity for many problems where n is the number of nodes. It was originally motivated by the need for parallel algorithms in the PRAM model, allowing tree functions like preorder numbering or subtree aggregation to be computed in O(log n) time using O(n/log n) processors.[6]
For illustration, consider a simple rooted tree with root A connected to leaf children B and C. The depth-first traversal begins at A, descends to B and returns, then descends to C and returns, producing the tour sequence A-B-A-C-A. In this sequence, the subtree rooted at B corresponds to the single position of B, while the entire tree spans the full tour, demonstrating how multiple visits to A delineate subtree boundaries.
Historical Background
The Euler tour technique was first introduced by Robert Tarjan and Uzi Vishkin in 1984 in their seminal work on parallel graph algorithms. In the paper "An Efficient Parallel Biconnectivity Algorithm," they presented the technique as a method to linearize tree structures for efficient parallel processing in the PRAM model, achieving O(log n) time complexity using O(n / log n) processors for finding biconnected components in undirected graphs.[6]
The early development of the Euler tour technique was motivated by the need to solve fundamental tree-related problems in parallel computing environments, such as computing biconnected components and lowest common ancestors. Building on classical graph theory concepts like Eulerian paths, which traverse each edge exactly once, Tarjan and Vishkin adapted the idea to rooted trees by recording a depth-first traversal that visits each edge twice—once descending and once ascending—enabling parallel operations like prefix computations on the tour sequence.[6] This approach addressed limitations in sequential algorithms by facilitating concurrent evaluation of tree properties without excessive synchronization overhead in the PRAM model.[7]
In the 1990s, the technique saw significant extensions to handle dynamic tree structures, where trees undergo insertions and deletions. Dynamic Euler tour trees, which maintain a representation of the tour using balanced binary search trees, were introduced by Monika Henzinger and Valerie King in 1995 to support link-cut operations efficiently on evolving forests.[8] More recent advancements include batch-parallel variants developed by Thomas Tseng, Laxman Dhulipala, and Guy Blelloch in 2019, which adapt the Euler tour tree for multicore processors handling batches of updates in O(k log(1 + n/k)) work and O(log n) depth with high probability, bridging classical parallel models with modern hardware.[9]
The Euler tour technique has had lasting impact beyond theoretical parallel algorithms, influencing practical implementations in competitive programming and educational resources. For instance, it has been adopted in libraries and guides for efficient tree queries, such as subtree sizes and path aggregates, as seen in the U.S. Computing Olympiad (USACO) training materials, where it simplifies complex tree manipulations into array-based operations.[10]
Constructing the Euler Tour
Depth-First Search Procedure
The depth-first search (DFS) procedure forms the core of constructing an Euler tour for a tree by performing a recursive traversal that simulates walking along every edge twice—once downward to a child and once upward back to the parent. This process begins by selecting an arbitrary node as the root, which orients the tree for traversal without loss of generality, as the underlying undirected structure remains symmetric regardless of root choice, though the specific tour sequence depends on this selection. From the root, the algorithm visits each subtree recursively: it enters a node, explores all its children in sequence, and returns to the parent after each subtree, ensuring comprehensive coverage of the tree structure.
The recursive nature of DFS relies on the call stack to manage the traversal depth, where each recursive call corresponds to descending an edge, and each return simulates ascending it; the maximum stack depth equals the tree's height, which is O(n) in the worst case for a skewed tree but typically smaller for balanced ones. This procedure runs in O(n) time for a tree with n nodes, as the total work is proportional to the tour length of 2n - 1 node visits, with each edge traversal contributing constant-time operations.
The following pseudocode depicts the standard recursive DFS for generating the Euler tour sequence, where tour is a list that accumulates node identifiers:
def dfs(node, parent):
tour.append(node)
for child in adj[node]:
if child != parent:
dfs(child, node)
tour.append(node)
def dfs(node, parent):
tour.append(node)
for child in adj[node]:
if child != parent:
dfs(child, node)
tour.append(node)
In this implementation, the node is appended upon entry (first visit) and again after returning from each child (simulating the backtrack), including after the last child; for leaves, only the entry append occurs. This ensures internal nodes appear degree + 1 times in the rooted tree, and the tour ends at the root.
To illustrate, consider a linear chain tree with nodes A (root) connected to B, and B to C. The procedure starts at A (append A), recurses to B (append B), recurses to C (append C; C is a leaf, so return), appends B (back from C), returns to A, and appends A (back from B), yielding the tour [A, B, C, B, A]. This demonstrates the recursive calls mirroring the down-and-up traversals along the path.
Recording Entry and Exit Events
In the Euler tour technique, entry and exit events are recorded during a depth-first search (DFS) traversal of the rooted tree to construct the explicit tour sequence. An entry event is logged when a node is first visited, signaling the beginning of its subtree exploration, while an exit event is logged after processing all its children, marking the end of the subtree. This dual recording ensures that each edge in the tree is traversed twice—once downward and once upward—resulting in internal nodes appearing multiple times in the sequence, specifically one more time than their number of children (degree + 1 in the rooted tree).[6][11]
The sequence is built by appending the node's identifier to a list at each entry and exit event, producing a linear record of the visit order that forms an Eulerian path starting and ending at the root. For example, consider a simple rooted tree with root 1 having children 2 and 4, where 2 has child 3; the resulting tour sequence would be [1, 2, 3, 2, 1, 4, 1], capturing the recursive descent into subtrees and returns to parent nodes. In a balanced binary tree, this sequence similarly embeds the full hierarchical structure as a continuous path that revisits ancestors between subtree tours.[11][12]
This tour is typically represented as a linear array of node identifiers, enabling constant-time access by index for static queries. For scenarios requiring dynamic modifications, such as in advanced tree representations, the tour can alternatively be maintained as a doubly-linked list, allowing efficient splits and concatenations in constant time per operation. The total length of the tour is 2n - 1 entries for a tree with n nodes, accounting for the initial entry to the root and one entry/exit pair per edge (with 2(n-1) edge traversals plus the starting node).[12][13][11]
Variations of the recording include augmenting the sequence with edge labels between consecutive node entries to track traversals explicitly, or assigning incremental timestamps to events for ordering purposes, though the core method relies on node-only entries. The technique generally employs a symmetric tour, where each tree edge is traversed in both directions, distinguishing it from asymmetric variants that might omit return paths; however, the standard approach prioritizes the node-recording DFS tour for simplicity and efficiency.[12][6]
Building the tour array requires O(n) space for storage and O(n) time, achieved through a single DFS pass that appends entries in visit order.[6][11]
Fundamental Properties
Entry and Exit Times
In the Euler tour technique, timestamps are assigned to nodes during a depth-first search (DFS) traversal of the rooted tree. Specifically, upon first visiting a node u, an entry time e is recorded as the current timestamp, which increments sequentially starting from 1. The traversal then recursively visits all subtrees of u, and upon returning from the last subtree, an exit time x is recorded as the updated timestamp before backtracking to the parent. This process generates a total of $2n - 1 timestamps for a tree with n nodes, corresponding to the n-1 edges traversed twice each (down and up) plus the initial entry to the root.
The entry and exit times define contiguous intervals in the Euler tour that represent subtrees. For each node u, the interval [e, x] in the timestamp sequence encompasses all entry and exit events of u and its descendants, capturing the complete DFS exploration of the subtree rooted at u. This interval representation linearizes the hierarchical structure of the tree into a one-dimensional array, facilitating efficient queries on subtree properties.
The length of this interval relates directly to the subtree size. Let s(u) denote the number of nodes in the subtree rooted at u. Then,
x - e + 1 = 2 \cdot s(u) - 1,
since the interval includes one entry time for each of the s(u) nodes, one exit time for each of the s(u) - 1 non-root nodes in the subtree, and accounts for the two traversals per subtree edge plus the root's entry. This formula allows immediate computation of subtree sizes from the timestamps without additional traversal.
These intervals exhibit key structural properties: they are properly nested, meaning the interval of a node contains the intervals of its children without overlap between disjoint subtrees, reflecting the recursive nature of DFS. The root's interval [e, x] = [1, 2n-1] spans the entire tour. For example, consider a tree with root A having children B and C, where B and C are leaves; the Euler tour A-B-A-C-A yields e[A]=1, x[A]=5; e[B]=2, x[B]=2; e[C]=4, x[C]=4, with subtree intervals [1,5] for A, [2,2] for B, and [4,4] for C, satisfying the length formula for each. The segments between consecutive timestamps classify as advances (to children) or retreats (to parents), as detailed subsequently.
Advance, Retreat, and Tree Edges
In the Euler tour technique applied to trees, the traversal path is analyzed by classifying the transitions between consecutive positions in the tour sequence. These transitions correspond to directed movements along the tree's edges during the depth-first search. Specifically, an advance edge represents a downward movement from a parent to an unvisited child, increasing the depth by one level. Conversely, a retreat edge denotes an upward movement from a child back to its parent after the subtree has been fully explored, decreasing the depth by one level. Tree edges refer to the original undirected edges of the input tree, each of which is traversed twice in opposite directions during the tour—once as an advance and once as a retreat—except at the root where the tour begins and ends.[14]
The segments of the tour between consecutive entries (or visits) to nodes are categorized based on these movements. An advance segment occurs when moving to an unvisited child, initiating exploration of a new subtree. A retreat segment happens upon returning from a completed subtree to the parent node. For leaf nodes, which have no children, the subtree interval consists of a single position in the tour (where entry and exit coincide), with no advances or retreats within the subtree. These classifications facilitate efficient parallel computations on tree structures by mapping local movements to global tour properties.[14]
The Euler tour is inherently circular due to the bidirectional traversal of each tree edge, but it is typically linearized starting and ending at the root, effectively splitting the circuit at the root's exit event. Rerooting the tree to a new node v can be achieved in constant time by rotating the tour's starting position to the entry event e of v, which reorients the linear representation while preserving key properties such as subtree intervals and relative entry-exit timings. This operation maintains the integrity of the tour for queries without requiring reconstruction.[15]
Depths in the tree can be computed using the entry and exit times assigned during the tour construction. Alternatively, depths can be derived via prefix sums on depth changes along the tour sequence, where each advance contributes +1 and each retreat contributes -1. The cumulative sum up to any position in the tour yields the relative depth at that point, enabling batch computation in parallel models.
d = \sum_{j=1}^{i} \delta, \quad \delta =
\begin{cases}
+1 & \text{if position } j \text{ is an advance} \\
-1 & \text{if position } j \text{ is a retreat}
\end{cases}
The depth of a node v is then d[e], providing a direct link between tour positions and tree structure.[16][14]
Applications to Tree Problems
Subtree Size and Depth Queries
The Euler tour technique facilitates efficient computation of subtree sizes and node depths in static rooted trees by utilizing the entry and exit times assigned during the depth-first search traversal that constructs the tour. For a node u, let e denote its entry time (the position in the tour when u is first visited) and x its exit time (the position when the traversal leaves u after exploring its entire subtree). The segment of the tour from e to x inclusive captures exactly the Euler tour of the subtree rooted at u, which has length $2 \times s - 1, where s is the size of the subtree (number of nodes). Thus, the subtree size s is given by
s = \frac{x - e + 2}{2}.
This formula arises because the tour traverses each of the s-1 edges in the subtree twice (down and up), recording a node visit each time, plus one additional recording for the entry to u.[6]
To query the depth of a node, the technique employs prefix sums over depth increments along the tour. During tour construction, associate a delta value with each consecutive pair of positions: +1 for an advance edge (moving down to a child, increasing depth) and -1 for a retreat edge (moving up to a parent, decreasing depth). The depth of node u (with root depth 0) is the prefix sum of these deltas from the start of the tour up to position e. These prefix sums are computed in O(n) sequential time or O(\log n) parallel time on a PRAM using a parallel prefix sum algorithm, where n is the number of nodes.[6]
The overall preprocessing involves building the Euler tour in O(n) time via DFS and computing the necessary prefix sums in O(n) time (or O(\log n) parallel), enabling both subtree size and depth queries to be answered in O(1) time by direct array lookups post-preprocessing. This approach was originally motivated by the need for parallelizable tree function computations, such as those in biconnectivity algorithms, where prefix operations allow logarithmic-time parallel evaluation.[6]
For illustration, consider a rooted tree with node 1 as root, child 2 (a leaf), and child 3 with child 4 (a leaf). A possible Euler tour sequence (recording nodes on visits) is 1, 2, 1, 3, 4, 3, 1, with positions 1 to 7. For node 3, e{{grok:render&&&type=render_inline_citation&&&citation_id=3&&&citation_type=wikipedia}}=4, x{{grok:render&&&type=render_inline_citation&&&citation_id=3&&&citation_type=wikipedia}}=6, so subtree size = (6 - 4 + 2)/2 = 2 (nodes 3 and 4). The deltas are +1 (1→2), -1 (2→1), +1 (1→3), +1 (3→4), -1 (4→3), -1 (3→1). Prefix sums up to each position yield depths: at position 4 (e{{grok:render&&&type=render_inline_citation&&&citation_id=3&&&citation_type=wikipedia}}), the sum is 1, confirming depth 1 for node 3.[6]
Lowest Common Ancestor
The Euler tour technique provides an efficient method for computing the lowest common ancestor (LCA) of two nodes in a static rooted tree by reducing the problem to a range minimum query (RMQ) on the depths of nodes in the tour sequence.[17] In this approach, the Euler tour records the sequence of nodes visited during a depth-first traversal, capturing entry and exit events for each node, which results in a sequence of length 2n-1 for a tree with n nodes.[18] Alongside this tour, a depth array is constructed where each position stores the depth of the corresponding node in the sequence, and a first-occurrence array records the initial position of each node in the tour.[19] The LCA of two nodes u and v is then the node with the minimum depth in the subsegment of the tour between their first occurrences, as this segment encompasses the path from u to v via their common ancestor, and the deepest common ancestor appears as the shallowest point in that range.[17]
To compute the LCA(u, v), first identify the positions i and j as the first occurrences of u and v in the tour, assuming without loss of generality that i < j. The LCA is the node at the position k = argmin_{m \in [i, j]} depth in the depth array. This RMQ can be answered in constant time after appropriate preprocessing on the depth array. Preprocessing the Euler tour takes O(n) time, and standard RMQ structures such as sparse tables require O(n log n) time and space for O(1) queries, while more advanced methods using Cartesian trees achieve O(n) preprocessing for the same query time.[18][19]
An alternative perspective leverages the interval property of the Euler tour: the subtrees of u and v are represented by disjoint intervals in the tour, and their lowest common ancestor defines the minimal interval containing both, where the depth-minimizing node in the overlapping tour portion between these intervals identifies the LCA. This formulation aligns with the RMQ reduction but emphasizes the nested structure of tour intervals corresponding to subtrees.[17]
For illustration, consider a tree with root A having children B and C. The Euler tour might be A, B, A, C, A, with depths 0, 1, 0, 1, 0. The first occurrence of B is at position 1 and of C at position 3. The segment from 1 to 3 is B-A-C with depths 1, 0, 1, and the minimum depth occurs at position 2 (node A), which is the LCA.[18]
Euler Tour Trees
Overview and Representation
An Euler tour tree (ETT) is a dynamic data structure that represents a tree or forest by storing the sequence of an Euler tour in a balanced binary search tree (BST), such as a splay tree or AVL tree, where the keys are the positions in the tour corresponding to entry and exit times of nodes.[13][20] This allows for efficient splitting and concatenation of tour segments, enabling support for dynamic modifications like linking and cutting subtrees.[21]
In the representation, each node in the BST corresponds to a specific event in the Euler tour—either the entry or exit of a tree node—and is keyed by its timestamp in the tour sequence.[13] Auxiliary information, such as subtree sizes, depths, or aggregate values (e.g., sums or minima over subtrees), can be stored in the BST nodes to facilitate queries on subtrees or paths.[20] For forests, each connected component is maintained as a separate ETT, allowing independent operations on multiple trees.[13]
To build an ETT, the initial Euler tour is constructed using a depth-first search traversal of the tree, recording entry and exit events as in the static case, which produces a sequence of length 2n-1 for a tree with n nodes.[13] This sequence is then inserted into the balanced BST in order, achieving O(n log n) time complexity due to the logarithmic cost per insertion.[20]
Compared to static Euler tours, which are fixed sequences for querying subtree properties, ETTs enable dynamic updates by leveraging BST operations to split the tour at entry/exit points of a subtree and merge segments during links or cuts, thus maintaining the structure under modifications in amortized O(log n) time per operation.[21][20]
The concept of Euler tour trees was developed following the introduction of dynamic tree structures in the 1980s, with key contributions by Henzinger and King in 1995, who integrated ETTs into fully dynamic graph algorithms to achieve polylogarithmic time for connectivity queries and updates using randomized balanced BSTs.[21] Earlier work by Miltersen et al. in 1994 also explored the complexity of such representations for dynamic trees.[20]
Link-Cut Operations and Queries
Euler tour trees support a suite of dynamic operations on forests, including linking and cutting edges, as well as queries on subtrees and paths, all leveraging the balanced binary search tree (BST) representation of Euler tours. These operations enable the maintenance of dynamic trees where edges can be added or removed, while preserving the ability to query aggregates and structural properties efficiently. The underlying BST, often implemented as a splay tree, ensures amortized logarithmic time by dynamically adjusting the tree structure after accesses.
The link operation adds an edge between two vertices u and v belonging to different trees in the forest, effectively merging the trees by making v a child of u. To perform the link, the Euler tour of v's tree is concatenated to the tour of u's tree at the attachment point following u's first visit in its tour; this is accomplished by splitting u's BST immediately after the node representing u's first occurrence and then concatenating v's BST between the split segments using two merge operations. Access to the roots is facilitated by following parent pointers or splaying relevant nodes, with the entire process taking O(\log n) amortized time due to the splay tree's self-adjusting properties.[22]
The cut operation removes the edge connecting a vertex u to its parent, splitting the tree into two separate components: one containing the subtree rooted at u and the other consisting of the remaining portion of the original tree. This is executed by splitting u's BST at the interval corresponding to u's subtree, specifically before the first and after the last occurrence of u in the tour, which separates the subtree's tour segment from the rest; the two resulting BST segments are then maintained as independent trees. The operation also updates parent pointers accordingly and runs in O(\log n) amortized time.[13]
Queries in Euler tour trees exploit the contiguous representation of subtrees and paths within the BST. For subtree aggregates, such as the sum of values in the subtree rooted at u, a range query is performed on the BST over the interval [e, x], where e and x denote the entry and exit times of u in the tour; the BST is augmented to store partial aggregates in internal nodes for efficient computation in O(\log n) time. The lowest common ancestor (LCA) of two nodes u and v can be found in O(\log n) time by performing a range minimum query on the depths over the interval in the tour between the first occurrences of u and v, after splitting the BST to isolate that range; the position with the minimum depth corresponds to the LCA.[13] While basic ETTs excel at subtree operations, efficient path queries often leverage extensions like link-cut trees, which use splay trees to represent paths directly. Path aggregates, such as sums along the path from u to v, can be computed using the LCA and aggregates from the root to u and v, assuming root-path queries are supported via similar range techniques or rerooting, in O(\log n) time.[13]
Maintenance of the structure following link and cut operations involves implicitly updating entry and exit times through the BST keys, which represent positions in the tour and adjust dynamically during splits and merges without explicit recomputation. Rerooting a tree at a new vertex r is achieved by splaying r to the root of its auxiliary splay tree and reversing the tour direction if necessary to reflect the new orientation, preserving the validity of subtree intervals in O(\log n) amortized time. All core operations—link, cut, and queries—achieve O(\log n) amortized time complexity, enabling applications in dynamic graph connectivity and minimum spanning tree maintenance.[22]
As an illustrative example, consider linking two single-node trees, each represented by a singleton Euler tour consisting of just the node itself. The link operation splits the tour of one node (trivially) and concatenates the other's tour, resulting in a combined tour such as u, v, u for the new tree with edge u-v, where the BST now holds this sequence and supports subsequent queries on the merged structure.