Segment tree
A segment tree is a balanced binary tree data structure designed for storing a collection of intervals (or segments) on the real line, enabling efficient range queries such as stabbing queries to report all intervals containing a given query point, as well as updates to the stored set. Each node in the tree represents a disjoint interval, with leaves corresponding to elementary intervals defined by the sorted endpoints of the input segments, and internal nodes covering unions of these subintervals; this canonical decomposition ensures that any input interval is stored in O(log n) nodes, where n is the number of segments. The structure supports O(n log n) preprocessing time for construction, O(log n + k) query time (where k is the output size), and O(log n) update time, making it a cornerstone of computational geometry for problems like orthogonal range searching and line segment intersection.
Invented by Jon L. Bentley in 1977 as part of solutions to Klee's rectangle problems, the segment tree builds on earlier ideas in multidimensional searching and has been refined in the late 1970s and 1980s. In its original form, it serves as a one-dimensional structure for fixed-endpoint intervals, often augmented with secondary structures (e.g., for y-coordinates in 2D applications) to handle multidimensional queries via fractional cascading, reducing query times from O(log² n + k) to O(log n + k).
A closely related variant, commonly employed in algorithm design and competitive programming, applies the segment tree concept to discrete arrays for efficient range aggregate queries (e.g., sum, minimum, or maximum over a subarray) and point updates, again achieving O(log n) time per operation after O(n) construction; this form leverages lazy propagation for range updates in O(log n) time as well.[1] The space complexity for both variants is O(n log n) in the general case, though optimized implementations can achieve O(n) space.[1] Segment trees remain influential due to their versatility, forming the basis for advanced structures like range trees and interval trees in geometric and database applications.
Fundamentals
Definition
A segment tree is a full binary tree data structure in which each node represents a contiguous interval of an underlying array or sequence.[2] It was introduced by Jon Louis Bentley in his 1977 unpublished manuscript addressing Klee's rectangle problems in computational geometry.[3]
In this structure, the leaf nodes each correspond to a single element of the array, while each internal node aggregates information over the interval spanned by its two child nodes, typically by applying an associative operation such as summation, minimum, or maximum.[2] Formally, for an array A of size n, the root node represents the entire interval [1, n], and the tree is recursively subdivided into halves until reaching single-element intervals at the leaves, ensuring balanced partitioning.[2]
The primary purpose of a segment tree is to support efficient range queries and point updates on static or dynamic arrays, achieving O(\log n) time complexity for both operations.[2]
For example, consider the array A = [1, 3, 5, 7] with indices 1 to 4. The segment tree would have leaves storing these values, with internal nodes above them representing sums (or other aggregates) over pairwise intervals: the node for [1,2] stores $1+3=4, [3,4] stores $5+7=12, and the root for [1,4] stores $4+12=16. This hierarchical representation allows quick computation of aggregates over arbitrary subintervals without traversing the entire array.[2]
Tree Structure
The segment tree organizes data in a binary tree structure that hierarchically partitions the index range of an underlying array into subintervals. The root node represents the full array interval [L, R], where L is the starting index (typically 0 or 1) and R is the ending index (typically n-1 for an array of size n). This root encompasses the entire range over which queries and updates will operate.[4]
Each internal node, corresponding to a subinterval [L, R], divides it evenly into two child intervals: the left child covers [L, \text{mid}] and the right child covers [\text{mid}+1, R], with \text{mid} = \lfloor (L + R)/2 \rfloor. This binary splitting rule ensures a systematic decomposition, recursively applied to both children until the base case is reached. The process terminates at the leaf nodes, each of which represents a singleton interval covering exactly one array index, such as [i, i] for some i.[4]
The resulting tree is balanced and complete, with a height of \lceil \log_2 (n+1) \rceil (approximately \log_2 n for large n), guaranteeing that paths from root to any leaf traverse a logarithmic number of nodes. This balanced structure arises from the consistent halving of intervals, preventing skewed subtrees and enabling predictable performance in operations.[4]
To illustrate, consider an array of size 8 with indices [0, 7]. The root node covers [0, 7]. It splits at \text{mid} = 3, yielding left child [0, 3] and right child [4, 7]. The left child further splits at \text{mid} = 1 into [0, 1] and [2, 3], while the right splits at \text{mid} = 5 into [4, 5] and [6, 7]. At the next level, these divide into singletons: [0, 0] and [1, 1], [2, 2] and [3, 3], [4, 4] and [5, 5], [6, 6] and [7, 7]. This forms a full binary tree of height 3, where each level refines the interval granularity.[4]
Storage Representation
Segment trees are commonly represented in memory using a one-dimensional array, which stores all tree nodes in a contiguous block for efficient access and cache performance. This array-based approach leverages the complete binary tree structure inherent to segment trees, allowing parent-child relationships to be computed via index arithmetic rather than pointers. The array is typically allocated with a size of $4n, where n is the length of the underlying data array, providing sufficient space to represent the tree without overflow even for arbitrary n.[2]
The root node, representing the full interval [0, n-1], is placed at index 1 (using 1-based indexing). For any node at index i, its left child is at index $2i and its right child at $2i+1. This fixed indexing enables constant-time computation of node locations during traversal.[5][2]
The space complexity of this representation is O(n), as the $4n allocation bounds the memory usage linearly despite some slots remaining unused, particularly when n is not a power of two. In such cases, the implementation pads the effective array size to the next power of two or relies on the generous $4n allocation to avoid explicit resizing, ensuring the tree fits without additional adjustments during indexing.[2][6]
Each array element corresponding to a node stores aggregated information for the interval it covers, such as the sum of elements (for range sum queries) or a single scalar like the minimum value (for range minimum queries). For queries requiring multiple aggregates, a node may hold a lightweight structure, such as a pair storing both minimum and maximum values within the interval.[7][2]
The mapping from array indices to specific intervals [L, R] is determined implicitly during the tree's recursive construction, where each node's position encodes the segment boundaries passed as parameters, avoiding the need for explicit storage of interval details in every node.[2]
Construction and Operations
Building the Tree
A segment tree is constructed from an input array by populating its nodes to represent aggregated values over array intervals, typically using a merge operation such as summation or minimum. The construction process ensures that leaf nodes hold individual array elements, while internal nodes store the merged results of their child subtrees. Two primary approaches exist: bottom-up iterative construction, which begins at the leaves and propagates values upward, and top-down recursive construction, which starts at the root and subdivides intervals recursively.[2][8]
In the bottom-up approach, the tree array is first initialized for non-leaf nodes with appropriate neutral values depending on the merge operation—for instance, positive infinity for minimum queries or zero for summation—to ensure correct merging even if partial construction occurs. The leaf nodes, corresponding to array indices, are then directly copied from the input array into the appropriate positions in the tree array (typically indices from n to $2n-1 for an array of size n). Subsequently, an iterative loop computes the values of internal nodes by merging pairs of children, starting from the level just above the leaves and moving upward to the root. This method is particularly efficient in practice due to its non-recursive nature, avoiding stack overhead.[8][9]
Pseudocode for the bottom-up construction, assuming a summation merge and a tree array t of size $2n with leaves at indices n to $2n-1, is as follows:
void build_bottom_up(int a[], int n) {
// Initialize non-leaf nodes (indices 1 to n-1) to 0 for sum
for (int i = 1; i < n; i++) {
t[i] = 0;
}
// Copy array to leaves
for (int i = 0; i < n; i++) {
t[n + i] = a[i];
}
// Build internal nodes bottom-up
for (int i = n - 1; i > 0; --i) {
t[i] = t[i * 2] + t[i * 2 + 1];
}
}
void build_bottom_up(int a[], int n) {
// Initialize non-leaf nodes (indices 1 to n-1) to 0 for sum
for (int i = 1; i < n; i++) {
t[i] = 0;
}
// Copy array to leaves
for (int i = 0; i < n; i++) {
t[n + i] = a[i];
}
// Build internal nodes bottom-up
for (int i = n - 1; i > 0; --i) {
t[i] = t[i * 2] + t[i * 2 + 1];
}
}
This process visits each node exactly once, yielding a time complexity of O(n).[8]
The top-down approach employs recursion to build the tree by dividing the array interval at the root and constructing subtrees for the left and right halves until reaching single-element intervals. Node indexing is used to place values in the tree array, with the root at index 1, left child at $2 \times v, and right child at $2 \times v + 1. The base case assigns the array value directly to leaf nodes, while non-leaf nodes merge the results from recursive calls on children. This recursive method aligns naturally with the divide-and-conquer paradigm underlying segment trees.[2][10]
Pseudocode for the top-down recursive build, again for summation and using a tree array t of size up to $4n, is:
void build_top_down(int a[], int v, int tl, int tr) {
if (tl == tr) {
t[v] = a[tl];
} else {
int tm = (tl + tr) / 2;
build_top_down(a, v * 2, tl, tm);
build_top_down(a, v * 2 + 1, tm + 1, tr);
t[v] = t[v * 2] + t[v * 2 + 1];
}
}
void build_top_down(int a[], int v, int tl, int tr) {
if (tl == tr) {
t[v] = a[tl];
} else {
int tm = (tl + tr) / 2;
build_top_down(a, v * 2, tl, tm);
build_top_down(a, v * 2 + 1, tm + 1, tr);
t[v] = t[v * 2] + t[v * 2 + 1];
}
}
The function is invoked initially as build_top_down(a, 1, 0, n-1). Like the bottom-up method, it achieves O(n) time complexity, as each array element contributes to a constant number of nodes along the path to the root. Both approaches produce an equivalent fully constructed tree ready for subsequent operations.[2][10]
Range Query
The range query operation in a segment tree retrieves the aggregated value (such as sum, minimum, or maximum) over a specified interval [ql, qr] in the underlying array by traversing the tree from the root. Each node represents a fixed interval [start, end] of the array, and the query determines the overlap between this interval and the query range to decide the next action. This traversal avoids visiting irrelevant parts of the tree, ensuring efficiency.[11]
The query function handles three cases based on overlap:
- If the current node's interval [start, end] is completely outside [ql, qr] (i.e., end < ql or start > qr), it returns the neutral element for the operation: 0 for sum queries, positive infinity (∞) for minimum queries, or negative infinity (-∞) for maximum queries.[11]
- If [start, end] is fully contained within [ql, qr] (i.e., ql ≤ start and end ≤ qr), it returns the precomputed value stored in the node, which represents the aggregate over that interval.[11]
- If [start, end] partially overlaps [ql, qr], the function recurses on the left and right child nodes, then combines their results using the appropriate operation (e.g., addition for sum, minimum for min). This splitting leverages the tree's binary structure, where child intervals cover [start, mid] and [mid+1, end], with mid = (start + end) / 2.[11]
The following pseudocode illustrates a general range query for an associative operation ⊕ (e.g., + for sum, min for minimum), assuming a 1-based array representation of the tree where the root is at index 1:
function query([node](/page/Node), start, end, ql, qr):
if qr < start or end < ql:
return neutral_element // e.g., 0 for [sum](/page/Sum), ∞ for [min](/page/Min)
if ql ≤ start and end ≤ qr:
return tree[node]
mid = (start + end) / 2
left_result = query(2 * node, start, mid, ql, qr)
right_result = query(2 * node + 1, mid + 1, end, ql, qr)
return left_result ⊕ right_result
function query([node](/page/Node), start, end, ql, qr):
if qr < start or end < ql:
return neutral_element // e.g., 0 for [sum](/page/Sum), ∞ for [min](/page/Min)
if ql ≤ start and end ≤ qr:
return tree[node]
mid = (start + end) / 2
left_result = query(2 * node, start, mid, ql, qr)
right_result = query(2 * node + 1, mid + 1, end, ql, qr)
return left_result ⊕ right_result
This recursive approach starts at the root (node=1, start=1, end=n) and terminates when base cases are met.[11]
The time complexity of a range query is O(log n), where n is the array size, because the traversal visits at most O(log n) nodes: at each level, it branches into at most two paths that lead to the query boundaries, and the tree height is O(log n). This bound holds regardless of the specific aggregate operation, as long as it is computable in constant time.[11]
Consider an example with array A = [1, 3, 5, 7, 9] (1-based indices 1 to 5) and a sum query on [2, 4], which should return 3 + 5 + 7 = 15. Starting at the root (interval [1,5]), the node partially overlaps [2,4], so recurse on left child [1,3] and right child [4,5]. For [1,3], it partially overlaps, so recurse on its children [1,2] and [3,3]: [1,2] partially overlaps (recurse to leaf 1: outside, returns 0; leaf 2: fully inside, returns 3; combine to 3), and [3,3] fully inside, returns 5; combine left subtree to 3 + 5 = 8. For right subtree [4,5], it partially overlaps, so recurse on [4,4] (fully inside, returns 7) and [5,5] (outside, returns 0); combine to 7. Finally, root combines 8 + 7 = 15. This path visits approximately 7 nodes in a tree of height 3.[11]
Point Update
A point update in a segment tree modifies the value stored at a single index in the underlying array and propagates this change upward through the tree to ensure all interval aggregates remain consistent. The process starts at the root and recursively traverses down to the leaf node representing the target index, where the value is directly updated. On the return path, each ancestor node's value is recomputed by combining the values of its children—for example, in a sum-based segment tree, the parent's sum is set to the sum of its left and right children's sums.[2]
The following pseudocode illustrates the point update function for a sum segment tree, where t is the array storing node values, v is the current node index, [tl, tr] is the interval it covers, pos is the 0-based index to update, and new_val is the new value:
void update(int v, int tl, int tr, int pos, int new_val) {
if (tl == tr) {
t[v] = new_val;
} else {
int tm = (tl + tr) / 2;
if (pos <= tm)
update(2*v, tl, tm, pos, new_val);
else
update(2*v+1, tm+1, tr, pos, new_val);
t[v] = t[2*v] + t[2*v+1];
}
}
void update(int v, int tl, int tr, int pos, int new_val) {
if (tl == tr) {
t[v] = new_val;
} else {
int tm = (tl + tr) / 2;
if (pos <= tm)
update(2*v, tl, tm, pos, new_val);
else
update(2*v+1, tm+1, tr, pos, new_val);
t[v] = t[2*v] + t[2*v+1];
}
}
This implementation handles the base case at the leaf by assigning the new value and, for internal nodes, recurses to the appropriate child before merging the children's values to update the current node.[2]
The time complexity of point update is O(\log n), where n is the array length, as the operation affects only the O(\log n) nodes along the path from root to leaf.[2]
For example, consider the array [1, 3, 5, 7, 9] with 1-based indexing (positions 1 to 5). Updating position 2 from 3 to 4 involves changing the corresponding leaf node and recomputing its ancestors: the node for interval [1,2] updates from $1 + 3 = 4 to $1 + 4 = 5, the node for [1,4] adjusts accordingly (e.g., from previous sum including 4 to new sum with 5), and the root for [1,5] updates from $1 + 3 + 5 + 7 + 9 = 25 to $26. This ensures all affected nodes reflect the change.[2]
Consistency is maintained by recomputing every ancestor node's value during the ascent from the leaf, guaranteeing that the tree accurately represents the updated array for subsequent queries.[2]
Advanced Features
Lazy Propagation
Lazy propagation is an optimization technique introduced to segment trees to handle range updates efficiently by deferring the application of updates to subtrees until they are necessary.[5] Each node in the segment tree maintains an additional lazy value that stores pending updates, such as adding a constant to all elements in the represented interval, without immediately propagating them to the leaves or child nodes.[12] This lazy value allows the tree to mark an entire range as updated in a single node, avoiding the need to traverse and modify every affected leaf, which would otherwise be inefficient for large ranges.[13]
The propagation rule involves pushing the lazy value down to child nodes when required. For a node representing an interval of length len, its value is updated by adding lazy \times len, and if it is not a leaf, the children's lazy values are incremented by the parent's lazy value, while the children's node values are also adjusted accordingly.[5] After pushing, the parent's lazy value is reset to zero to indicate that no pending updates remain at that level.[12] This ensures that updates are correctly distributed without redundant computations, maintaining the tree's integrity for subsequent operations.
Propagation is triggered whenever a node with a non-zero lazy value is visited during a range query or update operation, specifically before recursing into its children.[13] This on-demand approach contrasts with point updates, which immediately propagate changes along a single path to the leaf without deferral.[5] Implementing lazy propagation requires an additional array of size O(n) to store the lazy values, parallel to the main segment tree array, resulting in overall space complexity of O(n).[12]
A typical implementation of the push function for lazy propagation, assuming a sum-based segment tree and additive updates, is as follows:
pseudocode
void push(int node, int start, int end) {
if (lazy[node] != 0) {
tree[node] += (end - start + 1) * lazy[node]; // Update current node value
if (start != end) {
lazy[2 * node] += lazy[node]; // Propagate to left child
lazy[2 * node + 1] += lazy[node]; // Propagate to right child
}
lazy[node] = 0; // Clear lazy flag
}
}
void push(int node, int start, int end) {
if (lazy[node] != 0) {
tree[node] += (end - start + 1) * lazy[node]; // Update current node value
if (start != end) {
lazy[2 * node] += lazy[node]; // Propagate to left child
lazy[2 * node + 1] += lazy[node]; // Propagate to right child
}
lazy[node] = 0; // Clear lazy flag
}
}
Here, tree holds the node values, lazy holds pending updates, and indices follow the standard 1-based array representation of the segment tree.[5]
The primary benefit of lazy propagation is that it enables range updates to be performed in O(\log n) time by only updating O(\log n) nodes along the path to the range boundaries, rather than visiting all elements in the range, which would take O(n) time in the worst case.[13] This makes it particularly valuable for applications involving frequent range modifications alongside queries.[12]
Range Update
Range updates in segment trees extend the capability to modify all elements within a specified interval [l, r] efficiently, typically by adding a value or setting a value to each element in the range, while maintaining support for aggregate queries like sums. This is achieved by integrating lazy propagation, which defers the application of updates to subtrees until necessary, ensuring that the operation traverses only O(log n) nodes. The update process mirrors the recursive traversal used in range queries but focuses on applying the modification to nodes whose intervals are fully or partially covered by [l, r].[2]
When performing a range update on a node representing interval [start, end] for update range [l, r] with value val (assuming an additive update for sum queries), the function first checks if any pending lazy update exists on the current node; if so, it propagates that update downward before proceeding. For a fully covered case (where [start, end] is entirely within [l, r]), the node's aggregate value is immediately adjusted by val multiplied by the length of the interval (end - start + 1), and a lazy flag is set on the node with val to defer propagation to children. In the partial coverage case (where [start, end] overlaps but does not fully contain [l, r]), the function propagates any existing lazy value to the children, recursively updates the left and right child subtrees as needed, and then recombines the children's aggregate values to update the current node (e.g., sum = left_sum + right_sum). This ensures consistency without redundant traversals.
The following pseudocode illustrates the range update function for additive updates on sum queries, assuming a 1-based array indexing and tree stored in arrays tree[] and lazy[]:
void range_update(int node, int start, int end, int l, int r, long val) {
// Propagate pending lazy update if any
if (lazy[node] != 0) {
tree[node] += (end - start + 1) * lazy[node];
if (start != end) {
lazy[2*node] += lazy[node];
lazy[2*node + 1] += lazy[node];
}
lazy[node] = 0;
}
// No overlap
if (start > end || start > r || end < l) {
return;
}
// Fully covered
if (start >= l && end <= r) {
tree[node] += (end - start + 1) * val;
if (start != end) {
lazy[2*node] += val;
lazy[2*node + 1] += val;
}
return;
}
// Partial overlap
int mid = (start + end) / 2;
range_update(2*node, start, mid, l, r, val);
range_update(2*node + 1, mid + 1, end, l, r, val);
tree[node] = tree[2*node] + tree[2*node + 1];
}
void range_update(int node, int start, int end, int l, int r, long val) {
// Propagate pending lazy update if any
if (lazy[node] != 0) {
tree[node] += (end - start + 1) * lazy[node];
if (start != end) {
lazy[2*node] += lazy[node];
lazy[2*node + 1] += lazy[node];
}
lazy[node] = 0;
}
// No overlap
if (start > end || start > r || end < l) {
return;
}
// Fully covered
if (start >= l && end <= r) {
tree[node] += (end - start + 1) * val;
if (start != end) {
lazy[2*node] += val;
lazy[2*node + 1] += val;
}
return;
}
// Partial overlap
int mid = (start + end) / 2;
range_update(2*node, start, mid, l, r, val);
range_update(2*node + 1, mid + 1, end, l, r, val);
tree[node] = tree[2*node] + tree[2*node + 1];
}
This implementation correctly handles the length adjustment for sum aggregates during both propagation and application of the update. The time complexity for a range update is O(log n), as the recursion depth is logarithmic and each level processes a constant number of nodes, with lazy propagation preventing full subtree updates.[2]
Consider an example with an initial array A = [1, 3, 5, 7, 9] (1-based indices 1 to 5) and a segment tree built for sum queries. To add 2 to the range [2, 4], the update function starts at the root (node 1, interval [1,5]). Since [1,5] partially overlaps [2,4], it propagates any lazy (none initially), recurses to left child (node 2, [1,3]) and right child (node 3, [4,5]). For the left child [1,3], which partially overlaps, it further recurses: its left [1,1] has no overlap, its right [2,3] is fully covered, so node for [2,3] updates its sum by 2 * 2 = 4 and sets lazy=2. The left child's sum is recombined as sum([1,1]) + updated sum([2,3]). For the right child [4,5], it partially overlaps: left [4,4] fully covered (update sum by 2 * 1 = 2, set lazy=2), right [5,5] no overlap. The root sum is then updated to reflect the total addition of 2 * 3 = 6. Lazy values remain on nodes for [2,3] and [4,4] until a subsequent query or update triggers propagation, at which point the leaf values (e.g., A[14]=5, A[15]=7, A[16]=9) are adjusted accordingly.
Higher-Dimensional Generalizations
Higher-dimensional generalizations of the segment tree extend the one-dimensional structure to handle multi-dimensional data, such as points in d-dimensional space or arrays with multiple indices, by recursively nesting segment trees along each dimension. This approach enables efficient range queries and updates in higher dimensions, though at increased computational cost.
In two dimensions, the structure is a tree of trees: the root is a one-dimensional segment tree over the first coordinate (e.g., rows or x-coordinates), and each node in this outer tree contains another one-dimensional segment tree over the second coordinate (e.g., columns or y-coordinates). This nested design, originally described for orthogonal range searching, allows the data structure to partition the space hierarchically along each axis.
Construction proceeds recursively: first, build the outer segment tree on the first dimension, then for each node in the outer tree—representing a range in the first dimension—construct the corresponding inner segment tree on the second dimension using the elements (or aggregates) within that range.[17] For a set of n points with coordinates up to size n (after compression), the space complexity is O(n log n), as the outer tree requires O(n) space and each of its O(log n) levels contributes inner trees totaling O(n log n) overall; this generalizes to O(n log^d n) space for d dimensions.[18]
Range queries in two dimensions involve nested traversals: query the outer tree to identify O(log n) canonical nodes covering the range in the first dimension, then for each such node, query its inner tree for the range in the second dimension, yielding a total time of O(log^2 n); in d dimensions, this extends to O(log^d n) time through successive nested queries.[19] One key application is computing 2D range sums on matrices, where the structure aggregates values over rectangular subregions efficiently.[17]
Despite these advantages, higher-dimensional segment trees face significant limitations due to their space and time complexities: for 2D, queries and updates take O(log^2 n) time, while space O(n log n) can strain memory for large n, and these costs escalate exponentially with dimensionality as O(log^d n) and O(n log^d n), making them impractical beyond low dimensions without optimizations like sparsity or dynamic node allocation.[19]
Applications and Variants
Common Applications
Segment trees are widely employed in competitive programming platforms such as Codeforces and SPOJ for efficiently handling range sum, minimum, and maximum queries on arrays, enabling solutions to problems involving dynamic updates and aggregations over subarrays.[8] For instance, they facilitate solving tasks like computing prefix sums or finding extrema in specified intervals, which are common in algorithmic contests where naive approaches would exceed time limits.[20]
In problems requiring dynamic updates, segment trees support a variety of range queries and modifications on arrays.
In image processing, segment trees enable range updates for applying filters to pixel arrays, such as cost aggregation in stereo matching where they efficiently compute disparities over image segments while preserving edge information.[21]
Bioinformatics applications leverage segment trees for aggregating scores over genomic ranges, as in tools like Segtor, which use them to rapidly annotate single nucleotide variations by querying overlapping intervals in transcript data.[22] This structure supports efficient intersection queries across multiple genomic interval sets, crucial for variant analysis in large-scale sequencing data.[23]
For online query processing in real-time data streams, segment trees handle insertions and range queries dynamically, such as in distributed systems where they overlay interval representations to process video feeds or network data with low latency.[24]
In computational geometry, segment trees are fundamental for solving problems like orthogonal range searching and reporting all segments stabbed by a query point. They serve as a core component in more advanced structures, such as range trees for multidimensional range queries and interval trees for efficient interval overlap detection.[25][26]
Fenwick Tree Comparison
Both segment trees and Fenwick trees (also known as binary indexed trees) are efficient data structures for handling range sum queries and point updates on an array of size n, achieving O(\log n) time complexity for both operations.[27] This similarity makes them suitable alternatives for problems involving prefix sums, where neither requires rebuilding the structure after updates.[27]
A key difference lies in their representation: segment trees explicitly construct a balanced binary tree, typically using an array of size $4n to store node values representing subarray sums, which facilitates straightforward adaptation for non-sum operations like minimum or maximum queries by storing the respective aggregates at each node.[28] In contrast, Fenwick trees employ an implicit structure via a compact array of size n+1, relying on bit manipulation for index traversal, which results in simpler implementation for prefix sum computations but limits native support to sum-like operations without additional modifications.[27]
Segment trees are preferable when the application requires range updates, support for min/max or other associative operations, or extensions to higher dimensions, as their explicit tree structure enables features like lazy propagation for efficient range modifications in O(\log n) time.[29] Fenwick trees, however, are ideal for straightforward prefix sum scenarios due to their minimal code complexity and lower practical overhead.[27]
While Fenwick trees can emulate certain segment tree operations, such as range queries via prefix sum differences, they lack native lazy propagation and require multiple auxiliary trees for range updates, leading to O(\log^2 n) complexity in modified forms.[29]
In terms of space and time trade-offs, both structures use O(n) space overall, but segment trees incur higher constants with up to $4n array elements compared to the Fenwick tree's n+1, potentially increasing memory access costs; nonetheless, Fenwick trees often exhibit 1.17× to 2.43× speedups in practice for sum operations due to fewer iterations (about 50% less on average) and cache-friendly access patterns.[27]
History and Development
Origins
The segment tree was invented by Jon L. Bentley in 1977 as part of a suite of range searching structures developed for efficient querying in computational geometry.[30]
Bentley first described the segment tree in his technical report "Solutions to Klee's Rectangle Problems," where it served as a one-dimensional variant of more general multidimensional binary search trees, adapted for interval-based operations.[30] This report focused on solving Klee's problems, which involve counting and reporting intersections among axis-aligned rectangles, using the segment tree to manage projections onto lines for range reporting.[30]
The primary motivation behind the segment tree's invention was to enable efficient range reporting queries in database applications and geometric computations, where traditional linear scans were too slow for large datasets.
Early implementations of the segment tree appeared in computational geometry contexts, particularly for handling stabbing queries on intervals, which laid groundwork for structures like interval trees used in reporting all intervals containing a query point.[31]
Key Contributions
In the late 1970s and 1980s, the segment tree was refined for multidimensional applications through contributions from researchers such as Edward M. McCreight, who developed priority search trees; G.S. Lueker, who worked on range trees; and Daniel E. Willard, who advanced layered range trees and fractional cascading techniques to improve query efficiency.
The technique of lazy propagation enables efficient range updates in segment trees by deferring the application of updates to child nodes until explicitly required during a query or further update. This optimization ensures that both range updates and queries operate in O(log n) time complexity, making it essential for handling large-scale interval modifications without redundant traversals. By the 2000s, lazy propagation had become a cornerstone in competitive programming, where it is routinely implemented in templates for solving dynamic range problems.[2]
Higher-dimensional generalizations of segment trees emerged in the 1980s within computational geometry, extending the structure to multi-dimensional range searching by nesting segment trees across dimensions. These variants support queries on d-dimensional orthogonal ranges with space usage of O(n log^{d-1} n) and query times of O(log^d n + k), where k is the output size. A seminal analysis by Edelsbrunner et al. in 1983 examined the expected and worst-case space requirements of such multi-dimensional segment trees, confirming their practicality for geometric applications despite exponential growth in higher dimensions.[32]
Segment trees can be integrated with persistent data structures to facilitate versioned operations, allowing updates to produce new tree versions while preserving prior states for historical queries. This combination supports O(log n) time per update and query across versions, with space overhead of O(log n) per modification, enabling applications like version control in databases and reversible computations.
Open-source implementations of segment trees, including support for lazy propagation and persistence, are prevalent in competitive programming repositories and algorithmic libraries, promoting widespread adoption in software engineering and contest solutions.[2]
The survey "Geometric Range Searching and Its Relatives" by Agarwal and Erickson (1999) provided a formal framework for segment trees in orthogonal and simplex range searching, synthesizing their extensions for semialgebraic sets and establishing benchmarks for query efficiency in geometric data structures.[33]