Fact-checked by Grok 2 weeks ago

Interval tree

An interval tree is a specialized tree-based data structure designed to store a dynamic set of intervals on the real line, facilitating efficient queries to identify intervals that contain a given point (stabbing queries) or overlap with a query interval. It is typically implemented as an augmented balanced , such as a red-black tree, where nodes are ordered by the low (left) endpoints of the intervals. Each node augments the standard structure by storing the interval it represents along with the maximum high (right) endpoint value among all intervals in its subtree, enabling pruning during searches. This augmentation allows the tree to maintain balance and support operations in logarithmic time relative to the number of stored intervals n. The structure was originally proposed by Herbert Edelsbrunner in as part of dynamic data structures for orthogonal intersection queries, providing an efficient way to report s intersecting a query in optimal time. In the standard augmented variant, insertion and deletion of s require updating the tree keys (low endpoints) and propagating the maximum high endpoint augmentations along the path, both achieving O(log n) amortized time due to the balanced nature of the underlying tree. For a query seeking all s containing a point x, the algorithm traverses from the , checking the current node's for containment and recursing into the appropriate child while using the subtree maximum to skip irrelevant branches, reporting O(log n + k) time where k is the number of matching s. Overlap queries are handled by a traversal starting from the , reporting the current if it overlaps the query, recursing left if the left subtree's max endpoint is at least the query's low endpoint, and recursing right if the current node's low endpoint is at most the query's high endpoint, also in O(log n + k) time. Interval trees require O(n) space, as each interval is stored once and augmentations add constant overhead per node. They find applications in for intersections, scheduling to detect conflicts, and bioinformatics for interval queries, where efficient is critical. Variants, such as those using segment trees or priority search trees, extend the basic interval tree for multidimensional or weighted scenarios while preserving core efficiency.

Overview

Definition and Motivation

An interval tree is an ordered tree data structure that stores a collection of intervals, each defined as a pair of real numbers [low, high] representing the endpoints of a segment on the real line, and supports efficient queries to identify all stored intervals that overlap with a given query point or interval. This augmentation of a binary search tree allows for the maintenance of additional information at each node, such as the maximum endpoint value in subtrees, to facilitate rapid overlap detection without exhaustive scanning. The primary motivation for interval trees stems from the inefficiency of naive methods in handling range overlap queries on large sets of intervals, where checking each of the n intervals against a query requires O(n) time in the worst case, becoming prohibitive for applications involving thousands or millions of intervals. By leveraging the tree's hierarchical , interval trees enable query operations in O(log n + k) time, where k is the number of overlapping intervals reported, thus providing a logarithmic factor improvement essential for real-time or high-volume data processing in domains like scheduling and . Interval trees originated from foundational work in aimed at solving orthogonal problems efficiently, with the structure independently formalized by Herbert Edelsbrunner and Edward M. McCreight in 1980 as part of dynamic data structures for reporting intersecting intervals on the real line. This innovation addressed the growing need for balanced space and time complexities in geometric query processing, influencing subsequent developments in augmented tree-based indexing.

Applications

Interval trees find widespread application in database systems for optimizing range searches and managing dynamic intervals. In object-oriented and temporal databases, they support efficient stabbing queries on intervals representing time periods or attribute ranges, enabling updates and queries in optimal external memory bounds. For instance, external interval trees handle insertions, deletions, and point-location queries on large datasets stored across disk blocks, which is essential for scalable . Similarly, in indexing uncertain data, interval trees partition query spaces into slabs based on endpoints, allowing probabilistic range queries to report relevant intervals with probability thresholds above a specified value, thus improving query performance in probabilistic databases. In geographic information systems (GIS), interval trees facilitate spatial overlap detection for linear features like roads or rivers along one dimension, often integrated into windowing queries for visualization. They efficiently identify segments intersecting a query , such as zooming into a rectangular area on a digital to retrieve all contained or overlapping paths, supporting rendering and . This application extends to higher-dimensional extensions for multi-attribute spatial data, where interval trees augment relational indexes like the RI-Tree to transform region queries into intersections. Bioinformatics leverages interval trees extensively for genomic interval queries, such as annotating variants against large sets of genes, exons, or regulatory elements. Tools like augmented range trees built on interval structures process massive datasets from projects like and gnomAD, identifying overlaps between query variants and annotation intervals to reveal functional impacts. For example, bedtk employs an implicit interval tree to compute overlaps in BED-formatted genomic data, enabling operations like and on millions of intervals with high efficiency. These applications benefit from the structure's ability to report all overlapping intervals in O(log n + k) time, where n is the number of stored intervals and k is the output size, ensuring scalability for genome-wide analyses.

Basic Concepts

Intervals and Overlap Definitions

In interval trees, an interval is typically represented as a closed interval [a, b], where a \leq b and a, b are the endpoints, which can be integers or real numbers. This representation includes both endpoints, meaning the interval encompasses all points from a to b inclusive. Open intervals (a, b) excluding endpoints or half-open variants like [a, b) are possible but less common in standard interval tree formulations, as closed intervals simplify overlap detection without loss of generality for most applications. The overlap condition between two intervals [a_1, b_1] and [a_2, b_2] is defined such that they overlap if \max(a_1, a_2) \leq \min(b_1, b_2), or equivalently, if a_1 \leq b_2 and a_2 \leq b_1. This criterion captures various overlap types, including proper overlap (where the interiors intersect but neither contains the other), (one interval fully encloses the other), and touching endpoints (sharing an endpoint without interior overlap, such as [1, 2] and [2, 3]). An alternative formulation for testing non-overlap is b_1 < a_2 or b_2 < a_1, so overlap holds if the negation is true: \neg (b_1 < a_2 \lor b_2 < a_1). Related to these definitions are specific query types that rely on overlap detection. A stabbing query identifies all intervals containing a given point x, equivalent to overlapping the degenerate interval [x, x]. An interval overlap query retrieves all intervals in a set that overlap a specified query interval [q_a, q_b], using the standard overlap condition to determine matches. These concepts underpin interval tree operations in domains like and .

Query Types and Operations

Interval trees support a variety of query types and operations designed to efficiently manage and retrieve information from sets of intervals on the real line. The primary queries focus on identifying intervals that intersect or contain specified points or ranges, building on the fundamental overlap condition where two intervals [a, b] and [c, d] intersect if max(a, c) ≤ min(b, d). The stabbing query, given a point x, reports all intervals in the tree that contain x (i.e., those for which the interval's low endpoint ≤ x ≤ high endpoint). This operation is particularly useful for determining coverage at a specific location and returns a list of all matching intervals. Unlike traditional point location structures that identify a single discrete position, stabbing queries in interval trees handle continuous ranges by reporting all relevant overlapping intervals. The overlap query, given a query interval [q_low, q_high], reports all stored intervals that intersect with it according to the overlap condition. This is the core retrieval operation for many applications, such as detecting conflicts in scheduling or genomic regions, and outputs a list of intersecting intervals. Efficiency for both stabbing and overlap queries is typically measured by the time required to locate the relevant tree nodes plus the time to report the k matching results. In addition to these queries, interval trees support insertion to add a new interval to the structure and deletion to remove an existing one, enabling dynamic maintenance of the interval set. Membership operations include checking whether a specific point is contained within any interval (via a stabbing query that verifies if at least one interval is reported) or whether a particular interval exists in the tree (via an exact search based on endpoints). These operations allow interval trees to serve as versatile data structures for evolving datasets.

Naive Methods

Simple List-Based Approach

The simple list-based approach represents a fundamental method for managing a collection of intervals without employing any advanced indexing structure. All intervals are stored in a contiguous array or linked list, which can be unsorted or optionally sorted by their starting points for minor optimizations in certain traversal patterns. This storage mechanism requires O(n) space for n intervals and supports insertions in amortized O(1) time by appending new intervals to the end of the list. To perform an overlap query with a given interval i = [t, s], the approach iterates through every stored interval j = [l, r] and checks for overlap using the condition l ≤ s and t ≤ r; if true for any j, it is reported as overlapping. Similarly, for a stabbing query at a point x, the scan verifies whether l ≤ x ≤ r for each interval j, reporting all that contain x. These operations each take O(n) time due to the full linear traversal, making the method straightforward to implement but inefficient for large datasets. This approach is practical in scenarios with a small number of intervals, such as prototyping or applications where the dataset size remains limited and query frequency is low. For illustration, consider a list containing intervals [1, 4], [2, 5], and [3, 6]; a query for [2.5, 3.5] would scan all three, confirming overlaps with each via the test.

Performance Limitations

The simple list-based approach to managing intervals, which relies on linear scans to detect overlaps, exhibits significant performance drawbacks as dataset sizes grow. Each query operation requires examining every stored interval to determine overlaps, yielding a worst-case time complexity of O(n) per query, where n denotes the number of intervals. Insertions are efficient at O(1) time, as they involve merely appending the new interval to the list without any restructuring. However, when processing multiple queries—common in applications involving batch operations—the cumulative cost escalates to O(n²) in the worst case for n queries, rendering it impractical for even moderately large collections. Space usage remains linear at O(n), storing each interval directly without auxiliary data, but this comes at the expense of lacking any indexing mechanism for accelerated access. Consequently, the structure offers no inherent support for efficient dynamic updates beyond basic insertions; deletions or modifications necessitate full rescans for subsequent queries, as there is no organized retrieval path. These limitations become particularly acute in scalability-demanding domains such as genomics, where datasets routinely encompass millions of intervals representing genomic features like exons or regulatory regions. Linear scans prove prohibitive here, often taking seconds to minutes per query on standard hardware, and fail to accommodate the high volume of overlapping queries typical in sequence analysis pipelines. This motivates the adoption of hierarchical structures like , which enable logarithmic-time operations—O(log n + k) for queries reporting k overlaps—to handle such workloads efficiently.

Centered Interval Trees

Construction Process

The centered interval tree is a variant of the interval tree data structure that organizes intervals by selecting a central point, typically the median of all endpoints, to partition the set of intervals into left, center, and right subsets based on their coverage relative to this point. This approach ensures balanced recursion and efficient storage of crossing intervals at each node. To construct the tree, first sort all 2n endpoints (both starts and ends) of the n intervals to identify the median endpoint as the center point x_{\text{mid}}. Partition the intervals as follows: those entirely to the left of x_{\text{mid}} (where the end point is less than x_{\text{mid}}) form the left subset; those entirely to the right (where the start point is greater than x_{\text{mid}}) form the right subset; and those crossing x_{\text{mid}} (where the start is at most x_{\text{mid}} and the end is at least x_{\text{mid}}) are stored directly at the current node, typically in two sorted lists—one by start points and one by end points—for efficient querying. Recursively build the left and right subtrees using their respective subsets until no intervals remain. This process creates a binary tree where each node holds the center value and the lists of crossing intervals. The construction achieves a time complexity of O(n \log n), primarily due to the initial sorting of endpoints and the recursive partitioning, which can leverage selection algorithms like quickselect for median finding in average linear time per level. The space complexity is O(n), as the tree has fewer than n nodes, and each interval is stored exactly once across the structure, with crossing lists at nodes requiring additional but linear storage.

Intersection Queries

In centered interval trees, intersection queries encompass both stabbing queries (finding all intervals containing a given point) and overlap queries (finding all intervals overlapping a given query interval). These operations leverage the tree's structure, where intervals are partitioned based on their relation to each node’s center point, with crossing intervals stored directly at the node. The stabbing query for a point x begins at the root and follows the binary search path determined by comparing x to each node's center c. At the current node, the algorithm examines the centered intervals stored there—those that overlap c—and reports any that contain x (i.e., intervals [l, r] where l \leq x \leq r). If x = c, all centered intervals are reported, as they inherently contain the center, and no recursion occurs, since left and right subtrees contain only intervals strictly to one side of c. Otherwise, the algorithm recurses into the left subtree if x < c or the right subtree if x > c, repeating the process. This path traversal ensures all relevant centered lists are checked, as any interval containing x must cross one of the centers on the path to x's position. To efficiently identify qualifying centered intervals at each node (which are typically stored sorted by left endpoint), a binary search can be used to find and report those satisfying the containment condition, exploiting the property that for x < c, all centered intervals with left endpoint \leq x contain x (since their right endpoint \geq c > x). The overall time complexity is O(\log^2 n + k), where n is the number of intervals and k is the output size, due to O(\log n) binary searches along the path; practical implementations often achieve effective O(\log n + k) performance with optimized reporting. The overlap query for a query interval [q_l, q_r] follows a similar -based traversal from the root, but adapts to the query's extent. At each with center c, the algorithm checks the centered intervals for overlap with [q_l, q_r] (reporting those where the intervals intersect) and recurses into the left subtree if [q_l, q_r] could overlap intervals there (e.g., if q_r \geq the left subtree's range minimum), and similarly for the right. In an optimized variant, the traversal simplifies to a single descending : at the current , if c \in [q_l, q_r], all centered intervals are reported without individual checks, as they contain c and thus overlap the query; then, descend right if c < q_l (left subtrees are too small), or left if c > q_r (right subtrees are too large). This reports all overlapping intervals efficiently, as any overlapping must cross a center contained in the query or be handled in subtrees along the path. The is O(\log n + k), benefiting from the single-path descent and bulk reporting at qualifying nodes. These queries assume the tree is balanced (e.g., via red-black tree underlying structure), ensuring path lengths of O(\log n). Representative examples include applications, where queries locate all genomic intervals (e.g., genes) containing a specific , or scheduling systems using overlap queries to detect conflicts with a proposed time slot.

Higher Dimensions and Extensions

To extend centered interval trees to higher dimensions, a nested structure is employed, particularly for handling multidimensional intervals such as axis-aligned rectangles in space. In this approach, a primary interval tree is constructed on the projections of the intervals onto the first dimension (e.g., x-coordinates), and at each node of this primary tree, a secondary interval tree is built on the projections onto the second dimension (e.g., y-coordinates) for the set of intervals stored in that subtree. This hierarchical organization leverages the efficiency of 1D interval trees while enabling multidimensional overlap detection, generalizing the centered approach where point serves as a splitting criterion in each dimension. For overlap queries in 2D, the process begins with an query on the primary tree to identify all nodes whose subtrees may contain rectangles overlapping the query rectangle's x-; this step reports O(log n) candidate subtrees. Within each of these subtrees, a secondary query is performed on the associated y- trees to filter for actual y-overlaps, ultimately reporting all overlapping rectangles. This nested querying adapts the 1D overlap operation to higher dimensions without requiring full traversal of the structure. The space requirements for this 2D extension are O(n log n), as each interval appears in O(log n) secondary trees across the primary tree's height. Query time is O(log² n + k) for reporting k overlapping rectangles, reflecting the logarithmic cost per dimension plus output size. In higher dimensions (d > 2), space grows to O(n log^{d-1} n) and query time to O(log^d n + k), though practical implementations often balance this with techniques. These limitations arise from the exponential increase in nesting depth, making the structure more suitable for low-dimensional cases. Such multidimensional extensions of centered interval trees are applied in spatial databases to support region queries, such as identifying overlapping geographic regions or objects in geographic information systems (GIS). For instance, they accelerate retrieval of map features within query windows by indexing bounding rectangles of spatial entities.

Deletion and Balancing

Deletion of an interval in a centered interval tree typically involves locating the interval in the appropriate node's list of overlapping intervals and removing it from that list, followed by rebuilding the affected subtrees to restore the partition properties and balance. This rebuild process ensures that the center points remain appropriately chosen, but it can be costly, with a full tree reconstruction requiring O(n log n) time in the worst case. To maintain the tree's logarithmic height, balancing is achieved during the initial construction by selecting each node's center point as the median of the endpoints (starts and ends) of the intervals assigned to that subtree, resulting in a balanced partition without the need for rotations or other self-adjusting mechanisms like those in AVL trees. Deletions pose challenges to this balance, as removing intervals can skew the distribution of remaining endpoints, leading to unbalanced subtrees and degraded query performance over time; to address this, periodic full rebuilds are often necessary to reselect optimal center points and repartition the intervals efficiently. An alternative to immediate rebuilding is lazy deletion, where removed intervals are simply marked as invalid in their node lists rather than physically deleted, allowing queries to skip marked entries while deferring costly restructurings until the proportion of deleted intervals warrants a complete rebuild. This approach amortizes the maintenance cost but requires additional space for the marked entries.

Augmented Trees

Tree Structure and Augmentation

The interval tree is fundamentally an augmented (BST) in which nodes are ordered by the low endpoints of the stored s. Each node represents an [low, high] and contains pointers to its left and right children, forming the standard BST structure where all intervals in the left subtree have low endpoints less than the node's low, and all in the right subtree have greater low endpoints. This ordering ensures that searching for potential overlaps can proceed along the BST paths efficiently. To support fast overlap detection, the tree is augmented such that each stores an additional value: the maximum high (max-high) among all intervals in the subtree rooted at that . This augmentation value is computed as the maximum of the node's own high and the max-high values of its left and right subtrees. The max-high field enables quick pruning during traversal—for instance, if a query 's low exceeds the max-high of a subtree, no interval in that subtree can overlap the query, allowing the entire subtree to be skipped. This design addresses the inefficiencies of naive BSTs without augmentation, which would require examining many non-overlapping subtrees. Construction of the interval tree proceeds by inserting intervals one by one using standard BST insertion rules, ordered by their low endpoints, while updating the max-high values bottom-up along the insertion path. For a set of n intervals, this process yields an expected O(n log n) construction time if using a balanced variant like a red-black tree, though unbalanced BSTs may degrade to O(n^2) in the worst case. The space complexity is O(n), as each node requires constant additional storage beyond the interval itself and child pointers to hold the max-high value.

Membership and Overlap Queries

In augmented trees, membership queries determine whether a given point x is contained within any stored in the tree, which is a special case of a stabbing query. A stabbing query reports all that contain the point x, by treating it as a degenerate [x, x] and performing an overlap query. The tree, organized as a on the low endpoints of with each node augmented by the maximum high endpoint in its subtree, enables efficient traversal: start at the root and check if x falls within the current node's ; if the left subtree's maximum high is at least x, recurse left to capture potential containing with lower starting points; otherwise, recurse right if necessary. This process follows a single path down the , visiting O(\log n) nodes, where n is the number of . Overlap queries find all intervals that intersect a given query interval i = [l, r], leveraging the same augmented structure for pruning. The algorithm begins at the root: if the current node's interval overlaps i (i.e., its low \leq r and high \geq l), report it. Then, if the left subtree's maximum high \geq l, recurse left, as intervals there may extend into i despite lower lows. Finally, if the current node's low \leq r, recurse right, since higher-low intervals might still overlap. This visits O(\log n) nodes regardless of output size, reporting k overlapping intervals in O(\log n + k) total time, as subtrees without potential overlaps are skipped via the augmentation. These queries rely on the BST ordering by low endpoints and the max-high augmentation to avoid exhaustive searches, ensuring balanced trees like red-black variants maintain the logarithmic bounds during insertions and deletions.

Code Implementation Examples

The augmented interval tree can be implemented in using a ordered by the low of each , with each node augmented to store the maximum high in its subtree. This allows efficient insertion and query operations, assuming a balanced for O(log n) time complexity. The following code snippets illustrate the core components, adapted from standard algorithmic descriptions.

Node Class

The node structure includes fields for the interval's low and high values, the maximum high value in the subtree, and references to left and right children.
java
class Interval {
    int low;
    int high;
    
    public Interval(int low, int high) {
        this.low = low;
        this.high = high;
    }
}

class Node {
    Interval interval;
    int maxHigh;
    Node left;
    Node right;
    
    public Node(Interval interval) {
        this.interval = interval;
        this.maxHigh = interval.high;
        this.left = null;
        this.right = null;
    }
}

Insertion Method

Insertion follows the standard procedure based on the low endpoint, followed by recursive updates to the maxHigh values along the to maintain the augmentation. This ensures the maxHigh at each is the maximum of its own high, left subtree's maxHigh, and right subtree's maxHigh.
java
[public](/page/Public) [class](/page/Class) IntervalTree {
    [private](/page/Private) [Node](/page/Node) root;
    
    [public](/page/Public) void insert(Interval newInterval) {
        root = insertRec(root, newInterval);
    }
    
    [private](/page/Private) [Node](/page/Node) insertRec([Node](/page/Node) node, Interval newInterval) {
        if (node == [null](/page/Null)) {
            return new [Node](/page/Node)(newInterval);
        }
        
        if (newInterval.low < node.interval.low) {
            node.left = insertRec(node.left, newInterval);
        } else {
            node.right = insertRec(node.right, newInterval);
        }
        
        // Update maxHigh after insertion
        node.maxHigh = Math.max(node.interval.high,
                                Math.max((node.left != null ? node.left.maxHigh : node.interval.high),
                                         (node.right != null ? node.right.maxHigh : node.interval.high)));
        
        return node;
    }
}
The insertion operates in O(log n) time on average, assuming the tree remains balanced (e.g., via a self-balancing variant like ).

Search Methods

For stabbing queries (finding intervals containing a point x), the search traverses the tree, pruning left subtrees if their maxHigh is less than x, and collects matching intervals. For interval overlap queries (finding intervals overlapping a query interval [low, high]), the traversal checks for potential overlaps using low/high comparisons and the maxHigh augmentation.

Stabbing Query (Point x)

java
import java.util.ArrayList;
import java.util.List;

public List<Interval> searchPoint(int x) {
    List<Interval> result = new ArrayList<>();
    searchPointRec(root, x, result);
    return result;
}

private void searchPointRec([Node](/page/Node) node, int x, List<Interval> result) {
    if (node == null) {
        return;
    }
    
    // Check current node
    if ([node](/page/Node).interval.low <= x && x <= [node](/page/Node).interval.high) {
        result.add([node](/page/Node).interval);
    }
    
    // Traverse left if possible overlap
    if ([node](/page/Node).left != null && [node](/page/Node).left.maxHigh >= x) {
        searchPointRec([node](/page/Node).left, x, result);
    }
    
    // Traverse right only if current low <= x
    if ([node](/page/Node).interval.low <= x && [node](/page/Node).right != null) {
        searchPointRec([node](/page/Node).right, x, result);
    }
}

Overlap Query (Interval queryInterval)

java
public List<Interval> searchOverlap([Interval](/page/Interval) queryInterval) {
    List<Interval> result = new ArrayList<>();
    searchOverlapRec(root, queryInterval, result);
    return result;
}

private void searchOverlapRec([Node](/page/Node) node, [Interval](/page/Interval) queryInterval, List<Interval> result) {
    if (node == null) {
        return;
    }
    
    // Check for overlap with current node: not (high[node] < low[query] or high[query] < low[node])
    if (!(node.interval.high < queryInterval.low || queryInterval.high < node.interval.low)) {
        result.add(node.interval);
    }
    
    // Traverse left if possible overlap (using maxHigh)
    if (node.left != null && node.left.maxHigh >= queryInterval.low) {
        searchOverlapRec(node.left, queryInterval, result);
    }
    
    // Traverse right only if current low <= query high
    if (node.interval.low <= queryInterval.high && node.right != null) {
        searchOverlapRec(node.right, queryInterval, result);
    }
}
Both search methods run in O(log n + k) time, where k is the number of reported intervals, by following a single path down the tree while reporting overlaps in subtrees.

Multidimensional Adaptations

Augmented interval trees, which store intervals along one dimension while maintaining auxiliary information such as the maximum endpoint in subtrees, can be extended to handle multidimensional overlap queries by combining them with range tree structures. In two or three dimensions, this adaptation involves constructing a primary interval tree on the x-projections of the multidimensional intervals (such as rectangles), where each node is augmented not only with the standard maximum x-endpoint but also with a secondary augmentation for the y-dimension, typically the maximum y-endpoint among intervals in the subtree. This y-max augmentation enables quick pruning of subtrees that cannot overlap with the query in the y-direction during traversal. For a 2D overlap query with a , the process begins by traversing the x-interval to identify O(log n) subtrees whose intervals may overlap the query's x-interval, using the standard interval mechanism. Within these subtrees, the y-max augmentation filters out non-overlapping candidates, followed by explicit checks or secondary traversals for y-overlap, resulting in an overall query time of O(log² n + k), where k is the number of reported overlapping . This approach leverages the 1D interval 's efficiency while extending it orthogonally to additional dimensions. In higher dimensions, the structure generalizes through recursive nesting: the primary tree handles the first dimension as an interval tree, with each node containing a lower-dimensional augmented range tree for the remaining coordinates, cycled across dimensions to balance the load. This leads to a space complexity of O(n log^{d-1} n) for d dimensions, trading increased storage for query efficiency in applications like spatial databases or genomic region overlaps. To mitigate the repeated logarithmic factors in query times, variants incorporate fractional cascading, which links sorted auxiliary structures across tree levels with shared pointers and subsets, allowing subsequent searches to proceed in constant time after an initial binary search. This technique, originally developed for geometric searching, reduces the 2D query to O(log n + k) in practice for fixed low dimensions while preserving linear preprocessing updates.

Alternative Variants

Segment Tree Implementation

A segment tree serves as an effective alternative to traditional interval trees for handling overlap queries on a fixed, discrete universe of endpoints, such as the range [1, U] where U is the size of the universe. The structure is a full binary tree with O(U) nodes, where leaf nodes each represent a single point in the universe, and each internal node represents the contiguous interval spanning the points in its subtree. This tree is typically built bottom-up or recursively, ensuring that the depth is O(log U) and that every possible subinterval of the universe can be decomposed into a disjoint union of O(log U) node intervals, known as the canonical interval decomposition. To support interval storage and queries, each is augmented with a list of the input that exactly cover the 's in the canonical decomposition. Specifically, for a set of n input , each is decomposed into its canonical parts and stored only in the O(log U) whose ranges precisely match those parts, avoiding redundancy. This augmentation ensures that no is stored multiple times in overlapping , leading to a total of O(n log U). Construction begins by the n intervals by their left endpoints, which takes O(n log n) time, followed by inserting each interval into the tree by traversing from the to identify and augment the O(log U) canonical nodes it covers. The overall build time is O(n log U), as each of the n intervals requires O(log U) work to decompose and store. This process assumes a static set of intervals, making it suitable for scenarios where the universe U is known and manageable, unlike dynamic augmented trees that support insertions and deletions in O(log n) time per operation. For an overlap query with a target [q_l, q_r), the algorithm decomposes the query into O(log U) disjoint canonical node intervals that exactly cover [q_l, q_r). It then visits these nodes and unions the lists of stored intervals from each, reporting all intervals that overlap the query since any overlapping input interval must cover at least one full subinterval of the query. The query time is O(log U + k), where k is the number of reported overlapping intervals, as collecting and unioning the lists from O(log U) nodes takes O(log U + k) time assuming sorted or hashed lists for efficient merging if needed.

Analysis and Comparisons

Time and Space Complexity

Interval trees achieve linear space complexity of O(n) for storing n intervals, as each interval is represented once with additional augmentation data that does not increase the asymptotic space beyond a constant factor per node. Construction of the tree typically requires O(n \log n) time, involving sorting the intervals by their low endpoints and building the augmented structure recursively. Overlap queries, which report all intervals intersecting a given query interval, run in O(\log n + k) time, where k is the number of matching intervals output; this bound arises from traversing O(\log n) nodes to identify candidate subtrees and then scanning the relevant stored intervals. More formally, the query time complexity is T(q) = O(\log n + |output|), capturing both the tree descent cost and the output enumeration. In the augmented binary search tree variant, balancing (e.g., via AVL or red-black trees) ensures all operations—insertion, deletion, and queries—achieve O(\log n) worst-case time amortized, with insertions and deletions updating the max-endpoint augmentations along the path in constant time per level. The segment tree variant, which discretizes endpoints into a fixed array of size U (the coordinate universe), maintains O(n \log U) space but yields query times of O(\log U + k); while U can grow large for uncoordinated inputs, coordinate compression reduces the effective size, yielding O(n \log n) space and O(\log n + k) query time in practice. The centered (or medial) interval tree variant supports O(\log n + k) queries without augmentation by partitioning around center points, but insertions and deletions often require periodic rebuilds in O(n \log n) time to maintain balance, leading to amortized O(\log n) per update across a sequence of operations. Interval trees share similarities with segment trees in supporting interval queries, such as (finding intervals containing a point) and overlap detection, but differ significantly in structure and suitability. Segment trees are static structures built over a fixed of size U, enabling O(\log U + k) time for queries where k is the output size, yet they require superlinear space of O(n \log U) or O(U) in the worst case, making them efficient for dense, predefined ranges but inefficient for sparse data. In comparison, interval trees maintain a dynamic, balanced augmented with lists, achieving O(\log n + k) query time and O(n) linear space with O(\log n) updates, thus outperforming segment trees for evolving, sparse sets. Range trees, designed for multidimensional point data, extend interval tree principles to higher dimensions but at greater complexity. While interval trees focus on 1D interval overlaps with O(n) space and O(\log n + k) reporting time, range trees store points in a tree-of-trees structure, supporting orthogonal range queries in O(\log^d n + k) time for d dimensions using O(n \log^{d-1} n) space, making them ideal for multi-D point-range searches rather than pure 1D intervals. Interval trees remain simpler and more space-efficient for one-dimensional overlap problems without needing multidimensional extensions. Fenwick trees, or binary indexed trees, contrast sharply as they optimize aggregate operations like prefix sums over arrays in O(\log n) time with O(n) space, but lack support for geometric overlap reporting, limiting them to numerical queries rather than intersections. Key trade-offs highlight interval trees' strength in dynamic overlap reporting for arbitrary , balancing time and space better than segment trees' rigidity or trees' dimensionality overhead. For dynamic collections, augmented interval trees are preferred due to update efficiency and linear storage; static, dense universes favor segment trees despite space costs.

References

  1. [1]
    None
    ### Summary of Interval Trees
  2. [2]
  3. [3]
    Speeding up isosurface extraction using interval trees - IEEE Xplore
    The interval tree is an optimally efficient search structure proposed by Edelsbrunner (1980) to retrieve intervals on the real line that contain a given query ...
  4. [4]
    [PDF] Interval Trees
    Interval trees track axis-aligned segments, built recursively by sorting endpoints and using a median point, with depth O(log n) and O(n) storage.
  5. [5]
    [PDF] dynamic data structures for orthogonal - intersection queries
    In this paper we will describe a slightly modified version of the interval tree that Leads to a better dynamic tructure than developed by Edelsbrunner [6]. As ...
  6. [6]
    [PDF] Interval and Segment Trees - Amin Allam Home Page
    An interval [lo, hi] consists of the range between lo and hi inclusive, where lo ≤ hi. A point is a special case of an interval where lo = hi.
  7. [7]
    Interval Trees
    Define a closed interval object i as [t1, t2] such that \(t_1 \leq t_2\) , low(i) =t1, and high(i) = t2. Overlap(i1, i2) 1 $\;\;\;\;\;$ if low(i1) $\leq$ high( ...<|control11|><|separator|>
  8. [8]
    [PDF] HINT: A Hierarchical Index for Intervals in Main Memory
    Interval tree. One of the most popular data structures for intervals is Edelsbrunner's interval tree [16], a binary search tree, which takes O(n) space and ...
  9. [9]
    Finding All Overlapping Intervals | Baeldung on Computer Science
    Nov 11, 2022 · In this tutorial, we'll discuss overlapping intervals and how to detect them. In the beginning, we'll introduce the problem of finding overlapping ranges.2. Definitions · 4. Sweep-Line Approach · 4.2. Implementation
  10. [10]
    Bedtk: finding interval overlap with implicit interval tree | Bioinformatics
    We present bedtk, a new toolkit for manipulating genomic intervals in the BED format. It supports sorting, merging, intersection, subtraction and the ...
  11. [11]
  12. [12]
    Data Structures for Range Searching | ACM Computing Surveys
    A, AND BENTLEY, J. L. "Quad trees--a data structure for retrieval on composite keys," Acta Inf 4, 1 (1974), 1-9. Google Scholar. [12]. FREDMAN, M. "A near ...Missing: original | Show results with:original
  13. [13]
    [PDF] Efficient implementations of range trees - People | MIT CSAIL
    1 Range Trees. Range trees are the extension of a unidimensional data structure called the segment tree. I will first describe the segment tree, and then ...
  14. [14]
    [PDF] CMSC 420: Lecture 17 Range Trees - UMD CS
    Thus, each node of the 2-dimensional range tree has a pointer to a auxiliary 1-dimensional range tree. We can extend this to any number of dimensions. At ...
  15. [15]
    [PDF] the use of range-tree spatial indexing to speed cis data retrieval
    This paper discusses the use of a new concept, range trees, in these applications. Range trees are well suited to indexing GIS data elements which have finite ...
  16. [16]
    [PDF] Data-parallel Construction of Interval Trees and Range Trees
    Jun 10, 2024 · This report documents the theory and the data-parallel implementations of interval trees and range trees, or more specifically, ...
  17. [17]
    [PDF] Recognizing Weakly Simple Polygons - Jeff Erickson
    [tmin;tmax]. • We use a centered interval tree [4] for all O(n) intervals I ... deletion and merge operations as follows. First delete all nodes that ...
  18. [18]
    [PDF] CMSC 420: Lecture 5 AVL Trees - UMD CS
    An intriguing alternative for avoiding coding up the deletion algorithm is called lazy deletion. For each node, we maintain a boolean value indicating whether ...
  19. [19]
    [PDF] Augmented Search Trees
    If q is to the right of left endpoint of interval in r we have two cases: (a) If max(left(r)) > q there must be a segment in left subtree containing q and we.Missing: paper high
  20. [20]
    14.3 Interval trees - CLRS Solutions - walkccc.me
    Since all the interval tree operations used run in time O ( lg ⁡ n ) O(\lg n) O(lgn) and we only call them at most 3 n 3n 3n times, we have that the runtime is ...
  21. [21]
    [PDF] Tree Structures for Sets of Intervals
    4 Tree Structures for Sets of Intervals. Theorem. The interval tree structure is a static data structure that can be built in time O(n log n) and needs space ...
  22. [22]
    [PDF] 10 Range Trees and Segment Trees - Jeff Erickson
    The interval [v.min, v.max] is called the canonical range associated with v. In all examples, I will use the pivot value v.pivot ...
  23. [23]
    [PDF] on the intersection of orthogonal objects
    The third basic type is called the interval tree which stores n intervals in linear space and supports the enumeration of those which intersect a query interval ...
  24. [24]
    [PDF] Lecture 9: Augmentation - MIT OpenCourseWare
    The complexity is O(lg n + k). Sorted arrays are inefficient for insertion and deletion. For a dynamic data struc ture that supports range queries, we can use ...
  25. [25]
    [PDF] Optimal External Memory Interval Management
    In this paper we present the external interval tree, an optimal external memory data structure for answering stabbing queries on a set of dynamically maintained ...<|control11|><|separator|>
  26. [26]
    [PDF] Interval, Segment, Range, and Priority Search Trees
    O ut p ut : An interval tree, rooted at Interval Tree root ( S ) . Me t h o ... W e shall in this section describe an augmented binary search tree which is easily.
  27. [27]
    Difference Between Segment Trees, Interval Trees, Range Trees ...
    Mar 18, 2024 · In this tutorial, we'll discuss the difference between various types of trees: Segment Tree, Interval Tree, Range Tree, and Binary Indexed Tree.