Fact-checked by Grok 2 weeks ago

M-tree

The M-tree is a balanced, paged, and dynamic tree-based designed for indexing large datasets in generic spaces, where object similarity is defined solely by a that satisfies the properties of positivity, , and the . Introduced in 1997, it enables efficient support for similarity queries, such as range searches and k-nearest neighbor retrievals, in applications like databases without requiring objects to fit into traditional spaces or adhere to specific metrics like L_p norms. At its core, the M-tree organizes data hierarchically: leaf nodes store the actual database objects along with their distances to the parent routing objects, while internal (routing) nodes contain entry objects that represent subregions of the , each defined by a covering to bound the maximum distance from the routing object to any object in its subtree. This structure leverages the to prune irrelevant subtrees during query processing, thereby minimizing both disk I/O operations and expensive distance computations, which is particularly advantageous in high-dimensional or non-Euclidean spaces. The tree remains balanced through dynamic insertion algorithms that handle node splits using tunable policies, such as the mRAD for minimizing overlap or random selection for faster builds, ensuring without periodic reorganizations. Notable for its generality, the M-tree has influenced subsequent metric indexing techniques, including variants like the and , which address issues such as node overlap and dynamic adaptability in even larger datasets. Its performance has been empirically validated across diverse metric spaces, demonstrating superior efficiency over linear scans for similarity retrieval in domains like image processing and bioinformatics.

Introduction

Definition and Purpose

The M-tree is a balanced, paged tree data structure designed to index and organize objects within a generic metric space, where distances between objects are computed using a distance function that satisfies the properties of positivity, symmetry, and the triangle inequality, without relying on assumptions of Euclidean geometry or coordinate embeddings. This structure supports dynamic insertions and deletions, maintaining balance without requiring periodic reorganizations, making it suitable for evolving datasets. The primary purpose of the M-tree is to enable efficient similarity searches, particularly range queries and k-nearest neighbor (k-NN) queries, on large-scale datasets where only a black-box function is available, such as in applications involving content like color images or , and bioinformatics sequences like genomes. By leveraging the properties, the M-tree facilitates pruning of irrelevant subtrees during searches, significantly reducing the number of computations needed—often by up to 40% in experimental settings—thus improving query on non-vector that cannot be easily embedded into traditional spatial indexes. A key advantage of the M-tree lies in its exclusive reliance on the for search-space pruning, which avoids the coordinate-based assumptions inherent in structures like R-trees, allowing it to handle arbitrary metric spaces more effectively while remaining competitive in CPU costs even for vector data. The tree's is controlled by two parameters: a minimum m and a maximum capacity M for entries per node, with non- nodes typically supporting between m+1 and M children, and leaf nodes holding m to M data entries, where m is often set to approximately M/2 to balance load and splitting efficiency.

Historical Development

The M-tree was introduced in 1997 by Paolo Ciaccia, Marco Patella, and Pavel Zezula in their seminal paper presented at the 23rd International Conference on Very Large Data Bases (VLDB). This work proposed the M-tree as a dynamic, paged access method specifically designed to support efficient similarity searches in generic spaces, where objects are represented by feature vectors and distances are computed using any function adhering to the basic postulates of positivity, , and the . The development of the M-tree was motivated by the shortcomings of existing spatial access methods, such as R-trees and their variants, which were primarily tailored for multidimensional spaces under specific like or L_p norms. These structures struggled with non- spaces common in emerging applications, where computations could be arbitrary and computationally intensive, leading to inefficiencies in indexing and querying large datasets. By leveraging the to prune search spaces without embedding data into coordinates, the M-tree extended the applicability of tree-based indexing to broader domains, including content-based retrieval in databases. Following its introduction, early experiments validated the M-tree's performance on real-world datasets, such as color histograms from image databases and DNA sequences, demonstrating significant reductions in page I/Os and distance computations compared to adapted variants, particularly in higher dimensions. Subsequent developments in the late and early built upon this foundation, including bulk-loading techniques for improved construction efficiency in 1998 and optimizations to the building principles in 2003, which enhanced its scalability for dynamic insertions. The M-tree exerted considerable influence on the metric indexing field, inspiring a family of structures like the PM-tree in and becoming a benchmark for similarity search techniques, with later implementations in database systems, such as a 2019 extension using PostgreSQL's GiST framework. Its high citation count—over 2,000 as of recent analyses—underscores its role in advancing exact and approximate similarity queries across and bioinformatics applications.

Theoretical Foundations

Metric Spaces and Distance Measures

A metric space is formally defined as a pair M = (D, d), where D is a domain consisting of a set of objects and d is a function that assigns a non-negative to every pair of objects in D. The function d: D \times D \to \mathbb{R}_{\geq 0} must satisfy three fundamental axioms to qualify as a : positivity, which requires d(o_1, o_2) > 0 for all distinct objects o_1, o_2 \in D and d(o, o) = 0 for any o \in D; , which states that d(o_1, o_2) = d(o_2, o_1) for all o_1, o_2 \in D; and the , expressed as d(o_1, o_3) \leq d(o_1, o_2) + d(o_2, o_3) for all o_1, o_2, o_3 \in D. These properties ensure that the function behaves intuitively, providing a foundation for measuring similarity in diverse data types without requiring coordinate embeddings. Representative examples of distance measures that satisfy these metric axioms include the for vector , defined as d(\mathbf{x}, \mathbf{y}) = \sqrt{\sum_{i=1}^n (x_i - y_i)^2}, commonly applied to compare histograms such as color distributions in images. The Manhattan distance, or L_1 , given by d(\mathbf{x}, \mathbf{y}) = \sum_{i=1}^n |x_i - y_i|, is another vector-based measure suitable for histogram comparisons where grid-like paths are relevant. For textual , the () quantifies the minimum number of single-character insertions, deletions, or substitutions needed to transform one string into another, serving as a for similarity. In shape analysis, the between two sets A and B is d_H(A, B) = \max \left( \sup_{a \in A} \inf_{b \in B} d(a, b), \sup_{b \in B} \inf_{a \in A} d(b, a) \right), capturing the largest deviation between corresponding points and enabling -based comparisons of geometric objects. For compatibility with the M-tree, the distance function must be computable in finite time and strictly adhere to the axioms, allowing the structure to leverage relative distances without assuming any underlying or . This black-box treatment of the distance function enables the M-tree to handle arbitrary object domains, from numerical vectors to complex features, as long as the properties hold. However, certain challenges arise in practice. High-dimensional data can lead to performance degradation due to the curse of dimensionality, where distances between points become increasingly uniform, reducing the selectivity of indexing and increasing I/O costs in structures like the M-tree. Additionally, distance functions that violate metric axioms, such as those lacking the , are incompatible with the M-tree, as they prevent effective pruning during searches.

Triangle Inequality in Indexing

The property, a fundamental axiom of metric spaces, enables efficient indexing in structures like the M-tree by facilitating bounds that avoid exhaustive computations during search operations. For a query object Q and an object o in a subtree associated with a object p, the inequality yields a lower bound d(Q, o) \geq d(Q, p) - d(p, o), allowing estimation of minimum s to entire subtrees without evaluating s to every contained object. This bounding mechanism is precomputed where possible, reducing the computational overhead of calculations, which are often the bottleneck in metric-based similarity search. Pruning strategies in metric tree indexing exploit these bounds to skip irrelevant subtrees during traversal. In a range query with threshold radius r(Q), a subtree rooted at an object O_r with covering radius r(O_r) is explored if d(Q, O_r) \leq r(Q) + r(O_r); it can be pruned if d(Q, O_r) > r(Q) + r(O_r), or equivalently, if the lower bound d(Q, O_r) - r(O_r) > r(Q) guarantees no qualifying objects within, enabling safe elimination. This approach, derived directly from the , can prune up to 40% of necessary distance computations in practice by leveraging stored inter-object distances. Compared to Euclidean indexes like R-trees, which depend on coordinate systems and degrade under the curse of dimensionality in high-dimensional spaces due to overlapping bounding regions, the M-tree's reliance on the operates in arbitrary spaces without representations. This avoids requirements and preserves efficacy for non- distances, such as distances or those in features, where traditional spatial indexes fail. Despite these advantages, the triangle inequality's bounds can be loose in degenerate cases, such as when data points align nearly collinearly in the , resulting in lower bounds close to zero and reduced prunability—sometimes as low as 0.005 at levels—necessitating more traversals and diminishing overall .

M-tree Architecture

Node Types and Entries

The M-tree organizes data in a hierarchical structure consisting of internal (non-) nodes and nodes. Internal nodes contain entries that guide the search process toward relevant subtrees, while nodes store the actual data entries representing the database objects. This paged design ensures efficient disk-based storage and retrieval, with nodes fixed at a typical size of 4 to accommodate the underlying constraints. Entries in the M-tree are structured to leverage the properties without requiring coordinate embeddings. In leaf nodes, each consists of the value (representing the object O_j), its (oid(O_j)), and the precomputed to the parent object (d(O_j, P(O_j))). Internal node entries, known as entries, include the object (serving as a representative ), a pointer to the root of the child subtree (ptr(T(O_i))), the covering radius of the subtree (r(O_i)), and the to the parent (d(O_i, P(O_i))). These components enable during queries by exploiting the , though the covering radius is primarily used for decisions. The tree maintains balance through parameters that control node fanout: m specifies the minimum number of entries per node (except for the root), and M denotes the maximum, ensuring that nodes do not become overly sparse or full. This results in all leaf nodes residing at the same level, akin to B-trees, which guarantees logarithmic height and predictable query performance regardless of insertion order. For navigation, internal nodes use child pointers to traverse downward, while the stored distances to parents facilitate relative positioning within the metric space, supporting efficient index maintenance in secondary storage environments.

Routing and Covering Radii

In the M-tree, routing objects serve as representatives for subtrees in internal nodes. Each routing object, denoted as O_r, is a tuple consisting of the object itself, a pointer to the root of its covering tree T(O_r), a covering radius r(O_r), and a parent distance d(O_r, P(O_r)). The covering radius r(O_r) is defined as the maximum distance from O_r to any object within its subtree, ensuring that all objects in T(O_r) lie within a ball of radius r(O_r) centered at O_r. The distance component precomputes the metric distance d(O_r, P(O_r)) between the routing object and its routing object P(O_r), which is undefined for entries at the root level. This precomputation avoids redundant distance calculations during traversal, enhancing efficiency in metric spaces where distance computations can be costly. These attributes enable effective bounding and pruning of subtrees without accessing child nodes. By leveraging the , the covering and distance provide lower and upper bounds on distances from a query object to subtree contents—for instance, if the distance from a routing object to the query exceeds the sum of the query's and the subtree's covering , the entire subtree can be pruned. This mechanism significantly reduces the number of distance computations, with experimental results showing savings of up to 40% in typical scenarios. During node splits, the covering radii of promoted routing objects are adjusted to minimize overlap between sibling subtrees. For a split resulting in new groups N_1 and N_2, the radius for a promoted object O_{p1} in an internal node is updated as r(O_{p1}) = \max\{d(O_r, O_{p1}) + r(O_r) \mid O_r \in N_1\}, ensuring tight bounds for the redistributed entries. The parent distance is then recomputed relative to the new parent after the split.

Construction Algorithms

Insertion Procedure

The insertion procedure in an M-tree begins with the new object O_n and starts at the root node, recursively traversing down the tree to locate the most appropriate leaf node for accommodation. This descent relies on distance computations between O_n and the routing objects (ROs) in each node's entries, leveraging the triangle inequality to efficiently prune and select paths without exhaustive searches. Specifically, at an internal node, the algorithm identifies a set of "fitting" child subtrees where the distance d(O_n, O_r) from O_n to the RO O_r does not exceed the current covering radius r(O_r) of that entry; if such fitting entries exist, it selects the one minimizing d(O_n, O_r). If no fitting subtrees are available, the procedure chooses the entry that minimizes the potential increase in covering radius, computed as d(O_n, O_r) - r(O_r), and subsequently updates r(O_r) to \max(r(O_r), d(O_n, O_r)) for the selected path. This selection process uses pre-stored distances within entries and applies the to avoid unnecessary distance calculations—for instance, eliminating entries where |d(O_n, O_r) - d(O_r, O_p)| > r(O_r), indicating O_n cannot fit within the subtree's radius. The continues until reaching a leaf node, ensuring the chosen path balances balance and minimizes radius expansions. Upon arriving at a leaf node, if the node has available space (fewer than the fanout limit M entries), the new entry for O_n—including its , distances to the node's parent RO, and the covering radius—is simply stored. However, if the leaf is full, an condition triggers a split operation on that node, which may redistribute entries and propagate upward if an internal node subsequently overflows. This propagation recurses toward the root, potentially creating a new root node if the original root splits, thereby maintaining the tree's height-balanced structure. The core insertion algorithm can be outlined in pseudocode as follows:
Insert(N: node, entry(O_n): M-tree-entry) {
    if N is not a leaf then {
        let N_fit = {entries E in N | d(O_n, O_r) ≤ r(O_r)};
        if N_fit ≠ ∅ then
            let E ∈ N_fit such that d(O_n, O_r) is minimum;
        else
            let E ∈ N such that d(O_n, O_r) - r(O_r) is minimum;
            r(O_r) = max(r(O_r), d(O_n, O_r));
        Insert(*ptr(T(O_r)), entry(O_n));
    } else {
        if |N| < M then
            store entry(O_n) in N;
        else
            Split(N, entry(O_n));
    }
}
This recursive formulation ensures efficient integration of new objects while preserving the metric properties of the index.

Node Splitting Mechanism

In the M-tree, node splitting is triggered when the number of entries in a leaf or internal node exceeds the maximum fanout parameter M during an insertion operation. This overflow condition ensures the tree remains balanced and bounded in height, preventing excessive growth in a single node. Upon overflow, the splitting procedure selects two existing routing objects from the overflowing node as new pivots to partition its entries into two groups, thereby creating two sibling nodes. Pivot selection, often referred to as the "promote" step, employs heuristics to choose the pair of routing objects that optimizes the structure of the resulting subtrees. Common policies include the minimized radius (mRAD) approach, which selects pivots to minimize the sum of the covering radii of the new nodes, and the minimized maximum radius (mMRAD) policy, which aims to reduce the largest covering radius among the two. Other variants, such as maximizing the minimum distance between the new routing objects and their parent (MLBDIST) or random selection (RANDOM), provide trade-offs in efficiency and balance. Once pivots are chosen, the remaining entries are reassigned to one of the two new nodes based on their proximity to the respective pivots, using a generalized hyperplane partitioning method that assigns each entry to the nearest pivot; a balanced variant alternates assignments to ensure more even distribution. Following partitioning, the covering radii for the new routing objects (pivots) are recalculated as the maximum distance from each pivot to the objects in its assigned subtree, ensuring tight bounds that facilitate efficient pruning during queries. This adjustment minimizes overlap between sibling subtrees while adhering to the triangle inequality property of the underlying metric space. The split then propagates upward: the parent node is updated to include entries for both new routing objects, and if the parent overflows, the splitting recurses until a non-overflowing level is reached or the root is split, in which case a new root node is created with the two former root pivots as its children. Heuristics in partitioning prioritize minimizing the maximum radius or overlap between the new nodes to maintain low query costs and structural compactness.

Query Processing

Range Query Algorithm

The range query algorithm in the M-tree retrieves all data objects within a specified query radius r from a query object Q, leveraging the tree's structure to minimize distance computations through pruning. The process begins at the root node and proceeds recursively down the tree, examining entries in internal and leaf nodes to identify qualifying objects. During traversal of an internal node, the algorithm evaluates each routing object RO associated with a child pointer. The distance d(Q, RO) is computed only if a preliminary pruning test using the triangle inequality and pre-stored parent distances allows it; specifically, the test checks if |d(O_p, Q) - d(RO, O_p)| \leq r + covering radius of RO, where O_p is the node's parent object. If this holds, d(Q, RO) is calculated, and lower and upper bounds on possible distances from Q to objects in the subtree are derived as lb = d(Q, RO) - covering radius of RO and ub = d(Q, RO) + covering radius of RO. Subtrees are pruned if lb > r, indicating no overlap between the query ball and the subtree's covering ball. Otherwise, the child node is recursively traversed. At leaf nodes, the algorithm processes each data object entry similarly: a pruning test using the triangle inequality (|d(O_p, Q) - d(O_j, O_p)| \leq r, where O_j is the data object) determines whether to compute the exact distance d(Q, O_j). If d(Q, O_j) \leq r, the object is collected as part of the result set. This step ensures only relevant distances are evaluated, reducing computational overhead. The output of the algorithm is a list of all qualifying data objects, typically including their object identifiers and the computed distances d(Q, O_j) for verification or further processing. The k-nearest neighbors (k-NN) search in an M-tree retrieves the k objects closest to a given query object Q from the indexed dataset, assuming at least k objects are present. The algorithm employs a branch-and-bound strategy to efficiently prune the search space, leveraging the properties of the . It maintains a (PR) to manage unexplored subtrees and a result array (NN) of size k to store the current best k nearest neighbors, ordered by their distances to Q. A dynamic threshold, denoted as d_k, represents the distance to the k-th nearest neighbor in NN and is updated as closer objects are discovered, enabling progressive pruning. The traversal process adapts the principles of range query pruning but incorporates the evolving d_k as the search instead of a fixed . Starting from the , the algorithm recursively visits , computing the lower bound d_{\min}(T(o_r)) = \max\{d(o_r, Q) - r(o_r), 0\} for each entry o_r in a node, where r(o_r) is the covering of the subtree rooted at o_r. Subtrees are pruned if d_{\min}(T(o_r)) > d_k, as they cannot contain objects closer than the current k-th best. For non-pruned subtrees, the algorithm descends into to evaluate exact distances d(o, Q) for objects o, inserting qualifying objects into NN and resorting it to update d_k. This dynamic adjustment ensures that the search focuses on promising regions while discarding irrelevant ones as better candidates emerge. The PR operates as a min-heap containing entries for active subtrees, prioritized by their lower bound distances d_{\min}(T(o_r)) to facilitate a best-first exploration strategy. Each queue entry includes a pointer to the subtree, its d_{\min}, and an upper bound d_{\max}(T(o_r)) = d(o_r, Q) + r(o_r) for additional verification during processing. The algorithm selects the entry with the smallest d_{\min} using a ChooseNode , processes the corresponding via a k-NNNodeSearch subroutine—which may add child subtrees to PR or update NN—and continues until no viable candidates remain. This prioritization ensures that subtrees likely to yield the nearest neighbors are examined early, tightening the d_k and accelerating . The search terminates when PR is empty, indicating that all potentially relevant subtrees have been explored and no closer objects can exist beyond the current d_k. At this point, holds the k nearest neighbors, sorted by distance to . The algorithm guarantees exact results by exhaustively checking all non-pruned branches, with the branch-and-bound mechanism minimizing unnecessary distance computations.

Advanced Topics

Performance Analysis

The M-tree exhibits average-case time complexity of O(log n) for insertions and node splits, where n is the number of objects, due to its balanced structure and logarithmic tree height. Range queries and k-nearest neighbors (k-NN) searches achieve O(log n + k) complexity in the best case, assuming minimal node overlaps, though performance degrades with increasing overlaps as more subtrees must be traversed. Space complexity remains efficient, with fixed node sizes (e.g., 4 KB) accommodating up to hundreds of entries depending on object and routing object sizes, leading to overall storage scaling linearly with n. Several factors influence M-tree performance, including the parameter M ( capacity), which balances I/O efficiency against split frequency; larger M reduces but increases per- processing costs. size impacts traversal depth logarithmically, while distributions and dimensionality affect overlap degrees—higher dimensions often reduce capacity and elevate I/O costs, though they can lower CPU demands via the . Overlaps, exacerbated by unbalanced splits or clustered data, significantly raise query traversal costs by necessitating broader subtree explorations. Empirical evaluations in the original implementation demonstrate the M-tree's superiority over sequential scans, processing range queries on datasets of 100,000+ color histogram objects with up to 99% fewer distance computations. For k-NN queries, I/O and CPU costs grow logarithmically with dataset size, from approximately 10-20 node accesses for 10^4 objects to 30-50 for 10^5 objects under random split policies. Trade-offs between CPU and I/O are evident: optimized split policies like mMRAD reduce I/O by 20-25% in high-dimensional spaces at the expense of minor CPU increases during insertions. Limitations include substantial split overhead from O(M^2) CPU costs in dynamic insertions, which can accumulate in frequently updating datasets, and sensitivity to selection, as poorly chosen distances lead to excessive overlaps and degraded effectiveness. High-dimensional further constrain , amplifying I/O in disk-based scenarios without adaptive node sizing.

Variants and Extensions

The Slim-tree, introduced by Traina et al. in 2000, extends the M-tree by incorporating the Slim-Down technique, which redistributes entries after node splits to minimize overlaps between covering regions, and employs a algorithm for more balanced partitioning during splits. This approach reduces the "fat-factor," a measure of node overlap, leading to improved pruning efficiency in metric spaces. Studies show that the Slim-tree can outperform the original M-tree by up to 200% in range query performance, particularly in high-dimensional datasets. The PM-tree, or Pivoting M-tree, proposed by Skopal et al. in , enhances the M-tree by integrating pivot-based filtering at multiple levels to accelerate similarity searches in large databases. It combines the hierarchical structure of the M-tree with global and local pivots to exclude irrelevant subtrees more effectively, supporting dynamic insertions and deletions while maintaining properties. This variant is particularly suited for concurrency in multi-user environments through its efficient pruning mechanisms. Approximate M-trees facilitate faster queries by allowing controlled error bounds, often through probabilistic nearest neighbor searches that specify an accuracy parameter ε to trade precision for speed. These variants, building on the core M-tree, enable ε-approximate k-NN retrieval where results are guaranteed to include all objects within (1+ε) times the true distance to the k-th nearest neighbor, reducing traversal costs in expansive spaces. Extensions of the M-tree include hybrids with inverted files, such as the MHIM-tree, which integrates hashing, inverted indices for textual features, and M-tree routing for metric similarity in retrieval tasks. These structures support combined text-image queries by leveraging inverted lists for keyword filtering and M-tree branches for spatial or feature-based ranking. Additionally, the dynamic insertion capabilities of M-trees make them suitable for applications involving evolving datasets, such as , without requiring full rebuilds.

References

  1. [1]
    [PDF] M-tree: An Efficient Access Method for Similarity Search in Metric ...
    A new access method, called M-tree, is pro- posed to organize and search large data sets from a generic “metric space”, i.e. where ob- ject proximity is only ...
  2. [2]
    [PDF] High Performance Metric Trees Minimizing Overlap Between Nodes
    The Slim-tree shares the basic data structure with other metric trees like the M-tree [2] where data is stored in the leaves and an appropriate cluster ...<|control11|><|separator|>
  3. [3]
    [PDF] M+-tree: A New Dynamical Multidimensional Index for Metric Spaces
    Abstract. In this paper, we propose a new metric index, called M+-tree, which is a tree dynamically organized for large datasets in metric spaces.
  4. [4]
    Indexing Metric Spaces for Exact Similarity Search
    We offer a comprehensive survey of existing metric indexes that support exact similarity search: we summarize existing partitioning, pruning, and validation ...
  5. [5]
    [PDF] Indexing Structures for Similarity Search in Metric Spaces - FI MUNI
    A typical example of the discrete distance function is the edit distance, while. Minkowski distance measures are typical representatives of the other type.
  6. [6]
    [PDF] How to Improve the Pruning Ability of Dynamic Metric Access Methods
    Although the triangular inequality allows for pruning of distance calculations to objects in a previously read node, each disk access involves at least one ...Missing: strategies | Show results with:strategies
  7. [7]
    [PDF] MMIR: Kap.0 Einführung
    It was designed as a balanced tree with all leaves at the same level. In ... M-Tree (1997), Pyramid-Tree (1998), DABS-Tree (2000), P-Sphere Tree (2000) ...
  8. [8]
    [PDF] Indexing Metric Spaces with M-tree - University of Bologna
    Metric trees only consider relative distances of objects to organize and partition the search space, and just require the distance function to be a metric, ...
  9. [9]
    Slim-Trees: High Performance Metric Trees Minimizing Overlap ...
    The new insertion algorithm leads to a tree with high storage utilization and improved query performance, whereas the new split algorithm runs considerably ...Missing: subsequent | Show results with:subsequent
  10. [10]
    Approximate Searching with M-tree - Unibo
    We propose a probabilistic approach to approximate NN search, which allows two parameters to be specified at query time: the accuracy ε allows for a certain ...
  11. [11]
    A Unified Inverted Index for an Efficient Image and Text Retrieval
    The MHIM-tree seamlessly integrates several elements including Manhattan Hashing, Inverted index, and M-tree. To use our MHIM-tree wisely in the query, we ...
  12. [12]
    [PDF] Fast Anomaly Detection for Streaming Data - IJCAI
    This paper introduces Streaming Half-Space-Trees. (HS-Trees), a fast one-class anomaly detector for evolving data streams. It requires only normal data.