Fact-checked by Grok 2 weeks ago

Isolation forest

The Isolation Forest is an algorithm that identifies outliers in data by explicitly isolating them through random partitioning, rather than profiling normal instances. Developed by Fei Tony Liu, Kai Ming Ting, and Zhi-Hua Zhou in 2008, it constructs an ensemble of isolation trees (iTrees), where each tree recursively splits a random subsample of the data using randomly selected features and split values until instances are isolated. Anomalies are detected based on their shorter average path lengths in these trees, as they require fewer splits to isolate due to their distinctiveness from the majority of normal points. This approach leverages the principle that anomalies are "few and different," enabling efficient detection without assuming data distribution or employing distance measures. The algorithm achieves linear time complexity for training (O(t ψ log ψ), where t is the number of and ψ is the subsample ) and time for per instance, making it scalable to large, high-dimensional datasets where irrelevant features are common. By default, it uses 100 and subsamples of 256 instances, mitigating issues like the "swamping" effect (where anomalies mask each other) and the "masking" effect (where normal points obscure anomalies) through and . scores are computed as s(x, n) = 2^{-E(h(x))/c(n)}, where E(h(x)) is the for an instance x, and c(n) normalizes it against the expected path length for n instances; scores near 0.5 indicate normal points, while those approaching 1 or exceeding 0.75 signal anomalies. Isolation Forest has demonstrated superior performance over methods like (LOF) and One-Class Support Vector Machines (OC-SVM) in terms of area under the ROC curve () and execution speed on benchmark datasets, particularly those with evolving anomalies or irrelevant attributes. It is particularly useful in applications such as fraud detection, network intrusion monitoring, and system fault diagnosis, where training data may lack labeled anomalies. Extensions, such as the Extended Isolation Forest, have further improved its robustness to varying anomaly densities by incorporating proximity measures.

Introduction and History

History

The Isolation Forest algorithm was introduced in 2008 by Fei Tony Liu, Kai Ming Ting, and Zhi-Hua Zhou as an unsupervised method for anomaly detection, leveraging random partitioning to isolate outliers more efficiently than traditional density-based or distance-based approaches. The core idea, detailed in their seminal paper presented at the Eighth IEEE International Conference on Data Mining, emphasized the algorithm's linear time complexity and scalability for high-dimensional data without requiring distance computations. In 2010, , Ting, and Zhou proposed SCiForest as an evolution of Isolation Forest specifically tailored for detecting clustered anomalies in high-dimensional spaces, incorporating clustering to enhance isolation mechanisms while preserving computational efficiency. This variant addressed limitations in handling local anomalies by randomly selecting hyperplanes informed by data density, improving detection accuracy in scenarios with dense normal clusters. A key extension for dynamic environments came in 2013 with the development of iForestASD by Z. Ding and M. Fei, adapting Isolation Forest for through a sliding window framework that accommodates concept drift by periodically rebuilding trees to reflect evolving data distributions. This approach maintained the algorithm's isolation principle while enabling real-time processing and adaptation to non-stationary streams, such as sensor data or network traffic. Post-2015, Isolation Forest saw widespread industry adoption due to its robustness and ease of implementation, with notable integrations including its addition to the library in version 0.18 (released in ), facilitating broader use in pipelines for fraud detection, cybersecurity, and . By 2018, enhancements in version 0.20 further stabilized its behavior for production environments. As of 2025, recent advancements have focused on hybrid models that combine Isolation Forest with techniques for enhanced real-time , such as integrating it with (LSTM) networks for sequence modeling in detection via network traffic analysis. These hybrids leverage Isolation Forest's efficiency for initial isolation alongside neural architectures for capturing temporal dependencies, achieving superior accuracy in dynamic settings without excessive computational overhead.

Overview

Isolation Forest is an unsupervised algorithm designed to identify outliers in datasets by exploiting the principle that anomalies are rare and distinct from the majority of data points. Rather than profiling normal instances, as in many traditional methods, it explicitly isolates anomalies through random partitioning of the data space. This approach was first introduced in a 2008 paper by Liu, Ting, and Zhou. The core intuition of Isolation Forest relies on the observation that , being "few and different," require fewer partitions to be separated from the rest of the data compared to normal points, which tend to cluster and thus take longer to isolate. By constructing an ensemble of isolation trees—each a built via random splits on features and values—the algorithm aggregates path lengths from the root to leaf nodes across trees to determine anomaly likelihood, providing robustness through this collective decision-making process. As an unsupervised method, Isolation Forest requires no labeled examples of anomalies during training, distinguishing it from supervised techniques like Support Vector Machines (SVM), which depend on labeled data for classification. It achieves linear time complexity, [O(t \psi \log \psi)] for training, where t is the number of trees and \psi is the subsample size, achieved through subsampling, and constant memory usage, enabling efficient processing of large-scale datasets. Isolation Forest excels in high-dimensional spaces and scenarios with outliers, as it makes no assumptions about data distributions—unlike density-based methods such as (LOF)—allowing it to handle complex, real-world data without preprocessing for normality. This scalability and distribution-agnostic nature contribute to its superior performance in terms of both accuracy and speed over baselines like one-class SVM and random forests in empirical evaluations.

Algorithm Fundamentals

Isolation Tree Construction

The isolation tree, a core component of the , is constructed as a through a process of of a subsample. This method leverages to isolate data points efficiently, assuming that anomalies are more susceptible to due to their distinct characteristics. The construction begins with a randomly selected subsample of size \psi from the original dataset, typically set to 256 or 512 points to balance computational efficiency and . At each internal node, the partitioning step involves randomly selecting a feature q from the set of available attributes in the current subsample X. A split value p is then chosen uniformly at random from the observed range of q in X, specifically p \in [\min_q, \max_q], where \min_q and \max_q are the minimum and maximum values of q in the subsample. This randomization in both and split value ensures that the tree structure avoids to specific patterns in the data, promoting across the . The subsample X is subsequently partitioned into two child s: the left subset X_l containing points where q < p, and the right subset X_r containing points where q \geq p. The partitioning process is recursive, with each node splitting its subsample until one of the stopping conditions is met: the subsample size |X| reduces to 1 (achieving isolation), all points in X share identical attribute values (indicating no further meaningful split), or the tree height reaches the predefined limit l = \lceil \log_2 \psi \rceil. This height limit prevents excessive depth and maintains computational tractability, as deeper trees would require more resources without proportional benefits in isolation. External nodes, which are leaves, store the size of the subsample that reached them, aiding in subsequent path length computations, though the focus here remains on the growth mechanism. The algorithm for constructing an isolation tree can be formalized as follows in pseudocode:
iTree(X, e, l)
Input: subsample X, current height e, height limit l
Output: isolation tree structure

if |X| ≤ 1 or e ≥ l then
    return external node with size |X|
else
    randomly select feature q ∈ Q (attributes in X)
    randomly select split value p ∈ [min_q, max_q]
    partition X into X_l (q < p) and X_r (q ≥ p)
    left child = iTree(X_l, e+1, l)
    right child = iTree(X_r, e+1, l)
    return internal node (q, p, left child, right child)
This recursive procedure ensures that each isolation tree is built independently and efficiently, with an average construction time complexity of O(\psi \log \psi) due to the balanced expected height from random splits.

Isolation Forest Ensemble

The Isolation Forest algorithm constructs an ensemble of t isolation trees, denoted as iTrees, to enhance the robustness of anomaly isolation through collective decision-making. Each iTree in the ensemble is built independently using a random subsample of size \psi drawn without replacement from the original dataset, with a default subsample size of \psi = 256 to balance computational efficiency and detection performance. This random subsampling process introduces diversity among the trees by exposing each iTree to a unique subset of the data, thereby mitigating issues such as the swamping effect (where anomalies are overshadowed by normal points) and the masking effect (where clustered anomalies hide from isolation). The training of the Isolation Forest ensemble involves generating these t subsamples and constructing an iTree on each via recursive random partitioning of the feature space, typically until individual instances are isolated or a predefined tree height is reached. Since the construction of each iTree is independent, the ensemble can be built in parallel across multiple processors, achieving a time complexity of O(t \psi \log \psi) and enabling efficient scaling to large datasets. In the ensemble, each iTree isolates data points independently based on its random partitions, and the results are aggregated to form a unified model that reduces the variance inherent in any single tree's isolation behavior. The hyperparameter t, representing the number of trees, plays a critical role in the ensemble's stability; a default value of t = 100 is commonly used, as performance metrics stabilize well before this point, with increasing t progressively reducing variance in the isolation outcomes across trees. This variance reduction arises from the averaging effect of the ensemble, where diverse subsamples ensure that the collective isolation paths capture a more representative view of the data's structure without overfitting to noise in individual subsamples.

Anomaly Detection Process

The anomaly detection process in Isolation Forest evaluates test instances against a pre-trained ensemble of isolation trees to identify outliers based on their susceptibility to . For a new data point x, the inference begins by passing it through each of the t trees in the forest, typically with t = 100 for stable results. In each tree, x is traversed from the root node to an external node, recording the path length h(x) as the number of edges encountered during this descent plus c(s) if the external node size s > 1, where c(s) is the for s instances. The average path length across all trees, denoted E(h(x)), then serves as the primary isolation measure for x. Anomalies are determined through thresholding: points with shorter average path lengths—indicating easier isolation—are classified as anomalies, as they deviate from the denser clusters of normal data that require longer paths to separate. This approach leverages the fact that outliers are fewer and more distinct, allowing them to be isolated with fewer splits. Isolation Forest primarily operates in batch detection mode, where a collection of test points is processed collectively against the fixed ensemble for efficiency, with constant per point. Extensions, such as hybrid models, enable online detection for by incrementally updating the forest structure. Post-processing typically involves ranking instances by their average path lengths in ascending order, facilitating the identification and prioritization of the most easily isolated points as top anomalies. This ranking supports downstream tasks like visualization or alerting without altering the core detection logic.

Anomaly Scoring and Properties

Anomaly Score Calculation

In the Isolation Forest algorithm, the path length h(x) for an instance x in a single isolation tree is defined as the number of edges traversed from the root node to the external (leaf) node where the isolation of x terminates. This measure captures how quickly an instance can be isolated by random splits, with shorter paths indicating instances that are easier to separate from the majority of the data. To normalize path lengths across trees and datasets, the average path length c(\psi) is used, where \psi is the subsample size employed in tree construction. The formula is given by c(\psi) = \begin{cases} 2H(\psi - 1) - \frac{2(\psi - 1)}{\psi} & \text{for } \psi > 2, \\ 1 & \text{for } \psi = 2, \\ 0 & \text{otherwise}, \end{cases} with H(i) denoting the i-th , approximated as H(i) \approx \ln(i) + \gamma (where \gamma \approx 0.5772156649 is ). This normalization, derived from the expected path length in a for unsuccessful searches, adjusts for varying subsample sizes to ensure comparable measures regardless of scale. The anomaly score s(x, n) aggregates path lengths across the ensemble of n isolation trees, using the expected path length E(h(x)), which is the average h(x) over all trees. It is computed as s(x, n) = 2^{-\frac{E(h(x))}{c(n)}}, where n here represents the subsample size (consistent with \psi in tree building). This exponential form transforms the normalized path length into a score between 0 and 1, providing a probabilistic interpretation of anomaly likelihood. The score interpretation hinges on its value relative to 0.5: scores approaching 1 indicate anomalies, as E(h(x)) is much shorter than c(n), signifying easy isolation; scores near 0.5 suggest instances neither clearly anomalous nor normal, occurring when E(h(x)) \approx c(n); scores below 0.5 (approaching 0 for very long paths) identify inliers, as these instances require many splits to isolate and blend with the data majority. The use of c(n) inherently normalizes scores for different dataset sizes, making the method robust to variations in data volume without additional rescaling.

Key Properties

Isolation Forest exhibits several inherent characteristics that distinguish it from traditional methods, primarily due to its isolation-based approach rather than density or . A key property is its isolation efficiency, where anomalies are isolated more rapidly than normal instances because of their sparsity and distinct attribute values in the feature space. This results in shorter path lengths for anomalies in isolation trees, as they require fewer random partitions to be separated from the majority of data points. The algorithm demonstrates strong , with a of O(t \psi \log \psi), where t is the number of trees and \psi is the subsampling size, making it independent of the number of anomalies in the dataset. This efficiency arises from the data for each construction, ensuring linear scalability with respect to the total data size n when \psi is fixed at a small constant like 256. Additionally, memory usage remains low, typically requiring only O(t \psi) space. Robustness to irrelevant features is another notable property, achieved through random attribute selection during tree construction, which mitigates the impact of high-dimensional or noisy data without requiring explicit . This random partitioning helps in avoiding to irrelevant dimensions and maintains performance in sparse or high-dimensional spaces. Despite these strengths, Isolation Forest has limitations, including sensitivity to the , which estimates the proportion of anomalies and directly influences the score ; misestimation can lead to suboptimal detection rates. It is also less effective for detecting clustered anomalies, where outliers may be masked by similar nearby points or swamp the isolation process in dense regions. Theoretically, these properties are underpinned by mass-volume analysis, which demonstrates that outliers are isolated faster than normal points because anomalies occupy smaller volumes in the data space and thus require fewer splits to isolate. This analysis provides a probabilistic foundation for why path lengths serve as a reliable indicator of abnormality, with shorter average paths corresponding to higher isolation likelihood for outliers.

Parameter Tuning and Implementation

Parameter Selection

The selection of hyperparameters in the Isolation Forest algorithm is essential for achieving effective anomaly detection while managing computational resources and adapting to dataset characteristics such as dimensionality and anomaly prevalence. The primary parameters include the contamination rate, number of trees, subsample size, and maximum features per split, each influencing the model's isolation mechanism and output scores. The contamination parameter represents the expected fraction of anomalies in the data, with a common default value of 0.1 in practical implementations; it determines the applied to anomaly scores, where values closer to 0.5 allow for higher anomaly proportions but risk over-detection. In the scikit-learn implementation, the default is 'auto', which computes the using the expected path length from the original formulation, though users can override it with a in (0, 0.5] based on to adjust . The number of trees, denoted as t, defaults to 100 and controls the ensemble size, offering a trade-off between detection accuracy and training speed; typical values range from 100 to 500, as higher counts enhance stability but increase runtime linearly. The original authors note that average path lengths converge effectively at t \approx 100, making this a robust starting point for most applications without unnecessary overhead. The subsample size, often symbolized as \psi, specifies the number of instances drawn randomly for constructing each isolation tree and defaults to 256 (or the minimum of 256 and the size); values of 256 or 512 are standard, as they balance bias-variance trade-offs by promoting diverse isolations while keeping memory usage low. Empirical evaluations in the seminal work recommend restricting \psi to 128–512, beyond which processing time rises disproportionately without accuracy improvements. The max_features parameter defaults to 1.0, indicating the use of all features for splits in each tree, but can be set to a fraction (e.g., 0.8) to subsample features randomly, which adds diversity in high-dimensional data at the cost of extended computation. This setting, when reduced below 1.0, helps mitigate in feature-rich environments but requires careful calibration to avoid diluting isolation effectiveness. Tuning these parameters typically involves grid search or random search over predefined ranges, often using cross-validation on a validation set if partial labels are available to optimize supervised metrics like AUC-ROC. In purely unsupervised contexts, intrinsic metrics such as the silhouette score can evaluate the separation between normal points and detected anomalies, guiding selection via tools like scikit-learn's GridSearchCV with custom scorers. Such methods refine the anomaly scores by aligning isolation paths more closely with data structure, though they demand computational investment proportional to the search space.

Open-Source Implementations

One of the most widely used open-source implementations of the Isolation Forest algorithm is available in the library, a popular toolkit in . The class was introduced in version 0.18 of and provides methods such as fit for training the model on data and predict for scoring new observations as anomalies or inliers. It supports key parameters like contamination, which estimates the proportion of outliers in the dataset to adjust decision thresholds. PyOD is another library specialized for and , offering an enhanced variant of Isolation Forest through its IForest class. This implementation wraps scikit-learn's IsolationForest but adds extended functionalities, such as integration with other detection algorithms and built-in benchmarking tools to evaluate performance across datasets. PyOD's IForest is designed for seamless use in pipelines, supporting parameters like the number of trees while providing utilities for model comparison and visualization of results. For large-scale data processing, a distributed implementation of Isolation Forest is provided by the isolation-forest library, which operates on Apache Spark for scalable anomaly detection. This Scala-based tool enables training on clusters, handling big data volumes through parallel processing of isolation trees, and supports features like ONNX export for model portability across environments. In the R programming language, the isotree package offers a fast, multi-threaded implementation of Isolation Forest, suitable for outlier detection in tabular data. It includes core functions for model fitting and prediction, along with utilities for generating diagnostic plots to visualize isolation paths and anomaly scores.
LibraryLanguageKey FeaturesScalability
fit/predict methods; contamination parameter; ensemble integrationSingle-machine; handles moderate datasets
PyODEnhanced wrapper; benchmarking tools; pipeline supportSingle-machine; optimized for anomaly tasks
isolation-forest (Spark)Distributed training; ONNX export; parallel tree buildingCluster-based; big data via
isotreeMulti-threaded; plotting utilities; extended variantsSingle-machine; multi-core parallelization

Variants

SCiForest

SCiForest addresses the limitations of the original Isolation Forest in detecting clustered anomalies within high-dimensional data, where the curse of dimensionality causes points to become sparse and equidistant, leading to a masking effect that hides local anomalies dense in certain subspaces. This variant is particularly motivated by challenges in sparse high-dimensional datasets, such as text or , where traditional methods struggle to isolate anomalous clusters without assuming global sparsity. In SCiForest, subspace selection occurs by generating random hyperplanes defined over a small number of attributes q (with q \ll d, typically q=2) to focus on informative s that better separate anomalies from normal points. These hyperplanes are non-axis-parallel, and the best one is chosen from multiple candidates (\tau=10) using an gain criterion, which measures the separation between data distributions on either side of the hyperplane. The algorithm builds an ensemble of isolation trees (t=100) on subsamples (\psi=256), where each tree is constructed by recursively applying the selected to split the data until subsets are small (|X'| \leq 2). For anomaly scoring, SCiForest computes the path length h(x) for a test instance x through each tree, averaging these across the ensemble to obtain the expected path length E(h(x)), with shorter paths indicating anomalies. The score is normalized using the average path length c(\psi) for subsamples of size \psi, and thresholds are set based on these subspace-adjusted averages to classify points, penalizing those with out-of-range paths. Key hyperparameters include the subspace size q (e.g., 2, tunable based on dimensionality), number of hyperplane trials \tau, number of trees t, and subsample size \psi. This approach offers advantages in sparse high-dimensional data by leveraging hyperplanes and Sd gain to detect local clustered anomalies more effectively than the original Isolation Forest, with empirical superiority on datasets like KDDCUP 1999 and synthetic high-dimensional benchmarks, while maintaining linear .

Extended Isolation Forest

The Extended Isolation Forest (EIF) is an enhancement to the original Isolation Forest that addresses inconsistencies in score assignment caused by the axis-parallel splits in standard isolation trees, which can lead to artifacts in score heatmaps and biased variance along certain directions. Developed by Sahand Hariri, Matias Carrasco Kind, and Robert J. Brunner in 2018, EIF improves the robustness and reliability of by incorporating randomly oriented hyperplanes for data partitioning, reducing score variance for points on constant level sets. EIF operates in a manner similar to the original algorithm but modifies the tree construction to use non-axis-aligned splits. In the preferred approach, each split employs a with a random , selected to isolate instances more uniformly and avoid directional biases inherent in random feature and value selections. An alternative method involves randomly transforming the data before building each tree to average out biases, though the method is recommended for better performance. The ensemble of such extended isolation trees (eTrees) is used to compute scores based on average path lengths, normalized similarly to the original, where shorter paths indicate anomalies. This variant maintains the linear time complexity and scalability of Isolation Forest while providing more consistent anomaly scores, particularly beneficial in high-dimensional spaces where split directions matter. Evaluations on synthetic datasets and real-world benchmarks demonstrate comparable AUROC and AUPRC to the original method, with reduced variance in score assignments and no increase in computation time. EIF has been applied in areas such as astronomy data analysis and general outlier detection tasks requiring precise scoring.

Applications

General Use Cases

Isolation Forest has found wide applicability in unsupervised anomaly detection across diverse domains due to its efficiency in handling high-dimensional data and scalability to large datasets. In , it is employed to identify intrusions and distributed denial-of-service (DDoS) attacks by analyzing patterns in network traffic data, where anomalies manifest as unusual traffic volumes or protocols that deviate from normal behavior. For instance, the algorithm isolates aberrant packets or flows that could indicate malicious activities, enabling real-time threat mitigation without labeled training data. In , Isolation Forest aids in detecting faulty sensors within (IoT) streams by flagging outliers in sensor readings that signal equipment malfunctions or environmental interferences. This approach is particularly suited for continuous in settings, where it processes to predict needs and prevent . Similarly, in , beyond specific fraud scenarios, it detects outliers in volumes, such as sudden spikes or drops in market activity that may indicate economic irregularities or system errors. By leveraging random partitioning, it efficiently ranks these deviations in high-volume financial datasets. Healthcare applications include monitoring anomalous patient vitals in real-time systems, where Isolation Forest identifies irregular heart rates, blood pressure, or oxygen levels that could signal critical health events. This unsupervised method integrates seamlessly with wearable devices and electronic health records to prioritize alerts for medical intervention. In environmental monitoring, it uncovers unusual patterns in climate sensor data, such as aberrant temperature or humidity readings that deviate from expected seasonal trends, supporting early detection of ecological disruptions. Anomalies are typically ranked using the isolation-based scores from the ensemble of trees. Recent extensions as of 2025 have applied it to real-time IoT security for environmental parameters and aero-engine fault detection in manufacturing. Performance in these unsupervised settings is commonly evaluated using metrics like the area under the curve (AUC-ROC) to assess ranking quality across thresholds, or precision-recall curves to handle imbalanced data where anomalies are rare.

Credit Card Fraud Detection Case Study

The Credit Card Fraud Detection , provided by the ULB Group, consists of anonymized transactions from European cardholders over two days in September 2013, totaling 284,807 records with 492 confirmed frauds (0.172% prevalence), resulting in a highly imbalanced distribution. The features include 28 (PCA)-transformed variables (V1 to V28), along with 'Time' (seconds elapsed since first transaction) and 'Amount' (transaction value), with the target 'Class' labeling fraud as 1 and legitimate as 0. Preprocessing involves the non-PCA features to handle outliers in transaction amounts, using a robust scaler that reduces sensitivity to extreme values by relying on medians and quartiles rather than means. To address class imbalance without relying solely on the model's parameters, of the majority (legitimate) class is applied, selecting a subset to balance training data while preserving all instances. Feature follows, using importance scores to retain the top five most discriminative features (e.g., V14 with importance 0.191), reducing dimensionality and focusing on key indicators. The Isolation Forest model is trained using scikit-learn's , with key parameters set to n_estimators=100 (number of trees), max_samples=128 ( size per tree), and contamination=0.001 (approximating the known fraud proportion of ~0.0017). The dataset is split into 70% and 30% testing sets, with further validation partitioning (60:40) to tune thresholds; isolates anomalies by constructing random isolation trees on the preprocessed features, leveraging the algorithm's efficiency for high-dimensional, imbalanced data. Evaluation emphasizes metrics suited to imbalance, including , , F1-score, ROC-AUC, and AUCPR (area under the - ). On the test set, the model achieves of 0.807, of 0.764 (detecting 76.4% of frauds), and F1-score of 0.785; ROC-AUC reaches 0.974, while AUCPR is 0.759, indicating strong discrimination at low false positive rates. A reveals effective separation, with high true negatives for legitimate transactions and minimized false positives among the vast majority class. Visualizations include -recall and curves, demonstrating the model's ability to maintain high (>0.90) at thresholds above 0.50, highlighting its robustness to imbalance without explicit labeling assumptions. Path length distributions from the isolation trees further illustrate shorter paths for instances, confirming their status. These strengths enable Isolation Forest to handle the dataset's skewness effectively, outperforming baselines like (F1-score ~0.00 on similar unsupervised setups) by isolating anomalies faster in high dimensions. Overall results show a fraud detection rate exceeding 90% achievable at tuned thresholds with low false positives (<1% of legitimate transactions flagged), surpassing Local Outlier Factor's performance in recall and computational efficiency on this benchmark.

References

  1. [1]
    [PDF] Isolation Forest - LAMDA
    This paper proposes a fundamentally different model-based method that explicitly isolates anomalies in- stead of profiles normal points. To our best knowledge, ...
  2. [2]
    Isolation Forest | IEEE Conference Publication
    This paper proposes a fundamentally different model-based method that explicitly isolates anomalies instead of profiles normal points. To our best knowledge ...
  3. [3]
    [PDF] Isolation-based Anomaly Detection - LAMDA
    This paper proposes a method called Isolation Forest (iForest) which detects anomalies purely based on the concept of isolation without employing any distance ...
  4. [4]
    On Detecting Clustered Anomalies Using SCiForest
    Without using any density or distance measure, a new method is proposed called SCiForest to detect clustered anomalies that maintains the ability of ...<|control11|><|separator|>
  5. [5]
    An Anomaly Detection Approach Based on Isolation Forest ...
    In this paper, using the frame of sliding windows and taking into account the concept drift phenomenon, a novel anomaly detection framework is presented and an ...Missing: SCiForest | Show results with:SCiForest
  6. [6]
    IsolationForest — scikit-learn 1.7.2 documentation
    IsolationForest returns anomaly scores by randomly selecting features and split values. Shorter path lengths in a forest indicate anomalies.Missing: 2018 | Show results with:2018
  7. [7]
    sklearn.ensemble.IsolationForest — scikit-learn 0.20.4 documentation
    New in version 0.20: behaviour is added in 0.20 for back-compatibility purpose. Deprecated since version 0.20: behaviour='old' is deprecated in 0.20 and ...
  8. [8]
    Robust IoT security using isolation forest and one class SVM ...
    Oct 21, 2025 · A recent study proposed a deep learning framework called AttackNet, developed for detecting and classifying diverse botnet attacks in IIoT ...
  9. [9]
    Isolation Mondrian Forest for Batch and Online Anomaly Detection
    Mar 8, 2020 · The proposed method is a novel hybrid of isolation forest and Mondrian forest which are existing methods for batch anomaly detection and online ...
  10. [10]
    [PDF] The Effect of Hyperparameter Tuning on the Comparative Evaluation ...
    Aug 15, 2021 · Hyperparameter Tuning on the Comparative Evaluation of Unsupervised ... Isolation forest. In 2008. Eighth IEEE Intl. Conference on Data ...
  11. [11]
    linkedin/isolation-forest - GitHub
    A distributed Spark/Scala implementation of the isolation forest algorithm for unsupervised outlier detection, featuring support for scalable training and ONNX ...Isolation-Forest · Model Parameters · Onnx Conversion For Portable...
  12. [12]
    [PDF] On Detecting Clustered Anomalies using SCiForest - LAMDA
    Abstract. Detecting local clustered anomalies is an intricate problem for many existing anomaly detection methods. Distance-based and density-based methods.
  13. [13]
  14. [14]
    Incremental Isolation Forest to Handle Concept Drift in Anomaly ...
    Jan 4, 2024 · We propose Incremental Isolation Forest to quickly update the existing Isolation Forest in response to the arrival of new data.
  15. [15]
    Web Traffic Anomaly Detection Using Isolation Forest - MDPI
    Nov 5, 2024 · This study aims to implement Isolation Forest, an unsupervised machine learning methodology in the identification of anomalous and non-anomalous web traffic.
  16. [16]
    Detecting Network Anomalies with Set-Structured Isolation Forest
    Dec 8, 2024 · This paper discusses the usage of the Isolation forest approach to detecting anomalies in internet scan data for applications in attack surface ...
  17. [17]
    Self-Healing IoT Sensor Networks with Isolation Forest Algorithm for ...
    This research uses the Isolation Forest algorithm to provide a new method for autonomous defect detection and recovery in IoT sensor networks. The Isolation ...
  18. [18]
    Anomaly detection system for data quality assurance in IoT ...
    In the context of IoT, the Isolation Forest [26] algorithm has been one of the most prominent ones currently, due to its effectiveness in such varied ...
  19. [19]
  20. [20]
    [PDF] Interpretable Outlier Detection in Financial Data - Uppsala University
    Mar 12, 2022 · Two models, an outlier detection model Isolation Forest, and a feature importance model (MI-. Local-DIFFI and its subset Path Length Indicator) ...
  21. [21]
    Isolation Forest Anomaly Detection in Vital Sign Monitoring for ...
    The use of the isolate forest technique for recognizing anomalies in monitoring vital signs in healthcare is examined in this work.
  22. [22]
    Anomaly-based threat detection in smart health using machine ... - NIH
    Nov 19, 2024 · Isolation Forests assigns anomaly scores to data points, with normalcy being indicated by a lower score. An anomaly score of 0.57 is applied to ...
  23. [23]
    (PDF) Isolation Forest for Environmental Monitoring: A Data-Driven ...
    Apr 7, 2025 · This paper examines land management technologies to enhance environmental monitoring more efficiently.
  24. [24]
    Evaluation of outlier detection estimators - Scikit-learn
    This example compares two outlier detection algorithms, namely Local Outlier Factor (LOF) and Isolation Forest (IForest), on real-world datasets available in ...
  25. [25]
    [PDF] Evaluating the Isolation Forest Method for Anomaly Detection in ...
    It encompasses a detailed examination of various metrics such as classification accuracy, precision, recall, F1-score, AUC-ROC, AUC-PR, and the impact of ...
  26. [26]
    [PDF] Performance Analysis of Isolation Forest Algorithm in Fraud ...
    Abstract-Losses incurred due to fraud on e-commerce transactions, especially those based on credit cards, continue to increase, resulting in large losses ...Missing: pdf | Show results with:pdf
  27. [27]
    [PDF] Isolation Forest and Xg Boosting For Classifying Credit Card ...
    Jun 5, 2019 · The main objective of this paper is to create a predictive model that capture the fraudulent transactions with high accuracy using. Isolation ...
  28. [28]
    [PDF] MACHINE LEARNING ALGORITHMS FOR CREDIT CARD FRAUD ...
    but has a very low f1-score compared to isolation forest. The local outlier factor had the poorest performance. KEYWORDS. Credit card ... Credit card is one ...
  29. [29]
    Isolation Forest and Local Outlier Factor for Credit Card Fraud Detection System
    ### Summary of "Isolation Forest and Local Outlier Factor for Credit Card Fraud Detection System" by V. Vijayakumar et al.