Fact-checked by Grok 2 weeks ago

Anomaly detection

Anomaly detection is a core technique in data analysis and machine learning used to identify rare items, events, or observations that deviate significantly from the majority of the data, often indicating potential issues such as errors, fraud, or faults.^[1] These anomalies, also known as outliers, arise when patterns do not conform to expected normal behavior, which can be induced by malicious activities, system failures, or novel events.^[1] The process typically involves modeling normal data distributions and flagging deviations, though challenges include defining "normal" behavior in unlabeled or high-dimensional datasets.^[2] Anomalies are categorized into three primary types: point anomalies, which affect individual data instances (e.g., a single fraudulent transaction); contextual anomalies, which are unusual only within a specific context (e.g., high spending during off-peak hours); and collective anomalies, where a collection of related instances is abnormal (e.g., a sequence of network attacks).^[1] This classification helps tailor detection methods to the data's structure, whether static, time-series, spatial, or graph-based.^[2] Recent advancements emphasize handling evolving data streams and complex patterns, incorporating contextual factors for more accurate identification.^[2] The technique finds widespread applications across domains, including cybersecurity for intrusion detection, finance for fraud prevention, healthcare for disease outbreak monitoring, and manufacturing for fault diagnosis.^[1] In environmental monitoring, it detects unusual sensor readings signaling pollution spikes, while in social media, it identifies anomalous user behaviors indicative of bot activity.^[2] These uses underscore its role in enabling proactive responses, though effectiveness depends on data quality and domain-specific assumptions about normality.^[1] Detection methods span statistical, machine learning, and hybrid approaches, with statistical techniques like Gaussian mixture models assuming normal data follows probabilistic distributions.^[1] Machine learning methods include clustering-based (e.g., DBSCAN for density estimation) and nearest-neighbor-based (e.g., local outlier factor for relative deviation scoring) algorithms, often unsupervised due to the rarity of labeled anomalies.^[2] Emerging deep learning techniques, such as autoencoders and generative adversarial networks, excel in high-dimensional data like images and time series by learning latent representations of normality.^[2]

Definition and Types

Core Concepts

Anomaly detection refers to the identification of rare events or observations in a dataset that differ substantially from the expected norm, often signaling deviations generated by different underlying processes. A foundational definition comes from Hawkins (1980), who described an outlier as "an observation which deviates so much from other observations as to arouse suspicions that it was generated by a different mechanism."^[3] This concept has been adapted in anomaly detection to encompass patterns that do not conform to the typical behavior of the data, as articulated by Chandola et al. (2009), who define anomalies as instances that deviate from expected norms in a given context.^[4] Formally, anomaly detection is closely related to but sometimes distinguished from outlier detection, with the latter often emphasizing statistical deviations that may include noise or errors, while anomalies highlight contextually significant irregularities of interest.^[5] There is also notable ambiguity in terminology, particularly between anomaly detection and novelty detection; the former broadly identifies deviations within potentially contaminated data, whereas novelty detection specifically targets previously unseen patterns assuming a training set free of anomalies. Effective anomaly detection typically presupposes that normal data points form the vast majority of the dataset and exhibit discernible structure or distribution, rendering anomalies both rare and detectable as exceptions.^[4] Within data mining, anomaly detection serves as a critical exploratory task for uncovering unexpected patterns amid large volumes of data, predominantly through unsupervised paradigms that leverage the abundance of normal instances without requiring anomaly labels, alongside semi-supervised approaches using only normal data for training.^[4] Supervised variants, though feasible, are constrained by the rarity of labeled anomalies. This framework enables applications such as detecting irregular activities in cybersecurity systems.^[4]

Types of Anomalies

Anomalies are broadly categorized into point, contextual, and collective types based on their deviation patterns relative to the data distribution. These classifications help in understanding the structural variations in anomalous instances before applying detection techniques.^[1] Point anomalies, often referred to as global outliers, consist of individual data instances that deviate substantially from the overall expected behavior or norm of the dataset. For example, a single credit card transaction amounting to an unusually high value compared to a user's typical spending pattern exemplifies a point anomaly.^[1] Contextual anomalies arise when a data instance is deviant only within a particular context, such as a specific time period or location, while it may conform to norms in other contexts. These anomalies are characterized by both contextual attributes (e.g., temporal or spatial factors) and behavioral attributes (e.g., the value itself). A classic illustration is a temperature reading of 35°F, which is anomalous during summer but normal in winter within a time-series dataset.^[1] Collective anomalies involve a collection of related data instances that are anomalous when considered together as a group, even though the individual instances might appear normal in isolation. Such anomalies often manifest in sequences or clusters, like a coordinated series of network intrusions in cybersecurity logs that deviate from baseline system behavior.^[1] Anomalies can further be differentiated as global or local depending on their scope relative to the data structure. Global anomalies deviate from the entire dataset's distribution, such as an isolated point far from the main cluster in a uniform dataset. In contrast, local anomalies are outliers only with respect to their immediate neighborhood, for instance, a data point in a dense cluster that has lower density compared to surrounding points in high-dimensional data.^[6] Regarding dimensionality, anomalies are also classified as temporal or spatial. Temporal anomalies occur in time-ordered data, where deviations disrupt sequential patterns, such as irregular heart rate spikes in electrocardiogram signals over time. Spatial anomalies, meanwhile, pertain to positional or geographic data, exemplified by aberrant sensor readings in a localized area, like unusual seismic activity confined to a specific region. These distinctions are particularly relevant in time-series applications, where temporal anomalies might appear as discords in sequential data streams.^[7]

History

Early Foundations

The foundations of anomaly detection trace back to the 19th century, when statisticians began addressing outliers—data points deviating markedly from the expected pattern—as a core challenge in data analysis. Early efforts focused on identifying and handling "discordant observations" to improve the reliability of statistical inferences. For instance, in 1863, William Chauvenet proposed a criterion for rejecting outliers in astronomical data based on the probability of their occurrence under a normal distribution, marking one of the first formalized methods for outlier exclusion. This approach influenced subsequent work, emphasizing the need to distinguish genuine anomalies from measurement errors. Building on these ideas, Francis Ysidro Edgeworth contributed significantly in 1887 with his analysis of discordant observations, exploring how outliers affect probability distributions and advocating for robust statistical tests to detect abnormal deviations. Edgeworth's work highlighted the importance of considering the tails of distributions in outlier identification, laying groundwork for modern statistical anomaly detection. By the early 20th century, William Sealy Gosset, publishing as "Student" in 1908, introduced the t-test, which provided a method for testing means in small samples potentially contaminated by outliers, enhancing the robustness of anomaly assessment in limited datasets. These developments established outlier detection as a statistical discipline, prioritizing probabilistic models to quantify deviations. In the realm of computing, anomaly detection emerged in the 1970s through manual monitoring of early networks like ARPANET, where system administrators reviewed audit logs by hand to identify unusual activities indicative of misuse or faults. This labor-intensive process represented the initial shift toward real-time surveillance in networked environments, driven by growing concerns over unauthorized access. A pivotal advancement came in 1987 with Dorothy E. Denning's framework for intrusion detection systems (IDS), which formalized anomaly detection using statistical profiles of user behavior to flag deviations from normal patterns, distinguishing it from rule-based approaches. Denning's model integrated audit data analysis with threshold-based alerts, enabling automated detection of intrusions without predefined attack signatures.^[8] Key milestones in this evolution were later synthesized by Richard A. Kemmerer and colleagues in 2002, who traced the progression from rule-based IDS—reliant on explicit misuse signatures—to statistical anomaly detection, noting how the latter's adaptability to novel threats addressed limitations of earlier systems. This historical overview underscored the enduring value of statistical foundations in handling unknowns, paving the way for more sophisticated techniques in subsequent decades.^[9]

Modern Developments

In the 1990s and 2000s, anomaly detection shifted toward real-time applications, particularly in intrusion detection systems (IDS), driven by evaluations like those conducted by DARPA in 1998 and 1999. These evaluations, organized by MIT Lincoln Laboratory, tested IDS performance using simulated network traffic with embedded attacks, including both offline and real-time scenarios to assess detection accuracy and false alarm rates under operational conditions.^[10]^[11] The 1999 DARPA effort specifically incorporated real-time testing on a controlled network testbed, marking a transition from batch processing to dynamic monitoring essential for cybersecurity.^[12] Concurrently, data mining techniques gained prominence, enabling scalable analysis of large datasets through methods like clustering and association rules, which addressed the limitations of traditional statistical approaches in handling high-dimensional data.^[1] From the 2010s onward, anomaly detection increasingly integrated machine learning (ML) and deep learning (DL), enhancing capabilities for unsupervised and semi-supervised scenarios where labeled anomalies are scarce. Seminal surveys, such as Chandola et al.'s 2009 overview, categorized techniques into statistical, proximity-based, and density-based methods, laying groundwork for ML extensions that improved adaptability to evolving data patterns.^[4] DL models, including autoencoders and recurrent neural networks, emerged prominently in the mid-2010s, offering superior feature extraction for complex, non-linear anomalies in domains like network security and sensor data.^[13] This period also saw rapid growth in IoT applications, where anomaly detection addressed resource-constrained environments through lightweight ML algorithms for real-time threat identification in interconnected devices.^[14]^[15] In the 2020s, advances in AI have further refined time-series anomaly detection, with emphasis on scalable, explainable models for streaming data in big data ecosystems. Recent trends highlighted in PVLDB proceedings underscore hybrid AI approaches that combine forecasting with reconstruction errors to pinpoint subtle deviations in temporal patterns, achieving higher precision in industrial and financial monitoring. The global anomaly detection market reflects this momentum, projected to reach $6.90 billion in 2025, fueled by AI-driven demand in cybersecurity and predictive maintenance.^[16]

Methods

Statistical Methods

Statistical methods for anomaly detection rely on modeling the underlying probability distribution of normal data to identify deviations that indicate anomalies. These approaches assume that anomalies are rare events occurring outside the expected statistical behavior of the majority of the data. Traditional statistical techniques are particularly effective for univariate or low-dimensional data where distributional assumptions hold, providing interpretable and computationally efficient solutions.

Parametric Methods

Parametric methods presuppose a specific probability distribution for the normal data, such as the Gaussian distribution, enabling the estimation of parameters like mean \mu and standard deviation \sigma from the observed samples. A common technique is the z-score, which standardizes data points to measure their deviation from the mean in units of standard deviation, calculated as z = \frac{x - \mu}{\sigma}. Data points with |z| > 3 are typically flagged as anomalies, as this threshold corresponds to approximately 0.3% of data under a normal distribution, assuming independence and normality.^[17] For more formal hypothesis testing, Grubbs' test detects a single outlier in a univariate dataset assumed to follow a normal distribution by comparing the deviation of the suspected outlier to the standard deviation of the remaining data. The test statistic is G = \frac{\max |x_i - \bar{x}|}{s}, where \bar{x} is the sample mean and s is the sample standard deviation, and it is rejected if G exceeds a critical value from the distribution under the null hypothesis of no outliers. This method, originally proposed for small to moderate sample sizes, provides a p-value for decision-making and has been widely adopted in quality control applications.^[18]

Non-Parametric Methods

Non-parametric methods do not assume a specific distributional form, making them robust to violations of normality and suitable for exploratory analysis. Histogram-based approaches estimate the empirical density by binning the data and identifying points in low-density bins as potential anomalies; for instance, the outlier score can be inversely proportional to the bin height, reflecting rarity. Box plots, introduced as a visual tool, display the interquartile range (IQR) and flag points beyond Q_1 - 1.5 \times \text{IQR} or Q_3 + 1.5 \times \text{IQR} as outliers, where Q_1 and Q_3 are the first and third quartiles, providing a quick, non-parametric rule for symmetric or skewed distributions. Extreme value theory (EVT) extends non-parametric analysis to model the tails of distributions, focusing on the behavior of maxima or minima rather than the bulk. By fitting a generalized Pareto distribution (GPD) to exceedances over a high threshold, EVT estimates the probability of extreme events, flagging anomalies when observations fall into the upper tail with low exceedance probability. This framework is particularly useful for heavy-tailed data where standard methods fail, as it theoretically justifies the rarity of anomalies in the extremes.

Parameter-Free Methods

Parameter-free methods, such as the Histogram-based Outlier Score (HBOS), operate without explicit distributional assumptions or user-defined parameters beyond basic binning, achieving linear time complexity O(n) for n data points. HBOS assumes feature independence and computes univariate histograms for each dimension d, estimating the probability p(x_{i,d}) as the relative frequency of the bin containing x_{i,d}. The outlier score for a point \mathbf{x}_i is then \text{HBOS}(\mathbf{x}_i) = -\sum_{d=1}^D \log p(x_{i,d}), where higher scores indicate greater outlierness; anomalies are those exceeding a percentile-based threshold on the scores. The algorithm proceeds in steps: (1) construct histograms for each feature using a fixed number of bins (e.g., \sqrt{n}), (2) compute HBOS scores for all points, and (3) rank or threshold to detect outliers. This method excels in high-dimensional settings due to its efficiency and has demonstrated competitive performance on benchmark datasets compared to more complex alternatives.^[19]

Proximity and Clustering Methods

Proximity-based methods for anomaly detection identify outliers by measuring how isolated a data point is from its nearest neighbors using distance metrics, such as Euclidean distance, without assuming an underlying data distribution. These approaches treat anomalies as points that are significantly farther from the majority of the data compared to normal instances. A foundational technique is the k-nearest neighbors (k-NN) distance method, where an anomaly score for a point p is computed as the distance to its k-th nearest neighbor; points with large scores are flagged as outliers since normal data tend to cluster closely. To compute this score, the algorithm first calculates pairwise distances between all points, then for each point identifies the k nearest neighbors and takes the maximum distance among them as the score, enabling ranking of potential anomalies. This method excels in high-dimensional spaces where parametric assumptions fail, though it can suffer from the curse of dimensionality, leading to less meaningful distances as dimensions increase.^[20] The Local Outlier Factor (LOF) extends proximity concepts by incorporating local density variations, defining the outlier degree of a point based on how much its density differs from that of its neighbors. For a point p, the LOF is calculated using the k-distance of neighbors, reachability distances, and local reachability densities (lrd). Specifically, the reachability distance of p with respect to neighbor o is \max\{k\text{-distance}(o), d(p, o)\}, where d(p, o) is the distance between p and o, and k\text{-distance}(o) is the distance to the k-th nearest neighbor of o. The local reachability density of p is then lrd_k(p) = \frac{k}{\sum_{o \in N_k(p)} \text{reach-dist}_k(p, o)}, where N_k(p) is the k-neighborhood of p. The LOF score is LOF_k(p) = \frac{\sum_{o \in N_k(p)} lrd_k(o) / lrd_k(p)}{k}, capturing relative density deviation; values near 1 indicate normal points, while higher values signal local outliers. LOF's steps involve computing neighborhoods, densities, and ratios for all points, making it effective for detecting outliers in varying density regions, though computational complexity O(n^2) limits scalability for large datasets. In high dimensions, LOF's reliance on distances can degrade performance due to sparsity.^[21] Clustering-based methods leverage grouping algorithms to isolate anomalies as points that do not fit into dense clusters. In DBSCAN (Density-Based Spatial Clustering of Applications with Noise), anomalies are identified as noise points—those not assigned to any cluster—by defining clusters as dense regions separated by sparse areas using parameters \epsilon (neighborhood radius) and MinPts (minimum points for density). The algorithm starts by selecting an arbitrary point, expands its \epsilon-neighborhood if it has at least MinPts points (marking it core), and propagates to form clusters; isolated points or those in low-density areas remain unlabeled as noise, serving as anomaly candidates. This approach inherently handles arbitrary cluster shapes and varying densities, with advantages in high dimensions when \epsilon is tuned appropriately, though sensitivity to parameters can affect results.^[22] Isolation Forest builds on tree-based partitioning to isolate anomalies through random recursive splitting, treating the path length in isolation trees as an anomaly measure. It constructs an ensemble of isolation trees by randomly selecting features and split values to partition data until points are isolated; anomalies, being few and distinct, require shorter paths (fewer splits) to isolate than normal points, which share similar values and take longer. The anomaly score for a point is the average path length across trees, normalized such that scores below 0.5 indicate anomalies. Training involves subsampling the data for each tree to enhance efficiency, achieving linear time complexity O(n), and it performs robustly in high-dimensional spaces by avoiding distance computations altogether, mitigating the curse of dimensionality.^[23]

Density-Based Methods

Density-based methods for anomaly detection identify data points as anomalies if they lie in regions of low probability density relative to the overall data distribution. These approaches estimate the underlying density function of the normal data and assign anomaly scores based on how much a point deviates from high-density regions. Kernel density estimation (KDE) is a foundational non-parametric technique used in these methods, where the density at a point \mathbf{x} is approximated as \hat{f}(\mathbf{x}) = \frac{1}{n h^d} \sum_{i=1}^n K\left( \frac{\mathbf{x} - \mathbf{x}_i}{h} \right), with K as the kernel function, h as the bandwidth, and d as the dimensionality. Gaussian kernels, defined by K(\mathbf{u}) = \frac{1}{(2\pi)^{d/2}} \exp\left( -\frac{\|\mathbf{u}\|^2}{2} \right), are commonly selected for their radial symmetry and ability to produce smooth density estimates, enabling effective detection of isolated low-density points as anomalies.^[1] Support Vector Data Description (SVDD) represents a kernelized density-based approach that models normal data as lying within a tight hypersphere in a feature space. The objective is to minimize the radius R of the hypersphere centered at \mathbf{a}, formulated as \min_{R, \mathbf{a}, \xi_i} R^2 + C \sum_{i=1}^n \xi_i subject to \|\phi(\mathbf{x}_i) - \mathbf{a}\|^2 \leq R^2 + \xi_i for all i, where \phi maps data to a higher-dimensional space, \xi_i are slack variables for robustness, and C controls the trade-off. The support vectors define the boundary, and a test point \mathbf{x} is anomalous if \|\phi(\mathbf{x}) - \mathbf{a}\|^2 > R^2. This method effectively captures compact, high-density regions while excluding low-density outliers.^[24] One-class support vector machine (OC-SVM) extends density estimation by finding a hyperplane that maximizes the margin from the origin in feature space, thereby enclosing the support of the normal data distribution. The optimization problem is \min_{w, \xi, \rho} \frac{1}{2} \|w\|^2 + \frac{1}{\nu n} \sum_{i=1}^n \xi_i - \rho, subject to \langle w, \phi(\mathbf{x}_i) \rangle \geq \rho - \xi_i and \xi_i \geq 0, where \nu bounds the fraction of outliers and support vectors. Points violating \langle w, \phi(\mathbf{x}) \rangle < \rho are classified as anomalies, providing a flexible boundary for irregular high-density regions. The connectivity-based outlier factor (COF) improves density estimation in unevenly dense data by incorporating path connectivity among neighbors, avoiding issues with sparse regions. It defines the chaining distance d_{\text{chain}}(p, q) as the minimum length of paths connecting points p and q via intermediate neighbors. The path-based density of a point p is then \rho(p) = \frac{|N_k(p)|}{\sum_{o \in N_k(p)} d_{\text{chain}}(p, o)}, where N_k(p) denotes the k-nearest neighbors of p. The COF score is computed as

\text{COF}_k(p) = \frac{\frac{1}{|N_k(p)|} \sum_{o \in N_k(p)} \rho(o)}{\rho(p)},

yielding high values for points with poor connectivity to dense areas, thus identifying outliers based on relational density paths.^[25]

Neural Network Methods

Neural network methods in anomaly detection leverage deep learning architectures to automatically extract intricate features from complex, high-dimensional data, enabling the identification of outliers without relying on predefined statistical assumptions. These approaches excel in unsupervised settings, where normal patterns are learned from unlabeled data, and anomalies are flagged based on deviations such as reconstruction errors or generative inconsistencies. By modeling non-linear relationships, neural networks outperform traditional methods in domains like images, sequences, and multivariate time series, though they require substantial computational resources and careful tuning to avoid overfitting. Autoencoders form a cornerstone of neural network-based anomaly detection, consisting of an encoder that compresses input data into a lower-dimensional latent space and a decoder that reconstructs it from this representation. The reconstruction error—typically the mean squared error between input and output—serves as the anomaly score, with higher errors indicating deviations from learned normal patterns. This approach is particularly effective for high-dimensional data where manual feature engineering is impractical. A seminal demonstration by Sakurada and Yairi showed that autoencoders with nonlinear dimensionality reduction outperform linear principal component analysis (PCA) in detecting subtle anomalies in spacecraft telemetry data, achieving improved accuracy by capturing manifold structures.^[26] Variational autoencoders (VAEs) enhance standard autoencoders by introducing probabilistic latent variables, promoting smoother latent spaces and better generalization for anomaly scoring. The VAE optimizes the evidence lower bound (ELBO), balancing reconstruction fidelity with regularization via the Kullback-Leibler (KL) divergence:

\mathcal{L} = \mathbb{E}_{q(z|x)}[\log p(x|z)] - D_{KL}(q(z|x) \| p(z))

This loss function, formulated by Kingma and Welling, encourages the approximate posterior q(z|x) to match a prior p(z), typically a standard Gaussian, while maximizing the expected log-likelihood of the data. In anomaly detection, VAEs quantify uncertainty in reconstructions, proving robust for tasks like fraud detection where noisy normal data predominates.^[27] Generative adversarial networks (GANs) adapt adversarial training for anomaly detection by pitting a generator against a discriminator to model the distribution of normal data, with anomalies detected via poor generation or discrimination scores. This framework allows synthesis of realistic normal samples, aiding detection in imbalanced datasets. AnoGAN, introduced by Schlegl et al., applies GANs to unsupervised anomaly detection in medical images by iteratively mapping test inputs to the learned latent manifold and measuring reconstruction discrepancies, achieving high sensitivity on datasets like chest X-rays without labeled anomalies.^[28] Recent advancements include CloudGEN, which employs GANs to generate adaptive cloud traffic patterns for real-time anomaly detection in cloud computing environments, outperforming baselines like isolation forests by up to 15% in precision on synthetic workloads.^[29] Convolutional neural networks (CNNs) extend autoencoder principles to spatial data, using convolutional layers to capture local patterns in images for anomaly detection in visual inspection tasks. CNN-based models, such as convolutional autoencoders, reconstruct pixel-level features, flagging defects through elevated errors in industrial or medical imaging. For example, a deep CNN autoencoder framework has demonstrated effectiveness in identifying anomalies in non-image manufacturing data converted to 2D representations, reducing false positives compared to traditional thresholding.^[30] Recurrent neural networks (RNNs), particularly long short-term memory (LSTM) variants, address sequential anomalies by modeling temporal dependencies in time series data. LSTM autoencoders encode past observations to predict and reconstruct future sequences, with prediction errors signaling deviations like sudden spikes in sensor readings. A foundational LSTM approach by Malhotra et al. utilized stacked LSTM networks for multivariate time series, achieving superior detection rates on datasets from server machines and space shuttles by learning long-term patterns without supervision. Transformer-based models represent cutting-edge neural methods for time series anomaly detection, employing self-attention mechanisms to process entire sequences in parallel and capture global dependencies more efficiently than RNNs. These architectures often use masked autoencoders or reconstruction objectives tailored to temporal data, enhancing scalability for long horizons. In 2025, the RTdetector model advanced this paradigm by incorporating reconstruction trends in transformers, improving anomaly localization in multivariate industrial time series with up to 20% better F1-scores over LSTM baselines on benchmarks like SMD.^[31]

Ensemble and Hybrid Methods

Ensemble methods in anomaly detection combine multiple base detectors to enhance robustness and accuracy, addressing limitations of individual models such as sensitivity to noise or parameter tuning. By aggregating predictions from diverse detectors, ensembles reduce false positives and improve generalization across varied datasets. Bagging, or bootstrap aggregating, involves training base detectors on random subsets of data with replacement, promoting diversity and stability; for instance, it has been applied to outlier detection to mitigate overfitting in high-dimensional spaces. Boosting iteratively refines weak learners by assigning higher weights to misclassified instances, sequentially improving detection performance in unsupervised settings. These approaches outperform single detectors in benchmarks.^[32] Feature bagging extends this paradigm by randomly selecting subsets of features for each base tree in algorithms like Isolation Forest, isolating anomalies more efficiently through randomized partitioning. In extensions of Isolation Forest, feature bagging enhances scalability for large-scale data, reducing computational complexity from O(n log n) to near-linear while maintaining high detection rates. A seminal implementation demonstrated that such ensembles detect anomalies with 90% precision on synthetic datasets by leveraging tree diversity. Hybrid methods fuse statistical and machine learning techniques to leverage complementary strengths, such as the interpretability of statistics with the adaptability of ML models. For example, combining z-score thresholding for initial outlier flagging with autoencoder reconstruction errors refines anomaly scoring in time-series data, improving sensitivity in non-stationary environments. Multi-view ensembles for IoT applications integrate heterogeneous sensor data across multiple perspectives, using stacking or voting to fuse outputs from base models like random forests and SVMs; recent surveys highlight their efficacy in cybersecurity, with high F1-scores on datasets like IoTID20.^[33] Voting mechanisms aggregate individual detector scores to produce a consensus decision, with majority voting classifying instances based on the most common prediction and weighted averaging incorporating model confidence or performance metrics. These mechanisms are particularly effective in unsupervised anomaly detection, where normalized scores from detectors like LOF are combined to yield a final anomaly rank. Diversity metrics, such as the Q-statistic, quantify pairwise agreement among base detectors—values closer to -1 indicate high diversity, optimizing ensemble selection for better coverage of anomaly types. Studies show that ensembles with diversity measures like the Q-statistic achieve improved recall on benchmark datasets.^[32]^[34]

Applications

Cybersecurity

In cybersecurity, anomaly detection is integral to intrusion detection systems (IDS), which safeguard networks and endpoints against unauthorized access and malicious activities. Network-based IDS (NIDS) monitor traffic across entire networks at key points, such as vulnerable subnets, by analyzing packet contents and metadata to detect suspicious patterns. In contrast, host-based IDS (HIDS) focus on individual devices, examining system logs, file integrity, and local activities to identify threats specific to endpoints like servers or workstations. NIDS offer broad visibility for large-scale monitoring but may miss encrypted or host-internal attacks, while HIDS provide granular protection for critical assets yet require deployment on each device, increasing management overhead.^[35] IDS employ two primary detection modes: signature-based and anomaly-based. Signature-based systems match observed traffic against a database of known attack patterns, enabling rapid identification of familiar exploits like SQL injections, with the advantage of low false positives for established threats but the limitation of ineffectiveness against novel variants that evade predefined signatures. Anomaly-based systems, however, establish a baseline of normal behavior through statistical models or machine learning and flag deviations, such as unusual data flows, allowing detection of unknown threats including zero-day exploits; their strength lies in adaptability to emerging risks, though they can generate more false positives if baselines are poorly tuned. Hybrid approaches combining both modes are increasingly common to balance precision and coverage in dynamic environments.^[36] Specific applications highlight anomaly detection's value in threat mitigation. For Distributed Denial of Service (DDoS) attacks, it identifies anomalies like sudden traffic volume spikes—often 10 times normal rates—from atypical sources or protocols, such as SYN floods, enabling real-time responses like traffic rerouting to avert service disruptions; studies show this can reduce detection time from over an hour to seconds in enterprise settings. On endpoints, anomaly detection counters zero-day attacks by monitoring behavioral indicators, including unexpected process launches, abnormal outbound connections to unknown IP addresses, or unauthorized file modifications, which signal exploits bypassing traditional antivirus signatures.^[37]^[38] Recent developments emphasize machine learning enhancements for sophisticated threats like Advanced Persistent Threats (APTs), which unfold in multi-stage intrusions involving reconnaissance, lateral movement, and exfiltration. ML techniques, such as optimized LightGBM models using feature selection via LDR-RFECV and hyperparameter tuning with LWHO, detect anomalies in network behaviors during these phases, achieving accuracies of 97.31% on the DAPT2020 dataset and 98.32% on Unraveled, outperforming baselines by up to 4% in identifying subtle lateral movements. In Security Information and Event Management (SIEM) tools, anomaly detection mitigates false positives by dynamically learning normal patterns—like typical login times or file access—through behavioral analysis, suppressing alerts for benign variations and prioritizing genuine risks, which boosts security operations center efficiency by reducing alert fatigue.^[39]^[40]

Financial Fraud Detection

Anomaly detection plays a pivotal role in financial fraud detection by identifying irregular patterns in transaction data that deviate from established norms, enabling proactive intervention to mitigate losses. Transaction monitoring systems leverage anomaly detection to scrutinize real-time financial activities, focusing on velocity checks that flag abnormal transaction frequencies or volumes, such as sudden spikes in payment activity that exceed historical baselines for an account.^[41] These systems also detect unusual amounts, where transaction values significantly differ from a user's typical spending profile, and anomalous locations, such as purchases originating from geographically distant or inconsistent regions relative to the account holder's known behavior. By modeling normal behavioral profiles through unsupervised learning techniques, these methods achieve early identification of potential fraud without relying solely on predefined rules, reducing false positives in high-volume banking environments.^[42] Graph-based anomaly detection extends this capability to uncover complex networks of illicit activities, particularly in money laundering schemes where transactions form interconnected patterns across multiple accounts. These approaches represent financial entities as nodes and interactions as edges in a graph, applying algorithms like graph neural networks to detect structural anomalies, such as dense subgraphs indicative of layering or smurfing tactics used to obscure fund origins.^[43] A seminal application involves community detection and oddball identification to isolate suspicious clusters that deviate from legitimate network topologies, as demonstrated in analyses of inter-account transfer graphs where anomalous centrality measures highlight mule accounts.^[44] This method has proven effective in regulatory compliance, enabling the tracing of obfuscated flows in large-scale payment networks.^[45] Real-time scoring with machine learning integrates anomaly detection into operational workflows, processing streaming transaction data to assign risk scores instantaneously and trigger alerts or blocks. Techniques such as isolation forests or autoencoders compute deviation scores from learned normalcy models, allowing systems to adapt to evolving fraud tactics while handling millions of transactions per second.^[46] In 2024, IBM introduced enhancements to its Watsonx.ai platform and Safer Payments suite, incorporating generative AI for synthetic data augmentation and anomaly scoring tailored to financial services, which improves detection accuracy in low-data scenarios common to emerging fraud types. These tools facilitate hybrid rule-ML architectures that balance interpretability with predictive power, as seen in deployments reducing fraud investigation times by up to 50% in banking consortia.^[47] Given the severe class imbalance in fraud datasets—where fraudulent cases represent less than 1% of transactions—evaluation metrics emphasize precision and recall over accuracy to prioritize true positive detection without excessive false alarms. Precision measures the proportion of flagged anomalies that are actual frauds, crucial for minimizing customer friction from erroneous blocks, while recall captures the fraction of genuine frauds identified, essential for overall risk reduction.^[48] In credit card fraud case studies, ensemble models combining random forests with oversampling techniques have achieved precision and recall values exceeding 0.95 on imbalanced European cardholder datasets, demonstrating scalability to real-world volumes exceeding 284,000 transactions.^[49] Such results underscore the practical impact while maintaining operational efficiency.^[50]

Healthcare

Anomaly detection plays a crucial role in healthcare by identifying deviations in physiological and epidemiological data to support early diagnostics and intervention. In medical contexts, it analyzes signals and images to flag irregularities indicative of conditions such as cardiac arrhythmias or tumors, leveraging techniques like waveform analysis and reconstruction modeling to enhance patient outcomes.^[51] In electrocardiogram (ECG) and electroencephalogram (EEG) monitoring, anomaly detection focuses on waveform deviations to identify arrhythmias, where irregular patterns in cardiac or neural signals signal potential health risks. The Robust and Accurate Anomaly Detection (RAAD) algorithm, for instance, employs time series motif discovery and Dynamic Time Warping to segment ECG morphologies and distinguish artifacts from true anomalies, achieving 100% accuracy and 0% false alarm rate on datasets like MIT-BIH Arrhythmia Database.^[52] Deep learning approaches, including convolutional neural networks and long short-term memory models, further improve arrhythmia detection by extracting temporal features from ECG signals, with hybrid models reaching up to 99.46% accuracy on benchmarks such as MIT-BIH and PTB datasets.^[53] These methods, often applied to time-series data, enable real-time monitoring for conditions like atrial fibrillation.^[53] Wearable Internet of Things (IoT) devices extend anomaly detection to continuous vital signs monitoring, such as heart rate and blood pressure, facilitating early alerts for elderly or chronic patients. Anomaly detection frameworks for wearables integrate unsupervised algorithms like K-means and isolation forests to identify point or contextual anomalies in multivariate time-series data from devices like Fitbit, associating deviations with health events such as atrial fibrillation in over 16,000 patients.^[54] For hypertension management, deep learning models combining ResNet for feature extraction and LSTM for sequential analysis detect anomalies in photoplethysmography signals, yielding a mean absolute error of 6.2 mmHg and enabling non-invasive, remote interventions.^[55] Such systems support precision health by linking wearable data to electronic health records for proactive care.^[54] In medical imaging, anomaly detection identifies tumors in magnetic resonance imaging (MRI) scans through reconstruction errors, where models trained on healthy data highlight deviations as pathological regions. Unsupervised methods learn abstract distributions from large healthy brain MRI datasets, using reconstruction discrepancies to detect anomalies like glioblastomas with high sensitivity.^[56] Denoising autoencoders, employing skip connections for high-fidelity reconstructions, outperform variational autoencoders in tumor localization, providing a robust baseline for unsupervised brain MRI analysis without labeled anomalies.^[57] Anomaly detection also aids epidemic outbreak identification by flagging unusual patterns in surveillance data, enabling timely public health responses. Unsupervised machine learning techniques, such as principal component analysis and isolation forests, detect spikes in endemic disease cases like malaria from aggregated monthly records, identifying outbreak onsets and peaks across regions with up to 10% contamination thresholds.^[58] Time-series analysis of helpline call trends for symptoms like fever and cough has proven effective, with algorithms spotting collective anomalies seven days before confirmed COVID-19 cases in Sweden, using dynamic thresholds on rolling averages.^[59] Ethical considerations in healthcare anomaly detection emphasize compliance with regulations like the Health Insurance Portability and Accountability Act (HIPAA) to safeguard patient data in machine learning applications. Machine learning models trained on HIPAA-protected datasets must incorporate privacy-preserving techniques to mitigate risks of data leakage during anomaly detection in electronic health records, achieving high recall (93.0%) for privacy infringements.^[60] A 2025 review of artificial intelligence in healthcare underscores ongoing challenges in patient privacy for ML-based anomaly detection, advocating federated learning and anonymization to balance diagnostic utility with ethical data security.^[51]

Industrial and IoT Systems

In industrial settings, anomaly detection plays a crucial role in predictive maintenance, particularly through the analysis of vibration and sensor data from rotating machinery to forecast faults such as imbalances, misalignments, and bearing defects.^[61] Accelerometers are the primary sensors employed due to their accuracy and non-invasive nature, capturing signals that are processed via time-domain methods like root mean square (RMS) for imbalance detection and kurtosis for bearing faults, as well as frequency-domain techniques such as fast Fourier transform (FFT) for identifying harmonic patterns indicative of gear issues.^[61] Time-frequency approaches, including wavelet transforms, further enhance detection of non-stationary signals in complex machinery environments, enabling early intervention to minimize downtime.^[61] These methods collectively support condition-based maintenance strategies, reducing operational costs in manufacturing applications through proactive fault identification.^[62] In the oil and gas sector, anomaly detection is applied to pipeline monitoring using sensors for pressure, temperature, and flow rates to identify leaks caused by corrosion or structural failures.^[63] Machine learning models, such as support vector machines (SVM), have demonstrated high efficacy, achieving 97.4% accuracy in classifying leak severities after feature scaling and optimization on industrial datasets.^[63] Deep neural networks extend this to offshore naturally flowing wells, where they detect flow anomalies in real-time, addressing data complexity and improving safety by alerting to deviations from expected patterns.^[64] For Internet of Things (IoT) systems in industrial contexts, anomaly detection leverages edge computing to enable real-time processing of heterogeneous data streams from connected devices, mitigating latency issues inherent in cloud-based alternatives.^[65] Recurrent deep learning models with instance-level reduction techniques process high-dimensional traffic at the edge, achieving up to 99% accuracy in identifying network anomalies while handling noise and scalability demands.^[65] Recent surveys highlight challenges like device heterogeneity, which complicates model generalization across diverse IoT ecosystems, and resource constraints on edge nodes that limit deployment of complex algorithms.^[66] These issues are exacerbated in industrial IoT, where real-time detection is essential for operational continuity, prompting advancements in federated learning to preserve privacy and adapt to varying data types.^[66] Video surveillance enhances anomaly detection in factories by analyzing motion patterns to identify deviations such as worker falls, slips, or unsafe interactions with machinery.^[67] Deep learning frameworks, including Mask R-CNN for object detection and long short-term memory (LSTM) networks for pose estimation, achieve 97% accuracy in recognizing anomalous behaviors like tool breakage or machine malfunctions through frame-by-frame analysis of human-object interactions.^[67] In the petroleum industry, IoT-integrated systems incorporate video streaming alongside gas sensors for comprehensive monitoring, using one-class SVM to detect motion anomalies and gas spikes in real-time, thereby bolstering safety in hazardous environments.^[68]

Special Topics

Anomaly Detection in Dynamic Networks

Anomaly detection in dynamic networks involves identifying unusual patterns in evolving graph structures where nodes and edges change over time, such as in social media interactions or communication systems.^[69] These networks are modeled as temporal graphs, capturing changes through discrete snapshots (e.g., a sequence of graphs G_t = (V_t, E_t) at time t) or continuous-time representations that track evolving node attributes and edge formations.^[69] One foundational approach for outlier detection in graphs is ODIN (Outlier Detection using Indegree Number), which constructs a k-nearest neighbor graph and measures outlierness based on the indegree of nodes, highlighting entities with few reciprocal nearest neighbors as anomalies.^[70] Common anomaly types in dynamic networks include those characterized by sudden shifts in group connectivity or unusual paths that deviate from expected norms.^[69] For instance, anomalies might manifest as a rapid isolation of a subgroup in a collaboration network or unexpected long-range connections bridging distant components.^[71] Recent advancements have emphasized streaming graph processing for real-time detection, incorporating scalable techniques like approximate counting (e.g., Count-Min Sketch) to handle high-velocity edge arrivals without full graph recomputation.^[71] Methods such as NetWalk leverage incremental clustering on streaming data to flag node or edge anomalies as they emerge, achieving efficient performance on large-scale temporal datasets.^[72] Key algorithms for dynamic anomaly detection include subgraph matching with evolution scores, which identifies anomalous substructures by comparing temporal changes in subgraph densities or motifs, and trajectory-based approaches that model node behaviors as time-series paths.^[69] For subgraph matching, techniques like StrGNN extract h-hop subgraphs at each timestamp and compute evolution scores using graph convolutional networks (GCNs) combined with gated recurrent units (GRUs) to score deviations in structural continuity.^[73] Trajectory-based methods, such as TADDY and AddGraph, represent node trajectories as sequences of embeddings and detect anomalies via reconstruction errors or attention mechanisms that highlight irregular temporal patterns in edge formations.^[74]^[75] These algorithms prioritize capturing both spatial dependencies and temporal dynamics, with empirical evaluations showing improved precision over static baselines in common temporal datasets.^[71]

Explainable Anomaly Detection

Explainable anomaly detection addresses the limitations of opaque, black-box models by incorporating interpretability mechanisms that elucidate why specific instances are flagged as anomalous. This is crucial in high-stakes domains where understanding the rationale behind detections informs decision-making and builds trust in the system. Traditional anomaly detection often relies on complex algorithms like neural networks, which excel in performance but obscure the decision process; explainable approaches integrate transparency without fully sacrificing efficacy. A prominent strategy involves integrating explainable AI (XAI) techniques such as LIME and SHAP for feature attribution in anomaly detection models. LIME, or Local Interpretable Model-agnostic Explanations, approximates the behavior of black-box models locally around an instance by fitting a simple surrogate model, highlighting which features contribute most to an anomaly's score; for example, in intrusion detection, it reveals specific network traffic attributes driving outlier status.^[76] Similarly, SHAP (SHapley Additive exPlanations) assigns importance values to features based on game-theoretic principles, quantifying their impact on the anomaly prediction across the dataset; applied to models like isolation forests, SHAP has demonstrated robust explanations in financial fraud scenarios by attributing deviations to key transaction variables. These post-hoc methods are model-agnostic, enabling retrofitting to existing anomaly detectors. Self-organizing maps (SOMs) provide an in-model explainable framework by visualizing high-dimensional data in low-dimensional lattices, where anomalies appear as points distant from dense normal clusters. SOMs facilitate interpretation through topological preservation, allowing users to inspect cluster structures and boundary violations; in industrial sensor data, this has enabled intuitive anomaly localization by contrasting descriptor vectors against learned prototypes.^[77]^[78] The method's strength lies in its unsupervised nature, generating human-readable maps that inherently explain deviations without additional post-processing. Rule-based explainers, such as those employing contrasting outlier explanations via subspace density contrasts, generate local interpretations by comparing an anomalous instance against neighboring normal points or subgroups, identifying feature subsets where the outlier deviates significantly. For instance, approaches using subspace density contrastive loss produce concise rules like "high feature A and low feature B relative to similar instances indicate anomaly"; this has been effective in high-dimensional datasets for pinpointing discriminative attributes.^[79] These methods prioritize fidelity to the data's local structure, offering actionable insights over global summaries. Recent advances have advanced counterfactual explanations for neural anomaly detection, generating minimal perturbations that transform an anomalous input into a normal one to reveal critical decision boundaries. For autoencoder-based models, counterfactuals elucidate reconstruction errors by suggesting "what-if" scenarios, such as altering specific time-series values to normalize the output; the AR-Pro framework formalizes this for repairable anomalies in vision and time-series data, achieving interpretable repairs while maintaining detection accuracy.^[80]^[81] However, these enhancements often involve trade-offs between interpretability and performance.

Evaluation and Resources

Datasets and Benchmarks

Anomaly detection research relies on standardized datasets to evaluate algorithms across diverse scenarios, enabling reproducible comparisons and highlighting strengths in handling imbalanced data or real-world variability.^[82] Among classic datasets, the KDD Cup 1999 dataset serves as a foundational benchmark for intrusion detection systems (IDS), comprising approximately 4.9 million network connection records labeled with normal traffic and 24 types of intrusions simulated in a military network environment.^[83] This dataset, derived from DARPA 1998 data processed into 41 features like protocol type and connection duration, has been extensively used to assess anomaly-based detection despite criticisms of its dated attack patterns and redundancy issues.^[84] The Numenta Anomaly Benchmark (NAB), introduced in 2015, provides 58 time series datasets across seven categories such as AWS CloudWatch and artificial data, totaling over 300,000 data points with labeled anomaly windows for streaming anomaly detection evaluation.^[85] NAB emphasizes real-time performance by scoring algorithms on detection delay and false positives, making it suitable for time-series applications like sensor monitoring.^[86] Complementing these, the Open Anomaly Benchmark (OAB) framework from LMU Munich offers a modular repository for unsupervised and semi-supervised anomaly detection on image and tabular data, including over 50 datasets with standardized splits and evaluation protocols to facilitate fair comparisons.^[87]

Dataset	Domain	Key Characteristics	Source
KDD Cup 1999	Network Intrusion	4.9M records, 41 features, 24 attack types + normal	UCI KDD Archive
NAB	Time Series	58 datasets, labeled windows, streaming focus	Numenta GitHub
OAB	Image/Tabular	50+ datasets, unsupervised/semi-supervised splits	LMU MCML

Modern datasets address emerging domains with higher fidelity. The IoT-23 dataset, released in 2020, captures 20 malware scenarios and 3 benign captures of IoT traffic, including over 1 million network flows in PCAP and Zeek formats for cybersecurity anomaly detection in resource-constrained environments.^[88] For financial applications, the UCI Credit Card Fraud Detection dataset contains 284,807 anonymized transactions from European cardholders over two days in 2013, with 492 frauds (0.172% imbalance) and 28 principal components derived from PCA to preserve privacy while enabling unsupervised anomaly modeling.^[89] Synthetic datasets like those in NAB also support streaming evaluations, simulating continuous data feeds with controlled anomaly injections to test robustness in dynamic systems.^[90] Another recent benchmark is the CICIoT2023 dataset, released in 2023, which includes real-time IoT network traffic with labeled attacks across seven attack types, providing over 33 million records for evaluating large-scale intrusion detection.^[91] Evaluation of anomaly detection models on these datasets typically employs metrics suited to imbalanced classes. The Area Under the Precision-Recall Curve (AUC-PR) is preferred over AUC-ROC for its sensitivity to positive class performance in low-prevalence anomaly scenarios, where precision measures true positives among predicted anomalies and recall captures detection coverage.^[92] Contamination rate, defined as the expected proportion of anomalies in the test set (often 0.1-1%), guides model calibration by simulating real-world rarity and influencing threshold selection in unsupervised settings.^[93] Recent benchmarks for deep learning methods highlight advances in handling multivariate time series on datasets like NAB and IoT-23. These resources are integrated into software tools for streamlined experimentation.^[94]

Software Tools

A variety of open-source libraries and frameworks facilitate the implementation of anomaly detection algorithms, enabling researchers and practitioners to apply methods across diverse datasets. PyOD (Python Outlier Detection) is a comprehensive Python library established in 2017 that supports over 50 detection algorithms, including classical techniques like local outlier factor and emerging deep learning-based models, making it suitable for scalable multivariate analysis.^[95]^[96] The library's modular design allows for easy integration and benchmarking, with recent updates in PyOD 2.0 incorporating large language model-powered model selection to automate pipeline optimization.^[97] Scikit-learn, a widely used machine learning library in Python, provides the Isolation Forest algorithm as a core tool for unsupervised anomaly detection, which isolates outliers by constructing random trees and measuring path lengths in the ensemble.^[98] This implementation is efficient for high-dimensional data and supports customizable parameters such as contamination rate to estimate the proportion of anomalies, facilitating rapid prototyping in applications like fraud detection.^[99] For Java-based environments, ELKI (Environment for Developing KDD-Applications Supported by Index-Structures) serves as an open-source data mining framework emphasizing unsupervised methods, including a suite of outlier detection algorithms such as distance-based and density-based approaches.^[100] Its modular architecture supports custom algorithm development and evaluation on large-scale datasets, with built-in indexing for efficient distance computations.^[101] KNIME, an open-source platform for data analytics, offers visual workflows for anomaly detection, integrating nodes for time series analysis, control charts, and machine learning models to preprocess data and identify deviations without extensive coding.^[102] Users can build end-to-end pipelines, such as those using autoencoders or statistical thresholds, leveraging extensions for predictive maintenance scenarios.^[103] In deep learning contexts, TensorFlow provides robust support for anomaly detection through its ecosystem, including autoencoder models and probabilistic layers for reconstruction-based outlier identification, with 2025 enhancements in TensorFlow 2.17 improving scalability for edge deployments.^[104]^[105] Commercial tools extend these capabilities with enterprise-grade integrations. The Splunk Machine Learning Toolkit (MLTK) enables anomaly detection directly within Splunk's search processing language, using algorithms like DensityFunction for time series outliers and supporting scalable training on log data.^[106] Version 5.5, released in 2025, introduces improved histogram-based detection for real-time alerting in security operations.^[107] Darktrace's AI-driven platform specializes in cybersecurity anomaly detection, employing self-learning models to baseline normal network behavior and flag subtle deviations in real time, without relying on predefined signatures.^[108] Its autonomous response features integrate with existing infrastructure to mitigate threats proactively.^[109] Amazon SageMaker offers built-in anomaly detection via the Random Cut Forest (RCF) algorithm, an unsupervised method that builds ensembles of isolation trees for streaming data, integrable with AWS services like Kinesis for real-time inference.^[110] This cloud-native integration supports automated model tuning and deployment for industrial IoT monitoring.^[111]

Tool Category	Example	Key Features	Primary Language/Platform
Open-Source Library	PyOD	50+ algorithms, deep learning support	Python
Open-Source Library	scikit-learn Isolation Forest	Ensemble isolation, high-dimensional efficiency	Python
Framework	ELKI	Modular outlier methods, indexing	Java
Framework	KNIME	Visual workflows, time series nodes	GUI-based
Deep Learning Framework	TensorFlow	Autoencoders, edge scalability	Python
Commercial Toolkit	Splunk MLTK	Time series anomalies, SPL integration	Splunk
Commercial Platform	Darktrace	Self-learning AI, cybersecurity focus	Cloud/On-prem
Cloud Integration	AWS SageMaker RCF	Streaming detection, auto-tuning	AWS Cloud

Challenges and Future Directions

Key Challenges

One of the primary hurdles in anomaly detection is the scarcity of labeled data, as anomalies are inherently rare events that occur infrequently in real-world datasets, often comprising less than 1% of the total samples.^[112] This label scarcity complicates supervised learning approaches, forcing reliance on unsupervised or semi-supervised methods that must infer normal patterns without explicit anomaly examples.^[113] For instance, in time series data from industrial sensors, obtaining accurate labels requires domain expertise and can be prohibitively time-consuming, exacerbating the challenge in dynamic environments.^[114] Data imbalance further intensifies this issue, where the overwhelming majority of data points represent normal behavior, leading models to bias toward classifying everything as normal and thereby increasing false negative rates.^[112] In network traffic analysis, for example, anomalies like intrusions may constitute only a tiny fraction of the data volume, skewing training processes and reducing detection sensitivity.^[113] High dimensionality compounds these problems through the curse of dimensionality, where sparse data in high-feature spaces promotes overfitting and diminishes the effectiveness of distance-based or clustering algorithms.^[115] Multivariate time series from sources like satellite telemetry, with dozens of dimensions, exemplify how increased dimensionality introduces noise and irrelevant features that obscure true anomalies.^[114] Detection pitfalls, particularly high rates of false positives and false negatives, undermine the reliability of anomaly detection systems in operational settings. False positives can trigger unnecessary alerts, eroding user trust and causing resource waste, such as erroneous fraud flags in financial transactions.^[112] Conversely, false negatives allow critical threats to go undetected, with severe consequences in domains like cybersecurity.^[113] Adversarial attacks pose an additional threat by deliberately poisoning training data or crafting inputs to evade detection, a growing concern in 2025 machine learning applications where models are increasingly deployed in adversarial environments like autonomous systems.^[113] Scalability challenges arise prominently in processing continuous data streams, where real-time anomaly detection is essential for applications such as IoT monitoring but strained by high-velocity inputs.^[114] Systems handling thousands of sensors, like those in smart grids, demand efficient algorithms to analyze streaming data without latency, yet computational demands often exceed available resources in edge computing scenarios.^[115] Ethical concerns, including algorithmic bias, are particularly acute in healthcare anomaly detection, where imbalanced datasets from diverse populations can lead to discriminatory outcomes, such as overlooking anomalies in underrepresented groups.^[112] This bias not only affects fairness but also amplifies risks in life-critical decisions.^[113]

Emerging Trends

In recent years, the integration of advanced AI and machine learning techniques has significantly advanced anomaly detection (AD), particularly through federated learning frameworks that enable privacy-preserving model training across distributed systems. Federated learning allows multiple entities, such as IoT devices or organizations, to collaboratively train AD models without sharing raw data, thereby addressing privacy concerns in sensitive environments like healthcare and finance. For instance, a 2025 framework proposed for IoT anomaly detection uses federated learning to achieve high detection accuracy while minimizing data leakage risks.^[116] Similarly, generative models, including GANs and diffusion models, have emerged as powerful tools for creating synthetic anomalies to augment imbalanced datasets, improving model robustness against rare events. These models generate realistic anomalous samples by learning data distributions, as demonstrated in a 2024 approach where GANs enhanced intrusion detection systems by simulating underrepresented attack patterns, leading to 3-7% improvements in F1-scores on benchmark datasets.^[117] Diffusion models further extend this by iteratively denoising data to produce diverse synthetic anomalies, particularly effective for time-series AD in industrial settings.^[118] Shifts in application domains are driving innovations in real-time AD capabilities, especially within 5G and IoT ecosystems, where low-latency processing is critical for handling massive data streams. In 5G networks, hybrid machine learning frameworks enable real-time anomaly identification in traffic patterns, supporting applications like smart city security by detecting deviations with sub-millisecond response times.^[119] Quantum-resistant AD is gaining traction as quantum computing threats loom, with post-quantum cryptographic integrations ensuring secure model deployment in cloud environments; a 2025 study outlined quantum-resilient frameworks using lattice-based cryptography to protect AD pipelines from adversarial attacks.^[120] Ethical AI frameworks are also evolving to mitigate biases in AD systems, incorporating fairness-aware algorithms that detect and correct discriminatory outcomes in decision-making processes, such as in cybersecurity where biased models could unfairly flag certain user behaviors.^[121] Market projections underscore the rapid growth of AD technologies, with the global market valued at USD 6.90 billion in 2025 and expected to reach USD 28 billion by 2034, as of November 2025, driven by increasing adoption in cybersecurity and IoT sectors. Recent 2025 surveys highlight ongoing advancements, including diffusion models for time-series AD that outperform traditional methods in dynamic environments, and generative AI approaches like CloudGEN for cloud-based anomaly detection, which leverage GANs to model complex system behaviors with enhanced accuracy.^[16]^[118]^[29]

References

[1]
[PDF] Anomaly Detection : A Survey - cucis
This survey is an attempt to provide a structured and a broad overview of extensive research on anomaly detection techniques spanning multiple research areas ...
[2]
[PDF] Survey: Anomaly Detection Methods
This survey aims three-fold; firstly, we present a structured and com- prehensive overview of anomaly detection methods. Secondly, we review the ...
[3]
Identification of Outliers - SpringerLink
The problem of outliers is one of the oldest in statistics, and during the last century and a half interest in it has waxed and waned several times.Missing: definition | Show results with:definition
[4]
Anomaly detection: A survey: ACM Computing Surveys: Vol 41, No 3
This survey tries to provide a structured and comprehensive overview of the research on anomaly detection. We have grouped existing techniques ...
[5]
Outlier Analysis | SpringerLink
This book provides comprehensive coverage of the field of outlier analysis from a computer science point of view.
[6]
A comprehensive survey of anomaly detection techniques for high ...
Jul 2, 2020 · This survey aims to document the state of anomaly detection in high dimensional big data by identifying the unique challenges using a triangular representation ...<|control11|><|separator|>
[7]
Anomaly detection using spatial and temporal information in ...
Mar 16, 2023 · Our work, aiming at the above problems and integrating with practical applications, proposed a novel Anomaly Detection Network Using Spatial ...Missing: contextual seminal
[8]
Intrusion and intrusion detection | International Journal of ...
Jan 31, 2014 · It traces the history of intrusion and intrusion detection from the early 1970s to the present day, beginning with a historical overview. The ...
[9]
[PDF] Intrusion Detection: A Brief History and Overview
Intrusion detection identifies evidence of attacks, not the attacks themselves, and is needed because systems are not 100% secure. Real-time systems can detect ...
[10]
1999 DARPA Intrusion Detection Evaluation Dataset
There were two parts to the 1999 DARPA Intrusion Detection Evaluation: an off-line evaluation and a real-time evaluation. Intrusion detection systems were ...Missing: 1990s | Show results with:1990s
[11]
a critique of the 1998 and 1999 DARPA intrusion detection system ...
In 1998 and again in 1999, the Lincoln Laboratory of MIT conducted a comparative evaluation of intrusion detection systems (IDSs) developed under DARPA funding.
[12]
[PDF] 1999 DARPA Intrusion Detection Evaluation: Design and Procedures
Feb 26, 2001 · The 1999 DARPA evaluation measured intrusion detection system detections and false alarm rates using a network testbed with attacks and traffic ...
[13]
[PDF] Deep Learning for Anomaly Detection: A Review - arXiv
We review a large number of relevant studies in leading conferences and journals of several relevant communities, including machine learning, data mining, ...
[14]
IoT Anomaly Detection Methods and Applications: A Survey - arXiv
Jul 19, 2022 · This paper begins with a summary of the detection methods and applications, accompanied by a discussion of the categorization of IoT anomaly detection ...
[15]
A Comprehensive Study of Anomaly Detection Schemes in IoT ...
Dec 13, 2021 · In this paper, we aim to provide an in-depth review of existing works in developing anomaly detection solutions using machine learning for protecting an IoT ...Missing: 2010s onward
[16]
Anomaly Detection Market Size to Hit USD 28.00 Billion by 2034
Jun 4, 2025 · The global anomaly detection market size is calculated at USD 6.90 billion in 2025 and is forecasted to reach around USD 28.00 billion by 2034, accelerating at ...
[17]
1.3.5.17. Detection of Outliers - Information Technology Laboratory
Although it is common practice to use Z-scores to identify possible outliers, this can be misleading (particularly for small sample sizes) due to the fact ...
[18]
1.3.5.17.1. Grubbs' Test for Outliers
Grubbs' test (Grubbs 1969 and Stefansky 1972) is used to detect a single outlier in a univariate data set that follows an approximately normal distribution.Missing: citation | Show results with:citation
[19]
Histogram-based Outlier Score (HBOS): A fast Unsupervised ...
In this paper, a histogram-based outlier detection (HBOS) algorithm is presented, which scores records in linear time. It assumes independence of the features.
[20]
Distance-based outliers: algorithms and applications
Abstract. This paper deals with finding outliers (excep- tions) in large, multidimensional datasets. The identification of outliers can lead to the ...
[21]
LOF: Identifying Density-Based Local Outliers - ACM Digital Library
Our notion of local outliers share a few fundamen- tal concepts with density-based clustering approaches. However, our outlier detection method does not require ...
[22]
[PDF] A Density-Based Algorithm for Discovering Clusters in Large Spatial ...
In this paper, we present the new clustering algorithm DBSCAN. It requires only one input parameter and supports the user in determin- ing an appropriate value ...Missing: anomaly | Show results with:anomaly
[23]
[PDF] Isolation Forest - LAMDA
This paper proposes a fundamentally different model-based method that explicitly isolates anomalies in- stead of profiles normal points. To our best knowledge, ...
[24]
Support Vector Data Description | Machine Learning
Tax, D., & Duin, R. (1999). Support vector domain description. Pattern Recognition Letters, 20:11–13, 1191–1199. Google Scholar. Tax ...Missing: original paper
[25]
Enhancing Effectiveness of Outlier Detections for Low Density Patterns
In this paper, we introduce a connectivity-based outlier factor (COF) scheme that improves the effectiveness of an existing local outlier factor (LOF) scheme.
[26]
Anomaly Detection Using Autoencoders with Nonlinear ...
This paper demonstrates that autoencoders are able to detect subtle anomalies which linear PCA fails. Also, autoencoders can increase their accuracy by ...
[27]
[1312.6114] Auto-Encoding Variational Bayes - arXiv
Dec 20, 2013 · Authors:Diederik P Kingma, Max Welling. View a PDF of the paper titled Auto-Encoding Variational Bayes, by Diederik P Kingma and 1 other authors.
[28]
Unsupervised Anomaly Detection with Generative Adversarial ...
Mar 17, 2017 · Abstract page for arXiv paper 1703.05921: Unsupervised Anomaly Detection with Generative Adversarial Networks to Guide Marker Discovery.
[29]
Advancing anomaly detection in cloud environments with cutting ...
Aug 26, 2024 · This paper presents CloudGEN, a pioneering approach to anomaly detection in cloud environments by leveraging the potential of Generative Adversarial Networks ( ...
[30]
A Deep Convolutional Autoencoder-Based Approach for Anomaly ...
Jan 17, 2022 · This paper proposes using convolutional autoencoders to extract features from 2D, non-image data for anomaly detection in manufacturing, using ...Missing: seminal | Show results with:seminal
[31]
RTdetector: Deep Transformer Networks for Time Series Anomaly ...
To address these challenges, we propose RTdetector, a Transformer-based time series anomaly detection model leveraging reconstruction trends. ... Copyright © 2025 ...
[32]
A study on anomaly detection ensembles - ScienceDirect
Over the years many anomaly and outlier metrics have been developed. In this paper we propose a clustering-based score ensembling method for outlier detection.
[33]
Hybrid Machine Learning–Statistical Method for Anomaly Detection ...
Oct 12, 2022 · This paper investigates the use of an unsupervised hybrid statistical–local outlier factor algorithm to detect anomalies in time-series flight data.
[34]
Ensemble learning based anomaly detection for IoT cybersecurity ...
Jun 12, 2024 · In this paper, we present a comprehensive study on using ensemble machine learning methods for enhancing IoT cybersecurity via anomaly detection.Missing: seminal | Show results with:seminal
[35]
[PDF] Ensemble Methods for Anomaly Detection
We address this problem, proposing ensemble anomaly detection techniques that perform well in many applications, with four major contributions: using.
[36]
What is an Intrusion Detection System? - Palo Alto Networks
NIDS monitors network traffic for suspicious activity, while HIDS monitors activities on individual devices or hosts, such as system logs, file integrity, and ...
[37]
Signature-Based vs Anomaly-Based IDS: Key Differences
Feb 4, 2025 · Signature-based IDS quickly identifies known threats, while anomaly-based IDS detects new and unknown threats by flagging deviations from normal ...
[38]
Network Anomaly Detection: A Comprehensive Guide - Kentik
Mar 14, 2025 · DDoS traffic often starts as an anomaly—a sudden flood of packets, far above normal volume, coming from unusual sources. Detecting a DDoS attack ...The Role Of Ai And Machine... · Ddos Detection, Protection... · Kentik's Capabilities Vs...
[39]
How to Prevent Zero-Day Attacks? - Threat Intelligence - SentinelOne
Aug 5, 2025 · Zero-day attack prevention and discovery depend on the vigilance of security teams and anomaly detection tools. Unusual behavior patterns ...
[40]
Research on Multi-Stage Detection of APT Attacks - MDPI
Experimental results demonstrate that the proposed method achieves 97.31% and 98.32% accuracy on two typical APT attack datasets, DAPT2020 and Unraveled, ...
[41]
SIEM Anomaly Detection: The Key to Proactive Cybersecurity
SIEM anomaly detection, with its focus on behavioral patterns, reduces false positives by continuously learning what constitutes normal activity within a ...
[42]
https://www.sciencedirect.com/science/article/pii/S2666764925000372
[43]
Deep Learning in Financial Fraud Detection - ScienceDirect.com
Aug 20, 2025 · Recently, deep learning (DL) has gained prominence in financial fraud detection owing to its ability to model high-dimensional and complex data.
[44]
[PDF] Graph Computing for Financial Crime and Fraud Detection - arXiv
Analyzing interconnectivity and shared network patterns are key in detecting financial scams and the downstream fraud cases. 4. Account Takeover. Account ...Missing: seminal | Show results with:seminal
[45]
A systematic literature review of graph-based anomaly detection ...
In recent years, GBAD techniques have considerably contributed to identifying fraudulent activities within networks and have been recognized by fraud detection ...Missing: seminal | Show results with:seminal
[46]
Graph-Learning-Empowered Financial Fraud Detection
fraudulent transactions. 356. The model is designed to handle the challenge of limited labeled data in credit card fraud detec-.Missing: seminal | Show results with:seminal
[47]
https://www.techzert.com/blog/aml-in-banking-ibm-watsonx-and-safer-payments-detection-prevention
[48]
AML in Banking: IBM Watsonx & Safer Payments for Fraud Detection
Mar 31, 2025 · Discover how IBM Watsonx AI and Safer Payments transform AML in banking with real-time fraud detection, AI-driven analytics, and compliance ...<|separator|>
[49]
Enhancing credit card fraud detection: highly imbalanced data case
Dec 28, 2024 · This paper emphasizes the main issues in fraud detection and suggests a novel feature selection method called FID-SOM (feature selection for imbalanced data ...
[50]
Optimizing credit card fraud detection with random forests and SMOTE
May 22, 2025 · It achieved high precision (0.98) and recall (0.98), showcasing its ability to correctly identify both fraudulent and genuine transactions.
[51]
Enhancing Fraud Detection in Credit Card Transactions
Aug 29, 2025 · Supervised credit card fraud detection aims to develop ML models based on transactional credit card data to differentiate between transactions ...
[52]
Artificial intelligence in healthcare and medicine - PubMed Central
Sep 23, 2025 · Despite its potential, the adoption of AI in healthcare faces significant challenges, including ethical considerations regarding patient privacy ...
[53]
Robust and Accurate Anomaly Detection in ECG Artifacts ... - NIH
In this paper, we propose a robust and accurate anomaly detection algorithm (RAAD) for ECG to reduce the false alarm rate. The challenges of this work include ...
[54]
Deep learning for ECG Arrhythmia detection and classification - NIH
Sep 15, 2023 · This survey categorizes and compares the DL architectures used in ECG arrhythmia detection from 2017–2023 that have exhibited superior performance.
[55]
Anomaly Detection Framework for Wearables Data - NIH
The best practices in anomaly detection include associating wearable devices data with patient health records, which can annotate each instance of heart rate, ...
[56]
Detecting anomalies in smart wearables for hypertension - Frontiers
Jan 8, 2025 · The study aims to contribute to the paradigm shift in health data assessment and anomaly detection by integrating intelligent sensors that can ...
[57]
Unsupervised anomaly detection in brain MRI - ScienceDirect.com
Our proposed method was able to automatically detect various brain anomalies such as glioblastoma, multiple sclerosis, and cerebral infarction.
[58]
Denoising Autoencoders for Unsupervised Anomaly Detection in ...
Dec 4, 2022 · We investigate classical denoising autoencoder models that do not require bottlenecks and can employ skip connections to give high resolution fidelity.
[59]
Anomaly Detection in Endemic Disease Surveillance Data Using ...
Jun 30, 2023 · In this paper we explore the potential for unsupervised anomaly detection machine learning techniques to discover signals of epidemiological interest.
[60]
Time series anomaly detection in helpline call trends for early ...
Sep 24, 2025 · Timely detection and surveillance of disease community spread is a potent tool for implementing effective public health interventions.Method Development · Results · Anomaly Detection And...
[61]
Machine learning enables legal risk assessment in internet ... - NIH
Aug 5, 2025 · This study explores how artificial intelligence technologies can enhance the regulatory capacity for legal risks in internet healthcare ...
[62]
A Review on Vibration Monitoring Techniques for Predictive ... - MDPI
This paper reviews the techniques and tools used to collect and analyze vibration data, as well as the methods used to interpret and diagnose faults in rotating ...
[63]
Data-driven machinery fault diagnosis: A comprehensive review
Apr 28, 2025 · This survey provides a comprehensive review of the articles using different types of machine learning approaches for the detection and diagnosis of various ...
[64]
An Anomaly Detection Model for Oil and Gas Pipelines Using ... - MDPI
Herein, machine learning-based anomaly detection models are proposed to solve the problem of oil and gas pipeline leakage.
[65]
Oil and gas flow anomaly detection on offshore naturally flowing ...
This article explores the application of deep neural networks for anomaly detection in monitoring oil and gas flow in natural flow offshore wells.
[66]
Anomaly detection in IOT edge computing using deep learning and ...
Dec 2, 2023 · Transferring data from IoT devices helps reduce latency and response time in real-time applications [24]. Edge computing extends cloud computing ...
[67]
A systematic review of anomaly detection in IoT security
Sep 29, 2025 · This paper presents a systematic review of Machine Learning-based anomaly detection techniques for IoT security. Despite previous reviews, this ...
[68]
Detection of Anomalous Behavior of Manufacturing Workers Using ...
Jul 26, 2023 · This study presents a model that detects workers' anomalous behavior in manufacturing environments. The objective is to determine worker movements, postures, ...
[69]
Real-Time Gas Monitoring and Anomaly Detection in Petroleum ...
Aug 9, 2025 · Real-Time Gas Monitoring and Anomaly Detection in Petroleum Industry Using IoT and Machine Learning ... video. streaming, anomaly detection, and a ...
[70]
Anomaly Detection in Dynamic Graphs: A Comprehensive Survey
May 31, 2024 · This survey paper presents a comprehensive and conceptual overview of anomaly detection using dynamic graphs.
[71]
Outlier Detection Using k-Nearest Neighbour Graph - Volume 03
We present an Outlier Detection using Indegree Number (ODIN) algorithm that utilizes k-nearest neighbour graph. Improvements to existing kNN distance-based ...
[72]
[PDF] Unsupervised Anomaly Detection Algorithms on Real-world Data
Abstract. In this study we evaluate 33 unsupervised anomaly detection algorithms on 52 real-world multivariate tabular data sets, performing the largest ...
[73]
Deep Graph Anomaly Detection: A Survey and New Perspectives
Sep 16, 2024 · View a PDF of the paper titled Deep Graph Anomaly Detection: A Survey and New Perspectives, by Hezhe Qiao and 5 other authors. View PDF HTML ( ...
[74]
https://arxiv.org/abs/2106.09876
[75]
[2107.09903] Anomaly Detection via Self-organizing Map - arXiv
Jul 21, 2021 · This paper proposes SOMAD, an unsupervised anomaly detection method using Self-organizing Map (SOM) and multi-scale features, achieving state- ...
[76]
An Unsupervised Anomaly Detection Based on Self-Organizing Map ...
Mar 15, 2023 · In this paper, an approach for fault detection and identification was developed using a Self-Organizing Map algorithm, as the results of the obtained map are ...
[77]
Explaining outliers and anomalous groups via subspace density ...
Sep 23, 2024 · Adversarial Anomaly Explanation. Article Open access 21 ... detect set of features for which outliers exhibit a more abnormal behavior.Missing: COP Contrasting
[78]
Counterfactual Explanation for Auto-Encoder Based Time-Series ...
Jan 3, 2025 · [Submitted on 3 Jan 2025]. Title:Counterfactual Explanation for Auto-Encoder Based Time-Series Anomaly Detection. Authors:Abhishek Srinivasan ...
[79]
AR-Pro: Counterfactual Explanations for Anomaly Repair with ...
Oct 31, 2024 · We demonstrate the effectiveness of our anomaly explainability framework, AR-Pro, on vision (MVTec, VisA) and time-series (SWaT, WADI, HAI) anomaly datasets.
[80]
Benchmarking Anomaly Detection Algorithms: Deep Learning and ...
Feb 11, 2024 · This paper evaluates a diverse array of Machine Learning (ML)-based anomaly detection algorithms through a comprehensive benchmark study.
[81]
KDD Cup 1999 Data - UCI KDD Archive
This database contains a standard set of data to be audited, which includes a wide variety of intrusions simulated in a military network environment.
[82]
A detailed analysis of the KDD CUP 99 data set - ResearchGate
The KDD Cup 99 research dataset which was developed specifically for Network Intrusion Detection (NID) research is preferred by most researchers. Despite the ...
[83]
numenta/NAB: The Numenta Anomaly Benchmark - GitHub
The NAB corpus of 58 timeseries data files is designed to provide data for research in streaming anomaly detection. It is comprised of both real-world and ...
[84]
Evaluating Real-Time Anomaly Detection Algorithms - IEEE Xplore
Here we propose the Numenta Anomaly Benchmark (NAB), which attempts to provide a controlled and repeatable environment of open-source tools to test and measure ...
[85]
OAB - An Open Anomaly Benchmark Framework for Unsupervised ...
Abstract. We introduce OAB, an Open Anomaly Benchmark Framework for unsupervised and semisupervised anomaly detection on image and tabular data sets, ...
[86]
IoT-23 Dataset: A labeled dataset of Malware and Benign IoT Traffic.
Its goal is to offer a large dataset of real and labeled IoT malware infections and IoT benign traffic for researchers to develop machine learning algorithms.
[87]
Credit Card Fraud Detection - Kaggle
The dataset contains transactions made by credit cards in September 2013 by European cardholders. This dataset presents transactions that occurred in two days, ...
[88]
A Comprehensive Forecasting-Based Framework for Time Series ...
Oct 13, 2025 · We conduct the first complete evaluation on the Numenta Anomaly Benchmark (58 datasets, 7 categories) with 232 model training runs and 464 ...
[89]
F1 Score vs ROC AUC vs Accuracy vs PR AUC - Neptune.ai
Learn about metrics: Accuracy, F1, ROC AUC, PR AUC. In-depth comparisons with insights on binary classification metrics, and logging nuances.
[90]
Deep semi-supervised anomaly detection with contamination ...
This paper proposes a novel semi-supervised anomaly detection method, which devises contamination-resilient continuous supervisory signals.
[91]
Official Implement of "ADBench: Anomaly Detection Benchmark ...
ADBench is (to our best knowledge) the most comprehensive tabular anomaly detection benchmark, where we analyze the performance of 30 anomaly detection ...
[92]
[PDF] PyOD: A Python Toolbox for Scalable Outlier Detection
This paper presents PyOD, a comprehensive toolbox built in Python for scalable outlier detection. It includes more than 20 classical and emerging detection ...
[93]
yzhao062/pyod: A Python Library for Outlier and Anomaly ... - GitHub
PyOD, established in 2017, has become a go-to Python library for detecting anomalous/outlying objects in multivariate data.
[94]
PyOD 2: A Python Library for Outlier Detection with LLM ... - arXiv
Dec 11, 2024 · In this paper, we demonstrate how PyOD 2 streamlines the deployment and automation of OD models and sets a new standard in both research and industry.
[95]
IsolationForest — scikit-learn 1.7.2 documentation
The anomaly score of the input samples. The lower, the more abnormal. Negative scores represent outliers, positive scores represent inliers. Notes. The ...Missing: bagging | Show results with:bagging
[96]
IsolationForest example — scikit-learn 1.7.2 documentation
An example using IsolationForest for anomaly detection. The Isolation Forest is an ensemble of “Isolation Trees” that “isolate” observations by recursive ...
[97]
ELKI Data Mining Framework
ELKI is an open source (AGPLv3) data mining software written in Java. The focus of ELKI is research in algorithms, with an emphasis on unsupervised methods.
[98]
elki-project/elki: ELKI Data Mining Toolkit - GitHub
ELKI is an open source (AGPLv3) data mining software written in Java. The focus of ELKI is research in algorithms, with an emphasis on unsupervised methods.
[99]
Anomaly detection in manufacturing - KNIME
Jan 11, 2024 · Anomaly detection for predictive maintenance. Predict when critical equipment parts go bad and prevent failures and downtime.
[100]
anomaly detection - KNIME Community Hub
KNIME anomaly detection workflows use time series analysis, auto-regressive models, control charts, and convolutional networks.
[101]
TensorFlow Datasets Catalog
Anomaly detection; Audio; Biology; Categorical; Common sense reasoning; Computer science; Conditional image generation; Coreference resolution ...
[102]
Anomaly detection with TensorFlow - GeeksforGeeks
Anomaly detection with TensorFlow. Last Updated : 23 Jul, 2025 ... In this article, we will explore the use of autoencoders in anomaly detection and ...
[103]
anomalydetection | Splunk Docs
Jul 4, 2025 · The anomalydetection command includes the capabilities of the existing anomalousvalue and outlier commands and offers a histogram-based approach ...
[104]
Scaling Anomaly Detection with MLTK 5.5 - Splunk
Jan 3, 2025 · The recently released Machine Learning Toolkit (MLTK) 5.5 introduces a new version of our most popular algorithm for detecting anomalies: the DensityFunction ...
[105]
Anomaly Detection | Definition & Security Solutions - Darktrace
Anomaly detection is a critical process in cybersecurity, used to identify deviations from expected behavior within systems, networks, or datasets.
[106]
Autonomous Threat Detection - Darktrace
Darktrace DETECT analyzes thousands of metrics to reveal subtle deviations that may signal an evolving threat - even unknown techniques and novel malware.
[107]
Random Cut Forest (RCF) Algorithm - Amazon SageMaker AI
Amazon SageMaker AI Random Cut Forest (RCF) is an unsupervised algorithm for detecting anomalous data points within a data set.Input/Output Interface for the... · Instance Recommendations...
[108]
Efficiently build and tune custom log anomaly detection models with ...
Jan 6, 2025 · In this post, we walk you through the process to build an automated mechanism using Amazon SageMaker to process your log data, run training iterations over it.Process The Data · Decentralized Processing... · Train And Tune The Model
[109]
(PDF) Challenges in Anomaly Detection - ResearchGate
May 11, 2025 · One of the key challenges in anomaly detection is the ability to discern which data. points represent true anomalies versus those that are ...
[110]
A comprehensive survey on techniques, challenges, evaluation ...
Jul 15, 2025 · A comprehensive survey on techniques, challenges, evaluation metrics and applications of deep learning models for anomaly detection. Review ...
[111]
Open Challenges in Time Series Anomaly Detection - arXiv
Feb 8, 2025 · In this paper, we argue that while laudable and highly relevant, these efforts miss some fundamental aspects of time series anomaly detection in practical ...
[112]
A Survey of Deep Anomaly Detection in Multivariate Time Series - NIH
In this paper, we conducted a structured and comprehensive overview of the latest techniques in deep learning for multivariate time series anomaly detection ...
[113]
A Two-Stage Hybrid Federated Learning Framework for Privacy ...
This paper presents a two-stage hybrid Federated Learning (FL) framework for IoT anomaly detection and classification, validated on the real-world N-BaIoT ...
[114]
Leveraging GANs for Synthetic Data Generation to Improve Intrusion ...
Feb 24, 2025 · Our findings indicate that GAN-augmented training significantly enhances detection rates, particularly for rare attack types, while maintaining ...
[115]
[2501.11430] A Survey on Diffusion Models for Anomaly Detection
In this survey, we review recent advances in DMAD research. We begin by presenting the fundamental concepts of AD and DMs, followed by a comprehensive analysis ...Missing: CloudGEN | Show results with:CloudGEN
[116]
Hybrid Machine Learning Framework for Anomaly Detection in 5G ...
Apr 5, 2025 · This research focuses on collecting and analysing 5G network traffic to detect and classify various anomalies.
[117]
https://www.researchgate.net/publication/389432599_Leveraging_GANs_for_Synthetic_Data_Generation_to_Improve_Intrusion_Detection_Systems
[118]
Ethically Responsible Decision Making for Anomaly Detection in ...
Ethical principles can guide the decision-making of current AI models. ... Ethical decision-making provides robust regulations to address ethical dilemmas.