Fact-checked by Grok 2 weeks ago

Fault detection and isolation

Fault detection and isolation (FDI) is a systematic engineering approach used to identify the occurrence of faults—unintended deviations from normal system behavior—and to pinpoint their specific type, location, and timing within complex dynamic systems, thereby enabling timely corrective actions to maintain safety and reliability.^[1] This methodology is essential in mission-critical applications such as aerospace, nuclear power plants, automotive systems, and industrial processes, where undetected faults can lead to catastrophic failures, economic losses, or safety hazards.^[1] FDI typically operates through two primary stages: fault detection, which monitors system performance to recognize anomalies as early as possible, and fault isolation, which determines the root cause by analyzing the affected components or subsystems.^[2] The core mechanism involves residual generation, where discrepancies between observed and predicted system outputs (based on mathematical models or data patterns) signal potential issues; these residuals are then evaluated for sensitivity to specific faults.^[3] Robustness to noise, disturbances, and model uncertainties is a key challenge, addressed through techniques like thresholding and statistical analysis.^[3] Broadly, FDI methods are categorized into model-based, model-free, and data-driven paradigms. Model-based approaches, such as state observers and Kalman filters, rely on analytical redundancy relations derived from system dynamics to generate structured residuals for precise isolation.^[1]^[3] Model-free methods employ physical redundancy, like multiple sensors, to compare outputs and detect inconsistencies without explicit modeling.^[1] Data-driven techniques, including artificial neural networks (ANNs), fuzzy logic, and machine learning algorithms, leverage historical data for pattern recognition, offering adaptability in nonlinear or uncertain environments.^[2]^[3] In modern contexts, FDI extends to fault identification and recovery (FDIIR), particularly in autonomous systems like self-driving vehicles, where perception sensors (e.g., LiDAR, cameras) must be monitored for environmental-induced faults such as noise or occlusion, with recovery strategies like software reconfiguration ensuring continued operation. Advances in intelligent algorithms, including structural analysis and binary integer linear programming for residual selection, enhance fault solubility and diagnostic efficiency in large-scale systems.^[2] Overall, FDI's evolution reflects increasing system complexity, with ongoing research emphasizing real-time implementation, integration with prognostics for fault prediction, and hybrid methods combining multiple paradigms for superior performance.^[1]^[2]

Overview

Definition and Objectives

Fault detection and isolation (FDI) is a subfield of control engineering focused on monitoring dynamic systems to identify anomalies, known as faults, and determine their specific locations or sources within the system.^[4] This process typically involves generating residuals—discrepancies between expected and observed system behaviors—to signal the presence of faults, followed by analytical techniques to pinpoint the affected components, such as sensors or actuators. Unlike fault identification, which aims to estimate the magnitude, type, or extent of a fault once detected, FDI emphasizes binary decisions on occurrence and localization to enable prompt intervention.^[4] The primary objectives of FDI are to enable early detection of faults, thereby preventing catastrophic system failures and minimizing downtime in critical applications like aerospace and manufacturing processes. By isolating faults to specific subsystems, FDI facilitates targeted repairs or reconfigurations, reducing overall maintenance costs and enhancing operational safety.^[4] Furthermore, FDI integrates seamlessly with feedback control systems to maintain reliability, allowing for fault-tolerant designs that sustain performance even under degraded conditions. Key performance metrics for evaluating FDI systems include detection time, which measures the delay from fault onset to alert generation; false alarm rate, indicating the frequency of erroneous detections; isolation resolution, assessing the precision in identifying fault locations; and sensitivity thresholds, which define the minimum detectable fault size. These metrics ensure that FDI schemes balance responsiveness with robustness against noise and modeling uncertainties.^[4] In the conceptual framework of FDI within feedback control loops, faults disrupt the closed-loop dynamics, such as by altering sensor measurements or actuator responses, prompting residual-based monitoring to restore nominal behavior.^[4] For instance, in a simple DC motor control system, a sensor fault might bias speed feedback, leading to unstable velocity tracking, while an actuator fault could reduce torque output, causing position deviations; FDI isolates these by comparing loop outputs against model predictions.^[5]

Historical Background

The field of fault detection and isolation (FDI) emerged in the early 1970s within control theory, primarily driven by the need to enhance system reliability in aerospace and process industries. Richard V. Beard's 1971 dissertation introduced observer-based methods for failure accommodation in linear systems, laying foundational concepts for detecting and isolating faults through state estimation and self-reorganization. Complementing this, Howard L. Jones's 1973 thesis developed parity relations as a technique for failure detection in linear systems, enabling consistency checks on system measurements without explicit state observers.^[4] These early contributions established FDI as a distinct subfield, focusing on analytical methods to monitor dynamic systems proactively. The 1980s marked significant advancements, particularly with the formalization of model-based FDI. Edward Y. Chow and Alan S. Willsky's 1984 paper introduced analytical redundancy relations, which utilized mathematical models to generate residuals for robust failure detection and isolation, decoupling fault signatures from system uncertainties.^[6] This work unified observer and parity approaches, establishing model-based FDI as a core paradigm and influencing subsequent designs for safety-critical applications. By the late 1980s, integration with robust control techniques addressed real-world uncertainties, as exemplified by Paul M. Frank's comprehensive survey in 1990, which reviewed analytical and knowledge-based redundancy methods while proposing solutions for fault decoupling under disturbances. The 1990s saw FDI expand amid growing computational capabilities, with a rise in data-driven methods alongside model-based ones; Frank's ongoing contributions emphasized robustness to uncertainties, enabling applications in automotive and manufacturing sectors. The 2000s further consolidated the field through influential surveys, such as Rolf Isermann's 2006 book, which provided a systematic overview of fault diagnosis from detection to tolerance, highlighting process model-based estimation techniques.^[7] From the 2010s onward, FDI shifted toward artificial intelligence integration, with machine learning methods post-2010 enabling pattern recognition in complex data, followed by deep learning applications like convolutional neural networks (CNNs) for fault pattern detection since around 2015.^[8] In the 2020s, emphasis has grown on real-time FDI for cyber-physical systems, supported by recent IEEE standards such as IEEE 7009-2024 for fail-safe design in autonomous systems, ensuring safety in interconnected environments.^[9]

Core Principles

Types of Faults

Faults in dynamic systems are broadly classified based on their nature, manifestation, location, persistence, and impact, providing a foundational taxonomy for fault detection and isolation (FDI) strategies. This classification helps in understanding how anomalies deviate from nominal system behavior, influencing the design of diagnostic approaches. Seminal works in FDI, such as those by Isermann, emphasize these categories to distinguish between external disturbances and internal degradations, enabling targeted monitoring in industrial processes, aerospace, and automotive systems. Additive faults introduce an external offset or bias to system signals or states, typically appearing as superimposed disturbances independent of the system's operating point. For instance, a constant bias in a sensor reading exemplifies an additive fault, where the error adds a fixed value to the measured output regardless of the true signal magnitude. In contrast, multiplicative faults scale or alter the system's parameters proportionally to the operating conditions, such as gain degradation in an amplifier or efficiency loss in a motor, which multiplies the nominal response by a factor deviating from unity. This distinction is critical in model-based FDI, as additive faults affect residuals linearly while multiplicative ones introduce nonlinearities in the system dynamics.^[10]^[11] Faults are further categorized by their temporal evolution: abrupt faults occur suddenly as step-like changes, often due to instantaneous events like component breakage or electrical short circuits, leading to immediate and significant deviations from normal operation. Incipient faults, however, develop gradually as drifting or ramp-like progressions, such as mechanical wear in bearings or slow corrosion in pipelines, which may remain subtle until accumulating to affect performance. These gradual faults pose unique challenges in early detection, as their signatures are often masked by process noise or variability.^[12]^[13] Component-specific faults are localized to particular elements within the system. Sensor faults manifest as measurement inaccuracies, including bias, drift, or complete loss of signal, compromising the feedback loop in control systems. Actuator faults involve failures in control signal delivery, such as partial blockage in a valve or jamming in a servo motor, resulting in reduced or erroneous actuation. Process faults, also known as component or plant faults, arise from internal dynamic shifts, exemplified by sticking in mechanical components or parameter variations in chemical reactors, altering the core system equations. These categories—sensor, actuator, and process—form the basis for structured residual generation in FDI schemes.^[14] Regarding persistence, permanent faults endure until corrective intervention, causing sustained degradation like a fully broken wire leading to total signal loss. Intermittent faults, conversely, appear sporadically and self-resolve, often triggered by transient conditions such as loose connections or thermal fluctuations, complicating isolation due to their non-reproducible nature. Environmental influences exacerbate these, with noise from electromagnetic interference acting as intermittent additive disturbances, while cyber-attacks in networked systems can induce both intermittent and permanent manipulations of sensor or actuator data.^[12]^[15] Fault severity is assessed by the extent of system impact: catastrophic faults precipitate immediate shutdown or failure, such as a turbine blade fracture risking total system collapse and safety hazards. Degradative faults, on the other hand, cause progressive performance loss without instant breakdown, like gradual insulation wear in electrical components leading to reduced efficiency over time. This severity spectrum guides prioritization in FDI, where high-severity events demand rapid response to avert disasters.^[16]^[17]

Detection, Isolation, and Identification

Fault detection and isolation (FDI) encompasses three sequential processes: detection, which identifies the presence of a fault; isolation, which localizes the fault to specific components; and identification, which characterizes the fault's nature. These steps form the core of diagnostic frameworks in dynamic systems, relying on discrepancies between observed and expected behaviors to ensure timely system supervision.^[18] Detection involves monitoring residuals, defined as the differences between actual system measurements and those predicted by a nominal model, to flag anomalies indicative of faults. Residuals are generated through analytical methods, such as state observers or parity equations, capturing deviations caused by faults in actuators, sensors, or processes. To distinguish faults from noise or modeling uncertainties, residuals are evaluated against predefined thresholds; for instance, a residual exceeding a threshold ε signals a fault occurrence, where ε is typically set based on statistical bounds like three standard deviations of residual variance under fault-free conditions. This threshold-based approach ensures robustness while minimizing false alarms, as residuals remain close to zero in healthy operation but diverge significantly upon fault inception.^[19] Isolation follows detection and aims to pinpoint the affected subsystem or component using structured fault signatures derived from residual patterns. Fault signatures represent unique combinations of residual responses to specific faults, often encoded in binary diagnostic matrices where rows correspond to residuals and columns to potential fault candidates; a '1' indicates sensitivity to a fault, while '0' denotes insensitivity. Decision logic, such as pattern matching or inference rules, compares observed residual vectors against these signatures to identify the fault location—for example, if only certain residuals deviate in a manner matching a predefined column, the corresponding component is isolated. This matrix-based method facilitates efficient isolation in multi-variable systems by leveraging redundancy in measurements.^[20] Identification extends isolation by estimating the fault's quantitative attributes, including its magnitude, type (e.g., additive or multiplicative), and onset time. Techniques such as least-squares parameter estimation adapt system models to fit faulty data, yielding fault estimates without requiring full model inversion; for instance, an actuator fault magnitude can be approximated by minimizing the error between predicted and measured outputs. This process often integrates prior isolation results to focus estimation on candidate faults, providing actionable insights for maintenance or reconfiguration.^[19]^[21] These processes are interdependent, with detection serving as a prerequisite for both isolation and identification, as undetected faults cannot be localized or characterized. In multi-fault scenarios, challenges arise from fault masking, where one fault's effects obscure another's, leading to ambiguous signatures and reduced isolability; simultaneous faults may produce composite residuals that mimic single-fault patterns, necessitating advanced decoupling strategies.^[18] Evaluation of FDI performance hinges on criteria like fault detectability and isolability. Detectability assesses the minimum detectable fault size, defined as the smallest fault magnitude that produces a residual deviation exceeding the threshold despite disturbances, often quantified intrinsically by the fault's effect on system trajectories or performatively by detection delay metrics. Isolability evaluates the distinguishability of fault modes, requiring unique residual signatures for each fault to avoid confounding; for linear systems, this is ensured if fault directions in residual space are linearly independent. These criteria guide system design, ensuring faults are reliably addressed before propagation.^[22]^[23]

Model-Based FDI

Analytical Redundancy Relations

Analytical redundancy refers to the use of mathematical models of a system to generate expected outputs from known inputs and compare them against actual measurements, thereby creating residuals that indicate discrepancies due to faults; this approach substitutes for physical sensor redundancy by exploiting the inherent relationships within the system model.^[24] In model-based fault detection and isolation (FDI), analytical redundancy enables the computation of parity relations—equations that must hold for fault-free operation—allowing faults to be detected when these relations are violated.^[25] For linear time-invariant systems described by the state-space model \dot{x} = Ax + Bu + Ld, y = Cx + Du + Ff, where x is the state vector, u the input, y the output, d disturbances, f faults, and L, F fault distribution matrices, the parity vector is constructed to form residuals insensitive to inputs and disturbances but sensitive to faults. A basic residual is generated as r = y - \hat{y}, where \hat{y} = C\hat{x} + Du and \hat{x} is an estimate derived from the model, often simplified in static cases to r = y - Cx - Du under full state knowledge, though practical implementations use past inputs and outputs to eliminate unmeasured states.^[24] The parity vector w satisfies w(s)(y(s) - G_u(s)u(s)) = 0 in the fault-free case, where G_u(s) is the input-output transfer function and s the Laplace variable, ensuring residuals r = w(s)(y(s) - G_u(s)u(s)) decouple from nominal behavior.^[25] The fault signature matrix, also known as the fault direction matrix, organizes residuals for isolation: its rows correspond to independent residuals, and columns to potential faults, with entries indicating the effect of each fault on each residual (e.g., nonzero if the fault affects the residual). Structured residuals are designed such that each fault produces a unique pattern of nonzero residuals, enabling isolation; for instance, if a fault in actuator f_1 affects only residual r_1 (signature [1, 0]^T), while f_2 affects r_2 ([0, 1]^T), the observed residual vector uniquely identifies the fault.^[24] Generation of parity relations can be direct (static), using algebraic elimination of states from the system equations for instantaneous residuals, or dynamic, incorporating transfer functions or delay operators for time-series data to handle system dynamics.^[25] In the dynamic approach, a stable left annihilator W(s) of the system transfer function matrix ensures residuals are zero under no faults, enhancing robustness to noise. A representative example is fault detection in a DC motor drive system, modeled as \dot{x} = Ax + Bu + Lf, y = Cx, where analytical redundancy relations (ARRs) like R_m i_m + L_m \frac{di_m}{dt} + \mu_m \omega = v (with R_m, L_m motor parameters, i_m current, \omega speed, v voltage) generate residuals sensitive to faults in resistance or inductance; the fault signature matrix then isolates, e.g., motor faults from gear faults by unique residual patterns.^[26] Analytical redundancy offers the advantage of avoiding hardware duplication, relying instead on software-based model computations for cost-effective FDI, and provides explicit fault isolability through structured designs.^[24] However, it faces limitations in nonlinear systems, where deriving exact parity relations is challenging due to the lack of linear superposition, often requiring approximations or extensions like polynomial models, which may reduce robustness.^[25]

Observer-Based Approaches

Observer-based approaches to fault detection and isolation (FDI) in model-based frameworks utilize state observers to estimate system states from measurable outputs, generating residuals that signal deviations due to faults. These methods rely on the principle of analytical redundancy, where discrepancies between predicted and actual outputs indicate anomalies. The core idea involves designing an observer that asymptotically tracks the fault-free system dynamics, allowing fault effects to manifest in the estimation error. The foundational observer for linear time-invariant systems is the Luenberger observer, proposed for state estimation in deterministic systems described by \dot{x} = Ax + Bu, y = Cx. The observer dynamics are given by \dot{\hat{x}} = A\hat{x} + Bu + L(y - C\hat{x}), where \hat{x} is the estimated state and L is the observer gain matrix chosen to ensure error convergence. The residual is typically defined as r = y - C\hat{x}, which converges to zero in the fault-free case if the observer is stable. For fault detection, the observer is extended to handle unknown inputs such as disturbances and faults through unknown input observers (UIOs). In UIO designs, the observer structure decouples the effects of unknown inputs from the residual, ensuring sensitivity to faults like actuator or sensor malfunctions while remaining robust to process disturbances. For instance, residual generation for process faults involves modifying the observer to treat faults as additive terms in the state equation, where the residual r becomes non-zero only when faults occur, as derived from the error dynamics \dot{e} = (A - LC)e + E f, with e = x - \hat{x}, E the fault distribution matrix, and f the fault vector; stability is achieved by placing the eigenvalues of A - LC in the left half-plane via pole placement techniques for L. Seminal UIO formulations ensure the existence conditions, such as rank constraints on the output and fault matrices, to enable disturbance decoupling. Fault isolation in observer-based schemes employs structured configurations like the dedicated observer scheme (DOS), which uses a bank of observers—one dedicated to each potential fault hypothesis. In DOS, each observer is insensitive to all faults except the one it monitors, allowing isolation by identifying the unique residual that deviates from zero. Adaptive thresholds may be applied to residuals to account for modeling uncertainties, enhancing isolation reliability without false alarms. The gain L for each observer is designed independently using LMI-based or pole placement methods to guarantee asymptotic stability and fault sensitivity. Extensions to nonlinear systems incorporate sliding mode observers (SMOs) for enhanced robustness against Lipschitz nonlinearities and matched uncertainties. SMOs enforce a sliding surface on the output error, driving the estimation error to zero in finite time and generating discontinuous signals equivalent to fault estimates. For example, in \dot{x} = f(x) + g(x)u + d(x) + E(x)f, the SMO uses a switching term \nu = -\rho \frac{r}{|r| + \delta} added to the correction, where \rho bounds the nonlinearity, ensuring robust residual generation for fault isolation in applications like electric drives. These approaches maintain the error dynamics principles while addressing nonlinear fault propagation.

Data-Driven FDI

Signal Processing Techniques

Signal processing techniques form a cornerstone of data-driven fault detection and isolation (FDI) by transforming raw time-series sensor data into forms that reveal fault-induced anomalies without relying on system models. These methods emphasize filtering, decomposition, and feature extraction to identify patterns such as transients, harmonic distortions, or non-stationary behaviors in signals from sensors like accelerometers or current probes. Widely applied in rotating machinery and electrical systems, they enable early detection of faults like bearing wear or winding shorts by analyzing vibration or electrical signatures directly.^[27]^[28] In the time domain, moving average filters smooth noisy signals to highlight gradual or transient faults by averaging consecutive samples, reducing high-frequency noise while preserving fault-related trends. For instance, in brushless DC motor drives, a moving average filter processes back electromotive force signals to detect open-circuit faults by isolating deviations in the smoothed waveform. Wavelet transforms extend this capability for transient detection, decomposing signals into time-localized frequency components using orthogonal basis functions; Daubechies wavelets, known for their compact support and smoothness, excel at capturing abrupt changes like cracks in transmission lines or impacts in mechanical systems. These wavelets perform multi-resolution analysis, where higher-order Daubechies (e.g., db4) provide better approximation of sharp transients compared to simpler Haar wavelets, enabling isolation of fault events from healthy baselines.^[29]^[30]^[31] Frequency-domain analysis employs the Fast Fourier Transform (FFT) to convert time signals into spectra, revealing harmonic shifts indicative of faults; in bearing diagnosis, FFT identifies characteristic peaks at fault frequencies (e.g., ball pass frequencies) amid vibration spectra, where inner-race defects produce sidebands around the carrier frequency. This approach quantifies fault severity by measuring amplitude increases in specific harmonics, as demonstrated in rolling element bearings where outer-race faults manifest as elevated energy at the fault frequency multiplied by shaft rotation rate. Such spectral peaks allow isolation by comparing against healthy spectra, though FFT assumes stationarity and may smear transient events.^[32]^[33] For non-stationary signals, time-frequency methods like the Short-Time Fourier Transform (STFT) and Continuous Wavelet Transform (CWT) provide joint representations, balancing time and frequency resolution. STFT segments the signal into overlapping windows and applies FFT to each, producing spectrograms that track evolving fault frequencies in varying-speed machinery; however, its fixed window limits resolution for wideband transients. CWT overcomes this with scalable wavelets, offering variable resolution suited to non-stationary vibrations, such as in internal combustion engines where it localizes fault impulses in both time and scale domains for precise isolation. In rolling bearings, CWT scalograms highlight energy concentrations at fault scales, outperforming STFT for early-stage detection under speed fluctuations.^[34]^[35] Feature extraction from processed signals condenses information into scalar metrics for threshold-based detection rules, applied directly to raw sensor data. Root Mean Square (RMS) measures signal energy to detect increased vibration levels from faults like misalignment; kurtosis quantifies peakedness, rising above 3 for impulsive faults such as bearing spalls; and crest factor, the ratio of peak to RMS, signals transients by exceeding thresholds (e.g., >6 for healthy bearings). These time-domain features enable simple rule-based isolation—e.g., kurtosis >4 flags inner-race faults—without probabilistic modeling, though they are often combined for robustness in applications like wind turbine monitoring. Such techniques process unmodeled sensor streams in real-time, facilitating FDI in industrial settings like chemical plants or power grids.^[36]^[37]

Statistical and Parity Methods

Statistical and parity methods represent a class of data-driven fault detection and isolation (FDI) techniques that leverage historical process data to establish statistical models for identifying deviations indicative of faults. These approaches assume that normal operating conditions produce data following known statistical distributions, such as multivariate Gaussian, allowing residuals or test statistics to signal anomalies when they exceed predefined thresholds. By focusing on empirical correlations and variances from data, these methods avoid reliance on explicit physical models, making them suitable for complex systems where full modeling is impractical.^[38] In statistical process monitoring, multivariate data under normal conditions is often analyzed using Hotelling's T^2 statistic, which measures the squared Mahalanobis distance of a new observation from the process mean, accounting for data covariance. This statistic is defined as T^2 = (\mathbf{x} - \boldsymbol{\mu})^T \mathbf{S}^{-1} (\mathbf{x} - \boldsymbol{\mu}), where \mathbf{x} is the observation vector, \boldsymbol{\mu} is the mean vector estimated from historical data, and \mathbf{S} is the sample covariance matrix; under Gaussian assumptions, it follows a scaled chi-squared distribution, enabling threshold setting for fault detection. For instance, in manufacturing processes, T^2 charts have been applied to detect shifts in multiple sensor readings, isolating faults by examining contributions from individual variables to the statistic. Complementing T^2, the chi-squared (\chi^2) test is used for monitoring squared residuals from model predictions, assuming Gaussian noise, where the test statistic Q = \mathbf{e}^T \mathbf{e} (with \mathbf{e} as residuals) follows a \chi^2 distribution to detect non-conforming residual patterns. These tools are foundational in multivariate statistical process control for early fault alerting in industrial settings.^[39] Parity methods in a data-driven context generate parity vectors through dimensionality reduction techniques like principal component analysis (PCA), which decomposes historical data into principal components capturing normal variability, while residuals in the orthogonal space (non-principal directions) highlight faults. In PCA-based parity approaches, the parity vector is constructed as \mathbf{r} = \mathbf{P}^\perp \mathbf{y}, where \mathbf{P}^\perp is the projection matrix onto the residual subspace orthogonal to the principal components, and \mathbf{y} is the measurement vector; faults are isolated by identifying which variables contribute most to \|\mathbf{r}\|^2 exceeding thresholds. This method excels in high-dimensional systems, such as sensor networks, by reducing noise and isolating actuator or sensor faults through structured partial PCA on variable subsets. For example, in dynamic systems, PCA-derived parities have demonstrated effective isolation of multiple sensor failures by reconstructing fault signatures from residual patterns.^[40]^[41] Likelihood ratio tests provide a hypothesis-testing framework for FDI, comparing the likelihood of data under a null hypothesis (H_0: no fault, normal operation) against an alternative (H_1: fault present) using the test statistic \Lambda = 2 \ln \left( \frac{L(H_1)}{L(H_0)} \right), which under Gaussian assumptions approximates a chi-squared distribution for threshold decisions. In chemical processes, such as distillation columns, these tests have been applied to detect catalyst degradation or valve sticking by modeling fault-induced shifts in process variables, achieving detection rates above 95% in benchmark simulations while isolating faults via maximized likelihood under specific fault hypotheses.^[42]^[43] Covariance-based residuals in data-only setups focus on innovation sequences—differences between observed and predicted values from empirical covariance structures—without requiring a full state-space model. These residuals are generated as \mathbf{\nu}(k) = \mathbf{y}(k) - \hat{\mathbf{y}}(k|k-1), with their covariance \mathbf{P}_\nu estimated directly from historical data to form test statistics like the generalized variance |\mathbf{P}_\nu|, tested against chi-squared thresholds to detect sensor or actuator anomalies. This approach, akin to Kalman filter innovations but purely data-driven, has been used in stochastic systems to monitor residual covariance deviations, ensuring robustness to process noise in applications like power grids.^[44]^[45] For handling multiple faults, generalized likelihood ratio (GLR) tests extend standard likelihood ratios by jointly estimating fault parameters under H_1, using \Lambda_g = 2 \ln \left( \frac{\sup_{\theta \in \Theta_1} L(\theta)}{\sup_{\theta \in \Theta_0} L(\theta)} \right) to detect and isolate concurrent faults like multiple sensor biases. In complex systems, such as aerospace controls, GLR has isolated multi-fault scenarios with low false alarms by partitioning the parameter space, outperforming single-fault methods in simulations with overlapping fault signatures.^[46]^[47]

Artificial Intelligence in FDI

Machine Learning Techniques

Machine learning techniques in fault detection and isolation (FDI) leverage algorithms to identify patterns in sensor data or system features, enabling the classification or grouping of faults without relying on explicit physical models. These methods are particularly valuable in complex systems where data abundance allows for learning from historical or simulated fault scenarios, improving automation and reducing human intervention in diagnostics. Supervised and unsupervised approaches form the core, with ensembles enhancing robustness, while feature selection and validation strategies ensure practical deployment.^[48] Supervised machine learning methods, such as support vector machines (SVMs), classify fault types by constructing hyperplanes in high-dimensional feature spaces to separate normal operations from various fault classes, maximizing the margin between them for improved generalization. SVMs have been effectively applied in wind turbine FDI, where they detect and isolate actuator and sensor faults by training on vibration and operational data, achieving high classification accuracy in multi-fault scenarios.^[49] Similarly, k-nearest neighbors (k-NN) isolates faults by measuring proximity in feature space, assigning a data point to the fault class of its closest labeled neighbors, which proves useful for nonlinear industrial processes where fault boundaries are irregular. In process monitoring, k-NN rules have demonstrated robust isolation performance by adapting to data distributions without assuming underlying models. Unsupervised methods address scenarios with limited labeled fault data by identifying anomalies through inherent data structures. K-means clustering partitions data into clusters representing normal and anomalous behaviors, detecting faults as points deviating from the dominant normal cluster, which has been utilized in industrial process monitoring to group sensor readings and flag outliers indicative of faults. For novelty detection in normal operations, one-class SVM constructs a hypersphere enclosing typical data points, flagging deviations as potential faults; this approach excels in engineering systems like machinery where only healthy data is abundant for training, enabling early isolation of unseen anomalies. Ensemble techniques, such as random forests, aggregate multiple decision trees to enhance fault isolation robustness, particularly in handling imbalanced datasets common in FDI where fault events are rare. By employing bagging to create diverse trees from bootstrapped samples and random feature subsets, random forests reduce overfitting and improve decision boundaries for classifying multiple fault types in unsteady-state processes. This method has shown superior performance in diagnosing faults in chemical plants by ranking feature importance and mitigating bias toward majority classes. Feature selection is crucial in FDI to manage high-dimensional data from sensors, with recursive feature elimination (RFE) iteratively removing least important features based on model performance to retain discriminative ones. In wind turbine fault classification, RFE combined with classifiers like random forests selects key vibration and power features, improving detection accuracy by focusing on fault-relevant signals and reducing computational overhead. Training paradigms in ML-based FDI emphasize generalization through cross-validation, which partitions data into folds to evaluate model performance across subsets, preventing overfitting to specific fault instances. For imbalanced fault data, metrics like precision and recall are prioritized over accuracy; precision measures the proportion of true faults among detected positives, while recall captures the fraction of actual faults identified, often averaged via macro or weighted schemes in k-fold validation to guide hyperparameter tuning in applications like turbine diagnostics.

Deep Learning Techniques

Deep learning techniques in fault detection and isolation (FDI) leverage hierarchical neural architectures to automatically extract intricate fault patterns from raw sensor data, surpassing traditional methods by handling non-linearities and high-dimensional inputs without manual feature engineering. These approaches, particularly neural networks with multiple layers, enable end-to-end learning of fault representations, improving accuracy in complex systems like rotating machinery and pipelines. Seminal works have demonstrated their efficacy in industrial applications, where vast datasets from vibrations, acoustics, or time-series signals allow models to generalize across fault types.^[50] Convolutional neural networks (CNNs) are widely applied in FDI for processing image-like representations, such as spectrograms derived from vibration signals, to classify faults in mechanical components. The architecture typically includes convolutional layers that apply filters to detect local patterns like frequency peaks indicative of bearing wear, followed by pooling layers to reduce dimensionality and enhance translation invariance, culminating in fully connected layers with a softmax activation for multi-class fault isolation. For instance, in rotating machinery diagnostics, CNNs have achieved over 95% accuracy by directly learning from raw time-frequency data, avoiding the need for hand-crafted features.^[50]^[51] Recurrent neural networks (RNNs), particularly long short-term memory (LSTM) variants, excel in FDI tasks involving sequential data, where they capture temporal dependencies in time-series signals for early fault detection. LSTMs mitigate vanishing gradient issues in standard RNNs through gating mechanisms that selectively retain relevant historical information, making them suitable for monitoring dynamic processes like fluid leaks in pipelines. In such applications, LSTM models trained on pressure and flow data have detected anomalies with precision exceeding 90%, enabling isolation of fault locations by analyzing sequence patterns over time.^[52]^[53] Autoencoders provide an unsupervised framework for anomaly detection in FDI by learning compressed representations of normal system behavior, flagging deviations as potential faults through reconstruction error thresholds. The encoder compresses input data into a latent space, while the decoder reconstructs it; high errors on test data indicate faults, facilitating isolation without labeled examples. Variational autoencoders (VAEs) extend this by incorporating probabilistic modeling, where the latent space follows a prior distribution (e.g., Gaussian), allowing generative sampling for fault scenario simulation and probabilistic isolation in noisy industrial processes. VAEs have shown robust performance in process monitoring by quantifying uncertainty in fault likelihood, with fault detection rates around 83-87% in industrial applications.^[54]^[55] Transfer learning addresses data scarcity in FDI, particularly for rare faults, by fine-tuning pre-trained models like ResNet on domain-specific datasets. ResNet's residual connections enable deep architectures to learn transferable features from large-scale image tasks, which are adapted for vibration spectrograms, yielding accuracies up to 98% even with limited fault samples in mechanical systems. This approach mitigates overfitting in scarce-data scenarios, such as infrequent actuator failures, by initializing with ImageNet weights and retraining only the classifier layers.^[56]^[57] Hybrid models, such as CNN-LSTM architectures, integrate spatial and temporal feature extraction for spatio-temporal FDI challenges, like fault propagation in networked systems. CNN layers process local patterns in signal spectra, while LSTM layers model sequence evolution, enabling comprehensive isolation of dynamic faults. Training employs backpropagation to minimize loss functions tailored to FDI, such as cross-entropy for multi-class isolation:

L = -\sum_{i=1}^{C} y_i \log(\hat{y}_i)

where y_i is the true label for class i among C fault types, and \hat{y}_i is the predicted probability from softmax. Backpropagation computes gradients via the chain rule, updating weights layer-by-layer to optimize fault discrimination; in bearing diagnostics, these hybrids have shown improved accuracy by fusing vibration and temporal trends.^[58]^[59] Recent advances as of 2025 include autonomous AI agents for fault detection and self-healing in smart manufacturing systems, as well as enhanced AI integration for diagnosing faults in electric vehicles, improving real-time adaptability and prognostics.^[60]^[61]

Robust and Advanced FDI

Handling Uncertainties and Disturbances

In robust fault detection and isolation (FDI), uncertainties arise from various sources that can degrade the performance of diagnostic schemes, including parametric uncertainties due to modeling errors in system parameters, nonparametric uncertainties from unmodeled dynamics, and stochastic uncertainties manifested as measurement noise or process disturbances. These uncertainties must be explicitly addressed to ensure reliable residual generation and evaluation, as they can mimic fault signatures and lead to false alarms or missed detections. H∞ filtering provides a minimax approach to robust FDI by minimizing the worst-case energy gain from disturbances to residuals, thereby achieving disturbance rejection while maintaining sensitivity to faults. In this framework, residual generators are designed as H∞ filters that bound the influence of uncertainties, ensuring that the H∞ norm of the transfer function from disturbances to residuals remains below a prescribed level, often formulated as a standard filtering problem solvable via linear matrix inequalities (LMIs).^[62] This method extends basic observer-based techniques by incorporating robustness constraints, allowing for effective isolation even under bounded energy disturbances. Adaptive thresholds enhance robustness by establishing time-varying bounds on residuals that account for estimated disturbance levels, often integrated with unknown input decoupling in observer designs to eliminate the direct effect of disturbances on fault signatures.^[63] For instance, in linear parameter-varying (LPV) systems, interval observers generate adaptive thresholds that dynamically adjust based on uncertainty bounds, reducing false alarms without compromising fault detectability.^[64] This approach decouples unknown inputs—such as external disturbances—from the residual dynamics, ensuring that thresholds reflect only the residual's sensitivity to faults.^[65] Fuzzy logic integration addresses nonlinear uncertainties in FDI by employing membership functions and rule bases to model vague or imprecise knowledge about system behavior under disturbances.^[66] Takagi-Sugeno fuzzy observers, for example, approximate nonlinear dynamics with local linear models weighted by fuzzy rules, generating residuals robust to parametric variations and unmodeled nonlinearities while isolating faults through defuzzified decision logic.^[67] This method is particularly effective for systems where uncertainties defy precise quantification, allowing rule-based compensation for disturbances in real-time applications.^[68] Performance guarantees in robust FDI involve explicit trade-offs between fault sensitivity—measured by the minimum gain from faults to residuals—and disturbance robustness, often quantified using condition numbers that assess the ill-conditioning of residual generators under uncertainty.^[69] Seminal analyses show that optimizing the H−/H∞ index balances these objectives, with higher condition numbers indicating vulnerability to disturbances that could mask faults, thus guiding filter design to achieve specified detection rates while bounding false alarm probabilities.^[70] Such guarantees ensure that robust FDI schemes maintain efficacy across operating regimes, prioritizing high-impact metrics like the disturbance-to-fault sensitivity ratio over exhaustive benchmarks.^[71]

Integrated Fault-Tolerant Systems

Integrated fault-tolerant systems embed fault detection and isolation (FDI) mechanisms directly into control architectures to ensure continuous operation despite faults, enabling seamless transitions from nominal to degraded modes. Fault-tolerant control (FTC) strategies are categorized into passive and active paradigms. Passive FTC relies on robust controllers designed a priori to tolerate predefined faults without requiring real-time diagnosis, leveraging techniques like sliding mode control for inherent resilience against uncertainties.^[72] In contrast, active FTC incorporates FDI outputs to dynamically reconfigure the system, such as adjusting control laws based on fault severity, which enhances adaptability but demands faster computation.^[73] The integration of FDI with FTC facilitates real-time fault estimation that directly informs controller adjustments, minimizing performance degradation. In this framework, FDI modules estimate fault parameters—such as magnitude and location—using observer-based or data-driven methods, which are then fed into adaptive control gains to compensate for anomalies. A prominent example is in flight control systems with redundant actuators, where FDI detects partial failures in hydraulic or electro-mechanical actuators, enabling the controller to redistribute commands among healthy units while preserving stability during maneuvers. This approach has been demonstrated in simulations of civil aircraft.^[74] Reconfigurable control within integrated FTC often employs model predictive control (MPC) to adjust trajectories based on isolated faults, optimizing future states under constraints like actuator limits. Upon fault isolation, MPC reformulates its optimization problem to incorporate fault effects, such as reduced effector authority, ensuring constraint satisfaction and reference tracking. Stability is rigorously guaranteed through Lyapunov analysis, where a Lyapunov function—typically quadratic in state errors—is constructed to prove asymptotic convergence even under reconfiguration, with terminal constraints ensuring recursive feasibility. Such methods have shown robust performance in nonlinear systems.^[75] Hierarchical architectures in integrated FTC position the FDI layer above the control layer, allowing modular fault handling across system scales. The FDI layer processes raw sensor data for detection and isolation, passing refined fault signatures to the lower control layer for reconfiguration, which promotes scalability in complex systems like multi-agent networks. Voting mechanisms enhance reliability in multi-sensor fusion by aggregating outputs from redundant sensors—such as majority or weighted voting—to isolate faulty readings, thereby improving FDI accuracy in noisy environments. This structure has been applied in distributed systems.^[76] Compliance with standards like ISO 26262 is essential for automotive FTC implementations, mandating hazard analysis, fault injection testing, and ASIL-rated architectures to achieve functional safety up to ASIL D. The standard requires verifiable fault tolerance through metrics like diagnostic coverage exceeding 99% for high-risk items, guiding the design of redundant electronics and software partitioning.^[77] As of 2024-2025, recent developments in FTC for unmanned aerial vehicles include reinforcement learning-based approaches for quadrotor fault tolerance and distributed control for drone swarms, improving adaptability to actuator faults in dynamic environments.^[78]

Fault Recovery

Accommodation Strategies

Accommodation strategies in fault detection and isolation (FDI) focus on immediate mitigation of fault effects through software-based adjustments, enabling continued system operation without structural changes until repairs can be performed. These techniques typically activate post-fault isolation, substituting or compensating for faulty components to maintain stability and performance. For instance, in sensor faults, virtual sensors generate estimates to replace erroneous measurements, while actuator faults may be addressed by adapting control gains. Such approaches are essential in safety-critical systems, where rapid response minimizes downtime and prevents cascading failures. Sensor fault accommodation often employs virtual sensors, which use state estimation algorithms to substitute faulty readings with predicted values derived from system models and healthy sensor data. This method reconstructs the sensor output by integrating observer-based techniques, such as Kalman filters or sliding mode observers, to ensure continuity in feedback loops. In applications like wind turbines, virtual sensors have demonstrated effective fault hiding by maintaining control accuracy despite sensor degradation.^[79] Similarly, for grid-side converters, virtual sensors enable fault accommodation through estimation techniques.^[80] Actuator fault accommodation commonly involves gain scheduling, where control parameters are dynamically adjusted based on the identified fault magnitude to redistribute control effort among remaining actuators. This technique leverages linear parameter-varying (LPV) models to interpolate gains that compensate for partial actuator losses, ensuring robust performance under varying operating conditions. In aeroengine control, gain-scheduled robust controllers accommodate performance degradation by estimating fault impacts and optimizing thrust response.^[81] For networked systems, internal model control (IMC)-based PID architectures facilitate actuator fault tolerance through scheduled gains, minimizing overshoot in response to faults up to 50% effectiveness loss.^[82] Isolation-based responses utilize pre-computed lookup tables that map identified fault modes to predefined accommodation actions, allowing swift implementation in real-time systems. These tables store optimized control adjustments for common fault scenarios, derived from offline simulations or historical data, and are particularly effective in process industries where computational resources are limited. In chemical plants, lookup tables enable rapid switching to backup control laws upon fault isolation, such as adjusting valve positions to maintain reaction stability; studies report accommodation times under 1 second for multi-variable processes. This approach reduces reliance on online optimization, enhancing reliability in environments with high fault predictability. Soft computing methods, such as model predictive fault accommodation, employ optimization to minimize a cost function balancing fault impact and control effort. The objective is formulated as:

\min J = \sum_{k=1}^{N} \left( \| \hat{y}(k) - y_{ref}(k) \|^2_Q + \| \Delta u(k) \|^2_R \right) + \sum_{k=1}^{N} \| f(k) \|^2_P

where J incorporates predicted outputs \hat{y}, reference y_{ref}, control increments \Delta u, estimated fault f, and weighting matrices Q, R, P; this setup accommodates faults by constraining inputs to feasible sets while prioritizing performance recovery. In omni-directional mobile robots, such predictive schemes have achieved fault tolerance for wheel actuator failures, restoring trajectory tracking. For nonlinear systems like two-rotor aero-dynamical setups, neural network-enhanced MPC ensures accommodation without full reconfiguration.^[83]^[84] Despite their efficacy, accommodation strategies serve as temporary measures, bridging the gap to physical repairs, and are evaluated using metrics like response time from fault isolation to effective mitigation. Limitations include dependency on accurate fault estimation, potential performance degradation in severe faults, and increased computational load in optimization-based methods, which may exceed real-time constraints in resource-limited settings. In practice, targets for rapid accommodation in critical systems help avoid safety violations. A representative example is fault bypassing in hydraulic systems via parallel paths, where redundant flow routes activate upon detecting a blockage or leak in the primary actuator path. This software-mediated rerouting maintains pressure and flow continuity, as seen in heavy-duty mobile machinery, where parallel cylinder configurations rephase to compensate for single-path failures, preserving lifting capacity with minimal speed loss (typically <10%). Such strategies highlight the role of accommodation in extending operational life without hardware intervention.

System Reconfiguration

System reconfiguration in fault detection and isolation (FDI) involves dynamically altering the system's architecture after a fault has been detected and isolated to restore operational functionality and maintain performance objectives. This process contrasts with mere accommodation by emphasizing structural changes, such as rerouting resources or switching components, to adapt to the degraded state. Effective reconfiguration minimizes downtime and ensures the system continues to meet safety and reliability requirements in critical applications like aerospace and robotics.^[85] Hardware reconfiguration primarily relies on redundancy mechanisms to switch to backup components upon fault occurrence. In avionics systems, failover techniques enable seamless transition to redundant hardware, such as spare actuators or processors, to prevent mission failure. For instance, integrated modular avionics (IMA) employs multiprocessor reconfiguration algorithms that isolate faulty modules and redistribute tasks across healthy ones, enhancing overall fault tolerance. A prominent voting scheme is triple modular redundancy (TMR), where three identical hardware modules process inputs in parallel, and a majority vote determines the output, effectively masking single-point failures with a reliability improvement factor of up to 10^6 in radiation-prone environments. TMR has been integral to systems like the Apollo guidance computer, ensuring continued operation despite transient faults.^[86]^[87]^[88] Software reconfiguration focuses on updating control algorithms without hardware changes, often through adaptive mechanisms that modify system behavior in real-time. Adaptive control laws, updated via online parameter identification, allow the system to compensate for faults by recalibrating gains or switching to alternative controllers. In cabin pressure control systems, simple adaptive control (SAC) reconfigures by incorporating a parallel feedforward compensator, maintaining stability during actuator partial failures (e.g., 50% loss) or sensor drifts without requiring explicit fault models. In robotics, reconfiguration handles joint failures by redistributing tasks among redundant degrees of freedom; for a 2-DOF manipulator with a locked joint, the control law adapts to preserve workspace functionality using kinematic redundancy. Hybrid fault-tolerant control (FTC) in industrial robots combines passive robustness with active reconfiguration, improving recovery in multi-joint scenarios.^[89]^[90] Hybrid approaches integrate hardware and software by modeling the system as a graph, enabling topology changes for optimal fault recovery. Graph-based models represent components as nodes and connections as edges, allowing algorithms to identify and reroute paths post-fault. For industrial plants, directed weighted graphs simulate fault propagation and use genetic algorithms to activate switch nodes, minimizing cascade effects while preserving service capacity (e.g., maintaining 80-90% of total service in node failure simulations). Dijkstra's algorithm computes shortest paths for rerouting in sparse topologies, ensuring efficient resource allocation; in a 100-node network, it reduces reconfiguration actions to 1-3 flips, boosting node survival to 99%. These methods leverage redundancy at both levels, such as combining TMR hardware with adaptive software overlays.^[91] Key challenges in system reconfiguration include managing time delays during transitions and guaranteeing post-reconfiguration stability. Detection and switching delays can destabilize the system, particularly in switched control architectures where short dwell-times conflict with closed-loop stability requirements; delays exceeding 10-20% of the system time constant may lead to oscillations or divergence. Stability assurance often employs invariant sets, which define regions in state space where trajectories remain confined post-reconfiguration, ensuring bounded errors and convergence. For switching systems, maximal controlled invariant sets are computed offline to verify safety specifications, with online set-membership tests minimizing computational overhead while providing global stability guarantees.^[92]^[93] Performance is evaluated using metrics like recovery success rate and post-reconfiguration degradation. Recovery success rate measures the percentage of faults where full or partial functionality is restored, often exceeding 95% in redundant avionics with TMR but dropping to 70-80% in non-redundant robotics without timely reconfiguration. Post-reconfiguration performance degradation quantifies losses in metrics such as tracking error or throughput; these metrics highlight the trade-off between rapid recovery and sustained efficiency, guiding design for minimal impact (e.g., <5% degradation in high-reliability applications).

Applications

Industrial and Mechanical Systems

In industrial and mechanical systems, fault detection and isolation (FDI) plays a crucial role in maintaining operational efficiency, particularly through mechanical fault diagnosis targeting common failure points such as gearboxes and bearings. Vibration analysis is a primary technique for diagnosing gearbox faults, employing time-domain methods like waveform analysis and statistical indices (e.g., kurtosis and crest factor) to detect anomalies such as gear wear or misalignment, as well as frequency-domain approaches including Fourier transforms to identify characteristic fault frequencies. For bearing faults, which account for over 41% of machine breakdowns, vibration techniques such as root mean square (RMS) measurements, crest factor analysis, and spectral envelope methods enable early detection of defects like inner race cracks by isolating impulsive signals from background noise.^[94] A representative case study in predictive maintenance involves wind turbines, where FDI systems using vibration monitoring for pitch system faults—responsible for up to 20% of downtime—have demonstrated reductions in unplanned outages by up to 12% through timely fault isolation and accommodation strategies.^[95]^[96] In process industries like chemical plants, model-based FDI approaches are widely applied to detect and isolate faults such as valve leaks, which can compromise safety and efficiency. These methods generate residuals from discrepancies between observed and predicted system behavior, using techniques like neural networks trained on valve performance metrics (e.g., rise time, overshoot) to diagnose actuator faults including diaphragm leakage or supply pressure issues without additional hardware.^[97] For instance, in a fluid catalytic cracking (FCC) pilot plant, a causal model-based diagnostic module employing fuzzy logic and hitting-set algorithms isolated valve leaks between the stripper and column in 5 minutes, compared to 50 minutes via manual operator assessment, enhancing process reliability.^[98] Statistical methods complement these in batch processes, where multivariate statistical process control (MSPC) techniques, such as principal component analysis (PCA) and partial least squares (PLS), monitor trajectory deviations to detect faults like inconsistent reaction rates, enabling isolation in chemical batch reactors by aligning historical data phases.^[99] Implementation of FDI in industrial settings often involves integration with supervisory control and data acquisition (SCADA) systems, where FDI modules process real-time sensor data to generate alarms and isolate faults, as demonstrated in longwall mining machinery where SCADA-enabled FDI reduced downtime by identifying shearer drum overloads.^[100] A notable real-world example is Siemens' deployment of FDI-enhanced systems in factories post-2010, such as the Amberg Electronics Plant, which uses AI-driven fault diagnostics integrated into production lines to achieve near-zero defect rates and predictive maintenance, supporting Industry 4.0 transitions.^[101]^[102] Challenges in these environments include sensor degradation due to harsh conditions like high temperatures, corrosive chemicals, and mechanical vibrations, which can introduce false positives in FDI signals and necessitate robust, high-temperature electronics for reliable operation.^[103] However, Industry 4.0 advancements with Internet of Things (IoT) data mitigate these by enabling distributed sensing and cloud-based analytics, improving FDI accuracy through real-time fusion of multi-sensor inputs and reducing fault propagation in manufacturing chains.^[104] Quantitative impacts of FDI in heavy machinery highlight significant cost savings from reduced unplanned outages, with predictive approaches yielding 15-30% lower maintenance expenses by minimizing reactive repairs and extending asset life, as evidenced in sectors like mining and power generation.^[105]^[106]

Aerospace and Automotive Systems

In aerospace systems, fault detection and isolation (FDI) is critical for maintaining operational safety in high-stakes environments like engine health monitoring, where model-based methods analyze sensor data to identify anomalies in gas turbine performance. For instance, General Electric employs ensemble-based hierarchical classifiers for diagnosing and isolating faults in Frame 9 gas turbines, leveraging time-series data from sensors to detect degradation in components such as compressors and turbines.^[107] Similarly, NASA-developed architectures use model-based approaches for gas path FDI in aircraft engines, processing streaming data through Kalman filters and residual generation to achieve precise isolation of faults like sensor biases or actuator failures with minimal false alarms.^[108] These techniques ensure early detection, often within sub-second timelines, to support real-time decision-making during flight.^[109] Flight control systems in modern aircraft incorporate redundancies and fault-tolerant control (FTC) to handle FDI seamlessly. The Boeing 787's primary flight computers feature triple-redundant fly-by-wire architecture, where faults in actuators or sensors trigger automatic reconfiguration to backup channels, maintaining stability even under multiple failures.^[110] This design achieves high reliability in fault isolation for critical flight phases, aligning with FAA and EASA certification requirements that mandate robust FDI validation through extensive simulation and flight testing to ensure system integrity under 14 CFR Part 25 and CS-25 standards.^[111] A notable case is the Airbus A380's implementation in the 2000s, where electrohydrostatic actuators (EHAs) in the hydraulic systems enable fault recovery by switching to electrical backups upon detection of pressure losses or leaks.^[112] In automotive applications, FDI focuses on real-time diagnostics for safety-critical components, with On-Board Diagnostics II (OBD-II) standards enabling isolation of faults in brakes and engines through standardized diagnostic trouble codes (DTCs) and protocols like ISO 15765-4 (CAN).^[113] OBD-II systems monitor parameters such as brake pressure and engine misfires, triggering isolation via ECU analysis to comply with emissions and safety regulations, often detecting issues in under a second to avert accidents.^[114] Advanced driver-assistance systems (ADAS) integrate deep learning for sensor fault detection, using neural networks to identify failures in cameras or radars, as seen in Tesla's Autopilot evolutions since 2018, where machine learning models process fleet data to enhance isolation accuracy and mitigate risks like phantom braking.^[115] Compliance with ISO 26262 governs these systems, assigning Automotive Safety Integrity Levels (ASIL) from A to D based on hazard severity, exposure, and controllability; for example, brake FDI typically requires ASIL D, demanding probabilistic metrics like ≥99% diagnostic coverage to prevent systematic failures.^[116] Electric vehicle (EV) battery management exemplifies 2020s FDI advancements, with model-based and data-driven methods isolating cell-level faults to prevent thermal runaway. Techniques such as electrochemical modeling and machine learning detect imbalances in voltage or temperature, isolating faulty modules via battery management system (BMS) algorithms to avert propagation, achieving sub-second response times critical for passenger safety.^[117] These approaches integrate with ISO 26262 to ensure fault-tolerant operation under high-stress conditions like fast charging.^[118] Overall, aerospace and automotive FDI prioritizes sub-second detection and high reliability to meet stringent regulatory demands, enabling proactive recovery in dynamic transport scenarios.^[119]

References

[1]
(PDF) Overview on Fault Detection and Isolation - ResearchGate
Jul 20, 2022 · This paper presents a detailed survey of fault detection and isolation methods and reviews of scientific researches in this field.
[2]
https://doi.org/10.3390/s24082656
[3]
Comparison of fault detection and isolation methods: A review
### Summary of Fault Detection and Isolation Methods Comparison
[4]
https://doi.org/10.1016/0005-1098(90)90018-D
[5]
[PDF] Adaptive Fault Detection and Isolation for DC Motor Input and Sensors
This paper is devoted to the actuator and sensor adaptive fault detection and isolation for armature controlled direct current motors. Unknown input observers ...
[6]
A Review of Parity Space Approaches to Fault Diagnosis
This paper reviews the state of the art in fault detection and isolation for dynamic systems, based on the parity space concept.
[7]
[PDF] Analytical Redundancy and the Design of Robust Failure Detection ...
Using the concept of parity relations, residuals can be generated in a number of ways and the design of a robust residual generation process can be formulated ...
[8]
Fault-Diagnosis Systems - SpringerLink
In stockThe book covers fault detection methods, including signal analysis and process models, and fault-diagnosis methods like classification and inference, and fault ...Missing: survey | Show results with:survey
[9]
Deep Learning Techniques in Intelligent Fault Diagnosis and ... - NIH
This paper tries to give a comprehensive guideline for further research into the problem of intelligent industrial FDP for the community.
[10]
Model-based fault-detection and diagnosis – status and applications§
Additive faults appear, e.g., as offsets of sensors, whereas multiplicative faults are parameter changes within a process. Now lumped-parameter processes are ...
[11]
Model-based fault-detection and diagnosis – status and applications
Model-based methods of fault-detection were developed by using input and output signals and applying dynamic process models.<|separator|>
[12]
[PDF] MODEL-BASED FAULT DETECTION AND DIAGNOSIS
Hence, parity equations are suitable for the detection of additive faults. They are simpler to design and to implement than output observer-based approaches and ...
[13]
A survey and classification of incipient fault diagnosis approaches
Incipient faults almost occur gradually at a low rate in systems and usually are unnoticeable during their early stages. If diagnostic tools or proper ...
[14]
(PDF) Model-based Fault Diagnosis in Dynamic Systems Using ...
The pro-posed fault diagnosis scheme has been tested on an real indus-trial chemical process in the presence of sensor, actuator and component faults. The ...
[15]
Detection of Additive and Multiplicative Faults - Parity Space vs ...
Faults which appear in technical processes can often be described as additive or multiplicative faults with respect to the process model.
[16]
Fault Detection and Severity Level Identification of Spiral Bevel ...
This study uses AI techniques, specifically ANN and KNN, to detect and identify the severity of faults in spiral bevel gears under different operating ...
[17]
[PDF] Fault Diagnosis and Fault Severity Prediction Based on ...
While less severe faults might merely degrade performance, affecting the quality and quantity of output, serious faults can lead to complete system shutdowns,.
[18]
1. Introduction - SpringerLink
Since without fault detection it is impossible to perform fault isolation and, consequently, fault identification, all efforts regarding the improvement of.
[19]
https://doi.org/10.1016/S1474-6670(17)50824-1
[20]
Improved diagnosis of hybrid systems using instantaneous ...
One approach to quantitative model-based fault detection and isolation (FDI) is based on analytical redundancy relations (ARRs) and fault signatures.
[21]
Fault diagnosis of machines via parameter estimation and ...
The paper describes a general methodology for machines and other processes by using few measurements, dynamic process and signal models and parameter estimation
[22]
https://doi.org/10.3166/ejc.7.625-637
[23]
On Fault Detectability and Isolability - ScienceDirect
Several existing detectability and isolability definitions are reviewed. It is argued that two types of definitions have to be distinguished.
[24]
Fault diagnosis in dynamic systems using analytical and knowledge ...
The paper reviews the state of the art of fault detection and isolation in automatic processes using analytical redundancy, and presents some new results.
[25]
[PDF] Generation of Analytical Redundancy Relations for FDI purposes
Aug 13, 2008 · This paper presents the fundamental results obtained in this area. Keywords: redundancy relations, fault detection and isolation, model-based ...
[26]
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12158299/
[27]
A Review of Signal Processing Techniques for Detection and ...
This paper reviews signal processing techniques for fault detection, including Machine learning (ML), AI, Wide Area Measurement (WAM), and Phasor Measurement ...
[28]
A review of signal processing for fault diagnosis in systems with ...
This paper reviews the performance of 133 variations of signal processing techniques, aiming to highlight those that present the most significant potential for ...
[29]
A Fault Diagnosis Method in BLDC Motor Drive Systems Using ...
A Fault Diagnosis Method in BLDC Motor Drive Systems Using Moving Average Filter for Back Electromotive Force Signal Processing. Abstract: BLDC has the ...Missing: detection | Show results with:detection
[30]
Wavelet based rule for fault detection - ScienceDirect.com
The paper presents wavelet based fault detection method and the analysis of error rates. The proposed method is based on signal representations in Daubechies ...
[31]
(PDF) Fault Isolation Based on Wavelets Transform - ResearchGate
This paper evaluates how wavelet transform can be used to detect and isolate particular faults. The diagnostic method that is proposed is based on the ...
[32]
Frequency Energy Analysis in Detecting Rolling Bearing Faults
This research explores a method of classifying rolling bearing faults utilizing the total energy gathered from the Power Spectral Density (PSD) of a Fast ...
[33]
Multi-fault diagnosis of ball bearing using FFT, wavelet energy ...
The analysis results from ball bearing signals with six different faults in various working conditions show that the diagnosis approach based on using wavelet ...
[34]
Continuous wavelet transform technique for fault signal diagnosis of ...
A fault signal diagnosis technique for internal combustion engines that uses a continuous wavelet transform algorithm is presented in this paper.
[35]
A Comparative Study of Time–Frequency Representations ... - MDPI
Spectrograms are generated using the STFT, which analyzes non-stationary signals ... This makes CWT particularly well-suited for analyzing non-stationary signals ...
[36]
A Review of Feature Extraction Methods in Vibration-Based ... - MDPI
This paper presents an empirical study of feature extraction methods for the application of low-speed slew bearing condition monitoring.
[37]
[PDF] Chapter 4 REVIEW OF VIBRATION ANALYSIS TECHNIQUES
domain techniques, such as RMS, Crest Factor and Kurtosis, provide good bearing fault detection capabilities if the signal-to-noise ratio is sufficiently high.
[38]
Subspace method aided data-driven design of fault detection and ...
This paper deals with data-driven design of fault detection and isolation (FDI) systems. The basic idea is to identify a primary form of residual generators ...
[39]
[PDF] Statistical Fault Detection with Applications to IMU Disturbances
Often are sums of squared Gaussian residuals used as a test statistic and the result will then be chi-square distributed if the residuals are white and Gaussian ...
[40]
Sensor and actuator fault isolation by structured partial PCA with ...
Partial PCA based on the link between PCA and parity relations is a useful method in fault isolation. By performing PCA on subsets of variables, a set of ...
[41]
Fault Detection and Isolation in Dynamic Systems Using Principal ...
This paper proposes a decomposition of a global system in different subsystems by means of PCA framework, in respect to the IFATIS european project (EU-IST-2001 ...Missing: driven | Show results with:driven
[42]
Kernel Generalized Likelihood Ratio Test for Fault Detection of ...
In this paper, we develop an improved fault detection (FD) technique in order to enhance monitoring abilities of nonlinear chemical processes.
[43]
https://ieeexplore.ieee.org/document/8096094
[44]
Testing the covariance matrix of the innovation sequence with ...
Aug 6, 2025 · A new statistical fault detection technique based on the Kalman filter innovation covariance testing is proposed. The generalized variance ...<|separator|>
[45]
Fault detection of uncertain chemical processes using interval partial ...
Therefore, this work addresses the problem of fault detection of uncertain chemical processes using interval input-output PLS-based generalized likelihood ratio ...
[46]
A fault detection, isolation, and identification technique for complex ...
This paper presents a method, based on a generalized likelihood ratio test (GLRT), for combining fault detection, isolation, and identification in complex ...
[47]
Detection and Estimation of Multiple Fault Profiles Using ...
This paper discusses a fault detection scheme based on a tunable generalized likelihood algorithm. We discuss the detector algorithm, and then demonstrate its ...
[48]
Convolutional Neural Network Based Fault Detection for Rotating ...
Sep 1, 2016 · In this article we propose a feature learning model for condition monitoring based on convolutional neural networks.Missing: isolation seminal
[49]
Convolutional Neural Networks for Fault Diagnosis Using Rotating ...
This paper will focus on developing a convolutional neural network (CNN) to learn features directly from frequency data of vibration signals and testing the ...Missing: isolation seminal
[50]
Application of long short-term memory recurrent neural networks for ...
In this paper, long short-term memory recurrent neural networks were trained and tested to CH 4 leakage source in a chemical process module.
[51]
[PDF] Real-time pipeline leak detection and localization using an attention ...
Apr 12, 2023 · The second step is to utilize the LSTM network to learn the temporal information and classify the sequential data. The recurrent Neural. Network ...
[52]
Exploiting Autoencoder-Based Anomaly Detection to Enhance ...
The paper proposes a semi-supervised hybrid deep learning model using AE-GRU and anomaly detection to enhance cybersecurity in smart grids, accurately ...
[53]
Fault Detection and Diagnosis in Industrial Processes with ... - NIH
Dec 29, 2021 · This work considers industrial process monitoring using a variational autoencoder (VAE). As a powerful deep generative model, the variational ...
[54]
Deep transfer learning strategy for efficient domain generalisation in ...
Apr 24, 2023 · Wen et al. proposed a TL strategy that includes the use of a pre-trained ResNet-50 network to identify fault by fine-tuning just the fully ...
[55]
A Novel Mechanical Fault Diagnosis Based on Transfer Learning ...
For fault diagnosis, convolutional neural networks (CNN) have been performing as a data-driven method to identify mechanical fault features in forms of ...
[56]
Machine Fault Detection Using a Hybrid CNN-LSTM Attention ... - NIH
In this paper, the issue of predicting electrical machine failures by predicting possible anomalies in the data is addressed through time series analysis.Missing: spatio- | Show results with:spatio-
[57]
Bearing fault diagnosis with parallel CNN and LSTM - AIMS Press
Jan 16, 2024 · We construct a fault diagnostic model based on convolutional neural network (CNN) and long short-term memory (LSTM) parallel network to extract their temporal ...
[58]
Standard H ∞ Filtering Formulation of Robust Fault Detection
This paper studies the robust fault detection problem using the standard H∞ filtering formulation. With this formulation, the minimization of the ...Missing: seminal | Show results with:seminal
[59]
Robust fault detection based on adaptive threshold generation using ...
Jul 6, 2011 · In this paper, robust fault detection based on adaptive threshold generation of a non-linear system described by means of a linear ...
[60]
[PDF] Robust fault detection based on adaptive threshold generation using ...
SUMMARY. In this paper, robust fault detection based on adaptive threshold generation of a non-linear system described.
[61]
Using unknown input observers for robust adaptive fault detection in ...
The purpose of this manuscript is to construct natural observers for vector second-order systems by utilising unknown input observer (UIO) methods.
[62]
Fuzzy‐Logic‐Based Control, Filtering, and Fault Detection for ...
Oct 25, 2015 · This paper is concerned with the overview of the recent progress in fuzzy-logic-based filtering, control, and fault detection problems.
[63]
Fuzzy Model-Based Fault Detection Approach for Nonlinear Control ...
This work uses Takagi-Sugeno fuzzy systems to approximate nonlinear processes, a fuzzy controller, image representations, and signal errors for fault detection.
[64]
Fuzzy Kalman observer for fault detection in nonlinear discrete ...
Dec 10, 2015 · This paper presents an approach to design a fuzzy augmented state Kalman observer based on interval type-2 fuzzy logic for state estimation ...Missing: integration | Show results with:integration
[65]
Integrated trade‐off design of fault detection system for linear ...
Feb 1, 2013 · Optimal trade-off between sensitivity to faults and robustness to disturbances does not guarantee optimal trade-off between FDR and FAR.
[66]
Norm-based design of robust FDI schemes for uncertain systems ...
Aug 9, 2025 · To compare the fault sensitivity and robustness performances of ... disturbance. First, an internal dynamic variable is incorporated ...
[67]
[PDF] Distributionally robust trade-off design of parity relation based fault ...
Inspired by robust control concepts, one worst-case approach employs system norms to describe robustness to disturbances and sensitivity to faults. It first ...
[68]
A Survey on Active Fault-Tolerant Control Systems - MDPI
Fault, based on its location, can be classified to sensor, actuator, and plant (component or parameter) faults [17]. 2.1. Fault Types and Causes. In general, ...
[69]
Fault-tolerant control systems: A comparative study between active ...
This paper demystifies active and passive fault-tolerant control systems (FTCSs) by examining the similarities and differences between these two approaches.Missing: seminal | Show results with:seminal
[70]
Integrated sensor/actuator FDI and reconfigurable control for fault ...
Aug 6, 2025 · In this paper an approach to fault-tolerant flight control system design based on the integration of sensor/actuator fault detection and ...
[71]
Proactive fault-tolerant model predictive control - IEEE Xplore
In this paper, we propose a proactive fault-tolerant Lyapunov-based model predictive controller (LMPC) that can effectively deal with an incipient control ...
[72]
(PDF) Hierarchical Design of Distributed Fault Tolerant Control ...
PDF | This work deals with the description of a design procedure for hierarchical fault tolerant control (FTC) of large, distributed system. Following a.
[73]
Multi-Sensor Fault Detection, Identification, Isolation and Health ...
This paper proposes a novel fault detection, isolation, identification and prediction (based on detection) architecture for multi-fault in multi-sensor systems ...
[74]
Model-based diagnosis and fault tolerant control for ensuring torque ...
This paper studies the necessary steps for achieving functional safety using this model-based approach, and presents a case study on torque functional safety of ...Missing: unmanned | Show results with:unmanned
[75]
A Fault-Tolerant Control Architecture for Unmanned ... - Georgia Tech
Like previous hierarchical fault-tolerant control architectures, this one is expandable vertically. However, this architecture separates itself from its ...
[76]
Angle of attack prediction using recurrent neural networks in flight ...
Dec 17, 2021 · The best method to cope with faulty sensor measurements is to create trustworthy virtual sensors to replace them. The approach focuses on ...
[77]
Accommodation of actuator fault using local diagnosis and IMC-PID
Oct 8, 2014 · This paper presents an Internal Model Control (IMC) based PID control system architecture that can tolerate faulty actuators in a networked ...
[78]
Model Predictive Fault Tolerant Control for Omni-directional Mobile ...
May 27, 2019 · This paper describes the design of a Fault Tolerant Control scheme for an omni-directional mobile robot with four mecanum wheels.
[79]
https://www.sciencedirect.com/science/article/abs/pii/S096706611300230X
[80]
[PDF] Reliability of Fault Tolerant Control Systems: Part I 1
This paper reports. Part. I of a two part effort, that is intended to delineate the relationship between reliabil- ity and fault tolerant control.
[81]
Reconfigurable Fault-tolerant Control: A Tutorial Introduction
This paper provides a tutorial introduction to reconfigurable control and surveys recent advances on this topic.
[82]
[PDF] Fault-Tolerant Avionics - UNC Computer Science
The Apollo guidance and control system employed proven, highly reliable components and triple modular redundancy (TMR) with voting to select the correct output.
[83]
Hardware reconfiguration algorithm in multiprocessor systems of ...
Aug 10, 2025 · Reconfiguration of multiprocessor systems makes it possible to improve their failure-resistance that is especially important for the ...<|separator|>
[84]
Reliability analysis of the triple modular redundancy system under ...
Sep 7, 2023 · Triple modular redundancy (TMR) is a robust technique utilized in safety-critical applications to enhance fault-tolerance and reliability.
[85]
Simple Adaptive Control‐Based Reconfiguration Design of Cabin ...
Mar 30, 2021 · In particular, the reconfiguration system can update the control law online when the fault occurs without the system identification process.
[86]
Fault-tolerant control strategies for industrial robots: state of the art ...
Aug 30, 2025 · For reconfiguration or control adaptation, an alternate control law is switched to maintain functionality of the robot. Reliable and fail-safe ...
[87]
[PDF] arXiv:2302.06473v1 [eess.SY] 10 Feb 2023
Feb 10, 2023 · In this work we present a quantitative approach, employing directed graphs to the simulation and automatic reconfiguration of a fault in a ...<|separator|>
[88]
[PDF] Model-free reconfiguration mechanism for fault tolerance - HAL
Jul 4, 2011 · Note that a short detection delay requirement will need a short dwell-time that clearly conflicts with the stability of the closed-loop switched ...
[89]
(PDF) Invariant Sets and Control Synthesis for Switching Systems ...
Aug 7, 2025 · A structural procedure is proposed for solving the problem of maximal safe-set determination based on maximal controlled invari- ant sets.<|separator|>
[90]
[PDF] The Effect of Fault Detection, Diagnosis, and Recovery on ...
Dec 22, 2023 · Faults cause system instability and performance degradation. ... system performance, sudden deterioration, and delayed recovery. In ...
[91]
An Overview of Vibration Analysis Techniques for the Fault ...
This paper provides a fairly brief overview of methods and means for monitoring the condition and diagnosis of rolling bearings and also describes one of the ...
[92]
Effect of Drive and Power System Faults on Wind Turbine Shutdown ...
Aug 5, 2025 · The proposed strategies are estimated to reduce unplanned downtime by up to 12%, potentially lowering maintenance costs by approximately 8%–10%, ...
[93]
Fault diagnosis and prognosis capabilities for wind turbine hydraulic ...
Feb 1, 2025 · Defects in the pitch system are responsible for up to 20% of a wind turbine downtime. Thus, monitoring such defects is essential for avoiding it ...
[94]
Diagnosis of process valve actuator faults using a multilayer neural ...
This paper investigates the ability of a multilayer neural network to diagnose actuator faults in a Fisher-Rosemount 667 process control valve.
[95]
Model Based Diagnostic Module for a FCC Pilot Plant
It indicates that if the pressure stripper is low and the valve opening that regulates the pressure is 0%, then there is a leakage between the riser and ...
[96]
(PDF) Use of Multivariate Statistical Methods for Control of Chemical ...
This thesis focuses on the study and application of multivariate statistical methods to control product quality in chemical batch processes. These multivariate ...
[97]
Fault Detection and Identification for Longwall Machinery Using ...
Real-time fault detection and identification (FDI) offers maintenance personnel the ability to minimise, and potentially eliminate one or more of these factors, ...
[98]
Unlocking the Power of Artificial Intelligence in Manufacturing with ...
Feb 19, 2024 · The example from electronics manufacturing is not an isolated case: Siemens already employs AI in numerous applications for quality ...
[99]
FDD: Get The Savings Rolling In! - Siemens Blog
Sep 15, 2025 · Fault Detection & Diagnostics (FDD) applies rules, AI, and advanced analytics to continuously compare live system data against expected behavior ...Missing: factories | Show results with:factories
[100]
[PDF] Sensor Systems for Extremely Harsh Environments
Dec 22, 2022 · Sensor systems for harsh environments include sensing elements, integrated electronics, and signal processing, designed for high temperatures, ...
[101]
Impact of IoT on Manufacturing Industry 4.0: A New Triangular ...
Implementation of IoT has enabled the manufacturers to embrace digital transformations from multiple contexts such as customer focus, efficient productivity, ...Impact Of Iot On... · Industry 4.0 · 3. Findings
[102]
Predictive Maintenance Cost Savings | ATS
Predictive maintenance can save 8-12% over preventive, up to 40% over reactive, and 18-25% in maintenance costs, with reduced downtime.Missing: FDI 15-30%
[103]
Is Predictive Maintenance Really Cost-Effective? - Infraspeak Blog
Companies decreased maintenance costs by 12% (less than in the McKinsey study, which pointed to 18-25%) and availability improved by 9%. · Predictive maintenance ...Missing: FDI | Show results with:FDI<|control11|><|separator|>
[104]
Fault detection and isolation of gas turbine - ScienceDirect.com
The paper proposes an ensemble-based hierarchical classifier to diagnose and isolate faults in GE Frame9 gas turbines.
[105]
[PDF] An Integrated Architecture for Aircraft Engine Performance ...
The model-based approach to gas path fault detection and isolation presented in this paper is a promising architecture for the processing of streaming ...Missing: GE | Show results with:GE
[106]
Fault detection and isolation in aircraft gas turbine engines. Part 1
Jun 2, 2008 · This two-part paper formulates and validates a novel methodology of degradation monitoring of aircraft gas turbine engines with emphasis on ...Missing: GE | Show results with:GE
[107]
Boeing 787 – Flight controls - SmartCockpit - Airline training guides ...
The flight control system automatically operates in the normal mode and any system fault will automatically reset the PFCs (Primary Flight Computers). The ...
[108]
International Aircraft Certification - Federal Aviation Administration
Feb 27, 2025 · International aircraft certification includes bilateral agreements, working procedures, European Aviation Safety Agency information, and import ...
[109]
[PDF] the a380 flight control electrohydrostatic actuators, achievements ...
The hydraulic actuators are normally active while the electrically powered actuators are normally stand-by and become operative in the event of a failure of the ...
[110]
[PDF] In-flight uncontained engine failure Airbus A380-842, VH-OQA
Nov 4, 2010 · The A380 had two independent hydraulic systems identified as the Green system ... In the event of a hydraulic system failure, the following ...
[111]
Automotive Diagnostic Standards - x-engineer.org
This article delves into the core diagnostic communication standards governing modern automotive systems, including ISO 15031, ISO 27145, ISO 14229, ...
[112]
Introduction to the OBD-II Standard - Kvaser
The OBD-II standard defines DTCs (Diagnostic Trouble Codes) as well as other diagnostic information that is usually passed through a gateway to the OBD II port.Missing: fault isolation
[113]
Safety assurance for automated systems in transport: A collective ...
Tesla's Artificial Intelligence (AI) team uses data collected from Autopilot-equipped vehicles to continuously develop its ML models and algorithms, and ...
[114]
A Guide to Automotive Safety Integrity Levels (ASIL) - Jama Software
ASIL is defined by the ISO 26262 standard, part nine, and is adapted from the Safety Integrity Level (SIL) guidance published in IEC 61508.
[115]
Fault Detection of Li–Ion Batteries in Electric Vehicles - MDPI
A failure in just one cell can trigger thermal runaway, where the cell's temperature rises rapidly and uncontrollably, possibly causing nearby cells to overheat ...Missing: 2020s | Show results with:2020s
[116]
Thermal runaway prevention and mitigation for lithium-ion battery ...
This paper provides a comprehensive review of the current understanding of thermal runaway and the various technologies to prevent or mitigate thermal runaway.Missing: isolation | Show results with:isolation
[117]
Fault-Tolerant Platforms for Automotive Safety-Critical Applications
Fault-tolerant electronic sub-systems are becoming a standard requirement in the automotive industrial sector as electronics becomes pervasive in present cars.