Functional safety
Functional safety is the aspect of overall safety that relies on a system or equipment performing its intended safety functions correctly in response to its inputs, thereby reducing risks to an acceptable level.[1] It specifically addresses the correct operation of electrical, electronic, or programmable electronic (E/E/PE) safety-related systems to prevent hazardous events, such as physical injury, environmental damage, or property loss.[2] This concept is foundational in industries where automated control systems manage critical processes, ensuring that safety mechanisms like sensors, logic solvers, and actuators function reliably under all conditions.[3] The primary international standard governing functional safety is IEC 61508, a risk-based framework applicable across sectors including industrial automation, transportation, medical devices, and energy production.[4] Published in its first edition in 1998 and updated in subsequent revisions, IEC 61508 outlines requirements for the full safety lifecycle—from initial concept and risk assessment through design, implementation, operation, maintenance, and eventual decommissioning—to manage both systematic failures (e.g., design errors) and random hardware faults.[2] The standard is structured into seven parts, covering general requirements, hardware and software specifications, definitions, and guidelines for achieving safety integrity.[4] Central to IEC 61508 are Safety Integrity Levels (SILs), which quantify the reliability of safety functions on a scale from SIL 1 (lowest risk reduction) to SIL 4 (highest).[3] These levels are determined by probabilistic measures: for low-demand mode operations, SIL 1 requires a probability of failure on demand (PFD) between 10⁻² and 10⁻¹, escalating to 10⁻⁵ to 10⁻⁴ for SIL 4; in high-demand or continuous modes, dangerous failure rates (PFH) range from 10⁻⁶ to 10⁻⁵ per hour for SIL 1 up to 10⁻⁹ to 10⁻⁸ for SIL 4.[3] Compliance often involves third-party certification, incorporating techniques like failure modes and effects analysis (FMEA), fault injection testing, and rigorous documentation to verify that risks are mitigated to tolerable thresholds.[2] Functional safety standards like IEC 61508 have spurred sector-specific derivatives, such as IEC 61511 for process industries and ISO 26262 for automotive systems, promoting harmonized practices globally.[3] By emphasizing verifiable risk reduction through engineered safeguards rather than inherent safety alone, these frameworks enhance system resilience in increasingly complex, automated environments.[5]Fundamentals
Definition
Functional safety is defined as the part of the overall safety of a system or piece of equipment that depends on the correct functioning of the safety-related systems to achieve or maintain a safe state, particularly in response to its inputs in order to avoid hazardous situations.[1] This concept focuses on ensuring that electrical, electronic, or programmable electronic systems perform their intended safety functions reliably, thereby mitigating risks of physical injury or damage to health and the environment.[6] A key distinction in functional safety lies between fail-safe and fail-operational approaches. In a fail-safe system, upon detection of a failure, the system transitions to a predefined safe state, such as shutting down operations to prevent hazards.[7] Conversely, a fail-operational system is designed to continue performing its safety functions in a degraded but still safe manner after a failure, maintaining availability for critical operations like automated driving.[8] Central terminology in functional safety includes the tolerable hazard rate, which represents the maximum acceptable frequency of a hazardous event occurring under normal operating conditions, serving as a benchmark for risk reduction.[9] Failures are categorized into systematic failures, arising from errors in design, implementation, or operation that can be eliminated through process improvements, and random hardware failures, which are probabilistic events due to component degradation or external factors, addressed through probabilistic analysis and redundancy.[9] The concept of functional safety evolved from safety engineering practices in the 1970s, when solid-state electronics began integrating into process control systems, raising concerns about reliability in safety-critical applications.[10] This led to the development of IEC 61508 in the 1980s and 1990s as the foundational international standard, finalized between 1998 and 2000 to provide a risk-based framework for safety-related systems across industries.[11]Objectives and Importance
The primary objectives of functional safety are to ensure that electrical, electronic, or programmable electronic systems perform their intended safety functions correctly under all foreseeable conditions, thereby reducing the risk of physical injury or damage to health to an acceptable level. This involves mitigating random hardware failures through quantitative reliability measures, such as safety integrity levels that specify the probability of dangerous failure, and addressing systematic failures—arising from errors in design, development, or operation—via structured processes throughout the system lifecycle. These goals form the foundation of international standards like IEC 61508, which adopts a risk-based approach to determine the necessary performance of safety functions.[1][9] Functional safety is critically important for protecting human lives and property in high-risk domains, including transportation, manufacturing, and energy production, where system malfunctions can lead to catastrophic consequences. By minimizing the impact of human errors and equipment failures, it enhances overall system reliability and prevents accidents; for example, in vehicles, safety functions in advanced driver-assistance systems help avert collisions, contributing to efforts that address the global toll of road traffic crashes, which resulted in 1.19 million deaths in 2023. This societal impact underscores functional safety's role in fostering safer environments, particularly as automation and complexity increase in industrial applications.[12][13] Beyond human protection, functional safety delivers substantial economic benefits by averting the high costs associated with failures, such as product recalls and legal liabilities; the Takata airbag defect recall, for instance, incurred over $1 billion in direct costs to the supplier and billions more for automakers like Honda due to widespread safety risks. Regulatory drivers further amplify its importance, with mandates like the EU Machinery Directive 2006/42/EC requiring manufacturers to incorporate health and safety protections in machinery design to reduce accident risks and ensure market compliance. These factors collectively drive widespread adoption, yielding long-term savings and promoting innovation in safe technologies.[14][15]Core Concepts
Hazard Analysis and Risk Assessment
Hazard analysis and risk assessment (HARA) forms the foundational step in functional safety, involving the systematic identification of potential hazards associated with a system, evaluation of their risks, and determination of necessary safety measures to reduce those risks to tolerable levels. This process ensures that safety-related functions are defined based on the potential for harm, aligning with the risk-based approach outlined in international standards for electrical, electronic, and programmable electronic systems.[3] HARA typically proceeds through three main steps: hazard identification, which catalogs all foreseeable sources of danger under normal and abnormal conditions; risk evaluation, which assesses the combination of likelihood (probability of occurrence) and severity (potential consequences); and the establishment of tolerable risk criteria, where unacceptable risks are flagged for mitigation through safety functions.[11] Several established methods support HARA in functional safety. The Hazard and Operability Study (HAZOP) is a structured technique that examines deviations from design intent using guide words (e.g., "no," "more," "less") applied to process parameters, helping to uncover hazards and operability issues in complex systems.[16] Failure Modes and Effects Analysis (FMEA) provides a bottom-up approach by systematically reviewing potential failure modes of components, their effects on the system, and associated risks, often extended to Failure Mode, Effects, and Diagnostic Analysis (FMEDA) for quantitative fault coverage in safety mechanisms.[17] Fault Tree Analysis (FTA), a top-down deductive method, models the logical combinations of faults leading to a top-level hazardous event using Boolean gates, enabling probabilistic risk quantification where data is available.[17] These techniques are selected based on system complexity, with HAZOP suited for process-oriented systems and FMEA/FTA for hardware and software reliability assessments.[11] Risk evaluation often employs a qualitative risk matrix to classify hazards by plotting probability against severity, facilitating prioritization without requiring precise numerical data. For instance, probability scales might range from "rare" (occurs less than once in system lifetime) to "frequent" (occurs multiple times per hour of operation), while severity scales span "negligible" (no injury or minor disruption) to "catastrophic" (multiple fatalities or widespread damage). The resulting matrix assigns risk levels—low, medium, high, or intolerable—guiding decisions on whether existing controls suffice or additional safety functions are needed.[18]| Severity / Probability | Rare | Unlikely | Possible | Likely | Frequent |
|---|---|---|---|---|---|
| Catastrophic | High | High | Intolerable | Intolerable | Intolerable |
| Major | Medium | High | High | Intolerable | Intolerable |
| Moderate | Low | Medium | High | High | High |
| Minor | Low | Low | Medium | Medium | High |
| Negligible | Low | Low | Low | Low | Medium |
Safety Integrity Levels
Safety Integrity Levels (SILs) provide a quantitative framework for specifying the required performance of safety functions in functional safety systems, as defined in the international standard IEC 61508. These levels range from SIL 1 to SIL 4, with SIL 4 representing the highest degree of risk reduction needed for the most critical safety functions. The SIL indicates the reliability required to prevent dangerous failures, ensuring that the safety system achieves the necessary reduction in risk from identified hazards.[19] The performance of a safety function is measured differently depending on the mode of operation. In low-demand mode, where the safety function is invoked infrequently (less than once per year), the metric is the average probability of failure on demand (PFDavg), which represents the likelihood that the safety function fails to perform when called upon. In high-demand or continuous mode, the metric is the probability of dangerous failure per hour (PFH), quantifying the frequency of dangerous failures over time. Each SIL corresponds to specific target ranges for these metrics, as outlined in IEC 61508.[20][21] The following table summarizes the target ranges for PFDavg and PFH according to IEC 61508: Low-Demand Mode (PFDavg):| SIL | PFDavg Range |
|---|---|
| 1 | ≥ 10-2 to < 10-1 |
| 2 | ≥ 10-3 to < 10-2 |
| 3 | ≥ 10-4 to < 10-3 |
| 4 | ≥ 10-5 to < 10-4 |
| SIL | PFH Range |
|---|---|
| 1 | ≥ 10-6 to < 10-5 |
| 2 | ≥ 10-7 to < 10-6 |
| 3 | ≥ 10-8 to < 10-7 |
| 4 | ≥ 10-9 to < 10-8 |