RAID
RAID, or Redundant Array of Independent Disks (originally Redundant Arrays of Inexpensive Disks), is a data storage virtualization technology that combines multiple physical disk drives—such as hard disk drives (HDDs) or solid-state drives (SSDs)—into one or more logical storage units to enhance performance, increase storage capacity, and provide data redundancy against drive failures.[1][2] By distributing data across the array using techniques like striping (spreading data chunks across drives for parallel access) and parity (adding error-checking information), RAID mitigates the risk of data loss from single points of failure while potentially improving input/output (I/O) throughput.[3][4] The concept of RAID originated in the late 1980s amid rapid advancements in CPU and memory speeds that outpaced improvements in single large expensive disk (SLED) performance, creating an impending I/O bottleneck.[4] In 1987, researchers David A. Patterson, Garth Gibson, and Randy H. Katz at the University of California, Berkeley, coined the term and proposed RAID as an alternative using arrays of smaller, cost-effective PC disks to achieve an order-of-magnitude gain in performance, reliability, power efficiency, and scalability compared to traditional high-end drives.[1][4] Their seminal 1988 paper, "A Case for Redundant Arrays of Inexpensive Disks (RAID)," outlined five initial levels, emphasizing redundancy to overcome the higher failure rates of numerous small disks.[4] Over time, as disk costs decreased and independence from specific hardware became key, the acronym shifted to "Independent Disks," though the core principles remain.[1] RAID implementations vary by level, each balancing trade-offs in redundancy, capacity, and performance; common levels include RAID 0 (striping without redundancy for maximum speed), RAID 1 (mirroring for full duplication and high availability), and RAID 5 (distributed parity allowing tolerance of one drive failure while optimizing capacity).[3][1] These can be managed via hardware controllers (offloading computation to dedicated chips) or software (using operating system resources, often at lower cost but with CPU overhead).[1] While RAID improves fault tolerance, it is not a substitute for backups, as it protects against hardware failure but not data corruption, deletion, or disasters affecting the entire array.[5] Today, RAID supports diverse applications from enterprise servers to consumer NAS devices, with nested configurations like RAID 10 combining levels for enhanced resilience.[1]Fundamentals
Definition and Purpose
RAID, originally an acronym for Redundant Array of Inexpensive Disks, is a data storage virtualization technology that combines multiple physical disk drives into one or more logical units.[4] This approach enables the creation of a unified storage system from individual drives, treating them as a single entity for data management.[1] The primary purposes of RAID are to provide data redundancy for fault tolerance, enhance performance through data parallelism across drives, and optimize storage capacity utilization by distributing data efficiently.[4] Redundancy ensures data integrity by allowing recovery from drive failures without loss, while parallelism improves input/output operations by leveraging multiple drives simultaneously.[4] Capacity utilization is maximized by minimizing wasted space through techniques like parity distribution, enabling larger effective storage pools.[4] Key benefits include robust fault tolerance, such as surviving single or multiple drive failures depending on the configuration, significantly improved I/O throughput for demanding workloads, and cost-effective scaling that outperforms traditional single large expensive disks (SLEDs) in reliability and performance per dollar.[4] Compared to SLEDs, RAID arrays offer up to tenfold improvements in mean time to failure and better power efficiency.[4] The acronym evolved from "Inexpensive" to "Independent" Disks in subsequent industry usage, reflecting the shift from emphasizing low-cost components to the autonomy of individual drives in modern systems.[1] This concept originated in a 1987 technical report by David A. Patterson, Garth Gibson, and Randy H. Katz at the University of California, Berkeley.[6]Basic Principles
RAID employs three fundamental techniques for data management across multiple disk drives: block-level striping, mirroring, and parity. Block-level striping divides data into fixed-size blocks and distributes them sequentially across the drives in an array, enabling parallel read and write operations that enhance overall I/O performance by exploiting the combined bandwidth of the drives.[7] This method allows large data transfers to be spread across all disks in the group, reducing transfer times and minimizing queueing delays.[7] Mirroring involves creating exact duplicate copies of data on separate drives within the array, providing redundancy by ensuring that if one drive fails, the identical data remains accessible on another.[7] Parity, in contrast, computes redundant information—typically using exclusive-OR (XOR) operations—stored alongside the data to enable reconstruction of lost information following a drive failure, without duplicating the entire dataset.[7] These techniques involve inherent trade-offs between redundancy, performance, and capacity. Redundancy mechanisms like mirroring and parity improve fault tolerance by allowing data recovery after a failure, but they impose costs: mirroring requires writing data to multiple locations, which can degrade write performance, while parity calculations add computational overhead during writes.[7] Striping alone boosts throughput for both reads and writes by parallelizing access but offers no inherent redundancy, increasing the risk of total data loss from a single drive failure.[7] The usable capacity of a RAID array is determined by the total number of drives, their individual sizes, and the redundancy overhead. In general, usable capacity can be expressed as N \times S \times (1 - r), where N is the number of drives, S is the size of each drive, and r is the fraction of capacity dedicated to redundancy.[7] For mirroring with two copies, r = 0.5, yielding 50% usable capacity—for instance, two 1 TB drives provide 1 TB of usable storage.[7] RAID abstracts the physical drives from the operating system through logical block addressing, where the array controller maps sequential logical block numbers specified by the OS to physical locations across the drives, presenting the array as a single, contiguous storage device.[7] This abstraction hides the complexity of data distribution and redundancy, allowing applications to interact with the array using standard block I/O interfaces without awareness of the underlying physical configuration.[7] Standard RAID levels represent specific combinations of these principles to balance performance, redundancy, and cost.[7]History
Origins and Invention
The concept of RAID was first introduced in a seminal 1987 technical report by researchers David A. Patterson, Garth A. Gibson, and Randy H. Katz at the University of California, Berkeley, titled "A Case for Redundant Arrays of Inexpensive Disks (RAID)."[4] This work proposed using arrays of small, low-cost disk drives as an alternative to the prevailing Single Large Expensive Disks (SLEDs), such as the IBM 3380, to achieve higher storage capacity, improved performance, and enhanced reliability at a fraction of the cost.[4] The acronym RAID specifically denoted these "Redundant Arrays of Inexpensive Disks," emphasizing redundancy to tolerate failures while leveraging the parallelism of multiple drives.[4] The primary motivations stemmed from economic and performance trends in the mid-1980s. By 1987, the cost per megabyte of small personal computer disks, like the Conner CP3100, had fallen to $8–$11/MB, outpacing the $10–$18/MB for large SLEDs, enabling arrays of inexpensive drives to match or exceed the capacity of expensive single units at lower overall cost and power use.[4] Additionally, emerging applications demanded greater I/O bandwidth: supercomputing workloads required high transfer rates for large sequential accesses, while transaction-oriented databases needed high rates for numerous small, random I/O operations—capabilities that single drives struggled to provide without costly custom solutions.[4] An array of 75 inexpensive disks, for instance, could deliver approximately 12 times the I/O bandwidth of an IBM 3380 with equivalent capacity.[4] The paper outlined five initial RAID levels, with a particular emphasis on Levels 2 through 5 to enable error correction and fault tolerance in high-reliability settings. Level 2 employed bit-level striping with Hamming codes for single-error correction, suitable for error-prone environments; Level 3 used sector-level striping with dedicated parity disks for large transfers; Level 4 allowed independent access with a single parity disk; and Level 5 distributed parity across all drives to balance load during writes—all prioritizing redundancy overhead of 10–40% to detect and recover from disk failures.[4] In parallel with the theoretical framework, the Berkeley team developed early experimental prototypes to validate the RAID approach using off-the-shelf inexpensive disks. These efforts culminated in the RAID-I prototype in 1989, built on a Sun 4/280 workstation with 28 disks and specialized controllers for striping and redundancy.[8] Concurrently, industry responses included Thinking Machines' Data Vault (a Level 2 array for the Connection Machine supercomputer) and announcements from Maxtor and Micropolis for Level 3 systems with synchronized drives.[4]Standardization and Evolution
The formal standardization of RAID technology gained momentum in the early 1990s through the establishment of the RAID Advisory Board (RAB) in August 1992, a Massachusetts-based organization formed by leading storage industry companies to educate users, standardize terminology, and classify RAID configurations.[9] The RAB's initial membership included eight diverse vendors focused on reducing confusion in the rapidly growing disk array market, with subsequent growth to over 50 members by the mid-1990s. In 1993, the board published the first edition of The RAIDbook, a seminal handbook that defined the core RAID levels 0 through 6, establishing industry benchmarks for striping, mirroring, and parity-based redundancy while emphasizing fault tolerance and performance.[10] This effort built upon earlier academic concepts from the University of California, Berkeley's 1988 RAID paper, transitioning theoretical ideas into commercial standards. The RAB's work facilitated widespread adoption by providing a unified framework that vendors like Compaq and Digital Equipment Corporation (DEC) could reference for compatible implementations.[11] Throughout the 2000s, RAID evolved to accommodate emerging interfaces and storage media, particularly the shift from parallel ATA (PATA) to Serial ATA (SATA) in 2003, which improved cabling simplicity, data transfer rates up to 1.5 Gbit/s initially, and scalability for enterprise arrays. This transition enabled RAID controllers to support higher-density configurations at lower costs, making RAID viable for mainstream servers and NAS devices. Concurrently, the introduction of solid-state drives (SSDs) around 2006 necessitated RAID adaptations for flash-specific behaviors; for instance, SSD wear-leveling algorithms, which evenly distribute write operations across NAND cells to prevent premature failure, can interact with RAID parity calculations to exacerbate write amplification in levels like RAID 5, potentially reducing overall endurance in write-intensive scenarios without optimized firmware.[12][13] Manufacturers addressed this through hybrid controllers that incorporate over-provisioning and TRIM support to mitigate wear in RAID environments. The 2010s marked the rise of NVMe RAID with the NVMe specification's release in 2011, leveraging PCIe lanes for low-latency access and enabling throughputs exceeding 3 GB/s per drive, a stark contrast to SATA's limits. This era saw RAID controllers evolve to handle NVMe's parallel command queuing, boosting IOPS for database and virtualization workloads. A key SSD-specific advancement was the integration of NVMe over Fabrics (NVMe-oF) starting around 2014, which extends NVMe protocols over Ethernet, Fibre Channel, or InfiniBand for networked RAID, allowing disaggregated storage pools with sub-millisecond latencies and scalability to petabyte levels in data centers.[14][15] In 2024 and 2025, RAID continued to innovate with hybrid architectures to meet AI and high-performance computing demands, exemplified by Graid Technology's SupremeRAID SR-1001, launched in January 2024 as an entry-level GPU-accelerated solution that offloads RAID processing from CPUs to NVIDIA GPUs in a CPU-GPU hybrid model, delivering up to 80 GB/s throughput and 6 million IOPS from eight NVMe SSDs while reducing CPU overhead by 90%.[16] This approach addresses bottlenecks in traditional CPU-based RAID for mixed workloads, enhancing efficiency in GPU-heavy environments.RAID Levels
Standard Levels
The standard RAID levels are commonly considered to encompass configurations from 0 to 6, with levels 1 through 5 originally proposed in the seminal 1988 paper by Patterson, Gibson, and Katz.[4] RAID 0 and RAID 6 were developed subsequently to provide non-redundant striping and extended fault tolerance, respectively. These levels focus on fundamental techniques like striping and parity for data distribution and redundancy, balancing capacity, performance, and reliability across multiple drives. They serve as building blocks for more complex setups and are widely implemented in storage systems for applications ranging from high-throughput computing to data protection. RAID 0 employs data striping across n drives without any redundancy, dividing consecutive blocks evenly to maximize parallelism.[17] This yields full capacity utilization of n times the size of a single drive but offers no fault tolerance, as the failure of any drive results in total data loss. Performance benefits include high throughput for large sequential reads and writes, making it suitable for non-critical applications prioritizing speed, such as video editing or temporary data processing. RAID 1 uses mirroring, duplicating data across two or more drives to provide redundancy.[4] Capacity is limited to the size of one drive, regardless of the number of mirrors, while tolerating the failure of all but one drive.[4] It enhances read performance through parallel access to mirrors but incurs a write penalty equivalent to the number of mirrors, as data must be written to all.[4] This level is ideal for small datasets requiring high availability and reliability, such as operating system boot volumes or critical configuration files.[4] RAID 2 implements bit-level striping across n data drives with dedicated Hamming code parity drives for error correction.[4] It provides redundancy for multiple bit errors per sector via Hamming codes but requires synchronized drive operations, leading to capacity efficiency of approximately 71-83% depending on the number of parity drives (e.g., 4 parity drives for 10 data drives yield 71%).[4] Performance excels in large sequential I/O but falters for small random accesses due to bit-level granularity.[4] Largely obsolete today, it has been supplanted by modern drives with built-in error-correcting codes (ECC), eliminating the need for array-level bit error handling.[18] RAID 3 features byte-level striping across n-1 data drives with a single dedicated parity drive using XOR calculations for redundancy.[4] It tolerates one drive failure, with capacity efficiency around 91-96% (e.g., 91% for 10 drives).[4] The setup delivers high bandwidth for large, sequential transfers but suffers from bottlenecks on the parity drive for small or random I/O, as all operations must access it.[4] Like RAID 2, it is obsolete in contemporary systems due to inherent ECC in drives and the inefficiency of byte-level striping for modern workloads.[18] RAID 4 applies block-level striping across n-1 data drives with a dedicated parity drive, also using XOR for parity computation.[4] This configuration tolerates one failure, achieving similar capacity efficiency to RAID 3 (91-96%).[4] It improves on RAID 3 by supporting efficient small reads via direct data drive access, but writes remain constrained by the parity drive bottleneck, requiring updates to both data and parity.[4] Obsolete for similar reasons as RAID 3—modern ECC and the parity drive limitation— it sees no practical use today.[18] RAID 5 distributes both data and parity blocks across all n drives in a rotating manner, using XOR operations to compute parity (e.g., parity block = XOR of corresponding data blocks in the stripe).[4] It tolerates one drive failure and provides capacity of (n-1) times a single drive's size, with efficiency approaching 100% as n increases.[4] Performance is strong for large I/O and small reads, but small writes incur a penalty of four operations: reading the old data and parity, then writing the new data and updated parity.[4] Commonly used for balanced storage needs in servers and databases where moderate redundancy and capacity are key.[4] RAID 6 extends RAID 5 with dual distributed parity across n drives, typically employing Reed-Solomon codes to compute two independent parity blocks per stripe, enabling tolerance of two concurrent drive failures.[19] Capacity is (n-2) times a single drive's size, with efficiency similarly high for large n.[19] It maintains good read performance but amplifies the write penalty to six operations for small writes due to the dual parity updates.[20] Suited for large arrays in archival or enterprise storage where higher fault tolerance outweighs the capacity and performance trade-offs.[19]| RAID Level | Fault Tolerance | Capacity Efficiency | Write Penalty (Small Writes) |
|---|---|---|---|
| 0 | 0 failures | 100% | 1 |
| 1 | 1 failure (n-1 mirrors) | 50% (for 2 drives) | 2 (for 2 drives) |
| 2 | 1 failure | 71-83% | Varies (high for small I/O) |
| 3 | 1 failure | 91-96% | High (parity bottleneck) |
| 4 | 1 failure | 91-96% | 4 |
| 5 | 1 failure | (n-1)/n | 4 |
| 6 | 2 failures | (n-2)/n | 6 |