Extended file system
The Extended File System (ext) is the first file system specifically designed and implemented for the Linux kernel, serving as a foundational technology for data storage and management in early Linux distributions. Developed in April 1992 by Rémy Card, Theodore Ts'o, and Stephen Tweedie, it was integrated into Linux kernel version 0.96c to address the severe limitations of the previously used Minix file system, which capped file systems at 64 MB, restricted filenames to 14 characters, and lacked support for larger partitions.[1] Key features of ext included support for file systems up to 2 GB in size, filenames of up to 255 characters, and standard Unix-like semantics for files, directories, and symbolic links, making it a significant improvement for growing Linux usage on personal computers and servers.[1] However, ext had notable drawbacks, such as inefficient management of free blocks and inodes through simple linked lists, which caused fragmentation and degraded performance over time, particularly as file systems filled up.[1] It also lacked separate timestamps for file access, modification, and status changes, relying instead on a single timestamp field.[1] These limitations prompted rapid iteration, with the Second Extended File System (ext2) released in January 1993 as a direct successor that introduced block groups to reduce fragmentation, variable block sizes, and a maximum file system size of 4 TB, while maintaining backward compatibility with ext.[1] Ext quickly became obsolete in practice, as ext2 offered superior reliability and efficiency, but it played a crucial role in enabling Linux's early adoption by providing a scalable alternative to proprietary or restrictive file systems like those in MS-DOS or Minix.[2] The legacy of ext endures through its influence on the broader extended file system family, which evolved into ext3 (adding journaling for crash recovery in 2001) and ext4 (enhancing scalability for modern storage in 2008), remaining the default choice for many Linux installations as of 2025 due to their robustness, open-source nature, and compatibility, although some distributions are exploring alternatives like Btrfs.[2][3][4]History
Origins and Initial Development
The development of the Extended File System (ext) was initiated in 1992 by French software developer Rémy Card as part of his work to create a native file system for the Linux kernel, addressing the constraints of the Minix file system then used by Linux.[1] This effort was prompted by the need for a more robust storage solution, as sought by Linux creator Linus Torvalds.[5] Key motivations for ext included overcoming Minix's severe size restrictions, enabling support for larger partitions up to 2 gigabytes, filenames up to 255 characters, and full Unix-like file permissions to better align with Linux's Unix heritage.[1][6] These enhancements were essential for handling growing storage needs and providing proper access control in a multi-user environment.[1] The initial design of ext was inode-based, drawing inspiration from the Berkeley Fast File System, with each inode containing pointers to data blocks for file addressing, but lacking journaling for crash recovery.[1] This structure allowed efficient access to file data while keeping implementation straightforward.[1] Ext was introduced in the Linux kernel version 0.96c in April 1992, marking the first file system tailored specifically for Linux via the Virtual File System (VFS) interface.[1][6] However, it had notable limitations, including a 2 gigabyte cap on both partition and file sizes due to its block addressing scheme, as well as fixed inode allocation that prevented dynamic adjustment based on usage.[1][6] Early testing by the Linux developer community revealed these constraints, prompting rapid feedback and iterations that highlighted the need for improvements in scalability and flexibility.[1] This community-driven process quickly led to the transition to ext2 as a more advanced successor.[1]Evolution Through Versions
The second extended file system (ext2) was developed in 1993 by Rémy Card, Theodore Ts'o, and Stephen Tweedie as a major rewrite of the original Extended file system to address limitations in scalability and functionality.[7] It introduced dynamic inode allocation within block groups, support for volumes up to 4 terabytes (with 4 KB block sizes), and efficient handling of symbolic links stored directly in inodes for short paths.[1] These enhancements enabled better performance and flexibility for growing Linux systems, with ext2 first integrated into the kernel around early 1993 releases.[7] The third extended file system (ext3) was first released in September 1999 by Stephen Tweedie, building on ext2 by integrating a journaling mechanism to enable rapid crash recovery and reduce filesystem check times after power failures.[8] Designed for backward compatibility, ext3 volumes could be mounted as ext2 without modification, allowing seamless upgrades while adding metadata journaling as a core innovation for data integrity.[8] Initial support arrived in Linux kernel 2.4.15, marking a shift toward more robust filesystems in enterprise and desktop environments.[9] Ext4, the fourth extended file system, emerged in 2008 through contributions from developers including Andreas Dilger, Mingming Cao, and others, extending ext3's capabilities to meet demands for massive storage in modern hardware.[10] Key upgrades included support for volumes up to 1 exabyte and files up to 16 terabytes (with 4 KB blocks), delayed block allocation to minimize fragmentation, and online defragmentation tools for maintenance without downtime.[10] It was merged as stable code into Linux kernel 2.6.28, solidifying its role as the default filesystem for many distributions.[10] Parallel to these filesystem versions, the e2fsprogs utility suite evolved to provide tools for creation, maintenance, and repair, starting with ext2 support in 1993 and expanding to include ext3 journaling features by 2001 and ext4 extents by 2008.[11] The original Extended file system (ext) was deprecated in modern Linux kernels by the early 2000s, as ext2 and its successors fully supplanted it due to superior performance and features.[7] Throughout this progression, community-driven improvements were coordinated via mailing lists like linux-fsdevel and linux-ext4, as well as conferences such as the Ottawa Linux Symposium, fostering collaborative enhancements from global developers.[12]Design and Architecture
Core Data Structures
The superblock serves as the primary metadata structure in the Extended File System (ext), storing essential global information about the entire filesystem. It includes fields such as the total number of blocks and inodes, the counts of free blocks and inodes, the filesystem state (e.g., clean or erroneous), mount counts, check intervals, the revision level, the operating system identifier, a volume name, and a UUID. The superblock is fixed at 1024 bytes in size and is located at offset 1024 from the beginning of the filesystem.[13] Unlike later versions, the original ext did not include block groups or extensive backup mechanisms for the superblock; free space was managed via simple linked lists of available blocks and inodes, and only a single timestamp field was used for file metadata (combining access, modification, and status change times). This approach, while basic, supported Unix-like semantics but suffered from inefficiency and fragmentation as the filesystem grew.[1][14] The inode structure is a fixed-size record that encapsulates metadata for each file, directory, or other filesystem object, excluding the filename. In the original ext and ext2 implementations, inodes are 128 bytes each, containing fields for permissions and file type, ownership (user and group IDs), timestamps (access, modification, creation, and deletion), link count, file size, and block counts, along with pointers to data blocks. These pointers consist of 12 direct addresses to data blocks, one single-indirect pointer (referencing a block of further pointers), one double-indirect pointer, and one triple-indirect pointer, enabling scalable access to large files. Later versions like ext3 and ext4 expand the inode size (up to 256 bytes or more with features enabled) to accommodate additional fields such as extended attributes, ACLs, nanosecond timestamps, and 64-bit file sizes.[13][1] Directory entries function as variable-length records within directory files, mapping filenames to their corresponding inodes to facilitate name resolution. Each entry includes the inode number, the entry's length, the filename length, the filename itself (up to 255 characters), and in revision 1 and later, a file type indicator for quicker validation. These entries form a linear linked list within the directory's data blocks, though ext2 and subsequent versions support hashed directory indexing (e.g., via HTree in extensions) to accelerate lookups in large directories by organizing entries into a tree structure based on filename hashes.[13][15] Introduced in ext2, group descriptors are an array of structures, one per block group, that provide locality and redundancy for managing filesystem subsets. Each 32-byte descriptor (in ext2/ext3; expanded in ext4 with 64-bit features) tracks the starting block numbers and sizes of inode and block bitmaps, the inode table location, and counts of free inodes and blocks within its group. Positioned immediately after the superblock (with backups alongside superblock copies), this organization divides the filesystem into groups of up to 32,768 blocks (limited by bitmap size), promoting efficient allocation and reducing seek times by localizing related metadata.[13] In ext2, the maximum file size is constrained by the inode's block pointers and is calculated as the sum of blocks addressable through direct and indirect pointers multiplied by the block size. With 32-bit block pointers, the number of addressable blocks is $12 + N + N^2 + N^3, where N is the number of pointers per indirect block (N = \frac{\text{block [size](/page/Size)}}{4}). For a 1 KiB block size (N=256), this yields approximately 16 GiB; for 2 KiB (N=512), 256 GiB; and for 4 KiB (N=1024), 2 TiB (though originally limited to 4 GiB by the 32-bit size field until extended in later revisions). These inodes support block allocation by providing the mapping from logical offsets to physical blocks via their pointer hierarchy.Block Allocation and Management
The original Extended File System (ext) managed disk space without block groups, using linked lists to track free blocks and inodes, which allowed basic allocation but resulted in fragmentation and poor performance over time. This was improved in ext2, which partitions the disk into block groups to facilitate parallel access and enhance fault tolerance by localizing metadata and data. Each block group typically contains around 32,768 blocks for a default 4 KiB block size, equating to 128 MiB per group, though this can vary based on configuration. This structure reduces seek times and fragmentation by keeping related data close together. Within each group, separate bitmaps—one for blocks and one for inodes—track availability, with each bit representing one block or inode to efficiently manage free space.[7][18][2] The allocation algorithm in ext2 employs a goal-directed approach via the Orlov allocator, which prioritizes locality by attempting to place new data blocks in the same block group as the inode referencing them, thereby clustering related files and minimizing fragmentation. It scans bitmaps starting from a "goal" location to find free blocks, favoring contiguous allocations near the inode to reduce disk seeks. If the preferred group lacks space, it falls back to nearby groups or global scanning. This strategy spreads top-level directories across groups to avoid hotspots while keeping subdirectory contents proximate to their parents. Inode pointers reference these allocated blocks, directing access to the appropriate group.[7][19] Free space management in ext2 and later relies on per-group bitmaps, where a single bit per block or inode indicates usage, enabling quick queries and updates during allocation. To scale for larger filesystems, ext4 introduces flexible block groups (flex_bg), which logically combine multiple traditional groups—typically powers of two, such as 16—into one unit. Metadata like bitmaps and inode tables for the entire flex group are consolidated in the first physical group, improving allocation efficiency and reducing overhead for very large volumes exceeding traditional group limits. This enhances scalability without altering the core bitmap mechanism.[20][18] Fragmentation handling in ext2 and ext3 lacks a built-in defragmenter, relying instead on the locality-preferring allocator to prevent excessive scattering during writes; over time, however, repeated allocations can lead to non-contiguous blocks, impacting performance on aging filesystems. Ext4 mitigates this through multi-block allocation, which uses delayed allocation to batch requests and assign large contiguous extents—up to 128 MiB—from bitmap-derived free space lists, built via a buddy allocator at mount time. This approach minimizes seek operations and external fragmentation compared to the single-block allocations in ext2 and ext3.[21][22] The number of block groups is determined by dividing the total number of blocks by the blocks per group:\text{number of groups} = \frac{\text{total blocks}}{\text{blocks per group}}
For instance, on an 8 GiB drive with 4 KiB blocks (yielding approximately 2,097,152 total blocks and 32,768 blocks per group), this results in about 64 block groups.[18]
Specific Implementations
ext2 File System
The ext2 file system, also known as the second extended file system, represents a significant advancement over the original extended file system (ext) for Linux, emphasizing performance, scalability, and reliability without incorporating journaling. Developed by Rémy Card, Theodore Ts'o, and Stephen Tweedie, it was first released in January 1993 as part of Linux kernel 0.99 and quickly became the default file system for Linux distributions due to its robust design inspired by traditional Unix file systems like BSD FFS.[7][23] ext2 organizes data into block groups to minimize fragmentation, with each group containing bitmaps for block and inode allocation, an inode table, and data blocks; the maximum size of a block group is eight times the block size, allowing for a flexible number of groups across large volumes. The superblock supports up to 232 inodes filesystem-wide, enabling handling of billions of files in theory, though practical limits depend on inode density settings during formatting. With 4 KB blocks—the common default—ext2 supports maximum volume sizes of 16 TiB, constrained by the 32-bit block count in the superblock. File sizes are limited to 2 TiB due to the inode's 32-bit i_blocks field (counting 512-byte sectors), achieved through a combination of 12 direct block pointers, one single-indirect, one double-indirect, and one triple-indirect pointer; for 1 KB blocks, this limit drops to approximately 16 GiB.[7][23]| Block Size | Max. File Size | Max. Filesystem Size |
|---|---|---|
| 1 KB | 16 GiB | 4 TiB |
| 2 KB | 256 GiB | 8 TiB |
| 4 KB | 2 TiB | 16 TiB |
| 8 KB | 2 TiB | 32 TiB |
ext3 File System
The ext3 file system represents a significant evolution from its predecessor, ext2, by incorporating journaling capabilities to enhance data reliability and reduce recovery times after system crashes or power failures. Developed primarily by Stephen Tweedie and released in September 1999 for Linux kernel 2.2, with ports to later versions, ext3 builds on ext2's block group structure while adding an on-disk journal as a dedicated log for pending file system operations.[26][9] This journaling mechanism logs changes before they are committed to the main file system, enabling recovery in seconds rather than the hours or more required by full file system checks on ext2.[9] The core upgrade in ext3 is its journaling layer, implemented as a circular buffer on disk with a default size of 32 MB (corresponding to 8192 blocks at a 4 KB block size). The journal consists of transaction descriptors that outline operations, commit blocks to mark successful completions, and revoke records to undo incomplete transactions, ensuring atomicity. ext3 supports three journaling modes: data mode, which journals both data and metadata for maximum integrity; ordered mode (the default), which journals only metadata but guarantees data blocks are written to disk before the corresponding metadata commit; and writeback mode, which journals metadata without data ordering for potentially higher performance at some risk of inconsistency.[26][27][28] In ordered mode, data is flushed to the file system prior to metadata journaling, balancing reliability and efficiency.[27] ext3 maintains full backward compatibility with ext2, allowing ext3 volumes to be mounted and used as ext2 without modifications, as the journal is an optional feature flag. It inherits ext2's limits, such as a maximum file size of 2 TB and up to 32,000 subdirectories per directory, but introduces htree (hashed B-tree) indexing for directories, which significantly improves lookup times in large folders by enabling efficient traversal beyond linear scans.[26][29] ext3 reached its peak adoption as the default file system in many Linux distributions, including Red Hat Enterprise Linux and Debian, from 2001 to 2008, due to its stability and ease of upgrade from ext2. It has been supported in Linux kernels 2.6 and later, remaining viable for legacy systems.[26] Despite these advances, ext3 has notable drawbacks, including the absence of extent-based allocation, which relies instead on indirect block pointers and leads to fragmentation and inefficiency for very large files. Additionally, on 32-bit systems, the maximum volume size is limited to 16 TB due to block addressing constraints.[27][30]ext4 File System
The ext4 file system represents an evolution from ext3's journaling mechanism, incorporating extensive scalability enhancements to support modern storage demands while maintaining core compatibility.[22] It achieves this through 64-bit operations across key structures, enabling vastly larger volumes and files compared to its predecessors. Specifically, ext4 supports filesystem volumes up to 1 exabyte (EB), individual files up to 16 terabytes (TB), and inode sizes as small as 128 bytes for efficient metadata handling.[22] These limits are facilitated by features like 48-bit block addressing and dynamic inode allocation, allowing for billions of files on large-scale systems.[18] Unique to ext4 are several efficiency-focused innovations in block management. Delayed allocation defers block assignment until data is committed to disk, reducing fragmentation and overhead by coalescing multiple small writes into larger, contiguous extents during flush operations.[22] Persistent preallocation reserves space for files in advance—particularly useful for streaming media and databases—using a special flag in extent structures to mark uninitialized blocks without immediate data writes.[22] Additionally, nanosecond-resolution timestamps for modification (mtime), access (atime), change (ctime), and creation times provide precise file tracking, enabled through extended fields in larger inodes.[31] Ext4 ensures seamless integration with legacy setups via backward compatibility, allowing it to mount and operate on ext3 and ext2 partitions without reformatting; tools like tune2fs enable in-place migration, such as converting indirect blocks to extents or resizing inodes.[31] Released as stable in Linux kernel 2.6.28 in December 2008, ext4 became the default filesystem in major distributions like Ubuntu starting with version 9.10 in 2009, and it receives ongoing maintenance through kernel updates for reliability and feature refinements. Write barriers are enabled by default to enforce proper ordering for data integrity on storage devices, preventing metadata corruption during power failures.[31] In 2012, metadata checksums using CRC32C were introduced across structures like superblocks, inodes, and journals, adding a layer of integrity verification to detect and mitigate corruption.Key Features
Journaling Mechanism
The journaling mechanism in ext3 and ext4 provides data integrity by recording pending changes in a dedicated log area, known as the journal, before applying them to the main file system metadata and data blocks. This log acts as a circular buffer of fixed size, typically consisting of descriptor blocks, data or metadata blocks, commit blocks, and revoke blocks. During normal operation, file system modifications are grouped into transactions, which are atomic units that either fully complete or are rolled back in case of interruption, ensuring the file system remains in a consistent state. Upon mounting after a crash or power loss, the journal is scanned and replayed: committed transactions are applied, while incomplete ones are discarded, avoiding the need for extensive file system checks.[32] ext3 and ext4 offer three journaling modes to trade off between performance and safety. The default mode in ext3 is ordered (data=ordered), where only metadata changes are logged in the journal, but data blocks are flushed to disk before the corresponding metadata commit to prevent inconsistencies like zeroed data with non-zero metadata. In writeback mode (data=writeback), metadata is journaled while data writes occur asynchronously, maximizing throughput but risking data corruption on crash. The data=journal mode logs both metadata and new data blocks in the journal before writing them to their final locations, providing the highest integrity at the cost of reduced performance due to doubled writes for data. These modes are selected at mount time using mount options.[33][9] A transaction begins with allocating a handle via the journaling block device layer (JBD or JBD2), which reserves space in the journal—typically a few dozen blocks per transaction to amortize overhead. Metadata updates (and data in data=journal mode) are then logged by writing before-and-after images or just the new state, depending on the operation. The transaction commits by appending a commit block and issuing a disk barrier to ensure durability; checkpoints may follow to reuse journal space by verifying committed changes have reached the file system. If a crash interrupts, recovery replays the journal starting from the last checkpoint, applying only transactions with valid commit blocks. Commits occur periodically, often every 5 seconds, batching multiple system calls for efficiency.[9][34] The revoke mechanism prevents replay of obsolete or invalid log entries, such as when a block is modified multiple times within a transaction or freed before commit. When a logged block becomes irrelevant—e.g., during truncation or reallocation—a revoke record is inserted into the journal, listing the block numbers to ignore during recovery. These revoke blocks are hashed for quick lookup and processed sequentially during replay, ensuring that superseded metadata does not corrupt the file system. This feature is crucial for maintaining consistency in complex operations like file deletion, where old metadata logs must be invalidated.[9][32] By enabling atomic updates and rapid recovery, the journaling mechanism drastically reduces file system check (fsck) times after unclean shutdowns—from potentially hours to under a second—and enhances fault tolerance against power failures, as only consistent states are restored without data loss in metadata or ordered modes.[33][9] Journal size recommendations aim to cover typical workloads without excessive overhead; a common guideline is 1-5% of the file system size or at least 32 MB, with the minimum being 1024 blocks (e.g., 4 MB for 4 KB blocks). Transactions per commit generally encompass 100-1000 blocks to balance latency and throughput, though defaults are tuned for recovery in about 1 second on standard hardware.[35][33] Additionally, starting with Linux kernel 5.10 (December 2020), ext4 introduced the fast commit feature to reduce latency for synchronous operations in data=ordered mode. Fast commit uses a dedicated area within the journal to log only the minimal metadata deltas required for quick recovery of recent changes, allowing faster commits without full transactions for operations like file creation, deletion, and linking. This improves performance for workloads with frequent fsync calls, such as databases, while maintaining data integrity. The feature must be enabled at filesystem creation using the mkfs.ext4 -O fast_commit option and is supported alongside standard journaling. Recent developments include performance optimizations and bug fixes in Linux 6.11 (September 2024) and further enhancements in Linux 6.18 (October 2025).[32]Extent-Based Storage
In the ext4 file system, extent-based storage replaces the traditional indirect block mapping used in earlier ext versions with an extent tree, a hierarchical data structure that efficiently maps logical file offsets to ranges of physical disk blocks. The tree consists of interior nodes, defined by thestruct ext4_extent_idx (12 bytes each), which point to child nodes, and leaf nodes, defined by the struct ext4_extent (also 12 bytes), which contain the actual mappings. Each extent in a leaf node specifies a starting logical block (ee_block), a length (ee_len up to 32,768 blocks), and a starting physical block address (split across ee_start_hi and ee_start_lo fields). With a typical 4 KB block size, a single extent can thus represent up to 128 MB of contiguous data. The tree root is stored in the inode's i_block array, supporting a maximum depth of five levels to handle files up to 16 TB.[36]
This design significantly reduces metadata overhead compared to indirect blocks, where large files require numerous pointer blocks across direct, single, double, and triple indirect levels. For instance, representing a 1 GB file (262,144 blocks with 4 KB block size) using traditional indirect blocks might necessitate approximately 256 metadata blocks for pointers in the double indirect level alone, assuming moderate fragmentation. In contrast, extents can represent the same file with just 1-2 entries if highly contiguous, or up to about 8 extents if split into maximum-sized chunks, minimizing the inode's metadata footprint to a few kilobytes. Additionally, extents improve performance for sequential access patterns by promoting contiguous physical allocation, reducing seek times and fragmentation during reads and writes.[36][37]
Extent allocation in ext4 leverages the multi-block allocator (mballoc), which uses goal-oriented heuristics and buddy system bitmaps to reserve large contiguous ranges on disk, enhancing locality and reducing external fragmentation. For sparse files, uninitialized (uninit) extents are employed, marking ranges as allocated but unwritten until data is actually stored, which defers physical writes and supports efficient hole punching. This mechanism builds on ext4's delayed allocation in one key aspect: by postponing block commits until writeback, it allows the allocator to merge nearby writes into larger extents.[38]
To enable extent-based storage, the extents feature flag must be set in the superblock using the tune2fs -O extents command on an unmounted ext4 filesystem, which also requires the INCOMPAT_EXTENTS compatibility flag. Individual inodes opt into extents by setting the EXT4_EXTENTS_FL flag, with the filesystem falling back to indirect block mapping for compatibility if the feature is disabled or for small files where the overhead of building a tree is unnecessary. Once enabled, ext4 preferentially uses extents for new files, ensuring backward compatibility with ext2/ext3 tools via on-the-fly conversion.[37]