EROFS
EROFS (Enhanced Read-Only File System) is a modern, high-performance, block-based, immutable filesystem designed for read-only use cases in the Linux kernel, emphasizing flexibility, scalability, and minimal I/O amplification.[1] Introduced in Linux kernel version 4.19, it provides an optimized on-disk format that supports compression algorithms such as LZ4 and Zstandard, along with byte-granularity deduplication to reduce storage needs while maintaining efficient access times.[1][2] Unlike traditional archive formats, EROFS is engineered for general-purpose applications, including embedded devices and system images, where immutability and fast read operations are critical.[3] EROFS features two inode formats: a compact 32-byte version suitable for files up to 4 GiB with simplified metadata, and an extended 64-byte version supporting larger files up to 16 EiB, modification timestamps, and extended attributes.[4] It enables block-aligned storage for maximum I/O efficiency, direct I/O support, and filesystem DAX (Direct Access) for memory-mapped operations, making it ideal for resource-constrained environments.[1] Compression is optional and per-file configurable, with in-place decompression ensuring that over 99% of blocks remain inline, which contributes to up to 45% smaller image sizes compared to uncompressed formats. In practical deployments, EROFS has been integrated into Android starting with version 13, where it serves as the default filesystem for system partitions, offering superior random and sequential read performance over alternatives like SquashFS or EXT4 while supporting full Virtual A/B OTA updates.[6] Initially developed by Huawei engineer Xiang Gao and now maintained by a global open-source community under the Linux kernel umbrella, EROFS continues to evolve with enhancements like multi-device backing and advanced deduplication, positioning it as a versatile solution for immutable storage in both mobile and embedded systems.[3] Its design prioritizes runtime efficiency over mere compression ratios, resulting in faster boot times and lower memory usage in production scenarios.[1]Introduction
Overview
EROFS, or Enhanced Read-Only File System, is a lightweight, block-based, immutable filesystem designed specifically for read-only scenarios in the Linux kernel.[1] It serves as a generic solution for immutable and trusted storage environments, delivering high read performance while maintaining a low memory footprint.[1] This makes it particularly suitable for resource-constrained applications, such as embedded devices, mobile operating system images, and containerized workloads where write operations are not required.[6] The primary design goals of EROFS emphasize flexibility, scalability, and compression-friendliness, setting it apart from earlier read-only filesystems like SquashFS or ROMFS through its focus on modern kernel integration and reduced runtime overhead.[1] It prioritizes high-performance random access with minimal I/O amplification and memory usage, enabling efficient handling of large-scale read-only data sets in diverse environments.[1] These attributes support feature extendability and user-friendly payload management, allowing seamless adaptation to evolving storage needs without compromising integrity.[1] At its core, EROFS employs a little-endian byte order for on-disk structures to ensure compatibility with prevalent architectures.[1] It utilizes 48-bit block addressing to accommodate massive storage volumes, supporting up to exabyte-scale filesystems in recent implementations.[7] Additionally, it provides native support for multi-device backing stores, facilitating the distribution of external data blobs across multiple volumes for enhanced scalability in container and cloud scenarios.[1]Development History
EROFS was initiated by Huawei in late 2017 to address the need for an efficient read-only filesystem optimized for mobile and embedded systems, focusing on compression-friendly designs to reduce storage footprints while maintaining high read performance.[8] The project aimed to provide a lightweight alternative to existing read-only filesystems like SquashFS, emphasizing scalability and low memory usage for resource-constrained environments.[9] Initial development progressed rapidly, with EROFS entering the Linux kernel's staging area in version 4.19 released in October 2018 for testing and refinement.[10] After a year of staging, it was fully merged into the mainline kernel with Linux 5.4 on November 24, 2019, marking its official availability as a stable filesystem.[11] This integration was driven by primary developer Xiang Gao and the Huawei team, who submitted the promotion patch after proving its stability through extensive testing.[12] Originally developed under Huawei, EROFS transitioned to open-source community maintenance following its mainline inclusion, involving contributors from organizations including Alibaba Cloud, Bytedance, Google, OPPO, Red Hat, Shanghai Jiao Tong University, and VIVO.[13] The community's efforts have ensured ongoing enhancements, with Xiang Gao continuing as the lead maintainer.[14] This collaborative model has broadened EROFS's applicability beyond its Huawei origins. Key milestones include the presentation of EROFS's compression-friendly architecture at the 2019 USENIX Annual Technical Conference, where the paper detailed its fixed-sized output compression and memory-efficient decompression techniques.[9] In Linux 5.13 (2021), support for big pclusters was added, enabling compression units up to 1 MiB for improved ratios on larger files.[15] Linux 5.16 (2022) introduced MicroLZMA compression as an option for better efficiency in specific scenarios.[16] Further advancements came in Linux 5.19 (2022) with file-based fscache integration for on-demand loading without block devices.[17] In Linux 6.15 (2025), the maximum volume size was expanded to 1 EiB to accommodate emerging large-scale use cases like AI datasets.[18] As of November 2025, EROFS remains under active development, with recent hardening patches merged in Linux 6.18 (October 2025) to mitigate risks from specially crafted images that could cause crashes or infinite loops.[19] This update enhances security without compromising performance, reflecting the filesystem's evolution toward robustness in diverse deployments, including its adoption in Android for read-only partitions.[6]Design and Architecture
On-Disk Format
The EROFS on-disk format employs a block-based physical layout for data, with metadata that is not strictly aligned to blocks, allowing for flexible arrangement to optimize access patterns. The filesystem begins with 1024 bytes of unused space, followed by the superblock at offset 1024, which is 128 bytes in size and records essential metadata such as the magic number (0xE0F5E1E2), a CRC32-C checksum for integrity, the block size expressed as a bit shift (defaulting to 4096 bytes via 12 bits), the starting block address of the metadata area (meta_blkaddr), and support indicators for additional features. This design ensures compatibility with little-endian byte order and facilitates efficient mounting by providing all core parameters in a compact, fixed location.[20][1][21] The logical organization of EROFS volumes intermixes metadata and data regions starting after the superblock, with metadata primarily concentrated beginning at the meta_blkaddr to enable sequential access while permitting user-defined layouts that blend the two for reduced seek times. This mixed approach supports inline storage of small data and extended attributes directly within metadata structures, though detailed inode handling is separate. The format avoids centralized tables for inodes or directories, instead relying on a tree-like structure of components that can be arranged arbitrarily by the creator tool, promoting compactness and performance in read-only scenarios.[1][20] Addressing in EROFS utilizes 48-bit logical block addressing, enabling volumes up to 1 EiB in size with the default 4 KiB block size, an extension from the original 32-bit limit of 16 TiB to accommodate larger storage needs in modern applications. Block addresses are resolved relative to the superblock's parameters, with metadata offsets calculated as meta_blkaddr multiplied by block size plus a fixed multiple of the node ID (NID), ensuring predictable positioning without sparse allocation overhead. This scheme supports fixed-sized output units for data placement, aiding in bounded decompression and direct access.[1][20] EROFS includes multi-device support to span volumes across multiple block devices, indicated by the extra_devices field in the superblock, which specifies the count of additional devices beyond the primary one. A device table follows at an offset defined by devt_slotoff, listing details like device IDs and sizes, allowing for distributed storage in scenarios such as container images or large archives. This feature maintains a unified namespace while distributing physical extents.[20][1] Metadata organization in EROFS emphasizes compact encoding to minimize space usage, with inode tables positioned at fixed 32-byte aligned slots starting from the metadata base address, and directory entries stored in blocks that separate entry metadata from filenames for alphabetical ordering and efficient lookup. Shared extended attributes occupy a dedicated area at xattr_blkaddr, aligned to 4-byte slots, while extent maps use dense representations to track data locations without redundancy. This structure reduces overall footprint and enhances read locality on flash-based media.[1][20]Inode Structures
EROFS utilizes two primary inode formats to accommodate varying file sizes and metadata requirements while maintaining efficiency in a read-only environment. The compact inode targets smaller files, optimizing space usage, whereas the extended inode provides expanded capabilities for larger files, including enhanced attribute support and advanced data layouts. Both formats adhere to POSIX standards for core file attributes and integrate extended attributes (xattrs) for access control lists (ACLs), with metadata aligned on 32-byte boundaries within dedicated metablocks.[1][20] The compact inode spans 32 bytes and supports files up to 4 GiB - 1 byte in size, making it suitable for the majority of files in resource-constrained systems. It includes a modification timestamp in seconds along with essential fields for file permissions, ownership, and basic data location. Extended attributes are indicated by thei_xattr_icount field, allowing optional inline storage immediately following the inode. The structure is defined as follows:
| Offset | Size (bytes) | Field | Description |
|---|---|---|---|
| 0x00 | 2 | i_format | Inode version and data layout hints (e.g., plain, inline, or chunk-based). |
| 0x02 | 2 | i_xattr_icount | Number of extended attribute entries (0 for none). |
| 0x04 | 2 | i_mode | File type and permissions (POSIX mode bits). |
| 0x06 | 2 | i_nlink | Hard link count (16-bit, maximum 65,535). |
| 0x08 | 4 | i_size | Logical file size in bytes (32-bit). |
| 0x0C | 4 | i_mtime | Modification time in seconds since epoch. |
| 0x10 | 4 | i_u | Union for data mapping: direct block address for plain files, index table address for compressed, or extent header for chunk-based. |
| 0x14 | 4 | i_ino | Inode number. |
| 0x18 | 2 | i_uid | Owner user ID (16-bit, maximum 65,535). |
| 0x1A | 2 | i_gid | Owner group ID (16-bit, maximum 65,535). |
| 0x1C | 4 | i_reserved | Reserved for future use. |
i_u union supports sparse file handling through hole markers in extent trees. The structure builds on the compact format with these additions:
| Offset | Size (bytes) | Field | Description |
|---|---|---|---|
| 0x00 | 2 | i_format | Inode version and data layout hints. |
| 0x02 | 2 | i_xattr_icount | Number of extended attribute entries. |
| 0x04 | 2 | i_mode | File type and permissions. |
| 0x06 | 2 | i_nlink | Hard link count (16-bit). |
| 0x08 | 8 | i_size | Logical file size in bytes (64-bit). |
| 0x10 | 4 | i_u | Union for data mapping, with extended support for large extents and sparse holes. |
| 0x14 | 4 | i_ino | Inode number. |
| 0x18 | 4 | i_uid | Owner user ID (32-bit, maximum 4,294,967,295). |
| 0x1C | 4 | i_gid | Owner group ID (32-bit, maximum 4,294,967,295). |
| 0x20 | 8 | i_mtime | Modification time in seconds since epoch. |
| 0x28 | 4 | i_mtime_nsec | Nanosecond precision for i_mtime. |
| 0x2C | 4 | i_nlink | Hard link count (32-bit, maximum 4,294,967,295). |
| 0x30 | 16 | i_reserved2 | Reserved for future use. |
i_xattr_icount (up to filesystem limits). Inline data via tail-packing is available for small files under 184 bytes, embedding content directly after the inode metadata to eliminate separate block allocation and reduce fragmentation. This feature applies to both formats when the EROFS_INODE_FLAT_INLINE layout is selected, enhancing efficiency for tiny files common in embedded scenarios.[1][20]
Data mapping in EROFS inodes varies by file characteristics to optimize access patterns. Direct mapping (EROFS_INODE_FLAT_PLAIN) addresses consecutive blocks via i_u for small, uncompressed files. Indirect mapping occurs in compressed layouts (EROFS_INODE_COMPRESSED_*), where i_u points to an index table listing physical cluster addresses for logical clusters, with per-cluster compression flags indicating LZ4, Zstd, or plain data. For large files, extent trees in the chunk-based layout (EROFS_INODE_CHUNK_BASED) use a hierarchical structure in i_u for mapping up to 16 EiB, supporting sparse files by denoting holes with null addresses (-1) and enabling deduplication through shared chunk references. These mappings integrate with the overall on-disk layout by referencing blocks from the superblock's metablock address.[1][20][21]
Features
Compression Mechanisms
EROFS supports transparent data compression on a per-file basis to enhance storage efficiency while maintaining read performance. The default algorithm is LZ4, available since Linux kernel 5.3, providing fast decompression suitable for resource-constrained environments. MicroLZMA was added in Linux 5.16 for achieving higher compression ratios at the expense of increased decompression overhead. DEFLATE support was introduced in Linux 6.6, offering a balance between compression ratio and CPU usage compared to LZ4. Zstandard support was introduced in Linux 6.10, providing high compression ratios with moderate decompression overhead.[2][1] Compression in EROFS employs a fixed-sized output approach, where variable-length input data is compressed into fixed-sized physical clusters (pclusters), typically 4 KiB by default, to enable in-place decompression directly into the page cache without requiring temporary buffers. Logical clusters (lclusters) are fixed-sized, equal to the filesystem block size (typically 4 KiB), allowing partial reads of compressed data to minimize I/O amplification and latency. This integration is indicated by flags in the inode structure, enabling selective application per file or cluster.[1][9] The benefits include compression ratios of up to 2-3x for typical payloads such as application data and firmware images, reducing overall storage footprint while supporting efficient random access. By design, this mechanism lowers I/O latency through partial cluster decompression, avoiding full-block reads common in other compressed file systems.[9][2] Configuration occurs during filesystem creation using the mkfs.erofs tool, where the algorithm and compression level are specified via the -z option (e.g., -zlz4 for LZ4 or -zlzma,9 for MicroLZMA at level 9). At runtime, decompression leverages kernel threads, including per-CPU kthreads introduced in Linux 6.3 for low-latency processing, or direct in-place CPU paths to optimize for read-heavy workloads.[22][23]Deduplication and Performance Optimizations
EROFS supports chunk-based data deduplication, introduced in Linux kernel version 5.15, which splits files into equal-sized data chunks and allows extents to reference shared compressed data blocks, thereby reducing storage redundancy in scenarios like multi-layer container images or system updates.[24] This mechanism enables efficient sharing of identical data chunks across files without duplicating them on disk, particularly beneficial for read-only images where common data patterns, such as shared libraries, can be referenced multiple times. Since Linux 6.1, physical clusters (pclusters) can serve multiple logical clusters (lclusters) or extents, further enhancing deduplication by allowing compressed data to be shared across different file regions.[24] To optimize read performance, EROFS incorporates big pcluster mode, available since Linux 5.13, which allows compressed physical clusters to exceed the standard file-system block size—up to 1 MiB—facilitating better compression ratios and faster sequential reads on storage devices like flash memory.[1] This mode packs multiple logical clusters into larger physical units, minimizing decompression overhead for linear access patterns common in boot processes or application loading. Additionally, EROFS employs memory-efficient techniques for metadata handling, leveraging the kernel's slab allocator to reduce fragmentation and allocation latency when caching inode and extent information. For uncompressed files, the filesystem supports direct I/O without buffer heads, bypassing the page cache to prevent double buffering and improve throughput in high-bandwidth scenarios.[22] Key performance enhancements include in-place decompression, which reuses file pages for temporary compressed data storage, limiting additional RAM usage to under 4 KiB per decompression operation in most cases and avoiding the need for large bounce buffers.[25] This approach ensures low memory footprint during runtime, with evaluations showing only a 4.9% increase in memory usage compared to uncompressed filesystems like Ext4. Since Linux 5.19, integration with the fscache framework enables on-demand caching of compressed data blocks, allowing EROFS images to be mounted without underlying block devices and supporting lazy loading for container environments.[26] As a read-only filesystem, EROFS inherently avoids write amplification issues associated with compression and deduplication in mutable systems, eliminating the need for garbage collection or data rewriting that could accelerate wear on flash storage. Its design prioritizes compact on-disk layouts and efficient read paths, aligning with flash-optimized wear-leveling by minimizing physical block erasures through immutable data structures.[9]Implementation
Kernel Integration
EROFS is mounted using the standard Linuxmount command with the -t erofs option, specifying the device or image file as the source.[1] It supports mounting via loop devices for filesystem images stored as regular files, enabling use without dedicated block devices, and can function as the lower directory in OverlayFS configurations for writable overlays on read-only bases.[27] The kernel enforces read-only access at the filesystem level, preventing any modifications to the mounted volume and ensuring immutability.[6]
In runtime operation, EROFS integrates seamlessly with the Linux Virtual File System (VFS) layer, providing standard interfaces for directory traversal, file lookups, and read operations.[1] It employs bio-based I/O mechanisms for efficient block-level data access, leveraging the block layer to submit read requests directly to underlying storage. Decompression occurs either inline within the read path for small, uncompressed extents or asynchronously via kernel threads for larger compressed clusters, minimizing latency while supporting algorithms like LZ4 and zlib.[1]
Key enhancements to EROFS kernel integration include support for file-backed mounts since Linux 6.12 (replacing the fscache backend introduced in 5.19), enabling persistent, on-demand caching of filesystem data without requiring block devices, improving performance in container and networked scenarios.[2] Multi-threaded decompression was introduced in Linux 6.3, allowing parallel processing of compressed data across CPU cores to accelerate read throughput, particularly for chunk-based files.[23] Additionally, with security hardening measures merged for Linux 6.18 to mitigate risks from malformed filesystem images, such as potential kernel crashes or memory corruptions, through enhanced validation of on-disk structures.[19]
As a read-only filesystem, EROFS provides no support for write operations, making it unsuitable for environments requiring mutable storage.[1] It is also incompatible with filesystems that rely on journaling for consistency, as its immutable design precludes any form of write journaling or recovery mechanisms.[6]
User-Space Tools
The primary user-space tool for EROFS ismkfs.erofs, part of the erofs-utils package, which formats EROFS filesystem images from a source directory or tarball, supporting options for compression algorithms such as LZ4, zlib, zstd (kernel support since Linux 6.10), and others, along with configurable compression levels to balance size and performance.[28][29] It also enables deduplication through the -E deduplication flag for global removal of duplicate data blocks (requiring Linux kernel 6.1 or later) and fragment deduplication via -E fragdedupe when combined with fragmentation support.[28] Inline data placement is enabled by default to optimize small files, but can be disabled with -E ^inline_data for compatibility with file system DAX modes in Linux 5.15 and newer.[28] Extended attributes (xattrs) are inlined by default up to two per inode, adjustable via --xattr-inline N, and can be filtered by prefix using --xattr-prefix for selective inclusion (Linux 6.4+).[28] Multi-device setups are supported by specifying additional backing devices with --extra-blob-device DEVICE for chunk-based data distribution.[28]
Additional utilities in erofs-utils include fsck.erofs for verifying the integrity of an EROFS image by scanning its metadata and data structures, with options like -f to force checks on mounted images and -V for version information.[30] The dump.erofs tool extracts and inspects filesystem contents, displaying overall disk statistics, superblock details, or specific file information from an image, aiding in debugging and analysis.[31] Image size optimization is facilitated through mkfs.erofs flags such as --block-size BYTES (defaulting to the system page size, minimum 512 bytes) and compression-specific parameters to reduce overhead without compromising read efficiency.[28]
The erofs-utils package is available in major Linux distributions, including Arch Linux, Debian, Ubuntu, Fedora, openSUSE, and Alpine Linux, often via standard repositories like extra in Arch or universe in Ubuntu.[32][33] Source code is hosted at the official kernel repository git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs-utils.git and mirrored on GitHub at erofs/erofs-utils for community contributions.[28][34]
Basic usage of mkfs.erofs follows the syntax mkfs.erofs [OPTIONS] DESTINATION SOURCE, where SOURCE is the input directory or tarball and DESTINATION is the output image file; for example, mkfs.erofs -zl 4096 /path/to/source /output.img creates a compressed image with LZ4 at level 4096.[28] To enable deduplication and inline xattrs for a multi-device setup, one might use mkfs.erofs -E deduplication --xattr-inline 4 --extra-blob-device /dev/sdb /output.img /source/dir.[28] For verification, fsck.erofs /output.img performs a standard integrity check, while dump.erofs -i /output.img shows inode details for inspection.[30][31]