Fact-checked by Grok 2 weeks ago

Log-structured file system

A log-structured file system (LFS) is a file system architecture that appends all modifications to files and sequentially to a single on disk, rather than performing in-place updates, thereby treating the disk as an append-only of ordered records. This design leverages sequential writes to boost on rotating disks and flash storage, while enabling operations and simplified recovery from crashes by replaying the from the most recent checkpoint. The concept was pioneered in 1991 by and John K. Ousterhout at the , as part of their work on the operating system, with the seminal paper published in 1992 detailing its design and prototype implementation. The LFS prototype achieved write throughputs up to an higher than contemporary Unix file systems for small-file workloads, utilizing up to 70% of available disk compared to 5-10% for traditional systems. Key mechanisms include dividing the log into fixed-size segments (typically 1 MB), maintaining an in-memory or disk-based inode map to track current file locations within the log, and employing a background segment cleaner to garbage-collect obsolete data by copying live blocks to new segments and erasing old ones. LFS offers several advantages, particularly for write-intensive applications: sequential appending eliminates random seeks, enabling high write bandwidth; inherent journaling ensures crash consistency without separate recovery logs; and it avoids free-space fragmentation by reusing cleaned segments. However, drawbacks include potential read amplification from scattered file fragments (mitigated by caching but worsening on cache misses), increased overhead from segment cleaning as the disk fills (potentially halving performance above 50% utilization), and higher complexity in managing the cleaner to minimize write amplification. These trade-offs make LFS especially suitable for environments with large caches and sequential access patterns, but less ideal for random-read-heavy workloads on mechanical disks. Notable implementations include the original LFS, a 1993 port to Unix by Margo Seltzer and colleagues that integrated with the vnode for broader compatibility and robustness, and specialized variants for such as (developed for embedded systems). In modern systems, pure or hybrid LFS designs persist in commercial storage like NetApp's WAFL, Oracle's (via logging), and 's Btrfs and NILFS2, while nearly all solid-state drives (SSDs) employ LFS-like flash translation layers (FTL) for and write optimization. Recent research continues to refine LFS for emerging storage, such as garbage-collection-free variants to reduce overhead on high-capacity SSDs.

History and Development

Origins and Invention

The log-structured file system (LFS) was invented by and John K. Ousterhout, along with colleagues in the operating system project at the . Developed in the late 1980s and implemented by mid-1990, it emerged as part of the broader research effort to create an efficient for environments. The LFS prototype was designed specifically to address inefficiencies in traditional file systems, particularly for workloads involving frequent small writes, which were common in systems of the era. The invention was motivated by the growing disparity between rapidly advancing CPU and speeds and the relatively stagnant performance of during the . Disk access times improved only modestly compared to the exponential gains in performance, while transfer rates for sequential operations began outpacing capabilities due to evolving magnetic disk technologies. These trends highlighted the limitations of conventional file systems like the Berkeley Fast File System (FFS), which suffered from high seek times and fragmentation when handling small, random writes—operations that dominated many real-world workloads. By the late , large file caches in further shifted disk I/O toward write-dominated patterns, amplifying the need for a approach optimized for sequential throughput. The core innovation of LFS lay in treating the entire disk as a sequential log for all modifications, thereby minimizing seek overhead and leveraging the full bandwidth of emerging disk drives. This concept was first detailed in the seminal paper "The Design and Implementation of a Log-Structured File System," presented by Rosenblum and Ousterhout at the 13th ACM Symposium on Operating Systems Principles (SOSP) in 1991. The publication outlined the Sprite LFS prototype's goals of dramatically improving write performance for small files while simplifying crash recovery through log-based structures. As the authors noted, this sequential writing paradigm allowed LFS to achieve near-optimal disk utilization, marking a foundational shift in file system design.

Evolution and Key Publications

Following the introduction of the Sprite Log-structured File System (LFS) in , subsequent research in the focused on practical implementations and optimizations to address cleaning overheads and integration challenges. In 1993, Margo Seltzer and colleagues developed an LFS prototype for Unix systems, demonstrating improved write throughput and crash recovery while highlighting the need for efficient segment to mitigate performance degradation from live data relocation. This work built on the original cleaning policy by introducing heuristic approaches that prioritized segments based on utilization and age, reducing cleaning costs in workloads with high update rates. Concurrently, early experiments, such as the Linux LFS project initiated in the late , explored porting LFS concepts to open-source environments, though full implementations like LinLogFS emerged in 2000, emphasizing fast recovery and ordered writes. The 2000s saw LFS principles extend to specialized storage, particularly for flash memory and high-availability systems. Matthew Dillon's HAMMER file system, first prototyped in 2005 and detailed in a 2008 paper, incorporated LFS-style logging with B+ trees for metadata, enabling features like snapshots and history tracking in DragonFly BSD, which improved reliability for large-scale storage up to exabytes. For flash-optimized variants, systems like JFFS2 (2001) and YAFFS (2002) adapted log-structured writing to NAND flash constraints, using sequential appends to minimize erase cycles and support wear leveling by distributing writes evenly across blocks. These adaptations addressed flash's out-of-place update nature. In the and , LFS ideas influenced distributed and SSD-centric storage, evolving into hybrid structures for scalability and durability. The (LSM-tree), proposed by Patrick O'Neil et al. in 1996 but widely adopted in the 2000s through systems like (2011), extended LFS sequential writing to key-value stores, enabling high-ingestion rates in databases like by merging sorted logs to bound read amplification. Google's Colossus, deployed around 2010 as a successor to GFS, incorporated log-structured elements for append-only files and logging, supporting exabyte-scale clusters with sub-millisecond latencies in environments. SSD-specific evolutions, such as Samsung's (introduced in Linux 3.8 around 2013), refined log structuring with multi-log zones to align with flash translation layers, reducing garbage collection overhead and enhancing through hot/cold data separation. Key publications marking this progression include the seminal 1991 SOSP paper by Rosenblum and Ousterhout on LFS design, the 1993 implementation by Seltzer et al., and the 1995 heuristics paper on cleaning by Seltzer. The 2008 overview by Dillon, the 2011 technical notes, and the 2015 FAST paper on represent 2000s advancements. A comprehensive 2018 survey by Luo and Carey in The VLDB Journal analyzed LSM-based extensions of LFS, covering over 50 works and emphasizing their role in modern systems with trade-offs. These contributions underscore LFS's enduring impact on sequential-write optimization amid shifting hardware paradigms.

Core Concepts and Design

Fundamental Principles

A log-structured file system (LFS) treats the entire storage medium as an append-only log, where all file system modifications—including file creations, deletions, and updates—are written sequentially to the end of this log rather than in place. This design mimics a mechanism but applies it to the whole , ensuring that every change, from updates to data blocks, is appended as a contiguous sequence. The rationale for this log structuring stems from the inherent geometry of disk drives, where sequential writes significantly outperform random ones by minimizing mechanical seeks and rotational delays. Traditional file systems suffer from fragmented updates that require small writes across the disk, leading to inefficient patterns; LFS amortizes these costs by batching operations into sequential , thereby exploiting the full sequential of disks even for workloads dominated by small, random-like modifications. At its core, the key abstraction in an LFS is the disk viewed as a circular log composed of fixed-size segments, typically to 1 , which serve as atomic units for writing and management. Old data within these segments is invalidated through non-reference rather than immediate overwriting, allowing the log to wrap around circularly once it reaches the disk's end, thus maintaining a monotonically growing structure without gaps. This approach represents a fundamental conceptual shift from conventional file systems, which rely on in-place updates to fixed locations for files and inodes. In an LFS, no such overwrites occur; instead, entirely new versions of affected structures are appended to the log, with the current valid state reconstructed via dynamic maps or in-memory pointers that track the latest references, enabling efficient reads while deferring space reclamation.

Log Structure Mechanics

In a log-structured file system (LFS), the entire disk is organized as a single sequential log, divided into fixed-size segments, typically ranging from 512 KB to 1 MB in length, to which all modifications are appended in a contiguous manner. These segments serve as the fundamental units of storage and include a variety of content: file blocks, inodes, indirect blocks for larger files, portions of the inode , segment summary blocks, and dedicated checkpoint regions. A is maintained at a fixed disk to provide initial , such as the location of the most recent checkpoint, while each segment begins with a header followed by a summary block that records for every block within it, including the file number, block offset, and version number to facilitate quick identification of contents during recovery or cleaning. Central to the LFS mechanics are the mapping structures that translate virtual file addresses to physical locations in the log, ensuring efficient access without in-place updates. The inode map (IMAP) is a key component, consisting of fixed-size entries that point to the current location and version of each file's inode within the log; it is primarily cached in memory for fast lookups but periodically flushed to disk as part of checkpointing. Complementing the IMAP is the segment usage table (), which maintains per-segment information such as the number of live bytes, the age of the oldest data, and timestamps for cleaning decisions, helping the system track which portions of the log contain valid versus obsolete data. Additional cleaner , derived from segment summaries and the SUT, supports the identification and relocation of live data during space reclamation, with these structures themselves stored as appends within the log to maintain consistency. Checkpoints provide atomic snapshots of the volatile mapping tables, enabling reliable and rapid mount-time reconstruction of the active state. Generated periodically—often after writing a certain volume of data or at shutdown—a checkpoint involves a two-phase process: first, all modified data and partial maps are appended to the log, followed by the complete IMAP and written to one of two fixed checkpoint regions on disk, alternating between them to avoid overwriting active structures. Upon mounting, the system reads the latest checkpoint to load the IMAP and into , then scans recent segment summaries to update mappings for any post-checkpoint writes, ensuring the file system view reflects the state at the last consistent point. File operations in an LFS leverage these structures through a virtual-to-physical addressing scheme that promotes writes for atomicity and efficiency. When creating or modifying a , new s—including , inodes, and entries—are appended sequentially to the current segment, with their virtual addresses ( ID and ) mapped to physical positions via an in-memory that extends the IMAP for individual blocks. This mapping is updated dynamically in memory and committed durably through subsequent checkpoints, while a separate of operations ensures that name-to-inode linkages remain consistent even if a crash occurs mid-update, as all changes are grouped into indivisible appends rather than scattered updates.

Operational Mechanisms

Writing and Allocation

In a log-structured file system (LFS), the writing process begins by buffering small, asynchronous writes in a kernel-level to aggregate them into larger, segment-sized units, typically ranging from 512 to 1 MB, before committing them sequentially to disk. This batching ensures that modifications to data blocks, inodes, and other are grouped together and written atomically as a single log segment, maximizing disk bandwidth utilization by converting random small writes into sequential large I/O operations. For instance, when updating a , both the new data and the corresponding inode revisions are appended to the log in this bundled fashion, maintaining consistency without immediate on-disk reorganization. Space allocation for these writes relies on the (SUT), a maintained in memory and periodically checkpointed to disk, which tracks the status of each disk segment by recording the number of live bytes and the last modification time for each. To allocate a new segment, the system scans the SUT to identify free or sufficiently clean segments—those with minimal live data—and reserves one for the impending write, ensuring that the occurs at the log's current tail without fragmentation. This approach allows for rapid allocation, as the SUT provides an efficient into the log's overall structure, where segments are written contiguously. Allocation policies in LFS guide the selection of for writing and future reuse, with common strategies including a that prioritizes the least-utilized (cleanest) segments to quickly reclaim space, and a more sophisticated cost-benefit that weighs the potential reduction in overhead against the effort required. The cost-benefit approach, for example, selects based on a that favors those with low utilization u (the of live bytes to ) and high (time since last modification), using a formula like benefit/cost = [(1 - u) × age] / (1 + u), to minimize the long-term cost of relocating live data during space recovery. These are applied proactively during write allocation to balance immediate performance with sustained free space availability. When handling overwrites or modifications to existing files, LFS invalidates the old versions of affected directly in the inode map or block mapping structures without physically erasing them from their original positions, thereby marking space as obsolete for later reclamation. The new data is then written to a fresh address within the newly allocated segment, and the mapping tables are to point to this new location; for actively modified files, a in the inodes ensures that subsequent reads access the most recent version by chaining pointers through the . This out-of-place avoids the penalties of in-place revisions, treating overwrites as operations. Integration with the operating system occurs through kernel-managed buffering that groups disparate write operations—such as data updates, changes, and inode modifications—into cohesive segments, enforcing sequential I/O patterns even for mixed workloads. For example, a metadata-intensive operation like creating a new involves buffering the entry update alongside the initial write, allowing both to be flushed atomically to the without separate synchronous disk accesses, which reduces and enhances throughput in multi-user environments. This buffering layer, often implemented as an extension to the interface, ensures that LFS presents a standard POSIX-like while internally optimizing for appends.

Garbage Collection and Cleaning

In log-structured file systems (LFS), garbage collection, often referred to as segment cleaning, is the background process responsible for reclaiming space in log segments that contain a mix of live and invalid (dead) data blocks. The algorithm scans candidate segments to determine their live-to-dead ratios, typically using metadata like segment summary blocks to identify live blocks by file identifiers and offsets. Live data is then relocated and compacted into fewer new, clean segments, after which the original segments are erased and marked as available for reuse. This process is triggered proactively when the number of clean segments drops below a configurable threshold, such as maintaining tens of segments free, and continues until a higher threshold is met, like 50-100 clean segments, to ensure steady-state operation without interrupting foreground writes. The efficiency of cleaning hinges on cost models that quantify the overhead of relocation, as analyzed by Rosenblum in 1992. A key insight is , where rewriting live data increases total I/O beyond the original writes; for a segment with utilization u (fraction of live data), the write cost multiplier is given by \frac{1 + u}{1 - u}, assuming all data in N read segments is processed. Optimal performance targets segments with u < 0.5, yielding multipliers around 2 or less, though real implementations achieve effective write costs of 1.2-1.6 by selecting low-utilization segments, enabling 65-75% of to be used for writing new data (with the remainder for cleaning). Building briefly on segment usage tracking via summary blocks, these models enable predictive selection to minimize amplification. Cleaning policies vary to balance overhead and effectiveness, with two primary approaches: age-based and cost-based. Age-based policies prioritize the oldest segments, segregating hot (frequently updated) and cold (static) data to reduce repeated rewrites of hotspots, such as active files that could otherwise amplify costs in localized workloads. Cost-based policies, like the cost-benefit heuristic, select segments by maximizing a score of \frac{(1 - u) \times \text{age}}{1 + u}, favoring high-dead-ratio (low u) and aged segments to create a bimodal distribution—mostly full cold segments cleaned at ~75% utilization and sparse hot ones at ~15%—outperforming greedy selection (by utilization alone) by up to 50% in simulations. These policies handle hotspots by deprioritizing recently written hot data, preventing thrashing where high live ratios lead to inefficient cycles. Practical challenges include the of I/O, which can spike during intensive reclamation and compete with user requests, and the risk of thrashing if live ratios remain high across segments, exacerbating . Mitigations involve running in the background during periods to smooth bursts, and policy-driven selection to maintain a steady supply of low-cost segments, as demonstrated in production traces where 69% of cleaned segments were nearly empty, with average utilizations of 0.133-0.535.

Advantages and Limitations

Performance Benefits

Log-structured file systems (LFS) provide significant performance improvements primarily through their sequential write mechanism, which batches modifications into large, contiguous log appends rather than scattering them across the disk as random updates. This approach dramatically increases write throughput for small-file operations, such as creates and deletes, by converting costly random I/O into efficient sequential transfers. In benchmarks from the Sprite LFS implementation, small-file (1 KB) creates and deletes achieved rates of approximately 160 files per second, compared to about 20 files per second in the (FFS), representing up to a 10-fold . Similarly, random writes to large files reached approximately 700 KB/s in Sprite LFS, outperforming 's 400 KB/s, while sequential writes reached 800 KB/s versus 600 KB/s. By amortizing disk seeks through batching, LFS reduces the average seek time per operation, particularly benefiting metadata-intensive workloads like directory listings and file attribute updates. This is achieved by grouping related operations—such as multiple small appends or metadata changes—into single log segments, minimizing the number of head movements on mechanical disks. Production measurements of Sprite LFS demonstrated that it could utilize 65-75% of the disk's raw for writing new , in contrast to the 5-10% typical for traditional Unix file systems. Trace-driven simulations further illustrated this efficiency, showing LFS write costs as low as 1.2-1.6 (in terms of disk consumed per unit of useful written), compared to 10-20 for FFS, effectively yielding 6- to 16-fold improvements in write efficiency for production workloads. Read performance in LFS remains generally comparable to traditional systems, supported by in-memory structures like the inode map that enable quick location of current file versions within the . For small-file reads, Sprite LFS delivered about 180 files per second, slightly exceeding SunOS's 140 files per second. Sequential scans also benefit from the ordered structure, facilitating efficient traversal for applications requiring full-file or directory scans. LFS proves particularly suitable for environments characterized by frequent small appends and operations, such as in office or engineering workloads, where simulations indicated up to 70% overall write speedup relative to FFS.

Drawbacks and Challenges

One significant drawback of log-structured file systems (LFS) is the substantial cleaning overhead required to reclaim space occupied by invalid data. In steady-state operation, cleaning can consume 20-50% of the disk's , as it involves reading partially filled segments, copying live data to new locations, and writing them back, often leading to write bursts that cause stalls. For instance, simulations demonstrate that when segments contain only 30% live data, the write amplification factor reaches 1.4x, meaning 1.4 units of data must be written to the disk for every unit of new user data. This overhead arises from the core process, where the system periodically compacts data to maintain free space for sequential writes. LFS also suffers from space inefficiency due to the accumulation of invalid blocks until cleaning occurs, necessitating the reservation of extra disk space to avoid frequent or inefficient cleanups. To achieve acceptable , LFS typically requires 20-50% of the disk to remain free, as lower utilization levels reduce but leave less usable capacity for user data. This inefficiency is particularly sensitive to workload patterns; for example, frequent rewrites of large files exacerbate fragmentation, increasing the proportion of invalid data and demanding more reserved space. Recovery in LFS adds complexity, relying on periodic checkpoints to reconstruct the current map during mount time after a or unclean shutdown. This process scans the from the last checkpoint, replaying updates to rebuild in-memory structures, which can take significant time—up to 132 seconds for a 50 MB with small files—and risks or inconsistency if affects the checkpoint or tail. Unclean shutdowns heighten vulnerability, as partial writes may leave the system in an inconsistent state without atomic commit mechanisms beyond checkpoints. Finally, LFS performance is highly dependent on underlying characteristics. On traditional hard disk drives (HDDs), the random reads during selection for can lead to seek overhead, and the system is less effective on zoned block devices where writes must follow sequential zone constraints, potentially requiring adaptations to avoid zone overflows. On solid-state drives (SSDs), while sequential writes align well, the lack of TRIM support prevents the SSD controller from efficiently garbage-collecting invalid blocks, resulting in accelerated wear through unnecessary erases and writes.

Implementations and Applications

Early and Research Systems

The Sprite Log-structured File System (LFS), developed at the , served as a pioneering prototype implementation in 1991 within the operating system, a environment running on Sun workstations. It employed a 4 KB block size and configurable segment sizes of 512 KB or 1 MB, with all modifications written sequentially to the log to optimize disk throughput. Dynamic segment allocation was handled via an in-memory segment map that tracked free space, complemented by a segment cleaner that copied live data from partially filled segments to reclaim space and maintain utilization above 70%. This design emphasized high write performance for small files and rapid crash recovery through checkpoints, though it required substantial memory for mapping structures. In 1993, Margo Seltzer and colleagues implemented a port of LFS to 4.4BSD Unix, integrating it with the vnode interface for improved compatibility and robustness. This BSD-LFS retained core log-structured principles, including sequential writes and checkpoint-based recovery, and was later ported to derivatives like , where it became functional again with the NetBSD 4.0 release in 2007. Primarily used for , it demonstrated portability but required modifications and was limited to specific architectures. Early LFS systems like and BSD-LFS were inherently tied to their host operating systems, complicating portability to other Unix variants or non-BSD kernels. Experimental status often meant incomplete feature sets, such as limited support for legacy tools and persistent issues with cleaner overhead under low-utilization scenarios, restricting them to controlled research settings rather than broad adoption.

Modern and Commercial Uses

One prominent open-source implementation of log-structured file systems is the Flash-Friendly File System (), developed by and integrated into the starting with version 3.8 in 2012. is optimized for flash storage such as eMMC and SSDs, employing an append-only logging scheme with multi-head log structures that separate hot and cold data into distinct sections to minimize . It incorporates adaptive cleaning algorithms that dynamically adjust garbage collection based on segment utilization and device characteristics, improving performance on flash media by reducing random writes. ZFS, originally developed by in 2005 and now available in open-source distributions like , integrates log-structured elements through its (CoW) design, where all modifications append new blocks rather than overwriting existing ones. This approach ensures data immutability and enables efficient snapshots and clones, with the ZFS Intent Log (ZIL) serving as a dedicated log for synchronous writes to enhance reliability on storage pools spanning multiple devices. In commercial environments, Apple's (APFS), introduced in 2017 with , adopts LFS-like mechanisms including CoW for snapshots and cloning operations, allowing space-efficient versioning without duplicating data. APFS structures its container-based volumes to support these features natively on SSDs, facilitating features like backups through immutable snapshots. Google's File System (GFS), deployed in 2003, employs a approach for management via an operation that records all critical changes, enabling fault-tolerant writes across distributed clusters. Its successor, Colossus, introduced around 2010 and evolved for petabyte-scale storage, extends this with replication and file semantics to handle massive workloads in Google's data centers, supporting services like and Spanner. For embedded and cloud applications, Microsoft's Resilient File System (), available since , incorporates integrity streams that enable a log-structured mode for using checksums, particularly when file is enforced to detect and repair corruption in virtualized or storage-space configurations. Key-value stores like (developed by in 2011) and its derivative (by in 2012) utilize log-structured merge-trees (LSM-trees), which organize data in immutable levels of sorted runs for write-optimized persistence in databases and systems. Recent advancements in the 2020s include hybrid LFS designs tailored for -optimized storage, such as those leveraging the Zoned Namespaces (ZNS) standard ratified by in 2020, which divides SSD capacity into sequential-write zones to reduce internal fragmentation and overhead. Systems like Z-LFS build on this by adapting log-structured allocation to ZNS constraints, achieving up to 33x performance gains in small-zone SSDs through zone-aware segment management and reduced host-side garbage collection.

Comparisons with Other File Systems

Versus Traditional Inode-Based Systems

Log-structured file systems (LFS) fundamentally differ from traditional inode-based systems, such as the Berkeley Fast File System (FFS) or , in their approach to data and metadata updates. In LFS, all modifications—whether to file data, directories, or inodes—are appended sequentially to a log-like structure on disk, creating new versions of blocks rather than overwriting existing ones in place. This append-only model enables high write throughput by amortizing multiple small writes into large, sequential operations, achieving up to 65-75% of raw disk bandwidth for writes, compared to 5-10% in FFS due to scattered in-place updates that require multiple seeks per operation. However, this leads to higher space amplification in LFS, as obsolete data accumulates until garbage collection reclaims it, often operating effectively only at 85-90% disk utilization and consuming approximately 20% more space than FFS for equivalent workloads. Metadata handling in LFS contrasts sharply with the scattered inode tables of traditional systems. LFS batches inodes and related into the log alongside , using an inode (a separate structure pointing to the latest versions) to locate current , which reduces random seeks during writes but requires additional for reads. In contrast, FFS and maintain fixed inode tables distributed across groups or block groups, allowing direct access but necessitating separate I/O operations for updates, such as up to five seeks for a simple file creation. Without effective caching, LFS reads can thus incur higher latency due to scanning, though in practice, it performs comparably to FFS for patterns. Crash recovery mechanisms highlight another key divergence. LFS employs checkpointing to mark consistent log states, followed by rapid forward replay of post-checkpoint operations to reconstruct the , typically completing in under 1-132 seconds depending on the log size after a crash. Traditional inode-based systems like FFS rely on full-disk scans via tools such as to detect and repair inconsistencies, a process that can take tens of minutes or more for large volumes, especially after write-heavy failures. This makes LFS particularly resilient for workloads with frequent small writes, localizing recovery to recent changes rather than rescanning the entire disk. LFS is optimized for workloads dominated by operations, such as or creating many small files, where it delivers 1.5-10x faster write than inode-based systems—for instance, 0.28 MB/sec appends versus 0.11 MB/sec in FFS. Conversely, traditional systems like or UFS excel in random read/write patterns typical of databases or general-purpose use, where LFS's cleaning overhead can degrade throughput by up to 40% at high utilization, and read suffers without fragment support for small files under 8 . In benchmarks like the Andrew file system benchmark, LFS outperforms FFS in write-intensive phases (e.g., 1.30 seconds versus 3.30 seconds for file creation and writes) but trails in read-heavy scenarios.

Versus Other Journaling Approaches

Log-structured file systems (LFS) treat the entire disk as a single , where both and modifications are written sequentially without in-place updates. In , traditional journaling file systems such as primarily log metadata changes in a separate, fixed-size journal (typically 1-32 MB), while is written to fixed locations on disk in ordered or writeback modes; journaling modes in include both and but still separate the journal from the main structure. similarly focuses on metadata-only journaling, recording structural changes in its $LogFile while performing in-place writes. This unified logging in LFS promotes more complete sequential write patterns, enhancing performance on sequential media, but it increases the scope of garbage collection compared to the bounded journaling in and . Regarding overhead, LFS incurs from segment cleaning, with factors typically ranging from 1.2 to 1.6 times the user data volume under realistic workloads, though simulations indicate up to 2.5-3 times at higher utilization levels. Journaling systems like and maintain lower, more predictable overhead through their small, fixed journals—avoiding widespread rewriting—but data journaling in doubles writes by logging data temporarily before committing it to its final location. Despite this, LFS scales more favorably for small, random writes by aggregating them into sequential log appends, outperforming journaling approaches in such scenarios. For and , both LFS and journaling s provide atomic operations to prevent partial updates after crashes. LFS ensures this via periodic checkpoints, followed by a straightforward replay of the log tail from the last checkpoint, which restores the without needing transaction-specific rollbacks. Journaling systems like and rely on logs for redo (replaying committed changes) or / mechanisms for incomplete ones, scanning only the for quicker in most cases. However, LFS's approach trades off with potential read amplification during normal operation to maintain current file maps, as the log structure requires indirect access via in-memory or on-disk indices. Modern hybrid systems like incorporate log-structured mechanisms, blending LFS's append-only updates with indexing to mitigate fragmentation and cleaning overheads inherent in pure LFS designs. For instance, ext4's delayed allocation feature batches small writes before committing them to disk, akin to LFS's log batching for sequential efficiency, but within an inode-based framework rather than a full log.

References

  1. [1]
    The design and implementation of a log-structured file system
    This paper presents a new technique for disk storage management called a log-structured file system. A log-structured file system writes all modifications ...Missing: original | Show results with:original
  2. [2]
    LogStructuredFilesystem
    Log-structured filesystems were proposed by Rosenblum and Osterhout in a 1991 SOSP paper (See lfsSOSP91.ps). To date they have not been widely adopted, although ...2.2. Checkpoints · 3. Space Recovery · 4. PerformanceMissing: original Ousterhout
  3. [3]
    Log-structured filesystems (CS 4410, Summer 2015)
    Advantages and disadvantages of LFS · data can be lost if it has been written but not checkpointed. · most reads are absorbed by cache; writes always append to ...
  4. [4]
    Log-structured file systems: There's one in every SSD - LWN.net
    Sep 18, 2009 · Log-structured file systems are based on the assumption that files are cached in main memory and that increasing memory sizes will make the ...
  5. [5]
    [PDF] Log-structured File Systems - cs.wisc.edu
    [RO91] “Design and Implementation of the Log-structured File System” by Mendel Rosen- blum and John Ousterhout. SOSP '91, Pacific Grove, CA, October 1991. The ...
  6. [6]
    [PDF] IPLFS: Log-Structured File System without Garbage Collection
    Jul 13, 2022 · Abstract. In this work, we develop the log-structured filesystem that is free from garbage collection. There are two key.
  7. [7]
    [PDF] An Implementation of a Log-Structured File System for Unix - USENIX
    Research results [ROSE91] suggest that a log-structured file system (LFS) offers the potential for dramatically improved write performance, faster recovery time ...
  8. [8]
    [PDF] Heuristic Cleaning Algorithms in Log-Structured File Systems
    There are three terms that will be useful in discussing LFS cleaner performance write cost, on- demand cleaning, and background cleaning. Rosenblum defines ...Missing: refinements | Show results with:refinements
  9. [9]
    Linux Log-structured Filesystem Project - outflux.net
    The LFS Storage Manager 1990 Rosenblum & Ousterhout; Experiences with Implementing a Log-Structured File System 1992; An Implementation of a Log-Structured ...
  10. [10]
    [PDF] THE HAMMER FILESYSTEM - DragonFly BSD
    THE HAMMER FILESYSTEM. Matthew Dillon, 21 June 2008. Section 1 – Overview. Hammer is a new BSD file-system written by the author. Design on the file-.Missing: 2005 | Show results with:2005
  11. [11]
    A peek behind Colossus, Google's file system | Google Cloud Blog
    Apr 19, 2021 · Colossus is the secret scaling superpower behind Google's storage infrastructure. Colossus not only handles the storage needs of Google Cloud services.Missing: LFS | Show results with:LFS
  12. [12]
    [PDF] F2FS: A New File System for Flash Storage - USENIX
    Feb 19, 2015 · In this paper, we present the design and implemen- tation of F2FS, a new file system optimized for mod- ern flash storage devices. As far as we ...Missing: FFS | Show results with:FFS
  13. [13]
    [PDF] The design and implementation of a log-structured file system
    A log-structured file system writes all modifications to disk sequentially in a log-like structure, thereby speeding up both file writing.Missing: original | Show results with:original
  14. [14]
    [1812.07527] LSM-based Storage Techniques: A Survey - arXiv
    Dec 18, 2018 · In this paper, we provide a survey of recent research efforts on LSM-trees so that readers can learn the state-of-the-art in LSM-based storage techniques.
  15. [15]
    [PDF] The Design and Implementation of a Log-Structured File System
    Jul 24, 1991 · Synchronous writes couple the application's performance to that of the disk and make it hard for the application to benefit from faster CPUs.
  16. [16]
    LFS - Log-Structured Filesystem for NetBSD - hhhh.org
    Jun 27, 2002 · Short history and Description. The LFS, or Log-structured File System, is an alternative filesystem design proposed by Rosenblum and Ousterhout ...
  17. [17]
    A Brief Retrospective on the Sprite Network Operating System
    By late 1991 virtually all of Sprite's dozen disks were using LFS. The final phase of the project started in late 1990 and will continue until Sprite shuts ...
  18. [18]
    Flash-Friendly File System (F2FS) - The Linux Kernel documentation
    F2FS is a file system exploiting NAND flash memory-based storage devices, which is based on Log-structured File System (LFS).Missing: 2006 | Show results with:2006
  19. [19]
    [PDF] Apple File System Reference
    Jun 22, 2020 · Apple File System is the default file format used on Apple platforms. Apple File System is the successor to HFS.
  20. [20]
    [PDF] The Google File System
    The operation log contains a historical record of critical metadata changes. It is central to GFS. Not only is it the only persistent record of metadata, but it ...
  21. [21]
    Resilient File System (ReFS) overview - Microsoft Learn
    Jul 28, 2025 · Integrity-streams - ReFS uses checksums for metadata and optionally for file data, giving ReFS the ability to reliably detect corruptions.
  22. [22]
    NVMe Zoned Namespaces (ZNS) Command Set Specification
    The NVMe Zoned Namespaces (ZNS) interface is a command set developed by NVM Express. By dividing an NVMe namespace into zones, which are required to be ...Missing: hybrid LFS 2020
  23. [23]
    [PDF] Z-LFS: A Zoned Namespace-tailored Log-structured File System for ...
    This paper presents a novel zoned namespace (ZNS) tailored log-structured file system (LFS) called Z-LFS for commod- ity small-zone ZNS SSDs.
  24. [24]
    [PDF] An Implementation of a Log- Structured File System for UNIX
    Research results [ROSE91] demonstrate that a log-structured file system (LFS) offers the potential for dramatically improved write performance, faster recovery ...
  25. [25]
    [PDF] File System Logging versus Clustering: A Performance Comparison
    This paper presents a detailed performance comparison of the 4.4 BSD Log-structured File System and the 4.4 BSD Fast File System. ... This paper analyzes the ...
  26. [26]
    Analysis and Evolution of Journaling File Systems
    ### Summary of Differences Between Log-Structured File Systems and Journaling File Systems (ext3, NTFS)
  27. [27]
    [PDF] EXT3, Journaling Filesystem - cs.wisc.edu
    Jul 20, 2000 · The ext3 filesystem is a journaling extension to the standard ext2 filesystem on Linux. Journaling results in massively reduced time spent.
  28. [28]
    [1707.08514] Analyzing IO Amplification in Linux File Systems - ar5iv
    ... journaling, we find XFS significantly more efficient for file updates. Similarly, though F2FS and btrfs are implemented based on the log-structured approach ...
  29. [29]
    Extents and Extent allocation in Ext4 | linux - Oracle Blogs
    Oct 15, 2024 · Ext4 supports delayed allocation, it is also the default allocation algorithm. In delayed allocation, on-disk blocks are not allocated ...Missing: batching | Show results with:batching