zswap
Zswap is a Linux kernel feature that serves as a lightweight compressed cache for swap pages, intercepting pages during the swapout process to compress them and store the results in a RAM-based pool rather than writing directly to a backing swap device.[1] Introduced in kernel version 3.11, it aims to mitigate the performance overhead of swapping by minimizing disk I/O, particularly beneficial for systems with limited RAM, overcommitted virtual machines, or solid-state drives where write amplification can degrade longevity.[1] By using compression algorithms such as LZO, LZ4, or Zstd, zswap can store multiple compressed pages in the space typically occupied by a single uncompressed page, effectively extending available memory.[1] The mechanism relies on the zsmalloc allocator to manage the compressed pool dynamically, mapping swap entries to compressed data via red-black trees for efficient retrieval.[1] When the pool reaches its configured maximum size—typically a percentage of total RAM, such as 20%—zswap evicts the least recently used pages to the backing swap device on a write-back basis.[1] It also optimizes storage for same-filled pages, like zero pages, by avoiding full compression and instead using a compact representation to further enhance efficiency.[1] An optional shrinker mechanism allows proactive reclamation of cold compressed pages under memory pressure, integrating with the kernel's global reclaim process.[1] Configuration of zswap is flexible, supporting boot-time enabling via kernel parameters likezswap.enabled=1 or runtime toggling through sysfs interfaces in /sys/module/zswap/parameters/.[1] Users can select the compressor algorithm, set the maximum pool percentage with max_pool_percent, and adjust an acceptance threshold to control hysteresis in pool usage.[1] While zswap requires a backing swap device—unlike the related zram feature, which creates a standalone compressed block device—it complements traditional swapping by acting as a front-cache, and disabling it does not immediately evict stored pages, allowing gradual drainage.[1] This design makes zswap particularly suitable for environments seeking to balance memory efficiency with minimal configuration overhead.[1]
Introduction
Overview
Zswap is a Linux kernel feature that serves as a compressed write-back cache for swapped-out pages, storing compressed versions of these pages in a RAM-based memory pool to avoid or delay disk I/O operations, introduced in kernel version 3.11.[1] It functions by intercepting pages during the swap-out process, attempting to compress them, and managing the compressed data within a dynamic pool allocated from available RAM.[1] The core purpose of zswap is to reduce swap I/O overhead by keeping frequently accessed swapped pages in compressed form in memory, thereby trading CPU cycles for potential I/O savings.[2] Zswap is built on the frontswap API, which provides a "transcendent memory" interface that allows swap pages to be handled by backend mechanisms before reaching the backing swap device.[2] In the basic workflow, when a page is slated for swapping, zswap attempts to compress it; if compression succeeds and space is available in the pool, the compressed page is stored there along with metadata mapping it to the original swap entry.[1] If compression fails or the pool is full, the page is written directly to the backing swap device as a fallback.[1] Pages stored in the pool can later be decompressed and faulted back into memory when accessed, with eviction occurring on an LRU basis to manage pool size.[1]Benefits and Use Cases
Zswap provides significant performance benefits in memory-constrained environments by compressing swap pages in RAM, thereby minimizing disk I/O operations that would otherwise degrade system responsiveness. In systems with limited RAM, this approach mitigates the performance impact of swapping by keeping more inactive pages in a compressed form within memory rather than evicting them to slower storage devices. For overcommitted virtual machines, zswap reduces I/O pressure on shared backing storage, leading to lower latency and improved workload efficiency. Additionally, by decreasing the frequency of writes to swap devices, zswap extends the lifespan of SSD-based swap partitions, which are particularly sensitive to wear from repeated write cycles.[3] The resource efficiency of zswap stems from its ability to compress inactive pages, allowing RAM to be used more effectively for active workloads while effectively expanding available memory capacity. Typical compression ratios range from 3.6:1 on x86 architectures to 4.3:1 on POWER systems, depending on data compressibility, which can increase effective memory by 2-4 times in practice. For instance, benchmarks using the SPECjbb2005 workload demonstrated up to 40% performance improvements on systems exceeding available memory, with even higher gains (up to 60%) when leveraging hardware-accelerated compression on POWER7+ processors. This compression enables systems to handle larger workloads without proportional increases in physical memory demands.[2] Zswap is particularly well-suited for use cases involving memory-limited setups, such as desktops and laptops with 4-8 GB of RAM under multitasking loads, where it prevents thrashing by prioritizing compressed caching over disk swaps. In server environments with unpredictable workloads, it enhances stability by buffering bursts of memory pressure without immediate I/O spikes. Cloud virtual machines allocated minimal RAM benefit from reduced contention on shared storage, making zswap ideal for cost-optimized deployments. Systems relying on slow or expensive storage for swap, like remote or networked devices, also gain from deferred disk access, as zswap acts as an intermediary cache. A kernel build benchmark illustrated these advantages, showing a 53% reduction in runtime and 76% less I/O at high thread counts.[4] While zswap offers these gains, it introduces trade-offs, primarily additional CPU overhead for compression and decompression operations, which can impact performance in CPU-bound scenarios if the I/O savings do not outweigh the cycles spent. However, in most memory-pressure situations, the reduction in disk I/O latency—often orders of magnitude slower than RAM access—makes this trade-off favorable, especially on modern multi-core processors.[3]Functionality
Compression and Storage Mechanism
Zswap compresses anonymous swap pages prior to storage in a dedicated RAM pool, aiming to reduce memory pressure by retaining frequently accessed pages in compressed form rather than writing them to backing swap devices. The compression process employs a user-selectable algorithm such as LZO, LZ4, or Zstd, with the default determined by the kernel configuration optionCONFIG_ZSWAP_COMPRESSOR_DEFAULT.[3] This selection can be overridden at boot time via the zswap.compressor= kernel parameter or at runtime through the /sys/module/zswap/parameters/compressor sysfs interface.[3] During compression, each 4 KiB page is processed individually; pages are rejected if the resulting compressed size is greater than or equal to the original page size, deeming them incompressible.[5]
For storage, zswap utilizes the zsmalloc allocator, which manages a slab-based pool of compressed objects in RAM without requiring preallocation of fixed-size blocks.[3] Zsmalloc supports variable-sized allocations efficiently by packing multiple smaller compressed pages into larger zpages, minimizing internal fragmentation while providing handles rather than direct pointers for referencing stored data.[3] The pool's size is dynamically adjustable up to a percentage of total RAM, capped by the max_pool_percent parameter (default 20%), ensuring it does not consume excessive system memory.[3]
To facilitate quick lookups and management, zswap maintains per-swap-type XArrays that map swap entries—identified by their offset within the swap device—to the corresponding zsmalloc handles.[5] These indexed array structures enable efficient insertion, deletion, and retrieval operations. When a compressed page is stored, its swap entry is inserted into the XArray; upon a subsequent page fault, the handle is retrieved to decompress and restore the original page into the process's address space.[3]
Zswap includes optimizations for special page patterns through the same_filled_pages_enabled option (enabled by default), which detects pages filled with identical bytes—such as zero-filled pages common in sparse allocations—and stores only a representative pattern without invoking the full compression algorithm.[3] In this case, the compressed length is recorded as zero, and the pattern (e.g., all zeros) is associated with the handle, significantly reducing storage overhead for such pages while simplifying retrieval.[3]
If the zsmalloc pool reaches its capacity limit, zswap rejects the store operation, allowing the page to proceed directly to the backing swap device via the standard swap subsystem.[3] Rejection due to pool fullness is governed by an acceptance threshold (default 90% via accept_threshold_percent), introducing hysteresis to prevent thrashing between acceptance and rejection states as pool usage fluctuates.[5]
Eviction and Integration with Swap
Zswap integrates with the Linux kernel's swap subsystem by intercepting pages during the swap-out process, attempting to compress them into its in-memory pool before they reach the backing swap device. This integration allows zswap to serve as a cache layer, reducing direct I/O to slower storage devices. When the compressed pool fills to its configured limit, zswap begins evicting pages to maintain space.[1] The eviction policy in zswap is based on a least recently used (LRU) algorithm, targeting the oldest compressed pages in the pool for removal when the pool reaches its maximum size, defined by themax_pool_percent parameter. These evicted pages are decompressed and then written back to the backing swap device, ensuring that the original uncompressed data is preserved on disk. This process helps manage memory pressure by proactively freeing pool space without immediately rejecting new incoming pages.[1]
To support fine-grained control, zswap allows disabling writeback on a per-cgroup basis, preventing evicted pages from being sent to swap for specific memory control groups. Administrators can achieve this by setting the memory.zswap.writeback file to 0 within a cgroup's memory subsystem directory, such as echo 0 > /sys/fs/cgroup/<cgroup-name>/memory.zswap.writeback. This feature is useful in environments where certain workloads should avoid swap I/O entirely, though it risks pool exhaustion if not monitored.[1]
Under broader system memory pressure, zswap can optionally employ a shrinker mechanism for proactive reclamation. When enabled via the shrinker_enabled parameter (e.g., echo Y > /sys/module/zswap/parameters/shrinker_enabled), the shrinker scans the pool for cold pages—those unlikely to be accessed soon—and evicts them to the swap device ahead of full pool limits. This is disabled by default but can be activated at boot time using the CONFIG_ZSWAP_SHRINKER_DEFAULT_ON kernel configuration option, aiding in overall memory management by integrating with the kernel's global reclaim paths.[1]
To prevent thrashing during repeated pool overflows, zswap implements hysteresis through the accept_threshold_percent parameter. Once the pool exceeds its limit and begins rejecting new pages, zswap will not resume accepting them until the pool usage drops below this threshold (default 90% of the maximum). For instance, setting echo 80 > /sys/module/zswap/parameters/accept_threshold_percent ensures a buffer zone, stabilizing acceptance behavior; a value of 100 disables this hysteresis entirely. This mechanism balances pool utilization and system responsiveness during fluctuating memory demands.[1]
Configuration
Boot-Time Parameters
Zswap supports kernel command-line parameters for configuration at boot time, providing initial settings. These are useful for systems where zswap is compiled in or loaded as a module. Thezswap.enabled parameter toggles zswap's availability. Setting it to 1 enables zswap, while 0 disables it; the default depends on the kernel build option CONFIG_ZSWAP_DEFAULT_ON=y.[1]
The zswap.compressor parameter specifies the default compression algorithm, such as lzo, lz4, or zstd, overriding the build-time default from CONFIG_ZSWAP_COMPRESSOR_DEFAULT. Supported algorithms must be compiled into the kernel, with lz4 often preferred for its balance of speed and compression ratio.[1]
The zswap.zpool parameter selects the zpool implementation for the compressed pool, such as zsmalloc (default as of Linux kernel 5.10+), zbud, or others if compiled. For example, zswap.zpool=zsmalloc ensures use of the more efficient allocator.[1]
The shrinker can be enabled at boot if CONFIG_ZSWAP_SHRINKER_DEFAULT_ON=y is set in the kernel configuration.[1]
Runtime Tuning and Controls
Zswap supports dynamic adjustments after boot through sysfs interfaces in/sys/module/zswap/parameters/. These allow enabling/disabling, changing algorithms, adjusting limits, and tuning options without kernel reload.[1]
To enable zswap at runtime, write 1 to enabled (e.g., echo 1 > /sys/module/zswap/parameters/enabled). Disabling writes 0; existing pages remain in the pool until invalidated, faulted, or evicted. To force eviction, use swapoff on swap devices.[1]
The compression algorithm can be changed by writing to compressor (e.g., echo lz4 > /sys/module/zswap/parameters/compressor). New pages use the new compressor; existing ones retain the original until evicted. Supported options depend on kernel configuration and zpool backends like zsmalloc.[1]
The max_pool_percent parameter (default: 20) sets the maximum pool size as a percentage of total RAM (e.g., echo 30 > /sys/module/zswap/parameters/max_pool_percent). This balances memory usage and compression benefits.[1]
The same_filled_pages_enabled parameter (default: Y) optimizes storage for identical-value pages like zero pages without compression (e.g., echo 0 > /sys/module/zswap/parameters/same_filled_pages_enabled to disable).[1]
The shrinker_enabled parameter (default: N, unless CONFIG_ZSWAP_SHRINKER_DEFAULT_ON=y) enables proactive cold page reclamation under memory pressure (e.g., echo Y > /sys/module/zswap/parameters/shrinker_enabled).[1]
The accept_threshold_percent parameter (default: 90) sets hysteresis for accepting pages into a full pool (e.g., echo 80 > /sys/module/zswap/parameters/accept_threshold_percent). Values below 100 prevent thrashing by refusing pages until usage drops below the threshold; 100 disables hysteresis.[1]
These options enable tuning for specific workloads.[1]
Monitoring
Sysfs Interfaces
The sysfs interfaces for zswap are provided under the/sys/module/zswap/parameters/ directory, allowing runtime configuration and querying of core zswap settings. These parameters control global behavior, as zswap operates system-wide without per-cgroup sysfs controls in its base implementation; however, writeback to backing swap can be disabled on a per-cgroup basis using the cgroup v2 interface at /sys/fs/cgroup/<cgroup-name>/memory.zswap.writeback by writing 0 to disable it.[1]
The enabled parameter toggles zswap on or off by writing 1 or 0, respectively; for example, echo 1 > /sys/module/zswap/parameters/enabled activates it at runtime, assuming sysfs is mounted at /sys. When disabled, any pages already in the compressed pool remain until evicted or invalidated, but no new pages are accepted.[1]
The compressor parameter specifies the compression algorithm in use, such as lzo, lz4, or zstd, and supports runtime changes via writes to the file; the default is set by the kernel configuration CONFIG_ZSWAP_COMPRESSOR_DEFAULT, but alterations do not recompress existing pages.[1]
The max_pool_percent parameter limits the compressed pool size as a percentage of total system RAM (default 20%), influencing when zswap begins evicting pages to backing swap; reading this file returns the current limit value.[1]
Zswap always optimizes storage for same-filled pages, like zero pages, by using a compact representation with a compressed length of zero, avoiding unnecessary compression overhead.[1]
The accept_threshold_percent parameter sets a hysteresis threshold (default 90%) below the pool limit, at which zswap resumes accepting new pages after reaching capacity; writing 100 disables this mechanism entirely.[1]
The shrinker_enabled parameter activates the pool shrinker (default off), which evicts cold pages under memory pressure to reclaim RAM; it can be toggled at runtime with Y or N.[1]
These interfaces enable monitoring of basic zswap state during operation; for instance, repeatedly reading /sys/module/zswap/parameters/max_pool_percent and /sys/module/zswap/parameters/enabled in a script can track whether the pool limit is approached and zswap remains active under load. Detailed counters for actual pool usage, such as stored pages or total bytes, are available via debugfs.[1]
Debugfs Statistics
The debugfs interface for zswap provides detailed runtime statistics on pool usage, compression outcomes, and rejection events, accessible under the path/sys/kernel/debug/zswap/ once debugfs is mounted.[1][6] To enable access, the debugfs filesystem must be mounted with the command mount -t debugfs none /sys/kernel/debug, assuming the kernel was compiled with CONFIG_DEBUG_FS enabled. This interface exposes multiple read-only files, each containing a single 64-bit integer value representing cumulative or current metrics for zswap operations.
Key statistics include pool_total_size, which reports the total size of the compressed pool in bytes, calculated as the product of the number of pool pages and the page size.[6] The stored_pages file tracks the current number of compressed pages held in the pool, offering insight into storage utilization and compression efficiency.[6] Similarly, written_back_pages counts the number of pages evicted from the pool and written to backing swap storage, typically due to the pool reaching its maximum capacity defined by zswap_max_pool_percent.[6][1]
Rejection counters provide diagnostics on failed store attempts: reject_alloc_fail increments when the underlying buddy allocator cannot provide sufficient memory for a compressed page; reject_kmemcache_fail records rare failures to allocate metadata for pool entries; and reject_reclaim_fail tallies stores rejected after unsuccessful reclaim attempts when the pool is full.[6] Additional counters like reject_compress_fail (compression algorithm errors) and reject_compress_poor (pages compressed to sizes unsuitable for efficient storage) highlight potential issues with the chosen compressor.[6] The decompress_fail counter tracks failures in decompressing pages during load or writeback operations. The pool_limit_hit metric counts instances where the pool limit was reached, triggering potential writebacks.[6]
These statistics aid in troubleshooting zswap performance; for example, elevated reject_alloc_fail values suggest high memory pressure constraining the pool allocator, while frequent increments in written_back_pages indicate regular evictions and reliance on disk swap.[1] Administrators can monitor these files periodically or via scripts to assess compression success rates, such as by comparing stored_pages against total swap pressure, without altering runtime behavior.[6] For broader control parameters, sysfs interfaces offer complementary tuning options.[1]
History and Development
Introduction and Early Versions
zswap is a Linux kernel feature developed as part of the kernel's memory management subsystem to provide a compressed cache for swapped-out pages, aiming to mitigate the performance penalties associated with traditional swapping to disk. It was first merged into the mainline kernel in version 3.11, released on September 2, 2013, by primary developer Seth Jennings of IBM.[7][8] The implementation addressed the need for efficient memory compression in environments with constrained RAM, where swap operations could become significant bottlenecks due to I/O latency on storage devices. The initial design of zswap was built upon the frontswap API, which had been introduced in kernel version 3.1 in October 2011 to enable "transcendent memory" interfaces for swap pages without requiring modifications to the core swap subsystem.[9][10] Unlike device-based compression solutions that necessitate dedicated RAM-backed block devices, zswap provided a lightweight, in-kernel mechanism for compressing pages directly into a RAM pool, deferring or avoiding writes to the backing swap device. This approach was motivated by the desire to reduce swap I/O in RAM-limited systems, such as embedded devices or virtualized guests under memory pressure, where even modest compression ratios could yield substantial performance gains by trading CPU cycles for fewer disk accesses.[11] At launch, zswap supported only the LZO compression algorithm, chosen for its balance of speed and compression efficiency suitable for real-time swap operations.[12] Pages destined for swapping were compressed and stored in a dynamic pool managed by the zsmalloc allocator, which was also introduced in kernel 3.1 and optimized for handling variably sized, compressed objects with minimal fragmentation.[13][14] If compression succeeded, the page was cached; failures resulted in simple rejection, allowing the page to proceed to standard swap-out without further intervention. Eviction from the pool followed a least recently used (LRU) policy, writing compressed pages back to the swap device when the pool reached its configured size limit, ensuring the cache remained bounded and did not exacerbate memory pressure.[12] These features established zswap as a simple yet effective cache, with early benchmarks showing up to 76% reduction in swap I/O under heavy load.[12]Recent Enhancements
Since the mid-2010s, zswap has seen several key enhancements aimed at improving compression efficiency, pool management, and integration with modern Linux kernel features. In kernel version 3.11 (released in September 2013), support for the LZ4 compressor was added alongside the initial LZO support, providing a faster alternative for scenarios prioritizing speed over compression ratio.[1] Later, in kernel 4.19 (October 2018), Zstd compressor support was introduced, offering superior compression ratios at reasonable speeds, further expanding zswap's configurability for diverse workloads. These compressors can be specified at boot time using thezswap.compressor parameter or adjusted via sysfs at runtime.[1]
In kernel 4.20 (December 2019), a shrinker mechanism was added to zswap, enabling proactive reclamation of cold (infrequently accessed) pages from the compressed pool under memory pressure. This feature helps mitigate pool bloat by evicting stale entries to backing swap storage, reducing overall memory footprint without waiting for the pool to fill completely. The shrinker is disabled by default but can be enabled via the zswap.shrinker_enabled sysfs parameter or boot-time configuration.[1] Building on this, post-kernel 5.0 releases introduced hysteresis controls, such as the accept_threshold_percent parameter, which allows zswap to reject new pages into the pool until it shrinks below a specified percentage of capacity. This prevents thrashing between acceptance and eviction, improving stability in high-pressure scenarios.[1]
Cgroup integration advanced in kernel 5.8 (August 2020), with the addition of per-cgroup controls for disabling writeback, allowing administrators to prevent specific workloads or containers from evicting compressed pages to disk. This is useful for prioritizing latency-sensitive tasks by keeping their swap data in RAM. Further refinements in kernel 6.x series, including 6.8 (March 2024), added the ability to force cold page eviction under tight memory conditions and a mode to disable writeback entirely.[15]
In the 2020s, zswap benefited from tighter integration with zsmalloc, the default allocator, including optimizations for handling same-filled pages to reduce duplication and improve density. Ongoing developments through kernel 6.17 (September 2025) include a second-chance eviction algorithm in 6.12 for dynamic pool sizing, compression batching with hardware accelerator support like IAA for high-throughput systems, and node-preferred allocation policies in zsmalloc to enhance NUMA awareness and reduce remote memory access. These updates, including reduced write amplification for SSD longevity and better VM overcommit support, emphasize scalability and efficiency in cloud and edge environments.[16][17][1]
Comparisons
zswap vs. zram
Zswap and zram are both Linux kernel mechanisms for compressed in-memory swap, but they differ fundamentally in architecture. Zswap operates as a lightweight cache layered on top of an existing backing swap device, such as a disk partition or file, intercepting pages during the swap-out process and compressing them into a RAM-based pool using the zsmalloc allocator. If the pool reaches its limit, the least recently used compressed pages are written back to the underlying swap device in a write-back policy. In contrast, zram functions as a standalone compressed block device that resides entirely in RAM, acting as a self-contained swap space without requiring any backing storage; pages swapped to zram are compressed and stored directly in the allocated memory, with no eviction to disk.[1][18] These architectural distinctions lead to notable performance trade-offs. Zswap is particularly effective for large or unpredictable workloads on systems with fast backing storage like SSDs, as its dynamic sizing allows the cache to grow or shrink based on memory pressure, reducing I/O latency by keeping hot pages in compressed RAM while offloading cold ones to disk. Zram, however, offers simpler operation for fixed-size, small-scale swap needs on slower or absent storage, providing very fast in-memory I/O since all operations remain in RAM, though it may incur higher overall CPU overhead due to consistent compression and decompression without the option for disk fallback.[1][18] In terms of resource utilization, zswap employs a dynamic RAM pool limited by themax_pool_percent parameter (defaulting to 20% of total system memory), enabling flexible allocation that avoids overcommitting RAM while using a red-black tree for efficient page tracking. Zram, by comparison, requires preallocation of a fixed device size via the disksize parameter, which consumes a predictable but static amount of RAM for compressed storage, typically achieving a 2:1 compression ratio but dedicating resources upfront without the adaptability of zswap's pool.[1][18]
Choosing between zswap and zram depends on system constraints: zswap suits environments with an existing swap device on disk (e.g., SSD-backed systems) to extend effective memory capacity without fully replacing traditional swapping, while zram is preferable for diskless setups, embedded devices, or scenarios lacking persistent storage where a dedicated in-RAM swap is needed. While zswap and zram can be used together (e.g., with zram as the backing swap device), doing so is generally discouraged due to overlapping compression roles that lead to inefficiency from double compression.[1][18]