Page replacement algorithm
A page replacement algorithm is a method employed by operating systems in virtual memory management to determine which page residing in physical memory should be evicted to disk when a page fault occurs and all memory frames are occupied, thereby minimizing the number of page faults and optimizing system performance.[1] These algorithms are essential for handling the illusion of a large contiguous memory space provided to processes despite limited physical RAM, relying on the principle of locality of reference where programs tend to access a small subset of pages frequently.[2]
The concept of page replacement originated with the invention of virtual memory on the Atlas computer at the University of Manchester, where the first such algorithm was implemented in 1962 as part of a "learning program" that estimated page usage cycles to replace the least-used page.[3] This innovation, patented under GB976633, automated the transfer of data between fast core memory and slower drum storage, eliminating the need for manual overlay management.[3] Over time, various algorithms have been developed to approximate optimal replacement, with the ideal Optimal (OPT or MIN) algorithm—replacing the page that will not be referenced for the longest time in the future—serving as a theoretical benchmark, though it is unrealizable without knowledge of future accesses.[1][2]
Common practical algorithms include First-In, First-Out (FIFO), which evicts the oldest page in memory regardless of usage, simple to implement but susceptible to Belady's anomaly where increasing memory frames can paradoxically increase page faults.[1] Least Recently Used (LRU) approximates optimality by replacing the page unused for the longest recent period, leveraging hardware counters or stacks for tracking, and performs well under locality assumptions but can be computationally expensive.[2][1] Other variants, such as the Clock (Second-Chance) algorithm, use a circular list and an accessed bit to give recently used pages a "second chance" before eviction, providing an efficient approximation of LRU with minimal overhead.[1] Additional approaches like Least Frequently Used (LFU) prioritize eviction based on access counts (with decay to avoid stagnation) and Random replacement avoid pathological behaviors through simplicity, though they may not always minimize faults.[1]
Page replacement strategies significantly impact system throughput, as poor choices can lead to thrashing—excessive paging that degrades performance—and modern operating systems often combine these with frame allocation policies to balance multiprogramming levels.[2] Research continues to refine these algorithms for emerging hardware like non-volatile memory, emphasizing adaptability to access patterns in databases and high-performance computing.[4]
Historical Context
Early Developments in Paging
The paging mechanism was first introduced in the Atlas computer, developed by Ferranti in collaboration with the University of Manchester and operational from December 1962.[3] This system pioneered virtual memory by treating core memory and magnetic drum storage as a unified "one-level store," subdivided into fixed-size pages of 512 words each to facilitate efficient address mapping via page address registers.[5] Demand paging, a core innovation, loaded pages into physical memory only upon access, triggered by page faults detected by hardware, thereby allowing programs larger than available core to execute without manual intervention.[3]
In early multiprogrammed systems like Atlas, page replacement played a critical role in managing page faults, where the operating system supervisor automatically selected pages for eviction from core to drum when no free frames were available.[5] Initial implementations relied on fixed allocation of page frames to individual processes, which restricted the degree of multiprogramming and aimed to minimize interference between concurrent jobs by assigning static memory quotas at load time.[6] This approach ensured predictable performance in batch processing environments but proved inflexible for varying workloads, as exceeding allocated frames led to immediate job termination rather than dynamic adjustment.[7]
One specific example of early limitations occurred in the Manchester Mark 1 computer, a precursor to Atlas completed in 1949, which lacked any automatic paging or replacement mechanism, forcing programmers to manually manage overlays by explicitly coding transfers between main memory and auxiliary storage like drums. Such manual intervention was labor-intensive and error-prone, often requiring recompilation for large programs that exceeded the limited 1K-word Williams tube memory.
Systems like Multics, developed starting in 1965 by MIT, General Electric, and Bell Labs, encountered initial challenges with thrashing in the late 1960s, where aggressive multiprogramming caused excessive page faults and swapping overhead, drastically reducing CPU utilization to near zero as the system oscillated between memory operations. This phenomenon arose from fixed or poorly tuned allocation strategies that overcommitted physical memory, prompting the development of dynamic replacement techniques to monitor and limit active memory usage per process.[7]
Key Contributions and Evolution
The foundational work on page replacement algorithms began with Laszlo A. Belady's 1966 paper, which introduced stack algorithms and formalized the optimal replacement strategy, known as Belady's MIN, that evicts the page referenced furthest in the future to minimize faults.[8] Belady's analysis, conducted at IBM, also identified the counterintuitive Belady's anomaly, where increasing the number of page frames can lead to more page faults under certain reference patterns, challenging assumptions about memory allocation.[9]
Building on this, Peter J. Denning's 1968 paper proposed the working set model, which defined a process's active memory as the set of pages recently referenced within a time window, influencing replacement policies to retain these pages and prevent thrashing in multiprogrammed systems.[10] Denning's model, developed during his time at Princeton University, provided a theoretical basis for demand-paging systems and emphasized locality of reference, shaping subsequent hardware and software designs at institutions like Bell Labs.
In the 1970s, hardware limitations—such as the high cost and complexity of maintaining full access histories—led to approximations of the ideal Least Recently Used (LRU) algorithm, which evicts the least recently accessed page but requires timestamp tracking impractical for large memories.[11] These approximations, including reference bit-based counters, emerged as feasible alternatives in early virtual memory implementations, balancing performance with resource constraints.
During the 1980s and 1990s, page replacement evolved in Unix-like systems through refinements like the clock algorithm, an LRU approximation using a circular list and reference bits to simulate aging without full timestamps, as implemented in Berkeley Software Distribution (BSD) variants.[12] System V Release 4 (SVR4), developed at Bell Labs, further advanced this with the two-handed clock algorithm, incorporating priority bits to distinguish frequently used pages and reduce write traffic.[13] Aging techniques, which decrement counters for unreferenced pages to approximate recency, also gained adoption in Unix derivatives for efficient LRU simulation under hardware constraints.
By the 2000s, the rise of solid-state drives (SSDs) with asymmetric read/write costs and limited write endurance prompted hybrid algorithms tailored to flash memory, such as Clean-First LRU (CFLRU), which prioritizes evicting clean pages to minimize writes while preserving hit ratios.[14] These developments, building on IBM and Bell Labs foundations, addressed modern storage hierarchies by integrating device-specific optimizations into traditional policies.
Fundamental Concepts
Local versus Global Replacement
In page replacement algorithms, the scope of replacement—local or global—determines how physical memory frames are selected for eviction when a page fault occurs and no free frames are available. Local replacement confines the selection of victim pages to the set of frames allocated specifically to the faulting process, ensuring that only that process's pages are considered for replacement.[15] This approach maintains isolation between processes, as one process cannot evict pages belonging to another. In contrast, global replacement considers all frames in the system, regardless of which process owns them, allowing the algorithm to select any victim page system-wide.[15]
Local replacement offers several advantages, including fairness and protection against interference: a resource-intensive process cannot degrade the performance of others by monopolizing frame replacements, thereby preventing scenarios where one process starves the system of usable memory.[15] However, it has drawbacks, such as potential underutilization of physical memory; if a process's allocated frames exceed its active working set, idle frames remain unused while the system as a whole suffers from fragmentation or insufficient total allocation.[15] Global replacement, by accessing the entire frame pool, optimizes overall memory utilization and can reduce total page faults across all processes, as it dynamically balances load.[15] Its primary disadvantage is the risk of unfairness: a single thrashing process may repeatedly evict useful pages from well-behaved processes, leading to system-wide performance degradation.[15]
Local replacement is particularly suited to fixed-partition systems or those emphasizing process isolation, such as early implementations inspired by the working set model, where each process receives a predetermined number of frames based on its estimated memory needs to avoid thrashing. This model, proposed by Peter Denning, ensures that a process only executes if its working set fits within its allocation, with replacements confined locally to maintain predictable fault rates. Global replacement predominates in demand-paging operating systems like modern Unix variants, including Linux, where memory is allocated on demand without fixed per-process limits, allowing flexible sharing of the entire physical memory pool to maximize throughput in multiprogrammed environments.[16]
To illustrate, consider a multiprogrammed system with two processes sharing 8 frames total, where Process A is allocated 4 frames locally and Process B the remaining 4. If Process A experiences a page fault under local replacement, it must evict one of its own pages, preserving Process B's frames and avoiding cross-process interference. Under global replacement, the fault in Process A could evict a page from Process B's frames if deemed least useful system-wide, potentially increasing faults for Process B but possibly reducing overall system faults by better utilizing idle frames in Process B. This highlights how local policies prioritize equity, while global ones favor efficiency, though the latter requires mechanisms like priority scheduling to mitigate unfairness.[15] In paging systems, these scopes are invoked during page fault handling to maintain virtual memory illusion without halting the entire system.[15]
Detecting Page References and Modifications
In virtual memory systems, detecting page references and modifications is essential for efficient page replacement, as it allows the operating system to identify frequently accessed pages and those requiring write-back to secondary storage. The reference bit, also known as the accessed bit, is a hardware-provided flag in the page table entry (PTE) that is automatically set to 1 by the memory management unit (MMU) whenever a page is read or written, indicating recent usage.[17] The operating system periodically samples and clears these bits—typically to 0—during page replacement decisions to approximate recency of access, enabling algorithms to prioritize evicting unreferenced pages.[17]
The dirty bit, or modified bit, complements reference tracking by signaling whether a page's contents have been altered since it was loaded into physical memory. Set to 1 by hardware on the first write to the page, this bit in the PTE informs the OS that the page must be written back to disk upon eviction to preserve data integrity, avoiding unnecessary I/O for unmodified pages.[18] In the x86 architecture, both the accessed (reference) bit at position 5 and the dirty bit at position 6 have been standard features of PTEs since the introduction of the Intel 80386 processor in 1986, supporting robust virtual memory management.[19]
When hardware support for reference bits is unavailable or insufficient, operating systems employ software approximations using page tables augmented with timestamps or counters to track access patterns. For instance, on access, the OS updates a software-maintained counter or timestamp in the PTE, simulating hardware bit behavior through trap handling, though this incurs higher overhead than native MMU support.[20] In systems like VAX/VMS, such software simulation of reference bits has been shown to consume minimal processor resources, around 0.05% of CPU time.[20]
These detection mechanisms introduce challenges, particularly the overhead associated with bit clearing and sampling, which requires the OS to traverse page tables periodically. To manage this, systems often leverage timer interrupts to reset reference bits at fixed intervals, such as every few seconds, balancing accuracy with performance; however, frequent clearing can lead to increased context switches and memory access latency if not optimized.[17] In local replacement strategies, accurate detection helps confine decisions to a process's page set, though global approaches may aggregate bits across the system for broader efficiency.[17]
Precleaning Mechanisms
Precleaning mechanisms in page replacement involve the asynchronous writing of modified (dirty) pages back to secondary storage before they are selected for eviction, thereby mitigating the latency associated with synchronous I/O operations during actual page faults.[21] This approach contrasts with demand cleaning, where dirty pages are written out only upon eviction, potentially blocking the replacement process and increasing fault handling time.[22]
Common methods for precleaning include periodic scanning of dirty pages by background daemons and triggering writes when memory pressure reaches predefined thresholds. In Linux kernels prior to version 2.6.32, the pdflush daemon managed this by monitoring system memory and initiating writeback when the proportion of dirty pages exceeded tunable parameters like dirty_background_ratio (typically 5-10% of total RAM), ensuring proactive flushing without immediate need for eviction.[22][23] Threshold-based precleaning activates under low free memory conditions to reclaim space preemptively, while opportunistic precleaning occurs during system idle periods to minimize interference with active workloads.[21]
These mechanisms enhance system responsiveness by decoupling write operations from page replacement decisions, reducing the average time to handle faults and preventing stalls in interactive applications.[22] However, precleaning can waste I/O bandwidth if a page is modified again or referenced before eviction, leading to unnecessary disk writes and potential amplification of overhead in write-intensive environments.[21]
Dirty pages are identified using a dirty bit set in the page table entry upon modification, enabling efficient selection for precleaning without scanning all memory.[22] Overall, precleaning balances the trade-off between latency reduction and resource efficiency, particularly in systems with high memory turnover.
Theoretical Foundations
Belady's Optimal Page Replacement
Belady's optimal page replacement algorithm, also known as OPT, is a theoretical strategy designed to minimize the number of page faults in a paging system by selecting the page to evict that will not be referenced again for the longest period of time.[24] Introduced by László Bélády, this algorithm assumes complete knowledge of the future reference string, allowing it to achieve the lowest possible page fault rate among all replacement policies for a given sequence of page accesses.[24] Upon a page fault, the algorithm examines the future references and replaces the resident page whose next use appears farthest ahead in the string; if a page is never referenced again, it is immediately selected for eviction.[24]
The optimality of this approach can be formally expressed through the eviction decision rule. For a set of pages P currently in memory and a future reference string R, the page to evict is chosen as:
p^* = \arg\max_{p \in P} \min \{ i \mid R_i = p, i > t \}
where t is the current time step, and \min \{ i \mid R_i = p, i > t \} is the position of the next reference to page p after time t; if no such i exists, assign infinity.[24] This ensures that page faults are minimized by preserving pages with imminent future uses. A pseudocode representation of the algorithm, assuming access to the full reference string, is as follows:
function optimalReplace(currentPages, faultPage, referenceString, currentIndex):
if len(currentPages) < frameCount:
currentPages.append(faultPage)
return
maxDistance = -1
victim = None
for page in currentPages:
nextUse = findNextReference(referenceString, currentIndex + 1, page)
if nextUse > maxDistance or nextUse == [infinity](/page/Infinity):
maxDistance = nextUse
victim = page
currentPages.remove(victim)
currentPages.append(faultPage)
function findNextReference(refString, startIdx, page):
for i in range(startIdx, len(refString)):
if refString[i] == page:
return i
return [infinity](/page/Infinity) // page not used again
function optimalReplace(currentPages, faultPage, referenceString, currentIndex):
if len(currentPages) < frameCount:
currentPages.append(faultPage)
return
maxDistance = -1
victim = None
for page in currentPages:
nextUse = findNextReference(referenceString, currentIndex + 1, page)
if nextUse > maxDistance or nextUse == [infinity](/page/Infinity):
maxDistance = nextUse
victim = page
currentPages.remove(victim)
currentPages.append(faultPage)
function findNextReference(refString, startIdx, page):
for i in range(startIdx, len(refString)):
if refString[i] == page:
return i
return [infinity](/page/Infinity) // page not used again
This implementation scans the future portion of the reference string for each candidate page to determine the eviction target.[24]
Although OPT provides the theoretical lower bound on page faults, it is impractical for real systems because it requires oracle-like foresight into future memory accesses, which is unavailable at runtime.[24] Instead, it serves as a benchmark to evaluate the performance of practical algorithms, such as least recently used (LRU), which approximates OPT by assuming recent past usage predicts future needs.[24]
Bélády's work also highlighted counterintuitive behaviors in paging systems, notably Bélády's anomaly, where increasing the number of available frames can paradoxically lead to more page faults for certain algorithms like FIFO.[25] For example, with the reference string 1,2,3,4,1,2,5,1,2,3,4,5, FIFO incurs 9 faults with 3 frames but 10 faults with 4 frames, demonstrating how allocation strategies can interact unexpectedly with access patterns.[25] In contrast, OPT remains anomaly-free, always benefiting from additional frames.[25]
The (h,k)-Paging Problem
The (h,k)-paging problem is a theoretical model for page replacement in multiprogrammed systems, where h denotes the maximum number of pages that can be allocated to each process (reflecting a constrained working set size), and k represents the total number of available page frames in physical memory. The goal is to devise a replacement policy that minimizes the total number of page faults across all processes while adhering to these per-process limits and the global memory capacity. This setup captures the trade-offs in resource sharing, ensuring no process exceeds its h-page quota while utilizing the k frames efficiently to service reference streams from multiple processes.
This problem extends Belady's optimal page replacement algorithm, which assumes unlimited frames per process and focuses on evicting the page with the farthest next reference, to more realistic constrained environments. By incorporating per-process bounds h and total frames k, the (h,k)-model analyzes fault minimization under memory contention, highlighting how local quotas interact with global availability to affect overall system performance. Belady's OPT serves as a baseline for offline optimality, but the (h,k) constraints introduce additional complexity in decision-making for eviction choices.[26]
Approximations are possible through stack algorithms, which preserve the stack depth property (where smaller memory states are subsets of larger ones) and offer bounded approximation ratios relative to the optimum. These algorithms provide practical heuristics for fault reduction.
Sleator and Tarjan's 1985 work introduced competitive analysis to paging, adapting it to the (h,k) setting to evaluate online algorithms against an oblivious adversary. They proved that deterministic strategies like LRU achieve a competitive ratio of k/(k - h + 1) in this model, establishing a tight bound that no better deterministic online algorithm exists. This analysis underscores the inherent limitations of online replacement in constrained paging scenarios, influencing subsequent developments in approximation techniques.[27]
For example, in a system with h=3 and k=5, the optimal eviction must balance local constraints (limiting any process to at most 3 resident pages) and global limits (only 5 frames total), potentially requiring eviction from a process whose working set is full to accommodate a faulting page from another, while minimizing future faults across the system.[27]
Working Set Model
The working set model, proposed by Peter J. Denning in 1968, conceptualizes the active memory footprint of a process based on its recent reference behavior to inform efficient page replacement decisions.[10] The core idea defines the working set W(t, \tau) of a process at time t as the collection of distinct pages referenced within the preceding time window of length \tau, where \tau represents a fixed parameter tuned to the system's instruction execution rate or memory access patterns.[10] This model leverages the principle of locality of reference, positing that programs exhibit phases of concentrated access to a limited subset of their pages, allowing the system to allocate physical frames accordingly without excessive paging overhead.[10]
The size of the working set, denoted w(t, \tau) = |\{ p \mid p \ referenced \ in \ [t - \tau, t] \}|, quantifies the number of unique pages p accessed in that interval, providing a dynamic estimate of the minimum memory required for efficient execution.[10] In the context of page replacement, the algorithm maintains a separate working set for each process by monitoring page references and ensuring that physical memory allocation meets or exceeds this size; if available frames fall short, the process is suspended or swapped out entirely to avert thrashing, rather than evicting pages from the active set.[10] This per-process management promotes multiprogramming balance by prioritizing active workloads and scaling resource allocation to actual usage patterns.[10]
Practical estimation of the working set avoids exhaustive tracking by relying on hardware reference bits set by the memory management unit on each access, which are periodically scanned and cleared at intervals approximating \tau to identify recently used pages.[10] This approximation enables low-overhead computation, updating the working set as the process advances through its reference string.[10] The model's benefits include its adaptability to evolving locality phases in program behavior, reducing page faults during transitions and enhancing overall system throughput compared to static allocation strategies.[10] It has influenced virtual memory implementations in production systems, such as OpenVMS, where working sets explicitly govern process memory quotas.
Practical Algorithms
First-In, First-Out (FIFO)
The First-In, First-Out (FIFO) page replacement algorithm operates by maintaining a queue of pages in physical memory, ordered according to their loading time. Upon a page fault when memory is full, the operating system evicts the page at the front of the queue—the one that has resided in memory the longest—and appends the newly faulted page to the rear. This mechanism can be implemented efficiently using a simple queue or a cyclic counter to approximate age ordering, avoiding the need for complex tracking of usage patterns.[8]
The primary advantages of FIFO lie in its simplicity and minimal runtime overhead, making it suitable for resource-constrained environments where ease of implementation outweighs potential inefficiencies. It requires no hardware support for reference counting or timestamps, relying solely on sequential ordering.[8]
Despite these benefits, FIFO suffers from significant drawbacks, as it disregards the recency of page accesses and may evict frequently referenced pages simply because they were loaded early. A key illustration of its limitations is Belady's anomaly, where allocating more page frames paradoxically increases the number of page faults. Consider the reference string 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5: with 3 frames, FIFO produces 9 page faults, but with 4 frames, it results in 10 page faults.[8][28]
Performance evaluations through simulations reveal that FIFO typically underperforms compared to algorithms like LRU, often generating 2-3 times more page faults for the same reference strings and frame counts, particularly in workloads with looping access patterns. For example, in tests with varying frame allocations (e.g., 4 to 8 frames), FIFO's fault rate consistently exceeds LRU's by a factor of approximately 2.5 on average across synthetic traces.[29][30]
FIFO found application in early virtual memory implementations, including components of IBM's OS/360 operating system, where simplicity was prioritized in initial paging designs. It can be briefly enhanced via a second-chance mechanism to protect recently referenced pages from immediate eviction.
Second-Chance Algorithm
The Second-Chance algorithm enhances the First-In, First-Out (FIFO) page replacement method by incorporating a single reference bit per page, allowing recently accessed pages a second opportunity to avoid eviction. This approach addresses a key limitation of FIFO, where the oldest page is evicted regardless of its recent usage, potentially removing pages that are still relevant. By checking the reference bit before eviction, the algorithm prioritizes retaining pages that have been referenced since their last consideration, thereby improving overall memory utilization in demand-paging systems.[31][32]
In operation, pages are maintained in a FIFO queue ordered by their loading time. Upon a page fault when all frames are occupied, the algorithm inspects the front page of the queue. If its reference bit is 1 (indicating a recent access), the bit is cleared to 0, and the page is relocated to the tail of the queue; the inspection then proceeds to the new front page. This scanning continues until a page with a reference bit of 0 is encountered at the front, which is selected for eviction and replacement by the faulting page. The reference bit is hardware-managed, automatically set to 1 by the CPU on any access to the page (read or write). If the queue is fully scanned without finding a bit of 0, the original front page (now with bit 0) is evicted after all bits have been cleared. Also referred to as the clock algorithm using one bit, this mechanism effectively simulates a circular scan through the queue.[31][32][33]
The algorithm offers several advantages over basic FIFO, including reduced page faults by preserving pages with ongoing utility, while approximating the more effective Least Recently Used (LRU) policy at significantly lower overhead—no stack or timestamp maintenance is required. It performs better than FIFO in workloads with temporal locality, as demonstrated in simulations where it incurs fewer faults by deferring eviction of referenced pages. However, its queue-based nature can lead to inefficiencies, such as extended scanning times (potentially O(n) in the worst case, where n is the number of frames) if many pages have set reference bits, resulting in delays akin to those in FIFO but with added bit-checking costs. Additionally, if reference bits remain unset for long periods, it may degenerate into standard FIFO behavior.[31][33][31]
To illustrate, consider a reference string 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5 with three frames. The algorithm loads pages 1, 2, and 3 on initial faults (setting their reference bits to 1). For page 4, it scans the queue (1, 2, 3), clears each bit, and moves them to the tail, ultimately evicting page 1 (now with bit 0) after a full cycle and loading 4. Subsequent faults on 1 and 2 evict pages 2 and 3 (with bit 0), loading 1 and 2; the fault on 5 evicts 4 (bit 0). Later, after hits on 1 and 2 (setting their bits to 1), the fault on 3 scans and gives second chances to 1 and 2 before evicting 5 (bit 0); faults on 4 and 5 then evict 1 and 2 (bits cleared during previous scan). This results in 10 total faults, compared to 9 under FIFO—though in this specific string Second-Chance incurs one more fault, it avoids Belady's anomaly and reduces faults in many workloads exhibiting recency by protecting recently accessed pages like 1 and 2.[31]
Implementation requires only modest hardware support: a single reference bit in each page table entry, which the memory management unit sets on access without software intervention, enabling efficient approximation of usage patterns at low cost.[31][32]
Clock Algorithm
The Clock algorithm, also known as the second-chance algorithm, is a page replacement policy that approximates the Least Recently Used (LRU) strategy using a circular list of pages and hardware-supported reference bits. In this approach, physical memory frames are organized in a circular queue, with a "clock hand" pointer that sweeps through the list to identify victim pages for replacement. Each page frame has an associated reference bit, set to 1 by the hardware whenever the page is accessed (read or written). When a page fault occurs and no free frame is available, the clock hand advances to the next frame in the circle. If the reference bit of the current page is 0, that page is selected for replacement, as it has not been recently used. If the bit is 1, it is cleared to 0 (giving the page a "second chance"), and the hand moves to the next frame, repeating the process until a suitable victim is found. This mechanism simulates the recency-based eviction of LRU without requiring exact timestamps or counters per page, relying instead on periodic bit clearing by the sweeping hand.[34]
The algorithm's efficiency stems from its constant average-case time complexity of O(1), as the hand typically advances only a few positions before finding an unreferenced page in practice, making it suitable for real-time operating system decisions. It is particularly hardware-friendly, leveraging the same reference bits used for demand paging without additional per-page storage or complex computations. In systems with frequent page accesses, the hand's movement effectively partitions pages into "recently used" (with bit 1) and "not recently used" (bit 0) categories, providing a low-overhead approximation of LRU that avoids the stack anomalies seen in simpler policies like FIFO.[34]
Variants of the Clock algorithm enhance its basic mechanism to handle additional factors like page modifications or working sets. The two-handed Clock variant employs separate "front" and "back" hands to accelerate victim selection: the front hand clears reference bits as it advances, while the back hand, trailing at a fixed distance, selects pages with cleared bits for replacement, reducing sweep times in large memory systems. This is commonly used in operating systems like Solaris to balance responsiveness and overhead. Another variant, WSClock, integrates working set concepts with Clock by incorporating a modified bit and reference time stamps; it evicts unmodified pages immediately if unreferenced, but for dirty pages, it delays replacement until a precleaning phase based on age, combining global replacement with local working set protection to minimize thrashing. WSClock was proposed as an effective hybrid for virtual memory management in multiprogrammed environments.[35][36]
In practice, the Clock algorithm performs close to LRU, with empirical studies showing fault rates within 10-20% of optimal in typical workloads, due to its effective approximation of recency. It has been widely adopted in Unix-like systems, notably introduced in 4.2BSD as a global replacement policy to manage demand-paged virtual memory efficiently across processes.[34][37]
Least Recently Used (LRU)
The Least Recently Used (LRU) page replacement algorithm is a recency-based policy that approximates the optimal replacement strategy by evicting the page whose last access occurred furthest in the past. Upon a page fault, LRU identifies the victim page as the one with the earliest timestamp of access among those currently in memory, thereby prioritizing the retention of more recently accessed pages under the assumption of temporal locality. This approach was first systematically evaluated in early studies of virtual memory systems, where it demonstrated superior performance to simpler policies like FIFO by accounting for usage patterns rather than arrival order.[8]
In operation, LRU maintains a ordered structure, such as a doubly-linked list or an array of timestamps, to track the recency of page accesses. Each time a page is referenced (read or written), its position in the list is updated to the most recent spot, or its timestamp is refreshed to the current time. On a fault requiring replacement, the algorithm scans the structure to find and evict the page at the tail of the list (least recent) or with the minimum timestamp, then inserts the new page at the head or with the current timestamp. This exact tracking ensures fidelity to recency but requires O(1) updates per access via efficient data structures like hash tables combined with lists. For analysis, LRU's behavior aligns with the stack model, where page references form a stack discipline, allowing simulation of faults via stack distances—the number of distinct pages referenced since last use.[8][38]
The primary advantage of LRU is its optimality under the stack hypothesis, where future references follow the same access pattern as past ones, making it equivalent to Belady's optimal algorithm in such cases. Additionally, competitive analysis establishes that LRU achieves a competitive ratio of k against the optimal offline algorithm for a cache of size k, meaning its fault count is at most k times that of the optimum in the worst case, which is tight and the best possible for deterministic online algorithms. This theoretical guarantee underscores its robustness for workloads exhibiting locality. However, LRU incurs high implementation overhead due to the need for frequent list traversals or timestamp comparisons on every access, potentially leading to significant CPU costs in software; in hardware, exact LRU may require costly counters or content-addressable memory. The stack distance metric further aids in tuning but adds analytical complexity.[39][40]
Variants of LRU address these limitations while preserving core principles. LRU-k generalizes the policy by evicting the page whose k-th most recent access was furthest in the past, using the inter-reference gap (time between the (k-1)-th and k-th references) to better handle infrequent but recurring accesses, particularly in database buffers. Pseudo-LRU approximations, often implemented with binary trees or bit hierarchies, simulate recency ordering with lower overhead by approximating the full list order, suitable for hardware caches where exact tracking is infeasible. The eviction decision in standard LRU can be formally expressed as selecting the page p that minimizes the last access time:
p = \arg\min_{q \in \text{memory}} \tau_q
where \tau_q is the timestamp of the most recent reference to page q.[41]
In performance evaluations, LRU consistently outperforms FIFO and random replacement in standard benchmarks like those from early virtual memory simulations, generating 2-3 times fewer faults than non-recency policies but still trailing the theoretical optimum by a factor related to cache size. It is widely adopted in database systems and operating system page caches for its balance of effectiveness and analyzability, though often approximated by lower-overhead mechanisms like the clock algorithm to mitigate costs.[8]
Not Recently Used (NRU)
The Not Recently Used (NRU) page replacement algorithm classifies memory pages into four states based on two hardware-provided bits: the reference bit (R), which is set to 1 whenever the page is accessed (read or written) and indicates recent usage, and the modified bit (M), which is set to 1 if the page has been altered and signals the need for writing back to disk upon eviction.[42] These bits enable a simple approximation of recency and dirtiness without maintaining complex data structures like stacks or queues. Pages fall into the following classes: class 00 (R=0, M=0: not referenced and not modified, preferred victims), class 01 (R=0, M=1: not referenced but modified), class 10 (R=1, M=0: referenced but not modified), and class 11 (R=1, M=1: referenced and modified, least desirable for eviction).[42]
When a page fault occurs and no free frames are available, the NRU algorithm identifies the lowest-numbered non-empty class and randomly selects a page from it for replacement.[42] To maintain accuracy over time, the operating system periodically resets all reference bits to 0 (e.g., every few seconds or clock ticks), effectively sampling usage over a recent window without tracking exact timestamps.[42] This reset mechanism ensures that only truly recent accesses influence decisions, while the modified bit remains set until the page is written back, as it cannot be reset without risking data loss. The hardware typically detects references and modifications via the memory management unit (MMU), setting the bits on-the-fly during address translation.[42]
The algorithm's primary advantages lie in its speed and low overhead: it requires no ordered lists or counters, making it efficient for implementation in resource-constrained systems, and it balances recency (via the reference bit) with the cost of eviction (via the modified bit) to minimize unnecessary disk I/O.[42] However, NRU is coarse-grained, as it groups pages into only four categories without considering the precise order or frequency of references within the sampling period, potentially leading to suboptimal choices compared to more precise methods.[42]
In practice, under low memory conditions, NRU prioritizes evicting from class 00 over higher classes like 11; for instance, replacing an unmodified, unreferenced page avoids the overhead of flushing a recently accessed dirty page to disk, thereby reducing page fault latency.[42] Similar classification-based policies have been explored in various systems for lightweight page replacement.
Not Frequently Used (NFU)
The Not Frequently Used (NFU) page replacement algorithm is a frequency-based policy designed to approximate the Least Frequently Used (LFU) strategy by tracking page access counts through software counters, prioritizing the eviction of pages with the lowest usage frequency over time. Each page in physical memory is assigned a software counter, set to zero when the page is first loaded into memory. Periodically, at each clock interrupt—typically every 20 milliseconds—the operating system scans all pages in memory. For any page where the hardware reference bit is set (indicating a reference since the last scan), the counter is incremented by one, and the reference bit is cleared to zero; if the bit is not set, the counter remains unchanged. Upon a page fault requiring replacement, the algorithm selects the page with the smallest counter value for eviction, breaking ties arbitrarily if needed. This approach provides a simple, low-overhead way to estimate long-term frequency without requiring hardware support for exact counting on every access.
To optimize hardware implementation and reduce overhead, NFU counters can be approximated using shift registers or bit-shifting techniques, where reference bits are shifted into a fixed-width register (e.g., 8 bits) to represent a weighted frequency score, simulating the counter increments efficiently without full arithmetic operations. Early implementations, such as those in systems predating widespread LRU hardware, relied on such approximations to manage reference tracking in resource-constrained environments.
NFU excels in simplicity, requiring only periodic software scans and basic counter maintenance, making it suitable for systems without dedicated LRU stacks or timestamps. It performs particularly well on workloads with stable, looping access patterns, where frequently accessed pages accumulate higher counters and remain protected from eviction, minimizing faults in repetitive scenarios.
However, NFU has notable drawbacks due to its exclusive focus on frequency, completely ignoring recency of access. A page involved in an intense but one-time burst of references long ago can retain a high counter indefinitely, potentially evicting a currently active page with fewer total references, leading to suboptimal performance in dynamic workloads with shifting patterns. As a pure frequency variant without recency weighting—unlike aging algorithms that decay counters over time—NFU treats historical usage as permanent, exacerbating this issue. NFU serves as a foundational approximation to LFU, sampling frequency at fixed intervals rather than incrementing on every reference, which avoids the high cost of exact LFU but limits adaptability. Concepts from NFU have been briefly integrated with LRU in hybrid policies like LRFU for balanced frequency-recency decisions.
Aging Algorithm
The aging algorithm approximates the least recently used (LRU) page replacement strategy by maintaining a counter for each page that encodes both the recency and frequency of accesses through bit-shifting operations, providing a low-overhead alternative suitable for software implementation.[43] It extends the not frequently used (NFU) approach by incorporating periodic aging to decay older references, allowing more responsive eviction decisions based on recent usage patterns.[44]
In operation, each page table entry includes an 8-bit counter, initialized to zero, which serves as an age value representing the page's usage history.[43] When a page is referenced, its counter is updated immediately by shifting the current value left by one bit and setting the least significant bit to 1, which boosts the value to reflect the recent access:
\text{value} = (\text{value} \ll 1) \mid 1
Periodically, such as on each clock tick, the operating system shifts the counter right by one bit for every page in memory to simulate aging and decay the influence of past references:
\text{[value](/page/Value)} = \text{[value](/page/Value)} \gg 1
To evict a page during a fault, the algorithm selects the page with the smallest counter value, as this indicates the least recent or frequent usage.[44]
For example, consider a page referenced recently: its counter might jump to a high value like 255 (all bits set after multiple recent accesses), but over several clock ticks without further references, repeated right shifts gradually reduce it toward zero, lowering its priority relative to actively used pages.[43] This mechanism ensures that pages with sporadic but recent activity retain moderate priority, while long-idle pages age out completely within the 8-bit window (up to 256 ticks).
The algorithm's advantages include hardware efficiency, as it relies on simple bit operations without needing timestamps or stacks, and it balances recency (via immediate boosts) with frequency (via accumulated shifts), often performing close to LRU in practice with minimal overhead.[44] However, it has drawbacks such as a fixed history window limited by the counter size, leading to potential approximation errors if access patterns exceed the bit depth or vary in burstiness.[43] This aging algorithm was employed in older Unix systems for its simplicity and effectiveness in managing virtual memory under resource constraints.[45]
Random Replacement
The random replacement algorithm operates by selecting a victim page frame uniformly at random from the set of currently allocated frames whenever a page fault occurs and no free frames are available. This non-deterministic approach eliminates the need for any reference bits, timestamps, or access history maintenance, incurring zero overhead for tracking page usage patterns. As a result, it is exceptionally simple to implement in both software and hardware, requiring only a random number generator to choose among the available frames.
A key advantage of random replacement is its ability to avoid the overhead associated with more sophisticated algorithms, making it ideal for environments where computational resources are limited or predictability in execution time is prioritized over optimality. In practice, it often delivers average performance that is surprisingly competitive, particularly as a baseline for evaluating other policies, and can outperform FIFO in scenarios where FIFO's deterministic nature leads to suboptimal eviction sequences. However, it does not exploit temporal or spatial locality, leading to potentially high page fault rates in workloads with strong access patterns, and its worst-case performance degrades logarithmically with increasing frame count compared to the optimal (OPT) algorithm.
Random replacement finds application in resource-constrained embedded systems, where its low overhead and analyzable worst-case behavior support probabilistic timing guarantees in real-time environments. It also serves as a straightforward baseline in simulations and benchmarks for more advanced policies. In flash-based storage systems, such as SSDs, random eviction strategies help distribute write operations evenly, contributing to wear leveling by preventing concentrated erasure cycles on specific blocks and thereby extending device lifespan.
Longest Distance First (LDF)
The Longest Distance First (LDF) page replacement algorithm serves as a practical approximation to Belady's optimal replacement strategy by evicting the page predicted to have the longest time until its next use. It operates by analyzing historical access patterns or known access sequences to estimate the future reference distance for each page currently in memory, then selecting the one with the maximum estimated distance for replacement when a page fault occurs and no free frames are available. This prediction can be derived from techniques such as reference history tracking or sequential pattern recognition, avoiding the need for complete future knowledge required by the exact optimal algorithm.[46]
LDF demonstrates advantages over backward-looking algorithms like LRU, particularly in workloads exhibiting sequential or predictable access patterns, where it achieves lower page fault rates by better anticipating future needs—for instance, simulations on reference strings show LDF incurring 12 faults compared to 15 for LRU with 4 frames. In database systems, where queries often involve sequential scans, LDF leverages prefetching mechanisms to estimate distances, making it suitable for buffer management to minimize I/O overhead. A representative example occurs in streaming applications, where the algorithm evicts pages positioned farthest ahead in the prefetch queue, as these are least likely to be accessed imminently, thus optimizing memory for incoming sequential data blocks.[46][47]
Despite these benefits, LDF's reliance on an effective prediction model introduces implementation complexity, including overhead for maintaining and updating distance estimates, which can degrade performance in unpredictable or random access scenarios. As a variant of Belady's OPT that dispenses with full future visibility, LDF balances optimality with feasibility but remains more computationally intensive than simpler heuristics.[46]
Implementation and Applications
Hardware Techniques for Reference Tracking
In systems lacking dedicated hardware reference bits, operating systems employ software-based techniques to track page access patterns for page replacement decisions. These methods simulate reference tracking by periodically scanning page tables or using indirect indicators of access frequency, though they introduce significant overhead compared to hardware-assisted approaches. While bit-based hardware methods provide ideal low-latency tracking of page references, software alternatives are essential for legacy or resource-constrained architectures.[48]
One common software simulation approach involves the operating system periodically scanning page tables to identify access patterns. During these scans, the OS examines entries for indicators such as modification status or recent activity, approximating recency or frequency to select victim pages. For instance, in 4.2BSD Unix, the pagedaemon process—a low-priority kernel thread—awakens when free memory falls below a threshold and linearly scans the core memory map (cmap), a structure tracking allocated frames. It marks potentially reclaimable pages as invalid to simulate a reference bit; if a process accesses such a page before the next scan, a page fault occurs, allowing the OS to validate and update the page's status. This global clock-like sweep reclaims unreferenced pages while aiming to limit CPU usage to under 10% on typical VAX systems with 8-16 MB of memory. However, the technique incurs high overhead from frequent page faults and prolonged scan times, especially under memory pressure, prompting optimizations like a dual-hand clock in later versions such as 4.3BSD.[48][49]
Another technique uses timestamps recorded on page faults to approximate least recently used (LRU) behavior without tracking every access. Upon a page fault, the OS records the current time in the page table entry, enabling later comparisons to estimate recency during replacement decisions. This avoids the cost of updating timestamps on all memory references, which would require trapping every access—a prohibitive overhead in software. Systems may combine this with periodic scans to refresh timestamps, providing a lightweight approximation suitable for environments without hardware counters.[50]
Inverted page tables offer an efficient structure for reference tracking in large address spaces, using a single table indexed by physical frame number rather than virtual addresses. To handle mappings, entries are organized via a hash table where virtual page-process ID pairs hash to frame entries, with collisions resolved through chained lists of potential matches. During page replacement, the OS scans or traverses these hash chains in the inverted table to approximate access patterns, such as by checking resident page activity across processes. This approach reduces storage compared to full per-process page tables while enabling global replacement scans, though chain traversals can add latency in contended scenarios.[51][52]
Sampling accesses via software traps provides a probabilistic tracking method, where the OS selectively protects pages (e.g., marking them read-only or invalid) to generate traps on access. These traps log sampled references without monitoring all memory operations, allowing the OS to infer overall patterns for replacement. For example, page fault-based sampling triggers on selected pages, capturing access frequency through fault handlers while minimizing intervention. This is particularly useful in virtualized or multi-tenant environments but risks under-sampling bursty workloads.[53][54]
Indirect tracking via translation lookaside buffer (TLB) statistics leverages hardware caches to approximate page accesses without direct bit support. The OS monitors TLB miss rates or augments TLB miss handlers to record translation faults as proxies for page references, building access traces over time. Such stats provide coarse-grained insights into working set locality but may overlook intra-TLB hits.[55]
These hardware-constrained techniques collectively face challenges like elevated CPU overhead from scans and traps, potentially degrading system throughput, with CPU usage targeted below 10% but observed overheads varying in memory-intensive workloads. Despite optimizations, they underscore the performance gap versus native hardware support, influencing designs in early Unix variants and embedded systems.[48][49]
Operating System Examples: Linux Page Cache
The Linux kernel's virtual memory (VM) subsystem implements page replacement using an approximate least recently used (LRU) policy, which has relied on active and inactive lists since version 2.6 to distinguish frequently accessed pages (working set) from candidates for eviction.[16] These per-zone lists help approximate recency without full timestamp tracking, promoting pages from inactive to active upon access while demoting inactive pages after a period of disuse.[16] The page cache operates as a unified structure for both file-backed pages (from disk I/O) and anonymous pages (from processes), allowing shared management under the LRU approximation to optimize caching for diverse workloads.[16]
Background page reclamation is handled by the kswapd daemon, a per-node kernel thread that activates when available memory drops below low watermarks, scanning and freeing pages from the inactive list until high watermarks are reached.[16] Implemented primarily in mm/vmscan.c, this process prioritizes reclaiming clean pages first and involves slab shrinking alongside LRU scanning, ensuring proactive memory management without immediate process blocking.[56] The swappiness parameter (/proc/sys/vm/swappiness, default 60 on a 0–100 scale) tunes the policy's bias, favoring file page eviction over anonymous page swapping at lower values to preserve application data in memory, while higher values encourage earlier swapping to reduce disk cache pressure.[57]
Since Linux 6.1, the Multi-Generational LRU (MGLRU) has augmented the traditional two-list approach by organizing pages into multiple generations based on access recency, enabling finer-grained working set estimation through per-generation histograms and proactive cold page eviction.[58] This adaptation improves reclaim efficiency under pressure, reducing kswapd CPU overhead by up to 51% in web browsing benchmarks and minimizing thrashing via tunable parameters like min_ttl_ms for short-lived workloads.[59] For SSD-backed swap, MGLRU pairs effectively with zswap, a compressed in-RAM cache that intercepts evictable anonymous pages, compresses them (e.g., via zstd, the default since kernel 6.3, or LZO) into a dynamic pool (limited by max_pool_percent), and only spills to disk on overflow, thereby cutting I/O latency and extending SSD lifespan.[60][61]
In extreme cases, if direct reclaim (triggered by allocations) and kswapd fail to free sufficient memory, the system invokes the Out-of-Memory (OOM) killer to select and terminate high-memory processes based on oom_score, integrating page replacement exhaustion with broader memory recovery.