Hazard pointer
Hazard pointers are a synchronization primitive in concurrent programming designed to enable safe memory reclamation for dynamic lock-free data structures, allowing threads to protect shared objects from premature deallocation while avoiding the need for locks or garbage collection.[1] Introduced by Maged M. Michael in 2004, this technique addresses the challenges of memory management in multithreaded environments where threads concurrently read and modify linked data structures, such as queues and stacks, without blocking each other.[1]
The core mechanism involves each thread maintaining a fixed number of hazard pointers—special shared variables that point to objects the thread is currently accessing or may soon access.[1] When a thread retires (potentially deletes) an object, it adds it to a local list and periodically scans all active hazard pointers across threads; if the retired object is not referenced by any hazard pointer, it can be safely reclaimed, ensuring constant expected amortized time complexity for scans.[1] This approach mitigates issues like the ABA problem, where a reused memory address could lead to incorrect operations, and supports scalable, non-blocking concurrency without hardware-specific instructions.[1]
Hazard pointers have been widely adopted in high-performance systems, including implementations in libraries like Facebook's Folly for concurrent hash maps and sets, where they facilitate fast, lock-free iterations and read-mostly access patterns.[2] Hazard pointers are included in C++26 via paper P2530R3 (as of 2025), providing a stable interface in the <hazard_pointer> header to simplify their use in portable code, with classes like std::hazard_pointer_obj_base for object protection and functions for domain management.[2] Their integration into the standard aims to promote broader adoption in lock-free algorithms, enhancing concurrency in applications from databases to real-time systems.[2]
Background and motivation
Problems in lock-free concurrent programming
Lock-based approaches to concurrent programming rely on mutual exclusion primitives, such as mutexes, to serialize access to shared data structures, ensuring thread safety but introducing risks like deadlocks, priority inversion, and convoying effects where threads block each other unnecessarily.[3] In contrast, lock-free methods eschew locks entirely, using atomic operations like compare-and-swap (CAS) to enable concurrent modifications while guaranteeing non-blocking progress: if multiple threads are active on the data structure, at least one will complete its operation in a finite number of steps, preventing system-wide stalls even if individual threads experience repeated failures.[3] This progress property stems from the inherent resilience of atomic primitives to process delays or failures, making lock-free algorithms more robust in asynchronous environments compared to lock-based ones, which can halt entirely due to blocking.[3]
A prominent challenge in lock-free programming is the ABA problem, which arises during CAS operations when a shared value appears unchanged between reads but has been modified intermediately by other threads, leading to spurious successes and data corruption.[4] Consider a lock-free linked list removal: Thread T1 reads a node's next pointer as null (value A) to confirm it is the tail and prepares a CAS to update the previous node's next to null; concurrently, Thread T2 removes the node (changing the value to some other pointer B), frees it, reuses the memory for a new node with next set back to null (value A again), and inserts it elsewhere.[4] When T1 resumes and performs the CAS, it succeeds incorrectly because the value matches A, but the structure now points to the wrong node, potentially orphaning elements or creating cycles.[4] This issue is exacerbated in environments without garbage collection, where memory reuse is common, and it undermines the linearizability of operations unless mitigated by techniques like version counters appended to pointers.[4]
Beyond ABA, lock-free data structures face general race conditions in pointer manipulations, where concurrent atomic updates to shared pointers can result in lost nodes, duplicate insertions, or invalid traversals without additional synchronization beyond CAS.[3] For instance, in a concurrent queue, multiple threads attempting to swing a tail pointer via CAS may overwrite each other's updates if not carefully ordered, leading to broken links or enqueued items becoming inaccessible.[3] Early attempts at such structures, like those by Stone (1990) and Valois (1995), encountered races in pointer swings that caused list fragmentation or item loss, highlighting the subtlety required in designing atomic pointer operations.[3]
The development of lock-free structures began in the late 1980s and 1990s, with foundational work including Treiber's 1986 lock-free stack algorithm, which used CAS on a top pointer for push and pop operations in a singly-linked list, establishing a simple non-blocking pattern for LIFO access.[3] This was followed by efforts on queues, such as incomplete algorithms by Hwang and Briggs (1987) and Sites (1989), which suffered from specification gaps and races.[3] The seminal Michael and Scott queue (1996) addressed these by combining CAS with careful pointer tagging and validation steps, achieving practical lock-free FIFO behavior, though it still grappled with issues like the ABA problem and memory management.[3] These early structures demonstrated the viability of lock-free programming but underscored persistent challenges in ensuring correctness without locks, particularly around dynamic memory and concurrent pointer updates.[4] One specific unsolved issue in these designs was safe memory reclamation for retired nodes, as threads might retain dangling references amid ongoing operations.[3]
The memory reclamation challenge
In lock-free concurrent data structures, such as linked lists and search trees, nodes are dynamically allocated and deallocated by multiple threads operating simultaneously without mutual exclusion.[5] This concurrent memory management is essential for scalability but introduces the risk of premature reclamation, where a thread removes and frees a node while another thread still holds a pointer to it.[5] If the freed memory is reused for a different purpose, the second thread may dereference a dangling pointer, leading to data corruption, crashes, or undefined behavior.[5]
Garbage collection provides automatic memory safety by tracking reachable objects but is unsuitable for real-time or high-performance lock-free systems due to its non-deterministic pause times, which can halt all threads and violate timing guarantees.[6] These pauses arise from stop-the-world phases needed to scan and compact memory, making garbage collection incompatible with applications requiring bounded latency, such as embedded systems or high-throughput servers.[6]
Reference counting, an alternative where each node maintains a counter of active pointers, fails in lock-free settings due to atomicity challenges and potential for incomplete reclamation. Updating the count atomically during concurrent reads and writes requires strong multi-word primitives that are unavailable on most hardware, leading to inefficiency and complexity.[5] Moreover, circular references among nodes can prevent the count from reaching zero, even for unreachable objects, blocking reclamation indefinitely. The ABA problem can exacerbate these issues by allowing a node to appear safe for reuse when its pointer value matches a prior state, though memory reclamation risks compound the broader synchronization challenges.[5]
Core concept
Hazard pointers definition
Hazard pointers are a form of per-thread, single-writer multi-reader atomic pointers designed to facilitate safe memory reclamation in lock-free concurrent data structures.[7] Each thread maintains a small, fixed number of these pointers—typically one or two—to reference nodes that it may access during operations on dynamic objects, such as linked lists or queues.[7] These pointers are implemented using standard atomic operations, such as single-word atomic reads and writes, ensuring that only the owning thread can modify its hazard pointers while all threads can read any of them.[7]
Key properties of hazard pointers include their global visibility across all threads and initial null values, which signal that no protection is active.[7] In the context of optimistic concurrency, a thread "publishes" a hazard pointer by atomically setting it to the address of a node before performing a read or traversal, thereby announcing to other threads that the node is in active use and should not be reclaimed.[7] This mechanism allows threads to proceed without locks while guaranteeing that protected nodes remain valid during the operation.
For example, during an enqueue operation in a lock-free FIFO queue, a thread might first acquire a local reference t to a new tail node, then set its first hazard pointer hp0 atomically with *hp0 = t before validating and linking the node.[7] This simple assignment ensures the node is protected from immediate reuse by any reclaiming thread.
Publisher-subscriber paradigm
In the hazard pointer mechanism, the publisher-subscriber paradigm facilitates safe memory reclamation by enabling threads to announce their intent to access shared objects while allowing other threads to verify the safety of deallocation operations.[8] Threads act as publishers by setting their hazard pointers to reference specific nodes they plan to read or traverse, thereby notifying all other threads of potential ongoing access to those nodes. This publication step ensures visibility across the system without requiring centralized coordination.[9]
Conversely, threads attempting to retire or reclaim memory nodes serve as subscribers by scanning and checking the set of all active hazard pointers maintained by publishers.[8] If a retired node does not match any published hazard pointer, the subscriber can proceed with deallocation, as no other thread is currently accessing it. This subscription process decouples the protection announcements from the reclamation decisions, promoting lock-free progress.[9]
The paradigm relies on single-writer multi-reader semantics for each hazard pointer, where only the owning thread (publisher) can update its value, while any thread (subscriber) can read it concurrently. This ownership model limits contention to the writer's updates and supports efficient reads by multiple subscribers.[8]
Atomicity in hazard pointer updates is critical to maintain consistency, as publishers use atomic write operations to set pointers without intermediate states that could lead to races. Subscribers similarly rely on atomic reads to observe the latest published values, ensuring that the coordination remains reliable in a non-blocking environment.[9]
Operation
Protecting objects
In hazard pointers, protecting an object involves a thread announcing its intent to access a specific node by atomically publishing a pointer to it in one of its designated hazard pointer slots, ensuring that the node cannot be reclaimed by other threads during the access period.[1] This process relies on the publisher-subscriber paradigm, where the accessing thread acts as the publisher by updating its local hazard pointer.[1]
The protection sequence begins when a thread reads a pointer to a node from a concurrent data structure, such as during a traversal in a lock-free linked list. To safeguard this node, the thread atomically sets one of its hazard pointers to point to the read node. Once set, the thread can safely perform read or write operations on the node, as any potential reclaimer will detect the hazard pointer and defer reclamation. After completing the access, the thread clears the hazard pointer by atomically setting it to null, releasing the protection.[1]
For data structures requiring traversal of multiple linked nodes, such as doubly-linked lists or trees, each thread maintains multiple hazard pointer slots—commonly one or two—to protect both the current node and its successor or predecessor simultaneously. This allows safe navigation without unprotected intermediate reads that could lead to dangling pointers. The number of slots per thread is fixed and chosen based on the maximum concurrency and structure complexity, balancing protection needs with space efficiency.[1]
The following pseudocode illustrates the basic protection mechanism for a single node in a lock-free context:
[function](/page/Function) protect_and_access([node_ptr](/page/Node) p):
thread_hazard[slot] = p // [Atomic](/page/Atomic) set
// Now p is protected; safely access the [node](/page/Node)
[value](/page/Value) = p->data // Example read [operation](/page/Operation)
// ... perform other operations on p ...
thread_hazard[slot] = null // [Atomic](/page/Atomic) clear
return [value](/page/Value)
[function](/page/Function) protect_and_access([node_ptr](/page/Node) p):
thread_hazard[slot] = p // [Atomic](/page/Atomic) set
// Now p is protected; safely access the [node](/page/Node)
[value](/page/Value) = p->data // Example read [operation](/page/Operation)
// ... perform other operations on p ...
thread_hazard[slot] = null // [Atomic](/page/Atomic) clear
return [value](/page/Value)
This approach ensures race-free publication, as the atomic writes to the thread's own slots are contention-free, while the global list of hazard pointers (maintained separately) allows deferring reclamation without locks.[1]
Retiring and reclaiming nodes
In the hazard pointer methodology, when a thread logically removes a node from a concurrent data structure, it does not immediately deallocate the memory to avoid use-after-free errors from concurrent accesses. Instead, the thread retires the node by appending it to a thread-local list of retired nodes, often denoted as rlist, which accumulates potentially reusable objects.[7]
To bound the memory overhead and trigger periodic reclamation, each thread monitors the size of its rlist against a threshold R = H \log H, where H is the maximum number of hazard pointers that can be active across all threads (typically H = N \times K, with N threads and K pointers per thread). When the local retired count reaches or exceeds R, the thread invokes a scan operation to identify safe-to-reclaim nodes. This threshold ensures that reclamation efforts occur infrequently enough to amortize costs while preventing unbounded list growth.[7]
The scanning algorithm proceeds in two stages to determine which retired nodes can be safely reclaimed. In the first stage, the thread collects all currently active hazard pointers by iterating over the shared array of hazard pointer records, inserting non-null values into a local list or hash table (denoted plist) for efficient lookup. This captures pointers to objects that other threads may still be accessing. In the second stage, the thread examines each node in its rlist and checks if it matches any entry in plist; nodes without a match are deemed unreferenced and eligible for reuse or deallocation, after which they are removed from rlist. The process relies on the atomicity of hazard pointer updates to guarantee that no thread can access a reclaimed node.[7]
The retirement and scanning processes are implemented via simple routines, as shown in the following pseudocode from the original formulation:
RetireNode routine:
RetireNode(node)
rlist = rlist → node // Insert node at the head of the retired list
rcount = rcount + 1 // Increment the retired count
if (rcount >= R) // If threshold R is reached
Scan() // Scan hazard pointers
RetireNode(node)
rlist = rlist → node // Insert node at the head of the retired list
rcount = rcount + 1 // Increment the retired count
if (rcount >= R) // If threshold R is reached
Scan() // Scan hazard pointers
Scan routine:
Scan()
Stage 1:
plist = empty // Initialize private list of hazard pointers
for each HP record hp in HP list
if (hp != null)
insert hp into plist // Add nonnull hazard pointer to plist
Stage 2:
for each node n in rlist
if (lookup(n, plist) == failure) // If n not found in plist
PrepareForReuse(n) // Reclaim node
remove n from rlist // Remove node from retired list
rcount = length(rlist) // Update retired count
Scan()
Stage 1:
plist = empty // Initialize private list of hazard pointers
for each HP record hp in HP list
if (hp != null)
insert hp into plist // Add nonnull hazard pointer to plist
Stage 2:
for each node n in rlist
if (lookup(n, plist) == failure) // If n not found in plist
PrepareForReuse(n) // Reclaim node
remove n from rlist // Remove node from retired list
rcount = length(rlist) // Update retired count
These routines use a hash table or sorted array for plist lookups to achieve efficient matching, with the PrepareForReuse step handling actual memory deallocation or recycling.[7]
Analysis
The operations for setting and clearing hazard pointers are performed using compare-and-swap (CAS) instructions, resulting in constant O(1) time complexity per operation.[7] This efficiency stems from the use of atomic reads and writes without requiring complex synchronization primitives beyond standard hardware support.[7]
The scanning process, which checks retired nodes against active hazard pointers to determine safe reclamation, has O(H) time complexity per scan, where H is the total number of hazard pointers in the system (typically bounded by the number of threads times the maximum hazard pointers per thread).[7] Scans are triggered only when the number of retired nodes reaches a threshold R, set to Θ(H) to balance frequency and cost, leading to an amortized O(1) expected time per retired node across all retirements.[7]
Hazard pointers provide wait-free progress guarantees for the memory reclamation process, meaning each thread completes its reclamation operations in a bounded number of steps independent of the actions or delays of other threads.[7] Performance is influenced by the number of threads, which directly scales H and thus the scan overhead, as well as the depth of the underlying data structure, which affects the overall traversal time during which hazard pointers must be maintained.[7]
Space bounds
Hazard pointers impose a modest per-thread space overhead, consisting of typically 1 to 2 hazard pointers for protecting objects and a local retired list that accumulates up to R retired nodes before triggering a scan.[7] This local list serves as a buffer for nodes pending reclamation, ensuring that threads manage their own retirements without immediate global coordination.[7]
Globally, the number of unreclaimed nodes is bounded by at most H + N \times R, where H denotes the total number of hazard pointers across all threads and N is the number of threads; in practice, this yields a worst-case space complexity of O(H^2).[10] The retirement threshold R is selected as R = H + \sqrt{H} to optimally balance the frequency of costly scans against accumulated space usage, maintaining constant amortized time per retired node while keeping memory overhead manageable.[7]
Unlike certain bounded reclamation schemes that retain memory within the application indefinitely, hazard pointers permit the actual return of reclaimed nodes to the operating system, facilitating efficient long-term memory utilization.[7]
Advantages and limitations
Benefits over alternatives
Hazard pointers offer several advantages over reference counting techniques for memory reclamation in lock-free data structures. Unlike lock-free reference counting, which requires frequent atomic increments and decrements on per-object counters—often leading to contention and inefficiency—hazard pointers avoid these operations entirely by having threads announce their interest in specific objects via hazard pointers, enabling safe reclamation without modifying object metadata during access.[7] This approach also implicitly handles cyclic references, which can cause leaks or require complex cycle detection in reference counting systems, as hazard pointers rely solely on pointer announcements rather than count-based tracking.[7] Experimental evaluations demonstrate that hazard pointers achieve significantly higher throughput than reference counting; for instance, in a concurrent hash table benchmark on four processors, hazard pointers delivered up to 573% better performance under moderate load factors.[7]
Compared to epoch-based reclamation (EBR) methods, hazard pointers provide stronger progress guarantees without the need for global synchronization mechanisms like epochs or quiescent states. EBR requires threads to periodically enter quiescent periods or synchronize on global epochs to ensure safe reclamation, which can introduce blocking delays, especially in the presence of thread failures or scheduling variations, and may lead to unbounded growth in unreclaimed objects if threads lag.[7] In contrast, hazard pointers are fully lock-free, bounding the number of unreclaimed nodes independently of thread behavior and allowing immediate reuse once no hazards point to an object, thus avoiding global coordination overhead. This fine-grained protection per object, rather than coarse-grained epoch reservations, ensures predictable memory usage even under high contention or irregular thread participation.[11] Their inclusion in C++26 further enhances portability by providing a standardized interface.[2]
Relative to Read-Copy-Update (RCU), hazard pointers exhibit lower latency for write operations and greater flexibility for arbitrary lock-free structures. RCU excels in read-mostly scenarios by minimizing reader overhead through grace periods but imposes read-side critical sections and global synchronization for writers, delaying reclamation until all readers complete their phases, which can increase writer latency in mixed workloads. Hazard pointers, by contrast, enable writers to retire and scan for reclaimable nodes locally without waiting for global quiescence, making them suitable for structures lacking natural read-side barriers and providing consistent low-latency updates across all operations. This applicability extends to fully concurrent environments where RCU's assumptions about read-heavy access patterns do not hold.
In general, hazard pointers are highly portable, relying only on standard single-word atomic reads and writes without requiring specialized hardware like double-wide compare-and-swap (double-CAS) instructions, which are unavailable on many architectures.[7] They also resolve the ABA problem inherent in lock-free algorithms—where a reused pointer could lead to incorrect operations—without needing pointer-counter pairs or double-CAS, achieving the same prevention efficacy as garbage collection through direct pointer protection.[7] These properties make hazard pointers particularly efficient for moderate thread counts, where their per-thread announcement overhead scales linearly while delivering bounded space usage superior to epoch-based alternatives.[7]
Drawbacks and overheads
Hazard pointers, while effective for safe memory reclamation in lock-free data structures, exhibit scalability limitations primarily due to the reclamation scan process. Each thread retiring a node must scan all active hazard pointers to determine if the node is safe to reclaim, incurring a time complexity of O(H), where H is the total number of hazard pointers across all threads.[5] Since H scales linearly with the number of threads P and the maximum number of pointers protected per thread K (i.e., H = Θ(PK)), this scan time can become significant in highly concurrent environments with many threads. Additionally, the number of unreclaimed records waiting to be freed can reach O(P K P) in the worst case.[12]
Integrating hazard pointers into existing concurrent algorithms requires manual redesign, as developers must explicitly insert publications of hazard pointers at every point where a node might be accessed, ensuring protection before traversal or modification. This process is non-trivial and can be incompatible with certain lock-free data structures, such as those using marked nodes or helping mechanisms, potentially necessitating operation restarts that undermine lock-free progress guarantees.[12]
The technique also introduces space overhead through temporary retention of retired nodes. Threads accumulate retired nodes in private lists and only reclaim them after a successful scan confirms no active hazard pointers reference them, leading to bounded but potentially large buffers of unreclaimed memory that grow with contention or delayed scans.[12] This waste is exacerbated in scenarios with high deletion rates, where nodes remain allocated longer than necessary until reclamation thresholds are met.
Frequent atomic operations on the shared array of hazard pointers contribute to cache contention and performance degradation. Publishing a hazard pointer requires an atomic write, often followed by a memory barrier on architectures like x86 to ensure visibility, while validating during scans involves atomic reads across the array; these operations, repeated for every protected access, induce inter-thread cache invalidations and coherence traffic, particularly under high concurrency.[12]
Applications and implementations
Use in concurrent data structures
Hazard pointers are widely applied in the construction of lock-free concurrent data structures to ensure safe memory reclamation during operations that involve pointer traversals and node retirements.[13] In these structures, threads use hazard pointers to protect nodes that may be accessed concurrently, preventing premature deallocation while allowing efficient reuse of memory without locks or garbage collection.[13]
A prominent example is the Michael-Scott queue, a first-in-first-out (FIFO) structure where hazard pointers are integrated to safeguard nodes during enqueue and dequeue operations.[13] During traversal of the queue's linked list—such as following next pointers to locate the tail—threads publish hazard pointers to the nodes being read, ensuring that any concurrently retired nodes remain protected until the scan confirms they are no longer in use.[13] Similarly, in a lock-free stack (LIFO), hazard pointers protect the top node during push and pop operations, where threads announce the current top pointer before attempting to modify it.[13] For unordered hash tables, hazard pointers enable lock-free insertions, deletions, and lookups by protecting bucket chains and key-value nodes during traversals, allowing multiple threads to safely navigate and modify the table without blocking.[13]
The adaptation of hazard pointers to these data structures typically involves inserting hazard pointer announcements at key points in traversal algorithms, such as before dereferencing next or child pointers in linked lists or trees.[13] This process ensures that any node read during the operation is marked as hazardous, blocking its reclamation until the thread completes its scan of the global hazard pointer list.[13] For instance, in a queue dequeue, a thread might set a hazard pointer to the head node, follow the next pointer while protecting intermediate nodes, and only retire the head after verifying no other thread holds a hazard pointer to it.[13]
A detailed case study illustrates this in the pop operation of a lock-free stack. The thread first reads the current top node and publishes it as a hazard pointer to protect against concurrent pops that might retire it. If the pop succeeds by atomically updating the top to the next node, the original top is retired but not immediately reclaimed; instead, it joins a local retire list, and a subsequent scan checks if any active hazard pointer references it, delaying reclamation if necessary. This mechanism ensures linearizability and prevents use-after-free errors even under high contention.[13]
Extensions of hazard pointers support multiple domains, where distinct sets of hazard pointers can be maintained for different execution contexts, such as separate thread groups or isolated subsystems.[13] Additionally, assigning one domain per data structure enhances isolation, allowing hazard scans to be confined to relevant pointers and reducing overhead in multi-structure environments by preventing cross-contamination of protections.[13]
They have also been adopted in production libraries, such as Facebook's Folly, for implementing lock-free concurrent hash maps and sets.[2]
Standardization in C++
The standardization of hazard pointers in C++ originated with proposal P0233R6 in 2017, which introduced the core concepts for integrating this safe memory reclamation technique into the language standard. This initial paper focused on providing a foundation for lock-free concurrent programming by defining hazard pointers as a mechanism to protect dynamically allocated objects from premature reclamation. Over the following years, the proposal evolved through iterative revisions, incorporating feedback from the C++ standards committee to refine the interface and ensure compatibility with existing atomic operations. The matured version, P2530R3 from 2023, was ultimately accepted for inclusion in C++26, marking a significant step toward standardizing non-blocking memory management tools.[14][2]
In C++26, hazard pointers are implemented via the <hazard_pointer> header, featuring std::hazard_pointer as the primary type, which operates as a single-writer, multi-reader pointer owned by at most one thread at a time. This design supports multiple domains to segregate reclamation activities across different parts of an application, preventing interference between concurrent components. Key elements include the std::hazard_pointer_obj_base<T> base class for objects, with its retire() method to defer reclamation until safe; the library handles scanning of hazard pointers internally to identify reclaimable nodes. These facilities enable efficient, thread-safe memory management without relying on garbage collection or coarse-grained locking.[2]
The inclusion of hazard pointers in C++26 offers developers portable, high-performance implementations that abstract away the complexities of manual hazard pointer coding, reducing the risk of subtle concurrency bugs in lock-free data structures. By leveraging compiler and library optimizations, users can achieve scalable memory reclamation tailored to multicore systems without vendor-specific extensions. As of November 2025, the C++26 feature set has been frozen with hazard pointers included, and implementations are in development for major standard library distributions such as libc++ and libstdc++.[2][15][16]