Fact-checked by Grok 2 weeks ago

Compare-and-swap

Compare-and-swap (CAS), also known as compare-and-exchange, is an hardware instruction used in concurrent programming to safely update locations in multiprocessor systems. It operates by comparing the current stored at a specified with an expected provided as input; if the two values match, it atomically replaces the memory location's with a new , also provided as input, and returns success, while if they do not match, it leaves the unchanged and typically returns failure. This operation ensures indivisibility, preventing race conditions where multiple threads might interfere with each other's updates to the same variable. First introduced by in the System/370 architecture in , CAS provided a foundational synchronization primitive for operating systems like OS/360 and early UNIX ports, enabling efficient without traditional locks in certain scenarios. Over time, CAS has become a standard feature in modern processor architectures, including x86 (via the CMPXCHG instruction), (via CAS and CASP), and others, supporting both single-word and double-word variants for 32-bit and 64-bit operations. Its is commonly represented as:
bool CAS(address, expected, new_value) {
    if (*address == expected) {
        *address = new_value;
        return true;
    }
    return false;
}
This simplicity belies its power in enabling non-blocking algorithms, where threads retry operations on failure rather than blocking, improving scalability in high-contention environments. In , Maurice Herlihy's 1991 work demonstrated CAS's universality for wait-free , proving it sufficient to implement any shared object in a wait-free manner—guaranteeing progress for each thread independent of others—unlike weaker primitives like read-write registers or . This result has profoundly influenced the design of lock-free data structures, such as queues, stacks, and tables. Extensions like multi-word CAS (MCAS) address limitations in updating multiple disjoint locations atomically, further expanding its applicability in advanced concurrent algorithms. Despite its benefits, CAS can suffer from the , where a value temporarily changes and reverts, leading to incorrect updates, often mitigated by hazard pointers or version tags.

Fundamentals

Definition

Compare-and-swap (CAS) is an atomic primitive implemented as a single, indivisible CPU instruction or software-emulated operation that reads the current value from a specified , compares it to an expected value, and conditionally stores a new value only if the current value matches the expected one. This ensures the entire read-compare-write sequence occurs without interference from other threads, maintaining data consistency in concurrent environments. The operation requires three primary parameters: a pointer to the target , the expected value (representing the anticipated current state), and the desired new value to write if the comparison succeeds. Upon execution, returns a boolean value indicating success—true if the comparison matched and the write occurred, or false if the memory value had changed in the interim. In contexts, facilitates non-blocking (lock-free) concurrency by supporting optimistic update strategies, where threads attempt modifications assuming no conflicts and retry only upon , avoiding the overhead and potential deadlocks of traditional locks.

and Operation

The compare-and-swap () operation performs an read-modify-write on a location, proceeding in the following logical steps: first, it loads the current value from the specified ; second, it compares this value to an expected value provided as input; third, if the values match, it stores a new value at the address and indicates success; fourth, if they do not match, it leaves the unchanged and indicates . The following illustrates the basic logic of , where the operation takes a , an , and a new value as parameters:
[function](/page/Function) compareAndSwap([address](/page/Address), expected, newValue) returns [boolean](/page/Boolean)
    current ← load([address](/page/Address))  // Read current value from [memory](/page/Memory)
    if current == expected then
        store([address](/page/Address), newValue)  // Write new value if match
        return true
    else
        return false
    end if
end [function](/page/Function)
In actual implementations, the entire sequence—reading, comparing, and conditionally writing—must execute atomically to ensure indivisibility and prevent concurrent interference. A non-atomic software emulation of this logic, such as the above without support, risks race conditions where concurrent threads could interleave reads and writes, leading to inconsistent states; achieving atomicity in software typically requires mechanisms like disabling interrupts, using barriers or fences, or wrapping in locks, whereas implementations (e.g., via dedicated instructions) guarantee indivisibility at the level. The operation returns a value indicating whether the swap succeeded (true if the matched and was replaced, false otherwise), which enables algorithms to detect failures and in a loop until success, supporting non-blocking progress in concurrent settings.

Applications

Lock-Free Data Structures

(CAS) enables the construction of lock-free data structures by providing an atomic primitive for , where threads read shared state, compute updates assuming no , and use CAS to install changes only if the state remains unmodified. This approach contrasts with lock-based , such as mutexes, which serialize access and can lead to deadlocks or under contention, whereas lock-free methods ensure that at least one thread makes progress without blocking others. In lock-free designs, failed CAS attempts trigger retries, promoting non-blocking progress in multiprocessor environments. Common lock-free data structures leverage for pointer manipulations in linked representations. For instance, Treiber's stack implements and pop operations by atomically swapping the top pointer: a thread reads the current top, links a new node to it for , or unlinks it for pop, and retries via if concurrent modifications occur. Similarly, the Michael-Scott queue uses to update head and tail pointers in a singly-linked list, enabling lock-free enqueue and dequeue by swinging pointers and handling races through repeated attempts. Hash tables, such as Harris's design, employ for inserting or removing elements in per-bucket lock-free linked lists, maintaining consistency without global locks. These structures offer benefits on multi-core systems by minimizing contention; unlike coarse-grained locks that bottleneck at shared resources, CAS-based retries distribute load across cores, yielding higher under high parallelism as demonstrated in benchmarks where scales near-linearly with . Lock-freedom specifically guarantees system-wide —ensuring some completes its in finite steps amid contention—providing stronger assurances than obstruction-free methods, which only require in the absence of .

Example: Atomic Adder

One practical application of compare-and-swap () is implementing a thread-safe atomic increment on a shared variable, allowing multiple threads to update the without using locks and avoiding conditions. In this scenario, threads concurrently attempt to increment a shared , such as tracking the number of processed items in a multithreaded application; ensures that only one thread's update succeeds at a time, with others retrying transparently. The algorithm operates in a : a first reads the current value of the as the , computes the new value by adding the increment (typically ), and then attempts a CAS operation to replace the 's value with the new one only if it still matches the . If the CAS succeeds, the update is complete; if it fails—indicating another modified the in the interim—the discards the and retries by reading the updated value. This retry mechanism serializes the increments without blocking, leveraging the atomicity of CAS to detect and resolve conflicts. The following pseudocode illustrates a general atomic add operation using CAS, where address points to the shared counter and delta is the increment amount:
atomic_add(address, delta) {
  do {
    expected = *address;
    new_val = expected + delta;
  } while (CAS(address, expected, new_val) != expected);
  return expected;
}
Here, the CAS primitive returns the original value at address if the swap succeeds (matching expected), or the current value if it fails, prompting a loop iteration. For a simple increment, set delta = 1. This implementation is lock-free, as each thread makes progress independently, though retries may occur under contention. Consider a trace of two threads, Thread A and Thread B, both attempting to increment a shared initialized to 0:
  1. Both threads read the as expected = 0 and compute new_val = 1.
  2. Thread A performs CAS( , 0, 1 ), which succeeds since the value is still 0; the is now 1, and Thread A returns.
  3. Thread B's CAS( , 0, 1 ) fails because the is now 1 (not matching expected=0); Thread B retries.
  4. On retry, Thread B reads expected = 1, computes new_val = 2, and performs CAS( , 1, 2 ), which succeeds; the is now 2, and Thread B returns.
This execution ensures the counter reaches 2 atomically, with increments applied in some sequential order despite concurrency. The operation is atomic because the CAS instruction executes indivisibly as a single hardware primitive, guaranteeing that no other can observe or interfere with the and midway; combined with the retry , it emulates a sequentially consistent increment equivalent to a locked ++ operation, preventing lost updates or races.

Challenges

ABA Problem

The ABA problem arises in compare-and-swap (CAS) operations when a thread reads a shared A from a memory , performs some computation, and later attempts a CAS to replace it with a new value, but another has meanwhile modified the location from A to B and then back to A, causing the CAS to succeed incorrectly despite the intervening change. This issue stems from CAS relying solely on direct value comparison without accounting for the history of modifications to the location. It is particularly prevalent in environments with memory reuse, such as garbage-collected languages or systems using object pools, where deallocated memory can be quickly reallocated with the same pointer value. A classic illustration occurs in lock-free implementations, such as during node deletion. Suppose thread T1 reads a pointer to N (value A) from the previous node's next field. While T1 prepares to unlink N, thread T2 removes N, freeing it, and possibly another thread reuses the same for a new with value A. When T1 executes CAS to update the previous node's next field, skipping over the now-reused N, the operation succeeds because the pointer still appears as A, potentially leading to incorrect linking and loss of subsequent nodes. This scenario highlights how pointer-based structures exacerbate the problem, as addresses can cycle through reuse faster than threads complete operations. The impact of the ABA problem includes data corruption, such as lost updates or structural inconsistencies in data structures, and can even induce infinite loops in retry-based algorithms if the false success propagates undetected. To detect and mitigate it, one common approach appends a version or to the being compared, ensuring even if the base reappears; for instance, in a 64-bit , the low 32 bits might hold the pointer or , while the high 32 bits store an incrementing updated on each modification. In , the AtomicStampedReference class implements this by pairing an object reference with an integer stamp, allowing CAS operations like compareAndSet to validate both the reference and stamp atomically, thus preventing false successes in scenarios like the example. This tagged approach maintains the efficiency of single-word CAS while addressing the historical oversight inherent in plain comparisons.

Performance Trade-offs

Compare-and-swap (CAS) operations provide significant benefits in concurrent environments, particularly for uncontended access where they execute as a single instruction with low , typically around 4-5 CPU cycles on modern processors. This efficiency stems from avoiding the overhead of acquiring and releasing locks, enabling better on multi-core systems by permitting concurrent progress without serializing threads. In read-heavy workloads, CAS-based algorithms often outperform traditional locking mechanisms, as they minimize blocking and support higher throughput under low conflict. However, under high contention, CAS incurs substantial costs due to the need for retry loops, where failed attempts require reloading values and spinning, which can consume hundreds of CPU cycles per operation—up to 200 cycles or more for remote accesses on architectures. This spin-waiting exacerbates line bouncing in systems, as frequent invalidations and coherency traffic degrade performance, especially in write-heavy scenarios where guarantees demand strict atomicity. Benchmarks indicate that at moderate contention levels (e.g., around 50% success rate), CAS retries can waste 10-100 times more cycles than a successful lock acquisition, depending on the and workload. Performance varies across architectures; for instance, recent 2020s benchmarks on AArch64 processors like show exhibiting quadratic latency scaling under high thread counts due to contention, often performing worse than x86 equivalents in such cases, though load-link/store-conditional alternatives can mitigate this on . is ideal for low-conflict operations, such as counters or queues with infrequent updates, but developers should fallback to locks for high-contention environments to avoid excessive retry overhead. In write-heavy workloads, the stronger consistency provided by can lead to slower overall performance compared to coarser-grained locks, as retries enforce at the expense of throughput.

Implementations

In C and related languages, compare-and-swap (CAS) is implemented through standardized atomic libraries and compiler-specific intrinsics, enabling portable lock-free programming across platforms. The C11 standard introduces the <stdatomic.h> header, which defines atomic types via the _Atomic qualifier and provides generic atomic operations, including atomic_compare_exchange_n for portable CAS on integer types. This function atomically loads the value from an atomic object, compares it to an expected value, and—if they match—stores a desired value while returning true; otherwise, it stores the current value into the expected pointer and returns false. A representative example is an increment of a shared using the weak variant atomic_compare_exchange_weak, which may spuriously fail for performance reasons but must be used in a retry loop:
c
#include <stdatomic.h>

_Atomic int [counter](/page/Counter) = 0;
int expected = atomic_load(&[counter](/page/Counter));

while (!atomic_compare_exchange_weak(&[counter](/page/Counter), &expected, expected + 1)) {
    expected = atomic_load(&[counter](/page/Counter));
}
This pattern ensures progress despite contention or spurious failures, with the weak form often compiling to more efficient instructions on hardware supporting it. In C++, the <atomic> header extends this with std::atomic<T>, offering compare_exchange_strong (guaranteed no spurious failure) and compare_exchange_weak (allows spurious failure for potential speedups in retry loops, such as lock-free queues). The weak variant is recommended for performance-critical loops where retries are expected, as it avoids unnecessary hardware retries on platforms like x86. For platform dependencies, 's legacy __sync_val_compare_and_swap intrinsic (superseded since GCC 4.8 by __atomic builtins) provided a simple CAS but lacked memory order control; it is replaced by __atomic_compare_exchange_n, which supports semantics and explicit ordering. On Microsoft Visual C++, InterlockedCompareExchange delivers a 32-bit CAS with implicit full barriers, suitable for Windows threading. Best practices emphasize : use memory_order_acquire on loads and memory_order_release on successful CAS stores to synchronize dependent operations between threads without the cost of memory_order_seq_cst full barriers, which impose total ordering. Pitfalls include infinite loops from unupdated expected values in weak CAS retries and overusing strong variants, which can degrade performance under high contention by forcing hardware retries.

Hardware-Level Support

The compare-and-swap (CAS) operation is natively supported in the x86 architecture through the CMPXCHG instruction, which performs a 32-bit atomic compare-and-exchange, introduced in the Intel 80486 processor in 1989. This instruction compares the value in the accumulator (AL, AX, EAX, or RAX depending on operand size) with a memory operand and, if equal, replaces it with a source value while setting the zero flag accordingly; a LOCK prefix ensures atomicity across multi-core systems. Extensions followed with CMPXCHG8B for 64-bit operations in the original Pentium processor in 1993, enabling atomic swaps on doublewords essential for 64-bit concurrency primitives. Further evolution included CMPXCHG16B in the x86-64 extensions, ratified by AMD in 2000 and implemented starting with the Opteron in 2003, supporting 128-bit atomic operations for advanced synchronization in modern 64-bit environments. In the ARM architecture, direct single-word CAS is not provided as a primitive instruction; instead, the Load-Exclusive (LDREX or LDXR) and Store-Exclusive (STREX or STXR) pair, introduced in ARMv6 (announced 2001, first implemented 2002), emulates through a reservation-based mechanism that detects intervening modifications between load and store. True hardware CAS support arrived in ARMv8 (2011) with atomic load/store instructions, and ARMv8.1 (2016) added the family for paired 64-bit or 128-bit compare-and-swap operations, facilitating lock-free algorithms on doublewords without exclusive monitors. Other architectures provide analogous hardware support: PowerPC employs the Load Word And Reserve Indexed (lwarx) and Store Word Conditional Indexed (stwcx.) instructions since the PowerPC 601 in 1994 to implement CAS via reservation, ensuring atomicity for 32- and 64-bit operations in a similar manner to 's exclusive access. 's base A extension, frozen in 2011 and widely adopted by the early , uses Load-Reserved/Store-Conditional (LR/SC) loops for CAS emulation, with like atomic read-modify-write providing related primitives; the Zacas extension, ratified in November 2023, introduces direct amocas.w, amocas.d, and amocas.q instructions for explicit 32-, 64-, and 128-bit CAS to reduce loop overhead in software. By the early 2000s, hardware CAS or equivalent support became ubiquitous in 64-bit CPUs across x86, , PowerPC, and emerging ISAs like , driven by the rise of multi-core processors and the need for scalable concurrency. Hardware atomicity for CAS is enforced through mechanisms like bus locking in early designs or, more efficiently, cache coherence protocols such as MESI (Modified, Exclusive, Shared, Invalid), which ensure a core gains exclusive ownership of the cache line before performing the read-compare-write sequence, preventing interference from other cores. Under MESI, the atomic operation transitions the line to the Modified state, serializing access via snooping or directory-based coherence to guarantee visibility and consistency across cores without full bus locking in modern implementations. Post-2020 developments, such as RISC-V's Zacas, address gaps in direct CAS support while maintaining compatibility with existing coherence models.

Extensions

Load-Link/Store-Conditional

Load-link/store-conditional (LL/SC) is a synchronization primitive consisting of two instructions designed to enable atomic read-modify-write operations in load-store architectures. The load-link (LL) instruction retrieves the current value from a specified into a while establishing a hardware reservation on that address, marking it for exclusive monitoring. The subsequent store-conditional (SC) instruction attempts to write a new value from a back to the same , succeeding and updating the memory only if no other or has performed a store to that since the LL; in case of , the SC does not modify and typically returns a failure indicator to the issuing , prompting a retry. This mechanism relies on hardware enforcement of the reservation, often through protocols or dedicated monitors, ensuring atomicity without locking the entire system. LL/SC provides functional equivalence to the operation, allowing to be implemented using a retry loop: an LL loads the expected value, which is compared locally to determine the desired update, followed by an SC to apply the change if the reservation remains valid. If the SC fails, the loop reloads via LL and retries, ensuring the update occurs atomically relative to the original load. Unlike , which only detects changes if the value matches the expected one, LL/SC inherently mitigates the by failing the SC upon any intervening store to the address, irrespective of whether the value has reverted to its prior state. This detection of modifications at the address level, rather than value level, makes LL/SC particularly robust for pointer-based structures where value recycling could otherwise lead to incorrect behavior. Several architectures provide native LL/SC support, including with its LL and instructions for 32- or 64-bit words, through variants like Load-Exclusive (LDREX or LDAXR) and Store-Exclusive (STREX or STLXR), and PowerPC via Load Word And Reserve Indexed (lwarx) and Store Word Conditional Indexed (stwcx.). These instructions are typically restricted to cached, aligned memory accesses to leverage coherence hardware efficiently. In software ecosystems, LL/SC underpins atomic operations in the platform's sun.misc.Unsafe class, which emulates on LL/SC-capable architectures like and PowerPC by wrapping the instructions in retry loops for portable lock-free programming. The similarly employs LL/SC on and PowerPC to realize generic atomic primitives, such as atomic increments and exchanges, ensuring cross-architecture consistency in concurrent code. Compared to , offers advantages in avoidance and alignment with RISC principles by separating load and store phases, potentially enabling more flexible optimizations. However, it introduces disadvantages, including overhead from —such as per- exclusive monitors or bits in caches—that track addresses and can consume additional hardware resources. Additionally, LL/SC loops risk livelock from spurious SC failures, often triggered by store forwarding within the same core or unrelated events that invalidate the reservation without an actual conflicting store, necessitating backoff strategies for progress guarantees. Historically, LL/SC originated as a proposal by Jensen, Hagensen, and Broughton for the S-1 AAP multiprocessor project at in the late , serving as an early foundation for non-blocking in shared-memory systems and influencing subsequent RISC designs before widespread adoption of fused instructions.

Multi-Word Variants

Multi-word variants of compare-and-swap () extend the atomic primitive to multiple memory locations, enabling compound updates that maintain consistency in concurrent environments without intermediate states visible to other threads. The double compare-and-swap (DCAS) operation, which atomically compares the contents of two distinct memory locations against expected values and replaces both with new values if the comparison succeeds, was proposed in the mid-1990s as a foundation for non-blocking techniques in operating system structures. This extension addresses limitations of single-word in scenarios requiring simultaneous updates to related data, such as pointer pairs in linked structures. Implementations of DCAS and broader multi-word CAS (MCAS) primarily use software emulation due to sparse hardware support. Lock-based software methods ensure atomicity but incur blocking overhead and reduce scalability under contention. Lock-free emulations built atop single-word CAS, such as the descriptor-based approach for arbitrary N-word operations, provide practical alternatives with disjoint-access parallelism, though they require careful management of helper threads and retry mechanisms. On the hardware side, instructions like x86's CMPXCHG16B offer double-width CAS for 128-bit operands—effectively two adjacent 64-bit words—but apply only to packed data, not arbitrary non-contiguous locations, and are available only on processors supporting 64-bit extensions. DCAS facilitates atomic operations in concurrent data structures, such as updating both head and tail pointers in double-ended queues (deques) to avoid races during enqueue or dequeue. It also supports consistent modifications by atomically swapping key-value pairs, preventing partial updates that could lead to inconsistencies. Generic transactional memory libraries, including Microsoft's Persistent Multi-word Compare-and-Swap (PMwCAS), employ MCAS to enable lock-free updates across multiple variables, particularly in systems for durable, concurrent programming. Key challenges in multi-word variants include the rarity of dedicated support, limited to specialized instructions on select CPUs like those with x86 extensions, which restricts performance and portability. Scalability degrades beyond 2-4 words due to heightened contention in emulations and in state validation complexity. Progress in the 2020s leverages Intel's (TSX) to implement MCAS via hardware transactions, reducing aborts from retry-heavy single-CAS sequences by providing all-or-nothing multi-word atomicity, albeit with a 10-15% performance overhead in lock-free algorithms.

References

  1. [1]
    Spin locks (CS 4410, Summer 2018)
    The compare and swap instruction (CAS) is similar to, but more complicated than, the test_and_set instruction. The CAS instruction takes three parameters: a ...
  2. [2]
    [PDF] Implementation of Atomic Primitives on Distributed Shared Memory ...
    The compare and swap primitive was first provided on the IBM System/370 [4]. Compare and swap takes three parameters: the address of the destination operand ...
  3. [3]
    A UNIX System Implementation for System/370 - Nokia
    Since its introduction by IBM in 1970, System/370 ... However, it does include a synchronizing instruction, Compare and Swap (CS), that was used in implementing P ...
  4. [4]
    The different groups of A64 atomic instructions - Arm Developer
    Compare and swap instructions, CAS and CASP. Atomic memory operation instructions, LD<OP> and ST<OP> , where OP is one of arithmetic and bitwise operations.<|control11|><|separator|>
  5. [5]
    [PDF] 6.852: Distributed Algorithms - csail
    If initial value = v, then access the C&S shared variable with compare-and-swap( ⊥, v ), obtain the previous value w. • If w = ⊥ then decide v. (You are ...
  6. [6]
    Wait-free synchronization | ACM Transactions on Programming ...
    ASPNES, J., AND HERLIHY, M.P. Fast randomized ... On utility accrual processor scheduling with wait-free synchronization for embedded real-time software.
  7. [7]
    Impossibility and universality results for wait-free synchronization
    Wait-free synchronization. A wait-free implementation of a concurrent data object is one that guarantees that any process can complete any operation in a finite ...
  8. [8]
    A Practical Multi-word Compare-and-Swap Operation
    We present an efficient and practical lock-free implementation of a concurrent deque that supports parallelism for disjoint accesses and uses atomic primitives ...
  9. [9]
    [PDF] Synchronization Lecture Notes
    In using compare-and-swap, the architecture is asked to operate based on whether the expected value is still present at that memory location. For commutable ...
  10. [10]
    The Balancing Act of Choosing Nonblocking Features
    Sep 1, 2013 · ... Compare-and-Swap (CAS) atomic primitive. CAS atomically reads a shared location and compares the read value with an expected value. If equal ...
  11. [11]
    BQ: A Lock-Free Queue with Batching - ACM Digital Library
    A CAS primitive is defined by a triplet consisting of a target memory address, an expected value, and a new value. A CAS operation compares the value stored in ...<|control11|><|separator|>
  12. [12]
    Lock-free linked lists using compare-and-swap - ACM Digital Library
    Lock-free linked lists using compare-and-swap. Article. Free access. Share on. Lock-free linked lists using compare-and-swap. Author: John D. Valois. John D.
  13. [13]
    [PDF] IBM System/370 Principles of Operation
    Sep 1, 1975 · IBM System/3 70 highlights some of the major features of System/370 ... COMPARE AND SWAP. COMPARE DOUBLE AND SWAP. COMPARE HALFWORD ...
  14. [14]
    General-Purpose Multicore Architectures - arXiv
    Aug 23, 2024 · The wider adoption of multicore CPUs was hastened by two trends observed in the late 1990s. First, a handful of computer engineers started ...
  15. [15]
    [PDF] Wait-free synchronization - Brown CS
    It is impossible to construct a wait-free implementation of a compare&swap register from a set of registers that support any combination of read, write, ...
  16. [16]
    [PDF] Art of Multiprocessor Programming - School of Computing Science
    Page 1. Page 2. The Art of Multiprocessor Programming. Page 3. This page ... compare(Timestamp);. 3. } 4 public interface TimestampSystem {. 5 public ...
  17. [17]
    [PDF] Real-Time Computing with Lock-Free Shared Objects
    As the name suggests, lock-free shared objects are distinguished by the fact that they are accessed without locking. As such, they do not give rise to priority ...<|control11|><|separator|>
  18. [18]
    [PDF] Systems Programming: Coping With Parallelism - IBM Research
    SYSTEMS PROGRAMMING: COPING WITH PARALLELISM. R. Kent Treiber. IBM Almaden Research Center. 650 Harry Road. San Jose, California 95120-6099 ?:== Research ...
  19. [19]
    [PDF] Simple, Fast, and Practical Non-Blocking and Blocking Concurrent ...
    We use Treiber's simple and efficient non-blocking stack algorithm [21] to implement a non-blocking free list. Figure 2 presents commented pseudo-code for the ...
  20. [20]
    High performance dynamic lock-free hash tables and list-based sets
    This paper presents the first CAS-based lock-free list-based set algorithm that is compatible with all lock-free memory management methods.
  21. [21]
    [PDF] Understanding and Effectively Preventing the ABA Problem in ...
    The usage of version tags is the most commonly cited solution for ABA avoidance [6]. The approach is effective, when it is possible to apply, but suffers from a ...Missing: mitigation | Show results with:mitigation
  22. [22]
    Simple, fast, and practical non-blocking and blocking concurrent ...
    1 Compare.and.swap, introduced on the IBM System 370, take:s as arguments ... ABA problem: if a process reads a value A in a shared loca- tion ...<|control11|><|separator|>
  23. [23]
    [PDF] A Pragmatic Implementation of Non-Blocking Linked-Lists
    This paper presents a new non-blocking, linearizable linked-list implementation using compare-and-swap (CAS) with two CAS operations for deletion, and is ...
  24. [24]
    AtomicStampedReference (Java Platform SE 8 )
    ### Summary of How AtomicStampedReference Solves the ABA Problem
  25. [25]
    [PDF] Evaluating the Cost of Atomic Operations on Modern Architectures
    Oct 19, 2020 · In this paper we establish an evaluation methodology, develop a performance model, and present a set of detailed benchmarks for latency and ...Missing: ARM | Show results with:ARM
  26. [26]
    The Balancing Act of Choosing Nonblocking Features - ACM Queue
    Aug 12, 2013 · ... Compare-and-Swap (CAS) atomic primitive. CAS atomically reads a shared location and compares the read value with an expected value. If they ...
  27. [27]
    A Study on the Performance Implications of AArch64 Atomics
    May 21, 2023 · In this paper we explore the performance of the CAS and LL-SC approaches to implement CAS operations on recent high-performance Arm-based CPUs, ...Missing: AMD | Show results with:AMD
  28. [28]
    C11 Lock-free Stack - null program
    Sep 2, 2014 · It's centered around the new C11 stdatomic.h function atomic_compare_exchange_weak() . This is an atomic operation more generally called compare ...
  29. [29]
    std::atomic<T>::compare_exchange_weak, std::atomic<T>::compare_exchange_strong - cppreference.com
    ### Summary of `std::atomic::compare_exchange_strong` and `compare_exchange_weak`
  30. [30]
    __atomic Builtins (Using the GNU Compiler Collection (GCC))
    This built-in function implements an atomic compare and exchange operation. This compares the contents of * ptr with the contents of * expected . If equal ...Missing: pseudocode | Show results with:pseudocode
  31. [31]
    InterlockedCompareExchange function (winnt.h) - Win32 apps
    May 23, 2022 · Performs an atomic compare-and-exchange operation on the specified values. The function compares two specified 32-bit values and exchanges with another 32-bit ...
  32. [32]
  33. [33]
  34. [34]
    CMPXCHG — Compare and Exchange
    Compares the value in the AL, AX, EAX, or RAX register with the first operand (destination operand). If the two values are equal, the second operand (source ...Missing: PowerPC | Show results with:PowerPC
  35. [35]
    Why The Latest Linux Kernel Won't Run On Your 486 And 586 ...
    Jul 2, 2025 · ... CMPXCHG8B instructions. These became part of x86 when Intel introduced the very first Pentium processors to the market in the early 1990s.
  36. [36]
    X86-64 microarchitecture levels - openSUSE Wiki
    Jul 4, 2025 · The original specification, created by AMD and released in 2000, has been implemented by AMD, Intel, and VIA. ... CPU features: CMOV, CMPXCHG8B ( ...
  37. [37]
    LDREX and STREX - RealView Developer Kit Assembler Guide
    The address used in an STREX instruction must be the same as the address in the most recently executed LDREX instruction. The result of executing an STREX ...Missing: CAS | Show results with:CAS
  38. [38]
    caspa, caspal, casp, caspl, caspal, casp, caspl - Arm Developer
    Supported in ARMv8.1 and later. Usage. Compare and Swap Pair of words or doublewords in memory reads a pair of 32-bit words or 64 ...
  39. [39]
    riscv-zacas created from docs-spec-template template - GitHub
    Mar 21, 2024 · The Zacas extension adds support for amocas.w, amocas.d, and amocas.q instructions to perform atomic Compare-and-Swap (CAS) operations.
  40. [40]
    Which CPU architectures support Compare And Swap (CAS)?
    Sep 30, 2008 · There are two major approach for hardware synchronization support by CPU: Atomic transaction based. Cache coherence protocol based. No one is ...Fastest way to Compare And Swap (CAS) on Intel x86 CPU?Is there hardware support for 128-bit integers in modern processors?More results from stackoverflow.comMissing: 2000s | Show results with:2000s
  41. [41]
    [PDF] Cache Coherence and Atomic Operations in Hardware
    Dec 1, 2006 · Writable: Exactly 1 cache has the block and only that processor can write to it. 3. Read-only: Any number of caches can hold the block, and.
  42. [42]
    How does the cache coherency protocol enforce atomicity?
    Aug 17, 2014 · In case of MESI , the core executing the atomic instruction will first ensure it has ownership of the cache line containing the operand and ...Why can the MESI protocol not guarantee atomicity of CMPXCHG on ...LOCK prefix vs MESI protocol? - Stack OverflowMore results from stackoverflow.com
  43. [43]
    Ratified Extensions - Home - RISC-V Tech Hub
    Aug 29, 2025 · Atomic Compare-and-Swap (CAS) Instructions (Zacas). November 2023. Zacas. RISC-V Cryptography Extensions Volume II: Vector Instructions.
  44. [44]
    [PDF] MIPS IV Instruction Set
    There are paired instructions, Load Linked and Store Conditional, that can be used ... CCA of the reference, however, and the pseudocode for load and store common.
  45. [45]
    Exclusive access instructions - ARMv8-A synchronization primitives
    In AArch32 state, A32 and T32 have instructions with a similar behavior: Load Exclusive ( LDREX ); Store Exclusive ( STREX ). Whenever an address is read using ...Missing: CAS | Show results with:CAS
  46. [46]
    2023/09/01 - CAS, ABA and LL/SC - memzero
    Load linked store conditional. So far we looked at atomic CAS instructions which have read-modify-write semantics and are typically found with CISC ISAs.
  47. [47]
    Atomic types - The Linux Kernel documentation
    However, this is not evident on LL/SC architectures, because while an LL/SC architecture 'can/should/must' provide forward progress guarantees between competing ...
  48. [48]
    Livermore S-1 Supercomputer -- A Short History - Clemson University
    Summary: The S-1 project was an attempt to build a family of multiprocessor supercomputers. The project was envisioned by Lowell Wood at the Lawrence Livermore ...
  49. [49]
    [PDF] The Synergy Between Non-blocking Synchronization and Operating ...
    The main techniques we use for modularity, performance and reliability are atomic DCAS(or. Double-Compare-and-Swap), type-stable memory management (TSM), and ...
  50. [50]
    [PDF] A Practical Multi-Word Compare-and-swap Operation - Tim Harris
    Abstract. Work on non-blocking data structures has proposed extend- ing processor designs with a compare-and-swap primitive, CAS2, which.Missing: adoption 1990s
  51. [51]
    CMPXCHG8B/CMPXCHG16B — Compare and Exchange Bytes
    Compares the 64-bit value in EDX:EAX (or 128-bit value in RDX:RAX if operand size is 128 bits) with the operand (destination operand).Missing: introduction date
  52. [52]
    DCAS-based concurrent deques - ACM Digital Library
    The computer industry is currently examining the use of strong synchronization operations such as double compare-and-swap (DCAS) as a means of supporting ...
  53. [53]
    [PDF] DRAMHiT: A Hash Table Architected for the Speed of DRAM
    To implement insertion for tuples that are smaller than 16 bytes, DRAMHiT relies on a double-word compare-and-swap (cas) instruction. Concur- rent updates are ...
  54. [54]
    [PDF] On the Interplay of Hardware Transactional Memory and Lock-Free ...
    A nice feature of lock elision is that it is identical to programming with locks (Figure 1b). The difference between traditional lock-based synchronization is ...