Fact-checked by Grok 2 weeks ago

Transactional memory

Transactional memory (TM) is a concurrency control mechanism in computer science that enables groups of memory read and write operations to execute atomically and in isolation, akin to database transactions, ensuring that concurrent threads perceive either all changes from a transaction or none at all.^[1] This approach simplifies parallel programming by allowing developers to mark critical code sections as transactions, automatically handling synchronization without explicit locks.^[2] Proposed initially in theoretical work by David Lomet in 1977 and practically introduced as an architectural concept by Maurice Herlihy and J. Eliot B. Moss in 1993, TM addresses the complexities of shared-memory multiprocessors by providing ACID properties—atomicity, consistency, isolation, and durability (with the latter often application-specific)—for memory accesses.^[1]^[2] TM variants include hardware transactional memory (HTM), which leverages processor extensions like speculative execution and cache modifications for low-overhead performance; software transactional memory (STM), implemented through runtime libraries that manage conflicts via versioning or locking; and hybrid TM, which combines hardware acceleration with software fallbacks for robustness.^[3] HTM implementations, such as Intel's Transactional Synchronization Extensions (TSX) introduced in 2013, support instructions like XBEGIN and XEND to delimit transactions, achieving overheads as low as 2-10% of non-transactional code in ideal cases.^[4] STM, exemplified by early systems like DSTM from 2003, offers greater flexibility across hardware but incurs higher runtime costs due to software-based conflict detection.^[2] The primary advantages of TM over traditional lock-based synchronization include reduced risk of deadlocks, priority inversion, and convoying effects, as well as enhanced composability for building complex concurrent data structures.^[1] Simulations and benchmarks from the 1990s onward demonstrated that TM could match or exceed lock performance in scenarios like counter increments and linked-list operations on up to 32 processors.^[1] By the 2000s, surging interest in multicore processors fueled TM research, leading to its adoption in production systems like IBM's Blue Gene/Q and applications in garbage collection, data-race detection, and side-channel mitigation.^[5]^[4] Despite these benefits, TM faces challenges such as transaction aborts from conflicts, limited cache capacities (e.g., write sets restricted to L1 cache sizes of 22-31 KB in Intel TSX), and nondeterministic behavior due to hardware policies like cache replacement.^[4] As of 2021, optimizations like cache warmup techniques have pushed effective read capacities to nearly full last-level cache sizes (up to 97% utilization), but widespread commercial use remains tempered by these limitations and the need for fallback mechanisms in hybrid systems.^[4] Ongoing research explores machine learning for conflict prediction and energy-efficient implementations, underscoring TM's enduring relevance in scalable parallel computing.^[6]

Fundamentals

Definition and Core Principles

Transactional memory (TM) is a concurrency control mechanism that enables groups of memory operations, such as reads and writes, to execute atomically and in isolation within multi-threaded programs on shared-memory multiprocessors.^[7]^[8] This paradigm draws an analogy to database transactions, allowing programmers to define critical sections as transactions that appear to execute sequentially relative to one another, without the need for explicit locking of individual data items.^[9] At its core, transactional memory relies on speculative execution, where a transaction proceeds optimistically by performing memory operations under the assumption that no conflicts will arise with concurrent transactions.^[7]^[9] Key phases include the transaction's begin point, which initiates a local execution context; the execution phase, involving tentative reads and writes; validation to check for conflicts with other transactions; and either a commit to atomically install changes or an abort to discard them and rollback the state.^[8]^[9] These principles ensure that transactions maintain atomicity, consistency, and isolation (ACI).^[9] In the operational model, a transaction consists of a sequence of load and store instructions executed as if in a single-threaded context, with the underlying system—whether hardware or software—responsible for buffering changes and guaranteeing atomicity only upon successful commit.^[7]^[8] Conflicts detected during validation trigger aborts, often leading to retries, while successful commits make the transaction's effects visible to other threads instantaneously.^[9] A simple example is a transactional increment of a shared counter, which demonstrates a read-modify-write operation:

begin_transaction
  local_value = load([counter](/page/Counter))
  local_value = local_value + 1
  store([counter](/page/Counter), local_value)
if commit_transaction then
  // Success: [counter](/page/Counter) is atomically incremented
else
  // Abort: retry the [transaction](/page/Transaction)
end if
begin_transaction
  local_value = load([counter](/page/Counter))
  local_value = local_value + 1
  store([counter](/page/Counter), local_value)
if commit_transaction then
  // Success: [counter](/page/Counter) is atomically incremented
else
  // Abort: retry the [transaction](/page/Transaction)
end if

This pseudocode illustrates how the increment appears atomic to other transactions, with the system handling any necessary rollback on conflict.^[7]^[9]

Key Properties

Transactional memory (TM) systems are designed to provide a set of core properties that guarantee correct concurrent execution of critical sections, drawing inspiration from database transactions but adapted for in-memory operations. These properties, often referred to as ACI (atomicity, consistency, isolation), ensure that transactions behave as indivisible units without interfering with one another, while a fourth property, durability, is typically absent in standard TM but can be added in persistent variants.^[10]^[9] Atomicity guarantees that all memory operations within a transaction are executed as a single, indivisible unit: upon successful completion, all changes become visible to other threads simultaneously; if the transaction aborts due to a conflict or exception, all changes are discarded, leaving the shared state unchanged. This "all-or-nothing" semantics prevents partial updates that could lead to inconsistent data structures, as exemplified in lock-free implementations where tentative writes are buffered until commit.^[7]^[10] Consistency ensures that each transaction, starting from a consistent shared state, transforms the data into another consistent state, preserving application-specific invariants such as the absence of duplicates in a data structure. In TM, this is often achieved through serializability guarantees, where transactions maintain the illusion of sequential execution, and validation mechanisms ensure reads observe a consistent snapshot without violations of program logic. For instance, opacity—a stronger form of consistency—prevents transactions from observing intermediate states of aborted or concurrent transactions.^[9]^[11] Isolation provides the property that concurrent transactions do not observe or interfere with each other's intermediate states, making the execution equivalent to some serial ordering of the transactions. This is typically enforced through conflict detection, where reads and writes are tracked to identify overlaps, leading to aborts and retries if necessary; committed transactions appear atomic and non-interleaved to others, supporting serializability levels like strict serializability in advanced models.^[7]^[10] Durability, while a standard property in database transactions, is not inherently provided by conventional TM systems operating on volatile memory, as aborts simply discard buffered changes without persistence guarantees. However, in persistent transactional memory (PTM) designs integrated with non-volatile memory technologies like phase-change memory, durability ensures that once a transaction commits, its effects survive system crashes or failures, often through logging or direct writes to persistent storage upon commit.^[10]^[12] The ownership model in TM grants transactions implicit exclusive access to memory locations during execution, eliminating the need for programmers to manage explicit locks while preventing concurrent modifications. This is realized through hardware or software mechanisms that track read and write permissions on data items, such as cache-line ownership in hardware TM or versioned metadata in software TM, ensuring conflicts are detected efficiently without global synchronization.^[7]^[9]

Motivation and Advantages

Limitations of Traditional Synchronization

Traditional synchronization mechanisms, such as locks and barriers, are prone to deadlocks, particularly when using fine-grained locking to minimize contention, as this can lead to circular waits where multiple threads hold resources while awaiting others in a cycle.^[13] Avoiding deadlocks requires complex strategies like lock ordering or timeout mechanisms, which add significant design overhead and are error-prone in large-scale systems.^[14] Locks also suffer from composability issues, making it difficult to nest or combine locking operations across modules without introducing races, deadlocks, or priority inversions, as components must explicitly coordinate on lock acquisition orders.^[15] This lack of modularity complicates software maintenance and reuse, as changes in one part of the code can propagate locking dependencies throughout the system.^[16] Scalability problems arise with coarse-grained locks, which serialize access and cause high contention under load, limiting throughput on multicore processors, while fine-grained locks, though reducing contention, incur substantial overhead from frequent acquisition and release operations, along with increased risk of bugs.^[17] In read-heavy workloads, such as those in the readers-writers problem, traditional reader-writer locks often fail to scale efficiently, as writers block multiple concurrent readers, exacerbating bottlenecks.^[18] The programming complexity of manual lock management further compounds these issues, as developers must meticulously handle acquisition and release to prevent bugs like forgotten unlocks, which can lead to indefinite blocking or resource leaks, and race conditions from improper ordering.^[16] Classic examples illustrate these pitfalls: in the dining philosophers problem, five philosophers sharing forks via locks can deadlock if each acquires one fork and waits for the next, demonstrating circular wait hazards without careful protocol enforcement.^[19] Similarly, the readers-writers problem highlights fairness and scalability challenges, where naive locking favors readers but starves writers or vice versa, requiring intricate priority schemes that are hard to implement correctly.^[18]

Benefits for Concurrent Programming

Transactional memory (TM) significantly simplifies concurrent programming by automating synchronization through transactional annotations, allowing developers to write lock-free code without manually managing locks, unlocks, or complex retry logic. This approach shifts the burden of concurrency control from programmers to the underlying system, enabling focus on algorithmic logic rather than low-level synchronization details. For instance, operations on shared data structures can be enclosed in atomic blocks, where the system handles conflicts transparently, reducing the error-prone aspects of traditional locking mechanisms.^[20]^[21] A key advantage is TM's inherent deadlock-freedom, achieved through optimistic execution that avoids explicit locks and their associated holding periods. Unlike fine-grained locking, which risks deadlocks from circular wait conditions, TM speculatively executes transactions and aborts only on conflicts, preventing indefinite blocking. While livelocks are mitigated and starvation remains possible under high contention, this model substantially lowers the risk of programming errors related to lock ordering and acquisition hierarchies.^[20]^[22] TM enhances composability by allowing transactions to nest or combine modularly, facilitating the construction of larger concurrent abstractions from smaller, independently developed components. This property supports natural integration of operations across multiple data structures, unlike locks that often require careful coordination to avoid interference. As a result, developers can build scalable, reusable concurrent algorithms with reduced coupling between synchronization and data access.^[22]^[21] In terms of performance, TM yields gains in low-contention environments by enabling speculative parallel execution, which reduces overhead from lock contention and improves scalability on multicore systems. Simulations and benchmarks demonstrate that TM matches or exceeds locking techniques for simple operations, with speedups up to 29 times over sequential code in read-dominated workloads on 64 threads. This is particularly evident in scenarios with infrequent conflicts, where transactions commit efficiently without frequent aborts.^[20]^[23] Practical examples illustrate these benefits: implementing a concurrent queue with TM requires only wrapping enqueue and dequeue operations in transactions, automatically handling races without custom retry loops or lock variants, unlike locked queues that demand careful lock granularity to avoid bottlenecks. Similarly, hash tables benefit from TM's ability to atomically update multiple entries, simplifying resizing or insertion under concurrency while maintaining higher throughput in low-conflict access patterns compared to reader-writer locks.^[20]^[21]^[23]

Implementation Paradigms

Hardware Transactional Memory

Hardware transactional memory (HTM) provides processor-level support for executing sequences of instructions atomically and in isolation, leveraging dedicated hardware mechanisms to track and manage transactional state without software intervention. Core implementations typically employ hardware buffers, such as load and store buffers integrated into the processor's cache hierarchy, to capture and isolate reads and writes performed within a transaction. These buffers maintain a private copy of modified data, ensuring that transactional operations do not affect the global memory state until commit. Conflict detection operates at the granularity of cache lines (typically 64 bytes), where the underlying cache coherence protocol—such as MESI or MOESI—monitors access patterns to identify read-write or write-write conflicts between concurrent transactions.^[24]^[25]^[26] A key design choice in HTM systems is the approach to conflict detection: eager or lazy. In eager conflict detection, write-write conflicts trigger an immediate abort upon the conflicting access, often using in-place updates with versioning to propagate changes early, which minimizes latency for short transactions but can increase abort rates in high-contention scenarios. Lazy conflict detection, conversely, buffers all changes privately and defers validation until the transaction attempts to commit, relying on coherence messages to signal conflicts during execution; this allows more optimistic progress but may lead to costly rollbacks if conflicts are detected late. Many commercial HTM designs adopt a hybrid model, combining elements of both for balanced performance.^[27]^[28]^[29] Prominent examples of HTM include Intel's Transactional Synchronization Extensions (TSX), introduced in 2013 with the Haswell microarchitecture, which supports two modes: Restricted Transactional Memory (RTM) using explicit instructions like XBEGIN, XEND, and XABORT for transaction boundaries, and Hardware Lock Elision (HLE) via memory prefixes for implicit elision of lock-based code. TSX buffers transactional writes in the L1 data cache and uses the cache coherence protocol for conflict detection at cache-line granularity, classifying as lazy detection with potential for early aborts on coherence events; however, due to security vulnerabilities like TAA (2019) and subsequent mitigations, TSX was disabled by default via microcode updates starting in 2021 and formalized in deprecation guidance by 2023, though it remains available with ongoing microcode support and user-enabled reconfiguration in 2025 processors. Another implementation is ARM's Transactional Memory Extension (TME), added as an optional feature in the Armv9 architecture in 2020, providing intrinsics such as TSTART, TCOMMIT, TCANCEL, and TTEST to delimit transactions in an execution state-based model. TME employs eager conflict detection with lazy version management, also at cache-line granularity, offering best-effort isolation for short, low-conflict code regions.^[30]^[26]^[31] HTM excels in providing low-overhead speculation for short transactions, often achieving near-native execution speeds with minimal latency for conflict-free paths, making it suitable for fine-grained parallelism in uncontested workloads. However, limitations include bounded transaction capacity tied to hardware resources like L1 cache size—typically around 32 KB for reads in Intel TSX implementations—beyond which transactions abort due to overflow, restricting applicability to larger or data-intensive operations.^[25]^[32]

Software Transactional Memory

Software transactional memory (STM) implements the transactional memory abstraction entirely in software, relying on runtime libraries and compiler support to manage concurrency without dedicated hardware instructions. This approach prioritizes portability across diverse architectures, enabling transactional semantics on commodity processors where hardware transactional memory is unavailable or insufficient. Unlike hardware-based systems that leverage specialized buffers for fast execution, STM enforces atomicity, consistency, and isolation through general-purpose atomic primitives and metadata tracking, though at the cost of higher runtime overhead.^[33] Core techniques in STM revolve around atomic primitives such as compare-and-swap (CAS) for validation and versioning mechanisms for tracking read-write sets. During transaction execution, reads and writes are buffered in software structures, often using opaque or timestamp-based versioning to detect conflicts by comparing versions against a global clock or per-object counters. For instance, a transaction records the version of each read location; at commit, it uses CAS operations to update write locations and validate that no read versions have changed, ensuring isolation without immediate locking. This direct-update style minimizes buffering overhead but requires careful metadata management to avoid false conflicts.^[34]^[35] STM systems differ in conflict detection strategies: eager versus lazy approaches. Eager conflict detection acquires fine-grained locks on accessed locations during transaction execution, performing both detection and resolution immediately to prevent wasted work on doomed transactions, which suits low-contention workloads but can introduce serialization under high contention. In contrast, lazy conflict detection defers locking and validation until commit time, allowing optimistic execution with minimal interference during reads and writes, but risks higher abort rates if conflicts are detected late, making it preferable for read-heavy or unpredictable access patterns. The choice impacts progress guarantees and scalability, with hybrid policies sometimes blending elements for balance.^[36]^[37] Overhead in STM arises primarily from metadata management, frequent aborts in high-contention scenarios, and instrumentation choices. Each transaction maintains per-object metadata (e.g., locks, version counters, or reader/writer bitmaps), which incurs space and access costs, often amplified by dynamic allocation during execution. Aborts, triggered by validation failures, lead to restarts that amplify computation waste, especially in contended environments where abort rates can exceed 50% without contention mitigation. Instrumentation—dynamic (runtime interception) versus static (compiler-inserted)—further contributes, with dynamic methods adding indirection overhead but enabling flexibility, while static reduces it at the expense of code size. These factors make STM 2-10x slower than lock-based code in low-contention benchmarks, though optimizations like contention avoidance can mitigate impacts.^[38]^[39] A seminal example is the TL2 algorithm, introduced in 2006, which employs direct-update with reader/writer metadata for efficient conflict detection. TL2 uses per-object locks for writes and a global timestamp for versioning, validating reads via CAS on a status field during commit; it achieves scalability on up to 64 cores with low abort rates in benchmarks like STAMP, outperforming earlier STMs by reducing metadata contention through striped reader tracking. More recently, PIM-STM (2024) adapts STM for processing-in-memory architectures, integrating versioning with near-memory compute to minimize data movement; it reports up to 4x speedup over CPU-based STMs in graph workloads by localizing conflict resolution, demonstrating STM's evolution toward specialized hardware-software synergies without relying on core transactional instructions.^[34]^[40]

Hybrid Approaches

Hybrid transactional memory (HyTM) systems integrate hardware transactional memory (HTM) capabilities with software transactional memory (STM) mechanisms to achieve a balance between the high performance of hardware acceleration and the flexibility of software to handle limitations such as transaction size constraints and contention scenarios. In HyTM designs, transactions typically attempt execution using hardware speculation for the fast path, falling back to software-based execution upon hardware aborts. This approach ensures progress even when hardware resources are exhausted or conflicts exceed hardware capabilities. A seminal HyTM proposal, introduced in 2006, demonstrates that such integration can significantly outperform pure software or hardware-only systems on existing hardware by dynamically selecting the execution path based on transaction characteristics. One prominent example is the Hybrid NOrec protocol, which builds on the lightweight NOrec STM by incorporating best-effort HTM for speculative execution while using software fallbacks, often lock-based, for aborts. This design minimizes software overhead during successful hardware commits and ensures scalability under high contention by leveraging NOrec's lazy conflict detection in fallback mode. Evaluations show Hybrid NOrec achieving up to 2x speedup over pure STM in microbenchmarks with low abort rates, while maintaining correctness on hardware like early HTM prototypes.^[41] Contention management in HyTM often assigns hardware to low-conflict, small transactions for rapid execution, reserving software for complex conflicts or oversized transactions that exceed hardware buffers. For instance, Intel's TSX, with its restricted transactional memory (RTM) interface, is commonly paired with software wrappers to handle irrecoverable aborts—such as those due to capacity overflows or asynchronous conflicts—by retrying via STM or locks, thus mitigating TSX's vulnerability to denial-of-service attacks from repeated aborts. Similarly, ARM's Transactional Memory Extension (TME) integrates with STM libraries to enhance durability in persistent memory settings; the 2024 DUMBO system exemplifies this by using HTM for efficient read-only transactions, augmented with software for persistence guarantees, outperforming prior persistent HTM designs by up to 4x in benchmarks like TPC-C.^[42]^[43] These hybrid approaches leverage hardware's speed for common cases—reducing latency by orders of magnitude compared to pure STM—while software addresses hardware limits like L1 cache-based size caps (e.g., 8-32 cache lines in TSX) and security concerns such as speculative abort amplification, resulting in robust, progressive systems suitable for diverse workloads.^[44]

Historical Development

Early Concepts

The foundational idea of using atomic actions similar to database transactions for process structuring and synchronization was proposed by David Lomet in 1977.^[45] The roots of transactional memory trace back to the 1980s, when concepts from database transactions were explored for hardware-level concurrency control. In particular, H. T. Kung and John T. Robinson proposed optimistic concurrency control methods that avoid locking by allowing transactions to proceed speculatively and validate at commit time, drawing parallels to database systems but adaptable to hardware environments.^[46] This approach emphasized forward processing with backward validation to minimize contention, laying groundwork for non-locking mechanisms in shared-memory systems.^[47] A pivotal milestone came in 1993 with Maurice Herlihy and J. Eliot B. Moss's proposal of hardware transactional memory as a direct alternative to traditional locking primitives. Their architecture introduced speculative buffering, where transactions use a dedicated cache to hold tentative updates, enabling atomic multi-word operations without immediate visibility to other processors.^[7] By extending cache-coherence protocols, this design supported instructions like load-transactional and commit, aiming to simplify lock-free data structures while reducing the complexity of compare-and-swap sequences.^[7] Early ideas in software transactional memory emerged around the early 2000s, influenced by efforts to build practical lock-free data structures. Keir Fraser and Tim Harris's 2003 work demonstrated scalable implementations of concurrent objects, such as queues and lists, using obstruction-free techniques that inspired subsequent software transactional memory systems by providing composable, non-blocking abstractions.^[48] These foundational proposals also highlighted key challenges, including the need for invisible reads to prevent transactions from exposing partial states and mechanisms for conflict serialization to ensure consistent ordering upon aborts.^[7] Herlihy and Moss addressed invisible reads through validation checks to avoid inconsistencies from aborted "orphan" transactions, while conflicts were serialized via busy signals or backoff protocols.^[7]

Major Milestones

In the mid-2000s, one of the first commercial implementations of hardware transactional memory (HTM) was provided by Azul Systems in their Vega appliances, starting with the Vega 2 in 2007, to support concurrent Java applications on multicore hardware through optimistic transaction execution.^[49] Later in the decade, Sun Microsystems advanced hardware transactional memory (HTM) with its Rock processor prototype, announced in 2007 as a 16-core SPARC design incorporating HTM support for simplified multithreading.^[50] The Rock project, intended to boost server performance via transactional features, was ultimately canceled in 2009 amid financial challenges at Sun.^[51] The 2010s saw significant hardware advancements, beginning with IBM's Blue Gene/Q supercomputer in 2011, which integrated eDRAM-based HTM into its PowerPC A2 cores to facilitate efficient transactional execution in high-performance computing environments.^[52] This implementation provided low-overhead conflict detection and rollback, scaling to massive parallel workloads. In 2013, Intel launched Transactional Synchronization Extensions (TSX) with its Haswell microarchitecture, introducing restricted transactional memory (RTM) and hardware lock elision (HLE) instructions to mainstream x86 processors for broader software adoption. However, TSX faced setbacks due to security vulnerabilities, including the 2019 disclosure of TSX Asynchronous Abort (TAA, CVE-2019-11135), which enabled speculative data leakage; this led to microcode updates, such as those in June 2021 that disabled TSX by default on many processors, with further restrictions; as of 2025, it remains disabled in desktop processors but supported in some Xeon models to mitigate risks.^[53] Entering the early 2020s, ARM announced the Transactional Memory Extension (TME) in 2019 as an optional feature for the Armv9 architecture, providing best-effort HTM support to enhance concurrency in embedded and server systems.^[54] Despite this, standardization efforts for transactional memory in C++ stalled, with ISO/IEC TS 19841—published in 2015 as a technical specification for TM extensions—showing no progress toward integration into the core language standard by 2025.^[55] Meanwhile, simulation tools advanced, as seen in the 2020 integration of ARM TME support into the gem5 simulator, enabling researchers to model and evaluate HTM behaviors without physical hardware.^[56] Overall, transactional memory has experienced limited industry adoption due to implementation complexity, including challenges in hardware resource management and vulnerability mitigation, though it persists in niche high-performance and research contexts.^[33]

Algorithms and Mechanisms

Conflict Detection

Conflict detection in transactional memory systems identifies when concurrent transactions access overlapping data in ways that violate isolation, ensuring that committed transactions appear to execute serially in some order. The primary conflict types are read-write (RW), where one transaction reads data later written by another; write-read (WR), where one transaction writes data later read by another; and write-write (WW), where two transactions write to the same data. These conflicts arise because transactional memory enforces serializability or opacity, requiring a total serialization order among committed transactions to prevent anomalies like non-repeatable reads or lost updates.^[57] Eager conflict detection identifies violations during transaction execution, aborting conflicting transactions immediately to minimize wasted work. In hardware transactional memory (HTM), this often involves monitoring accessed addresses using signatures or Bloom filters to approximate read and write sets, leveraging cache coherence protocols to detect overlaps on the fly. For example, LogTM-SE uses Bloom filter-based signatures to summarize transaction footprints and trigger aborts via coherence messages. In software transactional memory (STM), eager detection typically employs per-object locks acquired on first access, blocking or aborting transactions that attempt to access locked items. This approach reduces commit-time overhead but can increase contention under high overlap.^[58]^[57] Lazy conflict detection defers checks until commit time, allowing transactions to execute optimistically while tracking reads and writes in logs for later validation. Systems maintain version numbers or timestamps on shared objects; at commit, a transaction verifies that no intervening writes occurred by comparing log entries against current versions, aborting if a RW, WR, or WW conflict is found. TL2, a widely adopted STM algorithm, exemplifies this by using global clocks for versioning and validating read sets against write timestamps during commit. This method suits low-contention workloads but may lead to more rollbacks in high-conflict scenarios.^[34] Conflict detection granularity affects precision and overhead, with finer levels reducing false positives but increasing tracking costs. Word-level detection allows precise checks on individual data items, common in software implementations for avoiding unnecessary aborts. In contrast, many hardware systems operate at cache-line granularity to align with cache protocols, potentially causing false conflicts when unrelated data shares a line. Intel's Transactional Synchronization Extensions (TSX), for instance, detects conflicts at 64-byte cache-line boundaries using coherence snoops, leading to spurious aborts on intra-line overlaps.^[59]

Transaction Management

Transactional memory systems manage the lifecycle of transactions through distinct states that ensure atomicity and isolation. A transaction begins in an active state, where it speculatively executes operations on shared data without immediately making changes visible to other threads. During this phase, reads and writes are tracked in buffers or logs to maintain isolation. Upon successful completion without conflicts, the transaction transitions to a committed state, atomically installing all changes as if they occurred instantaneously. If a conflict or other issue arises, the transaction enters an aborted state, where all speculative modifications are discarded, restoring the system to its pre-transaction state.^[60]^[9] The commit process requires validation of the transaction's read and write sets to confirm no conflicts with concurrent transactions. In hardware transactional memory (HTM), such as Intel's Transactional Synchronization Extensions (TSX), successful validation culminates in a single atomic instruction, like XEND, which flushes speculative writes from a hardware buffer to the main memory hierarchy, making changes globally visible without intermediate states. This ensures atomicity by leveraging cache coherence protocols to broadcast updates efficiently. In software transactional memory (STM), commit involves updating version numbers or locks on accessed objects and integrating shadow copies into the global state, often using optimistic concurrency control.^[59]^[7] Abort handling reverses speculative execution to preserve consistency. In HTM implementations like TSX, aborts discard buffered writes automatically via cache invalidation. For irrecoverable cases such as exceptions, the transaction aborts and the exception is re-raised; other aborts set the EAX register with a status code indicating the reason, allowing software to inspect and invoke fallback code if specified via XBEGIN. STM systems typically employ undo logs or shadow copies for rollback: writes are recorded in per-transaction logs, which are discarded on abort, or tentative versions are maintained separately until commit. This mechanism ensures that aborted transactions leave no observable effects, akin to non-occurrence.^[59]^[61] Nested transactions extend the lifecycle by composing inner transactions within outer ones, enabling modular programming. In closed nesting, an inner transaction's commit only affects the outer's private state, with visibility deferred until the outermost commit; aborts propagate upward, rolling back the entire hierarchy. Open nesting allows inner commits to release resources immediately, visible to the outer transaction but not globally until the outer commits, supporting partial progress. These models maintain isolation across levels using shared or separate read/write sets.^[62] Recovery from aborts often involves retrying the transaction, with repeated failures triggering a fallback to traditional locking mechanisms to guarantee progress in hybrid systems. For durability in persistent memory environments, commit protocols flush transaction logs or buffers to non-volatile storage, ensuring crash recovery reconstructs committed states via redo logs while discarding aborted ones. This integrates transactional semantics with persistence without compromising atomicity.^[63]^[64]

Current Implementations

Hardware Support

Hardware support for transactional memory has been integrated into several major processor architectures, enabling efficient speculative execution of atomic code regions directly in hardware. Intel's processors provide one of the most mature implementations through the Transactional Synchronization Extensions (TSX), which includes both Hardware Lock Elision (HLE) and Restricted Transactional Memory (RTM) modes. TSX is available in Intel Core series processors starting from the Haswell microarchitecture (2013) and later generations, such as Broadwell, Skylake, and subsequent families up to the latest Core Ultra series in 2025. These extensions allow developers to mark code regions as transactions using specific instructions, with hardware managing conflict detection and rollback on aborts. TSX has been disabled by default since 2021 microcode updates due to security concerns, including vulnerabilities like Transactional Asynchronous Abort (TAA), though it can be re-enabled via BIOS settings or software on supported hardware; ongoing microcode updates continue to address related issues as of 2025. [](https://access.redhat.com/articles/tsx-asynchronousabort) ARM architectures incorporate transactional memory via the optional Transactional Memory Extension (TME), introduced as part of the Armv9-A architecture specification in 2021. TME provides a best-effort hardware transactional memory mechanism, supporting nested transactions and conflict detection through dedicated instructions, with potential implementation in high-performance Armv9 cores such as those in the Neoverse series for server processors. [](https://developer.arm.com/documentation/ddi0617/latest/) Earlier Neoverse V1 cores, used in AWS Graviton3 since 2021, do not support TME as they are based on Armv8.x, but later Armv9 designs enable it for cloud and datacenter workloads where implemented. TME focuses on isolated execution states, with hardware buffering writes until commit, and has seen extensions in Armv9.2 for improved scalability in multi-core environments, though durable transaction guarantees remain software-dependent. `` Other platforms offer hardware transactional memory (HTM) features, though adoption varies. IBM's POWER9 processors, introduced in 2017, include HTM support inherited from POWER8, enabling speculative execution with hardware-managed conflict resolution for enterprise applications; this remains available in 2025 despite end-of-support announcements for select models in 2026. [](https://docs.kernel.org/arch/powerpc/transactional_memory.html)

[](https://www.ibm.com/support/pages/power9-power10-and-power11-system-fw-release-planned-schedule-2024-2025-updated-july-2025)

Similarly, IBM z15 mainframe processors (2019) feature Hardware Transactional Execution (HTE), an HTM variant optimized for high-reliability workloads like financial transactions. `` Support in GPUs is limited; NVIDIA's architectures, including Hopper and Blackwell series up to 2025, lack standardized HTM but include experimental coherence mechanisms for transactional-like operations in research contexts. [](https://arxiv.org/html/2507.02770v1) As of 2025, RISC-V has no widespread hardware support for transactional memory, with ongoing discussions in the community but no ratified extension in the base ISA or profiles like RVA23. [](https://riscv.org/blog/whats-on-tap-from-risc-v-in-2025/) [](https://www.researchgate.net/publication/368673799_RISC-V_Instruction_Set_Architecture_Extensions_A_Survey) Programmers access these features through architecture-specific intrinsics. For Intel TSX (RTM), key instructions include XBEGIN to initiate a transaction, XEND to commit it, and XABORT for explicit cancellation, with XTEST querying transaction status; these are exposed via compiler intrinsics like _xbegin() and _xend(). `` Compatibility notes include potential disabling of TSX in some Linux kernels (e.g., via prctl or boot parameters) for security reasons related to side-channel attacks, though it can be re-enabled. [](https://access.redhat.com/articles/tsx-asynchronousabort) In ARM TME, intrinsics such as __tstart() start a transaction (returning 0 on success), __tcommit() finalizes it, and __tcancel() aborts, supporting up to 255 nesting levels. [](https://developer.arm.com/documentation/101028/latest/16--Transactional-Memory-Extension--TME--intrinsics) For IBM platforms, POWER uses instructions like TBEGIN and TEND, while z15 employs similar HTE opcodes, often abstracted in compilers like GCC. [](https://docs.kernel.org/arch/powerpc/transactional_memory.html)

Software Libraries

Software transactional memory (STM) libraries provide portable implementations of transactional semantics in software, enabling concurrent programming without hardware-specific features. These libraries typically rely on fine-grained locking, optimistic concurrency control, or lock-free techniques like compare-and-swap (CAS) operations to manage conflicts and ensure atomicity.^[65] Established libraries such as RSTM for C++ offer a framework for building STM systems with support for word-based and object-based transactions, emphasizing portability across architectures.^[65] Similarly, TL2 implements a portable STM algorithm using thread-local read and write sets with global time stamps for conflict detection, achieving low overhead in multiprocessor environments.^[66] In functional languages, Haskell's STM library, updated in April 2024, enhances composability by allowing transactions to be nested and composed modularly, facilitating safe concurrent data structures like TVars for shared state.^[67] For Python, PyPy's STM implementation is an experimental feature based on Python 2.7, enabling parallelism for pure Python code within a separate interpreter branch, but it has seen no significant updates or integration into mainline PyPy as of 2025.^[68] Recent advancements include PIM-STM, introduced in 2024, which adapts STM for processing-in-memory (PIM) architectures to minimize data movement latency by executing transactions directly in memory controllers.^[69] In OCaml, the kcas library provides a lock-free STM based on multi-word compare-and-set (MCAS) operations, supporting composable concurrent abstractions without traditional locks.^[70] For Scala, ZIO STM integrates transactional effects into the ZIO ecosystem, enabling functional concurrency with typed error handling and resource management within transactions.^[71] Language-specific support extends to C/C++ via libraries like libtm, which offers a lightweight STM interface for integrating transactions into existing codebases using compiler directives.^[38] In Java, DeuceSTM provides a dynamic, instrumented STM that transforms bytecode to support opaque transactions, suitable for legacy applications.^[72] Notably, Unreal Engine's Verse scripting language, updated in 2024, incorporates TM semantics for game development, ensuring atomic updates in multiplayer scenarios through compiler-enforced transactional blocks.^[73] Current trends in software TM libraries emphasize integration with hardware transactional memory (HTM) for fallback mechanisms in hybrid systems, support for persistent transactions on non-volatile memory (NVM) to ensure durability across failures, and developer-friendly features like source-code annotations for automatic transaction demarcation. These evolutions aim to balance performance and usability in heterogeneous computing environments.^[74]

Challenges and Limitations

Performance Issues

Transactional memory (TM) systems introduce several overheads that affect runtime performance, particularly through speculation and conflict resolution mechanisms. In low-contention environments, speculation costs arise from tracking memory accesses, validation, and commit operations, typically adding 1-10% overhead and enabling near-native performance when conflicts are rare, as seen in Intel's TSX for microbenchmarks with small transactions.^[39] These overheads result in modest slowdowns compared to non-transactional code, often amortized in larger critical sections but noticeable in microbenchmarks with small transactions. Under high-contention workloads, abort frequency becomes a dominant issue, with rates exceeding 80% in persistent TM systems like DUDETM, necessitating multiple retries—sometimes up to dozens per transaction—to resolve conflicts and ensure progress. This amplifies wasted work, as rollbacks discard speculative state and restart execution, exacerbating latency in data structures with frequent overlapping accesses. Eager conflict detection can mitigate some aborts via stalls rather than full restarts, but dueling upgrades and friendly fire pathologies still degrade throughput by 30-70% in benchmarks like raytracing and caching.^[75]^[76] Scalability in TM is generally strong for small core counts, yielding linear throughput gains up to 8 cores in microbenchmarks such as hash tables, where contention remains manageable. Beyond this, performance plateaus or declines due to validation contention on shared metadata and centralized components like grace-period detection, limiting benefits on larger systems. Hardware constraints, such as implementation-defined limits on read/write set sizes in ARM's TME, further cap transaction capacity and contribute to abort-induced bottlenecks. Despite these challenges, TM delivers notable metrics like 5x higher throughput in in-memory database indexes under mixed read-write loads using TSX, compared to traditional reader-writer locks. Rollbacks impose additional energy costs, potentially increasing consumption by 20-30% over successful executions, though overall TM can reduce total energy by up to 50% relative to lock-based synchronization by minimizing unnecessary shared memory accesses. Adaptive fallback strategies, such as dynamic switching to locks upon repeated aborts, help alleviate these issues without detailed tuning.^[75]^[77]^[78]^[79]

Practical Constraints

Transactional memory implementations impose strict scope limitations to ensure atomicity and isolation, prohibiting operations that could introduce non-transactional side effects or external dependencies. For instance, input/output (I/O) operations and system calls are not permitted within transactions, as they may involve irrevocable actions or interactions with the operating system kernel. In hardware transactional memory (HTM) systems like Intel's Transactional Synchronization Extensions (TSX), transactions abort upon encountering such operations, including those triggered by interrupts or exceptions, to prevent partial execution states from affecting the broader system.^[80]^[44] Transaction size is another critical constraint, bounded by hardware resources such as cache capacities, which limit the footprint of read and write sets. In Intel TSX, for example, transactions typically span only hundreds to thousands of instructions before aborting due to capacity overflows in L1 caches or buffering structures, necessitating careful code structuring to avoid exceeding these thresholds.^[81]^[82] Security vulnerabilities further restrict practical deployment of transactional memory, particularly in speculative hardware implementations. The TSX Asynchronous Abort (TAA) vulnerability (CVE-2019-11135), disclosed in 2019, enables unprivileged speculative access to sensitive data in CPU buffers during transaction aborts, posing risks to confidentiality. Mitigations include microcode updates from Intel that disable TSX or clear affected buffers on context switches, with these updates integrated into processor firmware by 2021 and remaining standard in supported systems as of 2025.^[53] Additionally, the speculative nature of transactional execution exposes systems to broader side-channel attacks, such as those exploiting transient data leaks during aborted speculations, amplifying risks in multi-tenant environments.^[83] Debugging transactional memory programs presents significant hurdles due to their inherent non-determinism. Abort events, often caused by conflicts or capacity issues, vary across executions, making it difficult to reproduce and isolate bugs without specialized tools. Unlike traditional locking mechanisms, which offer predictable contention points, transactional aborts provide limited diagnostic information—such as coarse error codes in Intel TSX—lacking precise details on conflicting addresses or execution paths. The scarcity of mature debugging tools exacerbates this, as standard debuggers struggle with speculative states and parallel retries, often requiring ad hoc workarounds that alter program behavior.^[84] Adoption of transactional memory faces systemic barriers, including the absence of a standardized interface in major languages. As of 2025, C++ lacks built-in transactional memory support in its core standard, with proposals remaining as technical specifications rather than integrated features, complicating portability across compilers and platforms. Fallback mechanisms, essential for handling frequent aborts, introduce additional complexity by requiring hybrid designs that revert to locks or other synchronization primitives, which can lead to deadlocks, memory leaks, or performance inconsistencies if not meticulously implemented. Furthermore, transactional memory is inherently confined to shared-memory architectures, unsuitable for distributed systems where data resides across non-coherent nodes without explicit message passing.^[85]^[86]

References

[1]
https://dl.acm.org/doi/10.1145/165123.165164
[2]
Transactional Memory - Communications of the ACM
Jul 1, 2008 · TM offers a mechanism that allows portions of a program to execute in isolation, without regard to other, concurrently executing tasks.
[3]
Transactional Memory: An Overview | IEEE Journals & Magazine
Jun 30, 2007 · This article presents a survey of transactional memory, a mechanism that promises to enable scalable performance while freeing programmers from ...
[4]
Understanding and Utilizing Hardware Transactional Memory ...
Jun 22, 2021 · Transactional memory (TM) executes code sections auto- matically using optimistic parallelism: Marked sections of code execute as transactions, ...
[5]
What Scalable Programs Need from Transactional Memory
Transactional memory (TM) has been the focus of numerous studies, and it is supported in processors such as the IBM Blue Gene/Q and Intel Haswell.
[6]
Survey of machine learning application in transactional memory ...
In this survey different machine learning (ML) techniques are explored to improve performance, energy effectiveness, simplicity, and benefits of using TM.
[7]
[PDF] Transactional Memory: Architectural Support for Lock-Free Data ...
Transactional mem- ory allows programmers to define customized read-modify- write operations that apply to multiple, independently- chosen words of memory. It ...
[8]
[PDF] Software Transactional Memory - MIT
A sofiware transactional memory. (STM), is a shared ob- ject which behaves like a memory that supports multiple changes to its addresses by means of ...
[9]
[PDF] Transactional Memory - Computer Science (CS)
This book presents an overview of the state of the art in the design and implementation of transactional memory systems, as of early spring 2010. KEYWORDS.
[10]
Transactional memory | Communications of the ACM
Jul 1, 2008 · Compiler and runtime support for efficient software transactional memory. In Proceedings of the 2006 ACM SIGPLAN Conference on Programming ...
[11]
[PDF] Transactional memory coherence and consistency
This paper draws upon ideas from two existing bodies of work, database transaction processing systems and thread level specula tion (TLS), and applies them to ...
[12]
https://ieeexplore.ieee.org/document/6828760
[13]
Software and the Concurrency Revolution - ACM Queue
Oct 18, 2005 · In its simplest form, deadlock happens when two locks might be acquired by two tasks in opposite order: task T1 takes lock L1, task T2 takes ...
[14]
[PDF] Gadara: Dynamic Deadlock Avoidance for Multithreaded Programs
Deadlock in lock- based programs is difficult to reason about because locks are not composable: Deadlock-free lock-based soft- ware components may interact to ...Missing: traditional | Show results with:traditional
[15]
Digital Library: Communications of the ACM
Locking is not composable; a component must know about the locks taken by another component in order to avoid deadlock. TM can help reduce the complexity of ...
[16]
Concurrent programming without locks - ACM Digital Library
However, their apparent simplicity is deceptive: It is hard to design scalable locking strategies because locks can harbor problems such as priority inversion, ...
[17]
Real-World Concurrency - Communications of the ACM
Nov 1, 2008 · Know when—and when not—to break up a lock. Global locks can naturally become scalability inhibitors, and when gathered data indicates a single ...
[18]
Concurrent control with “readers” and “writers” - ACM Digital Library
The problem of the mutual exclusion of several independent processes from simultaneous access to a “critical section” is discussedMissing: original paper
[19]
Dining philosophers revisited | ACM SIGCSE Bulletin
In 1965 Dijkstra posed and solved the Dining Philosophers problem. ... This paper describes a problem in the solution of the dining philosophers problem ...Missing: original | Show results with:original
[20]
Transactional memory: architectural support for lock-free data ...
Transactional memory: architectural support for lock-free data structures ... View or Download as a PDF file. PDF. eReader. View online with eReader ...
[21]
Software transactional memory - ACM Digital Library
To improve the performance of transactional memory (TM), researchers have found many eager and lazy optimizations for conflict detection, the process of ...<|control11|><|separator|>
[22]
Composable memory transactions | Proceedings of the tenth ACM ...
In this paper we present a new concurrency model, based on transactional memory, that offers far richer composition. ... Software transactional memory for dynamic ...
[23]
Why STM Can Be More Than A Research Toy
Apr 1, 2011 · The transactional-memory paradigm simplifies concurrent programming by enabling programmers to focus on high-level synchronization concepts, or ...
[24]
[PDF] Hardware Transactional Memory - NYU Computer Science
Hardware vs. Software TM. Hardware Approach. • Low overhead. – Buffers transactional state in. Cache. • More concurrency. – Cache-line granularity. • Bounded ...
[25]
[PDF] Understanding and UtilizingHardware Transactional Memory Capacity
first level (L1) data cache to track transactional states. All tracking and data conflict detection are done at the granularity of a cache line, using.
[26]
Performance evaluation of Intel® Transactional Synchronization ...
All data conflicts are detected at the granularity of the 64-byte cache line because the implementation of TSX uses the L1 data cache to track transactional ...
[27]
[PDF] Lecture 7: Lazy & Eager Transactional Memory
• Lazy conflict detection: we are checking for conflicts only when one of the transactions reaches its end. • Aborts are quick (must just clear bits in cache ...Missing: hardware | Show results with:hardware
[28]
[PDF] EazyHTM: Eager-Lazy Hardware Transactional Memory
– Eager conflict detection, lazy conflict resolution. – Fast: performs well for low conflict parallel applications. – Minimal changes to directory protocols ...
[29]
Forgive-TM: Supporting Lazy Conflict Detection In Eager Hardware ...
Nov 26, 2024 · When a core inside a transaction receives a coherence request for data, it uses this information to determine whether there was a data conflict.
[30]
Exploring Intel® Transactional Synchronization Extensions with Intel ...
Intel TSX implements hardware support for a best-effort “transactional memory”, which is a simpler mechanism for scalable thread synchronization.
[31]
Intel To Disable TSX By Default On More CPUs With New Microcode
Jun 28, 2021 · With forthcoming microcode updates will effectively deprecate TSX for all Skylake Xeon CPUs prior to Stepping 5 (including Xeon D and 1st Gen ...
[32]
Studying Intel TSX Performance
Nov 11, 2013 · TSX vs Spin Lock: Transaction Size 64 cache lines is a point at which TSX gets too much aborts (6,4M in camparison with only 7K for 32 cache ...
[33]
Software Transactional Memory: Why Is It Only a Research Toy?
Oct 24, 2008 · Transactional memory (TM)13 is a concurrency control paradigm that provides atomic and isolated execution for regions of code.
[34]
[PDF] Transactional Locking II - People | MIT CSAIL
This paper introduces the transactional locking II (TL2) algorithm, a software transactional memory (STM) algorithm based on a combination of commit-time ...
[35]
[PDF] Lowering the Overhead of Nonblocking Software Transactional ...
Memory management. Both data objects and dynamically allocated metadata (transaction descriptors, DSTM Locators, OSTM Object Handles) require memory management.
[36]
Lazy Versus Eager Conflict Detection in Software Transactional ...
Dec 13, 2010 · The conflict detection policy used in STM, which can be of lazy or eager type, determines the point at which transactions are aborted.
[37]
[PDF] Flexible Decoupled Transactional Memory Support 1 Introduction
Pessimistic (eager) systems perform both conflict detection and conflict management as soon as pos- sible. Optimistic (lazy) systems delay conflict management ...
[38]
[PDF] Understanding Tradeoffs in Software Transactional Memory
STMs with memory management schemes, and the role of overheads and abort rates in STM performance. 1 Introduction. A goal of current multiprocessor software ...
[39]
[PDF] Low-Overhead Software Transactional Memory with Progress ...
... management policies lead to different abort rates for the STMs. Several programs have a trivial abort rate; others abort roughly 10% of their transactions.
[40]
PIM-STM: Software Transactional Memory for Processing-In-Memory ...
Apr 27, 2024 · This work tackles the problem of how to build efficient software implementations of the Transactional Memory (TM) abstraction by introducing PIM-STM.
[41]
Hybrid NOrec: a case study in the effectiveness of best effort ...
Transactional memory (TM) is a promising synchronization mechanism for the next generation of multicore processors. Best-effort Hardware Transactional ...
[42]
[PDF] Invyswell: A Hybrid Transactional Memory for Haswell's Restricted ...
Jun 25, 2014 · Because of this, a software fallback system must be used in conjunction with Haswell's RTM to ensure transactional programs ex- ecute to ...
[43]
DUMBO: Making durable read-only transactions fly on hardware ...
We propose DUMBO, a new design for PHT that eliminates the two most crucial bottlenecks that hinder RO transactions in state-of-the-art PHT.
[44]
Intel® Transactional Synchronization Extensions (Intel® TSX)...
Nov 12, 2019 · Intel TSX supports atomic memory transactions that are either committed or aborted. Upon an Intel TSX abort, all earlier memory writes inside ...Missing: hybrid wrappers
[45]
[PDF] On Optimistic Methods for Concurrency Control - Computer Science
In this paper, two families of nonlocking concurrency controls are presented. ... 2, June 1981. Page 8. 220. *. H. T. Kung and J. T. Robinson. Fig. 3. Transaction ...
[46]
On optimistic methods for concurrency control - ACM Digital Library
In this paper, two families of nonlocking concurrency controls are presented. The methods used are “optimistic” in the sense that they rely mainly on ...
[47]
[PDF] Practical lock-freedom
It is my thesis that lock-free data structures have important advantages com- pared with lock-based alternatives, and that tools for applying lock-free tech-.
[48]
Stealthy Azul's Java plans unstealthed - The Register
At last check, Azul Systems held tight to its stealth mode status, but a few sources have come forward to expose the company's hardware and ...
[49]
Sun slots transactional memory into Rock - The Register
Aug 21, 2007 · "HyTM exploits HTM support if it is available to achieve higher performance for transactions that do not exceed the HTM's limitations, and ...
[50]
Sun Can Kill Rock, but Not Its Memory Tech - IEEE Spectrum
Sun's Rock processor is probably dead, but the technology behind it, transactional memory, lives on at competitors. Margo Anderson. 23 Jun 2009. 2 min read.
[51]
[PDF] The Blue Gene/Q Compute Chip
Aug 18, 2011 · The Blue Gene/Q Compute chip will be the building block for a power-efficient supercomputing system that will be able to scale to tens of ...
[52]
Transactional Synchronization Extensions - Wikipedia
TSX/TSX-NI can provide around 40% faster applications execution in specific workloads, and 4–5 times more database transactions per second (TPS).Missing: limit | Show results with:limit
[53]
TSX Asynchronous Abort Advisory, INTEL-SA-00270, Disclosed ...
On November 11, 2019, Intel publicly disclosed a potential security vulnerability in Intel® Transactional Synchronization Extensions (TSX) Asynchronous Abort ( ...Missing: Haswell launch 2013 behavioral changes 2023
[54]
ARM Cortex/Neoverse IP + SoCs (no custom cores) Discussion
Apr 7, 2023 · It was announced with TME (ARM64 transactional memory) in 2019. **SVE1 first implementation was Fujitsu's custom supercomputing CPU core A64FX ...
[55]
ISO/IEC TS 19841:2015 - Technical Specification for C++ ...
ISO/IEC TS 19841:2015 describes extensions to the C++ Programming Language (1.3) that enable the specification of Transactional Memory.
[56]
Arm's Transactional Memory Extension support in gem5
Oct 27, 2020 · The Transactional Memory Extension (TME) is an optional feature of Armv9 (previously a part of Arm's A-profile Future Architecture Technologies program).Modelling Tme In Gem5 · Example Program · Compiling And Running
[57]
[PDF] Conflict Detection and Validation Strategies for Software ...
In a software transactional memory (STM) system, conflict detection is the problem of determining when two transactions cannot both safely commit.
[58]
[PDF] LogTM-SE: Decoupling Hardware Transactional Memory from Caches
LogTM-SE uses signatures to summarize a transaction's read- and write-sets and detects conflicts on coherence requests. (eager conflict detection). Transactions ...
[59]
[PDF] Intel® Transactional Synchronization Extensions
– Detected at cache line granularity. – Detected using existing cache coherence protocol. – Abort when conflicting access detected. Hardware Automatically ...
[60]
[PDF] Adaptive Software Transactional Memory*
Here, synergy between lazy acquire and early release significantly reduces the number of read-write conflicts. Eager ASTM performs a little better than DSTM ...
[61]
[PDF] A Flexible Framework for Implementing Software Transactional ...
A transaction may commit, in which case all its operations appear to take place atomically, or abort, in which case its operations appear not to have taken ...Missing: lifecycle seminal
[62]
Supporting Nested Transactional Memory in LogTM - cs.wisc.edu
Nested transactional memory (TM) facilitates software composition by letting one module invoke another without either knowing whether the other uses ...Missing: outer | Show results with:outer
[63]
[PDF] PHyTM: Persistent Hybrid Transactional Memory - VLDB Endowment
PHyTM improves on PHTM by offering both persistent HTM and highly concurrent STM, to gain the performance benefit of HTM while maintaining parallelism when a ...
[64]
[PDF] Hardware Transactional Memory meets Memory Persistency
mechanisms to ensure durability. Since these initial proposals, failure ... Persistent transactional memory. IEEE Computer. Architecture Letters, 14(1) ...<|control11|><|separator|>
[65]
Transactional Synchronization Extensions (TSX) Asynchronous Abort
Aug 14, 2023 · TSX was designed to improve the performance of these marked transaction code regions by monitoring all running threads for conflicting memory ...
[66]
Transactional Memory Extension (TME), for Armv9-A - Arm Developer
This document describes the changes and additions to the Armv9-A architecture that are introduced by the Transactional Memory Extension (TME).Missing: announcement 2019
[67]
Transactional Memory support - The Linux Kernel documentation
Hardware Transactional Memory is supported on POWER8 processors, and is a feature that enables a different form of atomic memory access.Missing: z15 2025
[68]
Power9, Power10 and Power11 System FW Release Planned ... - IBM
Jul 26, 2025 · Power9, Power10 and Power11 System FW Release Planned Schedule (2024-2025) - updated July 2025. Question & Answer. Question. In order to plan ...
[69]
NVIDIA GPU Confidential Computing Demystified - arXiv
Jul 3, 2025 · In this paper, we aim to demystify the implementation of NVIDIA GPU-CC system by piecing together the fragmented and incomplete information disclosed from ...
[70]
What's on tap from RISC-V in 2025?
Feb 26, 2025 · RISC-V is moving towards further standardization with the RVA23 Profile, ensuring software compatibility while retaining flexibility. AI and ...Missing: transactional memory
[71]
RISC-V Instruction Set Architecture Extensions: A Survey
Jan 1, 2023 · The goal of this paper is to provide a comprehensive survey on existing works about RISC-V ISA extensions.
[72]
Transactional Memory Extension (TME) intrinsics - Arm Developer
Intrinsics. uint64_t __tstart (void);. Starts a new transaction. When the transaction starts successfully the return value is 0. If the transaction fails, all ...
[73]
RSTM: Libraries - Computer Science : University of Rochester
Jun 26, 2009 · Word-based software transactional memory maps data to metadata on a per-word basis. The mapping is typically, but not always, a many-to-one ...
[74]
Software transactional memory implemented using the TL2 algorithm
An implementation of software transaction memory (STM) using the TL2 algorithm. It aims to create atomic, consistent, and isolated access to memory as ...Missing: libraries libtm DeuceSTM
[75]
stm: Software Transactional Memory - Hackage - Haskell.org
Apr 12, 2024 · Software Transactional Memory, or STM, is an abstraction for concurrent communication. The main benefits of STM are composability and modularity.Missing: PyPy Python
[76]
Software Transactional Memory - PyPy documentation
In this document I describe “pypy-stm”, which is based on PyPy's Python 2.7 interpreter. Supporting Python 3 should take about half an afternoon of work.Getting Started · User Guide · Transaction...Missing: Haskell | Show results with:Haskell
[77]
PIM-STM: Software Transactional Memory for Processing-In ... - arXiv
This work tackles the problem of how to build efficient software implementations of the Transactional Memory (TM) abstraction by introducing PIM-STM.
[78]
Kcas — STM for OCaml
Kcas. Software Transactional Memory for OCaml. Create and use modular and composable concurrent abstractions with ease. Use familiar programming techniques.Missing: PIM- 2024 ZIO Scala
[79]
STM - ZIO
An STM[E, A] represents an effect that can be performed transactionally resulting in a failure E or a success A.Missing: PIM- 2024 OCaml kcas<|separator|>
[80]
Software Transactional Memory in Java Using Multiverse | Baeldung
Jan 8, 2024 · In this article, we'll be looking at the Multiverse library – which helps us to implement the concept of Software Transactional Memory in ...<|separator|>
[81]
Bringing Verse Transactional Memory Semantics to C++
Mar 15, 2024 · AutoRTFM is a new compiler that adds transactional memory semantics to C++ code, enabling it to be compatible with Verse by automatically ...
[82]
Hardware/software cooperative caching for hybrid DRAM/NVM ...
Hybrid memory systems composed of DRAM and NVM have the best of both worlds, because NVM can offer larger capacity and have near-zero standby power consumption ...Missing: ease- annotations
[83]
[PDF] Performance Pathologies in Hardware Transactional Memory
With lazy conflict detection, an HTM detects conflicts when the first of two or more conflicting transac- tions commits. Eager conflict detection may improve ...
[84]
[PDF] Pisces: A Scalable and Efficient Persistent Transactional Memory
Jul 10, 2019 · Comparing with database transaction, Transactional memory. (TM) [38] ensures atomicity, isolation and consistency (ACI), but lacks the ...<|control11|><|separator|>
[85]
[PDF] Architecture Reference Manual Supplement, Transactional Memory ...
The Transactional Memory Extension (TME), FEAT_TME, introduces the TCANCEL, TCOMMIT, TSTART, and TTEST instructions. These instructions support hardware ...
[86]
[PDF] Energy Reduction in Multiprocessor Systems Using Transactional ...
Aug 10, 2005 · We show that transactional memory has an advantage in terms of energy consumption over locks, but that this advan- tage largely depends on the ...
[87]
[PDF] Self-Tuning Intel Transactional Synchronization Extensions - USENIX
Jun 18, 2014 · As such, programmers must provide a soft- ware fallback path when issuing a begin instruction, in which they must decide what to do upon the ...
[88]
[PDF] Applying Hardware Transactional Memory for Concurrency-Bug ...
Jul 13, 2018 · Developers can wrap a code region in a trans- action (Tx), and the underlying TM system guarantees its atomicity, consistency, and isolation.
[89]
[PDF] Understanding and UtilizingHardware Transactional Memory Capacity
HTM conflict detection typically piggybacks on existing processor cache mechanisms (Sec- tion 2.1), so if any such secondary structure did exist, it could.
[90]
Understanding and utilizing hardware transactional memory capacity
This paper deepens our understanding of how the underlying implementation and cache behavior affect the apparent capacity of HTM.
[91]
[PDF] A Survey of Microarchitectural Side-channel Vulnerabilities, Attacks ...
Side-channel attacks have become a severe threat to the confidentiality of computer applications and sys- tems. One popular type of such attacks is the ...
[92]
[PDF] But How Do We Really Debug Transactional Memory Programs?
Adjacent to this issue is the fact that input data is not visible until a transaction commits, mak- ing it difficult to collect non-deterministic input data ...
[93]
[PDF] Understanding Transactional Memory Performance
The complexity of tuning transactional performance motivated us to develop the Syncchar model. The table also shows that the idle time for some of the.
[94]
Transactional Distributed Shared Memory Thesis Summary
The union of transactions and distributed shared memory offers synergies in transaction recovery, concurrency control, and coherency control.