Fact-checked by Grok 2 weeks ago

MOESI protocol

The MOESI protocol is a used in multiprocessor and multi-core systems to maintain across multiple caches that may hold copies of the same memory block, ensuring that all processors observe a single, unified view of shared data. First implemented in AMD's processors in 2003, it extends the standard by incorporating five distinct states—Modified (M), Owned (O), Exclusive (E), Shared (S), and Invalid (I)—which govern how cache lines are accessed, updated, and shared, while optimizing bus traffic through mechanisms like cache-to-cache data transfers. In the MOESI protocol, the Invalid (I) state indicates that a cache line contains no valid and must be fetched from or another on access. The Exclusive (E) state signifies that the holds the only clean copy of the , matching main and allowing local writes without notification. The Shared (S) state denotes multiple clean copies across caches, permitting read-only access but requiring invalidation for writes. The Modified (M) state represents a unique, dirty copy that has been altered locally, making main stale and necessitating a write-back on . Uniquely, the Owned (O) state handles shared dirty , where one acts as the owner of the modified line—supplying it to other caches on read requests without immediate write-back to —thus enabling efficient sharing while deferring updates. State transitions in MOESI are triggered by local processor actions (reads, writes) or bus snooping events, such as a remote read (BusRd) that converts M to O for data supply or E to S for sharing, and a remote write request (BusRdX) that invalidates S, M, or O lines while flushing dirty data. This design reduces coherence overhead compared to MESI by avoiding redundant memory accesses in producer-consumer scenarios, where the owner can directly transfer data between caches. The protocol is implemented in hardware by processor vendors, notably in AMD's x86-64 architectures like the Opteron series for maintaining coherency in symmetric multiprocessing environments, and in certain ARM-based multi-core designs. Its advantages include lower latency for shared data access and reduced bandwidth usage on the interconnect, making it suitable for high-performance computing where cache contention is common.

Introduction

Definition and Purpose

The MOESI protocol is a full coherency mechanism employed in shared-memory multiprocessor systems to maintain among multiple copies of data. It utilizes five distinct states—Modified (M), Owned (O), Exclusive (E), Shared (S), and Invalid (I)—to track the status of individual lines, where each state defines the permissions for reading, writing, and sharing data while ensuring that all caches observe a consistent view of memory. This protocol builds on earlier designs like MESI by incorporating the Owned state, allowing for more efficient handling of shared modified data without immediate updates to main memory. The primary purpose of the MOESI protocol is to address the problem in multiprocessor environments, where multiple processors may simultaneously access and modify the same data, by enforcing rules for cache line modifications, sharing, and invalidation to prevent the propagation of stale or inconsistent data across caches. It guarantees key coherence invariants, such as single-writer-multiple-reader access patterns and the delivery of the most recent data value to requesting caches, thereby preserving the illusion of a single, unified memory system despite distributed caches. At its core, the MOESI protocol optimizes system performance by minimizing consumption through strategic transitions that reduce unnecessary write-backs to main , such as retaining of modified in the Owned to supply it directly to other caches or enabling silent upgrades from Exclusive to Modified without bus notifications. This approach lowers coherence overhead in bus-based or directory-based implementations, making it suitable for scalable multicore architectures.

Historical Context

The MOESI protocol emerged as an extension of the earlier during the late 1980s, specifically introduced in 1986 by Paul Sweazey and Alan Jay Smith in their seminal paper on compatible cache consistency protocols supported by the . This development addressed the growing demands of multiprocessor systems, where increasing numbers of cores necessitated more nuanced state management to handle shared data efficiently and minimize unnecessary memory traffic in snooping-based architectures. Key milestones in MOESI's adoption occurred in the and , with early commercial implementations appearing in environments. IBM's microprocessor, released in 2004, incorporated a snooping featuring MOESI-like states to support multiscope across multi-chip modules, marking one of the first major deployments in enterprise server architectures. By the , the saw widespread integration into ARM-based processors, where most designs—excluding the Cortex-A9 introduced in 2007—adopted MOESI to optimize in and systems with rising core counts. The evolution of was primarily driven by the need to enhance performance in scaling multiprocessor designs, particularly by introducing the Owned state to better support producer-consumer sharing patterns without excessive writebacks to , thereby reducing bandwidth contention in systems transitioning from uniprocessors to multicore configurations. Notable early adopters included IBM's for scalable servers and cores in consumer devices, reflecting its role in enabling efficient shared- parallelism amid the microprocessor proliferation of the era.

Background Concepts

Cache Coherence Fundamentals

In shared-memory multiprocessor systems, the problem emerges when multiple processors maintain private caches that hold copies of the same blocks from main memory, potentially resulting in inconsistent views of shared across the system. This inconsistency arises because updates to a block in one cache are not automatically propagated or reflected in other caches, leading to processors operating on divergent copies of the same memory location. Key challenges include the risk of stale data, where a processor reads an obsolete version of a memory block while another processor has already modified it, and race conditions during concurrent writes, where the order of updates becomes unpredictable without synchronization. Cache coherence protocols address these by enforcing —ensuring that operations on shared data appear serialized to all processors—and , guaranteeing that updates become apparent to subsequent reads by other processors in a timely manner. These mechanisms prevent incorrect execution in parallel environments, such as when one processor updates a shared and others must observe the change to avoid errors in computation. Two primary approaches to maintaining are snooping-based and directory-based protocols. Snooping protocols rely on a shared interconnect, such as a bus, where each controller monitors (or "snoops") all transactions broadcast by other processors to detect and respond to events affecting cached data, making them efficient for small-scale systems with broadcast support. In contrast, directory-based protocols use a centralized or distributed to track the state and location of cached blocks, enabling point-to-point communication without broadcasts, which scales better to larger systems but introduces higher and overhead for directory maintenance. Snooping remains relevant for bus-connected multiprocessors due to its simplicity and low implementation cost. As background for understanding cache states in coherence protocols, multi-level cache hierarchies often employ inclusion or exclusive policies to manage data replication. An inclusion policy requires that all data in higher-level caches (e.g., L1) is also present in lower-level caches (e.g., or L3), facilitating straightforward by allowing lower levels to serve as authoritative copies, though it may duplicate data and reduce effective . An exclusive policy, conversely, prohibits the same data block from residing in multiple cache levels simultaneously, maximizing total by avoiding but requiring more complex mechanisms to handle data movement and during transfers between levels. These policies influence how traffic and state transitions are optimized in hierarchical designs.

Evolution from MESI

The MESI (Modified, Exclusive, Shared, ) cache coherence protocol, introduced in , provides a foundational mechanism for maintaining consistency in multiprocessor systems with private write-back s by tracking the status of cache lines across multiple caches. In MESI, the Modified state indicates a cache line that has been altered and is the sole copy, Exclusive denotes a clean unique copy, Shared represents read-only copies in multiple caches, and marks unusable lines; however, this design requires that modified data be written back to main memory before it can be shared, leading to increased bus traffic in scenarios involving frequent reads of recently modified data. The MOESI protocol, proposed in as part of a of compatible protocols for the IEEE Futurebus, extends MESI by introducing a fifth state, Owned, to address these limitations. The Owned state allows a to hold a modified (dirty) copy of a line as the authoritative owner while permitting other caches to obtain read-only shared copies directly from the owner via , without requiring an immediate write-back to memory. This addition merges the benefits of the Modified state's dirtiness with the Shared state's multi-reader capability, enabling efficient handling of data that is both actively modified by one cache and accessed by others. The primary motivation for the Owned state in MOESI stems from the need to minimize bus contention in bus-based multiprocessor architectures, where MESI's mandatory memory write-backs for sharing dirty data can bottleneck performance under workloads with shared writable data. By facilitating direct cache-to-cache transfers during snooping operations, MOESI reduces the number of memory accesses and overall traffic, particularly beneficial in systems like those using high-performance 32-bit microprocessors of the era. This evolution resolves MESI's inefficiency in ownership scenarios, where data requires centralized modification control without full exclusivity, paving the way for more scalable coherence in shared-memory environments.

Protocol States

Modified State

In the MOESI cache coherence protocol, the Modified state signifies that the local cache holds the , up-to-date (dirty) copy of a cache line, which has been altered by the and no longer matches the stale version in main memory. This state ensures the data's exclusivity, preventing other caches from accessing it without , and positions the local cache as the authoritative source for the line. The cache line in this state is writable without bus transactions for local operations, but any modifications must eventually propagate to maintain system-wide consistency. A cache line enters the Modified state via a local write operation to a line previously in the Exclusive state, where the write dirties the clean exclusive copy while preserving its unshared nature, or to an Owned line via a local write that regains exclusivity. Entry can also occur from Shared or Invalid states on a write miss that acquires exclusive access and modifies the data. The Modified state is exited through several mechanisms to uphold . On or explicit flush, the dirty is written back to main , typically transitioning the line to . A snoop hit from a read request by another prompts the local cache to supply the updated , often transitioning to the Owned state to allow shared access to the modified copy without immediate update. For a snoop write request, the local cache supplies the and transitions to , ensuring the requester gains exclusivity. This state plays a critical role in write , guaranteeing that modifications appear across processors by enforcing exclusivity until shared or discarded. The owning bears responsibility for responding to snoop requests with the latest , optimizing traffic by enabling direct cache-to-cache transfers and deferring writes.

Owned State

In the MOESI cache coherence protocol, the Owned state indicates that a cache line contains modified data that is potentially shared across multiple caches, with the owning cache holding the most recent and authoritative copy while main memory remains stale. Unlike the Modified state, which enforces exclusivity, the Owned state permits other caches to hold read-only copies in the Shared state, allowing the owner to supply updated data directly to requesters. This state is unique to and is implemented in systems like AMD's architecture to handle scenarios where dirty data needs to be accessed concurrently without immediate memory intervention. A line enters the primarily from the when a snoop request for shared access arrives from another , prompting the owner to downgrade its exclusivity while retaining for supply. These entry conditions ensure that modifications propagate efficiently in multi-core environments without unnecessary bus traffic. The Owned exits upon , transitioning to Invalid after a write-back to main memory to preserve the modifications; on a local exclusive write by the owner, it upgrades to Modified to regain full exclusivity; or to Shared or Invalid in response to broader sharing or invalidation requests that relinquish ownership. Only one can hold the Owned at a time, maintaining by designating a single point of authority for the dirty data. The primary benefit of the Owned state lies in facilitating direct cache-to-cache transfers of dirty data, where the owner supplies the latest version to other caches, thereby avoiding costly write-backs to main memory and reducing overall and usage on the interconnect. This optimization is particularly valuable in shared-memory multiprocessors, as it minimizes delays in data dissemination compared to protocols lacking this state.

Exclusive State

In the MOESI cache coherence protocol, the Exclusive state denotes a cache line that is valid, unmodified (), and identical to the corresponding in main , while being held solely by one with no copies present in any other caches. This state ensures read-only access initially, granting the holding exclusive permission without ownership responsibilities for sharing the . Unlike the Shared state, which permits multiple copies across caches, Exclusive maintains uniqueness to support private handling. A cache line enters the Exclusive state primarily through a read miss where the requesting cache issues a GetS (or equivalent snoop request) and confirms via directory or bus snooping that no other cache holds a valid copy, prompting a fetch from main memory or the last-level cache. This transition commonly occurs from the Invalid state on an uncontested load miss, where the response provides unique data without indications of sharing. The process avoids unnecessary coherence actions if exclusivity is verified, streamlining initial acquisition for private data. From the Exclusive state, a local write by the holding transitions the line to the Modified state without requiring a bus , as no other caches need notification. A snoop hit from another cache's read request changes it to the Shared state, allowing multiple readers while preserving cleanliness. Invalidation requests from other caches (e.g., for their writes) demote it to , and is silent with no write-back to memory due to the clean nature of the data. In contrast to the Modified state, which involves dirty (modified) private data requiring eventual write-back, Exclusive handles only unmodified lines. The Exclusive state optimizes performance for private, clean data by enabling efficient read-then-write sequences, as the silent upgrade to Modified halves coherence traffic compared to starting from Shared. It reduces bus contention and latency in multiprocessor systems by minimizing invalidations for non-shared blocks, particularly beneficial in workloads with localized access patterns. This design supports single-writer, multiple-reader invariants without immediate memory updates, enhancing overall system efficiency in implementations like those in AMD processors.

Shared State

In the MOESI cache coherence protocol, the Shared state indicates that a cache line is present in multiple caches across processors, with all copies containing the most recent and correct data that matches the main memory copy unless an Owned copy exists, in which case the copies match the modified Owned data and main memory is stale, allowing read-only access by any holder without requiring ownership transfer. This state ensures that the data remains up-to-date across caches, distinguishing it from scenarios where memory might be stale due to modifications held elsewhere. A line enters the primarily through a read operation where the requesting receives the from another that supplies it—typically from that cache's Exclusive, Modified, or Owned —or via a shared read miss resolved directly from main memory when no other caches hold a modified version. In such cases, the bus or interconnect responds with a shared signal, indicating that multiple caches can retain valid copies, often triggered by a snoop hit that confirms the presence in other caches without granting exclusive access. The Shared state is exited when a local processor attempts to write to the line, prompting an invalidation of all other shared copies across the system and transitioning the local copy to the Modified state to maintain coherence. Alternatively, eviction of the line from a cache due to replacement requires no write-back to memory if clean, simply moving it to the Invalid state without further bus traffic. This state facilitates efficient multi-processor read sharing by permitting concurrent read access without the overhead of designation, as snooping mechanisms can verify and propagate the shared status across s, reducing for read-heavy workloads in systems like AMD's multi-core architectures. Unlike the Exclusive state, which limits the clean copy to a single for potential future writes, Shared explicitly supports multiple holders for optimized read distribution. In contrast to the Owned state, which involves a dirty copy with a designated supplier and stale , Shared can hold either clean matching or dirty matching an Owned copy.

Invalid State

In the MOESI cache coherence protocol, the Invalid state indicates that a cache line does not contain a valid copy of the data, meaning it is either absent from the cache or holds stale information that cannot be used by the processor. This state serves as the baseline condition where no local data is available, necessitating retrieval from main memory or another cache upon any access attempt. Cache lines enter the Invalid state under several conditions, including system power-up or , where all cache lines initialize to this state to ensure a clean starting point free of unverified . Eviction of a cache line to make room for new also transitions it to Invalid, preventing the retention of potentially outdated entries. Additionally, snooping mechanisms trigger invalidation when another issues a write , such as through a BusRdX transaction, which broadcasts an invalidation to maintain across caches. Exiting the Invalid state occurs primarily on cache misses. A read miss prompts the processor to issue a BusRd request, transitioning the line to the Exclusive state if no other caches hold a copy or to the Shared state if copies exist elsewhere, allowing read access without modification. For a write miss, the processor issues a BusRdX request to acquire exclusive ownership, invalidating copies in other caches and transitioning directly to the Modified state after updating the data, ensuring the write propagates correctly. The Invalid state plays a critical role as the foundational point for all coherence operations in MOESI, guaranteeing that processors never operate on inconsistent or obsolete data by forcing fresh fetches on every access from this condition. This mechanism underpins the protocol's reliability in multiprocessor systems, where it prevents data races and maintains a single consistent view of memory.

Operational Mechanics

Read Operations

In the MOESI protocol, a read operation begins when a requests from its local . If the request results in a —meaning the cache line is present in one of the valid s (Exclusive, Shared, Owned, or Modified)—the is supplied directly from the local without initiating any bus traffic or state changes. Specifically, in the Exclusive or Shared states, no further action is required as the is already accessible and consistent across caches. In contrast, if the occurs in the Owned state, the local supplies the internally and can respond to snoops from other s by providing the most recent copy without altering its own . If the occurs in the Modified state, the local supplies the internally, but responding to a snoop from another for a read request transitions it to the Owned state to reflect shared dirty . For a read miss, where the cache line is in the Invalid state, the requesting processor issues a bus read request to acquire the data. This triggers snoop probes to other caches to check for existing copies. If no other caches respond with a shared signal or data supply, the data is fetched from main memory, and the requesting cache transitions to the Exclusive state, indicating it holds the only clean copy. However, if other caches assert a shared response—such as from Shared or Exclusive states—the requesting cache transitions to the Shared state and obtains the data, typically from memory since these states hold clean copies. If a cache in the Owned or Modified state detects the snoop, it supplies the latest (potentially dirty) data directly to the requestor, causing the requesting cache to transition to the Shared state; the supplier in Owned remains in Owned, while the one in Modified transitions to Owned to reflect shared ownership of the dirty data. This snoop-based mechanism optimizes read misses by allowing caches in Owned or Modified states to supply peer-to-peer, bypassing main access when the supplier holds the up-to-date version. As a result, the Owned state remains unchanged after supplying a copy, preserving its role as the authoritative holder for subsequent reads without forcing a write-back. The state always transitions to either Shared upon a successful read miss resolution, ensuring the data is now valid and coherent.

Write Operations

In the MOESI protocol, write operations are handled differently based on whether the cache line is present (hit) or absent (miss) in the requesting cache, ensuring coherence through state transitions and bus interventions. For a write hit in the Modified state, the processor updates the data locally without any bus activity, as it already holds exclusive ownership of the modified copy, maintaining the line in the Modified state. Similarly, a write hit in the Owned state allows local modification but transitions the line to the Modified state, revoking shared read access from other caches to enforce exclusive write permission. In the Exclusive state, a write hit silently updates the unmodified copy and changes the state to Modified, with no need for invalidations since no other caches hold the line. However, for a write hit in the Shared state, the requesting cache issues a bus upgrade request to invalidate all other shared copies, transitioning its own line to Modified only after confirmations ensure no lingering sharers. Write misses occur when the line is in the Invalid state, prompting the to issue a read-for-ownership request (such as BusRdX) on the bus to fetch the data while simultaneously flushing or invalidating any Modified or Owned copies in other . The responding or provides the data, and the requesting acquires the line in the Modified state, ready for the write, with other transitioning Shared lines to if they held copies. If an Owned copy exists elsewhere, the owner supplies the latest data directly, invalidating its own copy and allowing the requester to take Modified ownership without involving main . Key state transitions during writes include non-owner caches moving from Shared to upon receiving invalidation signals, while the writer's shifts from to Modified on a miss or from Exclusive/Shared/Owned to Modified on a . Post-write, if the line is later shared via read requests from other processors, it may transition from Modified to Owned, but this occurs independently of the initial write sequence. The serializes writes through these bus upgrades or exclusive read requests, preventing concurrent modifications and ensuring that only one holds writable ownership at a time.

Snoop and Bus Transactions

In the MOESI protocol, caches enforce coherence through a snooping mechanism where each controller continuously monitors all bus for addresses corresponding to blocks in its local . Upon detecting a relevant transaction, the snooper evaluates the local (Modified, Owned, Exclusive, Shared, or ) and generates an appropriate response to maintain system-wide consistency, such as supplying data or invalidating copies. This broadcast-based approach leverages the shared bus to propagate coherence actions efficiently without centralized directories. The protocol defines three primary bus transaction types to handle read and write accesses: BusRd, BusRdX, and BusUpgr. BusRd is initiated by a on a read miss to obtain a shared copy of the block; snooping caches respond by supplying data if holding it in Modified or Owned states, or by asserting a shared signal if in Shared state, allowing to provide the block otherwise. BusRdX is used for exclusive access, typically preceding a write, where the requesting obtains the block while snooping caches invalidate their copies and supply data if they are the current owner. BusUpgr occurs on a write hit to a Shared block, upgrading it to Modified without fetching data but requiring snooping caches to invalidate other Shared copies to ensure exclusivity. These transactions support the Owned state's role in direct cache-to-cache transfers, reducing traffic compared to simpler protocols. Snoop responses are encoded as bus signals that collectively determine transaction outcomes and trigger state changes. For instance, SharedOK is asserted by any cache holding a Shared copy during a BusRd, informing the requester that multiple copies exist and potentially altering the response from Exclusive to Shared. A Modified or Owned response from a snooper indicates it will supply dirty data, transitioning its state to Shared if necessary while the requester moves to Exclusive or Shared. SupplyData facilitates the actual transfer from the owner cache, often in conjunction with invalidation acknowledgments to confirm coherence actions. These responses are typically implemented via wired-OR lines on the bus, allowing concurrent assertions from multiple snoopers to be resolved atomically. Bus arbitration mechanisms ensure atomicity in multi-cache interactions by serializing access to the shared medium. A central arbiter grants the bus to one agent at a time using fair algorithms like or priority-based schemes, preventing overlapping transactions that could violate ordering. In split-transaction buses, additional tags identify ongoing requests, and response phases are separated from requests to improve utilization while maintaining ordering for ; snoopers must handle these in buffers to avoid races during writebacks or interventions.

Advantages and Implementations

Key Advantages

The MOESI protocol offers significant performance benefits through its Owned state, which enables the sharing of modified (dirty) data among multiple s without requiring an immediate write-back to main . This state allows one cache to act as the authoritative owner, supplying data directly to requesting caches via cache-to-cache transfers, thereby eliminating unnecessary memory interventions. As a result, MOESI reduces overall memory traffic in scenarios involving frequent shared modifications, such as producer-consumer workloads common in multi-threaded applications. One key advantage is the substantial reduction in bus bandwidth usage, particularly in shared-write intensive environments. By avoiding the write-back overhead that protocols like MESI incur when transitioning modified data to shared state, can cut bus traffic significantly—for instance, benchmarks on systems using shared-buffer reuse patterns show throughput improvements of 5-50% compared to configurations without optimized dirty data sharing. This efficiency stems from fewer bus transactions, as the Owned state facilitates direct data forwarding, minimizing contention on shared interconnects. Additionally, provides lower access latency for shared modified data by enabling direct cache-to-cache communication, bypassing slower round-trips to main . In dual-core evaluations, such as those on Athlon 64 X2 processors, cache-to-cache transfer latencies are reported at around 68 ns, contributing to overall system speedups in multi-programmed workloads without the delays associated with write-backs. This latency reduction enhances responsiveness in coherent multi-core environments. The protocol's design also improves for multi-core systems with high sharing degrees, as the Owned state better manages data ownership among numerous processors, reducing overhead as core counts increase. Simulations and power consumption analyses indicate that maintains lower broadcast traffic growth compared to simpler protocols, supporting efficient operation in systems with 4-8 cores or more by optimizing shared data handling.

Hardware Implementations

In architectures, has implemented the MOESI protocol in processors like the series since the early 2000s, using it for snoopy cache coherence in multi-socket (SMP) environments. The protocol supports the interconnect for inter-processor communication, enabling efficient data sharing across nodes. Later designs, such as those in the Zen architecture (e.g., and series as of 2025), continue to employ MOESI variants for maintaining coherency in multi-core and multi-chiplet configurations. In ARM-based processors, the MOESI protocol is the standard for most Cortex-A series cores, enabling coherent multi-core operation in system-on-chip (SoC) designs. For instance, the Cortex-A53 implements MOESI to maintain data coherency across its L1 data caches, with states encoded in the tag RAM for shareable lines. Similarly, the Cortex-A72 utilizes a hybrid approach with MOESI for L2 cache integration and MESI for L1, supporting both inclusive and exclusive cache policies where the L2 acts as an inclusive backing store to higher-level caches. In contrast, the earlier Cortex-A9 deviates by using the simpler MESI protocol, lacking the Owned state for optimized dirty data transfers. Modern implementations of appear in and SoCs, where it facilitates multi-core in power-constrained environments. Qualcomm's Snapdragon X series, featuring Oryon cores, employs for full coherency across its L1 and shared caches, with a 64-byte line size to handle read/write transactions efficiently (as of ). Variations of MOESI extend to directory-based systems for scalability in larger multi-core setups, where a centralized tracks line ownership to reduce snoop traffic. In chip multiprocessor (CMP) designs, such as those simulated in gem5, MOESI with coalesces requests and handles the Owned state to minimize overhead in non-inclusive hierarchies. These adaptations integrate with L2 and L3 caches, supporting inclusive policies in SoCs for broader domains and exclusive policies to avoid redundant data storage.

Comparisons

With MESI Protocol

The MOESI protocol extends the by introducing an Owned state, which allows a cache line to be both modified (dirty) and shared among multiple caches without requiring an immediate write-back to main memory. In contrast, the 's Modified state is exclusive to a single cache, and any attempt to share modified data necessitates a write-back to memory before transitioning to the Shared state, followed by other caches acquiring a clean copy. This Owned state in MOESI effectively merges the behaviors of 's Modified and Shared states for scenarios involving dirty data sharing, enabling the owning cache to supply the most recent copy directly to requesting caches while retaining responsibility for eventual updates. Transaction differences arise primarily in write-sharing operations, where optimizes bus activity by permitting direct -to- transfers of dirty data via the Owned state, avoiding the need for a full write-back and subsequent exclusive read (such as a BusRdX ) in many cases. For instance, when a in Owned state receives a read request, it can forward the dirty data to the requester without flushing to memory, reducing the number of bus cycles compared to MESI, which would require the Modified to write back data, invalidate copies, and then perform an exclusive read for the new writer. This results in fewer messages and lower latency for shared modifications, as eliminates redundant accesses that MESI mandates for state transitions involving dirty lines. In terms of performance, demonstrates advantages in write-sharing scenarios by reducing bus traffic and write-back operations, with studies showing up to 23% fewer write-backs compared to , and both protocols achieving an average 7% reduction in broadcast traffic compared to simpler protocols like across multiprocessor benchmarks. While 's four-state design is simpler and incurs lower hardware complexity for basic invalidation and sharing, it is less efficient in environments with frequent shared writes, leading to higher miss rates and increased pressure. 's optimizations make it more suitable for high-sharing workloads, such as those in multi-core systems with heavy inter-cache communication. MESI finds use in simpler systems prioritizing ease of implementation, such as Intel's early x86 processors and the , where exclusive ownership suffices for most coherence needs without the added Owned state logic. Conversely, MOESI is employed in high-sharing environments like processors and most multi-core implementations, where the efficiency gains in dirty justify the additional state management.

With MOSI Protocol

The protocol extends the MOSI protocol by introducing an Exclusive (E) state, which specifically denotes a clean, private copy of a line that matches the main memory and is held uniquely by one , allowing for silent upgrades to the Modified state without generating traffic. In contrast, the MOSI protocol, consisting of Modified (M), Owned (O), Shared (S), and Invalid (I) states, lacks this distinct Exclusive state and instead merges the handling of private into the Owned state, treating all private copies—whether clean or dirty—as potentially owned, which implies that the assumes responsibility for supplying the to other requesters even if it is unmodified. This difference arises because the Owned state in MOSI is primarily designed for dirty that can be shared read-only by other caches while delaying write-backs to memory, effectively broadening its role to encompass scenarios where MOESI would use Exclusive for cleaner separation of clean private . Efficiency trade-offs between and MOSI stem from this distinction: 's Exclusive optimizes scenarios involving reads by avoiding the overhead associated with notifying or supplying to non-existent sharers, thereby reducing unnecessary bus snoops and messages in read-mostly workloads. MOSI, being simpler with one fewer , incurs potential extra snoops when confirming exclusivity for , as the Owned may trigger broader checks to ensure no other caches hold copies, leading to higher traffic in systems with frequent -to-shared transitions. Simulations indicate that while MOSI reduces write-backs by up to 23% compared to simpler protocols like MESI through its Owned , combines this benefit with the Exclusive 's contribution to broadcast reduction (up to 24% compared to protocols like ), offering better overall efficiency at the cost of increased . In terms of transaction variances, MOESI's Exclusive state enables avoidance of unnecessary ownership transfers during read-only private cases, such as when a cache holds a unique clean copy and later modifies it without needing to acquire from another cache or , streamlining upgrade paths. MOSI, however, applies the Owned state more broadly to private modifications, which can necessitate additional snoop responses or interventions to maintain the owner's role in supplying , potentially increasing for exclusive writes in low-sharing environments. This makes MOESI particularly advantageous for workloads with a mix of private reads and occasional writes, as it minimizes overhead without the pervasive ownership assumptions of MOSI. Regarding adoption contexts, MOSI has been implemented in certain older multiprocessor systems emphasizing simplicity and shared dirty data handling, such as variants in directory-based architectures like aspects of the Sun Starfire E10000, where reducing memory write-backs was prioritized over fine-grained exclusivity. In comparison, MOESI is preferred in modern balanced private/shared workloads, appearing in hardware like processors and most cores (except Cortex-A9), due to its ability to handle both exclusive clean data and owned dirty sharing efficiently, supporting higher performance in diverse applications.

References

  1. [1]
    [PDF] AMD x86-64 Architecture Programmer’s Manual, Volume 2 ...
    Instruction-execution activity and external-bus transactions can both be used to modify the cache MOESI state in multiprocessing or multi-mastering systems.
  2. [2]
    [PDF] Cache Coherence - Overview of 15-740
    Caches “snoop” (observe) each other's write/read operations. If a processor writes to a block, all others invalidate it from their caches. A simple protocol: 12.
  3. [3]
    [PDF] Lecture #17 - "Multicore Cache Coherence"
    Oct 25, 2017 · ▫ Observation: shared ownership prevents cache-to-cache transfer, causes unnecessary memory read. ▫ Add O (owner) state to protocol: MOSI/MOESI.<|control11|><|separator|>
  4. [4]
    [PDF] A Primer on Memory Consistency and Cache Coherence, Second ...
    • is as scalable as the cache coherence protocol it uses, and. • decouples the complexities of implementing cores from implementing coherence. Page 51. 3.8 ...
  5. [5]
    A class of compatible cache consistency protocols and their support ...
    In this paper we define a class of compatible consistency protocols supported by the current IEEE Futurebus design. We refer to this class as the MOESI class of ...
  6. [6]
    MESI and MOESI protocols - Arm Developer
    There are a number of standard ways by which cache coherency schemes can operate. Most ARM processors use the MOESI protocol, while the Cortex-A9 uses the MESI ...Missing: AMD | Show results with:AMD
  7. [7]
    [PDF] A survey of cache coherence schemes for multiprocessors - MIT
    Proposed solutions range from hard- ware-implemented cache consistency protocols, which give software a coherent view of the memory system, to schemes providing ...
  8. [8]
    A low-overhead coherence solution for multiprocessors with private ...
    A low-overhead coherence solution for multiprocessors with private cache memories. Authors: Mark S. Papamarcos, Janak H. PatelAuthors Info & Claims. ISCA '84 ...
  9. [9]
    The Directory-Based Cache Coherence Protocol for the DASH ...
    In this paper, we present the design of the DASH coherence protocol and discuss how it addresses the above issues. We also discuss our strategy for ...
  10. [10]
    [PDF] Achieving Non-Inclusive Cache Performance with Inclusive Caches
    This paper focuses on improving inclusive cache performance without sacrificing its benefits.
  11. [11]
    [PDF] SIMD Instructions MOESI Cache Coherence - EECS Instructional
    MOESI Cache Coherence. State. Cache up to date? Memory up to date? Others have copy? Can respond to other's reads? Can write without changing state? Modified.
  12. [12]
    [PDF] Cache Coherence & Memory Models
    MOESI State Diagram. 32. Source: AMD64 Architecture Programmer's Manual. Page 33. Related Protocols: MOESI (AMD). □ Modified (M): Modified Exclusive. □ No ...
  13. [13]
    [PDF] Design of MOESI protocol for multicore processors based on FPGA
    Figure 5: State transition diagram for the MOESI protocol. • Advantage of MOESI Protocol: 1. Avoid extra CPU stalls when writing to main memory. 2. If only ...
  14. [14]
    [PDF] Lecture #21 - "Multicore Cache Coherence"
    Nov 14, 2016 · ▫ Snoops that hit dirty lines? ▫ Flush modified data out of cache ... MOESI Coherence Protocol. 11/14/2016 (© J.P. Shen). 18-600 Lecture #21.
  15. [15]
    [PDF] Parallel Processing Cache Coherency
    MOESI State Pairs. O. M. E. I. S. A Class of Compatible Cache Consistency Protocols and their Support by the IEEE Futurebus, P. Sweazy and A. J. Smith © 1986.
  16. [16]
    [PDF] An Evaluation of Snoop-Based Cache Coherence Protocols
    The MOESI protocol, as the name may suggest, is a five-state protocol that ... In this paper, we described several existing snoop-based coherence protocols.
  17. [17]
    [PDF] Exclusive Hierarchies for Predictable Sharing in Last-level Cache
    For example, a popular protocol is MOESI [15] that introduces two additional states. (1) Exclusive (E): identifies a cache line as a read-only and exclusive ...
  18. [18]
    [PDF] Software Optimization Guide for the AMD Family 15h Processors
    Jan 8, 2014 · The information contained herein is for informational purposes only, and is subject to change without notice.
  19. [19]
    [PDF] Unit 13: Multicore - UPenn CIS
    MOESI Protocol State Transition Table. This Processor. Other Processor. State. Load. Store. Load Miss Store Miss. Invalid (I). Miss. → S or E. Miss. → M.
  20. [20]
    [PDF] EE 457 Cache Coherence
    Snooping protocols require broadcasts to L1/L2 caches of all cores of all CPUs at every miss (read/write). Every core has to handle every miss event in the ...Missing: explanation | Show results with:explanation
  21. [21]
    [PDF] Cache Coherence
    Coherence means to provide the same semantic in a system with multiple copies of M. • Formally, a memory system is coherent iff it behaves as.
  22. [22]
    [PDF] EECS 570 Lecture 8 Bus-based SMPs
    MOESI Protocol. • MESI must write-back to memory ... MOESI Framework. [Sweazey & Smith ISCA86]. M ... ❒ Must snoop/handle bus transactions in write buffer.
  23. [23]
    [PDF] Coherence & Snooping - Duke University
    • Up to 30 UltraSPARC processors, MOESI protocol. • GigaplaneTM bus has peak bw 2.67 GB/s, 300 ns latency. • Up to 112 outstanding transactions (max 7 per ...
  24. [24]
    [PDF] Memory hierarchy performance measurement of commercial dual ...
    The Athlon 64 x 2 processor employs MOESI protocol, which adds an ''Ownership” state to enable modified blocks to be shared on both cores without the need to ...
  25. [25]
    [PDF] Analysis of MPI Shared-Memory Communication Performance from ...
    Shared-buffer Reuse Order and MOESI Protocol. AMD platforms use the MOESI protocol that was (no- tably) designed to ease sharing of modified data. This ...<|control11|><|separator|>
  26. [26]
    [PDF] Impact of Cache Coherence Protocols on the Power Consumption of ...
    The MOESI protocol [4] adds both the Exclusive state and the Owned state, with the benefit of both reducing the number of broadcasts and the number of write- ...Missing: explanation | Show results with:explanation
  27. [27]
    Cache Coherence - an overview | ScienceDirect Topics
    Cache coherence refers to the mechanism that ensures agreement among various entities in a shared-memory system regarding the order of values observed at a ...
  28. [28]
    [PDF] Multiprocessors and Multithreading Multiprocessors Classifying ...
    • MOESI = Modified Owned Exclusive Shared Invalid ... What traffic does MOESI avoid? Snoopy-Cache State ... – IBM Power 5, 6 has 2 cores, each 2-way SMT.
  29. [29]
    IBM POWER7 multicore server processor | Request PDF
    Oct 28, 2025 · The MIST protocol is a simplified version of an IBM POWER TM Processor cache-coherence protocol [14, 15] that is based on the well-known MOESI ...
  30. [30]
    Data cache coherency - Arm Developer
    The Cortex-A53 processor uses the MOESI protocol to maintain data coherency between multiple cores. MOESI describes the state that a shareable line in a L1 Data ...
  31. [31]
  32. [32]
    [PDF] The Qualcomm Snapdragon X Architecture Deep
    Jun 13, 2024 · Cache coherency, in turn, is maintained using the MOESI protocol. The L2 cache itself runs at the full core frequency. L1/L2 cache operations, ...
  33. [33]
    Exploiting Exclusive System-Level Cache in Apple M-Series SoCs ...
    Aug 24, 2025 · This unique configuration optimizes performance while maintaining coherence across the heterogeneous system.
  34. [34]
    MOESI CMP directory - gem5
    The notation used in the controller FSM diagrams is described here. Transitions involving other chips are annotated in brown.
  35. [35]
    [PDF] MOESI-prime: Preventing Coherence-Induced Hammering in ...
    Jun 18, 2022 · Thus, our experiment offers insight both on how the existing Intel protocol behaves and an otherwise-identical. MOESI protocol would behave.<|control11|><|separator|>
  36. [36]