Fact-checked by Grok 2 weeks ago

MSI protocol

The MSI protocol, or Modified-Shared-Invalid protocol, is a foundational mechanism designed for multiprocessor systems with write-back s, ensuring that all processors observe a consistent view of by managing the validity and exclusivity of data s across multiple caches. It defines three stable states for each cache : Modified (M), where the block is exclusively held and altered in one cache, differing from main ; Shared (S), where the block is read-only and consistent across one or more caches and main memory; and Invalid (I), where the block is not present or valid in the cache. This state-based approach enforces key invariants, such as single-writer-multiple-reader (SWMR), preventing multiple caches from simultaneously modifying the same block while allowing concurrent reads. Introduced as part of a broader class of cache consistency protocols in the mid-1980s, the MSI protocol relies on a snooping-based implementation where caches monitor bus transactions to detect and respond to read or write requests from other processors, triggering state transitions to maintain coherence. Common transitions include a cache block moving from Invalid to Shared on a read miss via a BusRd request, from Shared to Modified on a write via BusRdX or upgrade, and from Modified to Shared or Invalid when another processor intervenes or the block is evicted with a writeback. These operations serialize writes and supply the most recent data value, typically from the modified cache if applicable, reducing unnecessary memory traffic compared to simpler valid-invalid schemes. While effective for basic multiprocessor environments, the MSI protocol forms the basis for more advanced variants like MESI (adding Exclusive) and (adding Owned), which address limitations such as shared-modified races or ownership tracking in larger systems. Its simplicity makes it a standard teaching example in , highlighting trade-offs in bus traffic, latency, and hardware complexity for achieving in shared-memory parallelism.

Background

Cache Coherence Fundamentals

In shared-memory multiprocessor systems, refers to the mechanism that ensures all processors observe a consistent view of the shared memory content at all times, despite each processor maintaining private s to reduce and pressure on the main . These systems typically consist of multiple processors, each equipped with its own high-speed , connected to a common main through a shared bus or scalable interconnect such as a multistage , allowing efficient to shared data while minimizing contention. The problem arises when multiple private caches hold copies of the same block, leading to potential inconsistencies if updates by one are not propagated correctly to others, resulting in stale data being read by subsequent . For instance, without proper , a writing to a shared might leave other caches with outdated values, violating the expected behavior of parallel programs that assume a unified space. This issue is exacerbated in systems with virtual address caches, where synonyms—multiple virtual addresses to the same physical —further complicate maintaining uniformity across caches. To address these challenges, cache coherence protocols must enforce key requirements, including the single writer-multiple readers (SWMR) invariant, which ensures that only one cache can modify a memory block at a time while allowing multiple caches to read it simultaneously. Additionally, writes to the same location must be serialized, meaning all processors perceive them in the same , often aligned with models that guarantee the outcome of any execution matches some interleaving of individual processor operations. Protocols like achieve this through mechanisms such as snooping on shared interconnects to detect and resolve inconsistencies.

Role of Snooping Protocols

Snooping protocols serve as the primary enforcement mechanism for in shared-memory multiprocessor systems, where multiple s may hold copies of the same data block. In these protocols, each controller continuously monitors, or "snoops," all transactions broadcast over the shared interconnect, such as a bus, to detect accesses that could affect the coherence of local lines. This monitoring ensures that caches maintain a consistent view of memory by invalidating or updating their copies as needed, preventing inconsistencies like stale data reads. The operation of snooping relies on a broadcast-based interconnect where every memory transaction—such as a read or write request—is visible to all controllers. When a initiates a transaction, it places it on the bus, and other controllers snoop this activity; if a controller detects a relevant event (e.g., another requesting exclusive access to a shared block), it responds by altering its local state, such as invalidating the block or supplying its modified copy to fulfill the request. This decentralized approach leverages the natural broadcast nature of buses to enforce without a central arbiter, making it suitable for systems where all processors share a single communication medium. Snooping protocols gained early adoption in the for bus-based multiprocessors, marking them as the first widely deployed class of cache coherence solutions due to their simplicity in small-scale systems. For instance, the (SGI) 4D series, introduced in the mid-, implemented a snooping-based similar to to support across its multiprocessor nodes. This historical reliance on snooping addressed the emerging problem in parallel systems by enabling efficient monitoring in environments with limited counts. At a high level, snooping protocols contrast with -based approaches in trade-offs: while snooping excels in low-latency for small systems (up to around 16 processors) via simple broadcasts, it suffers from bandwidth contention and escalating traffic as processor count grows, limiting its use in large-scale machines. protocols mitigate this by using a centralized or distributed tracking structure to direct point-to-point messages only to relevant caches, reducing unnecessary overhead but introducing higher complexity and potential latency from lookups. These trade-offs position snooping, including variants like , as ideal for bus-limited, modest-sized multiprocessors.

Core Protocol Mechanics

Cache States

The MSI protocol employs three fundamental states for cache lines—Modified (M), Shared (S), and Invalid (I)—to enforce consistency across multiple s in a multiprocessor system. These states dictate the validity of in a , the permissions for read and write operations, and the mechanisms for sourcing from or other s, thereby preventing processors from accessing inconsistent or outdated information. Snooping mechanisms bus transactions to enforce these states, ensuring without centralized control. In the Modified (M) state, a cache line holds the unique, up-to-date copy of the across the entire , which is "dirty" in that it differs from the stale version in main . This state provides the owning with exclusive ownership, permitting both local reads and writes without generating additional traffic, as no other cache has a valid copy. For instance, subsequent writes in this state can occur silently, relying solely on local . However, if the line is evicted or another requests it, the owning cache must supply the and write it back to to restore consistency. sourcing for other caches thus originates exclusively from this modified copy, underscoring its role in maintaining the authoritative version of the . The Shared (S) state signifies that the cache line contains a clean, unmodified copy identical to that in main , and it may be present in multiple s simultaneously. Permissions in this state are restricted to reads only, enabling efficient local read operations across caches without bus involvement, as all copies remain consistent. Writes are prohibited in Shared state to avoid divergence; attempting one requires invalidating all other copies and transitioning to Modified. This setup supports scalable read sharing in applications with frequent data reads, such as parallel computations. Data can be sourced from main or any holding a Shared copy, promoting low-latency access when multiple processors need the same read-only data. Under the Invalid (I) state, a cache line is deemed unusable, either because it is absent from the or contains outdated that no longer reflects the system's view. No read or write permissions are granted, forcing any access to initiate a fetch from main memory or another cache's valid copy, typically into Shared or Modified state. This state acts as a safeguard against stale usage, ensuring that processors always obtain fresh before proceeding. Implications include increased on first access but guaranteed consistency, with sourcing directed to the most recent valid location in the hierarchy.
StateValidityRead PermissionWrite PermissionData SourcingKey Implication
Modified (M)Unique, dirty copyYes (local)Yes (local, no traffic)From owning Exclusive ; writeback on required for .
Shared (S)Clean, possibly multiple copiesYes (local)No (requires )From or any Shared Supports multi-reader scenarios; invalidation needed for writes.
Invalid (I)Stale or absentNo (fetch required)No (fetch and required)From or valid Prevents stale ; triggers actions.

State Transition Rules

The state transition rules in the MSI protocol govern how a block moves between the (I), Shared (S), and Modified (M) states in response to local requests and snooped bus transactions, ensuring across multiprocessor caches. These rules are implemented via a in each controller, where transitions are triggered by events or observed bus messages, often involving fetches, flushes, or invalidations to maintain consistency. Processor-initiated actions include PrRd (processor read hit or miss) and PrWr (processor write hit or miss). On a PrRd from I, the cache issues a BusRd to fetch a shared copy, transitioning to S; from S or M, it remains in the current state as a hit. On a PrWr from I, the cache issues a BusRdX for exclusive ownership, transitioning to ; from S, it issues a BusUpgr to invalidate other sharers and transitions to ; from , it remains in as a hit. Bus-observed actions include BusRd (another processor's read request), BusRdX (another processor's exclusive read for write), and BusUpgr (another processor's upgrade request). On BusRd from S, the cache remains in S and may supply data if needed; from M, it flushes the modified data to the bus or another cache and transitions to S. On BusRdX or BusUpgr from S, the cache transitions to I to invalidate its copy; from M, it flushes the data and transitions to I. From I, no action is taken on bus events. The following table summarizes the core state transitions, indicating the action taken (e.g., bus message issued or supplied):
Current StatePrRdPrWrBusRdBusRdX / BusUpgr
I→ S (issue BusRd)→ M (issue BusRdX)No changeNo change
S→ S (no action)→ M (issue BusUpgr)→ S (no action)→ I (invalidate)
M→ M (no action)→ M (no action)→ S (flush/supply)→ I (flush/supply)
Evictions (e.g., on ) from S transition to I without action, while from M require a writeback to memory. Invalidation rules ensure exclusivity for writes: BusRdX and BusUpgr cause all other caches holding S copies to transition to I, preventing multiple modified versions; similarly, a local PrWr from S triggers BusUpgr to enforce this across the system. This write-invalidation approach guarantees that no other cache holds a valid copy when transitioning to M.

Practical Implementation

Bus Operations and Messages

In snooping-based implementations of the cache coherence protocol, bus operations serve as the primary mechanism for coordinating cache state changes across multiple processors connected via a shared bus. These operations are broadcast to all caches, allowing each to snoop the transaction and respond accordingly to maintain . The core transactions include read requests, exclusive read requests, and writebacks, ensuring that data consistency is preserved without direct inter-cache communication. The BusRd operation is initiated by a on a read miss when the is in the (I) state. It broadcasts a request for the , prompting or another to supply the while transitioning the requesting to the Shared (S) state if no other holds a Modified (M) copy. If a snooping is in the M state, it must flush the dirty to the bus or before the completes, allowing the to enter S in multiple caches. This operation supports read-only sharing efficiently in multiprocessor systems. For write accesses, the BusRdX (read-exclusive) transaction is used, typically on a write miss from the I state or to upgrade from S. The requesting processor broadcasts BusRdX, which invalidates the block in all other caches (transitioning them from S or M to I) and supplies the data to the requester, placing it in the M state. If another cache holds the block in M, it responds by flushing the data to the bus for a cache-to-cache transfer, minimizing memory access latency. This ensures exclusive write access but incurs higher bus traffic due to invalidations. The Flush or BusWB (writeback) operation pushes a dirty block from the M state back to memory or the bus, occurring during , , or when snooping a BusRd or BusRdX that requires supplying data. For instance, on a snooped BusRdX, an M cache transitions to I after flushing, serializing writes to prevent inconsistencies. This mechanism guarantees that modifications are eventually propagated to the . In some MSI implementations, an optimization called BusUpgr (bus upgrade) allows a cache in the S state to transition to M for a write without fetching data or invalidating others unnecessarily, assuming no concurrent sharers. This reduces bus contention for sequential workloads but is not part of the basic MSI specification and may require additional signaling. A typical message flow for a write miss in MSI illustrates these operations: Suppose processor P1 has the block in I and issues a write. P1 broadcasts BusRdX. If no other cache has the block (all I), memory supplies data, and P1 enters M. If P2 has it in S, P2 invalidates to I and ignores the data; if P3 has it in M, P3 flushes to the bus, transitions to I, and P1 enters M with the updated data. Subsequent reads by P4 would trigger BusRd, snooped by P1 (in M), causing a flush to S in P1 and P4. These sequences ensure atomicity and ordering on the bus.

Directory-Based Adaptations

Directory-based adaptations of the protocol extend its principles to scalable multiprocessor systems by replacing broadcast-based snooping with a that tracks cache line ownership and sharing information, enabling point-to-point communication in non-bus interconnects such as NUMA or mesh networks. In these systems, the directory maintains per-cache-line entries that record the (Modified, Shared, or ) and the set of caches holding copies, allowing coherence actions to target only relevant nodes rather than the entire system. This approach addresses the limitations of in large-scale architectures while preserving the core MSI invariants of single-writer-multiple-reader access and data consistency. The can be centralized, where a single controller associated with main tracks all sharers using a bit vector or , or distributed, with entries spread across s to reduce contention and . For instance, in a distributed setup, each holds information for a portion of the , often embedded in the last-level or , with bit vectors indicating sharers (one bit per potential ) or pointers to a subset for sparse representations. Adapting MSI states involves encoding the global coherence state in the entry: Modified indicates a single owner with a potentially dirty copy; Shared lists multiple readers with clean copies matching ; and Invalid signifies no valid copies in caches, with the block residing solely in . Transient states, such as those during transitions (e.g., Invalid-Modified-Awaiting), are used to serialize ongoing requests and prevent races, ensuring updates without broadcasts. Request handling begins with a directory lookup on cache misses: for a processor read (PrRd or GetS), the directory examines the entry—if Shared, it supplies from or forwards to an owner; if Modified, it requests the block from the owner before adding the requester to the sharer list. For a processor write (PrWr or GetM), the directory invalidates all Shared copies by sending targeted invalidation messages to listed sharers, collecting acknowledgments before granting exclusive Modified state to the requester, often involving three or more message hops for transfer. Evictions or writebacks update the directory by notifying changes in ownership, maintaining the sharer set accurately. The seminal multiprocessor implemented this adaptation using a distributed per , where inter-cluster requests route through home nodes for lookup and . These adaptations provide scalability benefits by minimizing network traffic—unicast messages replace broadcasts, reducing contention in interconnects supporting hundreds to thousands of processors, as demonstrated in systems like the SGI Origin to cores. In NUMA environments, distributed directories localize overhead, lowering average latency for remote accesses compared to bus-based , though at the cost of additional storage for sharer tracking (e.g., O(P) bits per line for P processors). Overall, this enables efficient in or topologies without the O(N) broadcast issues of snooping.

Extensions and Analysis

Key Variants

The MESI protocol extends the core MSI framework by incorporating an Exclusive (E) state, which denotes a clean cache line held uniquely by a single cache without any other copies in the system. This addition enables a processor to silently transition a line from the E state to the Modified (M) state upon a write hit, eliminating the need for a bus upgrade request that would otherwise be required in the basic MSI protocol to invalidate potential shared copies. As a result, MESI reduces bus traffic in workloads where processors frequently read unique data before modifying it, such as in single-threaded or low-sharing scenarios. The further refines this approach by introducing an Owned (O) state alongside the M, E, Shared (S), and Invalid (I) states, allowing a modified line to be shared among multiple caches without an immediate write-back to main memory. In the O state, the owning cache retains responsibility for eventual updates, supplying data directly to requesting caches while keeping its modified version, which avoids redundant memory accesses in shared-modified situations common in multiprocessor environments. For instance, a transition to O can occur when a cache in M responds to a read request by providing its data to another cache, converting the line to shared without flushing it to memory first. These variants, MESI and , have been adopted for traffic optimization in high-contention workloads, with MESI implemented in x86 processors and in systems and most cores, enabling efficient scaling in multi-core architectures.

Advantages and Limitations

The protocol offers several key advantages due to its straightforward design, making it particularly suitable for small-scale symmetric multiprocessor () systems with up to 4-8 processors. Its use of only three states—Modified, Shared, and —results in low overhead and simplified , as it requires minimal tracking and control logic compared to protocols with additional states. This simplicity enables efficient maintenance through snooping on a shared bus, supporting multiple copies of a cache block in the Shared state while distinguishing exclusive modifications in the state, without needing to fetch data from lower levels of the for certain transitions. Despite these benefits, the MSI protocol has notable limitations, primarily stemming from its reliance on bus-based snooping, which leads to high bus traffic in scenarios involving frequent read-write accesses. For instance, a processor transitioning from Shared to Modified state must issue a BusUpgrade transaction, broadcasting invalidation messages to all caches even if it holds the only valid copy, resulting in unnecessary overhead and performance degradation. Additionally, read operations always set the state to Shared, assuming potential sharing, which forces subsequent writes to generate extra bus transactions like BusRd followed by BusUpgr, exacerbating contention on the shared bus. These issues contribute to poor scalability beyond small processor counts, as broadcast traffic increases linearly with the number of nodes, making it inefficient for larger multiprocessor configurations. In comparison to variants like , the protocol generates more coherence traffic; for example, MESI's addition of an Exclusive state allows local upgrades from Exclusive to Modified without bus intervention, reducing upgrade transactions by up to 50% in read-then-write workloads and overall bus traffic by 20-30% in certain benchmarks. This makes MSI less optimal for workloads with sequential read-write patterns common in shared-memory applications. In modern contexts, the MSI protocol has largely been superseded by more advanced variants in due to its scalability constraints, but it remains valuable for educational purposes in illustrating fundamental principles and for low-end embedded systems where simplicity and minimal overhead are prioritized over performance in larger scales.

References

  1. [1]
    [PDF] Cache Coherence The Modified-Shared-Invalid (MSI) protocol ...
    The Modified-Shared-Invalid (MSI) protocol maintains the following invariant: Each block of memory is always in exactly one of the following states:.
  2. [2]
    [PDF] Memory Coherency
    How Does MSI Satisfy Cache Coherence? 1. Single-Writer, Multiple-Read (SWMR) Invariant. - Only one cache can be in M- ...
  3. [3]
    A class of compatible cache consistency protocols and their support ...
    A class of compatible cache consistency protocols and their support by the IEEE futurebus ... View or Download as a PDF file. PDF. eReader. View online with ...
  4. [4]
    Cache Coherency - CS 3410 - Cornell: Computer Science
    We discuss two protocols, Valid-Invalid (VI) and Modified-Shared-Invalid (MSI). In both the VI and MSI cache coherence prototocols, we are assuming write-back ...
  5. [5]
    Cache Coherence
    Write-back caches are more common where higher performance is desired. The MSI cache coherence protocol is one of the simpler write-back protocols.
  6. [6]
    [PDF] Cache Coherence
    Coherence means to provide the same semantic in a system with multiple copies of M. • Formally, a memory system is coherent iff it behaves as.
  7. [7]
    [PDF] A survey of cache coherence schemes for multiprocessors - MIT
    This article surveys schemes for cache coherence. These schemes exhibit various degrees of hardware complexity, ranging from protocols that maintain coherence ...
  8. [8]
    [PDF] Shared Memory Consistency Models: A Tutorial - Computer
    The memory consistency model of a shared memory multiprocessor for- mally specifies how the memory system will appear to the programmer. Essentially, a memory ...
  9. [9]
    Coherency for multiprocessor virtual address caches
    A multiprocessor cache memory system is described that supplies data to the processor based on virtual addresses, but maintains consistency in the main memory, ...
  10. [10]
    [PDF] 0018-9340/79/0900-0690$00.75 C) 1979 IEEE - Microsoft
    executes multiprocess programs. Index Terms-Computer design, concurrent computing, hardware correctness, multiprocessing, parallel processing. A high-speed ...
  11. [11]
    [PDF] Snooping-Based Cache Coherence
    When coherence protocol requires X to be flushed from L2 (e.g., another processor loads X), L2 cache must request the data from L1. Add another bit for “ ...
  12. [12]
    [PDF] A Primer on Memory Consistency and Cache Coherence, Second ...
    COHERENCE PROTOCOLS. Stable States. Many coherence protocols use a subset of the classic five state MOESI model first introduced by. Sweazey and Smith [10] ...
  13. [13]
    [PDF] 1 Remember Amdahl's Law
    Apr 29, 2010 · MSI: Better, but still basic snooping protocol. • Used in SGI 4D processor. • Supports sharing & write back. • Supports readX: read exclusive ...
  14. [14]
    [PDF] Lecture 18: Snooping vs. Directory Based Coherency
    • Directory has extra data structure to keep track of state of all cache blocks. • Distributing directory => scalable shared address multiprocessor.
  15. [15]
    [PDF] Directory-Based Cache Coherence
    ▫ What limits the scalability of snooping-based approaches to cache coherence. ▫ How a directory-based scheme avoids these problems. ▫ How the storage ...
  16. [16]
    [PDF] Cache coherence
    MSI protocol = Modified/shared/invalid. Makes sure that if a block is dirty in one cache, it is not valid in any other cache and that a read request gets the ...
  17. [17]
    [PDF] Parallel Processing Cache Coherency
    When P3 reads and the block is in the shared state, the slow memory supplies the data. We can add an “Owned” state where one cache takes “ownership” of a shared ...Missing: rules | Show results with:rules
  18. [18]
    [PDF] Lecture #21 - "Multicore Cache Coherence"
    Nov 14, 2016 · This is a general problem with shared memory multiprocessors and multicores, with private caches. • Coherence Solution:.
  19. [19]
    [PDF] Cache Coherence - csail
    □ Controller updates state of cache in response to processor and snoop events and generates bus transactions. □ Snoopy protocol (FSM). □ State-transition ...
  20. [20]
    [PDF] Cache Coherence - Overview of 15-740
    Caches “snoop” (observe) each other's write/read operations. If a processor writes to a block, all others invalidate it from their caches. A simple protocol: 12.
  21. [21]
    [PDF] ECE666 - Purdue Engineering
    Bus. – Bus Read (BusRd) Read without intent to modify, data could come from memory or another cache. – Bus Read-Exclusive (BusRdX) Read with intent to ...
  22. [22]
    [PDF] Shared Memory Architectures
    BusRdX could be replaced by BusUpgr without data transfer. ▫. Read-Write is 2 bus transactions, even if no sharing. ▫. BusRd (I→S) followed by BusRdX or BusUpgr.
  23. [23]
  24. [24]
    [PDF] Directory-Based Cache Coherence - MIT
    Oct 21, 2024 · MSI Protocol: Directory (1/2). Ex. Sh. Un. ShReq / Sharers = {P}; ShResp ... Our cache coherence protocol will introduce a performance issue here.Missing: adaptations | Show results with:adaptations
  25. [25]
    MESI and MOESI protocols - Arm Developer
    There are a number of standard ways by which cache coherency schemes can operate. Most ARM processors use the MOESI protocol, while the Cortex-A9 uses the MESI ...
  26. [26]
    An Overview of On-Chip Cache Coherence Protocols - ResearchGate
    Apr 18, 2018 · This paper presents an overview of emerging cache coherence protocols which aim at improving the performance of CMPs. Furthermore, an example of ...Missing: original | Show results with:original<|control11|><|separator|>