Study smarter with Fiveable
Get study guides, practice questions, and cheatsheets for all your subjects. Join 500,000+ students with a 96% pass rate.
Cache coherence is the glue that holds multiprocessor systems together. When multiple processors each have their own cache, you're creating multiple "versions" of the same data. Without a coherence protocol, those versions can diverge, leading to incorrect program behavior. You're being tested on your understanding of how these protocols maintain consistency, why certain designs scale better than others, and when to apply different strategies based on system architecture.
The concepts here connect directly to broader themes in computer architecture: parallelism and its challenges, memory hierarchy trade-offs, scalability versus simplicity, and bandwidth optimization. Exam questions often ask you to compare protocols, identify which approach fits a given system size, or analyze the trade-offs between bus traffic and implementation complexity. Don't just memorize protocol names and states. Know what problem each protocol solves and why its mechanism works.
These protocols form the backbone of cache coherence in bus-based systems. Each cache line exists in one of several states, and state transitions are triggered by local processor actions or observed bus transactions. Understanding the progression from MSI to MESI to MOESI reveals how architects incrementally solved performance problems.
The simplest invalidation-based protocol uses three states:
The write-invalidate mechanism forces all other caches to move their copies to Invalid before a write completes. This guarantees only one writable copy exists at any time.
MSI's main weakness: it has no way to distinguish between "I'm the only cache with this line" and "multiple caches share this line." Even private data that no other cache holds will generate unnecessary bus invalidations on a write. This is exactly what MESI fixes.
MESI adds a fourth state, Exclusive (E), which means: this cache holds the only copy, and it's clean (matches main memory).
Why this matters so much: a cache line in E can be silently upgraded to M on a write, with no bus transaction at all. The cache already knows no one else has a copy, so there's nothing to invalidate.
MOESI adds a fifth state, Owned (O), which solves a specific and costly problem: what happens when a processor with a Modified line sees another processor request that data?
In MESI, the modifying cache must write back to main memory, and then both caches hold the line in S. That write-back costs memory bandwidth. In MOESI, the modifying cache transitions to Owned and supplies the data directly to the requester via a cache-to-cache transfer. The O state means: "I have a dirty copy that I'm responsible for eventually writing back, but other caches can hold Shared copies that they got from me."
Compare: MSI vs. MESI: both use invalidation-based coherence, but MESI's Exclusive state eliminates bus traffic for private data. If a question asks about optimizing single-threaded performance in a multiprocessor, MESI's silent upgrade path is your answer. MOESI goes further by also eliminating unnecessary write-backs when sharing dirty data.
The mechanism for detecting coherence violations differs fundamentally between these approaches. Snooping relies on broadcast; directories rely on point-to-point messaging. This distinction drives scalability trade-offs that appear frequently on exams.
Every cache controller monitors (snoops on) every bus transaction, checking addresses against its own tags. When a cache sees a transaction relevant to a line it holds, it takes action (e.g., invalidating or supplying data).
Instead of broadcasting, a directory tracks which caches hold copies of each memory line and what state those copies are in. When a coherence action is needed, the directory sends targeted (point-to-point) messages only to the caches that actually hold the line.
This refers to the interconnect choice rather than the protocol logic itself. A shared bus serves as the communication backbone for snooping protocols.
Compare: Snooping vs. Directory: snooping is simpler and has lower latency for small systems, but directory-based protocols scale to hundreds of processors. When analyzing a system design question, processor count is your first decision point. Small system (under ~16 cores)? Snooping is likely fine. Large system? You need a directory.
When a processor writes to shared data, the protocol must decide how to inform other caches. This fundamental choice affects bandwidth consumption, read latency, and overall system performance.
Write-invalidate discards all other cached copies before the write proceeds. After the invalidation, only the writing cache holds valid data. Any subsequent read from another processor will miss and fetch the new value.
Write-update (also called write-broadcast) pushes the new value to all caches that hold the line, keeping every copy current.
The trade-offs break down like this:
Most real systems use write-invalidate because, in practice, writes to shared data that are never re-read by other processors are common. Write-update wastes bandwidth in those cases. Write-update can outperform write-invalidate in specific patterns like tight producer-consumer loops, where one core writes and another core reads the same line repeatedly.
Compare: Write-invalidate vs. Write-update: invalidate optimizes for bandwidth, update optimizes for read latency. The key question is whether other caches will actually re-read the data after a write. If yes, update helps. If no, update wastes bandwidth.
These protocols address limitations of traditional approaches, enabling coherence in systems with dozens to thousands of processors. They represent the cutting edge of coherence research and appear in high-end servers and supercomputers.
SCI replaces the shared bus with a ring or hierarchy topology, allowing parallel coherence transactions across different parts of the system.
Token coherence takes a fundamentally different approach to guaranteeing correctness. Each cache line in the system has a fixed number of tokens (typically equal to the number of processors plus one).
These protocols use logical timestamps to order operations rather than relying on bus ordering or directory serialization.
Compare: SCI vs. Token Coherence: both target large-scale systems, but SCI uses distributed linked-list directories while token coherence uses a counting mechanism. Token coherence offers more flexibility for performance optimization (you can mix broadcast and point-to-point freely) at the cost of more complex implementation and the need for a persistent request mechanism to handle token loss or starvation.
| Concept | Best Examples |
|---|---|
| Basic state-based protocols | MSI, MESI, MOESI |
| Broadcast-based coherence | Snooping protocols, Bus-based protocols |
| Scalable coherence | Directory-based, SCI, Token coherence |
| Write strategy trade-offs | Write-invalidate, Write-update |
| Reducing memory traffic | MOESI (Owned state), Cache-to-cache transfers |
| Small system optimization | MESI, Snooping protocols |
| Large system optimization | Directory-based, SCI |
| Flexible ordering | Timestamp-based protocols |
Which two protocols both use state-based coherence but differ in how they handle clean, private data? Explain why this difference matters for performance.
A system architect is designing a 64-processor server. Which coherence approach (snooping or directory) should they choose, and what specific scalability limitation are they avoiding?
Compare and contrast write-invalidate and write-update strategies. Under what workload conditions would write-update actually outperform write-invalidate?
If a question asks you to reduce memory bandwidth consumption in a system with high read-sharing of modified data, which protocol state (from MOESI) directly addresses this, and how does it work?
Token coherence and directory-based protocols both aim to scale beyond bus-based systems. What fundamental mechanism differs between them, and what trade-off does each approach make?