Study smarter with Fiveable
Get study guides, practice questions, and cheatsheets for all your subjects. Join 500,000+ students with a 96% pass rate.
Cache coherence is the glue that holds multiprocessor systems together. When multiple processors each have their own cache, you're essentially creating multiple "versions" of the same data—and without a coherence protocol, those versions can diverge, leading to incorrect program behavior. You're being tested on your understanding of how these protocols maintain consistency, why certain designs scale better than others, and when to apply different strategies based on system architecture.
The concepts here connect directly to broader themes in computer architecture: parallelism and its challenges, memory hierarchy trade-offs, scalability versus simplicity, and bandwidth optimization. Exam questions often ask you to compare protocols, identify which approach fits a given system size, or analyze the trade-offs between bus traffic and implementation complexity. Don't just memorize protocol names and states—know what problem each protocol solves and why its mechanism works.
These protocols form the backbone of cache coherence in bus-based systems. Each cache line exists in one of several states, and state transitions are triggered by local processor actions or observed bus transactions. Understanding the progression from MSI to MESI to MOESI reveals how architects incrementally solved performance problems.
Compare: MSI vs. MESI—both use invalidation-based coherence, but MESI's Exclusive state eliminates bus traffic for private data. If an FRQ asks about optimizing single-threaded performance in a multiprocessor, MESI's silent upgrade path is your answer.
The mechanism for detecting coherence violations differs fundamentally between these approaches. Snooping relies on broadcast; directories rely on point-to-point messaging. This distinction drives scalability trade-offs that appear frequently on exams.
Compare: Snooping vs. Directory—snooping is simpler and has lower latency for small systems, but directory-based protocols scale to hundreds of processors. When analyzing a system design question, processor count is your first decision point.
When a processor writes to shared data, the protocol must decide how to inform other caches. This fundamental choice affects bandwidth consumption, read latency, and overall system performance.
Compare: Write-invalidate vs. Write-update—invalidate optimizes for bandwidth, update optimizes for read latency. Most real systems use invalidate because writes to shared data that's never re-read waste bandwidth under update protocols.
These protocols address limitations of traditional approaches, enabling coherence in systems with dozens to thousands of processors. They represent the cutting edge of coherence research and appear in high-end servers and supercomputers.
Compare: SCI vs. Token Coherence—both target large-scale systems, but SCI uses distributed directories while token coherence uses a counting mechanism. Token coherence offers more flexibility for performance optimization at the cost of implementation complexity.
| Concept | Best Examples |
|---|---|
| Basic state-based protocols | MSI, MESI, MOESI |
| Broadcast-based coherence | Snooping protocols, Bus-based protocols |
| Scalable coherence | Directory-based, SCI, Token coherence |
| Write strategy trade-offs | Write-invalidate, Write-update |
| Reducing memory traffic | MOESI (Owned state), Cache-to-cache transfers |
| Small system optimization | MESI, Snooping protocols |
| Large system optimization | Directory-based, SCI |
| Flexible ordering | Timestamp-based protocols |
Which two protocols both use state-based coherence but differ in how they handle clean, private data? Explain why this difference matters for performance.
A system architect is designing a 64-processor server. Which coherence approach (snooping or directory) should they choose, and what specific scalability limitation are they avoiding?
Compare and contrast write-invalidate and write-update strategies. Under what workload conditions would write-update actually outperform write-invalidate?
If an FRQ asks you to reduce memory bandwidth consumption in a system with high read-sharing of modified data, which protocol state (from MOESI) directly addresses this, and how does it work?
Token coherence and directory-based protocols both aim to scale beyond bus-based systems. What fundamental mechanism differs between them, and what trade-off does each approach make?