🥸Advanced Computer Architecture

Key Concepts of Cache Coherence Protocols

Study smarter with Fiveable

Get study guides, practice questions, and cheatsheets for all your subjects. Join 500,000+ students with a 96% pass rate.

Get Started

Why This Matters

Cache coherence is the glue that holds multiprocessor systems together. When multiple processors each have their own cache, you're essentially creating multiple "versions" of the same data—and without a coherence protocol, those versions can diverge, leading to incorrect program behavior. You're being tested on your understanding of how these protocols maintain consistency, why certain designs scale better than others, and when to apply different strategies based on system architecture.

The concepts here connect directly to broader themes in computer architecture: parallelism and its challenges, memory hierarchy trade-offs, scalability versus simplicity, and bandwidth optimization. Exam questions often ask you to compare protocols, identify which approach fits a given system size, or analyze the trade-offs between bus traffic and implementation complexity. Don't just memorize protocol names and states—know what problem each protocol solves and why its mechanism works.

Snoopy Protocol Foundations: State-Based Coherence

These protocols form the backbone of cache coherence in bus-based systems. Each cache line exists in one of several states, and state transitions are triggered by local processor actions or observed bus transactions. Understanding the progression from MSI to MESI to MOESI reveals how architects incrementally solved performance problems.

MSI (Modified-Shared-Invalid) Protocol

Three fundamental states—Modified (dirty, exclusive), Shared (clean, potentially in multiple caches), and Invalid (not usable)
Write-invalidate mechanism ensures coherence by forcing all other caches to invalidate their copies before a write completes
High bus traffic results from lacking an Exclusive state, meaning even private data triggers unnecessary coherence transactions

MESI (Modified-Exclusive-Shared-Invalid) Protocol

Exclusive state addition allows a cache to hold clean, private data without broadcasting—silent upgrades to Modified become possible
Reduced invalidation traffic compared to MSI because the protocol distinguishes between "only copy" and "shared copy" scenarios
Industry standard in most modern x86 and ARM multiprocessors due to its balance of simplicity and efficiency

MOESI (Modified-Owned-Exclusive-Shared-Invalid) Protocol

Owned state permits a cache to supply data to other caches directly, avoiding expensive write-backs to main memory
Cache-to-cache transfers reduce memory bandwidth pressure when multiple processors share frequently-modified data
Best for high-contention workloads where shared data is read often but modified by one processor at a time

Compare: MSI vs. MESI—both use invalidation-based coherence, but MESI's Exclusive state eliminates bus traffic for private data. If an FRQ asks about optimizing single-threaded performance in a multiprocessor, MESI's silent upgrade path is your answer.

Architectural Approaches: Snooping vs. Directory

The mechanism for detecting coherence violations differs fundamentally between these approaches. Snooping relies on broadcast; directories rely on point-to-point messaging. This distinction drives scalability trade-offs that appear frequently on exams.

Snooping Protocols

Bus monitoring by all caches—every cache controller watches every transaction, checking addresses against its tags
Broadcast-based communication makes implementation straightforward but creates $O(n)$ traffic per coherence event where $n$ is processor count
Scalability ceiling around 8-16 processors before bus bandwidth becomes the bottleneck

Directory-Based Coherence Protocols

Centralized or distributed directory tracks which caches hold copies of each line and their current states
Point-to-point messages replace broadcasts, achieving $O(1)$ traffic per coherence event regardless of processor count
Essential for large-scale systems like NUMA architectures where hundreds of processors must maintain coherence

Bus-Based Coherence Protocols

Shared bus as communication backbone—all coherence traffic flows through a single interconnect
Atomic transactions simplify protocol design because only one request can be in flight at a time
Bottleneck at scale makes this approach unsuitable for systems beyond roughly 16 processors

Compare: Snooping vs. Directory—snooping is simpler and has lower latency for small systems, but directory-based protocols scale to hundreds of processors. When analyzing a system design question, processor count is your first decision point.

Write Strategies: Invalidate vs. Update

When a processor writes to shared data, the protocol must decide how to inform other caches. This fundamental choice affects bandwidth consumption, read latency, and overall system performance.

Write-Invalidate vs. Write-Update Protocols

Write-invalidate discards other copies—only the writing cache retains valid data, minimizing bandwidth for write-heavy workloads
Write-update broadcasts new values—all caches stay current, reducing read-miss latency but consuming bandwidth on every write
Workload-dependent choice where write-invalidate wins for most general-purpose computing, while write-update suits specific patterns like producer-consumer

Compare: Write-invalidate vs. Write-update—invalidate optimizes for bandwidth, update optimizes for read latency. Most real systems use invalidate because writes to shared data that's never re-read waste bandwidth under update protocols.

Advanced Scalability Solutions

These protocols address limitations of traditional approaches, enabling coherence in systems with dozens to thousands of processors. They represent the cutting edge of coherence research and appear in high-end servers and supercomputers.

Scalable Coherence Interface (SCI)

Ring or hierarchy topology replaces the shared bus, allowing parallel coherence transactions across different parts of the system
Distributed directory embedded in cache lines—each line contains pointers to sharing caches, eliminating central directory bottlenecks
IEEE standard (1596) designed specifically for systems where bus-based approaches fail

Token Coherence

Fixed number of tokens per cache line—a processor needs all tokens to write, at least one token to read
Decouples correctness from performance—the token mechanism guarantees safety while allowing flexible optimizations
Reduces serialization compared to directory protocols because token transfers can occur without central coordination

Timestamp-Based Coherence Protocols

Logical timestamps order operations—each write increments a timestamp, and caches compare timestamps to determine validity
Lazy coherence possible where stale data is tolerated briefly if the application semantics allow it
Flexibility for relaxed consistency models where strict ordering isn't required for correctness

Compare: SCI vs. Token Coherence—both target large-scale systems, but SCI uses distributed directories while token coherence uses a counting mechanism. Token coherence offers more flexibility for performance optimization at the cost of implementation complexity.

Quick Reference Table

Concept	Best Examples
Basic state-based protocols	MSI, MESI, MOESI
Broadcast-based coherence	Snooping protocols, Bus-based protocols
Scalable coherence	Directory-based, SCI, Token coherence
Write strategy trade-offs	Write-invalidate, Write-update
Reducing memory traffic	MOESI (Owned state), Cache-to-cache transfers
Small system optimization	MESI, Snooping protocols
Large system optimization	Directory-based, SCI
Flexible ordering	Timestamp-based protocols

Self-Check Questions

Which two protocols both use state-based coherence but differ in how they handle clean, private data? Explain why this difference matters for performance.
A system architect is designing a 64-processor server. Which coherence approach (snooping or directory) should they choose, and what specific scalability limitation are they avoiding?
Compare and contrast write-invalidate and write-update strategies. Under what workload conditions would write-update actually outperform write-invalidate?
If an FRQ asks you to reduce memory bandwidth consumption in a system with high read-sharing of modified data, which protocol state (from MOESI) directly addresses this, and how does it work?
Token coherence and directory-based protocols both aim to scale beyond bus-based systems. What fundamental mechanism differs between them, and what trade-off does each approach make?

🥸Advanced Computer Architecture

Key Concepts of Cache Coherence Protocols

Why This Matters

Snoopy Protocol Foundations: State-Based Coherence

MSI (Modified-Shared-Invalid) Protocol

MESI (Modified-Exclusive-Shared-Invalid) Protocol

MOESI (Modified-Owned-Exclusive-Shared-Invalid) Protocol

Architectural Approaches: Snooping vs. Directory

Snooping Protocols

Directory-Based Coherence Protocols

Bus-Based Coherence Protocols

Write Strategies: Invalidate vs. Update

Write-Invalidate vs. Write-Update Protocols

Advanced Scalability Solutions

Scalable Coherence Interface (SCI)

Token Coherence

Timestamp-Based Coherence Protocols

Quick Reference Table

Self-Check Questions

history

social science

english & capstone

arts

science

math & computer science

world languages

high school exams

honors classes

college classes

hs classes