Study smarter with Fiveable
Get study guides, practice questions, and cheatsheets for all your subjects. Join 500,000+ students with a 96% pass rate.
Distributed algorithms are the invisible backbone of every system where multiple machines need to work together—from cloud databases to blockchain networks to the apps on your phone syncing across devices. You're being tested on your understanding of how nodes coordinate without a single point of control, why failures don't crash entire systems, and what mechanisms ensure consistency when machines can't trust each other or even agree on what time it is. These concepts appear repeatedly in exam questions because they represent the fundamental challenges of building reliable systems at scale.
Don't just memorize algorithm names—know what problem each algorithm solves and why that problem is hard in a distributed context. When you see Paxos, think "agreement despite failures." When you see Chandy-Lamport, think "capturing global state without stopping the world." This conceptual mapping is what separates students who can handle FRQ scenarios from those who just recognize vocabulary.
When distributed nodes must agree on a single value or decision, you need consensus algorithms. The core challenge is that nodes can fail, messages can be delayed or lost, and yet the system must still make progress and never contradict itself.
Compare: Raft vs. PBFT—both achieve consensus, but Raft assumes nodes fail by crashing (crash fault tolerance), while PBFT assumes nodes might actively lie (Byzantine fault tolerance). If an FRQ asks about blockchain or adversarial environments, PBFT is your answer; for trusted internal systems, Raft is simpler and faster.
When multiple processes need exclusive access to shared resources, you need mechanisms to prevent simultaneous access. The challenge is coordinating without a central lock server that could become a bottleneck or single point of failure.
Compare: Ricart-Agrawala vs. Token Ring—both achieve mutual exclusion, but Ricart-Agrawala requires messages per critical section entry while Token Ring has lower message complexity but higher latency. Choose based on whether message overhead or response time matters more.
Distributed systems need ways to understand global state and spread information efficiently. The challenge is that there's no global clock or instantaneous communication—you can't just "pause" everything to take a picture.
Compare: Chandy-Lamport vs. Gossip—Chandy-Lamport captures a single consistent snapshot at one logical moment, while Gossip continuously propagates updates toward eventual consistency. Use snapshots for recovery points; use gossip for ongoing state synchronization.
Large-scale distributed systems need efficient ways to store data and route messages without centralized indexes. The challenge is maintaining structure and enabling fast lookups when nodes join, leave, and fail constantly.
Compare: DHTs vs. Spanning Trees—DHTs optimize for data lookup and storage distribution, while spanning trees optimize for message routing and broadcast efficiency. A system might use both: DHT for finding where data lives, spanning tree for efficiently reaching all nodes.
| Concept | Best Examples |
|---|---|
| Agreement despite crashes | Paxos, Raft, Leader Election |
| Agreement despite malicious nodes | PBFT, Byzantine fault tolerance |
| Exclusive resource access | Ricart-Agrawala, Token Ring |
| Time and ordering | NTP, Berkeley Algorithm, Lamport timestamps |
| Global state capture | Chandy-Lamport snapshots |
| Scalable information spread | Gossip protocols |
| Decentralized data storage | Chord, Kademlia (DHTs) |
| Network topology management | Spanning Tree Protocol, distributed MST |
Both Raft and PBFT achieve consensus—what fundamental assumption about node failures distinguishes when you'd use each one?
If a system needs to detect whether a distributed computation has terminated, which algorithm family would you use to capture the necessary global state, and why can't you simply query each node individually?
Compare Ricart-Agrawala and Token Ring for distributed mutual exclusion: which has better message complexity, and which is more vulnerable to single points of failure?
A peer-to-peer file sharing system needs to locate files across millions of nodes without any central server. Which distributed structure enables lookups, and what mechanism determines which node stores which keys?
An FRQ describes a large-scale database where nodes need to share membership information and detect failures, but centralized coordination would create a bottleneck. Which protocol family achieves eventual consistency through randomized peer-to-peer communication, and what is its typical convergence time in terms of network size?