An interactive deep-dive into two landmark distributed consensus algorithms — Raft and Viewstamped Replication — with live simulations you can explore and break.
Distributed systems are hard because machines fail. A server can crash mid-write, a network can partition, a message can vanish. The question at the heart of fault-tolerant systems is: how do a group of servers agree on anything when any of them can fail at any time?
That question is answered by consensus algorithms. Two of the most important are:
- Viewstamped Replication (Liskov & Cowling, 1988) — historically the first complete consensus protocol for state machine replication
- Raft (Ongaro & Ousterhout, 2014) — designed from the ground up to be understandable, now the backbone of etcd, CockroachDB, and many cloud systems
Both algorithms solve the same problem with surprisingly similar shapes. Let's explore them interactively.
The visualizations below are live simulations. Click any node to disconnect or reconnect it. Watch how the cluster reacts. Add or remove nodes. Hit "Client write" to trigger log replication. All message flows — the coloured circles flying between nodes — are the actual RPCs each algorithm sends.
Raft
Raft decomposes the consensus problem into three largely independent sub-problems:
- Leader election — exactly one server must be authoritative at any term
- Log replication — the leader accepts writes and replicates them to followers
- Safety — once an entry is committed, it can never be overwritten or lost
Cluster roles
Every Raft server is always in one of three states:
| State | Colour | Description |
|---|---|---|
| Follower | 🔵 Blue | Passive; just applies entries from the leader |
| Candidate | 🟡 Yellow | Campaigning to become leader |
| Leader | 🔴 Red | Handles all client writes; replicates to followers |
Leader election and the election timeout
Each follower has an election timeout — the arc you see growing around each blue node. If that timer expires without receiving a heartbeat from the leader, the follower assumes the leader is dead and promotes itself to Candidate.
The candidate increments its term number (visible inside the node as v1, v2, …), votes for itself, and fires RequestVote (purple circles) at every other server. A server grants its vote if:
- the candidate's term is at least as large as its own, and
- it hasn't already voted for someone else this term
If the candidate collects votes from a majority of the cluster, it wins and becomes the new leader — and immediately starts sending AppendEntries heartbeats (blue circles) to reset everyone's timers and prevent a new election.
The randomised timeout window (each follower picks a different timeout) ensures that in most cases only one candidate starts an election at a time, avoiding perpetual split votes.
▶ Simulation started — RAFT
Try it: disconnect the red leader node (click it). You'll see the blue nodes' timer arcs drain. One runs out first, turns yellow, and fires purple RequestVote RPCs. Green VoteGranted circles come back. Once a majority responds, a new red leader appears and immediately sends blue heartbeats.
Reconnect the old leader — it receives a heartbeat from the new leader with a higher term, realises someone else is in charge, and quietly steps back to follower.
Log replication
Once a leader is elected, it handles all client writes. The flow is:
- Leader appends the entry to its own log
- Leader sends
AppendEntriesto all followers (the blue circles — these double as heartbeats when empty) - Once a majority of servers acknowledge, the entry is committed
- Leader tells followers to commit in the next round
This majority requirement is the key safety property: a committed entry must be on enough servers that any future leader will have it in their log. Raft's Leader Completeness guarantee means a new leader is only elected if its log is at least as up-to-date as any majority — so committed entries are never lost.
Viewstamped Replication
VSR (first described in 1988 and revised by Liskov & Cowling in 2012) frames the problem as state machine replication: keep replicas of a state machine in sync so that the system continues to work even when some replicas fail.
Terminology
VSR uses different names for the same concepts:
| VSR term | Raft equivalent |
|---|---|
| Primary (red) | Leader |
| Backup (blue) | Follower |
| View | Term |
| View change | Leader election |
| Op-number | Log index |
Normal operation
During normal operation, the primary receives client requests and drives replication:
- Primary sends
Prepare(view, op-number, request)to all backups (blue circles) - Each backup increments its op-number, appends the entry, and responds
PrepareOK(teal circles) - When the primary has collected
PrepareOKfrom a majority, the entry is committed - Primary broadcasts
Commit(yellow circles) so backups can advance their commit pointer
The primary also sends periodic Prepare messages even with no new client operations, acting as a heartbeat and keeping backups' timers reset.
▶ Simulation started — VSR
Try it: disconnect the primary. Backups notice the silence, their timers expire, and they kick off a view change. Red StartViewChange messages propagate through the cluster. Purple DoViewChange messages converge on the deterministically chosen new primary (the node at index view mod N). Once it collects a majority, it fires yellow StartView to announce the new view, and normal Prepare replication resumes.
View change protocol
The VSR view change is more structured than Raft's election:
- A backup whose timer expires increments the view number and broadcasts
StartViewChange(v+1)to all replicas - Any replica receiving this message also broadcasts it (propagation), and sends
DoViewChangeto the designated new primary — deterministically the node at indexv+1 mod N - The new primary waits for
DoViewChangefrom a majority, then starts the new view withStartView
The key difference from Raft: the new leader is not elected by competition — it is mathematically determined by the view number. If that node is also down, the view number keeps incrementing until an available node lands on its slot.
Raft vs. Viewstamped Replication
Both algorithms tolerate up to ⌊(N−1)/2⌋ simultaneous failures in a cluster of N servers. A 5-node cluster can lose 2 nodes and still make progress.
| Property | Raft | VSR |
|---|---|---|
| New leader selection | First candidate to win a majority vote | Deterministic: view mod N |
| Heartbeat mechanism | AppendEntries (also carries log entries) | Prepare + separate Commit |
| View/term tracking | currentTerm | view-number |
| Vote deduplication | votedFor per term | Implicit via DoViewChange quorum |
| Historical role | Modern, pedagogical (2014) | Pioneer; predates Paxos (1988) |
Conceptually, the two are duals: Raft handles leader election and log replication in the same RPC (AppendEntries), while VSR separates replication (Prepare/PrepareOK) from commit notification (Commit) and keeps view changes as a distinct three-phase protocol.
Both algorithms require a quorum — a strict majority of servers — to make any progress. This is unavoidable: it is the minimum requirement to prevent split-brain (two leaders simultaneously accepting writes). With N=5, you need 3 servers alive. Try adding nodes in the simulation and then disconnecting several at once to see when the cluster halts.
What the simulation doesn't show
The visualisation is simplified in a few ways to keep it legible:
- Log contents are not shown — nodes just track a commit counter
- Disk writes are omitted (in real systems, entries are fsynced before acknowledging)
- Network partitions are modelled as total node disconnections, not one-directional partitions
- Snapshots / log compaction (needed in long-running systems) are not simulated
- Membership changes (adding/removing nodes without downtime) are not modelled
For a deeper treatment of these topics, the original papers are surprisingly readable.
Further reading
- Ongaro & Ousterhout, In Search of an Understandable Consensus Algorithm (Raft paper, 2014)
- Liskov & Cowling, Viewstamped Replication Revisited (2012)
- raft.github.io — Ongaro's own interactive visualisation
- Lamport, Paxos Made Simple (2001) — the third corner of the consensus triangle
- Howard, Flexible Paxos (2016) — a generalisation showing quorums don't have to be majorities
Related Articles
The Challenges of WebGL
WebGL lets you run shaders in the browser — and that's genuinely magical. But using it for arbitrary GPU computation quickly reveals a thicket of constraints that OpenCL and WebGPU were designed to escape.
The Abelian Sandpile: A Group Hiding in a Pile of Sand
The Bak–Tang–Wiesenfeld sandpile model (ASM) is a deceptively simple cellular automaton that conceals a rich algebraic structure — an abelian group whose identity element is a non-trivial fractal-like configuration.
From Regex to Automata to Generators
How regular expressions compile to deterministic finite automata — with an interactive visualizer, a DFA-powered string generator, and a note on closing the loop with property-based testing.