Raft & Viewstamped Replication: Consensus Visualized

An interactive deep-dive into two landmark distributed consensus algorithms — Raft and Viewstamped Replication — with live simulations you can explore and break.

·7 min read

Distributed systems are hard because machines fail. A server can crash mid-write, a network can partition, a message can vanish. The question at the heart of fault-tolerant systems is: how do a group of servers agree on anything when any of them can fail at any time?

That question is answered by consensus algorithms. Two of the most important are:

  • Viewstamped Replication (Liskov & Cowling, 1988) — historically the first complete consensus protocol for state machine replication
  • Raft (Ongaro & Ousterhout, 2014) — designed from the ground up to be understandable, now the backbone of etcd, CockroachDB, and many cloud systems

Both algorithms solve the same problem with surprisingly similar shapes. Let's explore them interactively.

💡

The visualizations below are live simulations. Click any node to disconnect or reconnect it. Watch how the cluster reacts. Add or remove nodes. Hit "Client write" to trigger log replication. All message flows — the coloured circles flying between nodes — are the actual RPCs each algorithm sends.

Raft

Raft decomposes the consensus problem into three largely independent sub-problems:

  1. Leader election — exactly one server must be authoritative at any term
  2. Log replication — the leader accepts writes and replicates them to followers
  3. Safety — once an entry is committed, it can never be overwritten or lost

Cluster roles

Every Raft server is always in one of three states:

StateColourDescription
Follower🔵 BluePassive; just applies entries from the leader
Candidate🟡 YellowCampaigning to become leader
Leader🔴 RedHandles all client writes; replicates to followers

Leader election and the election timeout

Each follower has an election timeout — the arc you see growing around each blue node. If that timer expires without receiving a heartbeat from the leader, the follower assumes the leader is dead and promotes itself to Candidate.

The candidate increments its term number (visible inside the node as v1, v2, …), votes for itself, and fires RequestVote (purple circles) at every other server. A server grants its vote if:

  • the candidate's term is at least as large as its own, and
  • it hasn't already voted for someone else this term

If the candidate collects votes from a majority of the cluster, it wins and becomes the new leader — and immediately starts sending AppendEntries heartbeats (blue circles) to reset everyone's timers and prevent a new election.

The randomised timeout window (each follower picks a different timeout) ensures that in most cases only one candidate starts an election at a time, avoiding perpetual split votes.

Speed1×
Leader / PrimaryFollower / BackupCandidateView-changeDisconnected
Client writeRequestVoteVoteGrantedVoteDeniedAppendEntriesAppendEntriesOK

Simulation started — RAFT

Try it: disconnect the red leader node (click it). You'll see the blue nodes' timer arcs drain. One runs out first, turns yellow, and fires purple RequestVote RPCs. Green VoteGranted circles come back. Once a majority responds, a new red leader appears and immediately sends blue heartbeats.

Reconnect the old leader — it receives a heartbeat from the new leader with a higher term, realises someone else is in charge, and quietly steps back to follower.

Log replication

Once a leader is elected, it handles all client writes. The flow is:

  1. Leader appends the entry to its own log
  2. Leader sends AppendEntries to all followers (the blue circles — these double as heartbeats when empty)
  3. Once a majority of servers acknowledge, the entry is committed
  4. Leader tells followers to commit in the next round

This majority requirement is the key safety property: a committed entry must be on enough servers that any future leader will have it in their log. Raft's Leader Completeness guarantee means a new leader is only elected if its log is at least as up-to-date as any majority — so committed entries are never lost.


Viewstamped Replication

VSR (first described in 1988 and revised by Liskov & Cowling in 2012) frames the problem as state machine replication: keep replicas of a state machine in sync so that the system continues to work even when some replicas fail.

Terminology

VSR uses different names for the same concepts:

VSR termRaft equivalent
Primary (red)Leader
Backup (blue)Follower
ViewTerm
View changeLeader election
Op-numberLog index

Normal operation

During normal operation, the primary receives client requests and drives replication:

  1. Primary sends Prepare(view, op-number, request) to all backups (blue circles)
  2. Each backup increments its op-number, appends the entry, and responds PrepareOK (teal circles)
  3. When the primary has collected PrepareOK from a majority, the entry is committed
  4. Primary broadcasts Commit (yellow circles) so backups can advance their commit pointer

The primary also sends periodic Prepare messages even with no new client operations, acting as a heartbeat and keeping backups' timers reset.

Speed1×
Leader / PrimaryFollower / BackupCandidateView-changeDisconnected
Client writePreparePrepareOKCommitStartViewChangeDoViewChangeStartView

Simulation started — VSR

Try it: disconnect the primary. Backups notice the silence, their timers expire, and they kick off a view change. Red StartViewChange messages propagate through the cluster. Purple DoViewChange messages converge on the deterministically chosen new primary (the node at index view mod N). Once it collects a majority, it fires yellow StartView to announce the new view, and normal Prepare replication resumes.

View change protocol

The VSR view change is more structured than Raft's election:

  1. A backup whose timer expires increments the view number and broadcasts StartViewChange(v+1) to all replicas
  2. Any replica receiving this message also broadcasts it (propagation), and sends DoViewChange to the designated new primary — deterministically the node at index v+1 mod N
  3. The new primary waits for DoViewChange from a majority, then starts the new view with StartView

The key difference from Raft: the new leader is not elected by competition — it is mathematically determined by the view number. If that node is also down, the view number keeps incrementing until an available node lands on its slot.


Raft vs. Viewstamped Replication

Both algorithms tolerate up to ⌊(N−1)/2⌋ simultaneous failures in a cluster of N servers. A 5-node cluster can lose 2 nodes and still make progress.

PropertyRaftVSR
New leader selectionFirst candidate to win a majority voteDeterministic: view mod N
Heartbeat mechanismAppendEntries (also carries log entries)Prepare + separate Commit
View/term trackingcurrentTermview-number
Vote deduplicationvotedFor per termImplicit via DoViewChange quorum
Historical roleModern, pedagogical (2014)Pioneer; predates Paxos (1988)

Conceptually, the two are duals: Raft handles leader election and log replication in the same RPC (AppendEntries), while VSR separates replication (Prepare/PrepareOK) from commit notification (Commit) and keeps view changes as a distinct three-phase protocol.

Both algorithms require a quorum — a strict majority of servers — to make any progress. This is unavoidable: it is the minimum requirement to prevent split-brain (two leaders simultaneously accepting writes). With N=5, you need 3 servers alive. Try adding nodes in the simulation and then disconnecting several at once to see when the cluster halts.

What the simulation doesn't show

The visualisation is simplified in a few ways to keep it legible:

  • Log contents are not shown — nodes just track a commit counter
  • Disk writes are omitted (in real systems, entries are fsynced before acknowledging)
  • Network partitions are modelled as total node disconnections, not one-directional partitions
  • Snapshots / log compaction (needed in long-running systems) are not simulated
  • Membership changes (adding/removing nodes without downtime) are not modelled

For a deeper treatment of these topics, the original papers are surprisingly readable.

Further reading

Related Articles

·11 min read

The Challenges of WebGL

WebGL lets you run shaders in the browser — and that's genuinely magical. But using it for arbitrary GPU computation quickly reveals a thicket of constraints that OpenCL and WebGPU were designed to escape.