Replication | Concepts

TL;DR

Replication is copying data across multiple nodes to survive failures, distribute load, and reduce latency. The core trade-off: synchronous replication guarantees consistency but adds latency; asynchronous replication is fast but risks data loss. Choose your strategy based on whether correctness or availability matters more.

Visual Overview

Replication Overview

Why Replicate?

Motivation	Single Node	With Replication
Availability	Node fails = down	Failover to replica
Read Scaling	One node handles all	Distribute reads
Latency	Single location	Replicas near users
Durability	Disk fails = data loss	Multiple copies survive

Replication Strategies

1. Single-Leader (Primary-Replica)

One node (leader) accepts all writes. Followers replicate from leader.

         Writes
            ↓
        [Leader]
        /   |   \
       ↓    ↓    ↓
    [F1]  [F2]  [F3]  ← Followers
      ↑     ↑     ↑
         Reads

Used by: PostgreSQL, MySQL, MongoDB, Redis

Pros: Simple, strong consistency possible Cons: Leader is bottleneck, failover complexity

2. Multi-Leader

Multiple nodes accept writes. Replicate changes between leaders.

    [Leader DC1] ←────→ [Leader DC2]
        |                   |
    [Follower]          [Follower]

Used by: CockroachDB (multi-region), collaborative editing

Pros: Local writes in each region Cons: Conflict resolution required

3. Leaderless

All nodes accept reads and writes. Quorum determines success.

    Client
   /   |   \
  ↓    ↓    ↓
[N1]  [N2]  [N3]

Used by: Cassandra, DynamoDB, Riak

Pros: No failover needed, high availability Cons: Eventually consistent, conflict resolution

Synchronous vs Asynchronous

Synchronous Replication

Leader waits for replica acknowledgment before confirming write.

Client → Leader: WRITE x=1
Leader → Follower: Replicate x=1
Follower → Leader: ACK
Leader → Client: SUCCESS

Guarantee: If write succeeds, data is on multiple nodes Cost: Latency = leader + slowest replica

Asynchronous Replication

Leader confirms immediately, replicates in background.

Client → Leader: WRITE x=1
Leader → Client: SUCCESS
Leader → Follower: Replicate x=1 (background)

Guarantee: None—leader crash can lose committed writes Benefit: Low latency

Semi-Synchronous

Wait for at least one replica (not all).

W = 2 (write quorum)
Wait for leader + 1 follower
Remaining followers: async

Used by: MySQL semi-sync, PostgreSQL synchronous_commit

Replication Lag

The delay between write on leader and visibility on replica.

Time →
Leader:   Write x=1 │ Write x=2 │ Write x=3
Replica:  ──────────│ x=1 ──────│ x=2 ──────│ x=3
                    ↑           ↑           ↑
               Lag: 50ms    Lag: 50ms    Lag: 50ms

Lag Causes Problems

Read-your-writes violation:

User: POST update profile
User: GET profile (routed to stale replica)
User: Sees old profile! Confused.

Solutions:

Read own writes from leader
Track write timestamp, ensure replica is caught up
Session affinity to same replica

Failover

When leader fails, promote a follower.

Before:  [Leader*] → [F1] → [F2]
                       ↓
After:   [Leader*]   [F1*] → [F2]
         (dead)    (new leader)

Challenges:

Detection: Is leader dead or just slow?
Election: Which follower becomes leader?
Reconfiguration: Redirect clients to new leader
Data loss: Async replication may lose recent writes

Split-Brain

Network partition creates two leaders:

[Old Leader] ──X── [New Leader + Followers]
     ↑                    ↑
  Client A             Client B

Both accept writes → data divergence!

Prevention: Quorum-based leader election (Raft, Paxos)

Replication Mechanisms

Mechanism	How It Works	Used By
Statement-based	Replicate SQL statements	MySQL (legacy)
WAL shipping	Send write-ahead log	PostgreSQL
Logical (row-based)	Send changed rows	MySQL binlog
Trigger-based	Application-level capture	Custom solutions

Statement-Based Problems

-- Leader executes:
INSERT INTO logs VALUES (NOW(), 'event');

-- Replica executes same statement:
INSERT INTO logs VALUES (NOW(), 'event');
-- NOW() returns different time! Data diverges.

Issues: Non-deterministic functions, auto-increment IDs

WAL Shipping

Send the physical log bytes:

Block 427: [old_bytes] → [new_bytes]
Block 893: [old_bytes] → [new_bytes]

Pros: Exact byte-for-byte replication Cons: Tied to storage engine version

Logical (Row-Based)

{
  "table": "users",
  "op": "UPDATE",
  "before": {"id": 1, "name": "old"},
  "after": {"id": 1, "name": "new"}
}

Pros: Version-independent, readable, enables CDC Cons: More verbose than WAL

Production Systems

System	Strategy	Sync Mode	Lag Typical
PostgreSQL	Single-leader	Configurable	sub-second async
MySQL	Single-leader	Semi-sync	sub-second
MongoDB	Replica set	Configurable	sub-second
Cassandra	Leaderless	Quorum	10-100ms
Kafka	Single-leader per partition	ISR	under 100ms

See It In Action:

Eventual Consistency - ~80 second animated visual explanation of async replication

Prerequisites:

Distributed Systems Basics - Foundation concepts

Related Concepts:

Leader-Follower Replication - Single-leader deep dive
Consensus - Leader election algorithms
Eventual Consistency - Async replication guarantees
Write-Ahead Log - How WAL shipping works

Used In Systems:

PostgreSQL, MySQL, MongoDB (database replication)
Kafka (partition replication with ISR)
Redis (leader-follower, Redis Cluster)
etcd, Consul (Raft-based replication)

Next Recommended: Leader-Follower Replication - Deep dive into the most common pattern