Leader-Follower Replication

TL;DR

Leader-follower replication is a pattern where one node (the leader) handles all writes and replicates data to multiple follower nodes that serve reads and provide fault tolerance. If the leader fails, a follower is promoted to become the new leader. This pattern achieves high availability, fault tolerance, and read scalability.

Visual Overview

LEADER-FOLLOWER ARCHITECTURE:

                      ┌──────────────┐
                      │   Leader     │
                      │  (Partition 0)│
                      └───────┬──────┘
                              │
                   Replication (async/sync)
                              │
            ┌─────────────────┼─────────────────┐
            │                 │                 │
            ▼                 ▼                 ▼
      ┌──────────┐      ┌──────────┐      ┌──────────┐
      │Follower 1│      │Follower 2│      │Follower 3│
      │ (Replica)│      │ (Replica)│      │ (Replica)│
      └──────────┘      └──────────┘      └──────────┘

WRITE FLOW:
1. Producer → Leader (write)
2. Leader → Append to local log
3. Leader → Replicate to Followers
4. Followers → Acknowledge replication
5. Leader → Acknowledge to Producer (after sync replicas)

READ FLOW:
- Option 1: Read from Leader (strong consistency)
- Option 2: Read from Followers (eventual consistency, higher throughput)

FAILURE SCENARIO:
Leader Fails:
  ┌──────────────┐
  │   Leader ✕  │
  └──────────────┘
         ↓
  Election Process
         ↓
  ┌──────────────┐
  │ New Leader ✓ │ ← Follower 1 promoted
  │ (Follower 1) │
  └──────────────┘

Core Explanation

What is Leader-Follower Replication?

Leader-follower replication (also called master-slave or primary-secondary) is a replication pattern where:

One Leader: Handles all writes, maintains authoritative copy
Multiple Followers: Replicate leader’s data, can serve reads
Automatic Failover: Follower promoted to leader on failure
Consistency: Leader ensures all replicas converge to same state

Single Node (No Replication):
┌─────────────┐
│  Server A   │
│  [DATA]     │
└─────────────┘
  ↓
Server crashes → DATA LOST ✕

Leader-Follower Replication:
┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│  Leader     │ ──→ │ Follower 1  │ ──→ │ Follower 2  │
│  [DATA]     │     │  [DATA]     │     │  [DATA]     │
└─────────────┘     └─────────────┘     └─────────────┘
  ↓
Leader crashes → Follower promoted → DATA SAFE ✓

Replication Modes: Synchronous vs Asynchronous

Synchronous Replication (Strong Consistency):

SYNCHRONOUS REPLICATION:

1. Client → Write to Leader
2. Leader → Write to local log
3. Leader → Send to Follower 1 & 2 (parallel)
4. Follower 1 → Acknowledge
5. Follower 2 → Acknowledge
6. Leader → Acknowledge to Client (AFTER followers ACK)

Timeline:
Client Write ─┐
              │
Leader Write  ├──► [10ms]
              │
Followers ACK ├──► [20ms] ← Wait for all followers
              │
Client ACK    └──► [25ms] ← Client waits 25ms total

Pros:
✓ No data loss (all replicas have data before ACK)
✓ Strong consistency (read from any replica is up-to-date)

Cons:
✕ High latency (wait for slowest follower)
✕ Availability issues (if follower down, writes block)

Asynchronous Replication (Eventual Consistency):

ASYNCHRONOUS REPLICATION:

1. Client → Write to Leader
2. Leader → Write to local log
3. Leader → Acknowledge to Client (IMMEDIATE)
4. Leader → Send to Followers (async, in background)
5. Followers → Eventually apply updates

Timeline:
Client Write ─┐
              │
Leader Write  ├──► [10ms]
              │
Client ACK    └──► [12ms] ← Client doesn't wait for followers
              ↓
Followers ACK (background, 50ms later)

Pros:
✓ Low latency (client doesn't wait for followers)
✓ High availability (follower failures don't block writes)

Cons:
✕ Potential data loss (if leader crashes before replication)
✕ Stale reads (followers may lag behind leader)

Hybrid: In-Sync Replicas (ISR) - Kafka’s Approach:

IN-SYNC REPLICAS (ISR):

Config: replication.factor=3, min.insync.replicas=2

Replicas:
- Leader (always in ISR)
- Follower 1 (in ISR if < 10s lag)
- Follower 2 (in ISR if < 10s lag)
- Follower 3 (NOT in ISR, lagging > 10s)

Write Flow:
1. Client → Write to Leader
2. Leader → Write + send to all followers
3. Wait for min.insync.replicas=2 ACKs (Leader + 1 follower)
4. Acknowledge to Client

Guarantees:
✓ At least 2 replicas have data before ACK
✓ Fast writes (don't wait for slow follower 3)
✕ If ISR count < min.insync.replicas, writes fail (availability hit)

Best of both worlds!

Leader Election and Failover

When Does Failover Happen?

FAILURE DETECTION:

Heartbeat Mechanism:
Followers → Send heartbeat to Leader every 1s
Leader → Send heartbeat to Followers every 1s

Failure Scenarios:
1. Leader misses heartbeats → Followers detect leader failure
2. Follower misses heartbeats → Leader removes from ISR
3. Network partition → Split-brain prevention needed

Leader Election Process:

LEADER ELECTION (Simplified):

1. FAILURE DETECTION
   Leader fails (no heartbeat for 10s)

2. ELECTION TRIGGER
   Followers detect failure
   Start election process

3. CANDIDATE SELECTION
   Criteria for new leader:
   - ✓ In ISR (up-to-date replica)
   - ✓ Highest offset (most data)
   - ✓ Lowest broker ID (tie-breaker)

4. NEW LEADER ANNOUNCED
   Controller broadcasts new leader
   Followers connect to new leader

5. RESUME OPERATIONS
   New leader accepts writes
   Old leader (if recovers) becomes follower

Total Failover Time: ~5-30 seconds

Kafka’s Controller-Based Election:

KAFKA ARCHITECTURE:

┌─────────────────────────────────────────┐
│         ZooKeeper / KRaft              │ ← Metadata store
│   - Broker liveness                    │
│   - Controller election                │
│   - Partition assignments              │
└────────────────┬────────────────────────┘
                 │
         ┌───────┴────────┐
         │   Controller   │ ← ONE broker elected as controller
         │   (Broker 2)   │    Manages all leader elections
         └───────┬────────┘
                 │
    ┌────────────┼────────────┐
    │            │            │
┌───┴───┐   ┌───┴───┐   ┌───┴───┐
│Broker1│   │Broker2│   │Broker3│
└───────┘   └───────┘   └───────┘

Controller Responsibilities:
1. Monitor broker liveness
2. Elect partition leaders when failures occur
3. Update metadata in ZooKeeper/KRaft
4. Notify all brokers of leadership changes

Replication Lag and ISR Management

What is Replication Lag?

REPLICATION LAG:

Leader:       [msg0][msg1][msg2][msg3][msg4][msg5] ← Offset 5

Follower 1:   [msg0][msg1][msg2][msg3][msg4][msg5] ← Lag: 0 ✓ IN ISR

Follower 2:   [msg0][msg1][msg2][msg3]             ← Lag: 2 ⚠️ IN ISR

Follower 3:   [msg0][msg1]                         ← Lag: 4 ✕ OUT OF ISR

ISR Criteria:
- replica.lag.time.max.ms=10000 (10 seconds)
- If follower doesn't fetch within 10s → Removed from ISR

ISR Dynamics:

TIMELINE OF ISR CHANGES:

T=0: All replicas in ISR
  ISR = [Leader, Follower1, Follower2, Follower3]

T=15s: Follower3 network issue, can't fetch
  ISR = [Leader, Follower1, Follower2]
  (Follower3 removed after 10s lag)

T=30s: Follower3 recovers, catches up
  ISR = [Leader, Follower1, Follower2, Follower3]
  (Follower3 re-added after catching up)

T=45s: Leader crashes
  Election triggered
  New Leader = Follower1 (highest offset in ISR)
  ISR = [Follower1(new leader), Follower2, Follower3]

Read Patterns

Read from Leader (Strong Consistency):

All reads go to Leader:

Clients → Leader (reads)
         ↓
     [DATA v5] ← Always latest version

Pros:
✓ Strong consistency (always up-to-date)
✓ Simple (no staleness issues)

Cons:
✕ Leader bottleneck (all read traffic)
✕ Doesn't scale with more replicas

Read from Followers (Eventual Consistency):

Reads distributed across replicas:

Client A → Follower 1 [DATA v4] ← Slightly stale
Client B → Follower 2 [DATA v5] ← Up-to-date
Client C → Leader    [DATA v5] ← Always latest

Pros:
✓ Read scalability (horizontal scaling)
✓ Lower latency (geographically closer follower)

Cons:
✕ Eventual consistency (may read stale data)
✕ Monotonic read issues (read v5, then v4)

Hybrid: Read-Your-Writes Consistency:

Strategy: Track last write offset, read only from replicas >= that offset

1. Client writes to Leader → Receives offset 100
2. Client reads → Request includes "minOffset=100"
3. Router → Send to follower with offset >= 100
4. If no follower caught up → Read from Leader

Result: Client always reads its own writes ✓

Tradeoffs

Advantages:

✓ Fault tolerance (survive N-1 failures with N replicas)
✓ High availability (automatic failover)
✓ Read scalability (distribute reads to followers)
✓ Data durability (multiple copies)

Disadvantages:

✕ Write latency (replication overhead)
✕ Consistency complexity (sync vs async tradeoffs)
✕ Failover time (10-30s downtime during leader election)
✕ Split-brain risk (requires external coordinator)

Real Systems Using This

Apache Kafka

Implementation: Leader per partition, ISR-based replication
Scale: 3-5 replicas typical, 7+ for critical data
Failover: Controller-based election, ~10s failover time
Typical Setup: replication.factor=3, min.insync.replicas=2

MongoDB

Implementation: Replica sets with primary and secondaries
Scale: 3-7 replicas per replica set
Failover: Raft-based election, ~10-40s failover
Typical Setup: 3 replicas, read preference “primaryPreferred”

PostgreSQL

Implementation: Streaming replication (WAL-based)
Scale: 1 primary + N standbys
Failover: Manual or automatic (with tools like Patroni)
Typical Setup: 1 primary + 2 standbys, async replication

Redis

Implementation: Master-slave replication
Scale: 1 master + multiple slaves
Failover: Redis Sentinel for automatic failover
Typical Setup: 1 master + 2 slaves + 3 Sentinel nodes

When to Use Leader-Follower Replication

✓ Perfect Use Cases

High Availability Critical Systems

Scenario: E-commerce platform requiring 99.99% uptime
Solution: 3 replicas, auto-failover on leader crash
Result: Survive single node failure with <30s downtime

Read-Heavy Workloads

Scenario: News site with 10:1 read/write ratio
Solution: 1 leader + 5 followers, reads from followers
Result: 6x read throughput

Geo-Distributed Reads

Scenario: Global application with users in US, EU, Asia
Solution: Leader in US, followers in EU and Asia
Result: Low-latency reads for all regions

✕ When NOT to Use

Multi-Region Writes

Problem: Users in EU and Asia need to write locally
Issue: All writes go to single leader (high latency)
Alternative: Multi-leader replication or sharding

Need for Strong Consistency Reads

Problem: Bank balance must always be current
Issue: Follower reads may be stale
Alternative: Read from leader or use quorum reads

Extremely High Write Throughput

Problem: 100K writes/sec overwhelming single leader
Issue: Leader bottleneck
Alternative: Partition data across multiple leaders (sharding)

Interview Application

Common Interview Question 1

Q: “Design a highly available message queue. How would you handle broker failures?”

Strong Answer:

“I’d use leader-follower replication with in-sync replicas (Kafka’s model):

Architecture:

Each partition has replication.factor=3 (1 leader + 2 followers)

min.insync.replicas=2 (leader + at least 1 follower must ACK)

Controller broker manages leader elections

Normal Operation:

Producers write to partition leader

Leader replicates to followers in parallel

ACK to producer after min 2 replicas confirm

Consumers read from leader (or followers for lower priority)

Failure Handling:

Follower failure: Removed from ISR, writes continue with remaining ISR

Leader failure: Controller elects new leader from ISR within 10-30s

Network partition: Rely on ZooKeeper quorum to prevent split-brain

Trade-offs:

Synchronous to ISR = no data loss but slightly higher latency

Async to non-ISR replicas = fast writes but potential data loss on leader crash

This is exactly how Kafka achieves 99.99%+ availability at LinkedIn scale.”

Why this is good:

Specific configuration values
Handles multiple failure scenarios
Explains trade-offs clearly
References real-world implementation

Common Interview Question 2

Q: “What’s the difference between synchronous and asynchronous replication? When would you use each?”