Leader-Follower Replication

TL;DR

Leader-follower replication is a pattern where one node (the leader) handles all writes and replicates data to multiple follower nodes that serve reads and provide fault tolerance. If the leader fails, a follower is promoted to become the new leader. This pattern achieves high availability, fault tolerance, and read scalability.

Visual Overview

Leader-Follower Architecture

┌────────────────────────────────────────────┐
│ Leader                                     │
│ - accepts writes                           │
│ - appends to local log                     │
│ - replicates entries to followers          │
└────────────────────────────────────────────┘
                  │
                  ▼
┌────────────────────────────────────────────┐
│ Followers                                  │
│ - copy the leader log                      │
│ - serve reads when staleness is acceptable │
│ - participate in failover                  │
└────────────────────────────────────────────┘

WRITE FLOW:

1. Producer → Leader (write)
2. Leader → Append to local log
3. Leader → Replicate to Followers
4. Followers → Acknowledge replication
5. Leader → Acknowledge to Producer (after sync replicas)

READ FLOW:

- Option 1: Read from Leader (strong consistency)
- Option 2: Read from Followers (eventual consistency, higher throughput)

FAILURE SCENARIO:
┌────────────────────────────────────────────┐
│ Leader fails                               │
│ Followers detect missed heartbeats         │
│ Election chooses the most up-to-date node  │
│ New leader resumes writes                  │
└────────────────────────────────────────────┘

Core Explanation

What is Leader-Follower Replication?

Leader-follower replication (also called master-slave or primary-secondary) is a replication pattern where:

One Leader: Handles all writes, maintains authoritative copy
Multiple Followers: Replicate leader’s data, can serve reads
Automatic Failover: Follower promoted to leader on failure
Consistency: Leader ensures all replicas converge to same state

Single Node vs Leader-Follower

Single Node (No Replication):
┌─────────────┐
│  Server A   │
│  [DATA]     │
└─────────────┘
↓
Server crashes → DATA LOST ✕

Leader-Follower Replication:
┌────────────────────────────────────────────┐
│ Leader stores DATA                         │
│ Followers replicate DATA                   │
│ Leader crashes                             │
│ Follower promoted                          │
│ Result: DATA SAFE                          │
└────────────────────────────────────────────┘

Replication Modes: Synchronous vs Asynchronous

Synchronous Replication (Strong Consistency):

Synchronous Replication

1. Client → Write to Leader
2. Leader → Write to local log
3. Leader → Send to Follower 1 & 2 (parallel)
4. Follower 1 → Acknowledge
5. Follower 2 → Acknowledge
6. Leader → Acknowledge to Client (AFTER followers ACK)

Timeline:
Client Write ─┐
│
Leader Write ├──► [10ms]
│
Followers ACK ├──► [20ms] ← Wait for all followers
│
Client ACK └──► [25ms] ← Client waits 25ms total

Pros:
✓ No data loss (all replicas have data before ACK)
✓ Strong consistency (read from any replica is up-to-date)

Cons:
✕ High latency (wait for slowest follower)
✕ Availability issues (if follower down, writes block)

Asynchronous Replication (Eventual Consistency):

Asynchronous Replication

1. Client → Write to Leader
2. Leader → Write to local log
3. Leader → Acknowledge to Client (IMMEDIATE)
4. Leader → Send to Followers (async, in background)
5. Followers → Eventually apply updates

Timeline:
Client Write ─┐
│
Leader Write ├──► [10ms]
│
Client ACK └──► [12ms] ← Client doesn't wait for followers
↓
Followers ACK (background, 50ms later)

Pros:
✓ Low latency (client doesn't wait for followers)
✓ High availability (follower failures don't block writes)

Cons:
✕ Potential data loss (if leader crashes before replication)
✕ Stale reads (followers may lag behind leader)

Hybrid: In-Sync Replicas (ISR) - Kafka’s Approach:

In-Sync Replicas (ISR)

Config: replication.factor=3, min.insync.replicas=2

Replicas:

- Leader (always in ISR)
- Follower 1 (in ISR if < 10s lag)
- Follower 2 (in ISR if < 10s lag)
- Follower 3 (NOT in ISR, lagging > 10s)

Write Flow:

1. Client → Write to Leader
2. Leader → Write + send to all followers
3. Wait for min.insync.replicas=2 ACKs (Leader + 1 follower)
4. Acknowledge to Client

Guarantees:
✓ At least 2 replicas have data before ACK
✓ Fast writes (don't wait for slow follower 3)
✕ If ISR count < min.insync.replicas, writes fail (availability hit)

Best of both worlds!

Leader Election and Failover

When Does Failover Happen?

Failure Detection

Heartbeat Mechanism:
Followers → Send heartbeat to Leader every 1s
Leader → Send heartbeat to Followers every 1s

Failure Scenarios:

1. Leader misses heartbeats → Followers detect leader failure
2. Follower misses heartbeats → Leader removes from ISR
3. Network partition → Split-brain prevention needed

Leader Election Process:

Leader Election Process

LEADER ELECTION (Simplified):

1. FAILURE DETECTION
 Leader fails (no heartbeat for 10s)

2. ELECTION TRIGGER
 Followers detect failure
 Start election process

3. CANDIDATE SELECTION
 Criteria for new leader:
 - ✓ In ISR (up-to-date replica)
 - ✓ Highest offset (most data)
 - ✓ Lowest broker ID (tie-breaker)

4. NEW LEADER ANNOUNCED
 Controller broadcasts new leader
 Followers connect to new leader

5. RESUME OPERATIONS
 New leader accepts writes
 Old leader (if recovers) becomes follower

Total Failover Time: ~5-30 seconds

Kafka’s Controller-Based Election:

Kafka Controller Architecture

KAFKA ARCHITECTURE:

┌─────────────────────────────────────────┐
│ ZooKeeper / KRaft                       │
│ Metadata store                          │
│ - Broker liveness                       │
│ - Controller election                   │
│ - Partition assignments                 │
└─────────────────────────────────────────┘
│
┌───────┴────────┐
│ Controller     │
│ Broker 2       │
│ one broker manages leader elections
└────────────────┘
│
┌────────────┼────────────┐
│ │                       │
┌───┴───┐ ┌───┴───┐ ┌───┴───┐
│Broker1│ │Broker2│ │Broker3│
└───────┘ └───────┘ └───────┘

Controller Responsibilities:

1. Monitor broker liveness
2. Elect partition leaders when failures occur
3. Update metadata in ZooKeeper/KRaft
4. Notify all brokers of leadership changes

Replication Lag and ISR Management

What is Replication Lag?

Replication Lag

Leader: [msg0][msg1][msg2][msg3][msg4][msg5] ← Offset 5

Follower 1: [msg0][msg1][msg2][msg3][msg4][msg5] ← Lag: 0 ✓ IN ISR

Follower 2: [msg0][msg1][msg2][msg3] ← Lag: 2 ⚠️ IN ISR

Follower 3: [msg0][msg1] ← Lag: 4 ✕ OUT OF ISR

ISR Criteria:

- replica.lag.time.max.ms=10000 (10 seconds)
- If follower doesn't fetch within 10s → Removed from ISR

ISR Dynamics:

ISR Dynamics

TIMELINE OF ISR CHANGES:

T=0: All replicas in ISR
ISR = [Leader, Follower1, Follower2, Follower3]

T=15s: Follower3 network issue, can't fetch
ISR = [Leader, Follower1, Follower2]
(Follower3 removed after 10s lag)

T=30s: Follower3 recovers, catches up
ISR = [Leader, Follower1, Follower2, Follower3]
(Follower3 re-added after catching up)

T=45s: Leader crashes
Election triggered
New Leader = Follower1 (highest offset in ISR)
ISR = [Follower1(new leader), Follower2, Follower3]

Read Patterns

Read from Leader (Strong Consistency):

Read from Leader

All reads go to Leader:

Clients → Leader (reads)
↓
[DATA v5] ← Always latest version

Pros:
✓ Strong consistency (always up-to-date)
✓ Simple (no staleness issues)

Cons:
✕ Leader bottleneck (all read traffic)
✕ Doesn't scale with more replicas

Read from Followers (Eventual Consistency):

Read from Followers

Reads distributed across replicas:

Client A → Follower 1 [DATA v4] ← Slightly stale
Client B → Follower 2 [DATA v5] ← Up-to-date
Client C → Leader [DATA v5] ← Always latest

Pros:
✓ Read scalability (horizontal scaling)
✓ Lower latency (geographically closer follower)

Cons:
✕ Eventual consistency (may read stale data)
✕ Monotonic read issues (read v5, then v4)

Hybrid: Read-Your-Writes Consistency:

Read-Your-Writes Consistency

Strategy: Track last write offset, read only from replicas >= that offset

1. Client writes to Leader → Receives offset 100
2. Client reads → Request includes "minOffset=100"
3. Router → Send to follower with offset >= 100
4. If no follower caught up → Read from Leader

Result: Client always reads its own writes ✓

Tradeoffs

Advantages:

✓ Fault tolerance (survive N-1 failures with N replicas)
✓ High availability (automatic failover)
✓ Read scalability (distribute reads to followers)
✓ Data durability (multiple copies)

Disadvantages:

✕ Write latency (replication overhead)
✕ Consistency complexity (sync vs async tradeoffs)
✕ Failover time (10-30s downtime during leader election)
✕ Split-brain risk (requires external coordinator)

Real Systems Using This

Apache Kafka

Implementation: Leader per partition, ISR-based replication
Scale: 3-5 replicas typical, 7+ for critical data
Failover: Controller-based election, ~10s failover time
Typical Setup: replication.factor=3, min.insync.replicas=2

MongoDB

Implementation: Replica sets with primary and secondaries
Scale: 3-7 replicas per replica set
Failover: Raft-based election, ~10-40s failover
Typical Setup: 3 replicas, read preference “primaryPreferred”

PostgreSQL

Implementation: Streaming replication (WAL-based)
Scale: 1 primary + N standbys
Failover: Manual or automatic (with tools like Patroni)
Typical Setup: 1 primary + 2 standbys, async replication

Redis

Implementation: Master-slave replication
Scale: 1 master + multiple slaves
Failover: Redis Sentinel for automatic failover
Typical Setup: 1 master + 2 slaves + 3 Sentinel nodes

When to Use Leader-Follower Replication

✓ Perfect Use Cases

Use Case	Scenario	Solution	Result
High Availability Critical Systems	E-commerce platform requiring 99.99% uptime	3 replicas, auto-failover on leader crash	Survive single node failure with under 30s downtime
Read-Heavy Workloads	News site with 10:1 read/write ratio	1 leader + 5 followers, reads from followers	6x read throughput
Geo-Distributed Reads	Global application with users in US, EU, Asia	Leader in US, followers in EU and Asia	Low-latency reads for all regions

✕ When NOT to Use

Anti-Pattern	Problem	Issue	Alternative
Multi-Region Writes	Users in EU and Asia need to write locally	All writes go to single leader (high latency)	Multi-leader replication or sharding
Need for Strong Consistency Reads	Bank balance must always be current	Follower reads may be stale	Read from leader or use quorum reads
Extremely High Write Throughput	100K writes/sec overwhelming single leader	Leader bottleneck	Partition data across multiple leaders (sharding)

Interview Application

Common Interview Question 1

Q: “Design a highly available message queue. How would you handle broker failures?”

Strong Answer:

“I’d use leader-follower replication with in-sync replicas (Kafka’s model):

Architecture:

Each partition has replication.factor=3 (1 leader + 2 followers)

min.insync.replicas=2 (leader + at least 1 follower must ACK)

Controller broker manages leader elections

Normal Operation:

Producers write to partition leader

Leader replicates to followers in parallel

ACK to producer after min 2 replicas confirm

Consumers read from leader (or followers for lower priority)

Failure Handling:

Follower failure: Removed from ISR, writes continue with remaining ISR

Leader failure: Controller elects new leader from ISR within 10-30s

Network partition: Rely on ZooKeeper quorum to prevent split-brain

Trade-offs:

Synchronous to ISR = no data loss but slightly higher latency

Async to non-ISR replicas = fast writes but potential data loss on leader crash

This is exactly how Kafka achieves 99.99%+ availability at LinkedIn scale.”

Why this is good:

Specific configuration values
Handles multiple failure scenarios
Explains trade-offs clearly
References real-world implementation

Common Interview Question 2

Q: “What’s the difference between synchronous and asynchronous replication? When would you use each?”