How distributed systems achieve fault tolerance and high availability by replicating data from a leader node to multiple follower nodes
85% of distributed systems interviews
Powers systems at Kafka, MongoDB, PostgreSQL
99.99%+ uptime query improvement
7+ trillion messages/day (LinkedIn)
TL;DR
Leader-follower replication is a pattern where one node (the leader) handles all writes and replicates data to multiple follower nodes that serve reads and provide fault tolerance. If the leader fails, a follower is promoted to become the new leader. This pattern achieves high availability, fault tolerance, and read scalability.
Visual Overview
LEADER-FOLLOWER ARCHITECTURE:
ββββββββββββββββ
β Leader β
β (Partition 0)β
βββββββββ¬βββββββ
β
Replication (async/sync)
β
βββββββββββββββββββΌββββββββββββββββββ
β β β
βΌ βΌ βΌ
ββββββββββββ ββββββββββββ ββββββββββββ
βFollower 1β βFollower 2β βFollower 3β
β (Replica)β β (Replica)β β (Replica)β
ββββββββββββ ββββββββββββ ββββββββββββ
WRITE FLOW:
1. Producer β Leader (write)
2. Leader β Append to local log
3. Leader β Replicate to Followers
4. Followers β Acknowledge replication
5. Leader β Acknowledge to Producer (after sync replicas)
READ FLOW:
- Option 1: Read from Leader (strong consistency)
- Option 2: Read from Followers (eventual consistency, higher throughput)
FAILURE SCENARIO:
Leader Fails:
ββββββββββββββββ
β Leader β β
ββββββββββββββββ
β
Election Process
β
ββββββββββββββββ
β New Leader β β β Follower 1 promoted
β (Follower 1) β
ββββββββββββββββ
Core Explanation
What is Leader-Follower Replication?
Leader-follower replication (also called master-slave or primary-secondary) is a replication pattern where:
- One Leader: Handles all writes, maintains authoritative copy
- Multiple Followers: Replicate leaderβs data, can serve reads
- Automatic Failover: Follower promoted to leader on failure
- Consistency: Leader ensures all replicas converge to same state
Single Node (No Replication):
βββββββββββββββ
β Server A β
β [DATA] β
βββββββββββββββ
β
Server crashes β DATA LOST β
Leader-Follower Replication:
βββββββββββββββ βββββββββββββββ βββββββββββββββ
β Leader β βββ β Follower 1 β βββ β Follower 2 β
β [DATA] β β [DATA] β β [DATA] β
βββββββββββββββ βββββββββββββββ βββββββββββββββ
β
Leader crashes β Follower promoted β DATA SAFE β
Replication Modes: Synchronous vs Asynchronous
Synchronous Replication (Strong Consistency):
SYNCHRONOUS REPLICATION:
1. Client β Write to Leader
2. Leader β Write to local log
3. Leader β Send to Follower 1 & 2 (parallel)
4. Follower 1 β Acknowledge
5. Follower 2 β Acknowledge
6. Leader β Acknowledge to Client (AFTER followers ACK)
Timeline:
Client Write ββ
β
Leader Write ββββΊ [10ms]
β
Followers ACK ββββΊ [20ms] β Wait for all followers
β
Client ACK ββββΊ [25ms] β Client waits 25ms total
Pros:
β No data loss (all replicas have data before ACK)
β Strong consistency (read from any replica is up-to-date)
Cons:
β High latency (wait for slowest follower)
β Availability issues (if follower down, writes block)
Asynchronous Replication (Eventual Consistency):
ASYNCHRONOUS REPLICATION:
1. Client β Write to Leader
2. Leader β Write to local log
3. Leader β Acknowledge to Client (IMMEDIATE)
4. Leader β Send to Followers (async, in background)
5. Followers β Eventually apply updates
Timeline:
Client Write ββ
β
Leader Write ββββΊ [10ms]
β
Client ACK ββββΊ [12ms] β Client doesn't wait for followers
β
Followers ACK (background, 50ms later)
Pros:
β Low latency (client doesn't wait for followers)
β High availability (follower failures don't block writes)
Cons:
β Potential data loss (if leader crashes before replication)
β Stale reads (followers may lag behind leader)
Hybrid: In-Sync Replicas (ISR) - Kafkaβs Approach:
IN-SYNC REPLICAS (ISR):
Config: replication.factor=3, min.insync.replicas=2
Replicas:
- Leader (always in ISR)
- Follower 1 (in ISR if < 10s lag)
- Follower 2 (in ISR if < 10s lag)
- Follower 3 (NOT in ISR, lagging > 10s)
Write Flow:
1. Client β Write to Leader
2. Leader β Write + send to all followers
3. Wait for min.insync.replicas=2 ACKs (Leader + 1 follower)
4. Acknowledge to Client
Guarantees:
β At least 2 replicas have data before ACK
β Fast writes (don't wait for slow follower 3)
β If ISR count < min.insync.replicas, writes fail (availability hit)
Best of both worlds!
Leader Election and Failover
When Does Failover Happen?
FAILURE DETECTION:
Heartbeat Mechanism:
Followers β Send heartbeat to Leader every 1s
Leader β Send heartbeat to Followers every 1s
Failure Scenarios:
1. Leader misses heartbeats β Followers detect leader failure
2. Follower misses heartbeats β Leader removes from ISR
3. Network partition β Split-brain prevention needed
Leader Election Process:
LEADER ELECTION (Simplified):
1. FAILURE DETECTION
Leader fails (no heartbeat for 10s)
2. ELECTION TRIGGER
Followers detect failure
Start election process
3. CANDIDATE SELECTION
Criteria for new leader:
- β In ISR (up-to-date replica)
- β Highest offset (most data)
- β Lowest broker ID (tie-breaker)
4. NEW LEADER ANNOUNCED
Controller broadcasts new leader
Followers connect to new leader
5. RESUME OPERATIONS
New leader accepts writes
Old leader (if recovers) becomes follower
Total Failover Time: ~5-30 seconds
Kafkaβs Controller-Based Election:
KAFKA ARCHITECTURE:
βββββββββββββββββββββββββββββββββββββββββββ
β ZooKeeper / KRaft β β Metadata store
β - Broker liveness β
β - Controller election β
β - Partition assignments β
ββββββββββββββββββ¬βββββββββββββββββββββββββ
β
βββββββββ΄βββββββββ
β Controller β β ONE broker elected as controller
β (Broker 2) β Manages all leader elections
βββββββββ¬βββββββββ
β
ββββββββββββββΌβββββββββββββ
β β β
βββββ΄ββββ βββββ΄ββββ βββββ΄ββββ
βBroker1β βBroker2β βBroker3β
βββββββββ βββββββββ βββββββββ
Controller Responsibilities:
1. Monitor broker liveness
2. Elect partition leaders when failures occur
3. Update metadata in ZooKeeper/KRaft
4. Notify all brokers of leadership changes
Replication Lag and ISR Management
What is Replication Lag?
REPLICATION LAG:
Leader: [msg0][msg1][msg2][msg3][msg4][msg5] β Offset 5
Follower 1: [msg0][msg1][msg2][msg3][msg4][msg5] β Lag: 0 β IN ISR
Follower 2: [msg0][msg1][msg2][msg3] β Lag: 2 β οΈ IN ISR
Follower 3: [msg0][msg1] β Lag: 4 β OUT OF ISR
ISR Criteria:
- replica.lag.time.max.ms=10000 (10 seconds)
- If follower doesn't fetch within 10s β Removed from ISR
ISR Dynamics:
TIMELINE OF ISR CHANGES:
T=0: All replicas in ISR
ISR = [Leader, Follower1, Follower2, Follower3]
T=15s: Follower3 network issue, can't fetch
ISR = [Leader, Follower1, Follower2]
(Follower3 removed after 10s lag)
T=30s: Follower3 recovers, catches up
ISR = [Leader, Follower1, Follower2, Follower3]
(Follower3 re-added after catching up)
T=45s: Leader crashes
Election triggered
New Leader = Follower1 (highest offset in ISR)
ISR = [Follower1(new leader), Follower2, Follower3]
Read Patterns
Read from Leader (Strong Consistency):
All reads go to Leader:
Clients β Leader (reads)
β
[DATA v5] β Always latest version
Pros:
β Strong consistency (always up-to-date)
β Simple (no staleness issues)
Cons:
β Leader bottleneck (all read traffic)
β Doesn't scale with more replicas
Read from Followers (Eventual Consistency):
Reads distributed across replicas:
Client A β Follower 1 [DATA v4] β Slightly stale
Client B β Follower 2 [DATA v5] β Up-to-date
Client C β Leader [DATA v5] β Always latest
Pros:
β Read scalability (horizontal scaling)
β Lower latency (geographically closer follower)
Cons:
β Eventual consistency (may read stale data)
β Monotonic read issues (read v5, then v4)
Hybrid: Read-Your-Writes Consistency:
Strategy: Track last write offset, read only from replicas >= that offset
1. Client writes to Leader β Receives offset 100
2. Client reads β Request includes "minOffset=100"
3. Router β Send to follower with offset >= 100
4. If no follower caught up β Read from Leader
Result: Client always reads its own writes β
Tradeoffs
Advantages:
- β Fault tolerance (survive N-1 failures with N replicas)
- β High availability (automatic failover)
- β Read scalability (distribute reads to followers)
- β Data durability (multiple copies)
Disadvantages:
- β Write latency (replication overhead)
- β Consistency complexity (sync vs async tradeoffs)
- β Failover time (10-30s downtime during leader election)
- β Split-brain risk (requires external coordinator)
Real Systems Using This
Apache Kafka
- Implementation: Leader per partition, ISR-based replication
- Scale: 3-5 replicas typical, 7+ for critical data
- Failover: Controller-based election, ~10s failover time
- Typical Setup: replication.factor=3, min.insync.replicas=2
MongoDB
- Implementation: Replica sets with primary and secondaries
- Scale: 3-7 replicas per replica set
- Failover: Raft-based election, ~10-40s failover
- Typical Setup: 3 replicas, read preference βprimaryPreferredβ
PostgreSQL
- Implementation: Streaming replication (WAL-based)
- Scale: 1 primary + N standbys
- Failover: Manual or automatic (with tools like Patroni)
- Typical Setup: 1 primary + 2 standbys, async replication
Redis
- Implementation: Master-slave replication
- Scale: 1 master + multiple slaves
- Failover: Redis Sentinel for automatic failover
- Typical Setup: 1 master + 2 slaves + 3 Sentinel nodes
When to Use Leader-Follower Replication
β Perfect Use Cases
High Availability Critical Systems
Scenario: E-commerce platform requiring 99.99% uptime
Solution: 3 replicas, auto-failover on leader crash
Result: Survive single node failure with <30s downtime
Read-Heavy Workloads
Scenario: News site with 10:1 read/write ratio
Solution: 1 leader + 5 followers, reads from followers
Result: 6x read throughput
Geo-Distributed Reads
Scenario: Global application with users in US, EU, Asia
Solution: Leader in US, followers in EU and Asia
Result: Low-latency reads for all regions
β When NOT to Use
Multi-Region Writes
Problem: Users in EU and Asia need to write locally
Issue: All writes go to single leader (high latency)
Alternative: Multi-leader replication or sharding
Need for Strong Consistency Reads
Problem: Bank balance must always be current
Issue: Follower reads may be stale
Alternative: Read from leader or use quorum reads
Extremely High Write Throughput
Problem: 100K writes/sec overwhelming single leader
Issue: Leader bottleneck
Alternative: Partition data across multiple leaders (sharding)
Interview Application
Common Interview Question 1
Q: βDesign a highly available message queue. How would you handle broker failures?β
Strong Answer:
βIβd use leader-follower replication with in-sync replicas (Kafkaβs model):
Architecture:
- Each partition has replication.factor=3 (1 leader + 2 followers)
- min.insync.replicas=2 (leader + at least 1 follower must ACK)
- Controller broker manages leader elections
Normal Operation:
- Producers write to partition leader
- Leader replicates to followers in parallel
- ACK to producer after min 2 replicas confirm
- Consumers read from leader (or followers for lower priority)
Failure Handling:
- Follower failure: Removed from ISR, writes continue with remaining ISR
- Leader failure: Controller elects new leader from ISR within 10-30s
- Network partition: Rely on ZooKeeper quorum to prevent split-brain
Trade-offs:
- Synchronous to ISR = no data loss but slightly higher latency
- Async to non-ISR replicas = fast writes but potential data loss on leader crash
This is exactly how Kafka achieves 99.99%+ availability at LinkedIn scale.β
Why this is good:
- Specific configuration values
- Handles multiple failure scenarios
- Explains trade-offs clearly
- References real-world implementation
Common Interview Question 2
Q: βWhatβs the difference between synchronous and asynchronous replication? When would you use each?β
Strong Answer:
βSynchronous Replication:
- Leader waits for follower ACKs before responding to client
- Guarantees: No data loss (all replicas have data)
- Trade-off: Higher latency, lower availability (blocked if follower down)
- Use case: Financial transactions, critical metadata
Asynchronous Replication:
- Leader responds immediately, replicates in background
- Guarantees: Low latency, high availability
- Trade-off: Potential data loss if leader crashes before replication
- Use case: Analytics logs, user activity streams
In Production: Most systems use a hybrid like Kafkaβs ISR:
- Synchronous to a quorum (e.g., 2 out of 3 replicas)
- Asynchronous to remaining replicas
- Dynamically remove slow replicas from ISR to maintain availability
- Result: Balance between durability and performance
For example, at Uber, weβd use sync replication for payment events (canβt lose money) but async for GPS location updates (can tolerate occasional loss).β
Why this is good:
- Clear comparison of both approaches
- Explains when to use each
- Mentions hybrid approach (real-world)
- Concrete examples for each use case
Red Flags to Avoid
- β Not understanding the difference between sync and async replication
- β Ignoring split-brain scenarios and how to prevent them
- β Thinking failover is instant (it takes 10-30s typically)
- β Not considering replication lag impact on read consistency
Quick Self-Check
Before moving on, can you:
- Explain leader-follower replication in 60 seconds?
- Draw the write and read flow diagrams?
- Compare synchronous vs asynchronous replication?
- Explain how leader election works?
- Describe ISR (in-sync replicas) concept?
- Identify when to use vs NOT use this pattern?
Related Content
Prerequisites
None - this is a foundational distributed systems pattern
Related Concepts
- Consensus - How nodes agree on leader
- Quorum - Minimum replicas for operation
- Failover - Automatic recovery process
- Eventual Consistency - Async replication guarantee
Used In Systems
- Distributed Database - PostgreSQL replication
- Message Queue - Kafka replication
Explained In Detail
- Kafka Architecture - Replication and ISR (45 minutes)
- Deep dive into partition leadership, ISR management, and controller election
Next Recommended: Consensus - Learn how distributed systems agree on a single leader