Skip to content

Log-Based Storage

How distributed systems use append-only logs for durable, ordered, and high-throughput data storage with time-travel and replay capabilities

TL;DR

Log-based storage uses an append-only, immutable sequence of records (a log) as the primary data structure. Records are written sequentially to the end of the log, never modified in place. This design enables high write throughput, simple crash recovery, built-in time-travel, and forms the foundation of systems like Kafka, databases, and distributed consensus.

Visual Overview

Log-Based Storage Overview

Core Explanation

What is Log-Based Storage?

A log in distributed systems is an append-only, totally-ordered sequence of records. Think of it as an immutable array where:

  • Append-only: Records added to the end, never inserted in middle
  • Immutable: Once written, records never change
  • Ordered: Each record has a unique sequential offset (0, 1, 2, …)
  • Durable: Persisted to disk before acknowledging write
Traditional vs Log-Based Storage

Why Sequential Writes Are Fast

Disk Performance Characteristics:

Disk Performance Characteristics

How Log Storage Achieves Sequential Writes:

B-Tree vs Log Storage

Log Segments and Retention

Why Segments?

Instead of one giant log file, logs are split into segments:

Why Segments

Segment Management:

// Segment configuration
segment.bytes = 1073741824  // 1 GB per segment
segment.ms = 604800000      // New segment every 7 days (whichever comes first)

// Retention strategies
log.retention.bytes = 107374182400  // Keep 100 GB total
log.retention.ms = 604800000         // OR keep 7 days (whichever hits first)

// Deletion process (runs periodically)
for (Segment segment : segments) {
  if (segment.isExpired() || totalSize > maxSize) {
    segment.delete();  // Simple file deletion, O(1)
  }
}

Indexing for Fast Reads

The Challenge:

Indexing Challenge

The Solution: Sparse Index

Sparse Index Structure

Time-Travel and Replay

Core Feature of Log Storage:

Time-Travel and Replay

Production Use Case:

Bug Recovery with Log Storage

Compaction: Log Cleanup with Retention

Time-Based Retention:

Time-Based Retention

Size-Based Retention:

Size-Based Retention

Log Compaction (Key-Based):

Log Compaction

Tradeoffs

Advantages:

  • ✓ Extremely high write throughput (sequential I/O)
  • ✓ Simple crash recovery (just find last valid offset)
  • ✓ Built-in audit trail and time-travel
  • ✓ Easy replication (just copy log segments)
  • ✓ Immutability eliminates update anomalies

Disadvantages:

  • ✕ Slow point reads without good indexing
  • ✕ Space amplification (old versions kept until deletion)
  • ✕ Range queries require scanning
  • ✕ Compaction overhead for key-based retention

Real Systems Using This

Apache Kafka

  • Implementation: Partitioned, replicated logs as primary abstraction
  • Scale: 7+ trillion messages/day at LinkedIn
  • Segments: 1 GB segments, time or size-based retention
  • Typical Setup: 7 day retention, 100+ partitions

Database Write-Ahead Logs (WAL)

  • PostgreSQL WAL: All changes written to log before data files
  • MySQL Binlog: Replication and point-in-time recovery
  • Redis AOF: Append-only file for durability
  • Purpose: Crash recovery, replication, backups

Distributed Consensus (Raft, Paxos)

  • Implementation: Replicated log of commands
  • Purpose: Ensure all nodes apply same operations in same order
  • Examples: etcd, Consul, ZooKeeper

Event Sourcing Systems

  • EventStore: Specialized event sourcing database
  • Axon Framework: CQRS/ES on top of logs
  • Purpose: Complete audit trail, temporal queries

When to Use Log-Based Storage

✓ Perfect Use Cases

High-Throughput Writes

High-Throughput Writes Use Case

Audit and Compliance

Audit and Compliance Use Case

Event Sourcing / CQRS

Event Sourcing Use Case

Stream Processing

Stream Processing Use Case

✕ When NOT to Use

Point Queries Without Indexing

Point Queries Warning

Frequent Updates to Same Key

Frequent Updates Warning

Need to Delete Individual Records (GDPR)

GDPR Deletion Warning

Interview Application

Common Interview Question 1

Q: “Why does Kafka achieve such high throughput compared to traditional message queues?”

Strong Answer:

“Kafka’s high throughput comes from its log-based storage design that exploits sequential I/O:

  1. Sequential writes: All writes append to the end of a log segment file, achieving 100K+ writes/sec on HDDs vs ~100/sec for random writes in traditional MQs
  2. Zero-copy transfers: Kafka uses sendfile() to transfer data from disk → OS page cache → network socket without copying to application memory
  3. Batching: Producers batch multiple messages, consumers fetch in batches, amortizing overhead
  4. No per-message disk seeks: Traditional MQs update indices and metadata per message; Kafka just appends

LinkedIn achieves 7+ trillion messages/day using this design - sequential I/O is the key.”

Why this is good:

  • Specific technical reasons
  • Quantifies performance difference
  • Compares to alternatives
  • Cites real-world scale

Common Interview Question 2

Q: “How would you design a system that can replay the last 30 days of events after discovering a bug in processing logic?”

Strong Answer:

“I’d use log-based storage with time-based retention:

Architecture:

  • Events written to Kafka topic with 30-day retention
  • Processing pipeline consumes from topic
  • Outputs to versioned tables (e.g., analytics_v1, analytics_v2)

Bug recovery process:

  1. Keep old pipeline running (serves current traffic)
  2. Deploy fixed pipeline as NEW consumer group
  3. Use consumer.offsetsForTimes() to seek to 30 days ago
  4. Replay events with corrected logic to analytics_v2 table
  5. Validate results, then switch traffic to v2
  6. Delete old consumer group and v1 table

Key decisions:

  • Separate consumer groups = independent offset tracking
  • Versioned outputs = safe validation before cutover
  • Log retention = enables replay without impacting production

This is exactly how teams at Uber and Netflix do data pipeline repairs.”

Why this is good:

  • Complete architecture design
  • Step-by-step process
  • Explains key design decisions
  • Zero-downtime approach
  • Real-world examples

Red Flags to Avoid

  • ✕ Confusing logs with log files (text files)
  • ✕ Not understanding sequential vs random I/O performance difference
  • ✕ Thinking logs are just for debugging
  • ✕ Not knowing about segments and retention strategies

Quick Self-Check

Before moving on, can you:

  • Explain log-based storage in 60 seconds?
  • Draw the structure of a log with segments?
  • Explain why sequential writes are 1000x faster?
  • Describe how to find a specific offset quickly?
  • Identify when to use vs NOT use log-based storage?
  • Explain time-travel and replay capabilities?

Prerequisites

None - this is a foundational storage concept

Used In Systems

  • Distributed Message Queues - Kafka-style messaging
  • Event-Driven Architectures - Using logs for communication

Explained In Detail


See It In Action


Next Recommended: Event Sourcing - See how to build applications using log-based storage

Interview Notes
💼75% of database interviews
Interview Relevance
75% of database interviews
🏭Kafka, databases, consensus
Production Impact
Powers systems at Kafka, databases, consensus
1000x faster sequential writes
Performance
1000x faster sequential writes query improvement
📈Trillions of messages
Scalability
Trillions of messages