Skip to content
~85s Visual Explainer

Log Compaction

How Kafka's log compaction turns a topic into a key-value table by keeping only the latest value per key.

Traditional: Delete Old Segments time Segment 1 7+ days DELETE Segment 2 8 days DELETE Segment 3 3 days Segment 4 1 day Active writing... cleanup.policy=delete retention.ms=7 days What if you only need the latest value per key? Compaction: Keep Latest Per Key Kafka Log (before compaction) K:A v=1 K:B v=1 K:A v=2 K:C v=1 K:B v=2 K:A v=3 superseded by newer values Only the latest value per key matters K:A → keep v=3 | K:B → keep v=2 | K:C → keep v=1 The Compaction Cleaner Closed Segment A:1 B:1 A:2 C:1 B:2 A:3 Cleaner Thread Scan compact Compacted Segment C:1 B:2 A:3 Background process Only closed segments 6 records → 3 records One per unique key Deleting Keys: Tombstones Before A:5 latest B:3 latest C:2 latest A:null tombstone compact After Compaction B:3 C:2 Key A removed entirely null value = delete the key Tombstones retained briefly (delete.retention.ms) to propagate deletion to consumers Log Becomes a Table Compacted Topic user:1 Alice user:2 Bob user:3 Carol = Key-Value Table Key Value user:1 Alice user:2 Bob user:3 Carol Use Cases User profiles • Configuration • CDC changelog Consumer offsets • KTable in Kafka Streams
1 / ?

Traditional: Delete Old Segments

By default, Kafka deletes log segments after a retention period (e.g., 7 days). This works for event streams where historical completeness matters.

But what if you only care about the latest value for each key?

  • cleanup.policy=delete (default)
  • Deletes by time or size
  • Loses historical values after retention

Compaction: Keep Latest Per Key

Log compaction retains only the most recent record for each key. If Key A has values v1, v2, v3 — after compaction, only v3 remains.

This turns the log into a changelog table.

  • cleanup.policy=compact
  • One record per unique key
  • Older values removed

The Compaction Cleaner

A background thread (the cleaner) periodically scans closed segments. It builds a new segment containing only the latest value per key.

The active segment is never compacted — only closed segments.

  • Background process, not blocking
  • Only closed segments compacted
  • Configurable cleaner threads

Deleting Keys: Tombstones

To delete a key, produce a record with that key and a null value. This is called a tombstone. After compaction, the key is removed entirely.

Tombstones are retained briefly to propagate the delete to consumers.

  • null value = tombstone
  • delete.retention.ms controls tombstone life
  • Key fully removed after propagation

Log Becomes a Table

After compaction, the topic is effectively a key-value table. Consumers can rebuild state by reading from offset 0.

Use cases: User profiles, configuration, CDC changelog, consumer offset storage.

  • Compacted topic = key-value store
  • Full state from offset 0
  • KTable in Kafka Streams

What's Next?

Now that you understand log compaction, explore related patterns: Write-Ahead Log (WAL) for durability, Kafka Topic Partitioning for message distribution, and Log-Based Storage for the underlying principles.