Log Compaction

How Kafka's log compaction turns a topic into a key-value table by keeping only the latest value per key.

Read this as Which latest value should survive for each key?
Failure Trap
Forgetting tombstones, compaction lag, and consumers that read before cleanup catches up.
Decision Rule
Use compaction for keyed state topics, and design retention plus tombstones around recovery expectations.
Kafka log compaction A five-step walkthrough: traditional retention deletes whole old segments; compaction instead keeps only the latest record per key; a background cleaner rewrites closed segments; a null-value tombstone deletes a key; the compacted topic becomes a key-value table. Default: delete old segments time Segment 1 8 days old DELETE Segment 2 7 days old DELETE Segment 3 2 days old KEEP Active writing… LIVE cleanup.policy=delete But what if you only need the latest value per key? Compaction: keep latest per key Log before compaction A v1 B v1 A v2 B v2 A v3 older A, B values are superseded Only the latest value per key survives A → v3 · B → v2 The cleaner rewrites segments Closed segment A:1 B:1 A:2 compact Compacted C:1 B:2 A:3 Cleaner thread scans 6 records → 3 records Background job; only closed segments are touched Delete a key with a tombstone Before A:5 B:3 A:∅ tombstone (null) compact After B:3 A is gone A null value means “delete this key” The tombstone is kept briefly so consumers see the delete, then it is removed too. delete.retention.ms A compacted log is a table Compacted topic user:1 Alice user:2 Bob user:3 Carol = Key-value table Key Value user:1 Alice user:2 Bob user:3 Carol Use cases User profiles · config · CDC · consumer offsets
1 / ?

Traditional: Delete Old Segments

By default, Kafka deletes log segments after a retention period (e.g., 7 days). This works for event streams where historical completeness matters.

But what if you only care about the latest value for each key?

  • cleanup.policy=delete (default)
  • Deletes by time or size
  • Loses historical values after retention

Compaction: Keep Latest Per Key

Log compaction retains only the most recent record for each key. If Key A has values v1, v2, v3 — after compaction, only v3 remains.

This turns the log into a changelog table.

  • cleanup.policy=compact
  • One record per unique key
  • Older values removed

The Compaction Cleaner

A background thread (the cleaner) periodically scans closed segments. It builds a new segment containing only the latest value per key.

The active segment is never compacted — only closed segments.

  • Background process, not blocking
  • Only closed segments compacted
  • Configurable cleaner threads

Deleting Keys: Tombstones

To delete a key, produce a record with that key and a null value. This is called a tombstone. After compaction, the key is removed entirely.

Tombstones are retained briefly to propagate the delete to consumers.

  • null value = tombstone
  • delete.retention.ms controls tombstone life
  • Key fully removed after propagation

Log Becomes a Table

After compaction, the topic is effectively a key-value table. Consumers can rebuild state by reading from offset 0.

Use cases: User profiles, configuration, CDC changelog, consumer offset storage.

  • Compacted topic = key-value store
  • Full state from offset 0
  • KTable in Kafka Streams