Log Compaction
How Kafka's log compaction turns a topic into a key-value table by keeping only the latest value per key.
Traditional: Delete Old Segments
By default, Kafka deletes log segments after a retention period (e.g., 7 days). This works for event streams where historical completeness matters.
But what if you only care about the latest value for each key?
cleanup.policy=delete(default)- Deletes by time or size
- Loses historical values after retention
Compaction: Keep Latest Per Key
Log compaction retains only the most recent record for each key. If Key A has values v1, v2, v3 — after compaction, only v3 remains.
This turns the log into a changelog table.
cleanup.policy=compact- One record per unique key
- Older values removed
The Compaction Cleaner
A background thread (the cleaner) periodically scans closed segments. It builds a new segment containing only the latest value per key.
The active segment is never compacted — only closed segments.
- Background process, not blocking
- Only closed segments compacted
- Configurable cleaner threads
Deleting Keys: Tombstones
To delete a key, produce a record with that key and a null value. This is called a tombstone. After compaction, the key is removed entirely.
Tombstones are retained briefly to propagate the delete to consumers.
- null value = tombstone
delete.retention.mscontrols tombstone life- Key fully removed after propagation
Log Becomes a Table
After compaction, the topic is effectively a key-value table. Consumers can rebuild state by reading from offset 0.
Use cases: User profiles, configuration, CDC changelog, consumer offset storage.
- Compacted topic = key-value store
- Full state from offset 0
- KTable in Kafka Streams