Exactly-Once Semantics
How Kafka achieves exactly-once processing using idempotent producers, transactions, and consumer isolation.
The Problem: Duplicates Everywhere
Distributed systems are plagued by duplicates. A producer retries on timeout — now there are two copies in Kafka. A consumer crashes after processing but before committing — now the same message is processed twice.
At-least-once is easy. Exactly-once is hard.
- Network timeouts cause producer retries
- Consumer crashes cause reprocessing
- Duplicates corrupt downstream systems
Solution Part 1: Idempotent Producer
Kafka assigns each producer a Producer ID (PID) and tracks sequence numbers per partition. If a retry arrives with a sequence already seen, the broker discards it.
Enable with: enable.idempotence=true
- PID assigned on producer init
- Sequence number per partition
- Broker deduplicates automatically
But Consumers Still Duplicate...
Idempotent producers solve writes. But consumers can still duplicate:
- Read message
- Process and write to output
- Crash before offset commit
- Restart → Read same message → Duplicate output
We need the output write and offset commit to be atomic.
Solution Part 2: Transactions
Kafka transactions group multiple operations into an atomic unit. Either ALL succeed (commit) or ALL fail (abort).
A transaction can include: writes to multiple partitions AND offset commits.
-
transactional.ididentifies the transactional producer - BEGIN → operations → COMMIT/ABORT
- Cross-partition atomicity
Atomic Commit: All or Nothing
In a stream processing app:
- Read from input topic
- Process
- Write to output topic
- Commit input offset
With transactions, steps 3 and 4 happen atomically. Either both succeed or both roll back.
sendOffsetsToTransaction()binds offset to txn- Output and offset commit are atomic
- No partial state possible
Crash? Transaction Aborts
If the processor crashes mid-transaction, the transaction times out and aborts. The output writes are discarded, and the offset stays at its previous position.
On restart, processing begins from the last committed offset — no duplicates.
- Uncommitted transactions abort on timeout
- No partial output persisted
- Clean restart from committed state
read_committed: See Only Committed
Downstream consumers set isolation.level=read_committed to only
see messages from committed transactions. They never see duplicates or partial
writes within Kafka.
External sinks (databases, APIs) must be transactional or idempotent to extend guarantees beyond Kafka.
read_committedfilters uncommittedread_uncommittedsees everything (default)- Exactly-once within Kafka; sinks need idempotency