Skip to content
~110s Visual Explainer

Exactly-Once Semantics

How Kafka achieves exactly-once processing using idempotent producers, transactions, and consumer isolation.

The Problem: Duplicates Everywhere Producer M1 Kafka Topic Timeout! Retry M1 M1 Duplicates! Consumer Processes Output 2x! Duplicates corrupt downstream systems Double charges, double inventory, double logs Solution Part 1: Idempotent Producer Producer PID=1 Seq=5 M1 Seq=5 M1 retry Kafka Broker PID=1 last_seq=5 Duplicate detected! Discarded Exactly-once to Kafka enable.idempotence=true PID + Seq = deduplication But Consumers Still Duplicate... Kafka offset: 100 1. Read Consumer Process M100 2. Write Sink DB Written! CRASH before commit After Restart: Kafka offset: 100 Consumer Reprocess M100 Sink DB DUPLICATE! Same message processed twice Solution Part 2: Transactions TRANSACTION (TXN_ID=abc123) BEGIN Read from input Process transform Write to output COMMIT All operations in single atomic unit Either ALL succeed or ALL roll back Atomic Commit: All or Nothing Input Topic Read M1 1 Process Transform 2 Output Topic Write M1' 3 __consumer _offsets Commit Output + Offset Commit = Atomic sendOffsetsToTransaction() Binds offset commit to same transaction as output Crash? Transaction Aborts Transaction In Progress Read Process Write CRASH After Crash TXN Aborted No output Offset unchanged Clean Restart from Last Committed Offset No partial output. No duplicates. Safe retry. read_committed: See Only Committed Kafka Topic M1 M2 M3 M4 Committed Uncommitted Consumer read_committed Sees: M1, M2 Skipped Output Clean data No duplicates Exactly-Once End-to-End Producer to Kafka to Consumer to Output
1 / ?

The Problem: Duplicates Everywhere

Distributed systems are plagued by duplicates. A producer retries on timeout — now there are two copies in Kafka. A consumer crashes after processing but before committing — now the same message is processed twice.

At-least-once is easy. Exactly-once is hard.

  • Network timeouts cause producer retries
  • Consumer crashes cause reprocessing
  • Duplicates corrupt downstream systems

Solution Part 1: Idempotent Producer

Kafka assigns each producer a Producer ID (PID) and tracks sequence numbers per partition. If a retry arrives with a sequence already seen, the broker discards it.

Enable with: enable.idempotence=true

  • PID assigned on producer init
  • Sequence number per partition
  • Broker deduplicates automatically

But Consumers Still Duplicate...

Idempotent producers solve writes. But consumers can still duplicate:

  1. Read message
  2. Process and write to output
  3. Crash before offset commit
  4. Restart → Read same message → Duplicate output

We need the output write and offset commit to be atomic.

Solution Part 2: Transactions

Kafka transactions group multiple operations into an atomic unit. Either ALL succeed (commit) or ALL fail (abort).

A transaction can include: writes to multiple partitions AND offset commits.

  • transactional.id identifies the transactional producer
  • BEGIN → operations → COMMIT/ABORT
  • Cross-partition atomicity

Atomic Commit: All or Nothing

In a stream processing app:

  1. Read from input topic
  2. Process
  3. Write to output topic
  4. Commit input offset

With transactions, steps 3 and 4 happen atomically. Either both succeed or both roll back.

  • sendOffsetsToTransaction() binds offset to txn
  • Output and offset commit are atomic
  • No partial state possible

Crash? Transaction Aborts

If the processor crashes mid-transaction, the transaction times out and aborts. The output writes are discarded, and the offset stays at its previous position.

On restart, processing begins from the last committed offset — no duplicates.

  • Uncommitted transactions abort on timeout
  • No partial output persisted
  • Clean restart from committed state

read_committed: See Only Committed

Downstream consumers set isolation.level=read_committed to only see messages from committed transactions. They never see duplicates or partial writes within Kafka.

External sinks (databases, APIs) must be transactional or idempotent to extend guarantees beyond Kafka.

  • read_committed filters uncommitted
  • read_uncommitted sees everything (default)
  • Exactly-once within Kafka; sinks need idempotency

What's Next?

Now that you understand exactly-once semantics, explore related patterns: Kafka Topic Partitioning for message distribution, Consumer Group Rebalancing for partition assignment, and Producer Acknowledgments for durability guarantees.