Exactly-Once Semantics

How Kafka achieves exactly-once processing using idempotent producers, transactions, and consumer isolation.

Read this as Where are duplicates prevented, and where are they only hidden?
Failure Trap
Assuming exactly-once holds outside the broker, transaction, and consumer isolation boundary.
Decision Rule
Combine idempotent producers, transactions, stable keys, and idempotent sinks for end-to-end correctness.
Exactly-once semantics in Kafka A seven-step walkthrough: duplicates arise from producer retries and consumer crashes; idempotent producers dedupe by producer ID and sequence number; Kafka transactions commit the output and the consumer offset atomically; aborted transactions leave no partial output; and read_committed consumers see only committed messages, giving exactly-once processing end to end. A retry duplicates Producer send retry Kafka M1 M1 Two copies stored Exactly-once is hard Producer dedupes Producer PID=1 Seq=5 Seq=5 Broker last=5 dropped Same Seq → drop idempotence=true Consumer can crash Consumer read 100 write Sink DB M100 in CRASH before commit Re-read → written 2× Wrap in a transaction TXN abc Read Process Write One atomic unit All, or roll back Output + offset Output write M1' Offsets commit BOTH or NEITHER No partial state Crash? It aborts Read Process CRASH No output. Offset held. Clean restart read_committed Topic M1 M2 M3 M4 Reader M1, M2 Hides uncommitted Exactly-once, e2e
1 / ?

The Problem: Duplicates Everywhere

Distributed systems are plagued by duplicates. A producer retries on timeout — now there are two copies in Kafka. A consumer crashes after processing but before committing — now the same message is processed twice.

At-least-once is easy. Exactly-once is hard.

  • Network timeouts cause producer retries
  • Consumer crashes cause reprocessing
  • Duplicates corrupt downstream systems

Solution Part 1: Idempotent Producer

Kafka assigns each producer a Producer ID (PID) and tracks sequence numbers per partition. If a retry arrives with a sequence already seen, the broker discards it.

Enable with: enable.idempotence=true

  • PID assigned on producer init
  • Sequence number per partition
  • Broker deduplicates automatically

But Consumers Still Duplicate...

Idempotent producers solve writes. But consumers can still duplicate:

  1. Read message
  2. Process and write to output
  3. Crash before offset commit
  4. Restart → Read same message → Duplicate output

We need the output write and offset commit to be atomic.

Solution Part 2: Transactions

Kafka transactions group multiple operations into an atomic unit. Either ALL succeed (commit) or ALL fail (abort).

A transaction can include: writes to multiple partitions AND offset commits.

  • transactional.id identifies the transactional producer
  • BEGIN → operations → COMMIT/ABORT
  • Cross-partition atomicity

Atomic Commit: All or Nothing

In a stream processing app:

  1. Read from input topic
  2. Process
  3. Write to output topic
  4. Commit input offset

With transactions, steps 3 and 4 happen atomically. Either both succeed or both roll back.

  • sendOffsetsToTransaction() binds offset to txn
  • Output and offset commit are atomic
  • No partial state possible

Crash? Transaction Aborts

If the processor crashes mid-transaction, the transaction times out and aborts. The output writes are discarded, and the offset stays at its previous position.

On restart, processing begins from the last committed offset — no duplicates.

  • Uncommitted transactions abort on timeout
  • No partial output persisted
  • Clean restart from committed state

read_committed: See Only Committed

Downstream consumers set isolation.level=read_committed to only see messages from committed transactions. They never see duplicates or partial writes within Kafka.

External sinks (databases, APIs) must be transactional or idempotent to extend guarantees beyond Kafka.

  • read_committed filters uncommitted
  • read_uncommitted sees everything (default)
  • Exactly-once within Kafka; sinks need idempotency