Exactly-Once Semantics

TL;DR

Exactly-once semantics (EOS) in distributed messaging ensures each message is processed precisely one time, even in the presence of failures, retries, and network issues. It combines idempotent producers (deduplicating retries) with transactions (atomic multi-message operations) to eliminate duplicates while maintaining strong consistency guarantees.

Visual Overview

THREE DELIVERY GUARANTEE LEVELS:

AT-MOST-ONCE (Fire and Forget):
Producer ──▶ [Message] ──▶ Broker  [FAIL] message lost
Result: 0 or 1 deliveries (message may be lost)

AT-LEAST-ONCE (Retry until success):
Producer ──▶ [Message] ──▶ Broker [ACK] received
         ──▶ [Message] ──▶ Broker [ACK] Retry on timeout (DUPLICATE!)
Result: 1+ deliveries (duplicates possible)

EXACTLY-ONCE (Idempotent + Transactional):
Producer ──▶ [Message seq=1] ──▶ Broker [STORED]
         ──▶ [Message seq=1] ──▶ Broker [IGNORED] Duplicate detected
Result: Exactly 1 delivery (no duplicates, no loss)

EXACTLY-ONCE COMPONENTS:
┌────────────────────────────────────────────────┐
│ 1. Idempotent Producer                        │
│    ├── Producer ID (PID)                       │
│    ├── Sequence numbers per partition          │
│    └── Broker-side deduplication               │
│                                                 │
│ 2. Transactions                                │
│    ├── Transaction coordinator                 │
│    ├── Two-phase commit protocol               │
│    ├── Atomic multi-partition writes           │
│    └── Read isolation (read_committed)         │
│                                                 │
│ 3. Zombie Fencing                              │
│    ├── Producer epochs                          │
│    ├── Fencing old producer instances          │
│    └── Preventing split-brain scenarios        │
└────────────────────────────────────────────────┘

Core Explanation

What is Exactly-Once Semantics?

Exactly-once semantics guarantees that:

Every message sent is delivered to the consumer
No message is delivered more than once
Messages are processed atomically across multiple operations

This is achieved through two mechanisms:

MECHANISM 1: IDEMPOTENT PRODUCER (Eliminates duplicate writes)

Producer Instance
├── Producer ID (PID): 12345
├── Sequence Numbers:
│   ├── Partition 0: seq=[0, 1, 2, 3, ...]
│   ├── Partition 1: seq=[0, 1, 2, 3, ...]
│   └── Partition 2: seq=[0, 1, 2, 3, ...]

Retry Scenario:
T=0:  Send msg (seq=5) ──▶ Broker [STORED]
T=1:  Network timeout, no ACK received
T=2:  Retry msg (seq=5) ──▶ Broker sees PID=12345, seq=5 already stored
                             ▶ Ignores duplicate, returns success
Result: Message stored exactly once

MECHANISM 2: TRANSACTIONS (Atomic multi-message operations)

Transaction {
  Write to topic A, partition 0
  Write to topic B, partition 2
  Write to topic C, partition 1
} ──▶ All succeed OR all fail atomically

Consumer with isolation.level=read_committed:
  ├── Sees only committed transactions
  ├── Never sees partial/aborted transactions
  └── Guaranteed consistent view

How Idempotent Producers Work

Producer ID and Sequence Numbers:

PRODUCER INITIALIZATION:

1. Producer starts up
2. Requests Producer ID (PID) from broker
3. Broker assigns unique PID: 12345
4. Producer maintains sequence counters per partition:

   Partition 0: next_seq = 0
   Partition 1: next_seq = 0
   Partition 2: next_seq = 0

SENDING MESSAGES:

Producer.send(topic="orders", partition=0, msg="order-123")
  ├── Attach PID=12345
  ├── Attach seq=0 (for partition 0)
  ├── Increment partition 0 seq to 1
  └── Send to broker

Broker receives (PID=12345, partition=0, seq=0):
  ├── Check: Is this a duplicate?
  ├── Last seq for (PID=12345, partition=0) = -1 (no previous)
  ├── Accept: 0 > -1, this is new
  ├── Store message
  └── Update last seq to 0

Producer.send(topic="orders", partition=0, msg="order-456")
  ├── Attach PID=12345
  ├── Attach seq=1 (incremented)
  └── Send to broker

RETRY SCENARIO (Network failure):

Producer.send(topic="orders", partition=0, msg="order-789")
  ├── Send with seq=2
  ├── Broker stores it
  ├── ACK packet lost in network [FAIL]
  └── Producer doesn't receive ACK

Producer retries:
  ├── Resend with same seq=2 (didn't increment)
  └── Send to broker

Broker receives (PID=12345, partition=0, seq=2):
  ├── Check: Last seq = 2 (already stored)
  ├── Reject as duplicate: seq=2 is not > 2
  ├── Return success ACK (idempotent)
  └── No duplicate stored!

Configuration:

Properties props = new Properties();

// Enable idempotency (also sets acks=all, retries=MAX, max.in.flight=5)
props.put(ProducerConfig.ENABLE_IDEMPOTENCE_CONFIG, true);

KafkaProducer<String, String> producer = new KafkaProducer<>(props);

// Automatic behavior:
// - acks=all (wait for ISR replication)
// - retries=Integer.MAX_VALUE (retry indefinitely)
// - max.in.flight.requests.per.connection=5 (pipeline 5 requests)

How Transactions Work

Transaction Coordinator Architecture:

TRANSACTION COORDINATOR SYSTEM:

__transaction_state Topic (Internal, 50 partitions)
┌──────────────────────────────────────────────────┐
│ Stores transaction metadata:                     │
│ - transactional.id → Producer ID mapping         │
│ - Transaction status (ONGOING/COMMITTED/ABORTED) │
│ - Partitions involved in transaction             │
│ - Producer epochs (for zombie fencing)           │
└──────────────────────────────────────────────────┘

Transaction Flow:
┌─────────────────────────────────────────────────┐
│ 1. Producer: initTransactions()                 │
│    ├── Request Producer ID                       │
│    ├── Increment epoch (fence old producers)     │
│    └── Get transaction coordinator assignment    │
│                                                   │
│ 2. Producer: beginTransaction()                  │
│    └── Mark transaction as ONGOING locally        │
│                                                   │
│ 3. Producer: send() messages                     │
│    ├── Send to partition leaders                 │
│    └── Coordinator tracks partitions involved    │
│                                                   │
│ 4. Producer: commitTransaction()                 │
│    ├── Write PREPARE_COMMIT to __transaction_state│
│    ├── Send COMMIT markers to all partitions     │
│    ├── Wait for partition ACKs                   │
│    ├── Write COMPLETE_COMMIT                     │
│    └── Transaction complete [SUCCESS]            │
└─────────────────────────────────────────────────┘

Two-Phase Commit Protocol:

TWO-PHASE COMMIT FLOW:

PHASE 1: PREPARE
────────────────────────────────────────────────
Producer ──▶ Transaction Coordinator
             "I want to commit transaction X"

Coordinator:
  1. Validate transaction state (must be ONGOING)
  2. Write PREPARE_COMMIT to __transaction_state
  3. Identify all partitions in transaction:
     - topic-A, partition-0
     - topic-B, partition-2
     - topic-C, partition-1

PHASE 2: COMMIT
────────────────────────────────────────────────
Coordinator ──▶ Partition Leaders
                "Write COMMIT markers"

Partition Leaders:
  topic-A, partition-0: [msg1][msg2][COMMIT_MARKER]
  topic-B, partition-2: [msg3][COMMIT_MARKER]
  topic-C, partition-1: [msg4][msg5][COMMIT_MARKER]
                                ↑
                        Transaction boundary marker

Coordinator receives all ACKs:
  1. Write COMPLETE_COMMIT to __transaction_state
  2. Transaction is now durable and visible
  3. Consumers with read_committed see all messages

ABORT SCENARIO:
────────────────────────────────────────────────
If ANY step fails:
  1. Coordinator writes PREPARE_ABORT
  2. Send ABORT markers to all partitions
  3. Write COMPLETE_ABORT
  4. Consumers never see aborted messages

Code Example:

public class ExactlyOnceProcessor {
    private final KafkaProducer<String, String> producer;
    private final KafkaConsumer<String, String> consumer;

    public ExactlyOnceProcessor() {
        // Producer setup
        Properties producerProps = new Properties();
        producerProps.put(ProducerConfig.TRANSACTIONAL_ID_CONFIG,
            "payment-processor-1");
        producerProps.put(ProducerConfig.ENABLE_IDEMPOTENCE_CONFIG, true);

        producer = new KafkaProducer<>(producerProps);
        producer.initTransactions(); // Initialize transaction state

        // Consumer setup
        Properties consumerProps = new Properties();
        consumerProps.put(ConsumerConfig.GROUP_ID_CONFIG, "payment-group");
        consumerProps.put(ConsumerConfig.ISOLATION_LEVEL_CONFIG, "read_committed");
        consumerProps.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, false);

        consumer = new KafkaConsumer<>(consumerProps);
    }

    public void processExactlyOnce() {
        consumer.subscribe(Arrays.asList("input-topic"));

        while (true) {
            ConsumerRecords<String, String> records =
                consumer.poll(Duration.ofMillis(1000));

            if (!records.isEmpty()) {
                // Start transaction
                producer.beginTransaction();

                try {
                    // Process records and produce outputs
                    for (ConsumerRecord<String, String> record : records) {
                        String result = processRecord(record);

                        producer.send(new ProducerRecord<>(
                            "output-topic",
                            record.key(),
                            result
                        ));
                    }

                    // Commit offsets as part of transaction
                    Map<TopicPartition, OffsetAndMetadata> offsets =
                        getOffsetsToCommit(records);

                    producer.sendOffsetsToTransaction(
                        offsets,
                        consumer.groupMetadata()
                    );

                    // Atomic commit: outputs + offsets together
                    producer.commitTransaction();

                    // Exactly-once guarantee:
                    // - Input consumed exactly once (offset committed)
                    // - Output produced exactly once (transaction committed)
                    // - Both atomic (all or nothing)

                } catch (Exception e) {
                    // Abort transaction on any failure
                    producer.abortTransaction();

                    // Offsets NOT committed, will reprocess from last commit
                    // Output messages NOT visible to consumers
                }
            }
        }
    }
}

Zombie Producer Fencing

The Zombie Problem:

SCENARIO: Producer appears to fail, new instance starts

Timeline:
T=0:   Producer-A (PID=123, epoch=5) is running
T=10:  Network partition, Producer-A isolated
T=20:  Application restarts Producer-B (same transactional.id)
T=21:  Producer-B gets (PID=123, epoch=6) ← Epoch incremented
T=30:  Network heals, Producer-A reconnects (zombie!)

Without Fencing:
  Producer-A: Writes with epoch=5
  Producer-B: Writes with epoch=6
  Result: Both write, duplicates! [PROBLEM]

With Fencing:
  Producer-A: Sends request with epoch=5
  Broker: Current epoch=6, reject with INVALID_PRODUCER_EPOCH
  Producer-A: Permanently fenced, stops writes [FENCED]
  Producer-B: Only valid producer, no duplicates

How Epochs Work:

// Automatic epoch management

// Producer 1 (original)
producer1.initTransactions(); // Gets epoch=5
producer1.beginTransaction();
producer1.send(record);
// ... network partition ...

// Producer 2 (new instance, same transactional.id)
producer2.initTransactions(); // Gets epoch=6 (incremented)
producer2.beginTransaction();
producer2.send(record);
producer2.commitTransaction(); // Succeeds [OK]

// Producer 1 (zombie, network recovers)
producer1.send(record);
// Broker rejects: epoch=5 < current epoch=6
// Throws: ProducerFencedException
// Producer 1 cannot write anymore [FENCED]

Performance Impact

Throughput and Latency Tradeoffs:

BENCHMARK COMPARISON (Same hardware, same workload):

Configuration 1: No Guarantees (acks=1)
├── Throughput: 500K msg/sec
├── Latency p99: 2ms
├── CPU: Baseline
└── Guarantee: At-least-once (duplicates possible)

Configuration 2: Idempotent Producer (acks=all)
├── Throughput: 450K msg/sec (-10%)
├── Latency p99: 4ms (2x)
├── CPU: +5%
└── Guarantee: Exactly-once writes (no duplicates)

Configuration 3: Full Transactions (acks=all + transactions)
├── Throughput: 300K msg/sec (-40%)
├── Latency p99: 8ms (4x)
├── CPU: +25%
└── Guarantee: Exactly-once end-to-end (atomic operations)

OVERHEAD SOURCES:
├── Two-phase commit coordination (~3ms)
├── Transaction state writes (~2ms)
├── Producer ID & epoch tracking (~1ms)
├── Commit markers to partitions (~2ms)
└── Coordinator communication (~2ms)

Optimization Strategies:

// Optimize transactional performance

Properties props = new Properties();

// Batch more aggressively
props.put(ProducerConfig.BATCH_SIZE_CONFIG, 65536);     // 64 KB
props.put(ProducerConfig.LINGER_MS_CONFIG, 50);         // 50ms wait

// Larger buffer for transactions
props.put(ProducerConfig.BUFFER_MEMORY_CONFIG, 134217728); // 128 MB

// Compression (helps with large transactions)
props.put(ProducerConfig.COMPRESSION_TYPE_CONFIG, "lz4");

// Result: Can achieve ~60% of non-transactional throughput

Tradeoffs

Advantages:

Eliminates duplicates completely
Atomic multi-message operations
Strong consistency guarantees
Automatic zombie producer fencing
Simplifies application logic (no deduplication needed)

Disadvantages:

40-60% throughput reduction
2-4x latency increase
Higher CPU and memory usage
More complex operational model
Limited to Kafka-to-Kafka operations

Real Systems Using This

Apache Kafka

Implementation: Idempotent producers + transactional API
Scale: LinkedIn processes trillions of messages with exactly-once
Performance: ~300K msg/sec with transactions (production workloads)
Use case: Financial transactions, order processing, exactly-once ETL

Apache Flink

Implementation: Two-phase commit with checkpoints
Integration: Uses Kafka transactions for exactly-once sinks
Scale: Processes 1+ trillion events per day at Alibaba
Use case: Real-time analytics with exactly-once guarantees

Google Cloud Dataflow

Implementation: Exactly-once processing with idempotent writes
Guarantee: End-to-end exactly-once in streaming pipelines
Scale: Petabyte-scale data processing
Use case: Financial reporting, regulatory compliance

When to Use Exactly-Once Semantics

Perfect Use Cases

Financial Transactions

Scenario: Payment processing, money transfers
Why EOS: Double charges unacceptable, regulatory requirements
Config: acks=all, transactions, read_committed
Example: Stripe, Square, PayPal payment systems

Order Processing

Scenario: E-commerce order fulfillment
Why EOS: Duplicate orders = angry customers + loss
Config: Full transactions with offset commits
Example: Amazon order pipeline, Shopify checkouts

Database Change Data Capture (CDC)

Scenario: Replicating database changes to data warehouse
Why EOS: Duplicate records corrupt analytics
Config: Transactional producer with idempotent writes
Example: Debezium → Kafka → Snowflake pipelines

Audit and Compliance Logs

Scenario: Financial audit trails, healthcare records
Why EOS: Legal requirement for accurate records
Config: acks=all, transactions, long retention
Example: Banking transaction logs, HIPAA-compliant systems

When NOT to Use

High-Volume Metrics/Logs

Problem: 60% throughput hit unacceptable for logs
Alternative: At-least-once + application-level dedup
Example: Observability data, clickstream analytics

Performance-Critical Real-Time Systems

Problem: 4x latency increase breaks SLA
Alternative: At-least-once with dedup cache
Example: Ad bidding, real-time recommendations

Idempotent Consumers

Problem: Consumer already handles duplicates
Alternative: No need for EOS overhead
Example: Incrementing counters (idempotent operation)

Interview Application

Common Interview Question 1

Q: “Explain how Kafka achieves exactly-once semantics and why it’s difficult in distributed systems.”

Strong Answer:

“Exactly-once is challenging because distributed systems face network failures, crashes, and retries that naturally cause duplicates. Kafka solves this with two mechanisms:

1. Idempotent Producers (Eliminates retry duplicates):

Producer gets unique Producer ID (PID) from broker

Each message tagged with sequence number per partition

Broker tracks (PID, partition) → last sequence seen

Duplicate retries: Same sequence number → Broker ignores, returns success

Example: Send seq=5 → timeout → retry seq=5 → Broker: ‘Already have seq=5, ignoring’

2. Transactions (Atomic multi-message operations):

Transaction coordinator manages distributed commit via two-phase protocol

Phase 1: PREPARE_COMMIT written to __transaction_state topic

Phase 2: COMMIT markers written to all involved partitions

Consumers with read_committed see only completed transactions

All messages in transaction visible atomically, or none at all

3. Zombie Fencing (Prevents split-brain):

Each producer instance gets incrementing epoch number

Old instance (zombie) has stale epoch → Broker rejects writes

Guarantees only one active producer per transactional ID

Why it’s hard:

Requires coordination across multiple brokers

Two-phase commit adds latency (2-4x increase)

Need to track sequence numbers per (producer, partition)

Zombie producer detection and fencing complexity

Tradeoff: 40% throughput reduction for guaranteed consistency. Worth it for financial transactions, not worth it for logs.”

Why this is good:

Explains all three mechanisms clearly
Provides concrete examples
Explains why it’s difficult
Quantifies performance impact
Shows decision-making on when to use

Common Interview Question 2

Q: “Design a payment processing system that guarantees no duplicate charges, even if services restart or experience network failures.”