Producer Acknowledgments

TL;DR

Producer acknowledgments (acks) control when Kafka considers a message successfully written. Options include acks=0 (no confirmation), acks=1 (leader confirms), and acks=all (all replicas confirm), trading latency for durability guarantees. Critical for balancing performance vs data safety in message brokers.

Visual Overview

ACKS = 0 (Fire and Forget)
┌────────────────────────────────────────────────┐
│  Producer → Message → Kafka Leader             │
│      ↓                      (don't wait)       │
│  Immediate return ✓                            │
│  Latency: <1ms                                 │
│                                                │
│  Risk: Message may be lost if:                 │
│  - Network failure before reaching leader      │
│  - Leader crashes before writing to disk       │
│  - Leader crashes before replication           │
│                                                │
│  Use case: Metrics, logs (lossy OK)            │
└────────────────────────────────────────────────┘

ACKS = 1 (Leader Acknowledgment)
┌────────────────────────────────────────────────┐
│  Producer → Message → Kafka Leader             │
│                         ↓                      │
│                    Write to log ✓              │
│                         ↓                      │
│                    Send ACK → Producer         │
│  Latency: 5-10ms                               │
│                                                │
│  Meanwhile (async):                            │
│  Leader → Replicate → Follower 1               │
│  Leader → Replicate → Follower 2               │
│                                                │
│  Risk: Message lost if leader crashes          │
│  before replication completes                  │
│                                                │
│  Use case: Most production workloads (default) │
└────────────────────────────────────────────────┘

ACKS = ALL (Full Quorum)
┌────────────────────────────────────────────────┐
│  Producer → Message → Kafka Leader             │
│                         ↓                      │
│                    Write to log                │
│                         ↓                      │
│            Replicate to all ISR replicas       │
│                         ↓                      │
│  Follower 1: Written ✓                         │
│  Follower 2: Written ✓                         │
│                         ↓                      │
│                    Send ACK → Producer         │
│  Latency: 10-50ms (network + replication)     │
│                                                │
│  Risk: Message never lost (replicated)         │
│  (unless all ISR replicas fail simultaneously) │
│                                                │
│  Use case: Financial transactions, orders      │
└────────────────────────────────────────────────┘

TIMELINE COMPARISON:
┌────────────────────────────────────────────────┐
│  acks=0:                                       │
│  T0: Send                                      │
│  T1: Return (1ms) ✓                            │
│                                                │
│  acks=1:                                       │
│  T0: Send                                      │
│  T5: Leader writes                             │
│  T10: Return (10ms) ✓                          │
│                                                │
│  acks=all:                                     │
│  T0: Send                                      │
│  T5: Leader writes                             │
│  T15: Follower 1 writes                        │
│  T20: Follower 2 writes                        │
│  T25: Return (25ms) ✓                          │
└────────────────────────────────────────────────┘

Core Explanation

What are Producer Acknowledgments?

Producer acknowledgments (acks) control when a Kafka producer considers a write operation successful. This determines:

When producer receives confirmation that message is safe
How many replicas must persist the message
Trade-off between latency and durability

Three Levels:

acks=0: No acknowledgment (fire-and-forget)
acks=1: Leader acknowledgment (default)
acks=all: Full ISR acknowledgment (safest)

acks=0: No Acknowledgment

Behavior:

Producer sends message, immediately considers it sent
Leader receives message (maybe)
No confirmation sent back

Result:
- Highest throughput (no waiting)
- Lowest latency (<1ms)
- Zero durability guarantee

When Message Can Be Lost:

1. Network failure before reaching broker
   Producer → [Network drops packet] → Leader (never arrives)

2. Leader crash before writing to disk
   Producer → Leader (in memory) → [Crash] ✗

3. Leader crash before replication
   Producer → Leader (written) → [Crash before replicating] ✗

Probability of loss: Relatively high (1-5%)

Configuration:

const producer = kafka.producer({
  acks: 0, // No acknowledgment
  compression: "gzip", // Often used with acks=0 for max throughput
});

Use Cases:

✓ Log aggregation (OK to lose some logs)
✓ Metrics collection (OK to lose some data points)
✓ IoT sensor data (high volume, redundancy)
✓ Clickstream tracking (lossy acceptable)

✗ Financial transactions
✗ User-facing data (messages, posts)
✗ Critical business events

acks=1: Leader Acknowledgment

Behavior:

Producer sends message
Leader writes to local log (durable on leader disk)
Leader sends ACK to producer
Producer considers message sent ✓

Meanwhile (asynchronous):
Leader replicates to followers (background)

Result:
- Good throughput
- Moderate latency (5-10ms)
- Durability: Survives producer/network failure
- Risk: Lost if leader fails before replication

When Message Can Be Lost:

Scenario: Leader fails before replication

T0: Producer → Leader (message written to leader)
T1: Leader → ACK → Producer ✓
T2: Producer moves on
T3: Leader crashes ⚡ (before replicating)
T4: Follower promoted to new leader
T5: Message is GONE ✗ (was only on failed leader)

Probability: Low (1-2% during failures)
Window of vulnerability: ~500ms (replication lag)

Configuration:

const producer = kafka.producer({
  acks: 1, // Leader acknowledgment (default)
  timeout: 30000, // 30s timeout
  retry: {
    retries: 3, // Retry on failure
  },
});

Use Cases:

✓ Most production workloads (default choice)
✓ High-throughput messaging
✓ Real-time analytics
✓ Event streaming

Balance between performance and safety

acks=all: Full ISR Acknowledgment

Behavior:

Producer sends message
Leader writes to local log
Leader waits for ALL in-sync replicas (ISR) to acknowledge
All ISR replicas write to their logs
Leader sends ACK to producer
Producer considers message sent ✓

Result:
- Lower throughput
- Higher latency (10-50ms)
- Maximum durability
- Message replicated before acknowledgment

In-Sync Replicas (ISR):

ISR = Set of replicas that are "caught up" with leader

Example:
- Leader: Broker 1
- Followers: Broker 2 (in sync), Broker 3 (lagging)
- ISR = {Broker 1, Broker 2}

acks=all waits for: Broker 1 + Broker 2

If Broker 2 falls behind (network issue):
ISR = {Broker 1}  (just leader)
acks=all waits for: Broker 1 only (no followers!)

This is why min.insync.replicas is critical!

min.insync.replicas:

Configuration: Minimum ISR size required for writes

min.insync.replicas=2 (recommended for acks=all)
- Requires at least 2 replicas in ISR
- If ISR shrinks to 1, producer gets error
- Prevents data loss when only leader is alive

Example with 3 replicas:
┌─────────────────────────────────────────────┐
│  Normal: ISR = {Leader, Follower1, Follower2}│
│  acks=all waits for Leader + Follower1      │
│  (or Leader + Follower2, first to respond)  │
│                                             │
│  Follower1 fails: ISR = {Leader, Follower2} │
│  acks=all waits for Leader + Follower2 ✓    │
│                                             │
│  Follower2 also fails: ISR = {Leader}       │
│  acks=all REJECTS writes ✗                  │
│  (ISR size 1 < min.insync.replicas 2)       │
└─────────────────────────────────────────────┘

Protection: Cannot lose data if leader fails,
because message is on at least 2 replicas

Configuration:

const producer = kafka.producer({
  acks: -1, // -1 means "all" (acks=all)
  timeout: 30000,
  retry: {
    retries: 5,
  },
});

// Topic configuration
min.insync.replicas = 2; // At least 2 replicas must ack
replication.factor = 3; // Total of 3 replicas

Use Cases:

✓ Financial transactions
✓ E-commerce orders
✓ User-generated content (posts, messages)
✓ Critical business events
✓ Regulatory/compliance data

Anywhere data loss is unacceptable

Real Systems Using Producer Acks

System	Default acks	Typical Config	Rationale
Kafka Streams	acks=all	acks=all, min.insync.replicas=2	State stores require durability
Netflix (Keystone)	acks=1	acks=1, replication=3	High throughput, tolerate rare loss
LinkedIn	acks=all	acks=all, min.insync.replicas=2	Business-critical events
Uber	acks=1	acks=1 (logs), acks=all (trips)	Mixed based on data criticality
Confluent Cloud	acks=all	acks=all, min.insync.replicas=2	Default for safety

Case Study: Kafka at LinkedIn

LinkedIn's Kafka usage (origin of Kafka):
- 100+ billion messages/day
- 1000s of topics
- Multi-datacenter deployment

Acknowledgment Strategy:
┌───────────────────────────────────────────┐
│  Critical Data (jobs, connections):       │
│  - acks=all                               │
│  - min.insync.replicas=2                  │
│  - replication.factor=3                   │
│  → Latency: 20-30ms                       │
│  → Zero data loss                         │
│                                           │
│  Metrics/Logs (high volume):              │
│  - acks=1                                 │
│  - replication.factor=2                   │
│  → Latency: 5-10ms                        │
│  → Acceptable loss rate: <0.1%            │
│                                           │
│  Analytics Events (ultra-high volume):    │
│  - acks=0                                 │
│  - compression=gzip                       │
│  → Latency: 1-2ms                         │
│  → Loss rate: 1-2% (acceptable)           │
└───────────────────────────────────────────┘

Lesson: Different acks for different data criticality

When to Use Each Ack Level

acks=0: Fire and Forget

Use When:

✓ High throughput required (100k+ msg/sec)
✓ Data loss is acceptable (logs, metrics)
✓ Data has natural redundancy (sensor arrays)
✓ Ultra-low latency required (<1ms)

Example: IoT sensor network
- 1000 sensors sending data every second
- If 1% of readings lost, still have 99%
- Aggregate statistics still accurate

acks=1: Leader Only

Use When:

✓ Good balance of performance and safety
✓ Occasional loss acceptable during failures
✓ High throughput with moderate durability
✓ Default choice for most workloads

Example: User activity tracking
- Click events, page views, etc.
- Occasional loss during broker failure OK
- Still maintain 99%+ delivery

acks=all: Full Replication

Use When:

✓ Zero data loss required
✓ Regulatory/compliance requirements
✓ Financial or critical business data
✓ Can tolerate higher latency (10-50ms)

Example: E-commerce order placement
- User places order (creates Kafka event)
- Order must not be lost
- OK to wait 20-30ms for full replication
- Worth latency cost for safety

Hybrid Approach

Different Topics, Different Acks:

// Critical orders: acks=all
const orderProducer = kafka.producer({
  acks: -1,
  timeout: 30000,
});

// Analytics events: acks=1
const analyticsProducer = kafka.producer({
  acks: 1,
  timeout: 10000,
});

// Metrics: acks=0
const metricsProducer = kafka.producer({
  acks: 0,
  compression: "gzip",
});

Interview Application

Common Interview Question

Q: “How would you ensure zero data loss in a Kafka-based order processing system?”

Strong Answer:

“To ensure zero data loss for orders, I’d configure producers with acks=all and proper ISR settings:

Producer Configuration:
acks=all (or acks=-1)
min.insync.replicas=2
replication.factor=3
retries=MAX_INT (infinite retries)
max.in.flight.requests=1 (for ordering)
How This Prevents Loss:

acks=all: Producer waits for full replication before considering write successful

min.insync.replicas=2: Requires at least 2 replicas (leader + 1 follower) to acknowledge

replication.factor=3: Total of 3 copies across brokers

Result: Message on ≥2 replicas before ACK

Failure Scenarios:

Network failure: Producer retries until successful

Leader failure: Message already on follower (promoted to new leader)

Follower failure: Still have leader + other follower (meets min ISR)

Leader + Follower fail: Third replica exists, can rebuild ISR

Only lose data if: All 3 replicas fail simultaneously (extremely rare)

Trade-offs:

Latency: 20-30ms vs 5-10ms for acks=1

Throughput: Lower (wait for replication)

Availability: May reject writes if ISR < 2

Worth It: For orders where data loss = lost revenue + angry customers

Monitoring: Alert if ISR falls below min.insync.replicas”

Code Example

Producer with Different Ack Levels

const { Kafka } = require("kafkajs");

const kafka = new Kafka({
  clientId: "my-producer",
  brokers: ["kafka1:9092", "kafka2:9092", "kafka3:9092"],
});

// Configuration 1: acks=0 (Fire and Forget)
async function sendMetrics() {
  const producer = kafka.producer({
    acks: 0, // No acknowledgment
    compression: "gzip",
  });

  await producer.connect();

  const start = Date.now();
  await producer.send({
    topic: "metrics",
    messages: [{ value: JSON.stringify({ cpu: 80, mem: 60 }) }],
  });
  const latency = Date.now() - start;

  console.log(`Metrics sent (acks=0): ${latency}ms`);
  // Typical output: 1-2ms
  // Risk: Message may be lost
}

// Configuration 2: acks=1 (Leader Acknowledgment)
async function sendUserActivity() {
  const producer = kafka.producer({
    acks: 1, // Leader acknowledgment (default)
    timeout: 30000,
    retry: {
      retries: 3,
      initialRetryTime: 100,
    },
  });

  await producer.connect();

  const start = Date.now();
  await producer.send({
    topic: "user-activity",
    messages: [
      {
        key: "user-123",
        value: JSON.stringify({ action: "click", page: "/products" }),
      },
    ],
  });
  const latency = Date.now() - start;

  console.log(`Activity sent (acks=1): ${latency}ms`);
  // Typical output: 5-10ms
  // Risk: Lost if leader fails before replication
}

// Configuration 3: acks=all (Full ISR Acknowledgment)
async function sendOrder() {
  const producer = kafka.producer({
    acks: -1, // acks=all (wait for full ISR)
    timeout: 30000,
    retry: {
      retries: Number.MAX_VALUE, // Retry forever
      initialRetryTime: 100,
      maxRetryTime: 30000,
    },
    idempotent: true, // Exactly-once semantics
    maxInFlightRequests: 1, // Preserve ordering
  });

  await producer.connect();

  const start = Date.now();
  try {
    await producer.send({
      topic: "orders", // Topic config: min.insync.replicas=2, replication.factor=3
      messages: [
        {
          key: "order-456",
          value: JSON.stringify({
            orderId: "456",
            userId: "123",
            total: 99.99,
            items: [{ id: "product-1", qty: 2 }],
          }),
        },
      ],
    });
    const latency = Date.now() - start;

    console.log(`Order sent (acks=all): ${latency}ms`);
    // Typical output: 15-30ms
    // Guarantee: Message on ≥2 replicas, zero loss
  } catch (error) {
    if (error.type === "NOT_ENOUGH_REPLICAS") {
      // ISR < min.insync.replicas (degraded cluster)
      console.error("Cluster degraded: Not enough in-sync replicas");
      // Alert operations team
      // Queue order for retry
    }
    throw error;
  }
}

// Demonstrating latency differences
async function benchmark() {
  console.log("Benchmarking producer acknowledgments...\n");

  await sendMetrics(); // ~1-2ms
  await sendUserActivity(); // ~5-10ms
  await sendOrder(); // ~15-30ms

  // Trade-off: Latency vs Durability
  // acks=0:   Fastest, least safe
  // acks=1:   Balanced (default)
  // acks=all: Slowest, safest
}

benchmark();

Error Handling with acks=all

async function sendCriticalData(data) {
  const producer = kafka.producer({
    acks: -1,
    retry: {
      retries: 5,
      initialRetryTime: 300,
    },
  });

  await producer.connect();

  try {
    await producer.send({
      topic: "critical-data",
      messages: [{ value: JSON.stringify(data) }],
    });

    console.log("Data persisted successfully (acks=all)");
  } catch (error) {
    // Error types to handle:

    if (error.type === "NOT_ENOUGH_REPLICAS") {
      // ISR < min.insync.replicas
      console.error("Not enough in-sync replicas");
      // Action: Alert operations, queue for retry
    }

    if (error.type === "NOT_ENOUGH_REPLICAS_AFTER_APPEND") {
      // Message written to leader, but ISR shrank before replication
      console.error("Replication failed after append");
      // Action: Retry (may be duplicate, use idempotent producer)
    }

    if (error.type === "REQUEST_TIMED_OUT") {
      // Replication took longer than timeout
      console.error("Acknowledgment timeout");
      // Action: Retry (may be duplicate)
    }

    // Store in dead letter queue for manual review
    await storeInDLQ(data, error);
    throw error;
  }
}

Prerequisites:

Leader-Follower Replication - Understanding ISR
Topic Partitioning - Kafka architecture

Related Concepts:

Quorum - ISR is a form of quorum
Idempotence - Idempotent producer with acks=all
Exactly-Once Semantics - Combines idempotence + acks=all

Used In Systems:

Kafka (producer acknowledgments)
Pulsar (similar ack levels)
RabbitMQ (publisher confirms)

Explained In Detail:

Kafka Deep Dive - Producer mechanics and acknowledgments

Quick Self-Check

Can explain acks=0/1/all in 60 seconds?
Understand latency vs durability trade-offs?
Know when messages can be lost for each ack level?
Can explain min.insync.replicas and ISR?
Understand acks=all + min.insync.replicas=2 pattern?
Know which ack level to use for different use cases?

TL;DR

Visual Overview

Core Explanation

What are Producer Acknowledgments?

acks=0: No Acknowledgment

acks=1: Leader Acknowledgment

acks=all: Full ISR Acknowledgment

Real Systems Using Producer Acks

Case Study: Kafka at LinkedIn

When to Use Each Ack Level

acks=0: Fire and Forget

acks=1: Leader Only

acks=all: Full Replication

Hybrid Approach

Interview Application

Common Interview Question

Code Example

Producer with Different Ack Levels

Error Handling with acks=all

Related Content

Quick Self-Check