Skip to content

Producer Acknowledgments

7 min Intermediate Messaging Interview: 65%

Mechanisms by which message producers receive confirmation that their messages were successfully persisted, enabling reliability tradeoffs between latency and durability

πŸ’Ό 65% of messaging interviews
Interview Relevance
65% of messaging interviews
🏭 Durability control
Production Impact
Powers systems at Durability control
⚑ Latency vs safety tradeoffs
Performance
Latency vs safety tradeoffs query improvement
πŸ“ˆ Data loss prevention
Scalability
Data loss prevention

TL;DR

Producer acknowledgments (acks) control when Kafka considers a message successfully written. Options include acks=0 (no confirmation), acks=1 (leader confirms), and acks=all (all replicas confirm), trading latency for durability guarantees. Critical for balancing performance vs data safety in message brokers.

Visual Overview

ACKS = 0 (Fire and Forget)
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Producer β†’ Message β†’ Kafka Leader             β”‚
β”‚      ↓                      (don't wait)       β”‚
β”‚  Immediate return βœ“                            β”‚
β”‚  Latency: <1ms                                 β”‚
β”‚                                                β”‚
β”‚  Risk: Message may be lost if:                 β”‚
β”‚  - Network failure before reaching leader      β”‚
β”‚  - Leader crashes before writing to disk       β”‚
β”‚  - Leader crashes before replication           β”‚
β”‚                                                β”‚
β”‚  Use case: Metrics, logs (lossy OK)            β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

ACKS = 1 (Leader Acknowledgment)
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Producer β†’ Message β†’ Kafka Leader             β”‚
β”‚                         ↓                      β”‚
β”‚                    Write to log βœ“              β”‚
β”‚                         ↓                      β”‚
β”‚                    Send ACK β†’ Producer         β”‚
β”‚  Latency: 5-10ms                               β”‚
β”‚                                                β”‚
β”‚  Meanwhile (async):                            β”‚
β”‚  Leader β†’ Replicate β†’ Follower 1               β”‚
β”‚  Leader β†’ Replicate β†’ Follower 2               β”‚
β”‚                                                β”‚
β”‚  Risk: Message lost if leader crashes          β”‚
β”‚  before replication completes                  β”‚
β”‚                                                β”‚
β”‚  Use case: Most production workloads (default) β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

ACKS = ALL (Full Quorum)
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Producer β†’ Message β†’ Kafka Leader             β”‚
β”‚                         ↓                      β”‚
β”‚                    Write to log                β”‚
β”‚                         ↓                      β”‚
β”‚            Replicate to all ISR replicas       β”‚
β”‚                         ↓                      β”‚
β”‚  Follower 1: Written βœ“                         β”‚
β”‚  Follower 2: Written βœ“                         β”‚
β”‚                         ↓                      β”‚
β”‚                    Send ACK β†’ Producer         β”‚
β”‚  Latency: 10-50ms (network + replication)     β”‚
β”‚                                                β”‚
β”‚  Risk: Message never lost (replicated)         β”‚
β”‚  (unless all ISR replicas fail simultaneously) β”‚
β”‚                                                β”‚
β”‚  Use case: Financial transactions, orders      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

TIMELINE COMPARISON:
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  acks=0:                                       β”‚
β”‚  T0: Send                                      β”‚
β”‚  T1: Return (1ms) βœ“                            β”‚
β”‚                                                β”‚
β”‚  acks=1:                                       β”‚
β”‚  T0: Send                                      β”‚
β”‚  T5: Leader writes                             β”‚
β”‚  T10: Return (10ms) βœ“                          β”‚
β”‚                                                β”‚
β”‚  acks=all:                                     β”‚
β”‚  T0: Send                                      β”‚
β”‚  T5: Leader writes                             β”‚
β”‚  T15: Follower 1 writes                        β”‚
β”‚  T20: Follower 2 writes                        β”‚
β”‚  T25: Return (25ms) βœ“                          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Core Explanation

What are Producer Acknowledgments?

Producer acknowledgments (acks) control when a Kafka producer considers a write operation successful. This determines:

  1. When producer receives confirmation that message is safe
  2. How many replicas must persist the message
  3. Trade-off between latency and durability

Three Levels:

acks=0: No acknowledgment (fire-and-forget)
acks=1: Leader acknowledgment (default)
acks=all: Full ISR acknowledgment (safest)

acks=0: No Acknowledgment

Behavior:

Producer sends message, immediately considers it sent
Leader receives message (maybe)
No confirmation sent back

Result:
- Highest throughput (no waiting)
- Lowest latency (<1ms)
- Zero durability guarantee

When Message Can Be Lost:

1. Network failure before reaching broker
   Producer β†’ [Network drops packet] β†’ Leader (never arrives)

2. Leader crash before writing to disk
   Producer β†’ Leader (in memory) β†’ [Crash] βœ—

3. Leader crash before replication
   Producer β†’ Leader (written) β†’ [Crash before replicating] βœ—

Probability of loss: Relatively high (1-5%)

Configuration:

const producer = kafka.producer({
  acks: 0, // No acknowledgment
  compression: "gzip", // Often used with acks=0 for max throughput
});

Use Cases:

βœ“ Log aggregation (OK to lose some logs)
βœ“ Metrics collection (OK to lose some data points)
βœ“ IoT sensor data (high volume, redundancy)
βœ“ Clickstream tracking (lossy acceptable)

βœ— Financial transactions
βœ— User-facing data (messages, posts)
βœ— Critical business events

acks=1: Leader Acknowledgment

Behavior:

Producer sends message
Leader writes to local log (durable on leader disk)
Leader sends ACK to producer
Producer considers message sent βœ“

Meanwhile (asynchronous):
Leader replicates to followers (background)

Result:
- Good throughput
- Moderate latency (5-10ms)
- Durability: Survives producer/network failure
- Risk: Lost if leader fails before replication

When Message Can Be Lost:

Scenario: Leader fails before replication

T0: Producer β†’ Leader (message written to leader)
T1: Leader β†’ ACK β†’ Producer βœ“
T2: Producer moves on
T3: Leader crashes ⚑ (before replicating)
T4: Follower promoted to new leader
T5: Message is GONE βœ— (was only on failed leader)

Probability: Low (1-2% during failures)
Window of vulnerability: ~500ms (replication lag)

Configuration:

const producer = kafka.producer({
  acks: 1, // Leader acknowledgment (default)
  timeout: 30000, // 30s timeout
  retry: {
    retries: 3, // Retry on failure
  },
});

Use Cases:

βœ“ Most production workloads (default choice)
βœ“ High-throughput messaging
βœ“ Real-time analytics
βœ“ Event streaming

Balance between performance and safety

acks=all: Full ISR Acknowledgment

Behavior:

Producer sends message
Leader writes to local log
Leader waits for ALL in-sync replicas (ISR) to acknowledge
All ISR replicas write to their logs
Leader sends ACK to producer
Producer considers message sent βœ“

Result:
- Lower throughput
- Higher latency (10-50ms)
- Maximum durability
- Message replicated before acknowledgment

In-Sync Replicas (ISR):

ISR = Set of replicas that are "caught up" with leader

Example:
- Leader: Broker 1
- Followers: Broker 2 (in sync), Broker 3 (lagging)
- ISR = {Broker 1, Broker 2}

acks=all waits for: Broker 1 + Broker 2

If Broker 2 falls behind (network issue):
ISR = {Broker 1}  (just leader)
acks=all waits for: Broker 1 only (no followers!)

This is why min.insync.replicas is critical!

min.insync.replicas:

Configuration: Minimum ISR size required for writes

min.insync.replicas=2 (recommended for acks=all)
- Requires at least 2 replicas in ISR
- If ISR shrinks to 1, producer gets error
- Prevents data loss when only leader is alive

Example with 3 replicas:
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Normal: ISR = {Leader, Follower1, Follower2}β”‚
β”‚  acks=all waits for Leader + Follower1      β”‚
β”‚  (or Leader + Follower2, first to respond)  β”‚
β”‚                                             β”‚
β”‚  Follower1 fails: ISR = {Leader, Follower2} β”‚
β”‚  acks=all waits for Leader + Follower2 βœ“    β”‚
β”‚                                             β”‚
β”‚  Follower2 also fails: ISR = {Leader}       β”‚
β”‚  acks=all REJECTS writes βœ—                  β”‚
β”‚  (ISR size 1 < min.insync.replicas 2)       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Protection: Cannot lose data if leader fails,
because message is on at least 2 replicas

Configuration:

const producer = kafka.producer({
  acks: -1, // -1 means "all" (acks=all)
  timeout: 30000,
  retry: {
    retries: 5,
  },
});

// Topic configuration
min.insync.replicas = 2; // At least 2 replicas must ack
replication.factor = 3; // Total of 3 replicas

Use Cases:

βœ“ Financial transactions
βœ“ E-commerce orders
βœ“ User-generated content (posts, messages)
βœ“ Critical business events
βœ“ Regulatory/compliance data

Anywhere data loss is unacceptable

Real Systems Using Producer Acks

SystemDefault acksTypical ConfigRationale
Kafka Streamsacks=allacks=all, min.insync.replicas=2State stores require durability
Netflix (Keystone)acks=1acks=1, replication=3High throughput, tolerate rare loss
LinkedInacks=allacks=all, min.insync.replicas=2Business-critical events
Uberacks=1acks=1 (logs), acks=all (trips)Mixed based on data criticality
Confluent Cloudacks=allacks=all, min.insync.replicas=2Default for safety

Case Study: Kafka at LinkedIn

LinkedIn's Kafka usage (origin of Kafka):
- 100+ billion messages/day
- 1000s of topics
- Multi-datacenter deployment

Acknowledgment Strategy:
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Critical Data (jobs, connections):       β”‚
β”‚  - acks=all                               β”‚
β”‚  - min.insync.replicas=2                  β”‚
β”‚  - replication.factor=3                   β”‚
β”‚  β†’ Latency: 20-30ms                       β”‚
β”‚  β†’ Zero data loss                         β”‚
β”‚                                           β”‚
β”‚  Metrics/Logs (high volume):              β”‚
β”‚  - acks=1                                 β”‚
β”‚  - replication.factor=2                   β”‚
β”‚  β†’ Latency: 5-10ms                        β”‚
β”‚  β†’ Acceptable loss rate: <0.1%            β”‚
β”‚                                           β”‚
β”‚  Analytics Events (ultra-high volume):    β”‚
β”‚  - acks=0                                 β”‚
β”‚  - compression=gzip                       β”‚
β”‚  β†’ Latency: 1-2ms                         β”‚
β”‚  β†’ Loss rate: 1-2% (acceptable)           β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Lesson: Different acks for different data criticality

When to Use Each Ack Level

acks=0: Fire and Forget

Use When:

βœ“ High throughput required (100k+ msg/sec)
βœ“ Data loss is acceptable (logs, metrics)
βœ“ Data has natural redundancy (sensor arrays)
βœ“ Ultra-low latency required (<1ms)

Example: IoT sensor network
- 1000 sensors sending data every second
- If 1% of readings lost, still have 99%
- Aggregate statistics still accurate

acks=1: Leader Only

Use When:

βœ“ Good balance of performance and safety
βœ“ Occasional loss acceptable during failures
βœ“ High throughput with moderate durability
βœ“ Default choice for most workloads

Example: User activity tracking
- Click events, page views, etc.
- Occasional loss during broker failure OK
- Still maintain 99%+ delivery

acks=all: Full Replication

Use When:

βœ“ Zero data loss required
βœ“ Regulatory/compliance requirements
βœ“ Financial or critical business data
βœ“ Can tolerate higher latency (10-50ms)

Example: E-commerce order placement
- User places order (creates Kafka event)
- Order must not be lost
- OK to wait 20-30ms for full replication
- Worth latency cost for safety

Hybrid Approach

Different Topics, Different Acks:

// Critical orders: acks=all
const orderProducer = kafka.producer({
  acks: -1,
  timeout: 30000,
});

// Analytics events: acks=1
const analyticsProducer = kafka.producer({
  acks: 1,
  timeout: 10000,
});

// Metrics: acks=0
const metricsProducer = kafka.producer({
  acks: 0,
  compression: "gzip",
});

Interview Application

Common Interview Question

Q: β€œHow would you ensure zero data loss in a Kafka-based order processing system?”

Strong Answer:

β€œTo ensure zero data loss for orders, I’d configure producers with acks=all and proper ISR settings:

Producer Configuration:

acks=all (or acks=-1)
min.insync.replicas=2
replication.factor=3
retries=MAX_INT (infinite retries)
max.in.flight.requests=1 (for ordering)

How This Prevents Loss:

  1. acks=all: Producer waits for full replication before considering write successful
  2. min.insync.replicas=2: Requires at least 2 replicas (leader + 1 follower) to acknowledge
  3. replication.factor=3: Total of 3 copies across brokers
  4. Result: Message on β‰₯2 replicas before ACK

Failure Scenarios:

  • Network failure: Producer retries until successful
  • Leader failure: Message already on follower (promoted to new leader)
  • Follower failure: Still have leader + other follower (meets min ISR)
  • Leader + Follower fail: Third replica exists, can rebuild ISR

Only lose data if: All 3 replicas fail simultaneously (extremely rare)

Trade-offs:

  • Latency: 20-30ms vs 5-10ms for acks=1
  • Throughput: Lower (wait for replication)
  • Availability: May reject writes if ISR < 2

Worth It: For orders where data loss = lost revenue + angry customers

Monitoring: Alert if ISR falls below min.insync.replicas”

Code Example

Producer with Different Ack Levels

const { Kafka } = require("kafkajs");

const kafka = new Kafka({
  clientId: "my-producer",
  brokers: ["kafka1:9092", "kafka2:9092", "kafka3:9092"],
});

// Configuration 1: acks=0 (Fire and Forget)
async function sendMetrics() {
  const producer = kafka.producer({
    acks: 0, // No acknowledgment
    compression: "gzip",
  });

  await producer.connect();

  const start = Date.now();
  await producer.send({
    topic: "metrics",
    messages: [{ value: JSON.stringify({ cpu: 80, mem: 60 }) }],
  });
  const latency = Date.now() - start;

  console.log(`Metrics sent (acks=0): ${latency}ms`);
  // Typical output: 1-2ms
  // Risk: Message may be lost
}

// Configuration 2: acks=1 (Leader Acknowledgment)
async function sendUserActivity() {
  const producer = kafka.producer({
    acks: 1, // Leader acknowledgment (default)
    timeout: 30000,
    retry: {
      retries: 3,
      initialRetryTime: 100,
    },
  });

  await producer.connect();

  const start = Date.now();
  await producer.send({
    topic: "user-activity",
    messages: [
      {
        key: "user-123",
        value: JSON.stringify({ action: "click", page: "/products" }),
      },
    ],
  });
  const latency = Date.now() - start;

  console.log(`Activity sent (acks=1): ${latency}ms`);
  // Typical output: 5-10ms
  // Risk: Lost if leader fails before replication
}

// Configuration 3: acks=all (Full ISR Acknowledgment)
async function sendOrder() {
  const producer = kafka.producer({
    acks: -1, // acks=all (wait for full ISR)
    timeout: 30000,
    retry: {
      retries: Number.MAX_VALUE, // Retry forever
      initialRetryTime: 100,
      maxRetryTime: 30000,
    },
    idempotent: true, // Exactly-once semantics
    maxInFlightRequests: 1, // Preserve ordering
  });

  await producer.connect();

  const start = Date.now();
  try {
    await producer.send({
      topic: "orders", // Topic config: min.insync.replicas=2, replication.factor=3
      messages: [
        {
          key: "order-456",
          value: JSON.stringify({
            orderId: "456",
            userId: "123",
            total: 99.99,
            items: [{ id: "product-1", qty: 2 }],
          }),
        },
      ],
    });
    const latency = Date.now() - start;

    console.log(`Order sent (acks=all): ${latency}ms`);
    // Typical output: 15-30ms
    // Guarantee: Message on β‰₯2 replicas, zero loss
  } catch (error) {
    if (error.type === "NOT_ENOUGH_REPLICAS") {
      // ISR < min.insync.replicas (degraded cluster)
      console.error("Cluster degraded: Not enough in-sync replicas");
      // Alert operations team
      // Queue order for retry
    }
    throw error;
  }
}

// Demonstrating latency differences
async function benchmark() {
  console.log("Benchmarking producer acknowledgments...\n");

  await sendMetrics(); // ~1-2ms
  await sendUserActivity(); // ~5-10ms
  await sendOrder(); // ~15-30ms

  // Trade-off: Latency vs Durability
  // acks=0:   Fastest, least safe
  // acks=1:   Balanced (default)
  // acks=all: Slowest, safest
}

benchmark();

Error Handling with acks=all

async function sendCriticalData(data) {
  const producer = kafka.producer({
    acks: -1,
    retry: {
      retries: 5,
      initialRetryTime: 300,
    },
  });

  await producer.connect();

  try {
    await producer.send({
      topic: "critical-data",
      messages: [{ value: JSON.stringify(data) }],
    });

    console.log("Data persisted successfully (acks=all)");
  } catch (error) {
    // Error types to handle:

    if (error.type === "NOT_ENOUGH_REPLICAS") {
      // ISR < min.insync.replicas
      console.error("Not enough in-sync replicas");
      // Action: Alert operations, queue for retry
    }

    if (error.type === "NOT_ENOUGH_REPLICAS_AFTER_APPEND") {
      // Message written to leader, but ISR shrank before replication
      console.error("Replication failed after append");
      // Action: Retry (may be duplicate, use idempotent producer)
    }

    if (error.type === "REQUEST_TIMED_OUT") {
      // Replication took longer than timeout
      console.error("Acknowledgment timeout");
      // Action: Retry (may be duplicate)
    }

    // Store in dead letter queue for manual review
    await storeInDLQ(data, error);
    throw error;
  }
}

Prerequisites:

Related Concepts:

Used In Systems:

  • Kafka (producer acknowledgments)
  • Pulsar (similar ack levels)
  • RabbitMQ (publisher confirms)

Explained In Detail:

  • Kafka Deep Dive - Producer mechanics and acknowledgments

Quick Self-Check

  • Can explain acks=0/1/all in 60 seconds?
  • Understand latency vs durability trade-offs?
  • Know when messages can be lost for each ack level?
  • Can explain min.insync.replicas and ISR?
  • Understand acks=all + min.insync.replicas=2 pattern?
  • Know which ack level to use for different use cases?