Producer Batching

TL;DR

Producer batching groups multiple messages together before sending them to the server, amortizing network overhead and maximizing throughput. Instead of sending each message immediately (1 message = 1 network request), batching collects messages for a short time window or until reaching a size threshold, then sends them together in a single request. This technique can improve throughput by 10-100x.

Visual Overview

WITHOUT BATCHING (Naive Approach):
T=0ms:  [Message A] ──▶ Network Request 1
T=5ms:  [Message B] ──▶ Network Request 2
T=8ms:  [Message C] ──▶ Network Request 3
T=12ms: [Message D] ──▶ Network Request 4

Result: 4 network requests, ~50ms total latency
Overhead: 4x network round-trips, 4x TCP overhead

WITH BATCHING (Optimized):
T=0ms:  [Message A] ──┐
T=5ms:  [Message B] ──┤
T=8ms:  [Message C] ──┼─── Batch Accumulation
T=12ms: [Message D] ──┘
T=20ms: [Batch: A,B,C,D] ──▶ Single Network Request

Result: 1 network request, ~30ms total latency
Overhead: 1x network round-trip, 4x compression efficiency

BATCH TRIGGERS:
├── Size Threshold: batch.size = 32 KB (default)
├── Time Threshold: linger.ms = 20 ms (configurable)
├── Memory Pressure: Buffer full, send immediately
└── Explicit Flush: Application calls flush()

Core Explanation

What is Producer Batching?

Producer batching is a performance optimization where a message producer accumulates multiple messages in memory before sending them to the server in a single network request.

BATCHING ARCHITECTURE:

Application Thread:
  producer.send(message_1)  ──┐
  producer.send(message_2)  ──┤
  producer.send(message_3)  ──┼──▶ Batch Buffer (per partition)
  producer.send(message_4)  ──┤      │
  producer.send(message_5)  ──┘      │
                                     ▼
Background Sender Thread:            │
                        ┌────────────┴────────────┐
                        │ Wait for trigger:       │
                        │ - Size >= 32 KB         │
                        │ - Time >= linger.ms     │
                        │ - Buffer full           │
                        └────────────┬────────────┘
                                     ▼
                        [Send Batch] ──▶ Server

Key Batching Parameters:

// Batch size threshold (bytes)
batch.size = 32768  // 32 KB default

// Time to wait for batch to fill (milliseconds)
linger.ms = 0       // Send immediately (default)
linger.ms = 20      // Wait up to 20ms for more messages

// Total memory for all batches
buffer.memory = 67108864  // 64 MB default

Why Batching Dramatically Improves Performance

Network Overhead Analysis:

SINGLE MESSAGE SEND:
┌─────────────────────────────────────────────┐
│ TCP/IP Header: 40 bytes                     │
│ Kafka Protocol Header: 100 bytes            │
│ Message Overhead: 50 bytes                  │
│ Actual Message Payload: 200 bytes           │
│ ─────────────────────────────────────────── │
│ Total: 390 bytes                            │
│ Efficiency: 200/390 = 51%                   │
└─────────────────────────────────────────────┘

BATCHED SEND (100 messages):
┌─────────────────────────────────────────────┐
│ TCP/IP Header: 40 bytes (1x)                │
│ Kafka Protocol Header: 100 bytes (1x)       │
│ Message Overhead: 50 bytes × 100 = 5000     │
│ Actual Message Payload: 200 × 100 = 20000   │
│ ─────────────────────────────────────────── │
│ Total: 25,140 bytes                         │
│ Efficiency: 20000/25140 = 80%               │
│ Network Savings: 64x fewer requests         │
└─────────────────────────────────────────────┘

Result: 64x reduction in network overhead!

Throughput Impact:

Scenario: Send 100,000 messages (200 bytes each)

NO BATCHING:
├── Network RTT: 1ms per request
├── Total time: 100,000 × 1ms = 100 seconds
└── Throughput: 1,000 messages/sec

WITH BATCHING (100 msg/batch):
├── Network RTT: 1ms per batch
├── Total time: 1,000 batches × 1ms = 1 second
└── Throughput: 100,000 messages/sec

100x improvement! 🚀

Batching Triggers and Tradeoffs

Batch Completion Triggers:

TRIGGER 1: SIZE THRESHOLD REACHED
─────────────────────────────────────
Current batch: 31 KB
New message: 2 KB
Total: 33 KB > batch.size (32 KB)
Action: Send batch immediately

TRIGGER 2: TIME THRESHOLD REACHED
─────────────────────────────────────
Batch started: T=0ms
Current time: T=20ms >= linger.ms (20ms)
Action: Send batch (even if not full)

TRIGGER 3: MEMORY PRESSURE
─────────────────────────────────────
Buffer memory: 64 MB
Used: 62 MB (97% full)
Action: Send oldest batches to free memory

TRIGGER 4: EXPLICIT FLUSH
─────────────────────────────────────
Application calls: producer.flush()
Action: Send all pending batches immediately

The Latency-Throughput Tradeoff:

CONFIGURATION SPECTRUM:

Low Latency (Real-time Systems):
┌─────────────────────────────────┐
│ linger.ms = 0                   │  ← Send immediately
│ batch.size = 16384 (16 KB)      │
│                                 │
│ Latency: ~1-2ms                 │
│ Throughput: ~10K msg/sec        │
│ Use case: Trading, alerts       │
└─────────────────────────────────┘

Balanced (Most Applications):
┌─────────────────────────────────┐
│ linger.ms = 10-20               │  ← Small wait window
│ batch.size = 32768 (32 KB)      │
│                                 │
│ Latency: ~15-25ms               │
│ Throughput: ~50K msg/sec        │
│ Use case: Event streaming       │
└─────────────────────────────────┘

High Throughput (Analytics):
┌─────────────────────────────────┐
│ linger.ms = 50-100              │  ← Longer wait
│ batch.size = 131072 (128 KB)    │
│                                 │
│ Latency: ~60-120ms              │
│ Throughput: ~200K msg/sec       │
│ Use case: Log aggregation       │
└─────────────────────────────────┘

Production Configuration Examples

Example 1: High-Throughput Log Ingestion

Properties props = new Properties();

// Optimize for throughput
props.put(ProducerConfig.BATCH_SIZE_CONFIG, 131072);    // 128 KB
props.put(ProducerConfig.LINGER_MS_CONFIG, 50);         // Wait 50ms
props.put(ProducerConfig.BUFFER_MEMORY_CONFIG, 268435456); // 256 MB

// Enable compression for better batching
props.put(ProducerConfig.COMPRESSION_TYPE_CONFIG, "lz4");

// Allow more in-flight requests
props.put(ProducerConfig.MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION, 5);

// Result: 10x throughput improvement
// Tradeoff: ~60ms added latency

Example 2: Low-Latency Real-Time Events

Properties props = new Properties();

// Optimize for latency
props.put(ProducerConfig.BATCH_SIZE_CONFIG, 16384);     // 16 KB
props.put(ProducerConfig.LINGER_MS_CONFIG, 0);          // No wait
props.put(ProducerConfig.BUFFER_MEMORY_CONFIG, 33554432); // 32 MB

// Minimal compression overhead
props.put(ProducerConfig.COMPRESSION_TYPE_CONFIG, "lz4");

// Limit in-flight for ordering
props.put(ProducerConfig.MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION, 1);

// Result: <5ms p99 latency
// Tradeoff: Lower throughput (~20K msg/sec)

Batch Compression and Efficiency

Compression with Batching:

WHY BATCHING IMPROVES COMPRESSION:

Single Message Compression:
Message 1: {"user_id": 123, "event": "click", "timestamp": 1234567890}
Compressed: 58 bytes → 52 bytes (10% savings)

Batched Messages Compression (100 messages):
Original: 5,800 bytes
Compressed (lz4): 1,200 bytes (80% savings!)

Why better compression?
├── Repeated keys: "user_id", "event", "timestamp" appear 100x
├── Similar values: Timestamps are sequential
├── Pattern recognition: Better with larger data sets
└── Compression dictionary: More effective context

Combined Batching + Compression:
├── Network overhead: 64x reduction (batching)
├── Payload size: 5x reduction (compression)
└── Total efficiency: 320x improvement!

Production Compression Strategy:

public class CompressionStrategy {

    // LZ4: Fast compression, low CPU
    // Best for: High-throughput systems with large batches
    // Compression: 2:1 ratio, 300 MB/sec
    props.put(ProducerConfig.COMPRESSION_TYPE_CONFIG, "lz4");

    // Snappy: Balanced
    // Best for: Moderate throughput, balanced CPU usage
    // Compression: 2.3:1 ratio, 250 MB/sec
    props.put(ProducerConfig.COMPRESSION_TYPE_CONFIG, "snappy");

    // GZIP: Best compression
    // Best for: Network-limited systems, low volume
    // Compression: 3.2:1 ratio, 50 MB/sec (high CPU)
    props.put(ProducerConfig.COMPRESSION_TYPE_CONFIG, "gzip");

    // None: No compression
    // Best for: Already-compressed data (images, video)
    props.put(ProducerConfig.COMPRESSION_TYPE_CONFIG, "none");
}

Memory Management and Buffer Pool

Buffer Pool Architecture:

PRODUCER MEMORY LAYOUT:

Total Buffer: 64 MB (buffer.memory)
├─────────────────────────────────────────┐
│ Partition 0 Batch: 32 KB (ready)       │ ← Full batch
│ Partition 1 Batch: 28 KB (building)    │ ← Accumulating
│ Partition 2 Batch: 31 KB (ready)       │ ← Full batch
│ Partition 3 Batch: 15 KB (building)    │
│ ...                                     │
│ Free Memory: 10 MB                      │
└─────────────────────────────────────────┘

Memory Exhaustion Behavior:
1. Buffer full (free < new batch size)
2. Block send() call for max.block.ms (default 60s)
3. If still full, throw BufferExhaustedException
4. As batches send, memory freed for new batches

Monitoring:
kafka.producer:type=producer-metrics,name=buffer-available-bytes

Tradeoffs

Advantages:

✓ Massively improved throughput (10-100x)
✓ Reduced network overhead (90%+ fewer requests)
✓ Better compression efficiency with larger batches
✓ Lower CPU usage per message (amortized overhead)
✓ Reduced server-side processing load

Disadvantages:

✕ Increased latency (messages wait in batch)
✕ Higher memory usage (buffering messages)
✕ Complexity in tuning (batch.size vs linger.ms)
✕ Risk of data loss if producer crashes before send
✕ Larger failure blast radius (entire batch fails together)

Real Systems Using This

Apache Kafka

Implementation: Per-partition batching with configurable size and time thresholds
Scale: 7+ trillion messages/day at LinkedIn with aggressive batching
Default Config: 32 KB batch.size, 0ms linger.ms (conservative)
Production Config: 64-128 KB batch.size, 20-50ms linger.ms (optimized)

AWS Kinesis

Implementation: Automatic batching via PutRecords API (up to 500 records)
Limits: 5 MB/sec per shard, 1 MB per batch
SDK Behavior: KPL (Kinesis Producer Library) batches automatically

Google Cloud Pub/Sub

Implementation: Client library batches messages automatically
Config: Max batch size (1000 messages), max batch bytes (10 MB)
Optimization: Batching + request compression for efficiency

RabbitMQ

Implementation: Optional publisher confirms batching
Config: Manual batching via application-level buffering
Performance: 10x improvement with batching enabled

When to Use Producer Batching

✓ Perfect Use Cases

High-Volume Event Streaming

Scenario: Ingesting millions of events per second
Why batching: Maximizes network and disk efficiency
Example: Clickstream analytics, IoT sensor data
Config: Large batches (128 KB), medium linger (20-50ms)

Log Aggregation

Scenario: Centralized logging from 1000s of services
Why batching: Reduces load on logging infrastructure
Example: ELK stack ingestion, Splunk forwarding
Config: Large batches (128 KB), high linger (50-100ms)

Bulk Data Migration

Scenario: Moving large datasets between systems
Why batching: Maximum throughput, latency not critical
Example: Database CDC, ETL pipelines
Config: Maximum batches (256 KB), high linger (100ms)

✕ When NOT to Use (or Use Minimal Batching)

Real-Time Alerting

Problem: Critical alerts delayed by batching
Solution: linger.ms=0, small batches (16 KB)
Example: Security alerts, system monitoring

Trading Systems

Problem: Milliseconds matter, batching adds latency
Solution: No batching (linger.ms=0) or very small windows
Example: High-frequency trading, order execution

Request-Response Patterns

Problem: User waiting for immediate response
Solution: Minimal batching, sync sends
Example: API calls, user-facing operations

Interview Application

Common Interview Question 1

Q: “How would you optimize a producer that’s sending 100,000 small messages per second, causing high CPU and network usage?”

Strong Answer:

“The issue is likely excessive network overhead from sending each message individually. I’d implement producer batching:

Diagnosis:

Current: 100K messages × 1 KB = 100K network requests/sec

Network overhead: ~50% of bandwidth wasted on headers

CPU overhead: 100K serialize/send operations

Solution:
// Enable aggressive batching
props.put(ProducerConfig.BATCH_SIZE_CONFIG, 65536);    // 64 KB
props.put(ProducerConfig.LINGER_MS_CONFIG, 20);         // 20ms window
props.put(ProducerConfig.COMPRESSION_TYPE_CONFIG, "lz4");
Result:

Batching: 100K messages → ~2K batches (50x reduction)

Compression: 64 KB → ~15 KB per batch (4x savings)

Network: 98% reduction in requests

CPU: 95% reduction in overhead

Added latency: ~20ms (acceptable for most use cases)

Tradeoff: 20ms added latency vs 50x throughput improvement. For log/event streaming, this is optimal.”

Why this is good:

Quantifies the problem
Provides specific configuration
Explains each parameter choice
Analyzes tradeoffs explicitly
Gives measurable results

Common Interview Question 2

Q: “Your Kafka producer is dropping messages under high load. How would you debug and fix this?”