Skip to content

Producer Batching

8 min Intermediate Messaging Interview: 70%

How message producers batch records to achieve high throughput by amortizing network overhead and maximizing sequential I/O

πŸ’Ό 70% of performance interviews
Interview Relevance
70% of performance interviews
🏭 LinkedIn (7+ trillion msgs/day)
Production Impact
Powers systems at LinkedIn (7+ trillion msgs/day)
⚑ 10-100x throughput
Performance
10-100x throughput query improvement
πŸ“ˆ 90%+ fewer network requests
Scalability
90%+ fewer network requests

TL;DR

Producer batching groups multiple messages together before sending them to the server, amortizing network overhead and maximizing throughput. Instead of sending each message immediately (1 message = 1 network request), batching collects messages for a short time window or until reaching a size threshold, then sends them together in a single request. This technique can improve throughput by 10-100x.

Visual Overview

WITHOUT BATCHING (Naive Approach):
T=0ms:  [Message A] ──▢ Network Request 1
T=5ms:  [Message B] ──▢ Network Request 2
T=8ms:  [Message C] ──▢ Network Request 3
T=12ms: [Message D] ──▢ Network Request 4

Result: 4 network requests, ~50ms total latency
Overhead: 4x network round-trips, 4x TCP overhead

WITH BATCHING (Optimized):
T=0ms:  [Message A] ──┐
T=5ms:  [Message B] ───
T=8ms:  [Message C] ──┼─── Batch Accumulation
T=12ms: [Message D] β”€β”€β”˜
T=20ms: [Batch: A,B,C,D] ──▢ Single Network Request

Result: 1 network request, ~30ms total latency
Overhead: 1x network round-trip, 4x compression efficiency

BATCH TRIGGERS:
β”œβ”€β”€ Size Threshold: batch.size = 32 KB (default)
β”œβ”€β”€ Time Threshold: linger.ms = 20 ms (configurable)
β”œβ”€β”€ Memory Pressure: Buffer full, send immediately
└── Explicit Flush: Application calls flush()

Core Explanation

What is Producer Batching?

Producer batching is a performance optimization where a message producer accumulates multiple messages in memory before sending them to the server in a single network request.

BATCHING ARCHITECTURE:

Application Thread:
  producer.send(message_1)  ──┐
  producer.send(message_2)  ───
  producer.send(message_3)  ──┼──▢ Batch Buffer (per partition)
  producer.send(message_4)  ───      β”‚
  producer.send(message_5)  β”€β”€β”˜      β”‚
                                     β–Ό
Background Sender Thread:            β”‚
                        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                        β”‚ Wait for trigger:       β”‚
                        β”‚ - Size >= 32 KB         β”‚
                        β”‚ - Time >= linger.ms     β”‚
                        β”‚ - Buffer full           β”‚
                        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                     β–Ό
                        [Send Batch] ──▢ Server

Key Batching Parameters:

// Batch size threshold (bytes)
batch.size = 32768  // 32 KB default

// Time to wait for batch to fill (milliseconds)
linger.ms = 0       // Send immediately (default)
linger.ms = 20      // Wait up to 20ms for more messages

// Total memory for all batches
buffer.memory = 67108864  // 64 MB default

Why Batching Dramatically Improves Performance

Network Overhead Analysis:

SINGLE MESSAGE SEND:
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ TCP/IP Header: 40 bytes                     β”‚
β”‚ Kafka Protocol Header: 100 bytes            β”‚
β”‚ Message Overhead: 50 bytes                  β”‚
β”‚ Actual Message Payload: 200 bytes           β”‚
β”‚ ─────────────────────────────────────────── β”‚
β”‚ Total: 390 bytes                            β”‚
β”‚ Efficiency: 200/390 = 51%                   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

BATCHED SEND (100 messages):
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ TCP/IP Header: 40 bytes (1x)                β”‚
β”‚ Kafka Protocol Header: 100 bytes (1x)       β”‚
β”‚ Message Overhead: 50 bytes Γ— 100 = 5000     β”‚
β”‚ Actual Message Payload: 200 Γ— 100 = 20000   β”‚
β”‚ ─────────────────────────────────────────── β”‚
β”‚ Total: 25,140 bytes                         β”‚
β”‚ Efficiency: 20000/25140 = 80%               β”‚
β”‚ Network Savings: 64x fewer requests         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Result: 64x reduction in network overhead!

Throughput Impact:

Scenario: Send 100,000 messages (200 bytes each)

NO BATCHING:
β”œβ”€β”€ Network RTT: 1ms per request
β”œβ”€β”€ Total time: 100,000 Γ— 1ms = 100 seconds
└── Throughput: 1,000 messages/sec

WITH BATCHING (100 msg/batch):
β”œβ”€β”€ Network RTT: 1ms per batch
β”œβ”€β”€ Total time: 1,000 batches Γ— 1ms = 1 second
└── Throughput: 100,000 messages/sec

100x improvement! πŸš€

Batching Triggers and Tradeoffs

Batch Completion Triggers:

TRIGGER 1: SIZE THRESHOLD REACHED
─────────────────────────────────────
Current batch: 31 KB
New message: 2 KB
Total: 33 KB > batch.size (32 KB)
Action: Send batch immediately

TRIGGER 2: TIME THRESHOLD REACHED
─────────────────────────────────────
Batch started: T=0ms
Current time: T=20ms >= linger.ms (20ms)
Action: Send batch (even if not full)

TRIGGER 3: MEMORY PRESSURE
─────────────────────────────────────
Buffer memory: 64 MB
Used: 62 MB (97% full)
Action: Send oldest batches to free memory

TRIGGER 4: EXPLICIT FLUSH
─────────────────────────────────────
Application calls: producer.flush()
Action: Send all pending batches immediately

The Latency-Throughput Tradeoff:

CONFIGURATION SPECTRUM:

Low Latency (Real-time Systems):
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ linger.ms = 0                   β”‚  ← Send immediately
β”‚ batch.size = 16384 (16 KB)      β”‚
β”‚                                 β”‚
β”‚ Latency: ~1-2ms                 β”‚
β”‚ Throughput: ~10K msg/sec        β”‚
β”‚ Use case: Trading, alerts       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Balanced (Most Applications):
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ linger.ms = 10-20               β”‚  ← Small wait window
β”‚ batch.size = 32768 (32 KB)      β”‚
β”‚                                 β”‚
β”‚ Latency: ~15-25ms               β”‚
β”‚ Throughput: ~50K msg/sec        β”‚
β”‚ Use case: Event streaming       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

High Throughput (Analytics):
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ linger.ms = 50-100              β”‚  ← Longer wait
β”‚ batch.size = 131072 (128 KB)    β”‚
β”‚                                 β”‚
β”‚ Latency: ~60-120ms              β”‚
β”‚ Throughput: ~200K msg/sec       β”‚
β”‚ Use case: Log aggregation       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Production Configuration Examples

Example 1: High-Throughput Log Ingestion

Properties props = new Properties();

// Optimize for throughput
props.put(ProducerConfig.BATCH_SIZE_CONFIG, 131072);    // 128 KB
props.put(ProducerConfig.LINGER_MS_CONFIG, 50);         // Wait 50ms
props.put(ProducerConfig.BUFFER_MEMORY_CONFIG, 268435456); // 256 MB

// Enable compression for better batching
props.put(ProducerConfig.COMPRESSION_TYPE_CONFIG, "lz4");

// Allow more in-flight requests
props.put(ProducerConfig.MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION, 5);

// Result: 10x throughput improvement
// Tradeoff: ~60ms added latency

Example 2: Low-Latency Real-Time Events

Properties props = new Properties();

// Optimize for latency
props.put(ProducerConfig.BATCH_SIZE_CONFIG, 16384);     // 16 KB
props.put(ProducerConfig.LINGER_MS_CONFIG, 0);          // No wait
props.put(ProducerConfig.BUFFER_MEMORY_CONFIG, 33554432); // 32 MB

// Minimal compression overhead
props.put(ProducerConfig.COMPRESSION_TYPE_CONFIG, "lz4");

// Limit in-flight for ordering
props.put(ProducerConfig.MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION, 1);

// Result: <5ms p99 latency
// Tradeoff: Lower throughput (~20K msg/sec)

Batch Compression and Efficiency

Compression with Batching:

WHY BATCHING IMPROVES COMPRESSION:

Single Message Compression:
Message 1: {"user_id": 123, "event": "click", "timestamp": 1234567890}
Compressed: 58 bytes β†’ 52 bytes (10% savings)

Batched Messages Compression (100 messages):
Original: 5,800 bytes
Compressed (lz4): 1,200 bytes (80% savings!)

Why better compression?
β”œβ”€β”€ Repeated keys: "user_id", "event", "timestamp" appear 100x
β”œβ”€β”€ Similar values: Timestamps are sequential
β”œβ”€β”€ Pattern recognition: Better with larger data sets
└── Compression dictionary: More effective context

Combined Batching + Compression:
β”œβ”€β”€ Network overhead: 64x reduction (batching)
β”œβ”€β”€ Payload size: 5x reduction (compression)
└── Total efficiency: 320x improvement!

Production Compression Strategy:

public class CompressionStrategy {

    // LZ4: Fast compression, low CPU
    // Best for: High-throughput systems with large batches
    // Compression: 2:1 ratio, 300 MB/sec
    props.put(ProducerConfig.COMPRESSION_TYPE_CONFIG, "lz4");

    // Snappy: Balanced
    // Best for: Moderate throughput, balanced CPU usage
    // Compression: 2.3:1 ratio, 250 MB/sec
    props.put(ProducerConfig.COMPRESSION_TYPE_CONFIG, "snappy");

    // GZIP: Best compression
    // Best for: Network-limited systems, low volume
    // Compression: 3.2:1 ratio, 50 MB/sec (high CPU)
    props.put(ProducerConfig.COMPRESSION_TYPE_CONFIG, "gzip");

    // None: No compression
    // Best for: Already-compressed data (images, video)
    props.put(ProducerConfig.COMPRESSION_TYPE_CONFIG, "none");
}

Memory Management and Buffer Pool

Buffer Pool Architecture:

PRODUCER MEMORY LAYOUT:

Total Buffer: 64 MB (buffer.memory)
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Partition 0 Batch: 32 KB (ready)       β”‚ ← Full batch
β”‚ Partition 1 Batch: 28 KB (building)    β”‚ ← Accumulating
β”‚ Partition 2 Batch: 31 KB (ready)       β”‚ ← Full batch
β”‚ Partition 3 Batch: 15 KB (building)    β”‚
β”‚ ...                                     β”‚
β”‚ Free Memory: 10 MB                      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Memory Exhaustion Behavior:
1. Buffer full (free < new batch size)
2. Block send() call for max.block.ms (default 60s)
3. If still full, throw BufferExhaustedException
4. As batches send, memory freed for new batches

Monitoring:
kafka.producer:type=producer-metrics,name=buffer-available-bytes

Tradeoffs

Advantages:

  • βœ“ Massively improved throughput (10-100x)
  • βœ“ Reduced network overhead (90%+ fewer requests)
  • βœ“ Better compression efficiency with larger batches
  • βœ“ Lower CPU usage per message (amortized overhead)
  • βœ“ Reduced server-side processing load

Disadvantages:

  • βœ• Increased latency (messages wait in batch)
  • βœ• Higher memory usage (buffering messages)
  • βœ• Complexity in tuning (batch.size vs linger.ms)
  • βœ• Risk of data loss if producer crashes before send
  • βœ• Larger failure blast radius (entire batch fails together)

Real Systems Using This

Apache Kafka

  • Implementation: Per-partition batching with configurable size and time thresholds
  • Scale: 7+ trillion messages/day at LinkedIn with aggressive batching
  • Default Config: 32 KB batch.size, 0ms linger.ms (conservative)
  • Production Config: 64-128 KB batch.size, 20-50ms linger.ms (optimized)

AWS Kinesis

  • Implementation: Automatic batching via PutRecords API (up to 500 records)
  • Limits: 5 MB/sec per shard, 1 MB per batch
  • SDK Behavior: KPL (Kinesis Producer Library) batches automatically

Google Cloud Pub/Sub

  • Implementation: Client library batches messages automatically
  • Config: Max batch size (1000 messages), max batch bytes (10 MB)
  • Optimization: Batching + request compression for efficiency

RabbitMQ

  • Implementation: Optional publisher confirms batching
  • Config: Manual batching via application-level buffering
  • Performance: 10x improvement with batching enabled

When to Use Producer Batching

βœ“ Perfect Use Cases

High-Volume Event Streaming

Scenario: Ingesting millions of events per second
Why batching: Maximizes network and disk efficiency
Example: Clickstream analytics, IoT sensor data
Config: Large batches (128 KB), medium linger (20-50ms)

Log Aggregation

Scenario: Centralized logging from 1000s of services
Why batching: Reduces load on logging infrastructure
Example: ELK stack ingestion, Splunk forwarding
Config: Large batches (128 KB), high linger (50-100ms)

Bulk Data Migration

Scenario: Moving large datasets between systems
Why batching: Maximum throughput, latency not critical
Example: Database CDC, ETL pipelines
Config: Maximum batches (256 KB), high linger (100ms)

βœ• When NOT to Use (or Use Minimal Batching)

Real-Time Alerting

Problem: Critical alerts delayed by batching
Solution: linger.ms=0, small batches (16 KB)
Example: Security alerts, system monitoring

Trading Systems

Problem: Milliseconds matter, batching adds latency
Solution: No batching (linger.ms=0) or very small windows
Example: High-frequency trading, order execution

Request-Response Patterns

Problem: User waiting for immediate response
Solution: Minimal batching, sync sends
Example: API calls, user-facing operations

Interview Application

Common Interview Question 1

Q: β€œHow would you optimize a producer that’s sending 100,000 small messages per second, causing high CPU and network usage?”

Strong Answer:

β€œThe issue is likely excessive network overhead from sending each message individually. I’d implement producer batching:

Diagnosis:

  • Current: 100K messages Γ— 1 KB = 100K network requests/sec
  • Network overhead: ~50% of bandwidth wasted on headers
  • CPU overhead: 100K serialize/send operations

Solution:

// Enable aggressive batching
props.put(ProducerConfig.BATCH_SIZE_CONFIG, 65536);    // 64 KB
props.put(ProducerConfig.LINGER_MS_CONFIG, 20);         // 20ms window
props.put(ProducerConfig.COMPRESSION_TYPE_CONFIG, "lz4");

Result:

  • Batching: 100K messages β†’ ~2K batches (50x reduction)
  • Compression: 64 KB β†’ ~15 KB per batch (4x savings)
  • Network: 98% reduction in requests
  • CPU: 95% reduction in overhead
  • Added latency: ~20ms (acceptable for most use cases)

Tradeoff: 20ms added latency vs 50x throughput improvement. For log/event streaming, this is optimal.”

Why this is good:

  • Quantifies the problem
  • Provides specific configuration
  • Explains each parameter choice
  • Analyzes tradeoffs explicitly
  • Gives measurable results

Common Interview Question 2

Q: β€œYour Kafka producer is dropping messages under high load. How would you debug and fix this?”

Strong Answer:

β€œMessage drops under load suggest buffer memory exhaustion. Here’s my approach:

Diagnosis Steps:

  1. Check JMX metric: buffer-available-bytes β†’ Likely near 0
  2. Check logs for BufferExhaustedException
  3. Check max.block.ms timeout (default 60s)

Root Cause Analysis:

  • Batches accumulating faster than sender thread can send
  • Possible causes:
    • Network slowness (broker response time)
    • Too small buffer.memory for traffic volume
    • Inefficient batching (small batches = more sends)

Solutions (in order):

1. Increase buffer memory:

props.put(ProducerConfig.BUFFER_MEMORY_CONFIG, 268435456); // 256 MB

2. Optimize batching for throughput:

props.put(ProducerConfig.BATCH_SIZE_CONFIG, 131072);    // 128 KB
props.put(ProducerConfig.LINGER_MS_CONFIG, 50);         // Wait for fuller batches
props.put(ProducerConfig.COMPRESSION_TYPE_CONFIG, "lz4");

3. Application-level backpressure:

producer.send(record, (metadata, exception) -> {
    if (exception instanceof BufferExhaustedException) {
        // Implement retry with exponential backoff
        // Or shed load (return 503 to clients)
    }
});

Result: Larger buffer + more efficient batching = 10x capacity improvement”

Why this is good:

  • Systematic debugging approach
  • Multiple solution layers
  • Specific metrics to check
  • Code examples
  • Explains root cause clearly

Red Flags to Avoid

  • βœ• Not understanding latency tradeoff of batching
  • βœ• Setting linger.ms without understanding batch.size
  • βœ• Not considering memory implications
  • βœ• Ignoring compression benefits with batching
  • βœ• Not knowing how to measure batching efficiency

Quick Self-Check

Before moving on, can you:

  • Explain producer batching in 60 seconds?
  • Draw the batching flow from send() to network?
  • List all 4 batch trigger conditions?
  • Explain the latency-throughput tradeoff?
  • Calculate network savings from batching?
  • Configure producer for high-throughput vs low-latency?

Prerequisites

None - this is a foundational performance concept

Used In Systems

Explained In Detail


Next Recommended: Producer Acknowledgments - Understand reliability guarantees