Skip to content

Consumer Groups

How multiple consumers coordinate to process partitions in parallel with fault tolerance, automatic rebalancing, and exactly-once guarantees

TL;DR

Consumer groups enable multiple consumer instances to work together to process partitions from a topic in parallel. Each partition is assigned to exactly one consumer within a group, providing parallel processing while maintaining ordering guarantees. Automatic rebalancing handles failures and scaling.

Visual Overview

Consumer Group Architecture

Core Explanation

What is a Consumer Group?

A consumer group is a logical collection of consumer instances that work together to consume messages from a topic. The group provides:

  • Load distribution: Partitions spread across consumers
  • Fault tolerance: Failed consumers automatically replaced
  • Scaling: Add/remove consumers dynamically
  • Coordination: Group coordinator manages partition assignments

Partition Assignment Guarantee

The Golden Rule:

Each partition is assigned to exactly one consumer within a consumer group at any given time.

Partition Assignment Guarantee

This guarantee ensures:

  • No duplicate processing within a group
  • Ordering maintained per partition
  • Clear ownership of each partition

How Partition Assignment Works

Assignment Strategies:

// 1. RANGE STRATEGY (default)
// Assigns consecutive partitions to consumers
Topic: user-events (6 partitions)
Consumer A: [0, 1]
Consumer B: [2, 3]
Consumer C: [4, 5]
// Pro: Simple, predictable
// Con: Uneven if partition count doesn't divide evenly

// 2. ROUND-ROBIN STRATEGY
// Distributes partitions one-by-one in round-robin
Topic: user-events (6 partitions)
Consumer A: [0, 3]
Consumer B: [1, 4]
Consumer C: [2, 5]
// Pro: Even distribution
// Con: Less predictable, more partition movement on rebalance

// 3. STICKY STRATEGY
// Minimizes partition movement during rebalance
// Keeps existing assignments when possible
// Pro: Reduces rebalancing overhead
// Con: Slightly more complex

Configuration:

Properties props = new Properties();
props.put(ConsumerConfig.GROUP_ID_CONFIG, "analytics-processors");
props.put(ConsumerConfig.PARTITION_ASSIGNMENT_STRATEGY_CONFIG,
    "org.apache.kafka.clients.consumer.RangeAssignor");

Group Coordinator and Rebalancing

Coordinator Selection:

Coordinator Selection

Rebalancing Protocol (Simplified):

Rebalancing Protocol

Scaling Patterns

Under-Subscribed (Fewer Consumers than Partitions):

Under-Subscribed Pattern

Fully-Subscribed (Equal Consumers and Partitions):

Fully-Subscribed Pattern

Over-Subscribed (More Consumers than Partitions):

Over-Subscribed Pattern

Multiple Consumer Groups

Independent Processing:

Multiple Consumer Groups

Use Case - Multiple Processing Pipelines:

Multiple Processing Pipelines

Tradeoffs

Advantages:

  • ✓ Horizontal scalability (add more consumers)
  • ✓ Automatic fault tolerance (consumer failures handled)
  • ✓ Load balancing across consumers
  • ✓ Multiple independent processing pipelines (multiple groups)

Disadvantages:

  • ✕ Rebalancing causes processing pause (stop-the-world)
  • ✕ Cannot scale beyond partition count
  • ✕ Partition assignment may be uneven
  • ✕ Rebalancing overhead on frequent consumer changes

Real Systems Using This

Kafka (Apache)

  • Implementation: Group coordinator per partition in __consumer_offsets
  • Scale: Thousands of consumer groups processing trillions of messages
  • Typical Setup: 10-50 consumers per group for high-throughput topics

Amazon Kinesis

  • Implementation: Kinesis Client Library (KCL) provides similar consumer group semantics
  • Scale: Auto-scaling consumer groups based on shard count
  • Typical Setup: 1 worker per shard, auto-scaling with shard splits/merges

Apache Pulsar

  • Implementation: Shared subscription model (similar to consumer groups)
  • Scale: Automatic load rebalancing without stop-the-world pauses
  • Typical Setup: Dynamic consumer scaling with minimal disruption

When to Use Consumer Groups

✓ Perfect Use Cases

High-Throughput Event Processing

High-Throughput Event Processing

Parallel Data Pipeline

Parallel Data Pipeline

Multiple Processing Pipelines

Multiple Processing Pipelines Use Case

✕ When NOT to Use

Need Broadcast to All Consumers

Broadcast Requirement

Very Low Latency Requirements

Very Low Latency Requirements

More Consumers than Partitions Long-Term

More Consumers than Partitions

Interview Application

Common Interview Question 1

Q: “You have a topic with 10 partitions. If you deploy 15 consumers in the same consumer group, what happens?”

Strong Answer:

“Only 10 consumers will be active - one per partition. The remaining 5 consumers will be idle since each partition can only be assigned to one consumer in a group. This is inefficient. To utilize all 15 consumers, I’d either increase the partition count to 15+, or split the workload across multiple topics. If scaling further is anticipated, I’d over-provision partitions upfront since changing partition count requires topic recreation.”

Why this is good:

  • Shows understanding of partition assignment constraint
  • Identifies the inefficiency
  • Provides multiple solutions
  • Considers future scaling

Common Interview Question 2

Q: “What happens during a consumer group rebalance? How does it affect processing?”

Strong Answer:

“Rebalancing occurs when consumers join, leave, or crash. The process:

  1. Coordinator detects the change (heartbeat timeout or explicit notification)
  2. Sends REBALANCE_IN_PROGRESS to all group members
  3. Consumers stop processing and commit their offsets
  4. All consumers re-join the group
  5. Coordinator calculates new partition assignments using the configured strategy
  6. Consumers receive new assignments and resume processing

Impact: Processing pauses for ~500ms to several seconds. In production, we minimize rebalances by:

  • Using static membership (Kafka 2.3+) to avoid rebalances on restarts
  • Tuning session.timeout.ms and heartbeat.interval.ms
  • Using sticky assignor to minimize partition movement
  • Graceful shutdowns with proper leave group notifications”

Why this is good:

  • Detailed step-by-step understanding
  • Quantifies the impact
  • Shows production awareness
  • Provides optimization strategies

Red Flags to Avoid

  • ✕ Confusing consumer groups with partition replicas
  • ✕ Claiming you can assign same partition to multiple consumers in one group
  • ✕ Not knowing about rebalancing and its impact
  • ✕ Forgetting that consumer count cannot exceed partition count for effectiveness

Quick Self-Check

Before moving on, can you:

  • Explain consumer groups in 60 seconds?
  • Draw a diagram showing partition-to-consumer assignment?
  • Explain what triggers a rebalance?
  • Calculate optimal consumer count given partition count?
  • Identify when to use multiple consumer groups?
  • Explain the partition assignment guarantee?

See It In Action

Prerequisites

Used In Systems

  • Real-Time Analytics Pipelines - Consumer groups for parallel processing
  • Event-Driven Microservices - Multiple consumer groups per service

Explained In Detail

  • Kafka Architecture - Consumer Groups & Rebalancing section (30 minutes)
  • Deep dive into rebalancing protocols, partition assignment strategies, and coordinator mechanics

Next Recommended: Offset Management - Learn how consumers track their position in partitions

Interview Notes
⭐ Must-Know
💼90% of messaging interviews
Interview Relevance
90% of messaging interviews
🏭LinkedIn, Uber, Netflix
Production Impact
Powers systems at LinkedIn, Uber, Netflix
Billions of messages
Performance
Billions of messages query improvement
📈Hundreds of parallel workers
Scalability
Hundreds of parallel workers