Skip to content

Consumer Groups

10 min Intermediate Messaging Interview: 90%

How multiple consumers coordinate to process partitions in parallel with fault tolerance, automatic rebalancing, and exactly-once guarantees

⭐ Must-Know
πŸ’Ό 90% of messaging interviews
Interview Relevance
90% of messaging interviews
🏭 LinkedIn, Uber, Netflix
Production Impact
Powers systems at LinkedIn, Uber, Netflix
⚑ Billions of messages
Performance
Billions of messages query improvement
πŸ“ˆ Hundreds of parallel workers
Scalability
Hundreds of parallel workers

TL;DR

Consumer groups enable multiple consumer instances to work together to process partitions from a topic in parallel. Each partition is assigned to exactly one consumer within a group, providing parallel processing while maintaining ordering guarantees. Automatic rebalancing handles failures and scaling.

Visual Overview

CONSUMER GROUP ARCHITECTURE:

Topic: user-events (4 partitions)
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚Part 0   β”‚Part 1   β”‚Part 2   β”‚Part 3   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
     β”‚        β”‚        β”‚        β”‚
     β”‚        β”‚        β”‚        β”‚
     β–Ό        β–Ό        β–Ό        β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚Consumer β”‚Consumer β”‚Consumer β”‚Consumer β”‚
β”‚   A     β”‚   B     β”‚   C     β”‚   D     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Group: "analytics-processors"

KEY GUARANTEES:
β”œβ”€β”€ Each partition assigned to exactly ONE consumer in group
β”œβ”€β”€ Each consumer can handle multiple partitions
β”œβ”€β”€ Automatic rebalancing on member changes
└── Fault tolerance through coordinator failover

REBALANCING SCENARIOS:
1. Consumer joins:   4 partitions β†’ 5 consumers (rebalance)
2. Consumer crashes: 4 partitions β†’ 3 consumers (rebalance)
3. Partition added:  New partition needs assignment (rebalance)

Core Explanation

What is a Consumer Group?

A consumer group is a logical collection of consumer instances that work together to consume messages from a topic. The group provides:

  • Load distribution: Partitions spread across consumers
  • Fault tolerance: Failed consumers automatically replaced
  • Scaling: Add/remove consumers dynamically
  • Coordination: Group coordinator manages partition assignments

Partition Assignment Guarantee

The Golden Rule:

Each partition is assigned to exactly one consumer within a consumer group at any given time.

VALID ASSIGNMENT (4 partitions, 3 consumers):
Consumer A: [Partition 0, Partition 1]
Consumer B: [Partition 2]
Consumer C: [Partition 3]
βœ“ Each partition assigned exactly once

INVALID ASSIGNMENT:
Consumer A: [Partition 0]
Consumer B: [Partition 0]  βœ• Partition 0 assigned twice!

This guarantee ensures:

  • No duplicate processing within a group
  • Ordering maintained per partition
  • Clear ownership of each partition

How Partition Assignment Works

Assignment Strategies:

// 1. RANGE STRATEGY (default)
// Assigns consecutive partitions to consumers
Topic: user-events (6 partitions)
Consumer A: [0, 1]
Consumer B: [2, 3]
Consumer C: [4, 5]
// Pro: Simple, predictable
// Con: Uneven if partition count doesn't divide evenly

// 2. ROUND-ROBIN STRATEGY
// Distributes partitions one-by-one in round-robin
Topic: user-events (6 partitions)
Consumer A: [0, 3]
Consumer B: [1, 4]
Consumer C: [2, 5]
// Pro: Even distribution
// Con: Less predictable, more partition movement on rebalance

// 3. STICKY STRATEGY
// Minimizes partition movement during rebalance
// Keeps existing assignments when possible
// Pro: Reduces rebalancing overhead
// Con: Slightly more complex

Configuration:

Properties props = new Properties();
props.put(ConsumerConfig.GROUP_ID_CONFIG, "analytics-processors");
props.put(ConsumerConfig.PARTITION_ASSIGNMENT_STRATEGY_CONFIG,
    "org.apache.kafka.clients.consumer.RangeAssignor");

Group Coordinator and Rebalancing

Coordinator Selection:

Group ID: "analytics-processors"
                ↓
    hash(groupId) % num_partitions(__consumer_offsets)
                ↓
    Partition 23 in __consumer_offsets
                ↓
    Broker 2 (leader of partition 23)
                ↓
    Broker 2 becomes Group Coordinator

Rebalancing Protocol (Simplified):

REBALANCING FLOW:

1. TRIGGER EVENT
   β”œβ”€β”€ Consumer joins group
   β”œβ”€β”€ Consumer leaves/crashes
   β”œβ”€β”€ Consumer heartbeat timeout
   └── Partition count changes

2. COORDINATOR INITIATES REBALANCE
   β”œβ”€β”€ Sends REBALANCE_IN_PROGRESS to all consumers
   └── Consumers stop processing, commit offsets

3. JOIN GROUP PHASE
   β”œβ”€β”€ All consumers re-join group
   β”œβ”€β”€ Send their supported partition assignment strategies
   └── Coordinator collects member info

4. ASSIGNMENT PHASE
   β”œβ”€β”€ Coordinator runs assignment strategy
   β”œβ”€β”€ Calculates new partition assignments
   └── Sends assignments to consumers

5. RESUME PROCESSING
   └── Consumers start consuming from new assignments

TOTAL REBALANCE TIME: ~500ms to several seconds

Scaling Patterns

Under-Subscribed (Fewer Consumers than Partitions):

4 Partitions, 2 Consumers:
Consumer A: [P0, P1]
Consumer B: [P2, P3]

Throughput: 2x (2 parallel consumers)
Utilization: 100% (all consumers busy)

Fully-Subscribed (Equal Consumers and Partitions):

4 Partitions, 4 Consumers:
Consumer A: [P0]
Consumer B: [P1]
Consumer C: [P2]
Consumer D: [P3]

Throughput: 4x (4 parallel consumers)
Utilization: 100% (optimal)

Over-Subscribed (More Consumers than Partitions):

4 Partitions, 6 Consumers:
Consumer A: [P0]
Consumer B: [P1]
Consumer C: [P2]
Consumer D: [P3]
Consumer E: []  ⚠️ IDLE
Consumer F: []  ⚠️ IDLE

Throughput: 4x (limited by partitions)
Utilization: 67% (2 consumers wasted)

βœ• Cannot scale beyond partition count!

Multiple Consumer Groups

Independent Processing:

Topic: user-events (4 partitions)
        β”‚
        β”œβ”€β”€β”€β–Ί Group: "analytics" (processes all events)
        β”‚     Consumer A: [P0, P1]
        β”‚     Consumer B: [P2, P3]
        β”‚
        └───► Group: "fraud-detection" (also processes all events)
              Consumer X: [P0, P1, P2, P3]

Each group independently consumes ALL messages.
Groups do NOT affect each other.

Use Case - Multiple Processing Pipelines:

Topic: "user-actions"

Group 1: "real-time-analytics"
  β†’ Processes events for live dashboards

Group 2: "ml-feature-pipeline"
  β†’ Extracts features for ML models

Group 3: "audit-logger"
  β†’ Archives events for compliance

All three groups consume the SAME messages independently.

Tradeoffs

Advantages:

  • βœ“ Horizontal scalability (add more consumers)
  • βœ“ Automatic fault tolerance (consumer failures handled)
  • βœ“ Load balancing across consumers
  • βœ“ Multiple independent processing pipelines (multiple groups)

Disadvantages:

  • βœ• Rebalancing causes processing pause (stop-the-world)
  • βœ• Cannot scale beyond partition count
  • βœ• Partition assignment may be uneven
  • βœ• Rebalancing overhead on frequent consumer changes

Real Systems Using This

Kafka (Apache)

  • Implementation: Group coordinator per partition in __consumer_offsets
  • Scale: Thousands of consumer groups processing trillions of messages
  • Typical Setup: 10-50 consumers per group for high-throughput topics

Amazon Kinesis

  • Implementation: Kinesis Client Library (KCL) provides similar consumer group semantics
  • Scale: Auto-scaling consumer groups based on shard count
  • Typical Setup: 1 worker per shard, auto-scaling with shard splits/merges

Apache Pulsar

  • Implementation: Shared subscription model (similar to consumer groups)
  • Scale: Automatic load rebalancing without stop-the-world pauses
  • Typical Setup: Dynamic consumer scaling with minimal disruption

When to Use Consumer Groups

βœ“ Perfect Use Cases

High-Throughput Event Processing

Scenario: Processing 1M events/sec from user activity stream
Solution: Consumer group with 100 consumers (10K events/sec each)
Result: Linear scaling, automatic fault tolerance

Parallel Data Pipeline

Scenario: Real-time ETL from Kafka to data warehouse
Solution: Consumer group with partitions = number of available cores
Result: Maximize parallelism while maintaining ordering per partition

Multiple Processing Pipelines

Scenario: Same events need processing by analytics, ML, and audit systems
Solution: Three separate consumer groups on same topic
Result: Independent processing without interfering with each other

βœ• When NOT to Use

Need Broadcast to All Consumers

Problem: Every consumer must receive ALL messages
Issue: Consumer groups distribute messages (each gets subset)
Alternative: Use separate consumer groups or pub-sub pattern

Very Low Latency Requirements

Problem: Sub-millisecond latency critical
Issue: Rebalancing causes temporary processing pause
Alternative: Single consumer or fixed partition assignment

More Consumers than Partitions Long-Term

Problem: Want to run 100 consumers with only 10 partitions
Issue: 90 consumers will be idle, wasting resources
Alternative: Increase partition count or reduce consumers

Interview Application

Common Interview Question 1

Q: β€œYou have a topic with 10 partitions. If you deploy 15 consumers in the same consumer group, what happens?”

Strong Answer:

β€œOnly 10 consumers will be active - one per partition. The remaining 5 consumers will be idle since each partition can only be assigned to one consumer in a group. This is inefficient. To utilize all 15 consumers, I’d either increase the partition count to 15+, or split the workload across multiple topics. If scaling further is anticipated, I’d over-provision partitions upfront since changing partition count requires topic recreation.”

Why this is good:

  • Shows understanding of partition assignment constraint
  • Identifies the inefficiency
  • Provides multiple solutions
  • Considers future scaling

Common Interview Question 2

Q: β€œWhat happens during a consumer group rebalance? How does it affect processing?”

Strong Answer:

β€œRebalancing occurs when consumers join, leave, or crash. The process:

  1. Coordinator detects the change (heartbeat timeout or explicit notification)
  2. Sends REBALANCE_IN_PROGRESS to all group members
  3. Consumers stop processing and commit their offsets
  4. All consumers re-join the group
  5. Coordinator calculates new partition assignments using the configured strategy
  6. Consumers receive new assignments and resume processing

Impact: Processing pauses for ~500ms to several seconds. In production, we minimize rebalances by:

  • Using static membership (Kafka 2.3+) to avoid rebalances on restarts
  • Tuning session.timeout.ms and heartbeat.interval.ms
  • Using sticky assignor to minimize partition movement
  • Graceful shutdowns with proper leave group notifications”

Why this is good:

  • Detailed step-by-step understanding
  • Quantifies the impact
  • Shows production awareness
  • Provides optimization strategies

Red Flags to Avoid

  • βœ• Confusing consumer groups with partition replicas
  • βœ• Claiming you can assign same partition to multiple consumers in one group
  • βœ• Not knowing about rebalancing and its impact
  • βœ• Forgetting that consumer count cannot exceed partition count for effectiveness

Quick Self-Check

Before moving on, can you:

  • Explain consumer groups in 60 seconds?
  • Draw a diagram showing partition-to-consumer assignment?
  • Explain what triggers a rebalance?
  • Calculate optimal consumer count given partition count?
  • Identify when to use multiple consumer groups?
  • Explain the partition assignment guarantee?

Prerequisites

Used In Systems

Explained In Detail

  • Kafka Architecture - Consumer Groups & Rebalancing section (30 minutes)
  • Deep dive into rebalancing protocols, partition assignment strategies, and coordinator mechanics

Next Recommended: Offset Management - Learn how consumers track their position in partitions