Skip to content
~100s Visual Explainer

Consumer Group Rebalancing

How Kafka redistributes partitions among consumers when members join, leave, or fail.

Topic: "events" (6 partitions) P0 P1 P2 P3 P4 P5 Consumer Group: "order-processors" C1 P0, P1 C2 P2, P3 C3 P4, P5 Balanced: 2 partitions each All consumers healthy, heartbeating... Topic: "events" (6 partitions) P0 P1 P2 P3 P4 P5 Consumer Group: "order-processors" C1 P0, P1 C2 CRASHED C3 P4, P5 Session timeout: 10s C2 stopped heartbeating - partitions orphaned! Topic: "events" (6 partitions) ⚠ REBALANCING P0 P1 P2 P3 P4 P5 Coordinator C1 ⏸ Paused C3 ⏸ Paused All consumption PAUSED during rebalance Topic: "events" (6 partitions) P0 P1 P2 P3 P4 P5 Partitions Redistributed C1 P0, P1, P2 +1 gained C3 P3, P4, P5 +1 gained 2 partitions reassigned C1 and C3 each gained one partition Topic: "events" (6 partitions) P0 P1 P2 P3 P4 P5 Consumer Group: "order-processors" C1 P0, P1, P2 C3 P3, P4, P5 ✓ Rebalance Complete Processing resumes from committed offsets Topic: "events" (6 partitions) P0 P1 P2 P3 P4 P5 Scaling Up: C4 Joins C1 P0, P1 C3 P2, P3 C4 P4, P5 NEW Max Parallelism 6 consumers Evenly balanced: 2 each Add consumers to scale (max = partitions)
1 / ?

Balanced Consumer Group

A consumer group is processing a topic with 6 partitions. Three consumers share the load, each handling 2 partitions. This is the optimal distribution — maximum parallelism with even load.

Each consumer regularly sends heartbeats to the group coordinator.

  • One partition → exactly one consumer (within group)
  • Even distribution maximizes throughput
  • Heartbeats prove liveness

Consumer C2 Crashes!

C2 stops sending heartbeats. After the session timeout (default 10 seconds), the coordinator considers C2 dead. Its partitions (P2, P3) are now orphaned.

Messages accumulate on these partitions — no one is consuming them.

  • Session timeout detects failures
  • Orphaned partitions stop processing
  • Lag increases during failure

Group Coordinator Triggers Rebalance

The coordinator initiates a rebalance. It sends a signal to all consumers: "Stop processing, we're redistributing partitions."

During rebalance, the entire consumer group pauses. This is the "stop-the-world" moment.

  • Rebalance affects ALL consumers
  • Processing halts during rebalance
  • Critical for correctness, painful for latency

Partitions Redistributed

The coordinator (or a designated consumer) runs the partition assignment strategy. Orphaned partitions are assigned to remaining consumers.

C1 now handles P0, P1, P2. C3 handles P3, P4, P5. Load is still balanced, though each does more work.

  • Assignment strategies: Range, RoundRobin, Sticky
  • Sticky minimizes partition movement
  • Remaining consumers handle more partitions

Processing Resumes

Consumers receive their new assignments and resume processing. Each consumer seeks to the last committed offset for newly assigned partitions.

The accumulated lag is processed, and the group catches up.

  • Offset tracking enables seamless handoff
  • Lag from pause gets processed
  • Group returns to steady state

Scaling Up: C4 Joins

When traffic increases, we add C4 to the group. This triggers another rebalance — partitions redistribute across 4 consumers.

More consumers = more parallelism (up to partition count).

  • Scale by adding consumers
  • Maximum consumers = partition count
  • Each change triggers rebalance

What's Next?

Understanding rebalancing helps you design resilient Kafka consumers. Key optimizations include sticky assignment (minimizes partition movement), incremental rebalancing (avoids stop-the-world), and tuning session timeouts for your workload.