Kafka Topic Partitioning

How Kafka distributes messages across partitions for parallelism and ordering guarantees.

Based on: Topic Partitioning Consumer Groups Offset Management

1 / ?

A Topic with Three Partitions

In Kafka, a topic is divided into partitions — ordered, immutable sequences of records. Each partition is an independent log that can be hosted on different brokers.

Think of partitions as parallel lanes on a highway. More lanes = more throughput.

Topics are logical groupings of related messages
Partitions enable horizontal scaling
Each partition maintains strict ordering

Messages Written by Key

When a producer sends a message, it includes an optional partition key. This key determines which partition receives the message.

Common keys include user IDs, order IDs, or session IDs — anything that groups related events together.

Keys are optional but recommended for ordering
Messages without keys use round-robin distribution
Key choice affects both ordering and load distribution

Hash Function Determines Partition

Kafka computes hash(key) % num_partitions to determine the target partition. This is deterministic — the same key always maps to the same partition.

This is why adding partitions later requires careful planning. The hash mapping changes, and existing keys may route differently.

Default partitioner uses murmur2 hash
Custom partitioners possible for special routing
Partition count changes affect key distribution

Order Preserved Within Partitions

Messages within a single partition are strictly ordered by offset. If message A was written before message B, consumers will always see A before B.

However, there's no ordering guarantee across partitions. If you need total ordering, you need a single partition (sacrificing parallelism).

Offsets are sequential integers per partition
Same-key messages always land in same partition
Cross-partition ordering requires application logic

Consumers Claim Partitions

In a consumer group, each partition is assigned to exactly one consumer. This ensures messages aren't processed twice within the group.

If you have 3 partitions and 3 consumers, each gets one partition. Add a 4th consumer? It sits idle until a partition becomes available.

One partition → one consumer (within a group)
More consumers than partitions = wasted consumers
Rebalancing redistributes partitions on changes

Maximum Parallelism

The maximum parallelism equals the number of partitions. Three partitions means at most three consumers can work simultaneously.

This is why partition count is a critical design decision. Too few partitions limit throughput. Too many create overhead and complicate ordering.

Parallelism upper bound = partition count
Plan partition count based on expected throughput
Typical: 3-12 partitions for moderate topics

What's Next?

Partitioning is just one piece of Kafka's architecture. Understanding consumer group rebalancing and offset management helps you build reliable stream processing applications.