Kafka Topic Partitioning
How Kafka distributes messages across partitions for parallelism and ordering guarantees.
A Topic with Three Partitions
In Kafka, a topic is divided into partitions — ordered, immutable sequences of records. Each partition is an independent log that can be hosted on different brokers.
Think of partitions as parallel lanes on a highway. More lanes = more throughput.
- Topics are logical groupings of related messages
- Partitions enable horizontal scaling
- Each partition maintains strict ordering
Messages Written by Key
When a producer sends a message, it includes an optional partition key. This key determines which partition receives the message.
Common keys include user IDs, order IDs, or session IDs — anything that groups related events together.
- Keys are optional but recommended for ordering
- Messages without keys use round-robin distribution
- Key choice affects both ordering and load distribution
Hash Function Determines Partition
Kafka computes hash(key) % num_partitions to determine the
target partition. This is deterministic — the same key always maps to the
same partition.
This is why adding partitions later requires careful planning. The hash mapping changes, and existing keys may route differently.
- Default partitioner uses murmur2 hash
- Custom partitioners possible for special routing
- Partition count changes affect key distribution
Order Preserved Within Partitions
Messages within a single partition are strictly ordered by offset. If message A was written before message B, consumers will always see A before B.
However, there's no ordering guarantee across partitions. If you need total ordering, you need a single partition (sacrificing parallelism).
- Offsets are sequential integers per partition
- Same-key messages always land in same partition
- Cross-partition ordering requires application logic
Consumers Claim Partitions
In a consumer group, each partition is assigned to exactly one consumer. This ensures messages aren't processed twice within the group.
If you have 3 partitions and 3 consumers, each gets one partition. Add a 4th consumer? It sits idle until a partition becomes available.
- One partition → one consumer (within a group)
- More consumers than partitions = wasted consumers
- Rebalancing redistributes partitions on changes
Maximum Parallelism
The maximum parallelism equals the number of partitions. Three partitions means at most three consumers can work simultaneously.
This is why partition count is a critical design decision. Too few partitions limit throughput. Too many create overhead and complicate ordering.
- Parallelism upper bound = partition count
- Plan partition count based on expected throughput
- Typical: 3-12 partitions for moderate topics