Consistent Hashing
How distributed systems minimize data movement when nodes change — consistent hashing maps both keys and nodes to a ring.
Traditional Hashing: The Problem
Simple hashing uses hash(key) % num_servers. With 3
servers, adding a 4th means hash(key) % 4 instead of % 3 — almost every key maps to a different server!
This causes a redistribution storm: massive data movement on any topology change.
- Modulo hashing: key_server = hash(key) % N
- Change N → almost all keys move
- Expensive in distributed systems
- Caches invalidated, data copied
The Consistent Hashing Ring
Consistent hashing imagines the hash space as a ring. Instead of modulo, we map both keys AND nodes onto this ring, then assign each key to the nearest node clockwise.
The ring is just a conceptual view of the hash range (e.g., 0 to 2³²-1) wrapped into a circle.
- Hash space forms a circular continuum
- Both nodes and keys are hashed
- Position determines ownership
Nodes Positioned on the Ring
Each node is placed on the ring based on its hash (e.g., hash of IP address). A node "owns" all positions from the previous node up to itself.
N2 at 150° owns everything from 46° to 150°.
- Node position = hash(node_id)
- Ownership range = (prev_node, this_node]
- Even distribution is ideal
- Keys go to first node clockwise
Keys Find Their Node
To find a key's owner: hash the key, then walk clockwise until you hit a node. That node stores the key.
Key "user-123" hashes to 100°. Walking clockwise, the first node is N2 at 150°. N2 owns this key.
- Key position = hash(key)
- Owner = first node clockwise
- Lookup is O(log N) with sorted node list
Add Node: Minimal Movement
When N4 joins at 200°, it takes ownership of keys between 150° and 200° — keys that WERE going to N3. Only those keys move.
On average, roughly 1/(N+1) of keys move when adding a node (about K/N for large N). Much better than modulo hashing!
- Only keys in new range move
- Average: ~K/(N+1) keys affected
- Other nodes untouched
Remove Node: Graceful Handoff
When N2 leaves, its keys simply extend their clockwise journey to the next node (N3). Only N2's keys move.
This is why consistent hashing is used in caching, distributed databases, and load balancers — it enables graceful scaling.
- Removed node's keys go to successor
- Only 1/N keys affected
- Enables graceful scaling up/down