Intentional Deliberate Engineering · Essay

Akka Cluster Sharding vs a Consistent-Hash Ring: What You Trade Away

When I built a small distributed cache I reached for Akka Cluster Sharding instead of writing a consistent-hash ring myself. Six months later I think it was the right call — for a learning project. Here is what each choice actually buys you, and where the marketing gloss falls apart.

Six months ago I wanted a small, real distributed cache to run on my laptop in three nodes — something where I could pull a cable, watch state move, and have an HTTP API I could curl. The honest goal was understanding, not production traffic.

The tradeoff at a glance

Consistent-hash ring versus Akka Cluster Sharding — the abstraction boundary moves, the underlying problems do not.

The instinctive design is a consistent-hash ring. I have written one before. It is a satisfying weekend of work: a sorted set of virtual node tokens, a hash function, a couple of methods that walk the ring clockwise from hash(key) to find the owner, and a notion of replicas to walk further for fault tolerance. Everyone who has read about Dynamo can sketch it on a whiteboard.

I reached for Akka Cluster Sharding instead. Here is why, and what that decision actually costs.

What a consistent-hash ring gives you

The model is small enough to fit in your head:

CONSISTENT-HASH RING
ring = sorted([
  (token, nodeId)  # 100-256 vnodes per physical node
  for node in cluster
])

owners(key) = [
  ring.successor(hash(key))  # primary
  ring.successor(primary)    # replica 1
  ring.successor(replica1)   # replica 2
]

You own the routing. You own the replica count. You own the membership list — usually backed by a gossip protocol or a strong-coordination layer like etcd. When a node joins or leaves, you walk the ring, identify which key ranges moved, and stream the affected entries to the new owner.

The cost is everything in that last paragraph. You own the routing. Membership detection. Failure detection. Rebalancing. Hand-off in flight. Backpressure during transfer. Quorum during partial partition. The first three are 80% of the code in any production distributed cache.

This is not bad — it is just where the work is.

What Akka Cluster Sharding gives you

Akka takes a different shape. Keys are not hashed onto a ring; they are hashed onto a fixed pool of shards (I used 10 — overkill for 3 nodes, intentional for headroom). Each shard has exactly one owning node at steady state — during rebalance there is a brief handoff window where messages to that shard are buffered, not routed. Each entity — in my case, a CacheActor for a single key — lives inside its assigned shard.

AKKA CLUSTER SHARDING

The shard → node assignment is owned by a shard coordinator — a cluster singleton actor — whose state is replicated through Distributed Data (ddata), Akka’s gossip-replicated key-value store backed by CRDTs. When a node joins or leaves, the coordinator computes a new assignment, persists it via ddata, and the per-node ShardRegion actors migrate entities to their new owners. Messages routed through ShardRegion during the handoff are buffered until the destination is ready; messages sent to a stale ActorRef directly will fail.

The win is what is not in my code:

  • No membership protocol. Akka Cluster handles join/leave/unreachable transitions.
  • No rebalancing logic. Shard coordinator does it, with backpressure.
  • No external dependency. ddata is in-process; no etcd or ZooKeeper.
  • No concurrency control on values. One actor per key — the mailbox is the lock.

For a learning project, this is exactly the right division of labor. I wanted to study consistency models and operator behavior, not reimplement gossip.

Where the marketing gloss falls apart

Both stories sound clean on a slide. Neither is.

Akka Sharding is not a database. ddata holds the shard assignment map, not your entity state. When a node dies, the entities it owned vanish — the shard moves to a new node, but the new node starts those entities cold. If you want replication, you bolt on akka-persistence (with a real journal — not the in-memory one I used for the demo) and accept the latency and operator overhead that comes with it.

A consistent-hash ring with replication is harder than the diagram suggests. N=3 with R=2, W=2 (so R+W>N for read-your-writes) sounds like CAP-tutorial material. In practice you fight three problems forever: (1) hinted handoff for writes that arrived during a partition, (2) read repair vs. background anti-entropy and the bandwidth budget between them, and (3) reconciling a node that returns with stale data after the replica set has already accepted newer writes. Cassandra and Riak each spent years on these. Yours will be less robust than either, by a wide margin, on day one.

Both lose data in the same place: when an unreplicated node dies. This is not a flaw in either model — it is the cache contract. You promise speed, not durability. The trap is the README that claims “fault tolerance” because the cluster survives the node death; the data on that node, by default, does not.

What I actually saw when I pulled the cable

The reason to build the thing instead of read about it: behavior under partition is rarely what the diagram suggests.

Killing one node of three (SIGKILL, not graceful leave) produced a measurable gap before the cluster reacted. With Akka’s defaults, the unreachable node stays “unreachable but not removed” until either auto-downing or a human marks it down. During that window, requests to entities owned by the dead node’s shards either time out or pile up in the ShardRegion mailbox — the rebalance does not start. With Split Brain Resolver configured (keep-majority), the gap dropped to a few seconds; without it, indefinite. This is the single most useful thing I learned: the failure-detection-to-rebalance latency is your real availability budget, and it is a tuning problem, not a design choice.

A consistent-hash ring would have shown me the same lesson, dressed differently — failure detector tunings, suspicion thresholds, gossip convergence time. The names change; the shape of the problem does not.

The honest tradeoff

If your goal is to study consistent hashing, write the ring. If your goal is to study how a sharded service behaves under partition and rolling restart, use Akka Sharding (or its analog in your runtime — Orleans, Service Fabric, Ray) and spend your effort on the failure modes rather than the routing math.

If your goal is production cache, do neither. Use Redis Cluster, or DynamoDB, or whatever your team already operates.

The mistake I made on the first version of my repo’s README was claiming “consistent hashing, ONE/QUORUM/ALL consistency, fault-tolerant replication.” None of that was implemented. The current README documents what is actually there: Akka Sharding, ddata-replicated coordinator state, in-memory entity state, no cross-node replication, no persistence. The gap between the original claims and the actual implementation is the gap between every distributed-cache blog post and a system you would put on call.

What this project taught me — and what I would put on a production-cache requirements list before writing any code again:

  • A documented split-brain policy, not a default. Auto-downing is a footgun.
  • An explicit replication factor and a quorum model that survives one-node failure without operator action.
  • A persistence story for the entity state, not just the coordinator state.
  • Backpressure on the ingress path during rebalance, measured in p99 latency, not theory.
  • A failure-detection budget tuned against your real network, not the localhost demo.

Code is at github.com/ketankhairnar/akka-distributed-cache. Read the Limits section before the Features section.