Virtual Nodes | Concepts

TL;DR

Virtual nodes (vnodes) assign multiple positions on the consistent hash ring to each physical node. Instead of one position per server, a server might have 100-200 virtual positions. This improves load balance, handles heterogeneous hardware (more vnodes for bigger servers), and smooths rebalancing when nodes join or leave.

Visual Overview

Virtual Nodes

Core Explanation

What are Virtual Nodes?

Real-World Analogy: Imagine dividing a pizza among 3 people, but instead of giving each person 1 large slice, you cut the pizza into 30 small slices and give each person 10 random slices. Even if some slices are bigger than others, the randomness averages out—everyone ends up with roughly 1/3 of the pizza.

Virtual nodes work the same way: instead of each physical server claiming one spot on the hash ring, it claims many spots. The randomness of hash positions averages out, giving each server roughly equal load.

Why Virtual Nodes?

With only one ring position per physical node, load distribution depends entirely on where nodes happen to hash. Bad luck means one node might own 60% of the keyspace while another owns 10%.

The Statistics of Virtual Nodes

How Virtual Nodes Work

Virtual Node Mapping

Benefits of Virtual Nodes

Virtual Node Benefits

Trade-offs

Virtual Node Trade-offs

Real Systems Using Virtual Nodes

System	Default Vnodes	Configuration	Notes
Apache Cassandra	256 (was 256, now 16 in newer versions)	`num_tokens` in cassandra.yaml	Reduced for faster bootstrap
Amazon DynamoDB	Internal partitioning	Not configurable	Managed service
Riak	64	`ring_size`	Fixed per cluster
Akka Cluster	Configurable	Per-node setting	Virtual nodes per member
Consul	Configurable	For service discovery	Hash ring for consistency

Note: Default values change across versions. Verify in current documentation.

Case Study: Cassandra Token Assignment

Cassandra Token Distribution

When to Use Virtual Nodes

✓ Perfect Use Cases

Virtual Node Use Cases

✕ When NOT to Use

When Virtual Nodes May Not Fit

Interview Application

Common Interview Question

Q: “In consistent hashing, what are virtual nodes and why are they important?”

Strong Answer:

“Virtual nodes solve the load imbalance problem in consistent hashing. Here’s the issue and solution:

The Problem: With one ring position per physical node, load distribution is random. A 3-node cluster might have 50%, 35%, 15% distribution instead of 33% each. Bad hash luck = hot spots.

The Solution: Assign multiple positions (virtual nodes) to each physical node. Instead of 3 positions on the ring, you have 300 (100 per node). Randomness averages out—each node ends up with ~33% ± 2%.

Key Benefits:

Better balance: More positions → lower variance

Heterogeneous hardware: 200 vnodes for big servers, 50 for small

Smoother rebalancing: When a node fails, its 100 vnodes spread across all survivors, not just one

Faster recovery: Multiple nodes can participate in parallel rebuild

Trade-offs:

More metadata: O(N × V) ring entries vs O(N)

Slightly more complex routing

Configuration decisions (how many vnodes?)

Real-World: Cassandra uses 256 vnodes by default (reduced from 256 to 16 in newer versions for faster bootstrap). DynamoDB uses virtual nodes internally. The overhead is negligible—a 100-node cluster with 256 vnodes each is ~800KB of metadata.”

Follow-up: How do virtual nodes help with node failure recovery?

“Without vnodes, when a node fails, its entire range goes to one other node. That node suddenly has 2× data and 2× traffic.

With vnodes, the failed node’s 100+ positions are scattered across the ring. Each surviving node picks up a portion. The load increase is distributed evenly—much smaller per-node impact.

Recovery is also faster because it’s parallelized. Instead of one node streaming all data from one peer, many nodes stream small portions simultaneously. If you have 10 surviving nodes and each helps rebuild, recovery is 10× faster.”

Follow-up: How would you choose the number of virtual nodes?

“It’s a trade-off between balance quality and overhead:

Factors to consider:

Cluster size: Smaller clusters need more vnodes for balance

Heterogeneity: More vnodes needed if node capacities vary

Bootstrap time: More vnodes = slower node join

Metadata size: Usually negligible

Common patterns:

Small cluster (3-10 nodes): 256 vnodes

Medium cluster (10-100 nodes): 128 vnodes

Large cluster (100+ nodes): 32-64 vnodes (cluster size provides natural balance)

Cassandra actually reduced their default from 256 to 16 in later versions because modern clusters are larger and faster bootstrap matters more than perfect balance.”

Code Example

Virtual Nodes Consistent Hashing (Python)

import hashlib
import bisect
from typing import Dict, List, Optional

class VirtualNodeRing:
    """
    Consistent hash ring with virtual nodes.

    Each physical node gets multiple positions on the ring
    for better load distribution.
    """

    def __init__(self, vnodes_per_node: int = 100):
        """
        Args:
            vnodes_per_node: Virtual positions per physical node
        """
        self.vnodes_per_node = vnodes_per_node
        self.ring: List[int] = []  # Sorted list of hash positions
        self.ring_to_node: Dict[int, str] = {}  # Hash position → physical node

    def _hash(self, key: str) -> int:
        """Generate consistent hash for a key."""
        return int(hashlib.md5(key.encode()).hexdigest(), 16)

    def add_node(self, node: str, weight: int = 1) -> None:
        """
        Add a physical node with virtual nodes.

        Args:
            node: Physical node identifier
            weight: Multiplier for vnodes (for heterogeneous hardware)
        """
        num_vnodes = self.vnodes_per_node * weight

        for i in range(num_vnodes):
            # Generate unique vnode identifier
            vnode_key = f"{node}-vnode-{i}"
            position = self._hash(vnode_key)

            # Add to ring
            bisect.insort(self.ring, position)
            self.ring_to_node[position] = node

    def remove_node(self, node: str, weight: int = 1) -> None:
        """Remove a physical node and all its virtual nodes."""
        num_vnodes = self.vnodes_per_node * weight

        for i in range(num_vnodes):
            vnode_key = f"{node}-vnode-{i}"
            position = self._hash(vnode_key)

            if position in self.ring_to_node:
                self.ring.remove(position)
                del self.ring_to_node[position]

    def get_node(self, key: str) -> Optional[str]:
        """
        Get the physical node responsible for a key.

        Args:
            key: The key to look up

        Returns:
            Physical node identifier, or None if ring is empty
        """
        if not self.ring:
            return None

        position = self._hash(key)

        # Find first vnode position >= key's position
        idx = bisect.bisect_right(self.ring, position)

        # Wrap around to first position if needed
        if idx >= len(self.ring):
            idx = 0

        vnode_position = self.ring[idx]
        return self.ring_to_node[vnode_position]

    def get_distribution(self) -> Dict[str, float]:
        """Calculate the theoretical load distribution."""
        if not self.ring:
            return {}

        # Count vnodes per physical node
        node_vnodes: Dict[str, int] = {}
        for node in self.ring_to_node.values():
            node_vnodes[node] = node_vnodes.get(node, 0) + 1

        total = len(self.ring)
        return {node: count / total for node, count in node_vnodes.items()}


# Usage example
if __name__ == "__main__":
    print("=== Virtual Nodes Demo ===\n")

    # Create ring with 50 vnodes per physical node
    ring = VirtualNodeRing(vnodes_per_node=50)

    # Add 3 nodes
    ring.add_node("server-a")
    ring.add_node("server-b")
    ring.add_node("server-c")

    print("Distribution with equal vnodes (50 each):")
    for node, pct in ring.get_distribution().items():
        print(f"  {node}: {pct:.1%}")

    # Add heterogeneous node with 2× capacity
    ring.add_node("server-d-large", weight=2)

    print("\nAfter adding server-d-large (2× weight):")
    for node, pct in ring.get_distribution().items():
        print(f"  {node}: {pct:.1%}")

    # Simulate key lookups
    print("\nKey assignments:")
    keys = ["user:1", "user:2", "user:3", "order:100", "session:xyz"]
    for key in keys:
        node = ring.get_node(key)
        print(f"  {key} → {node}")

    # Simulate node failure
    print("\nSimulating server-b failure...")
    ring.remove_node("server-b")

    print("\nDistribution after server-b removal:")
    for node, pct in ring.get_distribution().items():
        print(f"  {node}: {pct:.1%}")

    print("\nKey assignments (same keys, new distribution):")
    for key in keys:
        node = ring.get_node(key)
        print(f"  {key} → {node}")

Load Distribution Analysis

import random
from collections import Counter

def analyze_distribution(
    num_nodes: int,
    vnodes_per_node: int,
    num_keys: int = 100000
) -> dict:
    """
    Analyze key distribution across nodes.

    Returns statistics about load balance.
    """
    ring = VirtualNodeRing(vnodes_per_node=vnodes_per_node)

    # Add nodes
    for i in range(num_nodes):
        ring.add_node(f"node-{i}")

    # Distribute random keys
    key_counts: Counter = Counter()
    for i in range(num_keys):
        key = f"key-{random.randint(0, 10**12)}"
        node = ring.get_node(key)
        key_counts[node] += 1

    # Calculate statistics
    counts = list(key_counts.values())
    expected = num_keys / num_nodes
    variance = sum((c - expected) ** 2 for c in counts) / num_nodes
    std_dev = variance ** 0.5
    coefficient_of_variation = std_dev / expected

    return {
        "expected_per_node": expected,
        "actual_min": min(counts),
        "actual_max": max(counts),
        "std_dev": std_dev,
        "cv": coefficient_of_variation,  # Lower = more balanced
    }


if __name__ == "__main__":
    print("=== Distribution Analysis ===\n")

    print("5 nodes, varying vnode counts:")
    for vnodes in [1, 10, 50, 100, 200]:
        stats = analyze_distribution(
            num_nodes=5,
            vnodes_per_node=vnodes,
            num_keys=100000
        )
        print(f"\n  {vnodes} vnodes per node:")
        print(f"    Expected: {stats['expected_per_node']:.0f}")
        print(f"    Range: {stats['actual_min']} - {stats['actual_max']}")
        print(f"    Std Dev: {stats['std_dev']:.0f}")
        print(f"    CV: {stats['cv']:.2%} (lower = more balanced)")

See It In Action:

Consistent Hashing Explainer - Shows vnodes in the ring visualization

Related Concepts:

Consistent Hashing - Parent concept
Sharding - Data partitioning strategies

Quick Self-Check

Can explain why virtual nodes improve load distribution?
Understand the trade-off between vnode count and overhead?
Know how vnodes help with heterogeneous hardware?
Can explain how vnodes improve failure recovery?
Understand the O(log N×V) lookup complexity?
Know typical vnode counts in production systems (100-256)?