Token Bucket | Concepts

TL;DR

Token bucket is a rate limiting algorithm where tokens accumulate in a bucket at a fixed rate up to a maximum capacity. Each request consumes a token. If tokens are available, the request proceeds; if not, it’s rejected. This allows bursts up to bucket capacity while enforcing an average rate equal to the refill rate.

Visual Overview

Token Bucket Algorithm

Core Explanation

What is Token Bucket?

Real-World Analogy: Think of a parking garage with a limited number of spaces. Cars (requests) arrive and need a parking spot (token). The garage has 50 spots (capacity). Cars leave at a steady rate of 1 per minute (refill rate).

If spots are available, cars park immediately
If the garage is full, cars must wait or go elsewhere
After rush hour, spots gradually become available again
A burst of arrivals is fine—up to 50 cars—but then they must slow down

This is why token bucket is perfect for APIs: it allows legitimate burst traffic (page loads, app startup) while preventing sustained overload.

How It Works

The algorithm maintains two pieces of state per rate-limited key:

tokens: Current number of available tokens (float)
last_refill: Timestamp of last refill calculation

Token Bucket State Machine

The Algorithm

on request:
  now = current_time()
  elapsed = now - last_refill
  tokens_to_add = elapsed * refill_rate
  tokens = min(capacity, tokens + tokens_to_add)
  last_refill = now

  if tokens >= 1:
    tokens -= 1
    return ALLOW
  else:
    return REJECT

Key Properties

Property	Value	Notes
Time Complexity	O(1) per request	Simple arithmetic
Space Complexity	O(1) per key	~16 bytes (tokens + timestamp)
Burst Tolerance	Up to capacity	Allows full burst immediately
Average Rate	refill_rate	Long-term steady state

Token Bucket vs Leaky Bucket

Real Systems Using Token Bucket

System	Implementation	Configuration	Use Case
AWS API Gateway	Native token bucket	Configurable per stage	API management
NGINX	ngx_http_limit_req_module	burst parameter controls capacity	Web server rate limiting
Envoy Proxy	Local/global rate limiting	Configurable via xDS	Service mesh
Google Cloud Endpoints	Token bucket style	Quota configuration	API quotas
Kong Gateway	Rate limiting plugin	Configurable policy	API gateway

Note: Implementations may vary slightly. Verify current behavior in official documentation.

Configuration Patterns

Common Token Bucket Configurations

When to Use Token Bucket

✓ Perfect Use Cases

Token Bucket Use Cases

✕ When NOT to Use

When Token Bucket May Not Fit

Interview Application

Common Interview Question

Q: “How would you implement rate limiting for a public API? Walk me through token bucket and when you’d choose it over alternatives.”

Strong Answer:

“I’d implement rate limiting using a token bucket algorithm for most API use cases. Here’s my reasoning:

Why Token Bucket:

Burst tolerance: Real clients are bursty—mobile apps make 10 requests on launch, then go quiet. Token bucket handles this gracefully.

O(1) per request: Just arithmetic—no scanning, no sorting. Critical for high-throughput APIs.

Simple state: Two values per key (tokens, timestamp). Easy to store in Redis.

The Algorithm:

Each request checks: do I have tokens? If yes, consume one and proceed. If no, reject with 429.

Tokens refill at a steady rate (e.g., 10/second) up to capacity (e.g., 100).

Capacity controls burst size; refill rate controls sustained throughput.

Distributed Implementation:

Store {tokens, last_refill} in Redis per API key

Use Lua script for atomic check-and-decrement

Handle Redis failures by failing open (allow request)—better to risk some overload than block everyone

When I’d choose alternatives:

Sliding window: When I need exact counting for billing (no burst overage)

Leaky bucket: When I need to smooth output to a downstream service

Fixed window: Never—boundary burst problem makes it unreliable

Headers I’d return:

X-RateLimit-Limit: Bucket capacity

X-RateLimit-Remaining: Current tokens

X-RateLimit-Reset: When bucket refills to capacity

Retry-After: Seconds until tokens available (on 429)”

Follow-up: What’s the boundary burst problem and how does token bucket handle it?

“The boundary burst problem affects fixed window rate limiting. If the window resets at the top of each minute, a client can make 100 requests at 0:59 and 100 more at 1:00—effectively 200 requests in 2 seconds while technically staying within ‘100/minute’.

Token bucket avoids this because it doesn’t have discrete windows. Tokens accumulate continuously. If you burn all tokens at 0:59, you don’t magically get more at 1:00—you wait for refill. The maximum burst is always capped at capacity, regardless of timing.”

Follow-up: How do you handle token bucket in a multi-region deployment?

“There are three approaches with different trade-offs:

Sticky routing: Route each user to one region. Simple but adds latency for users far from their assigned region.

Synchronized global state: Single Redis cluster, all regions check it. Accurate but adds cross-region latency to every request.

Local with eventual sync: Each region has local bucket, periodically sync totals. Fast but user might get N × limit globally.

For most APIs, I’d choose option 3—slightly exceeding limits is acceptable compared to adding 100ms+ latency to every request. For billing-critical endpoints, option 2 or sticky routing.”

Code Example

Token Bucket Rate Limiter (Python)

import time
from dataclasses import dataclass
from typing import Tuple, Dict

@dataclass
class BucketState:
    """State for a single rate-limited key."""
    tokens: float
    last_refill: float

class TokenBucketRateLimiter:
    """
    In-memory token bucket rate limiter.

    For production, replace self.buckets with Redis storage.
    """

    def __init__(self, capacity: int, refill_rate: float):
        """
        Args:
            capacity: Maximum tokens (burst size)
            refill_rate: Tokens added per second
        """
        self.capacity = capacity
        self.refill_rate = refill_rate
        self.buckets: Dict[str, BucketState] = {}

    def is_allowed(self, key: str) -> Tuple[bool, dict]:
        """
        Check if request is allowed and consume a token.

        Args:
            key: Rate limit key (user_id, ip, api_key)

        Returns:
            (allowed: bool, info: dict with remaining, reset_at)
        """
        now = time.time()

        # Get or create bucket state
        if key not in self.buckets:
            self.buckets[key] = BucketState(
                tokens=self.capacity,
                last_refill=now
            )

        bucket = self.buckets[key]

        # Calculate tokens to add since last refill
        elapsed = now - bucket.last_refill
        tokens_to_add = elapsed * self.refill_rate

        # Refill bucket (cap at capacity)
        bucket.tokens = min(self.capacity, bucket.tokens + tokens_to_add)
        bucket.last_refill = now

        # Check if we have tokens
        if bucket.tokens >= 1:
            bucket.tokens -= 1
            allowed = True
        else:
            allowed = False

        # Calculate when bucket will be full again
        tokens_needed = self.capacity - bucket.tokens
        seconds_to_full = tokens_needed / self.refill_rate
        reset_at = int(now + seconds_to_full)

        return allowed, {
            "remaining": int(bucket.tokens),
            "limit": self.capacity,
            "reset_at": reset_at
        }


# Usage example
if __name__ == "__main__":
    # 10 requests/second with burst of 5
    limiter = TokenBucketRateLimiter(
        capacity=5,        # Allow burst of 5
        refill_rate=1      # Refill 1 token/second
    )

    user = "user_123"

    # Simulate burst of 7 requests
    print("Burst of 7 requests:")
    for i in range(7):
        allowed, info = limiter.is_allowed(user)
        status = "✓ ALLOWED" if allowed else "✗ REJECTED"
        print(f"  Request {i+1}: {status} (remaining: {info['remaining']})")

    # Wait 3 seconds for refill
    print("\nWaiting 3 seconds...")
    time.sleep(3)

    # Try again
    print("\nAfter waiting:")
    for i in range(4):
        allowed, info = limiter.is_allowed(user)
        status = "✓ ALLOWED" if allowed else "✗ REJECTED"
        print(f"  Request {i+1}: {status} (remaining: {info['remaining']})")

Token Bucket with Redis (Production)

import time
import redis

class RedisTokenBucket:
    """
    Distributed token bucket using Redis.

    Uses Lua script for atomic check-and-decrement.
    """

    LUA_SCRIPT = """
    local key = KEYS[1]
    local capacity = tonumber(ARGV[1])
    local refill_rate = tonumber(ARGV[2])
    local now = tonumber(ARGV[3])

    -- Get current state (or initialize)
    local bucket = redis.call('HMGET', key, 'tokens', 'last_refill')
    local tokens = tonumber(bucket[1]) or capacity
    local last_refill = tonumber(bucket[2]) or now

    -- Calculate refill
    local elapsed = now - last_refill
    tokens = math.min(capacity, tokens + elapsed * refill_rate)

    -- Try to consume token
    local allowed = 0
    if tokens >= 1 then
        tokens = tokens - 1
        allowed = 1
    end

    -- Update state
    redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now)
    redis.call('EXPIRE', key, 3600)  -- Expire after 1 hour idle

    -- Return: allowed, remaining, reset_at
    local reset_at = now + (capacity - tokens) / refill_rate
    return {allowed, math.floor(tokens), math.floor(reset_at)}
    """

    def __init__(self, redis_client: redis.Redis, capacity: int, refill_rate: float):
        self.redis = redis_client
        self.capacity = capacity
        self.refill_rate = refill_rate
        self.script = self.redis.register_script(self.LUA_SCRIPT)

    def is_allowed(self, key: str) -> tuple[bool, dict]:
        """Check and consume token atomically."""
        now = time.time()
        result = self.script(
            keys=[f"ratelimit:{key}"],
            args=[self.capacity, self.refill_rate, now]
        )

        return result[0] == 1, {
            "remaining": result[1],
            "limit": self.capacity,
            "reset_at": result[2]
        }

Express.js Middleware

const Redis = require('ioredis');

const redis = new Redis(process.env.REDIS_URL);

// Token bucket middleware factory
function tokenBucket({ capacity, refillRate, keyGenerator }) {
  const luaScript = `
    local key = KEYS[1]
    local capacity = tonumber(ARGV[1])
    local refill_rate = tonumber(ARGV[2])
    local now = tonumber(ARGV[3])

    local bucket = redis.call('HMGET', key, 'tokens', 'last_refill')
    local tokens = tonumber(bucket[1]) or capacity
    local last_refill = tonumber(bucket[2]) or now

    local elapsed = now - last_refill
    tokens = math.min(capacity, tokens + elapsed * refill_rate)

    local allowed = 0
    if tokens >= 1 then
      tokens = tokens - 1
      allowed = 1
    end

    redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now)
    redis.call('EXPIRE', key, 3600)

    return {allowed, math.floor(tokens), math.floor(now + (capacity - tokens) / refill_rate)}
  `;

  return async (req, res, next) => {
    const key = keyGenerator(req);
    const now = Date.now() / 1000;

    try {
      const [allowed, remaining, resetAt] = await redis.eval(
        luaScript,
        1,
        `ratelimit:${key}`,
        capacity,
        refillRate,
        now
      );

      // Set rate limit headers
      res.set({
        'X-RateLimit-Limit': capacity,
        'X-RateLimit-Remaining': remaining,
        'X-RateLimit-Reset': resetAt,
      });

      if (allowed) {
        next();
      } else {
        res.set('Retry-After', Math.ceil(1 / refillRate));
        res.status(429).json({
          error: 'Too Many Requests',
          retryAfter: Math.ceil(1 / refillRate),
        });
      }
    } catch (err) {
      // Fail open on Redis errors
      console.error('Rate limiter error:', err);
      next();
    }
  };
}

// Usage
app.use('/api/', tokenBucket({
  capacity: 100,      // Burst of 100
  refillRate: 10,     // 10 requests/second sustained
  keyGenerator: (req) => req.user?.id || req.ip,
}));

See It In Action:

Rate Limiting Explainer - Compares token bucket with sliding window

Related Concepts:

Rate Limiting - Parent concept
Sliding Window - Alternative: smoother enforcement, no burst
Leaky Bucket - Smooths output rate with queueing

Quick Self-Check

Can explain token bucket in 60 seconds?
Understand the difference between capacity and refill rate?
Know why token bucket allows bursts while leaky bucket smooths output?
Can implement atomic token bucket in Redis with Lua?
Understand the trade-off with fixed window (boundary burst problem)?
Know when to choose sliding window over token bucket?