Skip to content

Token Bucket

A rate limiting algorithm that allows controlled bursts of traffic while enforcing an average rate limit over time

TL;DR

Token bucket is a rate limiting algorithm where tokens accumulate in a bucket at a fixed rate up to a maximum capacity. Each request consumes a token. If tokens are available, the request proceeds; if not, it’s rejected. This allows bursts up to bucket capacity while enforcing an average rate equal to the refill rate.

Visual Overview

Token Bucket Algorithm

Core Explanation

What is Token Bucket?

Real-World Analogy: Think of a parking garage with a limited number of spaces. Cars (requests) arrive and need a parking spot (token). The garage has 50 spots (capacity). Cars leave at a steady rate of 1 per minute (refill rate).

  • If spots are available, cars park immediately
  • If the garage is full, cars must wait or go elsewhere
  • After rush hour, spots gradually become available again
  • A burst of arrivals is fine—up to 50 cars—but then they must slow down

This is why token bucket is perfect for APIs: it allows legitimate burst traffic (page loads, app startup) while preventing sustained overload.

How It Works

The algorithm maintains two pieces of state per rate-limited key:

  1. tokens: Current number of available tokens (float)
  2. last_refill: Timestamp of last refill calculation
Token Bucket State Machine

The Algorithm

on request:
  now = current_time()
  elapsed = now - last_refill
  tokens_to_add = elapsed * refill_rate
  tokens = min(capacity, tokens + tokens_to_add)
  last_refill = now

  if tokens >= 1:
    tokens -= 1
    return ALLOW
  else:
    return REJECT

Key Properties

PropertyValueNotes
Time ComplexityO(1) per requestSimple arithmetic
Space ComplexityO(1) per key~16 bytes (tokens + timestamp)
Burst ToleranceUp to capacityAllows full burst immediately
Average Raterefill_rateLong-term steady state

Token Bucket vs Leaky Bucket

Token Bucket vs Leaky Bucket

Real Systems Using Token Bucket

SystemImplementationConfigurationUse Case
AWS API GatewayNative token bucketConfigurable per stageAPI management
NGINXngx_http_limit_req_moduleburst parameter controls capacityWeb server rate limiting
Envoy ProxyLocal/global rate limitingConfigurable via xDSService mesh
Google Cloud EndpointsToken bucket styleQuota configurationAPI quotas
Kong GatewayRate limiting pluginConfigurable policyAPI gateway

Note: Implementations may vary slightly. Verify current behavior in official documentation.

Configuration Patterns

Common Token Bucket Configurations

When to Use Token Bucket

✓ Perfect Use Cases

Token Bucket Use Cases

✕ When NOT to Use

When Token Bucket May Not Fit

Interview Application

Common Interview Question

Q: “How would you implement rate limiting for a public API? Walk me through token bucket and when you’d choose it over alternatives.”

Strong Answer:

“I’d implement rate limiting using a token bucket algorithm for most API use cases. Here’s my reasoning:

Why Token Bucket:

  1. Burst tolerance: Real clients are bursty—mobile apps make 10 requests on launch, then go quiet. Token bucket handles this gracefully.
  2. O(1) per request: Just arithmetic—no scanning, no sorting. Critical for high-throughput APIs.
  3. Simple state: Two values per key (tokens, timestamp). Easy to store in Redis.

The Algorithm:

  • Each request checks: do I have tokens? If yes, consume one and proceed. If no, reject with 429.
  • Tokens refill at a steady rate (e.g., 10/second) up to capacity (e.g., 100).
  • Capacity controls burst size; refill rate controls sustained throughput.

Distributed Implementation:

  • Store {tokens, last_refill} in Redis per API key
  • Use Lua script for atomic check-and-decrement
  • Handle Redis failures by failing open (allow request)—better to risk some overload than block everyone

When I’d choose alternatives:

  • Sliding window: When I need exact counting for billing (no burst overage)
  • Leaky bucket: When I need to smooth output to a downstream service
  • Fixed window: Never—boundary burst problem makes it unreliable

Headers I’d return:

  • X-RateLimit-Limit: Bucket capacity
  • X-RateLimit-Remaining: Current tokens
  • X-RateLimit-Reset: When bucket refills to capacity
  • Retry-After: Seconds until tokens available (on 429)”

Follow-up: What’s the boundary burst problem and how does token bucket handle it?

“The boundary burst problem affects fixed window rate limiting. If the window resets at the top of each minute, a client can make 100 requests at 0:59 and 100 more at 1:00—effectively 200 requests in 2 seconds while technically staying within ‘100/minute’.

Token bucket avoids this because it doesn’t have discrete windows. Tokens accumulate continuously. If you burn all tokens at 0:59, you don’t magically get more at 1:00—you wait for refill. The maximum burst is always capped at capacity, regardless of timing.”

Follow-up: How do you handle token bucket in a multi-region deployment?

“There are three approaches with different trade-offs:

  1. Sticky routing: Route each user to one region. Simple but adds latency for users far from their assigned region.

  2. Synchronized global state: Single Redis cluster, all regions check it. Accurate but adds cross-region latency to every request.

  3. Local with eventual sync: Each region has local bucket, periodically sync totals. Fast but user might get N × limit globally.

For most APIs, I’d choose option 3—slightly exceeding limits is acceptable compared to adding 100ms+ latency to every request. For billing-critical endpoints, option 2 or sticky routing.”

Code Example

Token Bucket Rate Limiter (Python)

import time
from dataclasses import dataclass
from typing import Tuple, Dict

@dataclass
class BucketState:
    """State for a single rate-limited key."""
    tokens: float
    last_refill: float

class TokenBucketRateLimiter:
    """
    In-memory token bucket rate limiter.

    For production, replace self.buckets with Redis storage.
    """

    def __init__(self, capacity: int, refill_rate: float):
        """
        Args:
            capacity: Maximum tokens (burst size)
            refill_rate: Tokens added per second
        """
        self.capacity = capacity
        self.refill_rate = refill_rate
        self.buckets: Dict[str, BucketState] = {}

    def is_allowed(self, key: str) -> Tuple[bool, dict]:
        """
        Check if request is allowed and consume a token.

        Args:
            key: Rate limit key (user_id, ip, api_key)

        Returns:
            (allowed: bool, info: dict with remaining, reset_at)
        """
        now = time.time()

        # Get or create bucket state
        if key not in self.buckets:
            self.buckets[key] = BucketState(
                tokens=self.capacity,
                last_refill=now
            )

        bucket = self.buckets[key]

        # Calculate tokens to add since last refill
        elapsed = now - bucket.last_refill
        tokens_to_add = elapsed * self.refill_rate

        # Refill bucket (cap at capacity)
        bucket.tokens = min(self.capacity, bucket.tokens + tokens_to_add)
        bucket.last_refill = now

        # Check if we have tokens
        if bucket.tokens >= 1:
            bucket.tokens -= 1
            allowed = True
        else:
            allowed = False

        # Calculate when bucket will be full again
        tokens_needed = self.capacity - bucket.tokens
        seconds_to_full = tokens_needed / self.refill_rate
        reset_at = int(now + seconds_to_full)

        return allowed, {
            "remaining": int(bucket.tokens),
            "limit": self.capacity,
            "reset_at": reset_at
        }


# Usage example
if __name__ == "__main__":
    # 10 requests/second with burst of 5
    limiter = TokenBucketRateLimiter(
        capacity=5,        # Allow burst of 5
        refill_rate=1      # Refill 1 token/second
    )

    user = "user_123"

    # Simulate burst of 7 requests
    print("Burst of 7 requests:")
    for i in range(7):
        allowed, info = limiter.is_allowed(user)
        status = "✓ ALLOWED" if allowed else "✗ REJECTED"
        print(f"  Request {i+1}: {status} (remaining: {info['remaining']})")

    # Wait 3 seconds for refill
    print("\nWaiting 3 seconds...")
    time.sleep(3)

    # Try again
    print("\nAfter waiting:")
    for i in range(4):
        allowed, info = limiter.is_allowed(user)
        status = "✓ ALLOWED" if allowed else "✗ REJECTED"
        print(f"  Request {i+1}: {status} (remaining: {info['remaining']})")

Token Bucket with Redis (Production)

import time
import redis

class RedisTokenBucket:
    """
    Distributed token bucket using Redis.

    Uses Lua script for atomic check-and-decrement.
    """

    LUA_SCRIPT = """
    local key = KEYS[1]
    local capacity = tonumber(ARGV[1])
    local refill_rate = tonumber(ARGV[2])
    local now = tonumber(ARGV[3])

    -- Get current state (or initialize)
    local bucket = redis.call('HMGET', key, 'tokens', 'last_refill')
    local tokens = tonumber(bucket[1]) or capacity
    local last_refill = tonumber(bucket[2]) or now

    -- Calculate refill
    local elapsed = now - last_refill
    tokens = math.min(capacity, tokens + elapsed * refill_rate)

    -- Try to consume token
    local allowed = 0
    if tokens >= 1 then
        tokens = tokens - 1
        allowed = 1
    end

    -- Update state
    redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now)
    redis.call('EXPIRE', key, 3600)  -- Expire after 1 hour idle

    -- Return: allowed, remaining, reset_at
    local reset_at = now + (capacity - tokens) / refill_rate
    return {allowed, math.floor(tokens), math.floor(reset_at)}
    """

    def __init__(self, redis_client: redis.Redis, capacity: int, refill_rate: float):
        self.redis = redis_client
        self.capacity = capacity
        self.refill_rate = refill_rate
        self.script = self.redis.register_script(self.LUA_SCRIPT)

    def is_allowed(self, key: str) -> tuple[bool, dict]:
        """Check and consume token atomically."""
        now = time.time()
        result = self.script(
            keys=[f"ratelimit:{key}"],
            args=[self.capacity, self.refill_rate, now]
        )

        return result[0] == 1, {
            "remaining": result[1],
            "limit": self.capacity,
            "reset_at": result[2]
        }

Express.js Middleware

const Redis = require('ioredis');

const redis = new Redis(process.env.REDIS_URL);

// Token bucket middleware factory
function tokenBucket({ capacity, refillRate, keyGenerator }) {
  const luaScript = `
    local key = KEYS[1]
    local capacity = tonumber(ARGV[1])
    local refill_rate = tonumber(ARGV[2])
    local now = tonumber(ARGV[3])

    local bucket = redis.call('HMGET', key, 'tokens', 'last_refill')
    local tokens = tonumber(bucket[1]) or capacity
    local last_refill = tonumber(bucket[2]) or now

    local elapsed = now - last_refill
    tokens = math.min(capacity, tokens + elapsed * refill_rate)

    local allowed = 0
    if tokens >= 1 then
      tokens = tokens - 1
      allowed = 1
    end

    redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now)
    redis.call('EXPIRE', key, 3600)

    return {allowed, math.floor(tokens), math.floor(now + (capacity - tokens) / refill_rate)}
  `;

  return async (req, res, next) => {
    const key = keyGenerator(req);
    const now = Date.now() / 1000;

    try {
      const [allowed, remaining, resetAt] = await redis.eval(
        luaScript,
        1,
        `ratelimit:${key}`,
        capacity,
        refillRate,
        now
      );

      // Set rate limit headers
      res.set({
        'X-RateLimit-Limit': capacity,
        'X-RateLimit-Remaining': remaining,
        'X-RateLimit-Reset': resetAt,
      });

      if (allowed) {
        next();
      } else {
        res.set('Retry-After', Math.ceil(1 / refillRate));
        res.status(429).json({
          error: 'Too Many Requests',
          retryAfter: Math.ceil(1 / refillRate),
        });
      }
    } catch (err) {
      // Fail open on Redis errors
      console.error('Rate limiter error:', err);
      next();
    }
  };
}

// Usage
app.use('/api/', tokenBucket({
  capacity: 100,      // Burst of 100
  refillRate: 10,     // 10 requests/second sustained
  keyGenerator: (req) => req.user?.id || req.ip,
}));

See It In Action:

Related Concepts:

Quick Self-Check

  • Can explain token bucket in 60 seconds?
  • Understand the difference between capacity and refill rate?
  • Know why token bucket allows bursts while leaky bucket smooths output?
  • Can implement atomic token bucket in Redis with Lua?
  • Understand the trade-off with fixed window (boundary burst problem)?
  • Know when to choose sliding window over token bucket?
Interview Notes
💼60% of rate limiting discussions
Interview Relevance
60% of rate limiting discussions
🏭API gateways, network QoS
Production Impact
Powers systems at API gateways, network QoS
O(1) per request
Performance
O(1) per request query improvement
📈Minimal state per key
Scalability
Minimal state per key