Token Bucket

A rate limiting algorithm that allows controlled bursts of traffic while enforcing an average rate limit over time

TL;DR

Token bucket is a rate limiting algorithm where tokens accumulate in a bucket at a fixed rate up to a maximum capacity. Each request consumes a token. If tokens are available, the request proceeds; if not, it’s rejected. This allows bursts up to bucket capacity while enforcing an average rate equal to the refill rate.

Visual Overview

Token Bucket Algorithm
TOKEN BUCKET MECHANICS

                                                    
  Bucket State: [●●●●●] (5/5 tokens)                
                                                    
  Parameters:                                       
  - Capacity: 5 tokens (max burst)                  
  - Refill: 1 token/second (average rate)           
                                                    
  Request 1: [●●●●○]  ALLOWED (4 remaining)        
  Request 2: [●●●○○]  ALLOWED (3 remaining)        
  Request 3: [●●○○○]  ALLOWED (2 remaining)        
  Request 4: [●○○○○]  ALLOWED (1 remaining)        
  Request 5: [○○○○○]  ALLOWED (0 remaining)        
  Request 6: [○○○○○]  REJECTED (no tokens!)        
                                                    
  After 3 seconds: [●●●○○] (3 tokens refilled)      
                                                    


TWO KEY PARAMETERS

                                                    
  CAPACITY (bucket size)                            
   Maximum burst size                             
   How many requests can fire instantly           
                                                    
  REFILL RATE (tokens per second)                   
   Steady-state throughput                        
   Average rate over time                         
                                                    
  Example: capacity=100, refill=10/sec              
  - Burst: 100 requests instantly                   
  - Sustained: 10 requests/second average           
                                                    


Core Explanation

What is Token Bucket?

Real-World Analogy: Think of a parking garage with a limited number of spaces. Cars (requests) arrive and need a parking spot (token). The garage has 50 spots (capacity). Cars leave at a steady rate of 1 per minute (refill rate).

  • If spots are available, cars park immediately
  • If the garage is full, cars must wait or go elsewhere
  • After rush hour, spots gradually become available again
  • A burst of arrivals is fine—up to 50 cars—but then they must slow down

This is why token bucket is perfect for APIs: it allows legitimate burst traffic (page loads, app startup) while preventing sustained overload.

How It Works

The algorithm maintains two pieces of state per rate-limited key:

  1. tokens: Current number of available tokens (float)
  2. last_refill: Timestamp of last refill calculation
Token Bucket State Machine

                                                    
  On Request Arrival:                               
     
   1. Calculate elapsed time since last_refill    
   2. Add tokens: elapsed × refill_rate           
   3. Cap at capacity (no overflow)               
   4. Update last_refill = now                    
     
                                                   
     
   Check: tokens >= 1?                            
       YES  tokens -= 1, return ALLOW            
       NO   return REJECT (429 Too Many)         
     
                                                    
  State per key: ~16 bytes (float + timestamp)      
  Time complexity: O(1) per request                 
                                                    

The Algorithm

on request:
  now = current_time()
  elapsed = now - last_refill
  tokens_to_add = elapsed * refill_rate
  tokens = min(capacity, tokens + tokens_to_add)
  last_refill = now

  if tokens >= 1:
    tokens -= 1
    return ALLOW
  else:
    return REJECT

Key Properties

PropertyValueNotes
Time ComplexityO(1) per requestSimple arithmetic
Space ComplexityO(1) per key~16 bytes (tokens + timestamp)
Burst ToleranceUp to capacityAllows full burst immediately
Average Raterefill_rateLong-term steady state

Token Bucket vs Leaky Bucket

Token Bucket vs Leaky Bucket
TOKEN BUCKET

  Requests  Check Tokens  Allow/Reject            
                                                    
  [●●●●●] full bucket                               
   (5 rapid requests)                          
  [○○○○○] empty  REJECT until refill               
                                                    
   Allows bursts up to capacity                    
   Good for bursty but legitimate traffic          
   Simple to implement                             


LEAKY BUCKET

  Requests  Queue  Drain at fixed rate            
                                                    
   (5 rapid requests)                          
  [●●●●●] queue fills                               
      (drains one at a time at fixed rate)         
out  ●out  ●out...                          
                                                    
   Smooths output to constant rate                 
   Good for shaping traffic to downstream          
  ✕ Adds latency (queueing)                         


KEY DIFFERENCE

  Token Bucket: "Can you handle this burst?"        
     Yes/No immediately (no queueing)              
                                                    
  Leaky Bucket: "Process requests at steady rate"   
     Queues bursts, drains smoothly                
                                                    
  Use Token Bucket for: API rate limiting           
  Use Leaky Bucket for: Network traffic shaping     


Real Systems Using Token Bucket

SystemImplementationConfigurationUse Case
AWS API GatewayNative token bucketConfigurable per stageAPI management
NGINXngx_http_limit_req_moduleburst parameter controls capacityWeb server rate limiting
Envoy ProxyLocal/global rate limitingConfigurable via xDSService mesh
Google Cloud EndpointsToken bucket styleQuota configurationAPI quotas
Kong GatewayRate limiting pluginConfigurable policyAPI gateway

Note: Implementations may vary slightly. Verify current behavior in official documentation.

Configuration Patterns

Common Token Bucket Configurations
API ENDPOINT PATTERNS (Illustrative)

  Endpoint Type      Capacity  Refill     Notes  
 
  Standard API       100       10/sec     Burst  
  Expensive Query    10        1/sec      Heavy  
  Login/Auth         5         1/min      Slow   
  Webhook Receiver   500       100/sec    Spiky  


TIERED RATE LIMITS

  Tier        Bucket Size  Refill Rate  Monthly  
 
  Free        Small        Low          Trial    
  Developer   Medium       Medium       Basic    
  Business    Large        High         Standard 
  Enterprise  Custom       Custom       Custom   

(Exact limits vary by provider—consult documentation)

When to Use Token Bucket

✓ Perfect Use Cases

Use CaseScenarioRequirementConfigurationTrade-off
Public APIs with bursty clientsREST API called by mobile appsAllow burst on app open, enforce average ratecapacity=50, refill=10/secAllows short bursts, blocks sustained overload
CDN/edge rate limitingEdge nodes rate-limit per IPFast O(1) check, minimal statecapacity=100, refill=20/sec per IPLarger capacity = more tolerance for legitimate spikes
Tiered API accessFree vs paid API tiersDifferent burst/rate per subscriptionFree(10,1), Pro(100,10), Enterprise(1000,100)Need to track tier per API key
Internal service protectionDatabase connection rate limitingPrevent connection stormscapacity=20, refill=5/sec per serviceMay queue legitimate requests during spikes

✕ When NOT to Use

When Token Bucket May Not Fit
STRICT BILLING/QUOTA ENFORCEMENT
Problem: Token bucket allows bursts that may exceed quota
Example: "1000 requests/hour"  user burns 100 instantly
Alternative: Sliding window log for exact counting
         (counter variant is also approximate)
When OK: If slight overage is acceptable

SMOOTH OUTPUT REQUIRED
Problem: Token bucket doesn't smooth output rate
Example: Database can handle 10 req/sec, not 100 in 1 second
Alternative: Leaky bucket (queues and drains steadily)
When OK: If downstream handles bursts fine

GLOBAL DISTRIBUTED RATE LIMITING
Problem: Local buckets can allow N × limit globally
Example: 10 regions × 100 burst = 1000 concurrent burst
Alternative: Centralized limiter or eventual sync
When OK: If approximate limiting is acceptable

Interview Application

Common Interview Question

Q: “How would you implement rate limiting for a public API? Walk me through token bucket and when you’d choose it over alternatives.”

Strong Answer:

“I’d implement rate limiting using a token bucket algorithm for most API use cases. Here’s my reasoning:

Why Token Bucket:

  1. Burst tolerance: Real clients are bursty—mobile apps make 10 requests on launch, then go quiet. Token bucket handles this gracefully.
  2. O(1) per request: Just arithmetic—no scanning, no sorting. Critical for high-throughput APIs.
  3. Simple state: Two values per key (tokens, timestamp). Easy to store in Redis.

The Algorithm:

  • Each request checks: do I have tokens? If yes, consume one and proceed. If no, reject with 429.
  • Tokens refill at a steady rate (e.g., 10/second) up to capacity (e.g., 100).
  • Capacity controls burst size; refill rate controls sustained throughput.

Distributed Implementation:

  • Store {tokens, last_refill} in Redis per API key
  • Use Lua script for atomic check-and-decrement
  • Handle Redis failures by failing open (allow request)—better to risk some overload than block everyone

When I’d choose alternatives:

  • Sliding window: When I need exact counting for billing (no burst overage)
  • Leaky bucket: When I need to smooth output to a downstream service
  • Fixed window: Never—boundary burst problem makes it unreliable

Headers I’d return:

  • X-RateLimit-Limit: Bucket capacity
  • X-RateLimit-Remaining: Current tokens
  • X-RateLimit-Reset: When bucket refills to capacity
  • Retry-After: Seconds until tokens available (on 429)”

Follow-up: What’s the boundary burst problem and how does token bucket handle it?

“The boundary burst problem affects fixed window rate limiting. If the window resets at the top of each minute, a client can make 100 requests at 0:59 and 100 more at 1:00—effectively 200 requests in 2 seconds while technically staying within ‘100/minute’.

Token bucket avoids this because it doesn’t have discrete windows. Tokens accumulate continuously. If you burn all tokens at 0:59, you don’t magically get more at 1:00—you wait for refill. The maximum burst is always capped at capacity, regardless of timing.”

Follow-up: How do you handle token bucket in a multi-region deployment?

“There are three approaches with different trade-offs:

  1. Sticky routing: Route each user to one region. Simple but adds latency for users far from their assigned region.

  2. Synchronized global state: Single Redis cluster, all regions check it. Accurate but adds cross-region latency to every request.

  3. Local with eventual sync: Each region has local bucket, periodically sync totals. Fast but user might get N × limit globally.

For most APIs, I’d choose option 3—slightly exceeding limits is acceptable compared to adding 100ms+ latency to every request. For billing-critical endpoints, option 2 or sticky routing.”

Code Example

Token Bucket Rate Limiter (Python)

import time
from dataclasses import dataclass
from typing import Tuple, Dict

@dataclass
class BucketState:
    """State for a single rate-limited key."""
    tokens: float
    last_refill: float

class TokenBucketRateLimiter:
    """
    In-memory token bucket rate limiter.

    For production, replace self.buckets with Redis storage.
    """

    def __init__(self, capacity: int, refill_rate: float):
        """
        Args:
            capacity: Maximum tokens (burst size)
            refill_rate: Tokens added per second
    # ... omitted: keep concept snippets short
    # Wait 3 seconds for refill
    print("\nWaiting 3 seconds...")
    time.sleep(3)

    # Try again
    print("\nAfter waiting:")
    for i in range(4):
        allowed, info = limiter.is_allowed(user)
        status = "✓ ALLOWED" if allowed else "✗ REJECTED"
        print(f"  Request {i+1}: {status} (remaining: {info['remaining']})")

Token Bucket with Redis (Production)

import time
import redis

class RedisTokenBucket:
    """
    Distributed token bucket using Redis.

    Uses Lua script for atomic check-and-decrement.
    """

    LUA_SCRIPT = """
    local key = KEYS[1]
    local capacity = tonumber(ARGV[1])
    local refill_rate = tonumber(ARGV[2])
    local now = tonumber(ARGV[3])

    -- Get current state (or initialize)
    local bucket = redis.call('HMGET', key, 'tokens', 'last_refill')
    local tokens = tonumber(bucket[1]) or capacity
    local last_refill = tonumber(bucket[2]) or now

    -- Calculate refill
    # ... omitted: keep concept snippets short
        result = self.script(
            keys=[f"ratelimit:{key}"],
            args=[self.capacity, self.refill_rate, now]
        )

        return result[0] == 1, {
            "remaining": result[1],
            "limit": self.capacity,
            "reset_at": result[2]
        }

Express.js Middleware

const Redis = require('ioredis');

const redis = new Redis(process.env.REDIS_URL);

// Token bucket middleware factory
function tokenBucket({ capacity, refillRate, keyGenerator }) {
  const luaScript = `
    local key = KEYS[1]
    local capacity = tonumber(ARGV[1])
    local refill_rate = tonumber(ARGV[2])
    local now = tonumber(ARGV[3])

    local bucket = redis.call('HMGET', key, 'tokens', 'last_refill')
    local tokens = tonumber(bucket[1]) or capacity
    local last_refill = tonumber(bucket[2]) or now

    local elapsed = now - last_refill
    tokens = math.min(capacity, tokens + elapsed * refill_rate)

    local allowed = 0
    if tokens >= 1 then
      tokens = tokens - 1
  // ... omitted: keep concept snippets short
    }
  };
}

// Usage
app.use('/api/', tokenBucket({
  capacity: 100,      // Burst of 100
  refillRate: 10,     // 10 requests/second sustained
  keyGenerator: (req) => req.user?.id || req.ip,
}));

See It In Action:

Related Concepts:

Quick Self-Check

  • Can explain token bucket in 60 seconds?
  • Understand the difference between capacity and refill rate?
  • Know why token bucket allows bursts while leaky bucket smooths output?
  • Can implement atomic token bucket in Redis with Lua?
  • Understand the trade-off with fixed window (boundary burst problem)?
  • Know when to choose sliding window over token bucket?

Production signal

Why this concept matters

Interview 60% of rate limiting discussions
Production API gateways, network QoS
Performance O(1) per request
Scale Minimal state per key