A rate limiting algorithm that allows controlled bursts of traffic while enforcing an average rate limit over time
TL;DR
Token bucket is a rate limiting algorithm where tokens accumulate in a bucket at a fixed rate up to a maximum capacity. Each request consumes a token. If tokens are available, the request proceeds; if not, it’s rejected. This allows bursts up to bucket capacity while enforcing an average rate equal to the refill rate.
Visual Overview
Core Explanation
What is Token Bucket?
Real-World Analogy: Think of a parking garage with a limited number of spaces. Cars (requests) arrive and need a parking spot (token). The garage has 50 spots (capacity). Cars leave at a steady rate of 1 per minute (refill rate).
- If spots are available, cars park immediately
- If the garage is full, cars must wait or go elsewhere
- After rush hour, spots gradually become available again
- A burst of arrivals is fine—up to 50 cars—but then they must slow down
This is why token bucket is perfect for APIs: it allows legitimate burst traffic (page loads, app startup) while preventing sustained overload.
How It Works
The algorithm maintains two pieces of state per rate-limited key:
- tokens: Current number of available tokens (float)
- last_refill: Timestamp of last refill calculation
The Algorithm
on request:
now = current_time()
elapsed = now - last_refill
tokens_to_add = elapsed * refill_rate
tokens = min(capacity, tokens + tokens_to_add)
last_refill = now
if tokens >= 1:
tokens -= 1
return ALLOW
else:
return REJECT
Key Properties
| Property | Value | Notes |
|---|---|---|
| Time Complexity | O(1) per request | Simple arithmetic |
| Space Complexity | O(1) per key | ~16 bytes (tokens + timestamp) |
| Burst Tolerance | Up to capacity | Allows full burst immediately |
| Average Rate | refill_rate | Long-term steady state |
Token Bucket vs Leaky Bucket
Real Systems Using Token Bucket
| System | Implementation | Configuration | Use Case |
|---|---|---|---|
| AWS API Gateway | Native token bucket | Configurable per stage | API management |
| NGINX | ngx_http_limit_req_module | burst parameter controls capacity | Web server rate limiting |
| Envoy Proxy | Local/global rate limiting | Configurable via xDS | Service mesh |
| Google Cloud Endpoints | Token bucket style | Quota configuration | API quotas |
| Kong Gateway | Rate limiting plugin | Configurable policy | API gateway |
Note: Implementations may vary slightly. Verify current behavior in official documentation.
Configuration Patterns
When to Use Token Bucket
✓ Perfect Use Cases
✕ When NOT to Use
Interview Application
Common Interview Question
Q: “How would you implement rate limiting for a public API? Walk me through token bucket and when you’d choose it over alternatives.”
Strong Answer:
“I’d implement rate limiting using a token bucket algorithm for most API use cases. Here’s my reasoning:
Why Token Bucket:
- Burst tolerance: Real clients are bursty—mobile apps make 10 requests on launch, then go quiet. Token bucket handles this gracefully.
- O(1) per request: Just arithmetic—no scanning, no sorting. Critical for high-throughput APIs.
- Simple state: Two values per key (tokens, timestamp). Easy to store in Redis.
The Algorithm:
- Each request checks: do I have tokens? If yes, consume one and proceed. If no, reject with 429.
- Tokens refill at a steady rate (e.g., 10/second) up to capacity (e.g., 100).
- Capacity controls burst size; refill rate controls sustained throughput.
Distributed Implementation:
- Store
{tokens, last_refill}in Redis per API key- Use Lua script for atomic check-and-decrement
- Handle Redis failures by failing open (allow request)—better to risk some overload than block everyone
When I’d choose alternatives:
- Sliding window: When I need exact counting for billing (no burst overage)
- Leaky bucket: When I need to smooth output to a downstream service
- Fixed window: Never—boundary burst problem makes it unreliable
Headers I’d return:
X-RateLimit-Limit: Bucket capacityX-RateLimit-Remaining: Current tokensX-RateLimit-Reset: When bucket refills to capacityRetry-After: Seconds until tokens available (on 429)”
Follow-up: What’s the boundary burst problem and how does token bucket handle it?
“The boundary burst problem affects fixed window rate limiting. If the window resets at the top of each minute, a client can make 100 requests at 0:59 and 100 more at 1:00—effectively 200 requests in 2 seconds while technically staying within ‘100/minute’.
Token bucket avoids this because it doesn’t have discrete windows. Tokens accumulate continuously. If you burn all tokens at 0:59, you don’t magically get more at 1:00—you wait for refill. The maximum burst is always capped at capacity, regardless of timing.”
Follow-up: How do you handle token bucket in a multi-region deployment?
“There are three approaches with different trade-offs:
Sticky routing: Route each user to one region. Simple but adds latency for users far from their assigned region.
Synchronized global state: Single Redis cluster, all regions check it. Accurate but adds cross-region latency to every request.
Local with eventual sync: Each region has local bucket, periodically sync totals. Fast but user might get N × limit globally.
For most APIs, I’d choose option 3—slightly exceeding limits is acceptable compared to adding 100ms+ latency to every request. For billing-critical endpoints, option 2 or sticky routing.”
Code Example
Token Bucket Rate Limiter (Python)
import time
from dataclasses import dataclass
from typing import Tuple, Dict
@dataclass
class BucketState:
"""State for a single rate-limited key."""
tokens: float
last_refill: float
class TokenBucketRateLimiter:
"""
In-memory token bucket rate limiter.
For production, replace self.buckets with Redis storage.
"""
def __init__(self, capacity: int, refill_rate: float):
"""
Args:
capacity: Maximum tokens (burst size)
refill_rate: Tokens added per second
"""
self.capacity = capacity
self.refill_rate = refill_rate
self.buckets: Dict[str, BucketState] = {}
def is_allowed(self, key: str) -> Tuple[bool, dict]:
"""
Check if request is allowed and consume a token.
Args:
key: Rate limit key (user_id, ip, api_key)
Returns:
(allowed: bool, info: dict with remaining, reset_at)
"""
now = time.time()
# Get or create bucket state
if key not in self.buckets:
self.buckets[key] = BucketState(
tokens=self.capacity,
last_refill=now
)
bucket = self.buckets[key]
# Calculate tokens to add since last refill
elapsed = now - bucket.last_refill
tokens_to_add = elapsed * self.refill_rate
# Refill bucket (cap at capacity)
bucket.tokens = min(self.capacity, bucket.tokens + tokens_to_add)
bucket.last_refill = now
# Check if we have tokens
if bucket.tokens >= 1:
bucket.tokens -= 1
allowed = True
else:
allowed = False
# Calculate when bucket will be full again
tokens_needed = self.capacity - bucket.tokens
seconds_to_full = tokens_needed / self.refill_rate
reset_at = int(now + seconds_to_full)
return allowed, {
"remaining": int(bucket.tokens),
"limit": self.capacity,
"reset_at": reset_at
}
# Usage example
if __name__ == "__main__":
# 10 requests/second with burst of 5
limiter = TokenBucketRateLimiter(
capacity=5, # Allow burst of 5
refill_rate=1 # Refill 1 token/second
)
user = "user_123"
# Simulate burst of 7 requests
print("Burst of 7 requests:")
for i in range(7):
allowed, info = limiter.is_allowed(user)
status = "✓ ALLOWED" if allowed else "✗ REJECTED"
print(f" Request {i+1}: {status} (remaining: {info['remaining']})")
# Wait 3 seconds for refill
print("\nWaiting 3 seconds...")
time.sleep(3)
# Try again
print("\nAfter waiting:")
for i in range(4):
allowed, info = limiter.is_allowed(user)
status = "✓ ALLOWED" if allowed else "✗ REJECTED"
print(f" Request {i+1}: {status} (remaining: {info['remaining']})")
Token Bucket with Redis (Production)
import time
import redis
class RedisTokenBucket:
"""
Distributed token bucket using Redis.
Uses Lua script for atomic check-and-decrement.
"""
LUA_SCRIPT = """
local key = KEYS[1]
local capacity = tonumber(ARGV[1])
local refill_rate = tonumber(ARGV[2])
local now = tonumber(ARGV[3])
-- Get current state (or initialize)
local bucket = redis.call('HMGET', key, 'tokens', 'last_refill')
local tokens = tonumber(bucket[1]) or capacity
local last_refill = tonumber(bucket[2]) or now
-- Calculate refill
local elapsed = now - last_refill
tokens = math.min(capacity, tokens + elapsed * refill_rate)
-- Try to consume token
local allowed = 0
if tokens >= 1 then
tokens = tokens - 1
allowed = 1
end
-- Update state
redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now)
redis.call('EXPIRE', key, 3600) -- Expire after 1 hour idle
-- Return: allowed, remaining, reset_at
local reset_at = now + (capacity - tokens) / refill_rate
return {allowed, math.floor(tokens), math.floor(reset_at)}
"""
def __init__(self, redis_client: redis.Redis, capacity: int, refill_rate: float):
self.redis = redis_client
self.capacity = capacity
self.refill_rate = refill_rate
self.script = self.redis.register_script(self.LUA_SCRIPT)
def is_allowed(self, key: str) -> tuple[bool, dict]:
"""Check and consume token atomically."""
now = time.time()
result = self.script(
keys=[f"ratelimit:{key}"],
args=[self.capacity, self.refill_rate, now]
)
return result[0] == 1, {
"remaining": result[1],
"limit": self.capacity,
"reset_at": result[2]
}
Express.js Middleware
const Redis = require('ioredis');
const redis = new Redis(process.env.REDIS_URL);
// Token bucket middleware factory
function tokenBucket({ capacity, refillRate, keyGenerator }) {
const luaScript = `
local key = KEYS[1]
local capacity = tonumber(ARGV[1])
local refill_rate = tonumber(ARGV[2])
local now = tonumber(ARGV[3])
local bucket = redis.call('HMGET', key, 'tokens', 'last_refill')
local tokens = tonumber(bucket[1]) or capacity
local last_refill = tonumber(bucket[2]) or now
local elapsed = now - last_refill
tokens = math.min(capacity, tokens + elapsed * refill_rate)
local allowed = 0
if tokens >= 1 then
tokens = tokens - 1
allowed = 1
end
redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now)
redis.call('EXPIRE', key, 3600)
return {allowed, math.floor(tokens), math.floor(now + (capacity - tokens) / refill_rate)}
`;
return async (req, res, next) => {
const key = keyGenerator(req);
const now = Date.now() / 1000;
try {
const [allowed, remaining, resetAt] = await redis.eval(
luaScript,
1,
`ratelimit:${key}`,
capacity,
refillRate,
now
);
// Set rate limit headers
res.set({
'X-RateLimit-Limit': capacity,
'X-RateLimit-Remaining': remaining,
'X-RateLimit-Reset': resetAt,
});
if (allowed) {
next();
} else {
res.set('Retry-After', Math.ceil(1 / refillRate));
res.status(429).json({
error: 'Too Many Requests',
retryAfter: Math.ceil(1 / refillRate),
});
}
} catch (err) {
// Fail open on Redis errors
console.error('Rate limiter error:', err);
next();
}
};
}
// Usage
app.use('/api/', tokenBucket({
capacity: 100, // Burst of 100
refillRate: 10, // 10 requests/second sustained
keyGenerator: (req) => req.user?.id || req.ip,
}));
Related Content
See It In Action:
- Rate Limiting Explainer - Compares token bucket with sliding window
Related Concepts:
- Rate Limiting - Parent concept
- Sliding Window - Alternative: smoother enforcement, no burst
- Leaky Bucket - Smooths output rate with queueing
Quick Self-Check
- Can explain token bucket in 60 seconds?
- Understand the difference between capacity and refill rate?
- Know why token bucket allows bursts while leaky bucket smooths output?
- Can implement atomic token bucket in Redis with Lua?
- Understand the trade-off with fixed window (boundary burst problem)?
- Know when to choose sliding window over token bucket?
Interview Notes
60% of rate limiting discussions
Powers systems at API gateways, network QoS
O(1) per request query improvement
Minimal state per key