Rate Limiting Algorithms
Visual comparison of Token Bucket vs Sliding Window — understand burst handling, accuracy trade-offs, and when to use each algorithm.
Why Rate Limiting?
Without rate limiting, a single user or malicious actor can overwhelm your system with traffic spikes. Rate limiting protects your infrastructure, controls costs, and ensures fair usage across all users.
- Resource protection: Prevent database connection exhaustion, memory spikes
- Cost control: Limit cloud spend on serverless and usage-based pricing
- Security: Stop brute force attacks and credential stuffing
Token Bucket Algorithm
Imagine a bucket filled with tokens. Each request consumes one token. The bucket refills at a steady rate (e.g., 10 tokens/second) up to a maximum capacity.
Many APIs use token bucket-style limits to allow short bursts while preventing abuse. Most API gateways and edge services use variations of this algorithm.
- Capacity: Maximum burst size (e.g., 1,000 tokens)
- Refill rate: Tokens added per second (e.g., 100/sec)
- Time complexity:
O(1)per request
Burst Handling
When traffic is light, tokens accumulate. When a burst arrives, the bucket can absorb it — up to capacity. This is perfect for APIs where legitimate users have bursty patterns.
If tokens run out, requests are rejected with HTTP 429 (Too
Many Requests) until the bucket refills. Include a Retry-After header for proper client handling.
- Bursts allowed up to bucket capacity
- Steady-state rate limited by refill rate
- Common in API gateways, edge services, payment APIs
Sliding Window Counter
Fixed windows have a flaw: users can hit the limit at 2:59 PM, then again at 3:00 PM — effectively doubling their rate at the boundary.
Sliding window counter fixes this by weighting the previous window's count based on current position in the window.
- Formula:
prev_count × (1 - elapsed_ratio) + curr_count - Approximation: Smooths boundary spikes without full log storage
- Memory efficient: Only 2 counters per user (previous + current)
Smooth Rate Control
Sliding window counters ensure no spike at window boundaries by blending the previous window into the current one.
At 30% into the current window, if previous window had 10 requests and
current has 3, the weighted count is: 10 × 0.70 + 3 = 10.
Still at the limit!
- No boundary burst problem (unlike fixed windows)
- Better user experience with predictable enforcement
- Common in API rate limiters, billing systems, client SDKs
When to Use Which?
Choose based on your requirements:
- Token Bucket: When you need burst tolerance. Use for payment APIs, user-facing features, or when legitimate traffic is bursty.
- Sliding Window: When you need smooth, predictable rate enforcement. Use for background jobs, third-party API quotas, or billing limits.
For distributed systems, both algorithms need shared state (Redis, Memcached) to enforce limits across multiple instances.