Rate Limiting Algorithms

Read this as Should excess demand be queued, shaped, or rejected?

Failure Trap: Average request rates hide boundary bursts and unfair clients.
Decision Rule: Use token buckets when bursts are acceptable; use sliding windows when fairness and smoother limits matter more.

1 / ?

Why Rate Limiting?

Without rate limiting, a single user or malicious actor can overwhelm your system with traffic spikes. Rate limiting protects your infrastructure, controls costs, and ensures fair usage across all users.

Resource protection: Prevent database connection exhaustion, memory spikes
Cost control: Limit cloud spend on serverless and usage-based pricing
Security: Stop brute force attacks and credential stuffing

Token Bucket Algorithm

Imagine a bucket filled with tokens. Each request consumes one token. The bucket refills at a steady rate up to a fixed maximum capacity. The diagram uses small round numbers — a capacity of 5 tokens refilling at +1 token/sec — so the mechanics stay clear; production limits scale the same way (e.g., thousands of tokens, hundreds per second).

Many APIs use token bucket-style limits to allow short bursts while preventing abuse. Most API gateways and edge services use variations of this algorithm.

Capacity: Maximum burst size (here: 5 tokens)
Refill rate: Tokens added per second (here: +1/sec)
Time complexity: O(1) per request

Burst Handling

When traffic is light, tokens accumulate. When a burst arrives, the bucket can absorb it — up to capacity. This is perfect for APIs where legitimate users have bursty patterns.

If tokens run out, requests are rejected with HTTP 429 (Too Many Requests) until the bucket refills. Include a Retry-After header for proper client handling.

Bursts allowed up to bucket capacity
Steady-state rate limited by refill rate
Common in API gateways, edge services, payment APIs

Sliding Window Counter

Fixed windows have a flaw: users can hit the limit at 2:59 PM, then again at 3:00 PM — effectively doubling their rate at the boundary.

Sliding window counter fixes this by weighting the previous window's count based on current position in the window.

Formula: prev_count × (1 - elapsed_ratio) + curr_count
Approximation: Smooths boundary spikes without full log storage
Memory efficient: Only 2 counters per user (previous + current)

Smooth Rate Control

Sliding window counters ensure no spike at window boundaries by blending the previous window into the current one.

At 30% into the current window, if previous window had 10 requests and current has 3, the weighted count is: 10 × 0.70 + 3 = 10. Still at the limit!

No boundary burst problem (unlike fixed windows)
Better user experience with predictable enforcement
Common in API rate limiters, billing systems, client SDKs

When to Use Which?

Choose based on your requirements:

Token Bucket: When you need burst tolerance. Use for payment APIs, user-facing features, or when legitimate traffic is bursty.
Sliding Window: When you need smooth, predictable rate enforcement. Use for background jobs, third-party API quotas, or billing limits.

For distributed systems, both algorithms need shared state (Redis, Memcached) to enforce limits across multiple instances.