Rate Limiting Algorithms

Visual comparison of Token Bucket vs Sliding Window — understand burst handling, accuracy trade-offs, and when to use each algorithm.

Read this as Should excess demand be queued, shaped, or rejected?
Failure Trap
Average request rates hide boundary bursts and unfair clients.
Decision Rule
Use token buckets when bursts are acceptable; use sliding windows when fairness and smoother limits matter more.
Rate limiting: token bucket versus sliding window A six-step walkthrough. First, an unprotected API is overwhelmed by a flood of requests and returns errors. Then the token bucket algorithm holds a fixed number of tokens that refill at a steady rate, absorbing short bursts up to capacity. Finally, the sliding window counter blends the previous window into the current one with a weighted formula to smooth out boundary bursts, with a guide to when to choose each algorithm. No limit: API overwhelmed Flood API overloaded 503 Token bucket: holds tokens Capacity 5 +1 / sec 1 req = 1 token Refills steady Empty = reject O(1) each Burst of 4: absorbed 1 left Req 1 OK Req 2 OK Req 3 OK Req 4 OK Burst OK 4 served Sliding window counter Previous 10 req Current 3 req 30% in Weighted count prev × (1 − elapsed) + curr 10 × 0.7 + 3 = 10 Smooth rate control Current usage 8 / 10 No boundary burst When to use which? Token bucket Bursts OK Simple state Gateways User-facing Sliding window Smooth rate No boundary Mem light Billing
1 / ?

Why Rate Limiting?

Without rate limiting, a single user or malicious actor can overwhelm your system with traffic spikes. Rate limiting protects your infrastructure, controls costs, and ensures fair usage across all users.

  • Resource protection: Prevent database connection exhaustion, memory spikes
  • Cost control: Limit cloud spend on serverless and usage-based pricing
  • Security: Stop brute force attacks and credential stuffing

Token Bucket Algorithm

Imagine a bucket filled with tokens. Each request consumes one token. The bucket refills at a steady rate up to a fixed maximum capacity. The diagram uses small round numbers — a capacity of 5 tokens refilling at +1 token/sec — so the mechanics stay clear; production limits scale the same way (e.g., thousands of tokens, hundreds per second).

Many APIs use token bucket-style limits to allow short bursts while preventing abuse. Most API gateways and edge services use variations of this algorithm.

  • Capacity: Maximum burst size (here: 5 tokens)
  • Refill rate: Tokens added per second (here: +1/sec)
  • Time complexity: O(1) per request

Burst Handling

When traffic is light, tokens accumulate. When a burst arrives, the bucket can absorb it — up to capacity. This is perfect for APIs where legitimate users have bursty patterns.

If tokens run out, requests are rejected with HTTP 429 (Too Many Requests) until the bucket refills. Include a Retry-After header for proper client handling.

  • Bursts allowed up to bucket capacity
  • Steady-state rate limited by refill rate
  • Common in API gateways, edge services, payment APIs

Sliding Window Counter

Fixed windows have a flaw: users can hit the limit at 2:59 PM, then again at 3:00 PM — effectively doubling their rate at the boundary.

Sliding window counter fixes this by weighting the previous window's count based on current position in the window.

  • Formula: prev_count × (1 - elapsed_ratio) + curr_count
  • Approximation: Smooths boundary spikes without full log storage
  • Memory efficient: Only 2 counters per user (previous + current)

Smooth Rate Control

Sliding window counters ensure no spike at window boundaries by blending the previous window into the current one.

At 30% into the current window, if previous window had 10 requests and current has 3, the weighted count is: 10 × 0.70 + 3 = 10. Still at the limit!

  • No boundary burst problem (unlike fixed windows)
  • Better user experience with predictable enforcement
  • Common in API rate limiters, billing systems, client SDKs

When to Use Which?

Choose based on your requirements:

  • Token Bucket: When you need burst tolerance. Use for payment APIs, user-facing features, or when legitimate traffic is bursty.
  • Sliding Window: When you need smooth, predictable rate enforcement. Use for background jobs, third-party API quotas, or billing limits.

For distributed systems, both algorithms need shared state (Redis, Memcached) to enforce limits across multiple instances.