Skip to content
~100s Visual Explainer

Rate Limiting Algorithms

Visual comparison of Token Bucket vs Sliding Window — understand burst handling, accuracy trade-offs, and when to use each algorithm.

API overloaded 503 Service Unavailable Token Bucket Capacity: 5 +1 token/sec How It Works • Each request consumes 1 token • Tokens refill at steady rate • Empty bucket = reject request • O(1) per request Burst: 4 requests 1 token left Req 1 ✓ Req 2 ✓ Req 3 ✓ Req 4 ✓ Burst Allowed! 4 requests served from bucket capacity Sliding Window Counter Previous Window 10 requests Current Window 3 requests 30% elapsed Weighted Count Formula: prev × (1 - elapsed%) + curr = 10×0.7 + 3 = 10 Smooth Rate Control Current Usage 8 / 10 No Boundary Burst! Smooth enforcement When to Use Which? Token Bucket ✓ Allows bursts ✓ Simple state ✓ API gateways User-facing APIs Sliding Window ✓ Smooth rate ✓ No boundary burst ✓ Memory efficient Billing & quotas
1 / ?

Why Rate Limiting?

Without rate limiting, a single user or malicious actor can overwhelm your system with traffic spikes. Rate limiting protects your infrastructure, controls costs, and ensures fair usage across all users.

  • Resource protection: Prevent database connection exhaustion, memory spikes
  • Cost control: Limit cloud spend on serverless and usage-based pricing
  • Security: Stop brute force attacks and credential stuffing

Token Bucket Algorithm

Imagine a bucket filled with tokens. Each request consumes one token. The bucket refills at a steady rate (e.g., 10 tokens/second) up to a maximum capacity.

Many APIs use token bucket-style limits to allow short bursts while preventing abuse. Most API gateways and edge services use variations of this algorithm.

  • Capacity: Maximum burst size (e.g., 1,000 tokens)
  • Refill rate: Tokens added per second (e.g., 100/sec)
  • Time complexity: O(1) per request

Burst Handling

When traffic is light, tokens accumulate. When a burst arrives, the bucket can absorb it — up to capacity. This is perfect for APIs where legitimate users have bursty patterns.

If tokens run out, requests are rejected with HTTP 429 (Too Many Requests) until the bucket refills. Include a Retry-After header for proper client handling.

  • Bursts allowed up to bucket capacity
  • Steady-state rate limited by refill rate
  • Common in API gateways, edge services, payment APIs

Sliding Window Counter

Fixed windows have a flaw: users can hit the limit at 2:59 PM, then again at 3:00 PM — effectively doubling their rate at the boundary.

Sliding window counter fixes this by weighting the previous window's count based on current position in the window.

  • Formula: prev_count × (1 - elapsed_ratio) + curr_count
  • Approximation: Smooths boundary spikes without full log storage
  • Memory efficient: Only 2 counters per user (previous + current)

Smooth Rate Control

Sliding window counters ensure no spike at window boundaries by blending the previous window into the current one.

At 30% into the current window, if previous window had 10 requests and current has 3, the weighted count is: 10 × 0.70 + 3 = 10. Still at the limit!

  • No boundary burst problem (unlike fixed windows)
  • Better user experience with predictable enforcement
  • Common in API rate limiters, billing systems, client SDKs

When to Use Which?

Choose based on your requirements:

  • Token Bucket: When you need burst tolerance. Use for payment APIs, user-facing features, or when legitimate traffic is bursty.
  • Sliding Window: When you need smooth, predictable rate enforcement. Use for background jobs, third-party API quotas, or billing limits.

For distributed systems, both algorithms need shared state (Redis, Memcached) to enforce limits across multiple instances.

What's Next?

Now that you understand rate limiting algorithms, explore related patterns: Load Balancing for distributing traffic, Circuit Breakers for handling downstream failures, or dive into Distributed Systems fundamentals.