Rate Limiting Algorithms

Visual comparison of Token Bucket vs Sliding Window — understand burst handling, accuracy trade-offs, and when to use each algorithm.

Based on: Rate Limiting Token Bucket Sliding Window

1 / ?

Why Rate Limiting?

Without rate limiting, a single user or malicious actor can overwhelm your system with traffic spikes. Rate limiting protects your infrastructure, controls costs, and ensures fair usage across all users.

Resource protection: Prevent database connection exhaustion, memory spikes
Cost control: Limit cloud spend on serverless and usage-based pricing
Security: Stop brute force attacks and credential stuffing

Token Bucket Algorithm

Imagine a bucket filled with tokens. Each request consumes one token. The bucket refills at a steady rate (e.g., 10 tokens/second) up to a maximum capacity.

Many APIs use token bucket-style limits to allow short bursts while preventing abuse. Most API gateways and edge services use variations of this algorithm.

Capacity: Maximum burst size (e.g., 1,000 tokens)
Refill rate: Tokens added per second (e.g., 100/sec)
Time complexity: O(1) per request

Burst Handling

When traffic is light, tokens accumulate. When a burst arrives, the bucket can absorb it — up to capacity. This is perfect for APIs where legitimate users have bursty patterns.

If tokens run out, requests are rejected with HTTP 429 (Too Many Requests) until the bucket refills. Include a Retry-After header for proper client handling.

Bursts allowed up to bucket capacity
Steady-state rate limited by refill rate
Common in API gateways, edge services, payment APIs

Sliding Window Counter

Fixed windows have a flaw: users can hit the limit at 2:59 PM, then again at 3:00 PM — effectively doubling their rate at the boundary.

Sliding window counter fixes this by weighting the previous window's count based on current position in the window.

Formula: prev_count × (1 - elapsed_ratio) + curr_count
Approximation: Smooths boundary spikes without full log storage
Memory efficient: Only 2 counters per user (previous + current)

Smooth Rate Control

Sliding window counters ensure no spike at window boundaries by blending the previous window into the current one.

At 30% into the current window, if previous window had 10 requests and current has 3, the weighted count is: 10 × 0.70 + 3 = 10. Still at the limit!

No boundary burst problem (unlike fixed windows)
Better user experience with predictable enforcement
Common in API rate limiters, billing systems, client SDKs

When to Use Which?

Choose based on your requirements:

Token Bucket: When you need burst tolerance. Use for payment APIs, user-facing features, or when legitimate traffic is bursty.
Sliding Window: When you need smooth, predictable rate enforcement. Use for background jobs, third-party API quotas, or billing limits.

For distributed systems, both algorithms need shared state (Redis, Memcached) to enforce limits across multiple instances.

What's Next?

Now that you understand rate limiting algorithms, explore related patterns: Load Balancing for distributing traffic, Circuit Breakers for handling downstream failures, or dive into Distributed Systems fundamentals.