Building a Rate Limiter — Algorithms and Real-World Choices

Rate limiters protect services from overload, abuse, and runaway clients. The algorithm choice matters more than people think — it determines what “N requests per second” actually means in practice. This article covers the main algorithms and their real implementations.

The four algorithms

Fixed window

Count requests per window (e.g., per minute). Reset counter each window.

Request 1 at 00:00:59 → count = 1
Request 2 at 00:01:00 → counter resets, count = 1

Pros: simple.

Cons: bursting at window boundary. 100-req/min limit allows 200 req in 1 second if 100 at :59 and 100 at :00.

Sliding window log

Store each request’s timestamp. To check limit, count timestamps in the last N seconds.

Pros: accurate, no boundary problem.

Cons: memory per client (one timestamp per request); expensive at high RPS.

Sliding window counter

Approximation: two fixed windows, weighted by overlap with current time.

Counter for minute N and N-1
At second 30 of minute N: limit = 0.5 × counterN-1 + counterN

Pros: low memory, smooth rate.

Cons: approximate; edge effects at high cardinality.

Token bucket

Virtual bucket of N tokens. Refills at rate R tokens/sec. Each request takes 1 token. Empty bucket = rate limited.

Pros: naturally handles bursts (bucket full = N bursts allowed). Most-used in production.

Cons: slightly harder to implement correctly.

Leaky bucket

Requests queue; leak out at constant rate. Overflow = reject.

Pros: smooths traffic (output rate is constant).

Cons: adds latency for queued requests. Less common for APIs; common in traffic shaping.

The practical winner — token bucket

For most API rate limits, token bucket is the standard. It handles bursts naturally (desirable for most real workloads), scales well, and is easy to implement distributedly.

Key parameters:

Capacity (N) — maximum burst size
Refill rate (R) — sustained rate

100/minute steady = R = 100/60 per second ≈ 1.67 tokens/s. Capacity = 100 = allow 100-request burst.

Distributed implementation with Redis

In-memory counters work for single-node. Distributed needs shared state — Redis is the typical choice.

public class RedisTokenBucket {
    private static final String LUA = """
        local key = KEYS[1]
        local capacity = tonumber(ARGV[1])
        local refill_per_sec = tonumber(ARGV[2])
        local now = tonumber(ARGV[3])
        local tokens_wanted = tonumber(ARGV[4])

        local state = redis.call('HMGET', key, 'tokens', 'last')
        local tokens = tonumber(state[1]) or capacity
        local last = tonumber(state[2]) or now

        local elapsed = now - last
        tokens = math.min(capacity, tokens + elapsed * refill_per_sec)

        local allowed = 0
        if tokens >= tokens_wanted then
            tokens = tokens - tokens_wanted
            allowed = 1
        end

        redis.call('HMSET', key, 'tokens', tokens, 'last', now)
        redis.call('EXPIRE', key, 3600)

        return {allowed, tokens}
        """;

    public boolean tryAcquire(String key, int capacity, double refillPerSec) {
        Long now = System.currentTimeMillis() / 1000;
        List<Long> result = redisTemplate.execute(
            RedisScript.of(LUA, List.class),
            List.of(key),
            String.valueOf(capacity),
            String.valueOf(refillPerSec),
            String.valueOf(now),
            "1"
        );
        return result.get(0) == 1L;
    }
}

Lua script ensures atomicity — read, compute, write in one round-trip. Single Redis hash per bucket.

Keying strategies

Rate limit by what?

Per user / API key. Standard. ratelimit:user:u-123.
Per IP. For unauthenticated or public endpoints. Careful with NAT (whole offices share IPs).
Per endpoint. Different limits per URL.
Composite. ratelimit:user:u-123:endpoint:POST-orders.

Composite keys give fine control but multiply Redis load.

Client-side behavior

Good API design tells the client what happened:

HTTP 429 Too Many Requests
X-RateLimit-Remaining header — tokens left
X-RateLimit-Reset — when more tokens arrive
Retry-After — how long to wait

Well-behaved clients respect these and back off. Poorly-behaved clients hammer regardless — that’s when the rate limiter earns its keep.

Rate limiting at different layers

Gateway. First line of defense. Quick rejection of abuse.

Application. Fine-grained, business-aware (premium users = higher limits).

Database. Connection pool limits, statement timeouts.

Defense in depth. Each layer protects the next.

Fairness

Global rate limits are unfair — one heavy user starves others. Per-user limits are fair but allow total load to grow with user count.

Common compromise:

Per-user limit (fairness)
Global limit (capacity protection)
Priority tiers (paid users > free users)

Token buckets compose well for this — per-user bucket refilled by a per-tier policy.

Burst handling

Bursts aren’t always abuse. A user makes 20 fast requests loading a dashboard — fine. Same user makes 2000 requests in 10 seconds — abuse.

Token bucket handles this naturally: burst capacity absorbs legitimate spikes; sustained rate enforcement catches abuse.

Distributed rate limiting gotchas

Clock skew. Different servers have different clocks. For sliding windows, minor skew is fine; for precise limits, use a logical clock or Redis’ time.

Redis as SPOF. If Redis is down, what happens? Options:

Fail open (allow all) — safer for availability
Fail closed (reject all) — safer for backend
Fallback to local limiter per instance — approximate, OK for most cases

Hot keys. One extremely active user hammers one Redis key. Mitigations: shard by user ID to multiple Redis nodes.

Alternative: API gateway built-ins

Most gateways (Kong, nginx, Envoy, Spring Cloud Gateway) have rate limiting built in. Use them when possible — they’re battle-tested and off the hot path of your service.

Custom implementation only when business logic is involved (per-tier limits, dynamic adjustments, bypass rules).

Closing note

Rate limiting looks simple — “count requests per second” — and reveals depth fast. Token bucket with Redis-backed distributed state handles most use cases cleanly. Respect for clients via informative headers turns rate limits from a hostile wall into a predictable contract. Defense in depth across layers catches what slips through any single one. Get those three things right and rate limits become quietly useful infrastructure rather than a source of support tickets.