Redis vs In-Memory Token Bucket
Choosing between a Redis-backed and an in-memory token bucket is the first hard decision you make when you scale rate limiting past a single process. This question sits under the token bucket algorithm guide, and it turns on one variable: whether your limit must hold globally across every node, or only locally per process. Pick wrong and you either double your effective limit under load balancing, or you add a network round-trip to every request for a guarantee you never needed.
The problem in concrete numbers
Say you advertise 100 requests/second per API key and you run 5 application nodes behind a round-robin load balancer. With a purely in-memory bucket, each node refills its own 100 rps reservoir, so a single key can spend up to 500 rps across the fleet — a 5× overshoot that silently breaks the contract you sell to customers. A Redis-backed bucket keeps one authoritative reservoir, so the same key is held to 100 rps no matter which node answers. The cost: every decision becomes a Redis round-trip, typically 0.2–1 ms intra-AZ, added to request latency and to your Redis QPS bill.
Decision matrix
| Criterion | In-memory bucket | Redis-backed bucket |
|---|---|---|
| Accuracy across nodes | Per-process only; N× overshoot | Global, exact within one round-trip |
| Added per-request latency | ~0 (local memory) | ~0.2–1 ms intra-AZ; more cross-AZ |
| Throughput ceiling | CPU-bound, millions/s | Redis QPS + network bound |
| Failure blast radius | None (no dependency) | Redis outage stalls every limit check |
| State on deploy/restart | Lost — buckets reset, brief over-allow | Survives; TTL-evicted keys persist |
| Operational cost | Nil | Redis cluster to run, monitor, scale |
| Best fit | Single node, or coarse per-node caps | Multi-node global quotas, billing-critical limits |
Selection rules:
- Use in-memory when you run a single instance, when “per-node” is an acceptable definition of the limit, or as a fast local pre-filter that rejects obvious floods before they reach Redis.
- Use Redis when the advertised limit is global, when correctness is billing- or compliance-critical, or when autoscaling makes the node count unpredictable (a per-node cap drifts every time you scale).
- Use both (tiered): a generous in-memory bucket absorbs bursts and shields Redis, while the Redis counter architecture holds the authoritative global limit. This is the standard high-traffic pattern.
Step-by-step: a hybrid local + global bucket
The robust production design is a two-tier check: a local bucket that fails open and a global Redis bucket that is authoritative. Implement it in order.
- Define the limit once in config (
capacity,refill_rate - Emit
X-RateLimit-*andRetry-After
// Two-tier token bucket: local pre-filter + authoritative Redis bucket.
import Redis from "ioredis";
const redis = new Redis(process.env.REDIS_URL!);
// Atomic refill-and-consume; returns [allowed, remaining]. One round-trip, no race.
const BUCKET_LUA = `
local key = KEYS[1]
local capacity = tonumber(ARGV[1])
local rate = tonumber(ARGV[2]) -- tokens per second
local now = tonumber(ARGV[3]) -- ms
local cost = tonumber(ARGV[4])
local b = redis.call('HMGET', key, 'tokens', 'ts')
local tokens = tonumber(b[1]) or capacity
local ts = tonumber(b[2]) or now
tokens = math.min(capacity, tokens + (now - ts) / 1000 * rate)
local allowed = 0
if tokens >= cost then tokens = tokens - cost; allowed = 1 end
redis.call('HSET', key, 'tokens', tokens, 'ts', now)
redis.call('PEXPIRE', key, math.ceil(capacity / rate * 1000) + 1000)
return { allowed, math.floor(tokens) }`;
const local = new Map<string, { tokens: number; ts: number }>();
function localAllow(key: string, cap: number, rate: number): boolean {
const now = Date.now();
const b = local.get(key) ?? { tokens: cap, ts: now };
b.tokens = Math.min(cap, b.tokens + ((now - b.ts) / 1000) * rate);
b.ts = now;
if (b.tokens < 1) { local.set(key, b); return false; } // shed before hitting Redis
b.tokens -= 1; local.set(key, b); return true;
}
export async function allow(key: string): Promise<{ ok: boolean; remaining: number }> {
// Local pre-filter sized at 2x the fair per-node share (5 nodes => 40 rps local).
if (!localAllow(`local:${key}`, 80, 40)) return { ok: false, remaining: 0 };
try {
const [ok, remaining] = (await redis.eval(
BUCKET_LUA, 1, `rl:${key}`, 100, 100, Date.now(), 1,
)) as [number, number];
return { ok: ok === 1, remaining };
} catch {
return { ok: true, remaining: -1 }; // fail-open: Redis down, lean on local bucket
}
}
Gotchas & edge cases
- In-memory overshoot is silent. Nothing logs the 5× breach; you only see it as downstream saturation. If you advertise a global number, a per-node bucket is the wrong tool, not a cheaper one.
- Fail-open vs fail-closed is a business decision. Failing open during a Redis outage protects availability but lets traffic through; failing closed protects the backend but returns 429s during your incident. Choose deliberately and alert on the fallback path.
- Clock skew skews refill. The Lua script uses Redis server time implicitly only if you pass it; passing each node’s
Date.now()means clock skew between nodes perturbs refill. Preferredis.call('TIME')inside the script for a single clock source. - Restarts reset in-memory state. A rolling deploy briefly resets local buckets, allowing a short over-allow window. Usually fine; not fine for hard quotas.
- Hot keys concentrate load. One abusive key funnels every request to a single Redis slot. A local pre-filter is what saves Redis here.
Verification & testing
Drive concurrent load from multiple workers and confirm the aggregate accepted rate matches the limit, not a multiple of it.
# 5 parallel workers, same key, 3s — expect ~100 rps accepted total with Redis,
# ~500 rps with a pure in-memory bucket.
seq 5 | xargs -P5 -I{} sh -c \
'hey -z 3s -c 20 -H "X-API-Key: acct_42" https://api.example.com/v1/search' \
| grep -E "Requests/sec|Status code"
Watch rate_limit_redis_latency_seconds and the accepted-vs-rejected ratio while the test runs; see Prometheus metrics for rate limiting for the instrumentation.
Frequently Asked Questions
Is the Redis round-trip latency worth it?
For a global limit, yes — 0.2–1 ms intra-AZ is negligible next to typical API latency, and it is the only way to hold one number across many nodes. If your limit is genuinely per-node, the round-trip buys you nothing and an in-memory bucket is correct.
Can I avoid Redis and still get a global limit?
Only approximately. Gossip or CRDT-based counters spread state across nodes without a central store, but they trade exactness for availability and are far more complex to operate. For most teams a single Redis (with replicas) is simpler and accurate enough.
How big should the local pre-filter bucket be?
Larger than the fair per-node share so it never rejects legitimate traffic that Redis would allow, but small enough to shed obvious floods. Roughly 1.5–2× the per-node share (global limit ÷ node count) is a sensible default.
What happens to in-flight limits during a deploy?
In-memory buckets reset to full on restart, so a rolling deploy briefly over-allows. Redis-backed buckets persist through the deploy because state lives outside the process. For hard quotas, keep the authoritative state in Redis.
Related
- Token Bucket Implementation — the parent algorithm guide and atomic refill mechanics.
- How to Choose Between Token Bucket and Leaky Bucket — sibling decision guide.
- Redis Counter Architecture — how the authoritative store is built and scaled.
- Distributed Algorithm Sync — clock skew and cross-node consistency.