Redis vs In-Memory Token Bucket

Choosing between a Redis-backed and an in-memory token bucket is the first hard decision you make when you scale rate limiting past a single process. This question sits under the token bucket algorithm guide, and it turns on one variable: whether your limit must hold globally across every node, or only locally per process. Pick wrong and you either double your effective limit under load balancing, or you add a network round-trip to every request for a guarantee you never needed.

The problem in concrete numbers

Say you advertise 100 requests/second per API key and you run 5 application nodes behind a round-robin load balancer. With a purely in-memory bucket, each node refills its own 100 rps reservoir, so a single key can spend up to 500 rps across the fleet — a 5× overshoot that silently breaks the contract you sell to customers. A Redis-backed bucket keeps one authoritative reservoir, so the same key is held to 100 rps no matter which node answers. The cost: every decision becomes a Redis round-trip, typically 0.2–1 ms intra-AZ, added to request latency and to your Redis QPS bill.

In-memory versus Redis-backed token bucket across five nodes Five nodes each holding a local 100 rps bucket sum to 500 rps, while a shared Redis bucket holds the global limit at 100 rps. In-memory: limit per node Redis: one shared limit Node 1: 100 Node 2: 100 Node 3: 100 Node 4: 100 effective 500 rps 5x overshoot Node 1 Node 2 Node 3 Redis 100 rps global limit holds

Decision matrix

Criterion In-memory bucket Redis-backed bucket
Accuracy across nodes Per-process only; N× overshoot Global, exact within one round-trip
Added per-request latency ~0 (local memory) ~0.2–1 ms intra-AZ; more cross-AZ
Throughput ceiling CPU-bound, millions/s Redis QPS + network bound
Failure blast radius None (no dependency) Redis outage stalls every limit check
State on deploy/restart Lost — buckets reset, brief over-allow Survives; TTL-evicted keys persist
Operational cost Nil Redis cluster to run, monitor, scale
Best fit Single node, or coarse per-node caps Multi-node global quotas, billing-critical limits

Selection rules:

  • Use in-memory when you run a single instance, when “per-node” is an acceptable definition of the limit, or as a fast local pre-filter that rejects obvious floods before they reach Redis.
  • Use Redis when the advertised limit is global, when correctness is billing- or compliance-critical, or when autoscaling makes the node count unpredictable (a per-node cap drifts every time you scale).
  • Use both (tiered): a generous in-memory bucket absorbs bursts and shields Redis, while the Redis counter architecture holds the authoritative global limit. This is the standard high-traffic pattern.

Step-by-step: a hybrid local + global bucket

The robust production design is a two-tier check: a local bucket that fails open and a global Redis bucket that is authoritative. Implement it in order.

  • Define the limit once in config (capacity, refill_rate
  • Emit X-RateLimit-* and Retry-After
// Two-tier token bucket: local pre-filter + authoritative Redis bucket.
import Redis from "ioredis";
const redis = new Redis(process.env.REDIS_URL!);

// Atomic refill-and-consume; returns [allowed, remaining]. One round-trip, no race.
const BUCKET_LUA = `
local key      = KEYS[1]
local capacity = tonumber(ARGV[1])
local rate     = tonumber(ARGV[2])     -- tokens per second
local now      = tonumber(ARGV[3])     -- ms
local cost     = tonumber(ARGV[4])
local b = redis.call('HMGET', key, 'tokens', 'ts')
local tokens = tonumber(b[1]) or capacity
local ts     = tonumber(b[2]) or now
tokens = math.min(capacity, tokens + (now - ts) / 1000 * rate)
local allowed = 0
if tokens >= cost then tokens = tokens - cost; allowed = 1 end
redis.call('HSET', key, 'tokens', tokens, 'ts', now)
redis.call('PEXPIRE', key, math.ceil(capacity / rate * 1000) + 1000)
return { allowed, math.floor(tokens) }`;

const local = new Map<string, { tokens: number; ts: number }>();
function localAllow(key: string, cap: number, rate: number): boolean {
  const now = Date.now();
  const b = local.get(key) ?? { tokens: cap, ts: now };
  b.tokens = Math.min(cap, b.tokens + ((now - b.ts) / 1000) * rate);
  b.ts = now;
  if (b.tokens < 1) { local.set(key, b); return false; } // shed before hitting Redis
  b.tokens -= 1; local.set(key, b); return true;
}

export async function allow(key: string): Promise<{ ok: boolean; remaining: number }> {
  // Local pre-filter sized at 2x the fair per-node share (5 nodes => 40 rps local).
  if (!localAllow(`local:${key}`, 80, 40)) return { ok: false, remaining: 0 };
  try {
    const [ok, remaining] = (await redis.eval(
      BUCKET_LUA, 1, `rl:${key}`, 100, 100, Date.now(), 1,
    )) as [number, number];
    return { ok: ok === 1, remaining };
  } catch {
    return { ok: true, remaining: -1 }; // fail-open: Redis down, lean on local bucket
  }
}

Gotchas & edge cases

  • In-memory overshoot is silent. Nothing logs the 5× breach; you only see it as downstream saturation. If you advertise a global number, a per-node bucket is the wrong tool, not a cheaper one.
  • Fail-open vs fail-closed is a business decision. Failing open during a Redis outage protects availability but lets traffic through; failing closed protects the backend but returns 429s during your incident. Choose deliberately and alert on the fallback path.
  • Clock skew skews refill. The Lua script uses Redis server time implicitly only if you pass it; passing each node’s Date.now() means clock skew between nodes perturbs refill. Prefer redis.call('TIME') inside the script for a single clock source.
  • Restarts reset in-memory state. A rolling deploy briefly resets local buckets, allowing a short over-allow window. Usually fine; not fine for hard quotas.
  • Hot keys concentrate load. One abusive key funnels every request to a single Redis slot. A local pre-filter is what saves Redis here.

Verification & testing

Drive concurrent load from multiple workers and confirm the aggregate accepted rate matches the limit, not a multiple of it.

# 5 parallel workers, same key, 3s — expect ~100 rps accepted total with Redis,
# ~500 rps with a pure in-memory bucket.
seq 5 | xargs -P5 -I{} sh -c \
  'hey -z 3s -c 20 -H "X-API-Key: acct_42" https://api.example.com/v1/search' \
  | grep -E "Requests/sec|Status code"

Watch rate_limit_redis_latency_seconds and the accepted-vs-rejected ratio while the test runs; see Prometheus metrics for rate limiting for the instrumentation.

Frequently Asked Questions

Is the Redis round-trip latency worth it?

For a global limit, yes — 0.2–1 ms intra-AZ is negligible next to typical API latency, and it is the only way to hold one number across many nodes. If your limit is genuinely per-node, the round-trip buys you nothing and an in-memory bucket is correct.

Can I avoid Redis and still get a global limit?

Only approximately. Gossip or CRDT-based counters spread state across nodes without a central store, but they trade exactness for availability and are far more complex to operate. For most teams a single Redis (with replicas) is simpler and accurate enough.

How big should the local pre-filter bucket be?

Larger than the fair per-node share so it never rejects legitimate traffic that Redis would allow, but small enough to shed obvious floods. Roughly 1.5–2× the per-node share (global limit ÷ node count) is a sensible default.

What happens to in-flight limits during a deploy?

In-memory buckets reset to full on restart, so a rolling deploy briefly over-allows. Redis-backed buckets persist through the deploy because state lives outside the process. For hard quotas, keep the authoritative state in Redis.