Per-Tier Quota Enforcement With Redis

This guide builds the runnable path from an incoming API key to an enforced per-tier decision in Redis: resolve the key to a tier, load that tier’s limit, consume an atomic token bucket keyed by (account, tier), check the monthly quota, and emit the headers a client needs. It is the implementation companion to Tiered Access & Quota Enforcement, narrowed to one concrete build you can paste and run. The whole decision is a single Redis round-trip via a token bucket Lua script so it stays correct across stateless nodes.

The problem in concrete numbers

Suppose three plans share one fleet of 6 nodes behind a load balancer: free = 10 rps / 50k per month, pro = 100 rps / 5M per month, enterprise = 1,000 rps / uncapped. A pro key sending a 250-request burst should be allowed (its capacity is 300 tokens) but a free key sending the same burst should be cut at ~20. If the limiter resolved every key to one global bucket, or kept per-node buckets, a single pro key spread over 6 nodes could spend 600 rps — 6× its contract. Keying the bucket by (account, tier) in Redis and consuming atomically holds each account to exactly its tier’s number regardless of which node answers.

Decision table: how to key and enforce

Choice Option A Option B Use
Bucket key per API key per (account, tier) (account, tier) so all of an account’s keys share one limit
Atomicity INCR + EXPIRE (two calls) single Lua script Lua — no read-modify-write race, one round-trip
Rate algorithm fixed window token bucket token bucket — smooth bursts up to burst
Quota counter sliding log INCR with month TTL INCR for the gate; sliding log only if it bills (see below)
Unknown tier allow clamp to smallest clamp to free — never grant the largest reservoir by accident
Redis down fail-closed fail-open on rate fail-open rate, fail-closed quota

Step-by-step implementation

Build it in this order; each step is independently testable.

  • Load tier policies (rate, burst, quota
  • Resolve the incoming API key to {account, tier}
  • Compute the bucket key (account, tier) and the quota key (account, billing_month)
  • Map the returned decision to 200 / 429 / 402

1–3. Resolve the key and compute keys

# Python (redis-py). Resolve key -> tier, cache briefly, derive Redis keys.
import time, datetime, redis
r = redis.Redis(host="localhost", port=6379, decode_responses=True)

POLICIES = {
    "free":       {"rate": 10,   "burst": 20,   "quota": 50_000},
    "pro":        {"rate": 100,  "burst": 300,  "quota": 5_000_000},
    "enterprise": {"rate": 1000, "burst": 2000, "quota": -1},   # -1 = uncapped
}

_cache: dict[str, tuple[str, str, float]] = {}   # api_key -> (account, tier, exp)

def resolve_tier(api_key: str) -> tuple[str, str]:
    hit = _cache.get(api_key)
    if hit and hit[2] > time.time():
        return hit[0], hit[1]
    account, tier = lookup_account(api_key)        # your DB / auth-service call
    _cache[api_key] = (account, tier, time.time() + 30)
    return account, tier

def billing_keys(account: str, tier: str) -> tuple[str, str, int]:
    month = datetime.datetime.utcnow().strftime("%Y-%m")
    now = datetime.datetime.utcnow()
    nxt = datetime.datetime(now.year + (now.month == 12),
                            (now.month % 12) + 1, 1)
    ttl = int((nxt - now).total_seconds())          # seconds to month reset
    return f"rl:rate:{account}:{tier}", f"rl:quota:{account}:{month}", ttl

4. Atomic Lua: token bucket + quota in one round-trip

-- KEYS[1]=rate bucket  KEYS[2]=quota counter
-- ARGV: cap, rate(tok/s), quota(-1=uncapped), now_ms, quota_ttl_s
-- Returns: {decision, rate_remaining, quota_remaining}
local cap   = tonumber(ARGV[1])
local rate  = tonumber(ARGV[2])
local quota = tonumber(ARGV[3])
local now   = tonumber(ARGV[4])
local qttl  = tonumber(ARGV[5])

local b = redis.call('HMGET', KEYS[1], 'tokens', 'ts')
local tokens = tonumber(b[1]) or cap
local ts     = tonumber(b[2]) or now
-- refill since last touch, capped at burst capacity
tokens = math.min(cap, tokens + (now - ts) / 1000 * rate)

if tokens < 1 then                                   -- rate exceeded
  redis.call('HSET', KEYS[1], 'tokens', tokens, 'ts', now)
  redis.call('PEXPIRE', KEYS[1], math.ceil(cap / rate * 1000) + 1000)
  return { 'RATE', math.floor(tokens), -1 }
end

local q_remaining = -1
if quota >= 0 then
  local used = redis.call('INCR', KEYS[2])
  if used == 1 then redis.call('EXPIRE', KEYS[2], qttl) end
  if used > quota then                               -- quota exhausted
    return { 'QUOTA', math.floor(tokens), 0 }
  end
  q_remaining = quota - used
end

tokens = tokens - 1                                  -- consume one token
redis.call('HSET', KEYS[1], 'tokens', tokens, 'ts', now)
redis.call('PEXPIRE', KEYS[1], math.ceil(cap / rate * 1000) + 1000)
return { 'OK', math.floor(tokens), q_remaining }

5. Call the script and map the decision to headers

ENFORCE = r.register_script(LUA_SOURCE)   # LUA_SOURCE = the script above

def enforce(api_key: str) -> dict:
    account, tier = resolve_tier(api_key)
    p = POLICIES.get(tier, POLICIES["free"])         # unknown tier -> smallest
    rate_key, quota_key, qttl = billing_keys(account, tier)
    decision, rate_rem, quota_rem = ENFORCE(
        keys=[rate_key, quota_key],
        args=[p["burst"], p["rate"], p["quota"], int(time.time() * 1000), qttl],
    )
    return {"decision": decision, "tier": tier,
            "rate_remaining": rate_rem, "quota_remaining": quota_rem,
            "quota_reset": qttl, "limit": p["rate"]}
# Framework layer (FastAPI-style) mapping the decision to status + headers.
from fastapi import Response, HTTPException

def apply(res: Response, d: dict):
    res.headers["RateLimit-Limit"] = str(d["limit"])
    res.headers["RateLimit-Remaining"] = str(max(0, d["rate_remaining"]))
    if d["quota_remaining"] >= 0:
        res.headers["X-Quota-Remaining"] = str(d["quota_remaining"])
        res.headers["X-Quota-Reset"] = str(d["quota_reset"])
    if d["decision"] == "RATE":
        res.headers["Retry-After"] = "1"
        raise HTTPException(429, "rate_limited")
    if d["decision"] == "QUOTA":
        raise HTTPException(402, "quota_exceeded")
Single atomic Lua script enforcing rate then quota The request resolves a tier, then one Lua script refills and consumes the token bucket and increments the quota, returning OK, RATE, or QUOTA. resolve key account, tier Atomic Lua (one round-trip) refill+consume token bucket INCR quota month TTL rate checked first; quota only if rate passes keyed by (account, tier) and (account, month) OK 200 RATE 429 QUOTA 402

Gotchas & edge cases

  • Sharing one bucket across an account’s keys. Keying by (account, tier) means all of an account’s API keys draw from one reservoir — usually what you want for billing fairness. If keys must be independent, scope per key instead; see API key scoping & rate limits.
  • Quota consumed before the response is sent. The Lua INCR charges the quota at admission, so a request that later 5xxs still counted. For metered billing this is wrong — use idempotency keys and reconciliation.
  • Stale tier on upgrade. A 30 s cache TTL means a freshly upgraded customer stays on the old tier for up to 30 s unless you publish an invalidation event.
  • Month boundary race. Two requests crossing midnight on the 1st can land in different YYYY-MM keys; this is harmless for gating but matters if you reconcile against billing.
  • PEXPIRE keeps idle buckets cheap. Without the TTL, every key an account ever used lingers in Redis memory.

Verification & testing

# Free key: capacity 20, refill 10 rps. Fire 30 quick requests -> ~20 pass, rest 429.
for i in $(seq 1 30); do
  curl -s -o /dev/null -w "%{http_code} " -H "X-API-Key: free_demo" \
    https://api.example.com/v1/ping
done; echo
# Aggregate test: 6 workers, one pro key, 3s. Expect ~100 rps accepted TOTAL,
# not 600 — proves the (account, tier) bucket is global, not per-node.
seq 6 | xargs -P6 -I{} sh -c \
  'hey -z 3s -c 20 -H "X-API-Key: pro_demo" https://api.example.com/v1/ping' \
  | grep -E "Requests/sec|Status code distribution" -A4

Watch the accepted-vs-rejected ratio and Redis call latency while the test runs; wire it up per Prometheus metrics for rate limiting.

Frequently Asked Questions

Why key the bucket by account and tier instead of by API key?

Billing and fairness are per account, not per key. If an account rotates or issues several keys, keying by (account, tier) holds them all to one reservoir. Key per API key only when each key is sold as its own independent limit.

Should the rate check or the quota check run first?

Rate first. It is cheaper and rejects floods before they touch the monthly counter, so a client hammering you in a tight loop never burns quota it would otherwise consume on retries. The Lua script above only increments the quota once the rate bucket has tokens.

How do I reset the monthly quota?

Don't reset it explicitly. Key the counter by account:YYYY-MM and set EXPIRE to seconds-until-month-end on first write. The next month uses a new key that starts at zero, and Redis evicts the old one automatically — no cron sweep.

What status code should an exhausted monthly quota return?

402 Payment Required (or a 403 with a quota_exceeded reason). A spent quota is not a transient condition, so returning 429 wrongly tells clients to retry. Reserve 429 for the per-second rate limit and pair it with Retry-After.