Per-Tier Quota Enforcement With Redis

Q: Why key the bucket by account and tier instead of by API key?

Billing and fairness are per account, not per key. Keying by account and tier holds all of an account's keys to one reservoir. Key per API key only when each key is sold as its own independent limit.

Q: How do I reset the monthly quota?

Do not reset it explicitly. Key the counter by account and YYYY-MM and set EXPIRE to seconds-until-month-end on first write. The next month uses a fresh key starting at zero and Redis evicts the old one automatically.

Q: What status code should an exhausted monthly quota return?

402 Payment Required, or a 403 with a quota_exceeded reason. A spent quota is not transient, so returning 429 wrongly tells clients to retry. Reserve 429 for the per-second rate limit and pair it with Retry-After.

This guide builds the runnable path from an incoming API key to an enforced per-tier decision in Redis: resolve the key to a tier, load that tier’s limit, consume an atomic token bucket keyed by (account, tier), check the monthly quota, and emit the headers a client needs. It is the implementation companion to Tiered Access & Quota Enforcement, narrowed to one concrete build you can paste and run. The whole decision is a single Redis round-trip via a token bucket Lua script so it stays correct across stateless nodes.

The problem in concrete numbers

Suppose three plans share one fleet of 6 nodes behind a load balancer: free = 10 rps / 50k per month, pro = 100 rps / 5M per month, enterprise = 1,000 rps / uncapped. A pro key sending a 250-request burst should be allowed (its capacity is 300 tokens) but a free key sending the same burst should be cut at ~20. If the limiter resolved every key to one global bucket, or kept per-node buckets, a single pro key spread over 6 nodes could spend 600 rps — 6× its contract. Keying the bucket by (account, tier) in Redis and consuming atomically holds each account to exactly its tier’s number regardless of which node answers.

Decision table: how to key and enforce

Choice	Option A	Option B	Use
Bucket key	per API key	per `(account, tier)`	`(account, tier)` so all of an account’s keys share one limit
Atomicity	INCR + EXPIRE (two calls)	single Lua script	Lua — no read-modify-write race, one round-trip
Rate algorithm	fixed window	token bucket	token bucket — smooth bursts up to `burst`
Quota counter	sliding log	INCR with month TTL	INCR for the gate; sliding log only if it bills (see below)
Unknown tier	allow	clamp to smallest	clamp to `free` — never grant the largest reservoir by accident
Redis down	fail-closed	fail-open on rate	fail-open rate, fail-closed quota

Step-by-step implementation

Build it in this order; each step is independently testable.

Load tier policies (rate, burst, quota Load tier policies (`rate`, `burst`, `quota`) from config into an in-memory table.
Resolve the incoming API key to {account, tier} Resolve the incoming API key to `{account, tier}` with a short-TTL cache.
Compute the bucket key (account, tier) and the quota key (account, billing_month) Compute the bucket key `(account, tier)` and the quota key `(account, billing_month)`.
Run one atomic Lua script: refill+consume the token bucket, then INCR the quota.
Map the returned decision to 200 / 429 / 402 Map the returned decision to `200` / `429` / `402` and set headers.
Verify with curl and a concurrent load test that the aggregate accepted rate matches the tier.

1–3. Resolve the key and compute keys

# Python (redis-py). Resolve key -> tier, cache briefly, derive Redis keys.
import time, datetime, redis
r = redis.Redis(host="localhost", port=6379, decode_responses=True)

POLICIES = {
    "free":       {"rate": 10,   "burst": 20,   "quota": 50_000},
    "pro":        {"rate": 100,  "burst": 300,  "quota": 5_000_000},
    "enterprise": {"rate": 1000, "burst": 2000, "quota": -1},   # -1 = uncapped
}

_cache: dict[str, tuple[str, str, float]] = {}   # api_key -> (account, tier, exp)

def resolve_tier(api_key: str) -> tuple[str, str]:
    hit = _cache.get(api_key)
    if hit and hit[2] > time.time():
        return hit[0], hit[1]
    account, tier = lookup_account(api_key)        # your DB / auth-service call
    _cache[api_key] = (account, tier, time.time() + 30)
    return account, tier

def billing_keys(account: str, tier: str) -> tuple[str, str, int]:
    month = datetime.datetime.utcnow().strftime("%Y-%m")
    now = datetime.datetime.utcnow()
    nxt = datetime.datetime(now.year + (now.month == 12),
                            (now.month % 12) + 1, 1)
    ttl = int((nxt - now).total_seconds())          # seconds to month reset
    return f"rl:rate:{account}:{tier}", f"rl:quota:{account}:{month}", ttl

4. Atomic Lua: token bucket + quota in one round-trip

-- KEYS[1]=rate bucket  KEYS[2]=quota counter
-- ARGV: cap, rate(tok/s), quota(-1=uncapped), now_ms, quota_ttl_s
-- Returns: {decision, rate_remaining, quota_remaining}
local cap   = tonumber(ARGV[1])
local rate  = tonumber(ARGV[2])
local quota = tonumber(ARGV[3])
local now   = tonumber(ARGV[4])
local qttl  = tonumber(ARGV[5])

local b = redis.call('HMGET', KEYS[1], 'tokens', 'ts')
local tokens = tonumber(b[1]) or cap
local ts     = tonumber(b[2]) or now
-- refill since last touch, capped at burst capacity
tokens = math.min(cap, tokens + (now - ts) / 1000 * rate)

if tokens < 1 then                                   -- rate exceeded
  redis.call('HSET', KEYS[1], 'tokens', tokens, 'ts', now)
  redis.call('PEXPIRE', KEYS[1], math.ceil(cap / rate * 1000) + 1000)
  return { 'RATE', math.floor(tokens), -1 }
end

local q_remaining = -1
if quota >= 0 then
  local used = redis.call('INCR', KEYS[2])
  if used == 1 then redis.call('EXPIRE', KEYS[2], qttl) end
  if used > quota then                               -- quota exhausted
    return { 'QUOTA', math.floor(tokens), 0 }
  end
  q_remaining = quota - used
end

tokens = tokens - 1                                  -- consume one token
redis.call('HSET', KEYS[1], 'tokens', tokens, 'ts', now)
redis.call('PEXPIRE', KEYS[1], math.ceil(cap / rate * 1000) + 1000)
return { 'OK', math.floor(tokens), q_remaining }

5. Call the script and map the decision to headers

ENFORCE = r.register_script(LUA_SOURCE)   # LUA_SOURCE = the script above

def enforce(api_key: str) -> dict:
    account, tier = resolve_tier(api_key)
    p = POLICIES.get(tier, POLICIES["free"])         # unknown tier -> smallest
    rate_key, quota_key, qttl = billing_keys(account, tier)
    decision, rate_rem, quota_rem = ENFORCE(
        keys=[rate_key, quota_key],
        args=[p["burst"], p["rate"], p["quota"], int(time.time() * 1000), qttl],
    )
    return {"decision": decision, "tier": tier,
            "rate_remaining": rate_rem, "quota_remaining": quota_rem,
            "quota_reset": qttl, "limit": p["rate"]}

# Framework layer (FastAPI-style) mapping the decision to status + headers.
from fastapi import Response, HTTPException

def apply(res: Response, d: dict):
    res.headers["RateLimit-Limit"] = str(d["limit"])
    res.headers["RateLimit-Remaining"] = str(max(0, d["rate_remaining"]))
    if d["quota_remaining"] >= 0:
        res.headers["X-Quota-Remaining"] = str(d["quota_remaining"])
        res.headers["X-Quota-Reset"] = str(d["quota_reset"])
    if d["decision"] == "RATE":
        res.headers["Retry-After"] = "1"
        raise HTTPException(429, "rate_limited")
    if d["decision"] == "QUOTA":
        raise HTTPException(402, "quota_exceeded")

Gotchas & edge cases

Sharing one bucket across an account’s keys. Keying by (account, tier) means all of an account’s API keys draw from one reservoir — usually what you want for billing fairness. If keys must be independent, scope per key instead; see API key scoping & rate limits.
Quota consumed before the response is sent. The Lua INCR charges the quota at admission, so a request that later 5xxs still counted. For metered billing this is wrong — use idempotency keys and reconciliation.
Stale tier on upgrade. A 30 s cache TTL means a freshly upgraded customer stays on the old tier for up to 30 s unless you publish an invalidation event.
Month boundary race. Two requests crossing midnight on the 1st can land in different YYYY-MM keys; this is harmless for gating but matters if you reconcile against billing.
PEXPIRE keeps idle buckets cheap. Without the TTL, every key an account ever used lingers in Redis memory.

Verification & testing

# Free key: capacity 20, refill 10 rps. Fire 30 quick requests -> ~20 pass, rest 429.
for i in $(seq 1 30); do
  curl -s -o /dev/null -w "%{http_code} " -H "X-API-Key: free_demo" \
    https://api.example.com/v1/ping
done; echo

# Aggregate test: 6 workers, one pro key, 3s. Expect ~100 rps accepted TOTAL,
# not 600 — proves the (account, tier) bucket is global, not per-node.
seq 6 | xargs -P6 -I{} sh -c \
  'hey -z 3s -c 20 -H "X-API-Key: pro_demo" https://api.example.com/v1/ping' \
  | grep -E "Requests/sec|Status code distribution" -A4

Watch the accepted-vs-rejected ratio and Redis call latency while the test runs; wire it up per Prometheus metrics for rate limiting.

Frequently Asked Questions

Why key the bucket by account and tier instead of by API key?

Billing and fairness are per account, not per key. If an account rotates or issues several keys, keying by (account, tier) holds them all to one reservoir. Key per API key only when each key is sold as its own independent limit.

Should the rate check or the quota check run first?

Rate first. It is cheaper and rejects floods before they touch the monthly counter, so a client hammering you in a tight loop never burns quota it would otherwise consume on retries. The Lua script above only increments the quota once the rate bucket has tokens.

How do I reset the monthly quota?

Don't reset it explicitly. Key the counter by account:YYYY-MM and set EXPIRE to seconds-until-month-end on first write. The next month uses a new key that starts at zero, and Redis evicts the old one automatically — no cron sweep.

What status code should an exhausted monthly quota return?

402 Payment Required (or a 403 with a quota_exceeded reason). A spent quota is not a transient condition, so returning 429 wrongly tells clients to retry. Reserve 429 for the per-second rate limit and pair it with Retry-After.

Tiered Access & Quota Enforcement — the parent topic covering the two-layer model and response contract.
API Key Scoping & Rate Limits — when to scope per key, scope, or route instead of per account.
Billing-Critical Sliding-Log Usage — exact counters and idempotency when usage drives invoices.
Token Bucket Implementation — the algorithm behind the rate axis.
Redis Counter Architecture — atomic counters and key-expiration patterns.