Redis Cluster vs CRDT Rate Limiting

Q: How much does a CRDT counter over-admit during a partition?

In the worst case, up to N times the limit, where N is the number of mutually isolated partitions, for the duration of the split. You bound it by shortening the gossip interval, reserving headroom by enforcing at 80 to 90 percent of the true limit, or capping each node's slot at limit divided by node count.

Q: Does Redis Cluster mode give a global limit out of the box?

Yes, per key. Each key lives on one slot owned by one primary, so increments for that key are serialized and exact. The catch is that a multi-key Lua script for one limit must hash to a single slot via a hash tag, and a partition isolating that primary stalls or fails the affected checks.

Once a rate limit must hold across many nodes, you face one architectural fork: keep a single authoritative counter in Redis (Cluster mode), or let every node hold a partial counter and reconcile them with a CRDT or gossip protocol. This decision belongs to the Distributed Algorithm Sync parent topic, and it is fundamentally the CAP-theorem choice applied to a counter: under a network partition you can keep the count exact (and stall, sacrificing availability) or keep serving (and drift, sacrificing consistency). You cannot have both.

The problem in concrete numbers

Suppose you enforce 10,000 requests/minute per tenant across 8 gateway nodes in two regions. A centralized counter in Redis Cluster mode keeps one number per tenant, so the limit is exact — but every decision is a cross-node (sometimes cross-region) round-trip of roughly 0.3–1 ms intra-region and 30–80 ms cross-region, and a partition that isolates the Redis primary stalls or fails every check routed to it. A CRDT g-counter keeps a per-node sub-count and merges asynchronously every, say, 200 ms; no request ever waits on the network, and a partition only means each side counts independently. The price is over-admission: during a 5-second partition with traffic split 4/4 across nodes, each side can independently approach the full 10,000, so a tenant briefly sees up to ~2× the advertised limit.

Decision matrix

Criterion	Redis Cluster (centralized)	CRDT / gossip counters
Consistency model	Strong per key (single slot owner)	Eventual; converges after merge
Accuracy under partition	Exact on majority side; minority stalls	Over-admits up to ~N× across isolated sides
Added per-request latency	One round-trip: ~0.3–1 ms intra-region	~0 (local read); merge is off the request path
Availability during partition	Minority side fails or fails-open	Both sides keep serving
Throughput ceiling	Redis QPS + slot hotspots	Node-local, scales with fleet
Operational complexity	Moderate: run, shard, fail over Redis	High: merge logic, anti-entropy, convergence bugs
Recovery behavior	Immediate once primary reachable	Counts merge on heal; transient overshoot clears
Best fit	Hard/billing limits, modest node count	Huge fleets, soft limits, partition-prone topologies

Selection rules:

Use Redis Cluster mode when the limit is billing- or compliance-critical, when a brief stall is more acceptable than over-admission, and when your node count and round-trip budget make one authoritative counter cheap. This is the right default for most teams.
Use CRDT/gossip counters when availability outranks exactness (soft abuse limits, fairness shaping), when the fleet is large enough or partition-prone enough that a central store becomes a bottleneck or a liability, and when you can tolerate — and bound — transient overshoot.
Hybrid: run a generous CRDT/local layer that absorbs bursts and a periodic reconciliation against an authoritative counter in the Redis counter architecture for the hard ceiling. You get availability in the common case and a backstop on the number that matters.

Step-by-step: a CRDT g-counter for distributed rate limiting

A grow-only counter (g-counter) is the simplest CRDT that fits rate limiting within a fixed window: each node owns one slot in a vector, increments only its own slot, and the merged value is the sum of per-node maxima. Increments commute and are idempotent under merge, so order and duplication do not corrupt the count.

Assign every node a stable id (slot index) that survives restarts.
Keep a per-window vector {nodeId: localCount} Keep a per-window vector `{nodeId: localCount}`; reset it on window roll using a window-aligned key.
Increment only the local slot on each admitted request; never write another node's slot.
Gossip the vector to a few random peers on a fixed interval (e.g. 200 ms) and merge inbound vectors.
Compute the effective count as the sum over slots; deny when it reaches the limit.
Bound overshoot by lowering the gossip interval or reserving headroom (enforce at, say, 90% of the true limit).

import time, threading

class GCounter:
    """Grow-only CRDT counter keyed by node id, per fixed window."""
    def __init__(self, node_id: str, limit: int, window_s: int):
        self.node_id = node_id
        self.limit = limit
        self.window_s = window_s
        self._lock = threading.Lock()
        self._slots: dict[str, int] = {}     # nodeId -> count, this window
        self._window = self._current_window()

    def _current_window(self) -> int:
        return int(time.time()) // self.window_s

    def _roll_if_needed(self) -> None:
        w = self._current_window()
        if w != self._window:           # new window: g-counter is reset wholesale
            self._slots = {}
            self._window = w

    def value(self) -> int:
        # Effective count = sum of every node's local sub-count.
        return sum(self._slots.values())

    def try_acquire(self) -> bool:
        with self._lock:
            self._roll_if_needed()
            if self.value() >= self.limit:
                return False            # already at the global ceiling
            self._slots[self.node_id] = self._slots.get(self.node_id, 0) + 1
            return True                 # increment only our own slot

    def merge(self, other_slots: dict[str, int], other_window: int) -> None:
        # Idempotent, commutative merge: take the per-node maximum.
        with self._lock:
            self._roll_if_needed()
            if other_window != self._window:
                return                  # ignore stale-window gossip
            for nid, cnt in other_slots.items():
                self._slots[nid] = max(self._slots.get(nid, 0), cnt)

    def snapshot(self) -> tuple[dict[str, int], int]:
        with self._lock:
            return dict(self._slots), self._window

The merge takes the maximum of each node’s slot, not the sum, because two gossip messages may both carry the same node’s progress — taking the max makes re-delivery harmless. The effective count is then the sum across slots. This is why the counter over-admits during a partition: each isolated side advances its own slots freely, and only on heal does the merged sum reveal the true total.

Gotchas & edge cases

Window rolls must be deterministic. Both sides must agree on window boundaries or the merge silently drops counts. Anchor windows to a shared clock and watch clock skew — a node whose clock is fast rolls early and undercounts.
Over-admission is unbounded by node count, not by time. With 8 nodes fully partitioned into singletons, the worst case approaches 8× the limit for the partition’s duration. Reserve headroom or cap the per-node slot at limit / nodeCount if a hard ceiling matters.
g-counters only grow — they fit fixed windows, not token buckets. A refilling bucket needs a PN-counter or a different CRDT; do not bolt decrement onto a g-counter.
Redis Cluster keys must hash to one slot. A multi-key Lua script for one limit must use a hash tag (e.g. rl:{tenant42}:min) or Redis rejects the cross-slot operation. This is the most common Redis Cluster mode rate-limiting bug.
Fail-open vs fail-closed is still a decision on the Redis path. A partition that isolates the primary forces the question; decide deliberately and alert on the fallback.

Verification & testing

Partition the network mid-load and confirm each design behaves as the model predicts: Redis stalls the minority side and stays exact; the CRDT keeps serving and over-admits, then converges on heal.

# Drive 8 workers at the same tenant key for 30s, severing the link at t=10s.
seq 8 | xargs -P8 -I{} sh -c \
  'hey -z 30s -c 25 -H "X-Tenant: t42" https://gw-{}.internal/v1/run' \
  | grep -E "Requests/sec|Status code distribution"
# In another shell, drop traffic between the two AZ subnets for 5s:
sudo tc qdisc add dev eth0 root netem loss 100% ; sleep 5 ; sudo tc qdisc del dev eth0 root

Track accepted-vs-rejected totals and, for the CRDT, the peak effective count during the split versus after convergence; see Prometheus metrics for rate limiting for the instrumentation.

Frequently Asked Questions

Is CRDT-based rate limiting ever worth the complexity?

Only when availability genuinely outranks exactness and the fleet is large or partition-prone enough that a central counter is a bottleneck or single point of failure. For most teams a single Redis (Cluster mode with replicas) is simpler, accurate, and fast enough; reach for CRDTs for soft limits at very large scale.

How much does a CRDT counter over-admit during a partition?

In the worst case, up to N× the limit, where N is the number of mutually isolated partitions, for the duration of the split. You bound it by shortening the gossip interval, reserving headroom (enforce at 80–90% of the true limit), or capping each node's slot at limit / nodeCount.

Does Redis Cluster mode give a global limit out of the box?

Yes, per key — each key lives on one slot owned by one primary, so increments for that key are serialized and exact. The catch is that a multi-key Lua script for one limit must hash to a single slot via a hash tag, and a partition isolating that primary stalls or fails the affected checks.

Why does the g-counter merge use max instead of sum?

Because gossip can deliver the same node's progress more than once. Taking the per-node maximum makes merges idempotent and commutative, so re-delivery and reordering never double-count. The global value is then the sum across nodes' slots.

Distributed Algorithm Sync — the parent topic on cross-node state coordination, clock skew, and partition tolerance.
Redis Counter Architecture — building and scaling the authoritative centralized store.
Redis vs In-Memory Token Bucket — the simpler per-node-versus-global version of this same tradeoff.
Algorithm Tradeoff Analysis — where distributed complexity sits among the other selection axes.