Implementing Sliding Window in Memory
1. Architectural Context: Why In-Memory Sliding Windows?
Naive fixed-window counters introduce boundary spikes where traffic doubles at window rollover, violating strict API SLAs. This guide shows how to build a single-node sliding window in process memory, the concrete implementation behind the precision advantage described in the parent Fixed Window vs Sliding Window guide. For low-latency API gateways, the operational gap between fixed counters and precise sliding logic is measured in sub-2ms P99 latency and strict heap allocation constraints.
When evaluating where this primitive fits, the in-memory sliding window emerges as the optimal compromise between distributed accuracy and local execution speed. By anchoring state to process memory, you bypass network round-trips and serialization overhead inherent in Redis-backed counters. However, this architecture demands rigorous memory management: every millisecond of GC pause or unbounded heap growth directly translates to dropped requests or cascading timeouts.
To guarantee deterministic performance, implementations must rely on monotonic clock sources rather than wall-clock time. Monotonic timers (process.hrtime.bigint() in Node.js; in Go, time.Now() embeds a monotonic reading since Go 1.9, so a t2.Sub(t1) duration stays correct across NTP steps) prevent negative deltas during clock adjustments and ensure window boundaries advance predictably under all system load conditions.
2. Core Data Structure: Bounded Circular Buffers & Sorted Timestamps
Tracking request timestamps efficiently requires a memory layout that avoids dynamic allocations during the hot path. Two primary approaches exist: dynamic sorted sets (e.g., TreeSet, BTree) and array-backed circular buffers. While sorted sets offer O(log N) lookups, they incur pointer chasing, node allocations, and unpredictable GC pressure. For high-throughput rate limiting, a pre-allocated circular buffer is strictly superior.
The diagram traces how headIndex and tailIndex move over a fixed-capacity ring: new timestamps are written at the head, expired ones are skipped by advancing the tail, and the live count is the gap between the two pointers.
Memory Layout & Pointer Arithmetic
The buffer stores raw timestamp integers (int64 for nanosecond precision). Two pointers govern traversal:
headIndex: Points to the next available slot for insertion.tailIndex: Points to the oldest valid timestamp within the active window.
Insertion is strictly O(1). When a new request arrives, the timestamp is written to data[headIndex], and headIndex advances modulo capacity. Boundary calculation avoids full array scans by lazily advancing tailIndex past expired entries. The active request count is derived as (headIndex - tailIndex + capacity) % capacity.
Garbage Collection Avoidance
Dynamic arrays trigger stop-the-world GC cycles when resized. By pre-allocating maxRequests * sizeof(int64) bytes during initialization and reusing the same memory block across the process lifecycle, you eliminate allocation churn. Timestamp precision should be capped at milliseconds for rate limiting; nanosecond granularity rarely impacts business logic but doubles storage footprint and comparison overhead.
3. Implementation Pattern: Thread-Safe Request Tracking
Concurrency control is the critical failure surface for in-memory counters. In multi-threaded or worker-threaded runtimes, unprotected writes to shared circular buffers cause double-counting or lost increments. The following implementation uses atomic pointer advancement and per-client mutex isolation to guarantee linearizability without global lock contention.
package ratelimit
import (
"sync"
"time"
)
// CircularBuffer holds pre-allocated timestamp slots
type CircularBuffer struct {
data []int64
capacity int
headIndex int
tailIndex int
count int
mu sync.Mutex
}
// InMemorySlidingWindowCounter implements thread-safe sliding window logic
type InMemorySlidingWindowCounter struct {
maxRequests int
windowMs int64
cleanupIntervalMs int64
clients sync.Map
lastCleanupTs int64
lockMutex sync.Mutex
}
// NewInMemorySlidingWindowCounter initializes the counter with bounded memory
func NewInMemorySlidingWindowCounter(maxRequests int, windowMs int64, cleanupIntervalMs int64) *InMemorySlidingWindowCounter {
return &InMemorySlidingWindowCounter{
maxRequests: maxRequests,
windowMs: windowMs,
cleanupIntervalMs: cleanupIntervalMs,
lastCleanupTs: time.Now().UnixMilli(),
}
}
// RecordRequest evaluates and tracks a client request. Returns true if allowed.
func (c *InMemorySlidingWindowCounter) RecordRequest(clientID string) bool {
buf, _ := c.clients.LoadOrStore(clientID, &CircularBuffer{
data: make([]int64, c.maxRequests),
capacity: c.maxRequests,
})
cb := buf.(*CircularBuffer)
cb.mu.Lock()
defer cb.mu.Unlock()
now := time.Now().UnixMilli()
cutoff := now - c.windowMs
// Advance tail past expired entries (amortized O(1))
for cb.count > 0 && cb.data[cb.tailIndex] <= cutoff {
cb.tailIndex = (cb.tailIndex + 1) % cb.capacity
cb.count--
}
if cb.count >= c.maxRequests {
return false
}
// Insert timestamp and advance head
cb.data[cb.headIndex] = now
cb.headIndex = (cb.headIndex + 1) % cb.capacity
cb.count++
return true
}
// GetCurrentUsage returns active requests within the sliding window
func (c *InMemorySlidingWindowCounter) GetCurrentUsage(clientID string) int {
buf, ok := c.clients.Load(clientID)
if !ok {
return 0
}
cb := buf.(*CircularBuffer)
cb.mu.Lock()
defer cb.mu.Unlock()
now := time.Now().UnixMilli()
cutoff := now - c.windowMs
active := 0
for i := 0; i < cb.count; i++ {
idx := (cb.tailIndex + i) % cb.capacity
if cb.data[idx] > cutoff {
active++
}
}
return active
}
// EvictStaleEntries runs periodic cleanup to reclaim memory from inactive clients
func (c *InMemorySlidingWindowCounter) EvictStaleEntries() {
c.lockMutex.Lock()
now := time.Now().UnixMilli()
if now-c.lastCleanupTs < c.cleanupIntervalMs {
c.lockMutex.Unlock()
return
}
c.lastCleanupTs = now
c.lockMutex.Unlock()
c.clients.Range(func(key, value interface{}) bool {
cb := value.(*CircularBuffer)
cb.mu.Lock()
if cb.count == 0 {
c.clients.Delete(key)
}
cb.mu.Unlock()
return true
})
}
// Reset clears all state for a specific client
func (c *InMemorySlidingWindowCounter) Reset(clientID string) {
c.clients.Delete(clientID)
}
Concurrency Strategy
- Lock-Free CAS: For single-writer scenarios,
sync/atomiccan replacesync.MutexonheadIndexandtailIndex. However, timestamp comparison requires sequential consistency, making mutex-protected critical sections safer for general API gateways. - Client Isolation: Using
sync.Mapwith per-client buffers prevents hot-key contention. Each clientβs rate limit state is evaluated independently, enabling dynamic weight calculation based on tier or subscription level.
4. Configuration Blueprint & Tuning Parameters
Configuration directly dictates memory footprint and CPU utilization. The schema below maps operational constraints to runtime behavior.
rate_limiting:
sliding_window:
window_size_ms: 60000 # 1-minute evaluation horizon
max_requests: 100 # Buffer capacity per client
cleanup_interval_ms: 10000 # Background eviction frequency
memory_limit_mb: 256 # Hard cap for process heap
eviction_policy: "lru" # Fallback when memory_limit_mb is reached
Mapping to System Resources
- Buffer Capacity:
max_requests * 8 bytes(int64) per client. At 10,000 concurrent clients withmax_requests: 100, memory consumption is ~7.6 MB. Exceedingmemory_limit_mbtriggers forced LRU eviction. - CPU Cycles:
cleanup_interval_mscontrols background goroutine frequency. Too low (<1s) causes thread contention; too high (>30s) allows stale client maps to bloat. - Trade-Off Analysis: As detailed in Fixed Window vs Sliding Window, sliding windows consume 3β5x more memory and CPU than fixed counters. When
max_requestsexceeds 10,000 orwindow_size_msdrops below 100ms, computational overhead justifies migrating to token-bucket approximations or distributed probabilistic counters.
5. Failure-Mode Analysis & Edge Case Mitigation
In-memory sliding windows are highly performant but introduce specific failure surfaces. Systematic mitigation is required for production API gateways.
| Failure Mode | Reproduction / Impact | Mitigation Strategy |
|---|---|---|
| Race Conditions | Concurrent writes to headIndex without atomic guards cause double-counting. |
Use sync.Mutex per client buffer or sync/atomic.CompareAndSwapInt64 for pointer advancement. Never mutate buffer state without holding the lock. |
| Memory Leaks | Stale client IDs persist indefinitely if EvictStaleEntries fails or tailIndex never advances. |
Implement periodic LRU sweep with hard memory_limit_mb enforcement. Force-delete clients exceeding TTL regardless of buffer state. |
| Clock Skew | NTP syncs or system hibernation cause now - window to jump backward, invalidating timestamps. |
Anchor all calculations to monotonic clocks. Reject wall-clock deltas >500ms as anomalous and trigger circuit breaker fallback. |
| Burst Overload | Sudden traffic spikes fill buffers faster than eviction runs, causing O(N) scan latency. | Implement token-bucket fallback for >2x baseline RPS. Queue excess requests with exponential backoff before hard rejection. |
| GC Pressure | High-frequency timestamp allocations trigger stop-the-world pauses >10ms. | Pre-allocate fixed-size buffers at startup. Use object pooling for map entries. Zero dynamic allocations in the RecordRequest hot path. |
Fallback Architecture
When memory limits are breached or GC pauses exceed 5ms, route traffic to a circuit breaker state. Return 429 Too Many Requests with Retry-After headers derived from window_size_ms. Log overflow events to metrics pipelines for capacity planning.
6. Performance Validation & Production Readiness
Deploying in-memory sliding windows requires rigorous load testing and observability integration. The following methodology ensures platform teams can validate latency and memory constraints before traffic exposure.
Benchmarking Framework Setup
- Toolchain:
k6orwrk2for sustained RPS injection. Target 50kβ100k concurrent virtual users. - Metrics Collection: Instrument P99 latency, heap allocation rate (
alloc_rate), and GC pause times (gc_pause_ns). - Success Criteria:
P99 latency < 2msunder 90% ofmax_requestsloadHeap growth < 5%over 15-minute steady stateGC pause < 5ms(99th percentile)
Heap Profiling Commands
# Capture allocation hotspots
go tool pprof -http=:8080 http://localhost:6060/debug/pprof/heap
# Trace GC pauses and scheduler latency
go tool trace -http=:8080 trace.out
# Monitor live memory pressure
watch -n 1 'ps -o pid,rss,vsz -p $(pgrep gateway)'
Production Rollout Checklist
- Canary Deployment: Route 1β5% of traffic to the new rate limiter. Compare
429 - Alert Thresholds: Configure alerts for
heap_usage > 80%,gc_pause_p99 > 5ms, andeviction_queue_depth > 1000 - Graceful Degradation
- State Persistence
- Documentation: Publish client-facing headers (
X-RateLimit-Limit,X-RateLimit-Remaining,X-RateLimit-Reset
When validated against these criteria, in-memory sliding windows deliver deterministic, sub-millisecond rate limiting suitable for high-throughput microservice architectures.
Frequently Asked Questions
When should I use a raw timestamp buffer versus the weighted approximation?
Use the raw circular buffer when you need exact counts and max_requests per client is small (say under a few hundred), since memory is max_requests Γ 8 bytes. Switch to the weighted two-counter approximation when limits are large or you have millions of keys β it is strict O(1) per client at the cost of a small approximation error near the boundary.
Why monotonic clocks instead of wall-clock time?
An NTP step or hibernation can move wall-clock time backward, producing negative deltas that corrupt the window boundary and let traffic through. Monotonic sources (process.hrtime.bigint() in Node, the monotonic reading embedded in Go's time.Now()) only ever move forward, so now - window stays correct.
Does an in-memory window hold a global limit across nodes?
No. Each process keeps its own buffer, so N nodes enforce up to N times the limit in aggregate β the same per-node overshoot that affects any local counter. If the advertised limit is global, back it with a shared store; an in-memory window is correct only for single-node deployments or coarse per-node caps.
How do I stop inactive clients from leaking memory?
Run a periodic sweep (the EvictStaleEntries routine) that deletes any client whose buffer count has drained to zero, and enforce a hard memory_limit_mb with LRU eviction as a backstop. Without eviction, the per-client map grows unbounded as one-off IPs accumulate.
Related
- Fixed Window vs Sliding Window β the parent guide and the boundary-spike motivation.
- Fixed Window Counter Drift Explained β the exact artifact this implementation removes.
- Sliding Log Counters β the distributed, exact-accounting counterpart.
- Redis vs In-Memory Token Bucket β when local state is enough versus a shared store.