Local Cache Eventually-Consistent Redis

High level description

In the event where we want to decrease the latency added between our rate limiter and the service, we can have a local cache within the rate limiter itself. The local cache could be a simple ConcurrentHashMap.

This local cache will by asynchronously sync to the rate limiter eventually, Also fetch the new bucket value from the rate limiter

graph TD
  Client -->|request| GW[API Gateway]
  GW -->|"check(userId)"| RL[Rate Limiter Instance]
  
  subgraph RL_Internal[Rate Limiter Instance]
    Check[Check Local Cache<br/>ConcurrentHashMap] -->|hit: decrement locally| Verdict[Return allow/deny]
    Check -->|miss: fetch from Redis| Redis_Fetch[Fetch bucket from Redis]
    Redis_Fetch --> Verdict
  end

  RL -->|verdict| GW
  
  LocalCache[Local Cache<br/>ConcurrentHashMap<br/>key: userId<br/>val: tokens remaining] -->|"async sync every N ms<br/>(push local decrements)"| Redis[(Redis: source of truth)]
  Redis -->|"async pull<br/>(fetch updated bucket values)"| LocalCache

The process is:

The request arrives, we check the ConcurrentHashmap to see if the user token is inside
If we found locally, we then decrement in-memory.
A background sync will
1. push the local decrement to redis so that other instances can see
2. pull the updated token (also pickup refill rate)
  - NOTE: The refill rate only happen in redis side

The trade-off (what you correctly identified in the interview):

	Local cache	Direct Redis
Latency	~microseconds (in-process HashMap)	~1-2ms (network round-trip)
Accuracy	Stale by up to sync interval — bounded over-admission	Perfectly accurate
Complexity	Background sync thread, conflict resolution	Simple Lua script

[!NOTE]
Bounded over-adminission means we allow them go over the limit but bounded within our control

How it works

The ConcurrentHashMap only stores the current token counts. You sync per user key, you don't sync the whole map. We use delta-based approach

Push (local → redis):

// Don't send "user has 47 tokens"
// Send "user consumed 3 tokens since last sync"
EVAL lua_script: DECRBY rl:user123 3

Pull (redis → local):

// After pushing your delta, read back the authoritative count
new_count = GET rl:user123
localCache.put("user123", new_count)

The push-then-pull happens atomically in one Lua script call:

-- Lua script on Redis (atomic)
local key = KEYS[1]
local local_decrements = tonumber(ARGV[1])
redis.call('DECRBY', key, local_decrements)  -- apply this instance's usage
local current = redis.call('GET', key)        -- read back global truth
return current                                 -- send back to instance

Step	Why it's safe
Push delta (not absolute value)	Two instances pushing "I used 3" and "I used 5" → Redis gets both, total decreases by 8. No conflict.
Lua script is atomic	The DECRBY + GET runs as one uninterruptible operation. No other command can sneak in between.
Pull after push	You read the global truth after your delta is applied, so your local cache is now accurate (until the next local d

Key point to avoid conflict

We only push the operation not the actual value, which is the principle behind CRDTs (Conflict-free replicated data types). Absolute counts would cause last-write-wins conflict

❌ Instance A: "user has 47 tokens" (overwrites B's work)
✅ Instance A: "user consumed 3 tokens" (combines with B's work)