Local Cache Eventually-Consistent Redis
High level description
In the event where we want to decrease the latency added between our rate limiter and the service, we can have a local cache within the rate limiter itself. The local cache could be a simple ConcurrentHashMap.
This local cache will by asynchronously sync to the rate limiter eventually, Also fetch the new bucket value from the rate limiter
graph TD
Client -->|request| GW[API Gateway]
GW -->|"check(userId)"| RL[Rate Limiter Instance]
subgraph RL_Internal[Rate Limiter Instance]
Check[Check Local Cache<br/>ConcurrentHashMap] -->|hit: decrement locally| Verdict[Return allow/deny]
Check -->|miss: fetch from Redis| Redis_Fetch[Fetch bucket from Redis]
Redis_Fetch --> Verdict
end
RL -->|verdict| GW
LocalCache[Local Cache<br/>ConcurrentHashMap<br/>key: userId<br/>val: tokens remaining] -->|"async sync every N ms<br/>(push local decrements)"| Redis[(Redis: source of truth)]
Redis -->|"async pull<br/>(fetch updated bucket values)"| LocalCache
The process is:
- The request arrives, we check the
ConcurrentHashmapto see if the user token is inside - If we found locally, we then decrement in-memory.
- A background sync will
- push the local decrement to redis so that other instances can see
- pull the updated token (also pickup refill rate)
- NOTE: The refill rate only happen in redis side
The trade-off (what you correctly identified in the interview):
| Local cache | Direct Redis | |
|---|---|---|
| Latency | ~microseconds (in-process HashMap) | ~1-2ms (network round-trip) |
| Accuracy | Stale by up to sync interval — bounded over-admission | Perfectly accurate |
| Complexity | Background sync thread, conflict resolution | Simple Lua script |
[!NOTE]
Bounded over-adminission means we allow them go over the limit but bounded within our control
How it works
The ConcurrentHashMap only stores the current token counts. You sync per user key, you don't sync the whole map. We use delta-based approach
Push (local → redis):
// Don't send "user has 47 tokens"
// Send "user consumed 3 tokens since last sync"
EVAL lua_script: DECRBY rl:user123 3
Pull (redis → local):
// After pushing your delta, read back the authoritative count
new_count = GET rl:user123
localCache.put("user123", new_count)
The push-then-pull happens atomically in one Lua script call:
-- Lua script on Redis (atomic)
local key = KEYS[1]
local local_decrements = tonumber(ARGV[1])
redis.call('DECRBY', key, local_decrements) -- apply this instance's usage
local current = redis.call('GET', key) -- read back global truth
return current -- send back to instance
| Step | Why it's safe |
|---|---|
| Push delta (not absolute value) | Two instances pushing "I used 3" and "I used 5" → Redis gets both, total decreases by 8. No conflict. |
| Lua script is atomic | The DECRBY + GET runs as one uninterruptible operation. No other command can sneak in between. |
| Pull after push | You read the global truth after your delta is applied, so your local cache is now accurate (until the next local d |
Key point to avoid conflict
We only push the operation not the actual value, which is the principle behind CRDTs (Conflict-free replicated data types). Absolute counts would cause last-write-wins conflict
❌ Instance A: "user has 47 tokens" (overwrites B's work)
✅ Instance A: "user consumed 3 tokens" (combines with B's work)