Deal With Redis-Replica Lag

Sticky session

After write request, the application will forward all read of the user to the master redis to make sure that all the subsequent reads are up to date. This can be done using a session cookie.

Pros:

Simple to implement. Can use a cookie

Cons:

Inconsistent for other if they try to read

Versioning

For each key, we store the versioning. The application compares the version of data it need to what the replica currently has. If replica is stale, it read from master directly.

Pros:

Precise and can detect when the replica out of sync
Scalable: only fall back to master in scenario replica is delayed

Cons:

Increase development complexity

How would other people keep track of the latest version of a key

We need to use a global metadata store which keep the versioning. Everytime we read a key we can hit the redis primary cluster to be able to find the versioning

sequenceDiagram
    participant Writer
    participant Primary as Redis Primary
    participant Replica as Redis Replica (Lagging)
    participant ReaderA as Reader A
    participant ReaderB as Reader B

    Note over Writer, Replica: 1. Update Process
    Writer->>Primary: SET user:123:data "New Data"
    Writer->>Primary: INCR user:123:version (returns v5)
    
    Note over Primary, Replica: Async Replication Lag (v5 not here yet)

    Note over ReaderA, ReaderB: 2. Reading Process
    ReaderA->>Primary: GET user:123:version
    Primary-->>ReaderA: v5
    
    ReaderA->>Replica: GET user:123:v5
    Replica-->>ReaderA: [Cache Miss / Key Not Found]
    
    Note right of ReaderA: Reader A knows it's lagging.<br/>It fetches from Primary instead.
    ReaderA->>Primary: GET user:123:data
    Primary-->>ReaderA: "New Data"

    Note over ReaderB: Later, Replica catches up
    Replica->>Replica: Syncs v5 data
    
    ReaderB->>Primary: GET user:123:version
    Primary-->>ReaderB: v5
    ReaderB->>Replica: GET user:123:v5
    Replica-->>ReaderB: "New Data"

But this will increase the load on primary cluster?

If this is our concern, we can separate our metadata to another cluster that's highly sharded where our application communicate to get the version number.

The metadata redis cluster would not have any replica, it should be one cluster with many primary nodes. We can setup a master only cluster

graph TD
    Client[Client App] -->|get_version key| Slot{Master only redis cluster}
    Slot -->|0 to 5460| NodeA[Redis Primary Node A]
    Slot -->|5461 to 10922| NodeB[Redis Primary Node B]
    Slot -->|10923 to 16383| NodeC[Redis Primary Node C]
    
    subgraph Cluster
    NodeA
    NodeB
    NodeC
    end

redis-cli --cluster create 192.168.1.1:7000 192.168.1.2:7000 192.168.1.3:7000 --cluster-replicas 0

But what if one shard goes down, do we lost the versioning?

If we need high available, we can consider to use Zookeeper. Zookeeper has Quorum system that the data will be stored on n number of nodes. If we loose a node, we can still recover.

Feature	Redis Shards (Master-Only)	ZooKeeper
Consistency	Strong-ish (No lag, but risk of data loss on crash).	Strictly Consistent (Uses consensus algorithms).
Performance	Blazing Fast (Sub-millisecond reads/writes).	Slower (Writes must be agreed upon by a majority).
Scale	Scales linearly as you add more Master nodes.	Does not scale well for high-volume writes.
Reliability	If a node dies, that shard's data is offline.	Highly available (can lose nodes without downtime).