Vector Clock
Note that vector clock does not resolve conflict, it only used for conflict detection.
In the case conflict happens, we have 2 options to resolve:
- Latest write win (problematic)
- System clock is not reliable, clock can Clock drift or skewed. Even using Network Time Protocol (NTP) we still have 10ms - 500ms inconsistency
- The only solution for it would be using TrueTime for true atomic clock. However complexity
- Return result back for user to resolve conflict (Safest)
See: Merkle Tree vs Vector Clock
Implementation
Is a [server, version] pair with a data item $D$ to detect conflicts in a distributed system. For example a data item can look like this:
D([S1, V1], [S2, V1], [S3, V2], ... [Sn,Vn])
This means the data is distributed equally to different servers. Each server keeps track of a version of the data. In this case, our data $D$ was distributed in $n$ different server. In which in $S_1$ it has version 1, and in $S_3$ it was version 2 - which means was modified twice.
Rules for writing data
When writing data $D$ to the server Sx. The server must then increase its corresponding Vi for [Sx, Vi] if its exists. Otherwiese it should add a new entry.
For example:
- Client write data
Dthat handled byS1. S1now addDas a vector clock with valueD([S1,V1])- Client read data
Dand write dataDthat again received byS1. S1now increment its vector clock inDwhich results intoD([S1,V2])- Client read data
Dand write dataDthat received byS2. S2now addDas a vector clock with valueD([S1,V2], [S2, V1])
Conflict detection
In order to resolve a conflict, we need to start code our own client to do it. However, Vector Clock provides an easy way to detect confliction. A conflict happens if you can detect different in version if your ancestor (the previous change) vector's clock has any participant that greater than your current vector clock.
For example, supposed we have
D1([S0, 1], [S1, 2]) -> D2([S0, 2], [S1, 1])
In this case, D1 is the parent of D2, we compare each element in D1 and make sure it's < than each corresponding element in D2.
[S0, 1]<[S0, 2]: No conflict[S1, 2]>[S1, 1]: Conflict
Therefore this 2 keypair in here is a conflicts. Of course now you need to know which one comes first. So in a real-world system, a timestamp or other criteria is necessary to be tie-breaking rule.
Example
Considering the following scenario:
- Client write data
Dthat served byS1. S1now addDas a vector clock with valueD([S1,V1])- Client write data
Dto the server.
Now for whatever reasons, network partition happens and the servers are not able to talk to eachother
S1picks up the write and updateD([S1, V2])S2also picks up the write and updateD([S1, V1], [S2, V1])
Now network has been recovered. and the servers can talk to each other again.
- Client read
D, however there is a conflict. Server doesn't know how to resolve this conflict and send both back to our client. Therefore our client see:D([S1, V2]): value for the data isfooD([S1, V1], [S2, V1]): value for the data isbar
- The client now needs to resolve this conflict. Let's supposed that the client decided to merge these changes and determined that the latest value is
bar. We have the final valueD([S1, V2], [S2, V1])with valuebar
- This value is then sent to the server let's say
S1takes this value. This value is then get updated and store in the server:D([S1, V3], [S2, V1])with valuebar
Note
For DynamoDB, conflict resolvation decided to resolve when there is a
clientthat reads the conflicted value. Otherwise, it stays conflicted — therefore eventually consistentcyDue to the keypair
[server, version]keeps growing, DynamoDB has set a threshold to remove the oldest pair when the size exceeded this limit. So far Amazon has not encountered any problem with this approach in production.
Resources: Vector Clocks and Conflicting Data - Grokking the Advanced System Design Interview (designgurus.io)