Erasure Coding

Technology that's commonly used in object storage such as S3 to improve durability

Use Erasure Coding if the storage is blob storage because it's much more efficient than Replication. However, the trade off is we have slow data access

Pasted image 20230906203850.png

For example in this case we're looking at a 4 + 2 example. Which splitted the data into 4 chunks and 2 parties for replications.

So the steps are as follows:

  1. Data is split into 4 chunks: d1, d2, d3 and d4
  2. We use these 4 chunks to calculate p1 and p2. For example using mathematical algorithm formula
    • p1 = d1 + 2 * d2 - d3 + 4 * d4
    • p2 = -d1 + 5 * d2 + d3 - 3 * d4
  3. d3 and d4 are lost
  4. We use p1, p2 together with d1, d2, d3 and d4 to calculate d3 and d4.

This way of doing it save 50% of data in contrast to replication for example.

1GB of data doing Erasure Coding will take 1.5GB (50% more) with (4 + 2)

Pasted image 20230906204504.png

With replication it will take 3 GB (200% more)

Pasted image 20230906204534.png