Erasure Coding
Technology that's commonly used in object storage such as S3
to improve durability
Use Erasure Coding
if the storage is blob storage because it's much more efficient than Replication. However, the trade off is we have slow data access
For example in this case we're looking at a 4 + 2
example. Which splitted the data into 4 chunks and 2 parties for replications.
So the steps are as follows:
- Data is split into 4 chunks:
d1
,d2
,d3
andd4
- We use these 4 chunks to calculate
p1
andp2
. For example using mathematical algorithm formulap1 = d1 + 2 * d2 - d3 + 4 * d4
p2 = -d1 + 5 * d2 + d3 - 3 * d4
d3
andd4
are lost- We use
p1
,p2
together withd1
,d2
,d3
andd4
to calculated3
andd4
.
This way of doing it save 50% of data in contrast to replication for example.
1GB of data doing Erasure Coding will take 1.5GB (50% more) with (4 + 2)
With replication it will take 3 GB (200% more)