Database Scaling
Read: Database sharding
Vertical scaling
Scale up the database instance, add more ram and stuff
Horizontal scaling (sharding)
Sharding separate large database into smaller, more easy managed database called shard
- Each shard has the same schema (although the data is unique for each shard)
So everytime you query the database, it needs to go to the correct shard.
For example, if we have user_id
as the sharding key. We need an algorithm to find a hash function. Let's use $hash =\text{userId}\mod{4}$
If the result is 1
, it will go to shard 1
. If the result is 0
it will go to shard 0
and so on.
[!note]
Sometimes sharding is very difficult to shard it right. Therefore scale based on read replica is better.
Carveat
Resharding data:
- If single shard grow too much
- Uneven data distribution (might need to update the sharding function)
Celebrity problem
- hotspot, to solve this problem, we assign each shard a celebrity
Join and de-normalisation
- It's harder to perfrom join operations across database shards
- Common workaround is to denormalise database so that queries can be perform in a single table