Database Scaling

Read: Database sharding

Vertical scaling

Scale up the database instance, add more ram and stuff

Horizontal scaling (sharding)

Sharding separate large database into smaller, more easy managed database called shard

  • Each shard has the same schema (although the data is unique for each shard)

So everytime you query the database, it needs to go to the correct shard.

For example, if we have user_id as the sharding key. We need an algorithm to find a hash function. Let's use $hash =\text{userId}\mod{4}$

Pasted image 20221231173540.png

If the result is 1, it will go to shard 1. If the result is 0 it will go to shard 0 and so on.

Pasted image 20221231173815.png

[!note]
Sometimes sharding is very difficult to shard it right. Therefore scale based on read replica is better.

Carveat

Resharding data:

  • If single shard grow too much
  • Uneven data distribution (might need to update the sharding function)

Celebrity problem

  • hotspot, to solve this problem, we assign each shard a celebrity

Join and de-normalisation

  • It's harder to perfrom join operations across database shards