Back-of-envelop Estimation
General advices:
- Disk is slow, memory is fast
- avoid disk seeks
- Simple compression algorithms are fast
- Compress data before sending over the internet if possible
- It will take time to send information to data centeres as it's across different region
Back of envelop calculation
Example: Twitter QPS and storage requirement
- 300 million monthly active user
- 50% of users use Twitter daily
- Each user post on average 2 tweets per day
- 10% of tweets contain media
- Data is stored for 5 years
QPS estimation:
When calculating back-of-envelop, we need to calculate QPS (query per second)
So in this case, we want to calculate Tweet QPS (on average) and Peak QPS (on peak)
- Daily active user on average = 300 million * 50% = 150 million
- Tweet QPS = 150 million * 2 (tweet per days) / 24 (hours) / 3600 (seconds) = ~3500
- Peak QPS = Tweet QPS * 2 (100% usage) = 7000
Storage Estimation
- Let assume we have these attribute for a tweet
tweet_id
: 64 bytestext
: 140 bytes (140 characters)media
: 1MB
- Media storage per day = 150 million * 2 (tweet) * 1 MB * 10% = 30 TB per day
- 54 year storage = 30 TB * 365 days * 5 (year) = ~55 PB
Note
- When writing a number, you should put a unit. For example, if you write
5
it has to be5 MB
or5 KB
- Commonly you calculate
- QPS
- Peak QPS
- Storage
- Cache
- Number of servers etc