Back-of-envelop Estimation

General advices:

  • Disk is slow, memory is fast
  • avoid disk seeks
  • Simple compression algorithms are fast
  • Compress data before sending over the internet if possible
  • It will take time to send information to data centeres as it's across different region

Back of envelop calculation

Example: Twitter QPS and storage requirement

  • 300 million monthly active user
  • 50% of users use Twitter daily
  • Each user post on average 2 tweets per day
  • 10% of tweets contain media
  • Data is stored for 5 years

QPS estimation:
When calculating back-of-envelop, we need to calculate QPS (query per second)

So in this case, we want to calculate Tweet QPS (on average) and Peak QPS (on peak)

  1. Daily active user on average = 300 million * 50% = 150 million
  2. Tweet QPS = 150 million * 2 (tweet per days) / 24 (hours) / 3600 (seconds) = ~3500
  3. Peak QPS = Tweet QPS * 2 (100% usage) = 7000

Storage Estimation

  • Let assume we have these attribute for a tweet
    • tweet_id: 64 bytes
    • text: 140 bytes (140 characters)
    • media: 1MB
  • Media storage per day = 150 million * 2 (tweet) * 1 MB * 10% = 30 TB per day
  • 54 year storage = 30 TB * 365 days * 5 (year) = ~55 PB

Note

  • When writing a number, you should put a unit. For example, if you write 5 it has to be 5 MB or 5 KB
  • Commonly you calculate
    • QPS
    • Peak QPS
    • Storage
    • Cache
    • Number of servers etc