Design A New Feed System

The main thing of a new feed system is the feed distribution which takes the data from Post Database and User database to distribute the feeds.
When distribute the feeds, we can use cache instead of database since the newfeed is changing daily.
We use Pull model here since it the workers can dynamically generate the newfeed during read time. Using push model can achieve real time but however expensive.
Step 1: Understand the problem
Some important questions:
- Question: Mobile app or web app or both?
- Mobile app we might need to handle notification
- Answer: Both
- What are the important features?
- We need to check which features are needed for the design
- Answer:
- See and post feed
- Is the new feed sorted by score or something?
- Answer: no, just sort by order post time
- How many friends can a user have
- Answer: 5000
- What's the traffic volume:
- Answer: 10 million DAU
- Can feed contain images, videos or just text
- Answer: can contain media files, both image and video
Step 2: propose high level design
We have 2 tasks:
- Feed publishing:
- When user publishes a post, data is written into cache and database. Post is then deliver to friends new feed
- Feed building:
- New feed is built by aggregating friends' posts from newest to oldest
Newsfeed api
Post the feed
POST /v1/me/feed
- content: content of the text (request body)
- auth_token: auth token to authenticate API request (request header)
Retrieve the feed
GET /v1/me/feed
- auth_token: auth token to authenticate API request
Feed publishing
Components:
- Web server: to route to different service (orchestrator) (will need to trigger 3 service at the same time when someone post something), also authenticating, rate limiting if needed
- Fanout service: distribute your post to other feed
- Post service: persist your post in database
- Notification service: push to your friends phone that you posted a new post
Feed building
Components:
- Web server: to route to different service (orchestrator), also authenticating, rate limiting if needed
- New feed service: micro service to handle new feed fetching stuff
- New feed cache: new feed ID to render the new feeds. Note that 1 user has their own cache
Step 3: Design deep dive
Feed publishing
The web server itself has both Rate Limiter and Autonegotation This is more of a lazy design, for actual design we can refer to Design rate limiter where we divide into middle tier.
Note: When submiting to the fanout here is push model. Where we push directly to the user feed (their friend feed). The reason we do this is we need latest information, therefore we can't do pull, we need to do push.
However, when the user is a celebrity with a lot of friends (or followers), we can't push to everyone. As a result, we need to do a pull model for those.
Feed retrieving
We first need to get all the feed for current user first, Each feed would have <post_id, user_id> the user_id here is the user friend.
For each one, we can:
- Get more detail about the user: i.e user name, profile picture etc from the
User cache - Get more detail about the post: i.e post content, picture, images from the
Post cache
Hybrid Model with Push & Pull
Above model is the push model — we push the content of the feed directly to the user home, the user just go and read it. However for celebrity user (one with 10m+ followers), we can't all push to 10 millions new feed. That's very costly.
The solution is we introduce a pull model. So for celebrity only, we will pull from the celebrity post list itself and append to the current feed instead. This requires us to maintain a user_post_cache[user_id]
Publishing
In here for each fanout service, we also add a post_id in the User Post Cache [user_id]
Retriving
On retrieving, we fetch the user celebrity post and merge with the news feed cache.
- NOTE: it makes no sense to add it back to the news feed cache since it's the same thing to fanout 10M post to followers now but the trigger is on read. Also we always need to refetch the celebrity post cache anyways
Cache architecture
- New feeds: store the ID of each feed
- Content:
- Hot cache: popular content
- We need to separate this out to avoid hot key problem from the main cache
- We can spend more ram and computing power on hot cache
- Major hits will go to the hot cache, the normal cache is protected
- Normal cache: store post data
- Hot cache: popular content
- Social graph: store user relationship data, follower and following
- Actions: liked, replied, others
- Counters: like counter, reply counter
For caching, we can use Cache aside and Write through > Write through with invalidation