Design A Notification System

Step 1 — Understand the problem

Candidate: What types of notifications does the system support?
Interviewer: Push notification, SMS message, and email.

Candidate: Is it a real-time system?
Interviewer: Let us say it is a soft real-time system. We want a user to receive notifications as soon as possible. However, if the system is under a high workload, a slight delay is acceptable.

Candidate: What are the supported devices?
Interviewer: iOS devices, android devices, and laptop/desktop.

Candidate: What triggers notifications?
Interviewer: Notifications can be triggered by client applications. They can also be scheduled on the server-side.

Candidate: Will users be able to opt-out?
Interviewer: Yes, users who choose to opt-out will no longer receive notifications.

Candidate: How many notifications are sent out each day?
Interviewer: 10 million mobile push notifications, 1 million SMS messages, and 5 million emails.

So we have the functionality:

  • Near real time
  • Push notification, SMS email
  • IOS, Android, Laoptop, Desktop
  • Notifications triggered by client or server
  • Opt out option
  • 10 million push per day, 1 million SMS and 5 million emails

Step 2 – Propose a high level solution

We can consider the types of each notifications

Pasted image 20230714081043.png

We also want to consider third party service because in some country our service is not usable. For example in China, we cannot use Firebase Cloud Messaging (FCM)

Contact info gather flow

For push notification to work, we need:

  • Device tokens
  • Phone number
  • Email address

Therefore, the customer when sign up for our app, we need to store in the database

Pasted image 20230714081319.png

User can have multiple devices because the push notification can be send to all user devices

Pasted image 20230714081352.png

Notification sending / receving flow

Pasted image 20230714081436.png

  • Service: Could be microservice or cronjobs etc
  • Notifications system: For simplicity we can assume only 1 notification service is use. This notification system will pull the user data from the database and talk directly to APNS, PCM, …
  • IOS, Android, SMS, Email: To handle user notifications
  • Third party service: In the case that the default service is not available in some countries. We need third-party service. As explain above.

Problems:

  • Single point of failure:
    • Single notification server means single point of failure
  • Hard to scale
    • Our notification system handle all types of push notification, therefore it's harder to scale
  • Performance bottleneck
    • We only have 1 server (notification system) to handle processing and sending notification. Tasks that take long time such as constructing HTML pages or waiting responses from third party will result into system overload

Highlevel design (Improved)

Pasted image 20230714081959.png

  • Service 1 - N: Microservices or cronjobs
  • Notification Servers:
    • Provide basic APIs for service to send notifications. These can be access internally to prevent spam and to only controls the type of notifications we want to send.
      • For example http://api.example.com/v/sms/send
    • Carry out basic validations ie: emails, phone numbers, …
    • Query the database or cache to fetch data to render notification
    • Put notification in the queue for paralell process
  • Cache: user info, device info, notificationtemplate are caches
  • DB: stores data about user, notification settings
  • Message queues:
    • Buffer for high volumes.
    • Each notification type is assigned to a queue so that outage in one will not affect the other queue
  • Workers: processing the event
  • Third party services
  • IOS, Android, SMS, Email

Step 3 – Design deep dive

Reliability

One of the requirement for the notification system is it cannot loose data. The notification can usually be delayed or reorder but never lost.

Therefore we need to persist the notification data inside the database. As a result, we implement the notification log

Pasted image 20230714083153.png

Deduplication

The reciepents will never receive exactly one notification. This could happen due to various reasons:

  • Network partition
  • Failure to acknowledged
  • Workers not processing fast enough, hence the message go back into the queue.

Therefore, we need to first check if the message is sent before with the eventId before sending it.

Additional condsiderations

Message template

To avoid re-generating notifications every single time from scratch, we can use a message template and fill in the parameter.

Notification setting

We want to allow the user to have the ability to opt-in or out the notification. Therefore having the following fields in the database would be helpful

user_id bigInt
channel varChar
opt_in boolean

Rate limtting

To avoid overwhelming users with too many notifications. This is important because if we send too many notification, the user can turn our notification off completely.

Retry mechanism

When the third-party fails to send a notification, the notification will be added to the message queue for retrying

Security

We can use appSecret and appKey to verify if the client is genuine.

Monitor queued notification

We can monitor the queues to see if the number is large or not. If the number is too large that means the workers can not send the notification on time, Hence we need to increase the number of workers.

Events tracking

For analytics purposes, we want to keep track of how many notifications have been sent and how many clicks / unsubscribes.

Updated design

Pasted image 20230714083940.png

Step 4 — Wrap up

Discuss any addition features that needed.