Design Youtube

Quick notes

Video streaming from CDN to client use HLS
Consider using Alexu System Design Interview/14. Design youtube/DAG Scheduler Pattern instead of Alexu System Design Interview/1. Scale from zero to millions of users/Message queue to have a more control over the worker group
Video are serving incrementally via CDN

High design overview

We basically follow this concept

Pasted image 20231102202610.png

This is because when streaming the video, a large data will be transfered and caching is very important. As a result, we can use CDN to save cost and deliver better performance

General flow

Pasted image 20231102211622.png

It's important to keep in mind that when the user upload it's going to be raw format. These needed to be encoded before uploading to streaming website.

Therefore we need to do a batch job processing.

The user upload the video meta data through our normal REST service which stored in a regular DB
After the user finished uploading meta data. The metadata server return an upload link to the user with the URL to upload Raw video blob storage. (See more: The right way to UPLOAD a file using REST > 2. Upload the metadata first and then upload the file)
After uploading to the Blob Storage. Transcoding server can take the data and start our transcoding queue. Since the transcoding job is heavy, we adapt a worker pattern.
- To know when there is a new file upload to our Blob Storage, we can use a notification system. For example, in S3 we can use SQS.
Transcoding server receives status after all the workers has finished, and upload the transcoded video to Transcoded video blob storage.
After that it's cached in CDN
The user can then watch the video on CDN (CDN Consideration)
- The reason why we use CDN here is to guarantee a better connection at the edge server for the users.

Popular streaming protocol

Alexu System Design Interview/14. Design youtube/DASH
Apple HLS (Good pick for IOS compatible)
Microsoft Smooth streaming
Adobe HTTP Dynamic Streaming (HDS)
RTSP (low-latency streaming)
RTMP (high quality streaming)

Design deep drive

Transcoding

Transcoding is needed to ensure the videos are compatible between devices and browsers.
We also want to transcode it into different resolution:
- High network bandwidth user will be suggested with higher resolution
- Poor network bandwith user will be suggested with lower resolution

The video normally has 2 parts:

Container:
- Stores metadata, video files, audio
- In the format of .mov, .mp4, …
Codecs:
- Compression and decompression algorithm but preserve the quality of the video
- Example: H.264, VP9, HEVC

Given the above consideration including transcoding, let's revise our design as follows:

Pasted image 20231107163620.png

Pre-processing

Processing the video comes with a lot of task, for example:

Generating the thumb nail
Video transcoding
Watermark
Audio encoding
Metadata processing etc.

It's more efficient if we can process the video chunk by chunk. As a result, our pre-processor will:

Split video into smaller chunks
Create DAG Config files
Do some caching of the finished parts for retry purposes.

DAG Scheduler

Takes a list of task corresponding to DAG Config file and prioritise the tasks.

Pasted image 20231107180127.png

Resource Manager

Pull task from the Task Queue and worker from the Worker Queue. Select the most suited worker for the task.

Doing this will make sure that we're not wasting any resources and all the worker will be as busy as possible.

Pasted image 20231107180926.png

When all the job is finished, the worker which responsible for merging all the chunks, will upload it to Transcoded video blob storage

Optimisation

Parallel video uploading

We can consider chunks uploading if possible so a client can upload multiple chunks to the Raw video blob storage instead of one big file as the whole.

Doing this can enable Resumable Upload

Pasted image 20231107181304.png

Speed optimisation

Use different CDN in closer location to the user to enhance the upload speed

Parallel system everywhere

We can optimise our system parallely by chunk, for example:

Pasted image 20231107181609.png

For each chunk we started uploading, we can start processing immediately instead of waiting for the whole video to be uploaded before processing

Pre-sign upload URL

If we following a The right way to UPLOAD a file using REST > 2. Upload the metadata first and then upload the file, we can generate a pre-sign upload URL so that it's more secure for the client to talk to AWS S3 directly.

Pasted image 20231107181901.png

Other optimisation

DRM (Digital right management) protection
Cache CDN only for popular videos, other videos redirect to server (cost-saving)
Retry when error occured (rescheduler DAG, retry transcoding, re-generate DAG diagram, …)

IPSec protocols

Concurrency Collection

Reactive Programming

ReactiveStream API

Reactor Core

RxJava

Thread based programming

API

Events

Redis

SpringBatch

SpringBoot

Module API Standard

Listeners

Producers