Design Rate Limiter

Why we use rate limiting for API?

  • Prevent DDOS
  • Reduce cost
  • Prevent server overloading

Step by step to design a rate limiter

Step 1. Establish design scope

  • Server-side or client-side
  • Rate limiter based on which properties:
    • User ID
    • IP, ...
  • The rate limiter is in a separate service or implement in the application code?
    • Note: if interviewer said it's up to you, we then ask if the rate limter needs to support fault tolerant.
      • If so it's better to separate the rate limiter to the API servers, in the case the API server has any problem, it doesn't affect our rate limiter.

Step 2. Propose high-level design

Where to put our rate limiter?

  • Client-side: on client-side it's very easy to malform the rate-limiting. And also we're not always have the control over the client implementation. So this is not recommended
  • Server-side:
    • Rate limiter at the API server. Pasted image 20230303162506.png
    • Rate limiter as middleware Pasted image 20230303162528.png
      • This is more recommended as in the future we can extend this service to do SSL (Secure Sockets Layer) termination, IP whitelisting, serving static content, ...

Which algorithm should we use?

High level architecture

In a nut shell, we need to store the counters to count the number of requests to see if we need to reject. To do this, using a database will be slow and inefficient.

Redis is a popular way to implement rate limting, using its supported commands: INCR (increase the counter by 1) and EXPIRE (expire the counter)

Pasted image 20230322104452.png

As a result, the rate limiter middleware can communicate through Redis together for efficiency.

Step 3. Design deep drive

Rate limiting rules

We can use the following json rules. This example limit the number of marketing messages to 5 per day.

domain: messaging
descriptor: 
- key: message_type
  value: marketing
  rate_limit:
    unit: day
    requests_per_unit: 5

or with this example, we only allow 5 login authentication per day

domain: auth
descriptor: 
- key: auth_type
  value: login
  rate_limit:
    unit: minute
    requests_per_unit: 5

Exceeding rate limits

Will return 429-Too-Many-Requests. To let the client know that they're being throttled, we can use the following headers:

X-Ratelimit-Remaining: remain number of allowed request within the time window
X-Ratelimit-Limit: how many calls client can make per time window
X-Ratelimit-Retry-After: number of seconds to wait until you can make a request again without being throttled

Detail design:

Pasted image 20230322172729.png

Consider in distributed environment

  • Race condition:
    • When different rate limiter trying to read values from redis the same time, there could be a race condition. For example: Pasted image 20230322185553.png
    • Solution:
      • Lock (could slow down the system)
      • Use Sorted Set data structure in Redis ^863360
        • All the operation in the sorted set is atomic. Therefore synchronised
  • Synchronisation:
    • Because we have multiple rate limiters, client 1 can send data to rate limiter 1. And client 2 send data to rate limiter 2. Therefore rate limiter does not work properly because rate limiter 1 does not have any information of rate limiter 2 Pasted image 20230322190113.png
    • Solution
      • Sticky session (not scalable, not flexible)
      • Redis Pasted image 20230322190227.png

Step 4: Extra

Performance Optimization

  • We can use multi-data center model to address latency issue. Connection will automatically be routed to the closest edge server.
  • Synchronise data with eventual consistency model (NoSQL).

Monitoring

  • We need to capture log to decide if we need to change the rate limitor algorithm. We need to capture how many request are dropped, and weather we have sudden increase in traffic.

Hard or Soft rate limiting

  • Hard: number of requests cannot exceed the threshold
  • Soft: requests can exceed the threshold for a short period of time

Layer to rate limit

  • Beside HTTP (layer 7), we can also apply rate limit by IP addresses (Layer 3)

Client Side To avoid being rate-limited

  • Use caching
  • Backoff-retry to avoid sending too many requests within a short time frame.