Design Rate Limiter
Why we use rate limiting for API?
- Prevent DDOS
- Reduce cost
- Prevent server overloading
Step by step to design a rate limiter
Step 1. Establish design scope
- Server-side or client-side
- Rate limiter based on which properties:
- User ID
- IP, ...
- The rate limiter is in a separate service or implement in the application code?
- Note: if interviewer said it's up to you, we then ask if the rate limter needs to support fault tolerant.
- If so it's better to separate the rate limiter to the API servers, in the case the API server has any problem, it doesn't affect our rate limiter.
- Note: if interviewer said it's up to you, we then ask if the rate limter needs to support fault tolerant.
Step 2. Propose high-level design
Where to put our rate limiter?
- Client-side: on client-side it's very easy to malform the rate-limiting. And also we're not always have the control over the client implementation. So this is not recommended
- Server-side:
- Rate limiter at the API server.
- Rate limiter as middleware
- This is more recommended as in the future we can extend this service to do SSL (Secure Sockets Layer) termination, IP whitelisting, serving static content, ...
Which algorithm should we use?
- Stable traffic:
- Bursted traffic:
High level architecture
In a nut shell, we need to store the counters to count the number of requests to see if we need to reject. To do this, using a database will be slow and inefficient.
Redis is a popular way to implement rate limting, using its supported commands: INCR
(increase the counter by 1) and EXPIRE
(expire the counter)
As a result, the rate limiter middleware can communicate through Redis together for efficiency.
Step 3. Design deep drive
Rate limiting rules
We can use the following json rules. This example limit the number of marketing messages to 5
per day.
domain: messaging
descriptor:
- key: message_type
value: marketing
rate_limit:
unit: day
requests_per_unit: 5
or with this example, we only allow 5 login authentication per day
domain: auth
descriptor:
- key: auth_type
value: login
rate_limit:
unit: minute
requests_per_unit: 5
Exceeding rate limits
Will return 429-Too-Many-Requests
. To let the client know that they're being throttled, we can use the following headers:
X-Ratelimit-Remaining: remain number of allowed request within the time window
X-Ratelimit-Limit: how many calls client can make per time window
X-Ratelimit-Retry-After: number of seconds to wait until you can make a request again without being throttled
Detail design:
Consider in distributed environment
- Race condition:
- When different rate limiter trying to read values from redis the same time, there could be a race condition. For example:
- Solution:
- Lock (could slow down the system)
- Use Sorted Set data structure in Redis ^863360
- All the operation in the sorted set is atomic. Therefore synchronised
- Synchronisation:
- Because we have multiple rate limiters, client 1 can send data to rate limiter 1. And client 2 send data to rate limiter 2. Therefore rate limiter does not work properly because rate limiter 1 does not have any information of rate limiter 2
- Solution
- Sticky session (not scalable, not flexible)
- Redis
Step 4: Extra
Performance Optimization
- We can use multi-data center model to address latency issue. Connection will automatically be routed to the closest edge server.
- Synchronise data with eventual consistency model (NoSQL).
Monitoring
- We need to capture log to decide if we need to change the rate limitor algorithm. We need to capture how many request are dropped, and weather we have sudden increase in traffic.
Hard or Soft rate limiting
- Hard: number of requests cannot exceed the threshold
- Soft: requests can exceed the threshold for a short period of time
Layer to rate limit
- Beside HTTP (layer 7), we can also apply rate limit by IP addresses (Layer 3)
Client Side To avoid being rate-limited
- Use caching
- Backoff-retry to avoid sending too many requests within a short time frame.