System Design CourseSystem Design Course1

Rate Limiting in System Design



What is Rate Limiting?

Rate limiting is a technique used in system design to control the number of requests a user or client can make to a server within a specified time window. It protects your services from abuse, prevents system overload, and ensures fair usage among all clients.

Why Do We Need Rate Limiting?

Imagine a website that provides a public API. If one user starts making thousands of requests every second, it can slow down or even crash the server for everyone else. Rate limiting solves this by putting a cap on how many requests a user can make, such as 100 requests per minute.

Real-World Examples of Rate Limiting

Example 1: Login Attempts

Suppose a user is trying to log in to an application. If there is no limit on login attempts, a malicious actor could perform a brute-force attack by trying all possible passwords.

To avoid this, we can apply rate limiting: for example, only allow 5 login attempts per 10 minutes. After 5 failed attempts, block the user for 10 minutes.

Example 2: Public API Usage

Let’s say you’re offering a free weather API. You want to allow fair use but prevent abuse. You could define:

This encourages users to upgrade while protecting your backend resources from being overwhelmed by free users.

Intuition-Building Question

Question: If an attacker sends 1000 requests in 1 second, and you limit to 10 requests per second, will the attacker still be able to slow your system down?

Answer: It depends on how the rate limiter is implemented. If requests beyond the limit are dropped early (e.g., at the API gateway), then your system will stay safe. But if all requests reach the application before being rejected, the system can still become overwhelmed.

How Rate Limiting Works

Rate limiting works by tracking the number of requests from a user (or IP address, API key, etc.) and comparing it to a pre-defined threshold. If the threshold is exceeded, the server blocks further requests until the time window resets.

Common Algorithms Used for Rate Limiting

1. Fixed Window

In this method, time is divided into fixed windows (e.g., 1 minute). If the limit is 100 requests per minute, the system allows 100 requests from a user in that minute. After the minute resets, the counter resets.

Problem: A user can send 100 requests at the end of one window and 100 at the start of the next—effectively sending 200 requests in a short burst.

2. Sliding Window Log

This approach logs timestamps of each request and removes old ones outside the current time window. It gives a more accurate tracking of request rates but uses more memory and processing.

3. Sliding Window Counter

A more efficient version of the sliding window log. It approximates the number of requests using two time buckets (e.g., current minute and previous minute), combining them based on how much of each overlaps with the actual time range.

4. Token Bucket

This is one of the most commonly used algorithms. Here's how it works:

Advantage: This allows for some burst traffic, as tokens can be accumulated over time.

5. Leaky Bucket

Imagine a bucket with a hole in the bottom. Water (requests) enters the bucket at any rate, but leaks at a steady rate.

This algorithm smooths out bursty traffic by ensuring a constant output rate.

Question to Deepen Understanding

Question: Which is better for APIs that can tolerate occasional bursts — Token Bucket or Leaky Bucket?

Answer: The Token Bucket is better for bursty traffic because it allows tokens to accumulate and supports short spikes. Leaky Bucket enforces a more consistent rate and discards bursts.

Where to Apply Rate Limiting?

HTTP Headers Used in Rate Limiting

Most modern APIs respond with rate limit headers:

What Happens When the Limit is Exceeded?

When a client exceeds the rate limit, the server usually returns an HTTP status code 429 Too Many Requests. This is a signal to the client to wait and retry after some time.

Best Practices

Conclusion

Rate limiting is a crucial component in system design, especially for scalable and resilient systems. It ensures that resources are used fairly and efficiently, prevents abuse, and keeps your services healthy under load.

As a beginner, you should start recognizing where and how to apply rate limiting in real-world systems such as login APIs, third-party services, or any shared resource.



Welcome to ProgramGuru

Sign up to start your journey with us

Support ProgramGuru.org

Mention your name, and programguru.org in the message. Your name shall be displayed in the sponsers list.

PayPal

UPI

PhonePe QR

MALLIKARJUNA M