⬅ Previous Topic
CDN (Content Delivery Network)You can support this website with a contribution of your choice.
When making a contribution, mention your name, and programguru.org in the message. Your name shall be displayed in the sponsors list.
⬅ Previous Topic
CDN (Content Delivery Network)Rate limiting is a technique used in system design to control the number of requests a user or client can make to a server within a specified time window. It protects your services from abuse, prevents system overload, and ensures fair usage among all clients.
Imagine a website that provides a public API. If one user starts making thousands of requests every second, it can slow down or even crash the server for everyone else. Rate limiting solves this by putting a cap on how many requests a user can make, such as 100 requests per minute.
Suppose a user is trying to log in to an application. If there is no limit on login attempts, a malicious actor could perform a brute-force attack by trying all possible passwords.
To avoid this, we can apply rate limiting: for example, only allow 5 login attempts per 10 minutes. After 5 failed attempts, block the user for 10 minutes.
Let’s say you’re offering a free weather API. You want to allow fair use but prevent abuse. You could define:
This encourages users to upgrade while protecting your backend resources from being overwhelmed by free users.
Question: If an attacker sends 1000 requests in 1 second, and you limit to 10 requests per second, will the attacker still be able to slow your system down?
Answer: It depends on how the rate limiter is implemented. If requests beyond the limit are dropped early (e.g., at the API gateway), then your system will stay safe. But if all requests reach the application before being rejected, the system can still become overwhelmed.
Rate limiting works by tracking the number of requests from a user (or IP address, API key, etc.) and comparing it to a pre-defined threshold. If the threshold is exceeded, the server blocks further requests until the time window resets.
In this method, time is divided into fixed windows (e.g., 1 minute). If the limit is 100 requests per minute, the system allows 100 requests from a user in that minute. After the minute resets, the counter resets.
Problem: A user can send 100 requests at the end of one window and 100 at the start of the next—effectively sending 200 requests in a short burst.
This approach logs timestamps of each request and removes old ones outside the current time window. It gives a more accurate tracking of request rates but uses more memory and processing.
A more efficient version of the sliding window log. It approximates the number of requests using two time buckets (e.g., current minute and previous minute), combining them based on how much of each overlaps with the actual time range.
This is one of the most commonly used algorithms. Here's how it works:
Advantage: This allows for some burst traffic, as tokens can be accumulated over time.
Imagine a bucket with a hole in the bottom. Water (requests) enters the bucket at any rate, but leaks at a steady rate.
This algorithm smooths out bursty traffic by ensuring a constant output rate.
Question: Which is better for APIs that can tolerate occasional bursts — Token Bucket or Leaky Bucket?
Answer: The Token Bucket is better for bursty traffic because it allows tokens to accumulate and supports short spikes. Leaky Bucket enforces a more consistent rate and discards bursts.
Most modern APIs respond with rate limit headers:
X-RateLimit-Limit
: Maximum number of requests allowedX-RateLimit-Remaining
: Remaining requests before hitting the limitX-RateLimit-Reset
: Time at which the limit resetsWhen a client exceeds the rate limit, the server usually returns an HTTP status code 429 Too Many Requests. This is a signal to the client to wait and retry after some time.
Rate limiting is a important component in system design, especially for scalable and resilient systems. It ensures that resources are used fairly and efficiently, prevents abuse, and keeps your services healthy under load.
As a beginner, you should start recognizing where and how to apply rate limiting in real-world systems such as login APIs, third-party services, or any shared resource.
⬅ Previous Topic
CDN (Content Delivery Network)You can support this website with a contribution of your choice.
When making a contribution, mention your name, and programguru.org in the message. Your name shall be displayed in the sponsors list.