System Design CourseSystem Design Course1

Latency vs Throughput in System Design



Understanding Latency and Throughput

In system design, two important metrics help evaluate the performance of a system: latency and throughput. These terms are often used in the context of network communication, APIs, databases, and distributed systems. While they sound related, they measure two very different aspects of performance. Let's explore what they mean, how they differ, and how they influence your design decisions.

What is Latency?

Latency refers to the time it takes for a request to travel from the client to the server and back. In simpler words, it's the delay between sending a request and receiving the first byte of the response.

Real-life analogy:

Imagine you're sending a letter to a friend in another city. Latency is the time it takes for the letter to reach your friend and for them to read it.

Example 1: API Call

A user sends a request to an API to fetch their profile details. The time it takes from clicking the button to when the user sees the result on their screen is called latency. - If this takes 100 milliseconds (ms), we say the API has a latency of 100ms.

What is Throughput?

Throughput measures how many units of work a system can handle in a given time period. It's often measured in operations per second (ops/sec), requests per second (RPS), or transactions per second (TPS).

Real-life analogy:

Think of a water pipe. Throughput is how much water flows through the pipe per second. A big pipe (high throughput) can carry more water per second than a small pipe.

Example 2: Batch Processing System

Suppose a system is processing financial transactions. If it can process 10,000 transactions per second, that’s its throughput. - Even if each transaction takes 1 second, the system can still handle many transactions simultaneously depending on parallelism.

Latency vs Throughput: The Key Differences

| Metric | Latency | Throughput | |--------------|-------------------------------|----------------------------------| | What it measures | Time per operation | Number of operations per second | | Unit | Milliseconds (ms) | Requests per second (RPS) | | Focus | Responsiveness | Capacity | | User impact | Affects perceived speed | Affects system efficiency |

Can a system have low latency but low throughput?

Yes. A system might respond quickly to individual requests (low latency) but can't handle many users at once (low throughput).

Question:

If an API responds in 100ms but crashes after 50 concurrent requests, is it high or low throughput?

Answer:

It is low throughput. Even though each request is fast (low latency), it cannot serve many users simultaneously.

Can a system have high throughput but high latency?

Yes. Consider a batch system that processes 1 million jobs per hour, but each job takes 30 minutes to complete. It has high throughput (many jobs per hour) but high latency (long wait time for each job).

Example 3: Restaurant Analogy

- Latency: The time between placing your order and getting your food. - Throughput: The number of customers served per hour. A small restaurant may serve one customer every 5 minutes (low latency), but it can only handle 12 customers an hour (low throughput). A buffet restaurant may take 10 minutes to serve each customer (higher latency), but with 5 serving lines, it handles 30 customers an hour (higher throughput).

When to Optimize for Latency

- Real-time systems (e.g., chat apps, gaming) - Interactive user experiences (e.g., UI loading) - Financial trading platforms

When to Optimize for Throughput

- Data processing systems (e.g., ETL pipelines) - Backend job queues (e.g., image processing, backups) - Streaming platforms (e.g., video encoding)

Example 4: Web Server Design

Suppose you're designing a web server. Here are two situations: - Goal 1: Serve pages as fast as possible to keep users happy → Optimize for **low latency** - Goal 2: Handle millions of concurrent requests without downtime → Optimize for **high throughput**

Can you have both low latency and high throughput?

Yes, but it requires careful architecture, such as: - Caching frequently accessed data - Load balancing across multiple servers - Using asynchronous queues and worker threads

Final Thoughts

Latency and throughput are both essential metrics. Depending on the use case, one might matter more than the other. A good system design balances both, aiming for low latency and high throughput wherever possible.

Check Your Understanding

Question:

You're designing a video streaming platform. What matters more: latency or throughput?

Answer:

Both are important, but throughput is often more critical because the platform must stream to millions of users simultaneously. However, latency affects buffering and playback start time.

Summary

- Latency = How fast a single request is completed. - Throughput = How many requests can be completed per second. - Systems should be evaluated based on their performance goals—some need responsiveness, others need bulk handling. Understanding both helps make better architectural choices in system design.


Welcome to ProgramGuru

Sign up to start your journey with us

Support ProgramGuru.org

Mention your name, and programguru.org in the message. Your name shall be displayed in the sponsers list.

PayPal

UPI

PhonePe QR

MALLIKARJUNA M