What is Capacity Estimation?
Capacity estimation is the process of calculating how much load your system needs to handle. It helps you estimate the required hardware, bandwidth, database size, and other resources based on expected usage.
This step is typically performed early in the system design phase, especially in interviews and real-world system architecture planning. It ensures that the system will be scalable, responsive, and cost-effective.
Why is Capacity Estimation Important?
Without proper capacity estimation, a system might crash under high traffic or overuse resources unnecessarily. It helps in:
- Planning infrastructure and budget
- Scaling the system appropriately
- Avoiding system failures under peak load
Example 1: Designing a URL Shortener - Estimating Capacity
Let’s say we are designing a URL shortener like Bit.ly. First, we must understand the usage pattern.
Assumptions:
- 100 million users
- 10% users are active daily → 10 million daily active users (DAU)
- Each active user shortens 5 URLs per day
Step-by-Step Estimation:
- Total URLs created per day: 10M users × 5 = 50M URLs/day
- Storage for each URL:
- Short URL: 8 characters = ~8 bytes
- Long URL: average 100 bytes
- Total = ~108 bytes
- Daily Storage Requirement: 50M × 108 bytes ≈ 5.4 GB/day
- Monthly Storage (30 days): 5.4 × 30 = 162 GB
Question:
If we wanted to support the service for 3 years without archiving any data, how much storage would we need?
Answer:
3 years = 36 months → 162 GB × 36 = 5.8 TB approximately
What Else Do We Estimate in Capacity Planning?
- Read/Write Traffic: How many reads/writes per second?
- Database QPS (Queries per second): Important to size DB clusters
- Bandwidth: Needed for upload/download, API traffic, video content, etc.
- Peak vs Average Load: Systems must be designed to handle peak load
Example 2: Estimating Capacity for a Video Streaming Service
Imagine you’re designing a basic video streaming service.
Assumptions:
- 5 million daily active users
- Each user watches 2 videos/day
- Average video length = 10 minutes
- Bitrate = 2 Mbps
Step-by-Step Estimation:
- Total video minutes per day: 5M × 2 × 10 = 100M minutes
- Total data streamed:
- 10 minutes = 600 seconds
- 600 sec × 2 Mbps = 1.2 Gb (per video)
- 1.2 Gb = 150 MB per video
- Daily bandwidth: 5M users × 2 × 150 MB = 1.5 PB (petabytes)
Question:
How many CDN servers would we need if each server can handle 5 Gbps of traffic?
Answer:
- 1.5 PB/day = ~17.36 GBps
- 17.36 GBps = 138.88 Gbps
- 138.88 ÷ 5 = ~28 CDN servers minimum
How to Approach Capacity Estimation in Interviews
Always follow a logical, layered approach:
- Start with user base
- Estimate daily active users (DAU)
- Calculate requests per second (QPS)
- Estimate data size and storage needs
- Project bandwidth and compute capacity
Quick Tip:
Assume 1 month = 30 days and 1 year = 365 days unless specified otherwise.
Practice Question:
You are building a chat app. If 2 million users each send 20 messages per day (average message size 200 bytes), how much storage is needed per day?
Answer:
- 2M × 20 = 40M messages
- 40M × 200 bytes = 8,000,000,000 bytes = 8 GB/day
Key Takeaways
- Capacity estimation is about making realistic assumptions
- Break down into users, data size, and frequency of actions
- Helps ensure performance and scalability of your system