⬅ Previous Topic
Latency vs Throughput in System DesignNext Topic ⮕
Horizontal vs Vertical Scaling⬅ Previous Topic
Latency vs Throughput in System DesignNext Topic ⮕
Horizontal vs Vertical ScalingAvailability refers to the ability of a system to be operational and accessible when it is needed. It is usually represented as a percentage of uptime over a specific period. A highly available system ensures that users can access it without significant downtime.
Availability is commonly expressed in terms of "nines". For example:
Imagine a basic e-commerce website. If this website is hosted on a single server and that server crashes or restarts, the site becomes unavailable to users. If we add a load balancer and deploy multiple server instances across data centers, the application becomes more available—even if one server fails, traffic is routed to another.
Reliability is the probability that a system will run without failure over a specific period. A reliable system performs its intended function correctly and consistently under expected conditions. While availability focuses on being accessible, reliability focuses on being correct.
Consider a chat application like WhatsApp. You may be able to open the app and send a message (availability), but if that message never reaches the recipient or arrives corrupted, the system is not reliable. A reliable messaging system ensures that once you send a message, it will reach the intended user exactly once, in the correct order, and without modification.
Aspect | Availability | Reliability |
---|---|---|
Definition | System is up and reachable | System works correctly without failure |
Focus | Uptime | Correctness |
Example | App is accessible 24/7 | Data is consistently processed and delivered without loss |
Measurement | Uptime percentage (e.g. 99.99%) | Mean Time Between Failures (MTBF) |
Yes. A system may be accessible (available) but deliver incorrect or inconsistent results (not reliable).
Example: A payment gateway is online and lets you make payments (available), but due to a bug, it charges the customer twice (not reliable).
Yes. A system may deliver correct results when it runs (reliable), but it may not be accessible all the time (not available).
Example: A data processing system that gives accurate results when run, but is only available 4 hours a day due to maintenance or resource constraints.
Think of an ATM:
In system design interviews, you may be asked to build a highly available and reliable system. Clarify what the interviewer prioritizes:
"Is it more important that the system is always accessible, or that it never fails when processing data?"
Availability and reliability are two foundational pillars of system design. Both are important, but they solve different problems. Availability ensures users can access the system; reliability ensures the system works correctly. Depending on the system (e.g., banking vs social media), the balance between these two must be tailored carefully.
⬅ Previous Topic
Latency vs Throughput in System DesignNext Topic ⮕
Horizontal vs Vertical ScalingYou can support this website with a contribution of your choice.
When making a contribution, mention your name, and programguru.org in the message. Your name shall be displayed in the sponsors list.