System Design CourseSystem Design Course1

Availability vs Reliability in System Design



What is Availability?

Availability refers to the ability of a system to be operational and accessible when it is needed. It is usually represented as a percentage of uptime over a specific period. A highly available system ensures that users can access it without significant downtime.

How is Availability Measured?

Availability is commonly expressed in terms of "nines". For example:

Example: Availability in a Web Application

Imagine a basic e-commerce website. If this website is hosted on a single server and that server crashes or restarts, the site becomes unavailable to users. If we add a load balancer and deploy multiple server instances across data centers, the application becomes more available—even if one server fails, traffic is routed to another.

What is Reliability?

Reliability is the probability that a system will run without failure over a specific period. A reliable system performs its intended function correctly and consistently under expected conditions. While availability focuses on being accessible, reliability focuses on being correct.

Example: Reliability in a Messaging System

Consider a chat application like WhatsApp. You may be able to open the app and send a message (availability), but if that message never reaches the recipient or arrives corrupted, the system is not reliable. A reliable messaging system ensures that once you send a message, it will reach the intended user exactly once, in the correct order, and without modification.

Availability vs Reliability: Key Differences

Aspect Availability Reliability
Definition System is up and reachable System works correctly without failure
Focus Uptime Correctness
Example App is accessible 24/7 Data is consistently processed and delivered without loss
Measurement Uptime percentage (e.g. 99.99%) Mean Time Between Failures (MTBF)

Question: Can a System be Available but Not Reliable?

Yes. A system may be accessible (available) but deliver incorrect or inconsistent results (not reliable).

Example: A payment gateway is online and lets you make payments (available), but due to a bug, it charges the customer twice (not reliable).

Question: Can a System be Reliable but Not Available?

Yes. A system may deliver correct results when it runs (reliable), but it may not be accessible all the time (not available).

Example: A data processing system that gives accurate results when run, but is only available 4 hours a day due to maintenance or resource constraints.

Real-World Analogy: ATM Machines

Think of an ATM:

Strategies to Improve Availability

Strategies to Improve Reliability

Interview Insight

In system design interviews, you may be asked to build a highly available and reliable system. Clarify what the interviewer prioritizes:

"Is it more important that the system is always accessible, or that it never fails when processing data?"

Conclusion

Availability and reliability are two foundational pillars of system design. Both are crucial, but they solve different problems. Availability ensures users can access the system; reliability ensures the system works correctly. Depending on the system (e.g., banking vs social media), the balance between these two must be tailored carefully.

Quick Recap



Welcome to ProgramGuru

Sign up to start your journey with us

Support ProgramGuru.org

Mention your name, and programguru.org in the message. Your name shall be displayed in the sponsers list.

PayPal

UPI

PhonePe QR

MALLIKARJUNA M