⬅ Previous Topic
System Design: How to Design YouTubeNext Topic ⮕
Security and Authentication in System Design⬅ Previous Topic
System Design: How to Design YouTubeNext Topic ⮕
Security and Authentication in System DesignMonitoring and logging are essential components in system design that help engineers understand how a system behaves in real-time, detect issues early, and troubleshoot problems effectively. For any large-scale or production-level application, observability is critical to ensure reliability and maintain user satisfaction.
Monitoring is the process of collecting, analyzing, and using metrics to track the performance, health, and availability of a system. It helps answer questions like:
Logging refers to the recording of events and messages that occur within an application or system. Logs contain detailed information such as error traces, user actions, system warnings, and other operational data that help in debugging and analysis.
Consider a web application that starts responding slowly or returns error pages. Without logs and monitoring data, it would be very difficult to understand what went wrong. But with proper logs and a monitoring dashboard, you can quickly identify:
Let’s say you’ve built a food delivery app like Zomato or Swiggy. Your backend has multiple components like:
To monitor the health of this system, you might want to track metrics like:
These metrics would be visualized in dashboards using tools like Prometheus + Grafana or Datadog.
Answer: Monitoring alerts would be triggered if success rate drops below a certain threshold (say 95%). The operations team would investigate. They might check logs of the Payment Service and find out that a third-party API used for card payments is down. This quick insight is only possible because of effective monitoring and logging.
For beginners, it’s helpful to understand what types of events should be logged. Examples:
Let’s say a customer places an order and the system crashes. The logs should contain entries like:
[INFO] Order placed for userID=123 at 10:30:15
[ERROR] PaymentService timeout after 5000ms
[WARN] Retrying payment process
[ERROR] OrderService failed to confirm order for userID=123
These logs help engineers pinpoint where the issue occurred — whether in the PaymentService or OrderService — and take necessary actions.
Answer: No. Logging everything can lead to performance issues and unmanageable log storage. Instead, log:
This stack is often called the ELK (Elasticsearch, Logstash, Kibana) stack or EFK (Elasticsearch, Fluentd, Kibana).
Imagine you're debugging why users are facing random logouts. You search the logs in Kibana using a query like:
level: "ERROR" AND message: "user session expired"
The search shows that a Redis instance (used to store sessions) had restarted, causing session loss. This root cause would be hard to find without centralized, searchable logs.
Answer: Structured logging stores log data in a consistent format (like JSON), making it easier to parse, filter, and search logs. For example:
{
"timestamp": "2025-05-03T10:15:00Z",
"level": "ERROR",
"service": "PaymentService",
"message": "Failed to process payment",
"order_id": "ORD1007",
"user_id": "U512"
}
Aspect | Monitoring | Logging |
---|---|---|
Purpose | Track system health and performance | Record detailed event history |
Data Type | Numerical metrics | Textual logs |
Examples | CPU usage, error rate | Stack trace, request logs |
Tools | Prometheus, Datadog | ELK Stack, Fluentd |
Monitoring and logging are foundational to maintaining healthy and reliable systems. For a beginner, understanding these tools early helps build better systems that are easier to debug and scale. Always start by defining what you want to measure and what kind of problems you want to catch — then build your logging and monitoring strategy around that.
⬅ Previous Topic
System Design: How to Design YouTubeNext Topic ⮕
Security and Authentication in System DesignYou can support this website with a contribution of your choice.
When making a contribution, mention your name, and programguru.org in the message. Your name shall be displayed in the sponsors list.