What is Cache Invalidation?
Cache invalidation is the process of updating or removing stale data from the cache to ensure that users get the most accurate and up-to-date information. It plays a critical role in maintaining consistency between the cache and the underlying database or source of truth.
Why Do We Need Cache Invalidation?
When data changes in the database, the cache may still hold the old value. If the cache isn't updated or invalidated, users may see outdated information, leading to data inconsistency. Cache invalidation helps solve this problem.
Common Cache Invalidation Strategies
There are three widely used strategies:
- Write-Through Caching
- Write-Around Caching
- Write-Back Caching
1. Write-Through Caching
In write-through caching, data is written to both the cache and the database at the same time. This ensures that the cache is always consistent with the database.
Example:
Let’s say we are building an e-commerce product catalog. When an admin updates the price of a product:
- The new price is written to the cache
- The same price is also written to the database
This keeps both layers in sync. When users query for the product price, they get the latest value directly from the cache.
Question:
What happens if the database write fails but the cache update succeeds?
Answer:
This leads to inconsistency. In production systems, we use transaction support or retry mechanisms to ensure both cache and database are updated successfully or rolled back together.
Use Case:
Write-through caching is good when read performance is critical and we want the cache to always be up-to-date.
2. Write-Around Caching
In write-around caching, data is written directly to the database but not to the cache. When a read request comes, the system checks the cache first. If not found, it fetches from the database and stores it in the cache.
Example:
Continuing with our product catalog, when a product price is updated:
- The price is updated only in the database
- The cache is not updated immediately
When a customer views the product later, the system checks the cache. Since the cache doesn’t have the updated price, it fetches it from the database and then stores it in the cache for next time.
Question:
Why would we not write to cache immediately?
Answer:
To reduce write load on the cache, especially when write operations are frequent and the data is not read immediately after the update.
Use Case:
Write-around is useful when writes are frequent and reads are less frequent. It saves cache space and reduces cache write overhead.
3. Write-Back (Write-Behind) Caching
In write-back caching, data is written to the cache first and the database is updated later asynchronously. This improves write performance but introduces complexity.
Example:
Let’s say we have a user profile system. When a user updates their name:
- The new name is written to the cache
- A background process writes the change to the database after a short delay
This gives a fast response to the user, but the database will have stale data for a short period.
Question:
What happens if the system crashes before writing to the database?
Answer:
Data may be lost. That’s why write-back caching is usually used with a persistent cache and retry mechanisms to ensure durability.
Use Case:
Write-back caching is great for high-throughput systems where speed is critical and occasional delays in persistence are acceptable.
Which Strategy Should You Use?
It depends on your use case:
- Write-through: Use when consistency is top priority.
- Write-around: Use when you want to reduce cache write load.
- Write-back: Use when write performance is critical and occasional delays are acceptable.
Bonus: Time-To-Live (TTL) and Expiry
Many systems also use TTL (Time-To-Live) to auto-expire cache entries. For example, setting a 10-minute TTL ensures the data is refreshed every 10 minutes.
Question:
Is TTL a form of cache invalidation?
Answer:
Yes. TTL ensures that stale data is automatically removed after a fixed time, making it a passive cache invalidation strategy.
Summary
- Cache invalidation ensures that users get fresh data from the cache.
- Write-through keeps cache and DB in sync but is slower on write.
- Write-around skips cache during writes and populates on reads.
- Write-back writes to cache first and DB later for fast writes.
- TTL-based expiration is another way to invalidate stale data.
Interview Tip
In system design interviews, always discuss cache invalidation strategy when suggesting a caching solution. Show awareness of trade-offs between performance and consistency.