2.3 Caching - The System Design Interview Handbook

Caching Fundamentals

Caching is the practice of storing copies of frequently accessed data in a faster storage layer so your system does not have to fetch or compute it from scratch every time someone asks for it.

Why Caching Matters: Reducing Latency and Load

Your database might take 50 milliseconds to run a query. Your cache can return the same result in under 1 millisecond. That is a 50x speed improvement for a single request.

Multiply that across millions of requests per day, and caching becomes the difference between a system that feels instant and one that feels sluggish.

But speed is only half the story.

Caching also protects your backend from being crushed under load.

If your application serves 10,000 requests per second and 80% of those requests ask for the same popular data, you have two choices. You can let your database handle all 10,000 requests.

Or you can serve 8,000 of them from cache and let your database handle only 2,000.

The second option means your database needs five times fewer resources, costs less money, and stays healthy under traffic spikes.

This is why every major system you have ever used relies on caching. Your social media feed is cached. Your search results are cached.

The product page you browsed on an e-commerce site is cached.

Without caching, most of the internet would feel like it is running on dial-up.

Cache Hit, Cache Miss, and Hit Ratios

When a request comes in and the cache already has the answer, that is a cache hit. The data gets returned immediately from the fast layer, and the slower backend is never touched.

When the cache does not have the answer, that is a cache miss.

The system has to go to the original source (usually a database), fetch the data, return it to the caller, and typically store a copy in the cache so the next request for the same data is a hit.

The cache hit ratio is the percentage of requests served from cache.

If your cache serves 950 out of every 1,000 requests, your hit ratio is 95%. That number is one of the most revealing metrics in any system.

A high hit ratio (above 90%) means your cache is doing its job well.

A low hit ratio means you are paying for a cache that is not helping much, and you need to investigate why.

Common reasons for a low hit ratio include caching data that is rarely requested again, setting TTLs (time-to-live) too short so entries expire before they get reused, or having a cache that is too small to hold enough entries.

Types of Caches: In-Process, Distributed, Multi-Tier

Not all caches live in the same place or serve the same purpose.

An in-process cache lives inside the application's own memory. It is the fastest option because there is zero network overhead.

A HashMap or a library like Guava (Java) or lru-cache (Node.js) can serve as an in-process cache.

The limitation is scope: each application instance has its own cache, and they are not shared.

If you have 10 servers, each server maintains its own separate cache, which can lead to inconsistent data across instances and wasted memory storing duplicate entries.

A distributed cache runs as a separate service that all your application instances share. Redis and Memcached are the two most common choices.

Every server talks to the same cache, so data is consistent and memory is used efficiently.

The trade-off is a network hop: a distributed cache call takes 1 to 5 milliseconds instead of the microseconds you get with an in-process cache. But that is still dramatically faster than hitting a database.

A multi-tier cache combines both. Your application first checks its local in-process cache (microseconds).

If that misses, it checks the distributed cache (milliseconds).

If that misses too, it goes to the database (tens of milliseconds). Each tier acts as a filter, catching progressively more requests before they reach the slowest layer.

Cache Type	Speed	Shared Across Instances?	Capacity	Best For
In-process	Microseconds	No	Limited to instance memory	Extremely hot data, config, small lookup tables
Distributed	1-5 ms	Yes	Large (cluster of cache nodes)	Session data, API responses, database query results
Multi-tier	Fastest available	Partially	Combined	High-traffic systems needing both speed and consistency

Interview-Style Question

Q: Your application has 20 instances behind a load balancer. You notice that each instance caches user profiles locally, but users report seeing stale data after updating their profile. What is happening and how do you fix it?

A: The user's update request hits one server, which updates the database and its local cache. But the other 19 servers still have the old version in their local caches. Subsequent requests that land on those servers return stale data. The fix is to either move the user profile cache to a distributed cache (like Redis) that all instances share, or implement a cache invalidation broadcast so that when one instance updates a profile, it notifies all other instances to evict that entry from their local caches.

Multi-Tier Caching Flow

KEY TAKEAWAYS

Caching stores copies of frequently accessed data in a faster layer, reducing both latency and backend load.
Cache hit ratio is your most telling metric. Aim for 90% or above. Investigate if it drops significantly.
In-process caches are fastest but not shared. Distributed caches are shared but require a network hop. Multi-tier caches give you the best of both.
Every major internet application depends on caching. If your system does not use caching yet, it is almost certainly leaving performance on the table.

Where to Cache

Caching is not a single layer you bolt onto your system in one place.

Effective caching happens at multiple points along the path a request travels, from the user's device all the way down to the database.

Each layer catches a different category of repeated work.

Client-Side Caching (Browser, Mobile)

The closest cache to your user is the one on their own device.

Browsers cache static assets like images, JavaScript files, and CSS stylesheets based on HTTP headers.

When you set a Cache-Control: max-age=86400 header on a stylesheet, the browser stores that file locally for 24 hours. Any page that references the same stylesheet loads it from disk instead of re-downloading it from your server.

Mobile apps do the same thing, often with more control. You can cache API responses locally so the app remains usable even when the network drops.

A news app might cache the latest 50 articles so users can read them on the subway without a connection.

Client-side caching is powerful because it eliminates the network request entirely.

No DNS lookup, no TCP handshake, no server processing.

The data is already on the device. The trade-off is that you have limited control over the client's cache.

Once you send a max-age header, you cannot force the browser to fetch a new version until the TTL expires (though cache-busting techniques like adding version hashes to filenames help).

CDN Caching

A CDN (Content Delivery Network) caches your content on servers distributed across the globe. When a user in Tokyo requests an image hosted on your server in Virginia, the CDN serves a cached copy from a server in Tokyo instead.

The user gets the image in 20 milliseconds instead of 200.

CDNs handle static content brilliantly: images, videos, fonts, JavaScript bundles, CSS files. Modern CDNs can also cache API responses and even generate dynamic content at the edge using edge functions.

For now, think of the CDN as the first server-side layer that intercepts requests before they ever reach your infrastructure.

Web Server / Reverse Proxy Caching

A reverse proxy like Nginx or Varnish can cache HTTP responses and serve them directly without forwarding the request to your application servers.

If 500 users request the same product page within a minute, the reverse proxy fetches it from your app once, caches the response, and serves the other 499 requests from its own memory.

This works exceptionally well for content that is the same for every user: homepages, product listings, blog posts, public API endpoints. It works less well for personalized content where each user sees different data.

Application-Level Caching (Redis, Memcached)

This is the cache that most engineers think of first. Your application code explicitly stores and retrieves data from a cache service like Redis or Memcached.

Redis is the more feature-rich option. It supports strings, lists, sets, sorted sets, hashes, and more. It can persist data to disk, supports pub/sub messaging, and offers built-in TTL management.

Memcached is simpler and focused purely on key-value caching with exceptional performance for that specific use case.

Application-level caching gives you the most control. You decide exactly what to cache, when to cache it, and when to invalidate it. You might cache the result of an expensive database query, the output of a computation-heavy function, or an external API response that changes infrequently.

Technology	Data Structures	Persistence	Best For
Redis	Strings, lists, sets, sorted sets, hashes, streams	Yes (optional)	Versatile caching, sessions, leaderboards, rate limiting
Memcached	Strings only (key-value)	No	Simple, high-throughput key-value caching

Database Query Caching

Some databases have their own built-in cache. MySQL has a query cache that stores the result of SELECT queries. If the exact same query runs again and the underlying tables have not changed, the cached result is returned instantly.

The usefulness of database query caching varies. It works well for read-heavy workloads with repetitive queries. It breaks down for write-heavy workloads because every write invalidates cached queries for the affected tables, leading to a cache that constantly fills and flushes.

Most production systems rely on application-level caching (Redis/Memcached) rather than database query caching because it gives finer control over what gets cached and when entries expire.

Edge Caching

Edge caching pushes computation and data storage closer to the user, often to the same locations where CDN nodes live.

Services like Cloudflare Workers and AWS Lambda@Edge let you run code at the edge, so you can cache and serve dynamic content from a location that is physically close to the user.

Edge caching is especially valuable for applications with a global user base where every millisecond of latency matters.

A gaming leaderboard, a personalized product recommendation, or a user's notification count can all be computed and cached at the edge.

Caching

KEY TAKEAWAYS

Client-side caching eliminates network requests entirely. Use HTTP cache headers and local storage strategically.
CDN caching serves content from locations geographically close to the user. Ideal for static assets and increasingly for dynamic content.
Reverse proxy caching handles identical requests at the server level, shielding your application from redundant work.
Application-level caching with Redis or Memcached gives you the most control over what is cached and when.
Edge caching brings computation and data closer to users for latency-sensitive global applications.
The most effective systems cache at multiple layers simultaneously. Each layer catches traffic that the layers above it missed.

Cache Strategies

Knowing where to cache is only half the equation. You also need to decide how your cache interacts with your database.

When does data get loaded into the cache?

When does it get updated?

Who is responsible for keeping the cache and database in sync?

Five strategies answer these questions. Each one makes a different trade-off between read speed, write speed, data freshness, and implementation complexity.

Cache-Aside (Lazy Loading)

Cache-aside is the most commonly used caching strategy.

The application manages the cache directly. It works in three steps.

When a request comes in, the application checks the cache first.

If the data is there (cache hit), it returns the cached value. If the data is not there (cache miss), the application queries the database, stores the result in the cache, and then returns it.

The cache only gets populated on demand, which is why this pattern is also called lazy loading. Data enters the cache only when someone asks for it. This means the cache never fills up with data that nobody requests.

The downside is that the first request for any piece of data always hits the database.

If your application experiences a cold start (an empty cache after a restart or deployment), the initial wave of requests all miss the cache and hit the database simultaneously.

Cache-aside also puts the burden on your application code.

Every read path needs cache-check logic.

Every write path needs to decide whether to update or invalidate the cache.

Read-Through Cache

Read-through looks similar to cache-aside from the application's perspective, but the cache itself is responsible for loading data on a miss.

The application always asks the cache. If the cache has the data, it returns it.

If not, the cache fetches it from the database, stores it, and returns it.

The difference is subtle but meaningful.

With cache-aside, your application code contains the logic for fetching from the database on a miss.

With read-through, that logic lives in the cache layer. Your application code only knows about the cache, not the database.

This simplifies your application but requires a cache layer that supports data-loading callbacks or has built-in database integration.

Some cache frameworks and managed services support this natively.

Write-Through Cache

Write-through ensures that every write goes to both the cache and the database.

When the application writes data, the cache receives the write first (or simultaneously), and then the data is written to the database.

No write is considered complete until both the cache and the database have been updated.

The advantage is consistency.

The cache is always up to date with the database. Read requests never return stale data because the cache was updated at write time.

The disadvantage is write latency.

Every write operation now has two stops instead of one, and the write is not acknowledged until both succeed.

For write-heavy systems, this extra latency can add up. Write-through also caches data that might never be read, wasting cache memory on entries nobody requests.

Write-Behind (Write-Back) Cache

Write-behind flips the write-through model.

The application writes to the cache, and the cache acknowledges the write immediately. The cache then asynchronously writes the data to the database in the background, often batching multiple writes together for efficiency.

This dramatically reduces write latency because the application only waits for the cache write (microseconds), not the database write (milliseconds).

The batching also reduces the total number of database writes, which is valuable for systems with high write volume.

The risk is data loss.

If the cache crashes before it flushes writes to the database, those writes are gone. You are trading durability for speed. This trade-off is acceptable for data like view counts, analytics events, or session activity where losing a few seconds of data is tolerable. It is not acceptable for financial transactions or user-generated content.

Refresh-Ahead Cache

Refresh-ahead proactively reloads cache entries before they expire.

The cache tracks which entries are about to hit their TTL and fetches fresh data from the database in the background, so the next request gets a cache hit with up-to-date data instead of waiting for a miss-then-fetch cycle.

This eliminates the latency spike that happens when a popular cache entry expires and the next request has to wait for a database query.

The challenge is predicting which entries will actually be requested again.

If you refresh an entry that nobody asks for, you wasted a database read.

Refresh-ahead works best for data with predictable access patterns, like a dashboard that refreshes every 30 seconds or a homepage feed that millions of users load repeatedly.

Strategy	Who Loads Cache?	Write Path	Read Latency	Write Latency	Risk
Cache-aside	Application (on miss)	App updates DB, invalidates cache	Miss on first request	Normal	Cold start stampede
Read-through	Cache (on miss)	App writes to DB separately	Miss on first request	Normal	Stale reads until invalidated
Write-through	Cache + DB on every write	Cache and DB updated together	Always fast (cache is current)	Slower (dual write)	Wasted cache space
Write-behind	Cache immediately, DB later	Cache writes to DB asynchronously	Always fast	Very fast	Data loss if cache crashes
Refresh-ahead	Cache (proactively before TTL)	Varies	Consistently fast	Varies	Wasted refreshes for unpopular data

Interview-Style Question

Q: You are building a product catalog for an e-commerce platform. Products are updated a few times per day but read millions of times. Which caching strategy would you use?

A: Cache-aside is the best fit here. Products are read-heavy and write-light, so the majority of requests will hit the cache. On the rare occasion a product is updated, you invalidate the cache entry. The next read misses, fetches the fresh data from the database, and repopulates the cache. You could optionally combine this with refresh-ahead for the top 1,000 most-viewed products to eliminate any latency spike when their TTL expires.

KEY TAKEAWAYS

Cache-aside is the most common strategy. The application checks the cache, falls back to the database on a miss, and populates the cache afterward.
Read-through moves the database-fetching logic into the cache layer itself, simplifying application code.
Write-through keeps the cache and database perfectly in sync but adds latency to every write.
Write-behind is the fastest for writes but risks data loss if the cache fails before flushing to the database.
Refresh-ahead eliminates TTL-expiration latency spikes by proactively refreshing popular entries before they expire.
Most production systems use cache-aside. The others are useful in specific scenarios where their trade-offs align with your requirements.

Cache Eviction Policies

Your cache has a fixed amount of memory.

Eventually, it fills up.

When a new entry needs to go in and there is no room, the cache has to decide which existing entry to remove. That decision is governed by the eviction policy.

Choosing the wrong eviction policy for your access pattern can tank your hit ratio.

Choosing the right one can keep your cache lean and effective.

LRU (Least Recently Used)

LRU evicts the entry that has not been accessed for the longest time.

The logic is simple: if something has not been read or written recently, it is probably not needed soon.

LRU is the default eviction policy in most cache systems, and for good reason. It works well for a wide variety of access patterns, especially when recent data is more likely to be requested again than old data.

A user's session data, the latest news articles, recently viewed product pages: all of these follow a "recent is relevant" pattern where LRU performs well.

The weakness of LRU is that it is vulnerable to scans.

If a batch job reads through a large set of data once (like a nightly report that touches every record), those one-time entries flood the cache and evict genuinely popular items that will be requested again.

After the scan finishes, the cache is full of data nobody will ask for again, and the hit ratio drops sharply.

LFU (Least Frequently Used)

LFU evicts the entry that has been accessed the fewest times overall. It tracks how often each entry is used and removes the least popular one.

LFU works well when some items are consistently popular over long periods.

A frequently accessed configuration setting, a viral social media post, or a top-selling product will accumulate a high access count and stay in the cache even if it is not accessed for a few seconds.

The downside is that LFU is slow to adapt. An entry that was extremely popular yesterday but is irrelevant today takes a long time to get evicted because its high historical count protects it.

New entries start with a low count and are vulnerable to immediate eviction, even if they are about to become popular.

Some implementations address this with a time-decay mechanism that gradually reduces access counts over time.

FIFO (First In, First Out)

FIFO evicts the oldest entry in the cache, regardless of how often or how recently it was accessed.

The first entry added is the first one removed.

FIFO is the simplest eviction policy to implement, and it makes sense when the age of data matters more than its popularity. Streaming data, time-ordered events, or rolling windows of recent logs are natural fits. It also has predictable behavior that makes capacity planning easier.

For most general-purpose caching, FIFO performs worse than LRU because it ignores access patterns entirely.

A heavily requested entry that was added early gets evicted even though many users still need it.

TTL-Based Expiration

TTL (Time to Live) is not technically an eviction policy in the same category as LRU or LFU. It is a complementary mechanism.

Every cache entry gets a timestamp, and when the TTL expires, the entry becomes eligible for removal regardless of how recently or frequently it was accessed.

TTL ensures that cached data does not grow infinitely stale. Even if a product's price changes in the database, the cached version will be replaced after the TTL expires. You set TTLs based on how stale you can tolerate the data being.

User session tokens might get a TTL of 30 minutes. Stock prices might get a TTL of 5 seconds. Static configuration data might get a TTL of 24 hours.

Most production caches combine TTL with another policy like LRU. Entries expire when their TTL runs out, and when the cache is full, LRU decides which non-expired entries to evict.

Random Eviction

Random eviction picks an entry at random and removes it. No access tracking, no age tracking, no overhead.

This sounds reckless, and for small caches it is. But research has shown that for very large caches, random eviction performs surprisingly close to LRU.

The math works out because in a large cache, a randomly chosen entry has a high probability of being something rarely accessed.

Random eviction's real advantage is zero bookkeeping.

LRU and LFU require data structures (linked lists, heaps, counters) to track access patterns.

Random eviction needs nothing. For systems where cache metadata overhead is a concern, random eviction is a legitimate option.

Policy	Evicts	Strengths	Weaknesses	Best For
LRU	Least recently accessed	Works well for most patterns, simple	Vulnerable to scans	General-purpose caching
LFU	Least frequently accessed	Keeps consistently popular items	Slow to adapt to changing popularity	Stable popularity distributions
FIFO	Oldest entry	Simple, predictable	Ignores access patterns	Time-ordered data, rolling windows
TTL	Entries past expiration time	Prevents stale data	Does not handle cache fullness alone	All caches (combined with another policy)
Random	Any entry	Zero overhead	Unpredictable	Very large caches, low-overhead systems

Interview-Style Question

Q: Your cache has a 95% hit ratio, but after running a nightly analytics job that scans all user records, the hit ratio drops to 40% and takes an hour to recover. What is happening and how do you fix it?

A: The analytics scan is flooding the cache with entries that are accessed exactly once, evicting the popular entries that drive the high hit ratio. Two fixes: first, configure the analytics job to bypass the cache entirely so it reads directly from the database. Second, if bypassing is not possible, consider switching from LRU to a scan-resistant policy like LRU-K or segmented LRU, which requires an entry to be accessed more than once before it gets promoted into the main cache. This way, one-time scan entries never displace genuinely popular data.

KEY TAKEAWAYS

LRU is the default for most use cases. It keeps recently accessed data and works well for typical access patterns.
LFU keeps the most popular items but adapts slowly to shifting popularity. Use it when access frequency is more predictive than recency.
FIFO is simple and works for time-ordered data, but it ignores how useful an entry actually is.
TTL prevents stale data by expiring entries after a set time. It is almost always used alongside another eviction policy.
Random eviction has near-zero overhead and performs surprisingly well for large caches.
Scan-resistant policies protect your cache from one-time bulk reads that would otherwise flush valuable entries.

Cache Challenges

Caching sounds simple on the surface: store frequently accessed data in a fast layer and serve it from there.

In practice, caching introduces a set of problems that have tripped up engineering teams for decades.

If you want to use caching effectively, you need to understand these failure modes and know how to prevent them.

Cache Invalidation: The Two Hard Problems in CS

There is a famous saying in computer science: "There are only two hard things in computer science: cache invalidation and naming things."

Cache invalidation earns its place on that list.

The problem is this: when the source data changes, the cached copy becomes stale. You need a strategy for removing or updating stale entries.

This sounds easy, but in a distributed system with multiple services writing to the same data, multiple cache layers, and race conditions between reads and writes, invalidation becomes genuinely difficult.

Three approaches exist.

Delete on write: when data changes in the database, immediately delete the cached entry so the next read fetches fresh data.
Update on write: when data changes, update the cache with the new value at the same time.
TTL-based expiration: do not actively invalidate at all; let entries expire naturally and accept some staleness.

Delete on write is the simplest and safest. It avoids the race condition where a cache update and a database write happen in a different order, leaving the cache with outdated data.

The cost is that the next read after a write will always be a cache miss.

For most applications, that one-miss penalty is far preferable to the risk of serving stale data.

Cache Stampede (Thundering Herd)

A cache stampede happens when a popular cache entry expires and hundreds or thousands of requests simultaneously discover the miss and all hit the database at the same time to rebuild the entry.

Instead of one request refilling the cache, you get a thousand requests all doing the same expensive database query concurrently. This can overwhelm your database in seconds.

The standard fix is a lock-based approach. When the first request discovers the cache miss, it acquires a lock. All subsequent requests for the same key see the lock and wait (or get a slightly stale value) instead of hitting the database.

Once the first request finishes and repopulates the cache, the lock releases, and all waiting requests get the fresh cached value.

Another approach is staggered TTLs. Instead of setting every entry to expire at the same time, add a small random offset to each TTL.

If your base TTL is 300 seconds, you might set individual entries to expire between 270 and 330 seconds.

This prevents synchronized expiration waves.

Cache Penetration and Cache Avalanche

Cache penetration happens when requests come in for data that does not exist in the cache or the database. Every request misses the cache, hits the database, finds nothing, and the cache remains empty because there is nothing to store.

If an attacker sends millions of requests for nonexistent IDs, every single one bypasses your cache and hammers your database.

The fix is to cache the negative result. When a database query returns nothing, store a placeholder (like a null marker) in the cache with a short TTL. Subsequent requests for the same nonexistent key hit the cache and get the null result without touching the database.

Bloom filters are another defense: a probabilistic data structure that can quickly tell you "this key definitely does not exist" before you even check the cache.

Cache avalanche is a broader version of the stampede problem. It happens when a large number of cache entries expire at the same time, causing a massive surge of database queries.

This can occur after a cache restart (everything is cold) or when many entries share the same TTL. Staggered TTLs, cache pre-warming (loading popular data before traffic hits), and rate limiting database queries during recovery all help prevent avalanches.

Hot Key Problem

A hot key is a single cache entry that receives a disproportionate amount of traffic.

If a celebrity tweets something that goes viral, the cache entry for that tweet gets hammered by millions of requests per second. Even a fast cache like Redis can buckle under concentrated load on a single key, because that key lives on one specific node in the cluster and all traffic for it hits that single machine.

Strategies for handling hot keys include replicating the entry across multiple cache nodes (so traffic is distributed), adding a small random suffix to the key to create multiple copies (tweet_123_v1, tweet_123_v2, etc.), or using an in-process local cache on each application server as a first layer to absorb some of the load before it reaches the distributed cache.

Cold vs. Warm Cache: Pre-Warming Strategies

A cold cache is an empty cache. After a deployment, a restart, or a new cache node being added to the cluster, the cache starts with nothing. Every request is a miss, and your database bears the full load until the cache fills up organically.

For low-traffic systems, this is fine.

The cache fills within a few minutes and the brief period of increased database load is manageable.

For high-traffic systems, a cold cache can be catastrophic.

If your cache normally absorbs 90% of your traffic and it suddenly goes empty, your database sees a 10x spike in load. If the database is sized to handle only the normal 10% that usually reaches it, it will be overwhelmed.

Cache pre-warming solves this by proactively loading popular data into the cache before traffic arrives.

Before a deployment, you can run a script that fetches the top 10,000 most-requested items and stores them in the new cache. Some teams maintain a "warm-up" dataset based on recent access logs, replaying the most common queries against the new cache after a restart.

Another approach is gradual traffic shifting.

Instead of switching all traffic to a new cache node at once, you route a small percentage first, letting the cache warm up, and gradually increase the percentage as the hit ratio climbs.

Distributed Cache Consistency

When your cache runs on multiple nodes across a cluster, keeping those nodes consistent with each other and with the database is a real challenge.

If you update a cache entry on node A but node B still has the old version, requests routed to node B return stale data.

If you use consistent hashing to route keys to specific nodes, this is less of a problem because the same key always goes to the same node. But if a node fails and keys get redistributed, entries can be temporarily missing or duplicated.

For most applications, eventual consistency in the cache is perfectly acceptable. The cache is already an optimization layer, not a source of truth.

If a user sees a slightly stale version of their profile for 200 milliseconds until the cache catches up, that is usually fine.

For applications where even brief staleness is unacceptable, you can implement a write-through strategy where every write updates both the database and the cache atomically, or use a pub/sub mechanism where cache nodes subscribe to change events and invalidate affected entries immediately.

Interview-Style Question

Q: You are designing a caching layer for a flash sale on an e-commerce platform. Millions of users will load the same product page at the same time. What caching challenges should you anticipate and how would you address them?

A: Three main challenges. First, the hot key problem: the sale product's cache entry will receive extreme traffic concentrated on a single key. Replicate the entry across multiple cache nodes and add a local in-process cache on each application server to absorb load before it reaches Redis. Second, cache stampede: if the entry expires during the sale, thousands of requests will race to the database simultaneously. Use a distributed lock so only one request rebuilds the entry while others wait. Third, cold cache risk: if the cache restarts during the sale, the database will be overwhelmed. Pre-warm the cache with sale product data before the event starts. Combine all three defenses: local caching, locking on miss, and pre-warming before the event.

Cache Failure Scenarios

KEY TAKEAWAYS

Cache invalidation is the hardest part of caching. Prefer delete-on-write over update-on-write to avoid race conditions.
Cache stampedes happen when popular entries expire and many requests hit the database at once. Use locks and staggered TTLs to prevent them.
Cache penetration exploits requests for data that does not exist. Cache null results and consider Bloom filters.
Hot keys concentrate traffic on a single cache node. Replicate the key across nodes or absorb load with local in-process caches.
Cold caches are dangerous for high-traffic systems. Pre-warm the cache before deployments and major traffic events.
Distributed cache consistency is usually eventual, and that is acceptable for most applications. The cache is an optimization layer, not the source of truth.

Up Next: Your system can now find data fast thanks to caching. But what happens when the traffic hitting your servers doubles in an hour? Who decides which server handles which request? That is the job of a load balancer. Chapter II, Lesson 4 covers load balancing fundamentals, algorithms, and advanced patterns that keep your system responsive even under massive traffic spikes.