Fact-checked by Grok 2 weeks ago

Cache stampede

A cache stampede, also known as the dogpile effect, cache miss storm, or cache choking, is a cascading failure in caching systems where the expiration of a popular cached item triggers multiple concurrent requests to regenerate it from an underlying data source, overwhelming the system and causing severe performance degradation.^[1] This issue commonly arises in high-concurrency environments like web servers, databases, and distributed applications using tools such as Memcached or Redis, where caches are employed to reduce load on slower backends by storing frequently accessed data.^[1] When cache entries expire simultaneously—often due to fixed TTLs—numerous processes or threads experience misses and initiate expensive recomputations, leading to amplified database queries, increased latency, resource exhaustion, and potential outages.^[2] The problem is exacerbated in massively parallel systems under high load, as even moderate request rates (e.g., 10 per second) combined with long regeneration times (e.g., 3 seconds) can result in dozens of redundant operations.^[1] Mitigation strategies focus on preventing concurrent recomputations, including probabilistic early expiration—where items expire slightly before their TTL using an optimal exponential distribution to limit stampede size to approximately 2.4 expected recomputations—locking mechanisms to serialize access, and lazy regeneration techniques like promises that allow waiting clients to reuse the result once computed.^[1] These approaches, validated in both theoretical models and practical implementations, have been applied in production systems to maintain stability during cache cold starts or bursts.^[2]

Caching Fundamentals

Core Concepts in Caching

Caching is a fundamental technique in computer systems that involves temporarily storing copies of frequently accessed data in a faster storage layer, such as main memory or solid-state drives (SSDs), to minimize latency and reduce the computational or I/O load on primary data sources like databases or remote servers.^[3] This approach leverages the principle of locality of reference, where recently or repeatedly used data is likely to be requested again, allowing systems to serve such data directly from the cache rather than incurring the higher costs of retrieval from slower underlying storage.^[4] Key components of caching include the mechanisms for data retrieval and storage. A cache hit occurs when the requested data is found in the cache, enabling rapid access without backend involvement, while a cache miss happens when the data is absent, prompting the system to fetch it from the original source and often populate the cache with the result for subsequent requests.^[3] Caches are typically structured as key-value stores, where data is indexed by unique keys for efficient lookups and updates; prominent implementations include Memcached, a distributed in-memory system designed for high-performance object caching, and Redis, an advanced key-value store supporting persistent storage and various data structures.^[5] The benefits of caching are particularly evident in performance-critical applications. By serving data from cache on hits, systems achieve sub-millisecond response times compared to potentially seconds-long backend queries, substantially lowering overall latency.^[6] This also decreases the query volume on backend databases, preventing overload and enabling better resource utilization, which in turn supports scalability for web applications handling thousands of concurrent users.^[7] For example, a web server might cache user session data—such as authentication tokens or preferences—in memory to serve repeated page loads without querying the database each time, thereby improving user experience and throughput.^[6]

Cache Expiration Mechanisms

Cache expiration mechanisms ensure that cached data remains relevant by systematically invalidating entries over time or in response to changes, preventing the accumulation of outdated information while optimizing resource usage. A primary policy is the time-to-live (TTL), which sets a fixed duration for each cache entry after insertion or last update, beyond which the entry is deemed invalid. For example, a news article might employ a TTL of 5 minutes to balance freshness with reduced backend queries during high-traffic periods.^[8] Invalidation approaches vary in timing and triggers to manage expiration efficiently. Lazy invalidation defers removal of expired entries until the next access attempt, allowing the cache to retain them temporarily without immediate cleanup, as implemented in systems like Redis where passive expiration checks occur on read operations.^[9] In contrast, eager invalidation proactively removes or marks entries as invalid at the exact expiration moment, ensuring stricter enforcement but potentially increasing computational overhead for periodic sweeps.^[10] Event-driven invalidation complements these by triggering removal upon specific data updates, such as database modifications notified via broadcast or pub/sub mechanisms, which is particularly effective in distributed environments to maintain consistency without relying solely on timers.^[11] These mechanisms involve inherent trade-offs between data staleness—where users might receive slightly outdated information—and system performance, as longer TTLs boost cache hits but risk obsolescence. The hit ratio for an individual item under TTL policies can be approximated using renewal theory as \text{Hit Ratio} \approx 1 - e^{-\lambda \cdot \text{TTL}}, where \lambda represents the item's request arrival rate assuming Poisson arrivals; this formula highlights how higher request rates or longer TTLs improve hit probabilities, though aggregate cache performance depends on item popularity distributions.^[12] Common implementations distinguish between soft and hard expiration to fine-tune behavior. Hard expiration strictly discards entries at TTL end, enforcing immediate invalidation without serving stale data, which aligns with conservative consistency needs. Soft expiration, often used in web caches, permits serving potentially stale entries while initiating background revalidation or refresh, reducing latency spikes as seen in HTTP hinted caching where "soft" directives like must-revalidate guide rechecks without full eviction.^[13] In cache-aside patterns, where applications explicitly read from and write to the cache alongside the primary store, expiration is managed by the application code setting TTLs during population and handling misses by fetching and re-caching fresh data with new timers.^[14]

The Cache Stampede Phenomenon

Definition and Triggers

A cache stampede, also known as the dogpile effect, occurs when a popular cache entry expires, resulting in a surge of simultaneous cache misses that trigger multiple redundant requests to the backend system for regeneration of the same data.^[1] This phenomenon leads to excessive load on the underlying data sources, as uncoordinated client requests compete to recompute and repopulate the cache without awareness of each other's efforts.^[15] The primary triggers for a cache stampede involve the expiration of high-traffic cache items during periods of peak demand, such as when a trending resource's time-to-live (TTL) elapses and numerous concurrent users request it simultaneously.^[1] Without mechanisms to coordinate or deduplicate these requests, the absence of the cached value causes an avalanche of backend queries, amplifying the issue in scenarios like web applications serving dynamic content.^[16] Contributing factors include monolithic backend architectures that cannot scale to handle sudden burst loads from uncoordinated requests, the lack of built-in request queuing or throttling in caching layers, and cold starts in distributed cache systems where nodes initialize without pre-warmed data, exacerbating miss rates under load.^[16] These elements compound the problem by preventing efficient load distribution and allowing parallel regeneration attempts to overwhelm resource-constrained services.^[17]

System Impacts and Examples

A cache stampede can cause immediate overload on backend systems, as numerous concurrent requests bypass the cache and flood the underlying data sources, leading to sharply increased latency—such as response times escalating from milliseconds to several seconds—along with elevated CPU and memory utilization.^[16] This overload often triggers cascading failures, including database timeouts, where the backend becomes unresponsive under the sudden demand surge.^[18] Over the longer term, these events degrade overall service reliability, resulting in prolonged user experience issues like slow page loads and intermittent unavailability, which can erode trust in the platform.^[19] In high-traffic environments such as e-commerce sites, such disruptions may translate to significant economic losses through downtime, including forfeited revenue from abandoned transactions during peak periods.^[20] A prominent real-world example occurred on September 23, 2010, when Facebook suffered a 2.5-hour global outage affecting hundreds of millions of users, triggered by a configuration change that invalidated cache entries for critical backend services, causing a stampede of requests that overwhelmed the databases.^[18] Engineers reported that the incident stemmed from an automated error-handling mechanism exacerbating the miss rate, leading to near-total system failure until manual intervention restored caching stability.^[21] This phenomenon is closely analogous to the thundering herd problem, in which N concurrent cache misses generate N independent backend queries instead of a single one, amplifying the load on resources by a factor of N and potentially collapsing the system under otherwise manageable traffic.^[22]

Mitigation Techniques

Synchronization-Based Methods

Synchronization-based methods employ locking primitives to coordinate access to cache repopulation, ensuring that only a single process or thread executes the costly backend operation while others either wait or fall back to serving stale content. This deterministic serialization prevents the thundering herd of concurrent requests from overwhelming the data source, a common issue in cache-aside patterns where application code explicitly manages cache interactions.^[1] In large-scale systems like Google's Zanzibar authorization service, lock tables are used to track outstanding requests for a cache key, blocking duplicates until the primary request populates the cache and thereby mitigating flash hot spots from concurrent misses.^[23] A standard implementation involves attempting to acquire a mutex lock upon detecting a cache miss for a given key. If successful, the holder fetches the data from the backend, stores it in the cache, and releases the lock; concurrent requesters that fail to acquire the lock can either briefly wait, queue the request, or immediately return any available stale value to avoid blocking users. This "lock-on-miss" strategy is exemplified in languages like Java, where the ReentrantLock class enables non-blocking attempts via tryLock(), allowing implementers to avoid indefinite waits and handle contention gracefully.^[1] To further optimize for read-heavy workloads, advanced variants use read-write locks or semaphores that permit multiple concurrent reads of existing (potentially stale) cache entries while exclusively locking writes during repopulation. Read-write locks, for instance, allow all non-rebuilding requests to proceed in parallel under shared read access, reducing overall latency compared to exclusive mutexes, though they require careful management to prevent writer starvation in highly contended scenarios. Semaphores can similarly limit the number of concurrent rebuilders to one while queuing excess requests, providing bounded protection against stampedes.^[1] These methods effectively reduce backend load by eliminating redundant fetches but introduce trade-offs: waiting requesters experience added latency, and if the locking process fails midway (e.g., due to a crash), the cache may remain invalid until the lock times out, potentially prolonging the stampede. Proper lock TTL tuning and fault-tolerant fallbacks, such as serving stale data universally during contention, are essential to balance protection and availability.^[1] Illustrative pseudocode for a basic lock-on-miss approach in a cache-aside pattern might appear as follows:

if (!cache.contains([key](/page/Key))) {
    if (lock.tryLock(timeout)) {
        try {
            if (!cache.contains([key](/page/Key))) {  // Double-check after acquiring lock
                value = backend.fetch([key](/page/Key));
                cache.put([key](/page/Key), value);
            }
        } finally {
            lock.unlock();
        }
    } else {
        // Handle contention: return stale value if available, or wait/queue
        return cache.getStale([key](/page/Key)) ?: waitForCompletion([key](/page/Key));
    }
}
return cache.get([key](/page/Key));
if (!cache.contains([key](/page/Key))) {
    if (lock.tryLock(timeout)) {
        try {
            if (!cache.contains([key](/page/Key))) {  // Double-check after acquiring lock
                value = backend.fetch([key](/page/Key));
                cache.put([key](/page/Key), value);
            }
        } finally {
            lock.unlock();
        }
    } else {
        // Handle contention: return stale value if available, or wait/queue
        return cache.getStale([key](/page/Key)) ?: waitForCompletion([key](/page/Key));
    }
}
return cache.get([key](/page/Key));

This pattern ensures serialization while minimizing unnecessary backend calls.^[1] The challenges of concurrent cache misses and the need for synchronization to avoid them were early highlighted in the development of Memcached, a distributed caching system created by Brad Fitzpatrick, whose 2004 writings emphasized strategies for high-performance caching in web-scale environments like LiveJournal.^[24]

Probabilistic and Early Revalidation Approaches

Probabilistic early expiration is a non-blocking mitigation strategy for cache stampedes that distributes cache misses over time by randomly expiring entries before their nominal time-to-live (TTL). Instead of all entries expiring simultaneously at the TTL boundary, each cache access independently decides whether to refresh the entry early based on a probabilistic mechanism, staggering regeneration requests and reducing the likelihood of concurrent misses. A simple implementation uses a uniform distribution to set an early expiration gap, e.g., effective expiration time as t = \text{TTL} - \xi U, where U \sim \text{Uniform}(0,1) and \xi is a small fraction of TTL (e.g., \xi = \sqrt{n}/(2\lambda) for n processes), resulting in expirations spread evenly across the interval [TTL - ξ, TTL]. This approach ensures that under high load, only a fraction of requests trigger regeneration, preventing overload on the backing store.^[1] More advanced variants optimize the probability distribution to minimize both stampede size and unnecessary early refreshes. For instance, using an exponential distribution Exp(\lambda) for the time until early expiration bounds the expected stampede size to E[S] \leq (e^\lambda - 1)(1/\lambda + 1/e) and the expected early gap to E[T] \leq (1/\lambda) \log n, where n is the number of concurrent processes. Simulations demonstrate that this outperforms uniform distributions, limiting stampede sizes to under 10 processes even for thousands of requests, compared to over 80 for uniform methods with similar parameters, achieving roughly a 90% reduction in stampede risk.^[1] Lazy regeneration, also known as background refresh or stale-while-revalidate, complements probabilistic methods by allowing the cache to serve slightly outdated data immediately while asynchronously updating the entry in the background. Upon a cache hit on a stale or near-expiring item, the request returns the existing value without delay, and a separate thread or process initiates regeneration to repopulate the cache for future accesses. This technique, formalized in HTTP caching directives, ensures zero added latency for users during potential stampede windows, as regeneration occurs opportunistically without blocking.^[25] These approaches trade short-term data staleness for improved system resilience, introducing minimal overhead since decisions are local to each cache instance and require no inter-process coordination. While probabilistic expiration may lead to more frequent backend calls under low load, it self-regulates under high traffic by naturally increasing refresh rates proportionally to request volume. Background refresh further mitigates freshness concerns by decoupling serving from updating, though it necessitates careful tuning of staleness tolerances to balance accuracy and performance.^[1]^[26]

Distributed and External Strategies

In distributed systems, external recomputation addresses cache stampedes by offloading the responsibility of repopulating expired cache entries to a separate, asynchronous process or service, decoupling it from incoming user requests. This approach ensures that when multiple requests arrive simultaneously for a missed cache key, the application serves stale data or a placeholder while the external system handles the computationally expensive regeneration in the background, preventing a surge of backend queries. For instance, message queues such as Kafka can be used to trigger async rebuilds upon expiration detection, where cache misses enqueue tasks for a dedicated worker pool to process and update the cache without blocking the main application flow.^[27]^[28] Distributed locking extends coordination beyond single nodes by employing external coordination services to ensure that only one instance across a cluster attempts cache repopulation for a given key. Tools like ZooKeeper or etcd provide atomic operations for acquiring locks distributed over multiple nodes, serializing the recomputation process while other nodes wait or serve existing data. In a cluster, a node encountering a cache miss attempts to acquire a distributed lock via ZooKeeper's ephemeral nodes or etcd's lease-based mechanisms; if successful, it performs the computation and releases the lock upon updating the shared cache, avoiding redundant efforts and load spikes. This method is particularly effective in horizontally scaled environments where local locks alone cannot synchronize across instances.^[29]^[30] Hybrid approaches integrate external recomputation with content delivery networks (CDNs) to absorb stampede loads at the edge, combining distributed caching with on-the-fly regeneration. In systems like Cloudflare, cache revalidation uses request collapsing and a per-asset cache lock at edge data centers: when multiple simultaneous requests hit a stale or missed entry, only one forwards to the origin server, while others are held and served from the resulting response, effectively serializing origin traffic without propagating the stampede upstream. This edge-level handling leverages the CDN's global distribution to mitigate high-traffic triggers, such as sudden popularity surges, by recomputing and caching at the network periphery before requests reach the core infrastructure.^[31] Scalability considerations in sharded environments highlight how stampedes can be managed distributively, as distributed locking or external queues ensure recomputation is coordinated cluster-wide, preventing overload on backend partitions by limiting regeneration to a single node per key while using consistent hashing to route subsequent requests efficiently. This approach maintains performance as the system scales horizontally, with tools like etcd providing the necessary cross-shard synchronization to avoid amplifying stampedes through partition boundaries.^[32]

References

[1]
[PDF] Optimal Probabilistic Cache Stampede Prevention - UCSD CSE
This phenomenon, known as cache stampede, severely limits the performance of databases and web servers. A natural countermeasure to this issue is to let the ...
[2]
Caches, Promises and Locks - Redis
May 29, 2019 · ... stampedes on the underlying DBMS that the cache is supposed to protect. A stampede, among other things, consists of multiple parallel ...
[3]
What is Caching and How it Works - Amazon AWS
A cache miss occurs when the data fetched was not present in the cache. Controls such as TTLs (Time to live) can be applied to expire the data accordingly.AWS Caching Solutions · Database Caching · Best Practices · 一般缓存<|control11|><|separator|>
[4]
Caching | Redis
In-memory caching is a technique where frequently accessed data is stored in memory instead of being retrieved from disk or remote storage. This technique ...
[5]
memcached - a distributed memory object caching system
Jul 28, 2025 · Memcached is an in-memory key-value store for small chunks of arbitrary data (strings, objects) from results of database calls, API calls, or page rendering.Downloads · About · Memcached Documentation · BlogMissing: stampede | Show results with:stampede
[6]
Memcached | Distributed Key-Value Store - Amazon AWS
Memcached is an open source, distributed, in-memory key-value store. Learn how Memcached works and how you can use it as a cache or session store to speed ...
[7]
How to use Redis for Query Caching
Jan 31, 2025 · Redis is an in-memory datastore, best known for caching. Redis allows you to reduce the load on a primary database while speeding up database reads.
[8]
[PDF] Engineering Web Cache Consistency - Computer Science Cornell
local cache when the TTL is valid and the cached object is invalidated. This request could result in a cache miss for server-driven protocols, but a cache hit.Missing: seminal | Show results with:seminal
[9]
EXPIRE | Docs - Redis
Redis keys are expired in two ways: a passive way and an active way. A key is passively expired when a client tries to access it and the key is timed out.Expireat · Expiretime · Pexpire · TTL
[10]
Expiry Policies | GridGain Documentation
Eager TTL prioritizes proactively removing expired data, instead of waiting for the entry to be touched. The expired entries are made unavailable as soon as TTL ...
[11]
[PDF] A Scalable Low-Latency Cache Invalidation Strategy for Mobile ...
In this paper, we propose an IR-based cache invalidation algorithm, which can significantly reduce the query latency and efficiently utilize the broadcast ...Missing: seminal | Show results with:seminal
[12]
[PDF] Analysis of TTL-based Cache Networks - Hal-Inria
Dec 5, 2012 · Requests arrive to cache 1 according to a renewal process with generic inter-arrival time X and arrival rate λ. TTLs at all caches are mutually ...
[13]
Hinted caching in the Web Abstract - ACM Digital Library
... soft expiration ("your copy should be revalidated"), it does not help the cache decide between discarding the cached copy (because it is useless) or keeping ...<|separator|>
[14]
Caching patterns - Database Caching Strategies Using Redis
A proper caching strategy includes effective use of both write-through and lazy loading of your data and setting an appropriate expiration for the data to keep ...
[15]
Optimal probabilistic cache stampede prevention - ACM Digital Library
This phenomenon, known as cache stampede, severely limits the performance of databases and web servers. A natural countermeasure to this issue is to let the ...
[16]
Cache Stampede or Dogpile Problem in System Design
Sep 14, 2023 · The Cache Stempede or Dogpile Problem is defined as a situation where the system receives multiple requests for a cached resource simultaneously.
[17]
Improving Performance Using Distributed Cache with Couchbase
Oct 11, 2023 · Cold starts were impacting the cache hit rate and adding pressure to the source database, causing increased latency.
[18]
https://engineering.fb.com/2010/09/23/uncategorized/more-details-on-today-s-outage/
[19]
What is a cache stampede and how we solved it by writing our own ...
Oct 29, 2019 · Cache stampede is a fancy name for a cascading failure that will occur when caching mechanisms and underlying application layers come under high load both at ...
[20]
Avoiding Cache Stampede at DoorDash
Aug 3, 2018 · At DoorDash we wanted to solve the cache stampede that could be caused by a L1 cache miss, resulting in parallel duplicate reads to L2 or DB.Missing: paper | Show results with:paper
[21]
Explaining What a Cache Stampede Is and How to Prevent It Using ...
Jan 22, 2025 · A cache stampede happens when expired cache triggers a flood of backend requests.
[22]
How a Cache Stampede Caused One of Facebook's Biggest Outages
Jan 25, 2021 · A cache stampede occurs when several threads attempt to access a cache in parallel. If the cached value doesn't exist, the threads will then attempt to fetch ...Add More Caches · Locks And Promises · Early Recomputation<|control11|><|separator|>
[23]
How a Cache Stampede Caused One of Facebook's Biggest Outages
Jul 23, 2025 · The root reason for the outage changed into a cache stampede, a phenomena that happens when a large number of users try to access a cached resource at the same ...Missing: paper | Show results with:paper<|separator|>
[24]
Thundering Herd/Cache Stampede - Distributed Computing Musings
Dec 14, 2021 · This phenomenon is known as cache stampede and its result of a problem known as thundering herd. These are herds of concurrent requests running over the cache.
[25]
[PDF] Zanzibar: Google's Consistent, Global Authorization System | USENIX
Jul 10, 2019 · To handle the “cache stampede” problem [3], where con- current requests create flash hot spots before the cache is populated with results ...
[26]
Distributed Caching with Memcached - Linux Journal
Distributed Caching with Memcached. Software. by Brad Fitzpatrick. on August 1, 2004. Memcached is a high-performance, distributed caching system.Missing: dogpiles stampede
[27]
refreshAfterWrite jitter #1506 - ben-manes caffeine - GitHub
Thus, some type of probabilistic early expiration allows one client to reload early to refresh the entry. This is due to a shared expiration time. That's ...
[28]
Keeping things fresh with stale-while-revalidate | Articles - web.dev
Jul 18, 2019 · stale-while-revalidate helps developers balance between immediacy—loading cached content right away—and freshness—ensuring updates to the cached ...Missing: stampede | Show results with:stampede
[29]
Sometimes I cache: implementing lock-free probabilistic caching
Dec 26, 2024 · Optimal cache stampede solution. There seems to be a lot of decisions going on here. To solve this, we can reference an academic paper written ...
[30]
Asynchronous Caching Mechanisms to Overcome Cache Stampede ...
Jul 23, 2025 · This mechanism helps mitigate cache stampede problems by ensuring that expired cache entries continue to provide service until fresh data is ...
[31]
System Design Interview - Distributed Cache - Serhat Giydiren
Feb 3, 2021 · A comprehensive, in-depth guide to designing a distributed caching system. We explore core concepts from data partitioning with consistent ...
[32]
What Is Distributed Locking for Cache Rebuilds, and How Does It ...
A cache stampede (also known as the dogpile effect or cache miss storm) occurs when a cached item expires and many processes try to rebuild it at once. In high- ...
[33]
Revalidation and request collapsing - Cache / CDN - Cloudflare Docs
Jul 28, 2025 · The cache lock ensures that Cloudflare only sends the origin one request at a time for a given asset from a location in Cloudflare's network, ...Missing: stampede | Show results with:stampede
[34]
Designing a Distributed Cache: Redis and Memcached at Scale
Jul 8, 2025 · Cache Stampede Problem. A cache stampede occurs when multiple clients simultaneously request a missing key, overwhelming the backend database ...