Fact-checked by Grok 2 weeks ago

Consistent hashing

Consistent hashing is a distributed hashing technique that maps keys and nodes to a fixed circular , assigning each key to the nearest node in a manner, thereby minimizing the redistribution of keys when nodes are added or removed from the system. Introduced in by David Karger and colleagues, it was originally developed to address hot spots in web caching protocols, where high request volumes to popular content could overwhelm individual servers. The method ensures balance, distributing keys evenly across nodes so each handles approximately 1/|V| of the load where |V| is the number of nodes; monotonicity, preventing existing keys from remapping to different nodes upon changes; and properties like spread and load, which limit variations in key assignments across different node views to O(log |C|) where |C| is the number of caches. In practice, consistent hashing operates by applying a to both keys and s, projecting them onto a ; a key is then stored on the first encountered moving from its position. This approach supports in dynamic environments, such as networks and distributed storage, by requiring only O(1/|V|) key movements per change on average. It has become foundational in modern distributed systems, including Amazon's key-value store, which uses a variant for uniform load distribution across replicas, and , where it partitions data over a to facilitate additions or failures with minimal reorganization. Variants like virtual s further enhance load balancing by replicating positions on the ring, improving uniformity in heterogeneous clusters.

History

Origins in Distributed Systems

Consistent hashing emerged in 1997 from research at the (MIT) aimed at improving distributed caching mechanisms for the burgeoning . The technique was developed by David Karger, Eric Lehman, Tom Leighton, Matthew R. Levine, , and Rina Panigrahy to enable scalable, decentralized caching protocols that could adapt to dynamic network conditions without excessive reconfiguration. This invention was detailed in the seminal paper "Consistent Hashing and Random Trees: Distributed Caching Protocols for Relieving Hot Spots on the ," presented at the 29th Annual ACM Symposium on Theory of Computing (STOC 1997). The work addressed the limitations of traditional hashing methods in distributed environments, where frequent changes in the set of available cache nodes—such as proxies joining or leaving—would otherwise require remapping nearly all keys, leading to high overhead and instability. By introducing a "smooth" hashing scheme, the authors ensured that only an expected fraction of keys proportional to the change in the system needed reassignment, specifically bounding the number of remapped keys to O(n/m), where n represents the total number of keys and m the number of nodes. The primary motivation stemmed from the problem of hot spots in early web infrastructure, where a surge in requests for popular content—such as the Laboratory's site following the 1994 Shoemaker-Levy 9 comet impact or IBM's pages during the 1997 chess matches—overloaded individual servers or shared proxy caches, causing widespread delays and failures. In distributed proxy systems, uneven load distribution exacerbated these issues, as conventional uniform hashing failed to balance effectively across varying numbers of caches, often concentrating requests on a few overloaded nodes. Consistent hashing provided a solution by mapping both keys and nodes to a fixed circular space, minimizing global recomputation and enabling local decision-making for cache placement, thus alleviating hot spots while supporting the Web's rapid growth.

Evolution and Key Milestones

Following its initial proposal, consistent hashing saw early adoption in 1998 by , founded by the technique's co-inventors, who applied it for load balancing in their (CDN) to distribute web caching across global servers and minimize disruptions from node changes. This implementation addressed hot spots in web traffic, enabling scalable delivery of high-demand content like the 1999 Star Wars trailer, which handled millions of requests without overload. In the , consistent hashing expanded into (P2P) networks and distributed hash tables (DHTs), powering decentralized systems for efficient key lookups. A key milestone was its integration into , a scalable P2P lookup protocol introduced in 2001, which used consistent hashing to map keys to nodes on a ring, supporting dynamic joins and departures with only an O(1/n) fraction of keys affected, where n is the number of nodes. Subsequent DHT variants, such as and , built on this foundation, adapting consistent hashing for routing in large-scale, fault-tolerant overlays. By 2007, consistent hashing gained prominence in databases through Amazon's , a highly available key-value store that employed it to partition data across nodes while prioritizing availability over strict consistency in distributed environments. Dynamo's design influenced subsequent systems by demonstrating how consistent hashing, combined with virtual nodes, could balance load and enable seamless scaling in production settings handling petabytes of data. The 2010s brought refinements for heterogeneous clusters, notably in , which from its 2008 origins drew from but introduced virtual nodes (vnodes) in 2013 to approximate weighted consistent hashing. This allowed nodes with varying capacities to receive proportional token assignments on the hash ring, improving data distribution and reducing imbalances in multi-node setups without manual intervention. More recent milestones include its adoption in service meshes, such as Envoy proxy's ring hash load balancer, which supports traffic by using consistent hashing for sticky sessions and fault-tolerant routing across . Similarly, in 2017, Discord scaled to 5 million concurrent users by implementing a custom consistent hashing scheme in for distributing chat state across nodes, ensuring low-latency presence updates during rapid growth. As of , consistent hashing is emerging in sharding protocols for decentralized storage, with weight-based variants proposed to allocate across heterogeneous nodes, enhancing in systems like hierarchical sharding frameworks.

Core Concepts

The Hash Ring and Mapping

In consistent hashing, the hash space is conceptualized as a fixed-size circular structure, often referred to as the hash ring, spanning from 0 to $2^{32} - 1 to accommodate standard 32-bit hash outputs, though the original formulation uses a [0, 1] scaled from a large M. A consistent h maps both keys k (such as identifiers) and nodes n (such as servers) to points on this ring, ensuring assuming h is a strong hash like truncated to the ring size. This ring topology derives its consistency property from the circular arrangement, which minimizes disruptions during dynamic changes in the set, building on basic hashing principles where collisions are resolved spatially rather than via traditional partitioning. Nodes are positioned on the ring at their hashed values h(n_i), sorted in clockwise order around the circle. For a given key k hashed to position h(k), the key is assigned to the successor node, defined as the first node encountered when traversing clockwise from h(k). Mathematically, the successor node s is: s = \arg\min_{n_i} \left\{ h(n_i) \;\middle|\; h(n_i) > h(k) \text{ in clockwise order} \right\} with wrapping around the ring if no node lies clockwise from h(k) before completing the circle. This clockwise successor rule ensures that each key maps to exactly one , partitioning the key space into contiguous s owned by each . The ring's design achieves consistency by localizing remappings during additions or removals. When a is added, only the keys in the arc between the new 's position and its immediate clockwise predecessor are reassigned to the new ; similarly, removal affects only the arc it previously owned, which is redistributed to its successor. On average, this impacts O(n/m) keys, where n is the total number of keys and m the number of s, compared to the full remapping of O(n) keys in modulo-based schemes. This bounded disruption supports scalable distributed systems like caching protocols, where churn is frequent.

Key-to-Node Assignment Process

In consistent hashing, the assignment of a key to a node begins with computing the hash of the key, which maps it to a position on the circular hash ring. The key is then assigned to the node whose hashed position is the closest successor in the clockwise direction from the key's position on the ring. This lookup process ensures that each key is deterministically routed to exactly one node without requiring global state synchronization among the nodes. When adding a new node, its identifier is hashed to determine its position on the , and it is inserted at that point. Only the keys whose positions fall within the arc between the new node's predecessor and the new node itself are remapped to the new node, as these are the keys for which the new node becomes the closest successor. This localized remapping minimizes disruption, with the expected fraction of affected keys being approximately 1/(N+1), where N is the current number of nodes. Node removal follows a symmetric process: the removed node's position is deleted from the ring, and its arc is merged with that of its successor. All keys previously assigned to the removed node are reassigned to this successor node. The expected fraction of keys remapped in this case is also approximately 1/N, where N is the number of nodes before removal. For instance, consider a system with three evenly spaced nodes A, B, and C on the ring. Adding a fourth node D at a random position will split one of the existing arcs, remapping approximately one-fourth of the keys to D. This assignment mechanism provides deterministic key-to-node mappings based on fixed hash positions, eliminating the need for state replication across nodes and thereby supporting partition tolerance in distributed systems by allowing operations to proceed with partial or inconsistent views of the node set while bounding load deviations.

Implementation

Algorithmic Procedures

The hash ring in consistent hashing is typically maintained as a sorted collection of hash values for nodes (or their virtual representatives), implemented using a balanced such as a red-black tree or a sorted set , which enables O(log N) for lookups and updates, where N denotes the number of points on the ring. This structure stores pairs of hash values and associated node identifiers, ordered by the hash values to facilitate efficient successor queries in the circular space. To construct the ring, each physical generates multiple identifiers (e.g., by appending a sequence number to the name), hashes each identifier using a like or to produce a over a large fixed space (e.g., 128 or 160 bits), applies arithmetic to map values onto the (0 to 2^m - 1), and inserts the resulting points into the sorted while associating them with the original . This process ensures even distribution and is performed once during initialization or when the node set changes. Key-to-node lookup proceeds by hashing the key with the same function to obtain h(k), then querying the sorted structure for the ceiling (successor) of h(k) to identify the responsible node; the ring's circularity is handled by wrapping to the minimum value if no successor exists. The following pseudocode illustrates this procedure:
function findNode(key):
    h_k = hash(key)  // e.g., MD5(key) modulo ring_size
    successor = bst.ceiling(h_k)  // Find smallest entry >= h_k
    if successor is null:
        successor = bst.minimum()  // Wrap around to first entry
    return successor.node  // Return associated node
This operation achieves O(log N) time due to the balanced tree properties. When adding or removing a , the virtual node hashes for that node are inserted or deleted from the sorted , each in O(log N) time. Affected keys—those hashed into the arc between the changed node's predecessor and successor—are remapped to the new successor node, impacting on average O(K/N) keys (where K is the total number of keys across the system) for the reassignment step, plus O(log N) for the structural modification. In practice, this remapping can be deferred or handled lazily to minimize immediate overhead.

Virtual Nodes for Balance

In consistent hashing, virtual nodes address the potential for uneven distribution of keys across physical nodes by replicating each physical node multiple times on the hash ring. Specifically, for a system with N physical nodes, each node n is represented by v virtual nodes, where the hash values for these virtual nodes are computed as h(n || i) for i = 1 to v, with || denoting . These vN points are then placed on the , effectively subdividing the responsibility of each physical node into smaller, more evenly spaced segments. This approach spreads the positions of a single physical node's responsibilities more uniformly around the , reducing clustering that can occur with a single hash point per . The primary benefit of virtual nodes is improved load balancing, where the maximum load on any —measured as the deviation from the ideal share of 1/N—drops from O(\log N) with high probability in basic consistent hashing to O(1) as v grows sufficiently large, such as v = \Theta(\log N). This probabilistic guarantee ensures that no single bears a disproportionately large share of keys, enhancing overall stability and performance under varying workloads. In practice, systems like Amazon's employ virtual nodes to achieve near-uniform distribution, with empirical imbalance ratios as low as 10% even under heterogeneous capacities. Implementation involves generating the v hash points for each physical node and inserting them into a balanced (BST) that represents the ordered ring, allowing efficient successor lookups for key assignments. For a key k, its position h(k) is located in the BST, and the closest virtual node clockwise determines the responsible physical node. This maintains the core ring mapping from basic consistent hashing while distributing assignments more finely. However, increasing v introduces trade-offs: while balance improves, storage requirements grow to O(N v) for maintaining the BST, and lookup times increase marginally from O(\log N) to O(\log (N v)). The expected load per physical node remains approximately 1/N, with load variance reduced to roughly 1/(N v), providing tighter concentration around the mean as v rises. \text{Expected load per node} \approx \frac{1}{N}, \quad \text{variance} \approx \frac{1}{N v} This formulation highlights how virtual nodes statistically average out irregularities in distributions.

Advanced Techniques

Load Balancing Methods

Consistent hashing relies on the uniform hashing assumption, where keys are distributed evenly across the hash space, enabling balanced load distribution among nodes without requiring knowledge of the total number of nodes. However, in practice, this assumption often fails due to non-uniform key distributions, such as skewed access patterns in real-world workloads, leading to load imbalances where certain nodes handle disproportionately more keys or requests. To enhance balance while supporting replication, consistent hashing assigns each key to multiple replicas by mapping it to the r nearest successor nodes in the direction on the hash ring, where r is the replication factor (typically 3 in production systems). This approach not only provides but also contributes to load balancing by spreading replicas across distinct nodes, reducing the impact of node failures on overall distribution. Virtual nodes, as introduced in practical extensions, improve balance by representing each physical node with multiple points on the , approximating uniform spacing and mitigating the effects of non-uniform hashing. Empirical tuning of the number of virtual nodes per physical node (v) is essential for practical deployment; for instance, with N=100 physical nodes, setting v between 100 and 1000 yields a maximum load less than 1.2 times the average load, balancing distribution quality against lookup overhead. Load balance is typically measured using the standard deviation of key counts across nodes or the ratio of maximum to average load; in evaluated systems, increasing virtual nodes reduces this standard deviation, achieving ratios as low as 1.18 with v=100.

Mitigating Imbalances and Hotspots

In consistent hashing, hotspots arise when a small subset of popular keys, such as frequently accessed binary large objects (BLOBs) in content delivery networks, concentrate requests on a few nodes, leading to overload despite the use of virtual nodes for improved balance. This skew often follows Zipf-like distributions in real-world workloads, where a few items account for the majority of accesses, exacerbating load imbalances on the hash ring. To mitigate hotspots, systems employ dynamic reassignment techniques, where hot keys are identified through and migrated to additional nodes or replicated across multiple replicas beyond the standard assignment. This reactive approach spreads request traffic, often combined with dedicated caching layers at the application level to absorb bursts without altering hashing structure. For instance, requests for hot content can be routed via tree-based protocols overlaid on the hash , ensuring distribution to O( ) caches with high probability, where is the number of caches. For known or anticipated hotspots, biased loading adjustments incorporate elements into the consistent hashing framework, applying rendezvous computations selectively to hot keys while retaining the for general traffic to maintain uniformity and minimize remapping. , by selecting the node with the highest random weight for a key-node pair, provides finer-grained control over for skewed items without requiring modifications. nodes further reduce variance in , with probabilistic guarantees ensuring balanced . In practice, Akamai's uses consistent hashing to load within clusters of servers.

Extensions

Weighted and Biased Variants

Weighted consistent hashing addresses scenarios where nodes possess heterogeneous , ensuring that the load assigned to each node is proportional to its processing or storage capability. In this approach, each physical node i with c_i is represented by a number of virtual nodes v_i on the hash ring, where v_i = \left( \frac{c_i}{\sum_j c_j} \right) \times V and V is the total number of virtual nodes in the system. These virtual nodes are placed by hashing the node's identifier concatenated with an index for each virtual instance, distributing them evenly across the ring to approximate uniform load per unit. This method builds on the use of virtual nodes for , adapting their count to reflect capacity differences rather than treating all nodes equally. A well-known implementation of weighted consistent hashing is the Ketama library, which employs a continuum-based technique to position weighted points smoothly on the ring. In Ketama, weights determine the density of points for each node—higher-capacity nodes receive more points, generated via hashing of the node name and point indices for even spacing. This ensures that key assignments favor higher-capacity nodes proportionally, minimizing load imbalances in production environments like distributed caches. Biased variants further modify the ring to handle non-uniform costs beyond simple capacity. In CDNs, this can involve popularity-aware , where frequently requested content is preferentially mapped to multiple servers to reduce effective . Despite these benefits, weighted and biased variants introduce drawbacks, particularly increased remapping complexity when node weights or biases change. Updating the ring requires recomputing and repositioning multiple virtual nodes, which can temporarily elevate coordination overhead and disrupt ongoing assignments in dynamic clusters.

Integration in Modern Frameworks

Consistent hashing has been integrated into several open-source libraries that facilitate its use in distributed systems. The libketama library, implemented in C, provides a robust framework for consistent hashing with support for weighted node assignments, enabling efficient key distribution across servers in caching setups like Memcached. In Java, the Spymemcached client library incorporates consistent hashing for Memcached clusters, allowing seamless load balancing and minimal data remapping during node additions or failures. For Python developers, libraries such as hash_ring offer pure-Python implementations of consistent hashing, suitable for sharding in distributed caches and databases with low overhead. Modern frameworks have adopted consistent hashing for advanced load balancing scenarios. The Envoy proxy, integrated with since its 2021 updates, employs ring-based consistent hashing for traffic routing, ensuring sticky sessions and efficient distribution across upstream hosts in environments. This approach minimizes disruptions in service meshes by maintaining request even as cluster topology changes. Protocol extensions in content delivery networks (CDNs) leverage consistent hashing alongside emerging transport protocols. Post-2020 developments in QUIC-enabled CDNs utilize consistent hashing for server selection, combining QUIC's low-latency to route client requests to optimal edge nodes while preserving during dynamic . As of 2025, consistent hashing trends toward deeper integration in paradigms. In , it partitions asynchronous invocations across execution environments using consistent hashing, enabling scalable handling of billions of requests with reduced placement churn and improved reliability for event-driven workloads. Customizations of consistent hashing extend to protocols for . Variants of the algorithm incorporate consistent hashing to manage data placement and during failures. This adaptation builds on weighted variants to balance loads in replicated logs, ensuring without full rehashing.

Analysis and Comparisons

Complexity Measures

In consistent hashing, the lookup operation to determine the successor for a given key typically involves searching an ordered structure of positions on the hash circle. When implemented using a (BST) to store the hashed positions of (), the lookup is O(\log N), where N is the number of physical . Alternatively, maintaining a sorted array of these positions enables for lookups in O(\log (N v)) time, where v is the number of per physical . Node addition or removal requires updating the ordered structure by inserting or deleting the corresponding virtual positions, which takes O(\log (N v)) time in a BST or sorted implementation. Additionally, only a of keys need remapping to new s, leading to an expected remapping cost of O(K / N) operations, where K is the total number of keys stored across the system. This arises because the addition or removal of a single affects approximately $1/N of the hash circle, redistributing that share of keys. Formally, the expected of keys remapped upon such a change is given by \frac{|\Delta V|}{|V|} \approx \frac{1}{N}, where V is the current set of node positions on the circle and \Delta V is the change due to the node operation, assuming uniform hashing and balanced virtual node placement. The space complexity for maintaining the hash ring structure is O(N v), as each of the N nodes contributes v virtual positions that must be stored and ordered for efficient lookups. This overhead scales linearly with the number of virtual nodes used to improve load balance but remains modest compared to the total key storage. Overall, these measures render consistent hashing superior to simple modulo-based hashing schemes, where node changes necessitate remapping all O(K) keys, potentially causing significant disruption in large-scale systems.

Versus Alternative Hashing Schemes

Consistent hashing addresses key limitations of traditional simple modulo hashing, which assigns keys to nodes via a hash function modulo the current number of nodes n. When n changes—such as during node addition or failure—simple modulo hashing necessitates remapping nearly all K keys across the system, incurring O(K) computational cost and disrupting consistency by invalidating most prior assignments. In contrast, consistent hashing localizes disruptions, remapping only an expected O(K/n) fraction of keys on average, thereby maintaining higher and reducing overhead in dynamic clusters. Rendezvous hashing, proposed by and Ravishankar in as a name-based , provides an alternative to consistent hashing by selecting nodes via the highest random weight from pairwise key-node hash comparisons, typically using simple functions like XOR for efficiency. Unlike consistent hashing's ring-based structure, rendezvous hashing avoids maintaining a shared coordinate space, enabling stateless operation where each client independently computes assignments without global synchronization. This simplicity suits decentralized environments, though it demands O(n) hash evaluations per key lookup—scaling poorly with node count n—compared to consistent hashing's O(log n) lookup time when augmented with ordered node lists or trees. Consistent hashing also incorporates monotonicity as a core property: upon adding nodes, keys migrate only to new nodes and never rearrange among existing ones, preserving assignment stability. Consistent hashing trades some order strictness for efficiency, using the to approximate balance with lower , avoiding the full spatial overhead of purely monotonic designs. Key trade-offs between consistent hashing and alternatives like center on coordination and performance: the ring structure in consistent hashing requires distributed agreement on positions, introducing overhead in highly volatile networks, while 's stateless design eliminates this at the cost of repeated computations per operation. Both achieve comparable load balance under uniform distributions, but consistent hashing better mitigates hotspots through virtual nodes, whereas relies on hash quality for evenness. Consistent hashing is typically chosen for structured systems like databases, where its ring enables scalable partitioning with minimal remapping during growth. Rendezvous hashing, conversely, excels in simplicity-driven scenarios such as IoT and sensor networks, where stateless coordination supports swarm-like, low-resource deployments without centralized state.

Applications

In Distributed Storage Systems

Consistent hashing plays a pivotal role in distributed storage systems, particularly databases and key-value stores, by enabling efficient data partitioning across nodes in a . It maps keys to positions on a virtual ring, assigning ownership to nodes based on their positions, which facilitates horizontal scaling and without requiring full data reshuffling during cluster changes. This approach is foundational in systems prioritizing availability and partition tolerance over strict consistency, as per the implications in such architectures. Amazon Dynamo, introduced in 2007, employs consistent hashing for ring-based partitioning of its space, using to achieve load balancing and . Each is hashed to a point on the ring, with the successor responsible for storage; to handle heterogeneous capacities, Dynamo assigns multiple virtual s per physical , typically 100-1000 per machine. Replication is managed with N=3 replicas by default, where each write operation targets the primary and its N-1 successors on the ring, ensuring data availability even during failures. This design minimizes the impact of additions or removals, as only a fraction of keys (approximately 1/N) need remapping. Apache Cassandra, released in 2010, builds on similar principles with a token-based ring structure using to partition data across nodes. Keys are hashed to within a fixed range (0 to 2^{127} - 1 for the MD5-based RandomPartitioner used in early versions as described in the 2009 paper); each node owns a contiguous range of ; virtual nodes (vnodes) allow finer-grained distribution, with a default of 256 per node to balance load. Modern versions use the Murmur3 partitioner with a token range of -2^{63} to 2^{63}-1. supports tunable levels, such as ONE, QUORUM, or ALL, allowing applications to trade off between and during reads and writes. Node additions or removals trigger seamless token range transfers between neighbors, preserving data locality and enabling linear . Riak, another Dynamo-inspired system, utilizes consistent hashing on a ring for data distribution, with a default ring size of 64 partitions distributed across the physical nodes, resulting in approximately 64/N virtual nodes per physical node where N is the number of nodes, to mitigate hotspots and ensure even partitioning. Like Dynamo, it replicates data across N nodes (typically 3), but incorporates active anti-entropy mechanisms using Merkle trees to detect and resolve inconsistencies across replicas in the background, promoting eventual consistency without blocking operations. This approach supports high availability in dynamic environments, where node failures are common, by periodically comparing hash trees to repair divergent data. The primary benefits of consistent hashing in these systems include horizontal , as clusters can expand by adding nodes that absorb only a small portion of (roughly 1/N of the ), and minimized movement during resizes, which reduces and network overhead compared to traditional hashing schemes. It also enhances by localizing the effects of node changes to adjacent segments, allowing the system to maintain operations with minimal disruption. A key challenge is propagating ring membership changes across the cluster, addressed via protocols where nodes periodically exchange state information with random peers, achieving in awareness. While effective for , can introduce temporary inconsistencies in during high churn, requiring careful tuning of intervals to balance convergence speed and overhead.

In Content Delivery and Networking

Consistent hashing plays a crucial role in content delivery networks (CDNs) by enabling efficient distribution of user requests across edge servers, minimizing disruptions from server additions or failures. In Akamai's early implementation around 1998, consistent hashing was employed for selecting edge servers to serve , which helps reduce misses by ensuring that requests for the same content are consistently routed to the same server when possible, thereby improving hit rates and overall performance. This approach addresses hot spots in web caching by mapping both content keys and server identifiers to a ring, allowing dynamic scaling without widespread remapping of requests. In modern real-time communication platforms, consistent hashing facilitates low- sharding of users across distributed servers. For instance, adopted a custom consistent hashing in 2017 to shard over 5 million concurrent users across voice servers, which routes users to nearby servers and significantly reduces join for voice channels. By hashing user identifiers onto the and assigning virtual nodes to servers, this method ensures balanced load distribution and minimal reshuffling during server changes, supporting seamless scaling for high-traffic voice interactions. For architectures, weighted consistent hashing enhances routing in service meshes and proxies. Envoy Proxy utilizes a ring hash load balancer that incorporates endpoint weights to route requests across upstream services, enabling handling of thousands of requests per second (RPS) while preserving session affinity. This weighted variant adjusts server capacities on the hash ring, directing more traffic to higher-capacity nodes and maintaining consistency in environments like those using for inter-service communication. Other caching and proxy systems also leverage consistent hashing for upstream load balancing in networking contexts. Varnish Cache employs consistent hashing through its directors module to shard requests across backend servers, promoting even distribution and efficiency in high-throughput scenarios. Similarly, Nginx's upstream module supports ketama-based consistent hashing via the hash directive with the consistent parameter, which routes HTTP traffic to backends while minimizing key remappings upon failures. A key performance advantage in these networking applications is the reduction of remapped requests to less than 1% during node failures or additions, as only a proportional to 1/n (where n is the number of s) of the keys are affected, thereby enhancing system availability and reducing latency spikes. This property proves particularly valuable in CDNs and load balancers, where maintaining request affinity directly impacts and resource utilization.

References

  1. [1]
    [PDF] Consistent Hashing and Random Trees: Distributed Caching ...
    We describe a family of caching protocols for distrib-uted networks that can be used to decrease or eliminate the occurrence of hot spots in the network.
  2. [2]
    [PDF] Introduction and Consistent Hashing - Stanford University
    1. 1997: The implementation of consistent hashing given in this lecture first appeared in a research paper in STOC (“Symposium on the Theory of Computing”) ...
  3. [3]
    [PDF] Dynamo: Amazon's Highly Available Key-value Store
    Dynamo uses consistent hashing to partition its key space across its replicas and to ensure uniform load distribution. A uniform key distribution can help ...
  4. [4]
    [PDF] Cassandra - A Decentralized Structured Storage System
    Sep 18, 2009 · The principal advantage of consistent hashing is that departure or arrival of a node only affects its im- mediate neighbors and other nodes ...
  5. [5]
    Consistent hashing and random trees - ACM Digital Library
    Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web
  6. [6]
    The Akamai network: a platform for high-performance internet ...
    The Akamai network: a platform for high-performance internet applications ... View or Download as a PDF file. PDF. eReader. View online with eReader ...
  7. [7]
    [PDF] The Akamai network: a platform for high-performance internet ...
    ABSTRACT. Comprising more than 61,000 servers located across nearly 1,000 networks in 70 countries worldwide, the Akamai platform delivers.
  8. [8]
    [PDF] Chord: A Scalable Peer-to-peer Lookup Service for Internet
    The Ohaha system uses a consistent hashing-like algorithm for mapping documents to nodes, and Freenet-style query routing [18]. As a result, it shares some of ...
  9. [9]
    Dynamo | Apache Cassandra Documentation
    Consistent Hashing using a Token Ring. Cassandra partitions data over storage nodes using a special form of hashing called consistent hashing. In naive data ...
  10. [10]
    The Apache Software Foundation Announces Apache Cassandra ...
    Jan 2, 2013 · "In Cassandra v1.2 the introduction of vnodes will simplify managing clusters while improving performance when adding and rebuilding nodes.<|control11|><|separator|>
  11. [11]
    How Discord Scaled Elixir to 5,000,000 Concurrent Users
    Jul 6, 2017 · Discord is a distributed system achieved through consistent hashing. Using this method requires us to create a ring data structure that can be ...
  12. [12]
    [PDF] Web caching with consistent hashing | Brown CS
    Cache Resolver, the distributed Web caching system that we developed, eliminates inter-cache communi- cation on a miss by letting clients decide for them-.
  13. [13]
    [PDF] Lecture 7 1 Review of Consistent Hashing
    1. Minimal Data Movment: When a node is added or removed, consistent hashing ensures that only a small fraction of keys are relocated. This is because only the ...
  14. [14]
    [PDF] Chord: A Scalable Peer-to-peer Lookup Protocol for Internet ...
    Chord uses consistent hashing [12] to assign keys to Chord nodes. Consistent ... The distributed hash table uses Chord to identify the node responsible ...
  15. [15]
    [PDF] Simple Efficient Load Balancing Algorithms for Peer-to-Peer Systems
    Consistent hashing is an instance of the distributed hash table. (DHT) paradigm for assigning items to nodes in a peer-to-peer sys- tem: items and nodes are ...
  16. [16]
    [PDF] consistent-hashing-and-random-trees-distributed-caching-protocols ...
    Roughly speaking, a consistent hash function is one which changes minimally as the range of the function changes. Through the development of good consistent.Missing: adoption 1998
  17. [17]
    [PDF] Simple Load Balancing for Distributed Hash Tables
    In a basic consistent hashing approach, both peers and keys are hashed onto a one dimensional ring. Keys are then assigned to the nearest peer in the clockwise ...<|separator|>
  18. [18]
    Hints | Apache Cassandra Documentation
    Hints are a data repair technique in Cassandra, stored when a replica is unavailable, to reduce data inconsistency and help achieve eventual consistency.Missing: imbalances | Show results with:imbalances
  19. [19]
    RJ/ketama: C library for consistent hashing, and langauge bindings
    Just use the number of megs allocated to the server as the weight. The weightings are realised by adding more or less points to the continuum. Implementation == ...
  20. [20]
    [PDF] Algorithmic Nuggets in Content Delivery - UMass Amherst
    Consistent hashing [17, 22] is used by the CDN to balance the load within a single cluster of servers. Consistent hash- ing was the first algorithmic innovation ...
  21. [21]
    Consistent Hashing: Algorithmic Tradeoffs | by Damian Gryski
    Apr 2, 2018 · The original consistent hashing paper called servers “nodes”. Papers will generally talk about“nodes”, “servers”, or “shards”. This article ...
  22. [22]
    libketama: Consistent Hashing library for memcached clients
    Apr 10, 2007 · Ketama is an implementation of a consistent hashing algorithm, meaning you can add or remove servers from the memcached pool without causing a complete remap ...
  23. [23]
    Configuring your ElastiCache client for efficient load balancing ...
    The ElastiCache Memcached Java client is based on the open-source spymemcached Java client, which has consistent hashing capabilities built in. The library ...
  24. [24]
    hash_ring - PyPI
    Consistent hashing is a scheme that provides a hash table functionality in a way that the adding or removing of one slot does not significantly change the ...
  25. [25]
    Supported load balancers — envoy 1.37.0-dev-95d072 documentation
    When priority based load balancing is in use, the priority level is also chosen by hash, so the endpoint selected will still be consistent when the set of ...Weighted Least Request · Ring Hash · MaglevMissing: gRPC 2021
  26. [26]
    stateful session persistence #16698 - envoyproxy/envoy - GitHub
    May 27, 2021 · Http session persistence is achieved through hash-based (consistent hash algorithm) load balancing. When the state of the backend servers ...
  27. [27]
    [PDF] A QUIC-Based CDN Server Selection System Supporting Multiple ...
    Polygon is a QUIC-based CDN server selection system that supports multiple ... ProtocolVersion. Initial_max_stream_data. Initial_max_data. Idle_timeout.<|separator|>
  28. [28]
    Handling billions of invocations – best practices from AWS Lambda
    Mar 17, 2025 · The consistent hashing approach proved to be efficient and enabled Lambda to offer robust asynchronous invocation performance to customers. As ...Asynchronous Invocations... · Shuffle-Sharding · Observability For...
  29. [29]
    Building a Large-scale Distributed Storage System Based on Raft
    Nov 20, 2019 · Although you can use a consistent hashing algorithm like Ketama to reduce the system jitter as much as possible, it's hard to totally avoid it.
  30. [30]
    [PDF] A Survey and Fair Comparison of Consistent Hashing Algorithms
    Jul 5, 2023 · The goal of this paper is to analyze state of the art to determine the most effective consistent hashing algorithm to use to distribute data ...Missing: seminal | Show results with:seminal
  31. [31]
    [PDF] A survey and fair comparison of consistent hashing algorithms
    The original paper introduced the term consistent hashing, and described three key properties of consistent hashing, namely smoothness, spread and load, which ...
  32. [32]
    The Ultimate Guide to Consistent Hashing | Toptal®
    Consistent Hashing is a distributed hashing scheme that operates independently of the number of servers or objects in a distributed hash table.Introducing Hash Tables... · Scaling Out: Distributed... · The Rehashing Problem
  33. [33]
    Rendezvous-based data dissemination for supporting mobile sinks ...
    By using a rendezvous CH, RDDM constructs routing paths from source nodes to mobile sinks without flooding in our BVI and thus can save energy of sensor nodes.
  34. [34]
    Active Anti-Entropy - Riak Documentation
    The active anti-entropy (AAE) subsystem was added to Riak in versions 1.3 and later to enable conflict resolution to run as a continuous background process.
  35. [35]
  36. [36]
    VMOD directors - Varnish Directors Module
    Hashing creates the seemingly random order for placement of backends on the consistent hashing ring. When xshard.add_backend() was called with a weight ...
  37. [37]
    Module ngx_http_upstream_module - nginx
    If the consistent parameter is specified, the ketama consistent hashing method will be used instead. The method ensures that only a few keys will be remapped to ...upstream · server · zone · hash