Fact-checked by Grok 2 weeks ago

Cuckoo filter

A cuckoo filter is a probabilistic designed for approximate set membership testing, which determines whether an element is likely present in a set with a small probability of false positives but no false negatives. It builds on principles by storing compact fingerprints—short hash-derived bit strings representing items—within an array of buckets, enabling efficient insertions, lookups, and deletions. Introduced in 2014 by Bin Fan, David G. Andersen, Michael Kaminsky, and Michael D. Mitzenmacher at the ACM CoNEXT conference, the cuckoo filter addresses limitations of earlier structures like Bloom filters by supporting dynamic item removal without rebuilding the entire data structure. Unlike Bloom filters, which use multiple hash functions and fixed bit positions leading to variable cache misses during lookups, cuckoo filters organize fingerprints into a hash table-like array where each bucket holds a small number of entries (typically 4 or 8), ensuring lookups involve at most two candidate buckets and thus predictable performance. The core operations are straightforward: insertion places a fingerprint in one of two possible buckets via , potentially relocating existing entries to maintain space; lookup checks both candidate buckets for a match; and deletion removes a matching fingerprint from one of the two locations. These features make cuckoo filters particularly suitable for applications requiring high-speed approximate membership queries, such as network packet processing, caching systems, and databases. In terms of efficiency, cuckoo filters achieve higher space utilization—up to 95%—and lower s compared to Bloom filters for rates below 3%, using approximately 12.60 bits per item versus 13.00 for Bloom filters at a 0.19% . Performance evaluations show cuckoo filters delivering lookup throughputs of about 14 million operations per second, outperforming Bloom filters' 10 million, while construction speeds reach 5 million keys per second. Subsequent research has refined the structure, including simplifications for easier analysis and variants like Morton filters for further space savings, but the original design remains influential for its balance of simplicity and practicality.

Overview

Definition and Purpose

A cuckoo filter is a space-efficient probabilistic designed for approximate membership queries, which determine whether an element is likely present in a set. It operates by storing short fingerprints—fixed-length bit strings derived from functions of the elements—in a organized according to principles, enabling fast lookups with a bounded . The primary purpose of a cuckoo filter is to support efficient set membership testing in scenarios where exact membership is not required, but space and query speed are critical, such as in networking, caching, or database indexing. Unlike traditional , it allows for dynamic insertions and deletions of elements without rebuilding the entire structure, making it suitable for evolving datasets. By using fingerprints instead of full keys, it minimizes storage overhead while preserving low error probabilities for membership tests. Key advantages include a tunable that remains bounded regardless of the number of insertions, the ability to perform deletions accurately by removing specific fingerprints, and space efficiency comparable to or better than Bloom filters for false positive rates below 3%, often achieving up to 95% table occupancy. This design draws inspiration from hashing's relocation mechanism to handle collisions, ensuring high performance in practice.

History and Development

The cuckoo filter was first introduced in 2014 by Bin Fan, David G. Andersen, Michael Kaminsky, and Michael Mitzenmacher in their paper "Cuckoo Filter: Practically Better Than Bloom," presented at the ACM CoNEXT conference. The development was motivated by the limitations of Bloom filters, which do not support deletions without rebuilding the entire structure or risking false negatives, and by drawing on the relocation strategy of to enable dynamic operations. This approach aimed to provide a more flexible alternative for approximate membership queries in resource-constrained environments. The initial proposal positioned the cuckoo filter as a practical improvement over both standard Bloom filters and counting Bloom filters, offering better space efficiency and support for insertions and deletions, particularly suited for network applications such as packet processing and caching. Subsequent developments included open-source implementations, such as the reference C++ library released alongside the original paper, which facilitated adoption in various programming ecosystems. By 2018, discussions at RedisConf highlighted its integration potential, leading to official support in the RedisBloom module starting in version 2.2 in December 2019, enabling its use in production databases for probabilistic set membership testing.

Algorithm

Data Structure Components

The cuckoo filter is implemented as a fixed-size consisting of m , where m is typically chosen as a ranging from $2^{15} to $2^{30} to accommodate the expected number of items while maintaining efficiency. Each bucket holds a fixed number of slots, denoted as b, with common values being 4 or 8 slots per bucket to allow for multiple candidate positions during item placement. Instead of storing full item representations, the filter records fingerprints, which are short, fixed-length hashes of the items, typically 8 to 16 bits long (or more precisely, f bits where f ranges from 6 to 20 based on desired false positive rates). These fingerprints serve as compact proxies for the items, enabling space savings over storing complete hashes or keys while supporting probabilistic membership queries. To determine candidate bucket locations, the cuckoo filter employs two hash functions inspired by : the primary hash h_1(x) = \text{[hash](/page/Hash)}(x) \mod m, which maps an item x to its initial , and the alternate hash h_2(x, j) = h_1(x) \oplus \text{[hash](/page/Hash)}(j) \mod m, where j is the , providing a secondary position for relocation if needed. Optionally, the structure includes such as the target load factor \alpha, which represents the desired occupancy level (e.g., 95% for b=4 or 98% for b=8) to optimize the trade-off between space utilization and insertion performance.

Insertion Operation

The insertion operation in a cuckoo filter begins by computing a compact f of the input item x, typically using a to produce a fixed-size bit string (e.g., 8 or 16 bits). Two candidate indices are then derived using functions: the primary i_1 = \text{hash}(x) and the alternate index i_2 = i_1 \oplus \text{hash}(f), where \oplus denotes bitwise XOR. If either i_1 or i_2 contains an empty entry, the f is placed in that entry, and the insertion succeeds. When both candidate buckets are full, the filter initiates a cuckoo relocation process to make space. One of the two buckets (i_1 or i_2) is selected at random as the starting point i, and an existing fingerprint e is randomly chosen from that bucket. The new fingerprint f is swapped with e in the entry, effectively evicting e. The alternate bucket for the evicted fingerprint (now held in the variable f) is computed as i = i \oplus \text{hash}(f). If this new bucket has an empty entry, the evicted fingerprint is inserted there; otherwise, the relocation repeats by evicting another fingerprint from the current bucket and continuing the process. This relocation is recursive and limited to a maximum number of attempts, known as , typically set to 500 to balance performance and success rate. Each kick involves swapping and relocating an evicted to its alternate , mimicking the "cuckoo" bird's nest displacement behavior. The serve as partial representations of the keys, enabling probabilistic collision handling during subsequent lookups by matching against stored values to detect mismatches without storing full keys. If the maximum kicks are exceeded without finding an empty slot, the insertion fails, indicating the filter is at or near ; this occurs rarely at load factors below 95%. The following outlines the process:
Algorithm: Insert(x)
f ← [fingerprint](/page/Fingerprint)(x)
i₁ ← [hash](/page/Hash)(x)
i₂ ← i₁ ⊕ [hash](/page/Hash)(f)
if bucket[i₁] or bucket[i₂] has an empty entry then
    add f to that [bucket](/page/Bucket)
    return success
i ← randomly choose i₁ or i₂
for n = 0 to MaxNumKicks do
    randomly select entry e from [bucket](/page/Bucket)[i]
    swap f with [fingerprint](/page/Fingerprint) in e
    i ← i ⊕ [hash](/page/Hash)(f)
    if [bucket](/page/Bucket)[i] has an empty entry then
        add f to [bucket](/page/Bucket)[i]
        return success
return failure
```[](https://www.cs.cmu.edu/~binfan/papers/conext14_cuckoofilter.pdf)

### Lookup Operation

The lookup operation in a cuckoo filter determines whether a given item is likely present in the set by checking for the existence of its [fingerprint](/page/Fingerprint) in the filter's [buckets](/page/Bucket). To perform a lookup for an item $x$, the process begins by computing the [fingerprint](/page/Fingerprint) $f = \text{[fingerprint](/page/Fingerprint)}(x)$, which is a compact [hash](/page/Hash) representation of the item. Next, the primary [bucket](/page/Bucket) index $i_1 = [h](/page/Hash_function)(x)$ is calculated using a [hash function](/page/Hash_function) $h$, where $h$ maps the item to one of the filter's [buckets](/page/Bucket). If the [fingerprint](/page/Fingerprint) $f$ exactly matches any entry in [bucket](/page/Bucket) $i_1$, the operation returns a positive response, indicating that the item is likely present.[](https://www.cs.cmu.edu/~binfan/papers/conext14_cuckoofilter.pdf)

If no match is found in the primary bucket, the alternate bucket index $i_2 = i_1 \oplus h'(f)$ is computed, where $h'$ is a secondary hash function and $\oplus$ denotes the bitwise XOR operation, providing the candidate location for relocation during insertion. The entries in bucket $i_2$ are then checked for an exact match with $f$. A positive response is returned only if the fingerprint is found in either the primary or alternate bucket, with no further relocation or probing involved in the lookup process itself. This ensures a straightforward query without modifying the filter.[](https://www.cs.cmu.edu/~binfan/papers/conext14_cuckoofilter.pdf)

A positive response during lookup signifies a probable membership, but it may result in a false positive if the matching [fingerprint](/page/Fingerprint) belongs to a different item due to hash collisions. Such probabilistic errors are inherent to the filter's [design](/page/Design) and are controlled by the choice of [fingerprint](/page/Fingerprint) size and load factor, though detailed bounds on this probability are analyzed separately. The [fingerprint](/page/Fingerprint) used in matching is the same fixed-size [hash](/page/Hash) computed for insertions, serving as a succinct [proxy](/page/Proxy) for the full item.[](https://www.cs.cmu.edu/~binfan/papers/conext14_cuckoofilter.pdf)

In terms of efficiency, the lookup operation achieves constant time complexity, $O(1)$ on average, as it requires examining at most two buckets, each containing a fixed number of slots (typically 4 or 8 fingerprints per bucket). This makes lookups fast and cache-friendly, involving only a small number of memory accesses regardless of the filter's size.[](https://www.cs.cmu.edu/~binfan/papers/conext14_cuckoofilter.pdf)

### Deletion Operation

The deletion operation in a cuckoo filter enables the removal of an item from the set while preserving the filter's integrity for remaining elements. To delete an item $x$, the process begins by computing its fingerprint $f = \text{fingerprint}(x)$ and the two candidate bucket indices $i_1 = h_1(x)$ and $i_2 = h_2(x, f)$, mirroring the lookup procedure.[](https://dl.acm.org/doi/10.1145/2674005.2674994) The filter then scans both candidate buckets for a fingerprint matching $f$; if a match is found in either bucket, one instance of that fingerprint is removed, and the operation succeeds.[](https://dl.acm.org/doi/10.1145/2674005.2674994) Unlike insertion, deletion requires no relocation of other fingerprints, simplifying the process and avoiding the need for counting mechanisms.[](https://dl.acm.org/doi/10.1145/2674005.2674994)

This arbitrary removal of a matching fingerprint—without distinguishing multiples due to the absence of counters—can potentially delete a colliding entry from another item if fingerprints overlap, though partial-key cuckoo hashing mitigates widespread issues by allowing such collisions to coexist in candidate buckets.[](https://dl.acm.org/doi/10.1145/2674005.2674994) Importantly, the false positive probability remains unchanged post-deletion, as the operation does not introduce new lookup errors.[](https://dl.acm.org/doi/10.1145/2674005.2674994) Compared to standard Bloom filters, which lack native deletion support and require variants like counting Bloom filters that consume 3–4 times more space, cuckoo filters provide efficient dynamic deletion without compromising non-deleted items' membership queries.[](https://dl.acm.org/doi/10.1145/2674005.2674994)

In the edge case where no matching fingerprint is found in the candidate buckets, the item is treated as absent, and the operation returns without error or modification.[](https://dl.acm.org/doi/10.1145/2674005.2674994) This design ensures safe, reversible set membership without false negatives for previously inserted elements, assuming the filter avoids bucket overflows.[](https://dl.acm.org/doi/10.1145/2674005.2674994)

## Theoretical Properties

### Space Complexity

The [space complexity](/page/Space_complexity) of a cuckoo filter arises from its underlying cuckoo hash table structure, which stores [fingerprints](/page/Fingerprint) of inserted items across a fixed number of buckets, each containing multiple slots.[](https://www.cs.cmu.edu/~dga/papers/cuckoo-conext2014.pdf) Each [bucket](/page/Bucket) consists of $ b $ slots, where $ b $ (typically 4) is the number of [fingerprints](/page/Fingerprint) it can hold, and each [fingerprint](/page/Fingerprint) is $ f $ bits long, resulting in $ b \times f $ bits per [bucket](/page/Bucket).[](https://www.cs.cmu.edu/~dga/papers/cuckoo-conext2014.pdf) The total table size in buckets is approximately $ \frac{n}{b \alpha} $, where $ n $ is the number of items and $ \alpha $ is the load factor (the fraction of total slots occupied).[](https://www.cs.cmu.edu/~dga/papers/cuckoo-conext2014.pdf)

The overall space requirement is thus $ \frac{f}{\alpha} n $ bits, where the average bits per item $ \frac{f}{\alpha} $ balances storage efficiency with insertion success probability.[](https://www.cs.cmu.edu/~dga/papers/cuckoo-conext2014.pdf) For practical deployments, an optimal fingerprint size of 8 bits is commonly selected to achieve low false positive rates while keeping overhead minimal, paired with a load factor of up to 95% (or 98% with larger $ b = 8 $) to maximize utilization without excessive relocation failures during insertions.[](https://www.cs.cmu.edu/~dga/papers/cuckoo-conext2014.pdf)

This configuration yields competitive space usage compared to Bloom filters; for instance, at a 1% [false positive rate](/page/False_positive_rate), a cuckoo filter requires approximately 10.5 bits per item (with $ f = 10 $, $ \alpha = 0.95 $), slightly more than the Bloom filter's 9.6 bits per item but with the added benefit of supporting deletions.[](https://www.cs.cmu.edu/~dga/papers/cuckoo-conext2014.pdf) In general, cuckoo filters exhibit lower space overhead than standard Bloom filters for false positive rates below 3%, as the fixed additive term in the fingerprint size becomes negligible relative to the logarithmic component for smaller error probabilities.[](https://www.cs.cmu.edu/~dga/papers/cuckoo-conext2014.pdf)

### False Positive Probability

The false positive probability (FPP) in a cuckoo filter is the likelihood that a lookup [operation](/page/Operation) incorrectly identifies a non-inserted item as present. Unlike traditional Bloom filters, where the FPP increases with the load factor, the cuckoo filter provides a stable upper bound on the FPP due to its fixed lookup locations and eviction mechanism.[](https://www.cs.cmu.edu/~binfan/papers/conext14_cuckoofilter.pdf)

The cuckoo filter's design yields a tighter upper bound of $ 1 - \left(1 - \frac{1}{2^f}\right)^{2b} \approx \frac{2b}{2^f} $, where $ f $ is the [fingerprint](/page/Fingerprint) size in bits and $ b $ is the number of slots per bucket; this bound holds independently of the load factor as long as insertions succeed without overflow.[](https://www.vldb.org/pvldb/vol12/p502-lang.pdf)[](https://www.cs.cmu.edu/~binfan/papers/conext14_cuckoofilter.pdf)

This stability arises from the cuckoo relocation process, which prevents bucket overflows and maintains high load factors (up to 95%) without significantly degrading accuracy; for instance, with $ b = 4 $ and $ f = 8 $, the FPP remains below 3% even at 95% load.[](https://www.cs.cmu.edu/~binfan/papers/conext14_cuckoofilter.pdf) By design, cuckoo filters produce no false negatives, as lookups check both possible bucket locations for an exact fingerprint match, ensuring that inserted items are always detected if the insertion succeeded.[](https://www.cs.cmu.edu/~binfan/papers/conext14_cuckoofilter.pdf)

Deletions do not alter the FPP, as they simply remove matching fingerprints from one of the candidate buckets without affecting other entries or requiring rehashing.[](https://www.cs.cmu.edu/~binfan/papers/conext14_cuckoofilter.pdf) Tuning the fingerprint size $ f $ directly impacts the FPP: smaller values (e.g., $ f = 6 $) increase the probability of collisions during lookups but allow for higher space efficiency, while the relocation mechanism ensures overall [stability](/page/Stability) compared to Bloom filters, where FPP degrades more sharply under increasing load.[](https://www.cs.cmu.edu/~binfan/papers/conext14_cuckoofilter.pdf)

### Time Complexity

The lookup and deletion operations in a cuckoo filter exhibit average O(1) [time complexity](/page/Time_complexity), involving a fixed number of [bucket](/page/Bucket) checks—typically two—via two [hash](/page/Hash) functions, independent of the total number of stored items.[](https://www.cs.cmu.edu/~binfan/papers/conext14_cuckoofilter.pdf) This fixed probing ensures predictable performance for membership queries and removals, with deletion requiring no relocations beyond the initial check.[](https://www.cs.cmu.edu/~binfan/papers/conext14_cuckoofilter.pdf)

Insertion achieves amortized O(1) time complexity but has a worst-case [O(n](/page/O(n)) due to potential chains of relocations in the underlying partial-key [cuckoo hashing](/page/Cuckoo_hashing) mechanism.[](https://www.cs.cmu.edu/~binfan/papers/conext14_cuckoofilter.pdf) In practice, the number of relocations remains small and constant in expectation, with theoretical bounds showing O(log n) depth occurring only with exponentially low probability.[](https://www.brics.dk/RS/01/32/BRICS-RS-01-32.pdf) Implementations mitigate worst-case scenarios by capping relocation attempts at a small constant, such as 500, after which the table is expanded to maintain efficiency.[](https://www.cs.cmu.edu/~binfan/papers/conext14_cuckoofilter.pdf)

Cuckoo filters also benefit from improved cache performance compared to alternatives like Bloom filters, as fingerprints within buckets are stored sequentially in [memory](/page/Memory), typically incurring only two cache misses per lookup versus scattered accesses that can lead to more misses in bit-vector-based structures.[](https://www.cs.cmu.edu/~binfan/papers/conext14_cuckoofilter.pdf)

Empirical evaluations on modern hardware demonstrate high throughput, with lookups achieving 10–14 million operations per second and insertions around 5–10 million operations per second at 95% occupancy, outperforming Bloom filters in deletion-supporting scenarios.[](https://www.cs.cmu.edu/~binfan/papers/conext14_cuckoofilter.pdf)

## Comparisons with Other Filters

### Bloom Filters

Bloom filters, introduced by Burton Howard Bloom in 1970, are probabilistic data structures designed for efficient membership queries with a tunable false positive probability (FPP). Unlike cuckoo filters, standard Bloom filters do not support deletion operations natively, as removing an element would require unsetting specific bits, which may have been set by other elements due to hash collisions. This limitation necessitates rebuilding the entire filter or switching to space-intensive variants like counting Bloom filters for dynamic sets, leading to higher overall space consumption in scenarios involving insertions and deletions.[](https://www.cs.cmu.edu/~dga/papers/cuckoo-conext2014.pdf)

In terms of space efficiency and FPP, both structures achieve comparable performance, requiring approximately 1.44 \log_2(1/\epsilon) bits per item for a target FPP of \epsilon, though cuckoo filters can operate at higher load factors without immediate resizing, avoiding the need to rehash all elements during growth. Cuckoo filters also enable indirect support for counting occurrences through their deletion mechanism and optional extensions like larger fingerprints, whereas standard Bloom filters require additional bits per entry for counting. This makes cuckoo filters more adaptable for evolving datasets without proportional space penalties.

The insertion process in Bloom filters relies on multiple (typically k) independent hash functions to set corresponding bits in a [bit array](/page/Bit_array), distributing elements evenly but without relocation. In contrast, cuckoo filters employ two hash functions based on [cuckoo hashing](/page/Cuckoo_hashing), allowing for relocation of existing fingerprints to accommodate new insertions, which enhances load handling and maintains higher utilization rates under heavy insertion loads.[](https://www.cs.cmu.edu/~dga/papers/cuckoo-conext2014.pdf)

Cuckoo filters are preferable for applications involving deletable or dynamic sets, such as network caching or database indexing, where deletion support reduces space overhead. Bloom filters, however, remain ideal for static sets with simpler implementation needs, like spell-checkers or [URL](/page/URL) blacklists, due to their straightforward design and minimal computational overhead for pure membership testing.[](https://www.cs.cmu.edu/~dga/papers/cuckoo-conext2014.pdf)

### Counting Bloom Filters

The counting Bloom filter extends the standard [Bloom filter](/page/Bloom_filter) to support deletions by replacing each bit in the [bit array](/page/Bit_array) with a small [counter](/page/Counter), typically 4 bits wide, at each [hash](/page/Hash) position. For insertion, an [element](/page/Element) hashes to k positions where the corresponding counters are incremented; for deletion, the counters at those same positions are decremented, allowing the structure to track the presence or absence of elements dynamically without rebuilding the filter. This mechanism enables approximate membership queries with deletion support, but the counters introduce overhead since most positions remain at zero or low values.[](https://www.eecs.harvard.edu/~michaelm/postscripts/esa2006b.pdf)[](https://pages.cs.wisc.edu/~jussara/papers/00ton.pdf)

However, the space efficiency of counting Bloom filters is significantly reduced compared to standard Bloom filters, requiring approximately 4 to 8 times more space due to the counter bits— for instance, about 38 bits per item for a 1% false positive rate with 4-bit counters, versus 9-10 bits for the non-deletable version. This overhead arises because each of the m array positions must accommodate a [counter](/page/Counter) large enough to handle expected increments without frequent [overflow](/page/Overflow), often necessitating 4-8 bits per slot depending on the desired multiplicity range and load factor.[](https://www.usenix.org/system/files/nsdip13-paper6.pdf)[](https://www.eecs.harvard.edu/~michaelm/postscripts/esa2006b.pdf)

In contrast, the cuckoo filter supports deletions while maintaining space efficiency comparable to a standard [Bloom filter](/page/Bloom_filter) (around 9 bits per item for a 1% [false positive rate](/page/False_positive_rate)) by storing fixed-size [fingerprints](/page/Fingerprint)—short hashes of the elements—in a [hash table](/page/Hash_table) with multiple entries per [bucket](/page/Bucket), and using a [cuckoo](/page/Cuckoo) relocation strategy during insertions to resolve conflicts without counters. This approach avoids the per-position counter overhead entirely, as deletions simply remove the matching [fingerprint](/page/Fingerprint) from its [bucket](/page/Bucket) (potentially triggering relocations if needed, similar to the insertion process). As a result, cuckoo filters achieve deletable approximate membership testing with lower [space](/page/Space) usage than counting Bloom filters, particularly when [false positive rates](/page/False_positive_rate) below 3% are targeted.[](https://www.usenix.org/system/files/nsdip13-paper6.pdf)

A key trade-off is that counting Bloom filters can support multiplicity by tracking counts greater than one per element, enabling approximate frequency queries, whereas cuckoo filters are designed for sets with at most one occurrence per element and do not maintain counts, though they handle single-instance deletions accurately without false negatives. For space-constrained applications requiring only single deletions without multiplicity support, such as dynamic set membership in networking or caching, cuckoo filters are generally preferred over counting Bloom filters due to their superior space efficiency and performance.[](https://www.usenix.org/system/files/nsdip13-paper6.pdf)

### Quotient Filters

A quotient filter is a probabilistic data structure that represents a set using a single contiguous [hash table](/page/Hash_table), where each entry consists of a [quotient](/page/Quotient) (determining the slot position) and a [remainder](/page/Remainder) (serving as a short [fingerprint](/page/Fingerprint) of the element).[](https://arxiv.org/abs/1208.0290) For insertion, the hash of an element is split into the [quotient](/page/Quotient), which selects the initial slot, and the [remainder](/page/Remainder), which is stored there if the slot is empty; collisions are resolved by shifting subsequent entries to the right within a contiguous run of occupied slots.[](https://arxiv.org/abs/1208.0290) Deletions are supported by marking slots as empty and shifting entries leftward to maintain contiguity, avoiding the need for counters or additional space per element.[](https://arxiv.org/abs/1208.0290)

In contrast to the cuckoo filter's relocation mechanism, which uses multiple [hash](/page/Hash) functions to "kick out" existing entries to alternative slots during insertion, the quotient filter relies on sequential shifting within the table.[](https://www.cs.cmu.edu/~dga/papers/cuckoo-conext2014.pdf) This makes cuckoo filter insertions generally faster, as relocations are localized to candidate buckets, but they carry a risk of failure requiring table resizing if maximum relocations are exceeded; quotient filter shifts, while more predictable in avoiding such failures, can become slower due to potentially long chains of occupied slots at high load factors.[](https://www.cs.cmu.edu/~dga/papers/cuckoo-conext2014.pdf) Lookup in quotient filters involves scanning from the computed slot until a matching [remainder](/page/Remainder) is found or an empty slot/end-of-run marker is encountered, which is efficient at low to moderate loads but degrades beyond 75% occupancy.[](https://www.cs.cmu.edu/~dga/papers/cuckoo-conext2014.pdf)

Both structures achieve high space efficiency, requiring approximately 1-2 bits per item beyond the [fingerprint](/page/Fingerprint) size for low false positive rates, though the cuckoo filter maintains better [performance](/page/Performance) and [stability](/page/Stability) at load factors up to 95% without frequent resizing, while quotient filters may need 10-25% more space for [metadata](/page/Metadata) to handle shifts and runs.[](https://www.cs.cmu.edu/~dga/papers/cuckoo-conext2014.pdf) The quotient filter's design also enables straightforward merging of multiple filters by concatenating tables, a feature less native to cuckoo filters.[](https://arxiv.org/abs/1208.0290)

Cuckoo filters are preferable for high-throughput applications like [network packet](/page/Network_packet) processing, where insertion and lookup speed are critical and occasional resizes are tolerable.[](https://www.cs.cmu.edu/~dga/papers/cuckoo-conext2014.pdf) Quotient filters suit scenarios requiring ordered operations, such as [streaming data](/page/Streaming_data) or flash-optimized storage, where predictable shifting and dynamic resizing provide reliability without relocation overhead.[](https://arxiv.org/abs/1208.0290)

## Applications and Implementations

### Real-World Use Cases

Cuckoo filters are employed in [network packet](/page/Network_packet) filtering to detect heavy hitters or [blacklist](/page/The_Blacklist) IP addresses without storing exact data, enabling efficient processing of high-volume traffic while minimizing memory usage. For instance, they facilitate [IP packet](/page/IP_packet) filtering by matching packet headers against predefined rules, outperforming traditional [hash](/page/Hash) tables in space efficiency for [security](/page/Security) monitoring tasks.[](https://beei.org/index.php/EEI/article/view/4202) In [software-defined networking](/page/Software-defined_networking) (SDN), optimized cuckoo filters support [network](/page/Network) caching by approximating membership queries for [multicast](/page/Multicast) and forwarding decisions, reducing overhead in distributed environments.[](https://par.nsf.gov/servlets/purl/10208434)

In caching systems, cuckoo filters provide approximate membership testing for large sets in LRU caches, allowing systems to evict items dynamically with support for deletions, which enhances performance over static structures like Bloom filters. This is particularly valuable in high-speed key-value stores and [web cache](/page/Web_cache) sharing protocols, where they offer better space efficiency than Bloom filters for false positive rates below 3%.[](https://www.cs.cmu.edu/~dga/papers/cuckoo-conext2014.pdf)

For database indexing, cuckoo filters optimize query processing by serving as lightweight secondary indexes, representing relationships between keys and data partitions to accelerate lookups in columnar stores. The RedisBloom module, integrated since version 2.2 in December 2019, incorporates cuckoo filters for approximate set membership in applications like [fraud](/page/Fraud) detection and [targeted advertising](/page/Targeted_advertising), enabling efficient filtering without full data storage.[](https://redis.io/docs/latest/operate/oss_and_stack/stack-with-enterprise/release-notes/redisbloom/)[](https://www.vldb.org/pvldb/vol13/p3559-kipf.pdf)

In web crawling, cuckoo filters help avoid revisiting URLs by maintaining a probabilistic set of crawled links, supporting dynamic updates to handle evolving crawl frontiers more effectively than non-deletable alternatives. This application leverages their space efficiency for managing billions of URLs in distributed [crawl](/page/Crawl)ers, similar to [Bloom filter](/page/Bloom_filter) uses but with added deletion capability for refined control.[](https://www.cs.cmu.edu/~dga/papers/cuckoo-conext2014.pdf)

### Software Implementations

Cuckoo filters have been integrated into [Redis](/page/Redis) through the RedisBloom module, providing native-like support for probabilistic set membership queries since version 2.2 of the module in December 2019, which aligns with broader adoption around [Redis](/page/Redis) 6.0 in 2020.[](https://redis.io/docs/latest/operate/oss_and_stack/stack-with-enterprise/release-notes/redisbloom/) This integration includes dedicated commands such as CF.RESERVE for initializing a filter with specified capacity and bucket size, CF.ADD for inserting items, CF.DEL for deletions, and CF.EXISTS for membership checks, enabling efficient use in distributed caching and database scenarios.[](https://redis.io/docs/latest/commands/cf.reserve/)[](https://redis.io/docs/latest/commands/cf.add/)

Several open-source libraries implement cuckoo filters across programming languages, facilitating their use in diverse applications. In [Python](/page/Python), the cuckoofilter package offers a thread-safe implementation supporting insertion, deletion, and lookup operations with configurable false positive rates, last updated in 2019 and still used in various applications.[](https://pypi.org/project/cuckoofilter/) For [Haskell](/page/Haskell), the cuckoo-filter package on Hackage provides both pure and impure variants of the [data structure](/page/Data_structure), emphasizing [functional programming](/page/Functional_programming) paradigms with support for approximate set membership and low false positive probabilities.[](https://hackage.haskell.org/package/cuckoo-filter) In [Java](/page/Java), while Google's [Guava](/page/Guava) library natively supports Bloom filters, third-party extensions like Guava-Probably and dedicated libraries such as CuckooFilter4J provide cuckoo filter functionality with interfaces similar to Guava's probabilistic structures, including dynamic resizing and high-performance hashing.[](http://bdupras.github.io/guava-probably/releases/snapshot/api/docs/com/duprasville/guava/probably/CuckooFilter.html)[](https://github.com/MGunlogson/CuckooFilter4J)

Performance optimizations, such as SIMD instructions for accelerated hashing and bucket lookups, appear in advanced implementations to enhance throughput on modern hardware. For instance, evaluations on platforms like Intel Skylake-X demonstrate that SIMD-optimized cuckoo filters can achieve higher lookup speeds compared to unoptimized variants, particularly at high load factors.[](https://www.vldb.org/pvldb/vol12/p502-lang.pdf)

A BSD-licensed [reference implementation](/page/Reference_implementation) in C++ was released by the original authors, Bin Fan, David G. Andersen, Michael Kaminsky, and Michael D. Mitzenmacher, consisting of approximately 500 lines of code and serving as a [baseline](/page/Baseline) for further developments.[](https://www.cs.cmu.edu/~binfan/papers/login_cuckoofilter.pdf)[](https://github.com/efficient/cuckoofilter) This implementation has influenced integrations in [Apache](/page/Apache) projects and discussions for integration in projects like Kvrocks for efficient key-value storage.[](https://lists.apache.org/thread/4c4cb1xjz8872bwmhh7r8k59jz05byrl)

## Limitations and Extensions

### Drawbacks

Cuckoo filters encounter insertion failures when operating at high load factors, typically exceeding 95% for common configurations with bucket size $ b = 4 $, at which point no suitable vacant position can be found after a limited number of relocation attempts, necessitating table resizing or [eviction](/page/Eviction) of existing items.[](https://www.cs.cmu.edu/~binfan/papers/conext14_cuckoofilter.pdf)

The structure does not natively support tracking item counts or handling multiplicities beyond the limited capacity of $ 2b $ duplicates per item before [bucket](/page/Bucket) overflow occurs, thereby confining its applicability to approximate set membership testing without quantitative frequency information.[](https://www.cs.cmu.edu/~binfan/papers/conext14_cuckoofilter.pdf)

Although average insertion times remain efficient, worst-case performance degrades due to potentially long chains of relocations during the cuckoo displacement process, which become more probable as the load factor approaches its maximum, though such extended chains occur infrequently in practice.[](https://www.cs.cmu.edu/~binfan/papers/conext14_cuckoofilter.pdf)

Deletion requires that the item was previously inserted to ensure safety; otherwise, attempting to remove a non-inserted item may unintentionally evict a [fingerprint](/page/Fingerprint) belonging to a different present item, risking false negatives. The false positive probability remains unchanged after deletion.[](https://www.cs.cmu.edu/~binfan/papers/conext14_cuckoofilter.pdf)

### Variants and Improvements

Several variants of the cuckoo filter have been proposed to address limitations such as fixed capacity, lack of support for item multiplicity, and performance bottlenecks in lookups. One prominent extension is the dynamic cuckoo filter (DCF), which enables resizing and reliable deletions by organizing the structure into a [hierarchy](/page/Hierarchy) of smaller cuckoo filters that can be extended as needed. This approach allows the filter to adapt to changing set sizes without rebuilding from scratch, achieving up to 20% space savings compared to naive resizing methods while maintaining low false positive rates.[](https://cis.temple.edu/~wu/research/publications/Publication_files/Chen_ICNP_2017.pdf)

To support counting the multiplicity of items, the counting cuckoo filter (CCF) associates a counter with each fingerprint slot, allowing approximate tracking of duplicates at the cost of increased space overhead—typically 1.5 to 2 times that of the standard cuckoo filter. The CCF builds on the [cuckoo hashing](/page/Cuckoo_hashing) mechanism by enabling operations like incrementing and decrementing counts during insertions and deletions, which is particularly useful for [multiset](/page/Multiset) [synchronization](/page/Synchronization) in distributed systems. Evaluations show that CCFs achieve false positive rates under 1% for load factors up to 95%, with comparable insertion throughput to counting Bloom filters but improved space efficiency and [synchronization](/page/Synchronization) accuracy.[](https://arxiv.org/abs/2003.03801)

The blocked cuckoo filter, as exemplified by the additive and subtractive cuckoo filter (ASCF), uses a blocked compact cuckoo hash table structure to improve space efficiency through additive and subtractive hashing techniques. ASCF matches the lookup throughput of standard cuckoo filters while reducing space by up to 1.9x for equivalent false positive rates. This variant is especially effective in high-throughput scenarios.[](https://yangtonghome.github.io/uploads/Additive_and_Subtractive_Cuckoo_Filters.pdf)

Post-2014 enhancements have focused on optimizing lookup efficiency and reducing false positives. The semi-sorted cuckoo filter maintains fingerprints in sorted order within buckets to improve space efficiency. Adaptive cuckoo filters adapt fingerprints within buckets to eliminate detected false positives for future queries, dynamically lowering the false positive rate—potentially by orders of magnitude in skewed workloads—while maintaining similar query performance to the original design. These improvements, detailed in works from 2017, have influenced subsequent probabilistic data structures for networking and storage applications.[](https://arxiv.org/abs/1704.06818)

References

  1. [1]
    [PDF] Cuckoo Filter: Practically Better Than Bloom
    Cuckoo filters are substantially different from regular hash tables because only fingerprints are stored in the filter and the original key and value bits of ...
  2. [2]
    [PDF] Cuckoo Filter: Simplification and Analysis - arXiv
    Apr 20, 2016 · We begin by briefly reviewing the Bloom filter, whose operations the cuckoo filter emulates, and the cuckoo hash table on which the organization ...
  3. [3]
    [PDF] Morton Filters: Faster, Space-Efficient Cuckoo Filters via Biasing ...
    Both the quotient filter and the cuckoo filter differ from the Bloom filter in that they store fingerprints, short hashes that each typically have a one-to-one ...
  4. [4]
    Cuckoo Filter: Practically Better Than Bloom - ACM Digital Library
    We propose a new data structure called the cuckoo filter that can replace Bloom filters for approximate set membership tests.
  5. [5]
    efficient/cuckoofilter - GitHub
    Cuckoo filter is a Bloom filter replacement for approximated set-membership queries. While Bloom filters are well-known space-efficient data structures.
  6. [6]
    RedisConf18 - Implementing a New Data Structure for Redis
    The presenter explains why Cuckoo filters were chosen over Bloom filters for the ability to delete items. 3. Advice is given on starting a new Redis module, ...
  7. [7]
    RedisBloom release notes | Docs
    RedisBloom release notes ; v2.2 (December 2019), BF.INFO returns bloom filter details. CF.INFO returns cuckoo filter details. Scalable bloom and cuckoo filters.
  8. [8]
    [PDF] Cuckoo Filter: Practically Better Than Bloom
    Cuckoo Filter: Practically Better Than Bloom. Bin Fan, David G. Andersen ... In contrast, this paper focuses on optimiz- ing and analyzing the space ...
  9. [9]
    [PDF] Performance-Optimal Filtering: Bloom Overtakes Cuckoo at High ...
    The false-positive probability of a Cuckoo filter is fcuckoo(α, l, b)=1 −. 1 −. 1. 2l. 2bα. , with: α = l ∗ n m. (8) where l is the signature length in bits ...
  10. [10]
    [PDF] Cuckoo Hashing - BRICS
    3 Cuckoo Hashing. Cuckoo hashing is a dynamization of a static dictionary described in [21]. The dictionary uses two hash tables, T1 and T2, of length r and ...
  11. [11]
    [PDF] LNCS 4168 - An Improved Construction for Counting Bloom Filters
    Our goal is to provide a scheme that maintains the simplicity of the original counting Bloom filter construction, and further is backed by experimental results.
  12. [12]
    [PDF] Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol
    Abstract--The sharing of caches among Web proxies is an im- portant technique to reduce Web traffic and alleviate network bot- tlenecks.
  13. [13]
    [PDF] The Cuckoo Filter: It's Better Than Bloom - USENIX
    We propose the cuckoo filter, a practical data structure that can replace both counting and tradi- tional Bloom filters with three major advantages: (1) it.
  14. [14]
    [1208.0290] Don't Thrash: How to Cache Your Hash on Flash - arXiv
    Aug 1, 2012 · The paper then gives two data structures, the buffered quotient filter and the cascade filter, which exploit the quotient filter advantages ...Missing: original | Show results with:original
  15. [15]
    Cuckoo filter based IP packet filtering using M-tree | Abdulhassan
    IP packet filtering is the process of filtering incoming and outgoing network packets by matching several packet headers fields with thousands of predefined ...
  16. [16]
    [PDF] Optimized Cuckoo Filters for Efficient Distributed SDN and NFV ...
    Jan 2, 2021 · We show that our opti- mized Cuckoo filter performs better than the regular Cuckoo filter in SDN use-cases like network caching, multicast, and.
  17. [17]
    [PDF] Cuckoo Index: A Lightweight Secondary Index Structure
    To query CI, we first probe its Cuckoo filter with a hash of the lookup key. The filter lookup returns the offset of the matching fingerprint stored in the ...
  18. [18]
    CF.RESERVE | Docs - Redis
    CF.RESERVE creates an empty cuckoo filter with a single sub-filter for the specified capacity. It requires a key and capacity, and can auto-expand.
  19. [19]
    CF.ADD | Docs - Redis
    Adds an item to the cuckoo filter. Cuckoo filters can contain the same item multiple times, and consider each addition as separate.
  20. [20]
    cuckoofilter - PyPI
    Dec 3, 2017 · Cuckoofilter is an implementation of Cuckoo Filter using Python, which is thread-safe. Besides, the package can both be used in python2.x ...
  21. [21]
    Pure and impure Cuckoo Filter - Hackage
    Dec 12, 2018 · Cuckoo filters are a probabilistic data structure used to answer questions like "Have I already seen this user" or "Is this word in the English ...
  22. [22]
    CuckooFilter (Guava-Probably: Probabilistic Data Structures ...
    Cuckoo filters support adding and removing items dynamically while achieving even higher performance than Bloom filters. For applications that store many items ...
  23. [23]
    High performance Java implementation of a Cuckoo filter - GitHub
    Cuckoo filter is a Bloom filter replacement for approximated set-membership queries. While Bloom filters are well-known space-efficient data structures.
  24. [24]
    [PDF] Cuckoo Filter: Better Than Bloom - CMU School of Computer Science
    In this article, we describe a new data struc- ture called the cuckoo filter that can replace Bloom filters for many approxi- mate set-membership test ...
  25. [25]
    Re: [D] Cuckoo Filter Implementation Details [kvrocks]
    ### Cuckoo Hashing The Cuckoo Filter uses a technique called Cuckoo Hashing to decide where to place items. ... apache/kvrocks/discussions/2539 ---- This is an ...Missing: Flink | Show results with:Flink<|control11|><|separator|>
  26. [26]
    [PDF] The Dynamic Cuckoo Filter - Temple CIS
    testing data structure, called cuckoo filter (CF) [8]. Because the ... the deletion of x is essentially equivalent to a false positive, whose ...<|control11|><|separator|>
  27. [27]
    [2003.03801] Multiset Synchronization with Counting Cuckoo Filters
    Mar 8, 2020 · In this paper, we propose to leverage the counting cuckoo filter (CCF), a novel variant of cuckoo filter, to represent and thereafter synchronize a pair of ...
  28. [28]
    [PDF] Additive and Subtractive Cuckoo Filters - Tong Yang
    For instance, given ε=10-3 and σ=0.95, the ASCF requires 13.6 bits per item while the CF requires 26.0 bits per item and the CBF requires. 57.4 bits per item.
  29. [29]
    [1704.06818] Adaptive Cuckoo Filters - arXiv
    Apr 22, 2017 · Access Paper: View a PDF of the paper titled Adaptive Cuckoo Filters, by Michael Mitzenmacher and 2 other authors. View PDF · TeX Source · view ...