Fact-checked by Grok 2 weeks ago

Hash function

A hash function is a mathematical function that takes an input (or key) of arbitrary size and produces a fixed-size output value, typically an used for indexing or as a compact representation of the input. In , hash functions are fundamental to data structures such as hash tables, where they map keys to array indices to enable efficient average-case of O(1) for operations like insertion, deletion, and search. Good hash functions must be deterministic—producing the same output for the same input—computationally efficient, and designed to distribute outputs uniformly across the possible range to minimize collisions, where distinct inputs map to the same output. Common techniques for constructing hash functions include for integers and polynomial rolling hashes for strings, often using a prime modulus to enhance uniformity. Beyond general-purpose hashing, cryptographic hash functions form a specialized with enhanced security properties, mapping inputs to fixed-length bit strings while being computationally infeasible to reverse or find collisions under standard computational assumptions. These properties include preimage resistance (hard to find an input producing a given output), second preimage resistance (hard to find a different input producing the same output as a given input), and (hard to find any two distinct inputs with the same output). Approved cryptographic hash functions, such as those in the family (e.g., SHA-256 producing 256-bit outputs), are standardized by NIST and widely used in protocols for digital signatures, password storage, and integrity verification. Hash functions also play critical roles in applications like , caching, and message authentication, with ongoing research focusing on quantum-resistant variants to address emerging computational threats; as of August 2024, NIST has standardized post-quantum cryptographic algorithms including hash-based digital signature schemes such as SPHINCS+.

Introduction

Definition

A hash function h is a mathematical function that maps data of arbitrary size to values of a fixed size, typically expressed as h: D \to R, where D represents the of possible inputs, such as binary strings of any length \{0,1\}^*, and R is a fixed , often the set of k-bit strings \{0,1\}^k or integers in the range [0, 2^k - 1]. This mapping compresses the input into a shorter output, known as the hash value or digest, whose is predetermined by the function and independent of the input size. Key characteristics of a hash function include its fixed output size, referred to as the hash length, which ensures consistent digest lengths regardless of input variability. The function is a many-to-one mapping, meaning that the original input cannot be uniquely reconstructed from the hash value alone, as multiple distinct inputs may produce the same output; this distinguishes it from reversible transformations. The digest serves as a compact representation or "" of the input , capturing essential identifying features in a succinct form. A simple illustrative example for integer inputs is the modular hash function h(x) = x \mod m, where m is a positive defining the output range size. This computes the remainder of x divided by m, yielding a value in [0, m-1], effectively folding larger integers into a . Unlike encoding, which transforms data into a different format while preserving all for exact reversal, or , which reduces size but often allows of the original (in lossless cases), a hash function does not assume invertibility and prioritizes irreversible summarization over data preservation. Hash functions find brief application in structures like hash tables for efficient key indexing, though detailed uses appear elsewhere.

Motivation and Role in Computing

Hash functions serve as a cornerstone in by enabling efficient in scenarios involving vast amounts of . Their primary motivation lies in facilitating rapid lookup, indexing, and efficiency for large datasets, where traditional search methods like linear scanning would be prohibitively slow. By transforming variable-length inputs into compact representations, hash functions allow systems to complex keys directly to accessible locations, significantly reducing the time required for operations on extensive collections of . In algorithmic contexts, hash functions play a pivotal role by supporting data structures that achieve average O(1) for searches, insertions, and deletions through direct mapping of keys to array indices. This efficiency stems from the ability to approximate of outputs, ensuring that most operations avoid exhaustive traversals and instead perform constant-time probes. Early applications highlighted this utility, such as their use in compilers for managing symbol tables to quickly resolve variable names during , a practice dating back to proposals in the . The benefits of hash functions include substantial space efficiency, as they produce fixed-size outputs regardless of input length, allowing for compact indexing without storing full data keys. Additionally, their computational simplicity—often involving basic arithmetic or bitwise operations—ensures high speed even on resource-constrained systems. However, this efficiency introduces trade-offs, such as the potential for collisions where distinct inputs map to the same output, necessitating strategies to resolve conflicts and maintain performance.

Fundamental Properties

Determinism and Defined Output Range

A hash function is fundamentally , meaning that it maps the same input to the identical output every time it is computed, without any influence from randomization or external factors in non-probabilistic settings. This property ensures that the function behaves as a fixed mathematical , allowing consistent results across multiple invocations, which is a core requirement for reliable computational processes. The defined output range of a hash function specifies a fixed size for the result, regardless of the variable length of the input data. Mathematically, this is expressed as |h(x)| = k for all inputs x, where k is a constant bit length and h denotes the hash function. For instance, the algorithm produces outputs of exactly 256 bits. This fixed-length constraint contrasts with the arbitrary input size, compressing or projecting data into a uniform domain, often represented as an integer in the range [0, 2^k - 1]. These attributes enable key implications for practical use, such as reproducibility in and efficient comparison operations. Determinism guarantees that hash values can be recomputed identically for purposes, supporting applications like duplicate detection and checks. The fixed output range facilitates standardized storage allocation and in implementations, enhancing performance in hash-based structures. In edge cases, hash functions handle empty inputs deterministically; for example, the SHA-256 hash of an empty message (zero-length input) is e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855. For or values, practical implementations often treat them equivalently to empty inputs or use predefined representations to maintain the fixed output size, ensuring no exceptions disrupt the defined range.

Collision Resistance and Uniformity

A collision in a hash function occurs when two distinct inputs x \neq y produce the same output, i.e., h(x) = h(y). Due to the , collisions are inevitable for any hash function mapping an unbounded or large input domain to a finite output range of size $2^n, as exceeding $2^n + 1 inputs guarantees at least one pair sharing an output. While collisions cannot be eliminated, their properties are critical for hash function design, particularly in ensuring that they are computationally difficult to exploit. Collision resistance encompasses several related security notions, each addressing different ways adversaries might find colliding inputs. First-preimage resistance requires that, given an output value h(x), it is computationally infeasible to find any preimage x' such that h(x') = h(x). Second-preimage resistance stipulates that, given an input x and its hash h(x), finding a distinct x' \neq x with h(x') = h(x) is hard. , the strongest of these, demands that discovering any pair of distinct inputs x \neq y with h(x) = h(y) is infeasible, and it implies the other two properties under certain conditions. These levels ensure that hash functions behave like random oracles in practice, thwarting attacks that rely on predictable mappings. Beyond resistance to deliberate collision-finding, good hash functions exhibit uniformity, where outputs are distributed as evenly as possible across the output range, approximating a random . This property minimizes clustering in applications like hash tables and enhances unpredictability. Statistical tests, such as the , evaluate uniformity by comparing observed output frequencies against expected uniform probabilities across bins; a low indicates even . The complements these properties by ensuring sensitivity to input changes: a single-bit alteration in the input should, on average, flip approximately 50% of the output bits, propagating changes rapidly through the function. This diffusion property, observed in well-designed hashes like and families, makes outputs appear uncorrelated with inputs and resists differential attacks.

Efficiency and Performance Metrics

The computational efficiency of hash functions is a critical factor in their practical deployment, as they must process inputs rapidly while maintaining reliability in resource-constrained environments. The for computing a hash is typically linear in the input size, O(n) where n represents the length of the input in bits or bytes, due to the need to iterate over each element of the input . This linearity arises from sequential processing steps, such as mixing and , but implementations leverage fast bitwise operations—like shifts, XORs, and rotations—to minimize constant factors and achieve high speeds on commodity hardware. Space efficiency is another hallmark of hash functions, requiring only minimal additional during computation—typically O(1) beyond the input buffer and a fixed-size output for the hash digest. This low overhead stems from the stateless nature of most hash algorithms, which avoid storing intermediate states across multiple invocations unless explicitly designed for streaming inputs, making them suitable for memory-limited applications like embedded systems. Performance is commonly evaluated using metrics such as throughput (e.g., gigabytes processed per second) and (time for a single computation). For instance, on modern CPUs, non-cryptographic hashes like xxHash can achieve throughputs exceeding 10 GB/s for large inputs, while cryptographic ones like SHA-256 typically range from 0.5 to 2 GB/s depending on input size and . varies with input length but is often sub-microsecond for short messages, influenced by factors like efficiency and . further boosts these metrics; Intel's SHA Extensions, introduced in 2013, provide dedicated instructions for and SHA-256, yielding significant speedup over software implementations by optimizing round computations. A key exists between non-cryptographic and cryptographic hash functions: the former emphasize speed for general-purpose tasks like , often using simpler mixing to attain 5-10 times higher throughput than cryptographic variants, while the latter incorporate additional operations for properties, increasing computational cost. This balance ensures non-cryptographic hashes excel in high-volume scenarios without needing provable to attacks, whereas cryptographic ones prioritize at the expense of .

Types of Hash Functions

Non-Cryptographic Hash Functions

Non-cryptographic hash functions prioritize computational speed and to achieve low collision probabilities for typical inputs, without incorporating mechanisms to withstand adversarial . These functions are optimized for in resource-constrained environments, often producing fixed-size outputs through simple operations like , XOR, and bit shifting, enabling rapid of large datasets. Unlike their cryptographic counterparts, they make no guarantees against deliberate collision generation, focusing instead on efficiency for non-security-critical tasks. Prominent examples include the Fowler–Noll–Vo (FNV) hash, , and CityHash. FNV, developed by Glenn Fowler, Landon Curt Noll, and Phong Vo, is a simple iterative emphasizing good dispersion for nearly identical inputs. The widely used FNV-1a variant initializes the hash to an offset basis and processes each input byte with an XOR followed by multiplication by a prime:
uint32_t hash = 2166136261U;  // FNV_OFFSET_BASIS_32
uint32_t FNV_PRIME_32 = 16777619U;
for each byte b in input:
    hash ^= b;
    hash *= FNV_PRIME_32;
, created by in 2008, employs a mixing function with repeated avalanche steps to ensure high-quality output distribution, making it suitable for general-purpose lookups. CityHash, released by in 2011, targets string hashing with variants producing 64- or 128-bit values, optimized for high throughput on modern processors. These functions find primary use in internal implementations of data structures such as hash tables and in caching mechanisms to enable quick key-value mappings and lookups. However, their lack of resistance to targeted inputs renders them susceptible to hash flooding attacks, where attackers generate collisions to degrade performance, potentially enabling denial-of-service (DoS) scenarios like resource exhaustion in web servers.

Cryptographic Hash Functions

Cryptographic hash functions are specialized hash functions engineered to provide security guarantees essential for protecting , authenticity, and in cryptographic protocols. Unlike general-purpose hash functions, they are designed under the assumption of computational hardness, making them suitable as building blocks for digital signatures, message authentication, and blockchain technologies. These functions aim to behave as ideal one-way functions, where computing the inverse is infeasible for adversaries with limited resources. The core security properties of cryptographic hash functions are preimage resistance, second-preimage resistance, and . Preimage resistance ensures that, given a hash output h, it is computationally infeasible to find any input m such that H(m) = h. Second-preimage resistance requires that, for a given input m, finding a distinct m' with H(m') = H(m) is infeasible. is the strongest property, demanding that discovering any two distinct inputs m and m' where H(m) = H(m') is computationally hard; this implies second-preimage resistance, which in turn implies preimage resistance. These properties collectively prevent attacks like or reversal, with typically requiring an output size of at least 256 bits for current security levels. Prominent standards for cryptographic hash functions include the Secure Hash Algorithm (SHA) family, developed and published by the National Institute of Standards and Technology (NIST). , producing 160-bit outputs, was deprecated by NIST in 2011 due to collision vulnerabilities and is no longer approved for any security applications, with full phase-out mandated by 2030. SHA-2 variants, such as with 256-bit outputs, remain widely adopted for their proven resistance to known attacks and are recommended for ongoing use in federal systems. , standardized in 2015, offers a parallel family with variants like SHA3-256, providing equivalent security to SHA-2 but with a different internal structure to mitigate potential weaknesses in older designs. For high-performance needs, BLAKE3, introduced in 2020, delivers cryptographic security comparable to SHA-2 while achieving significantly higher throughput through parallel processing, making it suitable for resource-constrained environments post-2015. Common construction techniques for cryptographic hash functions include the and the sponge construction. The processes variable-length inputs by iteratively applying a compression function to fixed-size blocks, with a finalization step to produce the output; it underpins (collision attacks demonstrated in 2004), (practical collisions in 2017), and the secure family. However, its vulnerability to length-extension attacks has led to its deprecation in newer designs. In contrast, the sponge construction, used in (based on the ), alternates between absorbing input into a state (capacity-preserving) and squeezing output bits, enabling flexible output lengths and resistance to length-extension without relying on a dedicated compression function. This approach provides provable security bounds in the model when based on a secure . As of 2025, developments in post-quantum cryptography emphasize hash functions resilient to quantum attacks, with NIST incorporating SHA-3's extendable-output variants like SHAKE-256 into standards for signature schemes such as SLH-DSA and FN-DSA. These functions maintain collision resistance against Grover's algorithm by requiring doubled output sizes for equivalent security, ensuring their viability in quantum-safe protocols.

Applications

Data Structures and Retrieval

Hash tables are fundamental data structures that utilize hash functions to enable efficient storage and retrieval of key-value pairs. In a hash table, an array of fixed size m serves as the underlying storage, where a hash function h maps each key k to an index h(k) in the range [0, m-1], allowing direct access to the corresponding slot. This mapping supports average-case constant-time operations for insertion, deletion, and search, making hash tables ideal for applications requiring fast lookups, such as databases and caches. The efficiency depends on the hash function's ability to distribute keys uniformly across slots, minimizing clustering. The load factor α, defined as the ratio of the number of stored items n to the number of slots m (α = n/m), measures the table's utilization and influences . Keeping α below 0.7–0.8 typically ensures low collision rates and maintains efficient operation, as higher values increase the probability of conflicts. Collisions occur when distinct keys hash to the same , a inevitability addressed through strategies. With simple uniform hashing, the expected number of collisions remains proportional to α, supporting predictable . Collision resolution techniques fall into two primary categories: and . In , each array slot points to a (or other structure) holding all keys that hash to that ; insertions to the , and searches traverse it until the key is found or the end is reached. This method tolerates high load factors but incurs overhead from pointer traversals. resolves collisions by probing alternative slots within the array for an empty one during insertion, with retrieval following the same probe sequence. , a common variant, uses the sequence h(k, i) = (h(k) + i) mod m for i = 0, 1, 2, ..., where i is the probe number, though it can lead to primary clustering. Variants of hash tables address limitations in static sizing or collision handling. Cuckoo hashing employs two hash functions and two tables (or arrays), relocating conflicting keys via a "cuckoo" eviction process to achieve worst-case O(1) lookups with high probability, at the cost of potential insertion loops resolved by rehashing. Extendible hashing dynamically adjusts the effective table size by using a of pointers to buckets, prefixed by a varying number of hash bits, allowing growth without full rehashing and supporting O(1) access in practice for dynamic files. Overall, with a well-designed hash function providing good uniformity, hash tables achieve average O(1) for core operations, outperforming tree-based structures like balanced search trees for unordered access patterns.

Security and Verification

Hash functions are essential in cryptographic protocols for verifying the of , ensuring that transmitted or stored information has not been altered. In this context, a hash digest serves as a fixed-length of the , allowing efficient to detect tampering. Unlike simple checksums designed primarily for error detection in non-adversarial environments, cryptographic digests must resist intentional modifications by attackers. For instance, the (CRC), introduced as a polynomial-based error-detecting code, excels at identifying accidental bit errors in transmission over noisy channels, such as in protocols, but offers no protection against deliberate alterations since its computation is fully deterministic and . In contrast, mechanisms like the (HMAC) incorporate a secret key to produce an authenticated digest, providing both and by preventing an attacker from forging a valid tag without the key. HMAC, defined using iterative hash functions like SHA-256, processes the message and key through nested hashing to mitigate weaknesses in the underlying hash construction. A prominent application of hash functions in security is the digital signature paradigm, where long messages are first hashed to a fixed-size value before signing, known as the hash-then-sign approach. This method reduces the computational load on the signing while leveraging the of the hash to ensure the signature binds to the entire message. For example, in the signature scheme standardized by NIST, the message is hashed using SHA-256 to produce a 256-bit digest, which is then encrypted with the signer's private to form the signature; verification involves decrypting with the public and matching the recomputed hash. This combination, specified in the Digital Signature Standard, enables secure authentication in protocols like TLS and signing, as long as the hash function remains preimage-resistant. In systems, hash functions underpin verification through structures like Merkle trees, which efficiently prove the inclusion and unaltered state of within a . Each is hashed, and these hashes are pairwise combined up to a root hash, which is included in the header and linked via a chain of hashes to previous blocks. This allows light clients to verify a 's by recomputing the from the to the root without downloading the full dataset, as demonstrated in the where double-SHA-256 secures the tree against modifications. Any change to a would alter its hash and propagate up the tree, invalidating the root and breaking the chain's cryptographic link. Despite their strengths, certain functions are vulnerable to length extension attacks, where an attacker appends to a and computes a valid without knowing the original secret prefix, exploiting the Merkle-Damgård construction used in and . In this attack, the adversary uses the public of a secret-keyed (e.g., in a naive ), appends and new , and derives the extended by continuing the internal state compression. and , both based on this iterative scheme, allow such extensions because the output reveals the internal state after processing the original plus . This vulnerability has led to practical exploits, such as forging tokens, and contributed to the of these hashes in security-critical applications.

Other Computational Uses

Bloom filters are probabilistic data structures that use multiple independent hash functions to represent a set of elements in a , allowing membership queries with a controlled but no false negatives. Introduced by Burton Howard Bloom in 1970, they employ k hash functions to map each element to k positions in an m-bit , setting those bits to 1 upon insertion; queries check if all corresponding bits are 1. This approach trades space efficiency for potential errors, making it suitable for applications like spell-checkers and network routing caches where exact membership is not critical. The false positive probability p for a is approximated by the formula p \approx \left(1 - e^{-kn/m}\right)^k where n is the number of elements inserted, m is the size, and k is the number of functions, with optimal k around (m/n) ln 2 to minimize p. (LSH) extends hashing to high-dimensional spaces by using families of hash functions that preserve locality, such that similar items are hashed to the same with high probability while dissimilar items are separated. Pioneered by Piotr Indyk and in 1998, LSH enables efficient approximate in applications like recommendation systems and , avoiding exhaustive comparisons in the curse of dimensionality. For example, random projection-based LSH maps vectors to lower dimensions while approximating distances like or . In storage systems, hash functions facilitate by generating fixed-size fingerprints for data chunks, enabling rapid identification and elimination of duplicates to optimize space usage. A 2011 empirical study across diverse datasets showed deduplication ratios exceeding 50% in images and backups, using cryptographic hashes such as SHA-256, which provide strong to enable reliable duplicate detection for large chunks without practical risk of collisions. This technique underpins systems like and commercial backup solutions, balancing computational overhead with storage savings. As of 2025, hash functions play a growing role in , particularly through to manage high-dimensional sparse data in models. In retrieval-augmented generation () frameworks, deep hashing integrates with vector databases to accelerate similarity searches for large language models, reducing latency while maintaining semantic accuracy in tasks like . This approach, as demonstrated in Hash-RAG, combines hashing with fine-grained retrieval to handle billion-scale corpora efficiently.

Construction Techniques

Hashing Fixed-Length Data

Hashing fixed-length data, such as integers, forms the basis for many implementations where inputs are of uniform size, typically machine words like 32-bit or 64-bit integers. These methods aim to map the input to a over a range of slots, often using simple arithmetic operations to ensure efficiency on . Common techniques include division, multiplication, mid-square, and specialized variants like and , each with trade-offs in uniformity and computational cost. The division method is one of the simplest approaches for hashing, defined as h(x) = x \mod m, where x is the input and m is the number of slots, ideally chosen as a prime not close to a of 2 to avoid clustering. This operation leverages the arithmetic inherent in , producing values in \{0, 1, \dots, m-1\}. However, its performance degrades if m is a of 2 due to biases in representations. Multiplicative hashing improves on by using with a constant to scramble the bits before reduction, given by h(x) = \left\lfloor \frac{(x \cdot A) \mod 2^w}{2^{w-r}} \right\rfloor, where A is a multiplier close to the (approximately 0.6180339887), w is the word size (e.g., 32 or 64 bits), and r is the number of output bits. For 32-bit words, Knuth recommends A = 2654435769, which is \left\lfloor 2^{32} \cdot \frac{\sqrt{5} - 1}{2} \right\rfloor, ensuring good bit mixing without . This method is faster than on most processors, as it replaces with a right shift after . The mid-square method extracts bits from the middle of the squared input to produce the , computed as h(x) = (x^2 \mod 2^{2w}) \gg w, where \gg denotes a right shift by w bits to isolate the central w bits of the $2w-bit square. This approach assumes x fits in w bits and aims to capture higher-order interactions from the implicit in squaring. While conceptually simple, it can suffer from short periods and poor distribution for certain inputs, making it less favored in modern use but illustrative of bit-extraction techniques. Fibonacci hashing is a refined form of multiplicative hashing that explicitly uses constants derived from the or for the multiplier, typically A = \frac{2^w}{\phi} where \phi = \frac{1 + \sqrt{5}}{2} \approx 1.618, followed by a shift to extract the high bits. This yields h(x) = \left\lfloor \frac{x \cdot A}{2^{w-r}} \right\rfloor \mod 2^r, promoting optimal spreading due to the irrational properties of \phi, which minimize correlations in the expansions. It is particularly effective for power-of-two table sizes, avoiding the division pitfalls. Zobrist hashing, designed for fixed-length representations like game board states, assigns a unique random 64-bit value to each possible feature (e.g., piece position) and computes the hash as the bitwise XOR of the values for active features: h(state) = \bigoplus_{i} Z_{feature_i}, where Z is a precomputed table of random keys. Updating the hash for a move involves XORing out the old feature and XORing in the new one, enabling incremental computation in O(1) time. This method excels in applications requiring fast equality checks and transposition detection, with low collision probability under uniform randomness.

Hashing Variable-Length Data

Hashing variable-length data, such as strings or sequences, requires techniques that accommodate inputs of arbitrary lengths while producing fixed-size outputs suitable for or . These methods transform the entire input into a hash value without or , ensuring uniformity across different lengths. Common approaches include polynomial-based hashing, folding techniques, for multimedia, and radix conversion, each designed to handle variability efficiently while minimizing collisions. One widely used method for string hashing is the polynomial rolling hash, which treats the input string s = s{{grok:render&&&type=render_inline_citation&&&citation_id=0&&&citation_type=wikipedia}} s{{grok:render&&&type=render_inline_citation&&&citation_id=1&&&citation_type=wikipedia}} \dots s[n-1] as a polynomial evaluated at a base p, typically a large , modulo a m to bound the output size: h(s) = \left( \sum_{i=0}^{n-1} s \cdot p^{n-1-i} \right) \mod m Here, each character s is mapped to an integer (e.g., its ASCII value), and the powers of p weight positions from left to right, simulating a positional numeral system. This approach, introduced in the Rabin-Karp string matching algorithm, enables efficient computation for substrings via rolling updates, making it ideal for pattern matching in texts of varying lengths. The choice of p and m (often m as a large prime like $10^9 + 7) reduces collision probability, with expected linear-time performance for most inputs. Folding methods simplify hashing by partitioning the variable-length input into fixed-size segments and combining them, often via summation modulo m. In character folding, the hash is computed as the sum of individual codes (e.g., ASCII values) modulo m, such as h(s) = \left( \sum_{i=0}^{n-1} \text{ASCII}(s) \right) \mod m, which treats each character equally regardless of position or length. This is straightforward but prone to collisions for similar-length strings with balanced character distributions. Length folding extends this by dividing the into groups of fixed length (e.g., 2 or 3 characters), converting each group to a number, and summing or XORing them modulo m; for instance, for a string longer than the group size, the last partial group is folded in. These techniques, common in early implementations, prioritize speed over distribution quality for alphanumeric keys. Perceptual or fuzzy hashing adapts to variable-length multimedia data, such as images or audio, by focusing on content similarity rather than exact matches, producing hashes robust to minor alterations like resizing or compression. A representative example is pHash, which processes images by resizing to 32x32 pixels, applying a (DCT) to extract low-frequency coefficients, and generating a 64-bit hash from the of these coefficients compared to their mean. This method, implemented in open-source libraries, yields Hamming distances under 10 for visually similar images, enabling applications like duplicate detection in large datasets. Unlike exact hashes, perceptual ones tolerate variability in length or format while preserving semantic invariance. Radix conversion treats the variable-length as a number in a mixed system with base b (often 256 for byte strings or 128 for printable characters), computing h(s) = \left( \sum_{i=0}^{n-1} s \cdot b^{n-1-i} \right) \mod m, which is mathematically equivalent to the hash with p = b. This interpretation highlights its roots in numeral systems, allowing direct computation for short strings but requiring modular reduction for long ones to prevent . It is particularly effective for ordered alphabets, providing good distribution when b and m are coprime, though it shares collision risks with methods for adversarial inputs.

Specialized and Dynamic Hash Functions

Perfect hashing is a technique designed for static sets of keys, where the hash function or scheme ensures no collisions within a predefined universe, enabling constant-time lookups without additional probing. This is particularly useful for applications requiring fast, collision-free access, such as compiler symbol tables or static databases. The seminal Fredman-Komlós-Szemerédi (FKS) scheme constructs such a perfect hash table using a two-level structure: a first-level hash function maps the n keys to n buckets, aiming to keep the sum of squared bucket sizes bounded (e.g., \sum b_i^2 \leq \alpha n with \alpha \geq 2), followed by independent second-level injective hash functions for each bucket into arrays of size proportional to the square of the bucket size. This approach guarantees no collisions for the static set and achieves expected linear construction time O(n) with O(1) worst-case query time. Universal hashing addresses the limitations of fixed hash functions by employing a family of functions chosen randomly at runtime, providing probabilistic guarantees against worst-case collisions even for adversarial inputs. A family H of functions from a U to a range R is if, for any distinct x, y \in U, the collision probability satisfies \Pr_{h \in H}[h(x) = h(y)] \leq 1/|R|. Introduced by and Wegman, this framework ensures input-independent average-case linear time for insertions and lookups in hash tables, making it robust for dynamic environments where key distributions are unknown. The construction often uses , such as h(x) = ((a x + b) \mod p) \mod m with random a, b, balancing simplicity and performance. Dynamic hash functions extend these ideas to evolving datasets, prioritizing schemes that support insertions, deletions, and resizes with minimal key movements to avoid costly full rehashing. , developed by Pagh and Rodler, uses two hash functions and two parallel tables, where each key has exactly two possible locations; insertions proceed by evicting occupying keys (cuckoo-like) to their alternate position until a free slot is found or a rehash is triggered after a bounded number of attempts (e.g., O(\log n)). This yields worst-case O(1) lookup time via at most two probes and amortized expected O(1) updates, with space efficiency comparable to search trees (about 3 words per key at load factors up to 50%). hashing, proposed by Celis and Larson, refines by "robbing" keys from shorter probe sequences during insertion to balance distances, reducing the maximum probe length and variance in search times; this minimizes disruptions during table resizing, as keys are redistributed with low relocation costs, achieving near-optimal performance under high loads. Fuzzy or perceptual hash functions are specialized for domains where exact matches are insufficient, instead detecting approximate similarities such as near-duplicate documents or images by tolerating minor variations. SimHash, introduced for large-scale web crawling, generates compact fingerprints (e.g., 64 bits) from feature vectors by projecting onto random hyperplanes and signing the results, ensuring that similar inputs produce hashes with small s. For instance, documents differing by a small yield fingerprints differing in roughly proportional bits, allowing efficient near-duplicate detection via thresholding (e.g., Hamming distance \leq 3 for billions of pages). This locality-sensitive property contrasts with cryptographic hashes, prioritizing perceptual similarity over , and has been validated in production systems for clustering .

Analysis and Evaluation

Theoretical Analysis

The theoretical analysis of hash functions relies on probabilistic models and to evaluate their and properties. A key concept is the birthday paradox, which quantifies the likelihood of collisions when hashing multiple items into a fixed number of slots. For n distinct items hashed into m possible values, the probability of at least one collision is approximately $1 - e^{-n^2 / (2m)}, assuming and hashing; this approximation holds well when n is much smaller than m, highlighting how collisions become likely after roughly \sqrt{2m \ln 2} items, or about 1.18\sqrt{m} for a 50% probability. Universal hashing provides a rigorous framework for ensuring low collision probabilities regardless of input distribution. Introduced by Carter and Wegman, a universal hash family is defined such that for any two distinct keys x and y, the probability that h(x) = h(y) is at most 1/m, where h is chosen uniformly at random from the family and m is the output size. Their construction, based on polynomial evaluation over finite fields, achieves this bound exactly for appropriate parameters, enabling provably efficient storage and retrieval with expected constant-time operations in hash tables. Strongly universal variants further ensure that the hash values for distinct keys are uniformly distributed and independent, strengthening proofs for applications like authentication. In dynamic settings, evaluates the average performance over a of operations, accounting for costly rehashing. For resizable hash tables that double in size when load exceeds a (e.g., 0.5) and halve when below another (e.g., 0.25), the amortized time per insertion or deletion is O(1), as the total cost of resizing over n operations is O(n) despite occasional linear-time rebuilds. This holds under uniform hashing assumptions, where each resize amortizes the work across exponentially growing table sizes, yielding expected constant time without probabilistic failure. The of finding collisions underscores the hardness of strong hash functions. For certain cryptographic constructions, such as those based on NP-hard problems like syndrome decoding in , locating a pair of inputs with the same output is NP-hard, as it reduces directly to solving the underlying intractable instance. This provable hardness supports the security of such functions against collision attacks in theoretical models.

Practical Testing and Measurement

Practical testing and measurement of hash functions involve empirical evaluations to assess their , , and to attacks in real-world scenarios. These methods complement theoretical analysis by applying standardized suites and tools to generate quantifiable metrics, such as pass/fail rates in randomness tests or collision probabilities under simulated loads. Such evaluations are essential for validating hash functions in applications like data structures and , ensuring they meet practical requirements for uniformity, speed, and security. Statistical tests are fundamental for verifying the randomness and distribution properties of hash outputs. The Dieharder suite, an enhanced version of George Marsaglia's original Diehard battery, includes over 30 tests to evaluate sequences produced by hash functions treated as pseudorandom number generators, such as the birthday spacings test for and the runs test for bit . For cryptographic hash functions, the NIST Statistical Test Suite (SP 800-22) provides 15 core tests, including frequency, , and serial tests, applied to binary sequences derived from hash digests to confirm unpredictability against statistical biases. Additionally, the assesses the uniformity of hash value distributions by comparing observed frequencies in bins against expected uniform probabilities, with a near 0.5 indicating good uniformity for non-cryptographic hashes. Benchmarking tools measure computational efficiency and quality metrics like and . SMHasher, a widely used suite for non-cryptographic hash functions, evaluates speed in gigabytes per second on various inputs and detects quality issues through tests like the "Sanity" set for basic distribution and the "Sparse" test for low-entropy inputs, reporting metrics such as maximum collision count. The , which quantifies how input bit changes propagate to output bits, is tested by flipping single input bits and measuring in outputs; visualizations like bit-flip probability matrices (e.g., 50% flip rate per bit ideally) highlight strengths, as implemented in tools deriving from ' avalanche tester. Security audits focus on vulnerability to collision-finding attacks, simulating adversarial conditions. Rainbow tables, precomputed chains of hash reductions, accelerate offline collision searches for unsalted hashes like , reducing time from brute-force by orders of magnitude, as demonstrated in attacks on legacy systems where tables of terabytes enable rapid reversals. GPU-accelerated attacks exploit to brute-force or dictionary-search hash spaces, achieving speeds up to billions of hashes per second on modern hardware for weakened functions like , underscoring the need for audits in protocol implementations. Dedicated tools facilitate these audits through brute-force and hybrid attacks. , an open-source recovery utility, supports over 300 hash algorithms and modes like straight brute-force or mask attacks, with its 2025 v7.0.0 update enhancing kernel optimizations for multi-GPU setups and new hash modes for emerging standards, enabling practical measurement of cracking times under resource constraints.

History and Evolution

The concept of hash functions traces its roots to the early days of , emerging as a solution for efficient data organization and retrieval. Precursors can be found in Herman Hollerith's punch card systems developed in the 1890s for the U.S. Census, which enabled key-to-address mapping for data storage. In January 1953, researcher authored an internal memorandum proposing the use of hashing to transform keys into array indices for rapid searching in large document collections, particularly for keyword-in-context (KWIC) indexing systems. This is widely regarded as the first formal description of hashing in . Luhn demonstrated a practical KWIC implementation in 1958 at the International Conference on Scientific Information. His work laid the foundation for hash tables, emphasizing to minimize collisions. The term "hashing" and its theoretical underpinnings were further developed in the mid-1950s. In 1956, Arnold Dumey published a paper on "hash coding" for direct-access files, exploring collision resolution techniques. By 1958, researchers at , including Hans Peter Luhn's contemporaries, implemented early prototypes. Donald Knuth's seminal work, (first volume published in 1968), formalized hashing analysis, crediting Luhn as the originator and discussing optimality criteria for hash functions. Cryptographic hash functions evolved separately in the late 1970s, driven by the need for message authentication and integrity in . Early mentions appear in works by and (1979), who described hash functions for digital signatures. The first dedicated cryptographic hash designs emerged in the , such as the MDC-2 and MDC-4 constructions based on block ciphers. The 1990s marked rapid standardization. Ronald Rivest introduced in 1990 and in 1991, which became widely used despite later vulnerabilities. In 1993, the (NSA) developed SHA-0, followed by in 1995, published as Federal Information Processing Standard (FIPS) 180-1 by NIST. Concerns over 's security led to the family (SHA-224, SHA-256, etc.) in 2002 (FIPS 180-2). In response to cryptanalytic advances, NIST initiated a competition in 2007 for a new hash standard, culminating in the selection of Keccak (basis for ) in 2012, finalized in FIPS 202 in 2015. introduced a construction, differing from the Merkle–Damgård structure of prior SHA functions. As of 2025, research continues on quantum-resistant hash functions, with NIST standardizing that includes hash-based signatures like SPHINCS+ (FIPS 205, 2022).

References

  1. [1]
    3.4 Hash Tables - Algorithms, 4th Edition
    We have three primary requirements in implementing a good hash function for a given data type: It should be deterministic—equal keys must produce the same hash ...Missing: science | Show results with:science
  2. [2]
    Cryptographic hash function - Glossary | CSRC
    A function that maps a bit string of arbitrary length to a fixed-length bit string. Approved hash functions are expected to satisfy the following properties: 1.
  3. [3]
  4. [4]
    Deep Dive into Hashing | Baeldung on Computer Science
    Mar 18, 2024 · Hash functions take variable-length input data and produce a fixed-length output value. We usually refer to that as hash code, digest, hash ...2. Hashing · 2.1. Hash Functions · 2.2. Cryptographic Hash...<|control11|><|separator|>
  5. [5]
    [PDF] Chapter 5 Hash Functions
    A hash function usually means a function that compresses, meaning the output is shorter than the input. Often, such a function takes an input of arbitrary ...
  6. [6]
    hash function - Glossary | CSRC
    A hash function is a function on bit strings with a fixed output length, mapping arbitrary length bit strings to a fixed length, and is like a fingerprint.
  7. [7]
    [PDF] A GOOD HASH FUNCTION IS HARD TO FIND, AND VICE VERSA
    First, hash functions are not encodings of the original string, in the sense that it is not possible to recover the string from the hash function, no matter how ...<|control11|><|separator|>
  8. [8]
    CS 312 Lecture 21 Hash functions - Cornell: Computer Science
    As we've described it, the hash function is a single function that maps from the key type to a bucket index. In practice, the hash function is the composition ...
  9. [9]
    [PDF] Hash Tables
    ❑ Example: h(x) = x mod N is a hash function for integer keys. ❑ The integer h(x) is called the hash value of key x. ❑ A hash table for a given key type ...
  10. [10]
    [PDF] Why Simple Hash Functions Work: Exploiting the Entropy in a Data ...
    Abstract. Hashing is fundamental to many algorithms and data structures widely used in practice. For theoretical analysis of hashing, there have been two ...
  11. [11]
    CS106B Hashing
    Mar 6, 2024 · It generally supports insert, delete, and search operations that have O(1) runtimes in the average case, but it only uses O(n) memory (where n ...Missing: enable | Show results with:enable
  12. [12]
    History of Hashtables | CS62 - Computer Science Department
    The hash table is one of the most powerful data structures in computer science, enabling nearly constant-time access to data through key-value pairs.
  13. [13]
    Hash Function - an overview | ScienceDirect Topics
    A hash function is a computational method that maps an indeterminate size of data into a fixed size of data.8.2 Discrete String Matching · 8.2. 2 Applications Of... · Normalizing Bag Of Numbers
  14. [14]
    Hashing Algorithm - an overview | ScienceDirect Topics
    Hashing algorithms in Computer Science are functions that apply a mathematical process to data of arbitrary size, producing a fixed-size output known as a hash ...
  15. [15]
    Hash Functions | CSRC - NIST Computer Security Resource Center
    Jan 4, 2017 · Four fixed-length hash algorithms: SHA3-224, SHA3-256, SHA3-384, and SHA3-512; and; Two closely related, “extendable-output” functions (XOFs): ...NIST Policy · SHA-3 Standardization · SHA-3 Project · News & Updates<|control11|><|separator|>
  16. [16]
    [PDF] HASH FUNCTIONS - UCSD CSE
    Collision resistance (CR) Definition: A collision for a function h : D → {0,1}n is a pair x1,x2 ∈ D. of points such that h(x1) = h(x2) but x1 6= x2. If |D| > 2 ...
  17. [17]
    [PDF] Cryptographic Hash-Function Basics: Definitions, Implications, and ...
    Abstract. We consider basic notions of security for cryptographic hash functions: collision resistance, preimage resistance, and second-preimage resistance.
  18. [18]
    [PDF] Spritz—a spongy RC4-like stream cipher and hash function
    We consider a uniformity test to fail if its chi- squared statistic is greater than four standard de- viations above its expected value (more precisely: greater ...
  19. [19]
    [PDF] Cryptographic Hash Functions
    Avalanche Effect. • Desired property: avalanche effect. – Change to 1 bit of input should affect about half of output bits. • Crypto hash functions consist of ...
  20. [20]
    What is the time complexity of computing a cryptographic hash ...
    Feb 20, 2019 · Hashing is close to linear (O(N) for bits), or O(n) for arbitrary length, and can be O(1) for fixed sizes. For variable width outputs, it's O(n ...
  21. [21]
    [PDF] Intel® SHA Extensions
    This paper provides an introduction to the family of new instructions that support performance acceleration of the Secure Hash Algorithm (SHA) on. Intel®.
  22. [22]
    Understanding Hash Tables | Baeldung on Computer Science
    a constant time. It means that, on average ...<|separator|>
  23. [23]
    A Systematic Exploration of Evolutionary Computation for the Design ...
    Jul 14, 2024 · Non-cryptographic (NC) hash functions are crucial in high-speed search applications and probabilistic data structures (PDS) such as Bloom ...
  24. [24]
    The FNV Non-Cryptographic Hash Algorithm - IETF
    Jan 7, 2024 · FNV (Fowler/Noll/Vo) is a fast, non-cryptographic hash algorithm with good dispersion. The purpose of this document is to make information on FNV and open ...Table of Contents · FNV Basics · Other Hash Sizes and XOR... · FNV Constants
  25. [25]
    aappleby/smhasher: Automatically exported from code ... - GitHub
    This is the home for the MurmurHash family of hash functions along with the SMHasher test suite used to verify them. SMHasher is released under the MIT license.
  26. [26]
    [PDF] SipHash: a fast short-input PRF
    Sep 18, 2012 · These servers normally use public non-cryptographic hash functions, and these attacks exploit multicollisions in the hash functions to enforce ...
  27. [27]
    Introducing CityHash | Google Open Source Blog
    Apr 11, 2011 · We're releasing two functions today: CityHash64 and CityHash128. They hash strings to 64- and 128-bit hash codes, respectively.
  28. [28]
    Questioning the Criteria for Evaluating Non-cryptographic Hash ...
    Sep 16, 2024 · Cryptographic hashes have a security requirement, which is usually phrased in terms of resistance to collisions and pre-images. Indeed, for a ...
  29. [29]
    [PDF] Cryptographic hashing Non-keyed hash functions Preimage ...
    Collision resistance ⇒. 2nd preimage resistance ⇒ preimage resistance. ◇ In other words: • Hardest to construct collision resistant hashing. • Much easier to ...
  30. [30]
    Hash Functions | CSRC - NIST Computer Security Resource Center
    NIST recommends that federal agencies transition away from SHA-1 for all applications as soon as possible. Federal agencies should use SHA-2 or SHA-3 as an ...
  31. [31]
    The BLAKE3 Hashing Framework - IETF
    Jul 20, 2024 · This document specifies the cryptographic hashing primitive BLAKE3, a secure algorithm designed to be fast and highly parallelizable.Table of Contents · Compression Function · Tree Mode of Operation
  32. [32]
    The cryptographic sponge and duplex constructions. - Keccak Team
    The sponge construction is a simple iterated construction for building a function F with variable-length input and arbitrary output length.Missing: SHA- | Show results with:SHA-
  33. [33]
    Cuckoo Hashing | SpringerLink
    Cuckoo Hashing. Conference paper; First Online: 01 January 2001. pp 121–133 ... About this paper. Cite this paper. Pagh, R., Rodler, F.F. (2001). Cuckoo ...
  34. [34]
    Extendible hashing—a fast access method for dynamic files
    article. Free access. Share on. Extendible hashing—a fast access method ... Ronald Fagin. Ronald Fagin. IBM Research Lab, San Jose, CA. View Profile. , Jurg ...
  35. [35]
    Cyclic Codes for Error Detection | IEEE Journals & Magazine
    Abstract: Cyclic codes are defined and described from a new viewpoint involving polynomials. The basic properties of Hamming and Fire codes are derived.
  36. [36]
    RFC 2104 - HMAC: Keyed-Hashing for Message Authentication
    HMAC is a mechanism for message authentication using cryptographic hash functions, using a secret key for calculation and verification.Missing: source | Show results with:source
  37. [37]
    [PDF] The Exact Security of Digital Signatures How to Sign with RSA and ...
    Mar 14, 1996 · Hash-then-decrypt schemes. A widely employed paradigm to sign a document M is to first compute some “hash” y = Hash(M) and then set the ...Missing: original | Show results with:original
  38. [38]
    [PDF] Digital Signature Standard (DSS) - NIST Technical Series Publications
    Feb 5, 2024 · The Digital Signature Standard (DSS) specifies algorithms to generate digital signatures, used to detect unauthorized data modifications and ...
  39. [39]
    [PDF] A Peer-to-Peer Electronic Cash System - Bitcoin.org
    To facilitate this without breaking the block's hash, transactions are hashed in a Merkle Tree [7][2][5], with only the root included in the block's hash.
  40. [40]
    [PDF] Hash functions: Theory, attacks, and applications - Microsoft
    Nov 14, 2005 · We survey theory and applications of cryptographic hash functions, such as MD5 and SHA-1, especially their resistance to collision-finding ...
  41. [41]
    Space/time trade-offs in hash coding with allowable errors
    Space/time trade-offs in hash coding with allowable errors. Author: Burton H ... PDFeReader. Contents. Communications of the ACM. Volume 13, Issue 7 ...
  42. [42]
    [PDF] A Study of Practical Deduplication - USENIX
    A Study of Practical Deduplication. Dutch T. Meyer*† and William J. Bolosky*. *Microsoft Research and †The University of British Columbia. {dmeyer@cs.ubc.edu ...Missing: FAST | Show results with:FAST
  43. [43]
    [PDF] HASH-RAG: Bridging Deep Hashing with Retriever for Efficient, Fine ...
    Jul 27, 2025 · We propose Hash-RAG, a framework that integrates deep hashing tech- niques with systematic optimizations to ad- dress these limitations. Our ...
  44. [44]
    [PDF] Chapter 5. Hash Tables
    A hash table is a look-up table that, when designed well, has nearly O(1) average running time for a find or insert operation. More precisely, a hash table ...Missing: enable | Show results with:enable
  45. [45]
    [PDF] Lecture 18: Hashing II - CS2351 Data Structures
    The Division Method. •In division method we map key k into one of the m slots by : h(k) = k % m. Ex : if m = 12, k = 100 → h(k) = 4. •Should avoid m = power ...
  46. [46]
    Data Structures, Algorithms, & Applications in C++ Hash Functions ...
    In this method the key is interpreted as an integer using some radix (say 10 ). The integer is divided into segments, each segment except possibly the last ...
  47. [47]
    Hashing Tutorial: Section 2.3 - Mid-Square Method - Research
    Aug 24, 2011 · The mid-square method squares the key value, and then takes out the middle r bits of the result, giving a value in the range 0 to 2r-1. This ...Missing: source | Show results with:source
  48. [48]
    A New Hashing Method with Application for Game Playing
    Jun 1, 1990 · Abstract. A general method of hash coding is described together with an application for programs which play board games such as checkers, chess, ...
  49. [49]
    Hashing Tutorial: Section 2.4 - Hash Functions for Strings - Research
    Aug 24, 2011 · Another alternative would be to fold two characters at a time. Try out the sfold hash function. See what happens for short strings, and also ...
  50. [50]
    [PDF] Efficient randomized pattern-matching algorithms - DidaWiki
    We present randomized algorithms to solve the following string-matching problem and some of its generalizations: Given a string X of length n. (the pattern) ...
  51. [51]
    String Hashing - Algorithms for Competitive Programming
    Jul 4, 2024 · The idea behind the string hashing is the following: we map each string into an integer and compare those instead of the strings.
  52. [52]
    [PDF] Chapter 10: Hashing
    • Mid-Square Function. – In the mid-square approach, the numeric value of the key is squared and the middle part is extracted to serve as the address. – If ...
  53. [53]
    [PDF] Implementation and Benchmarking of Perceptual Image Hash ...
    pHash, an open source im- plementation of various perceptual hash functions, was used to benchmark the first three functions. The latter, the block mean value ...
  54. [54]
    [PDF] Perfect hashing. - Stanford CS Theory
    We show that a perfect hash table can be built in linear expected time. The idea is to build a two-level table (see Fig. 1). In the first level, a hash function ...Missing: original paper
  55. [55]
    Universal classes of hash functions - ScienceDirect.com
    This paper gives an input independent average linear time algorithm for storage and retrieval on keys. The algorithm makes a random choice of hash function.
  56. [56]
    [PDF] Cuckoo Hashing - BRICS
    RS-01-32 Rasmus Pagh and Flemming Friche Rodler. Cuckoo Hash- ing. August 2001. 21 pp. Short version appears in Meyer auf der Heide, editor, 9th Annual European ...
  57. [57]
    Robin hood hashing | Proceedings of the 26th Annual Symposium ...
    This paper deals with hash tables in which conflicts are resolved by open addressing. The initial contribution is a very simple insertion procedure.
  58. [58]
    [PDF] Detecting Near-Duplicates for Web Crawling - Google Research
    Properties of simhash: Note that simhash possesses two conflicting properties: (A) The fingerprint of a document is a “hash” of its features, and (B) Similar ...Missing: perceptual | Show results with:perceptual
  59. [59]
    [PDF] Lecture 3: Concentration Bounds and Hashing 3.1 Birthday Paradox
    Jan 11, 2017 · 3.1 Birthday Paradox. The Birthday paradox is a well-known problem in probability theory which finds the probability that some pairs of ...
  60. [60]
    [PDF] Universal Classes of Hash Functions - cs.Princeton
    CARTER, AND M. N. WEGMAN,. Analysis of a universal class of hash functions, in “Proceedings of the Seventh. Mathematical. Foundations of Computer. Science.
  61. [61]
    CS 312 Lecture 20 Hash tables and amortized analysis
    Resizable hash tables and amortized analysis. The claim that hash tables give O(1) performance is based on the assumption that n = O(m). If a hash table has ...
  62. [62]
    [PDF] A Fast Provably Secure Cryptographic Hash Function
    In this section we will show how finding collisions or inverting any of the two proposed hash function is as hard as solving an instance of one of the NP-.
  63. [63]
    Robert G. Brown's General Tools Page - Duke Physics
    Dieharder is a random number generator testing suite, designed to test generators and make it easy to time and test them.Missing: hash | Show results with:hash
  64. [64]
    [PDF] A Statistical Test Suite for Random and Pseudorandom Number ...
    NIST Special Publication 800-22. Revision 1a. A Statistical Test Suite for Random and. Pseudorandom Number Generators for. Cryptographic Applications. Andrew ...
  65. [65]
    9.2: Choosing a good hash function - Engineering LibreTexts
    Apr 28, 2021 · Bret Mulvey proposes the use of a chi-squared test for uniformity, based on power of two hash table sizes ranging from 21 to 216.
  66. [66]
    rurban/smhasher: Hash function quality and speed tests - GitHub
    The fast hash functions tested here are recommendable as fast for file digests and maybe bigger databases, but not for 32bit hash tables. The "Quality problems" ...
  67. [67]
    Code for Testing for Avalanche - Burtleburtle
    A hash causes a delta to achieve avalanche if every output bit differs with probability about 1/2 when two inputs differ by that delta. A function is said to ...Missing: visualization | Show results with:visualization
  68. [68]
    LM Hash Cracking – Rainbow Tables vs GPU Brute Force - NetSPI
    Oct 6, 2014 · Rainbow tables use tables of hash combinations, while GPU brute force uses GPUs for brute-forcing. Rainbow tables can be slow, and GPU cracking ...Windows Hashes · Rainbow Tables · Gpu Cracking
  69. [69]
    hashcat - advanced password recovery
    Features · World's fastest password cracker · World's first and only in-kernel rule engine · Free · Open-Source (MIT License) · Multi-OS (Linux, Windows and macOS) ...Tools · Converter · Wiki · Forum
  70. [70]
    Hashcat 7.0.0 is here: Massive update for password crackers ...
    Sep 18, 2025 · The Hashcat team released a major update on August 1, 2025, with version v7.0.0—the first major release in over two years.Missing: quantum | Show results with:quantum