Fact-checked by Grok 2 weeks ago

Birthday attack

In cryptography, a birthday attack is a type of collision-finding attack on hash functions that leverages the birthday paradox—a probabilistic phenomenon where the likelihood of at least one shared birthday in a group of randomly selected people rises sharply with group size—to identify two distinct inputs producing the same hash output with far fewer computations than exhaustive search would demand.^[1] For a hash function with an output space of size n (e.g., 2^b for b-bit hashes), the attack requires approximately √(2 * n * ln(2)) ≈ 1.177 √n hash evaluations to achieve a 50% probability of collision, reducing the effort from O(n) to O(√n).^[2] This makes it a significant threat to systems relying on collision resistance, such as digital signatures and integrity checks, as demonstrated in early proposals like Gideon Yuval's 1979 method to forge signatures in Rabin's cryptosystem by generating colliding message pairs. The birthday paradox itself, which underpins the attack, illustrates that in a room of just 23 people, the probability of at least two sharing a birthday exceeds 50%, calculated as 1 minus the product of (365 - i)/365 for i from 0 to 22, yielding approximately 0.507.^[1] This result stems from the quadratic growth in pairwise comparisons (k choose 2 for k samples), making collisions inevitable sooner than intuition suggests in uniform random distributions.^[3] In cryptographic contexts, the paradox translates directly to hash outputs treated as "birthdays," enabling attackers to precompute and store many hashes until a match emerges, often using storage-optimized variants like Pollard's rho algorithm for efficiency.^[2] Historically, the birthday attack gained prominence through Yuval's work, which highlighted its potential against naive hash-based authentication by allowing an adversary to craft equivalent messages indistinguishable to the verifier. While pure birthday attacks on strong modern hashes like SHA-256 (requiring ~2^128 operations) remain computationally infeasible, they set the baseline for security analysis and have informed vulnerabilities in weaker functions, such as MD5, where collisions were found in 2^64 effort bounds, though actual breaks used more sophisticated differential cryptanalysis.^[4] Defenses include using larger hash outputs, randomized salting, or target-specific collision resistance in protocols like digital signatures to mitigate the attack's impact.^[5]

Background and Paradox

The Birthday Paradox

The birthday paradox, also known as the birthday problem, describes the counterintuitive probability that, in a random group of 23 people assuming 365 possible birthdays and uniform distribution, there is approximately a 50% chance that at least two individuals share the same birthday.^[6] This result defies common intuition, which might suggest a much larger group size is needed given the 365 possible days, but it highlights how probabilities of shared events accumulate rapidly in moderate-sized sets.^[7] The intuitive explanation centers on pairwise comparisons rather than sequential checks against all possible dates. In a group of n people, there are \binom{n}{2} = \frac{n(n-1)}{2} unique pairs, and each pair has a \frac{1}{365} probability of matching birthdays under the uniform assumption, ignoring leap years and exact date distributions.^[6] As n increases, the number of pairs grows quadratically, amplifying the likelihood of at least one match even though individual pair probabilities remain small.^[7] The exact probability P of at least one shared birthday in a group of n people is given by
P(n) = 1 - \frac{_{365}P_n}{365^n},
where _{365}P_n = \frac{365!}{(365-n)!} is the number of permutations of 365 days taken n at a time, valid for n ≤ 365.^[7] For n ≪ 365, this simplifies computationally to the product form
P(\text{no shared}) = \prod_{k=1}^{n-1} \left(1 - \frac{k}{365}\right),
so P(at least one shared) = 1 minus that value; for example, at n=23, P ≈ 0.507, and at n=57, P ≈ 0.991.^[6] The problem was first published in 1939 by Austrian mathematician Richard von Mises, though its conceptual roots lie in earlier work on probability coincidences.^[6] It was popularized in the United States through Martin Gardner's "Mathematical Games" column in Scientific American, where he discussed it as a striking example of probabilistic surprise.^[8] This paradox extends analogously to scenarios like finding collisions in cryptographic hash functions, where output values behave like birthdays.^[7]

Relation to Balls and Bins

The balls and bins model serves as a foundational abstraction for analyzing collision probabilities in discrete uniform distributions. In this setup, m balls are thrown independently and uniformly at random into n bins, with a collision defined as at least one bin receiving two or more balls. The probability of at least one such collision is given by the complement of the probability that all balls land in distinct bins, which approximates to $1 - e^{-m(m-1)/(2n)} when m is much smaller than n.^[9] This framework maps directly to the birthday paradox by treating the n bins as the possible birthdays (typically n = 365 days, ignoring leap years) and the m balls as individuals. The approximation shows that a collision probability of roughly 50% arises when m ≈ √(2n ln 2), or approximately √(2n) for rough estimation. For the standard case of n = 365, m ≈ 23 yields a collision probability exceeding 50%.^[9] The model extends beyond birthdays to any uniform random mapping from a domain to a codomain of size n, such as random selections or functions over finite sets.^[10] The core insight lies in the quadratic scaling of pairwise comparisons: with m elements, there are \binom{m}{2} ≈ m²/2 possible pairs, each colliding with probability 1/n, yielding an expected number of collisions of m²/(2n). Collisions thus become probable after Θ(√n) trials, rather than Θ(n), due to this pairwise quadratic growth.^[9]

Cryptographic Application

Hash Function Collisions

A hash collision occurs when two distinct inputs produce the same output value from a hash function.^[11] In cryptographic contexts, ideal hash functions are assumed to behave like random functions that map inputs uniformly to a fixed-size output space of $2^k possible values, where k is the bit length of the hash.^[12] This uniformity assumption underpins security analyses, treating the hash as unpredictable and evenly distributed across its range. Under this model, the birthday paradox implies that finding a collision requires evaluating approximately \sqrt{2^k} = 2^{k/2} distinct inputs to achieve about a 50% probability of success.^[4] This effort is significantly less than the exhaustive $2^k trials needed to cover the entire output space, highlighting the practical vulnerability of hash functions to collision searches.^[4] Unlike a second preimage attack, which seeks a different input that matches the hash of a specific given input, a collision attack aims only to identify any pair of distinct inputs sharing the same hash value, without targeting a particular output. This distinction makes collision resistance a stricter requirement, as it demands security against finding arbitrary pairs rather than targeted inversions. The cryptographic relevance of such collisions was first noted in 1976 by Diffie and Hellman, who discussed the challenges of constructing collision-resistant one-way functions for authentication.^[13] The birthday attack itself was formalized in 1979 by Yuval, who demonstrated its application to forging digital signatures by exploiting hash collisions.^[14]

Attack Execution

The primary goal of a birthday attack is to generate two distinct messages that produce the same hash value, enabling an attacker to forge data or circumvent integrity verification mechanisms in cryptographic systems.^[15] The procedure involves selecting a hash function with an output size of k bits, yielding $2^k possible hash values, and then generating approximately $2^{k/2} random input messages. For each input, compute its hash and store the pairs of (hash, input) in a data structure such as a hash table, sorted list, or tree to facilitate quick lookups. Continue until a duplicate hash is detected, indicating a collision between two different inputs. This process exploits the probabilistic nature of hash distributions to find matches with high likelihood after roughly \sqrt{2 \cdot 2^k} trials.^[15] The computational complexity of this attack is O(2^{k/2}) in both time and space under the standard approach, as it requires evaluating the hash function that many times and storing the results. For instance, with k = 80, the attack demands about $2^{40} operations, which is feasible with modern hardware, whereas for k = 256, the $2^{128} scale renders it practically impossible.^[15] To address the high storage demands, optimizations like Floyd's cycle detection algorithm (also known as the tortoise-and-hare method) can be applied, treating the hash chain as a functional graph and detecting cycles with constant O(1) space by advancing two pointers at different speeds through iterated hash applications until they collide. Parallel computation across multiple machines can further distribute the workload, reducing effective memory per node while maintaining the overall time complexity.^[15] A hypothetical pure birthday attack on MD5, which has k = 128, would require approximately $2^{64} operations, though in 2004, Wang et al. demonstrated practical MD5 collisions using a differential cryptanalysis method with far lower complexity of about $2^{39} evaluations, highlighting the function's vulnerability to collision attacks.^[15]^[16]

Mathematical Foundations

Probability Derivation

The probability of no collision after m trials in a space of n possible outputs is given exactly by the product

\prod_{i=1}^{m-1} \left(1 - \frac{i}{n}\right) = \frac{n!}{(n-m)! n^{m-1}},

assuming m \leq n; this follows from the sequential probability that each new output differs from all previous ones, starting with the first trial having probability 1, the second having (n-1)/n, the third (n-2)/n, and so on up to the m-th having (n-m+1)/n.^[17] To derive this, consider the outputs as independent and uniformly distributed over \{1, 2, \dots, n\}, akin to drawing balls with replacement from an urn of n distinct balls; the probability of no repeated ball in m draws is the number of injective functions from m elements to n elements divided by the total n^m possible outcomes, yielding the falling factorial form above.^[17] Under the assumption of pairwise independence (a weaker condition than full independence for collision probabilities), the pairwise collision probability for any two distinct trials is exactly $1/n, leading to the expected number of collisions E = \binom{m}{2} \cdot (1/n) = m(m-1)/(2n).^[17] For approximation, take the natural logarithm of the exact no-collision probability:

\ln \left( \prod_{i=1}^{m-1} \left(1 - \frac{i}{n}\right) \right) = \sum_{i=1}^{m-1} \ln \left(1 - \frac{i}{n}\right).

When m \ll n, use the Taylor expansion \ln(1 - x) \approx -x - x^2/2 - \cdots for small x = i/n, truncating at the linear term to obtain \sum_{i=1}^{m-1} (-i/n) = -m(m-1)/(2n); exponentiating yields the approximation

\prod_{i=1}^{m-1} \left(1 - \frac{i}{n}\right) \approx e^{-m(m-1)/(2n)},

so the collision probability is approximately $1 - e^{-m(m-1)/(2n)}.^[17] This Poisson-like approximation holds well when m = O(\sqrt{n}), as higher-order terms in the expansion become negligible.^[17] Solving for m such that the collision probability equals p, set $1 - e^{-m(m-1)/(2n)} = p, or e^{-m(m-1)/(2n)} = 1-p; taking logs gives m(m-1)/(2n) \approx \ln(1/(1-p)), and for small p or large n, m \approx \sqrt{2n \ln(1/(1-p))}.^[17] For a 50% chance (p=0.5), \ln(2) \approx 0.693, so E \approx 0.693 and m \approx \sqrt{2n \ln 2} \approx 1.177 \sqrt{n}.^[17] The assumptions of uniform randomness and independence are critical; if outputs are not uniformly distributed, collisions may occur sooner (if biased toward certain values) or later, potentially invalidating the model—real hash functions aim to behave like random oracles to satisfy these.^[17] For the exact probability when m/n is small, the binomial expansion of (1 - i/n) can refine the approximation, but the exponential form dominates for practical collision thresholds.^[17] This discrete model parallels the balls-and-bins paradigm in probability theory.^[17]

Approximation Methods

The Poisson approximation provides a practical method for estimating the probability of a collision in the birthday attack without computing the exact combinatorial formula. Under this approximation, the number of potential pairwise collisions follows a Poisson distribution with parameter \lambda = \frac{m(m-1)}{2n} \approx \frac{m^2}{2n}, where m is the number of trials and n is the size of the output space. The probability of at least one collision is then p \approx 1 - e^{-\lambda}.^[17] To find the number of trials m required for a specific success probability p, solve $1 - e^{-\lambda} = p, yielding \lambda = -\ln(1-p) and thus m \approx \sqrt{2n \lambda} = \sqrt{-2n \ln(1-p)}. This closed-form expression allows quick estimates of attack feasibility. For example, a 50% success probability (p = 0.5) requires \lambda = \ln 2 \approx 0.693, so m \approx \sqrt{1.386 n} \approx 1.177 \sqrt{n}.^[17] For higher success probabilities, the required trials scale accordingly. Achieving p = 0.99 demands \lambda \approx 4.605, resulting in m \approx \sqrt{9.21 n} \approx 3.035 \sqrt{n}. These heuristics, derived from the Poisson model, enable cryptanalysts to assess vulnerability based on hash output length without exhaustive computation.^[17] In cryptographic contexts, such approximations highlight the impracticality of generic birthday attacks against secure hash functions. For SHA-1 with a 160-bit output (n = 2^{160}), approximately $1.177 \times 2^{80} \approx 2^{80} trials are needed for a 50% collision chance. Although the raw computational effort of $2^{80} hash evaluations is large—requiring on the order of years for a large cluster at rates around $10^{18} hashes per second as of 2025—the vanilla attack also demands infeasible storage of \approx 2^{80} hash values (on the order of yottabytes). Space-efficient methods, such as Pollard's rho algorithm, reduce storage to O(1) or O(\sqrt{n}) with parallelization but retain the O(\sqrt{n}) computational cost, keeping generic attacks on 160-bit hashes challenging for non-nation-state adversaries.^[17] This approximation assumes m \ll n to neglect higher-order effects like multiple collisions in the same output value, breaking down when m approaches n and the space nears full occupancy, where the exact probability saturates at 1 but the model overestimates intermediate risks.^[17]

Vulnerabilities and Examples

Digital Signature Weaknesses

Digital signature schemes that employ a hash-then-sign paradigm are particularly susceptible to birthday attacks, as these attacks exploit the reduced computational effort required to find hash collisions compared to exhaustive search. In such schemes, the signer computes the hash of the message and signs the hash value using a public-key algorithm like RSA, rather than the entire message. An attacker can generate numerous message variants until two messages, one benign (m1) and one malicious (m2), produce the same hash output, h(m1) = h(m2). By tricking the signer into producing a valid signature on m1, the attacker can then attach that signature to m2, forging authentication without the signer's knowledge or consent. This mechanism was first outlined in the context of Rabin's signature scheme, highlighting how the birthday paradox enables efficient collision discovery to undermine signature integrity.^[14] Vulnerable schemes include classic constructions like RSA-PKCS#1 v1.5 with short hash functions such as MD5 or SHA-1, where collisions in the hash directly compromise the signature's validity for altered content. If an attacker forges a collision before the signing process, the signature remains cryptographically valid across both messages, bypassing checks that assume hash uniqueness. Historical demonstrations underscore this weakness: in 2004, Wang et al. revealed practical collisions for MD5, enabling theoretical forgery in MD5-based signatures with feasible computation. Building on this, Stevens, Lenstra, and de Weger in 2007 extended the attack to chosen-prefix collisions for MD5, allowing targeted forgeries like colliding X.509 certificates for different identities, further exposing signature protocols reliant on weak hashes.^[18]^[19] The impact of these attacks severely erodes the non-repudiation property of digital signatures, as the signer cannot plausibly deny authoring the forged message since the signature verifies correctly against it. To counter this, signers must rigorously verify the full message content prior to signing, rather than trusting the hash alone, though this adds operational overhead. A prominent real-world exploitation occurred in 2012 with the Flame malware, which leveraged a sophisticated MD5 chosen-prefix collision— a variant of techniques from prior research—to forge Microsoft Windows update certificates. This allowed the malware to propagate as legitimate signed software on infected systems, evading detection and highlighting the practical dangers of legacy hash functions in signature chains.^[20]^[21]

Other Cryptographic Contexts

The birthday attack extends to X.509 certificates, where collisions in certificate hashes can enable the creation of forged certificates that appear valid to relying parties. In 2017, the SHAttered attack demonstrated the first practical collision for full SHA-1, raising immediate concerns about its use in certificate signing, as such collisions could allow attackers to generate rogue certificates mimicking legitimate ones issued by trusted authorities, similar to prior MD5-based forgeries that compromised HTTPS security.^[22] Although modern browsers have phased out SHA-1 support, legacy systems and certain TLS deployments remain vulnerable to chosen-prefix collisions, which build on birthday attack principles to target structured certificate data like serial numbers.^[23] In password storage, unsalted hashes are susceptible to collision searches via birthday attacks, particularly with short output lengths that reduce the effective search space. For instance, the legacy LAN Manager (LM) hash, with its 128-bit output, allows attackers to find collisions after approximately 2^{64} attempts in theory; however, practical vulnerabilities arise more from other design flaws, such as unsalted storage and a reduced effective input space due to uppercase conversion and length limits up to 14 characters, enabling efficient preimage and dictionary attacks rather than birthday collisions.^[24] This vulnerability is exacerbated in unsalted environments, where precomputed tables like rainbow tables can exploit hash chains, but direct collision attacks further undermine security by allowing multiple weak passwords to map to identical hashes.^[25] Weak random number generators (RNGs) with limited entropy pools heighten the risk of key collisions in cryptographic protocols, as the birthday paradox implies that even modestly sized key sets can produce duplicates when drawn from a reduced output space. In systems relying on pseudorandom number generation, insufficient entropy—such as from predictable seeds—effectively shrinks the key space, making it feasible for adversaries to encounter collisions after generating on the order of the square root of the entropy bits, compromising confidentiality in applications like session keys or nonces.^[26] In blockchain and Merkle trees, short hash functions introduce potential transaction collisions during proof-of-work mining, where the birthday paradox halves the effective security margin of the hash output length. For Merkle trees aggregating transaction hashes into a root commitment, collisions in intermediate nodes could enable double-spending or invalid block inclusions if the hash size is insufficient, as attackers might find matching hashes for conflicting transactions after roughly 2^{n/2} trials for an n-bit hash.^[27] As of 2025, post-quantum cryptography increasingly accounts for birthday bounds in designing lattice-based hash functions, ensuring collision resistance against both classical and quantum adversaries in protocols like Fiat-Shamir signatures. Lattice-based constructions, such as those in NIST-standardized schemes, parameterize hash outputs to withstand birthday attacks up to security levels of 2^{128} or higher, incorporating the paradox's probabilistic scaling to maintain integrity in quantum-resistant environments.^[28]

Variations and Extensions

Yuval's birthday attack

Yuval's birthday attack is a variant of the birthday attack that finds a collision between two families of related inputs, such as variants of a legitimate message m_1 and a fraudulent message m_2, such that there exist m_1' and m_2' with \hash(m_1') = \hash(m_2'), by exploiting the birthday paradox through targeted message modifications. This approach adapts the standard collision-finding technique to scenarios where the attacker controls variations around specific inputs, making it particularly relevant for applications like digital signature forgery or bypassing verification mechanisms that rely on unique hashes. First described by Gideon Yuval in 1979, the attack highlights vulnerabilities in hash-based systems where minor message tweaks can be used to align hashes probabilistically.^[14] In Yuval's method, the attacker starts with a legitimate target message m_1 and a desired fraudulent message m_2. The process generates approximately t = 2^{n/2} minor modifications of m_1 (e.g., changing a few bits or padding subtly to preserve semantic meaning), computes the hash of each, and stores the results indexed by the first n/2 bits of the hash value, where n is the hash output length. Then, the attacker generates minor modifications of m_2, computes their hashes, and checks if the leading n/2 bits match any stored entry from the m_1 variations. Due to the birthday paradox, a match in the full hash is expected after generating about $2^{n/2} such modifications for m_2. Upon finding a match, the corresponding full messages m_1' and m_2' collide under the hash, allowing the legitimate signature (or verification) for m_1' to apply to m_2'. This requires O(2^{n/2}) hash computations and storage, similar to a standard birthday attack but with roughly twice the constant factor due to the dual-set generation.^[14] (citing Yuval's method in context of hash security) The attack's complexity remains O(2^{n/2}) overall, offering a significant improvement over the generic collision search in some contexts, though the higher constants and need for message modifications limit its efficiency for arbitrary forgeries. It is most effective against hash functions with weak collision resistance, such as early designs lacking strong padding or strengthening. For instance, the attack has been applied to broken hash functions like HAVAL (with 128-bit output), where practical collision vulnerabilities (e.g., 5-pass HAVAL collisions in $2^{123} work) enable the targeted variant to succeed within feasible bounds, facilitating verification bypass in legacy systems. Unlike the standard birthday attack, which seeks any collision without prior knowledge of inputs, Yuval's variant requires target-specific adaptations, rendering it less versatile for blind forgery but valuable in scenarios demanding collisions tied to known data.

Generalized Collision Attacks

Generalized collision attacks extend the birthday paradox principle to scenarios involving multiple inputs colliding on the same output or hybrid resource tradeoffs, enabling more sophisticated exploits against hash-based structures. Multi-collision attacks aim to find sets of $2^t distinct messages that all map to the identical hash value under a function with k-bit outputs. Unlike pairwise collisions, these require identifying larger clusters in the hash space. A generic approach leverages the structure of iterated hash functions, building collisions incrementally by appending colliding suffixes to prefixes, achieving a complexity of approximately $2^{k/2 + t - 1} compression function evaluations using techniques that exploit cycles and trees in the functional graph of the hash iteration. In 2004, Jacques Joux introduced an efficient method for constructing such multi-collisions in iterated hash functions, demonstrating that finding an r-collision (with r = 2^t) requires only about r \cdot 2^{k/2} evaluations, far less than the naive r^2 \cdot 2^k expected for independent searches. This technique iteratively finds collisions at each level of message expansion, allowing the attacker to "peel" the hash computation to reveal multiple paths to the same output. For t = k/2, yielding r = 2^{k/2} colliding messages, the attack runs in O(2^k) time—equivalent to a preimage search—effectively breaking the security of tree-based hash constructions like Merkle trees, where an attacker can select from a massive set of colliding inputs to forge valid proofs across branches.^[29] Hellman-Merkle time-memory tradeoffs adapt the classic birthday attack for resource-constrained environments, such as partial searches where full space enumeration is infeasible. Originally proposed by Martin Hellman in 1980 for key recovery, the method precomputes chains of function iterations from starting points, storing endpoints in tables to trade memory M for online time T, satisfying T M^{1/3} \approx 2^k for a k-bit search space. In collision contexts, this enables finding hash collisions by treating hash outputs as chain endpoints and using multiple reduction functions to cover the space probabilistically, reducing the effective birthday bound when memory is limited (e.g., achieving near-$2^{k/2} collisions with T = M = 2^{2k/3}). Ralph Merkle refined this in 1980 with distinguished points to optimize chain storage, making it practical for partial birthday attacks in low-memory settings like embedded devices.^[30] These attacks have practical applications beyond pure cryptanalysis, such as in denial-of-service (DoS) via hash flooding, where attackers generate colliding inputs to degrade hash table performance in network devices. In the 2010s, vulnerabilities in routers and servers exploited weak hash functions (e.g., in Apache and Nginx implementations), causing quadratic resizing and memory exhaustion with minimal traffic, as demonstrated in algorithmic complexity attacks that force worst-case behavior through collisions. Similar issues affected routing protocols, amplifying DoS impact in BGP sessions. In digital forensics, multi-collisions aid file carving by identifying multiple candidate fragments matching the same hash signature, though this requires careful validation to avoid false positives. Modern extensions consider quantum threats, where Grover's algorithm, combined with the Brassard-Høyer-Tapp (BHT) variant, reduces the classical birthday bound from O(2^{k/2}) to O(2^{k/3}) for collision searches. The BHT method uses quantum superposition to simulate multiple parallel birthday trials, querying the hash oracle $2^{k/3} times to find a collision with high probability, halving the effective security of symmetric hashes like SHA-256 against quantum adversaries. As of 2025, with advancing quantum hardware, this prompts recommendations for larger hash outputs (e.g., 384+ bits) in post-quantum designs to restore $2^{k/2} classical-equivalent security.

Countermeasures

Hash Length Recommendations

To achieve collision resistance against birthday attacks, the output length k of a hash function must be selected such that the expected number of operations required, approximately $2^{k/2}, exceeds the computational resources available to potential attackers. NIST recommends aligning the hash function's security strength with the overall system requirements, where the security strength for collision resistance is typically k/2 bits.^[31] For a desired security level of s bits, a hash output length of k = 2s bits is thus advised to ensure the birthday bound provides at least s-bit resistance.^[31] NIST guidelines specify that for 128-bit security—suitable for most modern cryptographic applications such as digital signatures and key derivation—a minimum hash output length of 256 bits is required, corresponding to the birthday bound of $2^{128} operations. Examples include SHA-256 and SHA3-256, both providing 128-bit collision resistance under classical computing models. SHA-256 remains secure against practical birthday attacks as of 2025, as the $2^{128} computational cost far exceeds the capabilities of current classical supercomputers, which perform on the order of $10^{18} operations per second.^[31]^[32]^[33] Longer hash outputs enhance resistance but introduce tradeoffs, including increased computational overhead for hashing and storage. For instance, SHA-512 or SHA3-512 offers 256-bit security but requires roughly twice the processing time of 256-bit variants on typical hardware. To balance security and performance while future-proofing against potential advances in collision-finding techniques, NIST-standardized SHA-3 (based on the Keccak sponge construction) is recommended over SHA-2 family members for new designs, as it provides equivalent security levels with a different internal structure less vulnerable to certain differential attacks.^[34] Additionally, BLAKE3—a parallelizable hash function derived from BLAKE2—offers 256-bit output with superior speed on multi-core systems (up to 10-20 times faster than SHA-256 for large inputs) while maintaining provable collision resistance under the same birthday bounds, making it suitable for high-throughput applications.^[35] Historical shifts in hash adoption underscore the importance of these recommendations. Following the practical collision attack on MD5 demonstrated in 2005, which reduced its effective security below 64 bits using differential cryptanalysis, widespread migration occurred to SHA-256 for 128-bit security. Similarly, after Google and CWI researchers produced the first practical SHA-1 collision in 2017—confirming its 80-bit security was inadequate—NIST deprecated SHA-1 for all uses, accelerating adoption of longer hashes like SHA-256 and SHA-3.^[36]

Alternative Security Measures

One approach to mitigating birthday attacks involves employing authenticated encryption with associated data (AEAD) schemes or hash-then-MAC constructions, which integrate message authentication directly into the encryption process without relying solely on unauthenticated hashes for verification. These methods prevent collision-based forgery by ensuring that any alteration in the input, including those induced by hash collisions, invalidates the authentication tag, thereby protecting against unauthorized modifications in protocols like secure messaging. For instance, AEAD modes such as those analyzed in generic constructions provide security against forgery even when underlying primitives approach their birthday bounds, offering robustness in scenarios where direct hash signing is avoided.^[37]^[38] Randomized hashing techniques further enhance resistance by incorporating salts or nonces into the hashing process for each unique computation, effectively expanding the input domain and forcing attackers to generate distinct collision sets for every instance rather than reusing precomputed ones. In digital signature schemes, this is exemplified by the Probabilistic Signature Scheme (PSS), where a random salt is appended to the message before hashing, randomizing the input and thwarting existential forgery attacks that exploit hash collisions across multiple messages. This randomization ensures that even if a collision exists in the base hash function, the probabilistic padding makes it computationally infeasible to craft colliding padded messages that align with valid signatures.^[39]^[40] To achieve provable security against collision-based attacks, cryptographers often model hash functions as random oracles, an idealized framework where the hash behaves as a truly random function, providing collision resistance up to the birthday bound with high probability. Under this model, schemes like hash-and-sign signatures can be proven secure assuming the underlying trapdoor permutation (e.g., RSA) is hard, as the random oracle prevents efficient collision finding beyond the square root of the output size. This approach underpins widely adopted standards, ensuring that protocols remain secure as long as the hash instantiation approximates the ideal oracle behavior. Detection mechanisms, such as hash tree verification using Merkle trees, enable efficient integrity checks in high-volume systems by allowing parties to verify subsets of data without recomputing all hashes, potentially identifying anomalous collisions if they propagate to inconsistent tree roots. In such structures, each leaf holds a data hash, and internal nodes aggregate pairwise hashes, providing a compact proof of inclusion or alteration; if a birthday-induced collision affects a leaf, the verifier can traverse the path to detect discrepancies during audits. Anomaly detection complements this by monitoring hash distributions in large-scale deployments, flagging unusual collision rates that may indicate an attack. These methods are particularly useful in distributed systems like blockchains, where verifying vast datasets against collisions is essential. Post-quantum cryptographic alternatives, such as lattice-based digital signature schemes like CRYSTALS-Dilithium, offer inherent resistance to birthday bounds by basing security on the hardness of lattice problems rather than solely on hash collision resistance, while still incorporating secure hashes for transforms like Fiat-Shamir. Dilithium's module-lattice construction achieves chosen-message security with parameters tuned to withstand both classical and quantum adversaries, including those exploiting quantum-accelerated birthday searches on auxiliary hashes, ensuring the scheme remains viable even as classical hash lengths alone prove insufficient against advanced threats. This shift to non-hash-centric primitives provides long-term protection in environments transitioning to quantum-resistant cryptography.^[41]

References

[1]
[PDF] birthday attack - Columbia Math Department
The birthday attack is a method to find collisions in a cryptographic hash function, based on the birthday paradox. Having the same birthday is the analogue of ...
[2]
Birthday Attack - an overview | ScienceDirect Topics
A birthday attack is a cryptographic attack where the probability of two or more people sharing a birthday is greater than 50% in a room with 23 or more people.
[3]
https://www.sciencedirect.com/science/article/pii/B9781597495660000035
[4]
[PDF] Hash Function Balance and its Impact on Birthday Attacks - CSE Home
In practice, hash functions need to be designed not only to resist the birthday attack but also to resist cryptanalytic attacks, and for this, random behavior ...
[5]
https://www.sciencedirect.com/science/article/pii/B9780128053911000031
[6]
Birthday problem | Definition, Solution, Equations, & Facts - Britannica
Oct 3, 2025 · Although there is debate over who came up with the original birthday problem, a version was first published by Austrian mathematician Richard ...
[7]
Birthday Problem -- from Wolfram MathWorld
Consider the probability Q_1(n,d) that no two people out of a group of n will have matching birthdays out of d equally possible birthdays.Missing: source | Show results with:source
[8]
[PDF] Hexaflexagons and Other Mathematical Diversions
Hexaflexagons and other mathematical diversions : the first Scientific American book of puzzles & games : with a new afterword 1 by Martin Gardner. -University ...
[9]
[PDF] Lecture 5 - Independence 1 Independent Events - csail
3 The Birthday Paradox. Suppose that there are 23 people in a room. ... I throw at random into a collection of 1,000,000 bins before the probability that some bin.
[10]
[PDF] Lecture 3: Balls in Bins
Theorem 15 (Birthday Paradox). If you gather 23 people in a room, then with probability 50% there will be two sharing a birthday. ... balls fall in this bin ...
[11]
collision - Glossary | CSRC
Definitions: An event in which two different messages have the same message digest. Sources: NIST SP 800-106 under Collision. NIST SP 800-107 Rev.
[12]
[PDF] Random Oracles are Practical: A Paradigm for Designing Efficient ...
For simplicity assume it is uniform; see below for how to patch things for standard ones. Let H: {0,1}∗ → {0,1}k denote as usual a random hash function. Let G ...
[13]
[PDF] New Directions in Cryptography - Stanford Electrical Engineering
Diffie and M. E. Hellman, “Multiuser cryptographic techniques,” presented at National Computer Conference, New York, June 7-10,. 1976. [6] D. Knuth, The Art of ...
[14]
HOW TO SWINDLE RABIN
HOW TO SWINDLE RABIN. Gideon Yuval. CRYPTOLOGIA. Abstract: a computational short-cut is shown, which can compromise the secur- ity of Rabin's digital signature ...
[15]
[PDF] Notes #12: Collision Finding 12.1 Birthday Attack
The birthday attack finds collisions by computing H(k, x) for different x values until a collision is found, exploiting the birthday paradox.
[16]
[PDF] How to Break MD5 and Other Hash Functions - Merlot
In this paper we present a new powerful attack on MD5 which allows us to find collisions efficiently. We used this attack to find collisions of MD5 in about 15 ...
[17]
None
### Summary of Birthday Attacks and Probability of Collision from Chapter 2 of Handbook of Applied Cryptography
[18]
[PDF] SHA-1 is a Shambles∗ - Cryptology ePrint Archive
Our actual attack required two months of computations using 900 Nvidia GTX 1060 GPUs (we paid US$ 75k because GPU prices were higher, and we wasted some time ...
[19]
[PDF] Collisions for Hash Functions MD4, MD5, HAVAL-128 and RIPEMD
Aug 17, 2004 · MD5 is the hash function designed by Ron Rivest [9] as a strengthened version of MD4 [8]. In 1993 Bert den. Boer and Antoon Bosselaers [1] ...
[20]
[PDF] Chosen-prefix Collisions for MD5 and Colliding X.509 Certificates ...
We describe how chosen-prefix collisions for MD5 can be constructed, and show that our method is practical by constructing two MD5 based X.509 certificates with ...
[21]
Flame malware collision attack explained - Microsoft
Jun 6, 2012 · On systems that pre-date Windows Vista, an attack is possible without an MD5 hash collision. This certificate and all certificates from the ...
[22]
[PDF] Reverse-engineering of the cryptanalytic attack used in the Flame ...
Sep 7, 2015 · Stevens recently proposed to counter these threats using counter-cryptanalysis. [Ste13], specifically a collision detection algorithm, i.e., an ...
[23]
[PDF] The first collision for full SHA-1 - SHAttered.io
Feb 23, 2017 · As it turned out, it used a forged signature to infect Windows machines via a man-in-the-middle attack on. Windows Update. Using a new technique ...
[24]
[PDF] First Chosen-Prefix Collision on SHA-1 and Application to the PGP ...
Aug 12, 2020 · When renting cheap GPUs, this translates to a cost of US$ 11k for a collision, and US$ 45k for a chosen-prefix collision, within the means of ...<|control11|><|separator|>
[25]
Of History & Hashes: A Brief History of Password… - TrustedSec
May 30, 2015 · This is the classic “Birthday Attack”. ... The two cipher texts are then concatenated to produce the LM hash and store the hash in the SAM file.
[26]
Birthday Attacks, Collisions, And Password Strength - Auth0
Mar 23, 2021 · The answer to this question is better illustrated by a famous problem in probability theory known as the Birthday Problem.
[27]
[PDF] Cryptanalytic Attacks on Pseudorandom Number Generators
A permanent compromise attack occurs if, once an attacker compromises S at time t, all future and past. S values are vulnerable to attack. (c) Iterative ...<|separator|>
[28]
[PDF] Merkle Trees in Blockchain: A Study of Collision Probability ... - arXiv
Therefore, the "birthday paradox" strategy leads to a halving of the "effective" hash code length. In the context of our problem, this means that a ...
[29]
A post-quantum lattice-based lightweight anonymous authentication ...
Apr 7, 2025 · However, this scheme is not secure against the quantum attack. An ... Utilizing the birthday paradox [16], we get the following result:.
[30]
Multicollisions in Iterated Hash Functions. Application to Cascaded ...
Advances in Cryptology – CRYPTO 2004 (CRYPTO 2004). Multicollisions in ... We also discuss the potential impact of our attack on several published schemes.
[31]
[PDF] A Cryptanalytic Time - Memory Trade-Off
They think of a cipher- text-only attack in which the cryptanalyst possesses only ciphertext and some statistical knowledge of the plaintext. Aside from the ...
[32]
[PDF] Recommendation for Key Management: Part 1 - General
May 5, 2020 · NIST SP 800-57 PART 1 REV. 5. RECOMMENDATION FOR KEY MANAGEMENT: PART ... The security strength of a hash function is determined by the properties ...
[33]
https://csrc.nist.gov/projects/hash-functions
[34]
Hash Functions | CSRC - NIST Computer Security Resource Center
Jan 4, 2017 · Collision resistance: It is computationally infeasible to find two different inputs to the hash function that have the same hash value.NIST Policy · News & Updates · Events · SHA-3 Standardization
[35]
https://raw.githubusercontent.com/BLAKE3-team/BLAKE3-specs/master/blake3.pdf
[36]
[PDF] BLAKE3 - GitHub
We present BLAKE3, an evolution of the BLAKE2 cryptographic hash that is both faster and also more consistently fast across different platforms and input ...<|control11|><|separator|>
[37]
Improved Collision Attack on Hash Function MD5
In this paper, we present a fast attack algorithm to find two-block collision of hash function MD5. The algorithm is based on the two-block collision ...
[38]
[PDF] A Simple and Generic Construction of Authenticated Encryption With ...
We describe a simple and generic construction of an AEAD protocol from an AE protocol and a collision resistant hash function. This is done in two simple steps.
[39]
[PDF] GLEVIAN and VIGORNIAN: Robust beyond-birthday AEAD modes
Sep 20, 2023 · GLEVIAN and VIGORNIAN are AEAD modes with beyond-birthday security, security against nonce misuse, and against the release of unverified ...
[40]
6.8. Using a Nonce to Protect Against Birthday Attacks - O'Reilly
Use a nonce or salt before and after your message (preferably a securely generated random salt), padding the nonce to the internal block size of the hash ...
[41]
[PDF] Exact Security Analysis of Hash-then-Mask Type Probabilistic MAC ...
Thus, an appropriate usage of salt can enhance the security to go beyond birthday barrier without requiring larger domain PRF. Different Types of Hash-then-MAC ...Missing: prevent | Show results with:prevent
[42]
Dilithium - CRYSTALS
Feb 16, 2021 · Dilithium is a digital signature scheme that is strongly secure under chosen message attacks based on the hardness of lattice problems over module lattices.Missing: birthday | Show results with:birthday