Fact-checked by Grok 2 weeks ago

SipHash

SipHash is a family of lightweight pseudorandom functions (PRFs) developed by Jean-Philippe Aumasson and in 2012, optimized for short inputs and 128-bit keys to enable fast, secure hashing primarily for hash-table lookups and network traffic authentication. It employs an add-rotate-XOR (ARX) construction with a simple state of four 64-bit words, processing data in 64-bit blocks through configurable rounds of a core "SipRound" operation consisting of additions, XORs, and rotations. The design emphasizes key agility, simplicity, and resistance to side-channel attacks, with no reliance on external cryptographic primitives. Variants such as SipHash-2-4 (two compression rounds and four finalization rounds) and SipHash-4-8 balance security and performance, producing a 64-bit output suitable for message authentication codes (MACs) but not for collision-resistant hashing. Security proofs establish it as a strong PRF, secure against key recovery (requiring approximately 2^{127} evaluations), state recovery, and differential attacks up to the 128-bit key size, though its 64-bit output limits it against exhaustive forgery guesses of 2^{64}. Performance benchmarks highlight its efficiency: on an AMD FX-8150 processor, SipHash-2-4 processes a 16-byte input in 140 cycles, outperforming HMAC-SHA-1 for short messages and competing with non-cryptographic hashes like CityHash while providing cryptographic guarantees. It achieves about 1.96 cycles per byte for longer inputs on similar hardware, making it ideal for high-throughput scenarios. SipHash addresses vulnerabilities like hash-flooding denial-of-service (DoS) attacks by using a secret key to produce unpredictable hash values, preventing attackers from forcing collisions in hash tables. It has been widely adopted in production software: since Python 3.4, it serves as the default hash algorithm for built-in types like dictionaries to mitigate collision attacks, with randomization enabled by default. The Linux kernel integrated SipHash and a half-sized variant (HalfSipHash) starting in version 4.10 (2017) for secure sequence numbers and hash tables, replacing slower options like MD5. OpenBSD incorporated it from version 5.7 (2015) for hash table protection and other keyed hashing needs.

Background and Development

Origins

SipHash was developed by cryptographers Jean-Philippe Aumasson and as a family of pseudorandom functions designed for efficiency with short inputs. The algorithm was first introduced in their 2012 research paper titled "SipHash: a fast short-input PRF," published on the IACR ePrint archive in June 2012 and later presented at the INDOCRYPT conference in December of that year. The creation of SipHash was prompted by vulnerabilities in non-cryptographic hash functions exposed during demonstrations of hash-flooding denial-of-service (DoS) attacks in 2011 and 2012. These attacks targeted hash tables commonly used in web applications and programming languages, where adversaries could craft inputs to generate collisions, forcing all elements into the same bucket and degrading average-case O(1) lookup performance to worst-case O(n²) complexity. A seminal demonstration occurred at the 28th in December 2011, where researchers Alexander Klink and Julian Wälde detailed practical exploits against platforms like , , and , showing how small inputs (e.g., a few megabytes) could consume excessive . Follow-up analyses in 2012, including at the 29th , further highlighted the widespread impact on systems relying on predictable hash functions. The primary motivation for SipHash was to provide a keyed, cryptographically secure alternative to vulnerable hash functions, particularly for protecting hash-table lookups against such flooding attacks while maintaining high performance for short inputs typical of keys like strings or integers in network packets and data structures. By incorporating a secret key, SipHash ensures that outputs are unpredictable without key knowledge, thwarting collision-based DoS without the overhead of slower cryptographic primitives. This addressed a critical gap identified in earlier work, such as the 2003 analysis of algorithmic complexity attacks by Crosby and Wallach, but tailored specifically to the real-world threats emerging in web infrastructure.

Design Goals

SipHash was designed primarily to serve as a fast, secure keyed pseudorandom function (PRF) optimized for short inputs, typically ranging from 8 to 64 bytes, making it particularly suitable for protecting hash tables against collision-based denial-of-service (DoS) attacks. This focus addressed vulnerabilities exposed in 2011, where attackers exploited weak hash functions in software to degrade performance through crafted inputs. By providing a cryptographic PRF, SipHash ensures that hash values behave indistinguishably from random outputs, thereby resisting deliberate collisions while maintaining efficiency in resource-constrained environments. Performance targets emphasized high speed on modern CPUs through the use of simple ARX (addition, rotation, XOR) operations, which allow for rapid computation without relying on complex instructions or large code footprints. The design aims for minimal memory usage, with a small state size and straightforward iteration, enabling it to process short inputs significantly faster than prior cryptographic MACs like while remaining competitive with non-cryptographic hashes. For instance, on contemporary , SipHash achieves processing times around 140 cycles for a 16-byte input, establishing its viability for real-time applications. Security objectives center on delivering a 128-bit level against brute-force and collision attacks, leveraging a 128-bit key to produce 64-bit outputs that meet the highest standards for PRF distinguishability. Unlike more elaborate alternatives such as , SipHash prioritizes simplicity to reduce implementation errors and auditing complexity, while still providing robust protection for its intended scope. This balance makes it ideal for target use cases like network traffic and maintaining integrity in software interpreters and operating system kernels.

Technical Description

Algorithm Structure

SipHash is a keyed pseudorandom (PRF) that accepts a 128-bit secret key and a variable-length byte-string input , the in 64-bit (8-byte) blocks to generate a 64-bit output. The algorithm operates on a 256-bit internal state composed of four 64-bit words (v0, v1, v2, v3), employing addition, bitwise XOR, and word rotations (ARX operations) as its core primitives. The algorithm proceeds in three primary phases: state initialization, message absorption, and finalization. Initialization begins by splitting the 128-bit key into two 64-bit little-endian halves k0 and k1, then setting the as follows:
v0 ← k0 ⊕ 0x736f6d6570736575,
v1 ← k1 ⊕ 0x646f72616e646f6d,
v2 ← k0 ⊕ 0x6c7967656e657261,
v3 ← k1 ⊕ 0x7465646279746573. These constants derive from ASCII representations of strings ("somepseu", "dorandom", "lygenera", "tedbytes") to aid readability and debugging.
In the absorption phase, the input message of b bytes is parsed into w = ⌈(b + 1)/8⌉ little-endian 64-bit words m_i (i = 0 to w-1). For each word m_i, the state is updated by XORing m_i into v3, applying c iterations of the compression , and then XORing m_i into v0. The final block m_{w-1} incorporates any remaining message bytes (0 to 7) in the low-order byte positions, zero-padding to 7 bytes, and places the message length b mod 256 in the most significant byte; for instance, a 1-byte input 0xab yields m_0 = 0x01000000000000ab. This padding rule appends the length byte to distinguish messages of lengths congruent modulo . The SipRound compression function applies two half-rounds sequentially to mix the state. The first half-round performs:
v0 ← v0 + v1
v2 ← v2 + v3
v1 ← ROTL(v1, 13)
v3 ← ROTL(v3, 16)
v1 ← v1 ⊕ v0
v3 ← v3 ⊕ v2
v0 ← ROTL(v0, 32)
The second half-round continues with:
v2 ← v2 + v1
v0 ← v0 + v3
v1 ← ROTL(v1, 17)
v3 ← ROTL(v3, 21)
v1 ← v1 ⊕ v2
v3 ← v3 ⊕ v0
v2 ← ROTL(v2, 32)
All operations use 64-bit (additions wrap around at 2^{64}). Finalization XORs 0xff into v2, applies d iterations of SipRound, and outputs the 64-bit value v0 ⊕ v1 ⊕ v2 ⊕ v3.

Parameters and Variants

SipHash is parameterized by two integers, c and d, denoting the number of compression rounds and finalization rounds, respectively, in the notation SipHash-c-d). These parameters allow tuning the balance between security and performance, with higher values providing greater diffusion at the cost of computation time. The standard variant is SipHash-2-4, which employs c=2 compression rounds and d=4 finalization rounds, offering a balanced profile suitable for most applications such as hash-table lookups and network . A more conservative option, SipHash-4-8, doubles the rounds to c=4 and d=8, enhancing resistance to cryptanalytic attacks while approximately halving the speed. The shorthand SipHash24 often refers to the default 2-4 configuration. The uses a 128-bit secret , split into two 64-bit halves k_0 and k_1, without further expansion. These are XORed with fixed 64-bit initialization constants to form the initial state words v_0 to v_3: \begin{align*} v_0 &= k_0 \oplus 0x736f6d6570736575, \\ v_1 &= k_1 \oplus 0x646f72616e646f6d, \\ v_2 &= k_0 \oplus 0x6c7967656e657261, \\ v_3 &= k_1 \oplus 0x7465646279746573. \end{align*} These constants derive from the ASCII representations of strings "somepseu", "dorandom", "lygenera", and "tedbytes", ensuring a structured starting point for the ARX operations. Increasing c and d improves avalanche and diffusion properties but reduces throughput; for instance, the SipHash-2-4 variant processes an 8-byte input in approximately 124–135 cycles on contemporary 64-bit CPUs like the FX-8150 or Quad Q6600. This trade-off makes SipHash-2-4 preferable for speed-critical environments, while SipHash-4-8 is recommended where marginal security gains justify the overhead. SipHash primarily outputs 64 bits, aligning with its design for short-input pseudorandom functions. For scenarios requiring smaller outputs, a 32-bit adaptable variant known as HalfSipHash employs 32-bit words, a -bit key, and similar round parameters (e.g., HalfSipHash-2-4), producing 32-bit tags while maintaining core security principles.

Security Analysis

Cryptographic Properties

SipHash is constructed as a family of pseudorandom functions (PRFs), designed with security bounds supporting its use as a keyed function up to the birthday bound of its -bit output. The 128-bit key length ensures resistance to brute-force key recovery attacks with complexity 2^{128}, while the -bit output limits practical distinguishers to up to approximately 2^{64} queries before potential output collisions undermine further security analysis. This design positions SipHash as a strong PRF for short inputs, with no practical distinguishers identified despite extensive testing. In the context of hash table applications, SipHash provides effective by producing outputs indistinguishable from those of a for inputs under 256 bits, without key knowledge. This prevents denial-of-service attacks that exploit predictable collisions, as an adversary would need roughly 2^{32} chosen inputs to achieve a birthday collision in the 64-bit output space. The keyed nature further elevates security beyond schemes, ensuring resistant to targeted manipulation. The algorithm demonstrates robust and properties through its add-rotate-XOR (ARX) operations across multiple rounds. A single-bit change in the input or key leads to approximately 50% of the output bits flipping, indicative of ideal behavior, with full mixing achieved after just a few SipRounds due to the 32-bit rotations that propagate changes rapidly across the . As a , SipHash provides forgery resistance, requiring approximately 2^{64} work to produce a valid for a new message after observing others, limited by its 64-bit output. As of 2025, no practical breaks compromising these properties have been discovered for the standard SipHash-2-4 variant. Theoretical security relies on ARX provable bounds, including resistance to rotational and attacks from rapid algebraic degree growth, and cryptanalysis results showing low probabilities for high-weight s—the best truncated over 4 rounds has probability 2^{-134}, far below practical thresholds.

Known Limitations and Attacks

SipHash, while effective for its intended short-input applications, has inherent limitations stemming from its design parameters. Its 64-bit output size makes it susceptible to birthday attacks in scenarios involving very large hash tables exceeding approximately 2^{32} elements, where collisions become probable due to the . Additionally, SipHash is inefficient for long inputs beyond 64 bytes, as its performance degrades to around 1.96 cycles per byte on certain hardware, making it less suitable compared to alternatives optimized for extended messages. Theoretical attacks on SipHash remain impractical for the full-round variants used in practice. Key recovery requires brute-forcing the 128-bit key, demanding approximately 2^{127} evaluations, while state recovery has a complexity of approximately 2^{191} evaluations and internal collisions require around 2^{128} queries. Differential , as explored in subsequent analyses, identifies characteristics with probabilities as low as 2^{-236.3} for the standard SipHash-2-4 configuration, far exceeding the security margin needed for real-world threats and posing no practical risk. Related ARX-based functions like have undergone bias analysis, but no such exploitable biases have been found in full-round SipHash. More recent analyses, including cryptanalysis of reduced-round variants in 2019 and forgery attacks on weakened configurations in 2025, continue to affirm the of full-round SipHash-2-4 with no practical exploits identified. As of 2025, SipHash remains unbroken against practical cryptanalytic attacks, with no reported vulnerabilities compromising its core properties in deployed systems. However, minor side-channel concerns exist in software implementations, particularly timing variations on variable-length inputs if not implemented in constant time; these are readily mitigated through careful coding practices that avoid data-dependent branches. For applications requiring heightened security, higher-round variants such as SipHash-4-8 are recommended to provide additional margins against theoretical attacks, though the standard SipHash-2-4 suffices for most uses. SipHash is not intended as a general-purpose (MAC) for arbitrary-length messages, where its fixed structure limits efficiency. In comparison to AES-based MACs, SipHash is weaker for long inputs due to its short-input optimization but offers superior speed for brief messages, achieving up to 10 times faster performance in lookups.

Implementations and Adoption

Reference Implementation

The reference implementation of SipHash is a compact C codebase released in 2012 by its designers, Jean-Philippe Aumasson and , consisting of approximately 140 lines in the core siphash.c file, with additional files for testing and variants like HalfSipHash. This implementation is optimized for clarity, debugging, and performance on 64-bit architectures, leveraging efficient bit operations on uint64_t types to achieve low cycle counts per byte, such as around 2 cycles per byte on contemporary hardware at the time of release. The code is hosted on the veorq/SipHash repository and serves as the canonical portable version for verifying compliance in other implementations. Portability is a core design principle, achieved through exclusive use of add-rotate-XOR (ARX) operations—additions, rotations, and XORs on 64-bit integers—without reliance on external cryptographic libraries or platform-specific primitives beyond standard C headers like <stdint.h>. This allows straightforward compilation and execution on any system supporting 64-bit integers, including both 32-bit and 64-bit CPUs from vendors like and , with minimal adaptations needed for handling via macros. The absence of dependencies facilitates integration into diverse environments, from embedded systems to high-performance servers. Key features include built-in support for compile-time configuration of round counts (e.g., via -DcROUNDS=2 -DdROUNDS=4 for the SipHash-2-4 ) and a comprehensive in test.c with reference vectors per to validate outputs. For instance, using the (bytes 0 through 15) and empty input, the SipHash-2-4 function produces the output 0x726fdb47dd0e0e31, which implementations must match for correctness. These vectors cover inputs from 0 to 63 bytes, ensuring thorough of the compression and finalization phases. The implementation is released under permissive licenses including CC0 1.0 ( dedication) and , promoting unrestricted adoption and modification without proprietary constraints. Since its initial release, the core and reference code have remained unchanged, with repository updates limited to minor bug fixes, notices extending to 2023, and clarifications in , preserving for all dependent projects.

Use in Software and Systems

SipHash has seen widespread adoption in programming languages for securing hash-based data structures against denial-of-service (DoS) attacks through hash flooding. In , it became the default hash algorithm for dictionaries and sets starting with version 3.4, as specified in PEP 456, to replace the vulnerable FNV-based hashing and provide cryptographic resistance to collision attacks while maintaining performance for short inputs like strings and keys. Similarly, Ruby's CRuby interpreter integrates SipHash for hash table operations, 5 uses it as the default for internal hashing to mitigate DoS risks, and Rust's employs SipHash-1-3 in its HashMap and HashSet implementations for secure, randomized hashing without performance degradation on typical workloads. In operating systems, SipHash is employed in kernel-level hash tables to enhance for network processing. The introduced SipHash as a cryptographically secure pseudorandom function (PRF) in version 4.11, replacing weaker algorithms like jhash in components such as inet_hashtables for and connection tracking, thereby preventing remote attackers from exploiting collisions to degrade . followed suit by incorporating SipHash into its subsystem, with implementations visible in versions like 13-STABLE for similar network hash protections. Beyond languages and kernels, SipHash finds use in cryptographic libraries and other systems requiring lightweight keyed hashing. version 3.0 and later supports SipHash through its EVP_MAC API for computing message authentication codes (MACs), allowing configurable rounds (default 2-4) and integration into broader TLS and workflows. In cryptocurrency software, SipHash serves as a PRF in various implementations for generating nonces or keys, leveraging its efficiency for short inputs in resource-constrained environments. Performance-wise, SipHash's design emphasizes speed on short inputs, with the 2-4 variant offering approximately 3-5 times higher throughput than the more secure 4-8 variant for messages under 64 bytes, making it suitable for real-time hashing without introducing bottlenecks. This efficiency, combined with its resistance via per-session keys, renders hash flooding attacks negligible in production systems, often reducing amplification factors from to constant time. Adoption accelerated rapidly after SipHash's publication due to its simplicity and drop-in compatibility with existing hash tables, evolving from early integrations in and to becoming the in modern implementations by 2025.

References

  1. [1]
    None
    ### Detailed Summary of SipHash Paper
  2. [2]
    PEP 456 – Secure and interchangeable hash algorithm
    PEP 456 proposes SipHash as the default, secure, and interchangeable hash algorithm for Python, fixing hash randomization and making it easily replaceable.
  3. [3]
    SipHash in the kernel - LWN.net
    Jan 10, 2017 · SipHash is the creation of Jean-Philippe Aumasson and (inevitably) Daniel J. Bernstein; readers interested in the details can find them in this ...
  4. [4]
    SipHash: a fast short-input PRF - Cryptology ePrint Archive
    Jun 22, 2012 · SipHash is a family of pseudorandom functions optimized for short inputs. Target applications include network traffic authentication and hash-table lookups.Missing: original | Show results with:original
  5. [5]
    SipHash: A Fast Short-Input PRF - SpringerLink
    Aumasson, JP., Bernstein, D.J. (2012). SipHash: A Fast Short-Input PRF. In: Galbraith, S., Nandi, M. (eds) Progress in Cryptology - INDOCRYPT 2012 ...
  6. [6]
  7. [7]
    [PDF] Hash-flooding DoS reloaded: attacks and defenses
    Jun 28, 2025 · Peer-reviewed research paper (A., Bernstein). published at DIAC 2012, INDOCRYPT 2012. Page 147. SipHash initialization. 256-bit state v0 v1 v2 ...
  8. [8]
    [PDF] SipHash: a fast short-input PRF
    Sep 18, 2012 · This paper introduces the SipHash family of hash functions to address the ... Jean-Philippe Aumasson and Daniel J. Bernstein function and ` is a ...
  9. [9]
    veorq/SipHash: High-speed secure pseudorandom function ... - GitHub
    Security is limited by the key size (128 bits for SipHash), such that attackers searching 2s keys have chance 2s−128 of finding the SipHash key. Security is ...<|control11|><|separator|>
  10. [10]
    [PDF] Differential Cryptanalysis of SipHash - Cryptology ePrint Archive
    In this paper, we provide the first published third-party cryptanalysis of SipHash regarding differential cryptanalysis. We use existing auto- matic tools to ...Missing: properties original
  11. [11]
    [PDF] SipHash: a fast short-input PRF D. J. Bernstein, University of Illinois ...
    Dec 14, 2012 · Positive SipHash reception: many third-party implementations; now used for hash tables in Ruby,. Redis, Rust, OpenDNS, Perl 5.
  12. [12]
    siphasher - Rust - Docs.rs
    This crates implements SipHash-2-4 and SipHash-1-3 in Rust. It is based on the original implementation from rust-core and exposes the same API.
  13. [13]
    SipHash - a short input PRF - The Linux Kernel documentation
    There are two variants of the function, one that takes a list of integers, and one that takes a buffer: u64 siphash(const void *data, size_t len, const ...
  14. [14]
  15. [15]
    EVP_MAC-Siphash - OpenSSL Documentation
    DESCRIPTION¶. Support for computing Siphash MACs through the EVP_MAC API. Identity¶. This implementation is identified with this name and properties, to be used ...Missing: PostgreSQL Bitcoin
  16. [16]
    bitcoin-siphash 0.1.18 - Docs.rs
    SipHash was introduced in 2012 by Jean-Philippe Aumasson and Daniel J. Bernstein as a fast and secure hash function for use in various applications. What is the ...Missing: timeline | Show results with:timeline