SipHash

SipHash is a family of lightweight pseudorandom functions (PRFs) developed by Jean-Philippe Aumasson and Daniel J. Bernstein in 2012, optimized for short inputs and 128-bit keys to enable fast, secure hashing primarily for hash-table lookups and network traffic authentication.^[1] It employs an add-rotate-XOR (ARX) construction with a simple state of four 64-bit words, processing data in 64-bit blocks through configurable rounds of a core "SipRound" operation consisting of additions, XORs, and rotations.^[1] The design emphasizes key agility, simplicity, and resistance to side-channel attacks, with no reliance on external cryptographic primitives.^[1] Variants such as SipHash-2-4 (two compression rounds and four finalization rounds) and SipHash-4-8 balance security and performance, producing a 64-bit output suitable for message authentication codes (MACs) but not for collision-resistant hashing.^[1] Security proofs establish it as a strong PRF, secure against key recovery (requiring approximately 2^{127} evaluations), state recovery, and differential attacks up to the 128-bit key size, though its 64-bit output limits it against exhaustive forgery guesses of 2^{64}.^[1] Performance benchmarks highlight its efficiency: on an AMD FX-8150 processor, SipHash-2-4 processes a 16-byte input in 140 cycles, outperforming HMAC-SHA-1 for short messages and competing with non-cryptographic hashes like CityHash while providing cryptographic guarantees.^[1] It achieves about 1.96 cycles per byte for longer inputs on similar hardware, making it ideal for high-throughput scenarios.^[1] SipHash addresses vulnerabilities like hash-flooding denial-of-service (DoS) attacks by using a secret key to produce unpredictable hash values, preventing attackers from forcing collisions in hash tables.^[1] It has been widely adopted in production software: since Python 3.4, it serves as the default hash algorithm for built-in types like dictionaries to mitigate collision attacks, with randomization enabled by default.^[2] The Linux kernel integrated SipHash and a half-sized variant (HalfSipHash) starting in version 4.10 (2017) for secure sequence numbers and hash tables, replacing slower options like MD5.^[3] OpenBSD incorporated it from version 5.7 (2015) for hash table protection and other keyed hashing needs.

Background and Development

Origins

SipHash was developed by cryptographers Jean-Philippe Aumasson and Daniel J. Bernstein as a family of pseudorandom functions designed for efficiency with short inputs.^[4] The algorithm was first introduced in their 2012 research paper titled "SipHash: a fast short-input PRF," published on the IACR ePrint archive in June 2012 and later presented at the INDOCRYPT conference in December of that year.^[4]^[5] The creation of SipHash was prompted by vulnerabilities in non-cryptographic hash functions exposed during demonstrations of hash-flooding denial-of-service (DoS) attacks in 2011 and 2012. These attacks targeted hash tables commonly used in web applications and programming languages, where adversaries could craft inputs to generate collisions, forcing all elements into the same bucket and degrading average-case O(1) lookup performance to worst-case O(n²) complexity.^[4] A seminal demonstration occurred at the 28th Chaos Communication Congress in December 2011, where researchers Alexander Klink and Julian Wälde detailed practical exploits against platforms like PHP, ASP.NET, and Java, showing how small inputs (e.g., a few megabytes) could consume excessive CPU time.^[6] Follow-up analyses in 2012, including at the 29th Chaos Communication Congress, further highlighted the widespread impact on systems relying on predictable hash functions.^[7] The primary motivation for SipHash was to provide a keyed, cryptographically secure alternative to vulnerable hash functions, particularly for protecting hash-table lookups against such flooding attacks while maintaining high performance for short inputs typical of keys like strings or integers in network packets and data structures.^[4] By incorporating a secret key, SipHash ensures that outputs are unpredictable without key knowledge, thwarting collision-based DoS without the overhead of slower cryptographic primitives.^[4] This addressed a critical gap identified in earlier work, such as the 2003 analysis of algorithmic complexity attacks by Crosby and Wallach, but tailored specifically to the real-world threats emerging in web infrastructure.^[4]

Design Goals

SipHash was designed primarily to serve as a fast, secure keyed pseudorandom function (PRF) optimized for short inputs, typically ranging from 8 to 64 bytes, making it particularly suitable for protecting hash tables against collision-based denial-of-service (DoS) attacks. This focus addressed vulnerabilities exposed in 2011, where attackers exploited weak hash functions in software to degrade performance through crafted inputs. By providing a cryptographic PRF, SipHash ensures that hash values behave indistinguishably from random outputs, thereby resisting deliberate collisions while maintaining efficiency in resource-constrained environments.^[4] Performance targets emphasized high speed on modern CPUs through the use of simple ARX (addition, rotation, XOR) operations, which allow for rapid computation without relying on complex instructions or large code footprints. The design aims for minimal memory usage, with a small state size and straightforward iteration, enabling it to process short inputs significantly faster than prior cryptographic MACs like HMAC while remaining competitive with non-cryptographic hashes. For instance, on contemporary hardware, SipHash achieves processing times around 140 cycles for a 16-byte input, establishing its viability for real-time applications.^[4] Security objectives center on delivering a 128-bit security level against brute-force and collision attacks, leveraging a 128-bit key to produce 64-bit outputs that meet the highest standards for PRF distinguishability. Unlike more elaborate alternatives such as HMAC, SipHash prioritizes simplicity to reduce implementation errors and auditing complexity, while still providing robust protection for its intended scope. This balance makes it ideal for target use cases like network traffic authentication and maintaining hash table integrity in software interpreters and operating system kernels.^[4]

Technical Description

Algorithm Structure

SipHash is a keyed pseudorandom function (PRF) that accepts a 128-bit secret key and a variable-length byte-string input message, processing the message in 64-bit (8-byte) blocks to generate a 64-bit output.^[8] The algorithm operates on a 256-bit internal state composed of four 64-bit words (v0, v1, v2, v3), employing addition, bitwise XOR, and word rotations (ARX operations) as its core primitives.^[8] The algorithm proceeds in three primary phases: state initialization, message absorption, and finalization. Initialization begins by splitting the 128-bit key into two 64-bit little-endian halves k0 and k1, then setting the state as follows:
v0 ← k0 ⊕ 0x736f6d6570736575,
v1 ← k1 ⊕ 0x646f72616e646f6d,
v2 ← k0 ⊕ 0x6c7967656e657261,
v3 ← k1 ⊕ 0x7465646279746573.^[8] These constants derive from ASCII representations of strings ("somepseu", "dorandom", "lygenera", "tedbytes") to aid readability and debugging.^[8] In the absorption phase, the input message of b bytes is parsed into w = ⌈(b + 1)/8⌉ little-endian 64-bit words m_i (i = 0 to w-1). For each word m_i, the state is updated by XORing m_i into v3, applying c iterations of the SipRound compression function, and then XORing m_i into v0. The final block m_{w-1} incorporates any remaining message bytes (0 to 7) in the low-order byte positions, zero-padding to 7 bytes, and places the message length b mod 256 in the most significant byte; for instance, a 1-byte input 0xab yields m_0 = 0x01000000000000ab.^[8] This padding rule appends the length byte to distinguish messages of lengths congruent modulo 256.^[8] The SipRound compression function applies two half-rounds sequentially to mix the state. The first half-round performs:

v0 ← v0 + v1
v2 ← v2 + v3
v1 ← ROTL(v1, 13)
v3 ← ROTL(v3, 16)
v1 ← v1 ⊕ v0
v3 ← v3 ⊕ v2
v0 ← ROTL(v0, 32)
v0 ← v0 + v1
v2 ← v2 + v3
v1 ← ROTL(v1, 13)
v3 ← ROTL(v3, 16)
v1 ← v1 ⊕ v0
v3 ← v3 ⊕ v2
v0 ← ROTL(v0, 32)

The second half-round continues with:

v2 ← v2 + v1
v0 ← v0 + v3
v1 ← ROTL(v1, 17)
v3 ← ROTL(v3, 21)
v1 ← v1 ⊕ v2
v3 ← v3 ⊕ v0
v2 ← ROTL(v2, 32)
v2 ← v2 + v1
v0 ← v0 + v3
v1 ← ROTL(v1, 17)
v3 ← ROTL(v3, 21)
v1 ← v1 ⊕ v2
v3 ← v3 ⊕ v0
v2 ← ROTL(v2, 32)

All operations use 64-bit modular arithmetic (additions wrap around at 2^{64}).^[8] Finalization XORs 0xff into v2, applies d iterations of SipRound, and outputs the 64-bit value v0 ⊕ v1 ⊕ v2 ⊕ v3.^[8]

Parameters and Variants

SipHash is parameterized by two integers, c and d, denoting the number of compression rounds and finalization rounds, respectively, in the notation SipHash-c-d).^[8] These parameters allow tuning the balance between security and performance, with higher values providing greater diffusion at the cost of computation time.^[8] The standard variant is SipHash-2-4, which employs c=2 compression rounds and d=4 finalization rounds, offering a balanced profile suitable for most applications such as hash-table lookups and network authentication.^[8] A more conservative option, SipHash-4-8, doubles the rounds to c=4 and d=8, enhancing resistance to cryptanalytic attacks while approximately halving the speed.^[8] The shorthand SipHash24 often refers to the default 2-4 configuration.^[9] The key schedule uses a 128-bit secret key, split into two 64-bit halves k_0 and k_1, without further expansion. These are XORed with fixed 64-bit initialization constants to form the initial state words v_0 to v_3:

\begin{align*} v_0 &= k_0 \oplus 0x736f6d6570736575, \\ v_1 &= k_1 \oplus 0x646f72616e646f6d, \\ v_2 &= k_0 \oplus 0x6c7967656e657261, \\ v_3 &= k_1 \oplus 0x7465646279746573. \end{align*}

These constants derive from the ASCII representations of strings "somepseu", "dorandom", "lygenera", and "tedbytes", ensuring a structured starting point for the ARX operations.^[8] Increasing c and d improves avalanche and diffusion properties but reduces throughput; for instance, the SipHash-2-4 variant processes an 8-byte input in approximately 124–135 cycles on contemporary 64-bit CPUs like the AMD FX-8150 or Intel Core 2 Quad Q6600.^[8] This trade-off makes SipHash-2-4 preferable for speed-critical environments, while SipHash-4-8 is recommended where marginal security gains justify the overhead.^[8] SipHash primarily outputs 64 bits, aligning with its design for short-input pseudorandom functions. For scenarios requiring smaller outputs, a 32-bit adaptable variant known as HalfSipHash employs 32-bit words, a 64-bit key, and similar round parameters (e.g., HalfSipHash-2-4), producing 32-bit tags while maintaining core security principles.^[9]

Security Analysis

Cryptographic Properties

SipHash is constructed as a family of pseudorandom functions (PRFs), designed with security bounds supporting its use as a keyed function up to the birthday bound of its 64-bit output. The 128-bit key length ensures resistance to brute-force key recovery attacks with complexity 2^{128}, while the 64-bit output limits practical distinguishers to up to approximately 2^{64} queries before potential output collisions undermine further security analysis. This design positions SipHash as a strong PRF for short inputs, with no practical distinguishers identified despite extensive testing.^[8] In the context of hash table applications, SipHash provides effective collision resistance by producing outputs indistinguishable from those of a random oracle for inputs under 256 bits, without key knowledge. This prevents denial-of-service attacks that exploit predictable collisions, as an adversary would need roughly 2^{32} chosen inputs to achieve a birthday collision in the 64-bit output space. The keyed nature further elevates security beyond universal hashing schemes, ensuring uniform distribution resistant to targeted manipulation.^[8] The algorithm demonstrates robust diffusion and avalanche properties through its add-rotate-XOR (ARX) operations across multiple rounds. A single-bit change in the input or key leads to approximately 50% of the output bits flipping, indicative of ideal avalanche behavior, with full mixing achieved after just a few SipRounds due to the 32-bit rotations that propagate changes rapidly across the state.^[8] As a MAC, SipHash provides forgery resistance, requiring approximately 2^{64} work to produce a valid tag for a new message after observing others, limited by its 64-bit output. As of 2025, no practical breaks compromising these properties have been discovered for the standard SipHash-2-4 variant. Theoretical security relies on ARX provable bounds, including resistance to rotational and cube attacks from rapid algebraic degree growth, and differential cryptanalysis results showing low probabilities for high-weight differentials—the best truncated differential over 4 rounds has probability 2^{-134}, far below practical thresholds.^[8]^[10]

Known Limitations and Attacks

SipHash, while effective for its intended short-input applications, has inherent limitations stemming from its design parameters. Its 64-bit output size makes it susceptible to birthday attacks in scenarios involving very large hash tables exceeding approximately 2^{32} elements, where collisions become probable due to the pigeonhole principle.^[8] Additionally, SipHash is inefficient for long inputs beyond 64 bytes, as its performance degrades to around 1.96 cycles per byte on certain hardware, making it less suitable compared to alternatives optimized for extended messages.^[8] Theoretical attacks on SipHash remain impractical for the full-round variants used in practice. Key recovery requires brute-forcing the 128-bit key, demanding approximately 2^{127} evaluations, while state recovery has a complexity of approximately 2^{191} evaluations and internal collisions require around 2^{128} queries.^[8] Differential cryptanalysis, as explored in subsequent analyses, identifies characteristics with probabilities as low as 2^{-236.3} for the standard SipHash-2-4 configuration, far exceeding the security margin needed for real-world threats and posing no practical risk.^[10] Related ARX-based functions like ChaCha have undergone bias analysis, but no such exploitable biases have been found in full-round SipHash.^[8] More recent analyses, including cryptanalysis of reduced-round variants in 2019 and forgery attacks on weakened configurations in 2025, continue to affirm the security of full-round SipHash-2-4 with no practical exploits identified.^[11]^[12] As of 2025, SipHash remains unbroken against practical cryptanalytic attacks, with no reported vulnerabilities compromising its core security properties in deployed systems.^[10] However, minor side-channel concerns exist in software implementations, particularly timing variations on variable-length inputs if not implemented in constant time; these are readily mitigated through careful coding practices that avoid data-dependent branches.^[8] For applications requiring heightened security, higher-round variants such as SipHash-4-8 are recommended to provide additional margins against theoretical attacks, though the standard SipHash-2-4 suffices for most uses.^[8] SipHash is not intended as a general-purpose message authentication code (MAC) for arbitrary-length messages, where its fixed structure limits efficiency. In comparison to AES-based MACs, SipHash is weaker for long inputs due to its short-input optimization but offers superior speed for brief messages, achieving up to 10 times faster performance in hash table lookups.^[8]

Implementations and Adoption

Reference Implementation

The reference implementation of SipHash is a compact C codebase released in 2012 by its designers, Jean-Philippe Aumasson and Daniel J. Bernstein, consisting of approximately 140 lines in the core siphash.c file, with additional files for testing and variants like HalfSipHash. This implementation is optimized for clarity, debugging, and performance on 64-bit architectures, leveraging efficient bit operations on uint64_t types to achieve low cycle counts per byte, such as around 2 cycles per byte on contemporary hardware at the time of release. The code is hosted on the veorq/SipHash GitHub repository and serves as the canonical portable version for verifying compliance in other implementations.^[9]^[4] Portability is a core design principle, achieved through exclusive use of add-rotate-XOR (ARX) operations—additions, rotations, and XORs on 64-bit integers—without reliance on external cryptographic libraries or platform-specific primitives beyond standard C headers like <stdint.h>. This allows straightforward compilation and execution on any system supporting 64-bit integers, including both 32-bit and 64-bit CPUs from vendors like AMD and Intel, with minimal adaptations needed for endianness handling via macros. The absence of dependencies facilitates integration into diverse environments, from embedded systems to high-performance servers.^[4]^[9] Key features include built-in support for compile-time configuration of round counts (e.g., via -DcROUNDS=2 -DdROUNDS=4 for the standard SipHash-2-4 variant) and a comprehensive test suite in test.c with 64 reference vectors per variant to validate outputs. For instance, using the standard test key (bytes 0 through 15) and empty input, the SipHash-2-4 function produces the output 0x726fdb47dd0e0e31, which implementations must match for correctness. These vectors cover inputs from 0 to 63 bytes, ensuring thorough verification of the compression and finalization phases.^[9] The implementation is released under permissive licenses including CC0 1.0 (public domain dedication) and MIT, promoting unrestricted adoption and modification without proprietary constraints. Since its initial 2012 release, the core algorithm and reference code have remained unchanged, with repository updates limited to minor bug fixes, copyright notices extending to 2023, and clarifications in documentation, preserving backward compatibility for all dependent projects.^[9]

Use in Software and Systems

SipHash has seen widespread adoption in programming languages for securing hash-based data structures against denial-of-service (DoS) attacks through hash flooding. In Python, it became the default hash algorithm for dictionaries and sets starting with version 3.4, as specified in PEP 456, to replace the vulnerable FNV-based hashing and provide cryptographic resistance to collision attacks while maintaining performance for short inputs like strings and keys.^[2] Similarly, Ruby's CRuby interpreter integrates SipHash for hash table operations, Perl 5 uses it as the default for internal hashing to mitigate DoS risks, and Rust's standard library employs SipHash-1-3 in its HashMap and HashSet implementations for secure, randomized hashing without performance degradation on typical workloads.^[13]^[14] In operating systems, SipHash is employed in kernel-level hash tables to enhance security for network processing. The Linux kernel introduced SipHash as a cryptographically secure pseudorandom function (PRF) in version 4.11, replacing weaker algorithms like jhash in components such as inet_hashtables for IPv4 and IPv6 connection tracking, thereby preventing remote attackers from exploiting hash collisions to degrade performance.^[3]^[15] FreeBSD followed suit by incorporating SipHash into its kernel crypto subsystem, with implementations visible in versions like 13-STABLE for similar network hash protections.^[16] Beyond languages and kernels, SipHash finds use in cryptographic libraries and other systems requiring lightweight keyed hashing. OpenSSL version 3.0 and later supports SipHash through its EVP_MAC API for computing message authentication codes (MACs), allowing configurable rounds (default 2-4) and integration into broader TLS and protocol workflows.^[17] In cryptocurrency software, SipHash serves as a PRF in various implementations for generating nonces or keys, leveraging its efficiency for short inputs in resource-constrained environments.^[18] Performance-wise, SipHash's design emphasizes speed on short inputs, with the 2-4 variant offering approximately 3-5 times higher throughput than the more secure 4-8 variant for messages under 64 bytes, making it suitable for real-time hashing without introducing bottlenecks.^[8] This efficiency, combined with its DoS resistance via per-session keys, renders hash flooding attacks negligible in production systems, often reducing amplification factors from quadratic to constant time.^[15] Adoption accelerated rapidly after SipHash's 2012 publication due to its simplicity and drop-in compatibility with existing hash tables, evolving from early integrations in Redis and OpenDNS to becoming the de facto standard in modern implementations by 2025.^[13]^[9]