Fact-checked by Grok 2 weeks ago

One-way compression function

In , a one-way compression is a deterministic mathematical that takes two fixed-length inputs—typically an initial value or chaining variable of length n bits and a of length m bits (where m > n)—and produces a fixed-length output of n bits, designed to be efficiently computable in the forward direction while being computationally infeasible to invert on a significant fraction of its outputs or to find collisions (distinct inputs yielding the same output). This ensures data compression without loss of essential security properties, making it a core component for building cryptographic functions that process arbitrary-length inputs. The foundational role of one-way compression functions was established independently in 1989 by and Ivan Damgård, who demonstrated that a collision-resistant compression function can be iterated to construct a collision-resistant for messages of any length. In the , the input message is padded to a multiple of the block size, divided into blocks, and processed sequentially: each block is combined with the output (chaining value) from the previous compression, starting from a fixed initial value, with the final output serving as the hash digest. This iterative approach preserves collision resistance from the compression function to the overall hash, assuming the message length is appended to prevent certain attacks. Essential security properties of a one-way compression function include preimage resistance (given an output, finding any valid input is hard, with success probability negligible compared to 2n operations), second-preimage resistance (given an input and its output, finding a different input with the same output is hard), and (finding any two distinct inputs with the same output requires approximately 2n/2 efforts under the birthday bound). While these properties hold ideally for the Merkle–Damgård paradigm, subsequent cryptanalytic advances have identified limitations, such as length-extension attacks (where an attacker can append data to a hash without knowing the original message) and multicollision vulnerabilities (finding 2k colliding inputs for small k at cost ~2n/2), prompting mitigations like domain separation or alternative constructions in modern designs. One-way compression functions are integral to widely adopted standards, including the Secure Hash Algorithm (SHA) family in FIPS 180-4, where SHA-224, SHA-256, SHA-384, SHA-512, SHA-512/224, and SHA-512/256 employ Merkle–Damgård iterations of compression functions built from cipher-like operations for applications in digital signatures, message authentication codes, and integrity protection. However, due to evolving threats, NIST standardized SHA-3 in FIPS 202 using a sponge construction that absorbs and squeezes data without a traditional compression function, offering broader resistance to certain structural attacks while maintaining compatibility with legacy systems.

Fundamentals

Definition

In , a one-way compression is a mathematical function f: \{0,1\}^{n + b} \to \{0,1\}^n that takes two fixed-length inputs—a value of n bits and a message block of b bits, where typically b \geq n—and produces a fixed-length output of n bits, effectively compressing the combined input while preserving essential information for subsequent iterations. This structure ensures the output size remains constant regardless of variations in the message block, facilitating efficient processing of larger data. The concept was independently introduced by in 1989 and Ivan Damgård in 1989 (published in 1990) as a foundational for building secure functions from shorter underlying components, such as block ciphers, assuming those primitives exhibit suitable one-way properties. work demonstrated constructions using as the base, while Damgård formalized a design principle showing that a collision-free compression could yield a collision-free for arbitrary-length inputs. As the iterative core in domain extension algorithms like the , the one-way compression function enables hash functions to process messages of arbitrary length by chaining: the output of one application becomes the chain value for the next, starting from an initial value () and incorporating padded message blocks sequentially until a final digest is produced. This chaining mechanism, combined with the function's compression and one-way properties, ensures the overall hash resists collisions and preimages while handling variable inputs efficiently. A basic illustration of its application in a single iteration is given by the following pseudocode, where H_{i-1} is the previous chain value and m_i is the i-th message block:
H_i = f(H_{i-1}, m_i)
This step is repeated for each block, with the final H_k serving as the hash output.

Compression Property

The compression property of a one-way compression function refers to its ability to map an input consisting of a fixed-length chaining value of n bits and a message block of b bits—resulting in a total input length of n + b bits—to a fixed-length output of exactly n bits. This mechanism ensures that the output matches the size of the chaining value, allowing it to be fed directly into the next iteration without size accumulation. The primary purpose of this is to enable the of messages of arbitrary length in cryptographic functions. By dividing the input into fixed-size blocks of b bits and iteratively applying the function—starting from an initial chaining value—the overall computation produces a constant-size output regardless of the input length, thus preventing in output size. This iterative reduction facilitates efficient chaining, where each compressed output serves as the input chaining value for the subsequent block. A key implication of the fixed output size n is that it determines the length of the final digest, which must balance strength against computational ; for instance, larger n enhances resistance to attacks but increases processing overhead. Typically, the b/n > 1 allows more input bits to be processed per iteration, improving throughput. In the case of , the compression function takes a 160-bit chaining value and a 512-bit message block as input, producing a 160-bit output to maintain the chain value size across iterations. Higher compression ratios enhance by processing larger blocks relative to the output size, but they can amplify vulnerabilities, such as increased susceptibility to collisions if the underlying lacks sufficient diffusion and mixing properties. For example, designs aiming for rates exceeding $1 + k/n (where k relates to parameters) may fail to preserve , underscoring the need for careful parameter selection to mitigate these risks.

One-way Property

The one-way property of a , also known as preimage , ensures that it is computationally infeasible to reverse the to recover an input given its output. Formally, for a f: \{0,1\}^{m+n} \to \{0,1\}^n taking an m-bit message block and an n-bit chaining value to produce an n-bit output, the property requires that for a randomly selected input x \in \{0,1\}^{m+n}, given y = f(x), no probabilistic polynomial-time (PPT) adversary can find any preimage x' such that f(x') = y with more than negligible probability. The adversary advantage is defined as \Adv_f^\Pre(\mathcal{A}) = \Pr\left[ f(\mathcal{A}(y)) = y \;\middle|\; y \leftarrow f(U), \, U \leftarrow \{0,1\}^{m+n} \right], where f is (t, \epsilon)-preimage resistant if for all PPT adversaries \mathcal{A} running in time at most t, \Adv_f^\Pre(\mathcal{A}) \leq \epsilon. This definition captures the essence of one-wayness, making the function suitable as a building block for cryptographic primitives like hash functions where invertibility must be hard. Closely related to preimage resistance are second-preimage resistance and , which strengthen the one-way property for broader applications. Second-preimage resistance states that, given an input x, it is computationally infeasible to find a distinct x' \neq x such that f(x') = f(x), with adversary advantage \Adv_f^\Sec(\mathcal{A}) = \Pr\left[ f(\mathcal{A}(x)) = f(x) \;\middle|\; x \leftarrow \{0,1\}^{m+n}, \mathcal{A}(x) \neq x \right]. goes further, requiring that finding any pair x \neq x' with f(x) = f(x') is hard, formalized by \Adv_f^\Coll(\mathcal{A}) = \Pr\left[ f(x_1) = f(x_2) \;\middle|\; (x_1, x_2) \leftarrow \mathcal{A}, \, x_1 \neq x_2 \right]. Collision resistance implies second-preimage resistance, but neither property implies preimage resistance in general; these properties are related but must be analyzed separately for secure constructions. The fixed output length n from the compression property ties into this by enabling adversaries to target specific outputs, but the one-way barrier prevents efficient inversion. The security of the one-way property assumes the underlying primitive, such as a , is secure, with strength measured in bits of security. For an n-bit output, preimage resistance ideally provides n bits of security, as inverting requires on the order of $2^n operations in the worst case, while is bounded by the birthday paradox at roughly n/2 bits, allowing collisions after about $2^{n/2} evaluations. In the model, where the compression function is treated as an ideal random function, a generic security bound shows that one-wayness ensures holds up to approximately $2^{n/2} queries: an adversary querying the adaptively will encounter a collision with high probability only after this threshold due to the birthday paradox, while preimage recovery remains negligible at $2^{-n} per attempt. This model provides a proof framework demonstrating how the one-way property underpins resistance to collision attacks within practical query limits. Theoretical weaknesses can still affect applications relying on one-way compression functions, such as length extension vulnerabilities if not mitigated. In such cases, an attacker knowing y = f(x) for some input x may compute f(x \| \pad \| z) for padding \pad and appended data z without recovering x, exploiting the iterative nature to forge extended inputs. This undermines in protocols like message authentication where outputs are used directly, highlighting the need for careful design to preserve the one-way property's implications.

General Constructions

Merkle–Damgård Construction

The is a domain extension method that transforms a fixed-input-length one-way compression function into a variable-input-length by iteratively applying the compression function to successive blocks of the input message. The process begins with an initial chain value H_0, typically a fixed constant known as the (IV), and proceeds by processing the message in blocks of length b bits, where the compression function f takes an n-bit chain value and a b-bit block to produce an n-bit output. For a message divided into k blocks m_1, m_2, \dots, m_k, the chain values are computed as H_i = f(H_{i-1}, m_i) for i = 1 to k, with the final being H_k. To handle messages of arbitrary length, the input is first padded to ensure its length is a multiple of the block size b. The standard padding scheme appends a single '1' bit, followed by zeros, and concludes with the binary representation of the original message length (in bits) encoded in a fixed-length , typically 64 bits. This padding prevents ambiguities in message recovery and ensures that messages of different lengths do not collide after , as the length suffix distinguishes them even if the preceding bits match. The final padded block m_k incorporates this and the length, so the is computed as H(m) = f(H_{k-1}, m_k), where m_k includes the padding and length. The construction inherits from the underlying compression function: if f is collision-resistant, then the resulting H is also collision-resistant, as a collision in H would imply a collision in f via the iterative chain. Independently proposed by and Ivan Damgård in 1989, the paradigm provides a foundational proof that collision resistance in the compression function suffices for the full hash under this . This construction forms the basis for several widely adopted hash functions, including , , and the family (, , , ), which all iterate a compression function over message blocks starting from a fixed . Despite its security proofs, the is vulnerable to length extension , where an attacker knowing H(m) and the length of m can compute H(m || \text{[padding](/page/Padding)} || m') for an appended message m' without knowing m, due to the exposed value and predictable . Additionally, Joux's multicollision allows finding $2^n colliding messages with only n \cdot 2^{n/2} effort—roughly n times the cost of a single collision—by building a of collisions across values, weakening the structure for applications requiring second preimage resistance.

Sponge Construction

The provides an alternative approach to building cryptographic functions, without relying on a traditional one-way compression . It iteratively applies a fixed-width to a divided into two parts: the r, which handles input and output, and the c, which contributes to security. The total state size is b = r + c bits, with the \pi operating on the entire state. Input data is absorbed into the state r bits at a time by XORing it into the rate portion, followed by applying \pi; similarly, output is generated by squeezing r bits from the rate portion via XOR, then applying \pi again. This enables the construction to function as a variable-input-length that supports arbitrary output lengths while maintaining the one-way through the assumed one-way nature of \pi. The input message is first padded using a multi-rate padding scheme, such as appending a '1' bit followed by zeros to reach a multiple of r, ensuring the final block is properly delimited. In the absorption phase, each r-bit block m_i (padded as needed) is XORed into the first r bits of the current state S_{i-1}, and the permutation is applied to the full state: S_i = \pi\left( S_{i-1} \oplus (m_i \| 0^c) \right), where \| denotes concatenation and $0^c are c zero bits. Once absorption is complete, the squeezing phase begins: the first r bits of the current state are XORed to form the output block z_j, and \pi is applied to update the state for the next squeeze if more output is required. This duplex-like operation allows for efficient handling of variable output lengths without recomputing prior steps. The compression property is achieved since the rate r is typically smaller than the full state size b, effectively compressing input into the internal state. Security in the sponge construction derives primarily from the c, providing up to approximately c/2 bits, as generic attacks are bounded by the inner state collisions. Unlike some iterative constructions, the sponge avoids length-extension vulnerabilities because the output does not directly reveal the internal state or message length in a way that allows appending hidden data. Introduced by Bertoni, Daemen, Peeters, and Van Assche in , the sponge construction forms the basis for the Keccak hash function, which NIST selected as the core of in 2012. Key advantages include full-state via the \pi, which ensures changes in input propagate across the entire , enhancing to attacks. The construction is inherently parallelizable during the squeezing phase for generating multiple output blocks, and tree-based variants, such as the tree sponge, allow for parallel absorption of input blocks to accelerate computation on multi-core processors.

Block Cipher-Based Constructions

Davies–Meyer Construction

The Davies–Meyer construction is a for building a one-way compression function from an unkeyed , transforming an n-bit chaining value and an n-bit message block into an n-bit output. It operates by treating the message block as the key to the and XORing the result with the chaining value, ensuring compression when the key size equals the block size. The structure is defined as follows: for a E: \{0,1\}^n \times \{0,1\}^n \to \{0,1\}^n, the compression function is f(H, m) = E_m(H) \oplus H, where H \in \{0,1\}^n is the chaining input and m \in \{0,1\}^n is the message block used as the key. In an iterated , this yields the update rule H_i = f(H_{i-1}, m_i) = E_{m_i}(H_{i-1}) \oplus H_{i-1}, processing the message in sequential blocks starting from an initial value H_0. This design leverages the (PRP) properties of the , where the message-dependent keying introduces variability while the XOR feedforward prevents straightforward reversal. The rationale behind the construction lies in exploiting the block cipher's PRP security to achieve one-wayness: the output's dependence on both the input and the cipher's internal mixing makes inversion computationally infeasible without breaking the . Specifically, given y = f(H, m), recovering H requires solving for the preimage under E_m, which demands approximately $2^{n-1} evaluations of E in a black-box model, equivalent to the effort needed to distinguish E from a . Thus, the one-way property holds if E is a secure PRP. The construction is attributed to and William L. Meyer from unpublished work circa 1989, laying foundational groundwork for subsequent hash designs, including the compression functions in and , which adapt it using custom round-based "ciphers" with 512-bit message blocks and 128-bit chaining values. Despite its strengths, the –Meyer construction is susceptible to meet-in-the-middle attacks when iterated without careful or domain separation, potentially reducing preimage resistance to O(2^{n/2}) time and space for an n-bit output by exploiting the XOR structure across multiple iterations.

Matyas–Meyer–Oseas Construction

The Matyas–Meyer–Oseas construction is a method for building a one-way compression function from a , where both the input chaining value H and message block m are of block length n. The compression function is defined as f(H, m) = H \oplus E_m(E_H(m)), where E_k(\cdot) denotes the block cipher encryption under key k. In the context of an iterated , this yields the update rule H_i = H_{i-1} \oplus E_{m_i}(E_{H_{i-1}}(m_i)), with H_0 as the and the final H_k as the output after processing k message blocks. Introduced by Stephen M. Matyas, Carl H. Meyer, and John Oseas in 1985, this construction was proposed as a way to generate strong one-way functions using cryptographic algorithms like s. It influenced early designs, serving as the basis for the single-block-length generic in ISO/IEC 10118-2. Unlike the simpler Davies–Meyer construction, which relies on a single call, the Matyas–Meyer–Oseas approach uses nested encryptions to provide better diffusion by encrypting the message under a chain-derived key. The security of the , particularly its , reduces to finding high-probability events in the underlying E; it is provably secure up to the bound if E behaves as an ideal . This holds in the ideal-cipher model, where the construction (one of the 12 secure schemes among 64 analyzed by Preneel, Govaerts, and Vandewalle) resists collisions with advantage at most q^2 / 2^{n+1} for q queries. The double encryption enhances mixing, making it harder for adversaries to exploit structural weaknesses compared to single-call modes. A primary drawback is its computational cost: it requires two full block cipher encryptions per message block, making it roughly twice as slow as single-call alternatives like Davies–Meyer.

Miyaguchi–Preneel Construction

The Miyaguchi–Preneel construction transforms a into a one-way compression function by symmetrically incorporating both the previous chaining value and the message block into the output, ensuring balanced influence from each input. This approach fits within the broader paradigm of block cipher-based constructions, where the compression function maps a 2n-bit input (chaining value H of length n and message block m of length n) to an n-bit output. The structure is given by f(H, m) = E_H(m) \oplus H \oplus m, where E_K(P) denotes the block cipher encryption of P under K. In an iterated using the Merkle–Damgård , the proceeds as H_i = E_{H_{i-1}}(m_i) \oplus H_{i-1} \oplus m_i, with an initial value H_0 (often a fixed constant) and the final hash derived from H_l after processing all l message blocks. The rationale for this design lies in its mechanism, which creates a on the full of inputs akin to a feedback shift register; the post-encryption XOR of both H_{i-1} and m_i propagates changes throughout the computation, enhancing diffusion and preventing localized attacks. This symmetric treatment distinguishes it from asymmetric modes, promoting uniform sensitivity to input alterations. The construction was independently proposed by Shoji Miyaguchi in 1990 as part of the N-Hash function based on the FEAL and by Bart Preneel around the same period; it received formal analysis and security evaluation in a 1993 paper that systematized -based hash designs. Under the ideal cipher model—treating the block cipher as a —the Miyaguchi–Preneel function is provably secure, with preimage resistance requiring approximately $2^{n-1} evaluations to find a valid input for a given output, and it resists direct, , forward, backward, and fixed-point attacks. Compared to single-XOR modes like –Meyer, it offers improved resilience to certain differential attacks due to the dual XOR providing stronger algebraic separation between inputs and outputs. Key advantages include a strong , where a one-bit change in the input typically flips about half the output bits, ensuring rapid diffusion, and potential for parallelization in multi-branch or wide-pipe extensions where independent iterations can be computed concurrently. The design's efficiency (rate-1, with one encryption per block) and reliance on mature block ciphers like make it practical for implementation.

Hirose Construction

The Hirose construction is a -based approach to building one-way functions, proposed by in 2006 and specifically designed for lightweight hashing in resource-constrained environments such as RFID tags. The structure employs a single call per compression and is defined as f(H, m) = E(H \oplus m, m) \oplus H, where E is a with m serving as the key or as the tweak in a tweakable configuration to ensure secure operation. In the context of an iterated using the Merkle–Damgård , the chaining update is given by H_i = E(H_{i-1} \oplus m_i, m_i) \oplus H_{i-1}, where H_{i-1} is the previous chaining value and m_i is the i-th block. This design minimizes the number of calls to one per compression step while maintaining independence between iterations; the incorporation of m as the key or tweak mitigates potential related-key attacks that could compromise the properties of the underlying . In terms of security, the construction achieves provable in the cipher model, with the bound scaling as O(q^2 / 2^n) for q queries, relying on a single oracle query per . It also demonstrates superior resistance to multicollisions compared to certain other PGV-style single-call modes, owing to the input XOR enhancing and reducing fixed-point vulnerabilities. Compared to double-call modes like the Miyaguchi–Preneel construction, the Hirose approach is more efficient in terms of computational overhead, while delivering security guarantees comparable to the –Meyer construction—a similar single-call method that omits the tweak for key scheduling.

References

  1. [1]
    One Way Hash Functions and DES
    One Way Hash Functions and DES. Ralph C. Merkie. Xerox PARC. 3333 Coyote Hill ... The first definition was apparently given by. Merkle [1.2] who also gave ...
  2. [2]
    None
    ### Summary of Collision-Free Hash Functions and Compression Function
  3. [3]
    [PDF] Design Principles for Hash Functions Revisited
    Oct 15, 2005 · A one-way hash function H is a function with domain D = and range R = L n that satisfies the following conditions: • preimage resistance ...
  4. [4]
    [PDF] Making Large Hash Functions From Small Compression Functions
    Ivan Damgård. A design principle for hash functions. In Gilles Brassard, edi- tor, Advances in Cryptology - CRYPTO '89, 9th Annual International Cryptology.
  5. [5]
    (PDF) A Design Principle for Hash Functions - ResearchGate
    A Design Principle for Hash Functions ; Ivan Bjerre Damghdl ; We show that if there exists a computationally collision free function f from m bits to t bits where.
  6. [6]
    [PDF] Multi-Property-Preserving Hash Domain Extension and the EMD ...
    We do this by presenting an example CR compression function h such that applying each of the four transforms to it results in a hash function for which finding ...<|control11|><|separator|>
  7. [7]
    [PDF] On High-Rate Cryptographic Compression Functions*
    An important property of a hash function is performance. Therefore, one would like to maximize the rate of hash function – the number of message blocks.
  8. [8]
    [PDF] fips pub 180-4 - federal information processing standards publication
    Aug 4, 2015 · For the secure hash algorithms, the size of the message block - m bits - depends on the algorithm. a) For SHA-1, SHA-224 and SHA-256, each ...Missing: compression | Show results with:compression
  9. [9]
    [PDF] Cryptographic Hash-Function Basics: Definitions, Implications, and ...
    Black, Rogaway, and Shrimpton [5] use a concrete definition of preimage resistance that requires inversion of a uniformly selected range point. Two papers ...
  10. [10]
  11. [11]
    [PDF] Chapter 9 - Hash Functions and Data Integrity
    Remark 9.35). Handbook of Applied Cryptography by A. Menezes, P. van Oorschot and S. Vanstone. Page 5. 324. Ch. 9 Hash Functions and Data Integrity.
  12. [12]
    One Way Hash Functions and DES - SpringerLink
    Jul 6, 2001 · One Way Hash Functions and DES. Conference paper; First Online: 01 January 2001. pp 428–446; Cite this conference paper.
  13. [13]
    A Design Principle for Hash Functions - SpringerLink
    CRYPTO' 89 Proceedings. CRYPTO 1989. Lecture ...
  14. [14]
    Merkle-Damgård Construction Method and Alternatives: A Review
    This paper provides a review of cryptographic hash function, its security requirements and different design methods of compression function.
  15. [15]
    [PDF] Security of Practical Cryptosystems Using Merkle-Damgård Hash ...
    Merkle. One Way Hash Functions and DES. In CRYPTO, volume 435 of Lecture Notes in Computer. Science, pages 428–446. Springer, 1989. 30. Yusuke Naito, Kazuki ...
  16. [16]
    Multicollisions in Iterated Hash Functions. Application to Cascaded ...
    In this paper, we study the existence of multicollisions in iterated hash functions. We show that finding multicollisions, i.e. r-tuples of messages that ...
  17. [17]
    Design and security - Keccak Team
    There is a minimum output length as a consequence of the chosen security strength level (i.e., to avoid generic birthday attacks), but it is not the other way ...
  18. [18]
    SHA-3 Standard: Permutation-Based Hash and Extendable-Output ...
    This Standard specifies the Secure Hash Algorithm-3 (SHA-3) family of functions on binary data. Each of the SHA-3 functions is based on an instance of the ...
  19. [19]
    [PDF] Analysis and Design of Cryptographic Hash Functions
    In this thesis cryptographic hash functions will be studied according to the three ... Matyas-Meyer-Oseas and the Davies-Meyer scheme they construct collisions ...
  20. [20]
    Hash functions based on block ciphers: a synthetic approach
    CRYPTO' 93 (CRYPTO 1993). Hash functions based on block ciphers: a synthetic approach. Download book PDF. Bart Preneel, ...
  21. [21]
    Some Plausible Constructions of Double-Block-Length Hash ...
    In this article, it is discussed how to construct a compression function with 2 n-bit output using a component function with n-bit output.
  22. [22]
    [PDF] LIGHTWEIGHT CRYPTOGRAPHY
    Feb 4, 2009 · ... compression function, the. Hirose construction is probably one of the most efficient constructions of this type, e.g. in the case of PRESENT ...
  23. [23]
    [PDF] an Efficient and One-Pass Leakage-Resistant Mode of Operation
    Aug 31, 2022 · Triplex makes use of Hirose's compression function (based on TBCs) ... Figure 3: Hirose's compression function that is built on top of a tweakable ...
  24. [24]
    [PDF] Black-Box Analysis of the Block-Cipher-Based Hash-Function ...
    May 31, 2002 · Definition 2.3 might be understood as giving the technical meaning of preimage resistance. ... Phil Rogaway and his student Tom Shrimpton ...
  25. [25]
    [PDF] Blockcipher Based Hashing Revisited - Cryptology ePrint Archive
    Abstract. We revisit the rate-1 blockcipher based hash functions as first studied by Preneel, Govaerts and. Vandewalle (Crypto'93) and later extensively ...<|control11|><|separator|>