Fact-checked by Grok 2 weeks ago

MD4

MD4 (Message-Digest Algorithm 4) is a cryptographic function that produces a 128-bit value, known as a message digest, from an input message of arbitrary length. Developed by Ronald L. Rivest of the Laboratory for and RSA Data Security, Inc., it was published in 1990 as a faster and simpler successor to the earlier MD2 algorithm, with the primary goal of enabling secure compression of large files for digital signature applications under public-key cryptosystems like . The algorithm processes input in 512-bit blocks through three rounds of 16 operations each, using a 128-bit buffer initialized with fixed constants and incorporating bitwise functions (f, g, h) along with modular additions and rotations optimized for 32-bit processors. At the time of its release, MD4 was conjectured to offer strong security properties, including —making it computationally infeasible (requiring more than 264 operations) to find two distinct messages with the same digest—and —requiring more than 2128 operations to find a message matching a given digest. It was specified in RFC 1320 in 1992 and implemented in various systems for checks, such as early versions of Microsoft's authentication protocol. However, subsequent revealed significant weaknesses: a full was demonstrated in 1996 by Hans Dobbertin with 220 MD4 computations, and it was further improved in 2004 by Xiaoyun Wang and colleagues with a complexity of just 28 hash operations (verifiable manually). Further advances included a first in 2008 with 2100 effort and improvements reducing second preimage attacks to 269.4 complexity, along with key recovery vulnerabilities in HMAC-MD4 requiring 277 computations. Due to these flaws, MD4 provides no meaningful and has been deprecated since the mid-1990s by its creators at , with the IETF formally moving it to Historic status in 2011 to discourage any further use in cryptographic protocols. It served as the basis for subsequent hash functions like (1991) and influenced the design of , but modern applications have shifted to secure alternatives such as SHA-256 from the family. Despite its obsolescence, MD4 remains of historical and academic interest for studying design and differential techniques.

Overview and History

Description and Purpose

MD4 is a developed by Ronald Rivest in 1990. It takes an input message of arbitrary length and produces a fixed 128-bit output, typically represented as a 32-character digest. The primary purpose of MD4 is to provide a secure and efficient method for compressing , particularly for use in applications where large files must be "compressed" before with a private key. It was designed to support checks by generating a unique that verifies whether a has been altered. Optimized for fast computation on 32-bit processors, MD4 achieves high throughput, such as approximately 1,450,000 bytes per second on a SUN Sparc workstation. MD4 processes input messages in 512-bit blocks and maintains an internal state consisting of four 32-bit words, labeled A, B, C, and D. These state variables are initialized to fixed constants: A = 0x67452301, B = 0xefcdab89, C = 0x98badcfe, and D = 0x10325476. Compared to its predecessor MD2, which was optimized for 8-bit processors, MD4 offers significant improvements in speed and simplicity while maintaining the 128-bit output size and targeting enhanced security against collision attacks at the time of its release. Rivest later developed MD5 in 1991 as a strengthened successor to address emerging concerns with MD4.

Development and Key Milestones

MD4 was developed by Ronald L. Rivest, a prominent cryptographer known for co-inventing the public-key cryptosystem in 1977 alongside and , and for creating the message-digest algorithm in 1989 as an early optimized for 8-bit processors. Building on these foundations, Rivest designed MD4 in late 1990 as a faster and more secure successor to MD2, specifically tailored for 32-bit architectures to address the growing need for efficient one-way hash functions in digital signature schemes. The primary motivation for MD4 stemmed from the requirements of emerging protocols and applications, such as (PGP), which demanded a secure method to compress arbitrary-length messages into a fixed 128-bit digest for integrity verification and signing with public-key systems like , without relying on large lookup tables that could hinder performance. Rivest placed the algorithm in the to encourage review and potential standardization, reflecting the collaborative ethos of early cryptographic development in the era. Key milestones include the first public description in RFC 1186 in October 1990, which outlined an initial version of the algorithm, followed by the definitive specification in RFC 1320 published by the Internet Engineering Task Force (IETF) in April 1992, providing a portable reference implementation. MD4 saw rapid adoption shortly thereafter, notably in the initial release of PGP software in 1991, where it served as the core hashing mechanism for message authentication, and in various early cryptographic libraries by 1993, underscoring its role in facilitating secure email and data exchange protocols. This widespread integration was short-lived, however, as emerging security concerns prompted Rivest to introduce MD5 in 1991 as a strengthened variant.

Algorithm Design

Message Preparation and Padding

In the MD4 algorithm, the input message is first prepared by appending padding bits to ensure it can be divided into 512-bit blocks for processing. This padding scheme begins with the addition of a single '1' bit to the end of the message, followed by enough '0' bits to make the total length congruent to 448 512 bits. The purpose of this padding is to align the message length with MD4's block-based design, facilitating the compression function's operation on fixed-size inputs. Padding is applied regardless of the message's original length; if the length is already congruent to 448 modulo 512 bits, a full 512 bits of padding (starting with the '1' bit) are still appended to create an additional block. At least one bit is always added, and the maximum padding added in a single step is 512 bits. Following the bit padding, a 64-bit little-endian representing the original length in bits is appended as the final 64 bits, resulting in a total length that is a multiple of 512 bits. For messages of 0 to 447 bits, the results in exactly one 512-bit ; for 448 to 511 bits, it results in two 512-bit s. This ensures all inputs are processed through the compression function. MD4 supports a maximum original message length of $2^{64} - 1 bits to avoid overflow in the 64-bit length field; if the message exceeds this, only the low-order 64 bits of the length are used, though such cases are not standard.

Initialization and State Management

The MD4 hash function begins with the initialization of a state consisting of four 32-bit registers, denoted as A, B, C, and D. These registers are set to specific initial values: A = 0x67452301, B = 0xefcdab89, C = 0x98badcfe, and D = 0x10325476. These constants are chosen arbitrarily but fixed to ensure consistent hashing across implementations. The state is maintained as a chain of these four 32-bit words throughout the computation. For each 512-bit message block, the current values of A, B, C, and D serve as the input to the compression function. To preserve the chaining, the pre-compression values are temporarily saved (as AA, BB, CC, and DD), the compression is performed to update A, B, C, and D, and then the saved values are added to the updated registers: A ← AA + A, B ← BB + B, C ← CC + C, and D ← DD + D. This modular addition (modulo 2^32) ensures that the output state of one block directly becomes the input state for the subsequent block, enabling the processing of messages longer than 512 bits. MD4 operates on data in little-endian byte order, where the least significant byte of each 32-bit word is stored first. This convention applies to both the representation of the state registers and the parsing of message into 32-bit words, ensuring portability across different system architectures.

Compression Function Steps

The MD4 compression function processes each 512-bit message to update the 128-bit variables, producing an intermediate value that is carried forward. Each is divided into 16 32-bit words, denoted as M through M, which serve as inputs to a series of 48 computational steps organized into three rounds of 16 steps each. These steps perform nonlinear transformations, modular additions, and bit rotations to achieve and mixing across the state variables A, B, C, and D (initialized from the chaining values). All arithmetic operations are performed 2^{32}, ensuring the values remain within 32 bits. In the first (steps to ), the F is applied, defined as F(X, Y, Z) = (X \land Y) \lor (\lnot X \land Z), which selectively combines bits based on the value of X. Each step updates one of the state variables according to the formula: a = (a + F(b, c, d) + M) \ll s where k indexes the message word (following a specific sequence: ,,,,,,,,,,,,,,,), and s is the left amount, through , , , , , 19, , , , , , 19, , , , for the respective steps. No constant is added in this round (equivalent to adding ). The state variables are cyclically permuted after each update: for example, after updating A, the registers become D, A, B, C for the next step. This round emphasizes bitwise selection to propagate changes from the message into the state. The second round (steps 16 to 31) employs the function G, defined as G(X, Y, Z) = (X \land Y) \lor (X \land Z) \lor (Y \land Z), which provides a more balanced mixing by including majority-like operations among the inputs. The update formula becomes: a = (a + G(b, c, d) + M + 0x5A827999) \ll s Here, k follows the sequence 0, 4, 8, 12, 1, 5, 9, 13, 2, 6, 10, 14, 3, 7, 11, 15, and rotation amounts s cycle through 3, 5, 9, 13, 3, 5, 9, 13, 3, 5, 9, 13, 3, 5, 9, 13. The constant 0x5A827999, derived from the sine of fractions of π to promote avalanche effects, is added to each step. State permutation follows the same cyclic pattern as in the first round, enhancing nonlinear diffusion. The third round (steps 32 to 47) uses the function H, defined as H(X, Y, Z) = X \oplus Y \oplus Z, a linear XOR operation that further spreads bits across the state for parity-like mixing. The formula is: a = (a + H(b, c, d) + M + 0x6ED9EBA1) \ll s with k sequencing as 0, 8, 4, 12, 2, 10, 6, 14, 1, 9, 5, 13, 3, 11, 7, 15, and s values of 3, 9, 11, 15, 3, 9, 11, 15, 3, 9, 11, 15, 3, 9, 11, 15. The constant 0x6ED9EBA1, similarly based on trigonometric values, is incorporated. Cyclic permutation of the state continues, completing the block's transformation. Upon completing the 48 steps, the updated state values are added to the initial chaining variables: A \leftarrow A + AA, B \leftarrow B + BB, C \leftarrow C + CC, D \leftarrow D + DD, where AA through DD are the saved copies from the block's start. This produces the value for the next block or the final 128-bit if it is the last block. The design of these rounds, with varying functions, word orders, and rotations, aims to ensure thorough and irreversibility in the .

Security Analysis

Known Weaknesses and Theoretical Flaws

MD4 employs the Merkle-Damgård construction, which appends the message length after padding to the input blocks processed by the compression function; this structural choice renders it vulnerable to length extension attacks, where an adversary can extend a known hash value to forge a valid hash for a longer message without knowledge of the original secret. The early rounds of MD4 exhibit weak diffusion properties, with insufficient bit mixing that allows differences introduced in the input to propagate predictably through the state registers, facilitating the application of differential cryptanalysis techniques. This limited avalanche in the initial steps enables attackers to construct low-probability differential paths that exploit the linear nature of the operations, compromising the function's resistance to chosen-difference manipulations. MD4's compression function consists of 48 steps across three rounds, a choice that was deemed inadequate for providing full 128-bit even by mid-1990s standards, as it failed to achieve the expected level of non-linearity and mixing required to withstand emerging cryptanalytic methods. The reliance on relatively simple functions and fixed shift operations in these steps contributes to predictable behavior under analysis, falling short of the robustness needed for long-term . Theoretically, MD4 was intended to offer on the order of 2^{64} operations due to its 128-bit output size, but the specific choices in its nonlinear functions and round structure result in a to attain this bound, allowing collisions to be found with significantly lower complexity through structured differences. In comparison to its successor , MD4's three-round structure lacks the additional fourth round introduced in MD5, which enhances the by further propagating bit changes across the state and improving overall to better resist differential attacks. This simpler round design in MD4 prioritizes computational efficiency over security margins, contributing to its earlier vulnerability revelations.

Practical Attacks and Exploits

The first practical on MD4 was demonstrated by Hans Dobbertin in 1996, utilizing differential cryptanalysis to identify two distinct 512-bit blocks that produce identical hash outputs. This attack targets the full compression function and requires roughly 2^{20} evaluations of the MD4 function, executable in seconds on contemporary 1990s hardware such as a standard PC. Building on earlier partial attacks, including those on reduced rounds by den Boer and Bosselaers from , Dobbertin's method extended the approach to the complete , confirming MD4's vulnerability to collisions with high probability. The feasibility of generating such collisions in minimal time underscored MD4's inadequacy for collision-resistant applications, prompting immediate scrutiny of its deployment in protocols like digital signatures and integrity checks. In 2004, Xiaoyun Wang and colleagues further advanced collision attacks on MD4, achieving practical results with a complexity of 2^8 MD4 computations (about 256 operations, verifiable by hand) through refined paths. This improvement rendered collisions trivial to compute even on modest resources, amplifying concerns over MD4's in systems. Regarding second-preimage , weaknesses were first exploited in 2005 by Hongbo Yu, Gaoli Wang, and others, who developed a second-preimage attack with complexity reduced compared to the brute-force 2^{128}. For first preimage , Gaëtan Leurent presented an attack in 2008 with complexity approximately 2^{102}. These were further improved in 2010 by Jian Guo et al., reducing first preimage attacks to 2^{78.4} and second preimage attacks to 2^{69.4} with pre-computations. Although not as inexpensive as collisions, these attacks demonstrated that finding colliding messages is feasible with significant but achievable resources, further eroding trust in MD4 for purposes. Although no high-profile malware like (which targeted weaknesses for certificate forgery in 2012) directly leveraged MD4, its vulnerabilities have contributed to exploits in outdated protocols, such as legacy authentication in Windows environments where MD4 hashing enables offline attacks on captured challenges. MD4's demonstrated flaws led to its deprecation by NIST in 2011 via RFC 6150, which retired the algorithm to historic status and explicitly advised against its use in any new or existing security contexts due to the ease of practical attacks. Today, MD4 is considered obsolete, with modern systems migrating to secure alternatives like SHA-256 to mitigate collision and preimage risks.

Examples and Implementations

Standard Test Vectors

Standard test vectors for the MD4 algorithm are defined in RFC 1320 to validate the correctness of implementations by comparing computed hashes against expected outputs for specific inputs. These vectors form part of a comprehensive , ensuring and adherence to the algorithm's specifications across different computing environments. The following table presents key test vectors from RFC 1320, including the empty message, a single-byte input, and a 62-byte message:
InputDescriptionMD4 Hash (hexadecimal)
(empty string)No input bytes31d6cfe0d16ae931b73c59d7e0c089c0
"a"Single ASCII byte (0x61)bde52cb31de33e46245e05fbdbd6fb24
"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789"62 ASCII bytes covering uppercase, lowercase letters, and digits043f8582f241db351ce627e153e7f0e4
These vectors account for MD4's message preparation, including with a single '1' bit followed by zeros to reach a congruent to 448 512 bits, and appending the 64-bit . Implementations must process data in little-endian byte order for 32-bit words to achieve reproducible results matching these hashes.

Collision Demonstration

A prominent demonstration of MD4's collision vulnerability was provided by Hans Dobbertin in 1996, who developed an attack to construct colliding s for the full MD4 algorithm. The attack exploits structural weaknesses through differential , allowing collisions to be found with low complexity (around 220 operations) in seconds on contemporary hardware. Dobbertin's method adapts techniques from attacking related functions like , propagating differences through MD4's rounds to produce distinct messages with the same hash output. This example highlights MD4's lack of , as small input modifications can lead to identical digests, undermining its use in integrity checks.

References

  1. [1]
    RFC 1320 - The MD4 Message-Digest Algorithm - IETF Datatracker
    Network Working Group R. Rivest Request for Comments: 1320 MIT Laboratory for Computer Science Obsoletes: RFC 1186 and RSA Data Security, Inc. April 1992 ...
  2. [2]
    [PDF] THE MD4 MESSAGE DIGEST ALGORITHM - DSpace@MIT
    Abstract. The MD4 message digest algorithm takes an input message of arbitrary length and produces an output 128-bit "fingerprint" or "message digest", in.Missing: original | Show results with:original
  3. [3]
    RFC 6150 - MD4 to Historic Status - IETF Datatracker
    MD4 to Historic Status · RFC - Informational March 2011. View errata Report errata IPR. Obsoletes RFC 1320. Was draft-turner-md4-to-historic (individual in gen ...
  4. [4]
    RFC 1321 MD5 Message-Digest Algorithm - IETF
    The MD5 algorithm is intended for digital signature applications, where a large file must be "compressed" in a secure manner before being encrypted with a ...
  5. [5]
    Ronald L. Rivest: Publications and Talks - People | MIT CSAIL
    Pogue and Ronald L. Rivest. U.S. Patent 5,144,667. Issued December 20, 1990. (1990-12-20) bib pdf google-patent-page link. 112. [KPR90] Learning time-varying ...Missing: invention date<|separator|>
  6. [6]
    RFC 1319: The MD2 Message-Digest Algorithm
    This document describes the MD2 message-digest algorithm. The algorithm takes as input a message of arbitrary length and produces as output a 128-bit ...
  7. [7]
    None
    - **Date of Technical Memo**: October 1990
  8. [8]
    RFC 1186: MD4 Message Digest Algorithm
    This note describes the MD4 message digest algorithm. The algorithm takes as input an input message of arbitrary length and produces as output a 128-bit ...Missing: date | Show results with:date
  9. [9]
    [PDF] PGP User's Guide - Tech Insider
    The message digest algorithm used here is the MD4 Message Digest. Algorithm, placed in the public domain by RSA Data Security, Inc. Documents are signed by ...
  10. [10]
  11. [11]
  12. [12]
  13. [13]
    RFC 1320: The MD4 Message-Digest Algorithm
    RFC 1320 MD4 Message-Digest Algorithm April 1992 ; 3.1 Step 1. Append Padding Bits ; 3.2 Step 2. Append Length ; 3.3 Step 3. Initialize MD Buffer ...
  14. [14]
    [PDF] Cryptanalysis of MD4
    There is an algorithm such that the finding of collisions for MD4 requires the same computational effort as about 220 computations of the MD4-compression ...
  15. [15]
    Cryptanalysis of MD4 | Journal of Cryptology
    Sep 1, 1998 · ... full MD4, while previously only partial attacks were known. An implementation of our attack allows us to find collisions for MD4 in a few ...
  16. [16]
    [PDF] MD4 is Not One-Way
    Collision resistance is the strongest no- tion, so most constructions use a collision resistant hash function, and most cryptanalysis target collision attack.
  17. [17]
    Flame malware collision attack explained - Microsoft
    Jun 6, 2012 · On systems that pre-date Windows Vista, an attack is possible without an MD5 hash collision. This certificate and all certificates from the ...Missing: MD4 | Show results with:MD4
  18. [18]
    The Renaissance of NTLM Relay Attacks: Everything You Need to ...
    Apr 8, 2025 · Relay attacks are the easiest way to compromise domain-joined hosts nowadays, paving a path for lateral movement and privilege escalation.
  19. [19]
    Cryptanalysis of MD4 - SpringerLink
    Dobbertin, H.: The compress function of extended MD4 is not collision-free, preprint. Google Scholar. Dobbertin, H., Bosselaers, A., Preneel, B.: RIPEMD-160 ...