PBKDF2
PBKDF2 (Password-Based Key Derivation Function 2) is a cryptographic key derivation function designed to produce a cryptographic key from a password, salt, and other parameters, intentionally slowing down the process through multiple iterations of a pseudorandom function (PRF) to resist brute-force and dictionary attacks.[1] Defined in the Public-Key Cryptography Standards (PKCS #5) version 2.1 and published as RFC 8018 by the Internet Engineering Task Force (IETF), PBKDF2 applies the PRF—typically HMAC with an approved hash function such as SHA-256—iteratively to derive keys of variable length, making it suitable for password-based encryption and authentication schemes.[1][2] The function takes five inputs: a password P, a salt S (with at least 128 bits of randomness recommended by NIST to prevent precomputation attacks), an iteration count c (a positive integer that determines the computational cost, with higher values increasing security), a desired derived key length dkLen in octets, and the PRF.[1][2][3] The algorithm computes the derived key DK by dividing it into blocks of the PRF output length hLen, then for each block i, it generates a chain of c PRF values starting from U_1 = PRF(P, S || INT(i)), followed by U_j = PRF(P, U_{j-1}) for j = 2 to c, and XORs them to form F(P, S, c, i) = U_1 \oplus U_2 \oplus \cdots \oplus U_c; the full key is the concatenation of these blocks.[1] This iterative chaining ensures that each output block depends on the previous computations, amplifying the work factor against parallel attacks.[1] PBKDF2 is recommended by standards bodies like the National Institute of Standards and Technology (NIST) for password-based key derivation when used with HMAC and an approved underlying hash function, as outlined in Special Publication 800-132, though it notes that the iteration count should be tuned to require approximately 0.25 to 1 second of computation on target hardware, depending on the application, to balance security and usability. Modern guidelines, such as OWASP's as of 2023, recommend at least 600,000 iterations for PBKDF2-HMAC-SHA256.[2][4] It supersedes the earlier PBKDF1 from PKCS #5 v1.5 (RFC 2898), which is retained only for backward compatibility and not advised for new implementations due to limitations like fixed output length.[1] Widely adopted in various cryptographic protocols and systems for secure password storage (e.g., generating hashes resistant to rainbow tables via the salt), PBKDF2 remains a foundational tool in modern cryptography despite the emergence of memory-hard alternatives like scrypt or Argon2 for enhanced protection against GPU-accelerated attacks.[2][1]Introduction
Purpose and Applications
PBKDF2, or Password-Based Key Derivation Function 2, is a deterministic function that derives a cryptographically secure pseudorandom key from a password by applying a pseudorandom function, such as HMAC, iteratively based on inputs including the password, a salt, an iteration count, and the desired key length.[5] This process transforms low-entropy passwords into keys suitable for encryption, message authentication, or other cryptographic operations, ensuring the output appears random even from predictable inputs.[6] The primary applications of PBKDF2 include password-based encryption schemes, such as PBES2, where it generates keys to encrypt data protected by user passwords.[7] It is also widely used for secure password storage in authentication systems, where the derived key serves as a hashed verifier for login credentials, as recommended by NIST for storing password verifiers.[8] Examples include disk encryption tools like Microsoft's BitLocker, which employs PBKDF2 to derive keys from passphrases for full-volume protection. A core feature of PBKDF2 is key stretching, achieved through a high number of iterations that deliberately increase the computational cost of each key derivation, thereby slowing down brute-force and dictionary attacks on compromised passwords.[5] This makes offline attacks on leaked password databases significantly more time-intensive, as each password guess requires repeating the full iteration process.[4]High-Level Operation
PBKDF2 operates by transforming a human-memorable password into a cryptographically secure key through a deliberate, resource-intensive process designed to thwart unauthorized access. The primary inputs include the password, a secret string easy for users to remember; a salt, which is a randomly generated value that ensures each derivation is unique and prevents precomputation attacks such as rainbow tables; an iteration count, representing the number of repeated hashing operations to impose a computational burden; and a specified output length, determining the size of the resulting key in bits or bytes.[2] At a conceptual level, the derivation begins with combining the password and salt as the initial input to a pseudorandom function, typically HMAC based on a secure hash like SHA-256. This function is then applied iteratively—reusing the output of each round as input to the next—for the full iteration count, effectively "stretching" the weak password input into a robust intermediate value. If the desired output exceeds the length of a single iteration's result, multiple such blocks are generated and concatenated to achieve the required key size.[2] The final output is a fixed-length bit string suitable for use as a symmetric encryption key, such as for algorithms like AES. For example, deriving a 256-bit AES key from a weak password like "password123" involves pairing it with a 128-bit random salt and performing 100,000 iterations; this process completes in seconds on standard hardware for authorized access but significantly slows potential attackers.[2]Algorithm Details
Parameters
PBKDF2 requires five primary input parameters to derive a cryptographic key from a password: the password itself, a salt, an iteration count, the desired key length, and a pseudorandom function (PRF).[1] The password (P) is a variable-length octet string serving as the base input, typically derived from user-provided text encoded in UTF-8 to ensure consistent byte representation across systems. It must be handled securely in memory and transmission to prevent leakage, as exposure could undermine the derivation process.[2] The pseudorandom function (PRF) is a keyed function that takes the password as the key and produces a pseudorandom output from an input string; it is typically HMAC with an approved hash function such as SHA-256, with output length hLen in octets.[1] The salt (S) is an octet string of random data, with a minimum length of 128 bits (16 bytes) to provide sufficient entropy and uniqueness for each key derivation.[2] This randomness thwarts precomputation attacks, such as rainbow tables, by ensuring that identical passwords yield different outputs when paired with different salts; the salt is regenerated uniquely for every derivation and stored alongside the derived key for later verification.[1] The iteration count (c) is a positive integer that dictates the number of HMAC computations performed during derivation, with a minimum value of 1 but a recommended baseline of at least 1,000 to increase computational resistance against brute-force attempts.[1] For modern hardware, values of 100,000 or higher—such as 600,000 iterations when using HMAC-SHA256—are advised to balance security against usability, with the exact number tunable based on the threat model, such as higher counts for server-side applications facing offline attacks.[4][2] The desired key length (dkLen) specifies the output size in octets, a positive integer up to (2^32 - 1) times the PRF's output block length, allowing flexibility for various cryptographic needs.[1] Common values include 32 bytes to match the key size for AES-256 encryption.[2]Derivation Process
The PBKDF2 derivation process generates a cryptographic key of specified length from a password by applying a pseudo-random function, typically HMAC, in an iterative manner to increase computational cost and resist brute-force attacks. When the desired key length exceeds the output size of the pseudo-random function, known as hLen, the process computes multiple independent blocks, each of length hLen, and concatenates them, truncating to the exact required length if necessary. The number of such blocks, denoted l, is the ceiling of dkLen divided by hLen. Each block is produced through a chained iteration of the pseudo-random function, followed by bitwise XOR combination of the results.[1] The procedure begins by determining l as described. For each block index i ranging from 1 to l, the process computes a value F specific to that block. To do so, it first generates an initial chain value U1 by applying the pseudo-random function to the password using as input the concatenation of the salt and the four-octet big-endian integer representation of i. Subsequent chain values are then derived iteratively: for each iteration j from 2 to the iteration count c, Uj is obtained by applying the pseudo-random function to the password using the previous chain value U_{j-1} as input. The block Ti is the result of bitwise XORing all c chain values, U1 through Uc.[1] Once all blocks are computed, the full intermediate output T is the concatenation of T1 through Tl. The final derived key consists of the first dkLen octets of T. Notably, the iteration for each block is performed independently, ensuring that the salt and block index uniquely initialize each chain, while the XOR operation aggregates the iterated computations within a block.[1] A high-level outline of the per-block F computation can be described as follows: initialize the result and the first U value using the pseudo-random function on the password with salt concatenated to the block index encoding; then, loop c-1 times, updating U with the pseudo-random function on the password and the prior U, and accumulating the XOR of the new U into the result. The overall derivation then concatenates these F results across blocks before truncation. This structure, defined in the original specification, ensures deterministic output given the inputs while amplifying the effort required for key recovery.[1]Mathematical Formulation
The Password-Based Key Derivation Function 2 (PBKDF2) is formally defined as a function that takes a password P, a salt S, an iteration count c, and a desired key length dkLen in octets, producing a derived key DK of length dkLen. Specifically, \text{PBKDF2}(P, S, c, dkLen) = T_1 || T_2 || \cdots || T_l \quad \text{(first } dkLen \text{ octets)}, where l = \lceil dkLen / hLen \rceil, || denotes concatenation, and hLen is the output block length of the underlying pseudorandom function (PRF) in octets.[1] Each block T_i for i = 1, 2, \dots, l is computed as T_i = F(P, S, c, i), where F is an auxiliary function that applies the PRF iteratively with XOR combinations.[1] The auxiliary function F is defined as F(P, S, c, i) = U_1 \oplus U_2 \oplus \cdots \oplus U_c, with \oplus denoting bitwise XOR, U_1 = \text{PRF}(P, S || \text{INT}(i)), and U_k = \text{PRF}(P, U_{k-1}) for each k = 2, \dots, c. Here, \text{INT}(i) represents the four-octet encoding of the positive integer i in big-endian byte order (most significant octet first).[1] The PRF is a keyed pseudorandom function that takes a secret key and an input string, producing a fixed-length output; by default, it is the HMAC keyed with P and using SHA-1 as the underlying hash function, yielding hLen = 20 octets.[1] However, the definition is extensible to other PRFs, provided their output length hLen is known and they meet the requirements for a secure pseudorandom function.[1] In the edge case where c = 1, the function F(P, S, 1, i) simplifies to a single application of the PRF: F(P, S, 1, i) = \text{PRF}(P, S || \text{INT}(i)), reducing the iteration to a direct computation without subsequent XORs.[1] The resulting DK is obtained by truncating the concatenated blocks to exactly dkLen octets, ensuring the output matches the requested length while maintaining the security properties derived from the iterative mixing.[1]Security Analysis
Resistance to Attacks
PBKDF2 provides strong resistance to brute-force attacks through its iterative application of a pseudorandom function (PRF), typically HMAC, which forces an attacker to perform a computationally intensive process for each password guess.[9] With a high iteration count, such as 600,000 for PBKDF2-HMAC-SHA256, each trial requires the equivalent of approximately 0.1 seconds on a single modern CPU core, significantly slowing down exhaustive searches.[10] This design ensures that the computational cost scales linearly with the number of iterations, allowing defenders to adjust the parameter based on available hardware to maintain a practical verification time while imposing a heavy burden on offline attackers. The mandatory inclusion of a unique, randomly generated salt—typically at least 128 bits long—prevents dictionary attacks and the use of precomputed rainbow tables, as an attacker must generate separate tables for every possible salt value, rendering such optimizations infeasible.[9][4] This salting mechanism ensures that even identical passwords produce distinct derived keys, eliminating the efficiency gains from precomputation across multiple users or entries. PBKDF2 offers partial resistance to time-memory tradeoff attacks due to its iteration-based structure, which limits the benefits of parallelization despite its low memory footprint; however, this makes it susceptible to acceleration on GPUs and ASICs, where thousands of cores can perform iterations concurrently.[11] To achieve effective protection against offline brute-force attacks, guidelines recommend at least 600,000 iterations for PBKDF2-HMAC-SHA256, calibrated to take roughly 100-500 milliseconds on defender hardware.[4] Although PBKDF2's core design does not inherently protect against side-channel attacks, software implementations are vulnerable to timing and cache-based leaks unless constructed with constant-time operations to ensure uniform execution regardless of input.[4]HMAC and Collision Vulnerabilities
PBKDF2 was originally specified in 2000 using HMAC-SHA1 as its default pseudorandom function (PRF), where the key derivation process iteratively applies HMAC with SHA-1 as the underlying hash. This design leverages HMAC's keyed construction to produce pseudorandom output from the password and salt, with iterations intended to increase computational cost against brute-force attacks. In 2017, researchers from Google and the CWI Amsterdam demonstrated the first practical collision attack on SHA-1, achieving two distinct inputs with the same hash output after approximately 2^{63} SHA-1 computations, far below the ideal 2^{80} complexity for a 160-bit hash. While SHA-1's collision resistance is now broken, this does not directly compromise HMAC-SHA1 in PBKDF2, as HMAC security relies primarily on the PRF properties rather than full collision resistance of the hash alone.[12] Theoretically, an attacker could attempt to forge HMAC outputs by exploiting collisions in the inner or outer padded messages, but this requires control over the secret key (the password in PBKDF2), which is infeasible without already knowing it. PBKDF2's salting and high iteration counts further mitigate potential collision-related risks, as each derivation is unique per salt and the iterations chain multiple HMAC calls, preventing direct reuse of precomputed collisions.[13] Nonetheless, SHA-1's degraded security margin—reducing effective collision effort to around 2^{61} in optimized attacks—prompts recommendations against using HMAC-SHA1 for new PBKDF2 implementations. Although the 2000 specification defaults to HMAC-SHA1, a 2011 update to related standards permitted alternative PRFs like HMAC-SHA-256, and post-2017 guidance emphasizes migration to SHA-256 or SHA-512 for PBKDF2 to restore full security margins. NIST's SP 800-131A, revised in 2011 and reaffirmed in subsequent guidance, deprecates SHA-1 for most cryptographic uses, including recommending stronger hashes for key derivation functions like PBKDF2, with a full phase-out targeted by 2030.[14] No complete break of HMAC-SHA1 has been achieved, but its use in legacy PBKDF2 systems warrants caution and eventual replacement to avoid reduced long-term security.Best Practices
When implementing PBKDF2 for password-derived keys, select an iteration count that balances security and usability, aiming for a minimum of 600,000 iterations when using HMAC-SHA-256 to ensure resistance to brute-force attacks on modern hardware.[4] Profile the iteration count on target hardware to achieve a derivation time of 100-500 milliseconds, providing a practical delay for legitimate users while significantly slowing offline attacks.[4] Generate salts of at least 16 bytes (128 bits) using a cryptographically secure pseudorandom number generator (CSPRNG) to prevent rainbow table attacks and ensure uniqueness per derivation. Store the salt alongside the derived key in a modular crypt format, such as$pbkdf2$iterations$salt$key, to facilitate secure retrieval and verification without exposing components.
Choose HMAC-SHA-256 or HMAC-SHA-512 as the underlying pseudorandom function, as these provide stronger collision resistance than HMAC-SHA-1, which should be avoided for new implementations due to known weaknesses; MD5 is entirely unsuitable.[4] Implementations must employ constant-time HMAC operations to mitigate timing attacks that could leak information through execution time variations.[15] Avoid naive parallelization of iterations, as it can undermine security unless performed with hardware-accelerated primitives validated for constant-time behavior.
Be aware that GPU acceleration dramatically reduces the effectiveness of low iteration counts, enabling attackers to perform up to 10^9 guesses per second on optimized hardware for modest configurations, underscoring the need for high iterations.[15] To maintain security as hardware advances, periodically re-derive keys for existing users by increasing the iteration count during password changes or maintenance windows, doubling it approximately every two years.[10]