Chosen-plaintext attack
A chosen-plaintext attack (CPA) is a cryptanalytic attack model in which an adversary can select arbitrary plaintext messages of their own choosing and obtain the corresponding ciphertexts produced by the target encryption scheme under a fixed secret key, with the goal of deriving information about the key or decrypting other ciphertexts.[1] This attack is more powerful than a known-plaintext attack, where the adversary is limited to analyzing pre-existing plaintext-ciphertext pairs, as it allows the attacker to tailor inputs to probe the system's weaknesses systematically.[2] CPAs are classified into non-adaptive and adaptive variants: in the non-adaptive case, all chosen plaintexts are submitted in advance without feedback, while in the adaptive chosen-plaintext attack, the adversary can dynamically adjust subsequent plaintext choices based on the ciphertexts received from prior queries.[2] In modern cryptography, resistance to chosen-plaintext attacks forms a foundational security notion, exemplified by indistinguishability under chosen-plaintext attack (IND-CPA), which requires that even with oracle access to encrypt chosen plaintexts, a probabilistic polynomial-time adversary cannot distinguish the encryption of one target plaintext from another with more than negligible advantage.[3] This model underpins the design of secure symmetric and public-key encryption schemes, ensuring they leak no useful information about plaintexts or keys despite active adversary interaction.[3] Although mounting a full CPA in practice is challenging—often requiring access to an encryption oracle, such as through a compromised device or service—it serves as a rigorous benchmark for evaluating encryption strength, originating from early discussions in public-key cryptography where it was recognized as a critical threat surpassing ciphertext-only attacks.[1] Notable vulnerabilities under CPA models have influenced standards like AES and RSA-OAEP, emphasizing the need for probabilistic encryption to achieve security.[3]Fundamentals
Definition
A chosen-plaintext attack (CPA) is a model of cryptanalysis in which an adversary can select arbitrary plaintext messages and obtain the corresponding ciphertexts produced by an encryption scheme under an unknown secret key, with the objective of recovering the key or enabling decryption of additional ciphertexts.[4] This attack assumes active interaction with the encryption process, distinguishing it from passive eavesdropping, where the adversary only observes existing ciphertexts without influencing the inputs.[1] Key terms in this context include plaintext, which refers to the original, unencrypted message; ciphertext, the encrypted output; and encryption oracle, a conceptual black-box interface that encrypts chosen plaintexts without disclosing the underlying key.[5] The adversary, denoted as \mathcal{A}, interacts with this oracle by submitting plaintexts m and receiving ciphertexts c = E_k(m), where E is the encryption function and k is the secret key hidden from \mathcal{A}. The concept of chosen-plaintext attacks was first formalized in the 1970s amid the development of public-key cryptography, notably in the seminal work by Diffie and Hellman, who highlighted such attacks as a critical threat to encryption systems beyond mere ciphertext-only analysis.[1]Adversarial Model
In the chosen-plaintext attack (CPA) model, the adversary is classified as active, interacting directly with the encryption system by submitting plaintexts of its choice to obtain corresponding ciphertexts, in contrast to passive adversaries who only observe existing communications. This active role enables the attacker to gather targeted information about the encryption process. Adversaries are further distinguished as adaptive or non-adaptive: adaptive adversaries may choose subsequent plaintexts based on responses to prior queries, allowing dynamic strategy refinement, while non-adaptive (or batch) adversaries submit all plaintexts in advance without feedback. The adaptive variant is the standard assumption in formal CPA definitions, as it captures more powerful attacks.[6] Adversaries in the CPA model are constrained to be probabilistic polynomial-time (PPT) algorithms, meaning they run in time polynomial in the security parameter n, and their behavior incorporates randomness to model realistic computational limitations. They are permitted a polynomial number q(n) of queries to the encryption oracle, ensuring the attack remains feasible only under bounded resources. These bounds reflect the foundational assumption in modern cryptography that no efficient algorithm can break secure systems with non-negligible probability.[7][6] The primary goal of a CPA adversary is to distinguish between encryptions of two distinct messages or, more broadly, to recover partial information about plaintexts, such as satisfying a predicate on the message. Success is quantified by the distinguishing advantage \epsilon, defined as the absolute difference between the probability of correctly guessing the encrypted message and random guessing (\frac{1}{2}), where a secure scheme ensures \epsilon is negligible in n for all PPT adversaries. Secondary goals like full key recovery imply CPA vulnerability but are not the minimal requirement for the model.[7][6] The adversary gains black-box access to an encryption oracle \text{Enc}_k(\cdot), which computes ciphertexts for chosen plaintexts under a secret key k without revealing k or internal details of the algorithm. This oracle simulates real-world scenarios where an attacker might control plaintext submission, such as in a compromised encryption device or service, but lacks decryption capabilities. In the indistinguishability formulation, the oracle may be augmented to a left-or-right variant that encrypts one of two submitted messages, formalizing the distinguishing challenge.[6]Attack Methods
General Procedure
A chosen-plaintext attack (CPA) follows a structured methodology that leverages access to an encryption oracle, allowing the adversary to select plaintexts and observe corresponding ciphertexts under a fixed secret key. This general procedure applies across various cryptosystems, particularly block ciphers, and aims to recover the key or forge new ciphertexts by exploiting structural weaknesses. The process is divided into distinct phases to systematically probe and analyze the encryption function.[8] The initialization phase involves setting up the adversarial environment, including establishing access to the encryption oracle and understanding the target cryptosystem's parameters, such as block size, key length, and round structure. During this setup, the adversary may perform preliminary tests with simple plaintexts to confirm oracle functionality and identify any observable behaviors, such as error messages or timing leaks, without yet committing to a full attack strategy. This phase ensures efficient resource allocation for subsequent steps, as seen in standard cryptanalytic frameworks for symmetric ciphers.[9] In the query phase, the adversary adaptively or non-adaptively selects plaintexts to submit to the oracle, receiving the resulting ciphertexts. Plaintext selection strategies are tailored to the anticipated weakness; for instance, choosing all-zero plaintexts can probe substitution patterns in the cipher's S-boxes, revealing key-dependent mappings, while structured inputs like those with specific bit differences target differential properties. The number of queries varies by attack type—representative examples include 2^{43} plaintexts for linear cryptanalysis on DES or fewer for simpler systems—but the goal is to generate plaintext-ciphertext pairs that amplify exploitable biases or correlations. This phase may involve multiple iterations, with each new plaintext potentially informed by prior responses in adaptive scenarios.[9] The analysis phase processes the collected ciphertexts using techniques such as statistical analysis to detect non-random distributions, linear approximations to approximate key bits via high-probability equations over plaintext-ciphertext pairs, or exhaustive search over reduced key spaces derived from partial information. For example, in linear cryptanalysis, the adversary computes correlations between linear combinations of input and output bits across queries to hypothesize subkey values, verifying them against observed data. These methods reduce the effective key space, enabling feasible computation even for large keys.[9] Finally, the exploitation phase utilizes insights from analysis to achieve the attack's objective, such as full key recovery, decryption of arbitrary ciphertexts, or forgery of valid encryptions. Success is measured by the attack's ability to outperform random guessing, typically achieving key recovery with probability significantly greater than 1/2^{key length}, often exceeding 90% in optimized cases like linear attacks on reduced-round ciphers. The attack breaks the system if this advantage allows practical decryption or impersonation.[9] To illustrate the query phase's iterative nature, the following pseudocode depicts a basic loop for submitting chosen plaintexts and collecting responses:This loop can be adapted for adaptive queries by incorporating prior CT values into m_i selection.[8]initialize empty list PT = [], CT = [] for i from 1 to num_queries: select plaintext m_i based on strategy (e.g., all zeros or [differential](/page/Differential) pair) query [oracle](/page/Oracle): c_i = Enc_k(m_i) append m_i to PT append c_i to CT return PT, CT for analysisinitialize empty list PT = [], CT = [] for i from 1 to num_queries: select plaintext m_i based on strategy (e.g., all zeros or [differential](/page/Differential) pair) query [oracle](/page/Oracle): c_i = Enc_k(m_i) append m_i to PT append c_i to CT return PT, CT for analysis
Oracle-Based Techniques
In chosen-plaintext attacks (CPAs), oracle-based techniques exploit the encryption oracle by submitting carefully crafted plaintexts to obtain corresponding ciphertexts, thereby inferring key material or structural weaknesses in the cipher. These methods prioritize strategic query selection to maximize information gain per interaction, distinguishing them from brute-force or passive approaches. Seminal works emphasize the encryption oracle's role in enabling controlled experimentation, as seen in differential cryptanalysis where specific plaintext differences are chosen to propagate through the cipher rounds. Query optimization forms a core element of these techniques, particularly through adaptive querying, which refines subsequent plaintext choices based on prior oracle responses. This iterative refinement allows attackers to efficiently explore the key space or plaintext dependencies, reducing overall query needs compared to non-adaptive strategies. A representative example occurs in cryptanalysis of order-preserving encryption (OPE) schemes, where an adversary uses binary search on the plaintext domain to recover the ordering of encrypted values. Starting with a midpoint plaintext p = \frac{m+1}{2} (where m is the domain size), the attacker queries its encryption and compares it against known points; depending on the result, the search interval halves, achieving key dependency isolation in O(\log N) queries, with N the domain size.[10] Efficiency metrics underscore the practicality of oracle-based techniques, with query complexity often expressed as O(q), where q is the number of required plaintext-ciphertext pairs. In differential cryptanalysis, for instance, breaking the full 16-round DES demands approximately $2^{47} chosen plaintexts, paired with a time complexity of $2^{37} equivalent encryptions for analysis, illustrating how query volume trades against computational resources. Space requirements scale linearly with q for storing pairs, while adaptive optimizations like binary search minimize q (e.g., logarithmic reductions) at the expense of heightened per-query processing. These trade-offs are critical, as excessive queries risk detection, whereas intensive computation may exceed feasible limits.[11] Common pitfalls arise from the encryption oracle's inherent constraints in pure CPA scenarios, which limit outputs to ciphertexts alone, excluding decryption access or feedback on validity—unlike chosen-ciphertext settings that enable padding oracle exploitation through error signals. Consequently, techniques simulating padding-like leaks via edge-case plaintexts (e.g., malformed inputs causing internal validation failures) remain constrained, often yielding incomplete information. To mitigate detection, attackers adopt avoidance strategies such as randomizing query patterns or mimicking legitimate plaintext distributions, ensuring queries blend with normal oracle usage without triggering rate limits or logging anomalies.[12]Historical and Illustrative Examples
Caesar Cipher
The Caesar cipher, dating to the 1st century BCE, was employed by Julius Caesar for securing military and official communications, as documented in ancient accounts by Suetonius and later analyzed in comprehensive histories of cryptography. This simple substitution cipher uses a fixed rotation amount k (typically between 1 and 25) as the key, where encryption shifts each plaintext letter by k positions in the alphabet, mathematically expressed as c = (p + k) \mod 26, with letters represented numerically (A = 0, B = 1, ..., Z = 25).[13] The Caesar cipher exemplifies a basic vulnerability to chosen-plaintext attacks (CPAs) due to its deterministic encryption and limited key space of 25 possibilities. In a CPA scenario, the attacker accesses an encryption oracle to submit chosen plaintexts and receive corresponding ciphertexts, enabling key recovery without needing intercepted messages.[13] To execute the attack, the attacker begins by querying the oracle with the single-character plaintext "A" (p = 0). The resulting ciphertext reveals the key directly: if the output is "D" (c = 3), then k = 3, as (0 + 3) \mod 26 = 3. With the key known, the attacker can compute the decryption function p = (c - k) \mod 26 for any ciphertext, fully compromising the system. For a step-by-step illustration assuming the attacker verifies the mapping across the alphabet (though unnecessary for key recovery), subsequent queries could include "B" (expected ciphertext "E" for k=3), "C" ("F"), and so on up to "Z" ("C"), confirming the uniform shift and exposing the entire substitution table:A→D, B→E, C→F, ..., W→Z, X→A, Y→B, Z→C. In the worst case, exhaustively querying all 26 letters maps the full permutation, but a single targeted query like "A" suffices in practice.[13] This vulnerability arises from the cipher's linear, key-dependent structure, which allows complete key exposure through minimal chosen plaintexts, rendering it insecure against even basic adversarial access to the encryption process. Early 20th-century cryptography texts, such as those by William Friedman, first detailed such systematic attacks on classical ciphers like the Caesar in pedagogical contexts.
One-Time Pad
The one-time pad is a symmetric-key encryption system in which the plaintext message m is encrypted by bitwise XOR with a secret key k of equal length, producing the ciphertext c = m \oplus k. Decryption reverses the process using the same key, as m = c \oplus k, since XOR is its own inverse. This construction requires the key to be generated from a truly uniform random source and used only once for a single message of matching length.[14] Theoretically, the one-time pad provides perfect secrecy against chosen-plaintext attacks, meaning that even an adversary with unlimited computational resources and access to an encryption oracle cannot gain any information about the plaintext from the ciphertext. Claude Shannon proved in 1949 that this security holds if the key is truly random, uniformly distributed over all possible bit strings of the message length, and never reused, ensuring that every possible plaintext is equally likely given any ciphertext. In a CPA setting, unlimited queries to the oracle yield only random-looking ciphertexts indistinguishable from encryptions of arbitrary chosen plaintexts, preserving the key's secrecy.[14] In practice, however, implementation flaws such as key reuse introduce severe vulnerabilities exploitable via chosen-plaintext techniques. During World War II, the Venona project revealed how Soviet intelligence reused one-time pads under wartime production pressures, allowing U.S. cryptanalysts to break numerous diplomatic and espionage messages. By identifying reused pads and employing cribs—guessed common phrases or structures in plaintexts akin to chosen-plaintext submissions—the analysts recovered keys and full messages. A classic two-time pad attack demonstrates this: if the same key k encrypts two distinct messages m_1 and m_2 to yield c_1 = m_1 \oplus k and c_2 = m_2 \oplus k, then XORing the ciphertexts gives c_1 \oplus c_2 = (m_1 \oplus k) \oplus (m_2 \oplus k) = m_1 \oplus m_2. Guessing one plaintext segment (e.g., via a crib) allows recovery of the other, cascading to the full messages and key.[15] Similar reuse issues compromised U.S. diplomatic communications in the 1940s, where the State Department repeatedly used daily keys an average of nine times across messages, enabling enemy cryptanalysts to exploit patterns and recover sensitive content through methods analogous to CPA recoveries. These breaches led to significant intelligence leaks, underscoring how deviations from one-time use undermine the system's theoretical invulnerability.[16]Security Notions and Implications
Indistinguishability under CPA
Indistinguishability under chosen-plaintext attack (IND-CPA) is a fundamental security notion in modern cryptography that formalizes the requirement for an encryption scheme to resist adversaries who can obtain encryptions of chosen plaintexts.[6] In this model, the adversary interacts with an encryption oracle during a query phase, submitting any plaintexts of its choice to receive corresponding ciphertexts, but without access to the secret key.[17] Following this phase, the adversary selects two distinct plaintext messages m_0 and m_1 of equal length and receives the encryption of one of them, chosen uniformly at random (say m_b where b \in \{0,1\}). The adversary then outputs a guess b' for b, aiming to determine which message was encrypted based solely on the challenge ciphertext and prior oracle queries.[6] The IND-CPA game proceeds in two main phases: the query phase, where the adversary \mathcal{A} makes polynomially many encryption queries to the oracle \mathcal{O}, and the challenge phase, where \mathcal{A} receives \text{Enc}(pk, m_b) and attempts to guess b. The advantage of \mathcal{A} in this game is defined as \text{Adv}^{\text{IND-CPA}}_{\mathcal{A},\Pi}(\lambda) = \left| \Pr[b' = b] - \frac{1}{2} \right|, where \lambda is the security parameter and \Pi is the encryption scheme. An encryption scheme \Pi is IND-CPA secure if, for all probabilistic polynomial-time adversaries \mathcal{A}, this advantage is negligible in \lambda.[6] To establish IND-CPA security, cryptosystems are typically proven secure via reductions to well-studied hardness assumptions. For the ElGamal encryption scheme, security reduces to the decisional Diffie-Hellman (DDH) assumption in a cyclic group of prime order. The proof proceeds by a hybrid argument over game hops, where the real IND-CPA game is shown to be indistinguishable from a game where the challenge ciphertext uses a random group element instead of the true encryption. Specifically, consider the following games for an adversary \mathcal{A}:- Game 0: The standard IND-CPA game, where \mathcal{A} wins if b' = b.
- Game 1: Identical to Game 0, except the second component of the challenge ciphertext c_2 = g^{ab} \cdot m_b is replaced by a random r \in [G](/page/G) (independent of m_b).