Format-preserving encryption

Format-preserving encryption (FPE) is a cryptographic method that encrypts plaintext data into ciphertext while preserving the original format of the data, including properties such as length, character set, and structural constraints.^[1] This technique ensures that the ciphertext remains compatible with systems expecting data in a specific format, such as encrypting a 16-digit credit card number into another valid 16-digit number.^[2] The formalization of FPE began with a seminal 2009 paper by Mihir Bellare, Thomas Ristenpart, Phillip Rogaway, and Till Stegers, which defined the security model for FPE schemes and proposed constructions using block ciphers to achieve format preservation through methods such as unbalanced Feistel networks and cycle-walking.^[1] This work established FPE as a way to provide semantic security indistinguishable from a random permutation over the message space under chosen-plaintext attacks.^[1] In 2010, the FFX mode of operation was introduced as a flexible framework for building FPE algorithms, emphasizing efficiency and security for practical applications.^[3] NIST further advanced the field in 2016 by publishing Special Publication 800-38G, which initially recommended two approved FPE modes—FF1 and FF3—for use with underlying block ciphers, providing standardized methods to encrypt domain-specific data like social security numbers or proprietary identifiers without altering storage schemas.^[2] In its 2025 revision (Rev. 1), FF3 was deprecated due to a security vulnerability, leaving FF1 as the approved mode.^[4] These modes leverage tweakable block ciphers to support variable-length inputs and ensure reversibility, making FPE suitable for protecting personally identifiable information (PII) in databases and legacy systems.^[2] While FPE offers strong confidentiality, its security bounds are tied to the size of the format domain, potentially requiring larger key sizes for equivalent protection compared to unrestricted block ciphers.^[1]

Overview

Definition

Format-preserving encryption (FPE) is a cryptographic technique that encrypts plaintext data while preserving its original format and structure, ensuring the ciphertext belongs to the same domain as the plaintext.^[5] In essence, FPE defines a permutation on a finite domain D, where the encryption function \Enc_k: D \to D maps elements of D to other elements within the same domain, and the decryption function \Dec_k: D \to D serves as its inverse, such that \Dec_k(\Enc_k(m)) = m for any plaintext m \in D.^[5] This bijectivity ensures that encryption is reversible and that every possible plaintext has a unique ciphertext counterpart, maintaining the full size of the domain |D| without loss or duplication.^[1] The domain D typically consists of strings of fixed length over a specified alphabet, such as decimal digits or alphanumeric characters, allowing FPE to operate on structured data like identifiers or codes. For a fixed key k from the key space \mathcal{K}, the keyed function \Enc_k must be a bijection on D, providing a one-to-one correspondence that preserves the cardinality of the domain.^[5] This setup contrasts with traditional block ciphers, which often require padding to fit fixed block sizes and produce ciphertexts in a binary format that may not align with the input's structure, potentially necessitating additional formatting layers in applications.^[1] In FPE, no such expansion or reformatting occurs; the ciphertext directly substitutes the plaintext while adhering to the same constraints, such as length and valid character sets.^[5] Practical examples illustrate this preservation: encrypting a 16-digit credit card number (from the domain of all valid 16-digit strings) yields another 16-digit number, ensuring compatibility with systems expecting that format.^[1] Similarly, FPE can encrypt alphanumeric license plates, mapping a string like "ABC123" to another valid plate format such as "XYZ789" under the key k, without altering the overall structure or requiring post-encryption adjustments.^[5] These properties make FPE particularly suited for domains where format integrity is essential, though its security relies on the underlying construction achieving strong pseudorandom permutation behavior.^[1]

Historical Development

The origins of format-preserving encryption (FPE) trace back to the late 1970s and early 1980s, when early efforts focused on adapting block ciphers like the Data Encryption Standard (DES) to handle data in non-binary formats without altering their structure. In 1981, the U.S. National Bureau of Standards (now NIST) published FIPS PUB 74, which included guidelines for implementing DES that described a method to encipher strings over a fixed alphabet, such as decimal digits, by treating the plaintext as a number in that base and performing modular arithmetic with DES outputs to produce a ciphertext in the same domain.^[6] This approach, while innovative for its time, was ad-hoc and lacked formal security analysis, serving primarily as an implementation aid for DES in legacy systems.^[1] By the mid-1990s, interest in preserving data formats grew with the rise of databases and structured information systems, leading to further informal proposals. In 1997, Peter Gutmann discussed a modulo-n addition scheme using a block cipher to encrypt values within restricted ranges, such as credit card numbers or identifiers, in a posting to the sci.crypt newsgroup. That same year, Michael Brightwell and H. Smith introduced the concept of "datatype-preserving encryption" in a presentation at the 20th National Information Systems Security Conference, emphasizing its practical utility for encrypting database fields like Social Security numbers while maintaining compatibility with existing software.^[7] These ideas highlighted the challenges of format preservation but remained largely heuristic, without rigorous cryptographic foundations. The shift toward provably secure FPE began in the early 2000s, driven by the need for constructions that could handle arbitrary domains while offering strong security guarantees. In 2002, John Black and Phillip Rogaway formalized FPE for integers, proposing cycle-walking as a method to map outputs back into the desired domain using a block cipher, though they noted its inefficiency for large domains. This laid groundwork for more systematic approaches. A major milestone came in 2009 with the work of Mihir Bellare, Thomas Ristenpart, Phillip Rogaway, and Till Stegers, who introduced a framework for building efficient, provably secure FPE schemes based on tweakable block ciphers, including unbalanced Feistel networks that support domains of any size while achieving indistinguishability from random permutations under standard assumptions.^[1] Rogaway's 2011 synopsis further synthesized these developments, outlining FPE's security models and constructions.^[5] Standardization efforts accelerated around 2010, with proposals like the FFX mode, developed by Rogaway and colleagues, providing a Feistel-based construction for encrypting variable-length strings over arbitrary alphabets.^[3] In 2016, NIST formalized FPE in Special Publication 800-38G, approving the FF1 and FF3 algorithms as modes of operation for approved block ciphers like AES, enabling secure encryption of formatted data such as payment card numbers.^[8] However, shortly after publication, cryptanalytic results emerged; in 2017, Betül Durak and Serge Vaudenay demonstrated message-recovery attacks on Feistel-based FPE schemes like FF3, exploiting structural weaknesses in small domains to recover plaintexts with practical complexities.^[9] Subsequent analyses reinforced these vulnerabilities. This led to NIST's revision of SP 800-38G in 2025, withdrawing FF3 entirely while retaining FF1 with strengthened parameters to ensure security for domains of at least one million elements.^[10] Overall, the evolution of FPE has progressed from rudimentary, DES-era techniques to modern, provably secure schemes integrated into standards, reflecting a post-2000s emphasis on cryptographic rigor amid growing demands for data privacy in formatted environments.

Motivation and Applications

Reasons for Format Preservation

Format-preserving encryption (FPE) addresses critical constraints in legacy systems, where data structures often feature fixed field lengths that cannot accommodate the variable or expanded outputs produced by traditional encryption schemes, such as padded block ciphers. For instance, mainframe databases commonly enforce rigid formats like 8-character fields, making standard ciphertexts incompatible without extensive refactoring. By mapping plaintexts to ciphertexts of identical length and character set, FPE enables retrofitting of encryption into these environments without disrupting operational workflows or requiring hardware upgrades.^[11]^[12] Another primary motivation stems from format-specific requirements that ensure data validity for downstream applications, particularly where built-in checks like the Luhn algorithm must remain intact. Credit card numbers, for example, include a check digit computed via the Luhn method to verify integrity; conventional encryption would typically produce outputs failing this validation, halting processing in payment gateways or verification systems. FPE constructions preserve such structural invariants by transforming valid inputs into equally valid outputs, maintaining seamless interoperability with existing validation logic.^[13] FPE also alleviates integration challenges in contemporary systems, such as SQL databases and APIs that rely on exact data formats for queries, joins, and serialization. Altering field lengths or types via traditional encryption could necessitate schema modifications, API revisions, or custom parsing layers, introducing complexity and potential points of failure. With FPE, sensitive data can be encrypted in place, preserving referential integrity and allowing applications to process ciphertexts as if they were plaintexts, thus minimizing deployment friction.^[12] In terms of efficiency, FPE eliminates the need for ancillary format conversions or additional storage to handle ciphertext expansion, which is especially advantageous in high-volume transactional systems where even minor overheads can accumulate significantly. This direct mapping reduces both computational load and database footprint, enabling scalable protection without performance degradation.^[12] Finally, FPE supports regulatory compliance by facilitating the encryption of personally identifiable information within unaltered data pipelines, easing tokenization and auditing processes under standards like PCI DSS. This approach allows organizations to meet security mandates for protecting formats like credit card numbers while avoiding pipeline disruptions that could complicate verification or reporting.^[13]

Real-World Use Cases

In the financial sector, format-preserving encryption (FPE) is widely used to encrypt primary account numbers (PANs) and other account identifiers while maintaining their original digit length, ensuring compliance with Payment Card Industry Data Security Standard (PCI DSS) requirements.^[14]^[15] This approach allows financial institutions to perform fraud detection, transaction auditing, and reporting without altering database schemas or disrupting legacy payment processing systems.^[14] For instance, credit card numbers can be encrypted such that a 16-digit input yields a 16-digit ciphertext, preserving compatibility with validation algorithms and point-of-sale integrations.^[16] In healthcare, FPE facilitates the tokenization of sensitive patient identifiers, such as Social Security numbers (SSNs) and medical record numbers, within electronic health record (EHR) systems, thereby supporting Health Insurance Portability and Accountability Act (HIPAA) compliance.^[14]^[15] By retaining the alphanumeric format, encrypted data remains usable for queries, joins, and reporting in clinical workflows, avoiding disruptions to search functionalities or interoperability standards like HL7.^[17] Examples include encrypting nine-digit SSNs to produce similarly structured tokens, enabling secure analysis of patient data in quality assurance and research environments without exposing personally identifiable information (PII).^[16] For data analytics, FPE enables the secure sharing of de-identified datasets in multi-tenant cloud environments, particularly for AI and machine learning (ML) training where format preservation ensures referential integrity across distributed systems.^[18] In platforms like Snowflake or AWS, encrypted fields such as emails or phone numbers maintain their structural properties, allowing joins, filters, and aggregations without decryption, thus supporting privacy-preserving analytics on sensitive spans of data.^[17] This is especially valuable in multi-tenant SaaS applications, where tenant-specific keys isolate encrypted data while permitting collaborative ML model development on anonymized datasets.^[17] In government and identity management, FPE protects sensitive identifiers in systems handling voter information and official documents, maintaining alphanumeric structures essential for operational continuity.^[14]^[15] For example, it secures Social Security numbers in human resources databases and similar IDs in voter registration systems, enabling secure data sharing with partners while complying with privacy regulations like GDPR equivalents, without requiring schema modifications in legacy COBOL-based infrastructures.^[14] This preserves functionality for identity verification processes, such as those in passport databases, where format changes could invalidate cross-referencing protocols.^[15] As of 2025, notable developments include the integration of FPE for PII protection in SnapLogic's data integration platform, where the new FPE Snap Pack employs the NIST-approved FF1 algorithm to encrypt fields like SSNs and credit card numbers in real-time pipelines, enhancing compliance in hybrid cloud environments.^[16] Similarly, ALTR's solutions leverage FPE in multi-tenant environments to dynamically encrypt sensitive data for analytics and sharing, supporting deterministic operations across tenants without performance overhead.^[17] These advancements underscore FPE's role in enabling operations akin to homomorphic encryption, such as range queries and pattern matching on ciphertext, without altering data formats or necessitating full decryption.^[18]^[14]

Theoretical Foundations

Comparison to Random Permutations

An ideal random permutation on a domain D is a uniformly chosen bijection from D to itself, offering perfect secrecy as it maps every element equally likely to any other without patterns or biases.^[13] This construction provides the theoretical gold standard for format-preserving encryption (FPE), ensuring that no information about the plaintext leaks through the ciphertext beyond the format preservation itself. However, generating and storing such a permutation becomes computationally infeasible for large domains, as it requires precomputing and maintaining a table of size |D|, each entry specifying the image of an input under the bijection.^[19] For practical domains in FPE applications, such as credit card numbers forming a set of size $10^{16}, storing a random permutation table would demand approximately $10^{16} entries (e.g., 64 bits each), equating to roughly 80 petabytes of storage—far beyond current practical capabilities and economic feasibility.^[13] Even smaller domains, like $10^{10} elements, render table-based approaches like the prefix cipher impractical due to the linear space and time requirements for initialization.^[19] In contrast, the total number of possible permutations, |D|!, grows factorially, making enumeration or random selection without storage impossible even for modest |D|, as it exceeds any polynomial-time computation.^[13] FPE schemes address this by constructing keyed pseudorandom permutations (PRPs) that mimic the behavior of ideal random permutations under computational constraints, using efficient algorithms like Feistel networks to achieve bijections on arbitrary finite domains.^[19] These PRPs are deterministic functions keyed with a secret value, computable in polynomial time relative to the security parameter, avoiding the exponential storage and time costs of true random permutations.^[13] The efficiency stems from leveraging underlying block ciphers or hash functions to build the permutation iteratively, enabling encryption and decryption in constant or logarithmic time per operation, scalable to large domains like $2^{128}.^[19] The primary security goal of FPE is computational indistinguishability from a random permutation, analogous to IND-CPA security but adapted for permutations: an adversary with access to an encryption oracle (chosen-plaintext queries) should not distinguish the FPE scheme from a truly random bijection with more than negligible advantage, bounded by the number of queries and domain size.^[13] This PRP security ensures that for practical query limits (e.g., far below \sqrt{|D|}), the permutation appears uniformly random, providing strong protection against pattern-revealing attacks while maintaining format preservation.^[19]

Comparison to Block Ciphers

Block ciphers can be viewed as a special case of format-preserving encryption (FPE) when operating on their native domain, such as AES-128 mapping bit strings from the set {0,1}^128 to itself in a bijective manner.^[13] However, this preservation is limited to binary formats of fixed length matching the block size, rendering standard block ciphers unsuitable for arbitrary formats like decimal strings or credit card numbers without additional processing.^[13] Common modes of operation for block ciphers highlight further limitations in achieving format preservation. The Electronic Codebook (ECB) mode preserves the input format by encrypting each block independently, but it is insecure due to its susceptibility to pattern-revealing attacks, such as those exploiting repeated plaintext blocks.^[20] In contrast, modes like Cipher Block Chaining (CBC) and Output Feedback (OFB) typically alter or expand the format through chaining or feedback mechanisms, often requiring padding that disrupts the original structure and prevents direct preservation.^[21] Adapting block ciphers to non-power-of-two domains, such as strings of length n over a decimal alphabet (domain size 10^n), introduces significant challenges. Direct mapping to the cipher's block size can lead to inefficiencies, such as excessive padding or truncation, or introduce information leakage if the mapping is not perfectly uniform.^[13] These issues arise because block ciphers are optimized for power-of-two bit lengths, making bijections over irregular domains computationally expensive without specialized techniques like ranking functions.^[13] FPE addresses these shortcomings by directly handling arbitrary domains without necessitating binary conversion or format-altering padding, thereby maintaining compatibility with existing systems.^[13] Moreover, FPE constructions often integrate underlying block ciphers, such as AES, within structures like Feistel networks to leverage their proven security while achieving preservation over custom formats.^[13] Historically, early block ciphers like DES provided incidental format preservation over its 64-bit domain, as referenced in early standards like FIPS PUB 74.^[13] However, with modern data formats exceeding such fixed block sizes—such as 16-digit credit card numbers—the limitations of these early approaches became evident, prompting the development of dedicated FPE methods, including the datatype-preserving encryption introduced by Brightwell and Smith in 1997.^[22]

Security Concepts

Indistinguishability Security

While the ideal security notion for format-preserving encryption (FPE) is computational indistinguishability from a random permutation on the message domain—mirroring the pseudorandom permutation (PRP) security definition used for block ciphers—practical FPE schemes often target weaker but sufficient notions such as single-point indistinguishability (SPI).^[13] In the PRP framework, an efficient adversary with oracle access to the encryption function under a secret key cannot distinguish it from a randomly chosen permutation with more than negligible probability.^[13] Formally, for an adversary A making up to q queries, the distinguishing advantage is given by

\Adv^{\FPE}(A) = \left| \Pr\left[A^{\Enc_k}(\cdot) = 1\right] - \Pr\left[A^{\pi}(\cdot) = 1\right] \right|,

where \Enc_k denotes the keyed FPE encryption oracle, \pi is a random permutation from the domain to itself, and the probabilities are taken over the random key k and random choice of \pi.^[13] An FPE scheme achieves PRP security if \Adv^{\FPE}(A) is negligible as a function of the security parameter \lambda for all probabilistic polynomial-time adversaries A.^[13] For SPI security, which suffices for many applications by ensuring single encryptions are indistinguishable from random, typical bounds scale with terms like q / M, where M is the minimum domain size across slices; for domains of size approximately $2^n, this provides security up to queries limited by slice sizes, with full PRP approximations holding up to roughly q \approx 2^{n/2} before birthday attacks degrade distinguishability.^[13] A key variant is strong PRP (SPRP) security, which extends the notion to handle adaptive chosen-ciphertext queries via a decryption oracle, providing indistinguishability under IND-[CCA](/page/CCA) for deterministic schemes like FPE.^[13] This stronger model ensures robustness even if the adversary can query both encryptions and decryptions adaptively, excluding the test challenge itself.^[13] FPE indistinguishability relies fundamentally on the computational security of an underlying PRP or PRF (e.g., AES) incorporated into the construction, as no information-theoretic security is possible for large finite domains without exponential-time assumptions.^[13] Thus, the overall security inherits and amplifies the base primitive's PRP advantage, with reductions showing that breaking FPE security implies breaking the underlying primitive.^[13]

Adversarial Models and Proofs

In format-preserving encryption (FPE), adversarial models define the capabilities of attackers seeking to distinguish the scheme from a random permutation over the domain. The standard models include chosen-plaintext attacks (CPA), where the adversary queries the encryption oracle with chosen plaintexts, and chosen-ciphertext attacks (CCA), which additionally allow decryption queries. These models are adapted for FPE's domain-specific nature, often incorporating tweaks to model attacks on fixed formats like credit card numbers or social security identifiers. Tweakable FPE (tFPE) extends these models by including a tweak parameter, typically derived from domain metadata, which acts as an additional input to the encryption function. This tweakability mitigates risks in multi-domain deployments by ensuring that encryptions under different tweaks behave as independent permutations, limiting the adversary's ability to correlate queries across domains. In CPA and CCA settings, tFPE security is proven under the assumption that the underlying block cipher is a strong pseudorandom permutation (PRP), with the tweak space expanding the effective key diversity.^[3] Proof techniques for FPE security heavily rely on the Luby-Rackoff theorem, which establishes that a balanced Feistel network with independent pseudorandom round functions yields a PRP after three rounds under CPA and a strong PRP (secure against CCA) after four rounds, assuming ideal round functions. For FPE, this is extended to unbalanced Feistel networks to handle arbitrary domain sizes, where security proofs bound the adversary's distinguishing advantage (typically for SPI) by the pseudorandom function (PRF) security of the round functions plus terms scaling with the number of queries q and domain parameters. Specifically, for an r-round unbalanced Feistel like FE2, the SPI advantage is at most $2 \cdot \Adv_{\PRF}(q) + 2q / a_m, where a_m is the minimum slice size and n relates to overall bit length; full PRP approximations add birthday terms like q^2 / 2^{n}, achieving security up to roughly q \approx \min(2^{n/2}, a_m) queries before limitations dominate.^[13] In small domains common to FPE applications, such as numeric strings of length 16, the birthday bound restricts security to about \sqrt{|D|} queries, where |D| is the domain size, necessitating additional rounds or tweaks to push attacks beyond practical feasibility. Tweakability further alleviates this by treating each tweak as a separate instance, effectively multiplying the query bound by the tweak space size.^[3] Seminal constructions like those by Black and Rogaway, improved in subsequent work, demonstrate SPI security for FPE under CPA by basing unbalanced Feistels on AES as the round function, with proofs reducing security to the PRF advantage of AES plus terms like q / a_m: \Adv_{\SPI} \leq 2 \cdot \Adv_{\PRF}^{\AES}(q) + 2q / a_m. These proofs assume an adaptive adversary but hold for up to q \ll \min(a_m, 2^{n/2}) queries, confirming that AES-derived FPE resists domain-specific attacks when sufficiently many rounds (e.g., 10–12) are used.^[13]

Algorithms and Constructions

Black and Rogaway Methods

The foundational methods for format-preserving encryption (FPE) were introduced by Black and Rogaway in their 2002 work on ciphers for arbitrary finite domains, later extended and analyzed by Black, Rogaway, and Shrimpton in 2009. These constructions transform a standard block cipher, such as AES-128 serving as a pseudorandom permutation (PRP), into an FPE scheme over a domain D of arbitrary size n by encoding domain elements as integers in [0, n-1]. The methods ensure bijectivity to support decryption while preserving the input format, with security reductions to the underlying PRP's properties. The prefix cipher method encrypts by prefixing the plaintext value to an input for the block cipher and adjusting the output to fit the domain size n. Specifically, for a plaintext x \in D, the scheme prepends a distinguishing prefix (e.g., a counter or fixed string) to x to form a full block input, applies the PRP to produce an output block, extracts the relevant portion, and maps it modulo n or via rejection to yield a ciphertext in D. This ensures the mapping is a permutation equivalent in security to the block cipher, but it requires up to n PRP evaluations in the worst case for precomputation or per-encryption, making it practical only for small domains (e.g., n \leq 2^{20}).^[23] Cycle walking addresses domain mismatch by starting with the encoded plaintext x \in [0, n-1], applying the PRP to obtain y = E_k(x), and if y \notin [0, n-1], iteratively reapplying the PRP (i.e., y \leftarrow E_k(y)) until the output falls in the domain. Bijectivity follows from the PRP's permutation property, which decomposes the space into cycles; walking ensures each starting point in D maps uniquely to another point in D without fixed points outside. The expected number of PRP calls is $2^{b}/n, where b is the block size (e.g., 128 for AES), so it suits small-to-medium domains (e.g., n \approx 2^{32}) but becomes inefficient for very small n due to frequent iterations. No timing attacks from iteration counts are feasible under standard PRP assumptions.^[23]^[13] The Feistel network method builds a multi-round Feistel cipher directly over the numeric domain, using the PRP as the round function to split and recombine values via modular arithmetic. For an unbalanced variant suitable for non-power-of-two n = ab with a, b \approx \sqrt{n}, the input is partitioned into left (L \in \mathbb{Z}_a) and right (R \in \mathbb{Z}_b) parts; each round computes L' = R, R' = L \oplus F(R), where F is the PRP applied to R (padded if needed), with 4-6 rounds typically sufficient for security. Balanced Feistels can be used when n allows even splits. This yields a constant number of PRP calls (e.g., 4-6), making it efficient for large domains (e.g., n > 2^{64}), and inherits provable security from classical Feistel analyses under PRP assumptions.^[23]^[13]

Thorp Shuffle

The Thorp shuffle is a format-preserving encryption construction inspired by a card shuffling technique originally developed by Edward O. Thorp for analyzing blackjack probabilities.^[24] Thorp introduced the method in 1973 to model efficient deck mixing, and it was later analyzed for cryptographic use in the context of unbalanced Feistel networks, notably by Stefan Lucks in 1996.^[24] This adaptation leverages the shuffle's ability to generate pseudorandom permutations over small domains, making it suitable for encrypting structured data without altering its format. The construction operates as a maximally unbalanced Feistel network over the input domain [N] (representing values from 0 to N-1), treated as a "deck" of cards in binary representation. In each round, the network processes the domain using bit-wise splits (e.g., 1 bit versus n-1 bits for an n-bit domain), where a pseudorandom function (PRF), such as one derived from a block cipher like AES, generates bits that determine whether to interleave or position elements by deciding swaps or placements in the permutation. Multiple rounds (typically 6) compose to form the full permutation, ensuring the output remains in the same domain.^[24] Security analyses show that 6 rounds of the Thorp shuffle achieve pseudorandom permutation (PRP) security, with an adversary advantage bounded by approximately $6q^2 / [N](/page/N+) queries under the PRF security assumption, where q is the number of queries and N is the domain size.^[24] This makes it particularly efficient for numeric domains such as dates (e.g., YYYY-MM-DD strings) or identifiers (e.g., employee IDs), requiring only O(\log [N](/page/N+)) PRF evaluations per encryption in practice, often accelerated by processing multiple bits per call. For example, to encrypt a 4-digit year like 2025 in the domain [1000, 9999] (N = 9000), the value is mapped to an index in [0, 8999], then subjected to 6 rounds of shuffling using a keyed PRF; the resulting index is remapped back to a 4-digit year, preserving the numeric format while randomizing the output.^[24] Key advantages include its simplicity—requiring no complex state management—and the fact that it generates a full permutation without needing cycle detection, unlike cycle-walking approaches that may loop indefinitely on small domains.^[24] As a specialized unbalanced Feistel variant, it builds on general Feistel principles but optimizes for minimal round functions to mix the domain rapidly.^[24]

Hasty Pudding Cipher

The Hasty Pudding Cipher (HPC) was proposed by Richard Schroeppel in June 1998 as a candidate in the Advanced Encryption Standard (AES) competition organized by NIST, with a revised specification released in May 1999.^[25] Designed as a versatile block cipher, it was later recognized for its suitability in format-preserving encryption (FPE) due to its inherent support for arbitrary input sizes without requiring padding or length expansion.^[5] Unlike traditional fixed-block ciphers, HPC enables direct encryption of data in small domains, making it one of the earliest constructions adaptable for FPE applications.^[3] HPC operates as a variable-block-size cipher that accepts plaintext blocks and keys of any length (0 or more bits), processing them without fixed boundaries.^[25] It employs five specialized sub-ciphers tailored to different block size ranges: HPC-Tiny for 0-35 bits, HPC-Short for 36-64 bits, HPC-Medium for 65-128 bits, HPC-Long for 129-512 bits, and HPC-Extended for over 513 bits, allowing padding-free encryption for domains up to 256 bits in typical FPE scenarios.^[25] The cipher supports variable key sizes and incorporates a tweakable mechanism via an optional "spice" input (a 512-bit secondary key) to enhance variability, functioning in a manner that blends block and stream cipher characteristics for flexible data handling.^[26] At its core, HPC's structure revolves around key expansion to generate 256-entry tables (KX tables) of 64-bit words, initialized with mathematical constants like approximations of π, e, and the golden ratio, then stirred using a lossy mixing function involving XOR, addition, subtraction, and bit shifts modulo 2^64.^[25] Data is processed as arrays of 64-bit words, with a central "stirring" operation that mixes state variables (s0 through s7) through multiple passes of table lookups, arithmetic operations, and permutations to achieve diffusion and confusion, effectively combining substitution-like table accesses with permutation via word rearrangements.^[25] This format-aware design directly maps inputs to outputs of the same length and type, such as numeric fields or short strings, without altering the underlying data format.^[5] HPC is particularly well-suited for encrypting short fields, such as 4-8 character alphanumeric strings (typically 32-64 bits), where preserving exact length and domain is critical, and has been incorporated into some legacy cryptographic systems from the late 1990s AES era.^[25] However, despite its innovative flexibility, HPC has received less rigorous security analysis compared to modern FPE schemes like those in the FFX family, with limited formal proofs of security and no widespread adoption following its elimination from the AES selection process.^[5] Its use today is confined to niche or historical contexts, as contemporary standards favor more thoroughly vetted constructions.^[3]

FFX Family Schemes

The FFX (Format-preserving, Feistel-based eXtension) mode of operation provides a tweakable construction for format-preserving encryption, enabling the encryption of plaintexts from arbitrary domains while preserving the original format.^[3] Introduced by Mihir Bellare, Phillip Rogaway, and Terence Spies in 2010, FFX builds on Feistel networks to support flexible parameters such as radix (alphabet size), message length, and tweak values, allowing it to handle diverse data formats like numeric strings or binary fields without requiring domain-specific adjustments.^[3] The scheme operates over a domain D of size up to $2^{128}, using an underlying pseudorandom permutation (PRP) like AES-128 in a balanced or unbalanced Feistel structure with a variable number of rounds.^[3] FF1, a specific instantiation of the FFX framework, employs a 10-round Feistel network with AES-128 as the round function PRP.^[8] It accommodates radices from 2 to $2^{16} and lengths from 2 up to nearly $2^{32} symbols, with a minimum domain size of 1 million as of the 2025 revision, making it suitable for encrypting large datasets like credit card numbers while maintaining full NIST approval as a general-purpose FPE method.^[8] The construction ensures tweakable security, where the tweak incorporates domain parameters to prevent cross-domain attacks. As of November 2025, FF1 is the only NIST-approved FPE method following the withdrawal of FF3 and FF3-1.^[10] FF3, another FFX-based scheme, uses 8 rounds with AES-128 but limits domain sizes more conservatively, with lengths up to \lfloor \log_{\text{radix}}(2^{96}) \rfloor symbols and a minimum radix-length product of 100.^[8] Initially approved by NIST in 2016 alongside FF1, FF3 was withdrawn on February 3, 2025, in the second public draft (2PD) of SP 800-38G Revision 1, due to cryptanalytic attacks including 2017 slide attacks exploiting poor domain separation in small domains (e.g., sizes around 100 to 1 million) and a weakness in the tweak schedule identified by Beyne et al.^[10]^[27]^[28] These vulnerabilities render FF3 unsuitable for general use, with NIST recommending migration to FF1. FF3-1, introduced in the 2019 initial public draft of NIST SP 800-38G Revision 1 as a revision of FF3, featured modifications such as a reduced 56-bit tweak and a mandatory minimum domain size of 1 million to mitigate small-domain attacks.^[29] Retaining the 8-round Feistel structure with AES-128, FF3-1 aimed to enhance security through better round-key derivation but was also withdrawn in the February 3, 2025, 2PD due to the same tweak schedule weakness affecting FF3.^[10]^[28] In all FFX variants, the core round function follows a standard Feistel update:

\begin{align*} F_i(P_L, P_R) &= \text{PRP}_k(P_L \parallel \text{tweak}_i) \mod |D_R| \end{align*}

where P_L and P_R are the left and right halves of the plaintext, k is the key, \text{tweak}_i is the round-specific tweak, and |D_R| is the size of the right-half domain.^[3] This modular reduction ensures the output fits the target domain, preserving format integrity across rounds.

Other Constructions

Variable Input Length (VIL) mode provides a method for format-preserving encryption (FPE) over domains of arbitrary length, particularly suited for power-of-two sizes, by constructing a variable-input-length tweakable cipher from a fixed-input-length block cipher.^[13] It employs a rank-then-encipher paradigm, where the plaintext is ranked within its domain to an integer index, encrypted using an underlying block cipher in a Feistel network, and then unranked to produce the ciphertext.^[13] For power-of-two domains like binary strings of length n, VIL uses unbalanced Feistel constructions (e.g., type-1 or type-2) with pseudorandom functions (PRFs) derived from smaller block ciphers such as AES, applying modular arithmetic in round functions to ensure security against chosen-plaintext attacks with sufficient rounds (e.g., at least \lceil \log_2 n \rceil + 1 for type-1 Feistels).^[13] This approach supports efficient encryption for non-standard lengths while preserving the input format, with security proofs for standard permutation indistinguishability models.^[13] The Format-preserving Number-theoretic mode (FNR), or Flexible Naor-Reingold, is a block cipher designed for small-domain FPE, particularly numeric formats like credit card numbers or IP addresses, using modular arithmetic over finite fields. It builds on the Naor-Reingold construction by employing random invertible matrices in GF(2 for pairwise-independent permutations, processing inputs of arbitrary lengths from 32 to 128 bits through a series of linear transformations and key-dependent mixing steps. FNR preserves the exact length and numeric structure of the input, making it suitable for network protocols or database fields, with efficiency gains over Feistel-based methods for domains smaller than 128 bits. Security analyses confirm resistance to distinguishing attacks up to certain query complexities, though it has faced cryptanalytic scrutiny for reduced-round variants.^[30] Format-preserving encryption for JPEG 2000 images addresses syntax constraints by avoiding forbidden byte sequences, such as 0xFF as a standalone marker or in pairs exceeding 0xFF8F, using stream ciphers or cycle walking on byte streams.^[31] In the stream cipher approach, a keystream (e.g., from RC4) discards 0xFF values before XORing with plaintext bytes, leaving 0xFF bytes unchanged to maintain compliance, which secures approximately 99% of image data with minimal overhead.^[31] Alternatively, cycle walking applies a block cipher (e.g., AES) to selected non-0xFF bytes and re-encrypts if invalid sequences appear, ensuring the codestream remains decodable by standard JPEG 2000 parsers while providing semantic security.^[31] These methods integrate with JPSEC standards for selective encryption of packet bodies, balancing security and format fidelity.^[31] FEA-1 and FEA-2 are South Korean national standards for FPE, tailored for formats like resident registration numbers, using Feistel networks built from tweakable Even-Mansour block ciphers to ensure length and character set preservation.^[32] Both schemes employ multi-round Feistels with dedicated round functions incorporating S-boxes and diffusion layers, similar to DES, operating on fixed-length inputs (e.g., 13 digits for IDs) via tweakable primitives that adjust per domain.^[32] FEA-1 uses a balanced structure for general numeric domains, while FEA-2 optimizes for unbalanced partitions, achieving security against linear and differential attacks with 8-10 rounds for 128-bit keys.^[33] Recent cryptanalysis in 2025 has improved square attacks, enabling key recovery on 5 rounds for FEA-1 and 7 rounds for FEA-2, though full-round instances remain secure; these findings underscore the need for ongoing evaluation.^[33] They support efficient hardware implementation for national ID systems, with proven bounds under tweakable strong pseudorandom permutation models.^[32] Among miscellaneous constructions, Peter Gutmann's early method for file encryption uses modular addition to map data within restricted ranges, preserving file formats by adding a key modulo the domain size (e.g., for printable characters).

Standardization and Acceptance

NIST Standards

The National Institute of Standards and Technology (NIST) has played a pivotal role in standardizing format-preserving encryption (FPE) through its Special Publication (SP) 800-38G, initially released in March 2016 following drafts dating back to 2012. This document specifies two approved FPE methods, FF1 and FF3, as modes of operation for underlying approved block ciphers like AES, designed to encrypt data while preserving its format, such as numeric strings or identifiers. A key security requirement in the original specification is a minimum domain size of 100 for both methods, though NIST recommends at least 1,000,000 to ensure adequate security margins against potential attacks.^[34] In response to emerging cryptanalytic concerns, NIST issued a draft revision of SP 800-38G in February 2019, introducing FF3-1 as a modified version of FF3 with a shortened 56-bit tweak to mitigate known vulnerabilities, while retaining FF1. This revision strengthened the domain size requirement to a mandatory minimum of 1,000,000 for both FF1 and FF3-1, reflecting analysis that smaller domains could compromise security. Implementations must use keys of 128, 192, or 256 bits from approved block ciphers, and tweaks—additional inputs that can incorporate context like timestamps or user IDs—are required for all encryptions to enhance security by preventing identical plaintexts from producing identical ciphertexts under the same key. Validation of FPE implementations occurs through NIST's Cryptographic Algorithm Validation Program (CAVP), which tests conformance to SP 800-38G specifications, ensuring interoperability and security for certified modules.^[35] Further updates culminated in the second public draft of SP 800-38G Revision 1 on February 3, 2025, which fully withdrew FF3 and FF3-1 due to practical cryptanalytic attacks, such as linear cryptanalysis, that demonstrated vulnerabilities on domains smaller than 10^8, rendering these methods unsuitable for general use. The revised standard now endorses only FF1, with the elevated domain size requirement upheld to maintain 128-bit security levels. For U.S. federal systems handling formatted sensitive data, such as social security numbers or financial identifiers, adherence to these NIST-approved FPE methods is mandatory under Federal Information Processing Standards (FIPS), promoting secure data protection without altering downstream processes.^[36]

International and Other Standards

South Korea has developed national standards for format-preserving encryption through the Format Encryption Algorithms (FEA). FEA-1 and FEA-2 are Feistel-based constructions designed for secure encryption of structured data, such as resident registration numbers, which require preservation of numeric formats for compliance and usability in government systems.^[37]^[33] These algorithms employ tweakable block ciphers in multi-round Feistel networks to ensure the ciphertext matches the plaintext domain exactly, facilitating seamless integration into legacy databases handling personally identifiable information.^[36] The ISO/IEC 27033 series offers comprehensive guidelines on network security, emphasizing encryption techniques to protect data in transit and at rest, where format-preserving methods can support format retention without dedicated specifications for FPE itself. This framework indirectly aids FPE adoption by outlining controls for secure information flows, allowing organizations to apply format-preserving schemes in networked environments while aligning with broader information security management practices.^[38] In the payment industry, the Payment Card Industry Data Security Standard (PCI DSS) version 4.0, released in 2022, implicitly supports format-preserving encryption within its tokenization guidelines for protecting cardholder data. Specifically, PCI DSS recognizes reversible encryption methods like FPE for generating tokens from primary account numbers (PANs), noting that such tokens—derived mathematically from the original data—must address additional compliance considerations to minimize scope and risk exposure.^[39] Beyond these, the European Telecommunications Standards Institute (ETSI) incorporates encryption requirements in telecom security specifications, such as those for pervasive network protection, where format-preserving techniques help maintain data integrity for communication identifiers and logs.^[40] In the European Union, post-GDPR developments promote FPE as part of data masking strategies to safeguard personally identifiable information (PII) while preserving usability in processing pipelines, aligning with privacy-by-design principles under evolving regulatory guidance.^[41] As of 2025, format-preserving encryption sees increasing adoption for international compliance, particularly in cross-border cloud data sharing, where it enables secure collaboration without disrupting data pipelines or violating residency rules.^[42] This trend reflects a broader shift toward layered privacy tools, with FPE complementing general encryption mandates in global frameworks influenced by NIST's foundational work.^[16]

Implementations

Open-Source Libraries

Several open-source libraries provide implementations of format-preserving encryption (FPE) algorithms, focusing on NIST-approved methods such as FF1 and FF3-1, with many updating their support in light of the 2025 withdrawal of FF3 from the NIST standard.^[10] These libraries typically include features like tweakable encryption parameters, domain size validation to ensure format preservation, and integration with underlying block ciphers such as AES.^[43] They are hosted on platforms like GitHub, where active maintenance includes security patches and compatibility updates as of 2025.^[44] The Bouncy Castle library, a widely used Java cryptography suite, includes support for FF1 and FF3-1 through its FPEEngine and FPEFF3_1Engine classes, enabling format-preserving operations on strings and numeric domains while adhering to NIST SP 800-38G specifications.^[45] It supports tweakable modes for contextual encryption and validates input domains to prevent format violations, making it suitable for enterprise Java applications.^[45] In Go, the capitalone/fpe package offers an implementation of FF1 and FF3 algorithms, providing functions for encrypting and decrypting byte slices while preserving length and format, with built-in support for radix-based domains and tweak values.^[43] This library was last updated in 2022 and archived since 2021, with forks available for continued use.^[43] It includes examples for numeric and alphanumeric preservation, and it validates domain sizes per NIST guidelines to avoid insecure small domains.^[43] For Python, the ff3 package implements FF3 and FF3-1 for format-preserving encryption of strings, supporting custom alphabets and tweakable keys, with domain validation to ensure ciphertext matches plaintext format exactly.^[46] Similarly, the mysto/python-fpe library provides FF3-based FPE with updates noting the 2025 FF3 withdrawal, recommending migration to FF1-compatible modes, and includes tests for NIST compliance.^[44] These extend the core cryptography.io library by adding FPE primitives without altering its API, allowing seamless integration for data tokenization tasks.^[44] C implementations include the 0NG/Format-Preserving-Encryption library, which supports FF1 and FF3 for efficient, portable encryption on arbitrary-length domains, featuring tweak support and validation routines to maintain format integrity.^[47] The libffx library, primarily in Python but with C underpinnings, implements the broader FFX mode for versatile FPE, including parameter sets A2 and A10, and enforces domain checks for security.^[48] These libraries facilitate integration into database systems, such as PostgreSQL, where custom extensions or user-defined functions can leverage them for column-level FPE on sensitive data like credit card numbers, preserving SQL query compatibility and format constraints without schema changes.^[49] For instance, Python-based libraries like ff3 can be wrapped in PL/Python functions within PostgreSQL to apply tweakable FPE during queries, with active GitHub repositories ensuring ongoing support and patches for post-2025 standards.^[46] In April 2025, NIST published the final revision of SP 800-38G (Rev. 1), confirming the removal of FF3-1 and recommending exclusive use of FF1 for new implementations, prompting further updates in open-source libraries to align with the revised standard.^[50]

Commercial and Enterprise Solutions

Commercial and enterprise solutions for format-preserving encryption (FPE) primarily consist of proprietary platforms designed for large-scale data protection in regulated environments, emphasizing seamless integration with existing infrastructures without requiring schema changes. OpenText's Data Privacy & Protection Foundation, formerly known as Voltage SecureData from Micro Focus, provides FPE capabilities tailored for database encryption, utilizing NIST-approved AES FF1 algorithms to safeguard sensitive fields while maintaining data format, length, and usability for analytics. This solution supports in-place encryption across structured data stores, enabling organizations to protect information without disrupting applications or workflows.^[51]^[52] Thales' CipherTrust Data Security Platform incorporates format-aware tokenization and FPE to encrypt data in databases and files, preserving original formats to avoid modifications to database schemas or business logic. It leverages FPE algorithms compliant with standards like NIST SP 800-38G, allowing reversible encryption for operational needs while ensuring non-disruptive protection. The platform extends to vaultless tokenization modes, which enhance performance by eliminating centralized vault dependencies.^[53]^[54] IBM Guardium Data Encryption integrates FPE directly into database environments, offering NIST-approved capabilities that encrypt sensitive records without altering their format or length, thus supporting compliance-driven pseudonymization in live systems. This facilitates field-level protection for assets in cloud, on-premises, or hybrid setups, with centralized key management to streamline policy enforcement across multiple databases.^[55]^[56] These solutions often include hardware acceleration through integration with hardware security modules (HSMs), such as Entrust HSMs paired with OpenText Voltage for high-throughput FPE operations in enterprise-scale deployments. They provide built-in compliance reporting tools to audit encryption usage and generate evidence for standards like PCI DSS and GDPR, including pseudonymization features that align with data protection regulations by minimizing breach impacts. Scalability for big data is achieved via optimized FPE implementations, such as OpenText's support for Hadoop ecosystems, where encrypted data can be ingested and analyzed without re-encryption, handling petabyte-scale volumes efficiently.^[57]^[58]^[59] Adoption of these FPE solutions is prominent in banking for tokenized transactions, where platforms like Thales CipherTrust and OpenText Voltage secure payment card and account details to meet PCI DSS requirements without altering transaction processing flows. In healthcare, they enable electronic health record (EHR) pseudonymization, as seen with Thales CipherTrust's field-level FPE and masking for protected health information (PHI), ensuring HIPAA compliance while preserving data utility for clinical analytics. AWS environments also leverage FPE recommendations within their security frameworks for protecting sensitive data in services like S3, with 2025 guidance emphasizing its role in tokenization and masking for analytics workloads.^[60]^[61]^[62] Pricing for these enterprise FPE offerings typically follows subscription-based models, often bundled with broader data loss prevention (DLP) or key management suites to provide comprehensive security, with costs varying based on deployment scale and features.^[63]^[64]^[56]

Limitations and Considerations

Known Vulnerabilities

Format-preserving encryption (FPE) schemes, being deterministic, are susceptible to frequency analysis attacks when the same key is used to encrypt multiple similar plaintexts, as the ciphertext distribution mirrors the plaintext frequency without the diffusion provided by probabilistic encryption methods.^[65] This vulnerability arises because identical plaintexts always produce identical ciphertexts, enabling adversaries to infer statistical patterns in the underlying data.^[66] A prominent example of small-domain vulnerabilities is the 2017 attack on the FF3 scheme, which uses differential analysis to recover keys in O(N^5) time and O(N^{11/6}) chosen plaintexts, where N approximates the square root of the domain size |D|; this becomes feasible (under $2^{32} operations) for |D| \leq 10^6.^[67] The attack exploits poor domain separation in FF3's tweak design, reducing the 8-round Feistel network to an effectively weaker 4-round structure via a slide attack.^[67] Subsequent improvements, such as left-half differential attacks, achieve message recovery with $2^{20} to $2^{23} queries for domains like 100 or 1,000 elements.^[68] The original FF3 scheme was deprecated by NIST in 2017 following the slide attack, and fully removed in the February 2025 revision of SP 800-38G. The revised FF3-1, introduced as a replacement, was also withdrawn in February 2025 due to cryptanalytic advances, including linear cryptanalysis enabling distinguishing and message-recovery attacks in \tilde{O}(N^{r/2 - 1.5}) time for r=8 rounds and N < 2^{19}, affecting implementations with domains as small as $10^6.^[69]^[70] These attacks highlight FF3-1's failure to maintain 128-bit security across configurations.^[70] In general, FPE faces birthday attacks when the number of encryptions q > \sqrt{|D|}, leading to collisions that reveal information about the plaintext space due to determinism. Additionally, cycle-walking mechanisms, used to reject out-of-domain outputs, introduce side-channel leaks through variable execution time or memory access patterns, potentially exposing plaintext bits via timing analysis.^[71] NIST has addressed some risks by mandating a minimum domain size of $10^6 for the remaining FF1 scheme to mitigate small-domain attacks.^[69] As of 2025, emerging post-quantum concerns focus on potential Grover algorithm impacts on underlying symmetric primitives like AES, though 256-bit keys provide adequate resistance; research into quantum-safe FPE constructions is ongoing.^[72]

Best Practices and Domain Requirements

When deploying format-preserving encryption (FPE), proper domain sizing is essential to ensure security against guessing and related attacks. The National Institute of Standards and Technology (NIST) requires that the domain size for the approved scheme FF1 be at least 1,000,000 elements, calculated as radix raised to the minimum length (radix^minlen ≥ 10^6), to provide adequate resistance to brute-force attempts on small subsets of the domain.^[69] For applications involving sub-domains or partitioned data sets, such as specific fields in a database, incorporating unique tweaks can effectively expand the perceived domain size by deriving distinct permutations without altering the underlying key, thereby avoiding vulnerabilities associated with isolated small domains.^[5] Effective key management is critical for maintaining the integrity of FPE systems. Keys should be unique to each data field or format type, with tweaks used to differentiate encryptions across related but distinct domains, such as different columns in a relational database, to prevent correlation attacks that could link ciphertexts across contexts.^[5] Regular key rotation is recommended, aligned with general cryptographic guidelines, to limit the impact of potential key compromise; for instance, keys should be rotated at intervals based on usage volume and threat models, ensuring re-encryption of affected data during transitions.^[73] Reuse of keys or tweaks across incompatible formats must be avoided, as it can undermine the scheme's security properties and enable format-recovery attacks.^[74] Implementation of FPE requires careful attention to operational details for robustness. Inputs must be validated to confirm they fall within the specified domain and radix before encryption, preventing errors or exploits from malformed data. The approved mode FF1 should be used, with implementations subjected to regular audits for side-channel vulnerabilities, including timing and power analysis, through techniques like masking or constant-time operations.^[69]^[75] Performance considerations in FPE balance security and efficiency, particularly in high-volume environments. The approved scheme FF1 employs a fixed number of rounds (10) to achieve provable security, but implementers should select underlying block ciphers like AES-128 or AES-256 that support hardware acceleration via instructions such as AES-NI to handle large-scale deployments without excessive latency.^[69] As of 2025, organizations using legacy FPE implementations are advised to migrate from the original FF3 or revised FF3-1 to the sole approved standard FF1, which maintains stricter domain requirements. For high-risk data environments, such as those involving personally identifiable information under stringent regulations, hybrid approaches combining FPE with tokenization are increasingly recommended to enhance protection while preserving usability, allowing reversible encryption for internal processing alongside irreversible tokens for external sharing.^[69]^[76]