X.690
X.690 is an ITU-T Recommendation that specifies the Basic Encoding Rules (BER), Canonical Encoding Rules (CER), and Distinguished Encoding Rules (DER) for Abstract Syntax Notation One (ASN.1), providing standardized transfer syntaxes for encoding and decoding ASN.1 data values to ensure interoperability in information technology and telecommunications systems.[1] The purpose of X.690 is to define a set of encoding rules applicable to all ASN.1 types, allowing for the representation of complex data structures in a machine-independent format suitable for transmission and storage.[2] BER offers flexibility to the sender in choosing encoding options, such as primitive or constructed forms and definite or indefinite lengths, making it suitable for general-purpose applications.[2] In contrast, CER restricts BER to use indefinite-length encoding with specific canonical choices, which is more appropriate for large encoded values, while DER mandates definite-length encoding with minimal octet representations, ensuring a unique encoding ideal for small values and applications requiring unambiguous representations, such as digital signatures.[2] Originally approved in July 1994 by ITU-T Study Group 17, X.690 has undergone multiple revisions and amendments to incorporate new ASN.1 features, with the current version dated February 2021 and an erratum in September 2021; it is also published as the international standard ISO/IEC 8825-1.[1] Key features include detailed rules for encoding primitive types—such as BOOLEAN (encoded as a single octet with all bits set to 1 for TRUE under DER and CER), INTEGER (using two's complement with minimal octets), BIT STRING (including unused bits count), and OCTET STRING (direct octet sequences)—as well as support for constructed types, relative object identifiers, and time types like UTCTime and GeneralizedTime.[2] These rules are foundational for protocols in the X.500 series, including X.509 public-key infrastructure, and broader applications in network management and data interchange.[1]Introduction
Overview
X.690 is the ITU-T Recommendation that specifies a set of encoding rules—Basic Encoding Rules (BER), Canonical Encoding Rules (CER), and Distinguished Encoding Rules (DER)—for values of types defined using Abstract Syntax Notation One (ASN.1).[3] These rules provide transfer syntaxes that enable the machine-independent encoding of ASN.1 data structures, facilitating their transmission across networks or storage in files without dependency on specific hardware or software implementations.[4] The primary purpose of X.690 is to ensure interoperability in telecommunications and information technology applications by defining unambiguous ways to serialize complex data hierarchies into binary octet sequences.[3] ASN.1, as a prerequisite for X.690, is a formal notation standardized in ITU-T X.680 for describing the abstract syntax of data, independent of any particular encoding or implementation language. It allows the definition of data types such as INTEGER, SEQUENCE, and OCTET STRING, along with tags that identify their roles within structured information objects, enabling clear specification of protocols and data formats used in standards like X.509 for digital certificates or SNMP for network management.[5] At the core of all encoding rules in X.690 is the Tag-Length-Value (TLV) format, where each encoded value consists of an identifier octet (tag) indicating the data type, length octets specifying the size of the contents, and contents octets holding the actual value representation.[4] This foundational structure supports both primitive (simple) and constructed (composite) encodings, with BER serving as the flexible baseline that allows multiple valid representations, while CER and DER impose constraints for canonical uniqueness.[3] X.690 is identical in content to the international standard ISO/IEC 8825-1, ensuring global harmonization of these encoding practices.[4]History and Versions
X.690 originated from earlier CCITT standards developed under the Open Systems Interconnection (OSI) model, where the Abstract Syntax Notation One (ASN.1) was initially defined in Recommendation X.208 (1988) for specifying data structures in telecommunications protocols. The Basic Encoding Rules (BER) for ASN.1 were first detailed in CCITT Recommendation X.209 (1988), providing a foundational transfer syntax for encoding ASN.1 data values to ensure interoperability across diverse systems.[6] These rules built upon prior work in X.409 (1984), which introduced encoding concepts for message handling systems but was limited in scope.[7] In 1994, the standards evolved under the ITU-T, with X.690 published as the initial edition specifying BER, Canonical Encoding Rules (CER), and Distinguished Encoding Rules (DER) for ASN.1.[8] This marked a shift from CCITT to ITU-T nomenclature and incorporated refinements for broader OSI alignment. Subsequent major revisions addressed technical corrigenda, amendments, and harmonization with international standards: the 1997 edition (X.690, 12/1997) introduced updates to encoding specifications; the 2002 edition (X.690, 07/2002) provided clarifications and corrections to prior defects; the 2008 edition (X.690, 11/2008) refined rules for consistency with evolving ASN.1 notations; the 2015 edition (X.690, 08/2015) incorporated further amendments for precision in encoding application; and the current 2021 edition (X.690, 02/2021) with Erratum 1 (09/2021) includes errata resolutions while maintaining core structures.[8][9] The evolution of X.690 was driven by the requirements for unambiguous, canonical encoding forms—particularly DER for applications like digital signatures in public key infrastructure—and the need for robust interoperability in heterogeneous networks.[2] X.690 remains an active ITU-T Recommendation, identical to ISO/IEC 8825-1:2021, ensuring its continued role in global standards for data encoding.[8]Basic Encoding Rules (BER)
General Structure
The Basic Encoding Rules (BER) for Abstract Syntax Notation One (ASN.1), as specified in ITU-T Recommendation X.690, organize encoded data units using a tag-length-value (TLV) triplet as the core building block for representing all ASN.1 types. This structure ensures a consistent, extensible format for serializing complex data hierarchies into octet streams suitable for transmission or storage. Each encoded data element comprises identifier octets defining the type, length octets specifying the extent of the contents, and contents octets holding the value, with end-of-contents octets optionally appended in certain cases. Primitive types, such as INTEGER or BOOLEAN, encode their values directly within the contents octets as a self-contained representation. In contrast, constructed types, like SEQUENCE or SET, encode their contents as a sequence of zero or more nested TLV triplets, allowing hierarchical nesting without fixed boundaries for the overall structure. For indefinite-length forms, the contents conclude with the end-of-contents octets consisting of two zero octets (00 00), signaling the termination of the encoding. CHOICE types in ASN.1 are encoded according to the rules of the selected alternative, adopting its identifier and contents structure to unambiguously represent the chosen option. Optional components within SEQUENCE or SET types may be entirely omitted from the encoding if absent, or included if present, at the encoder's discretion; default values can similarly be either explicitly encoded or omitted, with decoders substituting the predefined default when absent. As an illustrative outline, a simple primitive INTEGER value, such as 127, follows the TLV pattern: the identifier designates the universal INTEGER class, the length specifies the number of contents octets (here, one), and the contents provide the binary octet representation of 127, forming a compact, self-describing unit without nesting.Identifier Octets
In the Basic Encoding Rules (BER) defined by X.690, the identifier octets form the initial part of the Type-Length-Value (TLV) encoding structure, specifying the tag associated with the data value's type.[1] The identifier encodes the ASN.1 tag, which consists of the class and the tag number, along with an indication of whether the encoding is primitive or constructed.[1] The first octet of the identifier has a fixed bit structure. Bits 8 and 7 (the two most significant bits) indicate the tag class, as follows: This classification is detailed in Table 1 of Clause 8.1.2.2.[1] Bit 6, known as the primitive/constructed (P/C) bit, is set to 0 for primitive encodings (simple types without embedded values) and to 1 for constructed encodings (composite types containing other values).[1] For tag numbers from 0 to 30, bits 5 through 1 of this single octet encode the tag number as a 5-bit binary integer, with bit 5 being the most significant.[1] For tag numbers greater than or equal to 31, the identifier requires multiple octets. The leading octet follows the same structure for bits 8-6 (class and P/C), but bits 5-1 are set to 11111 (decimal 31) to signal the extended format.[1] Subsequent octets then encode the full tag number in a base-128 representation: each octet's bits 7-1 hold 7 bits of the number (bit 7 most significant), and bit 8 is set to 1 in all but the final octet (where it is 0) to indicate continuation. The encoding uses the minimal number of octets, and the first of these subsequent octets is never all zeros.[1] Universal class tags, which apply to standard ASN.1 types, provide common examples within the single-octet range. For instance, the INTEGER type uses tag number 2 (encoded as 02 in hexadecimal), while the SEQUENCE type uses tag number 16 (encoded as 30 in hexadecimal).[1] Regarding optional components in ASN.1 structures, BER prohibits encoding those that are absent; such components are simply omitted from the output, ensuring the identifier octets only appear for present elements.[1]Length Octets
In the Basic Encoding Rules (BER) of ASN.1 as specified in ITU-T X.690, the length octets immediately follow the identifier octets and indicate the number of contents octets in the encoding. There are two forms for these length octets: the definite form and the indefinite form.[1] The definite form specifies the exact length of the contents octets and is used for both primitive and constructed types.[1] In the short definite form, a single octet is used where the most significant bit (bit 8) is 0 and the remaining bits (7-1) encode an integer value from 0 to 127, representing the number of contents octets.[1] For lengths exceeding 127 octets, the long definite form is employed: the first octet has bit 8 set to 1 and bits 7-1 indicating the number of subsequent octets (from 1 to 127) that encode the full length value in binary, with the sender having the option to use more octets than the minimum required, though the shortest possible representation is preferred to minimize encoding size.[1] For zero-length contents, the definite form encodes the length as a single octet with value 00.[1] The indefinite form, denoted by a single length octet with value 80 (binary 10000000), does not specify a length but instead indicates that the contents octets are terminated by a pair of end-of-contents (EOC) octets with value 00 00.[1] This form is permitted only for constructed types, allowing the sender flexibility when not all contents data is immediately available during encoding.[1] In cases of fragmentation or nested constructed types under indefinite form, each inner constructed encoding using indefinite length must be properly terminated with its own EOC octets (00 00), ensuring unambiguous parsing by placing the outermost EOC after all nested contents.[1]Contents Octets
In the Basic Encoding Rules (BER) of ASN.1, as specified in ITU-T Recommendation X.690, the contents octets encode the data value of the type, with the form depending on whether the encoding is primitive or constructed.[1] Primitive encodings directly represent the value in the contents octets without nested structures, while constructed encodings contain zero or more complete inner encodings of data values.[1] The size of these contents octets is delimited by the preceding length octets.[1] For primitive types, the contents octets provide a direct binary representation tailored to the type. An INTEGER, for instance, is encoded as a two's complement binary number equal to the integer value, using the minimal number of octets required; if more than one octet is needed, the bits in the first octet shall not all be ones or all zeros.[1] The NULL type has empty contents octets, containing no data.[1] Special cases include the BIT STRING, which in primitive form begins with a single octet specifying the number of unused bits (0 through 7) in the subsequent data octets, followed by the bit string itself with bits numbered from most significant to least significant; the encoding may alternatively be constructed.[1] Similarly, the OCTET STRING in primitive form consists directly of the sequence of octets representing the string value, or it may be constructed as a concatenation of inner octet string segments.[1] Constructed types encode their values through nested structures within the contents octets. For a SEQUENCE or SEQUENCE OF, the contents octets comprise the complete encodings of the components or elements in the order they appear in the ASN.1 definition, with optional or default components potentially omitted.[1] A SET or SET OF follows a similar approach but allows the sender to order the components or elements arbitrarily.[1] Tagged types modify the encoding of an underlying type via implicit or explicit tagging. In explicit tagging, the encoding is always constructed, with the contents octets holding the complete encoding (including identifier and length octets) of the underlying value.[1] Implicit tagging preserves the primitive or constructed nature of the underlying type, using its contents octets directly without additional nesting.[1] For open types, such as those in CHOICE alternatives, the contents octets contain the complete encoding of the selected specific type.[1]Canonical Encoding Rules (CER)
Key Constraints from BER
The Canonical Encoding Rules (CER) build directly upon the Basic Encoding Rules (BER) by inheriting the fundamental Tag-Length-Value (TLV) structure while imposing restrictions to produce a unique, canonical representation of ASN.1 data values, particularly useful for applications requiring verifiable encodings in protocols that support indefinite-length forms, such as certain telecommunications exchanges. This inheritance ensures that every encoded value consists of an identifier octet (or octets) specifying the tag, a length field indicating the size of the contents, and the contents octets themselves, but CER eliminates BER's flexibility in length determination and component selection to avoid multiple possible encodings for the same value.[10] A primary constraint is the mandatory use of definite length forms for primitive encodings, where the length octets explicitly state the exact number of contents octets using the shortest possible representation (one to five octets, without leading zeros except for zero-length cases). For constructed encodings, CER requires the indefinite length form, delimited by end-of-contents (EOC) octets (0x00 0x00), which allows for nested structures while maintaining predictability. This hybrid approach contrasts with BER's allowance for either form in both cases, ensuring CER encodings are self-delimiting without ambiguity.[10] To prevent fragmentation and ensure complete, non-partial representations, CER bans indefinite lengths for all primitive types and prohibits stray or unused EOC octets outside properly nested constructed forms; all contents must be fully enclosed within their defined boundaries, with no allowance for BER-style optional fragmentation that could lead to decoding variations.[10] For CHOICE types without explicit tags, CER restricts encoding to a single alternative: the one with the lowest tag number among possible options, eliminating BER's freedom to select any valid alternative and thereby guaranteeing uniqueness. This rule applies recursively to nested choices, prioritizing the minimal tag to canonicalize the structure.[10] The encoding of REAL types in CER adheres to BER's primitive formats—either binary (base-2) or decimal (base-10)—but enforces consistency through restrictions such as normalized exponents (where the exponent is adjusted so the mantissa's most significant digit is non-zero) and prohibition of scaling factors or exotic bases like 8 or 16. This ensures that equivalent real values always produce identical octet sequences.[10] Finally, CER mandates omission of unneeded components: optional elements are absent if not present, and default values are not encoded if the actual value matches the default, reducing the encoding to its minimal form without loss of information. This aligns with BER's optional inclusion but makes it obligatory for canonicity, preventing redundant octets that could vary across implementations.[10]Additional Canonical Requirements
In addition to the constraints inherited from Basic Encoding Rules (BER), Canonical Encoding Rules (CER) impose further restrictions to ensure a unique, canonical representation of ASN.1 values, particularly suited for applications requiring equivalence testing or relay-safe processing. These rules eliminate ambiguities arising from encoding choices, such as padding or formatting options, by mandating specific octet-level details that produce identical bit patterns for the same abstract value across all compliant encoders.[10] For BIT STRING and OCTET STRING types, CER requires that unused bits in the final octet of a BIT STRING be set to zero, preventing any non-zero padding that could vary between encoders. Additionally, any trailing zero bits in the bit string value must be removed prior to encoding to avoid optional extensions that could alter the length without changing the value. This ensures no internal fragmentation or variable padding, promoting a minimal and deterministic form. OCTET STRING follows similar primitive encoding preferences but without bit-level padding concerns.[10] The encoding of REAL types under CER prioritizes consistency by specifying distinct formats for binary and decimal representations. For base-2 (binary) REALs, the encoding uses a binary form with an odd mantissa (if non-zero) and a binary scaling factor of zero, minimizing octets while avoiding decimal approximations. For base-10 (decimal) REALs, the NR3 character form is required, with no spaces, leading zeros, or trailing zeros in the mantissa, ensuring a standardized textual representation. These rules favor binary encoding where applicable for precision in computational contexts, though both are supported with fixed constraints.[10] SEQUENCE and SET types maintain defined component ordering to eliminate sorting variations. In SEQUENCE encodings, components are always presented in the exact order specified in the ASN.1 type definition, preserving the structural intent without reordering. For SET types, components are ordered ascending by their tags as defined in ITU-T Rec. X.680, with untagged CHOICE types positioned based on the smallest tag alternative; no arbitrary sorting is permitted. This tag-based ordering for SETs, combined with SEQUENCE's fixed sequence, contributes to the overall canonical uniqueness.[10] Tag handling in CER adheres to general BER rules but integrates with ordering constraints, where tags determine the sequence of SET components without introducing ambiguities from implicit or explicit choices. Explicit tagging results in a constructed encoding containing the full base encoding, while implicit tagging aligns the form with the base type (primitive or constructed), but both must follow the canonical length and content rules to yield identical outputs. The emphasis on tag-driven consistency ensures that the resulting bit patterns are reproducible regardless of tagging style used in the ASN.1 schema.[10] Collectively, these CER-specific requirements guarantee that all valid encodings of an identical ASN.1 abstract value produce the exact same sequence of octets, removing any encoder-dependent options and enabling reliable comparison or processing in distributed systems. This completeness is essential for security protocols where encoding equivalence must be verifiable without decoding.[10]Distinguished Encoding Rules (DER)
Subset of BER Rules
The Distinguished Encoding Rules (DER) form a strict subset of the Basic Encoding Rules (BER) as specified in ITU-T Recommendation X.690, inheriting all BER encoding principles while imposing tighter constraints to produce a unique, unambiguous representation for each ASN.1 data value. This subset nature eliminates the encoding choices available in BER, ensuring that DER encodings are deterministic and suitable for applications requiring verifiable uniqueness, such as digital signatures and certificates.[2] A core restriction in DER is the mandatory use of definite length forms for all length octets, disallowing the indefinite length option permitted in BER.[2] The definite length must be encoded in the minimum number of octets required to represent the value, preventing unnecessary padding or extended forms.[2] DER further enforces minimal representations across various type encodings to avoid ambiguity. For instance, INTEGER values are encoded without leading zeros, except for the value zero itself, and length octets use the shortest possible form.[2] Similarly, other primitive types like OCTET STRING and BIT STRING are encoded in their minimal octet lengths without superfluous constructed wrappers.[2] In handling CHOICE types, DER mandates the selection of the lowest-numbered alternative based on the sequential order of components in the ASN.1 type definition, rejecting any other alternatives even if they could produce equivalent BER encodings.[2] DER aligns with the Canonical Encoding Rules (CER) in promoting canonical properties for equivalence testing but diverges by insisting on definite lengths and stricter minimalism, which underscores its focus on absolute uniqueness rather than mere canonical comparability.[2]Unambiguous Encoding Mandates
The Distinguished Encoding Rules (DER) establish stringent constraints to guarantee that every ASN.1 abstract value yields precisely one unique bit string encoding, a property vital for cryptographic applications where encoding ambiguity could compromise integrity and verifiability. By restricting choices available under Basic Encoding Rules (BER), DER eliminates variations in octet representation, length determination, and structural ordering. For INTEGER types, DER mandates encoding in two's complement binary representation using the minimal number of octets necessary. Positive integers exclude leading zero octets, except for the value zero, which is encoded as a single octet with value zero; this prevents equivalent values from having multiple octet sequences. Negative integers follow the same minimal-length rule, with the most significant bit set to one in the first octet.[2] String types under DER require primitive encoding (no constructed form) and the shortest possible octet representation without trailing padding. Restricted character string types, such as PrintableString and VisibleString, are encoded as octet strings using their specified character-to-octet mappings, with no reordering of characters.[2] BIT STRING encodings in DER consist of an initial octet specifying the number of unused bits (0-7) in the final data octet, followed by the bit field in successive octets, with any padding bits in the last octet set exclusively to zero. This construction avoids ambiguity in bit alignment and length while minimizing the overall octet count.[2] SEQUENCE and SET types adhere to fixed component ordering: SEQUENCE components follow the exact sequence defined in the ASN.1 module, while SET components are sorted in ascending order of their tags (universal, application, private, or context-specific). Optional or default components cannot be omitted or substituted in ways that alter the canonical form, ensuring deterministic serialization.[2] DER supports both explicit and implicit tagging to maintain uniqueness. For explicit tagging, a constructed encoding is used, with the tag and length octets preceding the complete encoding of the inner type. For implicit tagging, the tag and length octets precede the contents octets of the inner type's encoding (omitting the inner type's identifier), and all other DER rules apply to the base type encoding.[2]Comparisons and Considerations
Differences Across Rules
The Basic Encoding Rules (BER), Canonical Encoding Rules (CER), and Distinguished Encoding Rules (DER) defined in ITU-T X.690 represent a hierarchy of constraints applied to the core Type-Length-Value (TLV) structure for ASN.1 data encoding, with BER offering the broadest flexibility, while CER and DER impose increasing restrictions to achieve canonical or unique representations.[2] BER permits multiple valid encodings for the same value, supporting both indefinite and definite length forms, optional padding in certain types like bitstrings, and sender discretion in selections such as choice types or tag numbering, which facilitates implementation but can lead to interoperability issues.[2] In contrast, CER and DER mandate definite-length encodings for primitive types and minimal octet representations without padding, ensuring more predictable bitstreams, though CER allows indefinite lengths for constructed types under specific fragmentation rules (e.g., limited to 1000 octets per fragment for strings).[2] Regarding canonicality, CER enforces rules that produce equivalent bit patterns for semantically identical values—such as ordering set components by ascending tag numbers and selecting the lowest numbered tag for untagged choices—making it suitable for applications requiring verifiable equivalence without ambiguity, like certain protocol negotiations.[2] DER builds on this by further restricting encodings to a single unique bitstring per value, prohibiting indefinite lengths entirely and disallowing constructed encodings for primitive types like strings, which ensures absolute uniqueness critical for security contexts such as digital signatures in X.509 certificates.[2] Type-specific differences highlight these constraints: in BER, choices can use any valid tag, allowing variability; CER requires the lowest possible tag for such selections; and DER extends this minimalism to all representations, such as using the shortest possible encoding for integers and booleans.[2] These differences influence use cases, with BER favored for general-purpose, flexible serialization in protocols like LDAP where multiple encodings are tolerable; CER applied in scenarios needing canonical equivalence for comparison or hashing, such as in some cryptographic protocols; and DER mandated for unambiguous, non-repudiable encodings in standards like PKIX for public key infrastructure.[2]| Feature | BER | CER | DER |
|---|---|---|---|
| Multiple Encodings per Value | Allowed (e.g., indefinite lengths, padding, variable choices) | Prohibited; one equivalent encoding enforced | Prohibited; one unique encoding enforced |
| Length Forms | Indefinite and definite supported | Definite for primitives; indefinite for constructed (with limits) | Definite only |
| Padding Rules | Permitted in bitstrings and octet strings | Prohibited | Prohibited |
| Choice/Tag Selection | Sender's discretion (any valid tag) | Lowest numbered tag required | Lowest numbered tag; minimal overall representation |
| Set Component Ordering | Arbitrary | By ascending tag numbers | By ascending tag numbers |
| String Encoding | Constructed or primitive; variable fragments | Primitive preferred; fragments ≤1000 octets | Primitive only; no constructed encodings |