External Data Representation
External Data Representation (XDR) is a standard for the description and encoding of data, enabling the portable transfer of information between diverse computer architectures, such as those from Sun Microsystems, VAX, IBM-PC, and Cray systems.[1] Developed originally by Sun Microsystems in 1987, XDR provides a language-like notation—similar to C but not a programming language—for specifying data formats, ensuring interoperability without loss of meaning across hardware platforms.[2] It operates within the presentation layer of the ISO OSI model, analogous to the X.409 standard but employing implicit typing rather than explicit tagging.[1] Key features of XDR include support for fundamental data types such as 32-bit integers, IEEE 754 floating-point numbers, enumerated types, strings, arrays, structures, and discriminated unions, all encoded in a canonical big-endian byte order using 4-byte alignment blocks with padding as needed for portability.[1] The encoding assumes an 8-bit byte as the basic unit, which is portable across transmission media, and handles both fixed-length and variable-length data through explicit length prefixes where required.[1] This neutral representation avoids architecture-specific details like endianness or padding conventions that could otherwise cause compatibility issues.[2] XDR has been integral to several network protocols, notably the Open Network Computing Remote Procedure Call (ONC RPC) and the Network File System (NFS), where it standardizes the marshaling and unmarshaling of procedure arguments and results.[1] Its specification has evolved through multiple RFCs: initially defined in RFC 1014 (1987), revised in RFC 1832 (1995), and updated to its current form in RFC 4506 (2006), which remains the deployed standard.[1] Later extensions appear in protocol-specific documents, such as RFC 5662 for NFS version 4 minor version 1, but the core XDR remains unchanged for broad applicability.Overview
Definition and Purpose
External Data Representation (XDR) is a standard for the description and encoding of data, designed to facilitate the transfer of information between diverse computer architectures, such as those from Sun, VAX, IBM-PC, and Cray systems.[2] Developed as part of the Open Network Computing (ONC) protocol suite, XDR provides a platform-independent serialization format that ensures data can be exchanged without dependency on the specifics of the host machine's internal representation.[2] The primary purpose of XDR is to encode data in a canonical form that abstracts away architectural differences, including variations in byte order (endianness), data alignment, and size representations across heterogeneous environments.[2] By assuming a universal 8-bit byte structure and using a fixed, big-endian byte order, XDR enables seamless interoperability in network protocols, fitting into the presentation layer of the ISO reference model similar to standards like X.409 but with implicit typing for efficiency.[2] XDR's design emphasizes simplicity, allowing data formats to be described using a concise language akin to C, which minimizes complexity in implementation and parsing.[2] It prioritizes efficiency for network transmission through the use of fixed-length fields and 4-byte alignment blocks where possible, avoiding intricate variable-length parsing while balancing compactness and compatibility.[2] Originally created to support remote procedure calls (RPC) in distributed systems, XDR addresses the challenges of data exchange in multi-vendor networks by providing a neutral, architecture-independent representation.[2]Key Principles
External Data Representation (XDR) is designed around several core principles that ensure reliable and efficient data interchange across heterogeneous systems. These principles emphasize uniformity, self-describing structures, performance optimization, and independence from specific implementation environments. By adhering to these guidelines, XDR provides a standardized method for encoding data that can be decoded without prior knowledge of the producing system's architecture.[2] A fundamental principle of XDR is the use of a canonical representation through big-endian byte order, also known as network byte order. This approach mandates that all multi-byte data values, such as integers, are encoded with the most significant byte first, regardless of the native byte order of the source or destination machine. The rationale is to eliminate ambiguities arising from differing endianness in computer architectures; for instance, a 32-bit integer value of 1 would always be represented as the byte sequence 0x00 0x00 0x00 0x01. This uniformity allows any compliant system to interpret the data correctly without byte-swapping conversions, simplifying interoperability.[3] Another key principle is the discrimination of data types, particularly for variable-length constructs, which enables decoders to parse streams without relying on external schema information. For types like strings or opaque byte arrays, XDR prepends an explicit unsigned integer indicating the exact length of the data, followed by the payload itself, and concludes with padding to maintain alignment. This length prefix serves as a clear boundary marker, allowing sequential decoding even in concatenated messages. For example, a string of "hello" (5 characters) would begin with the 32-bit value 5, followed by the ASCII bytes and padding zeros if necessary. This self-descriptive mechanism enhances robustness in network transmissions where partial data might arrive.[4] Efficiency is prioritized through the preference for fixed-size primitives and enforced 4-byte alignment. Primitive types, such as 32-bit integers or floating-point numbers, are represented with consistent, fixed lengths to minimize encoding overhead and facilitate direct memory mapping. All encoded units, including aggregates like structures and arrays, are padded with zero bytes to multiples of four bytes, aligning data on natural word boundaries for faster processing on most hardware. This alignment reduces the computational cost of parsing and improves throughput in high-volume data exchanges, though it introduces minor space overhead for smaller items.[5] XDR maintains neutrality to programming languages by defining itself strictly as a transfer format, untethered to any particular language's memory layout or conventions. While its type system draws analogies to languages like C—facilitating straightforward mapping for C programmers—it avoids language-specific features such as pointers or unions with variant discriminants that could introduce platform dependencies. This design philosophy positions XDR as a portable intermediate representation, suitable for use in diverse environments from embedded systems to distributed applications, with implementations available in multiple languages to handle encoding and decoding.[3]History
Origins and Development
External Data Representation (XDR) was developed in 1987 by Sun Microsystems as a key component of the Open Network Computing (ONC) architecture, specifically to support remote procedure calls (RPC) in distributed computing environments.[6] The effort was led by Sun engineers, with Bob Lyon playing a prominent role in advancing ONC RPC technologies during the 1980s, addressing the challenges of data exchange in heterogeneous Unix-based systems.[7] This development arose from the need to enable seamless communication across diverse hardware architectures prevalent in early networked setups, such as those using 4.2BSD Unix variants.[8] The primary technical motivations for XDR centered on resolving interoperability issues in data marshalling for RPC, particularly discrepancies in byte ordering (endianness) and data alignment between machines like Digital Equipment Corporation's VAX systems (little-endian) and Sun's own Motorola 68000-based workstations (big-endian).[6] By defining a canonical, machine-independent format using big-endian byte order and fixed 4-byte alignment with zero padding, XDR ensured that binary data could be reliably transmitted and reconstructed without architecture-specific conversions, facilitating the growth of local area network applications.[8] This approach was integral to Sun's RPC implementation, where XDR routines handled the encoding and decoding of procedure arguments and results.[9] XDR's initial release occurred alongside Sun's RPC package in 1987, documented in RFC 1014 as a proposed standard for external data representation.[6] It was quickly integrated into SunOS, Sun's Unix operating system, to underpin network services like the Network File System (NFS), where it resolved persistent issues with data portability in multi-vendor environments.[10] Early adoption by Sun and its partners demonstrated XDR's effectiveness in promoting interoperability, laying the groundwork for broader use in distributed systems before its formal standardization.[9]Standardization
The External Data Representation (XDR) standard was initially formalized by the Internet Engineering Task Force (IETF) through RFC 1014, published in June 1987, which defined XDR as a method for representing data in a portable, network-transmissible format independent of the host architecture.[11] This document, authored by Sun Microsystems, established the core encoding rules and data types for XDR, marking its transition from a proprietary tool to an open specification.[12] In August 1995, RFC 1832 updated and obsoleted RFC 1014, refining the specification for greater clarity while introducing minor extensions such as 64-bit hyper integers and unsigned hyper integers to support larger numerical ranges without altering the fundamental big-endian, fixed-length encoding principles.[13] These changes addressed evolving needs in networked applications while preserving compatibility with existing implementations.[14] RFC 4506, published in May 2006, further obsoleted RFC 1832 and elevated XDR to Internet Standard status (STD 67), though it introduced no substantive technical modifications, focusing instead on editorial improvements and confirmation of the protocol's stability.[15] Since then, XDR has seen no major revisions, underscoring its design robustness and the emphasis on backward compatibility in its evolution.[16] XDR's standardization has influenced IETF protocols requiring straightforward, architecture-neutral serialization, such as those in the Open Network Computing (ONC) family, by providing a reliable foundation that ensures interoperability across diverse systems without necessitating version-specific overhauls.[15]Specification
Encoding Rules
External Data Representation (XDR) employs a standardized procedure to serialize data into a portable byte stream, ensuring interoperability across diverse computer architectures. The core encoding process converts data into a sequence of 8-bit octets, where multi-octet values such as integers are represented in big-endian byte order— with the most significant byte first—to eliminate dependence on host-specific endianness. This canonical format facilitates transmission over networks without requiring architecture-specific adjustments.[1] Explicit type discrimination is integral to the encoding, particularly for composite structures; for instance, in discriminated unions, the encoding begins with the discriminant value, followed by the serialized representation of the corresponding arm, allowing decoders to identify and process the appropriate data variant.[1] Alignment constraints are enforced throughout the stream to optimize performance and portability, with every encoded item—including primitives, structures, and arrays—padded to a multiple of four bytes using zero bytes as fillers. Variable-length opaque data, such as strings, exemplifies this: the length is first encoded as a 4-byte unsigned integer (indicating the number of bytes in the string), followed by the actual bytes, and then padded with zero to 3 additional bytes if necessary to reach the next 4-byte boundary. This padding mechanism ensures that subsequent items start on word-aligned boundaries, reducing fragmentation in the byte stream.[1] The serialization operates through an abstraction layer known as the XDR stream, which models the byte stream as a sequential input/output buffer for encoding and decoding operations. Implementations provide library routines to interact with this stream, such asxdr_int(), which serializes a 32-bit signed integer by converting it to big-endian octets and writing it to the stream while adhering to alignment rules; similar primitives exist for other basic types, abstracting the low-level byte manipulation. These routines return a boolean indicating success or failure, enabling robust error detection during the process.[1]
For handling undecoded or implementation-specific data, XDR supports opaque byte sequences treated as fixed-length uninterpreted blocks, encoded with a length prefix and padded to four-byte multiples, which promotes forward compatibility in evolving protocols by allowing newer decoders to safely skip unrecognized sections without corrupting the stream.[1]
XDR Description Language
The XDR Description Language is a simple, declarative language resembling C, designed for specifying the format of data structures to be serialized using the External Data Representation standard. It allows developers to define schemas that describe how data should be encoded and decoded for transfer between heterogeneous systems, ensuring platform independence without requiring a full programming environment. This language focuses on type definitions and compositions, producing output that can be compiled into routines for handling serialization.[1] Core syntax elements include declarations for basic constructs such as structures, type aliases, and enumerations, with extensions for RPC contexts. A struct declaration defines a composite type with named components, such asstruct example { int value; };, where components are encoded sequentially and padded to multiples of four bytes. The typedef keyword creates aliases for existing types, for instance, typedef int count;, enabling reuse without altering the underlying representation. Enumerations are specified via enum, like enum status { SUCCESS = 0, FAILURE = 1 };, mapping symbolic names to integer constants for compact encoding. In RPC scenarios, a program declaration outlines service definitions, such as program myservice { version myversion { int MYPROC(args) = 1; } = 1; } = 100000;, integrating XDR types with procedure calls.[1][17]
The compilation process involves using the rpcgen tool to process XDR description files, typically with a .x extension, generating C code for encoding and decoding. Input files contain the language definitions, and rpcgen produces header files for type declarations, along with source files containing XDR routines prefixed with xdr_, such as bool_t xdr_example(XDR *xdrs, example *obj);, which handle the serialization of defined structures into byte streams. This automated generation ensures that the resulting code adheres to XDR encoding rules, facilitating integration into RPC clients and servers.[17][18]
For example, consider a basic structure for a record:
This defines a structure with a 32-bit integercstruct record { int id; string name<32>; };struct record { int id; string name<32>; };
id followed by a variable-length string name limited to 32 characters. When compiled via rpcgen, it generates an xdr_record routine. In encoding, if id = 42 and name = "Test", the output byte stream would be: four bytes for id (0x0000002A in big-endian), four bytes for the string length (0x00000004), followed by the four ASCII bytes for "Test" (0x54657374), with no additional padding required, resulting in a 12-byte total for the structure. This mapping ensures consistent representation across architectures.[1]
Data Types
Primitive Types
External Data Representation (XDR) defines a set of primitive data types that serve as the foundational building blocks for encoding basic values in a platform-independent manner. These types are designed to be simple, atomic units that can be directly serialized into byte streams without internal structure beyond their bit representation. All primitive types in XDR are aligned on 4-byte boundaries in the encoded stream, with padding bytes (set to zero) added if necessary to achieve this alignment, ensuring consistent parsing across different host architectures.[19] The integer primitives include both 32-bit and 64-bit variants in signed and unsigned forms, all represented in big-endian byte order (network byte order). A signed 32-bit integer, declared asint in the XDR language, occupies 4 bytes and represents values in the range [-2^31, 2^31 - 1] using two's complement notation, with the most significant byte encoded first.[20] Similarly, an unsigned 32-bit integer, declared as unsigned int, spans 4 bytes and covers [0, 2^32 - 1] in binary notation, also big-endian.[21] For larger values, the signed 64-bit hyper integer (hyper) and unsigned 64-bit hyper integer (unsigned hyper) each use 8 bytes, maintaining the same two's complement or binary encoding and big-endian order, with the most significant byte first.[22] This uniform big-endian approach avoids endianness issues common in heterogeneous networks.[19]
The enumeration type, declared as enum { identifier [= integer], ... } identifier;, is encoded identically to a signed 32-bit integer, where the value corresponds to one of the defined enum constants (defaulting to sequential integers starting from 0 if not specified). It occupies 4 bytes in big-endian order.[23]
Floating-point primitives adhere strictly to IEEE 754 standards for portability. The 32-bit float type, declared as float, encodes a single-precision value using 4 bytes: 1 sign bit (S), 8 exponent bits (E, biased by 127), and 23 mantissa bits (F), following the formula (-1)^S × 2^(E-127) × (1.F) for normalized numbers.[24] The 64-bit double type, declared as double, uses 8 bytes with 1 sign bit, 11 exponent bits (biased by 1023), and 52 mantissa bits, computed as (-1)^S × 2^(E-1023) × (1.F).[25] XDR also supports 128-bit quadruple-precision floating-point, declared as quadruple, using 16 bytes: 1 sign bit, 15 exponent bits (biased by 16383), and 112 mantissa bits per IEEE 754, including special values.[26] These representations preserve the exact bit layout from the IEEE standard, including special cases like NaN, infinity, and denormals, to ensure identical decoding on any compliant system.[24]
The boolean type, declared as bool, is encoded as a 32-bit unsigned integer value: 0 for false and 1 for true, occupying 4 bytes in big-endian order.[27] This mapping simplifies implementation by reusing the unsigned integer machinery while restricting the effective range to these two values.[27]
For handling arbitrary binary data, XDR provides the opaque type, which treats sequences of bytes as uninterpreted data. A fixed-length opaque value, declared as opaque identifier[n], consists of exactly n bytes followed by 0-3 padding bytes to reach a multiple of 4.[28] Variable-length opaque data, declared as opaque identifier<m> (with an upper bound m) or opaque identifier<> (unbounded), is prefixed by a 32-bit unsigned integer indicating the actual length in bytes (up to 2^32 - 1), followed by that many bytes and padding to the next 4-byte boundary.[29] This length-prefixing enables efficient transmission of non-structured blobs, such as raw files or buffers, without requiring type-specific parsing.[29]
| Type | Declaration | Size (bytes) | Representation | Range/Notes |
|---|---|---|---|---|
| Signed Integer | int | 4 | Two's complement, big-endian | [-2^31, 2^31 - 1] |
| Unsigned Integer | unsigned int | 4 | Unsigned binary, big-endian | [0, 2^32 - 1] |
| Enumeration | [enum](/page/Enumeration) | 4 | Signed integer, big-endian | Enum values as integers |
| Hyper Integer | hyper | 8 | Two's complement, big-endian | [-2^63, 2^63 - 1] |
| Unsigned Hyper | unsigned hyper | 8 | Unsigned binary, big-endian | [0, 2^64 - 1] |
| Float | float | 4 | IEEE 754 single-precision | As per IEEE standard |
| Double | double | 8 | IEEE 754 double-precision | As per IEEE standard |
| Quadruple | quadruple | 16 | IEEE 754 quadruple-precision | As per IEEE standard |
| Boolean | bool | 4 | Unsigned int (0/1), big-endian | False (0), True (1) |
| Opaque (variable) | opaque<> | 4 + length | Length prefix + bytes + padding | Length up to 2^32 - 1 bytes |
Constructed Types
In XDR, constructed types enable the composition of primitive data types into more complex structures, ensuring platform-independent serialization through canonical encoding rules. These types include structures, unions, arrays, and mechanisms for optional data, all of which aggregate primitives while adhering to a uniform 4-byte alignment to facilitate efficient transfer across heterogeneous systems.[1] Structures represent fixed sequences of typed fields, declared in the XDR language asstruct { component-declaration-A; component-declaration-B; ... } identifier;, where each component can be a primitive or another constructed type. For instance, a structure might be defined as struct { [int](/page/INT) version; [string](/page/String) name<>; } example;, encoding the integer field first followed by the variable-length string, with any necessary padding bytes added to align the overall structure to a multiple of 4 bytes. During encoding, the fields are serialized in declaration order, and each field's representation is padded with zero to three zero bytes so that the total length is a multiple of four, preventing misalignment in byte streams.[30][19]
Discriminated unions allow for variant data based on a tag, typically an enumeration or integer, and are specified as union switch (discriminant-declaration) { case constant-A: arm-declaration-A; ... default: default-declaration; } identifier;. An example is union switch (int choice) { case 1: int value; case 2: float number; } variant;, where the discriminant occupies 4 bytes, followed by the selected arm's encoding, which is then padded to a 4-byte boundary. This ensures that only the relevant variant is included in the serialized form, with the tag determining the interpretation at decode time.[31][19]
Arrays in XDR come in fixed-length and variable-length forms to handle collections of homogeneous elements. Fixed-length arrays are declared as type-name identifier[n];, such as int counts[5];, where the n elements are encoded sequentially from index 0 to n-1, with the entire array padded to a multiple of 4 bytes. Variable-length arrays use type-name identifier<m>; (bounded) or type-name identifier<>; (unbounded), exemplified by string items<100>;; their encoding prepends a 4-byte unsigned integer indicating the actual count k (where 0 ≤ k ≤ m or unbounded), followed by the k elements, and padded so the total length is a multiple of 4.[32][33][19]
Optional data is not supported as a native type but is instead realized through discriminated unions or a boolean flag combined with the data type, as in bool present; type-name value; or the shorthand type-name *identifier;, which expands to a union with a boolean discriminant. For example, string *optionalName; encodes a 4-byte boolean (1 for present, 0 for absent), followed by the string if present, with appropriate padding to maintain 4-byte alignment. This approach allows conditional inclusion without dedicated optional semantics.[34][19]
In practice, encoding a constructed type like a structure involves concatenating the serialized fields with inter-field and post-field padding; for the example struct { int a; string b<>; } s; with a=1 and b="hello", the stream would be the 4-byte integer (e.g., 0x00000001), followed by the string's 4-byte length (5), the 5 bytes of "hello", one padding byte (0x00), totaling 16 bytes aligned to 4. Such aggregation builds upon primitive types like integers and strings to form hierarchical data representations suitable for network protocols.[30][19]
Applications
In ONC RPC and NFS
External Data Representation (XDR) serves as the foundational data encoding mechanism in Sun Microsystems' Open Network Computing Remote Procedure Call (ONC RPC), where it is used to marshal arguments and results for remote procedure calls, ensuring platform-independent transmission over networks. In ONC RPC, procedures are defined using the RPC description language, an extension of XDR, typically in files with a .x extension; these definitions specify the program's version, procedure numbers, and parameter structures, which are then compiled into stubs that handle XDR encoding and decoding. For instance, a simple RPC program might declare a procedure likeADDRESULT addargs(int x, int y) returns (int);, where the client marshals the input arguments into an XDR stream before transmission, and the server unmarshals them for execution. This integration, as specified in the ONC RPC protocol, allows RPC calls to operate transparently across heterogeneous systems without requiring host-specific data conversions.[35][36]
In the Network File System (NFS), XDR plays a central role in encoding protocol messages for Versions 2 and later, facilitating the representation of file system objects such as handles, attributes, and directory entries to enable cross-platform mounting and access. NFS Version 2, introduced in 1989, relies on XDR to define structures like the file handle (fhandle, an opaque 32-byte array) and file attributes (fattr, including fields for type, mode, size, and timestamps), which are serialized in RPC calls for operations such as getattr and setattr. Similarly, NFS Version 3 extends this with larger structures, such as the 64-byte nfs_fh3 file handle and the fattr3 attribute struct (encompassing file type, ownership, size, and time metadata), alongside directory entries in readdir operations that include file identifiers and names. These XDR-defined elements ensure that NFS clients and servers, often running on different architectures, can consistently interpret file system data without endianness or alignment issues.[37][38]
XDR's fixed, canonical encoding format contributes to the performance of NFS operations in ONC RPC environments by minimizing overhead in data serialization and deserialization, particularly over UDP or TCP transports in early distributed file systems. This efficiency arises from XDR's avoidance of variable-length headers and its standardized handling of primitives like integers (always big-endian, 4-byte aligned), which reduces parsing latency and enables stateless, high-throughput file access without per-client customizations. For example, in NFS read/write procedures, XDR's opaque data handling for file contents allows direct network transfer with minimal processing, supporting the protocol's design goals for simplicity and speed in local area networks.[6][37][38]