CBOR
Concise Binary Object Representation (CBOR) is a binary data serialization format designed to achieve small code size and message size while supporting extensibility without the need for version negotiation or schemas. It is based on an extended version of the JSON data model, incorporating additional types such as binary strings, and enables straightforward conversion between CBOR and JSON representations. Specified as an Internet Standard in RFC 8949, which obsoletes the earlier RFC 7049, CBOR is particularly suited for use in constrained devices and networks, such as those in Internet of Things (IoT) applications, by prioritizing compactness and efficiency in encoding common data structures like integers, strings, arrays, and maps.[1] CBOR's encoding scheme is structured around an initial byte that encodes both a major type (using the three most significant bits) and additional information (using the five least significant bits), allowing for definite-length items with explicit sizes or indefinite-length items terminated by a "break" stop code. The eight major types include unsigned integers (type 0), negative integers (type 1), byte strings (type 2), text strings in UTF-8 (type 3), arrays of data items (type 4), maps of key-value pairs (type 5), semantic tags for extensibility (type 6), and simple values or floating-point numbers such astrue, false, null, and IEEE 754 floats (type 7). This design ensures self-description without external schemas, making it schema-less like JSON while being more compact for binary data.[1]
Developed initially by Carsten Bormann and Paul Hoffman within the IETF's Constrained RESTful Environments (CoRE) working group, CBOR addresses limitations of text-based formats like JSON in resource-constrained settings by providing a binary alternative that supports the same interoperability goals. It has been extended through additional RFCs, including RFC 8610 for the Concise Data Definition Language (CDDL), a notation for specifying CBOR and JSON data structures, and RFC 8742 for CBOR Sequences, which allow concatenation of multiple CBOR items. CBOR's diagnostic notation, similar to JSON but with extensions for binary data (e.g., using h'..' for hex strings), facilitates human-readable representations for debugging.[1]
CBOR is integral to several IETF protocols optimized for low-power and lossy networks, including the Constrained Application Protocol (CoAP, RFC 7252) where it serves as a compact payload format alongside JSON, and CBOR Object Signing and Encryption (COSE, RFC 9052) for lightweight cryptographic operations in IoT security. Other applications include Sensor Measurement Lists (SenML, RFC 8428) for encoding sensor data and Authentication and Authorization for Constrained Environments (ACE, RFC 9200) for secure resource access. Its adoption extends to deterministic encoding modes for canonical representations in security contexts, ensuring reproducibility across implementations.[2]
Overview
Definition and History
CBOR, or Concise Binary Object Representation, is a binary data serialization format designed for encoding structured data in a compact and efficient manner, serving as a binary alternative to JSON while supporting a broader range of data types and optimized for resource-constrained environments such as Internet of Things (IoT) devices.[3] It enables unambiguous encoding of common Internet data formats, with goals including small code size for encoders and decoders, schema-less decoding, and extensibility without version negotiation.[3] Developed by Carsten Bormann and Paul Hoffman, CBOR was initially specified in RFC 7049, published in October 2013, as a response to the limitations of text-based formats like JSON, which are verbose and inefficient in low-bandwidth or memory-limited scenarios.[4] The format was motivated by the need for a general-purpose binary encoding that supports JSON-like data models but offers greater compactness and applicability to both constrained nodes and high-volume applications, drawing on lessons from formats like ASN.1 and MessagePack without their complexities.[4] In December 2020, RFC 7049 was obsoleted by RFC 8949, which refined the specification with editorial improvements, clarifications on streaming use, diagnostic notations, and floating-point handling, while preserving full backward compatibility and designating CBOR as an IETF Internet Standard (STD 94).[3] Key extensions followed, including RFC 8746 in February 2020, which introduced tags for typed arrays to enhance support for numeric data structures like those in JavaScript and Web applications, and RFC 9277 in August 2022, which defined conventions for stable file storage of CBOR items to improve interoperability with file recognition tools.[5][6]Design Principles
CBOR's design prioritizes efficiency in resource-constrained environments, with core goals centered on minimizing implementation footprint, optimizing data compactness, accelerating parsing, and ensuring self-description without external dependencies. Specifically, the format aims for compact encoder and decoder codebases suitable for devices with limited memory and processing power, such as those in Internet of Things (IoT) applications.[7] Message representations are bounded in size relative to equivalent JSON encodings, avoiding verbose text overhead while maintaining readability in binary form.[8] Parsing is streamlined through a simple, stateless structure that reduces computational demands, and no schema is required for decoding, allowing data to be self-contained and interpretable directly from the encoded bytes.[7] Key features of CBOR further support these goals by enforcing deterministic encoding, where identical data always produces the same byte sequence under defined rules, such as using the shortest possible forms and sorting map keys.[9] It enables streaming via indefinite-length items for arrays, maps, and strings, which is essential for handling large or dynamically generated data without buffering entire payloads.[10] Extensibility is achieved through semantic tags that allow layering additional meaning, such as timestamps or big integers, without breaking compatibility for unaware decoders.[11] While aligned with the JSON data model to support seamless conversion between the formats—including all standard JSON types—CBOR extends this model natively with binary strings and integer ranges beyond JSON's 53-bit limit, accommodating arbitrary-precision values via tags.[7][11] In comparison to JSON, CBOR's binary nature yields significant size efficiencies for typical structured data, often resulting in more compact representations.[8] This compactness stems from fixed-length headers, lack of whitespace, and direct binary encoding of numbers and strings, while still preserving JSON compatibility for interoperability. CBOR avoids full schema dependence by relying on these tags for semantic enrichment, enabling self-descriptive data that conveys intent without predefined contracts.[12] The design emphasizes long-term stability and evolvability, with RFC 8949 providing full backward compatibility to its predecessor (RFC 7049) to support ongoing adoption across protocols and applications without disruptive changes.[3] This approach ensures CBOR remains viable for decades, accommodating extensions through reserved spaces while maintaining core determinism and simplicity.[13]Encoding Specification
Data Item Structure
Every CBOR data item begins with an initial byte that encodes both the major type and the initial additional information in a compact 8-bit value, where the high-order 3 bits represent the major type (ranging from 0 to 7) and the low-order 5 bits provide the initial additional information.[14] This structure allows for efficient serialization by combining type identification and basic length or value hints in a single byte.[15] The overall structure of a CBOR data item consists of a head, which includes the initial byte plus any additional bytes needed to fully specify the argument (such as length), followed by the content or payload bytes if applicable.[15] CBOR supports both definite-length items, where the length is explicitly provided in the head, and indefinite-length items, particularly for arrays, maps, byte strings, and text strings, which allow streaming by deferring the total length determination.[10] Indefinite-length items are delimited by a break stop byte (0xFF), which signals the end of the item without requiring prior knowledge of its size; this break must not appear where a data item is expected, ensuring unambiguous parsing.[16] For error handling, CBOR data is considered not well-formed if an unexpected break stop appears or if lengths are incomplete or invalid, in which case decoders are required to fail without producing an invalid item.[10] In terms of byte layout, a simple unsigned integer like 10 can be encoded in a single byte (0x0A), where the major type 0 and value 10 are combined directly in the initial byte.[17] For larger values, such as 500, the encoding uses multiple bytes: the initial byte (0x19) indicates major type 0 with additional information 25 (signifying a 2-byte unsigned integer follows), followed by the two payload bytes (0x01 0xF4).[17] This prefix mechanism extends the structure for scalability while maintaining compactness.[18]Field Encoding Methods
In CBOR, the additional information field within the initial byte of a data item encodes an unsigned integer argument that specifies details such as the value of small integers or the length of subsequent data. This argument is represented using one of several methods, prioritizing compactness by embedding small values directly and extending to additional bytes for larger ones. The methods ensure efficient serialization while maintaining a fixed structure for decoding.[19] The primary method, known as tiny encoding, applies to argument values from 0 to 23. Here, the 5-bit additional information field directly holds the value without requiring extra bytes, resulting in a single-byte data item head for these cases. This approach minimizes overhead for common short lengths in arrays or strings and small integer values. For example, an argument of 5 would use additional information 5, forming a head byte where the major type bits precede the literal 5.[19] For larger arguments, short encoding extends the initial byte with fixed-length additional bytes, using additional information values 24 through 27 to indicate the byte count. Specifically, additional information 24 is followed by one byte (an 8-bit unsigned integer in big-endian order), supporting values up to 255; 25 adds two bytes (16 bits, up to 65,535); 26 adds four bytes (32 bits, up to 4,294,967,295); and 27 adds eight bytes (64 bits, up to 18,446,744,073,709,551,615). These encodings allow the argument to reach up to 264-1, covering practical needs for lengths or integers without variable-length overhead. Values 28 through 30 are reserved and result in ill-formed CBOR in the current specification.[19] Indefinite-length encoding, signaled by additional information 31, omits an argument value entirely and is used for streaming or variable-size items like byte strings, text strings, arrays, or maps. Such items begin with a head byte indicating the major type combined with 31 and end with a break stop code (major type 7, additional information 31), allowing content to be appended dynamically without predefined lengths. This method supports applications requiring partial or incremental processing.[20] CBOR mandates using the shortest possible encoding for any argument to optimize size; for instance, a value of 100 must use additional information 24 followed by the byte 100 (0x64), not a longer form like 25 with leading zeros. All multi-byte arguments employ big-endian byte order, with the most significant byte first. Leading zeros are prohibited in these representations to ensure uniqueness and prevent ambiguity, particularly in deterministic serializations. To reconstruct the argument value, decoders interpret the additional bytes directly as an unsigned integer: for tiny encoding, it equals the additional information; for short encodings, it equals the big-endian value of the following bytes. No variable-length "long" encoding with arbitrary byte strings is defined for the argument itself, as the fixed short forms suffice up to 64 bits.[21][22]| Additional Information | Bytes Following | Argument Value Range | Total Bits |
|---|---|---|---|
| 0–23 | 0 | 0–23 | 5 |
| 24 | 1 | 0–255 | 13 |
| 25 | 2 | 0–65,535 | 21 |
| 26 | 4 | 0–232-1 | 37 |
| 27 | 8 | 0–264-1 | 69 |
| 31 | 0 | Indefinite | N/A |
Major Types Summary
CBOR defines eight major types, each identified by a 3-bit major type value in the initial byte of a data item. These types provide the foundational structure for representing diverse data forms in a compact binary format, drawing inspiration from JSON while extending to binary and semantic elements. The major types are encoded using an "additional information" field that specifies length or value details, allowing for efficient variable-length representations.[24] The following table summarizes the major types, their identifiers, and high-level roles:| Major Type | Identifier (Binary Prefix) | Description and Role |
|---|---|---|
| 0 | 0b000 | Unsigned integers, ranging from 0 to 264-1; used for non-negative whole numbers.[25] |
| 1 | 0b001 | Negative integers, ranging from -264 to -1, encoded as 1 plus the absolute value; used for negative whole numbers.[25] |
| 2 | 0b010 | Byte strings, representing arbitrary binary data of specified length; suitable for non-textual octet sequences and supports indefinite lengths.[25] |
| 3 | 0b011 | Text strings, consisting of UTF-8 encoded Unicode characters of specified length; intended for human-readable text and supports indefinite lengths.[25] |
| 4 | 0b100 | Arrays, denoting ordered sequences of data items with a specified or indefinite number of elements; used to group homogeneous or heterogeneous items.[25] |
| 5 | 0b101 | Maps, representing unordered collections of key-value pairs where the number of pairs is specified or indefinite, and keys must have definite lengths; employed for associative data structures.[25] |
| 6 | 0b110 | Semantic tags, pairing an integer tag number with an enclosed data item to impart additional meaning; enables extension of the data model without altering core types.[25] |
| 7 | 0b111 | Simple values and floating-point numbers, including false, true, null, undefined, half-precision (value 25), single-precision (26), double-precision (27), and infinity/NaN variants (28-31); also serves as the "break" stop code (31) for indefinite-length items, with no additional content for simple values.[25] |
Data Types
Integers (Types 0 and 1)
In CBOR, integers are represented using major types 0 and 1, providing a compact binary encoding for both unsigned and signed values up to 64 bits in length. These types leverage the general field encoding structure, where the first 3 bits indicate the major type and the remaining 5 bits form the additional information, which either directly specifies small values or indicates the length of subsequent bytes for larger ones. This design ensures minimal byte usage, making integers efficient for constrained environments.[24] Unsigned integers (major type 0) cover non-negative values from 0 to 264 - 1 (18,446,744,073,709,551,615). The encoding uses the additional information to represent the value directly for small integers (0 to 23) in a single byte, or to specify the byte length (1 to 8 bytes) for larger values, always in big-endian order. For example, the value 23 is encoded as the single byte0x17 (binary 000_10111, where the first 3 bits are 000 for type 0 and the next 5 bits are 10111 for 23), while 24 is encoded as 0x1818 (type 0 with additional info 24 indicating 1 byte follows, containing the value 24). This minimal-length encoding prevents unnecessary bytes, with values up to 255 using 2 bytes, up to 65,535 using 3 bytes, and so on, up to 8 bytes for the full range. CBOR does not support integers beyond 64 bits, and behavior for overflow is undefined.[25][27]
Signed integers (major type 1) represent negative values from -1 to -264 (down to -18,446,744,073,709,551,616), encoded by treating the absolute value offset by 1 as an unsigned integer under type 0. Specifically, a signed integer n (where n < 0) is encoded with major type 1 and an argument v = -1 - n, so the encoded bytes for v follow the same length rules as type 0. For decoding, the value is computed as:
n = -1 - v
or equivalently,
n = -(v + 1)
where v is the unsigned integer value from the additional information and following bytes. For instance, -1 is encoded as 0x20 (type 1 with additional info 0, decoding to -1 - 0 = -1), and -24 as 0x37 (type 1 with additional info 23, decoding to -1 - 23 = -24). Larger negative values, such as -100, use 0x3863 (type 1 with 1-byte length, followed by 99, decoding to -1 - 99 = -100). Like unsigned integers, signed encodings are limited to 64 bits and use the minimal number of bytes.[25][27][28]
For interoperability, CBOR recommends canonicalization by always using the shortest possible encoding for integers, avoiding padded representations (e.g., encoding 23 as 0x17 rather than 0x190017). This "preferred serialization" is particularly important in applications like digital signing, where deterministic encoding ensures consistent byte streams across implementations. Decoders must support all valid encodings but may reject non-minimal ones in strict modes.[29][30]
Strings (Types 2 and 3)
In CBOR, strings are represented using major types 2 and 3 to encode binary and textual data, respectively, allowing for efficient serialization of arbitrary octet sequences or Unicode text.[15] These types support both definite and indefinite length encodings, where the length is specified in bytes rather than characters or code points, ensuring compact representation suitable for constrained environments.[17] Byte strings, encoded with major type 2 (initial byte values 0x40 to 0x5b), represent arbitrary sequences of zero or more bytes.[17] The additional information in the initial byte determines the length: values 0x40 to 0x57 indicate lengths from 0 to 23 bytes directly, while 0x58, 0x59, 0x5a, and 0x5b precede 1-byte, 2-byte, 4-byte, or 8-byte unsigned integers for longer lengths up to 2^64-1 bytes.[17] The payload follows immediately as the exact number of bytes specified, with no further interpretation.[17] For indefinite-length byte strings, the initial byte is 0x5f, followed by one or more definite-length byte string chunks (all of major type 2), and terminated by a break symbol (0xFF).[31] This form is particularly useful for streaming applications where the total length is unknown in advance.[31] Text strings, encoded with major type 3 (initial byte values 0x60 to 0x7b), represent Unicode text as well-formed UTF-8 encoded byte sequences, excluding any byte order mark (BOM).[10] The length determination follows the same structure as byte strings, but the payload must consist of valid UTF-8 bytes, with the length counting bytes rather than Unicode code points.[10] Indefinite-length text strings use initial byte 0x7f, followed by definite-length text string chunks (all of major type 3) that concatenate without splitting multi-byte code points, and end with the break symbol (0xFF).[31] Decoders must validate that the entire string is well-formed UTF-8 per RFC 3629; any ill-formed sequences render the item invalid, though handling of such errors is application-defined (e.g., rejection or substitution).[10][32] For canonicalization in diagnostic or deterministic contexts, text strings should use the shortest possible UTF-8 encoding for each code point, and indefinite-length forms are typically avoided except for streaming scenarios.[33] Byte strings have no such normalization requirements, as they are opaque octet sequences.[17] Examples illustrate these encodings. An empty byte string is encoded as0x40.[34] A byte string containing the four bytes 01 02 03 04 is 0x4401020304, where 0x44 indicates major type 2 with a 4-byte length.[34] For text strings, an empty string is 0x60, and the string "IETF" (UTF-8 bytes 49 45 54 46) is 0x6449455446.[35] An indefinite-length text string for "streaming" might be 0x7f657374726561646d696e67ff, comprising chunks "strea" (0x657374726561) and "ming" (0x6d696e67) followed by the break.[35]
Arrays and Maps (Types 4 and 5)
In CBOR, major type 4 represents arrays, which are ordered sequences of zero or more data items of any type. The encoding begins with an initial byte where the three most significant bits are 100, and the five least significant bits form the additional information specifying the number of data items (N) in the array. For definite-length arrays, if N is between 0 and 23, the additional information directly encodes N; for larger values, it indicates the length of the subsequent unsigned integer encoding N (using 1, 2, 4, or 8 bytes for additional information values 24 through 27, respectively). This is followed by exactly N encoded data items, which may include nested arrays or other types.[17] For indefinite-length arrays, the additional information is set to 31 (encoded as 0x9f), allowing the array to be streamed without knowing N in advance. In this case, data items are encoded sequentially until terminated by a break stop code (0xff), which signals the end of the array without counting elements. Nesting is supported in both definite and indefinite forms, with each level independently terminated in indefinite cases, though practical implementations impose recursion depth limits based on stack size or memory constraints. The total size in bytes for an array is the size of the head byte (plus any length-encoding bytes) plus the sum of the sizes of its data items.[10] Major type 5 encodes maps, which are unordered collections of key-value pairs where keys are distinct data items and values are arbitrary data items. The initial byte has the three most significant bits as 101, with the additional information specifying the number of pairs (M) in a manner identical to arrays: direct for 0-23, or followed by 1-8 bytes for larger M. This is followed by 2M data items, alternating as key1, value1, ..., keyM, valueM, with no semantic meaning to the order of pairs. Semantically, duplicate keys render the map invalid, though decoders may process them as well-formed but indeterminate. Keys can be of any type, though text strings are common for interoperability.[17][10] Indefinite-length maps use additional information 31 (0xbf) and consist of alternating key-value pairs until a break stop code (0xff); a break following a key without its value is invalid. Like arrays, nesting is permitted within keys or values, subject to practical depth limits. The total byte size follows the same formula as arrays: head (plus length bytes) plus the sum of all key and value item sizes. For deterministic encoding, such as in canonical CBOR, maps must use definite lengths, and pairs must be ordered by the bytewise lexicographic sort of their keys to ensure consistent representations across encoders.[10][9] For example, an empty array encodes as 0x80 (major type 4 with additional information 0). A map with a single pair, such as the key "a" (text string) mapping to the integer 1, encodes in definite length as 0xa16161 01, where 0xa1 indicates type 5 with one pair, 0x61 encodes the one-byte string "a", and 0x01 is the integer value. Indefinite forms would use 0x9f ... ff for arrays and 0xbf ... ff for maps, with elements or pairs in between.[36]Semantic Tags (Type 6)
Semantic tags in CBOR are encoded using major type 6, which consists of a major type byte with the value 6 (binary 110), followed by an additional information value that encodes a non-negative integer tag number (ranging from 0 to 2^64-1) using the standard field encoding for unsigned integers, and then the tagged item itself, which can be any valid CBOR data item.[37] This structure allows the tag to wrap a single enclosed data item without modifying the underlying CBOR format, providing a mechanism to attach interpretive semantics to the content. The purpose of these tags is to convey additional meaning to the enclosed data, such as interpreting a byte string as a big integer, a text string as a date, or an array as a self-describing structure, while leaving the core encoding intact; tags are optional and may be ignored by decoders that do not recognize them, treating the item as untagged.[37] Nesting of tags is permitted, enabling multiple layers of semantics where an outer tag applies to the result of an inner tagged item; for instance, a big integer (tag 2) could itself be tagged with a higher-level semantic.[37] The tag number itself is always encoded with a definite length, but the enclosed tagged item may use indefinite length encoding if it is a construct like an indefinite-length array or map.[38] For example, tag 1 (epoch date/time) might enclose an unsigned integer representing seconds since the epoch, encoded as 0xd8 01 (tag number 1 in two bytes) followed by 0x1a 47d6 0b80 (a 32-bit unsigned integer for the timestamp value), while tag 4 encloses an array of two integers to represent a decimal fraction (mantissa × 10exponent).[37] In terms of canonicalization and equivalence, semantic tags are generally ignored during value comparisons unless the specific tag semantics define otherwise; for instance, two items are considered equal if they are both untagged or both carry the same tag number with equal enclosed content, but the presence of unrecognized tags does not affect basic equality.[39] Deterministic encoding rules for CBOR do not mandate including tags unless required by the application, prioritizing the shortest possible representation for the underlying data item.[22] This design ensures that tags enhance portability and extensibility without compromising the format's conciseness or compatibility.[37]Simple and Floating-Point Values (Type 7)
Major type 7 in CBOR is used to encode both simple values and floating-point numbers according to the IEEE 754 standard.[40] These data items are represented with a single leading byte for simple values or a leading byte followed by the appropriate number of bytes for floating-point values, all in big-endian byte order.[40] Unlike other major types, type 7 does not support indefinite-length encodings; all items have a definite length determined by the additional information in the head.[40] Simple values under major type 7 are encoded as single bytes with no additional data, using additional information values 20 through 23.[40] The valuefalse is encoded as 0xf4, true as 0xf5, null as 0xf6, and undefined as 0xf7.[40] These encodings directly map to their semantic meanings without further interpretation, providing compact representations for common constants in data interchange.[40]
Floating-point values are encoded using additional information values 25, 26, and 27 for half-precision (16 bits), single-precision (32 bits), and double-precision (64 bits), respectively.[40] The leading byte is 0xf9 for half-precision followed by two bytes, 0xfa for single-precision followed by four bytes, and 0xfb for double-precision followed by eight bytes.[40] To decode these, the subsequent bytes are interpreted directly as the corresponding IEEE 754 binary format in big-endian order:
\text{value} = \text{IEEE 754 interpretation of the bytes}
For example, the single-precision encoding of 1.0 is 0xfa 3f 80 00 00, where the bytes 3f 80 00 00 represent the IEEE 754 binary32 value.[41]
Special floating-point values are handled per IEEE 754 conventions within these formats.[40] Positive infinity uses an exponent of all ones and a zero fraction, with the sign bit determining positive (+∞) or negative (-∞); for instance, half-precision positive infinity is 0xf9 7c 00.[41] NaN (Not a Number) is encoded as a quiet NaN with the exponent all ones and non-zero fraction, preserving any payload in the significand bits and allowing the sign bit to be set or cleared as needed.[40] Negative zero (-0.0) is distinguished from positive zero by its sign bit, such as 0xf9 80 00 in half-precision.[41] RFC 8949 clarifies that NaN payloads are retained during encoding and decoding to maintain fidelity, while the sign bit for special values follows IEEE 754 rules without additional restrictions.[40]
Semantic Tags
Registry Management
The CBOR semantic tags registry is maintained by the Internet Assigned Numbers Authority (IANA) and is accessible at the "CBOR Tags" registry page.[42] This registry serves as the authoritative source for all registered tags, ensuring global uniqueness and interoperability across CBOR implementations.[43] Assignment policies for new tags are defined in RFC 8949, which obsoletes the initial policies in RFC 7049, and follow the general guidelines in RFC 8126 for IANA registries.[11] Tags in the range 0–23 require Standards Action, typically through an IETF standards process.[43] Tags 24–255 and 256–32767 fall under Specification Required, involving expert review by designated experts (Christian Amsüss, Thomas Fossati, Francesca Palombini, or Carsten Bormann as backup) to verify the accompanying specification.[42][43] The range 32768–18,446,744,073,709,551,615 is allocated on a First Come, First Served basis, with registrations needing a point of contact but no formal specification.[43] All registered tags are global in scope and must specify the type(s) of data items they tag, along with their semantics, to enable consistent interpretation.[11] The registry was initially established in RFC 7049 in 2013, which defined the first set of tags and basic allocation rules.[44] It was expanded and refined in RFC 8949 in 2023 to address errata, improve clarity, and update policies for better extensibility.[3] Recent additions include tags 64–79 for typed arrays of integers, registered via RFC 8746 in 2020, which supports efficient encoding of numeric arrays in constrained environments.[5] Subsequent RFCs like RFC 9164 (2021) and RFC 9090 (2021) have added tags for IP addresses/prefixes and object identifiers, respectively. Some tags, such as 260 and 261, have been deprecated per RFC 9164. The registry continues to grow through ongoing IETF contributions, with the last update as of November 2025.[42] For private use, the range 65,536–18,446,744,073,709,551,615 is reserved, allowing implementers to define tags locally without registration or collision avoidance, though documentation is encouraged for interoperability.[11] Errata and clarifications are tracked through the IETF's RFC errata system.Common Tags and Examples
Semantic tags in CBOR extend the basic data model by associating specific meanings with tagged items, enabling representations of complex concepts like dates, identifiers, and high-precision numbers. The most frequently used tags are defined in the initial registry established by RFC 8949, with additional common ones added in subsequent specifications such as RFC 8746 for typed arrays. These tags are applied to major type 6 items, where the tag number precedes the content item, and their semantics are standardized to ensure interoperability across implementations.[3] Tag 1 represents an epoch-based date/time as seconds since 1970-01-01T00:00:00Z, tagging an unsigned integer for whole seconds or a floating-point number for subsecond precision. For example, the date 2025-11-10T00:00:00Z encodes as tag 1 applied to the unsigned integer 1762732800, yielding the diagnostic notation1(1762732800). This tag supports subsecond resolution when using IEEE 754 double-precision floats, allowing fractional seconds beyond integer boundaries.[3][45]
Tag 24 denotes embedded CBOR, indicating that the tagged content is another complete CBOR item, often used for indefinite-length or nested structures without altering the outer encoding. This facilitates modular data composition, such as embedding diagnostic information or partial streams within a larger CBOR object.[3]
Tags 4 and 5 provide representations for high-precision floating-point numbers as bigfloats. Tag 4 specifies a decimal fraction in the form of an array [exponent, mantissa], where the value is mantissa × 10^exponent, with both elements being integers (the mantissa may be negative). Tag 5 similarly defines a binary bigfloat as mantissa × 2^exponent. These tags are essential for applications requiring exact decimal or binary floating-point arithmetic beyond native CBOR floating-point types.[3]
Tag 32 identifies a URI, tagging a text string encoded in UTF-8 per RFC 3986, allowing compact representation of web addresses and resource locators. For instance, the URI "https://example.com" would be tagged as 32("https://example.com").[3][46]
Tag 37 denotes a UUID (Universally Unique Identifier), tagging a 16-byte byte string as defined in RFC 4122, providing a standardized way to embed unique identifiers in CBOR data. This is commonly used in distributed systems for entity referencing without string overhead.[3]
Tag 55799 marks self-described CBOR, applied to a map that includes keys for version (integer) and content type (text string or integer MIME type), enabling dynamic description of the enclosed CBOR item's structure and semantics. This tag supports versioning and metadata attachment for extensible formats.[3]
Tags 64 through 79, introduced in RFC 8746, represent typed arrays encoded as byte strings, mirroring JavaScript TypedArray semantics for efficient numeric array handling. For example, tag 64 tags a byte string as a uint8 array (unsigned 8-bit integers), tag 72 as a sint8 array (signed 8-bit integers), and higher tags like 66 (uint32, big-endian) or 70 (uint32, little-endian) specify varying bit widths and byte orders. These tags are particularly useful for binary data interchange in performance-sensitive environments, such as numerical computations or sensor data streams.[5]
Applications
Cryptographic Uses
CBOR plays a central role in cryptographic protocols through the CBOR Object Signing and Encryption (COSE) framework, which defines structures for signatures, message authentication codes, and encryption using CBOR's binary encoding for compactness and efficiency.[47] Originally specified in RFC 8152, COSE was updated in RFC 9052 (published August 2022) to refine message structures and CBOR integration, while RFC 9053 details initial algorithms such as ECDSA, EdDSA, HMAC, and AES-based encryption that leverage CBOR for headers and payloads.[47][48] This enables secure object handling in resource-constrained settings, where COSE structures like signed objects are encoded as CBOR arrays consisting of a protected header (a binary-encoded CBOR map of authenticated parameters), an unprotected header (a CBOR map of non-authenticated parameters), the payload (arbitrary CBOR data), and the signature (a byte string).[49] Another key application is in token formats, exemplified by CBOR Web Tokens (CWT) as defined in RFC 8392, which provide a binary alternative to JSON Web Tokens (JWT) for representing claims in a compact CBOR map with integer or string keys (e.g., key 1 for issuer, key 2 for subject).[50] CWTs are secured using COSE for signing or encryption, with claims avoiding additional CBOR tags to maintain simplicity, though the overall CWT may use tag 61 for identification.[51] This design ensures interoperability while reducing overhead compared to text-based JWTs.[52] CBOR's advantages in cryptography stem from its small message size and support for deterministic encoding, which produces identical byte sequences for equivalent data items—essential for consistent hashing and verification in protocols like TLS extensions or IoT security.[9] For instance, a simple COSE_Sign1 object (a single-signature variant) follows the array format[protected, unprotected, payload, signature], where the protected header might encode algorithm and key identifiers in a compact CBOR map.[49]
Enhancements in RFC 8949 (the current CBOR standard, published December 2020) further bolster cryptographic determinism by specifying rules for encoding floats (using the shortest form that preserves value, such as binary16 or binary32) and handling NaN values (preferring zero-padded representations where feasible), preventing variations that could undermine security processes like digital signatures.[53][54] These rules ensure reproducible encodings across implementations, making CBOR suitable for high-stakes cryptographic applications.[9]
Constrained Environments and IoT
CBOR's design emphasizes compactness and simplicity, making it particularly suitable for Internet of Things (IoT) applications where devices often operate under severe resource constraints, such as limited memory, processing power, and bandwidth. In sensor networks, for instance, CBOR enables efficient serialization of measurement data with low overhead. The Sensor Measurement Lists (SenML) specification, defined in RFC 8428, utilizes CBOR to represent simple sensor readings and device parameters, incorporating semantic tag 34 for base64url encoding of binary data values, while units are encoded as UTF-8 text strings (e.g., "Cel" for Celsius). This approach allows IoT devices to transmit structured data like temperature, humidity, or location readings in a binary format that minimizes transmission costs while preserving extensibility through tags.[55] On constrained devices, CBOR implementations are notably lightweight, facilitating deployment on microcontrollers with minimal resources. Full CBOR decoders can be as small as 880 bytes in ARM assembly code, well under 10 KB, enabling real-time processing without straining limited RAM or flash storage. In protocols like the Constrained Application Protocol (CoAP), defined in RFC 7252, CBOR serves as an efficient payload format for resource representations, allowing constrained nodes to exchange data over UDP with reduced overhead compared to text-based alternatives. This integration supports block-wise transfers for larger payloads, as extended in RFC 9177, ensuring reliability in unreliable networks typical of IoT deployments.[56][57][58] To enhance interoperability in such environments, CBOR employs extensions like the Concise Data Definition Language (CDDL) for schema description, as specified in RFC 8610. CDDL provides a human-readable notation to define expected CBOR structures, aiding validation and generation of IoT messages without requiring verbose grammars. Additionally, RFC 9277 introduces stable storage formats for CBOR data items, including CBOR Sequences for streaming multiple items and self-describing tags (e.g., tag 55799 for definite-length items) to ensure file stability and recognition by tools like the Unixfile command, which is useful for logging or archiving IoT data streams.[59][60]
Practical examples of CBOR in IoT include smart home protocols, where it underpins efficient communication in systems like CoAP-based gateways for device discovery and control, and firmware updates, which leverage CBOR's compact encoding to bundle metadata, checksums, and binary images for over-the-air (OTA) delivery on resource-limited hardware. For semantic interoperability, CBOR integrates with JSON-LD through the W3C's JSON-LD 1.1 in CBOR specification (2024 note), mapping JSON-LD's linked data model to CBOR's binary structure—such as serializing objects as maps (major type 5) and binaries as byte strings (major type 2)—to enable lightweight semantic web applications in the Web of Things.[61][62]
A key benefit of CBOR in these contexts is its bandwidth efficiency, with payloads often 40-60% smaller than equivalent JSON representations for typical IoT data like sensor arrays or configuration maps, as demonstrated in evaluations of time-series encodings and protocol benchmarks. This reduction supports binary media types (e.g., application/cbor) registered with IANA, allowing seamless integration with HTTP/CoAP while conserving energy on battery-powered devices. Overall, CBOR's adoption in IoT standards underscores its role in enabling scalable, low-power ecosystems.[63]