Fact-checked by Grok 2 weeks ago

CBOR

Concise Binary Object Representation (CBOR) is a format designed to achieve small code size and message size while supporting extensibility without the need for version negotiation or schemas. It is based on an extended version of the data model, incorporating additional types such as binary strings, and enables straightforward conversion between CBOR and representations. Specified as an in RFC 8949, which obsoletes the earlier RFC 7049, CBOR is particularly suited for use in constrained devices and networks, such as those in (IoT) applications, by prioritizing compactness and efficiency in encoding common data structures like integers, strings, arrays, and maps. CBOR's encoding scheme is structured around an initial byte that encodes both a major type (using the three most significant bits) and additional information (using the five least significant bits), allowing for definite-length items with explicit sizes or indefinite-length items terminated by a "break" stop code. The eight major types include unsigned integers (type 0), negative integers (type 1), byte strings (type 2), text strings in (type 3), arrays of data items (type 4), maps of key-value pairs (type 5), semantic tags for extensibility (type 6), and simple values or floating-point numbers such as true, false, null, and floats (type 7). This design ensures self-description without external schemas, making it schema-less like while being more compact for . Developed initially by Carsten Bormann and Paul Hoffman within the IETF's Constrained RESTful Environments () working group, CBOR addresses limitations of text-based formats like in resource-constrained settings by providing a alternative that supports the same interoperability goals. It has been extended through additional s, including RFC 8610 for the Concise (CDDL), a notation for specifying CBOR and data structures, and RFC 8742 for CBOR Sequences, which allow concatenation of multiple CBOR items. CBOR's diagnostic notation, similar to but with extensions for (e.g., using h'..' for strings), facilitates human-readable representations for . CBOR is integral to several IETF protocols optimized for low-power and lossy networks, including the (CoAP, RFC 7252) where it serves as a compact format alongside , and CBOR Object Signing and Encryption (COSE, RFC 9052) for lightweight cryptographic operations in . Other applications include Sensor Measurement Lists (SenML, RFC 8428) for encoding sensor data and Authentication and Authorization for Constrained Environments (, RFC 9200) for secure resource access. Its adoption extends to deterministic encoding modes for representations in contexts, ensuring across implementations.

Overview

Definition and History

CBOR, or Concise Binary Object Representation, is a serialization format designed for encoding structured data in a compact and efficient manner, serving as a alternative to while supporting a broader range of data types and optimized for resource-constrained environments such as () devices. It enables unambiguous encoding of common data formats, with goals including small code size for encoders and decoders, schema-less decoding, and extensibility without version negotiation. Developed by Carsten Bormann and Paul Hoffman, CBOR was initially specified in RFC 7049, published in October 2013, as a response to the limitations of text-based formats like , which are verbose and inefficient in low-bandwidth or memory-limited scenarios. The format was motivated by the need for a general-purpose binary encoding that supports JSON-like data models but offers greater compactness and applicability to both constrained nodes and high-volume applications, drawing on lessons from formats like and without their complexities. In December 2020, RFC 7049 was obsoleted by RFC 8949, which refined the specification with editorial improvements, clarifications on streaming use, diagnostic notations, and floating-point handling, while preserving full and designating CBOR as an IETF (STD 94). Key extensions followed, including RFC 8746 in February 2020, which introduced tags for typed arrays to enhance support for numeric data structures like those in and Web applications, and RFC 9277 in August 2022, which defined conventions for stable file storage of CBOR items to improve interoperability with file recognition tools.

Design Principles

CBOR's design prioritizes efficiency in resource-constrained environments, with core goals centered on minimizing implementation footprint, optimizing data compactness, accelerating parsing, and ensuring self-description without external dependencies. Specifically, the format aims for compact encoder and decoder codebases suitable for devices with limited memory and processing power, such as those in () applications. Message representations are bounded in size relative to equivalent encodings, avoiding verbose text overhead while maintaining readability in binary form. Parsing is streamlined through a simple, stateless structure that reduces computational demands, and no is required for decoding, allowing data to be self-contained and interpretable directly from the encoded bytes. Key features of CBOR further support these goals by enforcing deterministic encoding, where identical data always produces the same byte sequence under defined rules, such as using the shortest possible forms and sorting map keys. It enables streaming via indefinite-length items for arrays, maps, and strings, which is essential for handling large or dynamically generated data without buffering entire payloads. Extensibility is achieved through semantic tags that allow layering additional meaning, such as timestamps or big s, without breaking compatibility for unaware decoders. While aligned with the data model to support seamless conversion between the formats—including all standard types—CBOR extends this model natively with strings and integer ranges beyond 's 53-bit limit, accommodating arbitrary-precision values via tags. In comparison to , CBOR's binary nature yields significant size efficiencies for typical structured data, often resulting in more compact representations. This compactness stems from fixed-length headers, lack of whitespace, and direct encoding of numbers and strings, while still preserving JSON compatibility for . CBOR avoids full dependence by relying on these tags for semantic enrichment, enabling self-descriptive data that conveys intent without predefined contracts. The design emphasizes long-term stability and evolvability, with RFC 8949 providing full to its predecessor (RFC 7049) to support ongoing adoption across protocols and applications without disruptive changes. This approach ensures CBOR remains viable for decades, accommodating extensions through reserved spaces while maintaining core and .

Encoding Specification

Data Item Structure

Every CBOR data item begins with an initial byte that encodes both the major type and the initial additional information in a compact 8-bit value, where the high-order 3 bits represent the major type (ranging from 0 to 7) and the low-order 5 bits provide the initial additional information. This structure allows for efficient serialization by combining type identification and basic length or value hints in a single byte. The overall structure of a CBOR data item consists of a head, which includes the initial byte plus any additional bytes needed to fully specify the argument (such as length), followed by the content or payload bytes if applicable. CBOR supports both definite-length items, where the length is explicitly provided in the head, and indefinite-length items, particularly for arrays, maps, byte strings, and text strings, which allow streaming by deferring the total length determination. Indefinite-length items are delimited by a break stop byte (0xFF), which signals the end of the item without requiring prior knowledge of its size; this break must not appear where a item is expected, ensuring unambiguous . For error handling, CBOR data is considered not well-formed if an unexpected break stop appears or if lengths are incomplete or invalid, in which case decoders are required to fail without producing an invalid item. In terms of byte layout, a simple unsigned like 10 can be encoded in a single byte (0x0A), where the major type 0 and value 10 are combined directly in the initial byte. For larger values, such as 500, the encoding uses multiple bytes: the initial byte (0x19) indicates major type 0 with additional information 25 (signifying a 2-byte unsigned follows), followed by the two bytes (0x01 0xF4). This prefix mechanism extends the structure for scalability while maintaining compactness.

Field Encoding Methods

In CBOR, the additional information field within the initial byte of a data item encodes an unsigned argument that specifies details such as the value of small integers or the length of subsequent . This argument is represented using one of several methods, prioritizing by embedding small values directly and extending to additional bytes for larger ones. The methods ensure efficient serialization while maintaining a fixed for decoding. The primary method, known as tiny encoding, applies to argument values from 0 to 23. Here, the 5-bit additional information field directly holds the value without requiring extra bytes, resulting in a single-byte item head for these cases. This approach minimizes overhead for common short lengths in arrays or strings and small values. For example, an argument of 5 would use additional information 5, forming a head byte where the major type bits precede the literal 5. For larger arguments, short encoding extends the initial byte with fixed-length additional bytes, using additional information values 24 through 27 to indicate the byte count. Specifically, additional information 24 is followed by one byte (an 8-bit unsigned in big-endian order), supporting values up to 255; 25 adds two bytes ( bits, up to 65,535); 26 adds four bytes ( bits, up to ); and 27 adds eight bytes ( bits, up to 18,446,744,073,709,551,615). These encodings allow the argument to reach up to 264-1, covering practical needs for lengths or without variable-length overhead. Values 28 through 30 are reserved and result in ill-formed CBOR in the current specification. Indefinite-length encoding, signaled by additional information 31, omits an argument value entirely and is used for streaming or variable-size items like byte strings, text strings, arrays, or maps. Such items begin with a head byte indicating the major type combined with 31 and end with a break stop code (major type 7, additional information 31), allowing content to be appended dynamically without predefined lengths. This method supports applications requiring partial or incremental processing. CBOR mandates using the shortest possible encoding for any argument to optimize size; for instance, a value of 100 must use additional 24 followed by the byte 100 (0x64), not a longer form like 25 with leading zeros. All multi-byte arguments employ big-endian byte order, with the most significant byte first. Leading zeros are prohibited in these representations to ensure and prevent , particularly in deterministic serializations. To reconstruct the argument , decoders interpret the additional bytes directly as an unsigned : for tiny encoding, it equals the additional information; for short encodings, it equals the big-endian value of the following bytes. No variable-length "long" encoding with arbitrary byte strings is defined for the argument itself, as the fixed short forms suffice up to 64 bits.
Additional InformationBytes FollowingArgument Value RangeTotal Bits
0–2300–235
2410–25513
2520–65,53521
2640–232-137
2780–264-169
310IndefiniteN/A
These methods apply uniformly to encode lengths for major types like arrays or strings, ensuring consistent across CBOR implementations.

Major Types Summary

CBOR defines eight major types, each identified by a 3-bit major type value in the initial byte of a data item. These types provide the foundational structure for representing diverse forms in a compact binary format, drawing inspiration from while extending to binary and semantic elements. The major types are encoded using an "additional information" field that specifies length or value details, allowing for efficient variable-length representations. The following table summarizes the major types, their identifiers, and high-level roles:
Major TypeIdentifier (Binary Prefix)Description and Role
00b000Unsigned integers, ranging from 0 to 264-1; used for non-negative whole numbers.
10b001Negative integers, ranging from -264 to -1, encoded as 1 plus the absolute value; used for negative whole numbers.
20b010Byte strings, representing arbitrary binary data of specified length; suitable for non-textual octet sequences and supports indefinite lengths.
30b011Text strings, consisting of UTF-8 encoded Unicode characters of specified length; intended for human-readable text and supports indefinite lengths.
40b100Arrays, denoting ordered sequences of data items with a specified or indefinite number of elements; used to group homogeneous or heterogeneous items.
50b101Maps, representing unordered collections of key-value pairs where the number of pairs is specified or indefinite, and keys must have definite lengths; employed for associative data structures.
60b110Semantic tags, pairing an integer tag number with an enclosed data item to impart additional meaning; enables extension of the data model without altering core types.
70b111Simple values and floating-point numbers, including false, true, null, undefined, half-precision (value 25), single-precision (26), double-precision (27), and infinity/NaN variants (28-31); also serves as the "break" stop code (31) for indefinite-length items, with no additional content for simple values.
Indefinite lengths, indicated by an additional information value of 31, apply to major types 2 through 5, allowing streaming of content terminated by the break code from type 7; type 7 itself contains no payload beyond its value identifier.

Data Types

Integers (Types 0 and 1)

In CBOR, integers are represented using major types 0 and 1, providing a compact encoding for both unsigned and signed values up to 64 bits in length. These types leverage the general field encoding structure, where the first 3 bits indicate the major type and the remaining 5 bits form the additional information, which either directly specifies small values or indicates the length of subsequent bytes for larger ones. This design ensures minimal byte usage, making integers efficient for constrained environments. Unsigned integers (major type 0) cover non-negative values from to 264 - 1 (18,446,744,073,709,551,615). The encoding uses the additional information to represent the value directly for small integers ( to 23) in a single byte, or to specify the byte length (1 to 8 bytes) for larger values, always in big-endian order. For example, the value 23 is encoded as the single byte 0x17 (binary 000_10111, where the first 3 bits are 000 for type 0 and the next 5 bits are 10111 for 23), while is encoded as 0x1818 (type 0 with additional info 24 indicating 1 byte follows, containing the value ). This minimal-length encoding prevents unnecessary bytes, with values up to 255 using 2 bytes, up to 65,535 using 3 bytes, and so on, up to 8 bytes for the full range. CBOR does not support integers beyond bits, and behavior for is undefined. Signed integers (major type 1) represent negative values from -1 to -264 (down to -18,446,744,073,709,551,616), encoded by treating the absolute value offset by 1 as an unsigned integer under type 0. Specifically, a signed integer n (where n < 0) is encoded with major type 1 and an argument v = -1 - n, so the encoded bytes for v follow the same length rules as type 0. For decoding, the value is computed as: n = -1 - v or equivalently, n = -(v + 1) where v is the unsigned integer value from the additional information and following bytes. For instance, -1 is encoded as 0x20 (type 1 with additional info 0, decoding to -1 - 0 = -1), and -24 as 0x37 (type 1 with additional info 23, decoding to -1 - 23 = -24). Larger negative values, such as -100, use 0x3863 (type 1 with 1-byte length, followed by 99, decoding to -1 - 99 = -100). Like unsigned integers, signed encodings are limited to 64 bits and use the minimal number of bytes. For interoperability, CBOR recommends by always using the shortest possible encoding for integers, avoiding padded representations (e.g., encoding 23 as 0x17 rather than 0x190017). This "preferred " is particularly important in applications like digital signing, where deterministic encoding ensures consistent byte streams across implementations. Decoders must support all valid encodings but may reject non-minimal ones in strict modes.

Strings (Types 2 and 3)

In CBOR, strings are represented using major types 2 and 3 to encode binary and textual data, respectively, allowing for efficient serialization of arbitrary octet sequences or text. These types support both definite and indefinite length encodings, where the length is specified in bytes rather than characters or code points, ensuring compact representation suitable for constrained environments. Byte strings, encoded with major type 2 (initial byte values 0x40 to 0x5b), represent arbitrary sequences of zero or more bytes. The additional information in the initial byte determines the length: values 0x40 to 0x57 indicate lengths from 0 to 23 bytes directly, while 0x58, 0x59, 0x5a, and 0x5b precede 1-byte, 2-byte, 4-byte, or 8-byte unsigned integers for longer lengths up to 2^64-1 bytes. The payload follows immediately as the exact number of bytes specified, with no further interpretation. For indefinite-length byte strings, the initial byte is 0x5f, followed by one or more definite-length byte string chunks (all of major type 2), and terminated by a break symbol (0xFF). This form is particularly useful for streaming applications where the total length is unknown in advance. Text strings, encoded with major type 3 (initial byte values 0x60 to 0x7b), represent text as well-formed encoded byte sequences, excluding any (BOM). The length determination follows the same structure as byte strings, but the payload must consist of valid bytes, with the length counting bytes rather than Unicode code points. Indefinite-length text strings use initial byte 0x7f, followed by definite-length text string chunks (all of major type 3) that concatenate without splitting multi-byte code points, and end with the break symbol (0xFF). Decoders must validate that the entire string is well-formed per RFC 3629; any ill-formed sequences render the item invalid, though handling of such errors is application-defined (e.g., rejection or substitution). For canonicalization in diagnostic or deterministic contexts, text strings should use the shortest possible UTF-8 encoding for each code point, and indefinite-length forms are typically avoided except for streaming scenarios. Byte strings have no such normalization requirements, as they are opaque octet sequences. Examples illustrate these encodings. An empty byte string is encoded as 0x40. A byte string containing the four bytes 01 02 03 04 is 0x4401020304, where 0x44 indicates major type 2 with a 4-byte length. For text strings, an empty string is 0x60, and the string "IETF" (UTF-8 bytes 49 45 54 46) is 0x6449455446. An indefinite-length text string for "streaming" might be 0x7f657374726561646d696e67ff, comprising chunks "strea" (0x657374726561) and "ming" (0x6d696e67) followed by the break.

Arrays and Maps (Types 4 and 5)

In CBOR, major type 4 represents , which are ordered sequences of zero or more data items of any type. The encoding begins with an initial byte where the three most significant bits are 100, and the five least significant bits form the additional information specifying the number of data items (N) in the array. For definite-length arrays, if N is between 0 and 23, the additional information directly encodes N; for larger values, it indicates the length of the subsequent unsigned integer encoding N (using 1, 2, 4, or 8 bytes for additional information values 24 through 27, respectively). This is followed by exactly N encoded data items, which may include nested or other types. For indefinite-length arrays, the additional information is set to 31 (encoded as 0x9f), allowing the array to be streamed without knowing N in advance. In this case, items are encoded sequentially until terminated by a break stop code (0xff), which signals the end of the array without counting elements. Nesting is supported in both definite and indefinite forms, with each level independently terminated in indefinite cases, though practical implementations impose depth limits based on size or constraints. The total size in bytes for an array is the size of the head byte (plus any length-encoding bytes) plus the sum of the sizes of its items. Major type 5 encodes , which are unordered collections of key-value pairs where keys are distinct data items and values are arbitrary data items. The initial byte has the three most significant bits as 101, with the additional information specifying the number of pairs (M) in a manner identical to arrays: direct for 0-23, or followed by 1-8 bytes for larger M. This is followed by 2M data items, alternating as key1, value1, ..., keyM, valueM, with no semantic meaning to the of pairs. Semantically, duplicate keys render the map invalid, though decoders may process them as well-formed but indeterminate. Keys can be of any type, though text strings are common for . Indefinite-length maps use additional information 31 (0xbf) and consist of alternating key-value pairs until a break stop code (0xff); a break following a key without its value is invalid. Like arrays, nesting is permitted within keys or values, subject to practical depth limits. The total byte size follows the same formula as arrays: head (plus length bytes) plus the sum of all key and value item sizes. For deterministic encoding, such as in canonical CBOR, maps must use definite lengths, and pairs must be ordered by the bytewise lexicographic sort of their keys to ensure consistent representations across encoders. For example, an empty encodes as 0x80 (major type 4 with additional information 0). A with a single pair, such as the key "a" (text string) mapping to the 1, encodes in definite length as 0xa16161 01, where 0xa1 indicates type 5 with one pair, 0x61 encodes the one-byte string "a", and 0x01 is the value. Indefinite forms would use 0x9f ... ff for and 0xbf ... ff for , with elements or pairs in between.

Semantic Tags (Type 6)

Semantic tags in CBOR are encoded using major type 6, which consists of a major type byte with the value 6 (binary 110), followed by an additional information value that encodes a non-negative tag number (ranging from 0 to 2^64-1) using the standard field encoding for unsigned integers, and then the tagged item itself, which can be any valid CBOR data item. This structure allows the tag to wrap a single enclosed data item without modifying the underlying CBOR format, providing a mechanism to attach interpretive semantics to the content. The purpose of these tags is to convey additional meaning to the enclosed data, such as interpreting a byte string as a big , a text string as a , or an as a self-describing structure, while leaving the core encoding intact; tags are optional and may be ignored by decoders that do not recognize them, treating the item as untagged. Nesting of tags is permitted, enabling multiple layers of semantics where an outer applies to the result of an inner tagged item; for instance, a big (tag 2) could itself be tagged with a higher-level semantic. The tag number itself is always encoded with a definite length, but the enclosed tagged item may use indefinite length encoding if it is a construct like an indefinite-length or . For example, 1 ( date/time) might enclose an unsigned representing seconds since the , encoded as 0xd8 01 ( number 1 in two bytes) followed by 0x1a 47d6 0b80 (a 32-bit unsigned for the value), while 4 encloses an of two to represent a ( × 10exponent). In terms of and equivalence, semantic tags are generally ignored during value comparisons unless the specific tag semantics define otherwise; for instance, two items are considered equal if they are both untagged or both carry the same tag number with equal enclosed content, but the presence of unrecognized tags does not affect basic equality. Deterministic encoding rules for CBOR do not mandate including tags unless required by the application, prioritizing the shortest possible representation for the underlying data item. This design ensures that tags enhance portability and extensibility without compromising the format's conciseness or compatibility.

Simple and Floating-Point Values (Type 7)

Major type 7 in CBOR is used to encode both simple values and floating-point numbers according to the standard. These data items are represented with a single leading byte for simple values or a leading byte followed by the appropriate number of bytes for floating-point values, all in big-endian byte order. Unlike other major types, type 7 does not support indefinite-length encodings; all items have a definite length determined by the additional information in the head. Simple values under major type 7 are encoded as single bytes with no additional data, using additional information values 20 through 23. The value false is encoded as 0xf4, true as 0xf5, null as 0xf6, and undefined as 0xf7. These encodings directly map to their semantic meanings without further interpretation, providing compact representations for common constants in data interchange. Floating-point values are encoded using additional information values 25, 26, and 27 for half-precision (16 bits), single-precision (32 bits), and double-precision (64 bits), respectively. The leading byte is 0xf9 for half-precision followed by two bytes, 0xfa for single-precision followed by four bytes, and 0xfb for double-precision followed by eight bytes. To decode these, the subsequent bytes are interpreted directly as the corresponding binary format in big-endian order: \text{value} = \text{IEEE 754 interpretation of the bytes} For example, the single-precision encoding of 1.0 is 0xfa 3f 80 00 00, where the bytes 3f 80 00 00 represent the IEEE 754 binary32 value. Special floating-point values are handled per IEEE 754 conventions within these formats. Positive infinity uses an exponent of all ones and a zero fraction, with the sign bit determining positive (+∞) or negative (-∞); for instance, half-precision positive infinity is 0xf9 7c 00. NaN (Not a Number) is encoded as a quiet NaN with the exponent all ones and non-zero fraction, preserving any payload in the significand bits and allowing the sign bit to be set or cleared as needed. Negative zero (-0.0) is distinguished from positive zero by its sign bit, such as 0xf9 80 00 in half-precision. RFC 8949 clarifies that NaN payloads are retained during encoding and decoding to maintain fidelity, while the sign bit for special values follows IEEE 754 rules without additional restrictions.

Semantic Tags

Registry Management

The CBOR semantic tags registry is maintained by the (IANA) and is accessible at the "CBOR Tags" registry page. This registry serves as the authoritative source for all registered tags, ensuring global uniqueness and interoperability across CBOR implementations. Assignment policies for new tags are defined in RFC 8949, which obsoletes the initial policies in RFC 7049, and follow the general guidelines in RFC 8126 for IANA registries. Tags in the range 0–23 require Standards Action, typically through an IETF standards process. Tags 24–255 and 256–32767 fall under Specification Required, involving expert review by designated experts (Christian Amsüss, Thomas Fossati, Francesca Palombini, or Carsten Bormann as backup) to verify the accompanying specification. The range 32768–18,446,744,073,709,551,615 is allocated on a basis, with registrations needing a but no formal specification. All registered tags are global in scope and must specify the type(s) of data items they tag, along with their semantics, to enable consistent interpretation. The registry was initially established in RFC 7049 in 2013, which defined the first set of tags and basic allocation rules. It was expanded and refined in RFC 8949 in 2023 to address errata, improve clarity, and update policies for better extensibility. Recent additions include tags 64–79 for typed arrays of integers, registered via RFC 8746 in 2020, which supports efficient encoding of numeric arrays in constrained environments. Subsequent RFCs like RFC 9164 (2021) and RFC 9090 (2021) have added tags for IP addresses/prefixes and object identifiers, respectively. Some tags, such as 260 and 261, have been deprecated per RFC 9164. The registry continues to grow through ongoing IETF contributions, with the last update as of November 2025. For private use, the range 65,536–18,446,744,073,709,551,615 is reserved, allowing implementers to define tags locally without registration or collision avoidance, though documentation is encouraged for interoperability. Errata and clarifications are tracked through the IETF's RFC errata system.

Common Tags and Examples

Semantic tags in CBOR extend the basic data model by associating specific meanings with tagged items, enabling representations of complex concepts like dates, identifiers, and high-precision numbers. The most frequently used tags are defined in the initial registry established by RFC 8949, with additional common ones added in subsequent specifications such as RFC 8746 for typed arrays. These tags are applied to major type 6 items, where the tag number precedes the content item, and their semantics are standardized to ensure interoperability across implementations. Tag 1 represents an epoch-based date/time as seconds since 1970-01-01T00:00:00Z, tagging an for whole seconds or a floating-point number for subsecond precision. For example, the 2025-11-10T00:00:00Z encodes as tag 1 applied to the unsigned integer 1762732800, yielding the diagnostic notation 1(1762732800). This tag supports subsecond resolution when using double-precision floats, allowing fractional seconds beyond integer boundaries. Tag 24 denotes embedded CBOR, indicating that the tagged content is another complete CBOR item, often used for indefinite-length or nested structures without altering the outer encoding. This facilitates modular data composition, such as embedding diagnostic information or partial streams within a larger CBOR object. Tags 4 and 5 provide representations for high-precision floating-point numbers as bigfloats. Tag 4 specifies a decimal fraction in the form of an array [exponent, mantissa], where the value is mantissa × 10^exponent, with both elements being integers (the mantissa may be negative). Tag 5 similarly defines a binary bigfloat as mantissa × 2^exponent. These tags are essential for applications requiring exact or beyond native CBOR floating-point types. Tag 32 identifies a , tagging a text string encoded in per RFC 3986, allowing compact representation of web addresses and resource locators. For instance, the URI "https://example.com" would be tagged as 32("https://example.com"). Tag 37 denotes a (Universally Unique Identifier), tagging a 16-byte byte string as defined in RFC 4122, providing a standardized way to embed unique identifiers in CBOR data. This is commonly used in distributed systems for entity referencing without string overhead. Tag 55799 marks self-described CBOR, applied to a map that includes keys for version (integer) and content type (text string or integer type), enabling dynamic of the enclosed CBOR item's and semantics. This tag supports versioning and attachment for extensible formats. Tags 64 through 79, introduced in 8746, represent typed s encoded as byte strings, mirroring TypedArray semantics for efficient numeric handling. For example, tag 64 tags a byte string as a uint8 (unsigned 8-bit integers), tag 72 as a sint8 (signed 8-bit integers), and higher tags like 66 (uint32, big-endian) or 70 (uint32, little-endian) specify varying bit widths and byte orders. These tags are particularly useful for interchange in performance-sensitive environments, such as numerical computations or data streams.

Applications

Cryptographic Uses

CBOR plays a central role in cryptographic protocols through the CBOR Object Signing and Encryption (COSE) framework, which defines structures for signatures, message authentication codes, and encryption using CBOR's binary encoding for compactness and efficiency. Originally specified in 8152, COSE was updated in 9052 (published 2022) to refine message structures and CBOR integration, while 9053 details initial algorithms such as ECDSA, , , and AES-based encryption that leverage CBOR for headers and payloads. This enables secure object handling in resource-constrained settings, where COSE structures like signed objects are encoded as CBOR arrays consisting of a protected header (a binary-encoded CBOR of authenticated parameters), an unprotected header (a CBOR of non-authenticated parameters), the (arbitrary CBOR data), and the (a byte string). Another key application is in token formats, exemplified by CBOR Web Tokens (CWT) as defined in RFC 8392, which provide a alternative to Web Tokens (JWT) for representing claims in a compact CBOR map with integer or string keys (e.g., key 1 for issuer, key 2 for subject). CWTs are secured using COSE for signing or encryption, with claims avoiding additional CBOR tags to maintain simplicity, though the overall CWT may use tag 61 for identification. This design ensures interoperability while reducing overhead compared to text-based JWTs. CBOR's advantages in cryptography stem from its small message size and support for deterministic encoding, which produces identical byte sequences for equivalent data items—essential for consistent hashing and verification in protocols like TLS extensions or IoT security. For instance, a simple COSE_Sign1 object (a single-signature variant) follows the array format [protected, unprotected, payload, signature], where the protected header might encode algorithm and key identifiers in a compact CBOR map. Enhancements in RFC 8949 (the current CBOR standard, published December 2020) further bolster cryptographic determinism by specifying rules for encoding floats (using the shortest form that preserves value, such as binary16 or binary32) and handling values (preferring zero-padded representations where feasible), preventing variations that could undermine security processes like digital signatures. These rules ensure reproducible encodings across implementations, making CBOR suitable for high-stakes cryptographic applications.

Constrained Environments and IoT

CBOR's design emphasizes compactness and simplicity, making it particularly suitable for applications where devices often operate under severe resource constraints, such as limited memory, processing power, and . In sensor networks, for instance, CBOR enables efficient of measurement data with low overhead. The Sensor Measurement Lists (SenML) specification, defined in , utilizes CBOR to represent simple sensor readings and device parameters, incorporating semantic tag 34 for base64url encoding of values, while units are encoded as text strings (e.g., "Cel" for ). This approach allows devices to transmit structured data like temperature, humidity, or location readings in a format that minimizes transmission costs while preserving extensibility through tags. On constrained devices, CBOR implementations are notably lightweight, facilitating deployment on microcontrollers with minimal resources. Full CBOR decoders can be as small as bytes in assembly code, well under 10 KB, enabling real-time processing without straining limited or flash storage. In protocols like the (CoAP), defined in 7252, CBOR serves as an efficient payload format for resource representations, allowing constrained nodes to exchange data over with reduced overhead compared to text-based alternatives. This integration supports block-wise transfers for larger payloads, as extended in RFC 9177, ensuring reliability in unreliable networks typical of deployments. To enhance interoperability in such environments, CBOR employs extensions like the Concise (CDDL) for description, as specified in 8610. CDDL provides a human-readable notation to define expected CBOR structures, aiding validation and generation of messages without requiring verbose grammars. Additionally, 9277 introduces stable storage formats for CBOR data items, including CBOR Sequences for streaming multiple items and self-describing tags (e.g., tag 55799 for definite-length items) to ensure file stability and recognition by tools like the Unix file command, which is useful for logging or archiving data streams. Practical examples of CBOR in include smart home protocols, where it underpins efficient communication in systems like CoAP-based gateways for device discovery and control, and firmware updates, which leverage CBOR's compact encoding to bundle metadata, checksums, and binary images for over-the-air () delivery on resource-limited hardware. For semantic interoperability, CBOR integrates with through the W3C's JSON-LD 1.1 in CBOR specification (2024 note), mapping JSON-LD's model to CBOR's binary structure—such as serializing objects as maps (major type 5) and binaries as byte strings (major type 2)—to enable lightweight applications in the . A key benefit of CBOR in these contexts is its bandwidth efficiency, with payloads often 40-60% smaller than equivalent representations for typical data like arrays or maps, as demonstrated in evaluations of time-series encodings and benchmarks. This reduction supports types (e.g., application/cbor) registered with IANA, allowing seamless integration with HTTP/CoAP while conserving energy on battery-powered devices. Overall, CBOR's adoption in standards underscores its role in enabling scalable, low-power ecosystems.

References

  1. [1]
    RFC 8949: Concise Binary Object Representation (CBOR)
    CBOR is a data format designed for small code and message sizes, based on an extended JSON model, and aims to encode most common data formats.Table of Contents · Specification of the CBOR... · Creating CBOR-Based Protocols
  2. [2]
    CBOR Object Signing and Encryption (COSE): Structures and Process
    Aug 22, 2022 · Concise Binary Object Representation (CBOR) is a data format designed for small code size and small message size.
  3. [3]
    RFC 8949 - Concise Binary Object Representation (CBOR)
    CBOR-based protocols for constrained applications that provide a choice between representing a specific number as an integer and as a decimal fraction or ...
  4. [4]
    RFC 7049 - Concise Binary Object Representation (CBOR)
    An application or protocol that uses CBOR might restrict the representations of numbers. ... Few (if any) IETF protocols have adopted one of the several ...
  5. [5]
  6. [6]
  7. [7]
  8. [8]
  9. [9]
  10. [10]
  11. [11]
  12. [12]
  13. [13]
  14. [14]
  15. [15]
  16. [16]
  17. [17]
  18. [18]
  19. [19]
  20. [20]
  21. [21]
  22. [22]
  23. [23]
  24. [24]
  25. [25]
  26. [26]
  27. [27]
  28. [28]
  29. [29]
  30. [30]
  31. [31]
  32. [32]
  33. [33]
  34. [34]
  35. [35]
  36. [36]
  37. [37]
    Concise Binary Object Representation (CBOR) Tags
    Sep 19, 2013 · CBOR Tags ; 103, array, Geographic Coordinates ; 104, multiple, Geographic Coordinate Reference System WKT or EPSG number ; 105-106, Unassigned.
  38. [38]
  39. [39]
  40. [40]
  41. [41]
    Epoch Converter - Unix Timestamp Converter
    The converter on this page converts timestamps in seconds (10-digit), milliseconds (13-digit) and microseconds (16-digit) to readable dates.Time zone adjustment · What's the Current Day Number? · Unix Epoch Clock
  42. [42]
  43. [43]
  44. [44]
  45. [45]
  46. [46]
  47. [47]
    RFC 8392: CBOR Web Token (CWT)
    ### Summary of CBOR Web Token (CWT) from RFC 8392
  48. [48]
  49. [49]
  50. [50]
  51. [51]
  52. [52]
    RFC 8428: Sensor Measurement Lists (SenML)
    This specification defines a format for representing simple sensor measurements and device parameters in Sensor Measurement Lists (SenML).
  53. [53]
    CBOR — Concise Binary Object Representation | Implementations
    C, C++. A C implementation for highly constrained nodes, which achieves a full CBOR decoder in 880 bytes of ARM code (and now also includes an encoder), ...Missing: devices <10KB
  54. [54]
    RFC 7252 - The Constrained Application Protocol (CoAP)
    1. Representation The payload of requests or of responses indicating success is typically a representation of a resource ("resource representation") or the ...Missing: CBOR | Show results with:CBOR
  55. [55]
    RFC 9177: Constrained Application Protocol (CoAP) Block-Wise ...
    This document introduces the CoAP Q-Block1 and Q-Block2 options, which allow block-wise transfers to work with a series of Non-confirmable messages.<|separator|>
  56. [56]
    RFC 8610 - Concise Data Definition Language (CDDL)
    This document proposes a notational convention to express Concise Binary Object Representation (CBOR) data structures (RFC 7049).
  57. [57]
    RFC 9277: On Stable Storage for Items in Concise Binary Object ...
    This document defines a stored ("file") format for Concise Binary Object Representation (CBOR) data items that is friendly to common systems that recognize ...Table of Contents · Protocol · IANA Considerations · References
  58. [58]
    [PDF] Secure Firmware Updates for Constrained IoT Devices ... - Hal-Inria
    For instance TR 69 [21], also known as the CPE WAN Management Protocol (CWMP) devel- oped by the Broadband Forum4 offers firmware update functional- ity on ...
  59. [59]
    JSON-LD 1.1 in CBOR - W3C on GitHub
    A data format for constrained environments. It is designed to be processable with small code, and to produce small messages.Missing: integration | Show results with:integration
  60. [60]
    CBOR Serialization in the Golioth Zephyr SDK: 20% Data Savings!
    May 9, 2023 · Comparing CBOR serialization to JSON​​ If you remove all of the whitespace from our initial JSON string, you end up with 186 bytes, but when ...<|separator|>