# Specification: DAG-CBOR

Status: Descriptive - Draft

DAG-CBOR supports the full IPLD Data Model.

DAG-CBOR uses the Concise Binary Object Representation (CBOR) (opens new window) data format, which natively supports all IPLD Data Model Kinds.

# Format

The CBOR IPLD format is called DAG-CBOR to disambiguate it from regular CBOR. Most simple CBOR objects are valid DAG-CBOR. The primary differences are:

  • tag 42 interpreted as CIDs, no other tags are supported
  • maps may only be keyed by strings
  • additional strictness requirements are applied to ensure canonical data encoding forms

As with all IPLD formats, DAG-CBOR must be able to encode Links. In DAG-CBOR, links are the binary form of a [CID] encoded using the raw-binary identity Multibase (opens new window). That is, the Multibase identity prefix (0x00) is prepended to the binary form of a CID and this new byte array is encoded into CBOR as a byte-string (major type 2), with the tag 42.

The inclusion of the Multibase prefix exists for historical reasons and the identity prefix must not be omitted.

# Map Keys

In DAG-CBOR, map keys must be strings, as defined by the IPLD Data Model. Other map keys, such as ints, are not supported and should be rejected when encountered.

# Strictness

DAG-CBOR requires that there exist a single, canonical way of encoding any given object, and that encoded forms contain no superfluous data that may be ignored or lost in a round-trip decode/encode.

Therefore the DAG-CBOR codec must:

  1. Use no tags other than the CID tag (42). A valid DAG-CBOR encoder must not encode using any additional tags and a valid DAG-CBOR decoder must reject objects containing additional tags as invalid.
  2. The only usable major type 7 minor types are those for encoding Floats (25, 26, 27), True (20), False (21) and Null (22).
    • "Simple values" are not supported. This includes all registered or unregistered simple values that are encoded with a major type 7.
    • Undefined (23) is not supported.
  3. Use the canonical CBOR encoding defined by the suggestions in section 3.9 of the CBOR specification (opens new window). A valid DAG-CBOR decoder should reject objects not following these restrictions as invalid. Specifically:
    • Integer encoding must be as short as possible.
    • The expression of lengths in major types 2 through 5 must be as short as possible.
    • The expression of tag numbers (specifically only 42) must be as short as possible for major type 6. Therefore, for valid DAG-CBOR, the only tag token that can appear is 0xd82a - where 0xd8 is "major type 6 with 8-bit integer to follow" and 0x2a is the number 42.
    • The keys in every map must be sorted lowest value to highest. Sorting is performed on the bytes of the representation of the keys.
      • If two keys have different lengths, the shorter one sorts earlier;
      • If two keys have the same length, the one with the lower value in (byte-wise) lexical order sorts earlier.
    • Indefinite-length items are not supported, only definite-length items are usable. This includes strings, bytes, lists and maps. The "break" token is also not supported.
  4. Encode and decode a single top-level CBOR object and not allow back-to-back concatenated objects, as suggested by section 3.1 of the CBOR specification (opens new window) for streaming applications. All bytes of an encoded DAG-CBOR object must decode to a single object. Extraneous bytes, whether valid or invalid CBOR, should fail validation.
  5. Floating point values are always encoded in 64-bit, double-precision form, regardless of whether they can be represented as half (16) or single (32) precision.
  6. IEEE 754 special values NaN, Infinity and -Infinity should not be accepted as they do not appear in the IPLD Data Model. Therefore, tokens 0xf97c00 (Infinity), 0xf97e00 (NaN) and 0xf9fc00 (-Infinity) and their 32-bit and 64-bit variants, should not appear, or be accepted in DAG-CBOR binary form.

# Implementations

# JavaScript

dag-cbor (opens new window), used by ipld (opens new window) and @ipld/block (opens new window) adheres to this specification, with the following caveats:

  • Strictness is not yet enforced on decode, blocks encoded that don't follow the strictness rules are not rejected
  • Floating point values are encoded as their smallest form rather than always double-precision.
  • Many additional object types outside of the Data Model are currently accepted for encoding.
  • IEEE 754 special values NaN, Infinity and -Infinity are accepted for decode and encode.

# Go

ipld-cbor (opens new window) and ipld-prime (opens new window) adhere to this specification, with the following caveats:

  • Strictness is not yet enforced on decode, blocks encoded that don't follow the strictness rules are not rejected
  • IEEE 754 special values NaN, Infinity and -Infinity are accepted for decode and encode.

# Java

java ipld from Peergos (opens new window) adhere to this specification, with the following caveats:

  • Strictness is not yet enforced on decode, blocks encoded that don't follow the strictness rules are not rejected
  • Floats are disabled

# Limitations

# JavaScript

Users of DAG-CBOR that expect their data may be consumed or produced by JavaScript at some point should be aware of limitations that the language imposes on its use of DAG-CBOR, specifically concerning numbers.

All JavaScript numbers, both floating point and integer, (using the Number (opens new window) primitive wrapper) are represented internally as 64-bit IEEE 754 (opens new window) floating-point values (i.e. double-precision). Some implications within JavaScript of this design choice are:

  • There is no clear differentiation between a pure integer type and a floating-point number where a developer may wish to have such a differentiation.
  • By convention, JavaScript engines and developers usually omit the decimal point when representing whole numbers, simulating integers where the number is not actually stored as an integer.
  • There are limits on maximum and minimum safe integer sizes representable in JavaScript that are more constrained than those of languages where there are 64-bit integer types. Numbers outside of the range of Number.MAX_SAFE_INTEGER (253- 1) and Number.MIN_SAFE_INTEGER (-(253- 1)) cannot be safely manipulated or inspected as they incur rounding effects imposed by the IEEE 754 representation.
  • Native bit-wise operations on "integers" are not able to be performed outside of the 32-bit range; larger numbers will be truncated.

The current CBOR encoder/decoder used by the primary JavaScript DAG-CBOR implementation uses the bignumber.js (opens new window) library to handle large numbers in some cases, although reliance on its wrapper type is not recommended by DAG-CBOR users.

The implications for DAG-CBOR of these limitaitons are:

  • Any Number serialized by the JavaScript CBOR encoder relies on a whole-number check (e.g. x % 1 === 0) to determine whether it should be encoded as an integer or a float.
  • Any float deserialized by the JavaScript CBOR decoder that does not have a fractional component will be indistinguishable from an integer to a JavaScript program.
  • Any Number greater than Number.MAX_SAFE_INTEGER or less than Number.MIN_SAFE_INTEGER cannot be properly inspected for its whole-number status and is therefore encoded by the JavaScript CBOR encoder as float regardless of whether it is a whole-number or has a fractional component.
  • Any integer deserialized by the JavaScript CBOR decoder greater than Number.MAX_SAFE_INTEGER or less than Number.MIN_SAFE_INTEGER will be returned as a bignumber.js wrapper type, which may be unexpected to users and have unexpected effects on downstream code.

A new BigInt (opens new window) built-in type is currently being adopted across JavaScript engines. Once support is widely available, it is expected that this type will assist with some of these challenges.