# Authoring IPLD Schemas

IPLD Schemas can be represented in a compact, human-friendly DSL (opens new window). IPLD Schemas can also be naturally represented as an IPLD node graph, typically presented in JSON form. The human-friendly DSL compiles into this IPLD-native format.

# Basics

# Records: type and advanced

IPLD Schemas typically comprise a collection of optionally interdependent types. Each type definition starts with a type prefix at the beginning of a line, followed by the type's name and then its definition. One other style of record optionally exists within an IPLD Schema, Advanced Data Layouts. These replace the type keyword with advanced and have specific rules about their contents. More on this below.

# Newlines and Whitespace

The DSL treats newlines as significant, they are used to break up records (type and advanced) and descriptors within records. Newlines are used in a similar way to programming languages that substitute C-style ; breaks with significant newlines.

Multiple newline characters are folded in to one during parsing, so newlines may be used for formatting and documentation purposes where appropriate. It is also not necessary to separate records by a specific number of newlines, although a single blank line is typical.

Whitespace characters (tab and space) are also folded in to a single space during parsing, so may be used for formatting and documentation purposes where appropriate. Most tokens that don't need to be proceeded by a newline should be separated by at least one newline character. There are other tokens that don't strictly require a newline (e.g. {String:Int} for Map definitions where 5 tokens may be conjoined, but also may be separated, { String : Int }). Indenting is not strictly required for record component descriptors but are typical as they can be used to express intent.

type Foo struct {
  a   Int
  b   Int
  msg Message
}

type Message string

In this example:

  • The whitespace between non-punctuation tokens is required (typeMessagestring would be nonsense!)
  • At least one newline between each component of the Foo record are required such that a, b, and msg are all on separate lines.
  • The indenting for a, b, and msg is optional but helps express the ownership of these items to the parent record.
  • The additional spaces between a, b and their Int type descriptors is optional and used as a formatting nicety to line up the types in a Struct.
  • The blank line between the close of Foo and Message is optional but is intended to help with readability.
  • Newline and space rules for { and } are lax but convention is to use the locations and spacing in this example.

# Comments

All characters on a line following a # character are ignored during parsing. This allows for full-line comments and comments trailing Schema DSL tokens:

#
# This is a (pseudo)block comment
#

type Foo struct {
  a Int # An inline comment
  b Int
  msg Message
}

# Another full-line comment
type Message string

# Schema Kinds

See IPLD Schema Kinds for more information on this topic.

The schema kinds have matching tokens that appear throughout IPLD Schemas. Depending on context, the tokens are either lower-case (e.g. int) or title-case (e.g. Int), or may omitted entirely because they can be reliably inferred. This will become clear as we proceed.

  • Null: may appear as a typedef'd null but there is discussion regarding the possibility of changing the semantics of how Null is used in Schemas. It is not commonly useful outside of the nullable signifier for Struct fields.
  • Boolean: may appear as Bool for a component specifier or bool as a typedef.
  • Integer: May appear as Int for a component specifier or int as a typedef. There are no additional specifiers for integer size or signedness (although this may appear as adjuncts for codegen in the future).
  • Float: May appear as Float for a component specifier or float as a typedef. There are no additional specifiers for size or byte representation (although this may appear as adjuncts for codegen in the future).
  • String: May appear as String for a component specifier or string as a typedef. The Data Model assumes unicode. Specific string encodings also appear as representation forms, see below.
  • Bytes: May appear as Bytes for a component specifier or bytes as a typedef. There are no additional specifiers for byte array length and there is no way to specify a single byte. The byteprefix Union representation type is a special case indicating a single byte dictates the type of the proceeding bytes, see below.
  • List: Is inferred by the [Type] shorthand for both typedefs and inline component specification. The token "List" is not used in the Schema DSL and all Lists must have value type specified (although Unions allow for significant flexibility here).
  • Map: Is inferred by the {KeyType:ValueType} shorthand for both typedefs and inline component specification. The token "Map" is not used in the Schema DSL and all Maps must have key and value type specified (although Unions allow for significant flexibility here).
  • Link: The & token prefixing a type is used as a shorthand for links. A generic link to an untyped resource uses the special &Any, while a link where there is an expected type to be found uses that type name as a hinting mechanism, &Foo. See below and Links and IPLD Schemas for more information.
  • Union: Appears as union following type and the Union's type name.
  • Struct: Appears as struct following type and the Struct's type name.
  • Enum: Appears as enum following type and the Enum's type name.
  • Copy: Uses the shorthand = to indicate a copy type, as in type Foo = Bar. The token "Copy" does not directly appear in the Schema DSL.

# Naming Types

Type names must only contain alphanumeric ASCII characters and underscores. The first character must be a capital letter. Multiple connected underscores should be avoided (they should be reserved for codegen purposes). A strict regular expression for type names would be: [a-zA-Z][a-zA-Z0-9_]*. A regular expression following convention would be: [A-Z][a-zA-Z0-9_]* (disregarding the multiple-underscore rule for simplicity).

Camel case with an upper case first character is recommended. Underscore _ should be used sparingly. ThisIsRecommend, This_Not_So_Much, Thisisnotrecommended, neitherIsThis.

Type names are unique within a Schema and are ideally unique within related Schema documents; overlapping names are generally not ideal for documentation purposes. Certain forms of Schema kind identifiers are forbidden and those forms that are not forbidden should be avoided to save confusion for documentation purposes. i.e. Null, Boolean, Int, Float, String, Bytes are strictly not allowed as type names (they are already implicit type names), and their lower-case counterparts and the additional schema kinds should be avoided.

Type names should be used as a documentation tool. They don't need to be short if long names are more helpfully descriptive.

# Named Scalar Types (typedefs)

The non-recursive (scalar) Schema kinds (Boolean, Integer, Float, String, Bytes, Link) may all appear as typedef'd types. That is, a unique name may be assigned to a kind and that name may be used in place of the kind later in the schema. Multiple unique type names may share the same kind.

type Foo string
type Bar int
type Boom {Foo:Bar}

In terms of data layout, this is equivalent to:

type Boom {String:Int}

(Note that even though the Data Model only allows for string keys of maps, the indirection through type Foo is allowed since it has a string representation.)

There are a number of reasons to typedef a scalar Schema kind:

  • Documentation: A stand-alone type can be more easily documented in the Schema DSL. This may be helpful where there are additional rules that surround a type that are not expressible in the DSL but readers of the Schema may need to be aware of. You will find a lot of such typedefs in the schema-schema.
  • Highlighting re-use: Where the re-use of a particular Schema kind is noteworthy, naming it may help in expressing intent.
  • Codegen: the use of named types will have implications for codegen tools. It may be desirable for code generated from a Schema to have recognizable type names in certain positions.

Links in IPLD Schemas are a special-case. The Data Model kind "Link" is expressed by a token prefixed with the & character. The remainder of the token should be Any or the name of a type.

Links can be typedef'd, type Foo &Bar or can appear inline: type Baz {String:&Bang}.

Further, the type name is not a strict assertion that can be directly tested against underlying data, it is simply a hint regarding what should be found when following the link identified by the CID (opens new window) at the position indicated by the Schema link. Strict assertions of this expected type may be applied at layers above the Schema validation layer when the link is resolved and the node decoded.

For more information about Links in Schemas, see Links and IPLD Schemas.

# Inline Recursive Types

The scalar types (Boolean, Integer, Float, String, Bytes, Link) may appear inline or be typedef'd. In addition, both Map and Link types may appear both inline and as their own type. The additional Schema kinds (Struct, Enum, Union, Copy) do not have an inline variant.

type IntList [Int]

type MapOfIntLists {String:IntList}

type Foo struct {
  id Int
  data MapOfIntLists
}

is equivalent to:

type Foo struct {
  id Int
  data {String:[Int]}
}

As with typedef'd scalar kinds, this has implications for codegen and other API interactions with Schema types. Rather than having a explicit names, MapOfIntLists and IntList, auto-generated names may be applied to Foo->data and the type of the List nodes found within that Map. (e.g. perhaps Foo__dataType, Foo__data__valueType).

The inline facility is provided for convenience but explicitness is always recommended above expedience, including this case, in order to improve the documentation role of Schemas. By naming Map and List elements the author can express intent to the user and provide clarity through Schema-consuming tools.

# Representations

The concept of "representations" is a key component of IPLD Schemas and should be understood in order to create and read effective IPLD Schemas.

In the Data Model there are only 9 kinds (Null, Boolean, Integer, Float, String, Bytes, List, Map & Link). The Schema layer adds 4 more (Union, Struct, Enum & Copy). These aren't present at the Data Model and are opaque to serialization formats. Instead, they must be "represented" as a base Data Model kind. Each data type at the Schema layer, therefore, has a "representation kind". Scalar kinds are represented as the same kind at the Data Model layer (except in the case of Advanced Data Layouts, see below).

A Struct is represented as a Map by default when serialized and deserialized. The Struct adds the ability to apply additional constraints about the keys, the types found when consuming the value nodes of the Map, whether certain keys must be present and what to do when they aren't present. Enums also have a default representation; when one is not specified, they are assumed to be represented as Strings when serialized or deserialized, but with constraints about valid strings for the node(s) where the Enum appears.

A Copy type is a special case, it copies all properties of the copied type other than its name, including the representation.

Unions don't have a default representation as they express a concept that is commonly represented in a number of ways, so a representation must be supplied when defining a Union type.

Some Schema kinds have alternative representation "strategies" that dictate how a type is to be represented in serialized form. Most of these strategies change the representation kind of the type but some retain the same kind and simply alter how the type is encoded within that kind. The stringjoin and stringpairs representation strategies that can be used for Struct types both change the representation kind for a Struct from the default Map to a String. The method for encoding to a single String is different for both. A stringjoin strategy appends the fields in order separated by a delimiter (e.g. "v1,v2") while a stringpairs strategy include the field names, requiring a field delimited as well as an entry delimited (e.g. "f1=v1,f2=v2"). Similarly, the listpairs and tuple Struct representations both use a List representation kind but use different strategies to encode within a List.

To specify a type's representation, the keyword representation is supplied after the main type definition and is followed by a representation strategy name valid for that type.

For example, consider this Struct:

type Foo struct {
  fieldOne nullable String
  fieldTwo Bool
}

We could decode the following JSON (using the DAG-JSON codec) into a Foo type:

{
  "fieldOne": "This is field one of Foo",
  "fieldTwo": false
}

A Struct can also have the default representation expressed explicitly:

type Foo struct {
  fieldOne nullable String
  fieldTwo Bool
} representation map

These two descriptors of Foo are identical when parsed as the representation map is implicit for Structs when a representation is not supplied.

The Struct can also be represented as a List when we supply the tuple representation type:

type Foo struct {
  fieldOne nullable String
  fieldTwo Bool
} representation tuple

When encountering a Map at the Data Layer where this variant of Foo is expected, an error or failed-validation would occur. Instead, the data for this Struct is a simple List of two elements, the first one a String and the second a Bool. In JSON this may look like:

[ "This is field one of Foo", false ]

A full list of the available representation strategies and their kinds that can be supplied for various Schema kinds can be found in Representations of IPLD Schema Kinds.

# Representation Parameters

Some representation strategies have additional parameters that can be supplied and some have required parameters that are required in order to properly shape the type representation. There are two methods that representation parameters are supplied: within the representation block for general parameters and inline adjacent to type fields in parens where representation parameters are specific to fields.

# General Representation Parameters

Our Foo struct with a tuple representation may be serialized in an alternate field order by supplying the general fieldOrder parameter:

type Foo struct {
  fieldOne nullable String
  fieldTwo Bool
} representation tuple {
  fieldOrder ["fieldTwo", "fieldOne"]
}

Serialization of such a type in JSON may appear as:

[ false, "This is field one of Foo" ]

The stringjoin representation for Structs has a required parameter, join. There is no default for this parameter, so a Schema specifying a stringjoin Struct without it is invalid:

type Foo struct {
  fieldOne nullable String
  fieldTwo Bool
} representation stringjoin {
  join ":"
}

This representation for Foo would serialize into a single String node:

"This is field one of Foo:false"

This representation for Structs has limitations as there is no escaping mechanism for the join character, so it should be used with caution. Similar restrictions apply to the stringpairs Map representation. See Representations of IPLD Schema Kinds for more details on such restrictions.

# Field-specific Representation Parameters

The content in the main type declaration block (between opening { and closing }) is intended to represent the type as a user-facing concept, including the cardinality of the fields. However, content in parens ((, )) presented next to individual fields is an exception to this rule. This content is field-specific representation parameters. That is, the parameters presented inside these parens would ordinarily belong below in the representation block because it regards the interaction with the serialized form. It is present next to the fields to primarily avoid the duplication of re-declaring the fields in the representation block.

Two common field-specific representation parameters for Structs are implicit and rename:

type Foo struct {
  fieldOne nullable String (rename "one")
  fieldTwo Bool (rename "two" implicit "false")
}

A cleaner declaration that separates type declaration from serialized form representation details might present this as:

# This is not valid IPLD Schema but is presented to illustrate the additional verbosity being avoided

type Foo struct {
  fieldOne nullable String
  fieldTwo Bool
} representation map {
  fields {
    fieldOne rename "one"
    fieldTwo rename "two" implicit "false"
  }
}

In our example we can see that nullable is a distinct parameter for the field compared to rename and implicit. This is because nullable impacts the shape of the user-facing API for Foo, whereas rename and implicit only impact the serialization (representation) of Foo so are effectively hidden to the user.

See Value Type Modifiers for a discussion on such matters as well as the impacts on value cardinality.

A rename parameter specifies that at serialization and deserialization, a field has an alternate name than that present in the Schema. An implicit specifies that, when not present in the serialized form, the field should have a certain value.

Recall our original serialized form for Foo:

{
  "fieldOne": "This is field one of Foo",
  "fieldTwo": false
}

With the rename and implicit parameters above, this same data would be serialized as:

{
  "one": "This is field one of Foo"
}

See Fields with Implicit Values for more information on implicit. In the same document you will also find a discussion regarding combining nullable, optional and implicit and the limitations thereof.

Whenever a value appears in a representation parameter, it must be quoted, regardless of type. In our example above, implicit "false" quoted a Bool parameter. This will be interpreted appropriately depending on context, in this case it is clear that the type of the quoted value should be a Bool.

Another example of field parameters is the int representation for Enums, where the field parameter is mandatory:

type Status enum {
  | Nope  ("0")
  | Yep   ("1")
  | Maybe ("100")
} representation int

In this case we are mapping Int values at in the serialized form to the three Enum values. Note also that the values are again quoted, but will be interpreted appropriately as integers because the context makes that clear.

# Structs

The basic DSL form of a Struct has the following structure:

type TypeName struct {
  field1Name Field1Type
  field2Name Field2Type
  ... etc.
}

Where TypeName is a unique name for the type and follows the naming rules above. Field names follow the same rules as for type naming except that a lower-case first character is allowed and is encouraged as the conventional form. All fields have a type and the type should be one of the existing implicit Schema types (Int, String etc.) or be present as a named type elsewhere within the document. Field types can be recursive in that they can refer to the parent type, indicating a nested data structure (obviously such a nested data structure must have nullable or optional elements that prevent it from being necessarily infinitely recursive).

Structs must always have a body, enclosed by {, }. Fields must new-line delimited and should be indented for clarity.

The representation strategy for Structs is map by default, so may be omitted. Additional representation strategies See Representations of IPLD Schema Kinds for more details on these representation strategies.

Field representation parameters are presented in parens when present and representations requiring additional general parameters is presented in a separate representation block enclosed by {, }. For example, a Struct with both field representation parameters and general representation parameters:

type Foo struct {
  fieldOne nullable String (rename "one")
  fieldTwo Bool (rename "two" implicit "false")
} representation stringpairs {
  innerDelim "="
  entryDelim ","
}

Leading to a serialized form such as:

"one=This is field one of Foo,two=true"

More details regarding stringpairs can be found below and in Representations of IPLD Schema Kinds.

Valid representation strategies for Structs are:

  • map
  • tuple
  • stringpairs
  • stringjoin
  • listpairs

More details about these representation strategies, including their various parameters and their representation kinds can be found in Representations of IPLD Schema Kinds.

# Enums

Enums are used to indicate a distinct, fixed list of values. Enums in IPLD Schemas have a String representation kind, using the value token as the serialized value by default.

type Status enum {
  | Nope
  | Yep
  | Maybe
}

type Response struct {
  timestamp Int
  status Status
}

In this example, where Status is used, as the status field in the Response Struct, we expect to find a String in the serialized form that is one of "Nope", "Yep" or "Maybe". This string value is not presented via an API interacting via this Schema, rather, the special tokens Nope, Yep and Maybe may be used instead. Codegen would present these values as distinct types that can be passed to a struct / class implementing Response when interacting with the status field.

The serialized strings may be different from values:

type Status enum {
  | Nope ("Nay")
  | Yep  ("Yay")
  | Maybe
}

Creating a differential between the Strings at the Data Model layer and the tokens that an API may use at the Schema layer.

An alternate representation strategy for Enums may be specified: int. With an int representation strategy, the values are serialized and deserialized as Data Model Ints but the Enum value tokens are presented at the Schema Layer:

type Status enum {
  | Nope  ("0")
  | Yep   ("1")
  | Maybe ("100")
} representation int

Note again that the Int values are quoted in the field representation parens, they will be interpreted and validated as integers when parsing as the context of an int representation strategy makes this clear.

More details about these representation strategies can be found in Representations of IPLD Schema Kinds.

# Unions

# Introduction to Unions: Kinded Unions

IPLD Schema Unions describe various means for nodes that may be one of a number of kinds or forms. Consider a node that contains the following data, perhaps as part of a signalling protocol:

{
  "msg": "Something bad happened",
  "payload": "ERROR"
}

And an alternative form that is also acceptable but signals a different state and meaning:

{
  "msg": "All good",
  "payload": {
    "percent": 0.6,
    "last": "61626378797a"
  }
}

In this example, we have a Map that can be represented as a Struct since it has only two fields, but the payload field doesn't have a stable kind so we can't use any of the existing Schema types to represent the field type. Instead, we can introduce a Union and can take different forms depending on the different acceptable forms.

IPLD Schemas are intended to be efficient, so the ability to discriminate on Union types is limited to what we can find at the current node. That is, we can't inspect whether a node has a child that takes a particular form and use that as a discriminator (such as inspecting the keys or values of a Map). A Schema must be able to fail validation at a node being inspected where the data does not match the expected form.

In our example, the discriminator for type found at payload is the kind of node present. It is either a String kind of a Map kind. We can make an immediate determination of type based on this piece of information.

Our Schema for this data could be written as:

type Message struct {
  msg String
  payload Payload
}

type Payload union {
  | Error string
  | Progress map
} representation kinded

type Error string

type Progress struct {
  percent Float
  last String
}

Our Payload Union can be read as "one of Error or Progress" and could have additional elements if there are different forms that a "payload" could take. All Unions require a representation strategy to be stated, there is no default strategy. In this case we are specifying the kinded strategy, so we are opting to discriminate the type by inspecting the kind present at the data model layer. If we find a String at the data model layer then we can safely assume it is an Error. If we find a Map then we assume it's a Progress type but we have to proceed to validate it against Progress and check whether the Map has the required two elements, but at this point the validation job of Payload is done, it only needs to check for the presence of String or Map.

# Limitations of Union Discrimination

Authoring Unions in IPLD Schemas help expose some of the limitations of quickly validating data that is allowed to vary. If we extend our example and introduce another acceptable form of "payload" we can see how this ability to quickly discriminate breaks down and introduces the need to do child-contents checking to discriminate:

{
  "msg": "Ping",
  "payload": {
    "ts": 1572935564043,
    "nonce": "424f524b"
  }
}

We've introduced a new message type but lost the ability to discriminate based in kind as our new type is also a Map. A Schema that accommodates for this additional payload type is possible but forces the burden of discrimination and onto the consumer of the data as well as some additional validation burden:

type Message struct {
  msg String
  payload Payload
}

type Payload union {
  | Error string
  | ProgressOrPing map
} representation kinded

type Error string

type ProgressOrPing struct {
  percent optional Float
  last optional String
  ts optional Int
  nonce optional String
}

Now the user of such a Schema must do their own field inspection to determine whether a ProgressOrPing is a progress message or a ping. Additionally, the burden of ensuring that both percent and last are present or ts and nonce are present is left to the user, the Schema layer can't help here. The trade-off present in this scenario regards validation of a node by inspection of its child nodes. This type of data is common in the real world but IPLD Schemas encourage better data shape design to allow for fast validation through clear discrimination where such variance exists.

# Alternative Discrimination Strategies

If we are designing the data layout for our example protocol (rather than consuming something we have no control over the design of), we could choose a alternate strategy that would allow more efficient discrimination. Unions allow for five different representation strategies that allow for different kinds of discrimination.

# Keyed

By making our "payload" object contain a specific key that discriminates the type of the payload, we could use a keyed Union:

{
  "msg": "Something bad happened",
  "payload": {
    "error": "ERROR"
  }
}
{
  "msg": "All good",
  "payload": {
    "progress": {
      "percent": 0.6,
      "last": "61626378797a"
    }
  }
}
{
  "msg": "Ping",
  "payload": {
    "ping": {
      "ts": 1572935564043,
      "nonce": "424f524b"
    }
  }
}

We can now easily handle this data with the following Schema:

type Message struct {
  msg String
  payload Payload
}

type Payload union {
  | Error "error"
  | Progress "progress"
  | Ping "ping"
} representation keyed

type Error string

type Progress struct {
  percent Float
  last String
}

type Ping struct {
  ts Int
  nonce String
}

Our Payload union now has the keyed representation strategy. This strategy means the Payload will have a Map representation kind, and that map will be required to have exactly one of the various keys that are used to discriminate the type present. Syntatically in the Schema DSL, Payload now lists quotes string keys next to the types, rather than the kinds of the previous kinded Union -- these are the discriminate values that will be seen in the map.

Validation of such data can now check for the presence of each of these keys, exactly one of them exists, and then hand off validation to the expected type at the node found in the valued of that key. If an "error" key is found, it will proceed to validate Error which assumes that the node is a String. If a "progress" key is found, it will proceed to validate that it finds a Map at the value node and that it matches the Progress type, etc.

# Envelope

A strategy similar to keyed, but more explicit and allowing for the retention of the "payload" node is the envelope representation strategy. With this strategy we expect that the type will be present as the value of a fixed key of a Map ("payload"), but we can discriminate the type of data to be found by inspecting the value of another key in the Map:

{
  "msg": "Something bad happened",
  "envelope": {
    "tag": "error",
    "payload": "ERROR"
  }
}
{
  "msg": "All good",
  "envelope": {
    "tag": "progress",
    "payload": {
      "percent": 0.6,
      "last": "61626378797a"
    }
  }
}
{
  "msg": "Ping",
  "envelope": {
    "tag": "ping",
    "payload": {
      "ts": 1572935564043,
      "nonce": "424f524b"
    }
  }
}

This strategy results in the payload data being in a predictable position in the document, as well as the discriminator value being in a predictable position in the document, but the structure in the payload part of the document varies.

Our Schema can now take the following form:

type Message struct {
  msg String
  envelope Payload
}

type Payload union {
  | Error "error"
  | Progress "progress"
  | Ping "ping"
} representation envelope {
  discriminantKey "tag"
  contentKey "payload"
}

type Error string

type Progress struct {
  percent Float
  last String
}

type Ping struct {
  ts Int
  nonce String
}

This envelope representation strategy requires the parameters discriminantKey and contentKey. The discriminantKey tells the Schema the key of the discriminator value, while the discriminator values are listed next to the types of the Union (in this case, the same values as we used in the keyed Union example, above).

# Inline

An inline representation strategy pulls up nested structures into the current node rather than navigating down to a child nodes to interpret the constituent type as per the previous Union representation strategies. Discrimination between types use a discriminantKey, also in the current node. This necessarily means that the current node must be a map representation kind and constituent types of a Union must also have map representation kinds.

Our example must be extended so that the Error type can be extracted from a map representation:

{
  "msg": "Something bad happened",
  "union": {
    "tag": "error",
    "message": "ERROR"
  }
}
{
  "msg": "All good",
  "union": {
    "tag": "progress",
    "percent": 0.6,
    "last": "61626378797a"
  }
}
{
  "msg": "Ping",
  "union": {
    "tag": "ping",
    "ts": 1572935564043,
    "nonce": "424f524b"
  }
}

For types in the union which are a struct with only one field (like the first example data above), this looks very similar to envelope unions... except notice that there's no contentKey in our union's representation definition -- so the string of the other map key in that example comes from the struct's field name! The behavior of inline union becomes clearer as the contained types get more fields: the tag field is always just next to the other map keys.

type Message struct {
  msg String
  union Payload
}

type Payload union {
  | Error "error"
  | Progress "progress"
  | Ping "ping"
} representation inline {
  discriminantKey "tag"
}

type Error struct {
  message String
}

type Progress struct {
  percent Float
  last String
}

type Ping struct {
  ts Int
  nonce String
}

The interface presented by this Schema is adjusted in comparison to the previous Unions as Error is now a Struct with a message field.

# Byteprefix Unions for Bytes

A special case union exists for handling Bytes kinds. Where a node contains a byte array (Bytes kind), we may want to discriminate between two different uses of that byte array at the application layer. For example, consider two different encoding schemes where we store a "key" field that is distinct for the each encoding scheme. For practical purposes they are both byte arrays, but at the application layer it helps to have them separated into distinct forms, perhaps so we can make simple assertions about getting the expected key type for the given encoding scheme. There are additional documentation clarity benefits for extracting distinct forms and naming them in a Schema that may factor in to such a decision.

type Authorization struct {
  key PublicKey
  keySize Int
}

type PublicKey union {
  | RsaPubkey 0
  | Ed25519Pubkey 1
} representation byteprefix

type RsaPubkey bytes
type Ed25519Pubkey bytes

By declaring a byteprefix union, we specify that the first byte of the byte array found at the key node of Authorization will discriminate which type the public key is. That first byte will be sliced off and expected to be either 0x0 or 0x1, then the remainder of the byte array will be extracted and encapsulated inside either RsaPubkey or Ed25519Pubkey depending on the discriminator byte.

# Copy

The Copy Schema kind is a special case that provides a mechanism for copying the definition of one named type into a new name. It uses the = token after the new type's name followed by name of the type being copied. It is not possible to copy an unnamed (anonymous) type.

type Ping struct {
  ts Int
  nonce String
}

type Pong = Ping

This example is strictly equivalent to the following in terms of the interaction above the Schema layer:

type Ping struct {
  ts Int
  nonce String
}

type Pong struct {
  ts Int
  nonce String
}

The Schema tooling and the reified form of the Schema retains a copy kind marker, but tooling that consumes Schemas is expected to treat this marker as an indirection to the named type being copied and copy the entirety of that type's definition to the new name.

The Copy type is provided for convenience and should also prove beneficial in pointing out relationships between types.

# Advanced Data Layouts

Advanced Data Layouts (ADL) are a mechanism for breaking out of Schema processing into custom logic where such logic cannot be expressed in Schemas but where connection with Schema kinds may be beneficial.

ADLs are not considered types in the Schema sense, rather, they masquerade as types, or more specifically, have the ability to masquerade as Schema kinds when used in certain conditions.

Declaration of an ADL is similar to declaring a type but only requires a name:

advanced ROT13

Once declared as an entity in the Schema, the name (ROT13 in this case) may be used as a representation elsewhere in the Schema. We do this with representation advanced followed by the name:

type MyString string representation advanced ROT13

Coupling this type and the advanced definition, we are declaring that there exists above the Schema layer some logic labelled ROT13 that is able to interact with the Data Model layer on behalf of MyString and present a standard String kind interface for such a purpose.

How the ADL logic is wired in to the Schema tooling will be language and tooling specific. For the purpose of Schema authoring, an advanced definition and usage can be considered as a mechanism to break out of the standard Data-model-to-Schema processing that is performed, and instead, inserting custom logic in that flow for the particular node in question such that it becomes Data-model-to-ADL-to-Schema.

The interaction with the Data Model is also left up to the ADL, so it is not limited to consuming a particular node. Rather, it can consume any number of nodes (or no nodes!) and even traverse links in an opaque fashion. Another example of an ADL example provides an example of this. In this case, we declare a sharded Map kind which may be used to scale to Maps of very large size and therefore include multiple, independent, blocks:

advanced ShardedMap

type MyMap { String : &Any } representation advanced ShardedMap

In this case, we declare a MyMap type that is considered a Map kind for the purpose of the rest of the Schema and presents as such above the Schema layer. Meanwhile we have inserted custom logic, labelled ShardedMap, that takes care of the decode/encode and traversal required to present a standard Map kind to the user of such a Schema.

representation advanced is currently only available for Map, List and Bytes kinds. Additional use cases (such as the hypothetical String kind above) may be considered in the future.

See Advanced Layouts for IPLD Schemas for more details regarding Advanced Data Layouts.

# Schemas in Markdown

IPLD Schemas are intended to serve a documentation role as well as a programmatic declarative role. In this documentation role, inline comments (#) can be helpful to expand on declarations with explanations, but expanding this documentation form to embedding IPLD Schemas in consumable Markdown is also possible. When embedded in Markdown in code blocks with the right language marker, IPLD Schema tooling can accept Markdown files and extract only those IPLD Schema portions it finds, substituting for a stand-alone Schema file.

When embedding IPLD Schema declarations in Markdown, use code blocks with the language marker ipldsch, i.e.:


```ipldsch
type Foo struct {
  a   Int
  b   Int
  msg Message
}

type Message string
```

Any such block found in a Markdown document will be extracted and stitched together to form a single Schema document.

Additionally, it is also possible to perform this process across multiple Markdown documents for sufficiently complex Schema declarations. When the IPLD Schema tooling is provided a list of Markdown files it will extract the ipldsch blocks and stitch them all together and assume they comprise a single stand-alone Schema document.