Codecs
Encode and decode anything
Introduction
Kit includes a powerful serialisation system called Codecs. Whether you're working with account data, instruction arguments, or custom binary layouts, Codecs give you the tools to transform structured data into bytes — and back again.
Codecs are composable, type-safe, and environment-agnostic. They are designed to provide a flexible and consistent foundation for handling binary data across the Solana stack.
Installation
Codecs are included within the @solana/kit
library but you may also install them using their standalone package.
Note that the @solana/codecs
package itself is composed of several smaller packages, each providing a different set of codec helpers. Here's the list of all packages containing codecs, should you need to install them individually:
Package | Description |
---|---|
@solana/kit | Includes @solana/codecs . |
@solana/codecs | Includes all codecs packages below. |
@solana/codecs-core | Core types and utilities for building codecs. |
@solana/codecs-numbers | Codecs for numbers of various sizes and characteristics. |
@solana/codecs-strings | Codecs for strings of various encodings and size strategies. |
@solana/codecs-data-structures | Codecs for a variety of data structures such as objects, enums, arrays, maps, etc. |
@solana/options | Codecs for Rust-like Options in JavaScript. |
What is a Codec?
A Codec is an object that knows how to encode a any type into a Uint8Array
and how to decode a Uint8Array
back into that value.
No matter which serialization strategy we use, Codecs abstract away its implementation and offer a simple encode and decode interface. They are also highly composable, allowing us to build complex data structures from simple building blocks.
Here's a quick example that encodes and decodes a simple Person
type.
Composing codecs
The easiest way to create your own codecs is to compose the various codecs at your disposal.
For instance, consider the following codecs available:
getStructCodec
: Creates a codec for objects with named fields.getU32Codec
: Creates a codec for unsigned 32-bit integers.getUtf8Codec
: Creates a codec for UTF-8 strings.addCodecSizePrefix
: Creates a codec that prefixes the encoded data with its length.getBooleanCodec
: Creates a codec for booleans using a single byte.
By combining them together we can create a custom codec for the following Person
type.
This function returns a Codec
object which contains both an encode
and decode
function that can be used to convert a Person
type to and from a Uint8Array
.
There is a significant library of composable codecs at your disposal, enabling you to compose complex types. Check out the available codecs section for more information. If you need a custom codec that cannot be composed from existing ones, you can always create your own as we will see in the "Creating custom codecs" section below.
Separate encoders and decoders
Whilst Codecs can both encode and decode, it is possible to only focus on encoding or decoding data, enabling the unused logic to be tree-shaken. For instance, here's our previous example using Encoders only to encode a Person
type.
The same can be done for decoding the Person
type by using Decoders like so.
Combining encoders and decoders
Separating Codecs into Encoders and Decoders is particularly good practice for library maintainers as it allows their users to tree-shake any of the encoders and/or decoders they don't need. However, we may still want to offer a codec helper for users who need both for convenience.
That's why this library offers a combineCodec
helper that creates a Codec
instance from a matching Encoder
and Decoder
.
This means library maintainers can offer Encoders, Decoders and Codecs for all their types whilst staying efficient and tree-shakeable. In summary, we recommend the following pattern when creating codecs for library types.
Different From
and To
types
When creating codecs, the encoded type is allowed to be looser than the decoded type. A good example of that is the u64 number codec:
As you can see, the first type parameter is looser since it accepts numbers or big integers, whereas the second type parameter only accepts big integers. That's because when encoding a u64 number, you may provide either a bigint
or a number
for convenience. However, when you decode a u64 number, you will always get a bigint
because not all u64 values can fit in a JavaScript number
type.
This relationship between the type we encode “From” and decode “To” can be generalized in TypeScript as To extends From
.
Here's another example using an object with default values. You can read more about the transformCodec helper below.
Fixed-size and variable-size codecs
It is also worth noting that Codecs can either be of fixed size or variable size.
FixedSizeCodecs
have a fixedSize
number attribute that tells us exactly how big their encoded data is in bytes.
On the other hand, VariableSizeCodecs
do not know the size of their encoded data in advance. Instead, they will grab that information either from the provided encoded data or from the value to encode. For the former, we can simply access the length of the Uint8Array
. For the latter, it provides a getSizeFromValue
that tells us the encoded byte size of the provided value.
Also note that, if the VariableSizeCodec
is bounded by a maximum size, it can be provided as a maxSize
number attribute.
The following type guards are available to identify and/or assert the size of codecs: isFixedSize
, isVariableSize
, assertIsFixedSize
and assertIsVariableSize
.
Finally, note that the same is true for Encoders
and Decoders
.
- A
FixedSizeEncoder
has afixedSize
number attribute. - A
VariableSizeEncoder
has agetSizeFromValue
function and an optionalmaxSize
number attribute. - A
FixedSizeDecoder
has afixedSize
number attribute. - A
VariableSizeDecoder
has an optionalmaxSize
number attribute.
Creating custom codecs
If composing codecs isn't enough for you, you may implement your own codec logic by using the createCodec
function. This function requires an object with a read
and a write
function telling us how to read from and write to an existing byte array.
The read
function accepts the bytes
to decode from and the offset
at each we should start reading. It returns an array with two items:
- The first item should be the decoded value.
- The second item should be the next offset to read from.
Reciprocally, the write
function accepts the value
to encode, the array of bytes
to write the encoded value to and the offset
at which it should be written. It should encode the given value, insert it in the byte array, and provide the next offset to write to as the return value.
Additionally, we must specify the size of the codec. If we are defining a FixedSizeCodec
, we must simply provide the fixedSize
number attribute. For VariableSizeCodecs
, we must provide the getSizeFromValue
function as described in the previous section.
Here's a concrete example of a custom codec that encodes any unsigned integer in a single byte. Since a single byte can only store integers from 0 to 255, if any other integer is provided it will take its modulo 256 to ensure it fits in a single byte. Because it always requires a single byte, that codec is a FixedSizeCodec
of size 1
.
Note that, it is also possible to create custom encoders and decoders separately by using the createEncoder
and createDecoder
functions respectively and then use the combineCodec
function on them just like we were doing with composed codecs.
This approach is recommended to library maintainers as it allows their users to tree-shake any of the encoders and/or decoders they don't need.
Here's our previous modulo u8 example but split into separate Encoder
, Decoder
and Codec
instances.
Here's another example returning a VariableSizeCodec
. This one transforms a simple string composed of characters from a
to z
to a buffer of numbers from 1
to 26
where 0
bytes are spaces.
Available codecs
Core utilities
addCodecSentinel
addCodecSizePrefix
containsBytes
fixBytes
fixCodecSize
mergeBytes
offsetCodec
padBytes
padLeftCodec
padRightCodec
resizeCodec
reverseCodec
transformCodec
Numbers
getI8Codec
getI16Codec
getI32Codec
getI64Codec
getI128Codec
getF32Codec
getF64Codec
getShortU16Codec
getU8Codec
getU16Codec
getU32Codec
getU64Codec
getU128Codec
Strings
getBase10Codec
getBase16Codec
getBase58Codec
getBase64Codec
getBaseXCodec
getBaseXResliceCodec
getUtf8Codec
Data structures
getArrayCodec
getBitArrayCodec
getBooleanCodec
getBytesCodec
getConstantCodec
getDiscriminatedUnionCodec
getEnumCodec
getHiddenPrefixCodec
getHiddenSuffixCodec
getLiteralUnionCodec
getMapCodec
getNullableCodec
getOptionCodec
getSetCodec
getStructCodec
getTupleCodec
getUnionCodec
getUnitCodec
Core utilities listing
addCodecSentinel
One way of delimiting the size of a codec is to use sentinels. The addCodecSentinel
function allows us to add a sentinel to the end of the encoded data and to read until that sentinel is found when decoding. It accepts any codec and a Uint8Array
sentinel responsible for delimiting the encoded data.
Note that the sentinel must not be present in the encoded data and must be present in the decoded data for this to work. If this is not the case, dedicated errors will be thrown.
Separate addEncoderSentinel
and addDecoderSentinel
functions are also available.
addCodecSizePrefix
The addCodecSizePrefix
function allows us to store the byte size of any codec as a number prefix, enabling us to contain variable-size codecs to their actual size.
When encoding, the size of the encoded data is stored before the encoded data itself. When decoding, the size is read first to know how many bytes to read next.
For example, say we want to represent a variable-size base-58 string using a u32
size prefix. Here's how we can use the addCodecSizePrefix
function to achieve that.
You may also use the addEncoderSizePrefix
and addDecoderSizePrefix
functions to separate your codec logic like so:
containsBytes
Checks if a Uint8Array
contains another Uint8Array
at a given offset.
fixBytes
Pads or truncates a Uint8Array
so it has the specified length.
fixCodecSize
The fixCodecSize
function allows us to bind the size of a given codec to the given fixed size.
For instance, say we wanted to represent a base-58 string that uses exactly 32 bytes when decoded. Here's how we can use the fixCodecSize
helper to achieve that.
You may also use the fixEncoderSize
and fixDecoderSize
functions to separate your codec logic like so:
mergeBytes
Concatenates an array of Uint8Arrays
into a single Uint8Array
.
offsetCodec
The offsetCodec
function is a powerful codec primitive that allows us to move the offset of a given codec forward or backwards. It accepts one or two functions that takes the current offset and returns a new offset.
To understand how this works, let's take the following biggerU32Codec
example which encodes a u32
number inside an 8-byte buffer by using the resizeCodec helper.
Now, let's say we want to move the offset of that codec 2 bytes forward so that the encoded number sits in the middle of the buffer. To achieve, this we can use the offsetCodec
helper and provide a preOffset
function that moves the "pre-offset" of the codec 2 bytes forward.
We refer to this offset as the "pre-offset" because, once the inner codec is encoded or decoded, an additional offset will be returned which we refer to as the "post-offset". That "post-offset" is important as, unless we are reaching the end of our codec, it will be used by any further codecs to continue encoding or decoding data.
By default, that "post-offset" is simply the addition of the "pre-offset" and the size of the encoded or decoded inner data.
However, you may also provide a postOffset
function to adjust the "post-offset". For instance, let's push the "post-offset" 2 bytes forward as well such that any further codecs will start doing their job at the end of our 8-byte u32
number.
Both the preOffset
and postOffset
functions offer the following attributes:
bytes
: The entire byte array being encoded or decoded.preOffset
: The original and unaltered pre-offset.wrapBytes
: A helper function that wraps the given offset around the byte array length. E.g.wrapBytes(-1)
will refer to the last byte of the byte array.
Additionally, the post-offset function also provides the following attributes:
newPreOffset
: The new pre-offset after the pre-offset function has been applied.postOffset
: The original and unaltered post-offset.
Note that you may also decide to ignore these attributes to achieve absolute offsets. However, relative offsets are usually recommended as they won't break your codecs when composed with other codecs.
Also note that any negative offset or offset that exceeds the size of the byte array will throw a SolanaError
of code SOLANA_ERROR__CODECS__OFFSET_OUT_OF_RANGE
.
To avoid this, you may use the wrapBytes
function to wrap the offset around the byte array length. For instance, here's how we can use the wrapBytes
function to move the pre-offset 4 bytes from the end of the byte array.
As you can see, the offsetCodec
helper allows you to jump all over the place with your codecs. This non-linear approach to encoding and decoding data allows you to achieve complex serialization strategies that would otherwise be impossible.
The offsetEncoder
and offsetDecoder
functions can also be used to split your codec logic into tree-shakeable functions.
padBytes
Pads a Uint8Array
with zeroes (to the right) to the specified length.
padLeftCodec
The padLeftCodec
helper can be used to add padding to the left of a given codec. It accepts an offset
number that tells us how big the padding should be.
Note that the padLeftCodec
function is a simple wrapper around the offsetCodec
and resizeCodec
functions. For more complex padding strategies, you may want to use the offsetCodec and resizeCodec functions directly instead.
Encoder-only and decoder-only helpers are available for these padding functions.
padRightCodec
The padRightCodec
helper can be used to add padding to the right of a given codec. It accepts an offset
number that tells us how big the padding should be.
Note that the padRightCodec
function is a simple wrapper around the offsetCodec
and resizeCodec
functions. For more complex padding strategies, you may want to use the offsetCodec and resizeCodec functions directly instead.
Encoder-only and decoder-only helpers are available for these padding functions.
resizeCodec
The resizeCodec
helper re-defines the size of a given codec by accepting a function that takes the current size of the codec and returns a new size. This works for both fixed-size and variable-size codecs.
Note that the resizeCodec
function doesn't change any encoded or decoded bytes, it merely tells the encode
and decode
functions how big the Uint8Array
should be before delegating to their respective write
and read
functions. In fact, this is completely bypassed when using the write
and read
functions directly. For instance:
So when would it make sense to use the resizeCodec
function? This function is particularly useful when combined with the offsetCodec function. Whilst offsetCodec
may help us push the offset forward — e.g. to skip some padding — it won't change the size of the encoded data which means the last bytes will be truncated by how much we pushed the offset forward. The resizeCodec
function can be used to fix that. For instance, here's how we can use the resizeCodec
and the offsetCodec
functions together to create a struct codec that includes some padding.
Note that this can be achieved using the padLeftCodec helper which is implemented that way.
The resizeEncoder
and resizeDecoder
functions can also be used to split your codec logic into tree-shakeable functions.
reverseCodec
The reverseCodec
helper reverses the bytes of the provided FixedSizeCodec
.
Note that number codecs can already do that for you via their endian
option.
The reverseEncoder
and reverseDecoder
functions can also be used to achieve that.
transformCodec
It is possible to transform a Codec<T>
to a Codec<U>
by providing two mapping functions: one that goes from T
to U
and one that does the opposite.
For instance, here's how you would map a u32
integer into a string
representation of that number.
If a Codec
has different From and To types, say Codec<OldFrom, OldTo>
, and we want to map it to Codec<NewFrom, NewTo>
, we must provide functions that map from NewFrom
to OldFrom
and from OldTo
to NewTo
.
To illustrate that, let's take our previous getStringU32Codec
example but make it use a getU64Codec
codec instead as it returns a Codec<number | bigint, bigint>
. Additionally, let's make it so our getStringU64Codec
function returns a Codec<number | string, string>
so that it also accepts numbers when encoding values. Here's what our mapping functions look like:
Note that the second function that maps the decoded type is optional. That means, you can omit it to simply update or loosen the type to encode whilst keeping the decoded type the same.
This is particularly useful to provide default values to object structures. For instance, here's how we can map a Person
codec to give a default value to its age
attribute.
Similar helpers exist to map Encoder
and Decoder
instances allowing you to separate your codec logic into tree-shakeable functions. Here's our getStringU32Codec
written that way.
Numbers listing
getI8Codec
Encodes and decodes signed 8-bit integers. It supports values from -127 (-2^7
) to 128 (2^7 - 1
).
Values can be provided as either number
or bigint
, but the decoded value is always a number
.
getI8Encoder
and getI8Decoder
functions are also available.
getI16Codec
Encodes and decodes signed 16-bit integers. It supports values from -32,768 (-2^15
) to 32,767 (2^15 - 1
).
Values can be provided as either number
or bigint
, but the decoded value is always a number
. Endianness can be specified using the endian
option. The default is Endian.Little
.
getI16Encoder
and getI16Decoder
functions are also available.
getI32Codec
Encodes and decodes signed 32-bit integers. It supports values from -2,147,483,648 (-2^31
) to 2,147,483,647 (2^31 - 1
).
Values can be provided as either number
or bigint
, but the decoded value is always a number
. Endianness can be specified using the endian
option. The default is Endian.Little
.
getI32Encoder
and getI32Decoder
functions are also available.
getI64Codec
Encodes and decodes signed 64-bit integers. It supports values from -2^63
to 2^63 - 1
.
Values can be provided as either number
or bigint
, but the decoded value is always a bigint
. Endianness can be specified using the endian
option. The default is Endian.Little
.
getI64Encoder
and getI64Decoder
functions are also available.
getI128Codec
Encodes and decodes signed 128-bit integers. It supports values from -2^127
to 2^127 - 1
.
Values can be provided as either number
or bigint
, but the decoded value is always a bigint
. Endianness can be specified using the endian
option. The default is Endian.Little
.
getI128Encoder
and getI128Decoder
functions are also available.
getF32Codec
Encodes and decodes 32-bit floating-point numbers. Due to the IEEE 754 floating-point representation, some precision loss may occur.
Values can be provided as either number
or bigint
, but the decoded value is always a number
. Endianness can be specified using the endian
option. The default is Endian.Little
.
getF32Encoder
and getF32Decoder
functions are also available.
getF64Codec
Encodes and decodes 64-bit floating-point numbers. Due to the IEEE 754 floating-point representation, some precision loss may occur.
Values can be provided as either number
or bigint
, but the decoded value is always a number
. Endianness can be specified using the endian
option. The default is Endian.Little
.
getF64Encoder
and getF64Decoder
functions are also available.
getShortU16Codec
Encodes and decodes unsigned integer using 1 to 3 bytes based on the encoded value. It supports values from 0 to 4,194,303 (2^22 - 1
).
The larger the value, the more bytes it uses.
- If the value is
<= 0x7f
(127), it is stored in a single byte and the first bit is set to0
to indicate the end of the value. - Otherwise, the first bit is set to
1
to indicate that the value continues in the next byte, which follows the same pattern. - This process repeats until the value is fully encoded in up to 3 bytes. The third and last byte, if needed, uses all 8 bits to store the remaining value.
In other words, the encoding scheme follows this structure:
Values can be provided as either number
or bigint
, but the decoded value is always a number
.
getShortU16Encoder
and getShortU16Decoder
functions are also available.
getU8Codec
Encodes and decodes unsigned 8-bit integers. It supports values from 0 to 255 (2^8 - 1
).
Values can be provided as either number
or bigint
, but the decoded value is always a number
.
getU8Encoder
and getU8Decoder
functions are also available.
getU16Codec
Encodes and decodes unsigned 16-bit integers. It supports values from 0 to 65,535 (2^16 - 1
).
Values can be provided as either number
or bigint
, but the decoded value is always a number
. Endianness can be specified using the endian
option. The default is Endian.Little
.
getU16Encoder
and getU16Decoder
functions are also available.
getU32Codec
Encodes and decodes unsigned 32-bit integers. It supports values from 0 to 4,294,967,295 (2^32 - 1
).
Values can be provided as either number
or bigint
, but the decoded value is always a number
. Endianness can be specified using the endian
option. The default is Endian.Little
.
getU32Encoder
and getU32Decoder
functions are also available.
getU64Codec
Encodes and decodes unsigned 64-bit integers. It supports values from 0 to 2^64 - 1
.
Values can be provided as either number
or bigint
, but the decoded value is always a bigint
. Endianness can be specified using the endian
option. The default is Endian.Little
.
getU64Encoder
and getU64Decoder
functions are also available.
getU128Codec
Encodes and decodes unsigned 128-bit integers. It supports values from 0 to 2^128 - 1
.
Values can be provided as either number
or bigint
, but the decoded value is always a bigint
. Endianness can be specified using the endian
option. The default is Endian.Little
.
getU128Encoder
and getU128Decoder
functions are also available.
Strings listing
getBase10Codec
Encodes and decodes Base 10 strings.
This codec does not enforce a size boundary. It will encode and decode all bytes necessary to represent the string. To add size constraints to your codec, you may use utility functions such as fixCodecSize
, addCodecSizePrefix
or addCodecSentinel
.
getBase10Encoder
and getBase10Decoder
functions are also available.
getBase16Codec
Encodes and decodes Base 16 strings.
This codec does not enforce a size boundary. It will encode and decode all bytes necessary to represent the string. To add size constraints to your codec, you may use utility functions such as fixCodecSize
, addCodecSizePrefix
or addCodecSentinel
.
getBase16Encoder
and getBase16Decoder
functions are also available.
getBase58Codec
Encodes and decodes Base 58 strings.
This codec does not enforce a size boundary. It will encode and decode all bytes necessary to represent the string. To add size constraints to your codec, you may use utility functions such as fixCodecSize
, addCodecSizePrefix
or addCodecSentinel
.
getBase58Encoder
and getBase58Decoder
functions are also available.
getBase64Codec
Encodes and decodes Base 64 strings.
This codec does not enforce a size boundary. It will encode and decode all bytes necessary to represent the string. To add size constraints to your codec, you may use utility functions such as fixCodecSize
, addCodecSizePrefix
or addCodecSentinel
.
getBase64Encoder
and getBase64Decoder
functions are also available.
getBaseXCodec
The getBaseXCodec
accepts a custom alphabet
of X
characters and creates a base-X codec using that alphabet. It does so by iteratively dividing by X
and handling leading zeros.
This codec does not enforce a size boundary. It will encode and decode all bytes necessary to represent the string. To add size constraints to your codec, you may use utility functions such as fixCodecSize
, addCodecSizePrefix
or addCodecSentinel
.
getBaseXEncoder
and getBaseXDecoder
functions are also available.
getBaseXResliceCodec
The getBaseXResliceCodec
accepts a custom alphabet
of X
characters and creates a base-X codec using that alphabet.
It does so by re-slicing bytes into custom chunks of bits that are then mapped to the provided alphabet
. The number of bits per chunk is also provided as the second argument and should typically be set to log2(alphabet.length)
.
This is typically used to create codecs whose alphabet's length is a power of 2 such as base-16 or base-64.
This codec does not enforce a size boundary. It will encode and decode all bytes necessary to represent the string. To add size constraints to your codec, you may use utility functions such as fixCodecSize
, addCodecSizePrefix
or addCodecSentinel
.
getBaseXResliceEncoder
and getBaseXResliceDecoder
functions are also available.
getUtf8Codec
Encodes and decodes UTF-8 strings.
This codec does not enforce a size boundary. It will encode and decode all bytes necessary to represent the string. To add size constraints to your codec, you may use utility functions such as fixCodecSize
, addCodecSizePrefix
or addCodecSentinel
.
getUtf8Encoder
and getUtf8Decoder
functions are also available.
Data structures listing
getArrayCodec
The getArrayCodec
function accepts any codec of type T
and returns a codec of type Array<T>
.
By default, the size of the array is stored as a u32
prefix before encoding the items.
However, you may use the size
option to configure this behaviour. It can be one of the following three strategies:
Codec<number>
: When a number codec is provided, that codec will be used to encode and decode the size prefix.number
: When a number is provided, the codec will expect a fixed number of items in the array. An error will be thrown when trying to encode an array of a different length."remainder"
: When the string"remainder"
is passed as a size, the codec will use the remainder of the bytes to encode/decode its items. This means the size is not stored or known in advance but simply inferred from the rest of the buffer. For instance, if we have an array ofu16
numbers and 10 bytes remaining, we know there are 5 items in this array.
getArrayEncoder
and getArrayDecoder
functions are also available.
getBitArrayCodec
The getBitArrayCodec
function returns a codec that encodes and decodes an array of booleans such that each boolean is represented by a single bit. It requires the size of the codec in bytes and an optional backward
flag that can be used to reverse the order of the bits.
getBitArrayEncoder
and getBitArrayDecoder
functions are also available.
getBooleanCodec
The getBooleanCodec
function returns a Codec<boolean>
that stores the boolean as 0
or 1
using a u8
number by default.
You may configure that behaviour by providing an explicit number codec as the size
option of the getBooleanCodec
function. That number codec will then be used to encode and decode the values 0
and 1
accordingly.
getBooleanEncoder
and getBooleanDecoder
functions are also available.
getBytesCodec
The getBytesCodec
function returns a Codec<Uint8Array>
meaning it converts Uint8Arrays
to and from… Uint8Arrays
! Whilst this might seem a bit useless, it can be useful when composed into other codecs. For example, you could use it in a struct codec to say that a particular field should be left unserialised.
The getBytesCodec
function will encode and decode Uint8Arrays
using as many bytes as necessary. If you'd like to restrict the number of bytes used by this codec, you may combine it with utilities such as fixCodecSize
, addCodecSizePrefix
or addCodecSentinel
.
getBytesEncoder
and getBytesDecoder
functions are also available.
getConstantCodec
The getConstantCodec
function accepts any Uint8Array
and returns a Codec<void>
. When encoding, it will set the provided Uint8Array
as-is. When decoding, it will assert that the next bytes contain the provided Uint8Array
and move the offset forward.
getConstantEncoder
and getConstantDecoder
functions are also available.
getDiscriminatedUnionCodec
In Rust, enums are powerful data types whose variants can be one of the following:
- An empty variant — e.g.
enum Message { Quit }
. - A tuple variant — e.g.
enum Message { Write(String) }
. - A struct variant — e.g.
enum Message { Move { x: i32, y: i32 } }
.
Whilst we do not have such powerful enums in JavaScript, we can emulate them in TypeScript using a union of objects such that each object is differentiated by a specific field. We call this a discriminated union.
We use a special field named __kind
to distinguish between the different variants of a discriminated union. Additionally, since all variants are objects, we can use a fields
property to wrap the array of tuple variants. Here is an example.
The getDiscriminatedUnionCodec
function helps us encode and decode these discriminated unions.
It requires the discriminator and codec of each variant as a first argument. Similarly to the getStructCodec, these are defined as an array of variant tuples where the first item is the discriminator of the variant and the second item is its codec. Since empty variants do not have data to encode, they simply use the getUnitCodec which does nothing.
Here is how we can create a discriminated union codec for our previous example.
And here's how we can use such a codec to encode discriminated unions. Notice that by default, they use a u8
number prefix to distinguish between the different types of variants.
However, you may provide a number codec as the size
option of the getDiscriminatedUnionCodec
function to customise that behaviour.
You may also customize the discriminator property — which defaults to __kind
— by providing the desired property name as the discriminator
option like so:
Note that, the discriminator value of a variant may be any scalar value — such as number
, bigint
, boolean
, a JavaScript enum
, etc. For instance, the following is also valid:
getDiscriminatedUnionEncoder
and getDiscriminatedUnionDecoder
functions are also available.
getEnumCodec
The getEnumCodec
function accepts a JavaScript enum constructor and returns a codec for encoding and decoding values of that enum.
When encoding an enum, you may either provide the value of the enum variant — e.g. Direction.Left
— or its key — e.g. 'Left'
.
By default, a u8
number is being used to store the enum value. However, a number codec may be passed as the size
option to configure that behaviour.
This function also works with lexical enums — e.g. enum Direction { Left = '←' }
— explicit numerical enums — e.g. enum Speed { Left = 50 }
— and hybrid enums with a mix of both.
Notice how, by default, the index of the enum variant is used to encode the value of the enum. For instance, in the example above, Numbers.Five
is encoded as 0x01
even though its value is 5
. This is also true for lexical enums.
However, when dealing with numerical enums that have explicit values, you may use the useValuesAsDiscriminators
option to encode the value of the enum variant instead of its index.
Note that when using the useValuesAsDiscriminators
option on an enum that contains a lexical value, an error will be thrown.
getEnumEncoder
and getEnumDecoder
functions are also available.
getHiddenPrefixCodec
The getHiddenPrefixCodec
function allow us to prepend a list of hidden Codec<void>
to a given codec.
When encoding, the hidden codecs will be encoded before the main codec and the offset will be moved accordingly. When decoding, the hidden codecs will be decoded but only the result of the main codec will be returned. This is particularly helpful when creating data structures that include constant values that should not be included in the final type.
getHiddenPrefixEncoder
and getHiddenPrefixDecoder
functions are also available.
getHiddenSuffixCodec
The getHiddenSuffixCodec
function allow us to append a list of hidden Codec<void>
to a given codec.
When encoding, the hidden codecs will be encoded after the main codec and the offset will be moved accordingly. When decoding, the hidden codecs will be decoded but only the result of the main codec will be returned. This is particularly helpful when creating data structures that include constant values that should not be included in the final type.
getHiddenSuffixEncoder
and getHiddenSuffixDecoder
functions are also available.
getLiteralUnionCodec
The getLiteralUnionCodec
function accepts an array of literal values — such as string
, number
, boolean
, etc. — and returns a codec that encodes and decodes such values by using their index in the array. It uses TypeScript unions to represent all the possible values.
It uses a u8
number by default to store the index of the value. However, you may provide a number codec as the size
option of the getLiteralUnionCodec
function to customise that behaviour.
getLiteralUnionEncoder
and getLiteralUnionDecoder
functions are also available.
getMapCodec
The getMapCodec
function accepts two codecs of type K
and V
and returns a codec of type Map<K, V>
.
Each entry (key/value pair) is encoded one after the other with the key first and the value next. By default, the size of the map is stored as a u32
prefix before encoding the entries.
However, you may use the size
option to configure this behaviour. It can be one of the following three strategies:
Codec<number>
: When a number codec is provided, that codec will be used to encode and decode the size prefix.number
: When a number is provided, the codec will expect a fixed number of entries in the map. An error will be thrown when trying to encode a map of a different length."remainder"
: When the string"remainder"
is passed as a size, the codec will use the remainder of the bytes to encode/decode its entries. This means the size is not stored or known in advance but simply inferred from the rest of the buffer. For instance, if we have a map ofu16
numbers and 10 bytes remaining, we know there are 5 entries in this map.
getMapEncoder
and getMapDecoder
functions are also available.
getNullableCodec
The getNullableCodec
function accepts a codec of type T
and returns a codec of type T | null
. It stores whether or not the item exists as a boolean prefix using a u8
by default.
You may provide a number codec as the prefix
option of the getNullableCodec
function to configure how to store the boolean prefix.
Additionally, if the item is a FixedSizeCodec
, you may set the noneValue
option to "zeroes"
to also make the returned nullable codec a FixedSizeCodec
. To do so, it will pad null
values with zeroes to match the length of existing values.
The noneValue
option can also be set to an explicit byte array to use as the padding for null
values. Note that, in this case, the returned codec will not be a FixedSizeCodec
as the byte array representing null
values may be of any length.
The prefix
option of the getNullableCodec
function can also be set to null
, meaning no prefix will be used to determine whether the item exists. In this case, the codec will rely on the noneValue
option to determine whether the item is null
.
Note that if prefix
is set to null
and no noneValue
is provided, the codec assumes that the item exists if and only if some remaining bytes are available to decode. This could be useful to describe data structures that may or may not have additional data to the end of the buffer.
To recap, here are all the possible configurations of the getNullableCodec
function, using a u16
codec as an example.
encode(42) / encode(null) | No noneValue (default) | noneValue: "zeroes" | Custom noneValue (0xff ) |
---|---|---|---|
u8 prefix (default) | 0x012a00 / 0x00 | 0x012a00 / 0x000000 | 0x012a00 / 0x00ff |
Custom prefix (u16 ) | 0x01002a00 / 0x0000 | 0x01002a00 / 0x00000000 | 0x01002a00 / 0x0000ff |
No prefix | 0x2a00 / 0x | 0x2a00 / 0x0000 | 0x2a00 / 0xff |
Note that you might be interested in the Rust-like alternative version of nullable codecs, available as the getOptionCodec function.
getNullableEncoder
and getNullableDecoder
functions are also available.
getOptionCodec
The getOptionCodec
function accepts a codec of type T
and returns a codec of type Option<T>
— as defined in the @solana/options
package. Note that, when encoding, T
or null
may also be provided directly as input and will be interpreted as Some(T)
or None
respectively. However, when decoding, the output will always be an Option<T>
type.
It stores whether or not the item exists as a boolean prefix using a u8
by default.
You may provide a number codec as the prefix
option of the getOptionCodec
function to configure how to store the boolean prefix.
Additionally, if the item is a FixedSizeCodec
, you may set the noneValue
option to "zeroes"
to also make the returned Option codec a FixedSizeCodec
. To do so, it will pad None
values with zeroes to match the length of existing values.
The noneValue
option can also be set to an explicit byte array to use as the padding for None
values. Note that, in this case, the returned codec will not be a FixedSizeCodec
as the byte array representing None
values may be of any length.
The prefix
option of the getOptionCodec
function can also be set to null
, meaning no prefix will be used to determine whether the item exists. In this case, the codec will rely on the noneValue
option to determine whether the item is None
.
Note that if prefix
is set to null
and no noneValue
is provided, the codec assume that the item exists if and only if some remaining bytes are available to decode. This could be useful to describe data structures that may or may not have additional data to the end of the buffer.
To recap, here are all the possible configurations of the getOptionCodec
function, using a u16
codec as an example.
encode(some(42)) / encode(none()) | No noneValue (default) | noneValue: "zeroes" | Custom noneValue (0xff ) |
---|---|---|---|
u8 prefix (default) | 0x012a00 / 0x00 | 0x012a00 / 0x000000 | 0x012a00 / 0x00ff |
Custom prefix (u16 ) | 0x01002a00 / 0x0000 | 0x01002a00 / 0x00000000 | 0x01002a00 / 0x0000ff |
No prefix | 0x2a00 / 0x | 0x2a00 / 0x0000 | 0x2a00 / 0xff |
getOptionEncoder
and getOptionDecoder
functions are also available.
getSetCodec
The getSetCodec
function accepts any codec of type T
and returns a codec of type Set<T>
.
By default, the size of the set is stored as a u32
prefix before encoding the items.
However, you may use the size
option to configure this behaviour. It can be one of the following three strategies:
Codec<number>
: When a number codec is provided, that codec will be used to encode and decode the size prefix.number
: When a number is provided, the codec will expect a fixed number of items in the set. An error will be thrown when trying to encode a set of a different length."remainder"
: When the string"remainder"
is passed as a size, the codec will use the remainder of the bytes to encode/decode its items. This means the size is not stored or known in advance but simply inferred from the rest of the buffer. For instance, if we have a set ofu16
numbers and 10 bytes remaining, we know there are 5 items in this set.
getSetEncoder
and getSetDecoder
functions are also available.
getStructCodec
The getStructCodec
function accepts any number of field codecs and returns a codec for an object containing all these fields. Each provided field is an array such that the first item is the name of the field and the second item is the codec used to encode and decode that field type.
getStructEncoder
and getStructDecoder
functions are also available.
getTupleCodec
The getTupleCodec
function accepts any number of codecs — T
, U
, V
, etc. — and returns a tuple codec of type [T, U, V, …]
such that each item is in the order of the provided codecs.
getTupleEncoder
and getTupleDecoder
functions are also available.
getUnionCodec
The getUnionCodec
is a lower-lever codec helper that can be used to encode/decode any TypeScript union.
It accepts the following arguments:
- An array of codecs, each defining a variant of the union.
- A
getIndexFromValue
function which, given a value of the union, returns the index of the codec that should be used to encode that value. - A
getIndexFromBytes
function which, given the byte array to decode at a given offset, returns the index of the codec that should be used to decode the next bytes.
getUnionEncoder
and getUnionDecoder
functions are also available.
getUnitCodec
The getUnitCodec
function returns a Codec<void>
that encodes undefined
into an empty Uint8Array
and returns undefined
without consuming any bytes when decoding. This is more of a low-level codec that can be used internally by other codecs. For instance, this is how getDiscriminatedUnionCodec describes the codecs of empty variants.
getUnitEncoder
and getUnitDecoder
functions are also available.