MemoryPack Binary Format for Ring Kernels

This document describes the exact binary format used by MemoryPack serialization in the DotCompute Ring Kernel system, enabling accurate CUDA deserializer generation.

Overview

MemoryPack is a high-performance binary serializer for .NET that uses a "zero-encoding" approach, copying C# memory structures as directly as possible. Unlike self-describing formats (JSON, MessagePack), MemoryPack requires a schema for deserialization.

Format Specification

Endianness

Little-endian (matches x86/x64 architecture)
No cross-endian support

Object Header

For classes/reference types with [MemoryPackable]:

┌─────────────────────────────────────────────────────────┐
│ Member Count (1 byte)                                   │
│   0-249: Number of serialized members                   │
│   255: Object is null                                   │
├─────────────────────────────────────────────────────────┤
│ Member 1 data (variable size based on type)             │
├─────────────────────────────────────────────────────────┤
│ Member 2 data (variable size based on type)             │
├─────────────────────────────────────────────────────────┤
│ ... (members in declaration order)                      │
└─────────────────────────────────────────────────────────┘

Important: Members are serialized in declaration order in the C# source file.

Primitive Types

All primitives use fixed-size little-endian encoding:

C# Type	Size (bytes)	CUDA Type	Notes
byte	1	uint8_t	Direct copy
sbyte	1	int8_t	Direct copy
bool	1	bool	0 = false, 1 = true
short	2	int16_t	Little-endian
ushort	2	uint16_t	Little-endian
int	4	int32_t	Little-endian
uint	4	uint32_t	Little-endian
long	8	int64_t	Little-endian
ulong	8	uint64_t	Little-endian
float	4	float	IEEE 754 single
double	8	double	IEEE 754 double
Guid	16	uint8_t[16]	Byte-by-byte (see below)

Guid Serialization

System.Guid is serialized as 16 consecutive bytes using Guid.ToByteArray() ordering:

Guid: 12345678-90AB-CDEF-1234-567890ABCDEF

Bytes (indices 0-15):
  0-3:   12 34 56 78  (Data1 - little-endian int32)
  4-5:   90 AB        (Data2 - little-endian int16)
  6-7:   CD EF        (Data3 - little-endian int16)
  8-15:  12 34 56 78 90 AB CD EF  (Data4 - byte array)

CUDA equivalent:

struct guid_t {
    uint8_t bytes[16];
};

Nullable Types

Nullable value types (T?) use a presence byte prefix:

┌─────────────────────────────────────────────────────────┐
│ Has Value (1 byte): 0 = null, non-zero = has value      │
├─────────────────────────────────────────────────────────┤
│ Value (if has_value != 0): type-specific encoding       │
└─────────────────────────────────────────────────────────┘

Example: Guid?

Scenario	Bytes
null	`[0x00]` (1 byte)
has value	`[0x01][16 bytes Guid data]` (17 bytes)

CUDA equivalent:

struct nullable_guid {
    uint8_t has_value;  // 0 = null, 1 = present
    uint8_t value[16];  // Only valid if has_value != 0
};

Collections

Collections use a 4-byte signed length prefix:

┌─────────────────────────────────────────────────────────┐
│ Count (4 bytes, signed int32)                           │
│   -1: Collection is null                                │
│   0+: Number of elements                                │
├─────────────────────────────────────────────────────────┤
│ Element 0 data                                          │
├─────────────────────────────────────────────────────────┤
│ Element 1 data                                          │
├─────────────────────────────────────────────────────────┤
│ ... (count elements)                                    │
└─────────────────────────────────────────────────────────┘

Note: For primitive arrays, elements are packed without padding.

Strings

Strings use UTF-8 encoding with a 4-byte length prefix:

┌─────────────────────────────────────────────────────────┐
│ Byte Count (4 bytes, signed int32)                      │
│   -1: String is null                                    │
│   0+: Number of UTF-8 bytes                             │
├─────────────────────────────────────────────────────────┤
│ UTF-8 bytes (no null terminator)                        │
└─────────────────────────────────────────────────────────┘

Ring Kernel Message Examples

VectorAddRequest

C# Definition:

[MemoryPackable]
public partial class VectorAddRequest : IRingKernelMessage
{
    public Guid MessageId { get; set; }           // 16 bytes
    public string MessageType => "VectorAddRequest"; // NOT serialized (read-only)
    public byte Priority { get; set; }            // 1 byte
    public Guid? CorrelationId { get; set; }      // 1 + 16 = 17 bytes
    public float A { get; set; }                  // 4 bytes
    public float B { get; set; }                  // 4 bytes
}

MemoryPack Binary Format (43 bytes total):

Offset  Size  Field           Description
──────  ────  ──────────────  ─────────────────────────────────────
0       1     [Header]        Member count = 5 (0x05)
1       16    MessageId       Guid bytes [0-15]
17      1     Priority        uint8_t value
18      1     CorrelationId   has_value flag
19      16    CorrelationId   Guid value (if has_value)
35      4     A               float (IEEE 754)
39      4     B               float (IEEE 754)
──────  ────
Total:  43 bytes

CUDA Struct:

struct vector_add_request {
    uint8_t message_id[16];       // Offset: 0
    uint8_t priority;             // Offset: 16
    struct {
        uint8_t has_value;        // Offset: 17
        uint8_t value[16];        // Offset: 18
    } correlation_id;
    float a;                      // Offset: 34
    float b;                      // Offset: 38
};

// Note: Struct size = 42 bytes (excludes 1-byte header)
// Buffer offset 1 is where struct data starts (after header)

CUDA Deserializer:

__device__ bool deserialize_vector_add_request(
    const uint8_t* buffer,
    int buffer_size,
    vector_add_request* out)
{
    // Buffer layout: [header:1][struct_data:42]
    if (buffer_size < 43) return false;

    // Check header (member count must be 5)
    if (buffer[0] != 5) return false;

    // Skip header byte
    const uint8_t* data = buffer + 1;

    // MessageId: 16 bytes at offset 0
    #pragma unroll
    for (int i = 0; i < 16; i++) {
        out->message_id[i] = data[i];
    }

    // Priority: 1 byte at offset 16
    out->priority = data[16];

    // CorrelationId: nullable Guid at offset 17
    out->correlation_id.has_value = data[17] != 0;
    if (out->correlation_id.has_value) {
        #pragma unroll
        for (int i = 0; i < 16; i++) {
            out->correlation_id.value[i] = data[18 + i];
        }
    }

    // A: float at offset 34
    out->a = *reinterpret_cast<const float*>(&data[34]);

    // B: float at offset 38
    out->b = *reinterpret_cast<const float*>(&data[38]);

    return true;
}

VectorAddResponse

C# Definition:

[MemoryPackable]
public partial class VectorAddResponse : IRingKernelMessage
{
    public Guid MessageId { get; set; }           // 16 bytes
    public string MessageType => "VectorAddResponse"; // NOT serialized
    public byte Priority { get; set; }            // 1 byte
    public Guid? CorrelationId { get; set; }      // 1 + 16 = 17 bytes
    public float Result { get; set; }             // 4 bytes
}

MemoryPack Binary Format (39 bytes total):

Offset  Size  Field           Description
──────  ────  ──────────────  ─────────────────────────────────────
0       1     [Header]        Member count = 4 (0x04)
1       16    MessageId       Guid bytes [0-15]
17      1     Priority        uint8_t value
18      1     CorrelationId   has_value flag
19      16    CorrelationId   Guid value (if has_value)
35      4     Result          float (IEEE 754)
──────  ────
Total:  39 bytes

Key Differences from IRingKernelMessage.Serialize()

The manual IRingKernelMessage.Serialize() implementation does NOT include the 1-byte member count header. This creates a format mismatch:

Aspect	MemoryPack Format	Manual Serialize()
Header	1-byte member count	None
VectorAddRequest	43 bytes	42 bytes
VectorAddResponse	39 bytes	38 bytes

Resolution: The CUDA deserializer must be aware of which format is being used and handle accordingly.

Debug Byte Layout

For debugging message format issues, use this format to log byte contents:

Message: VectorAddRequest (43 bytes)
─────────────────────────────────────────────────────────
Offset   Hex                                        ASCII
0000:    05                                         .
         ^^-- Member count header (5 members)
0001:    12 34 56 78 90 AB CD EF 12 34 56 78 90 AB ............
000F:    CD EF                                      ..
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^-- MessageId
0011:    80                                         .
         ^^-- Priority (128)
0012:    01                                         .
         ^^-- CorrelationId.has_value (true)
0013:    AA BB CC DD EE FF 00 11 22 33 44 55 66 77 ..........
0021:    88 99                                      ..
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^-- CorrelationId value
0023:    00 00 20 41                                .. A
         ^^^^^^^^^^-- A = 10.0f (0x41200000 in IEEE 754)
0027:    00 00 A0 41                                ...A
         ^^^^^^^^^^-- B = 20.0f (0x41A00000 in IEEE 754)

CUDA Memory Alignment Considerations

For optimal GPU performance, consider alignment when accessing struct fields:

4-byte aligned access: Use reinterpret_cast<const float*> only on 4-byte aligned addresses
Unaligned access: Use byte-by-byte copy for Guid and other non-aligned fields
Coalesced access: When processing message batches, ensure messages are aligned to warp boundaries

Example of safe unaligned float read:

__device__ float read_float_unaligned(const uint8_t* ptr) {
    float result;
    memcpy(&result, ptr, sizeof(float));
    return result;
}

Version Compatibility

MemoryPack format is not version-tolerant by default. Adding, removing, or reordering members will break compatibility.

For Ring Kernels, use:

Fixed message schemas - Define message types once, never modify
Versioned message types - Create VectorAddRequestV2 instead of modifying original
Schema registry - Track message type versions at runtime

Table of Contents