Table of Contents

Class PinnedStagingBuffer

Namespace
DotCompute.Core.Messaging
Assembly
DotCompute.Core.dll

Lock-free pinned memory buffer for staging messages before GPU transfer.

public sealed class PinnedStagingBuffer : IDisposable
Inheritance
PinnedStagingBuffer
Implements
Inherited Members
Extension Methods

Remarks

This buffer uses pinned (non-movable) memory to enable zero-copy DMA transfers to GPU via CUDA/OpenCL/Metal. It implements a lock-free multi-producer/single-consumer ring buffer for maximum throughput.

Performance Characteristics: - Enqueue: O(1) amortized, lock-free CAS operations - Dequeue: O(1), single-consumer (pump thread) - Memory: Pinned, non-GC heap (careful with large capacities) - Latency: Sub-microsecond for cache-resident operations

Usage Pattern:

using var buffer = new PinnedStagingBuffer(capacity: 4096, messageSize: 256);

// Producer threads (lock-free)
if (buffer.TryEnqueue(messageBytes))
{
    // Message staged successfully
}

// Consumer thread (pump service)
Span<byte> batch = stackalloc byte[batchSize * messageSize];
int count = buffer.DequeueBatch(batch, batchSize);
// Transfer 'batch' to GPU via cuMemcpy/clEnqueueWrite/MTLBuffer.copy

Constructors

PinnedStagingBuffer(int, int)

Initializes a new instance of the PinnedStagingBuffer class.

public PinnedStagingBuffer(int capacity, int messageSize)

Parameters

capacity int

Maximum number of messages the buffer can hold (must be power of 2).

messageSize int

Fixed size of each message in bytes.

Exceptions

ArgumentException

Thrown if capacity is not a power of 2 or less than 2.

ArgumentOutOfRangeException

Thrown if messageSize is less than 1.

Properties

BufferPointer

Gets a pointer to the pinned buffer for direct GPU access.

public nint BufferPointer { get; }

Property Value

nint

Remarks

This pointer remains valid for the lifetime of the PinnedStagingBuffer. Use this for zero-copy DMA transfers to GPU via: - CUDA: cuMemcpyHtoD(devicePtr, BufferPointer, size) - OpenCL: clEnqueueWriteBuffer(queue, buffer, CL_TRUE, 0, size, BufferPointer, ...) - Metal: [mtlBuffer contents] = BufferPointer (or use didModifyRange)

Safety: Do not dereference this pointer after Dispose() is called.

Capacity

Gets the maximum number of messages the buffer can hold.

public int Capacity { get; }

Property Value

int

Count

Gets the current number of messages in the buffer.

public int Count { get; }

Property Value

int

Remarks

This is an approximate count due to lock-free operations. Use for monitoring only.

IsEmpty

Gets a value indicating whether the buffer is empty.

public bool IsEmpty { get; }

Property Value

bool

IsFull

Gets a value indicating whether the buffer is full.

public bool IsFull { get; }

Property Value

bool

MessageSize

Gets the fixed size of each message in bytes.

public int MessageSize { get; }

Property Value

int

Methods

Clear()

Clears all messages from the buffer.

public void Clear()

Remarks

This operation is NOT thread-safe with concurrent enqueue/dequeue operations. Use only when the buffer is idle (e.g., during shutdown or reset).

DequeueBatch(Span<byte>, int)

Dequeues a batch of messages from the staging buffer.

public int DequeueBatch(Span<byte> destination, int maxMessages)

Parameters

destination Span<byte>

The destination buffer to write messages to (must be at least maxMessages * MessageSize bytes).

maxMessages int

Maximum number of messages to dequeue.

Returns

int

The actual number of messages dequeued (0 if buffer is empty).

Remarks

This method is designed for single-consumer use (the pump thread). It is NOT thread-safe for multiple concurrent consumers.

Batching reduces per-message overhead for GPU transfers. Typical batch sizes: - Low-latency: 1-8 messages (minimize latency) - Balanced: 16-64 messages (balance latency vs throughput) - High-throughput: 128-512 messages (maximize PCIe bandwidth)

Exceptions

ArgumentException

Thrown if destination is too small.

ObjectDisposedException

Thrown if the buffer has been disposed.

Dispose()

Performs application-defined tasks associated with freeing, releasing, or resetting unmanaged resources.

public void Dispose()

~PinnedStagingBuffer()

Finalizer to ensure pinned memory is released.

protected ~PinnedStagingBuffer()

GetBuffer()

Gets a read-only span of the pinned buffer for direct GPU read access.

public ReadOnlySpan<byte> GetBuffer()

Returns

ReadOnlySpan<byte>

Remarks

Use this for zero-copy reads when the GPU can directly access host pinned memory. The span remains valid until Dispose() is called.

TryEnqueue(ReadOnlySpan<byte>)

Attempts to enqueue a message into the staging buffer.

public bool TryEnqueue(ReadOnlySpan<byte> message)

Parameters

message ReadOnlySpan<byte>

The message bytes to enqueue (must be exactly MessageSize bytes).

Returns

bool

true if the message was successfully enqueued; false if the buffer is full.

Remarks

This method is lock-free and thread-safe for multiple concurrent producers. Uses Compare-And-Swap (CAS) to ensure only one producer claims each slot.

Exceptions

ArgumentException

Thrown if message length does not match MessageSize.

ObjectDisposedException

Thrown if the buffer has been disposed.