Table of Contents

Class GpuRingBuffer<T>

Namespace
DotCompute.Backends.CUDA.RingKernels
Assembly
DotCompute.Backends.CUDA.dll

Manages GPU-resident ring buffer memory for message passing.

public sealed class GpuRingBuffer<T> : IGpuRingBuffer, IDisposable where T : IRingKernelMessage

Type Parameters

T

Message type implementing IRingKernelMessage.

Inheritance
GpuRingBuffer<T>
Implements
Inherited Members
Extension Methods

Remarks

Allocates and manages GPU device memory for lock-free message queues: - Message buffer (serialized MemoryPack data) - Head/tail atomic counters for lock-free coordination

Supports two allocation modes:

  • Unified Memory (cudaMallocManaged) for non-WSL2 systems
  • Device Memory (cudaMalloc) for WSL2 systems

Constructors

GpuRingBuffer(int, int, int, bool, ILogger?)

Initializes a new instance of the GpuRingBuffer<T> class.

public GpuRingBuffer(int deviceId, int capacity, int messageSize, bool useUnifiedMemory, ILogger? logger = null)

Parameters

deviceId int

CUDA device ID.

capacity int

Ring buffer capacity (must be power of 2).

messageSize int

Size of each message in bytes.

useUnifiedMemory bool

True to use unified memory (non-WSL2), false for device memory (WSL2).

logger ILogger

Optional logger for diagnostics.

Exceptions

ArgumentException

Thrown when capacity is not a power of 2.

InvalidOperationException

Thrown when GPU allocation fails.

Properties

Capacity

Gets the capacity of the ring buffer (power of 2).

public int Capacity { get; }

Property Value

int

DeviceBufferPtr

Gets the device pointer to the message buffer.

public nint DeviceBufferPtr { get; }

Property Value

nint

DeviceHeadPtr

Gets the device pointer to the head atomic counter.

public nint DeviceHeadPtr { get; }

Property Value

nint

DeviceTailPtr

Gets the device pointer to the tail atomic counter.

public nint DeviceTailPtr { get; }

Property Value

nint

IsUnifiedMemory

Gets whether unified memory is being used.

public bool IsUnifiedMemory { get; }

Property Value

bool

MessageSize

Gets the size of each message in bytes.

public int MessageSize { get; }

Property Value

int

Methods

Dispose()

Performs application-defined tasks associated with freeing, releasing, or resetting unmanaged resources.

public void Dispose()

ReadHead()

Reads the current head counter value from the GPU.

public uint ReadHead()

Returns

uint

Remarks

For unified memory with system-scope atomics, uses Volatile.Read for CPU-GPU coherent atomic reads.

ReadMessage(int, CancellationToken)

Reads a message from the GPU buffer at the specified index.

public T ReadMessage(int index, CancellationToken cancellationToken = default)

Parameters

index int

Index in the ring buffer (0 to Capacity-1).

cancellationToken CancellationToken

Cancellation token.

Returns

T

The deserialized message.

ReadTail()

Reads the current tail counter value from the GPU.

public uint ReadTail()

Returns

uint

Remarks

For unified memory with system-scope atomics, uses Volatile.Read for CPU-GPU coherent atomic reads.

WriteHead(uint)

Writes the head counter value to the GPU.

public void WriteHead(uint value)

Parameters

value uint

Remarks

For unified memory with system-scope atomics, uses Interlocked.Exchange for CPU-GPU coherent atomic writes.

WriteMessage(T, int, CancellationToken)

Writes a message to the GPU buffer at the specified index.

public void WriteMessage(T message, int index, CancellationToken cancellationToken = default)

Parameters

message T

Message to write.

index int

Index in the ring buffer (0 to Capacity-1).

cancellationToken CancellationToken

Cancellation token.

WriteTail(uint)

Writes the tail counter value to the GPU.

public void WriteTail(uint value)

Parameters

value uint

Remarks

For unified memory with system-scope atomics (cuda::atomic<T, thread_scope_system>), we use Interlocked.Exchange which provides:

  1. Atomic write semantics compatible with CUDA system-scope atomics
  2. Full memory barrier ensuring visibility across CPU-GPU boundary