Class GpuRingBuffer<T>

Namespace: DotCompute.Backends.CUDA.RingKernels

Assembly: DotCompute.Backends.CUDA.dll

Manages GPU-resident ring buffer memory for message passing.

public sealed class GpuRingBuffer<T> : IGpuRingBuffer, IDisposable where T : IRingKernelMessage

Type Parameters

T: Message type implementing IRingKernelMessage.

Inheritance: object

GpuRingBuffer<T>

Implements: IGpuRingBuffer

IDisposable

Inherited Members: object.Equals(object)

object.Equals(object, object)

object.GetHashCode()

object.GetType()

object.ReferenceEquals(object, object)

object.ToString()

Extension Methods: BufferExtensions.IsSafeToDispose(IDisposable)

BufferExtensions.SafeDispose(IDisposable)

Remarks

Allocates and manages GPU device memory for lock-free message queues: - Message buffer (serialized MemoryPack data) - Head/tail atomic counters for lock-free coordination

Supports two allocation modes:

Unified Memory (cudaMallocManaged) for non-WSL2 systems
Device Memory (cudaMalloc) for WSL2 systems

Constructors

GpuRingBuffer(int, int, int, bool, ILogger?)

Initializes a new instance of the GpuRingBuffer<T> class.

public GpuRingBuffer(int deviceId, int capacity, int messageSize, bool useUnifiedMemory, ILogger? logger = null)

Parameters

deviceId int: CUDA device ID.
capacity int: Ring buffer capacity (must be power of 2).
messageSize int: Size of each message in bytes.
useUnifiedMemory bool: True to use unified memory (non-WSL2), false for device memory (WSL2).
logger ILogger: Optional logger for diagnostics.

Exceptions

ArgumentException: Thrown when capacity is not a power of 2.
InvalidOperationException: Thrown when GPU allocation fails.

Properties

Capacity

Gets the capacity of the ring buffer (power of 2).

public int Capacity { get; }

Property Value

int

DeviceBufferPtr

Gets the device pointer to the message buffer.

public nint DeviceBufferPtr { get; }

Property Value

nint

DeviceHeadPtr

Gets the device pointer to the head atomic counter.

public nint DeviceHeadPtr { get; }

Property Value

nint

DeviceTailPtr

Gets the device pointer to the tail atomic counter.

public nint DeviceTailPtr { get; }

Property Value

nint

IsUnifiedMemory

Gets whether unified memory is being used.

public bool IsUnifiedMemory { get; }

Property Value

bool

MessageSize

Gets the size of each message in bytes.

public int MessageSize { get; }

Property Value

int

Methods

Dispose()

Performs application-defined tasks associated with freeing, releasing, or resetting unmanaged resources.

public void Dispose()

ReadHead()

Reads the current head counter value from the GPU.

public uint ReadHead()

Returns

uint

Remarks

For unified memory with system-scope atomics, uses Volatile.Read for CPU-GPU coherent atomic reads.

ReadMessage(int, CancellationToken)

Reads a message from the GPU buffer at the specified index.

public T ReadMessage(int index, CancellationToken cancellationToken = default)

Parameters

index int: Index in the ring buffer (0 to Capacity-1).
cancellationToken CancellationToken: Cancellation token.

Returns

T: The deserialized message.

ReadTail()

Reads the current tail counter value from the GPU.

public uint ReadTail()

Returns

uint

Remarks

For unified memory with system-scope atomics, uses Volatile.Read for CPU-GPU coherent atomic reads.

WriteHead(uint)

Writes the head counter value to the GPU.

public void WriteHead(uint value)

Parameters

value uint

Remarks

For unified memory with system-scope atomics, uses Interlocked.Exchange for CPU-GPU coherent atomic writes.

WriteMessage(T, int, CancellationToken)

Writes a message to the GPU buffer at the specified index.

public void WriteMessage(T message, int index, CancellationToken cancellationToken = default)

Parameters

message T: Message to write.
index int: Index in the ring buffer (0 to Capacity-1).
cancellationToken CancellationToken: Cancellation token.

WriteTail(uint)

Writes the tail counter value to the GPU.

public void WriteTail(uint value)

Parameters

value uint

Remarks

For unified memory with system-scope atomics (cuda::atomic<T, thread_scope_system>), we use Interlocked.Exchange which provides:

Atomic write semantics compatible with CUDA system-scope atomics
Full memory barrier ensuring visibility across CPU-GPU boundary

Table of Contents

Class GpuRingBuffer<T>

Type Parameters

Remarks

Constructors

GpuRingBuffer(int, int, int, bool, ILogger?)

Parameters

Exceptions

Properties

Capacity

Property Value

DeviceBufferPtr

Property Value

DeviceHeadPtr

Property Value

DeviceTailPtr

Property Value

IsUnifiedMemory

Property Value

MessageSize

Property Value

Methods

Dispose()

ReadHead()

Returns

Remarks

ReadMessage(int, CancellationToken)

Parameters

Returns

ReadTail()

Returns

Remarks

WriteHead(uint)

Parameters

Remarks

WriteMessage(T, int, CancellationToken)

Parameters

WriteTail(uint)

Parameters

Remarks