Class GpuRingBuffer<T>
- Namespace
- DotCompute.Backends.CUDA.RingKernels
- Assembly
- DotCompute.Backends.CUDA.dll
Manages GPU-resident ring buffer memory for message passing.
public sealed class GpuRingBuffer<T> : IGpuRingBuffer, IDisposable where T : IRingKernelMessage
Type Parameters
TMessage type implementing IRingKernelMessage.
- Inheritance
-
GpuRingBuffer<T>
- Implements
- Inherited Members
- Extension Methods
Remarks
Allocates and manages GPU device memory for lock-free message queues: - Message buffer (serialized MemoryPack data) - Head/tail atomic counters for lock-free coordination
Supports two allocation modes:
- Unified Memory (cudaMallocManaged) for non-WSL2 systems
- Device Memory (cudaMalloc) for WSL2 systems
Constructors
GpuRingBuffer(int, int, int, bool, ILogger?)
Initializes a new instance of the GpuRingBuffer<T> class.
public GpuRingBuffer(int deviceId, int capacity, int messageSize, bool useUnifiedMemory, ILogger? logger = null)
Parameters
deviceIdintCUDA device ID.
capacityintRing buffer capacity (must be power of 2).
messageSizeintSize of each message in bytes.
useUnifiedMemoryboolTrue to use unified memory (non-WSL2), false for device memory (WSL2).
loggerILoggerOptional logger for diagnostics.
Exceptions
- ArgumentException
Thrown when capacity is not a power of 2.
- InvalidOperationException
Thrown when GPU allocation fails.
Properties
Capacity
Gets the capacity of the ring buffer (power of 2).
public int Capacity { get; }
Property Value
DeviceBufferPtr
Gets the device pointer to the message buffer.
public nint DeviceBufferPtr { get; }
Property Value
DeviceHeadPtr
Gets the device pointer to the head atomic counter.
public nint DeviceHeadPtr { get; }
Property Value
DeviceTailPtr
Gets the device pointer to the tail atomic counter.
public nint DeviceTailPtr { get; }
Property Value
IsUnifiedMemory
Gets whether unified memory is being used.
public bool IsUnifiedMemory { get; }
Property Value
MessageSize
Gets the size of each message in bytes.
public int MessageSize { get; }
Property Value
Methods
Dispose()
Performs application-defined tasks associated with freeing, releasing, or resetting unmanaged resources.
public void Dispose()
ReadHead()
Reads the current head counter value from the GPU.
public uint ReadHead()
Returns
Remarks
For unified memory with system-scope atomics, uses Volatile.Read for CPU-GPU coherent atomic reads.
ReadMessage(int, CancellationToken)
Reads a message from the GPU buffer at the specified index.
public T ReadMessage(int index, CancellationToken cancellationToken = default)
Parameters
indexintIndex in the ring buffer (0 to Capacity-1).
cancellationTokenCancellationTokenCancellation token.
Returns
- T
The deserialized message.
ReadTail()
Reads the current tail counter value from the GPU.
public uint ReadTail()
Returns
Remarks
For unified memory with system-scope atomics, uses Volatile.Read for CPU-GPU coherent atomic reads.
WriteHead(uint)
Writes the head counter value to the GPU.
public void WriteHead(uint value)
Parameters
valueuint
Remarks
For unified memory with system-scope atomics, uses Interlocked.Exchange for CPU-GPU coherent atomic writes.
WriteMessage(T, int, CancellationToken)
Writes a message to the GPU buffer at the specified index.
public void WriteMessage(T message, int index, CancellationToken cancellationToken = default)
Parameters
messageTMessage to write.
indexintIndex in the ring buffer (0 to Capacity-1).
cancellationTokenCancellationTokenCancellation token.
WriteTail(uint)
Writes the tail counter value to the GPU.
public void WriteTail(uint value)
Parameters
valueuint
Remarks
For unified memory with system-scope atomics (cuda::atomic<T, thread_scope_system>), we use Interlocked.Exchange which provides:
- Atomic write semantics compatible with CUDA system-scope atomics
- Full memory barrier ensuring visibility across CPU-GPU boundary