Class GpuRingBuffer<T>
- Namespace
- DotCompute.Backends.CUDA.RingKernels
- Assembly
- DotCompute.Backends.CUDA.dll
Manages GPU-resident ring buffer memory for message passing.
public sealed class GpuRingBuffer<T> : IGpuRingBuffer, IDisposable where T : IRingKernelMessage
Type Parameters
TMessage type implementing IRingKernelMessage.
- Inheritance
-
GpuRingBuffer<T>
- Implements
- Inherited Members
- Extension Methods
Remarks
Allocates and manages GPU device memory for lock-free message queues: - Message buffer (serialized MemoryPack data) - Head/tail atomic counters for lock-free coordination
Supports two allocation modes:
- Unified Memory (cudaMallocManaged) - CPU and GPU share a single address space
- Device Memory (cudaMalloc) - explicit DMA transfers via cudaMemcpy
Constructors
GpuRingBuffer(int, int, int, bool, ILogger?)
Initializes a new instance of the GpuRingBuffer<T> class.
public GpuRingBuffer(int deviceId, int capacity, int messageSize, bool useUnifiedMemory, ILogger? logger = null)
Parameters
deviceIdintCUDA device ID.
capacityintRing buffer capacity (must be power of 2).
messageSizeintSize of each message in bytes.
useUnifiedMemoryboolTrue to use unified memory (cudaMallocManaged); false to use device memory (cudaMalloc) with explicit DMA.
loggerILoggerOptional logger for diagnostics.
Exceptions
- ArgumentException
Thrown when capacity is not a power of 2.
- InvalidOperationException
Thrown when GPU allocation fails.
Properties
Capacity
Gets the capacity of the ring buffer (power of 2).
public int Capacity { get; }
Property Value
DeviceBufferPtr
Gets the device pointer to the message buffer.
public nint DeviceBufferPtr { get; }
Property Value
DeviceHeadPtr
Gets the device pointer to the head atomic counter.
public nint DeviceHeadPtr { get; }
Property Value
DeviceTailPtr
Gets the device pointer to the tail atomic counter.
public nint DeviceTailPtr { get; }
Property Value
IsUnifiedMemory
Gets whether unified memory is being used.
public bool IsUnifiedMemory { get; }
Property Value
MessageSize
Gets the size of each message in bytes.
public int MessageSize { get; }
Property Value
Methods
Dispose()
Performs application-defined tasks associated with freeing, releasing, or resetting unmanaged resources.
public void Dispose()
~GpuRingBuffer()
Finalizer — defense-in-depth release of GPU buffer / head / tail allocations in case Dispose was never called. Best-effort, swallows driver errors.
protected ~GpuRingBuffer()
ReadHead()
Reads the current head counter value from the GPU.
public uint ReadHead()
Returns
Remarks
For unified memory with system-scope atomics, uses Volatile.Read for CPU-GPU coherent atomic reads.
ReadMessage(int, CancellationToken)
Reads a message from the GPU buffer at the specified index.
public T ReadMessage(int index, CancellationToken cancellationToken = default)
Parameters
indexintIndex in the ring buffer (0 to Capacity-1).
cancellationTokenCancellationTokenCancellation token.
Returns
- T
The deserialized message.
ReadTail()
Reads the current tail counter value from the GPU.
public uint ReadTail()
Returns
Remarks
For unified memory with system-scope atomics, uses Volatile.Read for CPU-GPU coherent atomic reads.
WriteHead(uint)
Writes the head counter value to the GPU.
public void WriteHead(uint value)
Parameters
valueuint
Remarks
For unified memory with system-scope atomics, uses Interlocked.Exchange for CPU-GPU coherent atomic writes.
WriteMessage(T, int, CancellationToken)
Writes a message to the GPU buffer at the specified index.
public void WriteMessage(T message, int index, CancellationToken cancellationToken = default)
Parameters
messageTMessage to write.
indexintIndex in the ring buffer (0 to Capacity-1).
cancellationTokenCancellationTokenCancellation token.
WriteTail(uint)
Writes the tail counter value to the GPU.
public void WriteTail(uint value)
Parameters
valueuint
Remarks
For unified memory with system-scope atomics (cuda::atomic<T, thread_scope_system>), we use Interlocked.Exchange which provides:
- Atomic write semantics compatible with CUDA system-scope atomics
- Full memory barrier ensuring visibility across CPU-GPU boundary