Interface IBarrierProvider
- Namespace
- DotCompute.Abstractions.Barriers
- Assembly
- DotCompute.Abstractions.dll
Provides GPU-native barrier synchronization primitives for coordinating thread execution.
public interface IBarrierProvider
Remarks
The barrier provider enables creation and management of hardware-accelerated synchronization barriers for GPU kernels. Barriers are essential for algorithms requiring phased computation where threads must coordinate their execution and ensure memory consistency.
Platform Support:
- CUDA (CC 1.0+): Thread-block barriers via __syncthreads()
- CUDA (CC 6.0+): Grid-wide barriers via Cooperative Groups
- CUDA (CC 7.0+): Named barriers (up to 16 per block)
- CUDA (CC 9.0+): Async barriers with transactions
- OpenCL 2.0+: Work-group barriers
Barrier Types:
| ThreadBlock | Fastest (~10ns), limited to single block, most common |
| Grid | Slower (~1-10μs), spans all blocks, requires cooperative launch |
| Warp | Ultra-fast (~1ns), 32-thread SIMD groups |
| Tile | Flexible (~20ns), arbitrary thread subsets |
Usage Example:
var provider = accelerator.GetBarrierProvider();
if (provider != null)
{
// Create thread-block barrier for 256 threads
using var barrier = provider.CreateBarrier(BarrierScope.ThreadBlock, capacity: 256);
// In kernel: barrier.Sync(); // All threads synchronize here
}
Properties
ActiveBarrierCount
Gets the number of currently active (non-disposed) barriers.
int ActiveBarrierCount { get; }
Property Value
Remarks
Use this property to monitor barrier resource usage. Most algorithms require 1-4 barriers, but complex kernels may use up to the hardware limit (typically 16).
Resource Limits:
- Thread-block barriers: Up to 16 per block (named barriers)
- Grid barriers: Limited by device memory, typically hundreds
IsCooperativeLaunchEnabled
Gets whether cooperative kernel launch is currently enabled.
bool IsCooperativeLaunchEnabled { get; }
Property Value
- bool
True if cooperative launch is enabled, false otherwise.
Remarks
This property reflects the current configuration set via EnableCooperativeLaunch(bool). When true, all kernel launches will use cooperative mode, enabling grid-wide barriers.
Methods
CreateBarrier(BarrierScope, int, string?)
Creates a new barrier with the specified scope and capacity.
IBarrierHandle CreateBarrier(BarrierScope scope, int capacity, string? name = null)
Parameters
scopeBarrierScopeThe synchronization scope determining which threads participate.
capacityintMaximum number of threads that will synchronize on this barrier. Must match the actual number of threads calling sync.
namestringOptional barrier name for debugging and named barrier support (CUDA 7.0+). If specified, enables multiple distinct barriers within same kernel.
Returns
- IBarrierHandle
A new barrier handle configured for the specified scope and capacity. The caller is responsible for disposing the handle when done.
Examples
// Simple thread-block barrier
using var blockBarrier = provider.CreateBarrier(BarrierScope.ThreadBlock, capacity: 512);
// Named barriers for multi-phase algorithms
using var phase1 = provider.CreateBarrier(BarrierScope.ThreadBlock, capacity: 512, name: "phase1");
using var phase2 = provider.CreateBarrier(BarrierScope.ThreadBlock, capacity: 512, name: "phase2");
// Grid-wide barrier (requires cooperative launch)
using var gridBarrier = provider.CreateBarrier(BarrierScope.Grid, capacity: gridSize * blockSize);
Remarks
Capacity Requirements:
- ThreadBlock: capacity ≤ block size (check device limits)
- Grid: capacity = grid size (all threads in kernel)
- Warp: capacity must be 32 (fixed warp size)
- Tile: capacity ≤ block size, any value
Named Barriers (CUDA 7.0+): Up to 16 named barriers can exist per thread block. Names enable multiple synchronization points in complex kernels without barrier ID conflicts.
Performance:
- Barrier creation: ~1μs overhead (once per kernel)
- Barrier sync: 1ns to 10μs depending on scope
- Named barriers: No performance penalty vs anonymous
Exceptions
- ArgumentOutOfRangeException
Thrown when:
- capacity ≤ 0
- capacity exceeds device limits for the specified scope
- Warp scope with capacity != 32
- InvalidOperationException
Thrown when:
- Grid scope on device with CC < 6.0
- Named barriers on device with CC < 7.0
- Maximum barrier count exceeded (16 per block)
EnableCooperativeLaunch(bool)
Enables cooperative kernel launch mode for grid-wide barriers.
void EnableCooperativeLaunch(bool enable = true)
Parameters
enableboolTrue to enable cooperative launch, false to disable.
Remarks
Cooperative launch is required for Grid barriers.
When enabled, kernels are launched using cudaLaunchCooperativeKernel,
guaranteeing all threads execute concurrently.
Requirements:
- Compute Capability 6.0+ (Pascal or newer)
- Grid size ≤ maximum concurrent kernel limit (check device props)
- Single GPU only (no multi-GPU cooperative launch yet)
Performance Impact: Cooperative launch may slightly increase kernel launch overhead (~10-50μs) but enables powerful grid-wide synchronization patterns.
Device Limits: Query cooperativeKernel attribute to determine
if device supports cooperative launch, and maxThreadsPerMultiProcessor to
calculate maximum concurrent grid size.
Exceptions
- NotSupportedException
Thrown when device does not support cooperative launch (CC < 6.0).
ExecuteWithBarrierAsync(ICompiledKernel, IBarrierHandle, object, object[], CancellationToken)
Launches a kernel with barrier support, automatically handling cooperative launch when needed.
Task ExecuteWithBarrierAsync(ICompiledKernel kernel, IBarrierHandle barrier, object config, object[] arguments, CancellationToken ct = default)
Parameters
kernelICompiledKernelThe compiled kernel to execute.
barrierIBarrierHandleThe barrier to use for synchronization.
configobjectLaunch configuration specifying grid/block dimensions. For CUDA, this should be a
LaunchConfigurationobject. The type is backend-specific.argumentsobject[]Kernel arguments (barrier handle will be prepended automatically).
ctCancellationTokenCancellation token.
Returns
- Task
A task representing the asynchronous kernel execution.
Examples
// Thread-block barrier (standard launch)
var blockBarrier = provider.CreateBarrier(BarrierScope.ThreadBlock, capacity: 256);
var config = new LaunchConfiguration
{
GridSize = new Dim3(10, 1, 1),
BlockSize = new Dim3(256, 1, 1)
};
await provider.ExecuteWithBarrierAsync(kernel, blockBarrier, config, args);
// Grid barrier (cooperative launch)
var gridBarrier = provider.CreateBarrier(BarrierScope.Grid, capacity: 2560);
await provider.ExecuteWithBarrierAsync(kernel, gridBarrier, config, args);
Remarks
This convenience method simplifies kernel execution with barriers by:
- Automatically enabling cooperative launch for Grid barriers
- Validating barrier capacity matches launch configuration
- Prepending barrier handle as first kernel parameter
- Handling argument marshaling and cleanup
Barrier Scope Handling:
| ThreadBlock | Standard launch, validates capacity ≤ block size |
| Grid | Cooperative launch required, validates capacity = grid × block |
| Warp | Standard launch, validates capacity = 32 |
| Tile | Standard launch, validates capacity ≤ block size |
Performance: Grid barriers incur ~10-50μs cooperative launch overhead but enable powerful grid-wide synchronization patterns. Use thread-block barriers when possible.
Exceptions
- ArgumentNullException
Thrown when kernel, barrier, or config is null.
- ArgumentException
Thrown when config is not of the expected backend-specific type.
- InvalidOperationException
Thrown when:
- Grid barrier used without cooperative launch support
- Barrier capacity doesn't match launch configuration
- Grid size exceeds maximum cooperative size
GetBarrier(string)
Gets an existing barrier by name, or null if not found.
IBarrierHandle? GetBarrier(string name)
Parameters
namestringThe barrier name specified during creation.
Returns
- IBarrierHandle
The barrier handle if found, otherwise null. Handle lifetime is managed by the creator; this method provides read-only access.
Remarks
This method enables barrier sharing across different parts of kernel code. Useful for complex algorithms where multiple functions need to synchronize on the same barrier.
Thread Safety: Multiple threads can safely call this method concurrently. The returned handle is thread-safe for sync operations.
GetMaxCooperativeGridSize()
Gets the maximum number of threads that can participate in a grid-wide barrier.
int GetMaxCooperativeGridSize()
Returns
- int
Maximum grid size for cooperative launch, or 0 if not supported. Value depends on device compute capability and available SM resources.
Remarks
This limit is computed from device properties:
maxGridSize = multiProcessorCount * maxThreadsPerMultiProcessor
Typical Values:
- Pascal (CC 6.0): ~50,000 threads
- Volta (CC 7.0): ~80,000 threads
- Ampere (CC 8.0): ~100,000+ threads
ResetAllBarriers()
Destroys all barriers and resets the provider to initial state.
void ResetAllBarriers()
Remarks
⚠️ WARNING: This method forcefully destroys all barriers, including those that may have waiting threads. Only call this during cleanup or error recovery.
After reset, all existing IBarrierHandle references become invalid and will throw ObjectDisposedException if used.