Interface IBarrierProvider

Namespace: DotCompute.Abstractions.Barriers

Assembly: DotCompute.Abstractions.dll

Provides GPU-native barrier synchronization primitives for coordinating thread execution.

public interface IBarrierProvider

Remarks

The barrier provider enables creation and management of hardware-accelerated synchronization barriers for GPU kernels. Barriers are essential for algorithms requiring phased computation where threads must coordinate their execution and ensure memory consistency.

Platform Support:

CUDA (CC 1.0+): Thread-block barriers via __syncthreads()
CUDA (CC 6.0+): Grid-wide barriers via Cooperative Groups
CUDA (CC 7.0+): Named barriers (up to 16 per block)
CUDA (CC 9.0+): Async barriers with transactions
OpenCL 2.0+: Work-group barriers

Barrier Types:

ThreadBlock	Fastest (~10ns), limited to single block, most common
Grid	Slower (~1-10μs), spans all blocks, requires cooperative launch
Warp	Ultra-fast (~1ns), 32-thread SIMD groups
Tile	Flexible (~20ns), arbitrary thread subsets

Usage Example:

var provider = accelerator.GetBarrierProvider();
if (provider != null)
{
    // Create thread-block barrier for 256 threads
    using var barrier = provider.CreateBarrier(BarrierScope.ThreadBlock, capacity: 256);

    // In kernel: barrier.Sync(); // All threads synchronize here
}

Properties

ActiveBarrierCount

Gets the number of currently active (non-disposed) barriers.

int ActiveBarrierCount { get; }

Property Value

int

Remarks

Use this property to monitor barrier resource usage. Most algorithms require 1-4 barriers, but complex kernels may use up to the hardware limit (typically 16).

Resource Limits:

Thread-block barriers: Up to 16 per block (named barriers)
Grid barriers: Limited by device memory, typically hundreds

IsCooperativeLaunchEnabled

Gets whether cooperative kernel launch is currently enabled.

bool IsCooperativeLaunchEnabled { get; }

Property Value

bool: True if cooperative launch is enabled, false otherwise.

Remarks

This property reflects the current configuration set via EnableCooperativeLaunch(bool). When true, all kernel launches will use cooperative mode, enabling grid-wide barriers.

Methods

CreateBarrier(BarrierScope, int, string?)

Creates a new barrier with the specified scope and capacity.

IBarrierHandle CreateBarrier(BarrierScope scope, int capacity, string? name = null)

Parameters

scope BarrierScope: The synchronization scope determining which threads participate.
capacity int: Maximum number of threads that will synchronize on this barrier. Must match the actual number of threads calling sync.
name string: Optional barrier name for debugging and named barrier support (CUDA 7.0+). If specified, enables multiple distinct barriers within same kernel.

Returns

IBarrierHandle: A new barrier handle configured for the specified scope and capacity. The caller is responsible for disposing the handle when done.

Examples

// Simple thread-block barrier
using var blockBarrier = provider.CreateBarrier(BarrierScope.ThreadBlock, capacity: 512);

// Named barriers for multi-phase algorithms
using var phase1 = provider.CreateBarrier(BarrierScope.ThreadBlock, capacity: 512, name: "phase1");
using var phase2 = provider.CreateBarrier(BarrierScope.ThreadBlock, capacity: 512, name: "phase2");

// Grid-wide barrier (requires cooperative launch)
using var gridBarrier = provider.CreateBarrier(BarrierScope.Grid, capacity: gridSize * blockSize);

Remarks

Capacity Requirements:

ThreadBlock: capacity ≤ block size (check device limits)
Grid: capacity = grid size (all threads in kernel)
Warp: capacity must be 32 (fixed warp size)
Tile: capacity ≤ block size, any value

Named Barriers (CUDA 7.0+): Up to 16 named barriers can exist per thread block. Names enable multiple synchronization points in complex kernels without barrier ID conflicts.

Performance:

Barrier creation: ~1μs overhead (once per kernel)
Barrier sync: 1ns to 10μs depending on scope
Named barriers: No performance penalty vs anonymous

Exceptions

ArgumentOutOfRangeException

Thrown when:

capacity ≤ 0
capacity exceeds device limits for the specified scope
Warp scope with capacity != 32

InvalidOperationException

Thrown when:

Grid scope on device with CC < 6.0
Named barriers on device with CC < 7.0
Maximum barrier count exceeded (16 per block)

EnableCooperativeLaunch(bool)

Enables cooperative kernel launch mode for grid-wide barriers.

void EnableCooperativeLaunch(bool enable = true)

Parameters

enable bool: True to enable cooperative launch, false to disable.

Remarks

Cooperative launch is required for Grid barriers. When enabled, kernels are launched using cudaLaunchCooperativeKernel, guaranteeing all threads execute concurrently.

Requirements:

Compute Capability 6.0+ (Pascal or newer)
Grid size ≤ maximum concurrent kernel limit (check device props)
Single GPU only (no multi-GPU cooperative launch yet)

Performance Impact: Cooperative launch may slightly increase kernel launch overhead (~10-50μs) but enables powerful grid-wide synchronization patterns.

Device Limits: Query cooperativeKernel attribute to determine if device supports cooperative launch, and maxThreadsPerMultiProcessor to calculate maximum concurrent grid size.

Exceptions

NotSupportedException: Thrown when device does not support cooperative launch (CC < 6.0).

ExecuteWithBarrierAsync(ICompiledKernel, IBarrierHandle, object, object[], CancellationToken)

Launches a kernel with barrier support, automatically handling cooperative launch when needed.

Task ExecuteWithBarrierAsync(ICompiledKernel kernel, IBarrierHandle barrier, object config, object[] arguments, CancellationToken ct = default)

Parameters

kernel ICompiledKernel: The compiled kernel to execute.
barrier IBarrierHandle: The barrier to use for synchronization.
config object: Launch configuration specifying grid/block dimensions. For CUDA, this should be a LaunchConfiguration object. The type is backend-specific.
arguments object[]: Kernel arguments (barrier handle will be prepended automatically).
ct CancellationToken: Cancellation token.

Returns

Task: A task representing the asynchronous kernel execution.

Examples

// Thread-block barrier (standard launch)
var blockBarrier = provider.CreateBarrier(BarrierScope.ThreadBlock, capacity: 256);
var config = new LaunchConfiguration
{
    GridSize = new Dim3(10, 1, 1),
    BlockSize = new Dim3(256, 1, 1)
};
await provider.ExecuteWithBarrierAsync(kernel, blockBarrier, config, args);

// Grid barrier (cooperative launch)
var gridBarrier = provider.CreateBarrier(BarrierScope.Grid, capacity: 2560);
await provider.ExecuteWithBarrierAsync(kernel, gridBarrier, config, args);

Remarks

This convenience method simplifies kernel execution with barriers by:

Automatically enabling cooperative launch for Grid barriers
Validating barrier capacity matches launch configuration
Prepending barrier handle as first kernel parameter
Handling argument marshaling and cleanup

Barrier Scope Handling:

ThreadBlock	Standard launch, validates capacity ≤ block size
Grid	Cooperative launch required, validates capacity = grid × block
Warp	Standard launch, validates capacity = 32
Tile	Standard launch, validates capacity ≤ block size

Performance: Grid barriers incur ~10-50μs cooperative launch overhead but enable powerful grid-wide synchronization patterns. Use thread-block barriers when possible.

Exceptions

ArgumentNullException

Thrown when kernel, barrier, or config is null.

ArgumentException

Thrown when config is not of the expected backend-specific type.

InvalidOperationException

Thrown when:

Grid barrier used without cooperative launch support
Barrier capacity doesn't match launch configuration
Grid size exceeds maximum cooperative size

GetBarrier(string)

Gets an existing barrier by name, or null if not found.

IBarrierHandle? GetBarrier(string name)

Parameters

name string: The barrier name specified during creation.

Returns

IBarrierHandle: The barrier handle if found, otherwise null. Handle lifetime is managed by the creator; this method provides read-only access.

Remarks

This method enables barrier sharing across different parts of kernel code. Useful for complex algorithms where multiple functions need to synchronize on the same barrier.

Thread Safety: Multiple threads can safely call this method concurrently. The returned handle is thread-safe for sync operations.

GetMaxCooperativeGridSize()

Gets the maximum number of threads that can participate in a grid-wide barrier.

int GetMaxCooperativeGridSize()

Returns

int: Maximum grid size for cooperative launch, or 0 if not supported. Value depends on device compute capability and available SM resources.

Remarks

This limit is computed from device properties:

maxGridSize = multiProcessorCount * maxThreadsPerMultiProcessor

Typical Values:

Pascal (CC 6.0): ~50,000 threads
Volta (CC 7.0): ~80,000 threads
Ampere (CC 8.0): ~100,000+ threads

ResetAllBarriers()

Destroys all barriers and resets the provider to initial state.

void ResetAllBarriers()

Remarks

⚠️ WARNING: This method forcefully destroys all barriers, including those that may have waiting threads. Only call this during cleanup or error recovery.

After reset, all existing IBarrierHandle references become invalid and will throw ObjectDisposedException if used.

Table of Contents

Interface IBarrierProvider

Remarks

Properties

ActiveBarrierCount

Property Value

Remarks

IsCooperativeLaunchEnabled

Property Value

Remarks

Methods

CreateBarrier(BarrierScope, int, string?)

Parameters

Returns

Examples

Remarks

Exceptions

EnableCooperativeLaunch(bool)

Parameters

Remarks

Exceptions

ExecuteWithBarrierAsync(ICompiledKernel, IBarrierHandle, object, object[], CancellationToken)

Parameters

Returns

Examples

Remarks

Exceptions

GetBarrier(string)

Parameters

Returns

Remarks

GetMaxCooperativeGridSize()

Returns

Remarks

ResetAllBarriers()

Remarks