Table of Contents

Interface IBarrierProvider

Namespace
DotCompute.Abstractions.Barriers
Assembly
DotCompute.Abstractions.dll

Provides GPU-native barrier synchronization primitives for coordinating thread execution.

public interface IBarrierProvider

Remarks

The barrier provider enables creation and management of hardware-accelerated synchronization barriers for GPU kernels. Barriers are essential for algorithms requiring phased computation where threads must coordinate their execution and ensure memory consistency.

Platform Support:

  • CUDA (CC 1.0+): Thread-block barriers via __syncthreads()
  • CUDA (CC 6.0+): Grid-wide barriers via Cooperative Groups
  • CUDA (CC 7.0+): Named barriers (up to 16 per block)
  • CUDA (CC 9.0+): Async barriers with transactions
  • OpenCL 2.0+: Work-group barriers

Barrier Types:

ThreadBlockFastest (~10ns), limited to single block, most common
GridSlower (~1-10μs), spans all blocks, requires cooperative launch
WarpUltra-fast (~1ns), 32-thread SIMD groups
TileFlexible (~20ns), arbitrary thread subsets

Usage Example:

var provider = accelerator.GetBarrierProvider();
if (provider != null)
{
    // Create thread-block barrier for 256 threads
    using var barrier = provider.CreateBarrier(BarrierScope.ThreadBlock, capacity: 256);

    // In kernel: barrier.Sync(); // All threads synchronize here
}

Properties

ActiveBarrierCount

Gets the number of currently active (non-disposed) barriers.

int ActiveBarrierCount { get; }

Property Value

int

Remarks

Use this property to monitor barrier resource usage. Most algorithms require 1-4 barriers, but complex kernels may use up to the hardware limit (typically 16).

Resource Limits:

  • Thread-block barriers: Up to 16 per block (named barriers)
  • Grid barriers: Limited by device memory, typically hundreds

IsCooperativeLaunchEnabled

Gets whether cooperative kernel launch is currently enabled.

bool IsCooperativeLaunchEnabled { get; }

Property Value

bool

True if cooperative launch is enabled, false otherwise.

Remarks

This property reflects the current configuration set via EnableCooperativeLaunch(bool). When true, all kernel launches will use cooperative mode, enabling grid-wide barriers.

Methods

CreateBarrier(BarrierScope, int, string?)

Creates a new barrier with the specified scope and capacity.

IBarrierHandle CreateBarrier(BarrierScope scope, int capacity, string? name = null)

Parameters

scope BarrierScope

The synchronization scope determining which threads participate.

capacity int

Maximum number of threads that will synchronize on this barrier. Must match the actual number of threads calling sync.

name string

Optional barrier name for debugging and named barrier support (CUDA 7.0+). If specified, enables multiple distinct barriers within same kernel.

Returns

IBarrierHandle

A new barrier handle configured for the specified scope and capacity. The caller is responsible for disposing the handle when done.

Examples

// Simple thread-block barrier
using var blockBarrier = provider.CreateBarrier(BarrierScope.ThreadBlock, capacity: 512);

// Named barriers for multi-phase algorithms
using var phase1 = provider.CreateBarrier(BarrierScope.ThreadBlock, capacity: 512, name: "phase1");
using var phase2 = provider.CreateBarrier(BarrierScope.ThreadBlock, capacity: 512, name: "phase2");

// Grid-wide barrier (requires cooperative launch)
using var gridBarrier = provider.CreateBarrier(BarrierScope.Grid, capacity: gridSize * blockSize);

Remarks

Capacity Requirements:

  • ThreadBlock: capacity ≤ block size (check device limits)
  • Grid: capacity = grid size (all threads in kernel)
  • Warp: capacity must be 32 (fixed warp size)
  • Tile: capacity ≤ block size, any value

Named Barriers (CUDA 7.0+): Up to 16 named barriers can exist per thread block. Names enable multiple synchronization points in complex kernels without barrier ID conflicts.

Performance:

  • Barrier creation: ~1μs overhead (once per kernel)
  • Barrier sync: 1ns to 10μs depending on scope
  • Named barriers: No performance penalty vs anonymous

Exceptions

ArgumentOutOfRangeException

Thrown when:

  • capacity ≤ 0
  • capacity exceeds device limits for the specified scope
  • Warp scope with capacity != 32
InvalidOperationException

Thrown when:

  • Grid scope on device with CC < 6.0
  • Named barriers on device with CC < 7.0
  • Maximum barrier count exceeded (16 per block)

EnableCooperativeLaunch(bool)

Enables cooperative kernel launch mode for grid-wide barriers.

void EnableCooperativeLaunch(bool enable = true)

Parameters

enable bool

True to enable cooperative launch, false to disable.

Remarks

Cooperative launch is required for Grid barriers. When enabled, kernels are launched using cudaLaunchCooperativeKernel, guaranteeing all threads execute concurrently.

Requirements:

  • Compute Capability 6.0+ (Pascal or newer)
  • Grid size ≤ maximum concurrent kernel limit (check device props)
  • Single GPU only (no multi-GPU cooperative launch yet)

Performance Impact: Cooperative launch may slightly increase kernel launch overhead (~10-50μs) but enables powerful grid-wide synchronization patterns.

Device Limits: Query cooperativeKernel attribute to determine if device supports cooperative launch, and maxThreadsPerMultiProcessor to calculate maximum concurrent grid size.

Exceptions

NotSupportedException

Thrown when device does not support cooperative launch (CC < 6.0).

ExecuteWithBarrierAsync(ICompiledKernel, IBarrierHandle, object, object[], CancellationToken)

Launches a kernel with barrier support, automatically handling cooperative launch when needed.

Task ExecuteWithBarrierAsync(ICompiledKernel kernel, IBarrierHandle barrier, object config, object[] arguments, CancellationToken ct = default)

Parameters

kernel ICompiledKernel

The compiled kernel to execute.

barrier IBarrierHandle

The barrier to use for synchronization.

config object

Launch configuration specifying grid/block dimensions. For CUDA, this should be a LaunchConfiguration object. The type is backend-specific.

arguments object[]

Kernel arguments (barrier handle will be prepended automatically).

ct CancellationToken

Cancellation token.

Returns

Task

A task representing the asynchronous kernel execution.

Examples

// Thread-block barrier (standard launch)
var blockBarrier = provider.CreateBarrier(BarrierScope.ThreadBlock, capacity: 256);
var config = new LaunchConfiguration
{
    GridSize = new Dim3(10, 1, 1),
    BlockSize = new Dim3(256, 1, 1)
};
await provider.ExecuteWithBarrierAsync(kernel, blockBarrier, config, args);

// Grid barrier (cooperative launch)
var gridBarrier = provider.CreateBarrier(BarrierScope.Grid, capacity: 2560);
await provider.ExecuteWithBarrierAsync(kernel, gridBarrier, config, args);

Remarks

This convenience method simplifies kernel execution with barriers by:

  • Automatically enabling cooperative launch for Grid barriers
  • Validating barrier capacity matches launch configuration
  • Prepending barrier handle as first kernel parameter
  • Handling argument marshaling and cleanup

Barrier Scope Handling:

ThreadBlockStandard launch, validates capacity ≤ block size
GridCooperative launch required, validates capacity = grid × block
WarpStandard launch, validates capacity = 32
TileStandard launch, validates capacity ≤ block size

Performance: Grid barriers incur ~10-50μs cooperative launch overhead but enable powerful grid-wide synchronization patterns. Use thread-block barriers when possible.

Exceptions

ArgumentNullException

Thrown when kernel, barrier, or config is null.

ArgumentException

Thrown when config is not of the expected backend-specific type.

InvalidOperationException

Thrown when:

  • Grid barrier used without cooperative launch support
  • Barrier capacity doesn't match launch configuration
  • Grid size exceeds maximum cooperative size

GetBarrier(string)

Gets an existing barrier by name, or null if not found.

IBarrierHandle? GetBarrier(string name)

Parameters

name string

The barrier name specified during creation.

Returns

IBarrierHandle

The barrier handle if found, otherwise null. Handle lifetime is managed by the creator; this method provides read-only access.

Remarks

This method enables barrier sharing across different parts of kernel code. Useful for complex algorithms where multiple functions need to synchronize on the same barrier.

Thread Safety: Multiple threads can safely call this method concurrently. The returned handle is thread-safe for sync operations.

GetMaxCooperativeGridSize()

Gets the maximum number of threads that can participate in a grid-wide barrier.

int GetMaxCooperativeGridSize()

Returns

int

Maximum grid size for cooperative launch, or 0 if not supported. Value depends on device compute capability and available SM resources.

Remarks

This limit is computed from device properties:

maxGridSize = multiProcessorCount * maxThreadsPerMultiProcessor

Typical Values:

  • Pascal (CC 6.0): ~50,000 threads
  • Volta (CC 7.0): ~80,000 threads
  • Ampere (CC 8.0): ~100,000+ threads

ResetAllBarriers()

Destroys all barriers and resets the provider to initial state.

void ResetAllBarriers()

Remarks

⚠️ WARNING: This method forcefully destroys all barriers, including those that may have waiting threads. Only call this during cleanup or error recovery.

After reset, all existing IBarrierHandle references become invalid and will throw ObjectDisposedException if used.