Enum FenceType

Namespace: DotCompute.Abstractions.Memory

Assembly: DotCompute.Abstractions.dll

Specifies the scope of memory fence operations for GPU synchronization.

public enum FenceType

Fields

Device = 1

Device-wide fence ensuring memory consistency across all thread blocks on a single GPU.

CUDA: __threadfence()

Visibility: All threads on the same GPU see memory updates after this fence. Does not guarantee visibility to host CPU or other GPUs.

Use Cases:

Grid-wide producer-consumer patterns
Device-global data structure updates
Inter-block communication via global memory

Performance: ~100ns latency (medium overhead).

System = 2

System-wide fence ensuring memory consistency across CPU, GPU, and all devices.

CUDA: __threadfence_system()

Visibility: All processors in the system (CPU, all GPUs) see memory updates after this fence. Strongest consistency guarantee, highest overhead.

Use Cases:

GPU-CPU communication via mapped/pinned memory
Multi-GPU synchronization
System-wide distributed data structures
Causal message passing in Orleans.GpuBridge.Core

Performance: ~200ns latency (slowest, strongest guarantee).

Requirements: Requires unified virtual addressing (UVA) on CUDA.

ThreadBlock = 0

Thread-block scope fence ensuring memory consistency within a single thread block.

CUDA: __threadfence_block()

Visibility: All threads in the same thread block see memory updates after this fence. Does not guarantee visibility to threads in other blocks.

Use Cases:

Producer-consumer patterns within a block
Shared memory synchronization
Block-local data structure updates

Performance: ~10ns latency (fastest fence type).

Remarks

Memory fences control the visibility of memory operations across different levels of the GPU memory hierarchy. The fence type determines which threads observe the memory consistency guarantees.

Platform Mapping:

CUDA: __threadfence_block(), __threadfence(), __threadfence_system()
OpenCL: mem_fence(CLK_LOCAL_MEM_FENCE | CLK_GLOBAL_MEM_FENCE)
Metal: threadgroup_barrier(), device_barrier(), system_barrier()

Performance vs. Scope Trade-off:

ThreadBlock	Fastest (~10ns), limited visibility
Device	Medium (~100ns), intra-device visibility
System	Slowest (~200ns), inter-device visibility

Table of Contents

Enum FenceType

Fields

Remarks