Table of Contents

Enum MemoryConsistencyModel

Namespace
DotCompute.Abstractions.Memory
Assembly
DotCompute.Abstractions.dll

Defines the memory consistency model for GPU kernel execution.

public enum MemoryConsistencyModel

Fields

Relaxed = 0

Relaxed memory consistency: no ordering guarantees between threads.

In the relaxed model, threads may observe memory operations in any order unless explicitly synchronized with fences or atomic operations. This is the default GPU memory model.

Example:

Thread 1: A = 1; B = 2;
Thread 2: r1 = B; r2 = A;  // May see r1=2, r2=0 (reordering)

Performance: 1.0× baseline (no overhead).

Use When: Data-parallel algorithms with independent operations, no inter-thread communication, or manual fence management.

ReleaseAcquire = 1

Release-Acquire memory consistency: causal ordering for synchronized operations.

Release-Acquire semantics ensure that:

  • Release Store: All prior writes become visible before the store
  • Acquire Load: All subsequent reads see values after the load
  • Causality: If Thread A releases X and Thread B acquires X, all of A's prior writes are visible to B

Example:

Thread 1: A = 1; B = 2; release_store(&flag, 1);  // Release
Thread 2: if (acquire_load(&flag)) r1 = A;       // Acquire, sees A=1

Implementation:

  • Release: Fence before atomic store
  • Acquire: Fence after atomic load

Performance: 0.85× baseline (15% overhead from fences).

Use When: Producer-consumer patterns, message passing, distributed data structures, actor systems (Orleans.GpuBridge.Core).

Sequential = 2

Sequential consistency: total order visible to all threads.

Sequential consistency (SC) provides the strongest guarantee: all threads observe memory operations in the same global order, as if operations were interleaved on a single processor.

Example:

Thread 1: A = 1; B = 2;
Thread 2: r1 = B; r2 = A;
// SC guarantees: if r1=2, then r2=1 (never r2=0)

Implementation: Fence before and after every memory operation.

Performance: 0.60× baseline (40% overhead from pervasive fencing).

Use When: Algorithm correctness requires total order visibility, performance is secondary, or debugging relaxed-model race conditions.

⚠️ Warning: Sequential consistency significantly impacts performance. Only use when absolutely necessary. Consider Release-Acquire first.

Remarks

Memory consistency models specify the ordering guarantees for memory operations performed by different threads. Stronger models provide more intuitive semantics but impose higher performance costs.

Model Comparison:

Relaxed No ordering guarantees. Threads may observe operations in any order. Maximum performance (1.0× baseline), minimal synchronization.
ReleaseAcquire Causal ordering: release stores become visible to acquire loads. Good balance (0.85× baseline, 15% overhead), suitable for most algorithms.
Sequential Total order: all threads observe operations in the same order. Strongest guarantees (0.60× baseline, 40% overhead), simplest reasoning.

Choosing a Model:

  • Relaxed: Data-parallel algorithms with no inter-thread dependencies
  • ReleaseAcquire: Producer-consumer patterns, message passing (recommended default)
  • Sequential: Complex algorithms requiring total order visibility