Class SharedMemoryAttribute
- Namespace
- DotCompute.Abstractions.Attributes
- Assembly
- DotCompute.Abstractions.dll
Declares a shared memory allocation for a kernel method. Shared memory (threadgroup memory in Metal, shared in CUDA, __local in OpenCL) is fast on-chip memory shared between threads in a thread block/workgroup.
[AttributeUsage(AttributeTargets.Method, AllowMultiple = true, Inherited = false)]
public sealed class SharedMemoryAttribute : Attribute
- Inheritance
-
SharedMemoryAttribute
- Inherited Members
Examples
// Declare 256 floats of shared memory for a reduction
[Kernel(Backends = KernelBackends.CUDA | KernelBackends.Metal)]
[SharedMemory(typeof(float), "shared_data", Size = 256)]
public static void Reduction(ReadOnlySpan<float> input, Span<float> output)
{
int tid = Kernel.ThreadId.X;
int gid = Kernel.GlobalId.X;
// Load into shared memory
Kernel.SharedMemory<float>("shared_data")[tid] = input[gid];
Kernel.Barrier();
// Parallel reduction in shared memory
for (int stride = 128; stride > 0; stride >>= 1)
{
if (tid < stride)
Kernel.SharedMemory<float>("shared_data")[tid] +=
Kernel.SharedMemory<float>("shared_data")[tid + stride];
Kernel.Barrier();
}
if (tid == 0)
output[Kernel.BlockId.X] = Kernel.SharedMemory<float>("shared_data")[0];
}
Remarks
Shared memory is significantly faster than global memory (typically 10-100x lower latency) and is essential for many GPU optimization patterns:
- Reduction operations (sum, min, max across threads)
- Tiled matrix multiplication (loading tiles into shared memory)
- Stencil computations (caching neighbor values)
- Inter-thread communication within a block
Backend Translation:
| Backend | Translation |
|---|---|
| CUDA | __shared__ T name[Size]; |
| Metal | threadgroup T* name [[threadgroup(N)]]; |
| OpenCL | __local T name[Size]; |
| CPU | Thread-local storage or stack allocation |
Limitations:
- Size is limited per thread block (typically 32-48KB on modern GPUs)
- Only accessible within a single thread block/workgroup
- Must use barriers for synchronization when sharing data
Constructors
SharedMemoryAttribute(Type, string)
Initializes a new instance of the SharedMemoryAttribute class.
public SharedMemoryAttribute(Type elementType, string name)
Parameters
elementTypeTypeThe type of elements in the shared memory array.
namestringThe name used to reference this shared memory in the kernel.
Properties
Alignment
Gets or sets the memory alignment requirement in bytes.
public int Alignment { get; set; }
Property Value
- int
The required memory alignment in bytes. Default is 0 (use natural alignment).
Remarks
Proper alignment can improve memory access performance:
- 16 bytes - Optimal for float4/int4 vector types
- 32 bytes - Cache line alignment on some GPUs
- 128 bytes - Memory transaction alignment on NVIDIA GPUs
Backend Translation:
- CUDA:
__align__(N) - Metal:
alignas(N) - OpenCL: Uses aligned allocation
ElementType
Gets the element type for the shared memory array.
public Type ElementType { get; }
Property Value
- Type
The type of each element in the shared memory. Must be an unmanaged type (e.g., int, float, double, or an unmanaged struct).
MetalBindingIndex
Gets or sets the threadgroup binding index for Metal shaders.
public int MetalBindingIndex { get; set; }
Property Value
- int
The threadgroup buffer binding index for Metal (0-based). Default is -1 (auto-assign based on declaration order).
Remarks
Metal requires explicit binding indices for threadgroup memory:
threadgroup float* shared_data [[threadgroup(0)]];
threadgroup int* shared_indices [[threadgroup(1)]];
When set to -1, binding indices are auto-assigned in declaration order. Set explicitly when multiple kernels share consistent binding layouts.
Name
Gets the name used to reference this shared memory allocation in the kernel code.
public string Name { get; }
Property Value
- string
A unique identifier for this shared memory block within the kernel. Used with
Kernel.SharedMemory<T>(name)to access the memory.
Size
Gets or sets the number of elements in the shared memory array.
public int Size { get; set; }
Property Value
- int
The size of the shared memory array in elements (not bytes). Default is 0, which indicates dynamic sizing based on thread block size.
Remarks
When Size is 0 (dynamic), the shared memory is typically sized to match the thread block size or can be specified at kernel launch time.
Memory Calculation: Total bytes = Size * sizeof(ElementType)
Common Sizes:
- 256 - Standard block size
- 512/1024 - Larger blocks for higher occupancy
- 32 - Warp-sized for warp-level operations
ZeroInitialize
Gets or sets whether to zero-initialize the shared memory.
public bool ZeroInitialize { get; set; }
Property Value
- bool
trueto zero-initialize the shared memory before use;falsefor uninitialized memory (faster).
Remarks
Zero initialization adds overhead but ensures predictable initial values. Only enable when the algorithm requires known initial state.
Performance: Disabling saves ~1-2 clock cycles per element.