Class SharedMemoryAttribute

Namespace: DotCompute.Abstractions.Attributes

Assembly: DotCompute.Abstractions.dll

Declares a shared memory allocation for a kernel method. Shared memory (threadgroup memory in Metal, shared in CUDA, __local in OpenCL) is fast on-chip memory shared between threads in a thread block/workgroup.

[AttributeUsage(AttributeTargets.Method, AllowMultiple = true, Inherited = false)]
public sealed class SharedMemoryAttribute : Attribute

Inheritance: object

Attribute

SharedMemoryAttribute

Inherited Members: Attribute.Equals(object)

Attribute.GetCustomAttribute(Assembly, Type)

Attribute.GetCustomAttribute(Assembly, Type, bool)

Attribute.GetCustomAttribute(MemberInfo, Type)

Attribute.GetCustomAttribute(MemberInfo, Type, bool)

Attribute.GetCustomAttribute(Module, Type)

Attribute.GetCustomAttribute(Module, Type, bool)

Attribute.GetCustomAttribute(ParameterInfo, Type)

Attribute.GetCustomAttribute(ParameterInfo, Type, bool)

Attribute.GetCustomAttributes(Assembly)

Attribute.GetCustomAttributes(Assembly, bool)

Attribute.GetCustomAttributes(Assembly, Type)

Attribute.GetCustomAttributes(Assembly, Type, bool)

Attribute.GetCustomAttributes(MemberInfo)

Attribute.GetCustomAttributes(MemberInfo, bool)

Attribute.GetCustomAttributes(MemberInfo, Type)

Attribute.GetCustomAttributes(MemberInfo, Type, bool)

Attribute.GetCustomAttributes(Module)

Attribute.GetCustomAttributes(Module, bool)

Attribute.GetCustomAttributes(Module, Type)

Attribute.GetCustomAttributes(Module, Type, bool)

Attribute.GetCustomAttributes(ParameterInfo)

Attribute.GetCustomAttributes(ParameterInfo, bool)

Attribute.GetCustomAttributes(ParameterInfo, Type)

Attribute.GetCustomAttributes(ParameterInfo, Type, bool)

Attribute.GetHashCode()

Attribute.IsDefaultAttribute()

Attribute.IsDefined(Assembly, Type)

Attribute.IsDefined(Assembly, Type, bool)

Attribute.IsDefined(MemberInfo, Type)

Attribute.IsDefined(MemberInfo, Type, bool)

Attribute.IsDefined(Module, Type)

Attribute.IsDefined(Module, Type, bool)

Attribute.IsDefined(ParameterInfo, Type)

Attribute.IsDefined(ParameterInfo, Type, bool)

Attribute.Match(object)

Attribute.TypeId

object.Equals(object, object)

object.GetType()

object.ReferenceEquals(object, object)

object.ToString()

Examples

// Declare 256 floats of shared memory for a reduction
[Kernel(Backends = KernelBackends.CUDA | KernelBackends.Metal)]
[SharedMemory(typeof(float), "shared_data", Size = 256)]
public static void Reduction(ReadOnlySpan<float> input, Span<float> output)
{
    int tid = Kernel.ThreadId.X;
    int gid = Kernel.GlobalId.X;

    // Load into shared memory
    Kernel.SharedMemory<float>("shared_data")[tid] = input[gid];
    Kernel.Barrier();

    // Parallel reduction in shared memory
    for (int stride = 128; stride > 0; stride >>= 1)
    {
        if (tid < stride)
            Kernel.SharedMemory<float>("shared_data")[tid] +=
                Kernel.SharedMemory<float>("shared_data")[tid + stride];
        Kernel.Barrier();
    }

    if (tid == 0)
        output[Kernel.BlockId.X] = Kernel.SharedMemory<float>("shared_data")[0];
}

Remarks

Shared memory is significantly faster than global memory (typically 10-100x lower latency) and is essential for many GPU optimization patterns:

Reduction operations (sum, min, max across threads)
Tiled matrix multiplication (loading tiles into shared memory)
Stencil computations (caching neighbor values)
Inter-thread communication within a block

Backend Translation:

Backend	Translation
CUDA	`__shared__ T name[Size];`
Metal	`threadgroup T* name [[threadgroup(N)]];`
OpenCL	`__local T name[Size];`
CPU	Thread-local storage or stack allocation

Limitations:

Size is limited per thread block (typically 32-48KB on modern GPUs)
Only accessible within a single thread block/workgroup
Must use barriers for synchronization when sharing data

Constructors

SharedMemoryAttribute(Type, string)

Initializes a new instance of the SharedMemoryAttribute class.

public SharedMemoryAttribute(Type elementType, string name)

Parameters

elementType Type: The type of elements in the shared memory array.
name string: The name used to reference this shared memory in the kernel.

Properties

Alignment

Gets or sets the memory alignment requirement in bytes.

public int Alignment { get; set; }

Property Value

int: The required memory alignment in bytes. Default is 0 (use natural alignment).

Remarks

Proper alignment can improve memory access performance:

16 bytes - Optimal for float4/int4 vector types
32 bytes - Cache line alignment on some GPUs
128 bytes - Memory transaction alignment on NVIDIA GPUs

Backend Translation:

CUDA: __align__(N)
Metal: alignas(N)
OpenCL: Uses aligned allocation

ElementType

Gets the element type for the shared memory array.

public Type ElementType { get; }

Property Value

Type: The type of each element in the shared memory. Must be an unmanaged type (e.g., int, float, double, or an unmanaged struct).

MetalBindingIndex

Gets or sets the threadgroup binding index for Metal shaders.

public int MetalBindingIndex { get; set; }

Property Value

int: The threadgroup buffer binding index for Metal (0-based). Default is -1 (auto-assign based on declaration order).

Remarks

Metal requires explicit binding indices for threadgroup memory:

threadgroup float* shared_data [[threadgroup(0)]];
threadgroup int* shared_indices [[threadgroup(1)]];

When set to -1, binding indices are auto-assigned in declaration order. Set explicitly when multiple kernels share consistent binding layouts.

Name

Gets the name used to reference this shared memory allocation in the kernel code.

public string Name { get; }

Property Value

string: A unique identifier for this shared memory block within the kernel. Used with Kernel.SharedMemory<T>(name) to access the memory.

Size

Gets or sets the number of elements in the shared memory array.

public int Size { get; set; }

Property Value

int: The size of the shared memory array in elements (not bytes). Default is 0, which indicates dynamic sizing based on thread block size.

Remarks

When Size is 0 (dynamic), the shared memory is typically sized to match the thread block size or can be specified at kernel launch time.

Memory Calculation: Total bytes = Size * sizeof(ElementType)

Common Sizes:

256 - Standard block size
512/1024 - Larger blocks for higher occupancy
32 - Warp-sized for warp-level operations

ZeroInitialize

Gets or sets whether to zero-initialize the shared memory.

public bool ZeroInitialize { get; set; }

Property Value

bool: true to zero-initialize the shared memory before use; false for uninitialized memory (faster).

Remarks

Zero initialization adds overhead but ensures predictable initial values. Only enable when the algorithm requires known initial state.

Performance: Disabling saves ~1-2 clock cycles per element.

Table of Contents

Class SharedMemoryAttribute

Examples

Remarks

Constructors

SharedMemoryAttribute(Type, string)

Parameters

Properties

Alignment

Property Value

Remarks

ElementType

Property Value

MetalBindingIndex

Property Value

Remarks

Name

Property Value

Size

Property Value

Remarks

ZeroInitialize

Property Value

Remarks