Table of Contents

Class HardwareInfo

Namespace
DotCompute.Algorithms
Assembly
DotCompute.Algorithms.dll

GPU hardware information for kernel optimization.

[SuppressMessage("Design", "CA1034:Nested types should not be visible", Justification = "Type made public to fix CA0050/CA0051 accessibility warnings. Used in public method signatures.")]
public class HardwareInfo
Inheritance
HardwareInfo
Inherited Members

Remarks

Hardware characteristics queried from GPU device for adaptive kernel configuration. Used to tune work group sizes, memory usage, and algorithm selection based on device capabilities.

Obtained via platform-specific APIs (CUDA: cudaDeviceGetAttribute, OpenCL: clGetDeviceInfo, Metal: MTLDevice properties).

Properties

ComputeUnits

Gets or sets the number of compute units (SMs/CUs).

public int ComputeUnits { get; set; }

Property Value

int

Remarks

Number of parallel processing units on GPU. Used to determine optimal grid size and occupancy targets.

Examples:

  • RTX 4090: 128 SMs
  • RTX 2000 Ada: 28 SMs
  • A100: 108 SMs
  • RX 7900 XTX: 96 CUs

Grid Sizing: Typically aim for 2-4x compute units for occupancy

GlobalMemorySize

Gets or sets the global memory size in bytes.

public long GlobalMemorySize { get; set; }

Property Value

long

Remarks

Total GPU device memory (VRAM) available. Determines maximum matrix sizes and whether out-of-core algorithms are needed.

Typical Values:

  • Consumer GPUs: 4-24 GB
  • Professional GPUs: 16-80 GB
  • Data Center GPUs: 40-80 GB (A100, H100)

MaxWorkGroupSize

Gets or sets the maximum work group size (threads per block).

public int MaxWorkGroupSize { get; set; }

Property Value

int

Remarks

Maximum number of threads that can execute concurrently in a single work group. Constrains LocalWorkSize configuration.

Typical Values: 256-1024 threads

NVIDIA: 1024 threads per block (most architectures)

AMD: 256-1024 threads per workgroup

PreferredWorkGroupSizeMultiple

Gets or sets the preferred work group size multiple (warp/wavefront size).

public int PreferredWorkGroupSizeMultiple { get; set; }

Property Value

int

Remarks

SIMD width for optimal execution efficiency. Work group sizes should be multiples of this value to avoid partial warps/wavefronts.

NVIDIA: 32 (warp size)

AMD: 64 (wavefront size)

Intel: 8-32 depending on architecture

Apple Silicon: 32 (SIMD group size)

SharedMemorySize

Gets or sets the shared memory size per work group in bytes.

public int SharedMemorySize { get; set; }

Property Value

int

Remarks

Fast on-chip memory shared within a work group (CUDA: shared memory, OpenCL: local memory, Metal: threadgroup memory).

NVIDIA: 48-164 KB per SM depending on architecture

AMD: 64 KB LDS per CU

Apple Silicon: 32-64 KB threadgroup memory

SupportsDoublePrecision

Gets or sets whether the hardware supports double precision (FP64).

public bool SupportsDoublePrecision { get; set; }

Property Value

bool

Remarks

Indicates native hardware support for 64-bit floating point operations. Consumer GPUs often have reduced FP64 performance (1/32 of FP32).

Full FP64: Professional/datacenter GPUs (Tesla, Instinct)

Reduced FP64: Consumer GPUs (GeForce, Radeon)

SupportsTensorCores

Gets or sets whether the hardware supports tensor cores.

public bool SupportsTensorCores { get; set; }

Property Value

bool

Remarks

Indicates availability of specialized hardware for mixed-precision matrix multiplication (e.g., NVIDIA Tensor Cores, AMD Matrix Cores).

NVIDIA: Volta, Turing, Ampere, Ada, Hopper architectures

AMD: RDNA 3 and CDNA architectures

WarpSize

Gets or sets the warp/wavefront size for the hardware.

public int? WarpSize { get; set; }

Property Value

int?

Remarks

SIMD execution width. Same as PreferredWorkGroupSizeMultiple but exposed with CUDA-style naming for convenience.

NVIDIA: 32 (warp)

AMD: 64 (wavefront)