Class HardwareInfo
- Namespace
- DotCompute.Algorithms
- Assembly
- DotCompute.Algorithms.dll
GPU hardware information for kernel optimization.
[SuppressMessage("Design", "CA1034:Nested types should not be visible", Justification = "Type made public to fix CA0050/CA0051 accessibility warnings. Used in public method signatures.")]
public class HardwareInfo
- Inheritance
-
HardwareInfo
- Inherited Members
Remarks
Hardware characteristics queried from GPU device for adaptive kernel configuration. Used to tune work group sizes, memory usage, and algorithm selection based on device capabilities.
Obtained via platform-specific APIs (CUDA: cudaDeviceGetAttribute, OpenCL: clGetDeviceInfo, Metal: MTLDevice properties).
Properties
ComputeUnits
Gets or sets the number of compute units (SMs/CUs).
public int ComputeUnits { get; set; }
Property Value
Remarks
Number of parallel processing units on GPU. Used to determine optimal grid size and occupancy targets.
Examples:
- RTX 4090: 128 SMs
- RTX 2000 Ada: 28 SMs
- A100: 108 SMs
- RX 7900 XTX: 96 CUs
Grid Sizing: Typically aim for 2-4x compute units for occupancy
GlobalMemorySize
Gets or sets the global memory size in bytes.
public long GlobalMemorySize { get; set; }
Property Value
Remarks
Total GPU device memory (VRAM) available. Determines maximum matrix sizes and whether out-of-core algorithms are needed.
Typical Values:
- Consumer GPUs: 4-24 GB
- Professional GPUs: 16-80 GB
- Data Center GPUs: 40-80 GB (A100, H100)
MaxWorkGroupSize
Gets or sets the maximum work group size (threads per block).
public int MaxWorkGroupSize { get; set; }
Property Value
Remarks
Maximum number of threads that can execute concurrently in a single work group. Constrains LocalWorkSize configuration.
Typical Values: 256-1024 threads
NVIDIA: 1024 threads per block (most architectures)
AMD: 256-1024 threads per workgroup
PreferredWorkGroupSizeMultiple
Gets or sets the preferred work group size multiple (warp/wavefront size).
public int PreferredWorkGroupSizeMultiple { get; set; }
Property Value
Remarks
SIMD width for optimal execution efficiency. Work group sizes should be multiples of this value to avoid partial warps/wavefronts.
NVIDIA: 32 (warp size)
AMD: 64 (wavefront size)
Intel: 8-32 depending on architecture
Apple Silicon: 32 (SIMD group size)
SharedMemorySize
Gets or sets the shared memory size per work group in bytes.
public int SharedMemorySize { get; set; }
Property Value
Remarks
Fast on-chip memory shared within a work group (CUDA: shared memory, OpenCL: local memory, Metal: threadgroup memory).
NVIDIA: 48-164 KB per SM depending on architecture
AMD: 64 KB LDS per CU
Apple Silicon: 32-64 KB threadgroup memory
SupportsDoublePrecision
Gets or sets whether the hardware supports double precision (FP64).
public bool SupportsDoublePrecision { get; set; }
Property Value
Remarks
Indicates native hardware support for 64-bit floating point operations. Consumer GPUs often have reduced FP64 performance (1/32 of FP32).
Full FP64: Professional/datacenter GPUs (Tesla, Instinct)
Reduced FP64: Consumer GPUs (GeForce, Radeon)
SupportsTensorCores
Gets or sets whether the hardware supports tensor cores.
public bool SupportsTensorCores { get; set; }
Property Value
Remarks
Indicates availability of specialized hardware for mixed-precision matrix multiplication (e.g., NVIDIA Tensor Cores, AMD Matrix Cores).
NVIDIA: Volta, Turing, Ampere, Ada, Hopper architectures
AMD: RDNA 3 and CDNA architectures
WarpSize
Gets or sets the warp/wavefront size for the hardware.
public int? WarpSize { get; set; }
Property Value
- int?
Remarks
SIMD execution width. Same as PreferredWorkGroupSizeMultiple but exposed with CUDA-style naming for convenience.
NVIDIA: 32 (warp)
AMD: 64 (wavefront)