Enum AcceleratorFeature

Defines hardware and software features that may be supported by compute accelerators.

[Flags]
public enum AcceleratorFeature

Fields

AtomicOperations = 128

Support for atomic operations on global and shared memory.

Essential for implementing thread-safe data structures and algorithms that require synchronization between threads.

Bfloat16 = 256

Support for Brain Floating Point 16-bit format (bfloat16).

A 16-bit format that maintains the same exponent range as float32, popular in machine learning for its balance of range and precision.

CooperativeGroups = 32

Support for cooperative groups and grid synchronization.

Enables synchronization across multiple thread blocks, allowing more complex parallel algorithms to be implemented.

DoublePrecision = 2

Support for 64-bit floating-point (double-precision) operations.

Essential for scientific computing applications requiring high numerical precision.

DynamicParallelism = 64

Support for dynamic parallelism (nested kernel launches).

Allows kernels to launch other kernels directly from device code, enabling recursive and adaptive algorithms.

Float16 = 1

Support for 16-bit floating-point (half-precision) operations.

This feature enables faster computation for workloads that don't require full precision, such as certain machine learning inference tasks.

LongInteger = 4

Support for 64-bit integer operations.

Required for applications working with large integer values or pointers on 64-bit systems.

MixedPrecision = 1024

Support for mixed-precision operations within a single kernel.

Allows combining different precision levels in a single computation for optimal performance and accuracy trade-offs.

None = 0

No special features are supported.

SignedByte = 512

Support for signed 8-bit integer operations.

Enables efficient quantized integer operations, commonly used in optimized neural network inference.

TensorCores = 8

Support for Tensor Core operations (NVIDIA) or equivalent matrix acceleration units.

Provides significant acceleration for matrix multiplication and convolution operations, particularly beneficial for deep learning workloads.

UnifiedMemory = 16

Support for unified memory between host and device.

Allows automatic memory migration between CPU and GPU, simplifying memory management at the potential cost of performance.

This enumeration uses the FlagsAttribute to allow combination of multiple features. Use bitwise operations to check for multiple feature support.