Enum DeviceFeatures
- Namespace
- DotCompute.Abstractions.Models.Device
- Assembly
- DotCompute.Abstractions.dll
Defines feature flags representing various capabilities supported by compute devices. These flags can be combined using bitwise operations to represent multiple features.
[Flags]
public enum DeviceFeatures
Fields
Atomics = 4Device supports atomic operations for thread-safe memory access.
Enables lock-free algorithms, reduction operations, and safe concurrent memory modifications across multiple work items. Essential for algorithms requiring synchronization between parallel threads without explicit locks.
DoublePrecision = 1Device supports double-precision (64-bit) floating-point operations.
Enables high-precision mathematical computations required for scientific applications, financial calculations, and scenarios where floating-point accuracy is critical. Not all devices support double precision due to hardware limitations or performance considerations.
DynamicParallelism = 128Device supports dynamic parallelism for nested kernel launches.
Allows kernels to launch other kernels dynamically, enabling recursive algorithms, adaptive parallelization, and complex control flow patterns. Particularly useful for irregular problems where the amount of work is not known until runtime.
HalfPrecision = 2Device supports half-precision (16-bit) floating-point operations.
Provides memory-efficient computations with reduced precision, commonly used in machine learning, graphics, and applications where memory bandwidth is more important than precision. Offers significant performance benefits for suitable workloads.
Images = 16Device supports image objects and texture operations.
Enables specialized image processing operations with hardware-accelerated filtering, interpolation, and format conversion. Supports various image formats and provides optimized memory access patterns for 2D data.
Images3D = 32Device supports three-dimensional image objects.
Extends image support to 3D volumes, enabling volumetric rendering, 3D convolutions, and scientific visualization applications. Provides hardware-accelerated 3D interpolation and filtering capabilities.
LocalMemory = 8Device supports local (shared) memory for work-group communication.
Provides high-speed memory shared among work items in the same work group. Local memory enables efficient data sharing, reduction operations, and cache-like behavior for frequently accessed data. Critical for optimizing memory-intensive algorithms.
None = 0No special features are supported beyond basic compute capability.
Represents a minimal compute device with only basic integer and single-precision floating-point operations. This is the baseline capability that all devices must support.
TensorCores = 256Device includes tensor processing units for accelerated AI workloads.
Provides specialized hardware for matrix operations, convolutions, and other AI/ML primitives. Tensor cores can dramatically accelerate deep learning training and inference through mixed-precision operations and optimized matrix multiplication algorithms.
UnifiedMemory = 64Device supports unified memory addressing between host and device.
Enables seamless memory access where the same pointers can be used on both host and device. Simplifies programming model and enables automatic memory migration based on access patterns. Reduces the need for explicit memory transfers.
Remarks
Device features determine which operations, data types, and programming constructs are available for kernel development. The framework uses these flags to enable conditional compilation, optimize kernels, and validate compatibility. Features are discovered during device initialization and remain constant.