Table of Contents

Enum DeviceFeatures

Namespace
DotCompute.Abstractions.Models.Device
Assembly
DotCompute.Abstractions.dll

Defines feature flags representing various capabilities supported by compute devices. These flags can be combined using bitwise operations to represent multiple features.

[Flags]
public enum DeviceFeatures

Fields

Atomics = 4

Device supports atomic operations for thread-safe memory access.

Enables lock-free algorithms, reduction operations, and safe concurrent memory modifications across multiple work items. Essential for algorithms requiring synchronization between parallel threads without explicit locks.

DoublePrecision = 1

Device supports double-precision (64-bit) floating-point operations.

Enables high-precision mathematical computations required for scientific applications, financial calculations, and scenarios where floating-point accuracy is critical. Not all devices support double precision due to hardware limitations or performance considerations.

DynamicParallelism = 128

Device supports dynamic parallelism for nested kernel launches.

Allows kernels to launch other kernels dynamically, enabling recursive algorithms, adaptive parallelization, and complex control flow patterns. Particularly useful for irregular problems where the amount of work is not known until runtime.

HalfPrecision = 2

Device supports half-precision (16-bit) floating-point operations.

Provides memory-efficient computations with reduced precision, commonly used in machine learning, graphics, and applications where memory bandwidth is more important than precision. Offers significant performance benefits for suitable workloads.

Images = 16

Device supports image objects and texture operations.

Enables specialized image processing operations with hardware-accelerated filtering, interpolation, and format conversion. Supports various image formats and provides optimized memory access patterns for 2D data.

Images3D = 32

Device supports three-dimensional image objects.

Extends image support to 3D volumes, enabling volumetric rendering, 3D convolutions, and scientific visualization applications. Provides hardware-accelerated 3D interpolation and filtering capabilities.

LocalMemory = 8

Device supports local (shared) memory for work-group communication.

Provides high-speed memory shared among work items in the same work group. Local memory enables efficient data sharing, reduction operations, and cache-like behavior for frequently accessed data. Critical for optimizing memory-intensive algorithms.

None = 0

No special features are supported beyond basic compute capability.

Represents a minimal compute device with only basic integer and single-precision floating-point operations. This is the baseline capability that all devices must support.

TensorCores = 256

Device includes tensor processing units for accelerated AI workloads.

Provides specialized hardware for matrix operations, convolutions, and other AI/ML primitives. Tensor cores can dramatically accelerate deep learning training and inference through mixed-precision operations and optimized matrix multiplication algorithms.

UnifiedMemory = 64

Device supports unified memory addressing between host and device.

Enables seamless memory access where the same pointers can be used on both host and device. Simplifies programming model and enables automatic memory migration based on access patterns. Reduces the need for explicit memory transfers.

Remarks

Device features determine which operations, data types, and programming constructs are available for kernel development. The framework uses these flags to enable conditional compilation, optimize kernels, and validate compatibility. Features are discovered during device initialization and remain constant.