Class NvidiaOpenCLAdapter
- Namespace
- DotCompute.Backends.OpenCL.Vendor
- Assembly
- DotCompute.Backends.OpenCL.dll
Vendor adapter for NVIDIA GPUs with CUDA-specific optimizations.
public sealed class NvidiaOpenCLAdapter : IOpenCLVendorAdapter
- Inheritance
-
NvidiaOpenCLAdapter
- Implements
- Inherited Members
Remarks
NVIDIA's OpenCL implementation is built on top of CUDA:
- Warp-based execution (32 threads execute in lockstep)
- 48KB shared memory per SM (configurable with L1 cache)
- Excellent support for out-of-order queues
- Strong FP64 support on compute-capable cards
- Coalesced memory access critical for performance
Optimization priorities:
- Align work groups to warp boundaries (multiples of 32)
- Use 128-byte alignment for global memory buffers
- Enable aggressive math optimizations
- Leverage out-of-order execution
Properties
Vendor
Gets the vendor type this adapter handles.
public OpenCLVendor Vendor { get; }
Property Value
VendorName
Gets the vendor's display name.
public string VendorName { get; }
Property Value
Methods
ApplyVendorOptimizations(QueueProperties, OpenCLDeviceInfo)
Applies vendor-specific queue properties.
public QueueProperties ApplyVendorOptimizations(QueueProperties properties, OpenCLDeviceInfo device)
Parameters
propertiesQueuePropertiesThe base queue properties.
deviceOpenCLDeviceInfoThe device the queue will operate on.
Returns
- QueueProperties
Modified queue properties optimized for this vendor.
Remarks
Some vendors benefit from out-of-order execution, while others work better with in-order queues depending on the workload characteristics.
CanHandle(OpenCLPlatformInfo)
Determines if this adapter can handle the specified platform.
public bool CanHandle(OpenCLPlatformInfo platform)
Parameters
platformOpenCLPlatformInfoThe OpenCL platform to evaluate.
Returns
- bool
trueif this adapter can handle the platform; otherwise,false.
GetCompilerOptions(bool)
Gets vendor-specific compiler options.
public string GetCompilerOptions(bool enableOptimizations)
Parameters
enableOptimizationsboolWhether to enable aggressive optimizations.
Returns
- string
Compiler options string suitable for clBuildProgram.
Remarks
Vendors support different compiler flags:
- Common: -cl-mad-enable, -cl-fast-relaxed-math
- NVIDIA: -cl-denorms-are-zero
- AMD: -cl-unsafe-math-optimizations
- Intel: Conservative optimizations for better compatibility
GetOptimalLocalMemorySize(OpenCLDeviceInfo)
Gets the optimal local memory size for this vendor.
public long GetOptimalLocalMemorySize(OpenCLDeviceInfo device)
Parameters
deviceOpenCLDeviceInfoThe device to query.
Returns
- long
The recommended local memory size in bytes.
Remarks
Local memory (shared memory in CUDA terms) has vendor-specific limits:
- NVIDIA: 48KB per SM (configurable with L1 cache)
- AMD: 64KB per CU
- Intel: 64KB per subslice
GetOptimalWorkGroupSize(OpenCLDeviceInfo, int)
Gets the optimal work group size for a kernel on this vendor's hardware.
public int GetOptimalWorkGroupSize(OpenCLDeviceInfo device, int defaultSize)
Parameters
deviceOpenCLDeviceInfoThe device to optimize for.
defaultSizeintThe default work group size to use as a baseline.
Returns
- int
The optimal work group size for this vendor.
Remarks
Work group sizing is critical for GPU performance:
- NVIDIA: Prefer multiples of 32 (warp size)
- AMD: Prefer multiples of 32/64 (wavefront size)
- Intel: Prefer multiples of 16 (SIMD width)
GetRecommendedBufferAlignment(OpenCLDeviceInfo)
Gets recommended buffer alignment for optimal memory access.
public int GetRecommendedBufferAlignment(OpenCLDeviceInfo device)
Parameters
deviceOpenCLDeviceInfoThe device to optimize for.
Returns
- int
The recommended alignment in bytes.
Remarks
Proper alignment ensures coalesced memory access:
- NVIDIA: 128-byte alignment for coalescing
- AMD: 256-byte alignment for optimal access
- Intel: 64-byte alignment (cache line size)
IsExtensionReliable(string, OpenCLDeviceInfo)
Checks if a specific extension is reliably supported by this vendor. Some vendors report extensions but have bugs/limitations.
public bool IsExtensionReliable(string extension, OpenCLDeviceInfo device)
Parameters
extensionstringThe extension name (e.g., "cl_khr_fp64").
deviceOpenCLDeviceInfoThe device to check.
Returns
- bool
trueif the extension is reliably supported; otherwise,false.
Remarks
Not all advertised extensions work correctly on all hardware. This method allows vendors to blacklist problematic extensions.
SupportsPersistentKernels(OpenCLDeviceInfo)
Indicates if this vendor benefits from persistent kernels.
public bool SupportsPersistentKernels(OpenCLDeviceInfo device)
Parameters
deviceOpenCLDeviceInfoThe device to check.
Returns
- bool
trueif persistent kernels are beneficial; otherwise,false.
Remarks
Persistent kernels keep work groups alive across multiple kernel invocations, which can improve performance for streaming workloads on high-end GPUs.