Table of Contents

Class CudaAccelerator

Namespace
DotCompute.Backends.CUDA
Assembly
DotCompute.Backends.CUDA.dll

CUDA accelerator implementation providing high-performance GPU compute operations.

public sealed class CudaAccelerator : BaseAccelerator, IAccelerator, IAsyncDisposable
Inheritance
CudaAccelerator
Implements
Inherited Members

Examples

// Create accelerator for default GPU device
using var accelerator = new CudaAccelerator();

// Get detailed device information
var deviceInfo = accelerator.GetDeviceInfo();
Console.WriteLine($"GPU: {deviceInfo.Name}, CC: {deviceInfo.ComputeCapability}");

// Compile and execute kernel
var kernel = await accelerator.CompileKernelAsync(kernelDef, options);
await accelerator.ExecuteKernelAsync(kernel, args);
await accelerator.SynchronizeAsync();

Remarks

The CUDA accelerator is the primary interface for executing compute kernels on NVIDIA GPUs. It provides automatic kernel compilation, memory management, and execution optimization.

Supported GPU Architectures:

  • Compute Capability 5.0+ (Maxwell through Ada Lovelace/Hopper)
  • RTX 2000 Ada Generation (Compute Capability 8.9) with optimizations
  • Automatic detection and optimization for specific GPU models

Key Features:

  • Zero-copy unified memory support
  • CUDA graph execution for optimized kernel launches (CC 10.0+)
  • Automatic kernel compilation with NVRTC
  • Device memory pooling for reduced allocation overhead

Constructors

CudaAccelerator(int, ILogger<CudaAccelerator>?)

Initializes a new CUDA accelerator instance for GPU compute operations.

public CudaAccelerator(int deviceId = 0, ILogger<CudaAccelerator>? logger = null)

Parameters

deviceId int

Zero-based index of the CUDA device to use (default: 0 for first GPU). Use cudaGetDeviceCount(out int) to enumerate available devices.

logger ILogger<CudaAccelerator>

Optional logger for diagnostics and performance monitoring. If null, a null logger is used (no logging output).

Examples

// Use default GPU (device 0)
using var gpu0 = new CudaAccelerator();

// Use second GPU with logging
var logger = loggerFactory.CreateLogger<CudaAccelerator>();
using var gpu1 = new CudaAccelerator(deviceId: 1, logger: logger);

Remarks

The accelerator automatically:

  • Detects GPU compute capability and optimizes accordingly
  • Initializes memory pools for efficient allocation
  • Creates CUDA graph manager if CC 10.0+ is detected
  • Sets up NVRTC compiler for runtime kernel compilation

Multi-GPU Systems:
Create separate accelerator instances for each GPU device to enable parallel execution.

Exceptions

InvalidOperationException

Thrown when CUDA initialization fails, device doesn't exist, or driver is incompatible.

Properties

Device

Gets the underlying CUDA device providing hardware information and capabilities.

public CudaDevice Device { get; }

Property Value

CudaDevice

The CUDA device instance for this accelerator.

DeviceId

Gets the CUDA device identifier used for multi-GPU scenarios.

public int DeviceId { get; }

Property Value

int

Zero-based device index (0 = first GPU, 1 = second GPU, etc.).

GraphManager

Gets the CUDA graph manager for optimized repeated kernel execution patterns.

public CudaGraphManager? GraphManager { get; }

Property Value

CudaGraphManager

Graph manager instance if supported (Compute Capability 10.0+), otherwise null. CUDA graphs capture kernel launch sequences for faster replay with reduced CPU overhead.

Remarks

Available only on GPUs with compute capability 10.0 or higher (Hopper architecture and newer). Use graphs for repetitive execution patterns to reduce launch overhead by up to 50%.

Methods

CompileKernelCoreAsync(KernelDefinition, CompilationOptions, CancellationToken)

Core kernel compilation logic to be implemented by derived classes.

protected override ValueTask<ICompiledKernel> CompileKernelCoreAsync(KernelDefinition definition, CompilationOptions options, CancellationToken cancellationToken)

Parameters

definition KernelDefinition
options CompilationOptions
cancellationToken CancellationToken

Returns

ValueTask<ICompiledKernel>

DisposeCoreAsync()

Core disposal logic to be implemented by derived classes.

protected override ValueTask DisposeCoreAsync()

Returns

ValueTask

GetDeviceInfo()

Retrieves comprehensive device information including hardware specifications and capabilities.

public DeviceInfo GetDeviceInfo()

Returns

DeviceInfo

Detailed device information including compute capability, memory configuration, multiprocessor count, and architecture-specific features like RTX Ada detection.

Examples

var info = accelerator.GetDeviceInfo();
Console.WriteLine($"GPU: {info.Name}");
Console.WriteLine($"Compute Capability: {info.ComputeCapability}");
Console.WriteLine($"Memory: {info.GlobalMemoryBytes / (1024 * 1024 * 1024)} GB");
Console.WriteLine($"CUDA Cores: ~{info.EstimatedCudaCores}");
Console.WriteLine($"Memory Bandwidth: {info.MemoryBandwidthGBps:F1} GB/s");

if (info.IsRTX2000Ada)
{
    Console.WriteLine("Detected RTX 2000 Ada - enabling Ada optimizations");
}

Remarks

The returned DeviceInfo includes:

  • Hardware: GPU name, compute capability, SM count, CUDA core estimate
  • Memory: Total/available memory, bandwidth, L2 cache size
  • Capabilities: Unified memory, concurrent kernels, ECC support
  • Limits: Max threads per block, shared memory, warp size
  • Architecture: Generation detection (Ampere, Ada, Hopper, etc.)

RTX 2000 Ada Detection:
The IsRTX2000Ada property specifically identifies the RTX 2000 Ada Generation (Compute Capability 8.9) for architecture-specific optimizations.

Exceptions

ObjectDisposedException

Thrown if the accelerator has been disposed.

GetHealthSnapshotAsync(CancellationToken)

Gets a comprehensive health snapshot of the CUDA device.

public override ValueTask<DeviceHealthSnapshot> GetHealthSnapshotAsync(CancellationToken cancellationToken = default)

Parameters

cancellationToken CancellationToken

Cancellation token.

Returns

ValueTask<DeviceHealthSnapshot>

A task containing the device health snapshot.

Remarks

This method queries NVIDIA Management Library (NVML) for real-time GPU metrics including: - Temperature, power consumption, and fan speed - GPU and memory utilization percentages - Clock frequencies (graphics and memory) - Memory usage statistics - PCIe throughput measurements - Throttling status and reasons

Performance: Typically takes 2-5ms to collect all metrics via NVML. Results are collected synchronously but wrapped in ValueTask for consistency.

Requirements: NVIDIA driver with NVML support. Falls back to unavailable snapshot if NVML is not available or initialization fails.

GetProfilingMetricsAsync(CancellationToken)

Gets current profiling metrics from the CUDA device.

public override ValueTask<IReadOnlyList<ProfilingMetric>> GetProfilingMetricsAsync(CancellationToken cancellationToken = default)

Parameters

cancellationToken CancellationToken

Cancellation token.

Returns

ValueTask<IReadOnlyList<ProfilingMetric>>

A task containing the collection of profiling metrics.

GetProfilingSnapshotAsync(CancellationToken)

Gets a comprehensive profiling snapshot of the CUDA device.

public override ValueTask<ProfilingSnapshot> GetProfilingSnapshotAsync(CancellationToken cancellationToken = default)

Parameters

cancellationToken CancellationToken

Cancellation token.

Returns

ValueTask<ProfilingSnapshot>

A task containing the profiling snapshot.

Remarks

CUDA profiling provides detailed performance metrics using CUDA Events for precise timing. This implementation tracks kernel execution statistics, memory operations, and device utilization.

Available Metrics: - Kernel execution statistics (average, min, max, median, P95, P99) - Memory transfer statistics (bandwidth, transfer counts) - Device utilization (estimated from execution patterns) - Performance trends and bottleneck identification

Profiling Overhead: Minimal (<0.5%) as timing data is collected passively. CUDA Events provide hardware-accurate timing without CPU synchronization overhead.

Performance: Typically less than 1ms to collect and aggregate metrics.

GetSensorReadingsAsync(CancellationToken)

Gets current sensor readings from the CUDA device.

public override ValueTask<IReadOnlyList<SensorReading>> GetSensorReadingsAsync(CancellationToken cancellationToken = default)

Parameters

cancellationToken CancellationToken

Cancellation token.

Returns

ValueTask<IReadOnlyList<SensorReading>>

A task containing the collection of sensor readings.

InitializeCore()

Core initialization logic to be implemented by derived classes.

protected override object? InitializeCore()

Returns

object

Initialization result (typically null or status object)

Reset()

Resets the CUDA device to a clean state, clearing all memory allocations and reinitializing the context.

public void Reset()

Examples

// Reset device after encountering errors
try
{
    await accelerator.ExecuteKernelAsync(kernel, args);
}
catch (CudaException ex) when (ex.Error == CudaError.IllegalAddress)
{
    accelerator.Reset(); // Clean slate
    // Retry operation...
}

Remarks

This operation:

  • Frees all device memory allocations managed by this accelerator
  • Destroys all CUDA contexts on this device
  • Resets the device to its initial state
  • Reinitializes the accelerator context

Warning: This is a heavyweight operation that affects all CUDA contexts on the device, not just this accelerator instance. Use sparingly, primarily for:

  • Recovering from device errors
  • Cleaning up after memory leaks during development
  • Benchmarking scenarios requiring pristine device state

Exceptions

ObjectDisposedException

Thrown if the accelerator has been disposed.

InvalidOperationException

Thrown if device reset fails due to driver error or pending operations.

ResetAsync(ResetOptions?, CancellationToken)

Resets the CUDA device to a clean state.

public override ValueTask<ResetResult> ResetAsync(ResetOptions? options = null, CancellationToken cancellationToken = default)

Parameters

options ResetOptions
cancellationToken CancellationToken

Returns

ValueTask<ResetResult>

SynchronizeCoreAsync(CancellationToken)

Core synchronization logic to be implemented by derived classes.

protected override ValueTask SynchronizeCoreAsync(CancellationToken cancellationToken)

Parameters

cancellationToken CancellationToken

Returns

ValueTask