Class CudaAccelerator
- Namespace
- DotCompute.Backends.CUDA
- Assembly
- DotCompute.Backends.CUDA.dll
CUDA accelerator implementation providing high-performance GPU compute operations.
public sealed class CudaAccelerator : BaseAccelerator, IAccelerator, IAsyncDisposable
- Inheritance
-
CudaAccelerator
- Implements
- Inherited Members
Examples
// Create accelerator for default GPU device
using var accelerator = new CudaAccelerator();
// Get detailed device information
var deviceInfo = accelerator.GetDeviceInfo();
Console.WriteLine($"GPU: {deviceInfo.Name}, CC: {deviceInfo.ComputeCapability}");
// Compile and execute kernel
var kernel = await accelerator.CompileKernelAsync(kernelDef, options);
await accelerator.ExecuteKernelAsync(kernel, args);
await accelerator.SynchronizeAsync();
Remarks
The CUDA accelerator is the primary interface for executing compute kernels on NVIDIA GPUs. It provides automatic kernel compilation, memory management, and execution optimization.
Supported GPU Architectures:
- Compute Capability 5.0+ (Maxwell through Ada Lovelace/Hopper)
- RTX 2000 Ada Generation (Compute Capability 8.9) with optimizations
- Automatic detection and optimization for specific GPU models
Key Features:
- Zero-copy unified memory support
- CUDA graph execution for optimized kernel launches (CC 10.0+)
- Automatic kernel compilation with NVRTC
- Device memory pooling for reduced allocation overhead
Constructors
CudaAccelerator(int, ILogger<CudaAccelerator>?)
Initializes a new CUDA accelerator instance for GPU compute operations.
public CudaAccelerator(int deviceId = 0, ILogger<CudaAccelerator>? logger = null)
Parameters
deviceIdintZero-based index of the CUDA device to use (default: 0 for first GPU). Use cudaGetDeviceCount(out int) to enumerate available devices.
loggerILogger<CudaAccelerator>Optional logger for diagnostics and performance monitoring. If
null, a null logger is used (no logging output).
Examples
// Use default GPU (device 0)
using var gpu0 = new CudaAccelerator();
// Use second GPU with logging
var logger = loggerFactory.CreateLogger<CudaAccelerator>();
using var gpu1 = new CudaAccelerator(deviceId: 1, logger: logger);
Remarks
The accelerator automatically:
- Detects GPU compute capability and optimizes accordingly
- Initializes memory pools for efficient allocation
- Creates CUDA graph manager if CC 10.0+ is detected
- Sets up NVRTC compiler for runtime kernel compilation
Multi-GPU Systems:
Create separate accelerator instances for each GPU device to enable parallel execution.
Exceptions
- InvalidOperationException
Thrown when CUDA initialization fails, device doesn't exist, or driver is incompatible.
Properties
Device
Gets the underlying CUDA device providing hardware information and capabilities.
public CudaDevice Device { get; }
Property Value
- CudaDevice
The CUDA device instance for this accelerator.
DeviceId
Gets the CUDA device identifier used for multi-GPU scenarios.
public int DeviceId { get; }
Property Value
- int
Zero-based device index (0 = first GPU, 1 = second GPU, etc.).
GraphManager
Gets the CUDA graph manager for optimized repeated kernel execution patterns.
public CudaGraphManager? GraphManager { get; }
Property Value
- CudaGraphManager
Graph manager instance if supported (Compute Capability 10.0+), otherwise
null. CUDA graphs capture kernel launch sequences for faster replay with reduced CPU overhead.
Remarks
Available only on GPUs with compute capability 10.0 or higher (Hopper architecture and newer). Use graphs for repetitive execution patterns to reduce launch overhead by up to 50%.
Methods
CompileKernelCoreAsync(KernelDefinition, CompilationOptions, CancellationToken)
Core kernel compilation logic to be implemented by derived classes.
protected override ValueTask<ICompiledKernel> CompileKernelCoreAsync(KernelDefinition definition, CompilationOptions options, CancellationToken cancellationToken)
Parameters
definitionKernelDefinitionoptionsCompilationOptionscancellationTokenCancellationToken
Returns
DisposeCoreAsync()
Core disposal logic to be implemented by derived classes.
protected override ValueTask DisposeCoreAsync()
Returns
GetDeviceInfo()
Retrieves comprehensive device information including hardware specifications and capabilities.
public DeviceInfo GetDeviceInfo()
Returns
- DeviceInfo
Detailed device information including compute capability, memory configuration, multiprocessor count, and architecture-specific features like RTX Ada detection.
Examples
var info = accelerator.GetDeviceInfo();
Console.WriteLine($"GPU: {info.Name}");
Console.WriteLine($"Compute Capability: {info.ComputeCapability}");
Console.WriteLine($"Memory: {info.GlobalMemoryBytes / (1024 * 1024 * 1024)} GB");
Console.WriteLine($"CUDA Cores: ~{info.EstimatedCudaCores}");
Console.WriteLine($"Memory Bandwidth: {info.MemoryBandwidthGBps:F1} GB/s");
if (info.IsRTX2000Ada)
{
Console.WriteLine("Detected RTX 2000 Ada - enabling Ada optimizations");
}
Remarks
The returned DeviceInfo includes:
- Hardware: GPU name, compute capability, SM count, CUDA core estimate
- Memory: Total/available memory, bandwidth, L2 cache size
- Capabilities: Unified memory, concurrent kernels, ECC support
- Limits: Max threads per block, shared memory, warp size
- Architecture: Generation detection (Ampere, Ada, Hopper, etc.)
RTX 2000 Ada Detection:
The IsRTX2000Ada property specifically identifies the RTX 2000 Ada Generation
(Compute Capability 8.9) for architecture-specific optimizations.
Exceptions
- ObjectDisposedException
Thrown if the accelerator has been disposed.
GetHealthSnapshotAsync(CancellationToken)
Gets a comprehensive health snapshot of the CUDA device.
public override ValueTask<DeviceHealthSnapshot> GetHealthSnapshotAsync(CancellationToken cancellationToken = default)
Parameters
cancellationTokenCancellationTokenCancellation token.
Returns
- ValueTask<DeviceHealthSnapshot>
A task containing the device health snapshot.
Remarks
This method queries NVIDIA Management Library (NVML) for real-time GPU metrics including: - Temperature, power consumption, and fan speed - GPU and memory utilization percentages - Clock frequencies (graphics and memory) - Memory usage statistics - PCIe throughput measurements - Throttling status and reasons
Performance: Typically takes 2-5ms to collect all metrics via NVML. Results are collected synchronously but wrapped in ValueTask for consistency.
Requirements: NVIDIA driver with NVML support. Falls back to unavailable snapshot if NVML is not available or initialization fails.
GetProfilingMetricsAsync(CancellationToken)
Gets current profiling metrics from the CUDA device.
public override ValueTask<IReadOnlyList<ProfilingMetric>> GetProfilingMetricsAsync(CancellationToken cancellationToken = default)
Parameters
cancellationTokenCancellationTokenCancellation token.
Returns
- ValueTask<IReadOnlyList<ProfilingMetric>>
A task containing the collection of profiling metrics.
GetProfilingSnapshotAsync(CancellationToken)
Gets a comprehensive profiling snapshot of the CUDA device.
public override ValueTask<ProfilingSnapshot> GetProfilingSnapshotAsync(CancellationToken cancellationToken = default)
Parameters
cancellationTokenCancellationTokenCancellation token.
Returns
- ValueTask<ProfilingSnapshot>
A task containing the profiling snapshot.
Remarks
CUDA profiling provides detailed performance metrics using CUDA Events for precise timing. This implementation tracks kernel execution statistics, memory operations, and device utilization.
Available Metrics: - Kernel execution statistics (average, min, max, median, P95, P99) - Memory transfer statistics (bandwidth, transfer counts) - Device utilization (estimated from execution patterns) - Performance trends and bottleneck identification
Profiling Overhead: Minimal (<0.5%) as timing data is collected passively. CUDA Events provide hardware-accurate timing without CPU synchronization overhead.
Performance: Typically less than 1ms to collect and aggregate metrics.
GetSensorReadingsAsync(CancellationToken)
Gets current sensor readings from the CUDA device.
public override ValueTask<IReadOnlyList<SensorReading>> GetSensorReadingsAsync(CancellationToken cancellationToken = default)
Parameters
cancellationTokenCancellationTokenCancellation token.
Returns
- ValueTask<IReadOnlyList<SensorReading>>
A task containing the collection of sensor readings.
InitializeCore()
Core initialization logic to be implemented by derived classes.
protected override object? InitializeCore()
Returns
- object
Initialization result (typically null or status object)
Reset()
Resets the CUDA device to a clean state, clearing all memory allocations and reinitializing the context.
public void Reset()
Examples
// Reset device after encountering errors
try
{
await accelerator.ExecuteKernelAsync(kernel, args);
}
catch (CudaException ex) when (ex.Error == CudaError.IllegalAddress)
{
accelerator.Reset(); // Clean slate
// Retry operation...
}
Remarks
This operation:
- Frees all device memory allocations managed by this accelerator
- Destroys all CUDA contexts on this device
- Resets the device to its initial state
- Reinitializes the accelerator context
Warning: This is a heavyweight operation that affects all CUDA contexts on the device, not just this accelerator instance. Use sparingly, primarily for:
- Recovering from device errors
- Cleaning up after memory leaks during development
- Benchmarking scenarios requiring pristine device state
Exceptions
- ObjectDisposedException
Thrown if the accelerator has been disposed.
- InvalidOperationException
Thrown if device reset fails due to driver error or pending operations.
ResetAsync(ResetOptions?, CancellationToken)
Resets the CUDA device to a clean state.
public override ValueTask<ResetResult> ResetAsync(ResetOptions? options = null, CancellationToken cancellationToken = default)
Parameters
optionsResetOptionscancellationTokenCancellationToken
Returns
SynchronizeCoreAsync(CancellationToken)
Core synchronization logic to be implemented by derived classes.
protected override ValueTask SynchronizeCoreAsync(CancellationToken cancellationToken)
Parameters
cancellationTokenCancellationToken