Backend Selection
This module covers DotCompute's backend system and how to target specific hardware for optimal performance.
Available Backends
DotCompute supports multiple compute backends:
| Backend | Vendor | Status | Use Case |
|---|---|---|---|
| CUDA | NVIDIA | ✅ Production | Production GPU computing |
| CPU | All | ✅ Production | Development, fallback, SIMD optimization |
| Metal | Apple | ⚠️ Experimental | macOS/iOS applications |
| OpenCL | Cross-vendor | ⚠️ Experimental | AMD, Intel, cross-platform |
Automatic Backend Selection
By default, DotCompute selects the best available backend automatically:
using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.Hosting;
using DotCompute.Runtime;
using DotCompute.Abstractions.Interfaces;
using DotCompute.Abstractions.Factories;
// Build host with DotCompute services
var host = Host.CreateApplicationBuilder(args);
host.Services.AddDotComputeRuntime(); // Registers all services
var app = host.Build();
// Get orchestrator - handles automatic backend selection
var orchestrator = app.Services.GetRequiredService<IComputeOrchestrator>();
// Execute kernel (orchestrator selects best backend)
await orchestrator.ExecuteKernelAsync(
kernelName: "VectorAdd",
args: new object[] { a, b, result }
);
Selection priority:
- CUDA (if NVIDIA GPU available)
- Metal (if Apple Silicon available)
- OpenCL (if supported GPU available)
- CPU (always available)
Device Discovery
Use IUnifiedAcceleratorFactory to enumerate available devices:
var factory = app.Services.GetRequiredService<IUnifiedAcceleratorFactory>();
// Get all available devices
var devices = await factory.GetAvailableDevicesAsync();
Console.WriteLine($"Found {devices.Count} device(s):");
foreach (var device in devices)
{
Console.WriteLine($" - {device.Name} ({device.DeviceType})");
Console.WriteLine($" Memory: {device.TotalMemory / (1024.0 * 1024 * 1024):F2} GB");
Console.WriteLine($" Compute Units: {device.MaxComputeUnits}");
Console.WriteLine();
}
Manual Backend Selection
Specify Preferred Backend
You can request a specific backend when executing kernels:
var orchestrator = app.Services.GetRequiredService<IComputeOrchestrator>();
// Request CUDA backend explicitly
await orchestrator.ExecuteAsync<object>(
"VectorAdd",
"CUDA", // Preferred backend
a, b, result
);
// Request CPU backend (useful for debugging)
await orchestrator.ExecuteAsync<object>(
"VectorAdd",
"CPU",
a, b, result
);
Create Accelerator for Specific Device
For more control, create an accelerator for a specific device:
var factory = app.Services.GetRequiredService<IUnifiedAcceleratorFactory>();
var devices = await factory.GetAvailableDevicesAsync();
// Find CUDA device
var cudaDevice = devices.FirstOrDefault(d => d.DeviceType == "CUDA");
if (cudaDevice != null)
{
// Create accelerator for this specific device
using var accelerator = await factory.CreateAsync(cudaDevice);
Console.WriteLine($"Using: {cudaDevice.Name}");
// Execute with specific accelerator
await orchestrator.ExecuteAsync<object>(
"VectorAdd",
accelerator,
a, b, result
);
}
Backend Fallback Chain
Implement your own fallback logic:
var factory = app.Services.GetRequiredService<IUnifiedAcceleratorFactory>();
var devices = await factory.GetAvailableDevicesAsync();
// Priority order: CUDA > OpenCL > CPU
var device = devices.FirstOrDefault(d => d.DeviceType == "CUDA")
?? devices.FirstOrDefault(d => d.DeviceType == "OpenCL")
?? devices.First(); // CPU is always available
Console.WriteLine($"Selected device: {device.Name} ({device.DeviceType})");
using var accelerator = await factory.CreateAsync(device);
Querying Device Capabilities
var factory = app.Services.GetRequiredService<IUnifiedAcceleratorFactory>();
var devices = await factory.GetAvailableDevicesAsync();
foreach (var device in devices)
{
Console.WriteLine($"Device: {device.Name}");
Console.WriteLine($" Type: {device.DeviceType}");
Console.WriteLine($" Memory: {device.TotalMemory / (1024*1024*1024.0):F1} GB");
Console.WriteLine($" Compute Units: {device.MaxComputeUnits}");
Console.WriteLine();
}
Troubleshooting
CUDA Not Detected
# Check NVIDIA driver
nvidia-smi
# Check CUDA toolkit
nvcc --version
# WSL2: Set library path (CRITICAL for Windows developers)
export LD_LIBRARY_PATH="/usr/lib/wsl/lib:$LD_LIBRARY_PATH"
OpenCL Not Detected
# List OpenCL platforms (Linux)
clinfo
# Install OpenCL runtime (Intel)
sudo apt install intel-opencl-icd
Metal Not Available
Metal requires macOS 10.14+ or iOS 12+. Verify in code:
if (OperatingSystem.IsMacOS())
{
// Metal should be available
}
Performance Comparison
Typical performance relative to CPU baseline:
| Operation | CPU (SIMD) | CUDA | Metal | OpenCL |
|---|---|---|---|---|
| VectorAdd 1M | 3.7x | 92x | 75x | 60x |
| MatMul 1K×1K | 4x | 85x | 70x | 55x |
| Reduction 10M | 3x | 45x | 40x | 35x |
Note: Actual performance depends on hardware and workload.
Exercises
Exercise 1: Backend Enumeration
List all available backends and their properties on your system.
using Microsoft.Extensions.Hosting;
using Microsoft.Extensions.DependencyInjection;
using DotCompute.Runtime;
using DotCompute.Abstractions.Factories;
var host = Host.CreateApplicationBuilder(args);
host.Services.AddDotComputeRuntime();
var app = host.Build();
var factory = app.Services.GetRequiredService<IUnifiedAcceleratorFactory>();
var devices = await factory.GetAvailableDevicesAsync();
foreach (var device in devices)
{
Console.WriteLine($"{device.DeviceType}: {device.Name}");
Console.WriteLine($" Memory: {device.TotalMemory / 1e9:F2} GB");
}
Exercise 2: Performance Comparison
Run the same kernel on each available backend and compare execution times.
Exercise 3: Fallback Chain
Implement a solution that uses CUDA if available, falls back to CPU otherwise.
Key Takeaways
- DotCompute auto-selects the best available backend
- Use
IComputeOrchestratorfor kernel execution with automatic selection - Use
IUnifiedAcceleratorFactoryfor device discovery - CUDA provides highest performance on NVIDIA hardware (Production)
- CPU backend is always available for development and fallback (Production)
- Metal and OpenCL are experimental - use for testing only
Path Complete
Congratulations! You've completed the Beginner Learning Path.
What you learned:
- GPU computing fundamentals
- Writing kernels with
[Kernel]attribute - Memory management and transfers
- Backend selection and configuration
Next steps:
- Intermediate Path - Memory optimization and performance
- Ring Kernels Guide - Persistent GPU computation
- Examples - Real-world code samples