Troubleshooting Guide
Common issues, diagnostic approaches, and solutions for DotCompute applications.
Quick Diagnostic Steps
When encountering an issue:
- Check Exception Message: Often contains specific error code and context
- Review Logs: Enable logging for detailed information
- Verify Installation: Ensure required backends are available
- Test with CPU Backend: Isolates GPU-specific issues
- Run Diagnostics: Use built-in diagnostic tools
// Enable detailed logging
services.AddLogging(builder =>
{
builder.AddConsole();
builder.SetMinimumLevel(LogLevel.Debug);
});
// Run diagnostics
var diagnostics = await orchestrator.RunDiagnosticsAsync();
Console.WriteLine(diagnostics.Summary);
Installation and Setup Issues
Issue: "DotCompute runtime not registered"
Symptom:
System.InvalidOperationException: Unable to resolve service for type 'DotCompute.Abstractions.IComputeOrchestrator'
Cause: AddDotComputeRuntime() not called
Solution:
var builder = Host.CreateDefaultBuilder(args)
.ConfigureServices(services =>
{
services.AddDotComputeRuntime(); // Add this
});
Issue: "No suitable backend found"
Symptom:
DotCompute.Exceptions.NoBackendAvailableException: No suitable backend found for execution
Cause: No backends available or all backends disabled
Diagnosis:
var devices = await orchestrator.GetAvailableDevicesAsync();
if (devices.Count == 0)
{
Console.WriteLine("No devices available");
}
else
{
foreach (var device in devices)
{
Console.WriteLine($"Device: {device.Name}, Type: {device.Type}");
}
}
Solutions:
- Enable CPU Fallback:
services.AddDotComputeRuntime(options =>
{
options.EnableCpuFallback = true; // Always have fallback
});
- Check Backend Installation:
# CUDA
nvidia-smi
nvcc --version
# Metal (macOS)
system_profiler SPDisplaysDataType
# OpenCL
clinfo
- Verify Package References:
<PackageReference Include="DotCompute.Core" Version="0.2.0-alpha" />
<PackageReference Include="DotCompute.Backends.CPU" Version="0.2.0-alpha" />
<PackageReference Include="DotCompute.Backends.CUDA" Version="0.2.0-alpha" />
Issue: "CUDA runtime not found"
Symptom:
DotCompute.Backends.CUDA.CudaException: CUDA runtime library not found
Cause: CUDA Toolkit not installed or not in PATH
Solutions:
- Install CUDA Toolkit:
# Ubuntu/Debian
sudo apt install nvidia-cuda-toolkit
# Windows
# Download from https://developer.nvidia.com/cuda-downloads
# Verify
nvidia-smi
- Set CUDA_HOME (if needed):
export CUDA_HOME=/usr/local/cuda
export PATH=$CUDA_HOME/bin:$PATH
export LD_LIBRARY_PATH=$CUDA_HOME/lib64:$LD_LIBRARY_PATH
- Check Compute Capability:
var cudaBackend = orchestrator.GetBackend(BackendType.CUDA);
var capability = await cudaBackend.GetComputeCapabilityAsync();
Console.WriteLine($"Compute Capability: {capability}");
// Minimum: 5.0
if (capability < 5.0)
{
Console.WriteLine("Warning: Old GPU, performance may be limited");
}
Compilation Issues
Issue: "Kernel compilation failed"
Symptom:
DotCompute.Exceptions.KernelCompilationException: Failed to compile kernel 'MyKernel'
NVRTC: identifier "xyz" is undefined
Cause: Syntax error in kernel code
Diagnosis:
try
{
await orchestrator.ExecuteKernelAsync("MyKernel", params);
}
catch (KernelCompilationException ex)
{
Console.WriteLine($"Compilation error: {ex.Message}");
Console.WriteLine($"Compiler output:\n{ex.CompilerOutput}");
}
Common Causes:
- Undefined Variables:
// ❌ Wrong
[Kernel]
public static void MyKernel(ReadOnlySpan<float> input, Span<float> output)
{
int idx = Kernel.ThreadId.X;
if (idx < length) // 'length' not defined
{
output[idx] = input[idx];
}
}
// ✅ Correct
[Kernel]
public static void MyKernel(ReadOnlySpan<float> input, Span<float> output)
{
int idx = Kernel.ThreadId.X;
if (idx < output.Length) // Use output.Length
{
output[idx] = input[idx];
}
}
- Unsupported Types:
// ❌ Wrong: strings not supported
[Kernel]
public static void MyKernel(string text) // Error!
// ✅ Correct: use supported types
[Kernel]
public static void MyKernel(ReadOnlySpan<byte> text)
- Missing Bounds Check:
// ❌ Diagnostic DC006: Missing bounds check
[Kernel]
public static void MyKernel(Span<float> output)
{
int idx = Kernel.ThreadId.X;
output[idx] = 0; // May be out of bounds!
}
// ✅ Correct
[Kernel]
public static void MyKernel(Span<float> output)
{
int idx = Kernel.ThreadId.X;
if (idx < output.Length)
{
output[idx] = 0;
}
}
Issue: Analyzer Warnings
Symptom: IDE shows diagnostic warnings DC001-DC012
Solutions: Use automated code fixes (Ctrl+. in Visual Studio)
| Diagnostic | Issue | Auto-Fix |
|---|---|---|
| DC001 | Method should be kernel | Add [Kernel] attribute |
| DC002 | Wrong return type | Change to void |
| DC003 | Instance method | Change to static |
| DC004 | Unsupported parameter type | Change to Span<T> or scalar |
| DC005 | Write to ReadOnlySpan | Change to Span<T> |
| DC006 | Missing bounds check | Add if (idx < length) |
| DC007 | Non-linear array access | Simplify indexing |
| DC008 | Async in kernel | Remove async/await |
| DC009 | Thread ID not used | Add thread indexing |
| DC010 | Incorrect threading pattern | Fix thread access |
| DC011 | Performance issue | Apply suggested optimization |
| DC012 | Missing documentation | Generate XML doc |
Runtime Errors
Issue: "Index out of range"
Symptom:
System.IndexOutOfRangeException: Index was outside the bounds of the array
at MyKernel execution
Cause: Thread index exceeds buffer size
Diagnosis:
// Enable debug validation
services.AddDotComputeRuntime()
.AddDevelopmentDebugging();
// Run kernel
await orchestrator.ExecuteKernelAsync("MyKernel", params);
// Debug service will report: "Thread 1024 accessed index 1000 in buffer of size 1000"
Solution: Always bounds-check
[Kernel]
public static void MyKernel(Span<float> output)
{
int idx = Kernel.ThreadId.X;
if (idx < output.Length) // Critical!
{
output[idx] = 0;
}
}
Issue: "Incorrect results"
Symptom: Kernel produces wrong output
Diagnosis: Use cross-backend validation
services.AddDotComputeRuntime()
.AddDevelopmentDebugging();
var result = await orchestrator.ExecuteKernelAsync("MyKernel", params);
// Automatically validates against CPU backend
// Reports differences if found
Common Causes:
- Race Conditions:
// ❌ Wrong: Data race
[Kernel]
public static void SumReduction(Span<float> data)
{
int idx = Kernel.ThreadId.X;
if (idx < data.Length)
{
data[0] += data[idx]; // Race! Multiple threads write to data[0]
}
}
// ✅ Correct: Use atomic operations or proper reduction
[Kernel]
public static void SumReduction(ReadOnlySpan<float> input, Span<float> output)
{
int idx = Kernel.ThreadId.X;
if (idx < input.Length)
{
// Each thread writes to unique location
output[idx] = input[idx];
}
// Separate reduction step needed
}
- Floating-Point Precision:
// Different backends may have different precision
var cpuResult = await ExecuteOnCpu(); // 3.14159265
var gpuResult = await ExecuteOnGpu(); // 3.14159274 (slightly different)
// Solution: Use tolerance in comparisons
Assert.Equal(cpuResult, gpuResult, precision: 5); // Compare to 5 decimal places
- Non-Deterministic Execution:
// Test determinism
services.AddDevelopmentDebugging();
var results = new List<float[]>();
for (int i = 0; i < 10; i++)
{
var result = new float[1000];
await orchestrator.ExecuteKernelAsync("MyKernel", new { output = result });
results.Add(result);
}
// Check if all runs produce same result
var determinismResult = await debugService.TestDeterminismAsync("MyKernel", params);
if (!determinismResult.IsDeterministic)
{
Console.WriteLine("Warning: Kernel has non-deterministic behavior");
}
Issue: "Intermittent failures"
Symptom: Kernel sometimes succeeds, sometimes fails
Causes and Solutions:
- Synchronization Issues:
// ✅ Always synchronize between dependent kernels
await orchestrator.ExecuteKernelAsync("Kernel1", params1);
await orchestrator.SynchronizeDeviceAsync(); // Critical!
await orchestrator.ExecuteKernelAsync("Kernel2", params2);
- Memory Not Initialized:
// ❌ Wrong: Uninitialized memory
var buffer = await memoryManager.AllocateAsync<float>(1000);
await orchestrator.ExecuteKernelAsync("Kernel", new { data = buffer });
// May contain garbage values
// ✅ Correct: Initialize memory
var buffer = await memoryManager.AllocateAsync<float>(1000);
await buffer.FillAsync(0.0f); // Clear to zero
await orchestrator.ExecuteKernelAsync("Kernel", new { data = buffer });
- Resource Contention:
// ✅ Limit concurrent executions
var semaphore = new SemaphoreSlim(maxConcurrency: 4);
await semaphore.WaitAsync();
try
{
await orchestrator.ExecuteKernelAsync("Kernel", params);
}
finally
{
semaphore.Release();
}
Memory Issues
Issue: "Out of memory"
Symptom:
DotCompute.Exceptions.OutOfMemoryException: Failed to allocate 4096 MB on device 0
Diagnosis:
var available = await memoryManager.GetAvailableMemoryAsync(deviceId: 0);
var total = await memoryManager.GetTotalMemoryAsync(deviceId: 0);
Console.WriteLine($"Available: {available / (1024 * 1024)} MB");
Console.WriteLine($"Total: {total / (1024 * 1024)} MB");
Console.WriteLine($"Used: {(total - available) / (1024 * 1024)} MB");
Solutions:
- Reduce Batch Size:
// Process in chunks
int maxElements = (int)(available / sizeof(float) * 0.8); // Use 80% of available
int chunkSize = Math.Min(requestedSize, maxElements);
for (int i = 0; i < data.Length; i += chunkSize)
{
var chunk = data[i..Math.Min(i + chunkSize, data.Length)];
await ProcessChunkAsync(chunk);
}
- Enable Memory Pooling:
services.AddDotComputeRuntime(options =>
{
options.MemoryPooling.Enabled = true;
options.MemoryPooling.TrimInterval = TimeSpan.FromMinutes(1); // Aggressive trimming
});
- Dispose Buffers Promptly:
// ✅ Use await using for automatic disposal
await using var buffer = await memoryManager.AllocateAsync<float>(size);
// Disposed immediately after scope exit
- Use Streaming:
// For datasets larger than GPU memory
var options = new ExecutionOptions
{
UseStreaming = true,
ChunkSize = 100_000_000 // 100M elements per chunk
};
await orchestrator.ExecuteKernelAsync("ProcessHugeDataset", params, options);
// Automatically streams data in chunks
Issue: "Memory leak"
Symptom: Memory usage grows over time
Diagnosis:
// Track active allocations
var initialStats = await memoryManager.GetPoolStatisticsAsync();
Console.WriteLine($"Active allocations: {initialStats.ActiveAllocations}");
// Run operations...
var finalStats = await memoryManager.GetPoolStatisticsAsync();
Console.WriteLine($"Active allocations: {finalStats.ActiveAllocations}");
if (finalStats.ActiveAllocations > initialStats.ActiveAllocations + 10)
{
Console.WriteLine("Warning: Possible memory leak");
}
Common Causes:
- Forgetting to Dispose:
// ❌ Leak
public async Task ProcessData()
{
var buffer = await memoryManager.AllocateAsync<float>(1000);
// Forgot to dispose
}
// ✅ No leak
public async Task ProcessData()
{
await using var buffer = await memoryManager.AllocateAsync<float>(1000);
// Disposed automatically
}
- Exception Before Disposal:
// ❌ Leak on exception
var buffer = await memoryManager.AllocateAsync<float>(1000);
// Exception thrown here
await buffer.DisposeAsync(); // Never reached
// ✅ Disposed even on exception
await using var buffer = await memoryManager.AllocateAsync<float>(1000);
// Always disposed
Performance Issues
Issue: "Slow execution"
Symptom: Kernel takes longer than expected
Diagnosis:
// Enable profiling
await orchestrator.EnableProfilingAsync();
var stopwatch = Stopwatch.StartNew();
await orchestrator.ExecuteKernelAsync("MyKernel", params);
stopwatch.Stop();
var profile = await orchestrator.GetProfileAsync("MyKernel");
Console.WriteLine($"Total time: {stopwatch.ElapsedMilliseconds}ms");
Console.WriteLine($"Compute time: {profile.ComputeTime}ms");
Console.WriteLine($"Transfer time: {profile.TransferTime}ms");
Console.WriteLine($"Overhead: {profile.OverheadTime}ms");
Common Causes:
- Transfer Overhead Dominates:
// If transfer_time > compute_time, problem is data movement
// ✅ Solution: Keep data on GPU
await orchestrator.ExecuteKernelAsync("Kernel1", params1);
// Don't transfer intermediate result to CPU
await orchestrator.ExecuteKernelAsync("Kernel2", params2); // Reuse GPU data
- Small Workload:
// GPU not efficient for small data
if (dataSize < 10_000)
{
// Use CPU instead
options.PreferredBackend = BackendType.CPU;
}
- CPU Backend Instead of GPU:
// Check which backend was used
var executionInfo = await orchestrator.GetLastExecutionInfoAsync();
Console.WriteLine($"Backend used: {executionInfo.BackendUsed}");
if (executionInfo.BackendUsed == BackendType.CPU && gpuAvailable)
{
// Force GPU
options.PreferredBackend = BackendType.CUDA;
options.EnableCpuFallback = false;
}
- Unoptimized Memory Access:
// ❌ Non-coalesced access (slow on GPU)
[Kernel]
public static void Transpose(ReadOnlySpan<float> input, Span<float> output, int width, int height)
{
int idx = Kernel.ThreadId.X;
int row = idx / width;
int col = idx % width;
output[col * height + row] = input[row * width + col]; // Strided access
}
// ✅ Tiled transpose (faster)
// See performance-tuning.md for optimized version
- Not Using SIMD on CPU:
// Enable SIMD intrinsics
services.AddDotComputeRuntime(options =>
{
options.CPU.EnableSIMD = true; // AVX2/AVX512
options.CPU.VectorWidth = 8; // 8 floats per vector (AVX2)
});
Issue: "Poor multi-GPU scaling"
Symptom: 2 GPUs only 1.3x faster than 1 GPU
Diagnosis:
var profile = await orchestrator.GetMultiGpuProfileAsync();
Console.WriteLine($"GPU 0 time: {profile.DeviceTimes[0]}ms");
Console.WriteLine($"GPU 1 time: {profile.DeviceTimes[1]}ms");
Console.WriteLine($"Transfer time: {profile.TransferTime}ms");
Console.WriteLine($"Sync time: {profile.SyncTime}ms");
// If transfer_time + sync_time > 50% of total: overhead problem
Solutions:
- Enable P2P:
if (await orchestrator.CanEnablePeerAccessAsync(0, 1))
{
await memoryManager.EnablePeerAccessAsync(0, 1);
// Direct GPU-GPU transfers (2x faster)
}
- Minimize Synchronization:
// ❌ Sync after every kernel
for (int i = 0; i < 100; i++)
{
await orchestrator.ExecuteKernelAsync("Kernel", params);
await orchestrator.SynchronizeAllDevicesAsync(); // Overhead!
}
// ✅ Batch and sync once
var tasks = new Task[100];
for (int i = 0; i < 100; i++)
{
tasks[i] = orchestrator.ExecuteKernelAsync("Kernel", params);
}
await Task.WhenAll(tasks);
await orchestrator.SynchronizeAllDevicesAsync();
- Use Dynamic Load Balancing:
var options = new ExecutionOptions
{
LoadBalancingStrategy = LoadBalancingStrategy.Dynamic
};
// Automatically distributes work based on GPU performance
Platform-Specific Issues
Windows
Issue: "CUDA driver version mismatch"
Solution:
# Update NVIDIA driver
# Download from https://www.nvidia.com/Download/index.aspx
# Or use GeForce Experience
Issue: "Visual Studio can't find DotCompute"
Solution:
- Rebuild solution: Ctrl+Shift+B
- Clean solution: Build → Clean Solution
- Delete
binandobjfolders - Restore packages:
dotnet restore
Linux
Issue: "libcuda.so.1: cannot open shared object file"
Solution:
# Install NVIDIA drivers
sudo apt install nvidia-driver-535 # Or latest version
# Add to library path
export LD_LIBRARY_PATH=/usr/lib/x86_64-linux-gnu:$LD_LIBRARY_PATH
# Or system-wide
sudo ldconfig
Issue: "Permission denied" when accessing GPU
Solution:
# Add user to video group
sudo usermod -a -G video $USER
# Logout and login again
macOS
Issue: "Metal backend not available"
Cause: Running on Intel Mac or macOS < 11.0
Solution:
- Apple Silicon required for Metal backend
- Use CPU backend on Intel Macs
Issue: "Kernel compilation slow on first run"
Cause: Metal shader compilation and caching
Solution:
- First run is slower (compiles and caches)
- Subsequent runs are fast (uses cached shaders)
- This is expected behavior
Debugging Tools
Built-in Diagnostics
// Run full system diagnostics
var diagnostics = await orchestrator.RunDiagnosticsAsync();
Console.WriteLine($"Status: {diagnostics.Status}");
Console.WriteLine($"Available backends: {string.Join(", ", diagnostics.AvailableBackends)}");
Console.WriteLine($"Memory available: {diagnostics.TotalAvailableMemory / (1024 * 1024)} MB");
Console.WriteLine($"Issues found: {diagnostics.Issues.Count}");
foreach (var issue in diagnostics.Issues)
{
Console.WriteLine($" {issue.Severity}: {issue.Message}");
Console.WriteLine($" Recommendation: {issue.Recommendation}");
}
Logging
services.AddLogging(builder =>
{
builder.AddConsole();
builder.AddDebug();
builder.SetMinimumLevel(LogLevel.Trace); // Verbose logging
});
// Log categories
builder.AddFilter("DotCompute.Core", LogLevel.Debug);
builder.AddFilter("DotCompute.Backends", LogLevel.Trace);
builder.AddFilter("DotCompute.Memory", LogLevel.Information);
Cross-Backend Validation
services.AddDotComputeRuntime()
.AddDevelopmentDebugging();
// Automatically validates GPU results against CPU
var result = await orchestrator.ExecuteKernelAsync("MyKernel", params);
// Check validation result
var validation = await debugService.GetLastValidationResultAsync();
if (!validation.IsValid)
{
Console.WriteLine($"Validation failed! Max difference: {validation.MaxDifference}");
Console.WriteLine($"First mismatch at index: {validation.FirstMismatchIndex}");
}
Performance Profiling
await orchestrator.EnableProfilingAsync();
// Execute kernels...
var profile = await orchestrator.GetProfileAsync();
Console.WriteLine($"Total executions: {profile.TotalExecutions}");
Console.WriteLine($"Average compute time: {profile.AverageComputeTime}ms");
Console.WriteLine($"Average transfer time: {profile.AverageTransferTime}ms");
Console.WriteLine($"Average overhead: {profile.AverageOverheadTime}ms");
// Detailed breakdown
foreach (var kernel in profile.KernelProfiles)
{
Console.WriteLine($"Kernel: {kernel.Name}");
Console.WriteLine($" Executions: {kernel.ExecutionCount}");
Console.WriteLine($" Average time: {kernel.AverageTime}ms");
Console.WriteLine($" Backend: {kernel.Backend}");
}
Getting Help
Collect Diagnostic Information
// Generate diagnostic report
var report = await orchestrator.GenerateDiagnosticReportAsync();
// Save to file
await File.WriteAllTextAsync("dotcompute-diagnostic.txt", report);
// Include in bug report:
// - DotCompute version
// - .NET version
// - OS and version
// - GPU model and driver version
// - Diagnostic report
// - Minimal reproduction code
Enable Debug Compilation
services.AddDotComputeRuntime(options =>
{
options.Debug.GenerateDebugInfo = true; // Include debug symbols
options.Debug.KeepIntermediateFiles = true; // Keep generated code
options.Debug.OutputDirectory = "./debug-output";
});
// Generated CUDA kernels saved to ./debug-output/*.cu
// Can inspect and debug manually
Community Support
- GitHub Issues: https://github.com/yourusername/DotCompute/issues
- Discussions: https://github.com/yourusername/DotCompute/discussions
- Documentation: https://dotcompute.dev/docs
When reporting issues, include:
- DotCompute version
- .NET version
- Operating system
- GPU model and driver version
- Diagnostic report
- Minimal code that reproduces the issue
- Expected vs actual behavior
Further Reading
- Debugging Guide - Detailed debugging techniques
- Performance Tuning - Optimization strategies
- Memory Management - Memory best practices
- Backend Selection - Choosing optimal backends
Troubleshooting • Diagnostics • Problem Solving • Production Ready