Getting Started with DotCompute
This guide will help you install DotCompute and write your first GPU-accelerated kernel in under 10 minutes using correct API patterns for v0.4.1-rc2.
ð See Also: Working Reference Example for tested, working code examples.
Prerequisites
Required
- .NET 9.0 SDK or later (download)
- C# 13 compatible IDE (Visual Studio 2022 17.8+, VS Code, or Rider)
Optional (for GPU acceleration)
- NVIDIA GPU with Compute Capability 5.0+ for CUDA support
- CUDA Toolkit 12.0+ (download)
- Apple Silicon Mac (M1/M2/M3) for Metal support
- AMD GPU for OpenCL support (foundation complete)
Verification
Check your setup:
# Verify .NET version
dotnet --version
# Should show 9.0.0 or later
# Check for NVIDIA GPU (optional)
nvidia-smi
# Should show GPU details if CUDA available
# Check CUDA version (optional)
nvcc --version
# Should show CUDA 12.0+ if installed
Installation
1. Create a New Project
# Create console application
dotnet new console -n MyFirstKernel
cd MyFirstKernel
2. Install DotCompute Packages
# Core runtime (required)
dotnet add package DotCompute.Core
dotnet add package DotCompute.Abstractions
dotnet add package DotCompute.Memory
# CPU backend (always available)
dotnet add package DotCompute.Backends.CPU
# GPU backends (optional, install what you need)
dotnet add package DotCompute.Backends.CUDA # For NVIDIA GPUs
dotnet add package DotCompute.Backends.Metal # For Apple Silicon
# Source generators for [Kernel] attribute support
dotnet add package DotCompute.Generators
# Runtime services
dotnet add package DotCompute.Runtime
3. Enable C# 13 (if not already)
Edit your .csproj file:
<Project Sdk="Microsoft.NET.Sdk">
<PropertyGroup>
<OutputType>Exe</OutputType>
<TargetFramework>net9.0</TargetFramework>
<LangVersion>13.0</LangVersion>
<Nullable>enable</Nullable>
</PropertyGroup>
<!-- DotCompute packages -->
<ItemGroup>
<PackageReference Include="DotCompute.Core" Version="0.4.1-rc2" />
<PackageReference Include="DotCompute.Abstractions" Version="0.4.1-rc2" />
<PackageReference Include="DotCompute.Backends.CPU" Version="0.4.1-rc2" />
<PackageReference Include="DotCompute.Generators" Version="0.4.1-rc2" />
<PackageReference Include="DotCompute.Runtime" Version="0.4.1-rc2" />
</ItemGroup>
</Project>
Your First Kernel
Let's write a simple vector addition kernel that runs on both CPU and GPU.
Step 1: Write the Kernel
Create a file Kernels.cs:
using DotCompute;
using System;
namespace MyFirstKernel;
public static class Kernels
{
/// <summary>
/// Adds two vectors element-wise: result[i] = a[i] + b[i]
/// </summary>
[Kernel]
public static void VectorAdd(
ReadOnlySpan<float> a,
ReadOnlySpan<float> b,
Span<float> result)
{
// Get thread ID (works on both CPU and GPU)
int idx = Kernel.ThreadId.X;
// Bounds check (important for safety)
if (idx < result.Length)
{
result[idx] = a[idx] + b[idx];
}
}
}
Key Points:
[Kernel]attribute marks the method for GPU acceleration- Must be
staticand returnvoid - Use
ReadOnlySpan<T>for inputs,Span<T>for outputs Kernel.ThreadId.Xprovides thread indexing- Always include bounds checking
Step 2: Set Up Dependency Injection
Edit Program.cs:
using DotCompute;
using DotCompute.Abstractions;
using DotCompute.Abstractions.Interfaces;
using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.Hosting;
using DotCompute.Runtime;
namespace MyFirstKernel;
class Program
{
static async Task Main(string[] args)
{
// Build host with DotCompute services
var host = Host.CreateDefaultBuilder(args)
.ConfigureServices((context, services) =>
{
// Add DotCompute runtime (includes orchestration, memory, backends)
services.AddDotComputeRuntime();
})
.Build();
// Get orchestrator from DI container
var orchestrator = host.Services.GetRequiredService<IComputeOrchestrator>();
// Create input data
const int size = 1_000_000;
var a = Enumerable.Range(0, size).Select(i => (float)i).ToArray();
var b = Enumerable.Range(0, size).Select(i => (float)i * 2).ToArray();
var result = new float[size];
Console.WriteLine($"Adding {size:N0} elements...");
// Execute kernel (automatic backend selection)
var stopwatch = System.Diagnostics.Stopwatch.StartNew();
await orchestrator.ExecuteKernelAsync(
kernelName: "VectorAdd",
args: new object[] { a, b, result }
);
stopwatch.Stop();
// Verify results (first 5 elements)
Console.WriteLine("\nResults (first 5 elements):");
for (int i = 0; i < 5; i++)
{
Console.WriteLine($" result[{i}] = {result[i]} (expected: {a[i] + b[i]})");
}
Console.WriteLine($"\nExecution time: {stopwatch.Elapsed.TotalMilliseconds:F2}ms");
Console.WriteLine($"Throughput: {size / stopwatch.Elapsed.TotalSeconds / 1_000_000:F2}M elements/sec");
}
}
Step 3: Build and Run
# Build project (source generator runs automatically)
dotnet build
# Run application
dotnet run
Expected Output:
Adding 1,000,000 elements...
Results (first 5 elements):
result[0] = 0 (expected: 0)
result[1] = 3 (expected: 3)
result[2] = 6 (expected: 6)
result[3] = 9 (expected: 9)
result[4] = 12 (expected: 12)
Execution time: 2.34ms
Throughput: 427.35M elements/sec
Understanding What Happened
1. Source Generator Created Code
When you built the project, DotCompute.Generators automatically generated:
- CPU SIMD implementation using
Vector<T>for hardware acceleration - CUDA kernel (if CUDA backend installed)
- Metal shader (if Metal backend installed)
- Registration code to make the kernel discoverable at runtime
You can see the generated code in your IDE's solution explorer under "Dependencies â Analyzers â DotCompute.Generators".
2. Automatic Backend Selection
The orchestrator automatically selected the best backend:
- Small data (< 10,000 elements): CPU (no transfer overhead)
- Large data + GPU available: CUDA or Metal
- Large data + no GPU: CPU with SIMD
3. Zero-Copy on CPU
When running on CPU, Span<T> enables zero-copy access:
- No intermediate arrays allocated
- Direct memory access
- Minimal overhead (< 50Ξs)
Next Steps
Add More Complex Kernels
Try a matrix multiplication:
[Kernel]
public static void MatrixMultiply(
ReadOnlySpan<float> a,
ReadOnlySpan<float> b,
Span<float> result,
int width)
{
int row = Kernel.ThreadId.Y;
int col = Kernel.ThreadId.X;
if (row < width && col < width)
{
float sum = 0;
for (int k = 0; k < width; k++)
{
sum += a[row * width + k] * b[k * width + col];
}
result[row * width + col] = sum;
}
}
Force Specific Backend
// Force CUDA execution
await orchestrator.ExecuteAsync<object>(
kernelName: "VectorAdd",
preferredBackend: "CUDA",
args: new object[] { a, b, result }
);
Enable Performance Profiling
services.AddDotComputeRuntime(options =>
{
options.EnableTelemetry = true;
});
Enable Debug Validation
#if DEBUG
services.AddProductionDebugging(options =>
{
options.Profile = DebugProfile.Development;
options.ValidateAllExecutions = true;
});
#endif
Troubleshooting
"Kernel not found" Error
Problem: ExecuteKernelAsync throws "Kernel 'VectorAdd' not found"
Solution: Ensure the source generator ran:
- Clean and rebuild:
dotnet clean && dotnet build - Check that
DotCompute.Generatorspackage is installed - Verify kernel method is
staticwith[Kernel]attribute
CUDA "Device not found" Error
Problem: CUDA backend fails to initialize
Solution:
- Verify GPU:
nvidia-smi - Check CUDA version:
nvcc --version - Install CUDA Toolkit 12.0+
- Restart IDE after CUDA installation
Slow Performance on First Run
Observation: First execution is slower than subsequent runs
Explanation: This is normal:
- First run: Kernel compilation (~50-200ms)
- Subsequent runs: Cached kernel (< 1ms overhead)
- Memory pool warm-up (first allocation)
"Assembly could not be found" Error
Problem: Runtime error about missing DotCompute assemblies
Solution: Ensure all required packages are installed:
dotnet add package DotCompute.Core
dotnet add package DotCompute.Runtime
dotnet add package DotCompute.Backends.CPU
Common Patterns
Pattern 1: Async Kernel Execution
// Execute multiple kernels concurrently
var task1 = orchestrator.ExecuteKernelAsync("Kernel1", params1);
var task2 = orchestrator.ExecuteKernelAsync("Kernel2", params2);
await Task.WhenAll(task1, task2);
Pattern 2: Device-Specific Execution
var factory = host.Services.GetRequiredService<IUnifiedAcceleratorFactory>();
// Get specific device (e.g., CUDA)
var devices = await factory.GetAvailableDevicesAsync();
var cudaDevice = devices.FirstOrDefault(d => d.DeviceType == "CUDA");
if (cudaDevice != null)
{
// Create accelerator for this device
using var accelerator = await factory.CreateAsync(cudaDevice);
// Use orchestrator with specific accelerator
await orchestrator.ExecuteAsync<object>(
"MyKernel",
accelerator,
new object[] { inputData, outputData }
);
}
Pattern 3: Runtime Configuration
using DotCompute.Runtime.Configuration;
services.Configure<DotComputeRuntimeOptions>(options =>
{
options.ValidateCapabilities = true;
options.AcceleratorLifetime = ServiceLifetime.Transient;
});
services.AddDotComputeRuntime();
services.AddProductionOptimization(); // Enable ML-based backend selection
services.AddProductionDebugging(); // Enable cross-backend validation
Performance Tips
1. Use Appropriate Data Types
Span<T>for zero-copy CPU operationsfloatfor most GPU operations (single-precision is faster)doubleonly when precision is critical
2. Batch Operations
// â Bad: Multiple small kernel calls
for (int i = 0; i < 1000; i++)
{
await orchestrator.ExecuteKernelAsync("SmallKernel", smallData[i]);
}
// â
Good: Single large kernel call
await orchestrator.ExecuteKernelAsync("BatchKernel", allData);
3. Reuse Buffers
// Allocate once, reuse many times
var buffer = await memoryManager.AllocateAsync<float>(1_000_000);
for (int i = 0; i < iterations; i++)
{
await buffer.CopyFromAsync(inputData[i]);
await orchestrator.ExecuteKernelAsync("MyKernel", new { buffer });
await buffer.CopyToAsync(outputData[i]);
}
await buffer.DisposeAsync(); // Return to pool
IDE Integration
Visual Studio
DotCompute analyzers provide real-time feedback:
- Error Squiggles: DC001-DC012 diagnostics
- Quick Fixes: Lightbulb actions for common issues
- IntelliSense: Full support for generated code
VS Code
Install C# Dev Kit for full analyzer support:
# Install C# Dev Kit extension
code --install-extension ms-dotnettools.csdevkit
Example Projects
Console Application (Simple)
MyKernelApp/
âââ MyKernelApp.csproj
âââ Program.cs
âââ Kernels.cs
Web API with Compute (Advanced)
ComputeApi/
âââ ComputeApi.csproj
âââ Program.cs
âââ Controllers/
â âââ ComputeController.cs
âââ Kernels/
â âââ ImageProcessing.cs
â âââ DataAnalysis.cs
âââ appsettings.json
Further Reading
- Kernel Development Guide - Write efficient kernels
- Backend Selection Guide - Choose optimal backends
- Performance Tuning - Optimize execution
- Architecture Overview - Understand the system design
- API Reference - Complete API documentation
Getting Help
- Documentation: https://dotcompute.dev
- GitHub Issues: Report bugs or request features
- Discussions: Ask questions
Welcome to DotCompute! âĒ Write Once âĒ Run Anywhere âĒ Run Fast