Ring Kernel Fundamentals
This module introduces Ring Kernels - persistent GPU-resident computations that use actor-style message passing for continuous processing.
What Are Ring Kernels?
Traditional GPU kernels:
- Launch, execute, terminate
- Overhead on each launch
- Stateless between invocations
Ring Kernels:
- Launch once, run continuously
- Zero overhead after initial launch
- Maintain state between messages
- Actor-style message processing
Ring Kernel Architecture
┌─────────────────────────────────────────────────────────┐
│ GPU Memory │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Input Queue │───▶│ Ring Kernel │───▶│Output Queue │ │
│ └─────────────┘ │ (Running) │ └─────────────┘ │
│ ▲ └─────────────┘ │ │
│ │ │ │
└─────────┼─────────────────────────────────────┼──────────┘
│ │
Host Enqueue Host Dequeue
Your First Ring Kernel
Step 1: Define the Kernel
using DotCompute.Generators.Kernel.Attributes;
using MemoryPack;
// Define message types
[MemoryPackable]
public partial struct VectorAddRequest : IRingKernelMessage
{
public uint MessageId { get; set; }
public ulong Timestamp { get; set; }
public int Index { get; set; }
public float ValueA { get; set; }
public float ValueB { get; set; }
}
[MemoryPackable]
public partial struct VectorAddResponse : IRingKernelMessage
{
public uint MessageId { get; set; }
public ulong Timestamp { get; set; }
public int Index { get; set; }
public float Result { get; set; }
}
// Define the Ring Kernel
public static partial class MyRingKernels
{
[RingKernel(
KernelId = "vector-add",
InputMessageType = typeof(VectorAddRequest),
OutputMessageType = typeof(VectorAddResponse),
Capacity = 1024,
ProcessingMode = RingProcessingMode.Continuous)]
public static VectorAddResponse ProcessVectorAdd(VectorAddRequest request)
{
return new VectorAddResponse
{
MessageId = request.MessageId,
Timestamp = request.Timestamp,
Index = request.Index,
Result = request.ValueA + request.ValueB
};
}
}
Step 2: Launch the Ring Kernel
var ringKernelService = provider.GetRequiredService<IRingKernelService>();
// Launch the Ring Kernel
var kernel = await ringKernelService.LaunchAsync<VectorAddRequest, VectorAddResponse>(
MyRingKernels.ProcessVectorAdd,
new RingKernelLaunchOptions
{
QueueCapacity = 1024,
BackpressureStrategy = BackpressureStrategy.Block
});
Console.WriteLine($"Ring Kernel launched: {kernel.KernelId}");
Step 3: Send and Receive Messages
// Send messages to the kernel
for (int i = 0; i < 1000; i++)
{
await kernel.SendAsync(new VectorAddRequest
{
MessageId = (uint)i,
Index = i,
ValueA = i * 1.0f,
ValueB = i * 2.0f
});
}
// Receive responses
int received = 0;
while (received < 1000)
{
if (await kernel.TryReceiveAsync(out var response))
{
Console.WriteLine($"Result[{response.Index}] = {response.Result}");
received++;
}
}
Step 4: Shutdown
// Graceful shutdown
await kernel.DeactivateAsync();
await kernel.TerminateAsync();
Processing Modes
Continuous Mode
Process messages as they arrive:
[RingKernel(ProcessingMode = RingProcessingMode.Continuous)]
public static Response Process(Request request)
{
// Called continuously while messages are available
return ProcessMessage(request);
}
Batch Mode
Process messages in batches:
[RingKernel(
ProcessingMode = RingProcessingMode.Batch,
MaxMessagesPerIteration = 100)]
public static Response Process(Request request)
{
// Called for each message in batch
return ProcessMessage(request);
}
Adaptive Mode
Automatically adjust between continuous and batch based on load:
[RingKernel(ProcessingMode = RingProcessingMode.Adaptive)]
public static Response Process(Request request)
{
// System optimizes processing strategy
return ProcessMessage(request);
}
Message Serialization
Ring Kernels use MemoryPack for efficient GPU-compatible serialization.
Message Requirements
[MemoryPackable]
public partial struct MyMessage : IRingKernelMessage
{
// Required fields
public uint MessageId { get; set; }
public ulong Timestamp { get; set; }
// Your data fields
public int CustomField1 { get; set; }
public float CustomField2 { get; set; }
// Arrays supported (fixed size recommended)
[MemoryPackInclude]
public float[] Data { get; set; }
}
Serialization Performance
| Data Size | Serialization | Deserialization |
|---|---|---|
| 32 bytes | ~100 ns | ~80 ns |
| 256 bytes | ~200 ns | ~150 ns |
| 1 KB | ~500 ns | ~400 ns |
Kernel-to-Kernel Communication
Ring Kernels can communicate directly on the GPU:
[RingKernel(
KernelId = "producer",
PublishesToKernels = new[] { "consumer" })]
public static ProducerMessage Produce(InputMessage input)
{
// Output goes to consumer kernel
return new ProducerMessage { Data = Process(input) };
}
[RingKernel(
KernelId = "consumer",
SubscribesToKernels = new[] { "producer" })]
public static OutputMessage Consume(ProducerMessage message)
{
// Receives from producer kernel
return new OutputMessage { Result = FinalProcess(message) };
}
Ring Kernel Lifecycle
┌───────────┐ ┌──────────┐ ┌──────────┐ ┌────────────┐
│ Launch │────▶│ Active │────▶│ Inactive │────▶│ Terminated │
└───────────┘ └──────────┘ └──────────┘ └────────────┘
│ ▲
│ Deactivate │
└─────────────────┘
Activate
States:
- Launched: Kernel code loaded, not processing
- Active: Processing messages continuously
- Inactive: Paused, can be reactivated
- Terminated: Resources freed
// Lifecycle control
await kernel.ActivateAsync(); // Start processing
await kernel.DeactivateAsync(); // Pause processing
await kernel.ActivateAsync(); // Resume processing
await kernel.TerminateAsync(); // Cleanup
Telemetry and Monitoring
// Get kernel metrics
var metrics = await kernel.GetMetricsAsync();
Console.WriteLine($"Messages processed: {metrics.MessagesProcessed}");
Console.WriteLine($"Messages pending: {metrics.MessagesPending}");
Console.WriteLine($"Average latency: {metrics.AverageLatencyMicroseconds} µs");
Console.WriteLine($"Throughput: {metrics.MessagesPerSecond} msg/s");
Error Handling
[RingKernel(KernelId = "robust-processor")]
public static OutputMessage ProcessWithErrorHandling(InputMessage input)
{
try
{
return new OutputMessage
{
Success = true,
Result = ProcessSafely(input)
};
}
catch
{
return new OutputMessage
{
Success = false,
ErrorCode = ErrorCodes.ProcessingFailed
};
}
}
Best Practices
1. Keep Messages Small
// GOOD: Small message with indices
[MemoryPackable]
public partial struct SmallMessage : IRingKernelMessage
{
public uint MessageId { get; set; }
public ulong Timestamp { get; set; }
public int DataIndex { get; set; } // Reference to larger data
}
// AVOID: Large embedded data
[MemoryPackable]
public partial struct LargeMessage : IRingKernelMessage
{
public uint MessageId { get; set; }
public ulong Timestamp { get; set; }
public float[] LargeArray { get; set; } // 10K elements
}
2. Use Appropriate Queue Capacity
// High throughput, bursty traffic
new RingKernelLaunchOptions { QueueCapacity = 4096 };
// Low latency, steady traffic
new RingKernelLaunchOptions { QueueCapacity = 256 };
3. Handle Backpressure
// Block when queue full (default)
new RingKernelLaunchOptions { BackpressureStrategy = BackpressureStrategy.Block };
// Drop oldest messages
new RingKernelLaunchOptions { BackpressureStrategy = BackpressureStrategy.DropOldest };
// Reject new messages
new RingKernelLaunchOptions { BackpressureStrategy = BackpressureStrategy.Reject };
Exercises
Exercise 1: Echo Kernel
Create a Ring Kernel that echoes messages back with a sequence number.
Exercise 2: Stateful Counter
Implement a Ring Kernel that maintains a running count across messages.
Exercise 3: Pipeline
Create a two-stage pipeline with producer and consumer Ring Kernels.
Key Takeaways
- Ring Kernels run persistently - zero launch overhead after initial setup
- Actor model on GPU - message-passing concurrency
- MemoryPack serialization - efficient GPU-compatible format
- Multiple processing modes - continuous, batch, or adaptive
- Full lifecycle control - launch, activate, deactivate, terminate
Next Module
Learn barrier synchronization and memory ordering for complex GPU coordination.
Further Reading
- Ring Kernels Guide - Comprehensive reference
- MemoryPack Format - Serialization details
- Telemetry API - Monitoring and metrics