Struct RingKernelTelemetry
- Namespace
- DotCompute.Abstractions.RingKernels
- Assembly
- DotCompute.Abstractions.dll
Ring kernel telemetry data collected on the GPU and polled by the CPU. This struct is cache-line aligned (64 bytes) for optimal GPU memory access. All fields use atomic operations for thread-safe GPU updates.
[SuppressMessage("Design", "CA1815:Override equals and operator equals on value types", Justification = "Telemetry is mutable state container, not a value type for comparison")]
public struct RingKernelTelemetry
- Inherited Members
Remarks
Ring kernels run indefinitely in infinite loops, making traditional debugging impossible. Telemetry enables real-time monitoring of kernel health, message throughput, and latency.
Usage Pattern:
// GPU side (auto-injected by source generator):
telemetry[0].MessagesProcessed++;
telemetry[0].LastProcessedTimestamp = GetGpuTimestamp();
// CPU side (polling):
var telemetry = await runtime.GetTelemetryAsync(kernelId);
Console.WriteLine($"Throughput: {telemetry.MessagesProcessed / uptime} msg/s");
Performance:
- GPU update overhead: <50ns per message (atomic increment)
- CPU polling latency: <1μs (zero-copy pinned host memory)
- Memory footprint: 64 bytes per kernel
Constructors
RingKernelTelemetry()
Initializes a new instance of RingKernelTelemetry with default values. Sets MinLatencyNanos to ulong.MaxValue (will be updated on first message).
public RingKernelTelemetry()
Fields
ErrorCode
Last error code reported by the ring kernel (0 = no error). Custom error codes defined by application (e.g., 1 = OOM, 2 = invalid message).
public ushort ErrorCode
Field Value
Remarks
GPU kernel can set this field when encountering errors:
if (outOfMemory)
{
telemetry[0].ErrorCode = 1;
return; // Early exit
}
CPU can poll for errors:
if (telemetry.ErrorCode != 0)
{
logger.LogError($"Kernel error: {telemetry.ErrorCode}");
}
LastProcessedTimestamp
GPU timestamp (nanoseconds) of the last successfully processed message. Obtained from ITimingProvider.GetTimestampAsync() (Phase 1 timing API).
public long LastProcessedTimestamp
Field Value
Remarks
On CUDA Compute Capability 6.0+: 1ns resolution via globaltimer. On CUDA CC 5.0: 1μs resolution via CUDA events. On OpenCL/Metal: implementation-dependent resolution.
Use for stuck kernel detection: if (currentTime - LastProcessedTimestamp > timeout) { /* kernel stuck */ }
MaxLatencyNanos
Peak message processing latency observed in nanoseconds. Updated when a message's latency exceeds current max.
public ulong MaxLatencyNanos
Field Value
Remarks
Useful for detecting outliers and tail latency issues. High MaxLatencyNanos may indicate:
- GPU memory contention
- Complex message processing
- Context switches (if sharing GPU with other kernels)
MessagesDropped
Total number of messages dropped due to backpressure or validation failures. Incremented when queue is full (BackpressureStrategy.DropOldest/DropNew) or when messages fail validation before enqueuing to dead letter queue.
public ulong MessagesDropped
Field Value
MessagesProcessed
Total number of messages successfully processed since kernel launch. Updated atomically on GPU via atomic_add or Interlocked.Increment.
public ulong MessagesProcessed
Field Value
Remarks
Use this field to calculate throughput: MessagesProcessed / uptime. For stuck kernel detection: if this value doesn't change for N seconds, the kernel may be deadlocked or idle.
MinLatencyNanos
Minimum message processing latency observed in nanoseconds. Represents best-case performance under ideal conditions.
public ulong MinLatencyNanos
Field Value
Remarks
Compare with MaxLatencyNanos to understand latency variance. Large variance (MaxLatencyNanos / MinLatencyNanos > 10) suggests:
- Inconsistent message complexity
- GPU thermal throttling
- External system interference
QueueDepth
Current depth of the input message queue (number of pending messages). Updated on each kernel iteration to reflect queue size.
public int QueueDepth
Field Value
Remarks
Use for backpressure monitoring:
- Low values (<10% capacity): Kernel is keeping up with message rate
- High values (>80% capacity): Risk of queue overflow, consider scaling
- Full capacity: Backpressure strategy is actively dropping/rejecting messages
Reserved
Reserved for future expansion (maintains 64-byte alignment). Do not use in application code.
public ushort Reserved
Field Value
TotalLatencyNanos
Cumulative processing latency in nanoseconds across all processed messages. Sum of (dequeue timestamp - enqueue timestamp) for each message.
public ulong TotalLatencyNanos
Field Value
Remarks
Calculate average latency: TotalLatencyNanos / MessagesProcessed. For detailed P50/P99 metrics, enable TrackLatency attribute (Phase 2.2).
Properties
AverageLatencyNanos
Calculates the average message processing latency in nanoseconds. Returns 0 if no messages have been processed.
public readonly ulong AverageLatencyNanos { get; }
Property Value
Methods
GetLatencyVariance()
Calculates the latency variance (MaxLatency / MinLatency ratio). High variance (>10) indicates inconsistent performance.
public readonly double GetLatencyVariance()
Returns
- double
Variance ratio, or 0 if MinLatencyNanos is still at initial value.
GetThroughput(double)
Gets the current message throughput in messages per second. Requires uptime in seconds to calculate.
public readonly double GetThroughput(double uptimeSeconds)
Parameters
uptimeSecondsdoubleKernel uptime in seconds.
Returns
- double
Messages per second, or 0 if uptime is 0.
IsHealthy(long, long)
Indicates whether the kernel is healthy (processing messages and no errors). A kernel is considered stuck if it hasn't processed messages for too long.
public readonly bool IsHealthy(long currentTimestamp, long stuckThresholdNanos = 1000000000)
Parameters
currentTimestamplongCurrent GPU timestamp in nanoseconds.
stuckThresholdNanoslongThreshold in nanoseconds (default: 1 second).
Returns
- bool
True if kernel is healthy; false if stuck or errored.