Struct RingKernelTelemetry

Namespace: DotCompute.Abstractions.RingKernels

Assembly: DotCompute.Abstractions.dll

Ring kernel telemetry data collected on the GPU and polled by the CPU. This struct is cache-line aligned (64 bytes) for optimal GPU memory access. All fields use atomic operations for thread-safe GPU updates.

[SuppressMessage("Design", "CA1815:Override equals and operator equals on value types", Justification = "Telemetry is mutable state container, not a value type for comparison")]
public struct RingKernelTelemetry

Inherited Members: ValueType.Equals(object)

ValueType.GetHashCode()

ValueType.ToString()

object.Equals(object, object)

object.GetType()

object.ReferenceEquals(object, object)

Remarks

Ring kernels run indefinitely in infinite loops, making traditional debugging impossible. Telemetry enables real-time monitoring of kernel health, message throughput, and latency.

Usage Pattern:

// GPU side (auto-injected by source generator):
telemetry[0].MessagesProcessed++;
telemetry[0].LastProcessedTimestamp = GetGpuTimestamp();

// CPU side (polling):
var telemetry = await runtime.GetTelemetryAsync(kernelId);
Console.WriteLine($"Throughput: {telemetry.MessagesProcessed / uptime} msg/s");

Performance:

GPU update overhead: <50ns per message (atomic increment)
CPU polling latency: <1μs (zero-copy pinned host memory)
Memory footprint: 64 bytes per kernel

Constructors

RingKernelTelemetry()

Initializes a new instance of RingKernelTelemetry with default values. Sets MinLatencyNanos to ulong.MaxValue (will be updated on first message).

public RingKernelTelemetry()

Fields

ErrorCode

Last error code reported by the ring kernel (0 = no error). Custom error codes defined by application (e.g., 1 = OOM, 2 = invalid message).

public ushort ErrorCode

Field Value

ushort

Remarks

GPU kernel can set this field when encountering errors:

if (outOfMemory)
{
    telemetry[0].ErrorCode = 1;
    return; // Early exit
}

CPU can poll for errors:

if (telemetry.ErrorCode != 0)
{
    logger.LogError($"Kernel error: {telemetry.ErrorCode}");
}

LastProcessedTimestamp

GPU timestamp (nanoseconds) of the last successfully processed message. Obtained from ITimingProvider.GetTimestampAsync() (Phase 1 timing API).

public long LastProcessedTimestamp

Field Value

long

Remarks

On CUDA Compute Capability 6.0+: 1ns resolution via globaltimer. On CUDA CC 5.0: 1μs resolution via CUDA events. On OpenCL/Metal: implementation-dependent resolution.

Use for stuck kernel detection: if (currentTime - LastProcessedTimestamp > timeout) { /* kernel stuck */ }

MaxLatencyNanos

Peak message processing latency observed in nanoseconds. Updated when a message's latency exceeds current max.

public ulong MaxLatencyNanos

Field Value

ulong

Remarks

Useful for detecting outliers and tail latency issues. High MaxLatencyNanos may indicate:

GPU memory contention
Complex message processing
Context switches (if sharing GPU with other kernels)

MessagesDropped

Total number of messages dropped due to backpressure or validation failures. Incremented when queue is full (BackpressureStrategy.DropOldest/DropNew) or when messages fail validation before enqueuing to dead letter queue.

public ulong MessagesDropped

Field Value

ulong

MessagesProcessed

Total number of messages successfully processed since kernel launch. Updated atomically on GPU via atomic_add or Interlocked.Increment.

public ulong MessagesProcessed

Field Value

ulong

Remarks

Use this field to calculate throughput: MessagesProcessed / uptime. For stuck kernel detection: if this value doesn't change for N seconds, the kernel may be deadlocked or idle.

MinLatencyNanos

Minimum message processing latency observed in nanoseconds. Represents best-case performance under ideal conditions.

public ulong MinLatencyNanos

Field Value

ulong

Remarks

Compare with MaxLatencyNanos to understand latency variance. Large variance (MaxLatencyNanos / MinLatencyNanos > 10) suggests:

Inconsistent message complexity
GPU thermal throttling
External system interference

QueueDepth

Current depth of the input message queue (number of pending messages). Updated on each kernel iteration to reflect queue size.

public int QueueDepth

Field Value

int

Remarks

Use for backpressure monitoring:

Low values (<10% capacity): Kernel is keeping up with message rate
High values (>80% capacity): Risk of queue overflow, consider scaling
Full capacity: Backpressure strategy is actively dropping/rejecting messages

Reserved

Reserved for future expansion (maintains 64-byte alignment). Do not use in application code.

public ushort Reserved

Field Value

ushort

TotalLatencyNanos

Cumulative processing latency in nanoseconds across all processed messages. Sum of (dequeue timestamp - enqueue timestamp) for each message.

public ulong TotalLatencyNanos

Field Value

ulong

Remarks

Calculate average latency: TotalLatencyNanos / MessagesProcessed. For detailed P50/P99 metrics, enable TrackLatency attribute (Phase 2.2).

Properties

AverageLatencyNanos

Calculates the average message processing latency in nanoseconds. Returns 0 if no messages have been processed.

public readonly ulong AverageLatencyNanos { get; }

Property Value

ulong

Methods

GetLatencyVariance()

Calculates the latency variance (MaxLatency / MinLatency ratio). High variance (>10) indicates inconsistent performance.

public readonly double GetLatencyVariance()

Returns

double: Variance ratio, or 0 if MinLatencyNanos is still at initial value.

GetThroughput(double)

Gets the current message throughput in messages per second. Requires uptime in seconds to calculate.

public readonly double GetThroughput(double uptimeSeconds)

Parameters

uptimeSeconds double: Kernel uptime in seconds.

Returns

double: Messages per second, or 0 if uptime is 0.

IsHealthy(long, long)

Indicates whether the kernel is healthy (processing messages and no errors). A kernel is considered stuck if it hasn't processed messages for too long.

public readonly bool IsHealthy(long currentTimestamp, long stuckThresholdNanos = 1000000000)

Parameters

currentTimestamp long: Current GPU timestamp in nanoseconds.
stuckThresholdNanos long: Threshold in nanoseconds (default: 1 second).

Returns

bool: True if kernel is healthy; false if stuck or errored.

Table of Contents

Struct RingKernelTelemetry

Remarks

Constructors

RingKernelTelemetry()

Fields

ErrorCode

Field Value

Remarks

LastProcessedTimestamp

Field Value

Remarks

MaxLatencyNanos

Field Value

Remarks

MessagesDropped

Field Value

MessagesProcessed

Field Value

Remarks

MinLatencyNanos

Field Value

Remarks

QueueDepth

Field Value

Remarks

Reserved

Field Value

TotalLatencyNanos

Field Value

Remarks

Properties

AverageLatencyNanos

Property Value

Methods

GetLatencyVariance()

Returns

GetThroughput(double)

Parameters

Returns

IsHealthy(long, long)

Parameters

Returns