Table of Contents

Class RingKernelLaunchOptions

Namespace
DotCompute.Abstractions.RingKernels
Assembly
DotCompute.Abstractions.dll

Configuration options for launching a ring kernel.

public sealed class RingKernelLaunchOptions
Inheritance
RingKernelLaunchOptions
Inherited Members

Remarks

Ring kernels are persistent GPU kernels that process messages from input queues and produce results in output queues. This class provides comprehensive configuration for queue sizing, deduplication, backpressure, and performance tuning.

Default Values

  • QueueCapacity: 4096 messages (optimized for high throughput)
  • DeduplicationWindowSize: 1024 messages (maximum validated size)
  • BackpressureStrategy: Block (wait for space)
  • EnablePriorityQueue: false (FIFO ordering)

Fields

DefaultDeduplicationWindowSize

Default deduplication window size (1024 messages).

public const int DefaultDeduplicationWindowSize = 1024

Field Value

int

Remarks

1024 is the maximum validated size that balances:

  • Memory usage: ~32KB deduplication cache per queue
  • Coverage: Detects duplicates within last 1024 messages
  • Performance: O(1) lookup via hash table

DefaultQueueCapacity

Default queue capacity for ring kernel message queues (4096 messages).

public const int DefaultQueueCapacity = 4096

Field Value

int

Remarks

4096 provides a good balance between memory usage and throughput:

  • Memory per queue: ~128KB for IRingKernelMessage types
  • Supports 2M+ messages/s throughput with 100-500ns latency
  • Power-of-2 for optimal modulo operations

Properties

BackpressureStrategy

Gets or sets the backpressure strategy when queues are full.

public BackpressureStrategy BackpressureStrategy { get; set; }

Property Value

BackpressureStrategy

The backpressure strategy. Default is Block.

Remarks

Strategy Comparison

StrategyBehavior
BlockWait for space (best for guaranteed delivery)
RejectReturn false immediately (best for latency-sensitive)
DropOldestOverwrite oldest message (best for real-time streams)
DropNewDiscard new message (best for preserving historical data)

Production Recommendation: Use Block for Orleans.GpuBridge to ensure actor requests are not lost during GPU computation.

DeduplicationWindowSize

Gets or sets the number of recent messages to check for duplicates.

public int DeduplicationWindowSize { get; set; }

Property Value

int

The deduplication window size in messages. Default is 1024. Valid range: 16-1024 (enforced by MessageQueueOptions.Validate()).

Remarks

Deduplication Behavior

  • Messages with duplicate MessageId within window are rejected
  • Implemented via circular buffer hash table (O(1) lookup)
  • Window size affects memory: ~32 bytes × window size per queue

Sizing Trade-offs

  • Smaller window (16-256): Lower memory, faster duplicates may pass
  • Larger window (512-1024): Higher memory, better duplicate detection

Note: Deduplication window size is clamped to QueueCapacity if QueueCapacity < 1024. For high-capacity queues (>1024), deduplication covers the most recent 1024 messages.

EnablePriorityQueue

Gets or sets whether to use priority-based message ordering.

public bool EnablePriorityQueue { get; set; }

Property Value

bool

true to enable priority queue; false for FIFO. Default is false.

Remarks

Priority Queue Behavior

  • Messages dequeued in priority order (0 = highest, 255 = lowest)
  • Same-priority messages dequeued in FIFO order
  • Slight performance overhead: ~10-20% vs FIFO

Use Cases

  • Enable: Critical actor requests need priority over batch operations
  • Disable: Uniform priority, maximize throughput

QueueCapacity

Gets or sets the maximum number of messages each queue can hold.

public int QueueCapacity { get; set; }

Property Value

int

The queue capacity in messages. Default is 4096. Must be a power of 2 for optimal performance (16, 32, 64, ..., 65536).

Remarks

Sizing Guidelines

  • Low latency (sub-microsecond): 256-1024
  • Balanced (production default): 4096
  • High throughput (batch processing): 16384-65536

Larger queues consume more memory but provide better burst handling. Memory usage: ~32 bytes × capacity for IRingKernelMessage types.

StreamPriority

Gets or sets the CUDA stream priority for Ring Kernel execution.

public RingKernelStreamPriority StreamPriority { get; set; }

Property Value

RingKernelStreamPriority

The stream priority level. Default is Normal.

Remarks

Stream Priority Behavior

  • High: GPU scheduler prioritizes this kernel for low-latency responses (use for critical operations)
  • Normal: Default priority for typical workloads
  • Low: Deprioritized for background processing that can tolerate higher latency

Use Cases

  • High: Actor request processing, real-time data streams, latency-sensitive operations
  • Normal: General purpose computation, balanced workloads
  • Low: Batch processing, background analytics, non-critical tasks

Note: Stream priority affects GPU scheduling but does not guarantee execution order. Higher priority streams get preferential access to GPU resources when multiple streams compete.

Methods

HighThroughputDefaults()

Creates a new instance optimized for high-throughput batch processing.

public static RingKernelLaunchOptions HighThroughputDefaults()

Returns

RingKernelLaunchOptions

A new RingKernelLaunchOptions with high-throughput defaults.

Remarks

High-Throughput Defaults

  • QueueCapacity: 16384 (large burst buffer)
  • DeduplicationWindowSize: 1024 (maximum window)
  • BackpressureStrategy: Block (no loss)
  • EnablePriorityQueue: false (maximize throughput)

Use for batch data processing where high memory usage is acceptable for throughput gains.

LowLatencyDefaults()

Creates a new instance optimized for low-latency scenarios (sub-microsecond).

public static RingKernelLaunchOptions LowLatencyDefaults()

Returns

RingKernelLaunchOptions

A new RingKernelLaunchOptions with low-latency defaults.

Remarks

Low-Latency Defaults

  • QueueCapacity: 256 (minimal memory footprint)
  • DeduplicationWindowSize: 256 (proportional to capacity)
  • BackpressureStrategy: Reject (fail-fast)
  • EnablePriorityQueue: false (FIFO is fastest)

Use for latency-critical applications where queue full = temporary backoff is acceptable.

ProductionDefaults()

Creates a new instance with default values optimized for Orleans.GpuBridge production use.

public static RingKernelLaunchOptions ProductionDefaults()

Returns

RingKernelLaunchOptions

A new RingKernelLaunchOptions with production defaults.

Remarks

Production Defaults

  • QueueCapacity: 4096 (handles burst traffic, 2M+ msg/s)
  • DeduplicationWindowSize: 1024 (covers recent messages)
  • BackpressureStrategy: Block (no message loss)
  • EnablePriorityQueue: false (maximize throughput)

These defaults are validated for: - 100-500ns latency targets - 2M+ messages/s throughput - Sub-10ms startup times - RTX 2000 Ada GPU (CC 8.9)

ToMessageQueueOptions()

Creates a MessageQueueOptions instance from these launch options.

public MessageQueueOptions ToMessageQueueOptions()

Returns

MessageQueueOptions

A new MessageQueueOptions with values from this instance.

Remarks

This method is used internally by ring kernel runtimes to create message queues with the configured options. It ensures consistent translation from launch options to queue options.

Validate()

Validates the launch options and throws if any values are invalid.

public void Validate()

Remarks

This method performs comprehensive validation before kernel launch:

  1. Queue Capacity: 16 ≤ capacity ≤ 1M, power-of-2
  2. Deduplication Window: 16 ≤ window ≤ 1024
  3. Consistency: Window ≤ capacity (auto-clamped)

Note: DeduplicationWindowSize is automatically clamped to QueueCapacity if QueueCapacity < DeduplicationWindowSize. This ensures smaller queues have proportional deduplication windows.

Exceptions

ArgumentOutOfRangeException

Thrown if:

  • QueueCapacity is less than 16 or greater than 1048576 (1M)
  • QueueCapacity is not a power of 2
  • DeduplicationWindowSize is less than 16 or greater than 1024