Table of Contents

Ring Kernels Documentation

Ring Kernels are a revolutionary programming model enabling persistent GPU-resident computation with actor-style message passing. This section provides comprehensive documentation for developing, deploying, and optimizing Ring Kernel applications.

Getting Started

Document Description
Overview Introduction to Ring Kernels and their benefits
Architecture System architecture and design principles
Migration Guide Migrating to the unified Ring Kernel system

Core Concepts

Document Description
Telemetry Real-time GPU health monitoring with <1us latency
Messaging & Telemetry Message queue integration and telemetry patterns
MemoryPack Format Binary serialization format for GPU messages
Compilation Pipeline How Ring Kernels are compiled for GPU execution

Synchronization & Coordination

Document Description
Barriers Thread-block, grid, and warp barrier synchronization
Memory Ordering Causal consistency and memory fence operations
Phase 3: Coordination Multi-kernel coordination primitives
Phase 4: Temporal Causality Hybrid Logical Clocks and advanced coordination
Health Monitoring GPU health and failure detection

Advanced Topics

Document Description
Advanced Programming Complex patterns and production deployment

Examples

Document Description
VectorAdd Example Complete reference implementation
PageRank Example Distributed actor implementation of PageRank

Quick Reference

Key Features:

  • Zero kernel launch overhead after initial launch
  • Actor-style message passing on GPU
  • Sub-microsecond telemetry polling
  • Cross-kernel coordination
  • Hybrid Logical Clock support

Supported Backends:

  • CUDA (CC 5.0+)
  • Metal (Apple Silicon)
  • OpenCL 1.2+
  • CPU (fallback)

Ring Kernels v0.5.0 - Production Ready