Table of Contents

Advanced Learning Path

Master advanced GPU programming patterns including Ring Kernels, synchronization primitives, multi-GPU coordination, and performance profiling.

Prerequisites

  • Completed Intermediate Path
  • Strong understanding of GPU memory and kernel optimization
  • Experience building multi-kernel pipelines

Learning Objectives

By completing this path, you will:

  1. Build persistent GPU computations with Ring Kernels
  2. Implement thread-safe synchronization patterns
  3. Scale applications across multiple GPUs
  4. Profile and optimize using GPU-native timing APIs

Modules

Module 1: Ring Kernel Fundamentals

Duration: 90-120 minutes

Learn persistent GPU computation with actor-style message passing.

Start Module 1 →

Module 2: Synchronization Patterns

Duration: 60-90 minutes

Master barriers, memory ordering, and multi-kernel coordination.

Start Module 2 →

Module 3: Multi-GPU Programming

Duration: 90-120 minutes

Scale applications across multiple GPUs with P2P transfers.

Start Module 3 →

Module 4: Performance Profiling

Duration: 60-90 minutes

Use GPU timing APIs for precise performance measurement and optimization.

Start Module 4 →

Completion Checklist

  • [ ] Create a Ring Kernel with message processing
  • [ ] Implement barrier-based synchronization
  • [ ] Configure P2P memory transfers
  • [ ] Profile kernel execution with GPU timestamps

Next Steps

After completing this path, continue to the Contributor Path to learn how to extend DotCompute or explore the comprehensive Ring Kernels Guide.


Estimated total duration: 6-8 hours