Table of Contents

CUDA Programming Guide

Comprehensive guide for CUDA kernel development and optimization with DotCompute.

🚧 Documentation In Progress - CUDA programming guide is being developed.

Overview

DotCompute's CUDA backend supports:

  • NVIDIA GPUs with Compute Capability 5.0-8.9
  • CUDA 12.0+ toolkit
  • Both CUBIN and PTX compilation paths
  • NVRTC just-in-time compilation

CUDA Kernel Basics

Kernel Syntax

TODO: Document CUDA C kernel syntax

Thread Hierarchy

TODO: Explain thread blocks and grids

Memory Model

TODO: Document CUDA memory model:

  • Global memory
  • Shared memory
  • Local registers
  • Constant memory

Compute Capability

Detection and Targeting

TODO: Cover compute capability detection:

  • Architecture-specific optimization
  • Feature availability

Capability Levels

TODO: Document different compute capabilities:

  • CC 5.0 (Maxwell) features
  • CC 6.0-6.2 (Pascal) features
  • CC 7.0-7.5 (Volta/Turing) features
  • CC 8.0+ (Ampere/Ada) features

Optimization Techniques

Memory Coalescing

TODO: Explain global memory coalescing patterns

Shared Memory Usage

TODO: Document shared memory optimization:

  • Bank conflicts
  • Memory access patterns
  • Synchronization

Register Pressure

TODO: Cover register usage optimization

Occupancy

TODO: Explain occupancy calculations and optimization

Synchronization

Warp-Level Operations

TODO: Document warp-level primitives:

  • Shuffle operations
  • Cooperative groups

Block-Level Synchronization

TODO: Explain block-level barriers

Global Synchronization

TODO: Cover global synchronization patterns

Advanced Features

Cooperative Groups

TODO: Document cooperative groups usage

Dynamic Parallelism

TODO: Explain dynamic kernel launches

Atomic Operations

TODO: Cover atomic memory operations

Debugging

CUDA Debugging Tools

TODO: List debugging tools:

  • cuda-gdb
  • Nsight Visual Studio Code Extension
  • Compute Sanitizer

Performance Profiling

Nsight Tools

TODO: Document profiling tools:

  • Nsight Systems
  • Nsight Compute

Examples

TODO: Provide CUDA kernel examples

See Also