Table of Contents

Enum CudaKernelType

Namespace
DotCompute.Backends.CUDA.Types
Assembly
DotCompute.Backends.CUDA.dll

Categorizes CUDA kernels by their computational pattern. Used for optimization decisions and kernel fusion strategies.

public enum CudaKernelType

Fields

Custom = 4

Custom kernel implementation. User-defined kernel with arbitrary computation pattern. Optimization strategies must be determined case-by-case.

ElementWise = 0

Element-wise operation kernel. Each thread operates on independent data elements. Examples: vector addition, scalar multiplication, activation functions. High potential for fusion and memory bandwidth optimization.

Fused = 5

Fused kernel combining multiple operations. Result of kernel fusion optimization. Reduces memory traffic by combining multiple operations in a single pass.

MatrixMultiply = 1

Matrix multiplication kernel. Performs matrix-matrix or matrix-vector operations. Benefits from Tensor Core acceleration on supported hardware. Critical for deep learning and linear algebra workloads.

Reduction = 2

Reduction operation kernel. Combines multiple values into a single result. Examples: sum, max, min, dot product. Requires careful synchronization and often uses shared memory.

Transpose = 3

Matrix transpose kernel. Rearranges matrix data layout in memory. Memory bandwidth bound, benefits from coalesced access patterns. Often combined with other operations for efficiency.