Enum CudaKernelType
- Namespace
- DotCompute.Backends.CUDA.Types
- Assembly
- DotCompute.Backends.CUDA.dll
Categorizes CUDA kernels by their computational pattern. Used for optimization decisions and kernel fusion strategies.
public enum CudaKernelType
Fields
Custom = 4Custom kernel implementation. User-defined kernel with arbitrary computation pattern. Optimization strategies must be determined case-by-case.
ElementWise = 0Element-wise operation kernel. Each thread operates on independent data elements. Examples: vector addition, scalar multiplication, activation functions. High potential for fusion and memory bandwidth optimization.
Fused = 5Fused kernel combining multiple operations. Result of kernel fusion optimization. Reduces memory traffic by combining multiple operations in a single pass.
MatrixMultiply = 1Matrix multiplication kernel. Performs matrix-matrix or matrix-vector operations. Benefits from Tensor Core acceleration on supported hardware. Critical for deep learning and linear algebra workloads.
Reduction = 2Reduction operation kernel. Combines multiple values into a single result. Examples: sum, max, min, dot product. Requires careful synchronization and often uses shared memory.
Transpose = 3Matrix transpose kernel. Rearranges matrix data layout in memory. Memory bandwidth bound, benefits from coalesced access patterns. Often combined with other operations for efficiency.