Table of Contents

Namespace DotCompute.Backends.CUDA.Memory

Classes

CacheLine

Cache-line padded wrappers for atomic counters in lock-free queues.

CudaAsyncMemoryManagerAdapter

Adapter that wraps CudaMemoryManager for async operations. Bridges the CUDA memory manager with the unified memory interface.

CudaContextMemoryManager

CUDA context-specific memory manager wrapping CudaMemoryManager

CudaMemoryBuffer

Represents a CUDA memory buffer allocated on the GPU device.

CudaMemoryBuffer<T>

Represents a generic CUDA memory buffer allocated on the GPU device.

CudaMemoryManager

High-level analytics and cleanup helpers for CudaMemoryManager. These methods were previously exposed by a parallel facade (Integration/Components/CudaMemoryManager) that has since been removed as part of the v1.0 deduplication pass. They are implemented here so the single canonical CudaMemoryManager remains the only public CUDA memory-manager type.

CudaMemoryOrderingProvider

CUDA-specific implementation of memory ordering primitives.

CudaMemoryPoolManager

Manages memory pools for efficient allocation and reuse of CUDA memory. Reduces allocation overhead and memory fragmentation.

CudaMemoryPrefetcher

Manages memory prefetching for unified memory to optimize data movement. Uses cudaMemPrefetchAsync to proactively move data between host and device.

CudaPinnedMemoryAllocator

Manages pinned (page-locked) host memory for high-bandwidth transfers. Pinned memory provides up to 10x bandwidth improvement (20GB/s vs 2GB/s).

CudaRawMemoryBuffer

Raw untyped CUDA memory buffer for byte-level operations.

MemoryPoolStatistics

Statistics for memory pool usage.

OptimizedCudaMemoryPrefetcher

Advanced CUDA memory prefetcher with intelligent pattern recognition:

  • Predictive prefetching based on access patterns
  • Multi-level prefetch strategies (L1, L2, global memory)
  • Adaptive prefetch distance based on bandwidth utilization
  • NUMA-aware prefetching for multi-GPU systems
  • Asynchronous prefetch operations with minimal overhead
  • Cache pollution avoidance with smart eviction policies Target: 30-50% improvement in memory-bound kernel performance
PinnedMemoryStatistics

Statistics for pinned memory usage.

PoolSizeStatistics

Statistics for a specific pool size.

PrefetchRequest

Request for batch prefetch operation.

PrefetchStatistics

Statistics for prefetch operations.

PrefetcherConfiguration

Configuration for the memory prefetcher.

SimpleCudaUnifiedMemoryBuffer<T>

Simple CUDA unified memory buffer implementation for the memory adapter. This is a lightweight version that doesn't depend on CudaUnifiedMemoryManagerProduction.

Structs

PaddedInt

A 32-bit counter padded to its own cache line.

PaddedLong

A 64-bit counter padded to its own cache line to prevent false sharing. Pass ref counter.Value to Interlocked.* APIs.

PrefetcherStatistics

Performance statistics for the prefetcher.

Interfaces

IPinnedMemoryBuffer<T>

Interface for pinned memory buffers.

IPinnedMemoryRegistration

Interface for pinned memory registration.

IPooledMemoryBuffer

Interface for pooled memory buffers.

Enums

CacheLevel

An cache level enumeration.

CudaHostAllocFlags

Flags for pinned memory allocation.

CudaHostRegisterFlags

Flags for host memory registration.

MemoryAccessHint

An memory access hint enumeration.

MemoryAccessType

An memory access type enumeration.

PrefetchPriority

An prefetch priority enumeration.

PrefetchStrategy

An prefetch strategy enumeration.

PrefetchTarget

Target location for prefetch operation.