Class DotComputeMemorySettings

Namespace: Orleans.GpuBridge.Backends.DotCompute.Configuration

Assembly: Orleans.GpuBridge.Backends.DotCompute.dll

Memory management configuration settings for the DotCompute backend.

public class DotComputeMemorySettings

Inheritance: object

DotComputeMemorySettings

Inherited Members: object.Equals(object)

object.Equals(object, object)

object.GetHashCode()

object.GetType()

object.MemberwiseClone()

object.ReferenceEquals(object, object)

object.ToString()

Examples

// Configuration for large dataset processing
var memorySettings = new DotComputeMemorySettings
{
    EnableMemoryPooling = true,
    InitialPoolSize = 2L * 1024 * 1024 * 1024, // 2 GB
    MaxPoolSize = 16L * 1024 * 1024 * 1024,    // 16 GB
    AllocationAlignment = 4096,                 // 4KB alignment
    EnableDefragmentation = true,
    DefragmentationThreshold = 0.15             // Defrag at 15% fragmentation
};

// Configuration for memory-constrained environments
var constrainedSettings = new DotComputeMemorySettings
{
    EnableMemoryPooling = true,
    InitialPoolSize = 128 * 1024 * 1024,       // 128 MB
    MaxPoolSize = 1024 * 1024 * 1024,          // 1 GB
    EnableDefragmentation = true,
    DefragmentationThreshold = 0.10,           // Aggressive defrag at 10%
    PreferUnifiedMemory = false                // Avoid unified memory overhead
};

Remarks

This class provides comprehensive configuration options for GPU memory management in the DotCompute backend, including memory pooling, allocation strategies, defragmentation, and platform-specific optimizations.

Proper memory configuration is critical for GPU performance. Poor memory management can lead to: - Frequent allocations causing performance bottlenecks - Memory fragmentation reducing available memory - Excessive memory transfers between CPU and GPU - Out-of-memory errors in compute-intensive workloads

The default settings are optimized for typical GPU compute workloads with moderate memory usage patterns. Adjust settings based on your specific workload characteristics: - Large dataset processing: Increase pool sizes and alignment - Frequent small allocations: Enable pooling and defragmentation - Memory-constrained systems: Reduce pool sizes and enable aggressive cleanup - High-performance computing: Enable unified memory and pinned memory features

Properties

AllocationAlignment

Gets or sets the memory allocation alignment in bytes.

public int AllocationAlignment { get; set; }

Property Value

int: The alignment requirement in bytes. Default is 256 bytes. Must be a power of 2 and at least 1.

Examples

// High-performance configuration for float4 data
memorySettings.AllocationAlignment = 512;  // 512-byte alignment

// Memory-efficient configuration
memorySettings.AllocationAlignment = 128;  // 128-byte alignment

// Conservative configuration for mixed workloads
memorySettings.AllocationAlignment = 256;  // 256-byte alignment (default)

Remarks

Memory alignment ensures that allocated memory addresses meet hardware requirements for optimal performance. Different GPU architectures and data types have varying alignment requirements for maximum memory bandwidth utilization.

Alignment benefits: - Optimal memory access patterns and bandwidth - Efficient vectorized operations and coalesced access - Compatibility with hardware-specific optimizations - Reduced memory access penalties and cache misses

Platform-specific recommendations: - CUDA: 256-512 bytes (matches warp/coalescing requirements) - OpenCL: 128-256 bytes (platform dependent) - DirectCompute: 256 bytes (typical D3D11/12 alignment) - Metal: 256 bytes (typical Metal buffer alignment) - Vulkan: 256 bytes (typical Vulkan buffer alignment)

Data type considerations: - Single precision (float): 128-256 byte alignment - Double precision (double): 256-512 byte alignment - Mixed data types: Use largest required alignment - Structured data: Align to largest member + padding

Higher alignment values may waste some memory due to padding but can significantly improve performance for memory-intensive kernels.

Exceptions

ArgumentOutOfRangeException: Thrown when the value is not a power of 2 or is less than 1.

DefragmentationThreshold

Gets or sets the fragmentation threshold that triggers automatic defragmentation.

public double DefragmentationThreshold { get; set; }

Property Value

double: The fragmentation threshold as a percentage (0.0 to 1.0). Default is 0.25 (25%).

Remarks

This threshold determines when automatic defragmentation is triggered. The value represents the percentage of memory pool space that is fragmented (unusable for contiguous allocations due to fragmentation).

The fragmentation metric considers: - Free memory blocks too small for typical allocations - Scattered free regions preventing large allocations - Overall efficiency of memory space utilization - Impact on allocation success rates

Threshold selection guidelines: - Lower values (0.10-0.20): More frequent defrag, better memory efficiency - Medium values (0.20-0.30): Balanced performance and efficiency (recommended) - Higher values (0.30-0.50): Less frequent defrag, may impact large allocations - Very high values (>0.50): Minimal defrag, risk of allocation failures

Workload-specific recommendations: - Uniform allocation sizes: 0.30-0.40 (fragmentation less problematic) - Mixed allocation sizes: 0.20-0.30 (fragmentation more problematic) - Memory-constrained: 0.10-0.20 (aggressive defragmentation) - Performance-critical: 0.25-0.35 (balance efficiency and performance)

This setting is only effective when EnableDefragmentation is true.

Exceptions

ArgumentOutOfRangeException: Thrown when the value is less than 0.0 or greater than 1.0.

EnableDefragmentation

Gets or sets whether to enable automatic memory defragmentation.

public bool EnableDefragmentation { get; set; }

Property Value

bool: true to enable automatic defragmentation; otherwise, false. Default is true.

Remarks

Automatic defragmentation periodically reorganizes memory pool allocations to reduce fragmentation and maximize available contiguous memory blocks. This helps maintain allocation performance over time, especially for long-running applications.

Defragmentation benefits: - Maintains allocation performance over time - Reduces memory waste from fragmentation - Enables larger contiguous allocations - Improves memory utilization efficiency - Prevents gradual performance degradation

Defragmentation costs: - Temporary performance impact during defrag operations - Memory copying overhead for active allocations - Potential kernel execution delays during defrag - CPU processing time for fragmentation analysis

Defragmentation is triggered when fragmentation exceeds the threshold specified by DefragmentationThreshold. The operation is performed during idle periods when possible to minimize performance impact.

Recommended for most workloads, especially: - Long-running applications - Workloads with variable allocation sizes - Applications with frequent allocation/deallocation cycles - Memory-constrained environments

EnableMemoryPooling

Gets or sets whether to enable memory pooling for GPU allocations.

public bool EnableMemoryPooling { get; set; }

Property Value

bool: true to enable memory pooling; otherwise, false. Default is true.

Remarks

Memory pooling pre-allocates blocks of GPU memory and reuses them for subsequent allocations, significantly reducing allocation overhead and memory fragmentation. This is especially beneficial for workloads with frequent memory allocations.

Benefits of memory pooling: - Faster allocation and deallocation (10-100x speedup) - Reduced memory fragmentation - More predictable memory usage patterns - Better utilization of large memory blocks - Reduced GPU driver overhead

Considerations: - Initial memory overhead from pre-allocated pools - May hold memory longer than necessary - Requires tuning of pool sizes for optimal efficiency - Most effective for repeated allocation patterns

Recommended to enable for most production workloads unless memory is severely constrained or allocation patterns are highly irregular.

InitialPoolSize

Gets or sets the initial memory pool size per GPU device in bytes.

public long InitialPoolSize { get; set; }

Property Value

long: The initial pool size in bytes. Default is 512 MB (536,870,912 bytes). Must be a positive value.

Examples

// For 8GB GPU, allocate 1GB initially
memorySettings.InitialPoolSize = 1024 * 1024 * 1024; // 1 GB

// For memory-constrained 4GB GPU
memorySettings.InitialPoolSize = 256 * 1024 * 1024;  // 256 MB

Remarks

This setting determines the amount of GPU memory pre-allocated for the memory pool when the backend is initialized. The pool will grow up to MaxPoolSize as needed, but starts with this initial allocation.

Sizing considerations: - Should accommodate typical working set size - Larger initial size reduces early allocations and fragmentation - Too large may waste memory or cause initialization failures - Should be significantly smaller than total GPU memory - Consider multiple GPU scenarios (allocation per device)

Recommended sizing guidelines: - Development: 128-256 MB - Small workloads: 256-512 MB - Medium workloads: 512 MB - 2 GB - Large workloads: 2-8 GB (depending on GPU memory capacity) - Memory-constrained: 64-128 MB

This setting is only effective when EnableMemoryPooling is true.

Exceptions

ArgumentOutOfRangeException: Thrown when the value is less than or equal to zero, or exceeds MaxPoolSize.

MaxPoolSize

Gets or sets the maximum memory pool size per GPU device in bytes.

public long MaxPoolSize { get; set; }

Property Value

long: The maximum pool size in bytes. Default is 4 GB (4,294,967,296 bytes). Must be greater than or equal to InitialPoolSize.

Remarks

This setting defines the upper limit for memory pool growth. When the pool reaches this size, new allocations will either reuse existing pool memory or fall back to direct GPU memory allocation if pooling is insufficient.

The maximum pool size serves as a safety mechanism to: - Prevent unbounded memory growth - Reserve GPU memory for other applications - Avoid out-of-memory conditions - Maintain system stability under load

Sizing guidelines: - Should be 60-80% of total GPU memory for dedicated workloads - Should be 40-60% of total GPU memory for shared environments - Consider OS and driver memory overhead (typically 200-500 MB) - Account for concurrent applications using GPU memory - Leave buffer for temporary allocations and driver operations

Common configurations by GPU memory: - 4 GB GPU: 2-3 GB max pool size - 8 GB GPU: 5-7 GB max pool size - 16 GB GPU: 10-14 GB max pool size - 32+ GB GPU: 20-28 GB max pool size

This setting is only effective when EnableMemoryPooling is true.

Exceptions

ArgumentOutOfRangeException: Thrown when the value is less than InitialPoolSize.

PreferUnifiedMemory

Gets or sets whether to prefer unified memory when available on the platform.

public bool PreferUnifiedMemory { get; set; }

Property Value

bool: true to prefer unified memory; otherwise, false. Default is true.

Remarks

Unified memory (also known as managed memory) provides a single address space shared between CPU and GPU, allowing automatic data migration and simplified memory management. This feature is available on select platforms and hardware.

Unified memory benefits: - Simplified programming model (no explicit transfers) - Automatic data migration between CPU and GPU - Reduced memory duplication for shared data - Better memory utilization on memory-constrained systems - Easier debugging and memory profiling

Unified memory limitations: - Platform availability (CUDA 6.0+, some OpenCL implementations) - Performance overhead from automatic migration - Potential page faults during GPU execution - Limited control over data placement and timing - May not be optimal for all memory access patterns

Platform support: - CUDA: Unified Memory available on Pascal+ architectures (GTX 10xx+) - OpenCL: Limited support in OpenCL 2.0+ (implementation dependent) - DirectCompute: Not typically supported - Metal: Shared memory on Apple Silicon (M1/M2 series) - Vulkan: Limited through extensions (implementation dependent)

When unified memory is not available or not beneficial for the workload, the backend will fall back to discrete memory management with explicit transfers.

UsePinnedMemory

Gets or sets whether to use pinned (page-locked) memory for CPU-GPU transfers.

public bool UsePinnedMemory { get; set; }

Property Value

bool: true to use pinned memory; otherwise, false. Default is true.

Remarks

Pinned memory (also called page-locked or non-pageable memory) is CPU memory that is locked in physical RAM and cannot be swapped to disk. This enables faster and more efficient transfers between CPU and GPU memory.

Pinned memory benefits: - Significantly faster CPU-GPU transfer speeds (2-3x improvement) - Enables asynchronous memory transfers - Reduced CPU usage during memory operations - More predictable transfer performance - Better overlap of computation and communication

Pinned memory costs: - Consumes system RAM that cannot be swapped - Limited resource (typically 50-75% of system RAM) - Slower allocation/deallocation compared to pageable memory - May impact system performance if overused - Not suitable for very large or temporary buffers

Usage recommendations: - Enable for frequently transferred data - Use for streaming or pipeline workloads - Ideal for intermediate-sized buffers (MB to GB range) - Avoid for very large datasets that exceed system RAM - Consider system RAM capacity and other applications

Platform considerations: - CUDA: cudaMallocHost() for pinned allocations - OpenCL: CL_MEM_ALLOC_HOST_PTR flag - DirectCompute: D3D11_USAGE_STAGING with appropriate flags - Metal: Shared memory pools on macOS/iOS - Vulkan: Host-visible memory with coherent flag

Table of Contents

Class DotComputeMemorySettings

Examples

Remarks

Properties

AllocationAlignment

Property Value

Examples

Remarks

Exceptions

DefragmentationThreshold

Property Value

Remarks

Exceptions

EnableDefragmentation

Property Value

Remarks

EnableMemoryPooling

Property Value

Remarks

InitialPoolSize

Property Value

Examples

Remarks

Exceptions

MaxPoolSize

Property Value

Remarks

Exceptions

PreferUnifiedMemory

Property Value

Remarks

UsePinnedMemory

Property Value

Remarks