Class AdvancedSimdKernels
- Namespace
- DotCompute.Backends.CPU.Kernels
- Assembly
- DotCompute.Backends.CPU.dll
Advanced SIMD kernel implementations with complete FMA, integer SIMD, enhanced ARM NEON, and modern vectorization techniques.
public static class AdvancedSimdKernels
- Inheritance
-
AdvancedSimdKernels
- Inherited Members
Methods
OptimizedMatrixMultiplyFloat32(float*, float*, float*, int, int, int)
Cache-friendly blocked matrix multiplication with FMA optimization. Essential for linear algebra workloads.
public static void OptimizedMatrixMultiplyFloat32(float* a, float* b, float* c, int m, int n, int k)
Parameters
VectorAddInt16(short*, short*, short*, long)
Vectorized 16-bit integer operations (common in image processing).
public static void VectorAddInt16(short* a, short* b, short* result, long elementCount)
Parameters
VectorAddInt32(int*, int*, int*, long)
Vectorized 32-bit integer addition with full SIMD support.
public static void VectorAddInt32(int* a, int* b, int* result, long elementCount)
Parameters
VectorAdvancedNeonFloat32(float*, float*, float*, float*, long, NeonOperation)
Comprehensive ARM NEON floating-point operations with full instruction coverage.
public static void VectorAdvancedNeonFloat32(float* a, float* b, float* c, float* result, long elementCount, NeonOperation operation)
Parameters
VectorConditionalSelect(float*, float*, float*, float*, long, float)
Conditional selection: result[i] = condition[i] ? a[i] : b[i] Uses SIMD masking to avoid branch divergence.
public static void VectorConditionalSelect(float* condition, float* a, float* b, float* result, long count, float threshold)
Parameters
VectorFmaFloat32(float*, float*, float*, float*, long)
Vectorized FMA operation: result = a * b + c using hardware FMA instructions. Essential for scientific computing with optimal precision and performance.
public static void VectorFmaFloat32(float* a, float* b, float* c, float* result, long elementCount)
Parameters
VectorFmaFloat64(double*, double*, double*, double*, long)
Double precision FMA operation.
public static void VectorFmaFloat64(double* a, double* b, double* c, double* result, long elementCount)
Parameters
VectorGatherFloat32(float*, int*, float*, int)
Gather operation: loads elements from memory using indices. Critical for sparse data and indirect memory access patterns.
public static void VectorGatherFloat32(float* basePtr, int* indices, float* result, int count)
Parameters
VectorHorizontalSum(float*, long)
Optimized horizontal sum reduction with SIMD.
public static float VectorHorizontalSum(float* data, long count)
Parameters
Returns
VectorMultiplyInt64(long*, long*, long*, long)
Vectorized 64-bit integer multiplication.
public static void VectorMultiplyInt64(long* a, long* b, long* result, long elementCount)
Parameters
VectorScatterFloat32(float*, int*, float*, int)
Scatter operation: stores elements to memory using indices.
public static void VectorScatterFloat32(float* values, int* indices, float* basePtr, int count)