Table of Contents

Class SimdHorizontalOperations

Namespace
DotCompute.Backends.CPU.Kernels.Simd
Assembly
DotCompute.Backends.CPU.dll

Horizontal operations for SIMD vectors - operations that combine elements within a single vector. Provides optimized horizontal sum, min, max, and product operations.

public static class SimdHorizontalOperations
Inheritance
SimdHorizontalOperations
Inherited Members

Methods

HorizontalMax(Vector128<float>)

Performs horizontal max of a 128-bit vector.

public static float HorizontalMax(Vector128<float> vector)

Parameters

vector Vector128<float>

Returns

float

HorizontalMax(Vector256<float>)

Performs horizontal max of a 256-bit vector.

public static float HorizontalMax(Vector256<float> vector)

Parameters

vector Vector256<float>

Returns

float

HorizontalMax(Vector512<float>)

Performs horizontal max of a 512-bit vector.

public static float HorizontalMax(Vector512<float> vector)

Parameters

vector Vector512<float>

Returns

float

HorizontalMaxNeon(Vector128<float>)

Performs horizontal max of a 128-bit float vector using ARM NEON.

public static float HorizontalMaxNeon(Vector128<float> vector)

Parameters

vector Vector128<float>

Returns

float

HorizontalMin(Vector128<float>)

Performs horizontal min of a 128-bit vector.

public static float HorizontalMin(Vector128<float> vector)

Parameters

vector Vector128<float>

Returns

float

HorizontalMin(Vector256<float>)

Performs horizontal min of a 256-bit vector.

public static float HorizontalMin(Vector256<float> vector)

Parameters

vector Vector256<float>

Returns

float

HorizontalMin(Vector512<float>)

Performs horizontal min of a 512-bit vector.

public static float HorizontalMin(Vector512<float> vector)

Parameters

vector Vector512<float>

Returns

float

HorizontalMinNeon(Vector128<float>)

Performs horizontal min of a 128-bit float vector using ARM NEON.

public static float HorizontalMinNeon(Vector128<float> vector)

Parameters

vector Vector128<float>

Returns

float

HorizontalProduct(Vector128<float>)

Performs horizontal product of a 128-bit vector.

public static float HorizontalProduct(Vector128<float> vector)

Parameters

vector Vector128<float>

Returns

float

HorizontalProduct(Vector256<float>)

Performs horizontal product of a 256-bit vector.

public static float HorizontalProduct(Vector256<float> vector)

Parameters

vector Vector256<float>

Returns

float

HorizontalProduct(Vector512<float>)

Performs horizontal product of a 512-bit vector.

public static float HorizontalProduct(Vector512<float> vector)

Parameters

vector Vector512<float>

Returns

float

HorizontalSum(Vector128<double>)

Performs horizontal sum of a 128-bit double vector.

public static double HorizontalSum(Vector128<double> vector)

Parameters

vector Vector128<double>

Returns

double

HorizontalSum(Vector128<int>)

Performs horizontal sum of a 128-bit integer vector.

public static int HorizontalSum(Vector128<int> vector)

Parameters

vector Vector128<int>

Returns

int

HorizontalSum(Vector128<float>)

Performs horizontal sum of a 128-bit vector.

public static float HorizontalSum(Vector128<float> vector)

Parameters

vector Vector128<float>

Returns

float

HorizontalSum(Vector256<double>)

Performs horizontal sum of a 256-bit double vector.

public static double HorizontalSum(Vector256<double> vector)

Parameters

vector Vector256<double>

Returns

double

HorizontalSum(Vector256<int>)

Performs horizontal sum of a 256-bit integer vector.

public static int HorizontalSum(Vector256<int> vector)

Parameters

vector Vector256<int>

Returns

int

HorizontalSum(Vector256<float>)

Performs horizontal sum of a 256-bit vector.

public static float HorizontalSum(Vector256<float> vector)

Parameters

vector Vector256<float>

Returns

float

HorizontalSum(Vector512<float>)

Performs horizontal sum of a 512-bit vector.

public static float HorizontalSum(Vector512<float> vector)

Parameters

vector Vector512<float>

Returns

float

HorizontalSumCrossPlatform(Vector128<float>)

Performs horizontal sum using the best available SIMD implementation for the current platform. Automatically selects AVX/SSE on x86 or NEON on ARM.

public static float HorizontalSumCrossPlatform(Vector128<float> vector)

Parameters

vector Vector128<float>

Returns

float

HorizontalSumNeon(Vector128<double>)

Performs horizontal sum of a 128-bit double vector using ARM NEON.

public static double HorizontalSumNeon(Vector128<double> vector)

Parameters

vector Vector128<double>

Returns

double

HorizontalSumNeon(Vector128<int>)

Performs horizontal sum of a 128-bit integer vector using ARM NEON.

public static int HorizontalSumNeon(Vector128<int> vector)

Parameters

vector Vector128<int>

Returns

int

HorizontalSumNeon(Vector128<float>)

Performs horizontal sum of a 128-bit float vector using ARM NEON. Uses Vector64 pairwise operations for maximum compatibility.

public static float HorizontalSumNeon(Vector128<float> vector)

Parameters

vector Vector128<float>

Returns

float

HorizontalSumPortable<T>(Vector<T>)

Performs horizontal sum using Vector<T> which auto-selects SIMD width. Portable across all platforms with automatic SIMD selection.

public static T HorizontalSumPortable<T>(Vector<T> vector) where T : struct, INumber<T>

Parameters

vector Vector<T>

Returns

T

Type Parameters

T

ParallelTreeReduction(float[])

Performs a parallel tree reduction on an array using SIMD operations. This is the most efficient way to reduce large arrays.

public static float ParallelTreeReduction(float[] data)

Parameters

data float[]

The input array to reduce.

Returns

float

The sum of all elements.

Remarks

Uses a three-phase approach for optimal performance:

  1. SIMD reduction phase: processes Vector256 chunks in parallel
  2. Horizontal reduction: combines SIMD results per thread
  3. Final reduction: combines thread results

ParallelTreeReductionDouble(double[])

Performs a parallel tree reduction for double precision.

public static double ParallelTreeReductionDouble(double[] data)

Parameters

data double[]

Returns

double

ParallelTreeReductionInt(int[])

Performs a parallel tree reduction for integers.

public static long ParallelTreeReductionInt(int[] data)

Parameters

data int[]

Returns

long