validation#

C API#

_validation#

class onnx_extended.validation.cpu._validation.ElementTime#
onnx_extended.validation.cpu._validation.benchmark_cache(size: int, verbose: bool = True) float#

Runs a benchmark to measure the cache performance. The function measures the time for N random accesses in array of size N and returns the time divided by N. It copies random elements taken from the array size to random position in another of the same size. It does that size times and return the average time per move. See example Measuring CPU performance.

Parameters:

size – array size

Returns:

average time per move

onnx_extended.validation.cpu._validation.benchmark_cache_tree(n_rows: int = 100000, n_features: int = 50, n_trees: int = 200, tree_size: int = 4096, max_depth: int = 10, search_step: int = 64) List[onnx_extended.validation.cpu._validation.ElementTime]#

Simulates the prediction of a random forest. Returns the time taken by every rows for a function doing random addition between an element from the same short buffer and another one taken from a list of trees. See example Measuring CPU performance.

Parameters:
  • n_rows – number of rows of the whole batch size

  • n_features – number of features

  • n_trees – number of trees

  • tree_size – size of a tree (= number of nodes * sizeof(node) / sizeof(float))

  • max_depth – depth of a tree

  • search_step – evaluate every…

Returns:

array of time take for every row

onnx_extended.validation.cpu._validation.vector_add(v1: numpy.ndarray[numpy.float32], v2: numpy.ndarray[numpy.float32]) numpy.ndarray[numpy.float32]#

Computes the addition of 2 vectors of any dimensions. It assumes both vectors have the same dimensions (no broadcast).).

Parameters:
  • v1 – first vector

  • v2 – second vector

Returns:

new vector

onnx_extended.validation.cpu._validation.vector_sum(n_columns: int, values: List[float], by_rows: bool) float#

Computes the sum of all elements in an array by rows or by columns. This function is slower than vector_sum_array as this function copies the data from an array to a std::vector. This copy (and allocation) is bigger than the compution itself.

Parameters:
  • n_columns – number of columns

  • values – all values in an array

  • by_rows – by rows or by columns

Returns:

sum of all elements

onnx_extended.validation.cpu._validation.vector_sum_array(n_columns: int, values: numpy.ndarray[numpy.float32], by_rows: bool) float#

Computes the sum of all elements in an array by rows or by columns.

Parameters:
  • n_columns – number of columns

  • values – all values in an array

  • by_rows – by rows or by columns

Returns:

sum of all elements

onnx_extended.validation.cpu._validation.vector_sum_array_parallel(n_columns: int, values: numpy.ndarray[numpy.float32], by_rows: bool) float#

Computes the sum of all elements in an array by rows or by columns. The computation is parallelized.

Parameters:
  • n_columns – number of columns

  • values – all values in an array

  • by_rows – by rows or by columns

Returns:

sum of all elements

onnx_extended.validation.cpu._validation.vector_sum_array_avx(n_columns: int, values: numpy.ndarray[numpy.float32]) float#

Computes the sum of all elements in an array by rows or by columns. The computation uses AVX instructions (see AVX API).

Parameters:
  • n_columns – number of columns

  • values – all values in an array

Returns:

sum of all elements

onnx_extended.validation.cpu._validation.vector_sum_array_avx_parallel(n_columns: int, values: numpy.ndarray[numpy.float32]) float#

Computes the sum of all elements in an array by rows or by columns. The computation uses AVX instructions and parallelization (see AVX API).

Parameters:
  • n_columns – number of columns

  • values – all values in an array

Returns:

sum of all elements

cuda_example_py#

onnx_extended.validation.cuda.cuda_example_py.vector_add(v1: numpy.ndarray[numpy.float32], v2: numpy.ndarray[numpy.float32], cuda_device: int = 0) numpy.ndarray[numpy.float32]#

Computes the additions of two vectors of the same size with CUDA.

Parameters:
  • v1 – array

  • v2 – array

  • cuda_device – device id (if mulitple one)

Returns:

addition of the two arrays

onnx_extended.validation.cuda.cuda_example_py.vector_sum0(vect: numpy.ndarray[numpy.float32], max_threads: int = 256, cuda_device: int = 0) float#

Computes the sum of all coefficients with CUDA. Naive method.

Parameters:
  • vect – array

  • max_threads – number of threads to use (it must be a power of 2)

  • cuda_device – device id (if mulitple one)

Returns:

sum

onnx_extended.validation.cuda.cuda_example_py.vector_sum6(vect: numpy.ndarray[numpy.float32], max_threads: int = 256, cuda_device: int = 0) float#

Computes the sum of all coefficients with CUDA. More efficient method.

Parameters:
  • vect – array

  • max_threads – number of threads to use (it must be a power of 2)

  • cuda_device – device id (if mulitple one)

Returns:

sum

vector_function_cy#

onnx_extended.validation.cython.vector_function_cy.vector_add_c()#

Computes the addition of two tensors of the same shape.

Parameters:
  • v1 – first tensor

  • v2 – second tensor

Returns:

result.