module testing.einsum.einsum_fct#

Inheritance diagram of mlprodict.testing.einsum.einsum_fct

Short summary#

module mlprodict.testing.einsum.einsum_fct

Main functions decomposing einsum computation into more simple functions.

source on GitHub

Classes#

class

truncated documentation

CachedEinsum

Stores all the necessary information to cache the preprocessing of a an einsum equation.

Functions#

function

truncated documentation

_einsum

einsum

Proposes a new implementation of numpy.einsum. It does not allow expresion using and expects a right …

enumerate_cached_einsum

Enumerates all cached einsum function.

optimize_decompose_einsum_equation

Proposes a new implementation of numpy.einsum. It does not allow expresion using and expects a right …

Static Methods#

staticmethod

truncated documentation

build_einsum

Creates an instance of CachedEinsum.

Methods#

method

truncated documentation

__call__

Calls the runtime self.runtime_.

__init__

__repr__

usual

_build_optimize

_build_optimize_ml

build

Preprocesses the equation builds whatever is necessary to compute the result of the einsum equation.

build_onnx_einsum

Builds an ONNX graph with a single einsum operator.

build_runtime

Builds the runtime associated to the equation self.equation_.

default_inputs

Returns default inputs (reshaped numpy.arange + 0.7i).

Documentation#

Main functions decomposing einsum computation into more simple functions.

source on GitHub

class mlprodict.testing.einsum.einsum_fct.CachedEinsum(equation, runtime='batch_dot', opset=None, optimize=False, dtype=<class 'numpy.float64'>, decompose=True, strategy=None, verbose=None, key=None)#

Bases: object

Stores all the necessary information to cache the preprocessing of a an einsum equation.

Parameters
  • equation – numpy equation

  • runtime – see einsum

  • opset – ONNX opset

  • optimize – finds the best letter permutation

  • dtype – dtype

  • decompose – to decompose Einsum operator or to keep it as is

  • key – key used to cache this class

  • strategy – optimization strategy

  • verbose – displays progress information

The class creates the following attributes:

  • equation_ corresponding to the best equivalent equation

  • graph_: the corresponding graph returned by function

    :func:`decompose_einsum_equation <mlprodict.testing.einsum.einsum_impl.decompose_einsum_equation> `

  • onnx_: if a conversion to onnx is used, stores the onnx graph

  • runtime_: a function used by __call__, calls the runtime

source on GitHub

__call__(*inputs)#

Calls the runtime self.runtime_.

source on GitHub

__init__(equation, runtime='batch_dot', opset=None, optimize=False, dtype=<class 'numpy.float64'>, decompose=True, strategy=None, verbose=None, key=None)#
__repr__()#

usual

_build_optimize()#
_build_optimize_ml()#
build()#

Preprocesses the equation builds whatever is necessary to compute the result of the einsum equation.

source on GitHub

static build_einsum(equation, runtime, opset, optimize, dtype, decompose=True, strategy=None, verbose=None, key=None)#

Creates an instance of CachedEinsum.

source on GitHub

build_onnx_einsum(input_names)#

Builds an ONNX graph with a single einsum operator.

source on GitHub

build_runtime()#

Builds the runtime associated to the equation self.equation_.

source on GitHub

default_inputs(N=None)#

Returns default inputs (reshaped numpy.arange + 0.7i).

Parameters

N – dimension (all dimension have the same size)

If N is None, N is given a size depending on the number of letters to avoid spending too much time on optimization.

source on GitHub

mlprodict.testing.einsum.einsum_fct._einsum(equation, dtype, optimize=False, runtime='batch_dot', cache=True, opset=None, decompose=True, strategy=None, verbose=None)#
mlprodict.testing.einsum.einsum_fct.einsum(equation, *inputs, optimize=False, runtime='batch_dot', cache=True, opset=None, decompose=True, strategy=None, verbose=None)#

Proposes a new implementation of numpy.einsum. It does not allow expresion using and expects a right member.

Parameters
  • equation – einsum equation

  • inputs – inputs

  • optimize – permutes all letters to find the best permutation

  • runtime – runtime used to compute the results once the computation graph is produced (see below)

  • cache – if True, the function stores the preprocessing done for a specific equation, the second call with the same equation is much faster

  • opset – ONNX opset to use for some runtimes

  • decompose – by default, the function decomposes the equation into more simple operators but it can keep the original ONNX einsum operator.

  • strategy – optimisation strategy (see below)

  • verbose – display progress if optimize is True

Returns

einsum result

The available runtimes are:

  • batch_dot: the runtime is apply_einsum_sequence,

  • python: one ONNX graph executed with a python runtime,

  • onnxruntime1: one ONNX graph executed with onnxruntime.

The optimisation strategy can be:

  • None: the same runtime is used to find the best permutation of letters

  • ‘ml’: a machine learned model is used to predict the

    best permutation of letters, this model comes from notebook Infer operator computation cost.

The function works in two steps:

  • first step analyses the equation to produce a computation graph, this graph can also be converted into ONNX,

  • second step runs the graph whatever the graph is.

Further details are available in the documentation of function optimize_decompose_einsum_equation. The function works the same way as numpy.einsum:

<<<

import numpy
from mlprodict.testing.einsum import einsum

equation = "abc,cd->abd"

m1 = numpy.random.randn(2, 2, 2)
m2 = numpy.random.randn(2, 2)

np = numpy.einsum(equation, m1, m2)
print('numpy.einsum')
print(np)

print('mlprodict.testing.einsum')
mp = einsum(equation, m1, m2)
print(mp)

>>>

    numpy.einsum
    [[[-2.499  0.046]
      [-0.885 -0.078]]
    
     [[ 2.338 -0.146]
      [-1.562 -0.154]]]
    mlprodict.testing.einsum
    [[[-2.499  0.046]
      [-0.885 -0.078]]
    
     [[ 2.338 -0.146]
      [-1.562 -0.154]]]

In some case, the einsum implementation can be optimized by looping on possible permutation:

<<<

import timeit
import numpy
from mlprodict.testing.einsum import einsum
from mlprodict.testing.einsum.einsum_fct import enumerate_cached_einsum

equation = "cab,cd->ad"

m1 = numpy.random.randn(20, 20, 20)
m2 = numpy.random.randn(20, 20)

print('numpy.einsum',
      timeit.timeit('numpy.einsum(equation, m1, m2)',
                    number=200,
                    globals=globals()))

einsum(equation, m1, m2)
print('einsum',
      timeit.timeit('einsum(equation, m1, m2)',
                    number=200,
                    globals=globals()))

einsum(equation, m1, m2, runtime='python')
print('einsum-python',
      timeit.timeit('einsum(equation, m1, m2, runtime="python")',
                    number=200,
                    globals=globals()))

einsum(equation, m1, m2, runtime='onnxruntime1')
print('einsum-onnxruntime1',
      timeit.timeit('einsum(equation, m1, m2, runtime="onnxruntime1")',
                    number=200,
                    globals=globals()))

einsum(equation, m1, m2, runtime='onnxruntime1', optimize=True, verbose=1)
print('einsum-onnxruntime1',
      timeit.timeit('einsum(equation, m1, m2, runtime="onnxruntime1", optimize=True)',
                    number=200,
                    globals=globals()))

print("list of cached einsum equations")
for k, v in enumerate_cached_einsum():
    print(k, v.equation, v.equation_)

>>>

    numpy.einsum 0.1758663970977068
    einsum 0.18937530741095543
    einsum-python 0.27371818013489246
    einsum-onnxruntime1 0.40259831584990025
    einsum-onnxruntime1 0.3921410646289587
    list of cached einsum equations
    ('cab,cd->ad', 'batch_dot', None, False, dtype('float64'), True, None) cab,cd->ad cab,cd->ad
    ('cab,cd->ad', 'python', None, False, dtype('float64'), True, None) cab,cd->ad cab,cd->ad
    ('cab,cd->ad', 'onnxruntime1', None, False, dtype('float64'), True, None) cab,cd->ad cab,cd->ad
    ('cab,cd->ad', 'onnxruntime1', None, True, dtype('float64'), True, None) cab,cd->ad dcb,da->ca
    [runpythonerror]
    0%|          | 0/25 [00:00<?, ?it/s]
0.02 rtbest='cab,cd->ad':   0%|          | 0/25 [00:00<?, ?it/s]
0.02 rtbest='cab,cd->ad':   8%|▊         | 2/25 [00:00<00:01, 18.76it/s]
0.019 rtbest='dab,dc->ac':   8%|▊         | 2/25 [00:00<00:01, 18.76it/s]
0.019 rtbest='bac,bd->ad':   8%|▊         | 2/25 [00:00<00:01, 18.76it/s]
0.019 rtbest='bac,bd->ad':  16%|█▌        | 4/25 [00:00<00:01, 18.08it/s]
0.018 rtbest='bad,bc->ac':  16%|█▌        | 4/25 [00:00<00:01, 18.08it/s]
0.018 rtbest='bad,bc->ac':  28%|██▊       | 7/25 [00:00<00:00, 19.26it/s]
0.018 rtbest='dba,dc->bc':  28%|██▊       | 7/25 [00:00<00:00, 19.26it/s]
0.018 rtbest='dba,dc->bc':  36%|███▌      | 9/25 [00:00<00:00, 18.55it/s]
0.018 rtbest='dba,dc->bc':  44%|████▍     | 11/25 [00:00<00:00, 18.64it/s]
0.018 rtbest='cda,cb->db':  44%|████▍     | 11/25 [00:00<00:00, 18.64it/s]
0.018 rtbest='cda,cb->db':  52%|█████▏    | 13/25 [00:00<00:00, 18.25it/s]
0.018 rtbest='cda,cb->db':  60%|██████    | 15/25 [00:00<00:00, 18.44it/s]
0.018 rtbest='cda,cb->db':  68%|██████▊   | 17/25 [00:00<00:00, 18.04it/s]
0.018 rtbest='cda,cb->db':  80%|████████  | 20/25 [00:01<00:00, 18.15it/s]
0.018 rtbest='dcb,da->ca':  80%|████████  | 20/25 [00:01<00:00, 18.15it/s]
0.018 rtbest='dcb,da->ca':  92%|█████████▏| 23/25 [00:01<00:00, 18.99it/s]
0.018 rtbest='dcb,da->ca': 100%|██████████| 25/25 [00:01<00:00, 18.57it/s]
0.018 rtbest='dcb,da->ca': 100%|██████████| 25/25 [00:01<00:00, 18.52it/s]

The last example shows the time taken by every function:

<<<

import os
from pyquickhelper.pycode.profiling import profile
import numpy
from mlprodict.testing.einsum import einsum
from mlprodict.testing.einsum.einsum_fct import enumerate_cached_einsum
from mlprodict import __file__ as path

root = os.path.dirname(path)

equation = "cab,cd->ad"

m1 = numpy.random.randn(200, 20, 20)
m2 = numpy.random.randn(200, 20)


def clean(txt):
    txt = txt.replace(root, "mlprodict")
    return "\n".join(txt.split("\n")[:30])


def fct1():
    for i in range(100):
        einsum(equation, m1, m2, cache=False)


print("Profile cache with default runtime.")
res = profile(fct1)
print(root)
print(clean(res[1]))


def fct2():
    for i in range(100):
        einsum(equation, m1, m2, cache=False, runtime='python')


print("Profile cache with runtime='python'.")
res = profile(fct2)
print(root)
print(clean(res[1]))


def fct3():
    for i in range(100):
        einsum(equation, m1, m2, cache=True)


einsum(equation, m1, m2, cache=True)
print("Profile execution with default runtime.")
res = profile(fct3)
print(root)
print(clean(res[1]))


def fct4():
    for i in range(100):
        einsum(equation, m1, m2, cache=True, runtime='python')


einsum(equation, m1, m2, cache=True, runtime='python')
print("Profile execution with runtime='python'.")
res = profile(fct4)
print(root)
print(clean(res[1]))


def fct5():
    for i in range(100):
        einsum(equation, m1, m2, cache=True, runtime='onnxruntime1')


einsum(equation, m1, m2, cache=True, runtime='onnxruntime1')
print("Profile execution with runtime='onnxruntime1'.")
res = profile(fct5)
print(root)
print(clean(res[1]))

>>>

    Profile cache with default runtime.
    /var/lib/jenkins/workspace/mlprodict/mlprodict_UT_39_std/_doc/sphinxdoc/source/mlprodict
             133202 function calls (133002 primitive calls) in 0.703 seconds
    
       Ordered by: cumulative time
    
       ncalls  tottime  percall  cumtime  percall filename:lineno(function)
            1    0.001    0.001    0.703    0.703 <stdin>:27(fct1)
          100    0.002    0.000    0.702    0.007 mlprodict/testing/einsum/einsum_fct.py:457(einsum)
          100    0.000    0.000    0.503    0.005 mlprodict/testing/einsum/einsum_fct.py:379(optimize_decompose_einsum_equation)
          100    0.000    0.000    0.502    0.005 mlprodict/testing/einsum/einsum_fct.py:357(_einsum)
          100    0.001    0.000    0.502    0.005 mlprodict/testing/einsum/einsum_fct.py:339(build_einsum)
          100    0.001    0.000    0.500    0.005 mlprodict/testing/einsum/einsum_fct.py:109(build)
          100    0.001    0.000    0.499    0.005 mlprodict/testing/einsum/einsum_fct.py:275(build_runtime)
          100    0.004    0.000    0.498    0.005 mlprodict/testing/einsum/einsum_impl.py:85(decompose_einsum_equation)
          100    0.063    0.001    0.434    0.004 mlprodict/testing/einsum/einsum_impl.py:411(_decompose_einsum_equation_simple)
          100    0.000    0.000    0.197    0.002 mlprodict/testing/einsum/einsum_fct.py:327(__call__)
          100    0.001    0.000    0.196    0.002 mlprodict/testing/einsum/einsum_fct.py:287(<lambda>)
          100    0.001    0.000    0.195    0.002 mlprodict/testing/einsum/einsum_impl.py:165(apply_einsum_sequence)
          100    0.009    0.000    0.194    0.002 mlprodict/testing/einsum/einsum_impl_classes.py:1217(apply_sequence)
         1200    0.012    0.000    0.184    0.000 mlprodict/testing/einsum/einsum_impl_classes.py:611(apply)
         1200    0.025    0.000    0.175    0.000 mlprodict/testing/einsum/einsum_impl_classes.py:334(compute_output_row)
         4800    0.019    0.000    0.098    0.000 mlprodict/testing/einsum/einsum_impl_classes.py:22(single_axes)
         1600    0.009    0.000    0.096    0.000 {built-in method numpy.core._multiarray_umath.implement_array_function}
         3800    0.078    0.000    0.078    0.000 mlprodict/testing/einsum/einsum_impl_classes.py:38(<listcomp>)
         1900    0.073    0.000    0.073    0.000 {method 'reduce' of 'numpy.ufunc' objects}
          100    0.010    0.000    0.068    0.001 mlprodict/testing/einsum/einsum_impl_classes.py:504(_apply_batch_dot)
          500    0.008    0.000    0.058    0.000 /var/lib/jenkins/workspace/mlprodict/mlprodict_UT_39_std/_venv/lib/python3.9/site-packages/numpy/core/fromnumeric.py:69(_wrapreduction)
          500    0.040    0.000    0.057    0.000 mlprodict/testing/einsum/einsum_impl.py:227(_apply_transpose_reshape)
          100    0.002    0.000    0.042    0.000 mlprodict/testing/einsum/einsum_impl_classes.py:573(_apply_reduce_sum)
          100    0.001    0.000    0.038    0.000 <__array_function__ internals>:2(sum)
          100    0.001    0.000    0.037    0.000 /var/lib/jenkins/workspace/mlprodict/mlprodict_UT_39_std/_venv/lib/python3.9/site-packages/numpy/core/fromnumeric.py:2123(sum)
    Profile cache with runtime='python'.
    /var/lib/jenkins/workspace/mlprodict/mlprodict_UT_39_std/_doc/sphinxdoc/source/mlprodict
             924505 function calls (915071 primitive calls) in 4.645 seconds
    
       Ordered by: cumulative time
    
       ncalls  tottime  percall  cumtime  percall filename:lineno(function)
            1    0.001    0.001    4.659    4.659 <stdin>:36(fct2)
          100    0.003    0.000    4.658    0.047 mlprodict/testing/einsum/einsum_fct.py:457(einsum)
          100    0.000    0.000    4.363    0.044 mlprodict/testing/einsum/einsum_fct.py:379(optimize_decompose_einsum_equation)
          100    0.001    0.000    4.363    0.044 mlprodict/testing/einsum/einsum_fct.py:357(_einsum)
          100    0.002    0.000    4.362    0.044 mlprodict/testing/einsum/einsum_fct.py:339(build_einsum)
          100    0.001    0.000    4.360    0.044 mlprodict/testing/einsum/einsum_fct.py:109(build)
          100    0.022    0.000    4.359    0.044 mlprodict/testing/einsum/einsum_fct.py:275(build_runtime)
          100    0.003    0.000    2.667    0.027 mlprodict/onnxrt/onnx_inference.py:103(__init__)
          100    0.077    0.001    2.663    0.027 mlprodict/onnxrt/onnx_inference.py:180(_init)
         2800    0.054    0.000    1.422    0.001 mlprodict/onnxrt/onnx_inference_node.py:186(setup_runtime)
         2800    0.051    0.000    1.320    0.000 mlprodict/onnxrt/ops.py:9(load_op)
          100    0.040    0.000    1.166    0.012 mlprodict/testing/einsum/einsum_impl_classes.py:1476(to_onnx)
        171/1    0.003    0.000    0.882    0.882 <frozen importlib._bootstrap>:1002(_find_and_load)
        171/1    0.003    0.000    0.882    0.882 <frozen importlib._bootstrap>:967(_find_and_load_unlocked)
        171/1    0.003    0.000    0.881    0.881 <frozen importlib._bootstrap>:659(_load_unlocked)
        158/1    0.001    0.000    0.881    0.881 <frozen importlib._bootstrap_external>:784(exec_module)
        185/1    0.000    0.000    0.881    0.881 <frozen importlib._bootstrap>:220(_call_with_frames_removed)
        159/1    0.001    0.000    0.881    0.881 {built-in method builtins.exec}
            1    0.000    0.000    0.881    0.881 mlprodict/onnxrt/ops_cpu/__init__.py:2(<module>)
            1    0.006    0.006    0.860    0.860 mlprodict/onnxrt/ops_cpu/_op_list.py:3(<module>)
          100    0.218    0.002    0.757    0.008 mlprodict/onnxrt/onnx_inference.py:510(to_sequence)
         5000    0.040    0.000    0.523    0.000 mlprodict/testing/einsum/einsum_impl_classes.py:964(to_onnx)
    9000/8102    0.045    0.000    0.520    0.000 {method 'join' of 'str' objects}
          100    0.004    0.000    0.503    0.005 mlprodict/testing/einsum/einsum_impl.py:85(decompose_einsum_equation)
          151    0.008    0.000    0.483    0.003 mlprodict/onnxrt/doc/doc_helper.py:152(get_rst_doc)
    Profile execution with default runtime.
    /var/lib/jenkins/workspace/mlprodict/mlprodict_UT_39_std/_doc/sphinxdoc/source/mlprodict
             35402 function calls in 0.196 seconds
    
       Ordered by: cumulative time
    
       ncalls  tottime  percall  cumtime  percall filename:lineno(function)
            1    0.001    0.001    0.196    0.196 <stdin>:46(fct3)
          100    0.002    0.000    0.195    0.002 mlprodict/testing/einsum/einsum_fct.py:457(einsum)
          100    0.000    0.000    0.192    0.002 mlprodict/testing/einsum/einsum_fct.py:327(__call__)
          100    0.001    0.000    0.191    0.002 mlprodict/testing/einsum/einsum_fct.py:287(<lambda>)
          100    0.001    0.000    0.190    0.002 mlprodict/testing/einsum/einsum_impl.py:165(apply_einsum_sequence)
          100    0.009    0.000    0.189    0.002 mlprodict/testing/einsum/einsum_impl_classes.py:1217(apply_sequence)
         1200    0.011    0.000    0.179    0.000 mlprodict/testing/einsum/einsum_impl_classes.py:611(apply)
         1400    0.006    0.000    0.091    0.000 {built-in method numpy.core._multiarray_umath.implement_array_function}
          100    0.010    0.000    0.067    0.001 mlprodict/testing/einsum/einsum_impl_classes.py:504(_apply_batch_dot)
          500    0.008    0.000    0.060    0.000 /var/lib/jenkins/workspace/mlprodict/mlprodict_UT_39_std/_venv/lib/python3.9/site-packages/numpy/core/fromnumeric.py:69(_wrapreduction)
          500    0.049    0.000    0.049    0.000 {method 'reduce' of 'numpy.ufunc' objects}
          100    0.002    0.000    0.043    0.000 mlprodict/testing/einsum/einsum_impl_classes.py:573(_apply_reduce_sum)
          100    0.001    0.000    0.039    0.000 <__array_function__ internals>:2(sum)
          100    0.001    0.000    0.038    0.000 /var/lib/jenkins/workspace/mlprodict/mlprodict_UT_39_std/_venv/lib/python3.9/site-packages/numpy/core/fromnumeric.py:2123(sum)
          400    0.002    0.000    0.029    0.000 <__array_function__ internals>:2(prod)
          400    0.003    0.000    0.026    0.000 /var/lib/jenkins/workspace/mlprodict/mlprodict_UT_39_std/_venv/lib/python3.9/site-packages/numpy/core/fromnumeric.py:2933(prod)
          200    0.004    0.000    0.024    0.000 mlprodict/testing/einsum/einsum_impl_classes.py:418(_apply_expand_dims)
          300    0.001    0.000    0.019    0.000 <__array_function__ internals>:2(expand_dims)
          400    0.005    0.000    0.019    0.000 mlprodict/testing/einsum/einsum_impl_classes.py:430(_apply_transpose)
          100    0.017    0.000    0.018    0.000 mlprodict/testing/einsum/blas_lapack.py:96(gemm_dot)
          300    0.006    0.000    0.016    0.000 /var/lib/jenkins/workspace/mlprodict/mlprodict_UT_39_std/_venv/lib/python3.9/site-packages/numpy/lib/shape_base.py:512(expand_dims)
          400    0.001    0.000    0.008    0.000 <__array_function__ internals>:2(transpose)
         1300    0.004    0.000    0.008    0.000 mlprodict/testing/einsum/einsum_impl_classes.py:379(_get_data)
          100    0.002    0.000    0.006    0.000 mlprodict/testing/einsum/einsum_impl_classes.py:598(_apply_squeeze)
         2000    0.006    0.000    0.006    0.000 {built-in method builtins.getattr}
    Profile execution with runtime='python'.
    /var/lib/jenkins/workspace/mlprodict/mlprodict_UT_39_std/_doc/sphinxdoc/source/mlprodict
             33702 function calls in 0.284 seconds
    
       Ordered by: cumulative time
    
       ncalls  tottime  percall  cumtime  percall filename:lineno(function)
            1    0.001    0.001    0.284    0.284 <stdin>:58(fct4)
          100    0.002    0.000    0.283    0.003 mlprodict/testing/einsum/einsum_fct.py:457(einsum)
          100    0.001    0.000    0.279    0.003 mlprodict/testing/einsum/einsum_fct.py:327(__call__)
          100    0.002    0.000    0.279    0.003 mlprodict/testing/einsum/einsum_fct.py:303(<lambda>)
          100    0.001    0.000    0.277    0.003 mlprodict/onnxrt/onnx_inference.py:781(run)
          100    0.002    0.000    0.276    0.003 mlprodict/onnxrt/onnx_inference.py:299(_run_sequence_runtime_compiled)
          100    0.013    0.000    0.274    0.003 <string>:1(compiled_run)
         2100    0.028    0.000    0.097    0.000 {built-in method numpy.core._multiarray_umath.implement_array_function}
          600    0.026    0.000    0.059    0.000 mlprodict/onnxrt/ops_cpu/op_gather.py:29(_run)
          100    0.003    0.000    0.045    0.000 mlprodict/onnxrt/ops_cpu/op_reduce_sum.py:64(_run)
          100    0.001    0.000    0.041    0.000 <__array_function__ internals>:2(sum)
          100    0.001    0.000    0.040    0.000 /var/lib/jenkins/workspace/mlprodict/mlprodict_UT_39_std/_venv/lib/python3.9/site-packages/numpy/core/fromnumeric.py:2123(sum)
          100    0.002    0.000    0.038    0.000 /var/lib/jenkins/workspace/mlprodict/mlprodict_UT_39_std/_venv/lib/python3.9/site-packages/numpy/core/fromnumeric.py:69(_wrapreduction)
          100    0.036    0.000    0.036    0.000 {method 'reduce' of 'numpy.ufunc' objects}
          300    0.001    0.000    0.035    0.000 mlprodict/onnxrt/ops_cpu/op_reshape.py:38(_run)
          300    0.013    0.000    0.034    0.000 mlprodict/onnxrt/ops_cpu/op_reshape.py:15(reshape_reference_implementation)
          600    0.009    0.000    0.033    0.000 /var/lib/jenkins/workspace/mlprodict/mlprodict_UT_39_std/_venv/lib/python3.9/site-packages/numpy/core/_dtype.py:34(__str__)
          400    0.003    0.000    0.033    0.000 mlprodict/onnxrt/ops_cpu/op_identity.py:18(_run)
          200    0.009    0.000    0.029    0.000 mlprodict/onnxrt/ops_cpu/op_unsqueeze.py:65(_run)
          400    0.029    0.000    0.029    0.000 {method 'copy' of 'numpy.ndarray' objects}
          600    0.008    0.000    0.024    0.000 /var/lib/jenkins/workspace/mlprodict/mlprodict_UT_39_std/_venv/lib/python3.9/site-packages/numpy/core/_dtype.py:321(_name_get)
          200    0.002    0.000    0.019    0.000 <__array_function__ internals>:2(expand_dims)
          100    0.000    0.000    0.018    0.000 mlprodict/onnxrt/ops_cpu/op_gemm.py:57(_run)
          100    0.000    0.000    0.018    0.000 mlprodict/onnxrt/ops_cpu/op_gemm.py:27(<lambda>)
          100    0.003    0.000    0.017    0.000 mlprodict/onnxrt/ops_cpu/op_gemm.py:36(_gemm01)
    Profile execution with runtime='onnxruntime1'.
    /var/lib/jenkins/workspace/mlprodict/mlprodict_UT_39_std/_doc/sphinxdoc/source/mlprodict
             2202 function calls in 0.287 seconds
    
       Ordered by: cumulative time
    
       ncalls  tottime  percall  cumtime  percall filename:lineno(function)
            1    0.001    0.001    0.287    0.287 <stdin>:69(fct5)
          100    0.003    0.000    0.286    0.003 mlprodict/testing/einsum/einsum_fct.py:457(einsum)
          100    0.001    0.000    0.282    0.003 mlprodict/testing/einsum/einsum_fct.py:327(__call__)
          100    0.002    0.000    0.281    0.003 mlprodict/testing/einsum/einsum_fct.py:303(<lambda>)
          100    0.001    0.000    0.279    0.003 mlprodict/onnxrt/onnx_inference.py:781(run)
          100    0.003    0.000    0.278    0.003 mlprodict/onnxrt/onnx_inference.py:1183(_run_whole_runtime)
          100    0.275    0.003    0.275    0.003 mlprodict/onnxrt/ops_whole/session.py:98(run)
          100    0.000    0.000    0.001    0.000 mlprodict/testing/einsum/einsum_fct.py:379(optimize_decompose_einsum_equation)
          100    0.001    0.000    0.001    0.000 mlprodict/testing/einsum/einsum_fct.py:357(_einsum)
          100    0.000    0.000    0.000    0.000 mlprodict/onnxrt/onnx_inference.py:1255(<dictcomp>)
          300    0.000    0.000    0.000    0.000 mlprodict/testing/einsum/einsum_fct.py:655(<genexpr>)
          100    0.000    0.000    0.000    0.000 {method 'get' of 'dict' objects}
          200    0.000    0.000    0.000    0.000 {built-in method builtins.hasattr}
          100    0.000    0.000    0.000    0.000 mlprodict/testing/einsum/einsum_fct.py:304(<dictcomp>)
          200    0.000    0.000    0.000    0.000 {built-in method builtins.len}
          100    0.000    0.000    0.000    0.000 {built-in method builtins.isinstance}
          100    0.000    0.000    0.000    0.000 {method 'values' of 'dict' objects}
          100    0.000    0.000    0.000    0.000 {built-in method builtins.iter}
          100    0.000    0.000    0.000    0.000 {built-in method builtins.next}
            1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

source on GitHub

mlprodict.testing.einsum.einsum_fct.enumerate_cached_einsum()#

Enumerates all cached einsum function.

source on GitHub

mlprodict.testing.einsum.einsum_fct.optimize_decompose_einsum_equation(equation, dtype, optimize=False, runtime='batch_dot', cache=True, opset=None, decompose=True, strategy=None, verbose=None)#

Proposes a new implementation of numpy.einsum. It does not allow expresion using and expects a right member.

Parameters
  • equation – einsum equation

  • optimize – permutes all letters to find the best permutation

  • runtime – runtime used to compute the results once the computation graph is produced (see below)

  • cache – if True, the function stores the preprocessing done for a specific equation, the second call with the same equation is much faster

  • opset – ONNX opset to use for some runtimes

  • decompose – by default, the function decomposes the equation into more simple operators but it can keep the original ONNX einsum operator.

  • strategy – optimisation strategy (see below)

  • verbose – display progress if optimize is True

Returns

einsum result

The available runtimes are:

  • batch_dot: the runtime is apply_einsum_sequence,

  • python: one ONNX graph executed with a python runtime,

  • onnxruntime1: one ONNX graph executed with onnxruntime.

The optimisation strategy can be:

  • None: the same runtime is used to find the best permutation of letters

  • ‘ml’: a machine learned model is used to predict the

    best permutation of letters, this model comes from notebook Infer operator computation cost.

The function works in two steps:

  • first step analyses the equation to produce a computation graph, this graph can also be converted into ONNX,

  • second step runs the graph whatever the graph is.

The function returns an object of type CachedEinsum which has the following members after optimization:

  • equation_ corresponding to the best equivalent equation

  • graph_: the corresponding graph returned by function

    :func:`decompose_einsum_equation <mlprodict.testing.einsum.einsum_impl.decompose_einsum_equation> `

  • onnx_: if a conversion to onnx is used, stores the onnx graph

  • runtime_: a function used by __call__, calls the runtime

  • oinf_: an object of type OnnxInference

  • timed_permutations_: memorizes the results of the optimization

<<<

import numpy
from mlprodict.testing.einsum import optimize_decompose_einsum_equation

seq_opt = optimize_decompose_einsum_equation(
    "bsnh,btnh->bnts", numpy.float64, strategy='ml', verbose=1,
    runtime="python", optimize=True)

print("best equation:", seq_opt.equation_)

>>>

    
  0%|          | 0/121 [00:00<?, ?it/s]
4.5 mlbest='bsnh,btnh->bnts':   0%|          | 0/121 [00:00<?, ?it/s]
4.5 mlbest='bsnh,btnh->bnts':   2%|2         | 3/121 [00:00<00:04, 25.65it/s]
4.5 mlbest='bnth,bsth->btsn':   2%|2         | 3/121 [00:00<00:04, 25.65it/s]
4.5 mlbest='bnth,bsth->btsn':   5%|4         | 6/121 [00:00<00:04, 26.99it/s]
4.5 mlbest='bnth,bsth->btsn':   7%|7         | 9/121 [00:00<00:04, 27.54it/s]
4.5 mlbest='bnht,bsht->bhsn':   7%|7         | 9/121 [00:00<00:04, 27.54it/s]
4.5 mlbest='bnht,bsht->bhsn':  10%|9         | 12/121 [00:00<00:03, 27.91it/s]
4.5 mlbest='bhtn,bstn->btsh':  10%|9         | 12/121 [00:00<00:03, 27.91it/s]
4.5 mlbest='bhtn,bstn->btsh':  12%|#2        | 15/121 [00:00<00:03, 27.37it/s]
4.5 mlbest='bhts,bnts->btnh':  12%|#2        | 15/121 [00:00<00:03, 27.37it/s]
4.5 mlbest='bhts,bnts->btnh':  15%|#4        | 18/121 [00:00<00:03, 27.72it/s]
4.5 mlbest='bhts,bnts->btnh':  17%|#7        | 21/121 [00:00<00:03, 28.09it/s]
4.5 mlbest='bhts,bnts->btnh':  20%|#9        | 24/121 [00:00<00:03, 28.38it/s]
4.5 mlbest='bhts,bnts->btnh':  22%|##2       | 27/121 [00:00<00:03, 27.79it/s]
4.5 mlbest='bhts,bnts->btnh':  25%|##4       | 30/121 [00:01<00:03, 28.05it/s]
4.5 mlbest='bhts,bnts->btnh':  27%|##7       | 33/121 [00:01<00:03, 28.24it/s]
4.5 mlbest='bhts,bnts->btnh':  30%|##9       | 36/121 [00:01<00:02, 28.39it/s]
4.5 mlbest='bhts,bnts->btnh':  32%|###2      | 39/121 [00:01<00:02, 28.57it/s]
4.5 mlbest='bhts,bnts->btnh':  35%|###4      | 42/121 [00:01<00:02, 27.84it/s]
4.5 mlbest='bhts,bnts->btnh':  37%|###7      | 45/121 [00:01<00:02, 28.09it/s]
4.5 mlbest='bhts,bnts->btnh':  40%|###9      | 48/121 [00:01<00:02, 28.31it/s]
4.5 mlbest='bhts,bnts->btnh':  42%|####2     | 51/121 [00:01<00:02, 28.44it/s]
4.5 mlbest='bhts,bnts->btnh':  45%|####4     | 54/121 [00:01<00:02, 28.68it/s]
4.5 mlbest='bhts,bnts->btnh':  47%|####7     | 57/121 [00:02<00:02, 27.92it/s]
4.5 mlbest='bhts,bnts->btnh':  50%|####9     | 60/121 [00:02<00:02, 28.15it/s]
4.5 mlbest='bhts,bnts->btnh':  52%|#####2    | 63/121 [00:02<00:02, 28.31it/s]
4.5 mlbest='bhts,bnts->btnh':  55%|#####4    | 66/121 [00:02<00:01, 28.51it/s]
4.5 mlbest='bhts,bnts->btnh':  57%|#####7    | 69/121 [00:02<00:01, 28.64it/s]
4.5 mlbest='bhts,bnts->btnh':  60%|#####9    | 72/121 [00:02<00:01, 27.91it/s]
4.5 mlbest='bhts,bnts->btnh':  62%|######1   | 75/121 [00:02<00:01, 28.12it/s]
4.5 mlbest='bhts,bnts->btnh':  64%|######4   | 78/121 [00:02<00:01, 28.31it/s]
4.5 mlbest='bhts,bnts->btnh':  67%|######6   | 81/121 [00:02<00:01, 28.50it/s]
4.5 mlbest='bhts,bnts->btnh':  69%|######9   | 84/121 [00:02<00:01, 27.78it/s]
4.5 mlbest='bhts,bnts->btnh':  72%|#######1  | 87/121 [00:03<00:01, 28.02it/s]
4.5 mlbest='bhts,bnts->btnh':  74%|#######4  | 90/121 [00:03<00:01, 28.22it/s]
4.5 mlbest='bhts,bnts->btnh':  77%|#######6  | 93/121 [00:03<00:00, 28.36it/s]
4.5 mlbest='bhts,bnts->btnh':  79%|#######9  | 96/121 [00:03<00:00, 28.49it/s]
4.5 mlbest='bhts,bnts->btnh':  82%|########1 | 99/121 [00:03<00:00, 27.72it/s]
4.5 mlbest='bhts,bnts->btnh':  84%|########4 | 102/121 [00:03<00:00, 27.99it/s]
4.5 mlbest='bhts,bnts->btnh':  87%|########6 | 105/121 [00:03<00:00, 28.26it/s]
4.5 mlbest='bhts,bnts->btnh':  89%|########9 | 108/121 [00:03<00:00, 28.39it/s]
4.5 mlbest='bhts,bnts->btnh':  92%|#########1| 111/121 [00:03<00:00, 28.52it/s]
4.5 mlbest='bhts,bnts->btnh':  94%|#########4| 114/121 [00:04<00:00, 27.75it/s]
4.5 mlbest='bhts,bnts->btnh':  97%|#########6| 117/121 [00:04<00:00, 28.01it/s]
4.5 mlbest='bhts,bnts->btnh':  99%|#########9| 120/121 [00:04<00:00, 28.17it/s]
4.5 mlbest='bhts,bnts->btnh': 100%|##########| 121/121 [00:04<00:00, 28.12it/s]
    best equation: bhts,bnts->btnh

source on GitHub