module `testing.einsum.einsum_fct`#

Short summary#

module mlprodict.testing.einsum.einsum_fct

Main functions decomposing einsum computation into more simple functions.

Classes#

class	truncated documentation
`CachedEinsum`	Stores all the necessary information to cache the preprocessing of a an einsum equation.

Functions#

function	truncated documentation
`_einsum`
`einsum`	Proposes a new implementation of numpy.einsum. It does not allow expresion using … and expects a right …
`enumerate_cached_einsum`	Enumerates all cached einsum function.
`optimize_decompose_einsum_equation`	Proposes a new implementation of numpy.einsum. It does not allow expresion using … and expects a right …

Static Methods#

staticmethod	truncated documentation
`build_einsum`	Creates an instance of CachedEinsum.

Methods#

method	truncated documentation
`__call__`	Calls the runtime self.runtime_.
`__init__`
`__repr__`	usual
`_build_optimize`
`_build_optimize_ml`
`build`	Preprocesses the equation builds whatever is necessary to compute the result of the einsum equation.
`build_onnx_einsum`	Builds an ONNX graph with a single einsum operator.
`build_runtime`	Builds the runtime associated to the equation self.equation_.
`default_inputs`	Returns default inputs (reshaped numpy.arange + 0.7i).

Documentation#

Main functions decomposing einsum computation into more simple functions.

source on GitHub

class mlprodict.testing.einsum.einsum_fct.CachedEinsum(equation, runtime='batch_dot', opset=None, optimize=False, dtype=<class 'numpy.float64'>, decompose=True, strategy=None, verbose=None, key=None)#

Bases: object

Stores all the necessary information to cache the preprocessing of a an einsum equation.

Parameters

equation – numpy equation
runtime – see einsum
opset – ONNX opset
optimize – finds the best letter permutation
dtype – dtype
decompose – to decompose Einsum operator or to keep it as is
key – key used to cache this class
strategy – optimization strategy
verbose – displays progress information

The class creates the following attributes:

equation_ corresponding to the best equivalent equation
graph_: the corresponding graph returned by function
:func:`decompose_einsum_equation <mlprodict.testing.einsum.einsum_impl.decompose_einsum_equation> `
onnx_: if a conversion to onnx is used, stores the onnx graph
runtime_: a function used by __call__, calls the runtime

source on GitHub

__call__(*inputs)#

Calls the runtime self.runtime_.

source on GitHub

__init__(equation, runtime='batch_dot', opset=None, optimize=False, dtype=<class 'numpy.float64'>, decompose=True, strategy=None, verbose=None, key=None)#

__repr__()#: usual

_build_optimize()#

_build_optimize_ml()#

build()#

Preprocesses the equation builds whatever is necessary to compute the result of the einsum equation.

source on GitHub

static build_einsum(equation, runtime, opset, optimize, dtype, decompose=True, strategy=None, verbose=None, key=None)#

Creates an instance of CachedEinsum.

source on GitHub

build_onnx_einsum(input_names)#

Builds an ONNX graph with a single einsum operator.

source on GitHub

build_runtime()#

Builds the runtime associated to the equation self.equation_.

source on GitHub

default_inputs(N=None)#

Returns default inputs (reshaped numpy.arange + 0.7i).

Parameters: N – dimension (all dimension have the same size)

If N is None, N is given a size depending on the number of letters to avoid spending too much time on optimization.

source on GitHub

mlprodict.testing.einsum.einsum_fct._einsum(equation, dtype, optimize=False, runtime='batch_dot', cache=True, opset=None, decompose=True, strategy=None, verbose=None)#

mlprodict.testing.einsum.einsum_fct.einsum(equation, *inputs, optimize=False, runtime='batch_dot', cache=True, opset=None, decompose=True, strategy=None, verbose=None)#

Proposes a new implementation of numpy.einsum. It does not allow expresion using … and expects a right member.

Parameters

equation – einsum equation
inputs – inputs
optimize – permutes all letters to find the best permutation
runtime – runtime used to compute the results once the computation graph is produced (see below)
cache – if True, the function stores the preprocessing done for a specific equation, the second call with the same equation is much faster
opset – ONNX opset to use for some runtimes
decompose – by default, the function decomposes the equation into more simple operators but it can keep the original ONNX einsum operator.
strategy – optimisation strategy (see below)
verbose – display progress if optimize is True

Returns

einsum result

The available runtimes are:

batch_dot: the runtime is apply_einsum_sequence,
python: one ONNX graph executed with a python runtime,
onnxruntime1: one ONNX graph executed with onnxruntime.

The optimisation strategy can be:

None: the same runtime is used to find the best permutation of letters
‘ml’: a machine learned model is used to predict the
best permutation of letters, this model comes from notebook Infer operator computation cost.

The function works in two steps:

first step analyses the equation to produce a computation graph, this graph can also be converted into ONNX,
second step runs the graph whatever the graph is.

Further details are available in the documentation of function optimize_decompose_einsum_equation. The function works the same way as numpy.einsum:

<<<

import numpy
from mlprodict.testing.einsum import einsum

equation = "abc,cd->abd"

m1 = numpy.random.randn(2, 2, 2)
m2 = numpy.random.randn(2, 2)

np = numpy.einsum(equation, m1, m2)
print('numpy.einsum')
print(np)

print('mlprodict.testing.einsum')
mp = einsum(equation, m1, m2)
print(mp)

>>>

    numpy.einsum
    [[[-2.499  0.046]
      [-0.885 -0.078]]
    
     [[ 2.338 -0.146]
      [-1.562 -0.154]]]
    mlprodict.testing.einsum
    [[[-2.499  0.046]
      [-0.885 -0.078]]
    
     [[ 2.338 -0.146]
      [-1.562 -0.154]]]

In some case, the einsum implementation can be optimized by looping on possible permutation:

<<<

import timeit
import numpy
from mlprodict.testing.einsum import einsum
from mlprodict.testing.einsum.einsum_fct import enumerate_cached_einsum

equation = "cab,cd->ad"

m1 = numpy.random.randn(20, 20, 20)
m2 = numpy.random.randn(20, 20)

print('numpy.einsum',
      timeit.timeit('numpy.einsum(equation, m1, m2)',
                    number=200,
                    globals=globals()))

einsum(equation, m1, m2)
print('einsum',
      timeit.timeit('einsum(equation, m1, m2)',
                    number=200,
                    globals=globals()))

einsum(equation, m1, m2, runtime='python')
print('einsum-python',
      timeit.timeit('einsum(equation, m1, m2, runtime="python")',
                    number=200,
                    globals=globals()))

einsum(equation, m1, m2, runtime='onnxruntime1')
print('einsum-onnxruntime1',
      timeit.timeit('einsum(equation, m1, m2, runtime="onnxruntime1")',
                    number=200,
                    globals=globals()))

einsum(equation, m1, m2, runtime='onnxruntime1', optimize=True, verbose=1)
print('einsum-onnxruntime1',
      timeit.timeit('einsum(equation, m1, m2, runtime="onnxruntime1", optimize=True)',
                    number=200,
                    globals=globals()))

print("list of cached einsum equations")
for k, v in enumerate_cached_einsum():
    print(k, v.equation, v.equation_)

>>>

    numpy.einsum 0.1758663970977068
    einsum 0.18937530741095543
    einsum-python 0.27371818013489246
    einsum-onnxruntime1 0.40259831584990025
    einsum-onnxruntime1 0.3921410646289587
    list of cached einsum equations
    ('cab,cd->ad', 'batch_dot', None, False, dtype('float64'), True, None) cab,cd->ad cab,cd->ad
    ('cab,cd->ad', 'python', None, False, dtype('float64'), True, None) cab,cd->ad cab,cd->ad
    ('cab,cd->ad', 'onnxruntime1', None, False, dtype('float64'), True, None) cab,cd->ad cab,cd->ad
    ('cab,cd->ad', 'onnxruntime1', None, True, dtype('float64'), True, None) cab,cd->ad dcb,da->ca
    [runpythonerror]
    0%|          | 0/25 [00:00<?, ?it/s]
0.02 rtbest='cab,cd->ad':   0%|          | 0/25 [00:00<?, ?it/s]
0.02 rtbest='cab,cd->ad':   8%|▊         | 2/25 [00:00<00:01, 18.76it/s]
0.019 rtbest='dab,dc->ac':   8%|▊         | 2/25 [00:00<00:01, 18.76it/s]
0.019 rtbest='bac,bd->ad':   8%|▊         | 2/25 [00:00<00:01, 18.76it/s]
0.019 rtbest='bac,bd->ad':  16%|█▌        | 4/25 [00:00<00:01, 18.08it/s]
0.018 rtbest='bad,bc->ac':  16%|█▌        | 4/25 [00:00<00:01, 18.08it/s]
0.018 rtbest='bad,bc->ac':  28%|██▊       | 7/25 [00:00<00:00, 19.26it/s]
0.018 rtbest='dba,dc->bc':  28%|██▊       | 7/25 [00:00<00:00, 19.26it/s]
0.018 rtbest='dba,dc->bc':  36%|███▌      | 9/25 [00:00<00:00, 18.55it/s]
0.018 rtbest='dba,dc->bc':  44%|████▍     | 11/25 [00:00<00:00, 18.64it/s]
0.018 rtbest='cda,cb->db':  44%|████▍     | 11/25 [00:00<00:00, 18.64it/s]
0.018 rtbest='cda,cb->db':  52%|█████▏    | 13/25 [00:00<00:00, 18.25it/s]
0.018 rtbest='cda,cb->db':  60%|██████    | 15/25 [00:00<00:00, 18.44it/s]
0.018 rtbest='cda,cb->db':  68%|██████▊   | 17/25 [00:00<00:00, 18.04it/s]
0.018 rtbest='cda,cb->db':  80%|████████  | 20/25 [00:01<00:00, 18.15it/s]
0.018 rtbest='dcb,da->ca':  80%|████████  | 20/25 [00:01<00:00, 18.15it/s]
0.018 rtbest='dcb,da->ca':  92%|█████████▏| 23/25 [00:01<00:00, 18.99it/s]
0.018 rtbest='dcb,da->ca': 100%|██████████| 25/25 [00:01<00:00, 18.57it/s]
0.018 rtbest='dcb,da->ca': 100%|██████████| 25/25 [00:01<00:00, 18.52it/s]

The last example shows the time taken by every function:

<<<

import os
from pyquickhelper.pycode.profiling import profile
import numpy
from mlprodict.testing.einsum import einsum
from mlprodict.testing.einsum.einsum_fct import enumerate_cached_einsum
from mlprodict import __file__ as path

root = os.path.dirname(path)

equation = "cab,cd->ad"

m1 = numpy.random.randn(200, 20, 20)
m2 = numpy.random.randn(200, 20)


def clean(txt):
    txt = txt.replace(root, "mlprodict")
    return "\n".join(txt.split("\n")[:30])


def fct1():
    for i in range(100):
        einsum(equation, m1, m2, cache=False)


print("Profile cache with default runtime.")
res = profile(fct1)
print(root)
print(clean(res[1]))


def fct2():
    for i in range(100):
        einsum(equation, m1, m2, cache=False, runtime='python')


print("Profile cache with runtime='python'.")
res = profile(fct2)
print(root)
print(clean(res[1]))


def fct3():
    for i in range(100):
        einsum(equation, m1, m2, cache=True)


einsum(equation, m1, m2, cache=True)
print("Profile execution with default runtime.")
res = profile(fct3)
print(root)
print(clean(res[1]))


def fct4():
    for i in range(100):
        einsum(equation, m1, m2, cache=True, runtime='python')


einsum(equation, m1, m2, cache=True, runtime='python')
print("Profile execution with runtime='python'.")
res = profile(fct4)
print(root)
print(clean(res[1]))


def fct5():
    for i in range(100):
        einsum(equation, m1, m2, cache=True, runtime='onnxruntime1')


einsum(equation, m1, m2, cache=True, runtime='onnxruntime1')
print("Profile execution with runtime='onnxruntime1'.")
res = profile(fct5)
print(root)
print(clean(res[1]))

>>>

    Profile cache with default runtime.
    /var/lib/jenkins/workspace/mlprodict/mlprodict_UT_39_std/_doc/sphinxdoc/source/mlprodict
             133202 function calls (133002 primitive calls) in 0.703 seconds
    
       Ordered by: cumulative time
    
       ncalls  tottime  percall  cumtime  percall filename:lineno(function)
            1    0.001    0.001    0.703    0.703 <stdin>:27(fct1)
          100    0.002    0.000    0.702    0.007 mlprodict/testing/einsum/einsum_fct.py:457(einsum)
          100    0.000    0.000    0.503    0.005 mlprodict/testing/einsum/einsum_fct.py:379(optimize_decompose_einsum_equation)
          100    0.000    0.000    0.502    0.005 mlprodict/testing/einsum/einsum_fct.py:357(_einsum)
          100    0.001    0.000    0.502    0.005 mlprodict/testing/einsum/einsum_fct.py:339(build_einsum)
          100    0.001    0.000    0.500    0.005 mlprodict/testing/einsum/einsum_fct.py:109(build)
          100    0.001    0.000    0.499    0.005 mlprodict/testing/einsum/einsum_fct.py:275(build_runtime)
          100    0.004    0.000    0.498    0.005 mlprodict/testing/einsum/einsum_impl.py:85(decompose_einsum_equation)
          100    0.063    0.001    0.434    0.004 mlprodict/testing/einsum/einsum_impl.py:411(_decompose_einsum_equation_simple)
          100    0.000    0.000    0.197    0.002 mlprodict/testing/einsum/einsum_fct.py:327(__call__)
          100    0.001    0.000    0.196    0.002 mlprodict/testing/einsum/einsum_fct.py:287(<lambda>)
          100    0.001    0.000    0.195    0.002 mlprodict/testing/einsum/einsum_impl.py:165(apply_einsum_sequence)
          100    0.009    0.000    0.194    0.002 mlprodict/testing/einsum/einsum_impl_classes.py:1217(apply_sequence)
         1200    0.012    0.000    0.184    0.000 mlprodict/testing/einsum/einsum_impl_classes.py:611(apply)
         1200    0.025    0.000    0.175    0.000 mlprodict/testing/einsum/einsum_impl_classes.py:334(compute_output_row)
         4800    0.019    0.000    0.098    0.000 mlprodict/testing/einsum/einsum_impl_classes.py:22(single_axes)
         1600    0.009    0.000    0.096    0.000 {built-in method numpy.core._multiarray_umath.implement_array_function}
         3800    0.078    0.000    0.078    0.000 mlprodict/testing/einsum/einsum_impl_classes.py:38(<listcomp>)
         1900    0.073    0.000    0.073    0.000 {method 'reduce' of 'numpy.ufunc' objects}
          100    0.010    0.000    0.068    0.001 mlprodict/testing/einsum/einsum_impl_classes.py:504(_apply_batch_dot)
          500    0.008    0.000    0.058    0.000 /var/lib/jenkins/workspace/mlprodict/mlprodict_UT_39_std/_venv/lib/python3.9/site-packages/numpy/core/fromnumeric.py:69(_wrapreduction)
          500    0.040    0.000    0.057    0.000 mlprodict/testing/einsum/einsum_impl.py:227(_apply_transpose_reshape)
          100    0.002    0.000    0.042    0.000 mlprodict/testing/einsum/einsum_impl_classes.py:573(_apply_reduce_sum)
          100    0.001    0.000    0.038    0.000 <__array_function__ internals>:2(sum)
          100    0.001    0.000    0.037    0.000 /var/lib/jenkins/workspace/mlprodict/mlprodict_UT_39_std/_venv/lib/python3.9/site-packages/numpy/core/fromnumeric.py:2123(sum)
    Profile cache with runtime='python'.
    /var/lib/jenkins/workspace/mlprodict/mlprodict_UT_39_std/_doc/sphinxdoc/source/mlprodict
             924505 function calls (915071 primitive calls) in 4.645 seconds
    
       Ordered by: cumulative time
    
       ncalls  tottime  percall  cumtime  percall filename:lineno(function)
            1    0.001    0.001    4.659    4.659 <stdin>:36(fct2)
          100    0.003    0.000    4.658    0.047 mlprodict/testing/einsum/einsum_fct.py:457(einsum)
          100    0.000    0.000    4.363    0.044 mlprodict/testing/einsum/einsum_fct.py:379(optimize_decompose_einsum_equation)
          100    0.001    0.000    4.363    0.044 mlprodict/testing/einsum/einsum_fct.py:357(_einsum)
          100    0.002    0.000    4.362    0.044 mlprodict/testing/einsum/einsum_fct.py:339(build_einsum)
          100    0.001    0.000    4.360    0.044 mlprodict/testing/einsum/einsum_fct.py:109(build)
          100    0.022    0.000    4.359    0.044 mlprodict/testing/einsum/einsum_fct.py:275(build_runtime)
          100    0.003    0.000    2.667    0.027 mlprodict/onnxrt/onnx_inference.py:103(__init__)
          100    0.077    0.001    2.663    0.027 mlprodict/onnxrt/onnx_inference.py:180(_init)
         2800    0.054    0.000    1.422    0.001 mlprodict/onnxrt/onnx_inference_node.py:186(setup_runtime)
         2800    0.051    0.000    1.320    0.000 mlprodict/onnxrt/ops.py:9(load_op)
          100    0.040    0.000    1.166    0.012 mlprodict/testing/einsum/einsum_impl_classes.py:1476(to_onnx)
        171/1    0.003    0.000    0.882    0.882 <frozen importlib._bootstrap>:1002(_find_and_load)
        171/1    0.003    0.000    0.882    0.882 <frozen importlib._bootstrap>:967(_find_and_load_unlocked)
        171/1    0.003    0.000    0.881    0.881 <frozen importlib._bootstrap>:659(_load_unlocked)
        158/1    0.001    0.000    0.881    0.881 <frozen importlib._bootstrap_external>:784(exec_module)
        185/1    0.000    0.000    0.881    0.881 <frozen importlib._bootstrap>:220(_call_with_frames_removed)
        159/1    0.001    0.000    0.881    0.881 {built-in method builtins.exec}
            1    0.000    0.000    0.881    0.881 mlprodict/onnxrt/ops_cpu/__init__.py:2(<module>)
            1    0.006    0.006    0.860    0.860 mlprodict/onnxrt/ops_cpu/_op_list.py:3(<module>)
          100    0.218    0.002    0.757    0.008 mlprodict/onnxrt/onnx_inference.py:510(to_sequence)
         5000    0.040    0.000    0.523    0.000 mlprodict/testing/einsum/einsum_impl_classes.py:964(to_onnx)
    9000/8102    0.045    0.000    0.520    0.000 {method 'join' of 'str' objects}
          100    0.004    0.000    0.503    0.005 mlprodict/testing/einsum/einsum_impl.py:85(decompose_einsum_equation)
          151    0.008    0.000    0.483    0.003 mlprodict/onnxrt/doc/doc_helper.py:152(get_rst_doc)
    Profile execution with default runtime.
    /var/lib/jenkins/workspace/mlprodict/mlprodict_UT_39_std/_doc/sphinxdoc/source/mlprodict
             35402 function calls in 0.196 seconds
    
       Ordered by: cumulative time
    
       ncalls  tottime  percall  cumtime  percall filename:lineno(function)
            1    0.001    0.001    0.196    0.196 <stdin>:46(fct3)
          100    0.002    0.000    0.195    0.002 mlprodict/testing/einsum/einsum_fct.py:457(einsum)
          100    0.000    0.000    0.192    0.002 mlprodict/testing/einsum/einsum_fct.py:327(__call__)
          100    0.001    0.000    0.191    0.002 mlprodict/testing/einsum/einsum_fct.py:287(<lambda>)
          100    0.001    0.000    0.190    0.002 mlprodict/testing/einsum/einsum_impl.py:165(apply_einsum_sequence)
          100    0.009    0.000    0.189    0.002 mlprodict/testing/einsum/einsum_impl_classes.py:1217(apply_sequence)
         1200    0.011    0.000    0.179    0.000 mlprodict/testing/einsum/einsum_impl_classes.py:611(apply)
         1400    0.006    0.000    0.091    0.000 {built-in method numpy.core._multiarray_umath.implement_array_function}
          100    0.010    0.000    0.067    0.001 mlprodict/testing/einsum/einsum_impl_classes.py:504(_apply_batch_dot)
          500    0.008    0.000    0.060    0.000 /var/lib/jenkins/workspace/mlprodict/mlprodict_UT_39_std/_venv/lib/python3.9/site-packages/numpy/core/fromnumeric.py:69(_wrapreduction)
          500    0.049    0.000    0.049    0.000 {method 'reduce' of 'numpy.ufunc' objects}
          100    0.002    0.000    0.043    0.000 mlprodict/testing/einsum/einsum_impl_classes.py:573(_apply_reduce_sum)
          100    0.001    0.000    0.039    0.000 <__array_function__ internals>:2(sum)
          100    0.001    0.000    0.038    0.000 /var/lib/jenkins/workspace/mlprodict/mlprodict_UT_39_std/_venv/lib/python3.9/site-packages/numpy/core/fromnumeric.py:2123(sum)
          400    0.002    0.000    0.029    0.000 <__array_function__ internals>:2(prod)
          400    0.003    0.000    0.026    0.000 /var/lib/jenkins/workspace/mlprodict/mlprodict_UT_39_std/_venv/lib/python3.9/site-packages/numpy/core/fromnumeric.py:2933(prod)
          200    0.004    0.000    0.024    0.000 mlprodict/testing/einsum/einsum_impl_classes.py:418(_apply_expand_dims)
          300    0.001    0.000    0.019    0.000 <__array_function__ internals>:2(expand_dims)
          400    0.005    0.000    0.019    0.000 mlprodict/testing/einsum/einsum_impl_classes.py:430(_apply_transpose)
          100    0.017    0.000    0.018    0.000 mlprodict/testing/einsum/blas_lapack.py:96(gemm_dot)
          300    0.006    0.000    0.016    0.000 /var/lib/jenkins/workspace/mlprodict/mlprodict_UT_39_std/_venv/lib/python3.9/site-packages/numpy/lib/shape_base.py:512(expand_dims)
          400    0.001    0.000    0.008    0.000 <__array_function__ internals>:2(transpose)
         1300    0.004    0.000    0.008    0.000 mlprodict/testing/einsum/einsum_impl_classes.py:379(_get_data)
          100    0.002    0.000    0.006    0.000 mlprodict/testing/einsum/einsum_impl_classes.py:598(_apply_squeeze)
         2000    0.006    0.000    0.006    0.000 {built-in method builtins.getattr}
    Profile execution with runtime='python'.
    /var/lib/jenkins/workspace/mlprodict/mlprodict_UT_39_std/_doc/sphinxdoc/source/mlprodict
             33702 function calls in 0.284 seconds
    
       Ordered by: cumulative time
    
       ncalls  tottime  percall  cumtime  percall filename:lineno(function)
            1    0.001    0.001    0.284    0.284 <stdin>:58(fct4)
          100    0.002    0.000    0.283    0.003 mlprodict/testing/einsum/einsum_fct.py:457(einsum)
          100    0.001    0.000    0.279    0.003 mlprodict/testing/einsum/einsum_fct.py:327(__call__)
          100    0.002    0.000    0.279    0.003 mlprodict/testing/einsum/einsum_fct.py:303(<lambda>)
          100    0.001    0.000    0.277    0.003 mlprodict/onnxrt/onnx_inference.py:781(run)
          100    0.002    0.000    0.276    0.003 mlprodict/onnxrt/onnx_inference.py:299(_run_sequence_runtime_compiled)
          100    0.013    0.000    0.274    0.003 <string>:1(compiled_run)
         2100    0.028    0.000    0.097    0.000 {built-in method numpy.core._multiarray_umath.implement_array_function}
          600    0.026    0.000    0.059    0.000 mlprodict/onnxrt/ops_cpu/op_gather.py:29(_run)
          100    0.003    0.000    0.045    0.000 mlprodict/onnxrt/ops_cpu/op_reduce_sum.py:64(_run)
          100    0.001    0.000    0.041    0.000 <__array_function__ internals>:2(sum)
          100    0.001    0.000    0.040    0.000 /var/lib/jenkins/workspace/mlprodict/mlprodict_UT_39_std/_venv/lib/python3.9/site-packages/numpy/core/fromnumeric.py:2123(sum)
          100    0.002    0.000    0.038    0.000 /var/lib/jenkins/workspace/mlprodict/mlprodict_UT_39_std/_venv/lib/python3.9/site-packages/numpy/core/fromnumeric.py:69(_wrapreduction)
          100    0.036    0.000    0.036    0.000 {method 'reduce' of 'numpy.ufunc' objects}
          300    0.001    0.000    0.035    0.000 mlprodict/onnxrt/ops_cpu/op_reshape.py:38(_run)
          300    0.013    0.000    0.034    0.000 mlprodict/onnxrt/ops_cpu/op_reshape.py:15(reshape_reference_implementation)
          600    0.009    0.000    0.033    0.000 /var/lib/jenkins/workspace/mlprodict/mlprodict_UT_39_std/_venv/lib/python3.9/site-packages/numpy/core/_dtype.py:34(__str__)
          400    0.003    0.000    0.033    0.000 mlprodict/onnxrt/ops_cpu/op_identity.py:18(_run)
          200    0.009    0.000    0.029    0.000 mlprodict/onnxrt/ops_cpu/op_unsqueeze.py:65(_run)
          400    0.029    0.000    0.029    0.000 {method 'copy' of 'numpy.ndarray' objects}
          600    0.008    0.000    0.024    0.000 /var/lib/jenkins/workspace/mlprodict/mlprodict_UT_39_std/_venv/lib/python3.9/site-packages/numpy/core/_dtype.py:321(_name_get)
          200    0.002    0.000    0.019    0.000 <__array_function__ internals>:2(expand_dims)
          100    0.000    0.000    0.018    0.000 mlprodict/onnxrt/ops_cpu/op_gemm.py:57(_run)
          100    0.000    0.000    0.018    0.000 mlprodict/onnxrt/ops_cpu/op_gemm.py:27(<lambda>)
          100    0.003    0.000    0.017    0.000 mlprodict/onnxrt/ops_cpu/op_gemm.py:36(_gemm01)
    Profile execution with runtime='onnxruntime1'.
    /var/lib/jenkins/workspace/mlprodict/mlprodict_UT_39_std/_doc/sphinxdoc/source/mlprodict
             2202 function calls in 0.287 seconds
    
       Ordered by: cumulative time
    
       ncalls  tottime  percall  cumtime  percall filename:lineno(function)
            1    0.001    0.001    0.287    0.287 <stdin>:69(fct5)
          100    0.003    0.000    0.286    0.003 mlprodict/testing/einsum/einsum_fct.py:457(einsum)
          100    0.001    0.000    0.282    0.003 mlprodict/testing/einsum/einsum_fct.py:327(__call__)
          100    0.002    0.000    0.281    0.003 mlprodict/testing/einsum/einsum_fct.py:303(<lambda>)
          100    0.001    0.000    0.279    0.003 mlprodict/onnxrt/onnx_inference.py:781(run)
          100    0.003    0.000    0.278    0.003 mlprodict/onnxrt/onnx_inference.py:1183(_run_whole_runtime)
          100    0.275    0.003    0.275    0.003 mlprodict/onnxrt/ops_whole/session.py:98(run)
          100    0.000    0.000    0.001    0.000 mlprodict/testing/einsum/einsum_fct.py:379(optimize_decompose_einsum_equation)
          100    0.001    0.000    0.001    0.000 mlprodict/testing/einsum/einsum_fct.py:357(_einsum)
          100    0.000    0.000    0.000    0.000 mlprodict/onnxrt/onnx_inference.py:1255(<dictcomp>)
          300    0.000    0.000    0.000    0.000 mlprodict/testing/einsum/einsum_fct.py:655(<genexpr>)
          100    0.000    0.000    0.000    0.000 {method 'get' of 'dict' objects}
          200    0.000    0.000    0.000    0.000 {built-in method builtins.hasattr}
          100    0.000    0.000    0.000    0.000 mlprodict/testing/einsum/einsum_fct.py:304(<dictcomp>)
          200    0.000    0.000    0.000    0.000 {built-in method builtins.len}
          100    0.000    0.000    0.000    0.000 {built-in method builtins.isinstance}
          100    0.000    0.000    0.000    0.000 {method 'values' of 'dict' objects}
          100    0.000    0.000    0.000    0.000 {built-in method builtins.iter}
          100    0.000    0.000    0.000    0.000 {built-in method builtins.next}
            1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

source on GitHub

mlprodict.testing.einsum.einsum_fct.enumerate_cached_einsum()#

Enumerates all cached einsum function.

source on GitHub

mlprodict.testing.einsum.einsum_fct.optimize_decompose_einsum_equation(equation, dtype, optimize=False, runtime='batch_dot', cache=True, opset=None, decompose=True, strategy=None, verbose=None)#

Proposes a new implementation of numpy.einsum. It does not allow expresion using … and expects a right member.

Parameters

equation – einsum equation
optimize – permutes all letters to find the best permutation
runtime – runtime used to compute the results once the computation graph is produced (see below)
cache – if True, the function stores the preprocessing done for a specific equation, the second call with the same equation is much faster
opset – ONNX opset to use for some runtimes
decompose – by default, the function decomposes the equation into more simple operators but it can keep the original ONNX einsum operator.
strategy – optimisation strategy (see below)
verbose – display progress if optimize is True

Returns

einsum result

The available runtimes are:

batch_dot: the runtime is apply_einsum_sequence,
python: one ONNX graph executed with a python runtime,
onnxruntime1: one ONNX graph executed with onnxruntime.

The optimisation strategy can be:

None: the same runtime is used to find the best permutation of letters
‘ml’: a machine learned model is used to predict the
best permutation of letters, this model comes from notebook Infer operator computation cost.

The function works in two steps:

first step analyses the equation to produce a computation graph, this graph can also be converted into ONNX,
second step runs the graph whatever the graph is.

The function returns an object of type CachedEinsum which has the following members after optimization:

equation_ corresponding to the best equivalent equation
graph_: the corresponding graph returned by function
:func:`decompose_einsum_equation <mlprodict.testing.einsum.einsum_impl.decompose_einsum_equation> `
onnx_: if a conversion to onnx is used, stores the onnx graph
runtime_: a function used by __call__, calls the runtime
oinf_: an object of type OnnxInference
timed_permutations_: memorizes the results of the optimization

<<<

import numpy
from mlprodict.testing.einsum import optimize_decompose_einsum_equation

seq_opt = optimize_decompose_einsum_equation(
    "bsnh,btnh->bnts", numpy.float64, strategy='ml', verbose=1,
    runtime="python", optimize=True)

print("best equation:", seq_opt.equation_)

>>>

    
  0%|          | 0/121 [00:00<?, ?it/s]
5 mlbest='bsnh,btnh->bnts':   0%|          | 0/121 [00:00<?, ?it/s]
5 mlbest='bsnh,btnh->bnts':   2%|2         | 3/121 [00:00<00:04, 25.65it/s]
5 mlbest='bnth,bsth->btsn':   2%|2         | 3/121 [00:00<00:04, 25.65it/s]
5 mlbest='bnth,bsth->btsn':   5%|4         | 6/121 [00:00<00:04, 26.99it/s]
5 mlbest='bnth,bsth->btsn':   7%|7         | 9/121 [00:00<00:04, 27.54it/s]
5 mlbest='bnht,bsht->bhsn':   7%|7         | 9/121 [00:00<00:04, 27.54it/s]
5 mlbest='bnht,bsht->bhsn':  10%|9         | 12/121 [00:00<00:03, 27.91it/s]
5 mlbest='bhtn,bstn->btsh':  10%|9         | 12/121 [00:00<00:03, 27.91it/s]
5 mlbest='bhtn,bstn->btsh':  12%|#2        | 15/121 [00:00<00:03, 27.37it/s]
5 mlbest='bhts,bnts->btnh':  12%|#2        | 15/121 [00:00<00:03, 27.37it/s]
5 mlbest='bhts,bnts->btnh':  15%|#4        | 18/121 [00:00<00:03, 27.72it/s]
5 mlbest='bhts,bnts->btnh':  17%|#7        | 21/121 [00:00<00:03, 28.09it/s]
5 mlbest='bhts,bnts->btnh':  20%|#9        | 24/121 [00:00<00:03, 28.38it/s]
5 mlbest='bhts,bnts->btnh':  22%|##2       | 27/121 [00:00<00:03, 27.79it/s]
5 mlbest='bhts,bnts->btnh':  25%|##4       | 30/121 [00:01<00:03, 28.05it/s]
5 mlbest='bhts,bnts->btnh':  27%|##7       | 33/121 [00:01<00:03, 28.24it/s]
5 mlbest='bhts,bnts->btnh':  30%|##9       | 36/121 [00:01<00:02, 28.39it/s]
5 mlbest='bhts,bnts->btnh':  32%|###2      | 39/121 [00:01<00:02, 28.57it/s]
5 mlbest='bhts,bnts->btnh':  35%|###4      | 42/121 [00:01<00:02, 27.84it/s]
5 mlbest='bhts,bnts->btnh':  37%|###7      | 45/121 [00:01<00:02, 28.09it/s]
5 mlbest='bhts,bnts->btnh':  40%|###9      | 48/121 [00:01<00:02, 28.31it/s]
5 mlbest='bhts,bnts->btnh':  42%|####2     | 51/121 [00:01<00:02, 28.44it/s]
5 mlbest='bhts,bnts->btnh':  45%|####4     | 54/121 [00:01<00:02, 28.68it/s]
5 mlbest='bhts,bnts->btnh':  47%|####7     | 57/121 [00:02<00:02, 27.92it/s]
5 mlbest='bhts,bnts->btnh':  50%|####9     | 60/121 [00:02<00:02, 28.15it/s]
5 mlbest='bhts,bnts->btnh':  52%|#####2    | 63/121 [00:02<00:02, 28.31it/s]
5 mlbest='bhts,bnts->btnh':  55%|#####4    | 66/121 [00:02<00:01, 28.51it/s]
5 mlbest='bhts,bnts->btnh':  57%|#####7    | 69/121 [00:02<00:01, 28.64it/s]
5 mlbest='bhts,bnts->btnh':  60%|#####9    | 72/121 [00:02<00:01, 27.91it/s]
5 mlbest='bhts,bnts->btnh':  62%|######1   | 75/121 [00:02<00:01, 28.12it/s]
5 mlbest='bhts,bnts->btnh':  64%|######4   | 78/121 [00:02<00:01, 28.31it/s]
5 mlbest='bhts,bnts->btnh':  67%|######6   | 81/121 [00:02<00:01, 28.50it/s]
5 mlbest='bhts,bnts->btnh':  69%|######9   | 84/121 [00:02<00:01, 27.78it/s]
5 mlbest='bhts,bnts->btnh':  72%|#######1  | 87/121 [00:03<00:01, 28.02it/s]
5 mlbest='bhts,bnts->btnh':  74%|#######4  | 90/121 [00:03<00:01, 28.22it/s]
5 mlbest='bhts,bnts->btnh':  77%|#######6  | 93/121 [00:03<00:00, 28.36it/s]
5 mlbest='bhts,bnts->btnh':  79%|#######9  | 96/121 [00:03<00:00, 28.49it/s]
5 mlbest='bhts,bnts->btnh':  82%|########1 | 99/121 [00:03<00:00, 27.72it/s]
5 mlbest='bhts,bnts->btnh':  84%|########4 | 102/121 [00:03<00:00, 27.99it/s]
5 mlbest='bhts,bnts->btnh':  87%|########6 | 105/121 [00:03<00:00, 28.26it/s]
5 mlbest='bhts,bnts->btnh':  89%|########9 | 108/121 [00:03<00:00, 28.39it/s]
5 mlbest='bhts,bnts->btnh':  92%|#########1| 111/121 [00:03<00:00, 28.52it/s]
5 mlbest='bhts,bnts->btnh':  94%|#########4| 114/121 [00:04<00:00, 27.75it/s]
5 mlbest='bhts,bnts->btnh':  97%|#########6| 117/121 [00:04<00:00, 28.01it/s]
5 mlbest='bhts,bnts->btnh':  99%|#########9| 120/121 [00:04<00:00, 28.17it/s]
5 mlbest='bhts,bnts->btnh': 100%|##########| 121/121 [00:04<00:00, 28.12it/s]
    best equation: bhts,bnts->btnh

source on GitHub

module testing.einsum.einsum_bench

module testing.einsum.einsum_impl

module testing.einsum.einsum_fct#

Short summary#

Classes#

Functions#

Static Methods#

Methods#

Documentation#

module `testing.einsum.einsum_fct`#