module testing.einsum.einsum_fct
#
Short summary#
module mlprodict.testing.einsum.einsum_fct
Main functions decomposing einsum computation into more simple functions.
Classes#
class |
truncated documentation |
---|---|
Stores all the necessary information to cache the preprocessing of a an einsum equation. |
Functions#
function |
truncated documentation |
---|---|
Proposes a new implementation of numpy.einsum. It does not allow expresion using … and expects a right … |
|
Enumerates all cached einsum function. |
|
Proposes a new implementation of numpy.einsum. It does not allow expresion using … and expects a right … |
Static Methods#
staticmethod |
truncated documentation |
---|---|
Creates an instance of CachedEinsum. |
Methods#
method |
truncated documentation |
---|---|
Calls the runtime self.runtime_. |
|
usual |
|
Preprocesses the equation builds whatever is necessary to compute the result of the einsum equation. |
|
Builds an ONNX graph with a single einsum operator. |
|
Builds the runtime associated to the equation self.equation_. |
|
Returns default inputs (reshaped numpy.arange + 0.7i). |
Documentation#
Main functions decomposing einsum computation into more simple functions.
- class mlprodict.testing.einsum.einsum_fct.CachedEinsum(equation, runtime='batch_dot', opset=None, optimize=False, dtype=<class 'numpy.float64'>, decompose=True, strategy=None, verbose=None, key=None)#
Bases:
object
Stores all the necessary information to cache the preprocessing of a an einsum equation.
- Parameters
equation – numpy equation
runtime – see
einsum
opset – ONNX opset
optimize – finds the best letter permutation
dtype – dtype
decompose – to decompose Einsum operator or to keep it as is
key – key used to cache this class
strategy – optimization strategy
verbose – displays progress information
The class creates the following attributes:
equation_ corresponding to the best equivalent equation
- graph_: the corresponding graph returned by function
:func:`decompose_einsum_equation <mlprodict.testing.einsum.einsum_impl.decompose_einsum_equation> `
onnx_: if a conversion to onnx is used, stores the onnx graph
runtime_: a function used by __call__, calls the runtime
- __call__(*inputs)#
Calls the runtime self.runtime_.
- __init__(equation, runtime='batch_dot', opset=None, optimize=False, dtype=<class 'numpy.float64'>, decompose=True, strategy=None, verbose=None, key=None)#
- __repr__()#
usual
- _build_optimize()#
- _build_optimize_ml()#
- build()#
Preprocesses the equation builds whatever is necessary to compute the result of the einsum equation.
- static build_einsum(equation, runtime, opset, optimize, dtype, decompose=True, strategy=None, verbose=None, key=None)#
Creates an instance of CachedEinsum.
- build_onnx_einsum(input_names)#
Builds an ONNX graph with a single einsum operator.
- build_runtime()#
Builds the runtime associated to the equation self.equation_.
- default_inputs(N=None)#
Returns default inputs (reshaped numpy.arange + 0.7i).
- Parameters
N – dimension (all dimension have the same size)
If N is None, N is given a size depending on the number of letters to avoid spending too much time on optimization.
- mlprodict.testing.einsum.einsum_fct._einsum(equation, dtype, optimize=False, runtime='batch_dot', cache=True, opset=None, decompose=True, strategy=None, verbose=None)#
- mlprodict.testing.einsum.einsum_fct.einsum(equation, *inputs, optimize=False, runtime='batch_dot', cache=True, opset=None, decompose=True, strategy=None, verbose=None)#
Proposes a new implementation of numpy.einsum. It does not allow expresion using … and expects a right member.
- Parameters
equation – einsum equation
inputs – inputs
optimize – permutes all letters to find the best permutation
runtime – runtime used to compute the results once the computation graph is produced (see below)
cache – if True, the function stores the preprocessing done for a specific equation, the second call with the same equation is much faster
opset – ONNX opset to use for some runtimes
decompose – by default, the function decomposes the equation into more simple operators but it can keep the original ONNX einsum operator.
strategy – optimisation strategy (see below)
verbose – display progress if optimize is True
- Returns
einsum result
The available runtimes are:
batch_dot: the runtime is
apply_einsum_sequence
,python: one ONNX graph executed with a python runtime,
onnxruntime1: one ONNX graph executed with onnxruntime.
The optimisation strategy can be:
None: the same runtime is used to find the best permutation of letters
- ‘ml’: a machine learned model is used to predict the
best permutation of letters, this model comes from notebook Infer operator computation cost.
The function works in two steps:
first step analyses the equation to produce a computation graph, this graph can also be converted into ONNX,
second step runs the graph whatever the graph is.
Further details are available in the documentation of function
optimize_decompose_einsum_equation
. The function works the same way as numpy.einsum:<<<
import numpy from mlprodict.testing.einsum import einsum equation = "abc,cd->abd" m1 = numpy.random.randn(2, 2, 2) m2 = numpy.random.randn(2, 2) np = numpy.einsum(equation, m1, m2) print('numpy.einsum') print(np) print('mlprodict.testing.einsum') mp = einsum(equation, m1, m2) print(mp)
>>>
numpy.einsum [[[-2.499 0.046] [-0.885 -0.078]] [[ 2.338 -0.146] [-1.562 -0.154]]] mlprodict.testing.einsum [[[-2.499 0.046] [-0.885 -0.078]] [[ 2.338 -0.146] [-1.562 -0.154]]]
In some case, the einsum implementation can be optimized by looping on possible permutation:
<<<
import timeit import numpy from mlprodict.testing.einsum import einsum from mlprodict.testing.einsum.einsum_fct import enumerate_cached_einsum equation = "cab,cd->ad" m1 = numpy.random.randn(20, 20, 20) m2 = numpy.random.randn(20, 20) print('numpy.einsum', timeit.timeit('numpy.einsum(equation, m1, m2)', number=200, globals=globals())) einsum(equation, m1, m2) print('einsum', timeit.timeit('einsum(equation, m1, m2)', number=200, globals=globals())) einsum(equation, m1, m2, runtime='python') print('einsum-python', timeit.timeit('einsum(equation, m1, m2, runtime="python")', number=200, globals=globals())) einsum(equation, m1, m2, runtime='onnxruntime1') print('einsum-onnxruntime1', timeit.timeit('einsum(equation, m1, m2, runtime="onnxruntime1")', number=200, globals=globals())) einsum(equation, m1, m2, runtime='onnxruntime1', optimize=True, verbose=1) print('einsum-onnxruntime1', timeit.timeit('einsum(equation, m1, m2, runtime="onnxruntime1", optimize=True)', number=200, globals=globals())) print("list of cached einsum equations") for k, v in enumerate_cached_einsum(): print(k, v.equation, v.equation_)
>>>
numpy.einsum 0.1758663970977068 einsum 0.18937530741095543 einsum-python 0.27371818013489246 einsum-onnxruntime1 0.40259831584990025 einsum-onnxruntime1 0.3921410646289587 list of cached einsum equations ('cab,cd->ad', 'batch_dot', None, False, dtype('float64'), True, None) cab,cd->ad cab,cd->ad ('cab,cd->ad', 'python', None, False, dtype('float64'), True, None) cab,cd->ad cab,cd->ad ('cab,cd->ad', 'onnxruntime1', None, False, dtype('float64'), True, None) cab,cd->ad cab,cd->ad ('cab,cd->ad', 'onnxruntime1', None, True, dtype('float64'), True, None) cab,cd->ad dcb,da->ca [runpythonerror] 0%| | 0/25 [00:00<?, ?it/s] 0.02 rtbest='cab,cd->ad': 0%| | 0/25 [00:00<?, ?it/s] 0.02 rtbest='cab,cd->ad': 8%|▊ | 2/25 [00:00<00:01, 18.76it/s] 0.019 rtbest='dab,dc->ac': 8%|▊ | 2/25 [00:00<00:01, 18.76it/s] 0.019 rtbest='bac,bd->ad': 8%|▊ | 2/25 [00:00<00:01, 18.76it/s] 0.019 rtbest='bac,bd->ad': 16%|█▌ | 4/25 [00:00<00:01, 18.08it/s] 0.018 rtbest='bad,bc->ac': 16%|█▌ | 4/25 [00:00<00:01, 18.08it/s] 0.018 rtbest='bad,bc->ac': 28%|██▊ | 7/25 [00:00<00:00, 19.26it/s] 0.018 rtbest='dba,dc->bc': 28%|██▊ | 7/25 [00:00<00:00, 19.26it/s] 0.018 rtbest='dba,dc->bc': 36%|███▌ | 9/25 [00:00<00:00, 18.55it/s] 0.018 rtbest='dba,dc->bc': 44%|████▍ | 11/25 [00:00<00:00, 18.64it/s] 0.018 rtbest='cda,cb->db': 44%|████▍ | 11/25 [00:00<00:00, 18.64it/s] 0.018 rtbest='cda,cb->db': 52%|█████▏ | 13/25 [00:00<00:00, 18.25it/s] 0.018 rtbest='cda,cb->db': 60%|██████ | 15/25 [00:00<00:00, 18.44it/s] 0.018 rtbest='cda,cb->db': 68%|██████▊ | 17/25 [00:00<00:00, 18.04it/s] 0.018 rtbest='cda,cb->db': 80%|████████ | 20/25 [00:01<00:00, 18.15it/s] 0.018 rtbest='dcb,da->ca': 80%|████████ | 20/25 [00:01<00:00, 18.15it/s] 0.018 rtbest='dcb,da->ca': 92%|█████████▏| 23/25 [00:01<00:00, 18.99it/s] 0.018 rtbest='dcb,da->ca': 100%|██████████| 25/25 [00:01<00:00, 18.57it/s] 0.018 rtbest='dcb,da->ca': 100%|██████████| 25/25 [00:01<00:00, 18.52it/s]
The last example shows the time taken by every function:
<<<
import os from pyquickhelper.pycode.profiling import profile import numpy from mlprodict.testing.einsum import einsum from mlprodict.testing.einsum.einsum_fct import enumerate_cached_einsum from mlprodict import __file__ as path root = os.path.dirname(path) equation = "cab,cd->ad" m1 = numpy.random.randn(200, 20, 20) m2 = numpy.random.randn(200, 20) def clean(txt): txt = txt.replace(root, "mlprodict") return "\n".join(txt.split("\n")[:30]) def fct1(): for i in range(100): einsum(equation, m1, m2, cache=False) print("Profile cache with default runtime.") res = profile(fct1) print(root) print(clean(res[1])) def fct2(): for i in range(100): einsum(equation, m1, m2, cache=False, runtime='python') print("Profile cache with runtime='python'.") res = profile(fct2) print(root) print(clean(res[1])) def fct3(): for i in range(100): einsum(equation, m1, m2, cache=True) einsum(equation, m1, m2, cache=True) print("Profile execution with default runtime.") res = profile(fct3) print(root) print(clean(res[1])) def fct4(): for i in range(100): einsum(equation, m1, m2, cache=True, runtime='python') einsum(equation, m1, m2, cache=True, runtime='python') print("Profile execution with runtime='python'.") res = profile(fct4) print(root) print(clean(res[1])) def fct5(): for i in range(100): einsum(equation, m1, m2, cache=True, runtime='onnxruntime1') einsum(equation, m1, m2, cache=True, runtime='onnxruntime1') print("Profile execution with runtime='onnxruntime1'.") res = profile(fct5) print(root) print(clean(res[1]))
>>>
Profile cache with default runtime. /var/lib/jenkins/workspace/mlprodict/mlprodict_UT_39_std/_doc/sphinxdoc/source/mlprodict 133202 function calls (133002 primitive calls) in 0.703 seconds Ordered by: cumulative time ncalls tottime percall cumtime percall filename:lineno(function) 1 0.001 0.001 0.703 0.703 <stdin>:27(fct1) 100 0.002 0.000 0.702 0.007 mlprodict/testing/einsum/einsum_fct.py:457(einsum) 100 0.000 0.000 0.503 0.005 mlprodict/testing/einsum/einsum_fct.py:379(optimize_decompose_einsum_equation) 100 0.000 0.000 0.502 0.005 mlprodict/testing/einsum/einsum_fct.py:357(_einsum) 100 0.001 0.000 0.502 0.005 mlprodict/testing/einsum/einsum_fct.py:339(build_einsum) 100 0.001 0.000 0.500 0.005 mlprodict/testing/einsum/einsum_fct.py:109(build) 100 0.001 0.000 0.499 0.005 mlprodict/testing/einsum/einsum_fct.py:275(build_runtime) 100 0.004 0.000 0.498 0.005 mlprodict/testing/einsum/einsum_impl.py:85(decompose_einsum_equation) 100 0.063 0.001 0.434 0.004 mlprodict/testing/einsum/einsum_impl.py:411(_decompose_einsum_equation_simple) 100 0.000 0.000 0.197 0.002 mlprodict/testing/einsum/einsum_fct.py:327(__call__) 100 0.001 0.000 0.196 0.002 mlprodict/testing/einsum/einsum_fct.py:287(<lambda>) 100 0.001 0.000 0.195 0.002 mlprodict/testing/einsum/einsum_impl.py:165(apply_einsum_sequence) 100 0.009 0.000 0.194 0.002 mlprodict/testing/einsum/einsum_impl_classes.py:1217(apply_sequence) 1200 0.012 0.000 0.184 0.000 mlprodict/testing/einsum/einsum_impl_classes.py:611(apply) 1200 0.025 0.000 0.175 0.000 mlprodict/testing/einsum/einsum_impl_classes.py:334(compute_output_row) 4800 0.019 0.000 0.098 0.000 mlprodict/testing/einsum/einsum_impl_classes.py:22(single_axes) 1600 0.009 0.000 0.096 0.000 {built-in method numpy.core._multiarray_umath.implement_array_function} 3800 0.078 0.000 0.078 0.000 mlprodict/testing/einsum/einsum_impl_classes.py:38(<listcomp>) 1900 0.073 0.000 0.073 0.000 {method 'reduce' of 'numpy.ufunc' objects} 100 0.010 0.000 0.068 0.001 mlprodict/testing/einsum/einsum_impl_classes.py:504(_apply_batch_dot) 500 0.008 0.000 0.058 0.000 /var/lib/jenkins/workspace/mlprodict/mlprodict_UT_39_std/_venv/lib/python3.9/site-packages/numpy/core/fromnumeric.py:69(_wrapreduction) 500 0.040 0.000 0.057 0.000 mlprodict/testing/einsum/einsum_impl.py:227(_apply_transpose_reshape) 100 0.002 0.000 0.042 0.000 mlprodict/testing/einsum/einsum_impl_classes.py:573(_apply_reduce_sum) 100 0.001 0.000 0.038 0.000 <__array_function__ internals>:2(sum) 100 0.001 0.000 0.037 0.000 /var/lib/jenkins/workspace/mlprodict/mlprodict_UT_39_std/_venv/lib/python3.9/site-packages/numpy/core/fromnumeric.py:2123(sum) Profile cache with runtime='python'. /var/lib/jenkins/workspace/mlprodict/mlprodict_UT_39_std/_doc/sphinxdoc/source/mlprodict 924505 function calls (915071 primitive calls) in 4.645 seconds Ordered by: cumulative time ncalls tottime percall cumtime percall filename:lineno(function) 1 0.001 0.001 4.659 4.659 <stdin>:36(fct2) 100 0.003 0.000 4.658 0.047 mlprodict/testing/einsum/einsum_fct.py:457(einsum) 100 0.000 0.000 4.363 0.044 mlprodict/testing/einsum/einsum_fct.py:379(optimize_decompose_einsum_equation) 100 0.001 0.000 4.363 0.044 mlprodict/testing/einsum/einsum_fct.py:357(_einsum) 100 0.002 0.000 4.362 0.044 mlprodict/testing/einsum/einsum_fct.py:339(build_einsum) 100 0.001 0.000 4.360 0.044 mlprodict/testing/einsum/einsum_fct.py:109(build) 100 0.022 0.000 4.359 0.044 mlprodict/testing/einsum/einsum_fct.py:275(build_runtime) 100 0.003 0.000 2.667 0.027 mlprodict/onnxrt/onnx_inference.py:103(__init__) 100 0.077 0.001 2.663 0.027 mlprodict/onnxrt/onnx_inference.py:180(_init) 2800 0.054 0.000 1.422 0.001 mlprodict/onnxrt/onnx_inference_node.py:186(setup_runtime) 2800 0.051 0.000 1.320 0.000 mlprodict/onnxrt/ops.py:9(load_op) 100 0.040 0.000 1.166 0.012 mlprodict/testing/einsum/einsum_impl_classes.py:1476(to_onnx) 171/1 0.003 0.000 0.882 0.882 <frozen importlib._bootstrap>:1002(_find_and_load) 171/1 0.003 0.000 0.882 0.882 <frozen importlib._bootstrap>:967(_find_and_load_unlocked) 171/1 0.003 0.000 0.881 0.881 <frozen importlib._bootstrap>:659(_load_unlocked) 158/1 0.001 0.000 0.881 0.881 <frozen importlib._bootstrap_external>:784(exec_module) 185/1 0.000 0.000 0.881 0.881 <frozen importlib._bootstrap>:220(_call_with_frames_removed) 159/1 0.001 0.000 0.881 0.881 {built-in method builtins.exec} 1 0.000 0.000 0.881 0.881 mlprodict/onnxrt/ops_cpu/__init__.py:2(<module>) 1 0.006 0.006 0.860 0.860 mlprodict/onnxrt/ops_cpu/_op_list.py:3(<module>) 100 0.218 0.002 0.757 0.008 mlprodict/onnxrt/onnx_inference.py:510(to_sequence) 5000 0.040 0.000 0.523 0.000 mlprodict/testing/einsum/einsum_impl_classes.py:964(to_onnx) 9000/8102 0.045 0.000 0.520 0.000 {method 'join' of 'str' objects} 100 0.004 0.000 0.503 0.005 mlprodict/testing/einsum/einsum_impl.py:85(decompose_einsum_equation) 151 0.008 0.000 0.483 0.003 mlprodict/onnxrt/doc/doc_helper.py:152(get_rst_doc) Profile execution with default runtime. /var/lib/jenkins/workspace/mlprodict/mlprodict_UT_39_std/_doc/sphinxdoc/source/mlprodict 35402 function calls in 0.196 seconds Ordered by: cumulative time ncalls tottime percall cumtime percall filename:lineno(function) 1 0.001 0.001 0.196 0.196 <stdin>:46(fct3) 100 0.002 0.000 0.195 0.002 mlprodict/testing/einsum/einsum_fct.py:457(einsum) 100 0.000 0.000 0.192 0.002 mlprodict/testing/einsum/einsum_fct.py:327(__call__) 100 0.001 0.000 0.191 0.002 mlprodict/testing/einsum/einsum_fct.py:287(<lambda>) 100 0.001 0.000 0.190 0.002 mlprodict/testing/einsum/einsum_impl.py:165(apply_einsum_sequence) 100 0.009 0.000 0.189 0.002 mlprodict/testing/einsum/einsum_impl_classes.py:1217(apply_sequence) 1200 0.011 0.000 0.179 0.000 mlprodict/testing/einsum/einsum_impl_classes.py:611(apply) 1400 0.006 0.000 0.091 0.000 {built-in method numpy.core._multiarray_umath.implement_array_function} 100 0.010 0.000 0.067 0.001 mlprodict/testing/einsum/einsum_impl_classes.py:504(_apply_batch_dot) 500 0.008 0.000 0.060 0.000 /var/lib/jenkins/workspace/mlprodict/mlprodict_UT_39_std/_venv/lib/python3.9/site-packages/numpy/core/fromnumeric.py:69(_wrapreduction) 500 0.049 0.000 0.049 0.000 {method 'reduce' of 'numpy.ufunc' objects} 100 0.002 0.000 0.043 0.000 mlprodict/testing/einsum/einsum_impl_classes.py:573(_apply_reduce_sum) 100 0.001 0.000 0.039 0.000 <__array_function__ internals>:2(sum) 100 0.001 0.000 0.038 0.000 /var/lib/jenkins/workspace/mlprodict/mlprodict_UT_39_std/_venv/lib/python3.9/site-packages/numpy/core/fromnumeric.py:2123(sum) 400 0.002 0.000 0.029 0.000 <__array_function__ internals>:2(prod) 400 0.003 0.000 0.026 0.000 /var/lib/jenkins/workspace/mlprodict/mlprodict_UT_39_std/_venv/lib/python3.9/site-packages/numpy/core/fromnumeric.py:2933(prod) 200 0.004 0.000 0.024 0.000 mlprodict/testing/einsum/einsum_impl_classes.py:418(_apply_expand_dims) 300 0.001 0.000 0.019 0.000 <__array_function__ internals>:2(expand_dims) 400 0.005 0.000 0.019 0.000 mlprodict/testing/einsum/einsum_impl_classes.py:430(_apply_transpose) 100 0.017 0.000 0.018 0.000 mlprodict/testing/einsum/blas_lapack.py:96(gemm_dot) 300 0.006 0.000 0.016 0.000 /var/lib/jenkins/workspace/mlprodict/mlprodict_UT_39_std/_venv/lib/python3.9/site-packages/numpy/lib/shape_base.py:512(expand_dims) 400 0.001 0.000 0.008 0.000 <__array_function__ internals>:2(transpose) 1300 0.004 0.000 0.008 0.000 mlprodict/testing/einsum/einsum_impl_classes.py:379(_get_data) 100 0.002 0.000 0.006 0.000 mlprodict/testing/einsum/einsum_impl_classes.py:598(_apply_squeeze) 2000 0.006 0.000 0.006 0.000 {built-in method builtins.getattr} Profile execution with runtime='python'. /var/lib/jenkins/workspace/mlprodict/mlprodict_UT_39_std/_doc/sphinxdoc/source/mlprodict 33702 function calls in 0.284 seconds Ordered by: cumulative time ncalls tottime percall cumtime percall filename:lineno(function) 1 0.001 0.001 0.284 0.284 <stdin>:58(fct4) 100 0.002 0.000 0.283 0.003 mlprodict/testing/einsum/einsum_fct.py:457(einsum) 100 0.001 0.000 0.279 0.003 mlprodict/testing/einsum/einsum_fct.py:327(__call__) 100 0.002 0.000 0.279 0.003 mlprodict/testing/einsum/einsum_fct.py:303(<lambda>) 100 0.001 0.000 0.277 0.003 mlprodict/onnxrt/onnx_inference.py:781(run) 100 0.002 0.000 0.276 0.003 mlprodict/onnxrt/onnx_inference.py:299(_run_sequence_runtime_compiled) 100 0.013 0.000 0.274 0.003 <string>:1(compiled_run) 2100 0.028 0.000 0.097 0.000 {built-in method numpy.core._multiarray_umath.implement_array_function} 600 0.026 0.000 0.059 0.000 mlprodict/onnxrt/ops_cpu/op_gather.py:29(_run) 100 0.003 0.000 0.045 0.000 mlprodict/onnxrt/ops_cpu/op_reduce_sum.py:64(_run) 100 0.001 0.000 0.041 0.000 <__array_function__ internals>:2(sum) 100 0.001 0.000 0.040 0.000 /var/lib/jenkins/workspace/mlprodict/mlprodict_UT_39_std/_venv/lib/python3.9/site-packages/numpy/core/fromnumeric.py:2123(sum) 100 0.002 0.000 0.038 0.000 /var/lib/jenkins/workspace/mlprodict/mlprodict_UT_39_std/_venv/lib/python3.9/site-packages/numpy/core/fromnumeric.py:69(_wrapreduction) 100 0.036 0.000 0.036 0.000 {method 'reduce' of 'numpy.ufunc' objects} 300 0.001 0.000 0.035 0.000 mlprodict/onnxrt/ops_cpu/op_reshape.py:38(_run) 300 0.013 0.000 0.034 0.000 mlprodict/onnxrt/ops_cpu/op_reshape.py:15(reshape_reference_implementation) 600 0.009 0.000 0.033 0.000 /var/lib/jenkins/workspace/mlprodict/mlprodict_UT_39_std/_venv/lib/python3.9/site-packages/numpy/core/_dtype.py:34(__str__) 400 0.003 0.000 0.033 0.000 mlprodict/onnxrt/ops_cpu/op_identity.py:18(_run) 200 0.009 0.000 0.029 0.000 mlprodict/onnxrt/ops_cpu/op_unsqueeze.py:65(_run) 400 0.029 0.000 0.029 0.000 {method 'copy' of 'numpy.ndarray' objects} 600 0.008 0.000 0.024 0.000 /var/lib/jenkins/workspace/mlprodict/mlprodict_UT_39_std/_venv/lib/python3.9/site-packages/numpy/core/_dtype.py:321(_name_get) 200 0.002 0.000 0.019 0.000 <__array_function__ internals>:2(expand_dims) 100 0.000 0.000 0.018 0.000 mlprodict/onnxrt/ops_cpu/op_gemm.py:57(_run) 100 0.000 0.000 0.018 0.000 mlprodict/onnxrt/ops_cpu/op_gemm.py:27(<lambda>) 100 0.003 0.000 0.017 0.000 mlprodict/onnxrt/ops_cpu/op_gemm.py:36(_gemm01) Profile execution with runtime='onnxruntime1'. /var/lib/jenkins/workspace/mlprodict/mlprodict_UT_39_std/_doc/sphinxdoc/source/mlprodict 2202 function calls in 0.287 seconds Ordered by: cumulative time ncalls tottime percall cumtime percall filename:lineno(function) 1 0.001 0.001 0.287 0.287 <stdin>:69(fct5) 100 0.003 0.000 0.286 0.003 mlprodict/testing/einsum/einsum_fct.py:457(einsum) 100 0.001 0.000 0.282 0.003 mlprodict/testing/einsum/einsum_fct.py:327(__call__) 100 0.002 0.000 0.281 0.003 mlprodict/testing/einsum/einsum_fct.py:303(<lambda>) 100 0.001 0.000 0.279 0.003 mlprodict/onnxrt/onnx_inference.py:781(run) 100 0.003 0.000 0.278 0.003 mlprodict/onnxrt/onnx_inference.py:1183(_run_whole_runtime) 100 0.275 0.003 0.275 0.003 mlprodict/onnxrt/ops_whole/session.py:98(run) 100 0.000 0.000 0.001 0.000 mlprodict/testing/einsum/einsum_fct.py:379(optimize_decompose_einsum_equation) 100 0.001 0.000 0.001 0.000 mlprodict/testing/einsum/einsum_fct.py:357(_einsum) 100 0.000 0.000 0.000 0.000 mlprodict/onnxrt/onnx_inference.py:1255(<dictcomp>) 300 0.000 0.000 0.000 0.000 mlprodict/testing/einsum/einsum_fct.py:655(<genexpr>) 100 0.000 0.000 0.000 0.000 {method 'get' of 'dict' objects} 200 0.000 0.000 0.000 0.000 {built-in method builtins.hasattr} 100 0.000 0.000 0.000 0.000 mlprodict/testing/einsum/einsum_fct.py:304(<dictcomp>) 200 0.000 0.000 0.000 0.000 {built-in method builtins.len} 100 0.000 0.000 0.000 0.000 {built-in method builtins.isinstance} 100 0.000 0.000 0.000 0.000 {method 'values' of 'dict' objects} 100 0.000 0.000 0.000 0.000 {built-in method builtins.iter} 100 0.000 0.000 0.000 0.000 {built-in method builtins.next} 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
- mlprodict.testing.einsum.einsum_fct.enumerate_cached_einsum()#
Enumerates all cached einsum function.
- mlprodict.testing.einsum.einsum_fct.optimize_decompose_einsum_equation(equation, dtype, optimize=False, runtime='batch_dot', cache=True, opset=None, decompose=True, strategy=None, verbose=None)#
Proposes a new implementation of numpy.einsum. It does not allow expresion using … and expects a right member.
- Parameters
equation – einsum equation
optimize – permutes all letters to find the best permutation
runtime – runtime used to compute the results once the computation graph is produced (see below)
cache – if True, the function stores the preprocessing done for a specific equation, the second call with the same equation is much faster
opset – ONNX opset to use for some runtimes
decompose – by default, the function decomposes the equation into more simple operators but it can keep the original ONNX einsum operator.
strategy – optimisation strategy (see below)
verbose – display progress if optimize is True
- Returns
einsum result
The available runtimes are:
batch_dot: the runtime is
apply_einsum_sequence
,python: one ONNX graph executed with a python runtime,
onnxruntime1: one ONNX graph executed with onnxruntime.
The optimisation strategy can be:
None: the same runtime is used to find the best permutation of letters
- ‘ml’: a machine learned model is used to predict the
best permutation of letters, this model comes from notebook Infer operator computation cost.
The function works in two steps:
first step analyses the equation to produce a computation graph, this graph can also be converted into ONNX,
second step runs the graph whatever the graph is.
The function returns an object of type
CachedEinsum
which has the following members after optimization:equation_ corresponding to the best equivalent equation
- graph_: the corresponding graph returned by function
:func:`decompose_einsum_equation <mlprodict.testing.einsum.einsum_impl.decompose_einsum_equation> `
onnx_: if a conversion to onnx is used, stores the onnx graph
runtime_: a function used by __call__, calls the runtime
oinf_: an object of type
OnnxInference
timed_permutations_: memorizes the results of the optimization
<<<
import numpy from mlprodict.testing.einsum import optimize_decompose_einsum_equation seq_opt = optimize_decompose_einsum_equation( "bsnh,btnh->bnts", numpy.float64, strategy='ml', verbose=1, runtime="python", optimize=True) print("best equation:", seq_opt.equation_)
>>>
0%| | 0/121 [00:00<?, ?it/s] 4.5 mlbest='bsnh,btnh->bnts': 0%| | 0/121 [00:00<?, ?it/s] 4.5 mlbest='bsnh,btnh->bnts': 2%|2 | 3/121 [00:00<00:04, 25.65it/s] 4.5 mlbest='bnth,bsth->btsn': 2%|2 | 3/121 [00:00<00:04, 25.65it/s] 4.5 mlbest='bnth,bsth->btsn': 5%|4 | 6/121 [00:00<00:04, 26.99it/s] 4.5 mlbest='bnth,bsth->btsn': 7%|7 | 9/121 [00:00<00:04, 27.54it/s] 4.5 mlbest='bnht,bsht->bhsn': 7%|7 | 9/121 [00:00<00:04, 27.54it/s] 4.5 mlbest='bnht,bsht->bhsn': 10%|9 | 12/121 [00:00<00:03, 27.91it/s] 4.5 mlbest='bhtn,bstn->btsh': 10%|9 | 12/121 [00:00<00:03, 27.91it/s] 4.5 mlbest='bhtn,bstn->btsh': 12%|#2 | 15/121 [00:00<00:03, 27.37it/s] 4.5 mlbest='bhts,bnts->btnh': 12%|#2 | 15/121 [00:00<00:03, 27.37it/s] 4.5 mlbest='bhts,bnts->btnh': 15%|#4 | 18/121 [00:00<00:03, 27.72it/s] 4.5 mlbest='bhts,bnts->btnh': 17%|#7 | 21/121 [00:00<00:03, 28.09it/s] 4.5 mlbest='bhts,bnts->btnh': 20%|#9 | 24/121 [00:00<00:03, 28.38it/s] 4.5 mlbest='bhts,bnts->btnh': 22%|##2 | 27/121 [00:00<00:03, 27.79it/s] 4.5 mlbest='bhts,bnts->btnh': 25%|##4 | 30/121 [00:01<00:03, 28.05it/s] 4.5 mlbest='bhts,bnts->btnh': 27%|##7 | 33/121 [00:01<00:03, 28.24it/s] 4.5 mlbest='bhts,bnts->btnh': 30%|##9 | 36/121 [00:01<00:02, 28.39it/s] 4.5 mlbest='bhts,bnts->btnh': 32%|###2 | 39/121 [00:01<00:02, 28.57it/s] 4.5 mlbest='bhts,bnts->btnh': 35%|###4 | 42/121 [00:01<00:02, 27.84it/s] 4.5 mlbest='bhts,bnts->btnh': 37%|###7 | 45/121 [00:01<00:02, 28.09it/s] 4.5 mlbest='bhts,bnts->btnh': 40%|###9 | 48/121 [00:01<00:02, 28.31it/s] 4.5 mlbest='bhts,bnts->btnh': 42%|####2 | 51/121 [00:01<00:02, 28.44it/s] 4.5 mlbest='bhts,bnts->btnh': 45%|####4 | 54/121 [00:01<00:02, 28.68it/s] 4.5 mlbest='bhts,bnts->btnh': 47%|####7 | 57/121 [00:02<00:02, 27.92it/s] 4.5 mlbest='bhts,bnts->btnh': 50%|####9 | 60/121 [00:02<00:02, 28.15it/s] 4.5 mlbest='bhts,bnts->btnh': 52%|#####2 | 63/121 [00:02<00:02, 28.31it/s] 4.5 mlbest='bhts,bnts->btnh': 55%|#####4 | 66/121 [00:02<00:01, 28.51it/s] 4.5 mlbest='bhts,bnts->btnh': 57%|#####7 | 69/121 [00:02<00:01, 28.64it/s] 4.5 mlbest='bhts,bnts->btnh': 60%|#####9 | 72/121 [00:02<00:01, 27.91it/s] 4.5 mlbest='bhts,bnts->btnh': 62%|######1 | 75/121 [00:02<00:01, 28.12it/s] 4.5 mlbest='bhts,bnts->btnh': 64%|######4 | 78/121 [00:02<00:01, 28.31it/s] 4.5 mlbest='bhts,bnts->btnh': 67%|######6 | 81/121 [00:02<00:01, 28.50it/s] 4.5 mlbest='bhts,bnts->btnh': 69%|######9 | 84/121 [00:02<00:01, 27.78it/s] 4.5 mlbest='bhts,bnts->btnh': 72%|#######1 | 87/121 [00:03<00:01, 28.02it/s] 4.5 mlbest='bhts,bnts->btnh': 74%|#######4 | 90/121 [00:03<00:01, 28.22it/s] 4.5 mlbest='bhts,bnts->btnh': 77%|#######6 | 93/121 [00:03<00:00, 28.36it/s] 4.5 mlbest='bhts,bnts->btnh': 79%|#######9 | 96/121 [00:03<00:00, 28.49it/s] 4.5 mlbest='bhts,bnts->btnh': 82%|########1 | 99/121 [00:03<00:00, 27.72it/s] 4.5 mlbest='bhts,bnts->btnh': 84%|########4 | 102/121 [00:03<00:00, 27.99it/s] 4.5 mlbest='bhts,bnts->btnh': 87%|########6 | 105/121 [00:03<00:00, 28.26it/s] 4.5 mlbest='bhts,bnts->btnh': 89%|########9 | 108/121 [00:03<00:00, 28.39it/s] 4.5 mlbest='bhts,bnts->btnh': 92%|#########1| 111/121 [00:03<00:00, 28.52it/s] 4.5 mlbest='bhts,bnts->btnh': 94%|#########4| 114/121 [00:04<00:00, 27.75it/s] 4.5 mlbest='bhts,bnts->btnh': 97%|#########6| 117/121 [00:04<00:00, 28.01it/s] 4.5 mlbest='bhts,bnts->btnh': 99%|#########9| 120/121 [00:04<00:00, 28.17it/s] 4.5 mlbest='bhts,bnts->btnh': 100%|##########| 121/121 [00:04<00:00, 28.12it/s] best equation: bhts,bnts->btnh