module `onnx_conv.convert`#

Short summary#

module mlprodict.onnx_conv.convert

Overloads a conversion function.

Functions#

function	truncated documentation
`_fix_opset_skl2onnx`
`_replace_tensor_type`
`convert_scorer`	Converts a scorer into ONNX assuming there exists a converter associated to it. The function wraps the function …
`get_inputs_from_data`	Produces input data for onnx runtime.
`guess_initial_types`	Guesses initial types from an array or a dataframe.
`guess_schema_from_data`	Guesses initial types from a dataset.
`guess_schema_from_model`	Guesses initial types from a model.
`to_onnx`	Converts a model using on sklearn-onnx.

Documentation#

Overloads a conversion function.

source on GitHub

mlprodict.onnx_conv.convert._fix_opset_skl2onnx()#

mlprodict.onnx_conv.convert._replace_tensor_type(schema, tensor_type)#

mlprodict.onnx_conv.convert.convert_scorer(fct, initial_types, name=None, target_opset=None, options=None, custom_conversion_functions=None, custom_shape_calculators=None, custom_parsers=None, white_op=None, black_op=None, final_types=None, verbose=0)#

Converts a scorer into ONNX assuming there exists a converter associated to it. The function wraps the function into a custom transformer, then calls function convert_sklearn from sklearn-onnx.

Parameters

fct – function to convert (or a scorer from scikit-learn)
initial_types – types information
name – name of the produced model
target_opset – to do it with a different target opset
options – additional parameters for the conversion
custom_conversion_functions – a dictionary for specifying the user customized conversion function, it takes precedence over registered converters
custom_shape_calculators – a dictionary for specifying the user customized shape calculator it takes precedence over registered shape calculators.
custom_parsers – parsers determine which outputs is expected for which particular task, default parsers are defined for classifiers, regressors, pipeline but they can be rewritten, custom_parsers is a dictionary { type: fct_parser(scope, model, inputs, custom_parsers=None) }
white_op – white list of ONNX nodes allowed while converting a pipeline, if empty, all are allowed
black_op – black list of ONNX nodes allowed while converting a pipeline, if empty, none are blacklisted
final_types – a python list. Works the same way as initial_types but not mandatory, it is used to overwrites the type (if type is not None) and the name of every output.
verbose – displays information while converting

Returns

ONNX graph

source on GitHub

mlprodict.onnx_conv.convert.get_inputs_from_data(X, schema=None)#

Produces input data for onnx runtime.

Parameters

X – data
schema – schema if None, schema is guessed with guess_schema_from_data

Returns

input data

source on GitHub

mlprodict.onnx_conv.convert.guess_initial_types(X, initial_types)#

Guesses initial types from an array or a dataframe.

Parameters

X – array or dataframe
initial_types – hints about X

Returns

data types

source on GitHub

mlprodict.onnx_conv.convert.guess_schema_from_data(X, tensor_type=None, schema=None)#

Guesses initial types from a dataset.

Parameters

X – dataset (dataframe, array)
tensor_type – if not None, replaces every FloatTensorType or DoubleTensorType by this one
schema – known schema

Returns

schema (list of typed and named columns)

source on GitHub

mlprodict.onnx_conv.convert.guess_schema_from_model(model, tensor_type=None, schema=None)#

Guesses initial types from a model.

Parameters

model – model
tensor_type – if not None, replaces every FloatTensorType or DoubleTensorType by this one
schema – known schema

Returns

schema (list of typed and named columns)

source on GitHub

mlprodict.onnx_conv.convert.to_onnx(model, X=None, name=None, initial_types=None, target_opset=None, options=None, rewrite_ops=False, white_op=None, black_op=None, final_types=None, rename_strategy=None, verbose=0)#

Converts a model using on sklearn-onnx.

Parameters

model – model to convert or a function wrapped into _PredictScorer with function make_scorer
X – training set (at least one row), can be None, it is used to infered the input types (initial_types)
initial_types – if X is None, then initial_types must be defined
name – name of the produced model
target_opset – to do it with a different target opset
options – additional parameters for the conversion
rewrite_ops – rewrites some existing converters, the changes are permanent
white_op – white list of ONNX nodes allowed while converting a pipeline, if empty, all are allowed
black_op – black list of ONNX nodes allowed while converting a pipeline, if empty, none are blacklisted
final_types – a python list. Works the same way as initial_types but not mandatory, it is used to overwrites the type (if type is not None) and the name of every output.
rename_strategy – rename any name in the graph, select shorter names, see onnx_rename_names
verbose – display information while converting the model

Returns

converted model

The function rewrites function to_onnx from sklearn-onnx but may changes a few converters if rewrite_ops is True. For example, ONNX only supports TreeEnsembleRegressor for float but not for double. It becomes available if rewrite_ops=True.

How to deal with a dataframe as input?

Each column of the dataframe is considered as an named input. The first step is to make sure that every column type is correct. pandas tends to select the least generic type to hold the content of one column. ONNX does not automatically cast the data it receives. The data must have the same type with the model is converted and when the converted model receives the data to predict.

<<<

from io import StringIO
from textwrap import dedent
import numpy
import pandas
from pyquickhelper.pycode import ExtTestCase
from sklearn.preprocessing import OneHotEncoder
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from mlprodict.onnx_conv import to_onnx
from mlprodict.onnxrt import OnnxInference

text = dedent('''
    __SCHEMA__
    7.4,0.7,0.0,1.9,0.076,11.0,34.0,0.9978,3.51,0.56,9.4,5,red
    7.8,0.88,0.0,2.6,0.098,25.0,67.0,0.9968,3.2,0.68,9.8,5,red
    7.8,0.76,0.04,2.3,0.092,15.0,54.0,0.997,3.26,0.65,9.8,5,red
    11.2,0.28,0.56,1.9,0.075,17.0,60.0,0.998,3.16,0.58,9.8,6,red
    ''')
text = text.replace(
    "__SCHEMA__",
    "fixed_acidity,volatile_acidity,citric_acid,residual_sugar,chlorides,"
    "free_sulfur_dioxide,total_sulfur_dioxide,density,pH,sulphates,"
    "alcohol,quality,color")

X_train = pandas.read_csv(StringIO(text))
for c in X_train.columns:
    if c != 'color':
        X_train[c] = X_train[c].astype(numpy.float32)
numeric_features = [c for c in X_train if c != 'color']

pipe = Pipeline([
    ("prep", ColumnTransformer([
        ("color", Pipeline([
            ('one', OneHotEncoder()),
            ('select', ColumnTransformer(
                [('sel1', 'passthrough', [0])]))
        ]), ['color']),
        ("others", "passthrough", numeric_features)
    ])),
])

pipe.fit(X_train)
pred = pipe.transform(X_train)
print(pred)

model_onnx = to_onnx(pipe, X_train, target_opset=12)
oinf = OnnxInference(model_onnx)

# The dataframe is converted into a dictionary,
# each key is a column name, each value is a numpy array.
inputs = {c: X_train[c].values for c in X_train.columns}
inputs = {c: v.reshape((v.shape[0], 1)) for c, v in inputs.items()}

onxp = oinf.run(inputs)
print(onxp)

>>>

    [[1.000e+00 7.400e+00 7.000e-01 0.000e+00 1.900e+00 7.600e-02 1.100e+01
      3.400e+01 9.978e-01 3.510e+00 5.600e-01 9.400e+00 5.000e+00]
     [1.000e+00 7.800e+00 8.800e-01 0.000e+00 2.600e+00 9.800e-02 2.500e+01
      6.700e+01 9.968e-01 3.200e+00 6.800e-01 9.800e+00 5.000e+00]
     [1.000e+00 7.800e+00 7.600e-01 4.000e-02 2.300e+00 9.200e-02 1.500e+01
      5.400e+01 9.970e-01 3.260e+00 6.500e-01 9.800e+00 5.000e+00]
     [1.000e+00 1.120e+01 2.800e-01 5.600e-01 1.900e+00 7.500e-02 1.700e+01
      6.000e+01 9.980e-01 3.160e+00 5.800e-01 9.800e+00 6.000e+00]]
    {'transformed_column': array([[1.000e+00, 7.400e+00, 7.000e-01, 0.000e+00, 1.900e+00, 7.600e-02,
            1.100e+01, 3.400e+01, 9.978e-01, 3.510e+00, 5.600e-01, 9.400e+00,
            5.000e+00],
           [1.000e+00, 7.800e+00, 8.800e-01, 0.000e+00, 2.600e+00, 9.800e-02,
            2.500e+01, 6.700e+01, 9.968e-01, 3.200e+00, 6.800e-01, 9.800e+00,
            5.000e+00],
           [1.000e+00, 7.800e+00, 7.600e-01, 4.000e-02, 2.300e+00, 9.200e-02,
            1.500e+01, 5.400e+01, 9.970e-01, 3.260e+00, 6.500e-01, 9.800e+00,
            5.000e+00],
           [1.000e+00, 1.120e+01, 2.800e-01, 5.600e-01, 1.900e+00, 7.500e-02,
            1.700e+01, 6.000e+01, 9.980e-01, 3.160e+00, 5.800e-01, 9.800e+00,
            6.000e+00]], dtype=float32)}

Changed in version 0.7: Parameter rename_strategy was added.

source on GitHub

module onnx_conv

module onnx_conv.helpers.lgbm_helper

module onnx_conv.convert#

Short summary#

Functions#

Documentation#

module `onnx_conv.convert`#