module onnx_conv.convert
#
Short summary#
module mlprodict.onnx_conv.convert
Overloads a conversion function.
Functions#
function |
truncated documentation |
---|---|
Converts a scorer into ONNX assuming there exists a converter associated to it. The function wraps the function … |
|
Produces input data for onnx runtime. |
|
Guesses initial types from an array or a dataframe. |
|
Guesses initial types from a dataset. |
|
Guesses initial types from a model. |
|
Converts a model using on sklearn-onnx. |
Documentation#
Overloads a conversion function.
- mlprodict.onnx_conv.convert._fix_opset_skl2onnx()#
- mlprodict.onnx_conv.convert._replace_tensor_type(schema, tensor_type)#
- mlprodict.onnx_conv.convert.convert_scorer(fct, initial_types, name=None, target_opset=None, options=None, custom_conversion_functions=None, custom_shape_calculators=None, custom_parsers=None, white_op=None, black_op=None, final_types=None, verbose=0)#
Converts a scorer into ONNX assuming there exists a converter associated to it. The function wraps the function into a custom transformer, then calls function convert_sklearn from sklearn-onnx.
- Parameters
fct – function to convert (or a scorer from scikit-learn)
initial_types – types information
name – name of the produced model
target_opset – to do it with a different target opset
options – additional parameters for the conversion
custom_conversion_functions – a dictionary for specifying the user customized conversion function, it takes precedence over registered converters
custom_shape_calculators – a dictionary for specifying the user customized shape calculator it takes precedence over registered shape calculators.
custom_parsers – parsers determine which outputs is expected for which particular task, default parsers are defined for classifiers, regressors, pipeline but they can be rewritten, custom_parsers is a dictionary
{ type: fct_parser(scope, model, inputs, custom_parsers=None) }
white_op – white list of ONNX nodes allowed while converting a pipeline, if empty, all are allowed
black_op – black list of ONNX nodes allowed while converting a pipeline, if empty, none are blacklisted
final_types – a python list. Works the same way as initial_types but not mandatory, it is used to overwrites the type (if type is not None) and the name of every output.
verbose – displays information while converting
- Returns
ONNX graph
- mlprodict.onnx_conv.convert.get_inputs_from_data(X, schema=None)#
Produces input data for onnx runtime.
- Parameters
X – data
schema – schema if None, schema is guessed with
guess_schema_from_data
- Returns
input data
- mlprodict.onnx_conv.convert.guess_initial_types(X, initial_types)#
Guesses initial types from an array or a dataframe.
- Parameters
X – array or dataframe
initial_types – hints about X
- Returns
data types
- mlprodict.onnx_conv.convert.guess_schema_from_data(X, tensor_type=None, schema=None)#
Guesses initial types from a dataset.
- Parameters
X – dataset (dataframe, array)
tensor_type – if not None, replaces every FloatTensorType or DoubleTensorType by this one
schema – known schema
- Returns
schema (list of typed and named columns)
- mlprodict.onnx_conv.convert.guess_schema_from_model(model, tensor_type=None, schema=None)#
Guesses initial types from a model.
- Parameters
model – model
tensor_type – if not None, replaces every FloatTensorType or DoubleTensorType by this one
schema – known schema
- Returns
schema (list of typed and named columns)
- mlprodict.onnx_conv.convert.to_onnx(model, X=None, name=None, initial_types=None, target_opset=None, options=None, rewrite_ops=False, white_op=None, black_op=None, final_types=None, rename_strategy=None, verbose=0)#
Converts a model using on sklearn-onnx.
- Parameters
model – model to convert or a function wrapped into _PredictScorer with function make_scorer
X – training set (at least one row), can be None, it is used to infered the input types (initial_types)
initial_types – if X is None, then initial_types must be defined
name – name of the produced model
target_opset – to do it with a different target opset
options – additional parameters for the conversion
rewrite_ops – rewrites some existing converters, the changes are permanent
white_op – white list of ONNX nodes allowed while converting a pipeline, if empty, all are allowed
black_op – black list of ONNX nodes allowed while converting a pipeline, if empty, none are blacklisted
final_types – a python list. Works the same way as initial_types but not mandatory, it is used to overwrites the type (if type is not None) and the name of every output.
rename_strategy – rename any name in the graph, select shorter names, see
onnx_rename_names
verbose – display information while converting the model
- Returns
converted model
The function rewrites function to_onnx from sklearn-onnx but may changes a few converters if rewrite_ops is True. For example, ONNX only supports TreeEnsembleRegressor for float but not for double. It becomes available if
rewrite_ops=True
.How to deal with a dataframe as input?
Each column of the dataframe is considered as an named input. The first step is to make sure that every column type is correct. pandas tends to select the least generic type to hold the content of one column. ONNX does not automatically cast the data it receives. The data must have the same type with the model is converted and when the converted model receives the data to predict.
<<<
from io import StringIO from textwrap import dedent import numpy import pandas from pyquickhelper.pycode import ExtTestCase from sklearn.preprocessing import OneHotEncoder from sklearn.pipeline import Pipeline from sklearn.compose import ColumnTransformer from mlprodict.onnx_conv import to_onnx from mlprodict.onnxrt import OnnxInference text = dedent(''' __SCHEMA__ 7.4,0.7,0.0,1.9,0.076,11.0,34.0,0.9978,3.51,0.56,9.4,5,red 7.8,0.88,0.0,2.6,0.098,25.0,67.0,0.9968,3.2,0.68,9.8,5,red 7.8,0.76,0.04,2.3,0.092,15.0,54.0,0.997,3.26,0.65,9.8,5,red 11.2,0.28,0.56,1.9,0.075,17.0,60.0,0.998,3.16,0.58,9.8,6,red ''') text = text.replace( "__SCHEMA__", "fixed_acidity,volatile_acidity,citric_acid,residual_sugar,chlorides," "free_sulfur_dioxide,total_sulfur_dioxide,density,pH,sulphates," "alcohol,quality,color") X_train = pandas.read_csv(StringIO(text)) for c in X_train.columns: if c != 'color': X_train[c] = X_train[c].astype(numpy.float32) numeric_features = [c for c in X_train if c != 'color'] pipe = Pipeline([ ("prep", ColumnTransformer([ ("color", Pipeline([ ('one', OneHotEncoder()), ('select', ColumnTransformer( [('sel1', 'passthrough', [0])])) ]), ['color']), ("others", "passthrough", numeric_features) ])), ]) pipe.fit(X_train) pred = pipe.transform(X_train) print(pred) model_onnx = to_onnx(pipe, X_train, target_opset=12) oinf = OnnxInference(model_onnx) # The dataframe is converted into a dictionary, # each key is a column name, each value is a numpy array. inputs = {c: X_train[c].values for c in X_train.columns} inputs = {c: v.reshape((v.shape[0], 1)) for c, v in inputs.items()} onxp = oinf.run(inputs) print(onxp)
>>>
[[1.000e+00 7.400e+00 7.000e-01 0.000e+00 1.900e+00 7.600e-02 1.100e+01 3.400e+01 9.978e-01 3.510e+00 5.600e-01 9.400e+00 5.000e+00] [1.000e+00 7.800e+00 8.800e-01 0.000e+00 2.600e+00 9.800e-02 2.500e+01 6.700e+01 9.968e-01 3.200e+00 6.800e-01 9.800e+00 5.000e+00] [1.000e+00 7.800e+00 7.600e-01 4.000e-02 2.300e+00 9.200e-02 1.500e+01 5.400e+01 9.970e-01 3.260e+00 6.500e-01 9.800e+00 5.000e+00] [1.000e+00 1.120e+01 2.800e-01 5.600e-01 1.900e+00 7.500e-02 1.700e+01 6.000e+01 9.980e-01 3.160e+00 5.800e-01 9.800e+00 6.000e+00]] {'transformed_column': array([[1.000e+00, 7.400e+00, 7.000e-01, 0.000e+00, 1.900e+00, 7.600e-02, 1.100e+01, 3.400e+01, 9.978e-01, 3.510e+00, 5.600e-01, 9.400e+00, 5.000e+00], [1.000e+00, 7.800e+00, 8.800e-01, 0.000e+00, 2.600e+00, 9.800e-02, 2.500e+01, 6.700e+01, 9.968e-01, 3.200e+00, 6.800e-01, 9.800e+00, 5.000e+00], [1.000e+00, 7.800e+00, 7.600e-01, 4.000e-02, 2.300e+00, 9.200e-02, 1.500e+01, 5.400e+01, 9.970e-01, 3.260e+00, 6.500e-01, 9.800e+00, 5.000e+00], [1.000e+00, 1.120e+01, 2.800e-01, 5.600e-01, 1.900e+00, 7.500e-02, 1.700e+01, 6.000e+01, 9.980e-01, 3.160e+00, 5.800e-01, 9.800e+00, 6.000e+00]], dtype=float32)}
Changed in version 0.7: Parameter rename_strategy was added.