Converters with options#
Some converters have options to change the way a specific operator is converted. The whole list is described at Converters with options.
Option cdist for GaussianProcessRegressor#
Notebooks Pairwise distances with ONNX (pdist) shows how much slower
an ONNX implementation of function
cdist, from 3 to 10 times slower.
One way to optimize the converted model is to
create dedicated operators such as the one for function
cdist. The first example shows how to
convert a GaussianProcessRegressor into
standard ONNX (see also CDist
).
Now the new model with the operator CDist.
The only change is parameter options
set to options={GaussianProcessRegressor: {'optim': 'cdist'}}
.
It tells the conversion fonction that every every model
sklearn.gaussian_process.GaussianProcessRegressor
must be converted with the option optim='cdist'
. The converter
of this model checks that that options and uses custom operator CDist
instead of its standard implementation based on operator
Scan.
Section GaussianProcess shows how much the gain
is depending on the number of observations for this example.
Other model supported cdist#
Pairwise distances are also is all nearest neighbours models. That same cdist option is also supported for these models.
Option zipmap for classifiers#
By default, the library sklearn-onnx produces a list
of dictionaries {label: prediction}
but this data structure
takes a significant time to be build. The converted
model can stick to matrices by removing operator ZipMap.
This is done by using option {'zipmap': False}
.
Option raw_scores for classifiers#
By default, the library sklearn-onnx produces probabilities
whenever it is possible for a classifier. Raw scores can usually
be still obtained by using option {'raw_scores': True}
.
Pickability and Pipeline#
The proposed way to specify options is not always pickable.
Function id(model)
depends on the execution and map an option
to one class may be not enough to customize the conversion.
However, it is possible to specify an option the same way
parameters are referenced in a scikit-learn pipeline
with method get_params.
Following syntax are supported:
pipe = Pipeline([('pca', PCA()), ('classifier', LogisticRegression())])
options = {'classifier': {'zipmap': False}}
Or
options = {'classifier__zipmap': False}
Options applied to one model, not a pipeline as the converter replaces the pipeline structure by a single onnx graph. Following that rule, option zipmap would not have any impact if applied to a pipeline and to the last step of the pipeline. However, because there is no ambiguity about what the conversion should be, for options zipmap and nocl, the following options would have the same effect:
pipe = Pipeline([('pca', PCA()), ('classifier', LogisticRegression())])
options = {id(pipe.steps[-1][1]): {'zipmap': False}}
options = {id(pipe): {'zipmap': False}}
options = {'classifier': {'zipmap': False}}
options = {'classifier__zipmap': False}