scikit-learn Converters and Benchmarks#
sklearn-onnx converts many scikit-learn models into ONNX. Every of them is tested against a couple of runtimes. The following pages shows which models are correctly converted and compares the predictions obtained by every runtime (see Runtimes for ONNX). It also displays some figures on how the runtime behave compare to scikit-learn in term of speed processing. The benchmark evaluates every model on a dataset inspired from the Iris dataset, so with four features, and different number of observations N= 1, 10, 100, 1.000, 10.000, 100.000. The measures for high values of N may be missing because the first one took too long.
Another benchmark based on asv is available and shows similar results but also measure the memory peaks : ASV Benchmark.
Visual Representations#
sklearn-onnx converts many scikit-learn models to ONNX, it rewrites the prediction function using ONNX Operators and ONNX ML Operators. The current package mlprodict implements a Python Runtime for ONNX operators.
- ONNX New Converters
- Visual Representation of scikit-learn models
- calibration
- cluster
- compose
- covariance
- cross_decomposition
- decomposition
- discriminant_analysis
- ensemble
- feature_extraction
- feature_selection
- gaussian_process
- impute
- isotonic
- kernel_approximation
- kernel_ridge
- linear_model
- mixture
- mlprodict.onnx_conv
- model_selection
- multiclass
- multioutput
- naive_bayes
- neighbors
- neural_network
- preprocessing
- random_projection
- semi_supervised
- svm
- tree
- Availability of scikit-learn model for runtime python_compiled
- …NB
- AdaBoost
- AdditiveChi2Sampler
- AffinityPropagation
- Bagging
- Bayesian…
- Binarizer
- Birch
- Booster
- Calibrated
- ClassifierChain
- CountVectorizer
- DictVectorizer
- DictionaryLearning
- EllipticEnvelope
- FactorAnalysis
- FastICA
- Feature…
- FunctionTransformer
- Gamma
- GaussianMixture
- GaussianProcess
- GaussianRandomProjection
- GenericUnivariateSelect
- GradientBoosting
- GridSearch
- HashingVectorizer
- HistGradientBoosting
- IncrementalPCA
- IsotonicRegression
- IterativeImputer
- KBinsDiscretizer
- KMeans
- KNNImputer
- KernelCenterer
- KernelPCA
- Label…
- LatentDirichletAllocation
- Linear
- LinearDiscriminantAnalysis
- LocalOutlierFactor
- MeanShift
- MiniBatch…
- MissingIndicator
- MultiLabelBinarizer
- MultiOutput
- NearestCentroid
- NeighborhoodComponentsAnalysis
- Neighbors
- Nystroem
- OneHotEncoder
- OneVs…
- OrdinalEncoder
- OrthogonalMatchingPursuit
- OutputCode
- PLS…
- PassiveAggressive
- Perceptron
- Poisson
- PolynomialCountSketch
- PolynomialFeatures
- PowerTransformer
- QuadraticDiscriminantAnalysis
- Quantile
- QuantileTransformer
- RBFSampler
- RandomizedSearch
- RegressorChain
- Scaler
- Select…
- SelfTraining
- SequentialFeatureSelector
- SimpleImputer
- SkewedChi2Sampler
- Sparse…
- SplineTransformer
- Stacking
- Tfidf…
- TransferTransformer
- TransformedTarget
- Trees
- TruncatedSVD
- Tweedie
- VarianceThreshold
- Voting
- WOETransformer
- Availability of scikit-learn model for runtime onnxruntime1
- …NB
- AdaBoost
- AdditiveChi2Sampler
- AffinityPropagation
- Bagging
- Bayesian…
- Binarizer
- Birch
- Booster
- Calibrated
- ClassifierChain
- CountVectorizer
- DictVectorizer
- DictionaryLearning
- EllipticEnvelope
- FactorAnalysis
- FastICA
- Feature…
- FunctionTransformer
- Gamma
- GaussianMixture
- GaussianProcess
- GaussianRandomProjection
- GenericUnivariateSelect
- GradientBoosting
- GridSearch
- HashingVectorizer
- HistGradientBoosting
- IncrementalPCA
- IsotonicRegression
- IterativeImputer
- KBinsDiscretizer
- KMeans
- KNNImputer
- KernelCenterer
- KernelPCA
- Label…
- LatentDirichletAllocation
- Linear
- LinearDiscriminantAnalysis
- LocalOutlierFactor
- MeanShift
- MiniBatch…
- MissingIndicator
- MultiLabelBinarizer
- MultiOutput
- NearestCentroid
- NeighborhoodComponentsAnalysis
- Neighbors
- Nystroem
- OneHotEncoder
- OneVs…
- OrdinalEncoder
- OrthogonalMatchingPursuit
- OutputCode
- PLS…
- PassiveAggressive
- Perceptron
- Poisson
- PolynomialCountSketch
- PolynomialFeatures
- PowerTransformer
- QuadraticDiscriminantAnalysis
- Quantile
- QuantileTransformer
- RBFSampler
- RandomizedSearch
- RegressorChain
- Scaler
- Select…
- SelfTraining
- SequentialFeatureSelector
- SimpleImputer
- SkewedChi2Sampler
- Sparse…
- SplineTransformer
- Stacking
- Tfidf…
- TransferTransformer
- TransformedTarget
- Trees
- TruncatedSVD
- Tweedie
- VarianceThreshold
- Voting
- WOETransformer
All results were obtained using out the following versions of modules below:
from mlprodict.onnxrt.validate.validate_helper import modules_list
from pyquickhelper.pandashelper import df2rst
from pandas import DataFrame
name |
version |
mlprodict |
0.8.1762 |
numpy |
1.21.5 |
onnx |
1.11.0 |
onnxmltools |
1.11.0 |
onnxruntime |
1.11.0 |
pandas |
1.4.2 |
scipy |
1.8.0 |
skl2onnx |
1.11.1 |
sklearn |
1.0.2 |
onnxruntime is compiled with the following options:
Supported models#
Every model is tested through a defined list of standard
problems created from the Iris dataset. Function
describes the list of considered problems.
from mlprodict.onnxrt.validate.validate import sklearn_operators, find_suitable_problem
from pyquickhelper.pandashelper import df2rst
from pandas import DataFrame
res = sklearn_operators(extended=True)
rows = []
for model in res:
name = model['name']
row = dict(name=name)
prob = find_suitable_problem(model['cl'])
if prob is None:
for p in prob:
row[p] = 'X'
except RuntimeError:
df = DataFrame(rows).set_index('name')
df = df.sort_index()
cols = list(sorted(df.columns))
df = df[cols]
print(df2rst(df, index=True))
name |
b-cl |
b-reg |
bow |
cluster |
int-col |
key-int-col |
key-str-col |
m-cl |
m-reg |
mix |
num+y-tr |
num+y-tr-cl |
num-tr |
num-tr-pos |
one-hot |
outlier |
text-col |
~b-cl-64 |
~b-cl-dec |
~b-cl-f100 |
~b-cl-nan |
~b-cl-nop |
~b-cl-nop-64 |
~b-clu-64 |
~b-reg-1d |
~b-reg-64 |
~b-reg-NF-64 |
~b-reg-NF-cov-64 |
~b-reg-NF-std-64 |
~b-reg-NSV-64 |
~b-reg-cov-64 |
~b-reg-f100 |
~b-reg-nan |
~b-reg-nan-64 |
~b-reg-std-NSV-64 |
~m-cl-dec |
~m-cl-nop |
~m-label |
~m-reg-64 |
~mix-64 |
~num+y-tr-1d |
~num-tr-clu |
~num-tr-clu-64 |
ARDRegression |
X |
X |
AdaBoostClassifier |
X |
X |
X |
AdaBoostRegressor |
X |
X |
AdditiveChi2Sampler |
X |
AffinityPropagation |
X |
X |
BaggingClassifier |
X |
X |
BaggingRegressor |
X |
X |
X |
X |
BayesianGaussianMixture |
X |
X |
BayesianRidge |
X |
X |
BernoulliNB |
X |
X |
BernoulliRBM |
X |
Binarizer |
X |
Birch |
X |
X |
X |
X |
Booster |
X |
X |
X |
X |
X |
CalibratedClassifierCV |
X |
X |
CategoricalNB |
X |
X |
X |
X |
ClassifierChain |
X |
X |
X |
X |
ComplementNB |
X |
X |
CountVectorizer |
X |
DecisionTreeClassifier |
X |
X |
X |
X |
X |
DecisionTreeRegressor |
X |
X |
X |
X |
X |
DictVectorizer |
X |
DictionaryLearning |
X |
ElasticNet |
X |
X |
X |
X |
ElasticNetCV |
X |
X |
EllipticEnvelope |
X |
ExtraTreeClassifier |
X |
X |
X |
X |
X |
ExtraTreeRegressor |
X |
X |
X |
X |
ExtraTreesClassifier |
X |
X |
X |
X |
ExtraTreesRegressor |
X |
X |
X |
X |
FactorAnalysis |
X |
FastICA |
X |
FeatureHasher |
X |
FunctionTransformer |
X |
GammaRegressor |
X |
X |
X |
X |
GaussianMixture |
X |
X |
GaussianNB |
X |
X |
GaussianProcessClassifier |
X |
X |
X |
GaussianProcessRegressor |
X |
X |
X |
X |
X |
X |
X |
X |
GaussianRandomProjection |
X |
GenericUnivariateSelect |
X |
GradientBoostingClassifier |
X |
X |
GradientBoostingRegressor |
X |
X |
GridSearchCV |
X |
X |
X |
X |
X |
X |
X |
X |
X |
HashingVectorizer |
X |
HistGradientBoostingClassifier |
X |
X |
X |
X |
HistGradientBoostingRegressor |
X |
X |
X |
X |
HuberRegressor |
X |
X |
IncrementalPCA |
X |
IsolationForest |
X |
IsotonicRegression |
X |
X |
IterativeImputer |
X |
KBinsDiscretizer |
X |
KMeans |
X |
X |
X |
X |
KNNImputer |
X |
KNeighborsClassifier |
X |
X |
X |
X |
KNeighborsRegressor |
X |
X |
X |
X |
KNeighborsTransformer |
X |
KernelCenterer |
X |
KernelPCA |
X |
KernelRidge |
X |
X |
X |
X |
LGBMClassifier |
X |
X |
X |
LGBMRegressor |
X |
X |
LabelBinarizer |
X |
LabelEncoder |
X |
LabelPropagation |
X |
X |
LabelSpreading |
X |
X |
Lars |
X |
X |
X |
X |
LarsCV |
X |
X |
Lasso |
X |
X |
X |
X |
LassoCV |
X |
X |
LassoLars |
X |
X |
X |
X |
LassoLarsCV |
X |
X |
LassoLarsIC |
X |
X |
LatentDirichletAllocation |
X |
LinearDiscriminantAnalysis |
X |
X |
LinearRegression |
X |
X |
X |
X |
LinearSVC |
X |
X |
LinearSVR |
X |
X |
LocalOutlierFactor |
X |
LogisticRegression |
X |
X |
X |
X |
X |
LogisticRegressionCV |
X |
X |
MLPClassifier |
X |
X |
X |
X |
MLPRegressor |
X |
X |
X |
X |
MaxAbsScaler |
X |
MeanShift |
X |
X |
MinMaxScaler |
X |
MiniBatchDictionaryLearning |
X |
MiniBatchKMeans |
X |
X |
X |
X |
MiniBatchSparsePCA |
X |
MissingIndicator |
X |
MultiLabelBinarizer |
X |
MultiOutputClassifier |
X |
X |
MultiOutputRegressor |
X |
MultiTaskElasticNet |
X |
MultiTaskElasticNetCV |
X |
MultiTaskLasso |
X |
MultiTaskLassoCV |
X |
MultinomialNB |
X |
X |
X |
NearestCentroid |
X |
X |
NeighborhoodComponentsAnalysis |
X |
Normalizer |
X |
X |
X |
X |
X |
X |
X |
Nystroem |
X |
OneClassSVM |
X |
OneHotEncoder |
X |
OneVsOneClassifier |
X |
X |
OneVsRestClassifier |
X |
X |
X |
X |
OrdinalEncoder |
X |
OrthogonalMatchingPursuit |
X |
X |
X |
X |
OrthogonalMatchingPursuitCV |
X |
X |
OutputCodeClassifier |
X |
X |
X |
PLSCanonical |
X |
X |
X |
X |
X |
PLSRegression |
X |
X |
X |
X |
X |
X |
PassiveAggressiveClassifier |
X |
X |
PassiveAggressiveRegressor |
X |
X |
Perceptron |
X |
X |
X |
X |
PoissonRegressor |
X |
X |
X |
X |
PolynomialCountSketch |
X |
PolynomialFeatures |
X |
PowerTransformer |
X |
QuadraticDiscriminantAnalysis |
X |
X |
QuantileRegressor |
X |
X |
X |
X |
QuantileTransformer |
X |
RANSACRegressor |
X |
X |
X |
X |
RBFSampler |
X |
X |
X |
RadiusNeighborsClassifier |
X |
X |
RadiusNeighborsRegressor |
X |
X |
X |
X |
RadiusNeighborsTransformer |
X |
RandomForestClassifier |
X |
X |
X |
X |
RandomForestRegressor |
X |
X |
X |
X |
RandomTreesEmbedding |
X |
RandomizedSearchCV |
X |
X |
RegressorChain |
X |
X |
X |
X |
Ridge |
X |
X |
X |
X |
RidgeCV |
X |
X |
X |
X |
RidgeClassifier |
X |
X |
X |
RidgeClassifierCV |
X |
X |
X |
RobustScaler |
X |
SGDClassifier |
X |
X |
X |
X |
SGDOneClassSVM |
X |
SGDRegressor |
X |
X |
X |
X |
X |
X |
X |
X |
SelectFdr |
X |
SelectFpr |
X |
SelectFromModel |
X |
SelectFwe |
X |
SelectKBest |
X |
SelectPercentile |
X |
SelfTrainingClassifier |
X |
X |
X |
X |
SequentialFeatureSelector |
X |
SimpleImputer |
X |
SkewedChi2Sampler |
X |
SparseCoder |
X |
SparsePCA |
X |
SparseRandomProjection |
X |
SplineTransformer |
X |
StackingClassifier |
X |
StackingRegressor |
X |
StandardScaler |
X |
TfidfTransformer |
X |
TfidfVectorizer |
X |
TheilSenRegressor |
X |
X |
TransferTransformer |
X |
TransformedTargetRegressor |
X |
X |
X |
X |
TruncatedSVD |
X |
TweedieRegressor |
X |
X |
X |
X |
VarianceThreshold |
X |
VotingClassifier |
X |
X |
VotingRegressor |
X |
X |
X |
X |
WOETransformer |
X |
XGBClassifier |
X |
X |
X |
XGBRegressor |
X |
X |
Summary graph#
The following graph summarizes the performance for every supported models and compares python runtime and onnxruntime to scikit-learn in the same condition. It displays a ratio r. Above 1, it is r times slower than scikit-learn. Below 1, it is 1/r faster than scikit-learn.
import pandas
import matplotlib.pyplot as plt
import numpy
from import get_opset_number_from_onnx
from mlprodict.plotting.plotting_validate_graph import _model_name
df1 = pandas.read_excel("bench_sum_python_compiled.xlsx")
df2 = pandas.read_excel("bench_sum_onnxruntime1.xlsx")
if 'n_features' not in df1.columns:
df1["n_features"] = 4
if 'n_features' not in df2.columns:
df2["n_features"] = 4
df1['optim'] = df1['optim'].fillna('')
df2['optim'] = df2['optim'].fillna('')
last_opset = max(int(_[5:]) for _ in list(df1.columns) if _.startswith("opset"))
opset_col = 'opset%d' % last_opset
df1['opset'] = df1[opset_col].fillna('')
df2['opset'] = df2[opset_col].fillna('')
df1['opset'] = df1['opset'].apply(lambda x: str(last_opset) if "OK %d" % last_opset in x else "")
df2['opset'] = df2['opset'].apply(lambda x: str(last_opset) if "OK %d" % last_opset in x else "")
sops = str(get_opset_number_from_onnx())
oksops = "OK " + str(get_opset_number_from_onnx())
df1['opset'] = df1['opset'].apply(lambda x: sops if oksops in x else "")
df2['opset'] = df2['opset'].apply(lambda x: sops if oksops in x else "")
fmt = "{} [{}-{}|{}] D{}-o{}"
df1["label"] = df1.apply(
lambda row: fmt.format(
row["name"], row["problem"], row["scenario"], row["optim"],
row["n_features"], row["opset"]).replace("-default|", "-*]"), axis=1)
df2["label"] = df2.apply(
lambda row: fmt.format(
row["name"], row["problem"], row["scenario"], row["optim"],
row["n_features"], row["opset"]).replace("-default|", "-*]"), axis=1)
indices = ['label']
values = ['RT/SKL-N=1', 'N=10', 'N=100', 'N=1000', 'N=10000']
df1 = df1[indices + values]
df2 = df2[indices + values]
df = df1.merge(df2, on="label", suffixes=("__pyrtc", "__ort"), how='outer')
na = df["RT/SKL-N=1__pyrtc"].isnull() & df["RT/SKL-N=1__ort"].isnull()
dfp = df[~na].sort_values("label", ascending=False).reset_index(drop=True)
# dfp = dfp[-10:]
# We add the runtime name as model.
ncol = (dfp.shape[1] - 1) // 2
dfp_legend = dfp.iloc[:3, :].copy()
dfp_legend.iloc[:, 1:] = numpy.nan
dfp_legend.iloc[1, 1:1+ncol] = dfp.iloc[:, 1:1+ncol].mean()
dfp_legend.iloc[2, 1+ncol:] = dfp.iloc[:, 1+ncol:].mean()
dfp_legend.iloc[1, 0] = "avg_" + dfp_legend.columns[1].split('__')[-1]
dfp_legend.iloc[2, 0] = "avg_" + dfp_legend.columns[1+ncol].split('__')[-1]
dfp_legend.iloc[0, 0] = "------"
rleg = dfp_legend.iloc[::-1, :].copy()
rleg.iloc[1, 1:1+ncol] = dfp.iloc[:, 1:1+ncol].median()
rleg.iloc[0, 1+ncol:] = dfp.iloc[:, 1+ncol:].median()
rleg.iloc[1, 0] = "med_" + dfp_legend.columns[1].split('__')[-1]
rleg.iloc[0, 0] = "med_" + dfp_legend.columns[1+ncol].split('__')[-1]
# draw lines between models
dfp = dfp.sort_values('label', ascending=False).copy()
vals = dfp.iloc[:, 1:].values.ravel()
xlim = [max(1e-3, min(0.5, min(vals))), min(1000, max(2, max(vals)))]
i = 0
while i < dfp.shape[0] - 1:
i += 1
label = dfp.iloc[i, 0]
if '[' not in label:
prev = dfp.iloc[i-1, 0]
if '[' not in label:
label = label.split()[0]
prev = prev.split()[0]
if _model_name(label) == _model_name(prev):
blank = dfp.iloc[:1,:].copy()
blank.iloc[0, 0] = '------'
blank.iloc[0, 1:] = xlim[0]
dfp = pandas.concat([dfp[:i], blank, dfp[i:]])
i += 1
dfp = dfp.reset_index(drop=True).copy()
# add exhaustive statistics
dfp = pandas.concat([rleg, dfp, dfp_legend]).reset_index(drop=True)
dfp["x"] = numpy.arange(0, dfp.shape[0])
# plot
total = dfp.shape[0] * 0.5
fig = plt.figure(figsize=(14, total))
ax = list(None for c in range((dfp.shape[1]-1) // 2))
p = 1.2
b = 0.35
for i in range(len(ax)):
x1 = i * 1. / len(ax)
x2 = (i + 0.95) * 1. / len(ax)
x1 = x1 ** p
x2 = x2 ** p
x1 = b + (0.99 - b) * x1
x2 = b + (0.99 - b) * x2
bo = [x1, 0.1, x2 - x1, 0.8]
if True or i == 0:
ax[i] = fig.add_axes(bo)
# Does not work because all graph shows the same
# labels.
ax[i] = fig.add_axes(bo, sharey=ax[i-1])
# fig, ax = plt.subplots(1, (dfp.shape[1]-1) // 2, figsize=(14, total),
# sharex=False, sharey=True)
x = dfp['x']
height = total / dfp.shape[0] * 0.65
for c in df.columns[1:]:
place, runtime = c.split('__')
dec = {'pyrtc': 1, 'ort': -1}
index = values.index(place)
yl = dfp.loc[:, c].fillna(0)
xl = xl = x + dec[runtime] * height / 2
ax[index].barh(xl, yl, label=runtime, height=height)
for i in range(len(ax)):
ax[i].plot([1, 1], [min(x), max(x)], 'g-')
ax[i].plot([2, 2], [min(x), max(x)], 'r--')
ax[i].plot([5, 5], [min(x), max(x)], 'r--', lw=3)
ax[i].set_ylim([min(x) - 2, max(x) + 1])
for i in range(1, len(ax)):