.. _l-howto-sklearn: scikit-learn ============ This page answers common *"how do I…"* questions for converting :epkg:`scikit-learn` estimators and pipelines to ONNX with :func:`yobx.sklearn.to_onnx`. ---- How to convert a single estimator ---------------------------------- Train a :epkg:`scikit-learn` estimator, then pass it together with a representative dummy input (one row is enough) to :func:`yobx.sklearn.to_onnx`: .. runpython:: :showcode: import numpy as np from sklearn.preprocessing import StandardScaler from yobx.helpers.onnx_helper import pretty_onnx from yobx.sklearn import to_onnx rng = np.random.default_rng(0) X = rng.standard_normal((20, 4)).astype(np.float32) scaler = StandardScaler().fit(X) onx = to_onnx(scaler, (X[:1],)) print(pretty_onnx(onx)) The dummy input controls the **dtype** and the **number of features** of the generated ONNX graph; its batch dimension is replaced by a symbolic dynamic axis automatically. ---- How to convert a Pipeline -------------------------- :func:`yobx.sklearn.to_onnx` handles :class:`~sklearn.pipeline.Pipeline` natively — each step is converted in sequence and the resulting ONNX nodes are chained together: .. runpython:: :showcode: import numpy as np from sklearn.linear_model import LogisticRegression from sklearn.pipeline import Pipeline from sklearn.preprocessing import StandardScaler from yobx.helpers.onnx_helper import pretty_onnx from yobx.sklearn import to_onnx rng = np.random.default_rng(0) X = rng.standard_normal((80, 4)).astype(np.float32) y = (X[:, 0] + X[:, 1] > 0).astype(int) pipe = Pipeline( [("scaler", StandardScaler()), ("clf", LogisticRegression())] ).fit(X, y) onx = to_onnx(pipe, (X[:1],)) print(f"ONNX opset : {onx.opset_import[0].version}") print(pretty_onnx(onx)) .. seealso:: :ref:`l-plot-sklearn-pipeline` — a full runnable gallery example with output verification. ---- How to run the exported ONNX model ------------------------------------ Use :epkg:`onnxruntime` to run the converted model and compare its outputs with :epkg:`scikit-learn`'s own predictions: .. runpython:: :showcode: import numpy as np import onnxruntime from sklearn.linear_model import LogisticRegression from sklearn.pipeline import Pipeline from sklearn.preprocessing import StandardScaler from yobx.sklearn import to_onnx rng = np.random.default_rng(0) X_train = rng.standard_normal((80, 4)).astype(np.float32) y_train = (X_train[:, 0] + X_train[:, 1] > 0).astype(int) pipe = Pipeline( [("scaler", StandardScaler()), ("clf", LogisticRegression())] ).fit(X_train, y_train) onx = to_onnx(pipe, (X_train[:1],)) # Run with onnxruntime X_test = rng.standard_normal((20, 4)).astype(np.float32) sess = onnxruntime.InferenceSession( onx.SerializeToString(), providers=["CPUExecutionProvider"] ) input_name = sess.get_inputs()[0].name label_onnx, proba_onnx = sess.run(None, {input_name: X_test}) # Compare with scikit-learn label_sk = pipe.predict(X_test) assert (label_sk == label_onnx).all(), "Label mismatch!" print("Labels match ✓") print(f"First 5 labels (sklearn): {label_sk[:5]}") print(f"First 5 labels (ONNX) : {label_onnx[:5]}") ---- How to control dynamic shapes ------------------------------- By default the batch dimension (axis 0) of every input is made dynamic. Pass ``dynamic_shapes`` to name that axis explicitly or to mark additional axes as symbolic: .. code-block:: python import numpy as np from sklearn.preprocessing import StandardScaler from yobx.sklearn import to_onnx rng = np.random.default_rng(0) X = rng.standard_normal((20, 4)).astype(np.float32) scaler = StandardScaler().fit(X) # axis 0 is dynamic and named "batch" onx = to_onnx(scaler, (X[:1],), dynamic_shapes=({0: "batch"},)) Pass an empty tuple (``dynamic_shapes=()``) to produce a fully **static** graph where every dimension is fixed at conversion time: .. code-block:: python onx_static = to_onnx(scaler, (X[:1],), dynamic_shapes=()) ---- How to inspect the ONNX graph ------------------------------ Print a compact text representation of the model with :func:`~yobx.helpers.onnx_helper.pretty_onnx`: .. runpython:: :showcode: import numpy as np from sklearn.preprocessing import StandardScaler from yobx.sklearn import to_onnx from yobx.helpers.onnx_helper import pretty_onnx rng = np.random.default_rng(0) X = rng.standard_normal((10, 4)).astype(np.float32) scaler = StandardScaler().fit(X) onx = to_onnx(scaler, (X[:1],)) print(pretty_onnx(onx)) ---- How to save and reload the ONNX model --------------------------------------- The :class:`~yobx.container.ExportArtifact` returned by :func:`yobx.sklearn.to_onnx` can be serialised directly to disk and loaded again later: .. code-block:: python import numpy as np import onnx from sklearn.preprocessing import StandardScaler from yobx.sklearn import to_onnx rng = np.random.default_rng(0) X = rng.standard_normal((20, 4)).astype(np.float32) scaler = StandardScaler().fit(X) onx = to_onnx(scaler, (X[:1],)) # Save onnx.save(onx, "scaler.onnx") # Reload onx_loaded = onnx.load("scaler.onnx") .. seealso:: :ref:`l-sklearn-converter` — full reference for the scikit-learn converter, including the converter registry and how to add support for custom estimators. :ref:`l-plot-sklearn-pipeline` — runnable gallery example. :ref:`l-plot-sklearn-function-options` — exporting each pipeline step as a separate ONNX local function. ---- How to export a custom estimator ---------------------------------- There are two ways to make :func:`~yobx.sklearn.to_onnx` work with an estimator that has no built-in converter. **Option 1 — TraceableMixin (numpy-based transformers)** If the ``transform`` method uses only standard :epkg:`numpy` operations, inherit from :class:`~yobx.sklearn.TraceableMixin` together with the usual sklearn base classes. The framework traces the method automatically — no converter function is needed: .. runpython:: :showcode: import numpy as np from sklearn.base import BaseEstimator, TransformerMixin from yobx.helpers.onnx_helper import pretty_onnx from yobx.sklearn import to_onnx, TraceableMixin class LogNormTransformer(BaseEstimator, TransformerMixin, TraceableMixin): def fit(self, X, y=None): self.scale_ = np.abs(X).mean(axis=0, keepdims=True).astype(np.float32) return self def transform(self, X): return np.log(np.abs(X) / self.scale_ + np.float32(1)) rng = np.random.default_rng(0) X = rng.standard_normal((20, 4)).astype(np.float32) est = LogNormTransformer().fit(X) onx = to_onnx(est, (X[:1],)) print(pretty_onnx(onx)) **Option 2 — extra_converters (full control)** For estimators whose logic cannot be expressed as plain numpy ops — or when you need fine-grained control over the ONNX graph — write a converter function and pass it via ``extra_converters``: .. runpython:: :showcode: import numpy as np import onnxruntime from sklearn.base import BaseEstimator, TransformerMixin from yobx.helpers.onnx_helper import pretty_onnx from yobx.sklearn import to_onnx from yobx.helpers.onnx_helper import tensor_dtype_to_np_dtype class ClipTransformer(BaseEstimator, TransformerMixin): def __init__(self, clip_min=0.0, clip_max=1.0): self.clip_min = clip_min self.clip_max = clip_max def fit(self, X, y=None): return self def transform(self, X): return np.clip(X, self.clip_min, self.clip_max) def convert_clip(g, sts, outputs, estimator, X, name="clip"): dtype = tensor_dtype_to_np_dtype(g.get_type(X)) low = np.array(estimator.clip_min, dtype=dtype) high = np.array(estimator.clip_max, dtype=dtype) res = g.op.Clip(X, low, high, outputs=outputs, name=name) g.set_type_shape_unary_op(res, X) return res rng = np.random.default_rng(0) X = rng.standard_normal((20, 4)).astype(np.float32) transformer = ClipTransformer(clip_min=-0.5, clip_max=0.5).fit(X) onx = to_onnx( transformer, (X[:1],), extra_converters={ClipTransformer: convert_clip}, ) print(pretty_onnx(onx)) sess = onnxruntime.InferenceSession( onx.SerializeToString(), providers=["CPUExecutionProvider"] ) X_test = rng.standard_normal((5, 4)).astype(np.float32) (clipped,) = sess.run(None, {"X": X_test}) expected = transformer.transform(X_test) assert np.allclose(clipped, expected, atol=1e-6) print("Results match ✓") .. seealso:: :ref:`l-plot-sklearn-custom-converter-options` — a full gallery example showing a custom converter with optional extra outputs. :ref:`l-sklearn-converter` — converter registry and how to write a converter for any estimator. ---- How to export with FunctionTransformer ---------------------------------------- :class:`~sklearn.preprocessing.FunctionTransformer` wraps any numpy function as a scikit-learn transformer. Because its ``func`` is a plain numpy function, :func:`~yobx.sklearn.to_onnx` converts it via numpy tracing — no custom converter is required. **Basic usage** .. runpython:: :showcode: import numpy as np import onnxruntime from sklearn.preprocessing import FunctionTransformer from yobx.helpers.onnx_helper import pretty_onnx from yobx.sklearn import to_onnx def log1p_abs(X): return np.log1p(np.abs(X)) rng = np.random.default_rng(0) X = rng.standard_normal((20, 4)).astype(np.float32) transformer = FunctionTransformer(func=log1p_abs).fit(X) onx = to_onnx(transformer, (X[:1],)) print(pretty_onnx(onx)) sess = onnxruntime.InferenceSession( onx.SerializeToString(), providers=["CPUExecutionProvider"] ) X_test = rng.standard_normal((5, 4)).astype(np.float32) (onnx_out,) = sess.run(None, {"X": X_test}) expected = transformer.transform(X_test).astype(np.float32) assert np.allclose(onnx_out, expected, atol=1e-5) print("Results match ✓") **Passing keyword arguments with kw_args** Constants can be forwarded to the function via ``kw_args``; the converter folds them into the ONNX graph as initializers: .. runpython:: :showcode: import numpy as np from sklearn.preprocessing import FunctionTransformer from yobx.helpers.onnx_helper import pretty_onnx from yobx.sklearn import to_onnx def scale_shift(X, scale=np.float32(1), shift=np.float32(0)): return X * scale + shift rng = np.random.default_rng(0) X = rng.standard_normal((20, 4)).astype(np.float32) transformer = FunctionTransformer( func=scale_shift, kw_args={"scale": np.float32(2.0), "shift": np.float32(1.0)}, ).fit(X) onx = to_onnx(transformer, (X[:1],)) print(pretty_onnx(onx)) **Identity transformer (func=None)** When ``func=None`` the transformer is a pass-through — the input is forwarded to the output unchanged. The optimizer removes redundant intermediate nodes, so the resulting graph is minimal: .. runpython:: :showcode: import numpy as np from sklearn.preprocessing import FunctionTransformer from yobx.helpers.onnx_helper import pretty_onnx from yobx.sklearn import to_onnx X = np.ones((5, 3), dtype=np.float32) identity_tf = FunctionTransformer(func=None).fit(X) onx = to_onnx(identity_tf, (X[:1],)) print(pretty_onnx(onx)) .. seealso:: :ref:`l-plot-sklearn-function-transformer` — a full gallery example that also shows standalone numpy tracing and pipeline embedding. :ref:`l-design-function-transformer-tracing` — design doc explaining the numpy tracing mechanism.