.. _l-design-sklearn-custom-converter: ================ Custom Converter ================ The built-in converter registry covers estimators that ship with :epkg:`scikit-learn`. When you train a **custom estimator** — or want to override how a built-in estimator is translated — you can supply your own converter without touching the package source. There are two ways: * **Ad-hoc** via the ``extra_converters`` parameter of :func:`to_onnx ` — useful for one-off conversions or during development. * **Permanent** via the :func:`register_sklearn_converter ` decorator — the right choice once a converter is stable and reusable. Writing a converter function ============================ A converter follows the same contract as all built-in ones: ``(g, sts, outputs, estimator, *input_names, name) → output_name(s)`` ================ ===================================================== Parameter Description ================ ===================================================== ``g`` :class:`GraphBuilder ` — call ``g.op.(…)`` to emit ONNX nodes. ``sts`` ``Dict`` of metadata (empty ``{}`` in the default path; reserved for future shape propagation). ``outputs`` ``List[str]`` of pre-allocated output tensor names that the converter **must** write to. ``estimator`` The fitted :epkg:`scikit-learn` object. ``*input_names`` One positional ``str`` per graph input tensor. ``name`` String prefix for unique node-name generation. ================ ===================================================== Ad-hoc conversion with ``extra_converters`` =========================================== Pass a ``{EstimatorClass: converter_function}`` mapping to the ``extra_converters`` keyword argument. Entries in that mapping take **priority** over built-in converters, so you can also override an existing converter this way. The example below defines a custom ``ScaleByConstant`` transformer and its corresponding ONNX converter, then converts an instance to ONNX and validates the result numerically. .. runpython:: :showcode: import numpy as np from sklearn.base import BaseEstimator, TransformerMixin from yobx.sklearn import to_onnx from yobx.helpers.onnx_helper import pretty_onnx # ── 1. Custom estimator ──────────────────────────────────────────── class ScaleByConstant(TransformerMixin, BaseEstimator): """Multiplies every feature by a fixed scalar constant.""" def __init__(self, scale=2.0): self.scale = scale def fit(self, X, y=None): return self def transform(self, X): return X * self.scale # ── 2. Converter function ────────────────────────────────────────── def convert_scale_by_constant(g, sts, outputs, estimator, X, name="scale"): """Emits a single ``Mul`` node: output = X * estimator.scale.""" scale = np.array([estimator.scale], dtype=np.float32) result = g.op.Mul(X, scale, name=name, outputs=outputs) return result rng = np.random.default_rng(0) X = rng.standard_normal((5, 3)).astype(np.float32) est = ScaleByConstant(scale=3.0).fit(X) onx = to_onnx(est, (X,), extra_converters={ScaleByConstant: convert_scale_by_constant}) print(pretty_onnx(onx)) Validate numerically ==================== .. runpython:: :showcode: import numpy as np from sklearn.base import BaseEstimator, TransformerMixin from yobx.sklearn import to_onnx from yobx.reference import ExtendedReferenceEvaluator class ScaleByConstant(TransformerMixin, BaseEstimator): def __init__(self, scale=2.0): self.scale = scale def fit(self, X, y=None): return self def transform(self, X): return X * self.scale def convert_scale_by_constant(g, sts, outputs, estimator, X, name="scale"): scale = np.array([estimator.scale], dtype=np.float32) result = g.op.Mul(X, scale, name=name, outputs=outputs) return result rng = np.random.default_rng(0) X = rng.standard_normal((5, 3)).astype(np.float32) est = ScaleByConstant(scale=3.0).fit(X) onx = to_onnx(est, (X,), extra_converters={ScaleByConstant: convert_scale_by_constant}) ref = ExtendedReferenceEvaluator(onx) onnx_output = ref.run(None, {"X": X})[0] sklearn_output = est.transform(X).astype(np.float32) print("max absolute difference:", np.abs(onnx_output - sklearn_output).max()) Overriding a built-in converter ================================ Because ``extra_converters`` entries take priority, you can also replace the converter for a built-in estimator. The snippet below replaces the standard :class:`sklearn.preprocessing.StandardScaler` converter with a trivial identity (just to illustrate the override mechanism): .. runpython:: :showcode: import numpy as np from sklearn.preprocessing import StandardScaler from yobx.sklearn import to_onnx from yobx.helpers.onnx_helper import pretty_onnx def identity_scaler(g, sts, outputs, estimator, X, name="scaler"): """Pass-through: return the input unchanged.""" result = g.op.Identity(X, name=name, outputs=outputs) return result rng = np.random.default_rng(1) X = rng.standard_normal((4, 2)).astype(np.float32) ss = StandardScaler().fit(X) # The custom converter overrides the built-in one onx = to_onnx(ss, (X,), extra_converters={StandardScaler: identity_scaler}) print(pretty_onnx(onx)) Permanent registration ====================== Once your converter is stable, promote it from an ad-hoc function to a first-class entry in the registry by using the :func:`register_sklearn_converter ` decorator. This means you no longer have to pass ``extra_converters`` at every call site: .. code-block:: python # myproject/onnx_converters.py import numpy as np from sklearn.base import BaseEstimator, TransformerMixin from yobx.sklearn.register import register_sklearn_converter from yobx.typing import GraphBuilderExtendedProtocol from yobx.xbuilder import GraphBuilder class ScaleByConstant(TransformerMixin, BaseEstimator): def __init__(self, scale=2.0): self.scale = scale def fit(self, X, y=None): return self def transform(self, X): return X * self.scale @register_sklearn_converter(ScaleByConstant) def convert_scale_by_constant( g: GraphBuilderExtendedProtocol, sts: dict, outputs: list, estimator: ScaleByConstant, X: str, name: str = "scale", ) -> str: scale = np.array([estimator.scale], dtype=np.float32) result = g.op.Mul(X, scale, name=name, outputs=outputs) return result Once this module is imported the converter is available globally and :func:`to_onnx ` will use it automatically: .. code-block:: python import myproject.onnx_converters # registers the converter from yobx.sklearn import to_onnx onx = to_onnx(ScaleByConstant(scale=3.0).fit(X), (X,)) # no extra_converters needed Using convert options in a custom converter =========================================== The ``convert_options`` parameter of :func:`to_onnx ` lets callers request **optional extra outputs** from a converter without changing the converter signature. This pattern is used by the built-in tree converters (``decision_path``, ``decision_leaf``) and is fully available to custom converters. How it works ------------ 1. **Define an options class** — any object that satisfies the :class:`~yobx.typing.ConvertOptionsProtocol` (two methods: ``available_options()`` and ``has()``). 2. **``available_options()``** returns the list of option names your class recognises. The framework iterates this list before calling the converter and pre-allocates one extra slot in ``outputs`` for every option that ``has()`` returns ``True`` for. 3. **``has(option_name, piece, name=None)``** returns ``True`` when the option is active for the estimator *piece*. The optional *name* argument carries the pipeline step name so you can enable an option only for a specific named step in a :class:`~sklearn.pipeline.Pipeline`. 4. **Check inside the converter** — use ``g.convert_options.has(...)`` to decide whether to emit the optional nodes and fill ``outputs[extra_idx]``. Minimal example --------------- The snippet below defines a ``ClipTransformer`` whose ONNX converter optionally emits a boolean clip-mask output controlled by a custom options class: .. code-block:: python import numpy as np from sklearn.base import BaseEstimator, TransformerMixin from yobx.sklearn import to_onnx from yobx.helpers.onnx_helper import tensor_dtype_to_np_dtype from yobx.typing import GraphBuilderExtendedProtocol class ClipTransformer(TransformerMixin, BaseEstimator): def __init__(self, clip_min=0.0, clip_max=1.0): self.clip_min = clip_min self.clip_max = clip_max def fit(self, X, y=None): return self def transform(self, X): return np.clip(X, self.clip_min, self.clip_max) class ClipOptions: def __init__(self, clip_mask=False): self.clip_mask = clip_mask def available_options(self): return ["clip_mask"] def has(self, option_name, piece, name=None): if option_name == "clip_mask": return bool(self.clip_mask) and hasattr(piece, "clip_min") return False def convert_clip_transformer( g: GraphBuilderExtendedProtocol, sts: dict, outputs: list, estimator: ClipTransformer, X: str, name: str = "clip", ): itype = g.get_type(X) dtype = tensor_dtype_to_np_dtype(itype) low = np.array(estimator.clip_min, dtype=dtype) high = np.array(estimator.clip_max, dtype=dtype) clipped = g.op.Clip(X, low, high, name=name, outputs=outputs[:1]) if g.convert_options.has("clip_mask", estimator, name): below = g.op.Less(X, low, name=f"{name}_below") above = g.op.Greater(X, high, name=f"{name}_above") mask = g.op.Or(below, above, name=f"{name}_mask", outputs=outputs[1:2]) return outputs[0] if len(outputs) == 1 else tuple(outputs) X = np.random.default_rng(0).standard_normal((10, 4)).astype(np.float32) transformer = ClipTransformer(clip_min=-0.5, clip_max=0.5).fit(X) # Without options: single output onx = to_onnx( transformer, (X,), extra_converters={ClipTransformer: convert_clip_transformer}, ) # With clip_mask: two outputs onx_with_mask = to_onnx( transformer, (X,), extra_converters={ClipTransformer: convert_clip_transformer}, convert_options=ClipOptions(clip_mask=True), ) print([o.name for o in onx_with_mask.graph.output]) # ['Y', 'clip_mask'] .. seealso:: :ref:`l-plot-sklearn-custom-converter-options` — a full runnable gallery example with numerical validation. :ref:`l-plot-sklearn-convert-options` — how the built-in ``decision_path`` and ``decision_leaf`` options work for tree and ensemble models. :ref:`l-design-expected-api` — the full ``convert_options`` protocol contract and the built-in :class:`~yobx.sklearn.ConvertOptions` reference. Multi-output converters with ``NoKnownOutputMixin`` =================================================== By default the framework infers the expected ONNX output names from the estimator type (see :ref:`l-design-sklearn-converter`, *Output naming* section). When a custom estimator produces outputs that don't fit those heuristics — for example an arbitrary set of named columns — the automatic inference gets in the way. Inheriting from :class:`NoKnownOutputMixin ` tells :func:`get_output_names ` to return ``None``, which causes :func:`to_onnx ` to skip pre-allocating output tensor names and hand full control to the converter. The converter is then free to call ``g.op.*`` and return as many (or as few) output names as it needs. .. seealso:: :ref:`l-design-sklearn-converter` — overview of the converter registry, the built-in converters, and how to add a new converter to the package itself.