.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "auto_examples/core/plot_onnx_time.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_auto_examples_core_plot_onnx_time.py>`
        to download the full example code.

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_auto_examples_core_plot_onnx_time.py:


.. _l-example-plot-onnx-time:

Measures loading and saving time for an ONNX model
====================================================

This script builds a small ONNX model and benchmarks the time to load
and save it using :mod:`onnx`, :mod:`onnx_light.onnx`, and
:mod:`onnxruntime`.
When the standalone C++ example executables ``load_onnx_time``,
``load_onnx_light_time``, and
``save_onnx_light_time`` are available, it also includes their timing output.
The model structure is identical in all cases.

Use ``--model <path>`` on the command line to benchmark an existing ONNX file
instead of the default synthetic model.  The script also prints a short
statistics block (node count, initializer count, total tensor size, etc.)
for whichever model is used.

The ``onnx_light.onnx`` implementation does not depend on protobuf and
therefore avoids the overhead of the protobuf serialization layer.
It also supports parallel loading of tensor weights through the
``num_threads`` keyword and loading models stored with external data.

When loading a single-file model, ``onnx_light.onnx`` memory-maps the
``.onnx`` file (``mmap`` on POSIX, ``CreateFileMapping`` on Windows) and
parses directly out of the mapped region — there is no double-buffered
``ifstream`` + read-ahead step on top of it.  The same memory-mapping
strategy is used for the *external weights* file when a model is stored
with external data: each weights file is mapped once into a shared buffer
that all tensors point into.

This brings ``load/1filex1/onnxlight-cpp`` close to (or ahead of)
``load/1filex1/onnx-cpp`` on parser-bound models with many small
initializers.  When ``no_copy=True`` is requested with a single-file
model the loader still copies inline ``raw_data`` (so that the parsed
``ModelProto`` does not depend on the lifetime of the mmap region):
zero-copy of inline raw data is supported only for ``bytes`` inputs and
for external weights files.

One key advantage over the ``onnx`` package is zero-copy parsing:
when ``no_copy=True`` is passed to :func:`onnx_light.onnx.load` (or via
:class:`~onnx_light.onnx.ParseOptions`), tensor ``raw_data`` blobs are
**not** copied into new buffers.  Instead each ``TensorProto`` stores a
direct pointer into the serialized bytes.  This eliminates one
``malloc + memcpy`` per tensor initializer and is therefore especially
beneficial for models with many large weight tensors.

For models stored with external data, ``no_copy=True`` enables a related
fast path: each external weights file is read once into a shared buffer,
and every tensor points into that shared storage instead of owning a
separate copy.

.. warning::
   When ``no_copy=True`` is used with an in-memory :class:`bytes` object,
   the caller must keep that original buffer alive for as long as the
   parsed model is in use.  External-data files do not have that
   lifetime constraint because ``onnx_light`` keeps the shared file
   buffers alive.

For ``onnxruntime``, the session is created with all graph optimizations
disabled (``ORT_DISABLE_ALL``) so that the measurement reflects only the
model loading overhead rather than compilation or fusion costs.

* ``onnx``, ``onnxlight``, ``ort``: use ``onnx``, ``onnx-light``, or ``onnxruntime``
* ``1filex1``: saves in a single file with 1 thread
* ``1filex4``: saves in a single file with 4 threads
* ``2filex1``: saves in a file and another for external data with 1 thread
* ``2filex4``: saves in a file and another for external data with 4 threads

Selectable benchmark scenarios (via ``--scenario``):
``load``, ``save``, ``serialize``, ``parse``, ``cpp``, ``all``.
The ``cpp`` scenario runs the standalone C++ timing executables
(``load_onnx_time``, ``load_onnx_light_time``, ``save_onnx_light_time``)
when they are available. The executable discovery automatically skips
them when the ``CI`` environment variable is set, so no results are
produced in CI environments where the executables have not been built.

Use ``--model <path>`` to supply an existing single-file ONNX model.
When provided the synthetic model is not created, and the supplied file
is used directly as the benchmark target.  The external-data variant
(used for ``2file`` benchmarks) is still derived from the loaded model
and written to the temporary directory.

Alternatively, use ``--model-id <huggingface_repo_id>`` to download an
ONNX model from the `Hugging Face Hub <https://huggingface.co>`_ and
benchmark it.  For example, ``--model-id onnx-community/Qwen3-0.6B-ONNX``
downloads `onnx-community/Qwen3-0.6B-ONNX
<https://huggingface.co/onnx-community/Qwen3-0.6B-ONNX>`_.  The specific
file to download inside the repository can be selected with
``--model-file`` (default ``onnx/model.onnx``).  When the download
fails (for example due to a connectivity issue) the script prints a
warning and falls back to the default synthetic model so the example
can still run in offline environments.

.. GENERATED FROM PYTHON SOURCE LINES 96-128

.. code-block:: Python


    import argparse
    import math
    import os
    import pathlib
    import re
    import shutil
    import tempfile
    import time
    import urllib.error
    import urllib.request

    import numpy as np
    import pandas
    import onnx
    import onnx.helper as oh
    import onnx.numpy_helper as onh

    import onnxruntime as ort

    _ort_sess_opts = ort.SessionOptions()
    _ort_sess_opts.graph_optimization_level = ort.GraphOptimizationLevel.ORT_DISABLE_ALL

    import onnx_light.onnx as onnxl
    import onnx_light.onnx_lib.helper as onnxlh
    from onnx_light.doc import (
        find_standalone_executable,
        get_processor_name,
        get_total_memory_gb,
        measure_cpp_with_example,
    )


.. GENERATED FROM PYTHON SOURCE LINES 129-135

Setup
-----

Define benchmark parameters and command-line argument parsers.
Use --model <path> to benchmark an existing ONNX file instead of the
default synthetic model built from make_model().

.. GENERATED FROM PYTHON SOURCE LINES 135-333

.. code-block:: Python


    N_INIT = 40
    DIM = 256 if os.environ.get("UNITTEST_GOING") == "1" else 2048
    BENCHMARK_SCENARIOS = ("load", "save", "serialize", "parse", "cpp")


    def _parse_benchmark_scenarios(args=None) -> set[str]:
        """Parses command-line arguments and returns the selected benchmark scenarios."""
        parser = argparse.ArgumentParser(
            description="Runs one or several benchmark scenarios for plot_onnx_time.py."
        )
        parser.add_argument(
            "--scenario",
            dest="scenarios",
            action="append",
            choices=(*BENCHMARK_SCENARIOS, "all"),
            help=(
                "Scenario to execute. May be specified multiple times. "
                "Supported values: load, save, serialize, parse, cpp, all."
            ),
        )
        parsed, _ = parser.parse_known_args(args=args)
        values = parsed.scenarios or ["all"]
        if "all" in values:
            return set(BENCHMARK_SCENARIOS)
        return set(values)


    def _parse_model_path(args=None) -> str | None:
        """Parses the ``--model`` command-line argument and returns the path.

        Returns:
            Path to an existing ONNX model file, or ``None`` if not provided.
        """
        parser = argparse.ArgumentParser(add_help=False)
        parser.add_argument(
            "--model",
            dest="model_path",
            default=None,
            help=(
                "Path to an existing single-file ONNX model to benchmark "
                "instead of the default synthetic model."
            ),
        )
        parsed, _ = parser.parse_known_args(args=args)
        return parsed.model_path


    def _parse_model_id(args=None) -> tuple[str | None, str]:
        """Parses the ``--model-id`` and ``--model-file`` command-line arguments.

        Returns:
            A tuple ``(model_id, model_file)`` where ``model_id`` is the
            Hugging Face repository identifier (or ``None`` when not given)
            and ``model_file`` is the path within the repository of the ONNX
            file to download (defaults to ``onnx/model.onnx``).
        """
        parser = argparse.ArgumentParser(add_help=False)
        parser.add_argument(
            "--model-id",
            dest="model_id",
            default=None,
            help=(
                "Hugging Face repository id (e.g. onnx-community/Qwen3-0.6B-ONNX) "
                "from which to download an ONNX model to benchmark."
            ),
        )
        parser.add_argument(
            "--model-file",
            dest="model_file",
            default="onnx/model.onnx",
            help=(
                "Path within the Hugging Face repository of the ONNX file to "
                "download when --model-id is provided. Defaults to onnx/model.onnx."
            ),
        )
        parsed, _ = parser.parse_known_args(args=args)
        return parsed.model_id, parsed.model_file


    def _download_hf_model(model_id: str, model_file: str, dest_dir: str) -> str | None:
        """Downloads an ONNX model file from the Hugging Face Hub.

        The file is fetched from
        ``https://huggingface.co/{model_id}/resolve/main/{model_file}`` and
        written under *dest_dir*.  Any download failure (network error,
        HTTP error, OS error, ...) is caught and reported with a warning;
        the function then returns ``None`` so that callers can fall back to
        a default model.

        Args:
            model_id: Hugging Face repository identifier.
            model_file: Path of the ONNX file inside the repository.
            dest_dir: Directory in which to write the downloaded file.

        Returns:
            Absolute path to the downloaded file, or ``None`` when the
            download failed.
        """
        url = f"https://huggingface.co/{model_id}/resolve/main/{model_file}"
        local_path = os.path.abspath(os.path.join(dest_dir, os.path.basename(model_file)))
        os.makedirs(os.path.dirname(local_path) or ".", exist_ok=True)
        print(f"Downloading {url} -> {local_path}")
        try:
            urllib.request.urlretrieve(url, local_path)  # noqa: S310
        except (urllib.error.URLError, urllib.error.HTTPError, OSError, ValueError) as exc:
            print(
                f"WARNING: failed to download {url}: {exc}. "
                "Falling back to the default synthetic model."
            )
            if os.path.exists(local_path):
                try:
                    os.remove(local_path)
                except OSError:
                    pass
            return None
        return local_path


    SELECTED_SCENARIOS = _parse_benchmark_scenarios()
    _CLI_MODEL_PATH = _parse_model_path()
    _CLI_MODEL_ID, _CLI_MODEL_FILE = _parse_model_id()


    def _run_scenario(name: str) -> bool:
        """Checks whether the given scenario name is selected for execution."""
        return name in SELECTED_SCENARIOS


    def make_model(n_init: int = N_INIT, dim: int = DIM) -> onnx.ModelProto:
        """Returns a synthetic ONNX model with *n_init* Gemm initializers of size *dim*."""
        initializers = []
        nodes = []
        inputs = [oh.make_tensor_value_info("X", onnx.TensorProto.FLOAT, [None, dim])]

        prev = "X"
        for i in range(n_init):
            weight_name = f"W{i}"
            out_name = f"Y{i}"
            w = np.random.randn(dim, dim).astype(np.float32)
            initializers.append(onh.from_array(w, name=weight_name))
            nodes.append(oh.make_node("Gemm", [prev, weight_name], [out_name], transB=1))
            prev = out_name

        outputs = [oh.make_tensor_value_info(prev, onnx.TensorProto.FLOAT, [None, dim])]
        graph = oh.make_graph(nodes, "bench_graph", inputs, outputs, initializer=initializers)
        model = oh.make_model(graph, opset_imports=[oh.make_opsetid("", 18)], ir_version=9)
        return model


    def _tensor_data_bytes(tensor: onnx.TensorProto) -> int:
        """Returns the in-memory byte count of a TensorProto's stored data.

        Uses :func:`onnx_light.onnx_lib.helper.tensor_dtype_to_np_dtype` to map
        the element type to a numpy dtype and derives the byte count from the
        tensor dimensions, avoiding a full array materialisation.

        Returns:
            Byte count of the tensor's data, or ``0`` when it cannot be determined.
        """
        if tensor.raw_data:
            return len(tensor.raw_data)
        if tensor.data_type not in onnxlh.TENSOR_TYPE_MAP:
            return 0
        np_dtype = onnxlh.tensor_dtype_to_np_dtype(tensor.data_type)
        n_elements = math.prod(tensor.dims) if tensor.dims else 1
        return int(np_dtype.itemsize * n_elements)


    def print_model_stats(model: onnx.ModelProto, file_path: str | None = None) -> None:
        """Prints summary statistics for *model* to stdout.

        Args:
            model: The ONNX model to inspect.
            file_path: Optional path to the model file on disk; when given the
                file size is included in the output.
        """
        graph = model.graph
        n_nodes = len(graph.node)
        n_initializers = len(graph.initializer)
        n_inputs = len(graph.input)
        n_outputs = len(graph.output)
        total_tensor_bytes = sum(_tensor_data_bytes(t) for t in graph.initializer)
        opsets = ", ".join(f"{op.domain or 'ai.onnx'}={op.version}" for op in model.opset_import)
        print("Model statistics")
        print("----------------")
        print(f"  IR version              : {model.ir_version}")
        print(f"  Opset(s)                : {opsets}")
        print(f"  Number of nodes         : {n_nodes}")
        print(f"  Number of inputs        : {n_inputs}")
        print(f"  Number of outputs       : {n_outputs}")
        print(f"  Number of initializers  : {n_initializers}")
        print(f"  Total initializer size  : {total_tensor_bytes / 2 ** 20:.3f} MB")
        if file_path and os.path.exists(file_path):
            print(f"  File size               : {os.path.getsize(file_path) / 2 ** 20:.3f} MB")
        print(f"  Serialized model size   : {model.ByteSize() / 2 ** 20:.3f} MB")


.. GENERATED FROM PYTHON SOURCE LINES 334-339

Model setup
-----------

Either load an existing model supplied via ``--model`` or build the default
synthetic one and write it to a temporary directory.

.. GENERATED FROM PYTHON SOURCE LINES 339-377

.. code-block:: Python


    tmp_dir = "temp_plot_onnx_time"
    if not os.path.exists(tmp_dir):
        os.mkdir(tmp_dir)

    if _CLI_MODEL_PATH is not None:
        onnx_path = os.path.abspath(_CLI_MODEL_PATH)
        model = onnx.load(onnx_path)
        print(f"Using provided model: {onnx_path}")
    elif _CLI_MODEL_ID is not None:
        downloaded = _download_hf_model(_CLI_MODEL_ID, _CLI_MODEL_FILE, tmp_dir)
        if downloaded is not None:
            onnx_path = downloaded
            model = onnx.load(onnx_path)
            print(f"Using model from Hugging Face id {_CLI_MODEL_ID!r}: {onnx_path}")
        else:
            model = make_model()
            onnx_path = os.path.join(tmp_dir, "bench.onnx")
            onnx.save(model, onnx_path)
    else:
        model = make_model()
        onnx_path = os.path.join(tmp_dir, "bench.onnx")
        onnx.save(model, onnx_path)

    size_bytes = model.ByteSize()
    print(f"Model size: {size_bytes / 2 ** 20:.3f} MB")

    file_size = os.path.getsize(onnx_path)
    print(f"File size : {file_size / 2 ** 20:.3f} MB")

    onx = onnx.load(onnx_path)
    onxl = onnxl.load(onnx_path)
    onxl_x4 = onnxl.load(onnx_path, num_threads=4)

    ext_load_onnx = os.path.abspath(os.path.join(tmp_dir, "ext_load.onnx"))
    ext_load_data = os.path.abspath(os.path.join(tmp_dir, "ext_load.onnx.data"))
    onnxl.save(onxl, ext_load_onnx, location=ext_load_data)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Model size: 640.002 MB
    File size : 640.002 MB


.. GENERATED FROM PYTHON SOURCE LINES 378-383

Model statistics
----------------

Print a summary of the model: number of nodes, initializers (tensors),
total weight size, file size, and serialized size.

.. GENERATED FROM PYTHON SOURCE LINES 383-386

.. code-block:: Python


    print_model_stats(model, onnx_path)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Model statistics
    ----------------
      IR version              : 9
      Opset(s)                : ai.onnx=18
      Number of nodes         : 40
      Number of inputs        : 1
      Number of outputs       : 1
      Number of initializers  : 40
      Total initializer size  : 640.000 MB
      File size               : 640.002 MB
      Serialized model size   : 640.002 MB


.. GENERATED FROM PYTHON SOURCE LINES 387-388

Benchmark helper.

.. GENERATED FROM PYTHON SOURCE LINES 388-638

.. code-block:: Python


    MIN_TIME_THRESHOLD = 1e-9
    CPP_LOAD_METRIC_PATTERN = re.compile(
        r"^\s*(Average|Median|Min|Max|Std|Standard deviation) load \(ms\)\s*:\s*([0-9.eE+-]+)\s*$"
    )
    CPP_SAVE_METRIC_PATTERN = re.compile(
        r"^\s*(Average|Median|Min|Max|Std|Standard deviation) save \(ms\)\s*:\s*([0-9.eE+-]+)\s*$"
    )
    WINDOWS_BUILD_CONFIGS = ("Release", "RelWithDebInfo", "Debug", "MinSizeRel")


    def measure(name: str, fn, n: int = 5, warmup: int = 1) -> dict:
        """
        Executes *fn* with warm-up iterations and records timing statistics.

        Args:
            name: Benchmark name.
            fn: Callable to execute.
            n: Number of measured iterations.
            warmup: Number of non-measured warm-up iterations.

        Returns:
            A dictionary containing name, median, avg, min, max, and std.
        """
        for _ in range(max(0, warmup)):
            fn()
        times = []
        for _ in range(n):
            t0 = time.perf_counter()
            fn()
            times.append(time.perf_counter() - t0)
        arr = np.array(times)
        return {
            "name": name,
            "median": float(np.median(arr)),
            "avg": float(np.mean(arr)),
            "min": float(np.min(arr)),
            "max": float(np.max(arr)),
            "std": float(np.std(arr)),
        }


    def _flush_file(path: str) -> None:
        """Flushes one file descriptor so benchmark timing includes write-back."""
        with open(path, "r+b") as stream:
            stream.flush()
            os.fsync(stream.fileno())


    def print_stats(name: str, stats: dict) -> None:
        """Prints timing statistics (average, median, max, and standard deviation) in milliseconds."""
        print(
            f"{name:<35} avg={stats['avg'] * 1e3:.1f} ms"
            f" median={stats['median'] * 1e3:.1f} ms"
            f" max={stats['max'] * 1e3:.1f} ms"
            f" std={stats['std'] * 1e3:.1f} ms"
        )


    def _find_load_onnx_time_executable() -> str | None:
        """Locates the standalone C++ timing executable.

        Returns:
            The path to ``load_onnx_time`` if available, otherwise ``None``.
        """
        return find_standalone_executable(
            "load_onnx_time",
            [
                pathlib.Path("build/load-onnx-time-example/load_onnx_time"),
                pathlib.Path("build/examples/load_onnx_time/load_onnx_time"),
                pathlib.Path("build-load-onnx-time/load_onnx_time"),
            ],
            script_file=globals().get("__file__"),
            windows_build_configs=WINDOWS_BUILD_CONFIGS,
        )


    def _find_load_onnx_light_time_executable() -> str | None:
        """Locates the standalone ``load_onnx_light_time`` executable.

        Returns:
            The path to ``load_onnx_light_time`` if available, otherwise ``None``.
        """
        return find_standalone_executable(
            "load_onnx_light_time",
            [
                pathlib.Path("build/load-onnx-light-time-example/load_onnx_light_time"),
                pathlib.Path("build/examples/load_onnx_light_time/load_onnx_light_time"),
                pathlib.Path("build-load-onnx-light-time/load_onnx_light_time"),
            ],
            script_file=globals().get("__file__"),
            windows_build_configs=WINDOWS_BUILD_CONFIGS,
        )


    def _measure_cpp_load_with_example(
        onnx_file: str,
        n: int = 20,
        num_threads: int = 1,
        executable_name: str = "load_onnx_light_time",
        file_count: int = 1,
        no_copy: bool = False,
        touch_raw_data_pages: bool = False,
    ) -> dict | None:
        """Measures C++ loading performance through a standalone executable.

        Args:
            onnx_file: Model path to pass to the standalone executable.
            n: Number of iterations to pass to the standalone executable.
            num_threads: Number of loading threads to pass to the standalone executable.
            executable_name: Executable selector to use:
                ``"load_onnx_time"`` or ``"load_onnx_light_time"``.
            file_count: Number of files involved in the benchmark key.
            no_copy: Whether to request ``no_copy`` mode from ``load_onnx_light_time``.
            touch_raw_data_pages: Whether to request page touching during no-copy loading
                from ``load_onnx_light_time``.

        Returns:
            A benchmark dictionary matching :func:`measure` output keys if successful,
            otherwise ``None``.
        """
        if file_count <= 0:
            raise ValueError(f"file_count must be positive, got {file_count!r}")
        if executable_name == "load_onnx_time":
            if no_copy:
                raise ValueError("no_copy is only supported with 'load_onnx_light_time'")
            executable = _find_load_onnx_time_executable()
            result_name = f"load/{file_count}filex{num_threads}/onnx-cpp"
        elif executable_name == "load_onnx_light_time":
            executable = _find_load_onnx_light_time_executable()
            lib_name = "onnxlight-cpp-nocopy" if no_copy else "onnxlight-cpp"
            result_name = f"load/{file_count}filex{num_threads}/{lib_name}"
        else:
            raise ValueError(
                "executable_name must be 'load_onnx_time' or "
                f"'load_onnx_light_time', got {executable_name!r}"
            )
        args = [onnx_file, str(n), str(num_threads)]
        if no_copy:
            args.append("nocopy_touch" if touch_raw_data_pages else "nocopy")
        return measure_cpp_with_example(
            executable=executable,
            args=args,
            metric_pattern=CPP_LOAD_METRIC_PATTERN,
            result_name=result_name,
            executable_name=executable_name,
        )


    def _find_save_onnx_light_time_executable() -> str | None:
        """Locates the standalone C++ save-timing executable.

        Returns:
            The path to ``save_onnx_light_time`` if available, otherwise ``None``.
        """
        return find_standalone_executable(
            "save_onnx_light_time",
            [
                pathlib.Path("build/save-onnx-light-time-example/save_onnx_light_time"),
                pathlib.Path("build/examples/save_onnx_light_time/save_onnx_light_time"),
                pathlib.Path("build-save-onnx-light-time/save_onnx_light_time"),
            ],
            script_file=globals().get("__file__"),
            windows_build_configs=WINDOWS_BUILD_CONFIGS,
        )


    def _measure_cpp_save_with_example(
        onnx_file: str, n: int = 20, num_threads: int = 1
    ) -> dict | None:
        """Measures C++ one-file save performance through ``save_onnx_light_time``.

        Returns:
            A benchmark dictionary matching :func:`measure` output keys if successful,
            otherwise ``None``.
        """
        executable = _find_save_onnx_light_time_executable()
        if executable is None:
            return None
        with tempfile.TemporaryDirectory() as tmp_save_dir:
            return measure_cpp_with_example(
                executable=executable,
                args=[onnx_file, tmp_save_dir, str(n), str(num_threads), "onefile"],
                metric_pattern=CPP_SAVE_METRIC_PATTERN,
                result_name=f"save/1filex{num_threads}/onnxlight-cpp",
                executable_name="save_onnx_light_time",
            )


    # Load scenarios
    # --------------

    data = []
    if _run_scenario("load"):
        # %%
        # Load with onnx.

        data.append(measure("load/1filex1/onnx", lambda: onnx.load(onnx_path)))
        print_stats("load/1filex1/onnx", data[-1])

        # %%
        # Load with ``onnx_light.onnx``.

        data.append(measure("load/1filex1/onnxlight", lambda: onnxl.load(onnx_path)))
        print_stats("load/1filex1/onnxlight", data[-1])

        # %%
        # Load with ``onnx_light.onnx`` using parallel tensor loading.

        data.append(measure("load/1filex4/onnxlight", lambda: onnxl.load(onnx_path, num_threads=4)))
        print_stats("load/1filex4/onnxlight", data[-1])

        # %%
        # Compare the two file-backed stream implementations explicitly:
        # ``FileLoadMode.MMAP`` memory-maps the ``.onnx`` file (``mmap`` on POSIX,
        # ``CreateFileMapping`` on Windows) and parses directly out of the mapped
        # region, while ``FileLoadMode.IFSTREAM`` forces the buffered
        # ``std::ifstream``-based reader.  The default ``FileLoadMode.AUTO``
        # behaves like ``MMAP`` for single-file models when ``no_copy`` is not
        # requested; running both modes side by side highlights the gain (or
        # cost) of memory mapping on the current platform/filesystem.

        data.append(
            measure(
                "load/1filex1/onnxlight-mmap", lambda: onnxl.load(onnx_path, file_load_mode="MMAP")
            )
        )
        print_stats("load/1filex1/onnxlight-mmap", data[-1])

        data.append(
            measure(
                "load/1filex1/onnxlight-ifstream",
                lambda: onnxl.load(onnx_path, file_load_mode="IFSTREAM"),
            )
        )
        print_stats("load/1filex1/onnxlight-ifstream", data[-1])

        # %%
        # Load with ``onnxruntime`` (all optimizations disabled).
        # ``InferenceSession`` is created with ``ORT_DISABLE_ALL`` so the
        # measurement captures only model loading overhead, not graph optimization.

        data.append(
            measure(
                "load/1filex1/ort",
                lambda: ort.InferenceSession(onnx_path, sess_options=_ort_sess_opts),
            )
        )
        print_stats("load/1filex1/ort", data[-1])


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    load/1filex1/onnx                   avg=166.7 ms median=166.8 ms max=167.8 ms std=0.9 ms
    load/1filex1/onnxlight              avg=94.8 ms median=94.8 ms max=95.4 ms std=0.4 ms
    load/1filex4/onnxlight              avg=53.7 ms median=53.8 ms max=55.3 ms std=1.3 ms
    load/1filex1/onnxlight-mmap         avg=96.0 ms median=95.9 ms max=96.8 ms std=0.6 ms
    load/1filex1/onnxlight-ifstream     avg=108.7 ms median=108.6 ms max=108.9 ms std=0.1 ms
    load/1filex1/ort                    avg=381.1 ms median=384.1 ms max=387.5 ms std=5.5 ms


.. GENERATED FROM PYTHON SOURCE LINES 639-641

Serialize and Parse benchmarks
------------------------------

.. GENERATED FROM PYTHON SOURCE LINES 641-674

.. code-block:: Python


    def _serialize_onnx() -> bytes:
        """Serializes the ONNX model to bytes."""
        return onx.SerializeToString()


    def _serialize_onnxlight() -> bytes:
        """Serializes the onnx_light model to bytes."""
        return onxl.SerializeToString()


    def _serialize_onnxlight_x4() -> bytes:
        """Serializes the onnx_light model in parallel to bytes."""
        return onxl.SerializeToString(opts_serial_x4)


    if _run_scenario("serialize"):
        opts_serial_x4 = onnxl.SerializeOptions()
        opts_serial_x4.num_threads = 4

        assert len(_serialize_onnx()) > 0
        assert len(_serialize_onnxlight()) > 0
        assert len(_serialize_onnxlight_x4()) > 0

        data.append(measure("serialize/x1/onnx", _serialize_onnx))
        print_stats("serialize/x1/onnx", data[-1])
        data.append(measure("serialize/x1/onnxlight", _serialize_onnxlight))
        print_stats("serialize/x1/onnxlight", data[-1])
        data.append(measure("serialize/x4/onnxlight", _serialize_onnxlight_x4))
        print_stats("serialize/x4/onnxlight", data[-1])


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    serialize/x1/onnx                   avg=384.1 ms median=383.7 ms max=386.9 ms std=1.6 ms
    serialize/x1/onnxlight              avg=388.4 ms median=388.2 ms max=389.7 ms std=0.7 ms
    serialize/x4/onnxlight              avg=396.6 ms median=393.6 ms max=406.3 ms std=6.9 ms


.. GENERATED FROM PYTHON SOURCE LINES 675-676

ParseFromString comparison between ``onnx`` and ``onnx_light.onnx``.

.. GENERATED FROM PYTHON SOURCE LINES 676-763

.. code-block:: Python


    def _parse_onnx() -> onnx.ModelProto:
        """Parses ONNX bytes into a ModelProto."""
        parsed = onnx.ModelProto()
        parsed.ParseFromString(serialized_onnx)
        return parsed


    def _parse_onnxlight() -> onnxl.ModelProto:
        """Parses onnx_light bytes into a ModelProto."""
        parsed = onnxl.ModelProto()
        parsed.ParseFromString(serialized_onnxlight)
        return parsed


    def _parse_onnxlight_x4() -> onnxl.ModelProto:
        """Parses onnx_light bytes in parallel into a ModelProto."""
        parsed = onnxl.ModelProto()
        parsed.ParseFromString(serialized_onnxlight, opts_parse_x4)
        return parsed


    def _parse_onnxlight_nc() -> onnxl.ModelProto:
        """Parses onnx_light bytes without copying raw tensor data (zero-copy)."""
        parsed = onnxl.ModelProto()
        parsed.ParseFromString(serialized_onnxlight, opts_parse_nc)
        return parsed


    def _parse_onnxlight_nc_x4() -> onnxl.ModelProto:
        """Parses onnx_light bytes in parallel without copying raw tensor data (zero-copy, 4 t)."""
        parsed = onnxl.ModelProto()
        parsed.ParseFromString(serialized_onnxlight, opts_parse_nc_x4)
        return parsed


    if _run_scenario("parse"):
        serialized_onnx = onx.SerializeToString()
        serialized_onnxlight = onxl.SerializeToString()
        opts_parse_x4 = onnxl.ParseOptions()
        opts_parse_x4.num_threads = 4
        opts_parse_nc = onnxl.ParseOptions()
        opts_parse_nc.no_copy = True
        opts_parse_nc_x4 = onnxl.ParseOptions()
        opts_parse_nc_x4.no_copy = True
        opts_parse_nc_x4.num_threads = 4

        parsed_onnx = _parse_onnx()
        assert parsed_onnx.ir_version == onx.ir_version
        assert len(parsed_onnx.graph.node) == len(onx.graph.node)
        parsed_onnxlight = _parse_onnxlight()
        assert parsed_onnxlight.ir_version == onxl.ir_version
        assert len(parsed_onnxlight.graph.node) == len(onxl.graph.node)
        parsed_onnxlight_x4 = _parse_onnxlight_x4()
        assert parsed_onnxlight_x4.ir_version == onxl.ir_version
        assert len(parsed_onnxlight_x4.graph.node) == len(onxl.graph.node)
        parsed_onnxlight_nc = _parse_onnxlight_nc()
        assert parsed_onnxlight_nc.ir_version == onxl.ir_version
        assert len(parsed_onnxlight_nc.graph.node) == len(onxl.graph.node)
        parsed_onnxlight_nc_x4 = _parse_onnxlight_nc_x4()
        assert parsed_onnxlight_nc_x4.ir_version == onxl.ir_version
        assert len(parsed_onnxlight_nc_x4.graph.node) == len(onxl.graph.node)

        data.append(measure("parse/x1/onnx", _parse_onnx))
        print_stats("parse/x1/onnx", data[-1])
        data.append(measure("parse/x1/onnxlight", _parse_onnxlight))
        print_stats("parse/x1/onnxlight", data[-1])
        data.append(measure("parse/x4/onnxlight", _parse_onnxlight_x4))
        print_stats("parse/x4/onnxlight", data[-1])

        # %%
        # Parse with zero-copy (``no_copy=True``): raw tensor data is not copied.
        # The pointer inside each TensorProto points directly into ``serialized_onnxlight``.
        # The bytes object **must** remain alive for as long as the parsed model is used.

        data.append(measure("parse/nc/onnxlight", _parse_onnxlight_nc))
        print_stats("parse/nc/onnxlight", data[-1])

        # %%
        # Parse with zero-copy **and** parallel tensor reads (``no_copy=True, num_threads=4``).
        # Combines the allocation savings of zero-copy with multi-threaded I/O for large models.

        data.append(measure("parse/ncx4/onnxlight", _parse_onnxlight_nc_x4))
        print_stats("parse/ncx4/onnxlight", data[-1])


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    parse/x1/onnx                       avg=96.9 ms median=95.1 ms max=102.7 ms std=3.1 ms
    parse/x1/onnxlight                  avg=97.5 ms median=97.6 ms max=98.3 ms std=0.8 ms
    parse/x4/onnxlight                  avg=54.0 ms median=54.4 ms max=54.9 ms std=1.0 ms
    parse/nc/onnxlight                  avg=0.0 ms median=0.0 ms max=0.0 ms std=0.0 ms
    parse/ncx4/onnxlight                avg=0.2 ms median=0.2 ms max=0.2 ms std=0.0 ms


.. GENERATED FROM PYTHON SOURCE LINES 764-773

Save benchmarks
---------------

Save once with external data (not benchmarked) using ``onnx_light.onnx`` so
that the in-memory model is not modified (``ClearExternalData`` restores it
after the C++ write).
Absolute paths ensure onnxlight stores only the basename in the ``.onnx``
metadata, letting both ``onnx.load`` and ``onnxl.load`` resolve the data
file automatically.

.. GENERATED FROM PYTHON SOURCE LINES 773-865

.. code-block:: Python


    if _run_scenario("save"):
        # %%
        # Save with ``onnx``.

        out_onnx = os.path.join(tmp_dir, "out_onnx.onnx")
        data.append(measure("save/1filex1/onnx", lambda: onnx.save(onx, out_onnx)))
        print_stats("save/1filex1/onnx", data[-1])

        # %%
        # Save with ``onnx`` using external data.
        # This is the slow path: Python iterates every tensor, creates a numpy
        # intermediate, and calls Python I/O for each weight blob.

        out_onnx_ext = os.path.join(tmp_dir, "out_onnx_ext.onnx")
        out_onnx_ext_location = "out_onnx_ext.data"
        out_onnx_ext_data = os.path.join(tmp_dir, out_onnx_ext_location)

        def _save_onnx_external_with_flush() -> None:
            onnx.save_model(
                onx,
                out_onnx_ext,
                save_as_external_data=True,
                all_tensors_to_one_file=True,
                location=out_onnx_ext_location,
            )
            _flush_file(out_onnx_ext_data)
            _flush_file(out_onnx_ext)

        data.append(measure("save/2filex1/onnx", _save_onnx_external_with_flush, n=1, warmup=0))
        print_stats("save/2filex1/onnx", data[-1])

        # %%
        # The onnx file is modified to store the external data.
        # Let's make sure it is not used again.
        onx = None

        # %%
        # Save with ``onnx_light.onnx``.

        out_onnxl = os.path.join(tmp_dir, "out_onnxlight.onnx")
        data.append(measure("save/1filex1/onnxlight", lambda: onnxl.save(onxl, out_onnxl)))
        print_stats("save/1filex1/onnxlight", data[-1])

        # %%
        # Save with ``onnx_light.onnx`` parallelized.

        out_onnxl_x4 = os.path.join(tmp_dir, "out_onnxlight_x4.onnx")
        data.append(
            measure(
                "save/1filex4/onnxlight", lambda: onnxl.save(onxl_x4, out_onnxl_x4, num_threads=4)
            )
        )
        print_stats("save/1filex4/onnxlight", data[-1])

        # %%
        # Save with ``onnx_light.onnx`` using external data.
        # All work is done in C++: ``PopulateExternalData`` attaches metadata once,
        # ``SerializeToStream`` routes large ``raw_data`` blobs directly to the
        # weights file via ``TwoFilesWriteStream``, and ``ClearExternalData``
        # restores the in-memory model.  No numpy arrays are created.
        # As for the ``onnx`` row, the two output files are explicitly ``fsync``-ed
        # so both benchmarks include descriptor flush/write-back costs.
        # The main ``.onnx`` structure is accumulated in a ``StringWriteStream``
        # (memory buffer) and flushed to disk in a single write after all tensor
        # data has been written, mirroring the sequential I/O pattern used by
        # ``onnx.save_model`` and allowing OS-level write coalescing.

        out_ext = os.path.join(tmp_dir, "out_ext.onnx")
        out_ext_data = out_ext + ".data"

        def _save_onnxlight_external_with_flush() -> None:
            onnxl.save(onxl, out_ext, location=out_ext_data)
            _flush_file(out_ext_data)
            _flush_file(out_ext)

        data.append(measure("save/2filex1/onnxlight", _save_onnxlight_external_with_flush))
        print_stats("save/2filex1/onnxlight", data[-1])

        # %%
        # Save with ``onnx_light.onnx`` using external data parallelized.

        out_ext_x4 = os.path.join(tmp_dir, "out_ext_x4.onnx")
        out_ext_x4_data = out_ext_x4 + ".data"
        data.append(
            measure(
                "save/2filex4/onnxlight",
                lambda: onnxl.save(onxl, out_ext_x4, location=out_ext_x4_data, num_threads=4),
            )
        )
        print_stats("save/2filex4/onnxlight", data[-1])


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    save/1filex1/onnx                   avg=1810.3 ms median=1653.6 ms max=2156.6 ms std=262.6 ms
    save/2filex1/onnx                   avg=2100.3 ms median=2100.3 ms max=2100.3 ms std=0.0 ms
    save/1filex1/onnxlight              avg=1477.0 ms median=1592.0 ms max=1742.6 ms std=297.4 ms
    save/1filex4/onnxlight              avg=256.2 ms median=208.2 ms max=449.0 ms std=96.4 ms
    save/2filex1/onnxlight              avg=1716.4 ms median=1644.1 ms max=2092.0 ms std=189.4 ms
    save/2filex4/onnxlight              avg=208.1 ms median=206.6 ms max=214.4 ms std=3.2 ms


.. GENERATED FROM PYTHON SOURCE LINES 866-873

C++ benchmarks
--------------

Run the standalone C++ benchmark executables when available.
These scenarios measure the same operations as ``load`` and ``save``
but use the compiled C++ timing executables directly, bypassing the
Python interpreter overhead entirely.

.. GENERATED FROM PYTHON SOURCE LINES 873-935

.. code-block:: Python


    if _run_scenario("cpp"):
        # %%
        # Load with standalone C++ ``load_onnx_light_time`` example when available.
        # The executable uses ``FileStream`` as well, so this row measures the same
        # file-backed parsing path as ``onnxl.load(onnx_path)``.

        cpp_load_x1 = _measure_cpp_load_with_example(onnx_path, num_threads=1)
        if cpp_load_x1 is not None:
            data.append(cpp_load_x1)
            print_stats(cpp_load_x1["name"], cpp_load_x1)
        else:
            print(
                "load_onnx_light_time executable not found (or failed), skipping C++ load benchmark."
            )

        cpp_load_x4 = _measure_cpp_load_with_example(onnx_path, num_threads=4)
        if cpp_load_x4 is not None:
            data.append(cpp_load_x4)
            print_stats(cpp_load_x4["name"], cpp_load_x4)

        # %%
        # Load an external-data model with standalone C++ ``load_onnx_light_time``
        # using ``no_copy`` shared external buffers.

        cpp_load_ext_nc = _measure_cpp_load_with_example(
            ext_load_onnx, num_threads=1, file_count=2, no_copy=True, touch_raw_data_pages=True
        )
        if cpp_load_ext_nc is not None:
            data.append(cpp_load_ext_nc)
            print_stats(cpp_load_ext_nc["name"], cpp_load_ext_nc)

        # %%
        # Load with standalone C++ ``load_onnx_time`` example when available.
        # The executable uses the standard onnx protobuf library for loading.

        cpp_load_onnx_x1 = _measure_cpp_load_with_example(
            onnx_path, num_threads=1, executable_name="load_onnx_time"
        )
        if cpp_load_onnx_x1 is not None:
            data.append(cpp_load_onnx_x1)
            print_stats(cpp_load_onnx_x1["name"], cpp_load_onnx_x1)
        else:
            print("load_onnx_time executable not found (or failed), skipping C++ load benchmark.")

        # %%
        # Save with standalone C++ ``save_onnx_light_time`` example when available.

        cpp_save_x1 = _measure_cpp_save_with_example(onnx_path, num_threads=1)
        if cpp_save_x1 is not None:
            data.append(cpp_save_x1)
            print_stats(cpp_save_x1["name"], cpp_save_x1)
        else:
            print(
                "save_onnx_light_time executable not found (or failed), skipping C++ save benchmark."
            )

        cpp_save_x4 = _measure_cpp_save_with_example(onnx_path, num_threads=4)
        if cpp_save_x4 is not None:
            data.append(cpp_save_x4)
            print_stats(cpp_save_x4["name"], cpp_save_x4)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    load_onnx_light_time executable not found (or failed), skipping C++ load benchmark.
    load_onnx_time executable not found (or failed), skipping C++ load benchmark.
    save_onnx_light_time executable not found (or failed), skipping C++ save benchmark.


.. GENERATED FROM PYTHON SOURCE LINES 936-940

Load with ``onnx`` using external data
--------------------------------------

Reload the model previously saved with external data using ``onnx.load``.

.. GENERATED FROM PYTHON SOURCE LINES 940-998

.. code-block:: Python


    if _run_scenario("load"):
        data.append(
            measure("load/2filex1/onnx", lambda: onnx.load(ext_load_onnx, load_external_data=True))
        )
        print_stats("load/2filex1/onnx", data[-1])

        # %%
        # Load with ``onnx_light.onnx`` using external data.
        # Reload the same external-data model using ``onnxl.load``.

        data.append(
            measure(
                "load/2filex1/onnxlight", lambda: onnxl.load(ext_load_onnx, location=ext_load_data)
            )
        )
        print_stats("load/2filex1/onnxlight", data[-1])

        # %%
        # Load with ``onnx_light.onnx`` using external data and shared no-copy buffers.
        # Each external weights file is read once, then every tensor borrows a view
        # into that shared buffer.

        data.append(
            measure(
                "load/2filex1/onnxlight-nocopy",
                lambda: onnxl.load(
                    ext_load_onnx, location=ext_load_data, no_copy=True, touch_raw_data_pages=True
                ),
            )
        )
        print_stats("load/2filex1/onnxlight-nocopy", data[-1])

        # %%
        # Load with ``onnx_light.onnx`` using external data and parallel tensor loading.
        # Combine external-data loading with ``num_threads > 1`` for maximum throughput.

        data.append(
            measure(
                "load/2filex4/onnxlight",
                lambda: onnxl.load(ext_load_onnx, location=ext_load_data, num_threads=4),
            )
        )
        print_stats("load/2filex4/onnxlight", data[-1])

        # %%
        # Load with ``onnxruntime`` using external data (all optimizations disabled).
        # Reload the external-data model with ``onnxruntime``, keeping
        # ``ORT_DISABLE_ALL`` so only loading overhead is measured.

        data.append(
            measure(
                "load/2filex1/ort",
                lambda: ort.InferenceSession(ext_load_onnx, sess_options=_ort_sess_opts),
            )
        )
        print_stats("load/2filex1/ort", data[-1])


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    load/2filex1/onnx                   avg=83.5 ms median=83.6 ms max=84.3 ms std=0.5 ms
    load/2filex1/onnxlight              avg=52.0 ms median=52.0 ms max=52.3 ms std=0.3 ms
    load/2filex1/onnxlight-nocopy       avg=4.7 ms median=4.7 ms max=4.7 ms std=0.0 ms
    load/2filex4/onnxlight              avg=37.6 ms median=37.5 ms max=38.3 ms std=0.4 ms
    load/2filex1/ort                    avg=261.3 ms median=260.9 ms max=262.6 ms std=0.8 ms


.. GENERATED FROM PYTHON SOURCE LINES 999-1001

Results
--------

.. GENERATED FROM PYTHON SOURCE LINES 1001-1006

.. code-block:: Python


    df = pandas.DataFrame(data).set_index("name").sort_index()
    print(df)
    df = df.sort_index(ascending=False)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

                                       median       avg  ...       max       std
    name                                                 ...                    
    load/1filex1/onnx                0.166848  0.166655  ...  0.167779  0.000875
    load/1filex1/onnxlight           0.094797  0.094782  ...  0.095382  0.000440
    load/1filex1/onnxlight-ifstream  0.108639  0.108650  ...  0.108851  0.000148
    load/1filex1/onnxlight-mmap      0.095863  0.095963  ...  0.096754  0.000585
    load/1filex1/ort                 0.384123  0.381123  ...  0.387492  0.005454
    load/1filex4/onnxlight           0.053849  0.053733  ...  0.055260  0.001320
    load/2filex1/onnx                0.083553  0.083499  ...  0.084313  0.000499
    load/2filex1/onnxlight           0.051952  0.051971  ...  0.052336  0.000295
    load/2filex1/onnxlight-nocopy    0.004669  0.004671  ...  0.004701  0.000019
    load/2filex1/ort                 0.260949  0.261341  ...  0.262581  0.000751
    load/2filex4/onnxlight           0.037528  0.037553  ...  0.038307  0.000433
    parse/nc/onnxlight               0.000029  0.000029  ...  0.000031  0.000001
    parse/ncx4/onnxlight             0.000206  0.000211  ...  0.000239  0.000015
    parse/x1/onnx                    0.095135  0.096881  ...  0.102680  0.003102
    parse/x1/onnxlight               0.097615  0.097489  ...  0.098325  0.000785
    parse/x4/onnxlight               0.054402  0.053965  ...  0.054913  0.000976
    save/1filex1/onnx                1.653635  1.810276  ...  2.156589  0.262568
    save/1filex1/onnxlight           1.591998  1.476984  ...  1.742602  0.297447
    save/1filex4/onnxlight           0.208170  0.256172  ...  0.448991  0.096421
    save/2filex1/onnx                2.100343  2.100343  ...  2.100343  0.000000
    save/2filex1/onnxlight           1.644106  1.716432  ...  2.091961  0.189380
    save/2filex4/onnxlight           0.206572  0.208063  ...  0.214393  0.003181
    serialize/x1/onnx                0.383723  0.384051  ...  0.386870  0.001564
    serialize/x1/onnxlight           0.388207  0.388419  ...  0.389651  0.000650
    serialize/x4/onnxlight           0.393573  0.396580  ...  0.406335  0.006873

    [25 rows x 5 columns]


.. GENERATED FROM PYTHON SOURCE LINES 1007-1014

Plot the results.
The average and median are shown for each operation, with the average value
and a 95% confidence interval (derived from the measured standard deviation)
annotated alongside the average bar.
Bars are colored by library: blue family for ``onnx``, orange family for
``onnx_light``, green family for ``onnxruntime``.  Solid shades represent
the average; lighter shades the median.

.. GENERATED FROM PYTHON SOURCE LINES 1014-1093

.. code-block:: Python

    import matplotlib.patches as mpatches

    _onnx_avg = "steelblue"
    _onnx_med = "lightsteelblue"
    _onnx_light_avg = "darkorange"
    _onnx_light_med = "moccasin"
    _ort_avg = "seagreen"
    _ort_med = "lightgreen"


    processor_name = get_processor_name()
    total_memory_gb = get_total_memory_gb()
    memory_str = f"{total_memory_gb:.1f} GB" if total_memory_gb is not None else "unknown"
    cpu_count = os.cpu_count() or 0

    ax = df[["avg", "median"]].plot.barh(
        title=(
            f"onnx vs onnx_light vs ort load/save (s), size={file_size / 2 ** 20:.2f} MB "
            f"(lower is better)\n"
            f"CPU: {processor_name} ({cpu_count} cores), RAM: {memory_str}\n"
            f"benchmark key: <op>/<files>x<threads>/<lib>\n"
            f"op=load|save|parse|serialize, files=1|2, threads=1|4, "
            f"lib=onnx|onnx-cpp|onnxlight|onnxlight-cpp|onnxlight-cpp-nocopy|"
            f"onnxlight-nocopy|ort"
        ),
        xlabel="seconds",
        legend=False,
        figsize=(12, 8),
    )

    # Row names use "onnxlight" / "ort" as recorded during benchmarking.
    row_names = df.index.tolist()
    for container, col in zip(ax.containers, ["avg", "median"]):
        for bar, name in zip(container, row_names):
            if "onnxlight" in name:
                if col == "avg":
                    bar.set_facecolor(_onnx_light_avg)
                elif col == "median":
                    bar.set_facecolor(_onnx_light_med)
            elif "/ort" in name:
                if col == "avg":
                    bar.set_facecolor(_ort_avg)
                elif col == "median":
                    bar.set_facecolor(_ort_med)
            else:
                if col == "avg":
                    bar.set_facecolor(_onnx_avg)
                elif col == "median":
                    bar.set_facecolor(_onnx_med)

    first_container = ax.containers[0]
    for bar, name in zip(first_container, row_names):
        avg = df.loc[name, "avg"]
        std = df.loc[name, "std"]
        if not np.isfinite(avg):
            continue
        if np.isfinite(std):
            ci = 1.96 * std
            label = f" {avg * 1e3:.1f} ±{ci * 1e3:.1f} ms"
        else:
            label = f" {avg * 1e3:.1f} ms"
        ax.text(bar.get_width(), bar.get_y() + bar.get_height() / 2.0, label, va="center", ha="left")

    legend_handles = [
        mpatches.Patch(color=_onnx_avg, label="onnx avg"),
        mpatches.Patch(color=_onnx_med, label="onnx median"),
        mpatches.Patch(color=_onnx_light_avg, label="onnx_light avg"),
        mpatches.Patch(color=_onnx_light_med, label="onnx_light median"),
        mpatches.Patch(color=_ort_avg, label="ort avg"),
        mpatches.Patch(color=_ort_med, label="ort median"),
    ]
    ax.legend(handles=legend_handles)
    ax.grid(axis="x")
    for label in ax.get_yticklabels():
        label.set_horizontalalignment("left")
    ax.tick_params(axis="y", pad=160)
    ax.figure.tight_layout()
    ax.figure.savefig("plot_onnx_time.png")


.. image-sg:: /auto_examples/core/images/sphx_glr_plot_onnx_time_001.png
   :alt: onnx vs onnx_light vs ort load/save (s), size=640.00 MB (lower is better) CPU: AMD EPYC 7763 64-Core Processor (4 cores), RAM: 15.6 GB benchmark key: <op>/<files>x<threads>/<lib> op=load|save|parse|serialize, files=1|2, threads=1|4, lib=onnx|onnx-cpp|onnxlight|onnxlight-cpp|onnxlight-cpp-nocopy|onnxlight-nocopy|ort
   :srcset: /auto_examples/core/images/sphx_glr_plot_onnx_time_001.png
   :class: sphx-glr-single-img


.. GENERATED FROM PYTHON SOURCE LINES 1094-1097

Cleanup
--------
Remove all temporary files created during the benchmark.

.. GENERATED FROM PYTHON SOURCE LINES 1097-1099

.. code-block:: Python


    shutil.rmtree(tmp_dir, ignore_errors=True)


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** (0 minutes 57.269 seconds)


.. _sphx_glr_download_auto_examples_core_plot_onnx_time.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: plot_onnx_time.ipynb <plot_onnx_time.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: plot_onnx_time.py <plot_onnx_time.py>`

    .. container:: sphx-glr-download sphx-glr-download-zip

      :download:`Download zipped: plot_onnx_time.zip <plot_onnx_time.zip>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_