.. _l-design-differences: Differences between ``onnx`` and ``onnx_light`` =============================================== This section explains the internal design of *onnx-light* and how it differs from the reference :epkg:`onnx` package. Both packages share the same on-disk format (binary protobuf-encoded ``.onnx`` files) and expose very similar Python APIs, so ``onnx_light`` can act as a near-drop-in replacement for common model loading and inspection tasks. The key differences lie in how the serialization layer is implemented. .. contents:: :local: :depth: 2 ---- No protobuf dependency ----------------------- ``onnx`` wraps the official `Protocol Buffers `_ (protobuf) runtime. Every message class is auto-generated by the ``protoc`` compiler from ``.proto`` schema files, and the resulting Python objects delegate all parsing and serialization to the ``libprotobuf`` C++ library. ``onnx_light`` ships its own hand-written parser and serializer implemented entirely in C++ (see ``onnx_light/onnx_proto/``). There is **no dependency on protobuf at compile time or at runtime**. Design implications: * The C++ shared object (``_onnxpy.so``) built by ``onnx_light`` is smaller because it does not statically link any portion of ``libprotobuf``. * All parsing and serialization code lives in a single self-contained library that can be consumed by other C++ projects without installing protobuf (see :epkg:`C++ onnx-light examples`). * C++ projects that only need protobuf-compatible message types can link ``onnx_light::lib_onnx_proto`` directly; ``onnx_light::onnx_light`` is only needed when features tied to operator notions are required (checker, schema lookup, shape inference, version conversion, ...). * The wire format produced by ``onnx_light`` is 100 % compatible with the official ONNX binary format, so models can be freely exchanged between the two libraries. ---- Files larger than 2 GB ----------------------- The protobuf C++ runtime enforces a hard **2 GB message-size limit** at parsing time. Loading or saving a model larger than that threshold with the standard ``onnx`` package raises a ``DecodeError``. ``onnx_light`` imposes no such limit. Internally it tracks byte counts with 64-bit unsigned integers throughout the parsing and serialization path, so models of arbitrary size are supported. ---- Buffered file I/O ----------------- For file-based loading, ``onnx_light`` uses ``FileStream`` (``stream.h`` / ``stream.cc``), a buffered binary reader that opens the file with ``std::ifstream`` and reads ahead in 4096-byte chunks. On POSIX platforms a second file descriptor is opened for parallel block reads via ``pread``. The ``onnx`` package reads the whole file into a Python ``bytes`` object first and then passes it to protobuf, which copies it again internally. ---- Parallel tensor loading ----------------------- Large ONNX models contain hundreds or thousands of initializers (tensor weights). Parsing these sequentially is the dominant cost when loading a model. ``onnx_light`` exposes a ``num_threads`` option that distributes the initializer parsing across a thread pool: .. code-block:: python import onnx_light.onnx as onnxl model = onnxl.load("model.onnx", num_threads=4) On the C++ side the thread pool is implemented in ``thread_pool.h`` / ``thread_pool.cc``. Each worker independently parses a slice of the initializer list, so wall-clock loading time scales with the number of hardware threads available. The standard ``onnx`` package is single-threaded; it offers no built-in parallel loading mechanism. ---- Zero-copy parsing ----------------- When the full model bytes are already in memory (e.g. downloaded into a ``bytes`` object), ``onnx_light`` can skip the ``malloc + memcpy`` that would normally be used to copy each tensor's raw data into an owned buffer: .. code-block:: python import onnx_light.onnx as onnxl serialized = open("model.onnx", "rb").read() # keep alive! model = onnxl.load(serialized, no_copy=True) # tensor.raw_data now points directly into 'serialized' Internally, each ``TensorProto`` stores a non-owning ``ByteSpan`` (from ``simple_span.h``) that borrows the bytes from the source buffer. The borrowed span's ``is_borrowed()`` predicate can be used to check whether raw data is owned or borrowed. .. warning:: The source ``bytes`` object **must** remain alive for as long as the model is in use. Freeing it while ``raw_data`` fields still point into it causes undefined behavior. This constraint does not exist in the standard ``onnx`` package. ---- C++ class generation via macros --------------------------------- The ``onnx`` package generates Python message classes from ``.proto`` schema files using ``protoc``. ``onnx_light`` takes a different approach: message classes are generated **at compile time** from a small set of C++ macros defined in ``stream_class.h``: * ``BEGIN_PROTO(cls, doc)`` / ``END_PROTO()`` — open/close a message class. * ``FIELD(type, name, order, doc)`` — declare a scalar field with typed accessors ``ref_()``, ``has_()``, ``set_()``. * ``FIELD_STR(name, order, doc)`` — shorthand for ``utils::String`` fields that also accepts ``std::string``. * ``FIELD_REPEATED(type, name, order, doc)`` — declare a repeated (list) field. * ``SERIALIZATION_METHOD()`` — inject ``ParseFromString``, ``SerializeToString``, ``ParseFromStream``, and ``SerializeToStream`` declarations. The resulting classes in ``onnx.h`` closely mirror the protobuf-generated classes so that code originally written for ``onnx`` can be adapted with minimal changes. ---- External-data / multi-file models ---------------------------------- Large ONNX models can be split across two files: a small ``.onnx`` file that holds the graph structure and a separate binary blob (the *external data file*) that holds the raw tensor weights. This layout allows the structural metadata to be inspected quickly without loading the weights and makes it possible to memory-map only the weight region. Saving with external data ~~~~~~~~~~~~~~~~~~~~~~~~~ Pass a ``location`` argument to ``onnxl.save`` to route tensor weights to a separate file: .. code-block:: python import onnx_light.onnx as onnxl # model.onnx – graph structure only # model.onnx.data – all tensor weights onnxl.save(model, "model.onnx", location="model.onnx.data") Serializing to two files does **not** mutate the in-memory ``ModelProto``. ``onnx_light`` applies external-data metadata on a temporary copy while writing and keeps the original model unchanged. The ``location`` value stored inside the ``.onnx`` metadata is automatically reduced to a *relative* path (just the file name) when an absolute path is provided, so the two files can be moved together without breaking the reference. Loading with external data ~~~~~~~~~~~~~~~~~~~~~~~~~~ When the ``.onnx`` file already references an external data file through its tensor metadata, ``onnxl.load`` can discover and load the weights automatically: .. code-block:: python import onnx_light.onnx as onnxl model = onnxl.load("model.onnx", load_external_data=True) To override the data-file location (for example when the file has been moved), pass ``location`` explicitly: .. code-block:: python model = onnxl.load("model.onnx", location="/data/weights.bin", load_external_data=True) When ``no_copy=True`` is combined with external data, ``onnx_light`` reads each external weights file once into a shared model-owned buffer and every tensor points into that shared storage. This avoids one allocation and copy per tensor while still handling split external-data files transparently. Splitting external data across multiple files ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ For very large models it can be useful to cap the size of each external weight file. Set ``max_external_file_size`` (in bytes) and ``onnxl.save`` will automatically open a new file once the limit is reached, appending ``.1``, ``.2``, … suffixes to the base name: .. code-block:: python import onnx_light.onnx as onnxl # Produces: model.onnx, model.onnx.data, model.onnx.data.1, … onnxl.save( model, "model.onnx", location="model.onnx.data", max_external_file_size=2 * 1024 ** 3, # 2 GB per file ) When loading, only the primary location (``model.onnx.data``) needs to be specified; the loader automatically opens ``model.onnx.data.1``, ``model.onnx.data.2``, … as required. All I/O is performed in C++ via ``TwoFilesWriteStream`` / ``TwoFilesStream``, so no Python overhead is incurred per tensor. ---- Encrypted model save / load ----------------------------- ``onnx_light`` optionally supports saving and loading models in an **AES-256-CBC encrypted** binary format (extension ``.onnxc``). The standard ``onnx`` package offers no equivalent functionality. The feature is available only when ``onnx_light`` is built with OpenSSL (``-DONNX_LIGHT_HAS_OPENSSL``); when OpenSSL is absent the helpers raise ``NotImplementedError`` with a clear message. File format ~~~~~~~~~~~ The encrypted file is a compact, self-contained binary: .. code-block:: text Offset Size Field ------ ---- ----- 0 8 Magic: "ONNXCRY1" 8 16 Random PBKDF2 salt 24 16 Random AES-CBC initialisation vector 40 N AES-256-CBC ciphertext (PKCS#7-padded protobuf payload) Key derivation uses **PBKDF2-HMAC-SHA256** with 100 000 iterations, which makes brute-force attacks on the passphrase computationally expensive. Python API (file-based) ~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python import onnx_light.onnx as onnxl # Save an encrypted model to a file onnxl.save_encrypted(model, "model.onnxc", key="my_passphrase") # Load and decrypt from a file model = onnxl.load_encrypted("model.onnxc", key="my_passphrase") # A wrong key raises RuntimeError model = onnxl.load_encrypted("model.onnxc", key="wrong") # RuntimeError Python API (in-memory / bytes) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ When no file I/O is desired, the model can be encrypted to a ``bytes`` object and decrypted back directly: .. code-block:: python import onnx_light.onnx as onnxl # Encrypt to bytes (no file written) blob: bytes = onnxl.save_encrypted_string(model, key="my_passphrase") # Decrypt from bytes model = onnxl.load_encrypted_string(blob, key="my_passphrase") The ``bytes`` object produced by :func:`save_encrypted_string` is in the same ``ONNXCRY1`` format as the file produced by :func:`save_encrypted`, so the two forms are interchangeable. C++ API ~~~~~~~ .. code-block:: cpp #include "onnx_crypt.h" // File-based ONNX_LIGHT_NAMESPACE::SaveEncryptedModel(model, "model.onnxc", "passphrase"); ONNX_LIGHT_NAMESPACE::LoadEncryptedModel(model, "model.onnxc", "passphrase"); // In-memory std::string blob = ONNX_LIGHT_NAMESPACE::SaveEncryptedModelToString(model, "passphrase"); ONNX_LIGHT_NAMESPACE::LoadEncryptedModelFromString(model, blob, "passphrase"); See :ref:`l-api-onnx-onnx-proto-onnx-crypt` for the full C++ API reference. ---- API compatibility ----------------- ``onnx_light`` aims to be a functional subset of the ``onnx`` Python API for the most common operations: .. list-table:: :header-rows: 1 :widths: 40 30 30 * - Operation - ``onnx`` - ``onnx_light`` * - Load from file - ``onnx.load(path)`` - ``onnxl.load(path)`` * - Load from bytes - ``onnx.load_from_string(b)`` - ``onnxl.load(b)`` * - Save to file - ``onnx.save(model, path)`` - ``onnxl.save(model, path)`` * - Save with external data - ``onnx.save_model(model, path, save_as_external_data=True, location=loc)`` - ``onnxl.save(model, path, location=loc)`` * - Save external data with aligned tensor offsets - not supported - ``opts = onnxl.SerializeOptions(); opts.alignment = 4096; model.SerializeToFile(path, opts, loc)`` * - Load with external data - ``onnx.load(path, load_external_data=True)`` - ``onnxl.load(path, load_external_data=True)`` * - Load external data with shared no-copy buffers - not supported - ``onnxl.load(path, load_external_data=True, no_copy=True)`` * - Split external data - not supported - ``onnxl.save(model, path, location=loc, max_external_file_size=N)`` * - Save encrypted to file - not supported - ``onnxl.save_encrypted(model, path, key=k)`` * - Load encrypted from file - not supported - ``onnxl.load_encrypted(path, key=k)`` * - Save encrypted to bytes - not supported - ``onnxl.save_encrypted_string(model, key=k)`` * - Load encrypted from bytes - not supported - ``onnxl.load_encrypted_string(blob, key=k)`` * - Parse a message - ``msg.ParseFromString(b)`` - ``msg.ParseFromString(b)`` * - Serialize a message - ``msg.SerializeToString()`` - ``msg.SerializeToString()`` * - Parallel load - not supported - ``onnxl.load(path, num_threads=N)`` * - Zero-copy parse - not supported - ``onnxl.load(b, no_copy=True)`` * - File size limit - 2 GB (protobuf) - unlimited Some helper utilities present in ``onnx`` (shape inference, model checker, etc.) are not yet implemented in ``onnx_light``, which focuses on fast, dependency-free loading and saving. ---- Summary ------- .. list-table:: :header-rows: 1 :widths: 35 30 35 * - Aspect - ``onnx`` - ``onnx_light`` * - Serialization runtime - Google protobuf - Custom C++ (no protobuf) * - Max model size - 2 GB - Unlimited * - File I/O - Read-into-bytes - Memory-mapped (mmap) * - Tensor loading - Single-threaded - Optional parallel (thread pool) * - Raw-data copying - Always copied - Zero-copy option (``no_copy=True``) * - External data (2-file) - Yes (``save_model`` / ``load``) - Yes (``save`` / ``load``) * - External data no-copy shared buffers - No - Yes (``load(..., no_copy=True)``) * - Split external data (N files) - No - Yes (``max_external_file_size``) * - Tensor offset alignment in external files - No - Yes (``SerializeOptions.alignment``) * - Standalone C++ library - Yes - Yes (``onnx_light::lib_onnx_proto`` for proto-only code, ``onnx_light::onnx_light`` when operator-aware APIs are needed) * - Wire format - ONNX binary protobuf - ONNX binary protobuf (identical) * - Encrypted save / load - No - Yes (AES-256-CBC, requires OpenSSL)