Detailed Differences between `onnx` and `onnx_light`#

This section explains the internal design of onnx-light and how it differs from the reference onnx package. Both packages share the same on-disk format (binary protobuf-encoded .onnx files) and expose very similar Python APIs, so onnx_light can act as a near-drop-in replacement for common model loading and inspection tasks. The key differences lie in how the serialization layer is implemented.

No protobuf dependency#

onnx wraps the official Protocol Buffers (protobuf) runtime. Every message class is auto-generated by the protoc compiler from .proto schema files, and the resulting Python objects delegate all parsing and serialization to the libprotobuf C++ library. onnx_light ships its own hand-written parser and serializer implemented entirely in C++ (see onnx_light/onnx_proto/). There is no dependency on protobuf at compile time or at runtime.

The C++ shared object built by onnx_light is smaller because it does not statically link any portion of libprotobuf.
All parsing and serialization code lives in a single self-contained library that can be consumed by other C++ projects without installing protobuf (see C++ onnx-light examples).
C++ projects that only need protobuf-compatible message types can link onnx_light::lib_onnx_proto directly; onnx_light::onnx_light is only needed when features tied to operator notions are required (checker, schema lookup, shape inference, version conversion, …).
The wire format produced by onnx_light is 100 % compatible with the official ONNX binary format, so models can be freely exchanged between the two libraries.

No serialize / parse round-trip when calling C++ tools#

In the official onnx package the Python message classes (ModelProto, GraphProto, NodeProto, TensorProto, …) are generated by protoc and live entirely in Python (or in the pure-Python protobuf runtime when the C++ accelerator is not available). They are not the same objects as the onnx::ModelProto C++ instances used by the bundled native tools (checker, shape inference, inliner, version converter, …).

Every call from Python into one of those C++ helpers therefore needs a full serialize / parse round-trip:

# onnx/inliner.py – simplified
result = C.inline_local_functions(model.SerializeToString(), convert_version)
inlined_model = ModelProto()
inlined_model.ParseFromString(result)

The same pattern shows up in onnx.checker (C.check_model(model.SerializeToString(), ...)), onnx.shape_inference (model.SerializeToString() then ParseFromString() on the result), and the version converter. For large models this serialize/parse pair dominates the wall-clock time of the call even when the C++ work itself is cheap.

onnx_light does not have this split: the Python ModelProto exposed by onnx_light.onnx is the C++ ModelProto, bound through nanobind (see onnx_light/onnx_py/_onnxpy_lib.cc). The bundled tools take the object by reference and operate on it in place:

// onnx_light/onnx_py/_onnxpy_lib.cc – inliner binding
inliner_mod.def(
    "inline_local_functions",
    [](const ModelProto &model, bool convert_version) {
      inliner::CheckFunctionCallCycles(model);
      ModelProto copy = model;
      inliner::InlineLocalFunctions(copy, convert_version);
      return copy;
    });

No bytes are produced and no parser runs. The inliner still makes an internal C++ copy of the ModelProto so the caller’s model is left unchanged, but that copy is a structural deep-copy of C++ objects, not a serialize + parse round-trip — it is orders of magnitude cheaper than the protobuf path and no temporary bytes object is materialised on the Python side. The same direct-call pattern is used by onnx_light.onnx.checker, onnx_light.onnx.shape_inference, and the version converter, so invoking any of these helpers on a model already loaded by onnx_light is essentially free apart from the work of the algorithm itself.

Files larger than 2 GB#

The protobuf C++ runtime enforces a hard 2 GB message-size limit at parsing time. Loading or saving a model larger than that threshold with the standard onnx package raises a DecodeError. onnx_light imposes no such limit. Internally it tracks byte counts with 64-bit unsigned integers throughout the parsing and serialization path, so models of arbitrary size are supported.

Buffered file I/O#

For file-based loading, onnx_light uses FileStream (stream.h / stream.cc), a buffered binary reader that opens the file with std::ifstream and reads ahead in 4096-byte chunks. On POSIX platforms a second file descriptor is opened for parallel block reads via pread. The onnx package reads the whole file into a Python bytes object first and then passes it to protobuf, which copies it again internally.

Parallel tensor loading and saving#

Large ONNX models contain hundreds or thousands of initializers (tensor weights). Parsing or serializing these sequentially is the dominant cost when loading or saving a model. onnx_light exposes a num_threads option on both onnx_light.onnx.load() and onnx_light.onnx.save() that distributes the initializer parsing / writing across a thread pool:

import onnx_light.onnx as onnxl

model = onnxl.load("model.onnx", num_threads=4)
onnxl.save(model, "model.onnx", num_threads=4)

On the C++ side the thread pool is implemented in thread_pool.h / thread_pool.cc. Each worker independently parses (or writes) a slice of the initializer list, so wall-clock load and save time scales with the number of hardware threads available. In practice loading or saving a large model is significantly faster than with the single-threaded path (see Number of threads used to load and save ONNX models for a detailed benchmark). The standard onnx package is single-threaded; it offers no built-in parallel loading or saving mechanism.

Zero-copy parsing#

When the full model bytes are already in memory (e.g. downloaded into a bytes object), onnx_light can skip the malloc + memcpy that would normally be used to copy each tensor’s raw data into an owned buffer:

import onnx_light.onnx as onnxl

serialized = open("model.onnx", "rb").read()   # keep alive!
model = onnxl.load(serialized, no_copy=True)
# tensor.raw_data now points directly into 'serialized'

Internally, each TensorProto stores a non-owning ByteSpan (from simple_span.h) that borrows the bytes from the source buffer. The borrowed span’s is_borrowed() predicate can be used to check whether raw data is owned or borrowed.

Warning

The source bytes object must remain alive for as long as the model is in use. Freeing it while raw_data fields still point into it causes undefined behavior. This constraint does not exist in the standard onnx package.

C++ class generation via macros#

The onnx package generates Python message classes from .proto schema files using protoc. onnx_light takes a different approach: message classes are generated at compile time from a small set of C++ macros defined in stream_class.h:

BEGIN_PROTO(cls, doc) / END_PROTO() — open/close a message class.
FIELD(type, name, order, doc) — declare a scalar field with typed accessors ref_<name>(), has_<name>(), set_<name>().
FIELD_STR(name, order, doc) — shorthand for String fields that also accepts std::string.
FIELD_REPEATED(type, name, order, doc) — declare a repeated (list) field.
SERIALIZATION_METHOD() — inject ParseFromString(), SerializeToString(), ParseFromStream(), and SerializeToStream() declarations.

The resulting classes in onnx.h closely mirror the protobuf-generated classes so that code originally written for onnx can be adapted with minimal changes.

External-data / multi-file models#

Large ONNX models can be split across two files: a small .onnx file that holds the graph structure and a separate binary blob (the external data file) that holds the raw tensor weights. This layout allows the structural metadata to be inspected quickly without loading the weights and makes it possible to memory-map only the weight region.

Saving with external data#

Pass a location argument to onnx_light.onnx.save() to route tensor weights to a separate file:

import onnx_light.onnx as onnxl

# model.onnx – graph structure only
# model.onnx.data – all tensor weights
onnxl.save(model, "model.onnx", location="model.onnx.data")

Serializing to two files does not mutate the in-memory ModelProto. onnx_light applies external-data metadata on a temporary copy while writing and keeps the original model unchanged. The location value stored inside the .onnx metadata is automatically reduced to a relative path (just the file name) when an absolute path is provided, so the two files can be moved together without breaking the reference.

Loading with external data#

When the .onnx file already references an external data file through its tensor metadata, onnx_light.onnx.load() can discover and load the weights automatically:

import onnx_light.onnx as onnxl

model = onnxl.load("model.onnx", load_external_data=True)

To override the data-file location (for example when the file has been moved), pass location explicitly:

model = onnxl.load("model.onnx", location="/data/weights.bin",
                    load_external_data=True)

When no_copy=True is combined with external data, onnx_light reads each external weights file once into a shared model-owned buffer and every tensor points into that shared storage. This avoids one allocation and copy per tensor while still handling split external-data files transparently.

Splitting external data across multiple files#

For very large models it can be useful to cap the size of each external weight file. Set max_external_file_size (in bytes) and onnx_light.onnx.save() will automatically open a new file once the limit is reached, appending .1, .2, … suffixes to the base name:

import onnx_light.onnx as onnxl

# Produces: model.onnx, model.onnx.data, model.onnx.data.1, …
onnxl.save(
    model,
    "model.onnx",
    location="model.onnx.data",
    max_external_file_size=2 * 1024 ** 3,  # 2 GB per file
)

When loading, only the primary location (model.onnx.data) needs to be specified; the loader automatically opens model.onnx.data.1, model.onnx.data.2, … as required. All I/O is performed in C++ via TwoFilesWriteStream / TwoFilesStream, so no Python overhead is incurred per tensor.

Encrypted model save / load#

onnx_light optionally supports saving and loading models in an AES-256-CBC encrypted binary format (extension .onnxc). The standard onnx package offers no equivalent functionality. The feature is available only when onnx_light is built with OpenSSL (-DONNX_LIGHT_HAS_OPENSSL); when OpenSSL is absent the helpers raise NotImplementedError with a clear message.

File format#

The encrypted file is a compact, self-contained binary:

Offset  Size  Field
------  ----  -----
     0     8  Magic: "ONNXCRY1"
     8    16  Random PBKDF2 salt
    24    16  Random AES-CBC initialisation vector
    40     N  AES-256-CBC ciphertext (PKCS#7-padded protobuf payload)

Key derivation uses PBKDF2-HMAC-SHA256 with 100 000 iterations, which makes brute-force attacks on the passphrase computationally expensive.

Python API (file-based)#

import onnx_light.onnx as onnxl

# Save an encrypted model to a file
onnxl.save_encrypted(model, "model.onnxc", key="my_passphrase")

# Load and decrypt from a file
model = onnxl.load_encrypted("model.onnxc", key="my_passphrase")

# A wrong key raises RuntimeError
model = onnxl.load_encrypted("model.onnxc", key="wrong")  # RuntimeError

Python API (in-memory / bytes)#

When no file I/O is desired, the model can be encrypted to a bytes object and decrypted back directly:

import onnx_light.onnx as onnxl

# Encrypt to bytes (no file written)
blob: bytes = onnxl.save_encrypted_string(model, key="my_passphrase")

# Decrypt from bytes
model = onnxl.load_encrypted_string(blob, key="my_passphrase")

The bytes object produced by onnx_light.onnx.save_encrypted_string() is in the same ONNXCRY1 format as the file produced by onnx_light.onnx.save_encrypted(), so the two forms are interchangeable.

C++ API#

#include "onnx_crypt.h"

// File-based
ONNX_LIGHT_NAMESPACE::SaveEncryptedModel(model, "model.onnxc", "passphrase");
ONNX_LIGHT_NAMESPACE::LoadEncryptedModel(model, "model.onnxc", "passphrase");

// In-memory
std::string blob = ONNX_LIGHT_NAMESPACE::SaveEncryptedModelToString(model, "passphrase");
ONNX_LIGHT_NAMESPACE::LoadEncryptedModelFromString(model, blob, "passphrase");

See onnx_crypt.h for the full C++ API reference.

No checked-in markdown files for operators#

The official onnx repository tracks operator metadata in a set of markdown files (docs/Operators.md, docs/Operators-ml.md, docs/Changelog.md, docs/Changelog-ml.md, docs/TestCoverage.md, docs/TestCoverage-ml.md). These files are generated from the C++ schemas and the Python test suite, and the onnx CI fails if a change to an operator schema is not accompanied by a regeneration of the corresponding markdown files.

onnx_light does not ship any equivalent set of generated markdown files, and adding a new operator (or modifying an existing schema, opset history, kernel, shape-inference rule, backend test case, …) never requires editing or regenerating any .md file in this repository.

All operator documentation is produced on the fly by the Sphinx build from the live LightOpSchema objects (see Backend test-case coverage), so the single source of truth is the C++ code itself. Contributors should therefore:

never commit a regenerated Operators.md / Changelog.md / TestCoverage.md (or any -ml variant);
never add a new .md file under docs/ to describe a new operator — the operator pages are generated automatically.

API compatibility#

onnx_light aims to be a functional subset of the onnx Python API for the most common operations:

Operation	`onnx`	`onnx_light`
Load from file	`onnx.load(path)`	`onnxl.load(path)`
Load from bytes	`onnx.load_from_string(b)`	`onnxl.load(b)`
Save to file	`onnx.save(model, path)`	`onnxl.save(model, path)`
Save with external data	`onnx.save_model(model, path, save_as_external_data=True, location=loc)`	`onnxl.save(model, path, location=loc)`
Save external data with aligned tensor offsets	not supported	`opts = onnxl.SerializeOptions(); opts.alignment = 4096; model.SerializeToFile(path, opts, loc)`
Load with external data	`onnx.load(path, load_external_data=True)`	`onnxl.load(path, load_external_data=True)`
Load external data with shared no-copy buffers	not supported	`onnxl.load(path, load_external_data=True, no_copy=True)`
Split external data	not supported	`onnxl.save(model, path, location=loc, max_external_file_size=N)`
Save encrypted to file	not supported	`onnxl.save_encrypted(model, path, key=k)`
Load encrypted from file	not supported	`onnxl.load_encrypted(path, key=k)`
Save encrypted to bytes	not supported	`onnxl.save_encrypted_string(model, key=k)`
Load encrypted from bytes	not supported	`onnxl.load_encrypted_string(blob, key=k)`
Parse a message	`msg.ParseFromString(b)`	`msg.ParseFromString(b)`
Serialize a message	`msg.SerializeToString()`	`msg.SerializeToString()`
Parallel load	not supported	`onnxl.load(path, num_threads=N)`
Zero-copy parse	not supported	`onnxl.load(b, no_copy=True)`
File size limit	2 GB (protobuf)	unlimited

Some helper utilities present in onnx (shape inference, model checker, etc.) are not yet implemented in onnx_light, which focuses on fast, dependency-free loading and saving.

Summary#

Aspect	`onnx`	`onnx_light`
Serialization runtime	Google protobuf	Custom C++ (no protobuf)
Max model size	2 GB	Unlimited
File I/O	Read-into-bytes	Memory-mapped (mmap)
Tensor loading	Single-threaded	Optional parallel (thread pool)
Raw-data copying	Always copied	Zero-copy option (`no_copy=True`)
External data (2-file)	Yes (`save_model` / `load`)	Yes (`save` / `load`)
External data no-copy shared buffers	No	Yes (`load(..., no_copy=True)`)
Split external data (N files)	No	Yes (`max_external_file_size`)
Tensor offset alignment in external files	No	Yes (`SerializeOptions.alignment`)
Standalone C++ library	Yes	Yes (`onnx_light::lib_onnx_proto` for proto-only code, `onnx_light::onnx_light` when operator-aware APIs are needed)
Wire format	ONNX binary protobuf	ONNX binary protobuf (identical)
Encrypted save / load	No	Yes (AES-256-CBC, requires OpenSSL)
Python ↔ C++ tool calls (checker, inliner, shape inference, …)	Serialize + parse round-trip per call	Direct call (Python object is the C++ object)

Detailed Differences between onnx and onnx_light#