Differences between onnx and onnx_light#

This section explains the internal design of onnx-light and how it differs from the reference onnx package. Both packages share the same on-disk format (binary protobuf-encoded .onnx files) and expose very similar Python APIs, so onnx_light can act as a near-drop-in replacement for common model loading and inspection tasks. The key differences lie in how the serialization layer is implemented.


No protobuf dependency#

onnx wraps the official Protocol Buffers (protobuf) runtime. Every message class is auto-generated by the protoc compiler from .proto schema files, and the resulting Python objects delegate all parsing and serialization to the libprotobuf C++ library.

onnx_light ships its own hand-written parser and serializer implemented entirely in C++ (see onnx_light/onnx_proto/). There is no dependency on protobuf at compile time or at runtime.

Design implications:

  • The C++ shared object (_onnxpy.so) built by onnx_light is smaller because it does not statically link any portion of libprotobuf.

  • All parsing and serialization code lives in a single self-contained library that can be consumed by other C++ projects without installing protobuf (see C++ onnx-light examples).

  • C++ projects that only need protobuf-compatible message types can link onnx_light::lib_onnx_proto directly; onnx_light::onnx_light is only needed when features tied to operator notions are required (checker, schema lookup, shape inference, version conversion, …).

  • The wire format produced by onnx_light is 100 % compatible with the official ONNX binary format, so models can be freely exchanged between the two libraries.


Files larger than 2 GB#

The protobuf C++ runtime enforces a hard 2 GB message-size limit at parsing time. Loading or saving a model larger than that threshold with the standard onnx package raises a DecodeError.

onnx_light imposes no such limit. Internally it tracks byte counts with 64-bit unsigned integers throughout the parsing and serialization path, so models of arbitrary size are supported.


Buffered file I/O#

For file-based loading, onnx_light uses FileStream (stream.h / stream.cc), a buffered binary reader that opens the file with std::ifstream and reads ahead in 4096-byte chunks. On POSIX platforms a second file descriptor is opened for parallel block reads via pread.

The onnx package reads the whole file into a Python bytes object first and then passes it to protobuf, which copies it again internally.


Parallel tensor loading#

Large ONNX models contain hundreds or thousands of initializers (tensor weights). Parsing these sequentially is the dominant cost when loading a model.

onnx_light exposes a num_threads option that distributes the initializer parsing across a thread pool:

import onnx_light.onnx as onnxl

model = onnxl.load("model.onnx", num_threads=4)

On the C++ side the thread pool is implemented in thread_pool.h / thread_pool.cc. Each worker independently parses a slice of the initializer list, so wall-clock loading time scales with the number of hardware threads available.

The standard onnx package is single-threaded; it offers no built-in parallel loading mechanism.


Zero-copy parsing#

When the full model bytes are already in memory (e.g. downloaded into a bytes object), onnx_light can skip the malloc + memcpy that would normally be used to copy each tensor’s raw data into an owned buffer:

import onnx_light.onnx as onnxl

serialized = open("model.onnx", "rb").read()   # keep alive!
model = onnxl.load(serialized, no_copy=True)
# tensor.raw_data now points directly into 'serialized'

Internally, each TensorProto stores a non-owning ByteSpan (from simple_span.h) that borrows the bytes from the source buffer. The borrowed span’s is_borrowed() predicate can be used to check whether raw data is owned or borrowed.

Warning

The source bytes object must remain alive for as long as the model is in use. Freeing it while raw_data fields still point into it causes undefined behavior. This constraint does not exist in the standard onnx package.


C++ class generation via macros#

The onnx package generates Python message classes from .proto schema files using protoc. onnx_light takes a different approach: message classes are generated at compile time from a small set of C++ macros defined in stream_class.h:

  • BEGIN_PROTO(cls, doc) / END_PROTO() — open/close a message class.

  • FIELD(type, name, order, doc) — declare a scalar field with typed accessors ref_<name>(), has_<name>(), set_<name>().

  • FIELD_STR(name, order, doc) — shorthand for utils::String fields that also accepts std::string.

  • FIELD_REPEATED(type, name, order, doc) — declare a repeated (list) field.

  • SERIALIZATION_METHOD() — inject ParseFromString, SerializeToString, ParseFromStream, and SerializeToStream declarations.

The resulting classes in onnx.h closely mirror the protobuf-generated classes so that code originally written for onnx can be adapted with minimal changes.


External-data / multi-file models#

Large ONNX models can be split across two files: a small .onnx file that holds the graph structure and a separate binary blob (the external data file) that holds the raw tensor weights. This layout allows the structural metadata to be inspected quickly without loading the weights and makes it possible to memory-map only the weight region.

Saving with external data#

Pass a location argument to onnxl.save to route tensor weights to a separate file:

import onnx_light.onnx as onnxl

# model.onnx – graph structure only
# model.onnx.data – all tensor weights
onnxl.save(model, "model.onnx", location="model.onnx.data")

Serializing to two files does not mutate the in-memory ModelProto. onnx_light applies external-data metadata on a temporary copy while writing and keeps the original model unchanged.

The location value stored inside the .onnx metadata is automatically reduced to a relative path (just the file name) when an absolute path is provided, so the two files can be moved together without breaking the reference.

Loading with external data#

When the .onnx file already references an external data file through its tensor metadata, onnxl.load can discover and load the weights automatically:

import onnx_light.onnx as onnxl

model = onnxl.load("model.onnx", load_external_data=True)

To override the data-file location (for example when the file has been moved), pass location explicitly:

model = onnxl.load("model.onnx", location="/data/weights.bin",
                    load_external_data=True)

When no_copy=True is combined with external data, onnx_light reads each external weights file once into a shared model-owned buffer and every tensor points into that shared storage. This avoids one allocation and copy per tensor while still handling split external-data files transparently.

Splitting external data across multiple files#

For very large models it can be useful to cap the size of each external weight file. Set max_external_file_size (in bytes) and onnxl.save will automatically open a new file once the limit is reached, appending .1, .2, … suffixes to the base name:

import onnx_light.onnx as onnxl

# Produces: model.onnx, model.onnx.data, model.onnx.data.1, …
onnxl.save(
    model,
    "model.onnx",
    location="model.onnx.data",
    max_external_file_size=2 * 1024 ** 3,  # 2 GB per file
)

When loading, only the primary location (model.onnx.data) needs to be specified; the loader automatically opens model.onnx.data.1, model.onnx.data.2, … as required.

All I/O is performed in C++ via TwoFilesWriteStream / TwoFilesStream, so no Python overhead is incurred per tensor.


Encrypted model save / load#

onnx_light optionally supports saving and loading models in an AES-256-CBC encrypted binary format (extension .onnxc). The standard onnx package offers no equivalent functionality.

The feature is available only when onnx_light is built with OpenSSL (-DONNX_LIGHT_HAS_OPENSSL); when OpenSSL is absent the helpers raise NotImplementedError with a clear message.

File format#

The encrypted file is a compact, self-contained binary:

Offset  Size  Field
------  ----  -----
     0     8  Magic: "ONNXCRY1"
     8    16  Random PBKDF2 salt
    24    16  Random AES-CBC initialisation vector
    40     N  AES-256-CBC ciphertext (PKCS#7-padded protobuf payload)

Key derivation uses PBKDF2-HMAC-SHA256 with 100 000 iterations, which makes brute-force attacks on the passphrase computationally expensive.

Python API (file-based)#

import onnx_light.onnx as onnxl

# Save an encrypted model to a file
onnxl.save_encrypted(model, "model.onnxc", key="my_passphrase")

# Load and decrypt from a file
model = onnxl.load_encrypted("model.onnxc", key="my_passphrase")

# A wrong key raises RuntimeError
model = onnxl.load_encrypted("model.onnxc", key="wrong")  # RuntimeError

Python API (in-memory / bytes)#

When no file I/O is desired, the model can be encrypted to a bytes object and decrypted back directly:

import onnx_light.onnx as onnxl

# Encrypt to bytes (no file written)
blob: bytes = onnxl.save_encrypted_string(model, key="my_passphrase")

# Decrypt from bytes
model = onnxl.load_encrypted_string(blob, key="my_passphrase")

The bytes object produced by save_encrypted_string() is in the same ONNXCRY1 format as the file produced by save_encrypted(), so the two forms are interchangeable.

C++ API#

#include "onnx_crypt.h"

// File-based
ONNX_LIGHT_NAMESPACE::SaveEncryptedModel(model, "model.onnxc", "passphrase");
ONNX_LIGHT_NAMESPACE::LoadEncryptedModel(model, "model.onnxc", "passphrase");

// In-memory
std::string blob = ONNX_LIGHT_NAMESPACE::SaveEncryptedModelToString(model, "passphrase");
ONNX_LIGHT_NAMESPACE::LoadEncryptedModelFromString(model, blob, "passphrase");

See onnx_crypt.h for the full C++ API reference.


API compatibility#

onnx_light aims to be a functional subset of the onnx Python API for the most common operations:

Operation

onnx

onnx_light

Load from file

onnx.load(path)

onnxl.load(path)

Load from bytes

onnx.load_from_string(b)

onnxl.load(b)

Save to file

onnx.save(model, path)

onnxl.save(model, path)

Save with external data

onnx.save_model(model, path, save_as_external_data=True, location=loc)

onnxl.save(model, path, location=loc)

Save external data with aligned tensor offsets

not supported

opts = onnxl.SerializeOptions(); opts.alignment = 4096; model.SerializeToFile(path, opts, loc)

Load with external data

onnx.load(path, load_external_data=True)

onnxl.load(path, load_external_data=True)

Load external data with shared no-copy buffers

not supported

onnxl.load(path, load_external_data=True, no_copy=True)

Split external data

not supported

onnxl.save(model, path, location=loc, max_external_file_size=N)

Save encrypted to file

not supported

onnxl.save_encrypted(model, path, key=k)

Load encrypted from file

not supported

onnxl.load_encrypted(path, key=k)

Save encrypted to bytes

not supported

onnxl.save_encrypted_string(model, key=k)

Load encrypted from bytes

not supported

onnxl.load_encrypted_string(blob, key=k)

Parse a message

msg.ParseFromString(b)

msg.ParseFromString(b)

Serialize a message

msg.SerializeToString()

msg.SerializeToString()

Parallel load

not supported

onnxl.load(path, num_threads=N)

Zero-copy parse

not supported

onnxl.load(b, no_copy=True)

File size limit

2 GB (protobuf)

unlimited

Some helper utilities present in onnx (shape inference, model checker, etc.) are not yet implemented in onnx_light, which focuses on fast, dependency-free loading and saving.


Summary#

Aspect

onnx

onnx_light

Serialization runtime

Google protobuf

Custom C++ (no protobuf)

Max model size

2 GB

Unlimited

File I/O

Read-into-bytes

Memory-mapped (mmap)

Tensor loading

Single-threaded

Optional parallel (thread pool)

Raw-data copying

Always copied

Zero-copy option (no_copy=True)

External data (2-file)

Yes (save_model / load)

Yes (save / load)

External data no-copy shared buffers

No

Yes (load(..., no_copy=True))

Split external data (N files)

No

Yes (max_external_file_size)

Tensor offset alignment in external files

No

Yes (SerializeOptions.alignment)

Standalone C++ library

Yes

Yes (onnx_light::lib_onnx_proto for proto-only code, onnx_light::onnx_light when operator-aware APIs are needed)

Wire format

ONNX binary protobuf

ONNX binary protobuf (identical)

Encrypted save / load

No

Yes (AES-256-CBC, requires OpenSSL)