Differences between onnx and onnx_light#
This section explains the internal design of onnx-light and how it differs from
the reference onnx package.
Both packages share the same on-disk format (binary protobuf-encoded .onnx
files) and expose very similar Python APIs, so onnx_light can act as a
near-drop-in replacement for common model loading and inspection tasks.
The key differences lie in how the serialization layer is implemented.
No protobuf dependency#
onnx wraps the official Protocol Buffers (protobuf) runtime. Every message class is
auto-generated by the protoc compiler from .proto schema files, and
the resulting Python objects delegate all parsing and serialization to the
libprotobuf C++ library.
onnx_light ships its own hand-written parser and serializer implemented
entirely in C++ (see onnx_light/onnx_proto/). There is no
dependency on protobuf at compile time or at runtime.
Design implications:
The C++ shared object (
_onnxpy.so) built byonnx_lightis smaller because it does not statically link any portion oflibprotobuf.All parsing and serialization code lives in a single self-contained library that can be consumed by other C++ projects without installing protobuf (see C++ onnx-light examples).
C++ projects that only need protobuf-compatible message types can link
onnx_light::lib_onnx_protodirectly;onnx_light::onnx_lightis only needed when features tied to operator notions are required (checker, schema lookup, shape inference, version conversion, …).The wire format produced by
onnx_lightis 100 % compatible with the official ONNX binary format, so models can be freely exchanged between the two libraries.
Files larger than 2 GB#
The protobuf C++ runtime enforces a hard 2 GB message-size limit at
parsing time. Loading or saving a model larger than that threshold with the
standard onnx package raises a DecodeError.
onnx_light imposes no such limit. Internally it tracks byte counts with
64-bit unsigned integers throughout the parsing and serialization path, so
models of arbitrary size are supported.
Buffered file I/O#
For file-based loading, onnx_light uses FileStream
(stream.h / stream.cc), a buffered binary reader that opens the
file with std::ifstream and reads ahead in 4096-byte chunks. On
POSIX platforms a second file descriptor is opened for parallel block reads
via pread.
The onnx package reads the whole file into a Python bytes object first
and then passes it to protobuf, which copies it again internally.
Parallel tensor loading#
Large ONNX models contain hundreds or thousands of initializers (tensor weights). Parsing these sequentially is the dominant cost when loading a model.
onnx_light exposes a num_threads option that distributes the initializer
parsing across a thread pool:
import onnx_light.onnx as onnxl
model = onnxl.load("model.onnx", num_threads=4)
On the C++ side the thread pool is implemented in thread_pool.h /
thread_pool.cc. Each worker independently parses a slice of the
initializer list, so wall-clock loading time scales with the number of
hardware threads available.
The standard onnx package is single-threaded; it offers no built-in
parallel loading mechanism.
Zero-copy parsing#
When the full model bytes are already in memory (e.g. downloaded into a
bytes object), onnx_light can skip the malloc + memcpy that would
normally be used to copy each tensor’s raw data into an owned buffer:
import onnx_light.onnx as onnxl
serialized = open("model.onnx", "rb").read() # keep alive!
model = onnxl.load(serialized, no_copy=True)
# tensor.raw_data now points directly into 'serialized'
Internally, each TensorProto stores a non-owning ByteSpan (from
simple_span.h) that borrows the bytes from the source buffer. The
borrowed span’s is_borrowed() predicate can be used to check whether raw
data is owned or borrowed.
Warning
The source bytes object must remain alive for as long as the model
is in use. Freeing it while raw_data fields still point into it
causes undefined behavior. This constraint does not exist in the standard
onnx package.
C++ class generation via macros#
The onnx package generates Python message classes from .proto schema
files using protoc. onnx_light takes a different approach: message
classes are generated at compile time from a small set of C++ macros
defined in stream_class.h:
BEGIN_PROTO(cls, doc)/END_PROTO()— open/close a message class.FIELD(type, name, order, doc)— declare a scalar field with typed accessorsref_<name>(),has_<name>(),set_<name>().FIELD_STR(name, order, doc)— shorthand forutils::Stringfields that also acceptsstd::string.FIELD_REPEATED(type, name, order, doc)— declare a repeated (list) field.SERIALIZATION_METHOD()— injectParseFromString,SerializeToString,ParseFromStream, andSerializeToStreamdeclarations.
The resulting classes in onnx.h closely mirror the protobuf-generated
classes so that code originally written for onnx can be adapted with
minimal changes.
External-data / multi-file models#
Large ONNX models can be split across two files: a small .onnx file that
holds the graph structure and a separate binary blob (the external data file)
that holds the raw tensor weights. This layout allows the structural metadata
to be inspected quickly without loading the weights and makes it possible to
memory-map only the weight region.
Saving with external data#
Pass a location argument to onnxl.save to route tensor weights to a
separate file:
import onnx_light.onnx as onnxl
# model.onnx – graph structure only
# model.onnx.data – all tensor weights
onnxl.save(model, "model.onnx", location="model.onnx.data")
Serializing to two files does not mutate the in-memory ModelProto.
onnx_light applies external-data metadata on a temporary copy while writing
and keeps the original model unchanged.
The location value stored inside the .onnx metadata is automatically
reduced to a relative path (just the file name) when an absolute path is
provided, so the two files can be moved together without breaking the
reference.
Loading with external data#
When the .onnx file already references an external data file through its
tensor metadata, onnxl.load can discover and load the weights
automatically:
import onnx_light.onnx as onnxl
model = onnxl.load("model.onnx", load_external_data=True)
To override the data-file location (for example when the file has been moved),
pass location explicitly:
model = onnxl.load("model.onnx", location="/data/weights.bin",
load_external_data=True)
When no_copy=True is combined with external data, onnx_light reads
each external weights file once into a shared model-owned buffer and every
tensor points into that shared storage. This avoids one allocation and copy
per tensor while still handling split external-data files transparently.
Splitting external data across multiple files#
For very large models it can be useful to cap the size of each external weight
file. Set max_external_file_size (in bytes) and onnxl.save will
automatically open a new file once the limit is reached, appending .1,
.2, … suffixes to the base name:
import onnx_light.onnx as onnxl
# Produces: model.onnx, model.onnx.data, model.onnx.data.1, …
onnxl.save(
model,
"model.onnx",
location="model.onnx.data",
max_external_file_size=2 * 1024 ** 3, # 2 GB per file
)
When loading, only the primary location (model.onnx.data) needs to be
specified; the loader automatically opens model.onnx.data.1,
model.onnx.data.2, … as required.
All I/O is performed in C++ via TwoFilesWriteStream /
TwoFilesStream, so no Python overhead is incurred per tensor.
Encrypted model save / load#
onnx_light optionally supports saving and loading models in an
AES-256-CBC encrypted binary format (extension .onnxc). The
standard onnx package offers no equivalent functionality.
The feature is available only when onnx_light is built with OpenSSL
(-DONNX_LIGHT_HAS_OPENSSL); when OpenSSL is absent the helpers raise
NotImplementedError with a clear message.
File format#
The encrypted file is a compact, self-contained binary:
Offset Size Field
------ ---- -----
0 8 Magic: "ONNXCRY1"
8 16 Random PBKDF2 salt
24 16 Random AES-CBC initialisation vector
40 N AES-256-CBC ciphertext (PKCS#7-padded protobuf payload)
Key derivation uses PBKDF2-HMAC-SHA256 with 100 000 iterations, which makes brute-force attacks on the passphrase computationally expensive.
Python API (file-based)#
import onnx_light.onnx as onnxl
# Save an encrypted model to a file
onnxl.save_encrypted(model, "model.onnxc", key="my_passphrase")
# Load and decrypt from a file
model = onnxl.load_encrypted("model.onnxc", key="my_passphrase")
# A wrong key raises RuntimeError
model = onnxl.load_encrypted("model.onnxc", key="wrong") # RuntimeError
Python API (in-memory / bytes)#
When no file I/O is desired, the model can be encrypted to a bytes
object and decrypted back directly:
import onnx_light.onnx as onnxl
# Encrypt to bytes (no file written)
blob: bytes = onnxl.save_encrypted_string(model, key="my_passphrase")
# Decrypt from bytes
model = onnxl.load_encrypted_string(blob, key="my_passphrase")
The bytes object produced by save_encrypted_string() is in the
same ONNXCRY1 format as the file produced by save_encrypted(),
so the two forms are interchangeable.
C++ API#
#include "onnx_crypt.h"
// File-based
ONNX_LIGHT_NAMESPACE::SaveEncryptedModel(model, "model.onnxc", "passphrase");
ONNX_LIGHT_NAMESPACE::LoadEncryptedModel(model, "model.onnxc", "passphrase");
// In-memory
std::string blob = ONNX_LIGHT_NAMESPACE::SaveEncryptedModelToString(model, "passphrase");
ONNX_LIGHT_NAMESPACE::LoadEncryptedModelFromString(model, blob, "passphrase");
See onnx_crypt.h for the full C++ API reference.
API compatibility#
onnx_light aims to be a functional subset of the onnx Python API for
the most common operations:
Operation |
|
|
|---|---|---|
Load from file |
|
|
Load from bytes |
|
|
Save to file |
|
|
Save with external data |
|
|
Save external data with aligned tensor offsets |
not supported |
|
Load with external data |
|
|
Load external data with shared no-copy buffers |
not supported |
|
Split external data |
not supported |
|
Save encrypted to file |
not supported |
|
Load encrypted from file |
not supported |
|
Save encrypted to bytes |
not supported |
|
Load encrypted from bytes |
not supported |
|
Parse a message |
|
|
Serialize a message |
|
|
Parallel load |
not supported |
|
Zero-copy parse |
not supported |
|
File size limit |
2 GB (protobuf) |
unlimited |
Some helper utilities present in onnx (shape inference, model checker,
etc.) are not yet implemented in onnx_light, which focuses on fast,
dependency-free loading and saving.
Summary#
Aspect |
|
|
|---|---|---|
Serialization runtime |
Google protobuf |
Custom C++ (no protobuf) |
Max model size |
2 GB |
Unlimited |
File I/O |
Read-into-bytes |
Memory-mapped (mmap) |
Tensor loading |
Single-threaded |
Optional parallel (thread pool) |
Raw-data copying |
Always copied |
Zero-copy option ( |
External data (2-file) |
Yes ( |
Yes ( |
External data no-copy shared buffers |
No |
Yes ( |
Split external data (N files) |
No |
Yes ( |
Tensor offset alignment in external files |
No |
Yes ( |
Standalone C++ library |
Yes |
Yes ( |
Wire format |
ONNX binary protobuf |
ONNX binary protobuf (identical) |
Encrypted save / load |
No |
Yes (AES-256-CBC, requires OpenSSL) |