onnx-light#

onnx-light started from the upstream ONNX pull request onnx/onnx#7208, which is the initial code base from which this project diverged.

onnx without protobuf and more freedom#

Files larger than 2 GB – protobuf enforces a 2 GB message-size limit. onnx-light does not have this constraint.
External-data / multi-file models – external files are supported natively in C++.
Parallel loading and saving – onnx_light.onnx.load() and onnx_light.onnx.save() are parallelized. In practice loading or saving large models is significantly faster (see the threads benchmark example).
Zero-copy parsing – When parsing from an in-memory bytes buffer, the no_copy=True option makes each tensor’s raw_data point directly into the source bytes without allocating an extra copy. This eliminates one malloc + memcpy per tensor initializer.
Encrypted save / load – Models can be encrypted with AES-256-CBC (ONNXCRY1) or ChaCha20-Poly1305 (ONNXCRY2), both using PBKDF2-HMAC-SHA256 key derivation, and saved to a single self-contained .onnxc file, or serialized to an in-memory bytes object.
No serialize/parse round-trip for C++ tools – the Python ModelProto exposed by onnx_light.onnx is the C++ ModelProto (bound through nanobind). No serialization is need from Python to C++.
Supports protobuf (onnx) and flatbuffers (onnxruntime) format.

Modular C++ libraries#

The C++ code is shipped as several small libraries so that downstream projects can link only what they need:

onnx_light::lib_onnx_proto – protobuf-compatible message types, parser / serializer, external data, optional encrypted save / load (AES-256-CBC or ChaCha20-Poly1305).
onnx_light::lib_onnx_op – lightweight LightOpSchema registrations for ONNX operator domains, with no shape inference.
onnx_light::onnx_manipulations – graph-manipulation helpers (text parser / printer, attribute and tensor proto helpers, data-type name utilities, graph-input collection); depends only on lib_onnx_proto.
onnx_light::onnx_light – full ONNX-compatible schemas (with history), checker, inliner, shape inference and version converter.
onnx_light::lib_onnx_optim – shape-inference dispatch table, expression engine and graph optimization helpers.
onnx_light::onnx_kernels – C++ kernels, a C++ reference implementation, it is used to generate the expected outputs for the backend test.
onnx_light::onnx_backend_test – C++ backend test infrastructure and reference operator kernels.

In addition, onnx_light::onnx_lib replicates the current C++ API from onnx package. See How the C++ libraries are split for the detailed breakdown of each assembly and Linking onnx-light in C++ for the matching CMake usage.

Kernels#

It is a C++ reference implementation and used to generate the expected outputs for the backend tests. Parallelization is allowed except where it would change the order of floating-point accumulation: operators that accumulate internally (reductions, MatMul, Gemm, Attention, …) stay sequential on the accumulated axis to enforce reproducibility. See C++ Kernels for details.

Backend tests#

They are fully written in C++. They can be called from any language. Every output is generated with a C++ implementation of the operator. Kernels can be used without the backend tests but the backend tests rely on the kernels to produce the expected outputs.

Contents