.. _l-design-protobuf-format:
Protobuf format applied to ONNX
================================
ONNX models are serialized using `Protocol Buffers
`_ (protobuf), Google's compact binary encoding
format. An ``.onnx`` file is just the binary serialization of a
``ModelProto`` message defined in `onnx.proto
`_. This page
explains the relevant subset of the protobuf wire format and how it maps
to ONNX message types. It is meant to help readers understand the
low-level layout that :epkg:`onnx` produces and that ``onnx_light``
parses and writes without depending on ``libprotobuf``.
----
The wire format in a nutshell
-----------------------------
A protobuf-encoded message is a flat concatenation of *(tag, value)*
pairs. There is no framing, no length prefix at the top level, and no
field ordering requirement. A consumer reads bytes from start to end,
decodes each tag, and dispatches the following bytes according to the
*wire type* embedded in the tag.
Tag encoding
~~~~~~~~~~~~
Each field starts with a tag encoded as a *varint* (see below). The
tag combines the field number declared in the ``.proto`` file with a
3-bit wire type:
.. code-block:: text
tag = (field_number << 3) | wire_type
The wire types relevant to ONNX are:
.. list-table::
:header-rows: 1
:widths: 10 20 70
* - Value
- Name
- Meaning
* - 0
- ``VARINT``
- Variable-length integer (``int32``, ``int64``, ``uint32``,
``uint64``, ``bool``, ``enum``, ``sint32``, ``sint64``).
* - 1
- ``I64``
- Fixed 8 bytes, little-endian (``fixed64``, ``sfixed64``,
``double``).
* - 2
- ``LEN``
- Length-prefixed payload: a varint *length* followed by *length*
bytes (``string``, ``bytes``, embedded messages, packed
repeated fields).
* - 5
- ``I32``
- Fixed 4 bytes, little-endian (``fixed32``, ``sfixed32``,
``float``).
Wire types 3 and 4 (start-group / end-group) are deprecated and not
used by ONNX.
The decoding side appears in ``onnx_light/onnx_proto/stream.cc``: the
function ``BinaryStream::next_field()`` reads the tag varint and splits
it into ``field_number = tag >> 3`` and ``wire_type = tag & 0x07``.
Varints
~~~~~~~
A *varint* (variable-length integer) encodes an unsigned integer using
1 to 10 bytes. Each byte stores 7 payload bits in its low bits and
uses the most significant bit (``0x80``) as a *continuation* flag:
* ``0x80`` set: more bytes follow.
* ``0x80`` clear: this is the last byte.
The payload bits are stored *little-endian* (least-significant 7 bits
first). For example the integer ``300`` encodes as ``0xAC 0x02``:
.. code-block:: text
byte 0: 1010 1100 -> continuation=1, payload=0101100 (low 7 bits)
byte 1: 0000 0010 -> continuation=0, payload=0000010 (next 7 bits)
value = (0000010 << 7) | 0101100 = 300
Because a 64-bit value contains at most ``ceil(64 / 7) = 10`` payload
groups, a varint never exceeds 10 bytes. Field numbers from 1 to 15
fit in a single tag byte, which is why frequently used fields in
ONNX are assigned small numbers.
ZigZag encoding
~~~~~~~~~~~~~~~
The protobuf types ``sint32`` and ``sint64`` apply *ZigZag* mapping
before varint encoding so that small negative numbers do not require
the full 10 bytes. The mapping interleaves positive and negative
values:
.. code-block:: text
0 -> 0
-1 -> 1
1 -> 2
-2 -> 3
2 -> 4
...
It is implemented in ``onnx_light/onnx_proto/stream.h`` by
``encodeZigZag64`` / ``decodeZigZag64``. Note that the plain
``int32`` and ``int64`` ONNX fields do **not** use ZigZag; they use the
two's-complement representation directly, which is why a negative
``int64`` always takes 10 bytes.
Length-prefixed values
~~~~~~~~~~~~~~~~~~~~~~
For wire type ``LEN`` the encoder writes a varint *length* followed by
*length* raw bytes. This single mechanism is reused for:
* UTF-8 strings (``string``);
* arbitrary byte blobs (``bytes``), including tensor ``raw_data``;
* nested messages (such as ``GraphProto`` inside ``ModelProto``);
* packed repeated fields of scalar types.
Embedded messages are simply written out as their own bytestream and
prefixed by their total size, so the parser can either descend into
the substream or skip the whole region. ``onnx_light`` exposes this
pattern through ``BinaryStream::LimitToNext()`` and ``Restore()``,
which push and pop a temporary read limit corresponding to the
length-prefixed substream.
Packed repeated fields
~~~~~~~~~~~~~~~~~~~~~~
Repeated scalar fields can be encoded in two ways:
* **Unpacked** – each element is written with its own tag, repeating
the field number once per value. This is the only legal encoding
for repeated message fields and the legacy encoding for proto2
scalar fields.
* **Packed** – all elements are concatenated into a single
length-prefixed block (wire type ``LEN``) with a single tag.
ONNX uses packed encoding for scalar arrays such as
``TensorProto.float_data``, ``TensorProto.int32_data``, and
``TensorProto.dims``. A conformant parser must support both
representations on read, even when it only emits the packed form on
write.
Default values and unknown fields
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Protobuf (proto3) treats every field as *optional*: fields that hold
their default value (``0``, empty string, empty message) are omitted
from the wire format. Unknown fields encountered during parsing must
be skipped according to their wire type and not treated as errors,
which allows files produced by a newer ONNX version to still be read
by older tools. ``onnx_light`` skips unknown fields by consulting the
wire type byte and reading the appropriate number of bytes (varint,
4 bytes, 8 bytes, or *length* bytes for ``LEN``).
How ONNX uses the wire format
-----------------------------
ONNX defines its messages in ``onnx.proto``. The top-level
``ModelProto`` aggregates metadata (``ir_version``,
``producer_name``, ...), the opset imports, and the model graph.
A simplified view of the layout for a tiny model is:
.. code-block:: text
ModelProto (root)
field 1 (ir_version) VARINT
field 2 (producer_name) LEN "..."
field 7 (graph) LEN -> GraphProto
field 1 (node) LEN -> NodeProto [repeated]
field 1 (input) LEN "..." [repeated]
field 2 (output) LEN "..." [repeated]
field 4 (op_type) LEN "..."
field 5 (attribute) LEN -> AttributeProto [repeated]
field 5 (initializer) LEN -> TensorProto [repeated]
field 11 (input) LEN -> ValueInfoProto [repeated]
field 12 (output) LEN -> ValueInfoProto [repeated]
Each nested message is a self-contained length-prefixed substream, so
``onnx_light`` can parse them independently and even read tensor
payloads in parallel using a thread pool.
Tensors and ``raw_data``
~~~~~~~~~~~~~~~~~~~~~~~~
``TensorProto`` is the central message used everywhere a tensor value
appears: graph initializers, attribute defaults of type ``TENSOR``,
constant nodes, and external-data references. Its on-wire layout
illustrates almost every feature of the protobuf format described
above. The fields most commonly seen are:
.. list-table::
:header-rows: 1
:widths: 8 22 12 58
* - #
- Field
- Wire type
- Description
* - 1
- ``dims`` (``repeated int64``)
- ``LEN`` (packed)
- Tensor shape. Encoded as a single length-prefixed block of
varints; an unranked tensor has no ``dims`` field at all
(different from a scalar, which has zero ``dims`` entries
inside an empty packed block).
* - 2
- ``data_type`` (``int32`` enum)
- ``VARINT``
- Element type from ``TensorProto.DataType``
(1 = ``FLOAT``, 7 = ``INT64``, 11 = ``DOUBLE``, ...). Required
for any tensor that carries data.
* - 3
- ``segment`` (``Segment``)
- ``LEN``
- Optional ``{begin, end}`` pair used by legacy chunked tensors;
rarely populated by modern producers.
* - 4
- ``float_data`` (``repeated float``)
- ``LEN`` (packed)
- Typed payload for ``FLOAT``. Each element is 4 little-endian
bytes; the packed block size therefore equals
``4 * numel(tensor)``.
* - 5
- ``int32_data`` (``repeated int32``)
- ``LEN`` (packed)
- Typed payload for ``INT32``, ``UINT8``, ``INT8``, ``UINT16``,
``INT16``, ``BOOL``, ``FLOAT16``, ``BFLOAT16`` and the small
``FLOAT8`` / ``INT4`` / ``UINT4`` types (each element widened
to a varint).
* - 6
- ``string_data`` (``repeated bytes``)
- ``LEN``
- Typed payload for ``STRING``. Each element is its own
length-prefixed block, so this field is **unpacked** (one tag
per element); the order matches the row-major iteration of
``dims``.
* - 7
- ``int64_data`` (``repeated int64``)
- ``LEN`` (packed)
- Typed payload for ``INT64``.
* - 8
- ``name`` (``string``)
- ``LEN``
- Tensor name; matched against ``input``/``output`` names in the
enclosing graph and against ``external_data`` keys.
* - 9
- ``raw_data`` (``bytes``)
- ``LEN``
- Single contiguous blob of element bytes in little-endian order
and the native binary representation of ``data_type``
(4 bytes per ``FLOAT``, 8 bytes per ``INT64``, 2 bytes per
``FLOAT16``, packed nibbles for ``INT4`` / ``UINT4``, ...).
Mutually exclusive with the typed ``*_data`` fields.
* - 10
- ``double_data`` (``repeated double``)
- ``LEN`` (packed)
- Typed payload for ``DOUBLE`` (8 bytes per element on the wire).
* - 11
- ``uint64_data`` (``repeated uint64``)
- ``LEN`` (packed)
- Typed payload for ``UINT32`` and ``UINT64``.
* - 12
- ``doc_string`` (``string``)
- ``LEN``
- Optional human-readable description.
* - 13
- ``external_data`` (``repeated StringStringEntryProto``)
- ``LEN``
- Key/value pairs (``location``, ``offset``, ``length``,
``checksum``, ...) used when ``data_location`` is set to
``EXTERNAL``.
* - 14
- ``data_location`` (``Location`` enum)
- ``VARINT``
- ``DEFAULT`` (0) when the payload is inline (in ``raw_data`` or
a typed field), or ``EXTERNAL`` (1) when it lives in a
companion file pointed to by ``external_data``.
* - 16
- ``metadata_props`` (``repeated StringStringEntryProto``)
- ``LEN``
- Free-form key/value annotations attached to the tensor.
In the simplest case a ``FLOAT`` initializer of shape ``[2, 3]``
containing six values is written, in order, as:
.. code-block:: text
TensorProto
field 1 (dims) LEN 3 bytes -> packed varints: 2, 3
field 2 (data_type) VARINT -> 1 (FLOAT)
field 8 (name) LEN N bytes -> "weight"
field 9 (raw_data) LEN 24 bytes -> six little-endian float32 values
The typed scalar arrays (``float_data`` / ``int32_data`` /
``int64_data`` / ``double_data`` / ``uint64_data``) and ``raw_data``
are **mutually exclusive**: a parser must read whichever one is
present and use ``data_type`` to interpret the bytes. Modern
producers almost always use ``raw_data`` because:
* it is a single ``LEN`` payload — its byte size is known up front,
making memory pre-allocation trivial;
* the bytes are already in the on-disk layout, so they can be
``memcpy``-ed (or, with ``no_copy=True`` in
:ref:`l-howto-load-save-onnx-files`, simply pointed at) into the
destination buffer;
* ``onnx_light`` can hand the block to a worker thread and keep
parsing the rest of the message in parallel.
The ``raw_data`` encoding has two notable subtleties: it is always
little-endian regardless of the host byte order, and sub-byte types
(``BOOL``, ``INT4``, ``UINT4``, the ``FLOAT4E2M1`` family) are
packed two values per byte in the order specified by ``onnx.proto``.
Sparse tensors are carried by ``SparseTensorProto`` (used in
``GraphProto.sparse_initializer``), which simply embeds two
``TensorProto`` substreams — one for the non-zero values and one for
the coordinate indices — alongside the dense ``dims``.
External data
~~~~~~~~~~~~~
For models larger than a few gigabytes, ONNX supports storing tensor
payloads in companion files referenced from ``TensorProto`` via the
``external_data`` field (a repeated ``StringStringEntryProto``) and
the ``data_location`` enum. The ``.onnx`` file then only carries the
metadata (shape, type, ``location``, ``offset``, ``length``, ...) for
those tensors; the actual bytes live in one or more separate files
referenced by ``location``. ``onnx_light`` implements both sides of
this convention and can additionally split very large initializers
across multiple data files automatically (see the ``location`` and
``max_external_file_size`` options on :func:`onnx_light.onnx.save`).
The 2 GB protobuf limit
~~~~~~~~~~~~~~~~~~~~~~~
``libprotobuf`` enforces a hard 2 GB limit on a single message,
because internal offsets are stored in 32-bit signed integers.
This is the main reason large ONNX models must use external data when
serialized through the standard ``onnx`` package. ``onnx_light``
uses 64-bit offsets throughout its reader and writer, so it can
produce and consume single ``.onnx`` files larger than 2 GB while
remaining wire-compatible with the protobuf format.
Further reading
---------------
* `Protocol Buffers wire format
`_ – the
authoritative description of varints, wire types, and packed
encoding.
* `onnx.proto schema
`_ – the
``.proto`` schema that defines every ONNX message and field number.
* :ref:`l-design-differences` – how ``onnx_light`` implements this
format without depending on ``libprotobuf``.