Protobuf format applied to ONNX#

ONNX models are serialized using Protocol Buffers (protobuf), Google’s compact binary encoding format. An .onnx file is just the binary serialization of a ModelProto message defined in onnx.proto. This page explains the relevant subset of the protobuf wire format and how it maps to ONNX message types. It is meant to help readers understand the low-level layout that onnx produces and that onnx_light parses and writes without depending on libprotobuf.


The wire format in a nutshell#

A protobuf-encoded message is a flat concatenation of (tag, value) pairs. There is no framing, no length prefix at the top level, and no field ordering requirement. A consumer reads bytes from start to end, decodes each tag, and dispatches the following bytes according to the wire type embedded in the tag.

Tag encoding#

Each field starts with a tag encoded as a varint (see below). The tag combines the field number declared in the .proto file with a 3-bit wire type:

tag = (field_number << 3) | wire_type

The wire types relevant to ONNX are:

Value

Name

Meaning

0

VARINT

Variable-length integer (int32, int64, uint32, uint64, bool, enum, sint32, sint64).

1

I64

Fixed 8 bytes, little-endian (fixed64, sfixed64, double).

2

LEN

Length-prefixed payload: a varint length followed by length bytes (string, bytes, embedded messages, packed repeated fields).

5

I32

Fixed 4 bytes, little-endian (fixed32, sfixed32, float).

Wire types 3 and 4 (start-group / end-group) are deprecated and not used by ONNX.

The decoding side appears in onnx_light/onnx_proto/stream.cc: the function BinaryStream::next_field() reads the tag varint and splits it into field_number = tag >> 3 and wire_type = tag & 0x07.

Varints#

A varint (variable-length integer) encodes an unsigned integer using 1 to 10 bytes. Each byte stores 7 payload bits in its low bits and uses the most significant bit (0x80) as a continuation flag:

  • 0x80 set: more bytes follow.

  • 0x80 clear: this is the last byte.

The payload bits are stored little-endian (least-significant 7 bits first). For example the integer 300 encodes as 0xAC 0x02:

byte 0: 1010 1100   -> continuation=1, payload=0101100  (low 7 bits)
byte 1: 0000 0010   -> continuation=0, payload=0000010  (next 7 bits)
value = (0000010 << 7) | 0101100 = 300

Because a 64-bit value contains at most ceil(64 / 7) = 10 payload groups, a varint never exceeds 10 bytes. Field numbers from 1 to 15 fit in a single tag byte, which is why frequently used fields in ONNX are assigned small numbers.

ZigZag encoding#

The protobuf types sint32 and sint64 apply ZigZag mapping before varint encoding so that small negative numbers do not require the full 10 bytes. The mapping interleaves positive and negative values:

 0 -> 0
-1 -> 1
 1 -> 2
-2 -> 3
 2 -> 4
 ...

It is implemented in onnx_light/onnx_proto/stream.h by encodeZigZag64 / decodeZigZag64. Note that the plain int32 and int64 ONNX fields do not use ZigZag; they use the two’s-complement representation directly, which is why a negative int64 always takes 10 bytes.

Length-prefixed values#

For wire type LEN the encoder writes a varint length followed by length raw bytes. This single mechanism is reused for:

  • UTF-8 strings (string);

  • arbitrary byte blobs (bytes), including tensor raw_data;

  • nested messages (such as GraphProto inside ModelProto);

  • packed repeated fields of scalar types.

Embedded messages are simply written out as their own bytestream and prefixed by their total size, so the parser can either descend into the substream or skip the whole region. onnx_light exposes this pattern through BinaryStream::LimitToNext() and Restore(), which push and pop a temporary read limit corresponding to the length-prefixed substream.

Packed repeated fields#

Repeated scalar fields can be encoded in two ways:

  • Unpacked – each element is written with its own tag, repeating the field number once per value. This is the only legal encoding for repeated message fields and the legacy encoding for proto2 scalar fields.

  • Packed – all elements are concatenated into a single length-prefixed block (wire type LEN) with a single tag.

ONNX uses packed encoding for scalar arrays such as TensorProto.float_data, TensorProto.int32_data, and TensorProto.dims. A conformant parser must support both representations on read, even when it only emits the packed form on write.

Default values and unknown fields#

Protobuf (proto3) treats every field as optional: fields that hold their default value (0, empty string, empty message) are omitted from the wire format. Unknown fields encountered during parsing must be skipped according to their wire type and not treated as errors, which allows files produced by a newer ONNX version to still be read by older tools. onnx_light skips unknown fields by consulting the wire type byte and reading the appropriate number of bytes (varint, 4 bytes, 8 bytes, or length bytes for LEN).

How ONNX uses the wire format#

ONNX defines its messages in onnx.proto. The top-level ModelProto aggregates metadata (ir_version, producer_name, …), the opset imports, and the model graph. A simplified view of the layout for a tiny model is:

ModelProto (root)
  field 1  (ir_version)        VARINT
  field 2  (producer_name)     LEN  "..."
  field 7  (graph)             LEN  -> GraphProto
    field 1 (node)             LEN  -> NodeProto    [repeated]
      field 1 (input)          LEN  "..."           [repeated]
      field 2 (output)         LEN  "..."           [repeated]
      field 4 (op_type)        LEN  "..."
      field 5 (attribute)      LEN  -> AttributeProto [repeated]
    field 5 (initializer)      LEN  -> TensorProto   [repeated]
    field 11 (input)           LEN  -> ValueInfoProto [repeated]
    field 12 (output)          LEN  -> ValueInfoProto [repeated]

Each nested message is a self-contained length-prefixed substream, so onnx_light can parse them independently and even read tensor payloads in parallel using a thread pool.

Tensors and raw_data#

TensorProto is the central message used everywhere a tensor value appears: graph initializers, attribute defaults of type TENSOR, constant nodes, and external-data references. Its on-wire layout illustrates almost every feature of the protobuf format described above. The fields most commonly seen are:

#

Field

Wire type

Description

1

dims (repeated int64)

LEN (packed)

Tensor shape. Encoded as a single length-prefixed block of varints; an unranked tensor has no dims field at all (different from a scalar, which has zero dims entries inside an empty packed block).

2

data_type (int32 enum)

VARINT

Element type from TensorProto.DataType (1 = FLOAT, 7 = INT64, 11 = DOUBLE, …). Required for any tensor that carries data.

3

segment (Segment)

LEN

Optional {begin, end} pair used by legacy chunked tensors; rarely populated by modern producers.

4

float_data (repeated float)

LEN (packed)

Typed payload for FLOAT. Each element is 4 little-endian bytes; the packed block size therefore equals 4 * numel(tensor).

5

int32_data (repeated int32)

LEN (packed)

Typed payload for INT32, UINT8, INT8, UINT16, INT16, BOOL, FLOAT16, BFLOAT16 and the small FLOAT8 / INT4 / UINT4 types (each element widened to a varint).

6

string_data (repeated bytes)

LEN

Typed payload for STRING. Each element is its own length-prefixed block, so this field is unpacked (one tag per element); the order matches the row-major iteration of dims.

7

int64_data (repeated int64)

LEN (packed)

Typed payload for INT64.

8

name (string)

LEN

Tensor name; matched against input/output names in the enclosing graph and against external_data keys.

9

raw_data (bytes)

LEN

Single contiguous blob of element bytes in little-endian order and the native binary representation of data_type (4 bytes per FLOAT, 8 bytes per INT64, 2 bytes per FLOAT16, packed nibbles for INT4 / UINT4, …). Mutually exclusive with the typed *_data fields.

10

double_data (repeated double)

LEN (packed)

Typed payload for DOUBLE (8 bytes per element on the wire).

11

uint64_data (repeated uint64)

LEN (packed)

Typed payload for UINT32 and UINT64.

12

doc_string (string)

LEN

Optional human-readable description.

13

external_data (repeated StringStringEntryProto)

LEN

Key/value pairs (location, offset, length, checksum, …) used when data_location is set to EXTERNAL.

14

data_location (Location enum)

VARINT

DEFAULT (0) when the payload is inline (in raw_data or a typed field), or EXTERNAL (1) when it lives in a companion file pointed to by external_data.

16

metadata_props (repeated StringStringEntryProto)

LEN

Free-form key/value annotations attached to the tensor.

In the simplest case a FLOAT initializer of shape [2, 3] containing six values is written, in order, as:

TensorProto
  field 1  (dims)       LEN  3 bytes  -> packed varints: 2, 3
  field 2  (data_type)  VARINT        -> 1   (FLOAT)
  field 8  (name)       LEN  N bytes  -> "weight"
  field 9  (raw_data)   LEN  24 bytes -> six little-endian float32 values

The typed scalar arrays (float_data / int32_data / int64_data / double_data / uint64_data) and raw_data are mutually exclusive: a parser must read whichever one is present and use data_type to interpret the bytes. Modern producers almost always use raw_data because:

  • it is a single LEN payload — its byte size is known up front, making memory pre-allocation trivial;

  • the bytes are already in the on-disk layout, so they can be memcpy-ed (or, with no_copy=True in html_theme.sidebar_secondary.remove, simply pointed at) into the destination buffer;

  • onnx_light can hand the block to a worker thread and keep parsing the rest of the message in parallel.

The raw_data encoding has two notable subtleties: it is always little-endian regardless of the host byte order, and sub-byte types (BOOL, INT4, UINT4, the FLOAT4E2M1 family) are packed two values per byte in the order specified by onnx.proto.

Sparse tensors are carried by SparseTensorProto (used in GraphProto.sparse_initializer), which simply embeds two TensorProto substreams — one for the non-zero values and one for the coordinate indices — alongside the dense dims.

External data#

For models larger than a few gigabytes, ONNX supports storing tensor payloads in companion files referenced from TensorProto via the external_data field (a repeated StringStringEntryProto) and the data_location enum. The .onnx file then only carries the metadata (shape, type, location, offset, length, …) for those tensors; the actual bytes live in one or more separate files referenced by location. onnx_light implements both sides of this convention and can additionally split very large initializers across multiple data files automatically (see the location and max_external_file_size options on onnx_light.onnx.save()).

The 2 GB protobuf limit#

libprotobuf enforces a hard 2 GB limit on a single message, because internal offsets are stored in 32-bit signed integers. This is the main reason large ONNX models must use external data when serialized through the standard onnx package. onnx_light uses 64-bit offsets throughout its reader and writer, so it can produce and consume single .onnx files larger than 2 GB while remaining wire-compatible with the protobuf format.

Further reading#