ModelProto creation and no-copy ownership#
This page explains exactly who owns tensor raw_data when no_copy=True is
enabled, when ownership is transferred, and when memory is released. It also
documents the in-place tensor consolidation function
onnx_light.onnx.consolidate_tensors_to_buffer()
(C++: ConsolidateTensorsToBuffer) that produces the same kind of shared-buffer
ownership after loading.
Options class hierarchy#
Buffer-related options (alignment, size threshold) are shared across several operations and are factored into a common base class:
TensorBufferOptions– base class withraw_data_threshold(default: 0) andalignment(default: 0).ParseOptionsinheritsTensorBufferOptions; itsraw_data_thresholddefaults to 1024 bytes.SerializeOptionsinheritsTensorBufferOptions; itsraw_data_thresholddefaults tokSmallTensorDataThresholdBytes(64 bytes).
Any function that accepts a TensorBufferOptions reference also accepts
ParseOptions or SerializeOptions objects.
Core objects and where ownership lives#
Every tensor stores bytes in TensorProto::raw_data (type utils::ByteSpan).
That ByteSpan object is a member of the TensorProto instance, so its
lifetime is tied to the model object graph (ModelProto -> GraphProto -> TensorProto).
ByteSpan has two storage modes:
Owned mode: it owns an internal byte buffer.
Borrowed mode: it stores a pointer plus an optional
std::shared_ptr<void>owner token.
When the borrowed mode also carries a shared owner token, the backing storage
remains alive as long as the corresponding TensorProto (or a copy of it) is
alive.
When ownership is assigned during parsing#
Ownership is assigned while parsing each tensor:
Inline
raw_datain the protobuf payload:no_copy=False: bytes are copied intoByteSpanowned mode.no_copy=True(from in-memory bytes):ByteSpanborrows from the input bytes buffer. No shared owner token is attached, so the caller owns the input bytes lifetime.
External-data tensors (
data_location=EXTERNAL):no_copy=False: bytes are copied intoByteSpanowned mode.no_copy=True:TwoFilesStreammemory-maps (or file-maps on Windows) the weights file once, returns a slice pointer and ashared_ptrowner, andByteSpanstores both in borrowed mode.
In other words, external-data no-copy transfers lifetime management to shared ownership held by each tensor, while inline-bytes no-copy keeps lifetime management with the caller.
In-place consolidation with ConsolidateTensorsToBuffer#
The function ConsolidateTensorsToBuffer(ModelProto &model, const TensorBufferOptions &opts)
(Python: onnx_light.onnx.consolidate_tensors_to_buffer) takes an already-loaded
model and moves all qualifying tensor payloads into a single contiguous buffer,
reproducing the shared-buffer ownership pattern of the no-copy external-data
loading scenario:
All tensors whose
raw_data.size() >= opts.raw_data_thresholdare selected.A single buffer is allocated. If
opts.alignment > 0, each tensor’s offset within the buffer is rounded up to the nearest multiple ofalignmentbytes, and the buffer start itself is aligned to the same boundary.Each tensor’s bytes are copied into the buffer at the computed offset.
Each tensor’s
raw_datais switched to borrowed mode with a shared owner token pointing to the new buffer, so the buffer stays alive as long as any tensor references it.
The function returns a std::shared_ptr<uint8_t[]> (Python: the function returns
None; the buffer lifetime is managed by the tensors). Tensors that are smaller
than the threshold remain in their original owned or borrowed state.
This is useful for:
Reducing memory fragmentation after loading a model that was parsed without the no-copy option.
Enabling memory-mapping of all tensor weights after the fact.
Preparing a model for inference runtimes that benefit from a single contiguous tensor weight region.
Usage example (C++):
#include "onnx_helper.h"
// Load a model normally.
ModelProto model;
utils::FileStream stream("model.onnx");
ParseOptions parse_opts;
ParseProtoFromStream(model, stream, parse_opts);
// Consolidate all tensors into a single 64-byte-aligned buffer.
TensorBufferOptions buf_opts;
buf_opts.alignment = 64;
auto buf = ConsolidateTensorsToBuffer(model, buf_opts);
// buf is now the shared buffer; the model's tensors borrow from it.
Usage example (Python):
import onnx_light.onnx as onnxl
model = onnxl.load("model.onnx")
opts = onnxl.TensorBufferOptions()
opts.alignment = 64 # optional: align each tensor to 64-byte boundaries
opts.raw_data_threshold = 1024 # optional: only consolidate tensors >= 1 KB
onnxl.consolidate_tensors_to_buffer(model, opts)
# After this call, large tensors borrow from a single shared buffer.
Loading scenarios summary#
Load scenario |
|
|
Who must keep backing memory alive |
|---|---|---|---|
|
|
Owned copy |
|
|
|
Owned copy (file stream cannot borrow inline payload) |
|
|
|
Owned copy |
|
|
|
Borrowed pointer into |
Caller (must keep |
|
|
Owned copy |
|
|
|
Borrowed pointer + shared owner token |
Shared ownership via |
|
n/a |
Borrowed pointer + shared owner token |
Shared ownership via |
When memory is released#
Owned mode memory is released when
ByteSpanis destroyed.Copy scenarios (
no_copy=False) always use owned storage; memory is released when eachByteSpanis destroyed with the model/tensor object.No-copy + external-data stores borrowed pointers with a shared owner token; mapped/shared weights are released only when the last referencing
ByteSpanis destroyed.No-copy + inline bytes stores borrowed pointers without owner token; tensors are valid only while the caller-managed input bytes object exists.
ConsolidateTensorsToBuffer creates a single shared buffer and stores a shared owner token in each tensor’s
ByteSpan; the buffer is released when all referencingByteSpanobjects (and any externalshared_ptrreturned by the C++ function) are destroyed.
Model copy/move behavior#
Moving model/tensor objects preserves ByteSpan ownership state:
owned buffers remain owned by the destination object,
borrowed pointers remain borrowed,
shared owner tokens (when present in no-copy external-data) move with the tensors.
This means:
In copy scenarios, model data remains owned by model objects.
In no-copy external-data scenarios, data remains valid after the
TwoFilesStreamparser object is destroyed because each tensor keeps a shared owner token for the mapped buffer.In no-copy inline-bytes scenarios, tensors still depend on the original caller-provided bytes object lifetime.
After ConsolidateTensorsToBuffer, tensors remain valid regardless of whether the caller retains the returned
shared_ptrbecause each tensor holds its own owner token.