.. _l-design-no-copy-ownership:

ModelProto creation and no-copy ownership
==========================================

This page explains exactly who owns tensor ``raw_data`` when ``no_copy=True`` is
enabled, when ownership is transferred, and when memory is released.  It also
documents the in-place tensor consolidation function
:func:`onnx_light.onnx.consolidate_tensors_to_buffer`
(C++: ``ConsolidateTensorsToBuffer``) that produces the same kind of shared-buffer
ownership after loading.

Options class hierarchy
-----------------------

Buffer-related options (alignment, size threshold) are shared across several
operations and are factored into a common base class:

* ``TensorBufferOptions`` – base class with ``raw_data_threshold`` (default: 0)
  and ``alignment`` (default: 0).
* ``ParseOptions`` inherits ``TensorBufferOptions``; its ``raw_data_threshold``
  defaults to 1024 bytes.
* ``SerializeOptions`` inherits ``TensorBufferOptions``; its ``raw_data_threshold``
  defaults to ``kSmallTensorDataThresholdBytes`` (64 bytes).

Any function that accepts a ``TensorBufferOptions`` reference also accepts
``ParseOptions`` or ``SerializeOptions`` objects.

Core objects and where ownership lives
--------------------------------------

Every tensor stores bytes in ``TensorProto::raw_data`` (type ``utils::ByteSpan``).
That ``ByteSpan`` object is a member of the ``TensorProto`` instance, so its
lifetime is tied to the model object graph (``ModelProto -> GraphProto -> TensorProto``).

``ByteSpan`` has two storage modes:

* **Owned mode**: it owns an internal byte buffer.
* **Borrowed mode**: it stores a pointer plus an optional ``std::shared_ptr<void>``
  owner token.

When the borrowed mode also carries a shared owner token, the backing storage
remains alive as long as the corresponding ``TensorProto`` (or a copy of it) is
alive.

When ownership is assigned during parsing
-----------------------------------------

Ownership is assigned while parsing each tensor:

* Inline ``raw_data`` in the protobuf payload:

  * ``no_copy=False``: bytes are copied into ``ByteSpan`` owned mode.
  * ``no_copy=True`` (from in-memory bytes): ``ByteSpan`` borrows from the input bytes buffer.
    No shared owner token is attached, so the caller owns the input bytes lifetime.

* External-data tensors (``data_location=EXTERNAL``):

  * ``no_copy=False``: bytes are copied into ``ByteSpan`` owned mode.
  * ``no_copy=True``: ``TwoFilesStream`` memory-maps (or file-maps on Windows) the
    weights file once, returns a slice pointer and a ``shared_ptr`` owner, and
    ``ByteSpan`` stores both in borrowed mode.

In other words, external-data no-copy transfers lifetime management to shared
ownership held by each tensor, while inline-bytes no-copy keeps lifetime
management with the caller.

In-place consolidation with ConsolidateTensorsToBuffer
------------------------------------------------------

The function ``ConsolidateTensorsToBuffer(ModelProto &model, const TensorBufferOptions &opts)``
(Python: ``onnx_light.onnx.consolidate_tensors_to_buffer``) takes an already-loaded
model and moves all qualifying tensor payloads into a single contiguous buffer,
reproducing the shared-buffer ownership pattern of the no-copy external-data
loading scenario:

1. All tensors whose ``raw_data.size() >= opts.raw_data_threshold`` are selected.
2. A single buffer is allocated.  If ``opts.alignment > 0``, each tensor's offset
   within the buffer is rounded up to the nearest multiple of ``alignment`` bytes,
   and the buffer start itself is aligned to the same boundary.
3. Each tensor's bytes are copied into the buffer at the computed offset.
4. Each tensor's ``raw_data`` is switched to **borrowed mode** with a shared owner
   token pointing to the new buffer, so the buffer stays alive as long as any
   tensor references it.

The function returns a ``std::shared_ptr<uint8_t[]>`` (Python: the function returns
``None``; the buffer lifetime is managed by the tensors).  Tensors that are smaller
than the threshold remain in their original owned or borrowed state.

This is useful for:

* Reducing memory fragmentation after loading a model that was parsed without the
  no-copy option.
* Enabling memory-mapping of all tensor weights after the fact.
* Preparing a model for inference runtimes that benefit from a single contiguous
  tensor weight region.

Usage example (C++)::

    #include "onnx_helper.h"

    // Load a model normally.
    ModelProto model;
    utils::FileStream stream("model.onnx");
    ParseOptions parse_opts;
    ParseProtoFromStream(model, stream, parse_opts);

    // Consolidate all tensors into a single 64-byte-aligned buffer.
    TensorBufferOptions buf_opts;
    buf_opts.alignment = 64;
    auto buf = ConsolidateTensorsToBuffer(model, buf_opts);
    // buf is now the shared buffer; the model's tensors borrow from it.

Usage example (Python)::

    import onnx_light.onnx as onnxl

    model = onnxl.load("model.onnx")

    opts = onnxl.TensorBufferOptions()
    opts.alignment = 64        # optional: align each tensor to 64-byte boundaries
    opts.raw_data_threshold = 1024  # optional: only consolidate tensors >= 1 KB
    onnxl.consolidate_tensors_to_buffer(model, opts)
    # After this call, large tensors borrow from a single shared buffer.

Loading scenarios summary
-------------------------

.. list-table::
   :header-rows: 1
   :widths: 30 20 25 25

   * - Load scenario
     - ``no_copy``
     - ``TensorProto::raw_data`` storage
     - Who must keep backing memory alive
   * - ``onnxl.load("model.onnx")`` (single-file)
     - ``False`` (default)
     - Owned copy
     - ``TensorProto`` / model
   * - ``onnxl.load("model.onnx", no_copy=True)`` (single-file)
     - ``True``
     - Owned copy (file stream cannot borrow inline payload)
     - ``TensorProto`` / model
   * - ``onnxl.load(model_bytes, no_copy=False)``
     - ``False``
     - Owned copy
     - ``TensorProto`` / model
   * - ``onnxl.load(model_bytes, no_copy=True)``
     - ``True``
     - Borrowed pointer into ``model_bytes``
     - **Caller** (must keep ``model_bytes`` alive)
   * - ``onnxl.load("model.onnx", load_external_data=True, no_copy=False)``
     - ``False``
     - Owned copy
     - ``TensorProto`` / model
   * - ``onnxl.load("model.onnx", load_external_data=True, no_copy=True)``
     - ``True``
     - Borrowed pointer + shared owner token
     - Shared ownership via ``ByteSpan`` in model tensors
   * - ``onnxl.consolidate_tensors_to_buffer(model)`` (post-load)
     - n/a
     - Borrowed pointer + shared owner token
     - Shared ownership via ``ByteSpan`` in model tensors

When memory is released
-----------------------

* Owned mode memory is released when ``ByteSpan`` is destroyed.
* **Copy scenarios** (``no_copy=False``) always use owned storage; memory is
  released when each ``ByteSpan`` is destroyed with the model/tensor object.
* **No-copy + external-data** stores borrowed pointers with a shared owner
  token; mapped/shared weights are released only when the last referencing
  ``ByteSpan`` is destroyed.
* **No-copy + inline bytes** stores borrowed pointers without owner token;
  tensors are valid only while the caller-managed input bytes object exists.
* **ConsolidateTensorsToBuffer** creates a single shared buffer and stores a
  shared owner token in each tensor's ``ByteSpan``; the buffer is released when
  all referencing ``ByteSpan`` objects (and any external ``shared_ptr`` returned
  by the C++ function) are destroyed.

Model copy/move behavior
------------------------

Moving model/tensor objects preserves ``ByteSpan`` ownership state:

* owned buffers remain owned by the destination object,
* borrowed pointers remain borrowed,
* shared owner tokens (when present in no-copy external-data) move with the tensors.

This means:

* In **copy scenarios**, model data remains owned by model objects.
* In **no-copy external-data scenarios**, data remains valid after the
  ``TwoFilesStream`` parser object is destroyed because each tensor keeps a
  shared owner token for the mapped buffer.
* In **no-copy inline-bytes scenarios**, tensors still depend on the original
  caller-provided bytes object lifetime.
* After **ConsolidateTensorsToBuffer**, tensors remain valid regardless of
  whether the caller retains the returned ``shared_ptr`` because each tensor
  holds its own owner token.