Using backend tests to evaluate a runtime#

This page explains how to use the backend test suite shipped with onnx-light to validate that a custom ONNX runtime produces correct numerical results.

The backend test infrastructure is located in onnx_light.backend.test.case and mirrors the structure of the official ONNX backend test suite. The registered node test cases are generated by the C++ lib_onnx_backend_test library and exposed to Python through collect_test_case(). Downstream code can still register additional Python-only test cases by subclassing Base and calling the expect() helper. The make_test_class() function then turns those test cases into a standard unittest.TestCase subclass that calls into a user-supplied runtime function.


Defining a runtime function#

The only requirement for plugging in a runtime is to write a callable with the following signature:

def my_runtime(model, *inputs: np.ndarray) -> list[np.ndarray]:
    ...

where

  • model is an onnx_light.onnx.ModelProto (the ONNX model for the test case),

  • *inputs are numpy.ndarray objects corresponding to the model’s graph inputs in order, and

  • the return value is a list of numpy.ndarray objects corresponding to the model’s graph outputs in order.

The runtime may serialize the ModelProto to bytes, pass it to any ONNX-compatible engine, and return the results.


Generating a test class#

Call make_test_class() with the runtime callable to obtain a ExtTestCase subclass whose methods are one test per registered test case:

import unittest
import numpy as np
from onnx_light.backend.test.case import make_test_class


def my_runtime(model, *inputs: np.ndarray) -> list[np.ndarray]:
    # replace with the actual engine call
    raise NotImplementedError


MyBackendTests = make_test_class(my_runtime)

if __name__ == "__main__":
    unittest.main(verbosity=2)

Running the file with python or through any unittest-compatible runner (pytest, etc.) will execute every registered node test case and report failures when the runtime output differs from the expected output.


Filtering tests#

Two optional parameters let you restrict which test cases are executed.

include_regex

A list of regular-expression patterns. Only test cases whose name matches at least one pattern are kept.

exclude_regex

A list of regular-expression patterns. Test cases whose name matches at least one pattern are discarded (evaluated before include_regex).

Example — run only tests related to element-wise arithmetic:

ArithmeticTests = make_test_class(
    my_runtime,
    include_regex=[r"^test_add", r"^test_sub", r"^test_mul", r"^test_div"],
)

Example — run everything except the quantization operators:

NoQuantTests = make_test_class(
    my_runtime,
    exclude_regex=[r"quantize", r"dequantize"],
)

Adjusting numerical tolerances#

By default each test case uses atol=1e-7 and rtol=1e-3. These values can be overridden globally per test-case name via the atols and rtols dictionaries:

MyBackendTests = make_test_class(
    my_runtime,
    atols={"test_cast_FLOAT_to_FLOAT16": 1e-3},
    rtols={"test_cast_FLOAT_to_FLOAT16": 1e-2},
)

Filtering test cases by operator and opset#

The helper get_test_cases_for_op() returns the subset of collected backend test cases whose model contains a node with a given op_type (and optionally a given domain / opset_version). This is convenient when a backend wants to focus on a single operator (and version) at a time:

from onnx_light.backend.test.case import get_test_cases_for_op

# All cases that exercise Abs in the default ai.onnx domain.
abs_cases = get_test_cases_for_op("Abs")

# Cases that import ai.onnx at exactly version 13 and use Abs.
abs_v13 = get_test_cases_for_op("Abs", opset_version=13)

# Cases that use Abs from a custom domain.
custom = get_test_cases_for_op("Abs", domain="my.custom.domain")

When called without test_cases, the helper calls collect_test_case() internally. A precomputed mapping can be passed via the test_cases argument to avoid recollecting test cases on repeated lookups.


Full example: ONNXRuntime backend#

The file unittests/backend/test_backend_with_onnxruntime.py in the repository is a ready-to-run example that exercises every registered backend test case through ONNXRuntime:

import unittest
import numpy as np
import onnxruntime as ort
from onnx_light.backend.test.case import make_test_class


def onnxruntime_backend(model, *inputs: np.ndarray) -> list[np.ndarray]:
    """
    Runs an ONNX model using ONNXRuntime.

    Args:
        model: The ONNX model (onnx_light.ModelProto) to run
        *inputs: Input arrays for the model

    Returns:
        List of output arrays from the model
    """
    # Serialize the model to bytes
    model_bytes = model.SerializeToString()

    # Create an ONNXRuntime inference session
    sess = ort.InferenceSession(model_bytes, providers=["CPUExecutionProvider"])

    # Get input names from the session
    input_names = [inp.name for inp in sess.get_inputs()]

    # Create input dictionary
    input_dict = dict(zip(input_names, inputs))

    # Run inference
    outputs = sess.run(None, input_dict)
    return outputs


# Backend test cases that ONNXRuntime cannot run as-is:
#   * ``test_cc_roialign_max`` — ORT's RoiAlign max-mode implementation does
#     not match the ONNX reference (ORT emits a warning on session creation).
#   * ``test_cc_flex_attention_*`` — ORT does not register the
#     ``ai.onnx.preview`` domain, so these models fail to load with
#     "ai.onnx.preview:FlexAttention(-1) is not a registered function/op".
#   * ``test_cc_adam_*``, ``test_adam`` and ``test_adam_multiple`` — ORT does
#     not register the ``ai.onnx.preview.training`` domain, so these models
#     fail to load with
#     "ai.onnx.preview.training:Adam(-1) is not a registered function/op".
#   * ``test_cc_binarizer_int64`` — ORT only registers a ``float`` kernel for
#     ``ai.onnx.ml::Binarizer``, so the ``int64`` variant fails with
#     "Could not find an implementation for Binarizer(1) node". The
#     ``float`` variant (``test_cc_binarizer_float``) is still exercised.
# These cases remain covered by the reference backend tests.
ORT_EXCLUDE_REGEX = [
    r"^test_cc_roialign_max$",
    r"^test_cc_flex_attention_",
    r"^test_cc_adam_",
    r"^test_adam$",
    r"^test_adam_multiple$",
    r"^test_cc_binarizer_int64$",
]

TestOrtBackend = make_test_class(onnxruntime_backend, exclude_regex=ORT_EXCLUDE_REGEX)


if __name__ == "__main__":
    unittest.main(verbosity=2)

The runtime function serialises the ModelProto to bytes with SerializeToString(), creates an onnxruntime.InferenceSession, and returns the inference outputs.

Run it with:

python -m pytest unittests/backend/test_backend_with_onnxruntime.py -v

or, to run only the Abs test cases:

python -m pytest unittests/backend/test_backend_with_onnxruntime.py -v -k abs

How test cases are collected#

collect_test_case() first collects every node test case registered by the C++ lib_onnx_backend_test library (exposed through the onnx_light.onnx_py._onnxpy.backend_test Python bindings). It then runs every export_* class method declared on any user-defined subclass of Base; each call to expect() appends one TestCase to the global ALL_TESTS dictionary. Python-defined cases take precedence over C++ cases with the same name.

make_test_class() calls collect_test_case() internally, so tests are always re-collected from scratch when the function is called.


Running backend tests in C++#

The exact same node test cases are also available directly from C++ via the lib_onnx_backend_test static library, with no dependency on Python. The library lives in onnx_light/onnx_backend_test/ and only depends on lib_onnx_proto. It exposes:

  • a runtime onnx::onnx_backend_test::Tensor (distinct from onnx::TensorProto) that stores raw element bytes,

  • a onnx::onnx_backend_test::TestCase bundle of onnx::ModelProto and expected input/output data sets,

  • the onnx::onnx_backend_test::Expect() helper used by every RegisterXxxCases function to register a single-node model, and

  • onnx::onnx_backend_test::CollectTestCases(), which returns the full registry of node test cases (the same registry that the Python bindings expose through onnx_light.onnx_py._onnxpy.backend_test).

Per-operator cases are organised under onnx_light/onnx_backend_test/cases/<group>/ (math, logical, nn, tensor, …) and the expected outputs are computed with the reference kernels under onnx_light/onnx_backend_test/kernels/<group>/ so the registry is fully self-contained and deterministic.

A minimal C++ runtime evaluator therefore looks like:

#include "onnx_backend_test/test_case.h"

using namespace onnx::onnx_backend_test;

int main() {
  std::vector<TestCase> cases = CollectTestCases();
  for (const TestCase &tc : cases) {
    // Serialize tc.model and run it through your engine, then
    // compare against tc.data_sets[*].outputs using tc.atol / tc.rtol.
  }
  return 0;
}

The library ships its own GoogleTest-based unit tests under unittests/cc_onnx_backend_test/. To build and run them, configure the project with ONNX_LIGHT_BUILD_TESTS=ON and use ctest:

cmake -S . -B build -DONNX_LIGHT_BUILD_TESTS=ON
cmake --build build -j
ctest --test-dir build -R Backend --output-on-failure

The -R regex can be tightened (for example -R BackendKernelClass) to focus on a single test group.


See also#