Using backend tests to evaluate a runtime#
This page explains how to use the backend test suite shipped with onnx-light to validate that a custom ONNX runtime produces correct numerical results.
The backend test infrastructure is located in
onnx_light.backend.test.case and mirrors the structure of the
official ONNX backend test suite. The registered node test cases are
generated by the C++ lib_onnx_backend_test library and exposed to
Python through collect_test_case().
Downstream code can still register additional Python-only test cases
by subclassing Base and
calling the expect() helper.
The make_test_class() function
then turns those test cases into a standard unittest.TestCase
subclass that calls into a user-supplied runtime function.
Defining a runtime function#
The only requirement for plugging in a runtime is to write a callable with the following signature:
def my_runtime(model, *inputs: np.ndarray) -> list[np.ndarray]:
...
where
modelis anonnx_light.onnx.ModelProto(the ONNX model for the test case),*inputsarenumpy.ndarrayobjects corresponding to the model’s graph inputs in order, andthe return value is a list of
numpy.ndarrayobjects corresponding to the model’s graph outputs in order.
The runtime may serialize the ModelProto to bytes, pass it to any
ONNX-compatible engine, and return the results.
Generating a test class#
Call make_test_class() with
the runtime callable to obtain a
ExtTestCase subclass whose methods
are one test per registered test case:
import unittest
import numpy as np
from onnx_light.backend.test.case import make_test_class
def my_runtime(model, *inputs: np.ndarray) -> list[np.ndarray]:
# replace with the actual engine call
raise NotImplementedError
MyBackendTests = make_test_class(my_runtime)
if __name__ == "__main__":
unittest.main(verbosity=2)
Running the file with python or through any unittest-compatible
runner (pytest, etc.) will execute every registered node test case and
report failures when the runtime output differs from the expected output.
Filtering tests#
Two optional parameters let you restrict which test cases are executed.
include_regexA list of regular-expression patterns. Only test cases whose name matches at least one pattern are kept.
exclude_regexA list of regular-expression patterns. Test cases whose name matches at least one pattern are discarded (evaluated before
include_regex).
Example — run only tests related to element-wise arithmetic:
ArithmeticTests = make_test_class(
my_runtime,
include_regex=[r"^test_add", r"^test_sub", r"^test_mul", r"^test_div"],
)
Example — run everything except the quantization operators:
NoQuantTests = make_test_class(
my_runtime,
exclude_regex=[r"quantize", r"dequantize"],
)
Adjusting numerical tolerances#
By default each test case uses atol=1e-7 and rtol=1e-3. These
values can be overridden globally per test-case name via the atols
and rtols dictionaries:
MyBackendTests = make_test_class(
my_runtime,
atols={"test_cast_FLOAT_to_FLOAT16": 1e-3},
rtols={"test_cast_FLOAT_to_FLOAT16": 1e-2},
)
Filtering test cases by operator and opset#
The helper
get_test_cases_for_op() returns
the subset of collected backend test cases whose model contains a node
with a given op_type (and optionally a given domain /
opset_version). This is convenient when a backend wants to focus on
a single operator (and version) at a time:
from onnx_light.backend.test.case import get_test_cases_for_op
# All cases that exercise Abs in the default ai.onnx domain.
abs_cases = get_test_cases_for_op("Abs")
# Cases that import ai.onnx at exactly version 13 and use Abs.
abs_v13 = get_test_cases_for_op("Abs", opset_version=13)
# Cases that use Abs from a custom domain.
custom = get_test_cases_for_op("Abs", domain="my.custom.domain")
When called without test_cases, the helper calls
collect_test_case() internally.
A precomputed mapping can be passed via the test_cases argument to
avoid recollecting test cases on repeated lookups.
Full example: ONNXRuntime backend#
The file unittests/backend/test_backend_with_onnxruntime.py in the
repository is a ready-to-run example that exercises every registered
backend test case through
ONNXRuntime:
import unittest
import numpy as np
import onnxruntime as ort
from onnx_light.backend.test.case import make_test_class
def onnxruntime_backend(model, *inputs: np.ndarray) -> list[np.ndarray]:
"""
Runs an ONNX model using ONNXRuntime.
Args:
model: The ONNX model (onnx_light.ModelProto) to run
*inputs: Input arrays for the model
Returns:
List of output arrays from the model
"""
# Serialize the model to bytes
model_bytes = model.SerializeToString()
# Create an ONNXRuntime inference session
sess = ort.InferenceSession(model_bytes, providers=["CPUExecutionProvider"])
# Get input names from the session
input_names = [inp.name for inp in sess.get_inputs()]
# Create input dictionary
input_dict = dict(zip(input_names, inputs))
# Run inference
outputs = sess.run(None, input_dict)
return outputs
# Backend test cases that ONNXRuntime cannot run as-is:
# * ``test_cc_roialign_max`` — ORT's RoiAlign max-mode implementation does
# not match the ONNX reference (ORT emits a warning on session creation).
# * ``test_cc_flex_attention_*`` — ORT does not register the
# ``ai.onnx.preview`` domain, so these models fail to load with
# "ai.onnx.preview:FlexAttention(-1) is not a registered function/op".
# * ``test_cc_adam_*``, ``test_adam`` and ``test_adam_multiple`` — ORT does
# not register the ``ai.onnx.preview.training`` domain, so these models
# fail to load with
# "ai.onnx.preview.training:Adam(-1) is not a registered function/op".
# * ``test_cc_binarizer_int64`` — ORT only registers a ``float`` kernel for
# ``ai.onnx.ml::Binarizer``, so the ``int64`` variant fails with
# "Could not find an implementation for Binarizer(1) node". The
# ``float`` variant (``test_cc_binarizer_float``) is still exercised.
# These cases remain covered by the reference backend tests.
ORT_EXCLUDE_REGEX = [
r"^test_cc_roialign_max$",
r"^test_cc_flex_attention_",
r"^test_cc_adam_",
r"^test_adam$",
r"^test_adam_multiple$",
r"^test_cc_binarizer_int64$",
]
TestOrtBackend = make_test_class(onnxruntime_backend, exclude_regex=ORT_EXCLUDE_REGEX)
if __name__ == "__main__":
unittest.main(verbosity=2)
The runtime function serialises the ModelProto
to bytes with SerializeToString(),
creates an onnxruntime.InferenceSession, and returns the inference
outputs.
Run it with:
python -m pytest unittests/backend/test_backend_with_onnxruntime.py -v
or, to run only the Abs test cases:
python -m pytest unittests/backend/test_backend_with_onnxruntime.py -v -k abs
How test cases are collected#
collect_test_case() first
collects every node test case registered by the C++
lib_onnx_backend_test library (exposed through the
onnx_light.onnx_py._onnxpy.backend_test Python bindings). It then
runs every export_* class method declared on any user-defined
subclass of Base; each call
to expect() appends one
TestCase to the global
ALL_TESTS dictionary. Python-defined cases take precedence over
C++ cases with the same name.
make_test_class() calls
collect_test_case() internally,
so tests are always re-collected from scratch when the function is called.
Running backend tests in C++#
The exact same node test cases are also available directly from C++ via
the lib_onnx_backend_test static library, with no dependency on
Python. The library lives in onnx_light/onnx_backend_test/ and only
depends on lib_onnx_proto. It exposes:
a runtime
onnx::onnx_backend_test::Tensor(distinct fromonnx::TensorProto) that stores raw element bytes,a
onnx::onnx_backend_test::TestCasebundle ofonnx::ModelProtoand expected input/output data sets,the
onnx::onnx_backend_test::Expect()helper used by everyRegisterXxxCasesfunction to register a single-node model, andonnx::onnx_backend_test::CollectTestCases(), which returns the full registry of node test cases (the same registry that the Python bindings expose throughonnx_light.onnx_py._onnxpy.backend_test).
Per-operator cases are organised under
onnx_light/onnx_backend_test/cases/<group>/ (math, logical,
nn, tensor, …) and the expected outputs are computed with the
reference kernels under
onnx_light/onnx_backend_test/kernels/<group>/ so the registry is
fully self-contained and deterministic.
A minimal C++ runtime evaluator therefore looks like:
#include "onnx_backend_test/test_case.h"
using namespace onnx::onnx_backend_test;
int main() {
std::vector<TestCase> cases = CollectTestCases();
for (const TestCase &tc : cases) {
// Serialize tc.model and run it through your engine, then
// compare against tc.data_sets[*].outputs using tc.atol / tc.rtol.
}
return 0;
}
The library ships its own GoogleTest-based unit tests under
unittests/cc_onnx_backend_test/. To build and run them, configure
the project with ONNX_LIGHT_BUILD_TESTS=ON and use ctest:
cmake -S . -B build -DONNX_LIGHT_BUILD_TESTS=ON
cmake --build build -j
ctest --test-dir build -R Backend --output-on-failure
The -R regex can be tightened (for example -R BackendKernelClass)
to focus on a single test group.
See also#
Python API: onnx_light.backend — Python API reference for the backend module.
onnx_backend_test — C++ API reference for the
lib_onnx_backend_testlibrary.