QuantizeLinear - version 10#

This page documents version 10 of operator QuantizeLinear. See QuantizeLinear for the latest version (since version 25).

  • Domain: ai.onnx

  • Since version: 10

The linear per-tensor/layer quantization operator. It consumes a high precision tensor, a scale, a zero point to compute the low precision / quantized tensor. The quantization formula is y = saturate ((x / y_scale) + y_zero_point). For saturation, it saturates to [0, 255] if it’s uint8, or [-128, 127] if it’s int8. For (x / y_scale), it’s rounding to the nearest even. Refer to https://en.wikipedia.org/wiki/Rounding for details. ‘y_zero_point’ and ‘y’ must have same type.

Inputs

  • x (T1): N-D full precision Input tensor to be quantized.

  • y_scale (tensor(float)): Scale for doing quantization to get ‘y’. It’s a scalar, which means a per-tensor/layer quantization.

  • y_zero_point (T2): Zero point for doing quantization to get ‘y’. It’s a scalar, which means a per-tensor/layer quantization. Default value is uint8 typed 0 if it’s not specified.

Outputs

  • y (T2): N-D quantized output tensor. It has same shape as input ‘x’.

Type Constraints

  • T1: Constrain ‘x’ to float or int32 tensor. Allowed types: tensor(float), tensor(int32).

  • T2: Constrain ‘y_zero_point’ and ‘y’ to 8-bit integer tensor. Allowed types: tensor(int8), tensor(uint8).