DynamicQuantizeLinear#

DynamicQuantizeLinear - 11#

Version

This version of the operator has been available since version 11.

Summary

A Function to fuse calculation for Scale, Zero Point and FP32->8Bit convertion of FP32 Input data. Outputs Scale, ZeroPoint and Quantized Input for a given FP32 Input. Scale is calculated as:

y_scale = (max(x) - min(x))/(qmax - qmin)
* where qmax and qmin are max and min values for quantization range .i.e [0, 255] in case of uint8
* data range is adjusted to include 0.

Zero point is calculated as:

intermediate_zero_point = qmin - min(x)/y_scale
y_zero_point = cast(round(saturate(itermediate_zero_point)))
* where qmax and qmin are max and min values for quantization range .i.e [0, 255] in case of uint8
* for saturation, it saturates to [0, 255] if it's uint8, or [-127, 127] if it's int8. Right now only uint8 is supported.
* rounding to nearest ties to even.

Data quantization formula is:

y = saturate (round (x / y_scale) + y_zero_point)
* for saturation, it saturates to [0, 255] if it's uint8, or [-127, 127] if it's int8. Right now only uint8 is supported.
* rounding to nearest ties to even.

Inputs

  • x (heterogeneous) - T1: Input tensor

Outputs

  • y (heterogeneous) - T2: Quantized output tensor

  • y_scale (heterogeneous) - tensor(float): Output scale. It’s a scalar, which means a per-tensor/layer quantization.

  • y_zero_point (heterogeneous) - T2: Output zero point. It’s a scalar, which means a per-tensor/layer quantization.

Type Constraints

  • T1 in ( tensor(float) ): Constrain ‘x’ to float tensor.

  • T2 in ( tensor(uint8) ): Constrain ‘y_zero_point’ and ‘y’ to 8-bit unsigned integer tensor.

Examples