DynamicQuantizeLinear

DynamicQuantizeLinear - 11

Version

This version of the operator has been available since version 11.

Summary

A Function to fuse calculation for Scale, Zero Point and FP32->8Bit convertion of FP32 Input data. Outputs Scale, ZeroPoint and Quantized Input for a given FP32 Input. Scale is calculated as:

y_scale = (max(x) - min(x))/(qmax - qmin)
* where qmax and qmin are max and min values for quantization range .i.e [0, 255] in case of uint8
* data range is adjusted to include 0.

Zero point is calculated as:

intermediate_zero_point = qmin - min(x)/y_scale
y_zero_point = cast(round(saturate(itermediate_zero_point)))
* where qmax and qmin are max and min values for quantization range .i.e [0, 255] in case of uint8
* for saturation, it saturates to [0, 255] if it's uint8, or [-127, 127] if it's int8. Right now only uint8 is supported.
* rounding to nearest ties to even.

Data quantization formula is:

y = saturate (round (x / y_scale) + y_zero_point)
* for saturation, it saturates to [0, 255] if it's uint8, or [-127, 127] if it's int8. Right now only uint8 is supported.
* rounding to nearest ties to even.

Inputs

  • x (heterogeneous) - T1: Input tensor

Outputs

  • y (heterogeneous) - T2: Quantized output tensor

  • y_scale (heterogeneous) - tensor(float): Output scale. It’s a scalar, which means a per-tensor/layer quantization.

  • y_zero_point (heterogeneous) - T2: Output zero point. It’s a scalar, which means a per-tensor/layer quantization.

Type Constraints

  • T1 in ( tensor(float) ): Constrain ‘x’ to float tensor.

  • T2 in ( tensor(uint8) ): Constrain ‘y_zero_point’ and ‘y’ to 8-bit unsigned integer tensor.

Examples

default

import numpy as np
import onnx

node = onnx.helper.make_node(
    "DynamicQuantizeLinear",
    inputs=["x"],
    outputs=["y", "y_scale", "y_zero_point"],
)

# expected scale 0.0196078438 and zero point 153
X = np.array([0, 2, -3, -2.5, 1.34, 0.5]).astype(np.float32)
x_min = np.minimum(0, np.min(X))
x_max = np.maximum(0, np.max(X))
Y_Scale = np.float32((x_max - x_min) / (255 - 0))  # uint8 -> [0, 255]
Y_ZeroPoint = np.clip(round((0 - x_min) / Y_Scale), 0, 255).astype(np.uint8)
Y = np.clip(np.round(X / Y_Scale) + Y_ZeroPoint, 0, 255).astype(np.uint8)

expect(
    node,
    inputs=[X],
    outputs=[Y, Y_Scale, Y_ZeroPoint],
    name="test_dynamicquantizelinear",
)

# expected scale 0.0156862754 and zero point 255
X = np.array([-1.0, -2.1, -1.3, -2.5, -3.34, -4.0]).astype(np.float32)
x_min = np.minimum(0, np.min(X))
x_max = np.maximum(0, np.max(X))
Y_Scale = np.float32((x_max - x_min) / (255 - 0))  # uint8 -> [0, 255]
Y_ZeroPoint = np.clip(round((0 - x_min) / Y_Scale), 0, 255).astype(np.uint8)
Y = np.clip(np.round(X / Y_Scale) + Y_ZeroPoint, 0, 255).astype(np.uint8)

expect(
    node,
    inputs=[X],
    outputs=[Y, Y_Scale, Y_ZeroPoint],
    name="test_dynamicquantizelinear_max_adjusted",
)

X = (
    np.array([1, 2.1, 1.3, 2.5, 3.34, 4.0, 1.5, 2.6, 3.9, 4.0, 3.0, 2.345])
    .astype(np.float32)
    .reshape((3, 4))
)

# expected scale 0.0156862754 and zero point 0
x_min = np.minimum(0, np.min(X))
x_max = np.maximum(0, np.max(X))
Y_Scale = np.float32((x_max - x_min) / (255 - 0))  # uint8 -> [0, 255]
Y_ZeroPoint = np.clip(round((0 - x_min) / Y_Scale), 0, 255).astype(np.uint8)
Y = np.clip(np.round(X / Y_Scale) + Y_ZeroPoint, 0, 255).astype(np.uint8)

expect(
    node,
    inputs=[X],
    outputs=[Y, Y_Scale, Y_ZeroPoint],
    name="test_dynamicquantizelinear_min_adjusted",
)