DynamicQuantizeLinear#
DynamicQuantizeLinear - 11#
Version
domain: main
since_version: 11
function: True
support_level: SupportType.COMMON
shape inference: True
This version of the operator has been available since version 11.
Summary
A Function to fuse calculation for Scale, Zero Point and FP32->8Bit convertion of FP32 Input data. Outputs Scale, ZeroPoint and Quantized Input for a given FP32 Input. Scale is calculated as:
y_scale = (max(x) - min(x))/(qmax - qmin)
* where qmax and qmin are max and min values for quantization range .i.e [0, 255] in case of uint8
* data range is adjusted to include 0.
Zero point is calculated as:
intermediate_zero_point = qmin - min(x)/y_scale
y_zero_point = cast(round(saturate(itermediate_zero_point)))
* where qmax and qmin are max and min values for quantization range .i.e [0, 255] in case of uint8
* for saturation, it saturates to [0, 255] if it's uint8, or [-127, 127] if it's int8. Right now only uint8 is supported.
* rounding to nearest ties to even.
Data quantization formula is:
y = saturate (round (x / y_scale) + y_zero_point)
* for saturation, it saturates to [0, 255] if it's uint8, or [-127, 127] if it's int8. Right now only uint8 is supported.
* rounding to nearest ties to even.
Inputs
x (heterogeneous) - T1: Input tensor
Outputs
y (heterogeneous) - T2: Quantized output tensor
y_scale (heterogeneous) - tensor(float): Output scale. It’s a scalar, which means a per-tensor/layer quantization.
y_zero_point (heterogeneous) - T2: Output zero point. It’s a scalar, which means a per-tensor/layer quantization.
Type Constraints
T1 in ( tensor(float) ): Constrain ‘x’ to float tensor.
T2 in ( tensor(uint8) ): Constrain ‘y_zero_point’ and ‘y’ to 8-bit unsigned integer tensor.
Examples
default
import numpy as np
import onnx
node = onnx.helper.make_node(
"DynamicQuantizeLinear",
inputs=["x"],
outputs=["y", "y_scale", "y_zero_point"],
)
# expected scale 0.0196078438 and zero point 153
X = np.array([0, 2, -3, -2.5, 1.34, 0.5]).astype(np.float32)
x_min = np.minimum(0, np.min(X))
x_max = np.maximum(0, np.max(X))
Y_Scale = np.float32((x_max - x_min) / (255 - 0)) # uint8 -> [0, 255]
Y_ZeroPoint = np.clip(round((0 - x_min) / Y_Scale), 0, 255).astype(np.uint8)
Y = np.clip(np.round(X / Y_Scale) + Y_ZeroPoint, 0, 255).astype(np.uint8)
expect(
node,
inputs=[X],
outputs=[Y, Y_Scale, Y_ZeroPoint],
name="test_dynamicquantizelinear",
)
# expected scale 0.0156862754 and zero point 255
X = np.array([-1.0, -2.1, -1.3, -2.5, -3.34, -4.0]).astype(np.float32)
x_min = np.minimum(0, np.min(X))
x_max = np.maximum(0, np.max(X))
Y_Scale = np.float32((x_max - x_min) / (255 - 0)) # uint8 -> [0, 255]
Y_ZeroPoint = np.clip(round((0 - x_min) / Y_Scale), 0, 255).astype(np.uint8)
Y = np.clip(np.round(X / Y_Scale) + Y_ZeroPoint, 0, 255).astype(np.uint8)
expect(
node,
inputs=[X],
outputs=[Y, Y_Scale, Y_ZeroPoint],
name="test_dynamicquantizelinear_max_adjusted",
)
X = (
np.array([1, 2.1, 1.3, 2.5, 3.34, 4.0, 1.5, 2.6, 3.9, 4.0, 3.0, 2.345])
.astype(np.float32)
.reshape((3, 4))
)
# expected scale 0.0156862754 and zero point 0
x_min = np.minimum(0, np.min(X))
x_max = np.maximum(0, np.max(X))
Y_Scale = np.float32((x_max - x_min) / (255 - 0)) # uint8 -> [0, 255]
Y_ZeroPoint = np.clip(round((0 - x_min) / Y_Scale), 0, 255).astype(np.uint8)
Y = np.clip(np.round(X / Y_Scale) + Y_ZeroPoint, 0, 255).astype(np.uint8)
expect(
node,
inputs=[X],
outputs=[Y, Y_Scale, Y_ZeroPoint],
name="test_dynamicquantizelinear_min_adjusted",
)