QuantizeLinear#
QuantizeLinear - 13#
Version
name: QuantizeLinear (GitHub)
domain: main
since_version: 13
function: False
support_level: SupportType.COMMON
shape inference: True
This version of the operator has been available since version 13.
Summary
The linear quantization operator. It consumes a high precision tensor, a scale, and a zero point to compute the low precision / quantized tensor. The scale factor and zero point must have same shape, and can be either a scalar for per-tensor / per layer quantization, or a 1-D tensor for per-axis quantization. The quantization formula is y = saturate ((x / y_scale) + y_zero_point). For saturation, it saturates to [0, 255] if it’s uint8, or [-128, 127] if it’s int8. For (x / y_scale), it’s rounding to nearest ties to even. Refer to https://en.wikipedia.org/wiki/Rounding for details. ‘y_zero_point’ and ‘y’ must have same type.
Attributes
axis: (Optional) The axis of the quantization dimension of the input tensor. Ignored for per-tensor quantization. Negative value means counting dimensions from the back. Accepted range is [-r, r-1] where r = rank(input). Default value is
1
.
Inputs
Between 2 and 3 inputs.
x (heterogeneous) - T1: N-D full precision Input tensor to be quantized.
y_scale (heterogeneous) - tensor(float): Scale for doing quantization to get ‘y’. It can be a scalar, which means per-tensor/layer quantization, or a 1-D Tensor for per-axis quantization.
y_zero_point (optional, heterogeneous) - T2: Zero point for doing quantization to get ‘y’. Shape must match y_scale. Default is uint8 with zero point of 0 if it’s not specified.
Outputs
y (heterogeneous) - T2: N-D quantized output tensor. It has same shape as input ‘x’.
Type Constraints
T1 in ( tensor(float), tensor(int32) ): Constrain ‘x’ to float or int32 tensor.
T2 in ( tensor(int8), tensor(uint8) ): Constrain ‘y_zero_point’ and ‘y’ to 8-bit integer tensor.
Examples
axis
node = onnx.helper.make_node('QuantizeLinear',
inputs=['x', 'y_scale', 'y_zero_point'],
outputs=['y'],)
x = np.array([[[[-162, 10],
[-100, 232],
[-20, -50]],
[[-76, 0],
[0, 252],
[32, -44]],
[[245, -485],
[-960, -270],
[-375, -470]], ], ], dtype=np.float32)
y_scale = np.array([2, 4, 5], dtype=np.float32)
y_zero_point = np.array([84, 24, 196], dtype=np.uint8)
y = (x / y_scale.reshape(1, 3, 1, 1) + y_zero_point.reshape(1, 3, 1, 1)).astype(np.uint8)
expect(node, inputs=[x, y_scale, y_zero_point], outputs=[y],
name='test_quantizelinear_axis')
Differences
0 | 0 | The linear per-tensor/layer quantization operator. It consumes a high precision tensor, a scale, a zero point to compute the low precision / quantized tensor. |
|
1 | The scale factor and zero point must have same shape, and can be either a scalar for per-tensor / per layer quantization, or a 1-D tensor for per-axis quantization. | ||
1 | 2 | The quantization formula is y = saturate ((x / y_scale) + y_zero_point). For saturation, it saturates to [0, 255] if it's uint8, or [-128, 127] if it's int8. |
|
3 | For saturation, it saturates to [0, 255] if it's uint8, or [-128, 127] if it's int8. | ||
2 | 4 | For (x / y_scale), it's rounding to nearest ties to even. Refer to https://en.wikipedia.org/wiki/Rounding for details. 'y_zero_point' and 'y' must have same type. | For (x / y_scale), it's rounding to nearest ties to even. Refer to https://en.wikipedia.org/wiki/Rounding for details. 'y_zero_point' and 'y' must have same type. |
3 | 5 |
|
|
6 | **Attributes** | ||
7 |
| ||
8 | * **axis**: | ||
9 | (Optional) The axis of the quantization dimension of the input | ||
10 | tensor. Ignored for per-tensor quantization. Negative value means | ||
11 | counting dimensions from the back. Accepted range is [-r, r-1] where | ||
12 | r = rank(input). Default value is 1. | ||
13 |
| ||
4 | 14 | **Inputs** | **Inputs** |
5 | 15 |
|
|
6 | 16 | Between 2 and 3 inputs. | Between 2 and 3 inputs. |
7 | 17 |
|
|
8 | 18 | * **x** (heterogeneous) - **T1**: | * **x** (heterogeneous) - **T1**: |
9 | 19 | N-D full precision Input tensor to be quantized. | N-D full precision Input tensor to be quantized. |
10 | 20 | * **y_scale** (heterogeneous) - **tensor(float)**: | * **y_scale** (heterogeneous) - **tensor(float)**: |
11 | 21 | Scale for doing quantization to get 'y'. It's a scalar, which means |
|
12 | 22 | a per-tensor/layer quantization. |
|
23 | quantization. | ||
13 | 24 | * **y_zero_point** (optional, heterogeneous) - **T2**: | * **y_zero_point** (optional, heterogeneous) - **T2**: |
14 | 25 | Zero point for doing quantization to get 'y'. It's a scalar, which |
|
15 | means a per-tensor/layer quantization. Default value is uint8 typed | ||
16 | 26 | 0 if it's not specified. |
|
27 | specified. | ||
17 | 28 |
|
|
18 | 29 | **Outputs** | **Outputs** |
19 | 30 |
|
|
20 | 31 | * **y** (heterogeneous) - **T2**: | * **y** (heterogeneous) - **T2**: |
21 | 32 | N-D quantized output tensor. It has same shape as input 'x'. | N-D quantized output tensor. It has same shape as input 'x'. |
22 | 33 |
|
|
23 | 34 | **Type Constraints** | **Type Constraints** |
24 | 35 |
|
|
25 | 36 | * **T1** in ( | * **T1** in ( |
26 | 37 | tensor(float), | tensor(float), |
27 | 38 | tensor(int32) | tensor(int32) |
28 | 39 | ): | ): |
29 | 40 | Constrain 'x' to float or int32 tensor. | Constrain 'x' to float or int32 tensor. |
30 | 41 | * **T2** in ( | * **T2** in ( |
31 | 42 | tensor(int8), | tensor(int8), |
32 | 43 | tensor(uint8) | tensor(uint8) |
33 | 44 | ): | ): |
34 | 45 | Constrain 'y_zero_point' and 'y' to 8-bit integer tensor. | Constrain 'y_zero_point' and 'y' to 8-bit integer tensor. |
QuantizeLinear - 10#
Version
name: QuantizeLinear (GitHub)
domain: main
since_version: 10
function: False
support_level: SupportType.COMMON
shape inference: True
This version of the operator has been available since version 10.
Summary
The linear per-tensor/layer quantization operator. It consumes a high precision tensor, a scale, a zero point to compute the low precision / quantized tensor. The quantization formula is y = saturate ((x / y_scale) + y_zero_point). For saturation, it saturates to [0, 255] if it’s uint8, or [-128, 127] if it’s int8. For (x / y_scale), it’s rounding to nearest ties to even. Refer to https://en.wikipedia.org/wiki/Rounding for details. ‘y_zero_point’ and ‘y’ must have same type.
Inputs
Between 2 and 3 inputs.
x (heterogeneous) - T1: N-D full precision Input tensor to be quantized.
y_scale (heterogeneous) - tensor(float): Scale for doing quantization to get ‘y’. It’s a scalar, which means a per-tensor/layer quantization.
y_zero_point (optional, heterogeneous) - T2: Zero point for doing quantization to get ‘y’. It’s a scalar, which means a per-tensor/layer quantization. Default value is uint8 typed 0 if it’s not specified.
Outputs
y (heterogeneous) - T2: N-D quantized output tensor. It has same shape as input ‘x’.
Type Constraints
T1 in ( tensor(float), tensor(int32) ): Constrain ‘x’ to float or int32 tensor.
T2 in ( tensor(int8), tensor(uint8) ): Constrain ‘y_zero_point’ and ‘y’ to 8-bit integer tensor.