QuantizeLinear - 10 vs 13#

Next section compares an older to a newer version of the same operator after both definition are converted into markdown text. Green means an addition to the newer version, red means a deletion. Anything else is unchanged.

QuantizeLinear10 → QuantizeLinear13 RENAMED
@@ -1 +1 @@
1
- The linear quantization operator. It consumes a high precision tensor, a scale, and a zero point to compute the low precision / quantized tensor.
1
+ The linear per-tensor/layer quantization operator. It consumes a high precision tensor, a scale, a zero point to compute the low precision / quantized tensor.
2
+ The quantization formula is y = saturate ((x / y_scale) + y_zero_point). For saturation, it saturates to [0, 255] if it's uint8, or [-128, 127] if it's int8.
2
- The scale factor and zero point must have same shape, and can be either a scalar for per-tensor / per layer quantization, or a 1-D tensor for per-axis quantization.
3
- The quantization formula is y = saturate ((x / y_scale) + y_zero_point).
4
- For saturation, it saturates to [0, 255] if it's uint8, or [-128, 127] if it's int8.
5
3
  For (x / y_scale), it's rounding to nearest ties to even. Refer to https://en.wikipedia.org/wiki/Rounding for details. 'y_zero_point' and 'y' must have same type.
6
-
7
- **Attributes**
8
-
9
- * **axis**:
10
- (Optional) The axis of the quantization dimension of the input
11
- tensor. Ignored for per-tensor quantization. Negative value means
12
- counting dimensions from the back. Accepted range is [-r, r-1] where
13
- r = rank(input).
14
4
  **Inputs**
15
5
  Between 2 and 3 inputs.
16
6
  * **x** (heterogeneous) - **T1**:
17
7
  N-D full precision Input tensor to be quantized.
18
8
  * **y_scale** (heterogeneous) - **tensor(float)**:
19
- Scale for doing quantization to get 'y'. It can be a scalar, which
9
+ Scale for doing quantization to get 'y'. It's a scalar, which means
10
+ a per-tensor/layer quantization.
20
- means per-tensor/layer quantization, or a 1-D Tensor for per-axis
21
- quantization.
22
11
  * **y_zero_point** (optional, heterogeneous) - **T2**:
23
- Zero point for doing quantization to get 'y'. Shape must match
12
+ Zero point for doing quantization to get 'y'. It's a scalar, which
24
- y_scale. Default is uint8 with zero point of 0 if it's not
13
+ means a per-tensor/layer quantization. Default value is uint8 typed
25
- specified.
14
+ 0 if it's not specified.
26
15
  **Outputs**
27
16
  * **y** (heterogeneous) - **T2**:
28
17
  N-D quantized output tensor. It has same shape as input 'x'.
29
18
  **Type Constraints**
30
19
  * **T1** in (
31
20
  tensor(float),
32
21
  tensor(int32)
33
22
  ):
34
23
  Constrain 'x' to float or int32 tensor.
35
24
  * **T2** in (
36
25
  tensor(int8),
37
26
  tensor(uint8)
38
27
  ):
39
28
  Constrain 'y_zero_point' and 'y' to 8-bit integer tensor.