QuantizeLinear#

QuantizeLinear - 13
QuantizeLinear - 10

QuantizeLinear - 13 #

Version

name: QuantizeLinear (GitHub)
domain: main
since_version: 13
function: False
support_level: SupportType.COMMON
shape inference: True

This version of the operator has been available since version 13.

Summary

The linear quantization operator. It consumes a high precision tensor, a scale, and a zero point to compute the low precision / quantized tensor. The scale factor and zero point must have same shape, and can be either a scalar for per-tensor / per layer quantization, or a 1-D tensor for per-axis quantization. The quantization formula is y = saturate ((x / y_scale) + y_zero_point). For saturation, it saturates to [0, 255] if it’s uint8, or [-128, 127] if it’s int8. For (x / y_scale), it’s rounding to nearest ties to even. Refer to https://en.wikipedia.org/wiki/Rounding for details. ‘y_zero_point’ and ‘y’ must have same type.

Attributes

axis: (Optional) The axis of the quantization dimension of the input tensor. Ignored for per-tensor quantization. Negative value means counting dimensions from the back. Accepted range is [-r, r-1] where r = rank(input). Default value is 1.

Inputs

Between 2 and 3 inputs.

x (heterogeneous) - T1: N-D full precision Input tensor to be quantized.
y_scale (heterogeneous) - tensor(float): Scale for doing quantization to get ‘y’. It can be a scalar, which means per-tensor/layer quantization, or a 1-D Tensor for per-axis quantization.
y_zero_point (optional, heterogeneous) - T2: Zero point for doing quantization to get ‘y’. Shape must match y_scale. Default is uint8 with zero point of 0 if it’s not specified.

Outputs

y (heterogeneous) - T2: N-D quantized output tensor. It has same shape as input ‘x’.

Type Constraints

T1 in ( tensor(float), tensor(int32) ): Constrain ‘x’ to float or int32 tensor.
T2 in ( tensor(int8), tensor(uint8) ): Constrain ‘y_zero_point’ and ‘y’ to 8-bit integer tensor.

Examples

axis

node = onnx.helper.make_node('QuantizeLinear',
                             inputs=['x', 'y_scale', 'y_zero_point'],
                             outputs=['y'],)

x = np.array([[[[-162, 10],
                [-100, 232],
                [-20, -50]],

               [[-76, 0],
                [0, 252],
                [32, -44]],

               [[245, -485],
                [-960, -270],
                [-375, -470]], ], ], dtype=np.float32)
y_scale = np.array([2, 4, 5], dtype=np.float32)
y_zero_point = np.array([84, 24, 196], dtype=np.uint8)
y = (x / y_scale.reshape(1, 3, 1, 1) + y_zero_point.reshape(1, 3, 1, 1)).astype(np.uint8)

expect(node, inputs=[x, y_scale, y_zero_point], outputs=[y],
       name='test_quantizelinear_axis')

Differences

`0`	`0`	`The linear per-tensor/layer quantization operator. It consumes a high precision tensor, a scale, a zero point to compute the low precision / quantized tensor.`	`The linear quantization operator. It consumes a high precision tensor, a scale, and a zero point to compute the low precision / quantized tensor.`
	`1`		`The scale factor and zero point must have same shape, and can be either a scalar for per-tensor / per layer quantization, or a 1-D tensor for per-axis quantization.`
`1`	`2`	`The quantization formula is y = saturate ((x / y_scale) + y_zero_point). For saturation, it saturates to [0, 255] if it's uint8, or [-128, 127] if it's int8.`	`The quantization formula is y = saturate ((x / y_scale) + y_zero_point).`
	`3`		`For saturation, it saturates to [0, 255] if it's uint8, or [-128, 127] if it's int8.`
`2`	`4`	`For (x / y_scale), it's rounding to nearest ties to even. Refer to https://en.wikipedia.org/wiki/Rounding for details. 'y_zero_point' and 'y' must have same type.`	`For (x / y_scale), it's rounding to nearest ties to even. Refer to https://en.wikipedia.org/wiki/Rounding for details. 'y_zero_point' and 'y' must have same type.`
`3`	`5`
	`6`		`Attributes`
	`7`
	`8`		`* axis:`
	`9`		`(Optional) The axis of the quantization dimension of the input`
	`10`		`tensor. Ignored for per-tensor quantization. Negative value means`
	`11`		`counting dimensions from the back. Accepted range is [-r, r-1] where`
	`12`		`r = rank(input). Default value is 1.`
	`13`
`4`	`14`	`Inputs`	`Inputs`
`5`	`15`
`6`	`16`	`Between 2 and 3 inputs.`	`Between 2 and 3 inputs.`
`7`	`17`
`8`	`18`	`* x (heterogeneous) - T1:`	`* x (heterogeneous) - T1:`
`9`	`19`	`N-D full precision Input tensor to be quantized.`	`N-D full precision Input tensor to be quantized.`
`10`	`20`	`* y_scale (heterogeneous) - tensor(float):`	`* y_scale (heterogeneous) - tensor(float):`
`11`	`21`	`Scale for doing quantization to get 'y'. It's a scalar, which means`	`Scale for doing quantization to get 'y'. It can be a scalar, which`
`12`	`22`	`a per-tensor/layer quantization.`	`means per-tensor/layer quantization, or a 1-D Tensor for per-axis`
	`23`		`quantization.`
`13`	`24`	`* y_zero_point (optional, heterogeneous) - T2:`	`* y_zero_point (optional, heterogeneous) - T2:`
`14`	`25`	`Zero point for doing quantization to get 'y'. It's a scalar, which`	`Zero point for doing quantization to get 'y'. Shape must match`
`15`		`means a per-tensor/layer quantization. Default value is uint8 typed`
`16`	`26`	`0 if it's not specified.`	`y_scale. Default is uint8 with zero point of 0 if it's not`
	`27`		`specified.`
`17`	`28`
`18`	`29`	`Outputs`	`Outputs`
`19`	`30`
`20`	`31`	`* y (heterogeneous) - T2:`	`* y (heterogeneous) - T2:`
`21`	`32`	`N-D quantized output tensor. It has same shape as input 'x'.`	`N-D quantized output tensor. It has same shape as input 'x'.`
`22`	`33`
`23`	`34`	`Type Constraints`	`Type Constraints`
`24`	`35`
`25`	`36`	`* T1 in (`	`* T1 in (`
`26`	`37`	`tensor(float),`	`tensor(float),`
`27`	`38`	`tensor(int32)`	`tensor(int32)`
`28`	`39`	`):`	`):`
`29`	`40`	`Constrain 'x' to float or int32 tensor.`	`Constrain 'x' to float or int32 tensor.`
`30`	`41`	`* T2 in (`	`* T2 in (`
`31`	`42`	`tensor(int8),`	`tensor(int8),`
`32`	`43`	`tensor(uint8)`	`tensor(uint8)`
`33`	`44`	`):`	`):`
`34`	`45`	`Constrain 'y_zero_point' and 'y' to 8-bit integer tensor.`	`Constrain 'y_zero_point' and 'y' to 8-bit integer tensor.`

QuantizeLinear - 10 #

Version

name: QuantizeLinear (GitHub)
domain: main
since_version: 10
function: False
support_level: SupportType.COMMON
shape inference: True

This version of the operator has been available since version 10.

Summary

The linear per-tensor/layer quantization operator. It consumes a high precision tensor, a scale, a zero point to compute the low precision / quantized tensor. The quantization formula is y = saturate ((x / y_scale) + y_zero_point). For saturation, it saturates to [0, 255] if it’s uint8, or [-128, 127] if it’s int8. For (x / y_scale), it’s rounding to nearest ties to even. Refer to https://en.wikipedia.org/wiki/Rounding for details. ‘y_zero_point’ and ‘y’ must have same type.