.. _op_ai_onnx_QuantizeLinear:

QuantizeLinear
==============

- **Domain**: ``ai.onnx``
- **Since version**: 25

The linear quantization operator consumes a high-precision tensor, a scale, and a zero point to compute the
low-precision/quantized tensor. The scale factor and zero point must have the same shape, determining the quantization
granularity. The quantization formula is ``y = saturate((x / y_scale) + y_zero_point)``.

Saturation is done according to:

- uint16: [0, 65535]
- int16: [-32768, 32767]
- uint8: [0, 255]
- int8: [-128, 127]
- uint4: [0, 15]
- int4: [-8, 7]
- uint2: [0, 3]
- int2: [-2, 1]

For ``(x / y_scale)``, it rounds to the nearest even. Refer to https://en.wikipedia.org/wiki/Rounding for details.

``y_zero_point`` and ``y`` must have the same type. ``y_zero_point`` is usually not used for quantization to float8 and 4bit types, but the quantization
formula remains the same for consistency, and the type of the attribute ``y_zero_point`` still determines the quantization type.
``x`` and ``y_scale`` are allowed to have different types. The type of ``y_scale`` determines the precision of the division operation between ``x`` and
``y_scale``, unless the ``precision`` attribute is specified.

There are three supported quantization granularities, determined by the shape of ``y_scale``.
In all cases, ``y_zero_point`` must have the same shape as ``y_scale``.

- Per-tensor (per-layer) quantization: ``y_scale`` is a scalar.
- Per-axis quantization: The scale must be a 1-D tensor, with the length of the quantization axis. For an input shape
  ``(D0, ..., Di, ..., Dn)`` and ``axis=i``, ``y_scale`` is a 1-D tensor of length ``Di``.
- Blocked quantization: The scale's shape is identical to the input's shape, except for one dimension, in which
  blocking is performed. Given ``x`` shape ``(D0, ..., Di, ..., Dn)``, ``axis=i``, and block size ``B``: ``y_scale`` shape is
  ``(D0, ..., ceil(Di/B), ..., Dn)``.

**Inputs**

- **x** (*T1*): N-D full precision Input tensor to be quantized.
- **y_scale** (*T2*): Scale for doing quantization to get ``y``. For per-tensor/layer quantization the scale is a scalar, for per-axis quantization it is a 1-D Tensor and for blocked quantization it has the same shape as the input, except for one dimension in which blocking is performed.
- **y_zero_point** (*T3*): Zero point for doing quantization to get ``y``. Shape must match ``y_scale``. Default is uint8 with zero point of 0 if it's not specified.

**Outputs**

- **y** (*T3*): N-D quantized output tensor. It has same shape as input ``x``.

**Type Constraints**

- **T1**: The type of the input 'x'.
  Allowed types: tensor(bfloat16), tensor(float), tensor(float16), tensor(int32).
- **T2**: The type of the input 'y_scale'.
  Allowed types: tensor(bfloat16), tensor(float), tensor(float16), tensor(float8e8m0), tensor(int32).
- **T3**: The type of the input ``y_zero_point`` and the output ``y``.
  Allowed types: tensor(float4e2m1), tensor(float8e4m3fn), tensor(float8e4m3fnuz), tensor(float8e5m2), tensor(float8e5m2fnuz), tensor(int16), tensor(int2), tensor(int4), tensor(int8), tensor(uint16), tensor(uint2), tensor(uint4), tensor(uint8).

Differences with previous version (24)
--------------------------------------

**SchemaDiff**: ``QuantizeLinear`` (domain ``'ai.onnx'``)

* old version: 24
* new version: 25
* breaking: no

**Type constraints:**

* changed 'T3': added types: ['tensor(int2)', 'tensor(uint2)']

**Documentation:**

* line similarity: 0.97 (+2/-0 lines)

.. code-block:: diff

    --- QuantizeLinear v24
    +++ QuantizeLinear v25
    @@ -10,6 +10,8 @@
     - int8: [-128, 127]
     - uint4: [0, 15]
     - int4: [-8, 7]
    +- uint2: [0, 3]
    +- int2: [-2, 1]
     
     For `(x / y_scale)`, it rounds to the nearest even. Refer to https://en.wikipedia.org/wiki/Rounding for details.
     

Version History
---------------

- :doc:`Version 24 <QuantizeLinear-24>`
- :doc:`Version 23 <QuantizeLinear-23>`
- :doc:`Version 21 <QuantizeLinear-21>`
- :doc:`Version 19 <QuantizeLinear-19>`
- :doc:`Version 13 <QuantizeLinear-13>`
- :doc:`Version 10 <QuantizeLinear-10>`