.. _op_ai_onnx_AveragePool-19:

AveragePool - version 19
========================

This page documents version **19** of operator **AveragePool**. See :doc:`AveragePool` for the latest version (since version 22).

- **Domain**: ``ai.onnx``
- **Since version**: 19

AveragePool consumes an input tensor X and applies average pooling across
the tensor according to kernel sizes, stride sizes, and pad lengths.
average pooling consisting of computing the average on all values of a
subset of the input tensor according to the kernel size and downsampling the
data into the output tensor Y for further processing. The output spatial shape is calculated differently
depending on whether explicit padding is used, where pads is employed, or auto padding is used, where auto_pad is utilized.
With explicit padding (https://pytorch.org/docs/stable/generated/torch.nn.MaxPool2d.html?highlight=maxpool#torch.nn.MaxPool2d):

.. code-block::

    output_spatial_shape[i] = floor((input_spatial_shape[i] + pad_shape[i] - dilation[i] * (kernel_shape[i] - 1) - 1) / strides_spatial_shape[i] + 1)

or

.. code-block::

    output_spatial_shape[i] = ceil((input_spatial_shape[i] + pad_shape[i] - dilation[i] * (kernel_shape[i] - 1) - 1) / strides_spatial_shape[i] + 1)

if ceil_mode is enabled. ``pad_shape[i]`` is the sum of pads along axis ``i``.

``auto_pad`` is a DEPRECATED attribute. If you are using them currently, the output spatial shape will be following when ceil_mode is enabled:

.. code-block::

    VALID: output_spatial_shape[i] = ceil((input_spatial_shape[i] - ((kernel_spatial_shape[i] - 1) * dilations[i] + 1) + 1) / strides_spatial_shape[i])
    SAME_UPPER or SAME_LOWER: output_spatial_shape[i] = ceil(input_spatial_shape[i] / strides_spatial_shape[i])

or when ceil_mode is disabled (https://www.tensorflow.org/api_docs/python/tf/keras/layers/AveragePooling2D):

.. code-block::

    VALID: output_spatial_shape[i] = floor((input_spatial_shape[i] - ((kernel_spatial_shape[i] - 1) * dilations[i] + 1)) / strides_spatial_shape[i]) + 1
    SAME_UPPER or SAME_LOWER: output_spatial_shape[i] = floor((input_spatial_shape[i] - 1) / strides_spatial_shape[i]) + 1

And pad shape will be following if ``SAME_UPPER`` or ``SAME_LOWER``:

.. code-block::

    pad_shape[i] = (output_spatial_shape[i] - 1) * strides_spatial_shape[i] + ((kernel_spatial_shape[i] - 1) * dilations[i] + 1) - input_spatial_shape[i]

The output of each pooling window is divided by the number of elements (exclude pad when attribute count_include_pad is zero).

**Inputs**

- **X** (*T*): Input data tensor from the previous operator; dimensions for image case are (N x C x H x W), where N is the batch size, C is the number of channels, and H and W are the height and the width of the data. For non image case, the dimensions are in the form of (N x C x D1 x D2 ... Dn), where N is the batch size. Optionally, if dimension denotation is in effect, the operation expects the input data tensor to arrive with the dimension denotation of [DATA_BATCH, DATA_CHANNEL, DATA_FEATURE, DATA_FEATURE ...].

**Outputs**

- **Y** (*T*): Output data tensor from average or max pooling across the input tensor. Dimensions will vary based on various kernel, stride, and pad sizes. Floor value of the dimension is used

**Type Constraints**

- **T**: Constrain input and output types to float tensors.
  Allowed types: tensor(double), tensor(float), tensor(float16).

Examples
--------

**test_cc_averagepool_1d_default**

.. code-block:: text

    Attributes:
      kernel_shape = [2]

.. code-block:: text

    Inputs:
      x: shape=(1, 1, 8), dtype=float32
        [[[1., 2., 3., 4., 5., 6., 7., 8.]]]

    Outputs:
      y: shape=(1, 1, 7), dtype=float32
        [[[1.5, 2.5, 3.5, 4.5, 5.5, 6.5, 7.5]]]

**test_cc_averagepool_2d_ceil**

.. code-block:: text

    Attributes:
      kernel_shape = [3, 3]
      strides = [2, 2]
      ceil_mode = 1

.. code-block:: text

    Inputs:
      x: shape=(1, 1, 4, 4), dtype=float32
        [[[[ 1.,  2.,  3.,  4.],
           [ 5.,  6.,  7.,  8.],
           [ 9., 10., 11., 12.],
           [13., 14., 15., 16.]]]]

    Outputs:
      y: shape=(1, 1, 2, 2), dtype=float32
        [[[[ 6. ,  7.5],
           [12. , 13.5]]]]

**test_cc_averagepool_2d_ceil_last_window_starts_on_pad**

.. code-block:: text

    Attributes:
      kernel_shape = [3, 3]
      strides = [3, 3]
      pads = [1, 1, 1, 1]
      ceil_mode = 1
      count_include_pad = 1

.. code-block:: text

    Inputs:
      x: shape=(1, 1, 2, 2), dtype=float32
        [[[[1., 2.],
           [3., 4.]]]]

    Outputs:
      y: shape=(1, 1, 1, 1), dtype=float32
        [[[[1.1111112]]]]

**test_cc_averagepool_2d_default**

.. code-block:: text

    Attributes:
      kernel_shape = [2, 2]

.. code-block:: text

    Inputs:
      x: shape=(1, 1, 4, 4), dtype=float32
        [[[[ 1.,  2.,  3.,  4.],
           [ 5.,  6.,  7.,  8.],
           [ 9., 10., 11., 12.],
           [13., 14., 15., 16.]]]]

    Outputs:
      y: shape=(1, 1, 3, 3), dtype=float32
        [[[[ 3.5,  4.5,  5.5],
           [ 7.5,  8.5,  9.5],
           [11.5, 12.5, 13.5]]]]

**test_cc_averagepool_2d_pads**

.. code-block:: text

    Attributes:
      kernel_shape = [3, 3]
      pads = [2, 2, 2, 2]

.. code-block:: text

    Inputs:
      x: shape=(1, 1, 4, 4), dtype=float32
        [[[[ 1.,  2.,  3.,  4.],
           [ 5.,  6.,  7.,  8.],
           [ 9., 10., 11., 12.],
           [13., 14., 15., 16.]]]]

    Outputs:
      y: shape=(1, 1, 6, 6), dtype=float32
        [[[[ 1. ,  1.5,  2. ,  3. ,  3.5,  4. ],
           [ 3. ,  3.5,  4. ,  5. ,  5.5,  6. ],
           [ 5. ,  5.5,  6. ,  7. ,  7.5,  8. ],
           [ 9. ,  9.5, 10. , 11. , 11.5, 12. ],
           [11. , 11.5, 12. , 13. , 13.5, 14. ],
           [13. , 13.5, 14. , 15. , 15.5, 16. ]]]]

**test_cc_averagepool_2d_pads_count_include_pad**

.. code-block:: text

    Attributes:
      kernel_shape = [3, 3]
      pads = [1, 1, 1, 1]
      count_include_pad = 1

.. code-block:: text

    Inputs:
      x: shape=(1, 1, 5, 5), dtype=float32
        [[[[ 1.,  2.,  3.,  4.,  5.],
           [ 6.,  7.,  8.,  9., 10.],
           [11., 12., 13., 14., 15.],
           [16., 17., 18., 19., 20.],
           [21., 22., 23., 24., 25.]]]]

    Outputs:
      y: shape=(1, 1, 5, 5), dtype=float32
        [[[[ 1.7777778,  3.       ,  3.6666667,  4.3333335,  3.1111112],
           [ 4.3333335,  7.       ,  8.       ,  9.       ,  6.3333335],
           [ 7.6666665, 12.       , 13.       , 14.       ,  9.666667 ],
           [11.       , 17.       , 18.       , 19.       , 13.       ],
           [ 8.444445 , 13.       , 13.666667 , 14.333333 ,  9.777778 ]]]]

**test_cc_averagepool_2d_precomputed_pads**

.. code-block:: text

    Attributes:
      kernel_shape = [5, 5]
      pads = [2, 2, 2, 2]

.. code-block:: text

    Inputs:
      x: shape=(1, 1, 5, 5), dtype=float32
        [[[[ 1.,  2.,  3.,  4.,  5.],
           [ 6.,  7.,  8.,  9., 10.],
           [11., 12., 13., 14., 15.],
           [16., 17., 18., 19., 20.],
           [21., 22., 23., 24., 25.]]]]

    Outputs:
      y: shape=(1, 1, 5, 5), dtype=float32
        [[[[ 7. ,  7.5,  8. ,  8.5,  9. ],
           [ 9.5, 10. , 10.5, 11. , 11.5],
           [12. , 12.5, 13. , 13.5, 14. ],
           [14.5, 15. , 15.5, 16. , 16.5],
           [17. , 17.5, 18. , 18.5, 19. ]]]]

**test_cc_averagepool_2d_precomputed_pads_count_include_pad**

.. code-block:: text

    Attributes:
      kernel_shape = [5, 5]
      pads = [2, 2, 2, 2]
      count_include_pad = 1

.. code-block:: text

    Inputs:
      x: shape=(1, 1, 5, 5), dtype=float32
        [[[[ 1.,  2.,  3.,  4.,  5.],
           [ 6.,  7.,  8.,  9., 10.],
           [11., 12., 13., 14., 15.],
           [16., 17., 18., 19., 20.],
           [21., 22., 23., 24., 25.]]]]

    Outputs:
      y: shape=(1, 1, 5, 5), dtype=float32
        [[[[ 2.52,  3.6 ,  4.8 ,  4.08,  3.24],
           [ 4.56,  6.4 ,  8.4 ,  7.04,  5.52],
           [ 7.2 , 10.  , 13.  , 10.8 ,  8.4 ],
           [ 6.96,  9.6 , 12.4 , 10.24,  7.92],
           [ 6.12,  8.4 , 10.8 ,  8.88,  6.84]]]]

**test_cc_averagepool_2d_precomputed_strides**

.. code-block:: text

    Attributes:
      kernel_shape = [2, 2]
      strides = [2, 2]

.. code-block:: text

    Inputs:
      x: shape=(1, 1, 5, 5), dtype=float32
        [[[[ 1.,  2.,  3.,  4.,  5.],
           [ 6.,  7.,  8.,  9., 10.],
           [11., 12., 13., 14., 15.],
           [16., 17., 18., 19., 20.],
           [21., 22., 23., 24., 25.]]]]

    Outputs:
      y: shape=(1, 1, 2, 2), dtype=float32
        [[[[ 4.,  6.],
           [14., 16.]]]]

**test_cc_averagepool_2d_strides**

.. code-block:: text

    Attributes:
      kernel_shape = [3, 3]
      strides = [2, 2]

.. code-block:: text

    Inputs:
      x: shape=(1, 1, 5, 5), dtype=float32
        [[[[ 1.,  2.,  3.,  4.,  5.],
           [ 6.,  7.,  8.,  9., 10.],
           [11., 12., 13., 14., 15.],
           [16., 17., 18., 19., 20.],
           [21., 22., 23., 24., 25.]]]]

    Outputs:
      y: shape=(1, 1, 2, 2), dtype=float32
        [[[[ 7.,  9.],
           [17., 19.]]]]

**test_cc_averagepool_3d_default**

.. code-block:: text

    Attributes:
      kernel_shape = [2, 2, 2]

.. code-block:: text

    Inputs:
      x: shape=(1, 1, 3, 3, 3), dtype=float32
        [[[[[ 1.,  2.,  3.],
            [ 4.,  5.,  6.],
            [ 7.,  8.,  9.]],
        
           [[10., 11., 12.],
            [13., 14., 15.],
            [16., 17., 18.]],
        
           [[19., 20., 21.],
            [22., 23., 24.],
            [25., 26., 27.]]]]]

    Outputs:
      y: shape=(1, 1, 2, 2, 2), dtype=float32
        [[[[[ 7.5,  8.5],
            [10.5, 11.5]],
        
           [[16.5, 17.5],
            [19.5, 20.5]]]]]

Differences with previous version (11)
--------------------------------------

**SchemaDiff**: ``AveragePool`` (domain ``'ai.onnx'``)

* old version: 11
* new version: 19
* breaking: no

**Documentation:**

* line similarity: 0.63 (+13/-10 lines)

.. code-block:: diff

    --- AveragePool v11
    +++ AveragePool v19
    @@ -3,28 +3,31 @@
      the tensor according to kernel sizes, stride sizes, and pad lengths.
      average pooling consisting of computing the average on all values of a
      subset of the input tensor according to the kernel size and downsampling the
    - data into the output tensor Y for further processing. The output spatial shape will be following:
    + data into the output tensor Y for further processing. The output spatial shape is calculated differently
    + depending on whether explicit padding is used, where pads is employed, or auto padding is used, where auto_pad is utilized.
    + With explicit padding (https://pytorch.org/docs/stable/generated/torch.nn.MaxPool2d.html?highlight=maxpool#torch.nn.MaxPool2d):
      ```
    - output_spatial_shape[i] = floor((input_spatial_shape[i] + pad_shape[i] - kernel_spatial_shape[i]) / strides_spatial_shape[i] + 1)
    + output_spatial_shape[i] = floor((input_spatial_shape[i] + pad_shape[i] - dilation[i] * (kernel_shape[i] - 1) - 1) / strides_spatial_shape[i] + 1)
      ```
      or
      ```
    - output_spatial_shape[i] = ceil((input_spatial_shape[i] + pad_shape[i] - kernel_spatial_shape[i]) / strides_spatial_shape[i] + 1)
    + output_spatial_shape[i] = ceil((input_spatial_shape[i] + pad_shape[i] - dilation[i] * (kernel_shape[i] - 1) - 1) / strides_spatial_shape[i] + 1)
      ```
    - if ceil_mode is enabled
    + if ceil_mode is enabled. `pad_shape[i]` is the sum of pads along axis `i`.
     
    + `auto_pad` is a DEPRECATED attribute. If you are using them currently, the output spatial shape will be following when ceil_mode is enabled:
      ```
    - * pad_shape[i] is sum of pads along axis i
    + VALID: output_spatial_shape[i] = ceil((input_spatial_shape[i] - ((kernel_spatial_shape[i] - 1) * dilations[i] + 1) + 1) / strides_spatial_shape[i])
    + SAME_UPPER or SAME_LOWER: output_spatial_shape[i] = ceil(input_spatial_shape[i] / strides_spatial_shape[i])
      ```
    -
    - `auto_pad` is a DEPRECATED attribute. If you are using them currently, the output spatial shape will be following:
    + or when ceil_mode is disabled (https://www.tensorflow.org/api_docs/python/tf/keras/layers/AveragePooling2D):
      ```
    - VALID: output_spatial_shape[i] = ceil((input_spatial_shape[i] - kernel_spatial_shape[i] + 1) / strides_spatial_shape[i])
    - SAME_UPPER or SAME_LOWER: output_spatial_shape[i] = ceil(input_spatial_shape[i] / strides_spatial_shape[i])
    + VALID: output_spatial_shape[i] = floor((input_spatial_shape[i] - ((kernel_spatial_shape[i] - 1) * dilations[i] + 1)) / strides_spatial_shape[i]) + 1
    + SAME_UPPER or SAME_LOWER: output_spatial_shape[i] = floor((input_spatial_shape[i] - 1) / strides_spatial_shape[i]) + 1
      ```
      And pad shape will be following if `SAME_UPPER` or `SAME_LOWER`:
      ```
    - pad_shape[i] = (output_spatial_shape[i] - 1) * strides_spatial_shape[i] + kernel_spatial_shape[i] - input_spatial_shape[i]
    + pad_shape[i] = (output_spatial_shape[i] - 1) * strides_spatial_shape[i] + ((kernel_spatial_shape[i] - 1) * dilations[i] + 1) - input_spatial_shape[i]
      ```
      The output of each pooling window is divided by the number of elements (exclude pad when attribute count_include_pad is zero).