.. _op_ai_onnx_GRU-14:

GRU - version 14
================

This page documents version **14** of operator **GRU**. See :doc:`GRU` for the latest version (since version 22).

- **Domain**: ``ai.onnx``
- **Since version**: 14

Computes an one-layer GRU. This operator is usually supported via some custom
implementation such as CuDNN.

Notations:

* ``X`` - input tensor
* ``z`` - update gate
* ``r`` - reset gate
* ``h`` - hidden gate
* ``t`` - time step (t-1 means previous time step)
* ``W[zrh]`` - W parameter weight matrix for update, reset, and hidden gates
* ``R[zrh]`` - R recurrence weight matrix for update, reset, and hidden gates
* ``Wb[zrh]`` - W bias vectors for update, reset, and hidden gates
* ``Rb[zrh]`` - R bias vectors for update, reset, and hidden gates
* ``WB[zrh]`` - W parameter weight matrix for backward update, reset, and hidden gates
* ``RB[zrh]`` - R recurrence weight matrix for backward update, reset, and hidden gates
* ``WBb[zrh]`` - W bias vectors for backward update, reset, and hidden gates
* ``RBb[zrh]`` - R bias vectors for backward update, reset, and hidden gates
* ``H`` - Hidden state
* ``num_directions`` - 2 if direction == bidirectional else 1

Activation functions:

* Relu(x)                - max(0, x)
* Tanh(x)                - (1 - e^{-2x})/(1 + e^{-2x})
* Sigmoid(x)             - 1/(1 + e^{-x})

NOTE:

.. code-block:: text

    Below are optional

* Affine(x)              - alpha \* x + beta
* LeakyRelu(x)           - x if x >= 0 else alpha \* x
* ThresholdedRelu(x)     - x if x >= alpha else 0
* ScaledTanh(x)          - alpha \* Tanh(beta \* x)
* HardSigmoid(x)         - min(max(alpha \* x + beta, 0), 1)
* Elu(x)                 - x if x >= 0 else alpha \* (e^x - 1)
* Softsign(x)            - x/(1 + ``|x|``)
* Softplus(x)            - log(1 + e^x)

Equations (Default: f=Sigmoid, g=Tanh):

* zt = f(Xt\*(Wz^T) + Ht-1\*(Rz^T) + Wbz + Rbz)
* rt = f(Xt\*(Wr^T) + Ht-1\*(Rr^T) + Wbr + Rbr)
* ht = g(Xt\*(Wh^T) + (rt (.) Ht-1)\*(Rh^T) + Rbh + Wbh) # default, when linear_before_reset = 0
* ht = g(Xt\*(Wh^T) + (rt (.) (Ht-1\*(Rh^T) + Rbh)) + Wbh) # when linear_before_reset != 0
* Ht = (1 - zt) (.) ht + zt (.) Ht-1

**Inputs**

- **X** (*T*): The input sequences packed (and potentially padded) into one 3-D tensor with the shape of ``[seq_length, batch_size, input_size]``.
- **W** (*T*): The weight tensor for the gates. Concatenation of ``W[zrh]`` and ``WB[zrh]`` (if bidirectional) along dimension 0. This tensor has shape ``[num_directions, 3*hidden_size, input_size]``.
- **R** (*T*): The recurrence weight tensor. Concatenation of ``R[zrh]`` and ``RB[zrh]`` (if bidirectional) along dimension 0. This tensor has shape ``[num_directions, 3*hidden_size, hidden_size]``.
- **B** (*T*): The bias tensor for the gates. Concatenation of ``[Wb[zrh], Rb[zrh]]`` and ``[WBb[zrh], RBb[zrh]]`` (if bidirectional) along dimension 0. This tensor has shape ``[num_directions, 6*hidden_size]``. Optional: If not specified - assumed to be 0
- **sequence_lens** (*T1*): Optional tensor specifying lengths of the sequences in a batch. If not specified - assumed all sequences in the batch to have length ``seq_length``. It has shape ``[batch_size]``.
- **initial_h** (*T*): Optional initial value of the hidden. If not specified - assumed to be 0. It has shape ``[num_directions, batch_size, hidden_size]``.

**Outputs**

- **Y** (*T*): A tensor that concats all the intermediate output values of the hidden. It has shape ``[seq_length, num_directions, batch_size, hidden_size]``.
- **Y_h** (*T*): The last output value of the hidden. It has shape ``[num_directions, batch_size, hidden_size]``.

**Type Constraints**

- **T**: Constrain input and output types to float tensors.
  Allowed types: tensor(double), tensor(float), tensor(float16).
- **T1**: Constrain seq_lens to integer tensor.
  Allowed types: tensor(int32).

Differences with previous version (7)
-------------------------------------

**SchemaDiff**: ``GRU`` (domain ``'ai.onnx'``)

* old version: 7
* new version: 14
* breaking: no