GRU - version 3#

This page documents version 3 of operator GRU. See GRU for the latest version (since version 22).

Computes an one-layer GRU. This operator is usually supported via some custom implementation such as CuDNN.

Notations:

X - input tensor
z - update gate
r - reset gate
h - hidden gate
t - time step (t-1 means previous time step)
W[zrh] - W parameter weight matrix for update, reset, and hidden gates
R[zrh] - R recurrence weight matrix for update, reset, and hidden gates
Wb[zrh] - W bias vectors for update, reset, and hidden gates
Rb[zrh] - R bias vectors for update, reset, and hidden gates
WB[zrh] - W parameter weight matrix for backward update, reset, and hidden gates
RB[zrh] - R recurrence weight matrix for backward update, reset, and hidden gates
WBb[zrh] - W bias vectors for backward update, reset, and hidden gates
RBb[zrh] - R bias vectors for backward update, reset, and hidden gates
H - Hidden state
num_directions - 2 if direction == bidirectional else 1

Activation functions:

NOTE:

Below are optional

Equations (Default: f=Sigmoid, g=Tanh):

zt = f(Xt*(Wz^T) + Ht-1*(Rz^T) + Wbz + Rbz)
rt = f(Xt*(Wr^T) + Ht-1*(Rr^T) + Wbr + Rbr)
ht = g(Xt*(Wh^T) + (rt (.) Ht-1)*(Rh^T) + Rbh + Wbh) # default, when linear_before_reset = 0
ht = g(Xt*(Wh^T) + (rt (.) (Ht-1*(Rh^T) + Rbh)) + Wbh) # when linear_before_reset != 0
Ht = (1 - zt) (.) ht + zt (.) Ht-1

Inputs

X (T): The input sequences packed (and potentially padded) into one 3-D tensor with the shape of [seq_length, batch_size, input_size].
W (T): The weight tensor for the gates. Concatenation of W[zrh] and WB[zrh] (if bidirectional) along dimension 0. This tensor has shape [num_directions, 3*hidden_size, input_size].
R (T): The recurrence weight tensor. Concatenation of R[zrh] and RB[zrh] (if bidirectional) along dimension 0. This tensor has shape [num_directions, 3*hidden_size, hidden_size].
B (T): The bias tensor for the gates. Concatenation of [Wb[zrh], Rb[zrh]] and [WBb[zrh], RBb[zrh]] (if bidirectional) along dimension 0. This tensor has shape [num_directions, 6*hidden_size]. Optional: If not specified - assumed to be 0
sequence_lens (T1): Optional tensor specifying lengths of the sequences in a batch. If not specified - assumed all sequences in the batch to have length seq_length. It has shape [batch_size].
initial_h (T): Optional initial value of the hidden. If not specified - assumed to be 0. It has shape [num_directions, batch_size, hidden_size].

Outputs

Y (T): A tensor that concats all the intermediate output values of the hidden. It has shape [seq_length, num_directions, batch_size, hidden_size]. It is optional if output_sequence is 0.
Y_h (T): The last output value of the hidden. It has shape [num_directions, batch_size, hidden_size].

Type Constraints

T: Constrain input and output types to float tensors. Allowed types: tensor(double), tensor(float), tensor(float16).
T1: Constrain seq_lens to integer tensor. Allowed types: tensor(int32).

Differences with previous version (1)#

SchemaDiff: GRU (domain 'ai.onnx')