GRU - version 7#
This page documents version 7 of operator GRU. See GRU for the latest version (since version 22).
Domain:
ai.onnxSince version: 7
Computes an one-layer GRU. This operator is usually supported via some custom implementation such as CuDNN.
Notations:
X- input tensorz- update gater- reset gateh- hidden gatet- time step (t-1 means previous time step)W[zrh]- W parameter weight matrix for update, reset, and hidden gatesR[zrh]- R recurrence weight matrix for update, reset, and hidden gatesWb[zrh]- W bias vectors for update, reset, and hidden gatesRb[zrh]- R bias vectors for update, reset, and hidden gatesWB[zrh]- W parameter weight matrix for backward update, reset, and hidden gatesRB[zrh]- R recurrence weight matrix for backward update, reset, and hidden gatesWBb[zrh]- W bias vectors for backward update, reset, and hidden gatesRBb[zrh]- R bias vectors for backward update, reset, and hidden gatesH- Hidden statenum_directions- 2 if direction == bidirectional else 1
Activation functions:
Relu(x) - max(0, x)
Tanh(x) - (1 - e^{-2x})/(1 + e^{-2x})
Sigmoid(x) - 1/(1 + e^{-x})
NOTE:
Below are optional
Affine(x) - alpha * x + beta
LeakyRelu(x) - x if x >= 0 else alpha * x
ThresholdedRelu(x) - x if x >= alpha else 0
ScaledTanh(x) - alpha * Tanh(beta * x)
HardSigmoid(x) - min(max(alpha * x + beta, 0), 1)
Elu(x) - x if x >= 0 else alpha * (e^x - 1)
Softsign(x) - x/(1 +
|x|)Softplus(x) - log(1 + e^x)
Equations (Default: f=Sigmoid, g=Tanh):
zt = f(Xt*(Wz^T) + Ht-1*(Rz^T) + Wbz + Rbz)
rt = f(Xt*(Wr^T) + Ht-1*(Rr^T) + Wbr + Rbr)
ht = g(Xt*(Wh^T) + (rt (.) Ht-1)*(Rh^T) + Rbh + Wbh) # default, when linear_before_reset = 0
ht = g(Xt*(Wh^T) + (rt (.) (Ht-1*(Rh^T) + Rbh)) + Wbh) # when linear_before_reset != 0
Ht = (1 - zt) (.) ht + zt (.) Ht-1
Inputs
X (T): The input sequences packed (and potentially padded) into one 3-D tensor with the shape of
[seq_length, batch_size, input_size].W (T): The weight tensor for the gates. Concatenation of
W[zrh]andWB[zrh](if bidirectional) along dimension 0. This tensor has shape[num_directions, 3*hidden_size, input_size].R (T): The recurrence weight tensor. Concatenation of
R[zrh]andRB[zrh](if bidirectional) along dimension 0. This tensor has shape[num_directions, 3*hidden_size, hidden_size].B (T): The bias tensor for the gates. Concatenation of
[Wb[zrh], Rb[zrh]]and[WBb[zrh], RBb[zrh]](if bidirectional) along dimension 0. This tensor has shape[num_directions, 6*hidden_size]. Optional: If not specified - assumed to be 0sequence_lens (T1): Optional tensor specifying lengths of the sequences in a batch. If not specified - assumed all sequences in the batch to have length
seq_length. It has shape[batch_size].initial_h (T): Optional initial value of the hidden. If not specified - assumed to be 0. It has shape
[num_directions, batch_size, hidden_size].
Outputs
Y (T): A tensor that concats all the intermediate output values of the hidden. It has shape
[seq_length, num_directions, batch_size, hidden_size].Y_h (T): The last output value of the hidden. It has shape
[num_directions, batch_size, hidden_size].
Type Constraints
T: Constrain input and output types to float tensors. Allowed types: tensor(double), tensor(float), tensor(float16).
T1: Constrain seq_lens to integer tensor. Allowed types: tensor(int32).
Differences with previous version (3)#
SchemaDiff: GRU (domain 'ai.onnx')
old version: 3
new version: 7
breaking: no