GRU#

GRU - 14#

Version

  • name: GRU (GitHub)

  • domain: main

  • since_version: 14

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 14.

Summary

Computes an one-layer GRU. This operator is usually supported via some custom implementation such as CuDNN.

Notations:

X - input tensor

z - update gate

r - reset gate

h - hidden gate

t - time step (t-1 means previous time step)

W[zrh] - W parameter weight matrix for update, reset, and hidden gates

R[zrh] - R recurrence weight matrix for update, reset, and hidden gates

Wb[zrh] - W bias vectors for update, reset, and hidden gates

Rb[zrh] - R bias vectors for update, reset, and hidden gates

WB[zrh] - W parameter weight matrix for backward update, reset, and hidden gates

RB[zrh] - R recurrence weight matrix for backward update, reset, and hidden gates

WBb[zrh] - W bias vectors for backward update, reset, and hidden gates

RBb[zrh] - R bias vectors for backward update, reset, and hidden gates

H - Hidden state

num_directions - 2 if direction == bidirectional else 1

Activation functions:

Relu(x) - max(0, x)

Tanh(x) - (1 - e^{-2x})/(1 + e^{-2x})

Sigmoid(x) - 1/(1 + e^{-x})

(NOTE: Below are optional)

Affine(x) - alpha*x + beta

LeakyRelu(x) - x if x >= 0 else alpha * x

ThresholdedRelu(x) - x if x >= alpha else 0

ScaledTanh(x) - alpha*Tanh(beta*x)

HardSigmoid(x) - min(max(alpha*x + beta, 0), 1)

Elu(x) - x if x >= 0 else alpha*(e^x - 1)

Softsign(x) - x/(1 + |x|)

Softplus(x) - log(1 + e^x)

Equations (Default: f=Sigmoid, g=Tanh):

  • zt = f(Xt*(Wz^T) + Ht-1*(Rz^T) + Wbz + Rbz)

  • rt = f(Xt*(Wr^T) + Ht-1*(Rr^T) + Wbr + Rbr)

  • ht = g(Xt*(Wh^T) + (rt (.) Ht-1)*(Rh^T) + Rbh + Wbh) # default, when linear_before_reset = 0

  • ht = g(Xt*(Wh^T) + (rt (.) (Ht-1*(Rh^T) + Rbh)) + Wbh) # when linear_before_reset != 0

  • Ht = (1 - zt) (.) ht + zt (.) Ht-1

This operator has optional inputs/outputs. See ONNX for more details about the representation of optional arguments. An empty string may be used in the place of an actual argument’s name to indicate a missing argument. Trailing optional arguments (those not followed by an argument that is present) may also be simply omitted.

Attributes

  • activation_alpha: Optional scaling values used by some activation functions. The values are consumed in the order of activation functions, for example (f, g, h) in LSTM. Default values are the same as of corresponding ONNX operators.For example with LeakyRelu, the default alpha is 0.01.

  • activation_beta: Optional scaling values used by some activation functions. The values are consumed in the order of activation functions, for example (f, g, h) in LSTM. Default values are the same as of corresponding ONNX operators.

  • activations: A list of 2 (or 4 if bidirectional) activation functions for update, reset, and hidden gates. The activation functions must be one of the activation functions specified above. Optional: See the equations for default if not specified.

  • clip: Cell clip threshold. Clipping bounds the elements of a tensor in the range of [-threshold, +threshold] and is applied to the input of activations. No clip if not specified.

  • direction: Specify if the RNN is forward, reverse, or bidirectional. Must be one of forward (default), reverse, or bidirectional. Default value is 'forward'.

  • hidden_size: Number of neurons in the hidden layer

  • layout: The shape format of inputs X, initial_h and outputs Y, Y_h. If 0, the following shapes are expected: X.shape = [seq_length, batch_size, input_size], Y.shape = [seq_length, num_directions, batch_size, hidden_size], initial_h.shape = Y_h.shape = [num_directions, batch_size, hidden_size]. If 1, the following shapes are expected: X.shape = [batch_size, seq_length, input_size], Y.shape = [batch_size, seq_length, num_directions, hidden_size], initial_h.shape = Y_h.shape = [batch_size, num_directions, hidden_size]. Default value is 0.

  • linear_before_reset: When computing the output of the hidden gate, apply the linear transformation before multiplying by the output of the reset gate. Default value is 0.

Inputs

Between 3 and 6 inputs.

  • X (heterogeneous) - T: The input sequences packed (and potentially padded) into one 3-D tensor with the shape of [seq_length, batch_size, input_size].

  • W (heterogeneous) - T: The weight tensor for the gates. Concatenation of W[zrh] and WB[zrh] (if bidirectional) along dimension 0. This tensor has shape [num_directions, 3*hidden_size, input_size].

  • R (heterogeneous) - T: The recurrence weight tensor. Concatenation of R[zrh] and RB[zrh] (if bidirectional) along dimension 0. This tensor has shape [num_directions, 3*hidden_size, hidden_size].

  • B (optional, heterogeneous) - T: The bias tensor for the gates. Concatenation of [Wb[zrh], Rb[zrh]] and [WBb[zrh], RBb[zrh]] (if bidirectional) along dimension 0. This tensor has shape [num_directions, 6*hidden_size]. Optional: If not specified - assumed to be 0

  • sequence_lens (optional, heterogeneous) - T1: Optional tensor specifying lengths of the sequences in a batch. If not specified - assumed all sequences in the batch to have length seq_length. It has shape [batch_size].

  • initial_h (optional, heterogeneous) - T: Optional initial value of the hidden. If not specified - assumed to be 0. It has shape [num_directions, batch_size, hidden_size].

Outputs

Between 0 and 2 outputs.

  • Y (optional, heterogeneous) - T: A tensor that concats all the intermediate output values of the hidden. It has shape [seq_length, num_directions, batch_size, hidden_size].

  • Y_h (optional, heterogeneous) - T: The last output value of the hidden. It has shape [num_directions, batch_size, hidden_size].

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

  • T1 in ( tensor(int32) ): Constrain seq_lens to integer tensor.

Examples

defaults

input = np.array([[[1., 2.], [3., 4.], [5., 6.]]]).astype(np.float32)

input_size = 2
hidden_size = 5
weight_scale = 0.1
number_of_gates = 3

node = onnx.helper.make_node(
    'GRU',
    inputs=['X', 'W', 'R'],
    outputs=['', 'Y_h'],
    hidden_size=hidden_size
)

W = weight_scale * np.ones((1, number_of_gates * hidden_size, input_size)).astype(np.float32)
R = weight_scale * np.ones((1, number_of_gates * hidden_size, hidden_size)).astype(np.float32)

gru = GRU_Helper(X=input, W=W, R=R)
_, Y_h = gru.step()
expect(node, inputs=[input, W, R], outputs=[Y_h.astype(np.float32)], name='test_gru_defaults')

initial_bias

input = np.array([[[1., 2., 3.], [4., 5., 6.], [7., 8., 9.]]]).astype(np.float32)

input_size = 3
hidden_size = 3
weight_scale = 0.1
custom_bias = 0.1
number_of_gates = 3

node = onnx.helper.make_node(
    'GRU',
    inputs=['X', 'W', 'R', 'B'],
    outputs=['', 'Y_h'],
    hidden_size=hidden_size
)

W = weight_scale * np.ones((1, number_of_gates * hidden_size, input_size)).astype(np.float32)
R = weight_scale * np.ones((1, number_of_gates * hidden_size, hidden_size)).astype(np.float32)

# Adding custom bias
W_B = custom_bias * np.ones((1, number_of_gates * hidden_size)).astype(np.float32)
R_B = np.zeros((1, number_of_gates * hidden_size)).astype(np.float32)
B = np.concatenate((W_B, R_B), axis=1)

gru = GRU_Helper(X=input, W=W, R=R, B=B)
_, Y_h = gru.step()
expect(node, inputs=[input, W, R, B], outputs=[Y_h.astype(np.float32)], name='test_gru_with_initial_bias')

seq_length

input = np.array([[[1., 2., 3.], [4., 5., 6.], [7., 8., 9.]],
                  [[10., 11., 12.], [13., 14., 15.], [16., 17., 18.]]]).astype(np.float32)

input_size = 3
hidden_size = 5
number_of_gates = 3

node = onnx.helper.make_node(
    'GRU',
    inputs=['X', 'W', 'R', 'B'],
    outputs=['', 'Y_h'],
    hidden_size=hidden_size
)

W = np.random.randn(1, number_of_gates * hidden_size, input_size).astype(np.float32)
R = np.random.randn(1, number_of_gates * hidden_size, hidden_size).astype(np.float32)

# Adding custom bias
W_B = np.random.randn(1, number_of_gates * hidden_size).astype(np.float32)
R_B = np.random.randn(1, number_of_gates * hidden_size).astype(np.float32)
B = np.concatenate((W_B, R_B), axis=1)

gru = GRU_Helper(X=input, W=W, R=R, B=B)
_, Y_h = gru.step()
expect(node, inputs=[input, W, R, B], outputs=[Y_h.astype(np.float32)], name='test_gru_seq_length')

batchwise

input = np.array([[[1., 2.]], [[3., 4.]], [[5., 6.]]]).astype(np.float32)

input_size = 2
hidden_size = 6
number_of_gates = 3
weight_scale = 0.2
layout = 1

node = onnx.helper.make_node(
    'GRU',
    inputs=['X', 'W', 'R'],
    outputs=['Y', 'Y_h'],
    hidden_size=hidden_size,
    layout=layout
)

W = weight_scale * np.ones((1, number_of_gates * hidden_size, input_size)).astype(np.float32)
R = weight_scale * np.ones((1, number_of_gates * hidden_size, hidden_size)).astype(np.float32)

gru = GRU_Helper(X=input, W=W, R=R, layout=layout)
Y, Y_h = gru.step()
expect(node, inputs=[input, W, R], outputs=[Y.astype(np.float32), Y_h.astype(np.float32)], name='test_gru_batchwise')

Differences

00Computes an one-layer GRU. This operator is usually supported via some customComputes an one-layer GRU. This operator is usually supported via some custom
11implementation such as CuDNN.implementation such as CuDNN.
22
33Notations:Notations:
44
55X - input tensorX - input tensor
66
77z - update gatez - update gate
88
99r - reset gater - reset gate
1010
1111h - hidden gateh - hidden gate
1212
1313t - time step (t-1 means previous time step)t - time step (t-1 means previous time step)
1414
1515W[zrh] - W parameter weight matrix for update, reset, and hidden gatesW[zrh] - W parameter weight matrix for update, reset, and hidden gates
1616
1717R[zrh] - R recurrence weight matrix for update, reset, and hidden gatesR[zrh] - R recurrence weight matrix for update, reset, and hidden gates
1818
1919Wb[zrh] - W bias vectors for update, reset, and hidden gatesWb[zrh] - W bias vectors for update, reset, and hidden gates
2020
2121Rb[zrh] - R bias vectors for update, reset, and hidden gatesRb[zrh] - R bias vectors for update, reset, and hidden gates
2222
2323WB[zrh] - W parameter weight matrix for backward update, reset, and hidden gatesWB[zrh] - W parameter weight matrix for backward update, reset, and hidden gates
2424
2525RB[zrh] - R recurrence weight matrix for backward update, reset, and hidden gatesRB[zrh] - R recurrence weight matrix for backward update, reset, and hidden gates
2626
2727WBb[zrh] - W bias vectors for backward update, reset, and hidden gatesWBb[zrh] - W bias vectors for backward update, reset, and hidden gates
2828
2929RBb[zrh] - R bias vectors for backward update, reset, and hidden gatesRBb[zrh] - R bias vectors for backward update, reset, and hidden gates
3030
3131H - Hidden stateH - Hidden state
3232
3333num_directions - 2 if direction == bidirectional else 1num_directions - 2 if direction == bidirectional else 1
3434
3535Activation functions:Activation functions:
3636
3737 Relu(x) - max(0, x) Relu(x) - max(0, x)
3838
3939 Tanh(x) - (1 - e^{-2x})/(1 + e^{-2x}) Tanh(x) - (1 - e^{-2x})/(1 + e^{-2x})
4040
4141 Sigmoid(x) - 1/(1 + e^{-x}) Sigmoid(x) - 1/(1 + e^{-x})
4242
4343 (NOTE: Below are optional) (NOTE: Below are optional)
4444
4545 Affine(x) - alpha*x + beta Affine(x) - alpha*x + beta
4646
4747 LeakyRelu(x) - x if x >= 0 else alpha * x LeakyRelu(x) - x if x >= 0 else alpha * x
4848
4949 ThresholdedRelu(x) - x if x >= alpha else 0 ThresholdedRelu(x) - x if x >= alpha else 0
5050
5151 ScaledTanh(x) - alpha*Tanh(beta*x) ScaledTanh(x) - alpha*Tanh(beta*x)
5252
5353 HardSigmoid(x) - min(max(alpha*x + beta, 0), 1) HardSigmoid(x) - min(max(alpha*x + beta, 0), 1)
5454
5555 Elu(x) - x if x >= 0 else alpha*(e^x - 1) Elu(x) - x if x >= 0 else alpha*(e^x - 1)
5656
5757 Softsign(x) - x/(1 + |x|) Softsign(x) - x/(1 + |x|)
5858
5959 Softplus(x) - log(1 + e^x) Softplus(x) - log(1 + e^x)
6060
6161Equations (Default: f=Sigmoid, g=Tanh):Equations (Default: f=Sigmoid, g=Tanh):
6262
6363 - zt = f(Xt*(Wz^T) + Ht-1*(Rz^T) + Wbz + Rbz) - zt = f(Xt*(Wz^T) + Ht-1*(Rz^T) + Wbz + Rbz)
6464
6565 - rt = f(Xt*(Wr^T) + Ht-1*(Rr^T) + Wbr + Rbr) - rt = f(Xt*(Wr^T) + Ht-1*(Rr^T) + Wbr + Rbr)
6666
6767 - ht = g(Xt*(Wh^T) + (rt (.) Ht-1)*(Rh^T) + Rbh + Wbh) # default, when linear_before_reset = 0 - ht = g(Xt*(Wh^T) + (rt (.) Ht-1)*(Rh^T) + Rbh + Wbh) # default, when linear_before_reset = 0
6868
6969 - ht = g(Xt*(Wh^T) + (rt (.) (Ht-1*(Rh^T) + Rbh)) + Wbh) # when linear_before_reset != 0 - ht = g(Xt*(Wh^T) + (rt (.) (Ht-1*(Rh^T) + Rbh)) + Wbh) # when linear_before_reset != 0
7070
7171 - Ht = (1 - zt) (.) ht + zt (.) Ht-1 - Ht = (1 - zt) (.) ht + zt (.) Ht-1
7272This operator has **optional** inputs/outputs. See ONNX _ for more details about the representation of optional arguments. An empty string may be used in the place of an actual argument's name to indicate a missing argument. Trailing optional arguments (those not followed by an argument that is present) may also be simply omitted.This operator has **optional** inputs/outputs. See ONNX _ for more details about the representation of optional arguments. An empty string may be used in the place of an actual argument's name to indicate a missing argument. Trailing optional arguments (those not followed by an argument that is present) may also be simply omitted.
7373
7474**Attributes****Attributes**
7575
7676* **activation_alpha**:* **activation_alpha**:
7777 Optional scaling values used by some activation functions. The Optional scaling values used by some activation functions. The
7878 values are consumed in the order of activation functions, for values are consumed in the order of activation functions, for
7979 example (f, g, h) in LSTM. Default values are the same as of example (f, g, h) in LSTM. Default values are the same as of
8080 corresponding ONNX operators.For example with LeakyRelu, the default corresponding ONNX operators.For example with LeakyRelu, the default
8181 alpha is 0.01. alpha is 0.01.
8282* **activation_beta**:* **activation_beta**:
8383 Optional scaling values used by some activation functions. The Optional scaling values used by some activation functions. The
8484 values are consumed in the order of activation functions, for values are consumed in the order of activation functions, for
8585 example (f, g, h) in LSTM. Default values are the same as of example (f, g, h) in LSTM. Default values are the same as of
8686 corresponding ONNX operators. corresponding ONNX operators.
8787* **activations**:* **activations**:
8888 A list of 2 (or 4 if bidirectional) activation functions for update, A list of 2 (or 4 if bidirectional) activation functions for update,
8989 reset, and hidden gates. The activation functions must be one of the reset, and hidden gates. The activation functions must be one of the
9090 activation functions specified above. Optional: See the equations activation functions specified above. Optional: See the equations
9191 for default if not specified. for default if not specified.
9292* **clip**:* **clip**:
9393 Cell clip threshold. Clipping bounds the elements of a tensor in the Cell clip threshold. Clipping bounds the elements of a tensor in the
9494 range of [-threshold, +threshold] and is applied to the input of range of [-threshold, +threshold] and is applied to the input of
9595 activations. No clip if not specified. activations. No clip if not specified.
9696* **direction**:* **direction**:
9797 Specify if the RNN is forward, reverse, or bidirectional. Must be Specify if the RNN is forward, reverse, or bidirectional. Must be
9898 one of forward (default), reverse, or bidirectional. Default value is 'forward'. one of forward (default), reverse, or bidirectional. Default value is 'forward'.
9999* **hidden_size**:* **hidden_size**:
100100 Number of neurons in the hidden layer Number of neurons in the hidden layer
101* **layout**:
102 The shape format of inputs X, initial_h and outputs Y, Y_h. If 0,
103 the following shapes are expected: X.shape = [seq_length,
104 batch_size, input_size], Y.shape = [seq_length, num_directions,
105 batch_size, hidden_size], initial_h.shape = Y_h.shape =
106 [num_directions, batch_size, hidden_size]. If 1, the following
107 shapes are expected: X.shape = [batch_size, seq_length, input_size],
108 Y.shape = [batch_size, seq_length, num_directions, hidden_size],
109 initial_h.shape = Y_h.shape = [batch_size, num_directions,
110 hidden_size]. Default value is 0.
101111* **linear_before_reset**:* **linear_before_reset**:
102112 When computing the output of the hidden gate, apply the linear When computing the output of the hidden gate, apply the linear
103113 transformation before multiplying by the output of the reset gate. Default value is 0. transformation before multiplying by the output of the reset gate. Default value is 0.
104114
105115**Inputs****Inputs**
106116
107117Between 3 and 6 inputs.Between 3 and 6 inputs.
108118
109119* **X** (heterogeneous) - **T**:* **X** (heterogeneous) - **T**:
110120 The input sequences packed (and potentially padded) into one 3-D The input sequences packed (and potentially padded) into one 3-D
111121 tensor with the shape of [seq_length, batch_size, input_size]. tensor with the shape of [seq_length, batch_size, input_size].
112122* **W** (heterogeneous) - **T**:* **W** (heterogeneous) - **T**:
113123 The weight tensor for the gates. Concatenation of W[zrh] and The weight tensor for the gates. Concatenation of W[zrh] and
114124 WB[zrh] (if bidirectional) along dimension 0. This tensor has WB[zrh] (if bidirectional) along dimension 0. This tensor has
115125 shape [num_directions, 3*hidden_size, input_size]. shape [num_directions, 3*hidden_size, input_size].
116126* **R** (heterogeneous) - **T**:* **R** (heterogeneous) - **T**:
117127 The recurrence weight tensor. Concatenation of R[zrh] and The recurrence weight tensor. Concatenation of R[zrh] and
118128 RB[zrh] (if bidirectional) along dimension 0. This tensor has RB[zrh] (if bidirectional) along dimension 0. This tensor has
119129 shape [num_directions, 3*hidden_size, hidden_size]. shape [num_directions, 3*hidden_size, hidden_size].
120130* **B** (optional, heterogeneous) - **T**:* **B** (optional, heterogeneous) - **T**:
121131 The bias tensor for the gates. Concatenation of [Wb[zrh], Rb[zrh]] The bias tensor for the gates. Concatenation of [Wb[zrh], Rb[zrh]]
122132 and [WBb[zrh], RBb[zrh]] (if bidirectional) along dimension 0. and [WBb[zrh], RBb[zrh]] (if bidirectional) along dimension 0.
123133 This tensor has shape [num_directions, 6*hidden_size]. Optional: This tensor has shape [num_directions, 6*hidden_size]. Optional:
124134 If not specified - assumed to be 0 If not specified - assumed to be 0
125135* **sequence_lens** (optional, heterogeneous) - **T1**:* **sequence_lens** (optional, heterogeneous) - **T1**:
126136 Optional tensor specifying lengths of the sequences in a batch. If Optional tensor specifying lengths of the sequences in a batch. If
127137 not specified - assumed all sequences in the batch to have length not specified - assumed all sequences in the batch to have length
128138 seq_length. It has shape [batch_size]. seq_length. It has shape [batch_size].
129139* **initial_h** (optional, heterogeneous) - **T**:* **initial_h** (optional, heterogeneous) - **T**:
130140 Optional initial value of the hidden. If not specified - assumed to Optional initial value of the hidden. If not specified - assumed to
131141 be 0. It has shape [num_directions, batch_size, hidden_size]. be 0. It has shape [num_directions, batch_size, hidden_size].
132142
133143**Outputs****Outputs**
134144
135145Between 0 and 2 outputs.Between 0 and 2 outputs.
136146
137147* **Y** (optional, heterogeneous) - **T**:* **Y** (optional, heterogeneous) - **T**:
138148 A tensor that concats all the intermediate output values of the A tensor that concats all the intermediate output values of the
139149 hidden. It has shape [seq_length, num_directions, batch_size, hidden. It has shape [seq_length, num_directions, batch_size,
140150 hidden_size]. hidden_size].
141151* **Y_h** (optional, heterogeneous) - **T**:* **Y_h** (optional, heterogeneous) - **T**:
142152 The last output value of the hidden. It has shape [num_directions, The last output value of the hidden. It has shape [num_directions,
143153 batch_size, hidden_size]. batch_size, hidden_size].
144154
145155**Type Constraints****Type Constraints**
146156
147157* **T** in (* **T** in (
148158 tensor(double), tensor(double),
149159 tensor(float), tensor(float),
150160 tensor(float16) tensor(float16)
151161 ): ):
152162 Constrain input and output types to float tensors. Constrain input and output types to float tensors.
153163* **T1** in (* **T1** in (
154164 tensor(int32) tensor(int32)
155165 ): ):
156166 Constrain seq_lens to integer tensor. Constrain seq_lens to integer tensor.

GRU - 7#

Version

  • name: GRU (GitHub)

  • domain: main

  • since_version: 7

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 7.

Summary

Computes an one-layer GRU. This operator is usually supported via some custom implementation such as CuDNN.

Notations:

X - input tensor

z - update gate

r - reset gate

h - hidden gate

t - time step (t-1 means previous time step)

W[zrh] - W parameter weight matrix for update, reset, and hidden gates

R[zrh] - R recurrence weight matrix for update, reset, and hidden gates

Wb[zrh] - W bias vectors for update, reset, and hidden gates

Rb[zrh] - R bias vectors for update, reset, and hidden gates

WB[zrh] - W parameter weight matrix for backward update, reset, and hidden gates

RB[zrh] - R recurrence weight matrix for backward update, reset, and hidden gates

WBb[zrh] - W bias vectors for backward update, reset, and hidden gates

RBb[zrh] - R bias vectors for backward update, reset, and hidden gates

H - Hidden state

num_directions - 2 if direction == bidirectional else 1

Activation functions:

Relu(x) - max(0, x)

Tanh(x) - (1 - e^{-2x})/(1 + e^{-2x})

Sigmoid(x) - 1/(1 + e^{-x})

(NOTE: Below are optional)

Affine(x) - alpha*x + beta

LeakyRelu(x) - x if x >= 0 else alpha * x

ThresholdedRelu(x) - x if x >= alpha else 0

ScaledTanh(x) - alpha*Tanh(beta*x)

HardSigmoid(x) - min(max(alpha*x + beta, 0), 1)

Elu(x) - x if x >= 0 else alpha*(e^x - 1)

Softsign(x) - x/(1 + |x|)

Softplus(x) - log(1 + e^x)

Equations (Default: f=Sigmoid, g=Tanh):

  • zt = f(Xt*(Wz^T) + Ht-1*(Rz^T) + Wbz + Rbz)

  • rt = f(Xt*(Wr^T) + Ht-1*(Rr^T) + Wbr + Rbr)

  • ht = g(Xt*(Wh^T) + (rt (.) Ht-1)*(Rh^T) + Rbh + Wbh) # default, when linear_before_reset = 0

  • ht = g(Xt*(Wh^T) + (rt (.) (Ht-1*(Rh^T) + Rbh)) + Wbh) # when linear_before_reset != 0

  • Ht = (1 - zt) (.) ht + zt (.) Ht-1

This operator has optional inputs/outputs. See ONNX for more details about the representation of optional arguments. An empty string may be used in the place of an actual argument’s name to indicate a missing argument. Trailing optional arguments (those not followed by an argument that is present) may also be simply omitted.

Attributes

  • activation_alpha: Optional scaling values used by some activation functions. The values are consumed in the order of activation functions, for example (f, g, h) in LSTM. Default values are the same as of corresponding ONNX operators.For example with LeakyRelu, the default alpha is 0.01.

  • activation_beta: Optional scaling values used by some activation functions. The values are consumed in the order of activation functions, for example (f, g, h) in LSTM. Default values are the same as of corresponding ONNX operators.

  • activations: A list of 2 (or 4 if bidirectional) activation functions for update, reset, and hidden gates. The activation functions must be one of the activation functions specified above. Optional: See the equations for default if not specified.

  • clip: Cell clip threshold. Clipping bounds the elements of a tensor in the range of [-threshold, +threshold] and is applied to the input of activations. No clip if not specified.

  • direction: Specify if the RNN is forward, reverse, or bidirectional. Must be one of forward (default), reverse, or bidirectional. Default value is 'forward'.

  • hidden_size: Number of neurons in the hidden layer

  • linear_before_reset: When computing the output of the hidden gate, apply the linear transformation before multiplying by the output of the reset gate. Default value is 0.

Inputs

Between 3 and 6 inputs.

  • X (heterogeneous) - T: The input sequences packed (and potentially padded) into one 3-D tensor with the shape of [seq_length, batch_size, input_size].

  • W (heterogeneous) - T: The weight tensor for the gates. Concatenation of W[zrh] and WB[zrh] (if bidirectional) along dimension 0. This tensor has shape [num_directions, 3*hidden_size, input_size].

  • R (heterogeneous) - T: The recurrence weight tensor. Concatenation of R[zrh] and RB[zrh] (if bidirectional) along dimension 0. This tensor has shape [num_directions, 3*hidden_size, hidden_size].

  • B (optional, heterogeneous) - T: The bias tensor for the gates. Concatenation of [Wb[zrh], Rb[zrh]] and [WBb[zrh], RBb[zrh]] (if bidirectional) along dimension 0. This tensor has shape [num_directions, 6*hidden_size]. Optional: If not specified - assumed to be 0

  • sequence_lens (optional, heterogeneous) - T1: Optional tensor specifying lengths of the sequences in a batch. If not specified - assumed all sequences in the batch to have length seq_length. It has shape [batch_size].

  • initial_h (optional, heterogeneous) - T: Optional initial value of the hidden. If not specified - assumed to be 0. It has shape [num_directions, batch_size, hidden_size].

Outputs

Between 0 and 2 outputs.

  • Y (optional, heterogeneous) - T: A tensor that concats all the intermediate output values of the hidden. It has shape [seq_length, num_directions, batch_size, hidden_size].

  • Y_h (optional, heterogeneous) - T: The last output value of the hidden. It has shape [num_directions, batch_size, hidden_size].

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

  • T1 in ( tensor(int32) ): Constrain seq_lens to integer tensor.

Differences

00Computes an one-layer GRU. This operator is usually supported via some customComputes an one-layer GRU. This operator is usually supported via some custom
11implementation such as CuDNN.implementation such as CuDNN.
22
33Notations:Notations:
44
55X - input tensorX - input tensor
66
77z - update gatez - update gate
88
99r - reset gater - reset gate
1010
1111h - hidden gateh - hidden gate
1212
1313t - time step (t-1 means previous time step)t - time step (t-1 means previous time step)
1414
1515W[zrh] - W parameter weight matrix for update, reset, and hidden gatesW[zrh] - W parameter weight matrix for update, reset, and hidden gates
1616
1717R[zrh] - R recurrence weight matrix for update, reset, and hidden gatesR[zrh] - R recurrence weight matrix for update, reset, and hidden gates
1818
1919Wb[zrh] - W bias vectors for update, reset, and hidden gatesWb[zrh] - W bias vectors for update, reset, and hidden gates
2020
2121Rb[zrh] - R bias vectors for update, reset, and hidden gatesRb[zrh] - R bias vectors for update, reset, and hidden gates
2222
2323WB[zrh] - W parameter weight matrix for backward update, reset, and hidden gatesWB[zrh] - W parameter weight matrix for backward update, reset, and hidden gates
2424
2525RB[zrh] - R recurrence weight matrix for backward update, reset, and hidden gatesRB[zrh] - R recurrence weight matrix for backward update, reset, and hidden gates
2626
2727WBb[zrh] - W bias vectors for backward update, reset, and hidden gatesWBb[zrh] - W bias vectors for backward update, reset, and hidden gates
2828
2929RBb[zrh] - R bias vectors for backward update, reset, and hidden gatesRBb[zrh] - R bias vectors for backward update, reset, and hidden gates
3030
3131H - Hidden stateH - Hidden state
3232
3333num_directions - 2 if direction == bidirectional else 1num_directions - 2 if direction == bidirectional else 1
3434
3535Activation functions:Activation functions:
3636
3737 Relu(x) - max(0, x) Relu(x) - max(0, x)
3838
3939 Tanh(x) - (1 - e^{-2x})/(1 + e^{-2x}) Tanh(x) - (1 - e^{-2x})/(1 + e^{-2x})
4040
4141 Sigmoid(x) - 1/(1 + e^{-x}) Sigmoid(x) - 1/(1 + e^{-x})
4242
4343 (NOTE: Below are optional) (NOTE: Below are optional)
4444
4545 Affine(x) - alpha*x + beta Affine(x) - alpha*x + beta
4646
4747 LeakyRelu(x) - x if x >= 0 else alpha * x LeakyRelu(x) - x if x >= 0 else alpha * x
4848
4949 ThresholdedRelu(x) - x if x >= alpha else 0 ThresholdedRelu(x) - x if x >= alpha else 0
5050
5151 ScaledTanh(x) - alpha*Tanh(beta*x) ScaledTanh(x) - alpha*Tanh(beta*x)
5252
5353 HardSigmoid(x) - min(max(alpha*x + beta, 0), 1) HardSigmoid(x) - min(max(alpha*x + beta, 0), 1)
5454
5555 Elu(x) - x if x >= 0 else alpha*(e^x - 1) Elu(x) - x if x >= 0 else alpha*(e^x - 1)
5656
5757 Softsign(x) - x/(1 + |x|) Softsign(x) - x/(1 + |x|)
5858
5959 Softplus(x) - log(1 + e^x) Softplus(x) - log(1 + e^x)
6060
6161Equations (Default: f=Sigmoid, g=Tanh):Equations (Default: f=Sigmoid, g=Tanh):
6262
6363 - zt = f(Xt*(Wz^T) + Ht-1*Rz + Wbz + Rbz) - zt = f(Xt*(Wz^T) + Ht-1*(Rz^T) + Wbz + Rbz)
6464
6565 - rt = f(Xt*(Wr^T) + Ht-1*Rr + Wbr + Rbr) - rt = f(Xt*(Wr^T) + Ht-1*(Rr^T) + Wbr + Rbr)
6666
6767 - ht = g(Xt*(Wh^T) + (rt (.) Ht-1)*Rh + Rbh + Wbh) # default, when linear_before_reset = 0 - ht = g(Xt*(Wh^T) + (rt (.) Ht-1)*(Rh^T) + Rbh + Wbh) # default, when linear_before_reset = 0
6868
6969 - ht = g(Xt*(Wh^T) + (rt (.) (Ht-1*Rh + Rbh) + Wbh) # when linear_before_reset != 0 - ht = g(Xt*(Wh^T) + (rt (.) (Ht-1*(Rh^T) + Rbh)) + Wbh) # when linear_before_reset != 0
7070
71 - Ht = (1 - zt) (.) ht + zt (.) Ht-1
7172 - Ht = (1 - zt) (.) ht + zt (.) Ht-1This operator has **optional** inputs/outputs. See ONNX <https://github.com/onnx/onnx/blob/master/docs/IR.md>_ for more details about the representation of optional arguments. An empty string may be used in the place of an actual argument's name to indicate a missing argument. Trailing optional arguments (those not followed by an argument that is present) may also be simply omitted.
7273
7374**Attributes****Attributes**
7475
7576* **activation_alpha**:* **activation_alpha**:
7677 Optional scaling values used by some activation functions. The Optional scaling values used by some activation functions. The
7778 values are consumed in the order of activation functions, for values are consumed in the order of activation functions, for
7879 example (f, g, h) in LSTM. Default values are the same as of example (f, g, h) in LSTM. Default values are the same as of
7980 corresponding ONNX operators.For example with LeakyRelu, the default corresponding ONNX operators.For example with LeakyRelu, the default
8081 alpha is 0.01. alpha is 0.01.
8182* **activation_beta**:* **activation_beta**:
8283 Optional scaling values used by some activation functions. The Optional scaling values used by some activation functions. The
8384 values are consumed in the order of activation functions, for values are consumed in the order of activation functions, for
8485 example (f, g, h) in LSTM. Default values are the same as of example (f, g, h) in LSTM. Default values are the same as of
8586 corresponding ONNX operators. corresponding ONNX operators.
8687* **activations**:* **activations**:
8788 A list of 2 (or 4 if bidirectional) activation functions for update, A list of 2 (or 4 if bidirectional) activation functions for update,
8889 reset, and hidden gates. The activation functions must be one of the reset, and hidden gates. The activation functions must be one of the
8990 activation functions specified above. Optional: See the equations activation functions specified above. Optional: See the equations
9091 for default if not specified. for default if not specified.
9192* **clip**:* **clip**:
9293 Cell clip threshold. Clipping bounds the elements of a tensor in the Cell clip threshold. Clipping bounds the elements of a tensor in the
9394 range of [-threshold, +threshold] and is applied to the input of range of [-threshold, +threshold] and is applied to the input of
9495 activations. No clip if not specified. activations. No clip if not specified.
9596* **direction**:* **direction**:
9697 Specify if the RNN is forward, reverse, or bidirectional. Must be Specify if the RNN is forward, reverse, or bidirectional. Must be
9798 one of forward (default), reverse, or bidirectional. Default value is 'forward'. one of forward (default), reverse, or bidirectional. Default value is 'forward'.
9899* **hidden_size**:* **hidden_size**:
99100 Number of neurons in the hidden layer Number of neurons in the hidden layer
100101* **linear_before_reset**:* **linear_before_reset**:
101102 When computing the output of the hidden gate, apply the linear When computing the output of the hidden gate, apply the linear
102103 transformation before multiplying by the output of the reset gate. Default value is 0. transformation before multiplying by the output of the reset gate. Default value is 0.
103* **output_sequence**:
104 The sequence output for the hidden is optional if 0. Default 0. Default value is 0.
105104
106105**Inputs****Inputs**
107106
108107Between 3 and 6 inputs.Between 3 and 6 inputs.
109108
110109* **X** (heterogeneous) - **T**:* **X** (heterogeneous) - **T**:
111110 The input sequences packed (and potentially padded) into one 3-D The input sequences packed (and potentially padded) into one 3-D
112111 tensor with the shape of [seq_length, batch_size, input_size]. tensor with the shape of [seq_length, batch_size, input_size].
113112* **W** (heterogeneous) - **T**:* **W** (heterogeneous) - **T**:
114113 The weight tensor for the gates. Concatenation of W[zrh] and The weight tensor for the gates. Concatenation of W[zrh] and
115114 WB[zrh] (if bidirectional) along dimension 0. This tensor has WB[zrh] (if bidirectional) along dimension 0. This tensor has
116115 shape [num_directions, 3*hidden_size, input_size]. shape [num_directions, 3*hidden_size, input_size].
117116* **R** (heterogeneous) - **T**:* **R** (heterogeneous) - **T**:
118117 The recurrence weight tensor. Concatenation of R[zrh] and The recurrence weight tensor. Concatenation of R[zrh] and
119118 RB[zrh] (if bidirectional) along dimension 0. This tensor has RB[zrh] (if bidirectional) along dimension 0. This tensor has
120119 shape [num_directions, 3*hidden_size, hidden_size]. shape [num_directions, 3*hidden_size, hidden_size].
121120* **B** (optional, heterogeneous) - **T**:* **B** (optional, heterogeneous) - **T**:
122121 The bias tensor for the gates. Concatenation of [Wb[zrh], Rb[zrh]] The bias tensor for the gates. Concatenation of [Wb[zrh], Rb[zrh]]
123122 and [WBb[zrh], RBb[zrh]] (if bidirectional) along dimension 0. and [WBb[zrh], RBb[zrh]] (if bidirectional) along dimension 0.
124123 This tensor has shape [num_directions, 6*hidden_size]. Optional: This tensor has shape [num_directions, 6*hidden_size]. Optional:
125124 If not specified - assumed to be 0 If not specified - assumed to be 0
126125* **sequence_lens** (optional, heterogeneous) - **T1**:* **sequence_lens** (optional, heterogeneous) - **T1**:
127126 Optional tensor specifying lengths of the sequences in a batch. If Optional tensor specifying lengths of the sequences in a batch. If
128127 not specified - assumed all sequences in the batch to have length not specified - assumed all sequences in the batch to have length
129128 seq_length. It has shape [batch_size]. seq_length. It has shape [batch_size].
130129* **initial_h** (optional, heterogeneous) - **T**:* **initial_h** (optional, heterogeneous) - **T**:
131130 Optional initial value of the hidden. If not specified - assumed to Optional initial value of the hidden. If not specified - assumed to
132131 be 0. It has shape [num_directions, batch_size, hidden_size]. be 0. It has shape [num_directions, batch_size, hidden_size].
133132
134133**Outputs****Outputs**
135134
136135Between 0 and 2 outputs.Between 0 and 2 outputs.
137136
138137* **Y** (optional, heterogeneous) - **T**:* **Y** (optional, heterogeneous) - **T**:
139138 A tensor that concats all the intermediate output values of the A tensor that concats all the intermediate output values of the
140139 hidden. It has shape [seq_length, num_directions, batch_size, hidden. It has shape [seq_length, num_directions, batch_size,
141140 hidden_size]. It is optional if output_sequence is 0. hidden_size].
142141* **Y_h** (optional, heterogeneous) - **T**:* **Y_h** (optional, heterogeneous) - **T**:
143142 The last output value of the hidden. It has shape [num_directions, The last output value of the hidden. It has shape [num_directions,
144143 batch_size, hidden_size]. batch_size, hidden_size].
145144
146145**Type Constraints****Type Constraints**
147146
148147* **T** in (* **T** in (
149148 tensor(double), tensor(double),
150149 tensor(float), tensor(float),
151150 tensor(float16) tensor(float16)
152151 ): ):
153152 Constrain input and output types to float tensors. Constrain input and output types to float tensors.
154153* **T1** in (* **T1** in (
155154 tensor(int32) tensor(int32)
156155 ): ):
157156 Constrain seq_lens to integer tensor. Constrain seq_lens to integer tensor.

GRU - 3#

Version

  • name: GRU (GitHub)

  • domain: main

  • since_version: 3

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 3.

Summary

Computes an one-layer GRU. This operator is usually supported via some custom implementation such as CuDNN.

Notations:

X - input tensor

z - update gate

r - reset gate

h - hidden gate

t - time step (t-1 means previous time step)

W[zrh] - W parameter weight matrix for update, reset, and hidden gates

R[zrh] - R recurrence weight matrix for update, reset, and hidden gates

Wb[zrh] - W bias vectors for update, reset, and hidden gates

Rb[zrh] - R bias vectors for update, reset, and hidden gates

WB[zrh] - W parameter weight matrix for backward update, reset, and hidden gates

RB[zrh] - R recurrence weight matrix for backward update, reset, and hidden gates

WBb[zrh] - W bias vectors for backward update, reset, and hidden gates

RBb[zrh] - R bias vectors for backward update, reset, and hidden gates

H - Hidden state

num_directions - 2 if direction == bidirectional else 1

Activation functions:

Relu(x) - max(0, x)

Tanh(x) - (1 - e^{-2x})/(1 + e^{-2x})

Sigmoid(x) - 1/(1 + e^{-x})

(NOTE: Below are optional)

Affine(x) - alpha*x + beta

LeakyRelu(x) - x if x >= 0 else alpha * x

ThresholdedRelu(x) - x if x >= alpha else 0

ScaledTanh(x) - alpha*Tanh(beta*x)

HardSigmoid(x) - min(max(alpha*x + beta, 0), 1)

Elu(x) - x if x >= 0 else alpha*(e^x - 1)

Softsign(x) - x/(1 + |x|)

Softplus(x) - log(1 + e^x)

Equations (Default: f=Sigmoid, g=Tanh):

  • zt = f(Xt*(Wz^T) + Ht-1*Rz + Wbz + Rbz)

  • rt = f(Xt*(Wr^T) + Ht-1*Rr + Wbr + Rbr)

  • ht = g(Xt*(Wh^T) + (rt (.) Ht-1)*Rh + Rbh + Wbh) # default, when linear_before_reset = 0

  • ht = g(Xt*(Wh^T) + (rt (.) (Ht-1*Rh + Rbh) + Wbh) # when linear_before_reset != 0

  • Ht = (1 - zt) (.) ht + zt (.) Ht-1

Attributes

  • activation_alpha: Optional scaling values used by some activation functions. The values are consumed in the order of activation functions, for example (f, g, h) in LSTM. Default values are the same as of corresponding ONNX operators.For example with LeakyRelu, the default alpha is 0.01.

  • activation_beta: Optional scaling values used by some activation functions. The values are consumed in the order of activation functions, for example (f, g, h) in LSTM. Default values are the same as of corresponding ONNX operators.

  • activations: A list of 2 (or 4 if bidirectional) activation functions for update, reset, and hidden gates. The activation functions must be one of the activation functions specified above. Optional: See the equations for default if not specified.

  • clip: Cell clip threshold. Clipping bounds the elements of a tensor in the range of [-threshold, +threshold] and is applied to the input of activations. No clip if not specified.

  • direction: Specify if the RNN is forward, reverse, or bidirectional. Must be one of forward (default), reverse, or bidirectional. Default value is 'forward'.

  • hidden_size: Number of neurons in the hidden layer

  • linear_before_reset: When computing the output of the hidden gate, apply the linear transformation before multiplying by the output of the reset gate. Default value is 0.

  • output_sequence: The sequence output for the hidden is optional if 0. Default 0. Default value is 0.

Inputs

Between 3 and 6 inputs.

  • X (heterogeneous) - T: The input sequences packed (and potentially padded) into one 3-D tensor with the shape of [seq_length, batch_size, input_size].

  • W (heterogeneous) - T: The weight tensor for the gates. Concatenation of W[zrh] and WB[zrh] (if bidirectional) along dimension 0. This tensor has shape [num_directions, 3*hidden_size, input_size].

  • R (heterogeneous) - T: The recurrence weight tensor. Concatenation of R[zrh] and RB[zrh] (if bidirectional) along dimension 0. This tensor has shape [num_directions, 3*hidden_size, hidden_size].

  • B (optional, heterogeneous) - T: The bias tensor for the gates. Concatenation of [Wb[zrh], Rb[zrh]] and [WBb[zrh], RBb[zrh]] (if bidirectional) along dimension 0. This tensor has shape [num_directions, 6*hidden_size]. Optional: If not specified - assumed to be 0

  • sequence_lens (optional, heterogeneous) - T1: Optional tensor specifying lengths of the sequences in a batch. If not specified - assumed all sequences in the batch to have length seq_length. It has shape [batch_size].

  • initial_h (optional, heterogeneous) - T: Optional initial value of the hidden. If not specified - assumed to be 0. It has shape [num_directions, batch_size, hidden_size].

Outputs

Between 0 and 2 outputs.

  • Y (optional, heterogeneous) - T: A tensor that concats all the intermediate output values of the hidden. It has shape [seq_length, num_directions, batch_size, hidden_size]. It is optional if output_sequence is 0.

  • Y_h (optional, heterogeneous) - T: The last output value of the hidden. It has shape [num_directions, batch_size, hidden_size].

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

  • T1 in ( tensor(int32) ): Constrain seq_lens to integer tensor.

Differences

00Computes an one-layer GRU. This operator is usually supported via some customComputes an one-layer GRU. This operator is usually supported via some custom
11implementation such as CuDNN.implementation such as CuDNN.
22
33Notations:Notations:
44
55X - input tensorX - input tensor
66
77z - update gatez - update gate
88
99r - reset gater - reset gate
1010
1111h - hidden gateh - hidden gate
1212
1313t - time step (t-1 means previous time step)t - time step (t-1 means previous time step)
1414
1515W[zrh] - W parameter weight matrix for update, reset, and hidden gatesW[zrh] - W parameter weight matrix for update, reset, and hidden gates
1616
1717R[zrh] - R recurrence weight matrix for update, reset, and hidden gatesR[zrh] - R recurrence weight matrix for update, reset, and hidden gates
1818
1919Wb[zrh] - W bias vectors for update, reset, and hidden gatesWb[zrh] - W bias vectors for update, reset, and hidden gates
2020
2121Rb[zrh] - R bias vectors for update, reset, and hidden gatesRb[zrh] - R bias vectors for update, reset, and hidden gates
2222
2323WB[zrh] - W parameter weight matrix for backward update, reset, and hidden gatesWB[zrh] - W parameter weight matrix for backward update, reset, and hidden gates
2424
2525RB[zrh] - R recurrence weight matrix for backward update, reset, and hidden gatesRB[zrh] - R recurrence weight matrix for backward update, reset, and hidden gates
2626
2727WBb[zrh] - W bias vectors for backward update, reset, and hidden gatesWBb[zrh] - W bias vectors for backward update, reset, and hidden gates
2828
2929RBb[zrh] - R bias vectors for backward update, reset, and hidden gatesRBb[zrh] - R bias vectors for backward update, reset, and hidden gates
3030
3131H - Hidden stateH - Hidden state
3232
3333num_directions - 2 if direction == bidirectional else 1num_directions - 2 if direction == bidirectional else 1
3434
3535Activation functions:Activation functions:
3636
3737 Relu(x) - max(0, x) Relu(x) - max(0, x)
3838
3939 Tanh(x) - (1 - e^{-2x})/(1 + e^{-2x}) Tanh(x) - (1 - e^{-2x})/(1 + e^{-2x})
4040
4141 Sigmoid(x) - 1/(1 + e^{-x}) Sigmoid(x) - 1/(1 + e^{-x})
4242
4343 (NOTE: Below are optional) (NOTE: Below are optional)
4444
4545 Affine(x) - alpha*x + beta Affine(x) - alpha*x + beta
4646
4747 LeakyRelu(x) - x if x >= 0 else alpha * x LeakyRelu(x) - x if x >= 0 else alpha * x
4848
4949 ThresholdedRelu(x) - x if x >= alpha else 0 ThresholdedRelu(x) - x if x >= alpha else 0
5050
5151 ScaledTanh(x) - alpha*Tanh(beta*x) ScaledTanh(x) - alpha*Tanh(beta*x)
5252
5353 HardSigmoid(x) - min(max(alpha*x + beta, 0), 1) HardSigmoid(x) - min(max(alpha*x + beta, 0), 1)
5454
5555 Elu(x) - x if x >= 0 else alpha*(e^x - 1) Elu(x) - x if x >= 0 else alpha*(e^x - 1)
5656
5757 Softsign(x) - x/(1 + |x|) Softsign(x) - x/(1 + |x|)
5858
5959 Softplus(x) - log(1 + e^x) Softplus(x) - log(1 + e^x)
6060
6161Equations (Default: f=Sigmoid, g=Tanh):Equations (Default: f=Sigmoid, g=Tanh):
6262
6363 - zt = f(Xt*(Wz^T) + Ht-1*Rz + Wbz + Rbz) - zt = f(Xt*(Wz^T) + Ht-1*Rz + Wbz + Rbz)
6464
6565 - rt = f(Xt*(Wr^T) + Ht-1*Rr + Wbr + Rbr) - rt = f(Xt*(Wr^T) + Ht-1*Rr + Wbr + Rbr)
6666
6767 - ht = g(Xt*(Wh^T) + (rt (.) Ht-1)*Rh + Rbh + Wbh) # default, when linear_before_reset = 0 - ht = g(Xt*(Wh^T) + (rt (.) Ht-1)*Rh + Rbh + Wbh) # default, when linear_before_reset = 0
6868
6969 - ht = g(Xt*(Wh^T) + (rt (.) (Ht-1*Rh + Rbh) + Wbh) # when linear_before_reset != 0 - ht = g(Xt*(Wh^T) + (rt (.) (Ht-1*Rh + Rbh) + Wbh) # when linear_before_reset != 0
7070
7171 - Ht = (1 - zt) (.) ht + zt (.) Ht-1 - Ht = (1 - zt) (.) ht + zt (.) Ht-1
7272
7373**Attributes****Attributes**
7474
7575* **activation_alpha**:* **activation_alpha**:
7676 Optional scaling values used by some activation functions. The Optional scaling values used by some activation functions. The
7777 values are consumed in the order of activation functions, for values are consumed in the order of activation functions, for
7878 example (f, g, h) in LSTM. example (f, g, h) in LSTM. Default values are the same as of
79 corresponding ONNX operators.For example with LeakyRelu, the default
80 alpha is 0.01.
7981* **activation_beta**:* **activation_beta**:
8082 Optional scaling values used by some activation functions. The Optional scaling values used by some activation functions. The
8183 values are consumed in the order of activation functions, for values are consumed in the order of activation functions, for
8284 example (f, g, h) in LSTM. example (f, g, h) in LSTM. Default values are the same as of
85 corresponding ONNX operators.
8386* **activations**:* **activations**:
8487 A list of 2 (or 4 if bidirectional) activation functions for update, A list of 2 (or 4 if bidirectional) activation functions for update,
8588 reset, and hidden gates. The activation functions must be one of the reset, and hidden gates. The activation functions must be one of the
8689 activation functions specified above. Optional: See the equations activation functions specified above. Optional: See the equations
8790 for default if not specified. for default if not specified.
8891* **clip**:* **clip**:
8992 Cell clip threshold. Clipping bounds the elements of a tensor in the Cell clip threshold. Clipping bounds the elements of a tensor in the
9093 range of [-threshold, +threshold] and is applied to the input of range of [-threshold, +threshold] and is applied to the input of
9194 activations. No clip if not specified. activations. No clip if not specified.
9295* **direction**:* **direction**:
9396 Specify if the RNN is forward, reverse, or bidirectional. Must be Specify if the RNN is forward, reverse, or bidirectional. Must be
9497 one of forward (default), reverse, or bidirectional. Default value is 'foward'. one of forward (default), reverse, or bidirectional. Default value is 'forward'.
9598* **hidden_size**:* **hidden_size**:
9699 Number of neurons in the hidden layer Number of neurons in the hidden layer
100* **linear_before_reset**:
101 When computing the output of the hidden gate, apply the linear
102 transformation before multiplying by the output of the reset gate. Default value is 0.
97103* **output_sequence**:* **output_sequence**:
98104 The sequence output for the hidden is optional if 0. Default 0. Default value is 0. The sequence output for the hidden is optional if 0. Default 0. Default value is 0.
99105
100106**Inputs****Inputs**
101107
102108Between 3 and 6 inputs.Between 3 and 6 inputs.
103109
104110* **X** (heterogeneous) - **T**:* **X** (heterogeneous) - **T**:
105111 The input sequences packed (and potentially padded) into one 3-D The input sequences packed (and potentially padded) into one 3-D
106112 tensor with the shape of [seq_length, batch_size, input_size]. tensor with the shape of [seq_length, batch_size, input_size].
107113* **W** (heterogeneous) - **T**:* **W** (heterogeneous) - **T**:
108114 The weight tensor for the gates. Concatenation of W[zrh] and The weight tensor for the gates. Concatenation of W[zrh] and
109115 WB[zrh] (if bidirectional) along dimension 0. This tensor has WB[zrh] (if bidirectional) along dimension 0. This tensor has
110116 shape [num_directions, 3*hidden_size, input_size]. shape [num_directions, 3*hidden_size, input_size].
111117* **R** (heterogeneous) - **T**:* **R** (heterogeneous) - **T**:
112118 The recurrence weight tensor. Concatenation of R[zrh] and The recurrence weight tensor. Concatenation of R[zrh] and
113119 RB[zrh] (if bidirectional) along dimension 0. This tensor has RB[zrh] (if bidirectional) along dimension 0. This tensor has
114120 shape [num_directions, 3*hidden_size, hidden_size]. shape [num_directions, 3*hidden_size, hidden_size].
115121* **B** (optional, heterogeneous) - **T**:* **B** (optional, heterogeneous) - **T**:
116122 The bias tensor for the gates. Concatenation of [Wb[zrh], Rb[zrh]] The bias tensor for the gates. Concatenation of [Wb[zrh], Rb[zrh]]
117123 and [WBb[zrh], RBb[zrh]] (if bidirectional) along dimension 0. and [WBb[zrh], RBb[zrh]] (if bidirectional) along dimension 0.
118124 This tensor has shape [num_directions, 6*hidden_size]. Optional: This tensor has shape [num_directions, 6*hidden_size]. Optional:
119125 If not specified - assumed to be 0 If not specified - assumed to be 0
120126* **sequence_lens** (optional, heterogeneous) - **T1**:* **sequence_lens** (optional, heterogeneous) - **T1**:
121127 Optional tensor specifying lengths of the sequences in a batch. If Optional tensor specifying lengths of the sequences in a batch. If
122128 not specified - assumed all sequences in the batch to have length not specified - assumed all sequences in the batch to have length
123129 seq_length. It has shape [batch_size]. seq_length. It has shape [batch_size].
124130* **initial_h** (optional, heterogeneous) - **T**:* **initial_h** (optional, heterogeneous) - **T**:
125131 Optional initial value of the hidden. If not specified - assumed to Optional initial value of the hidden. If not specified - assumed to
126132 be 0. It has shape [num_directions, batch_size, hidden_size]. be 0. It has shape [num_directions, batch_size, hidden_size].
127133
128134**Outputs****Outputs**
129135
136Between 0 and 2 outputs.
137
130138* **Y** (optional, heterogeneous) - **T**:* **Y** (optional, heterogeneous) - **T**:
131139 A tensor that concats all the intermediate output values of the A tensor that concats all the intermediate output values of the
132140 hidden. It has shape [seq_length, num_directions, batch_size, hidden. It has shape [seq_length, num_directions, batch_size,
133141 hidden_size]. It is optional if output_sequence is 0. hidden_size]. It is optional if output_sequence is 0.
134142* **Y_h** (heterogeneous) - **T**:* **Y_h** (optional, heterogeneous) - **T**:
135143 The last output value of the hidden. It has shape [num_directions, The last output value of the hidden. It has shape [num_directions,
136144 batch_size, hidden_size]. batch_size, hidden_size].
137145
138146**Type Constraints****Type Constraints**
139147
140148* **T** in (* **T** in (
141149 tensor(double), tensor(double),
142150 tensor(float), tensor(float),
143151 tensor(float16) tensor(float16)
144152 ): ):
145153 Constrain input and output types to float tensors. Constrain input and output types to float tensors.
146154* **T1** in (* **T1** in (
147155 tensor(int32) tensor(int32)
148156 ): ):
149157 Constrain seq_lens to integer tensor. Constrain seq_lens to integer tensor.

GRU - 1#

Version

  • name: GRU (GitHub)

  • domain: main

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: False

This version of the operator has been available since version 1.

Summary

Computes an one-layer GRU. This operator is usually supported via some custom implementation such as CuDNN.

Notations:

X - input tensor

z - update gate

r - reset gate

h - hidden gate

t - time step (t-1 means previous time step)

W[zrh] - W parameter weight matrix for update, reset, and hidden gates

R[zrh] - R recurrence weight matrix for update, reset, and hidden gates

Wb[zrh] - W bias vectors for update, reset, and hidden gates

Rb[zrh] - R bias vectors for update, reset, and hidden gates

WB[zrh] - W parameter weight matrix for backward update, reset, and hidden gates

RB[zrh] - R recurrence weight matrix for backward update, reset, and hidden gates

WBb[zrh] - W bias vectors for backward update, reset, and hidden gates

RBb[zrh] - R bias vectors for backward update, reset, and hidden gates

H - Hidden state

num_directions - 2 if direction == bidirectional else 1

Activation functions:

Relu(x) - max(0, x)

Tanh(x) - (1 - e^{-2x})/(1 + e^{-2x})

Sigmoid(x) - 1/(1 + e^{-x})

(NOTE: Below are optional)

Affine(x) - alpha*x + beta

LeakyRelu(x) - x if x >= 0 else alpha * x

ThresholdedRelu(x) - x if x >= alpha else 0

ScaledTanh(x) - alpha*Tanh(beta*x)

HardSigmoid(x) - min(max(alpha*x + beta, 0), 1)

Elu(x) - x if x >= 0 else alpha*(e^x - 1)

Softsign(x) - x/(1 + |x|)

Softplus(x) - log(1 + e^x)

Equations (Default: f=Sigmoid, g=Tanh):

  • zt = f(Xt*(Wz^T) + Ht-1*Rz + Wbz + Rbz)

  • rt = f(Xt*(Wr^T) + Ht-1*Rr + Wbr + Rbr)

  • ht = g(Xt*(Wh^T) + (rt (.) Ht-1)*Rh + Rbh + Wbh) # default, when linear_before_reset = 0

  • ht = g(Xt*(Wh^T) + (rt (.) (Ht-1*Rh + Rbh) + Wbh) # when linear_before_reset != 0

  • Ht = (1 - zt) (.) ht + zt (.) Ht-1

Attributes

  • activation_alpha: Optional scaling values used by some activation functions. The values are consumed in the order of activation functions, for example (f, g, h) in LSTM.

  • activation_beta: Optional scaling values used by some activation functions. The values are consumed in the order of activation functions, for example (f, g, h) in LSTM.

  • activations: A list of 2 (or 4 if bidirectional) activation functions for update, reset, and hidden gates. The activation functions must be one of the activation functions specified above. Optional: See the equations for default if not specified.

  • clip: Cell clip threshold. Clipping bounds the elements of a tensor in the range of [-threshold, +threshold] and is applied to the input of activations. No clip if not specified.

  • direction: Specify if the RNN is forward, reverse, or bidirectional. Must be one of forward (default), reverse, or bidirectional. Default value is 'foward'.

  • hidden_size: Number of neurons in the hidden layer

  • output_sequence: The sequence output for the hidden is optional if 0. Default 0. Default value is 0.

Inputs

Between 3 and 6 inputs.

  • X (heterogeneous) - T: The input sequences packed (and potentially padded) into one 3-D tensor with the shape of [seq_length, batch_size, input_size].

  • W (heterogeneous) - T: The weight tensor for the gates. Concatenation of W[zrh] and WB[zrh] (if bidirectional) along dimension 0. This tensor has shape [num_directions, 3*hidden_size, input_size].

  • R (heterogeneous) - T: The recurrence weight tensor. Concatenation of R[zrh] and RB[zrh] (if bidirectional) along dimension 0. This tensor has shape [num_directions, 3*hidden_size, hidden_size].

  • B (optional, heterogeneous) - T: The bias tensor for the gates. Concatenation of [Wb[zrh], Rb[zrh]] and [WBb[zrh], RBb[zrh]] (if bidirectional) along dimension 0. This tensor has shape [num_directions, 6*hidden_size]. Optional: If not specified - assumed to be 0

  • sequence_lens (optional, heterogeneous) - T1: Optional tensor specifying lengths of the sequences in a batch. If not specified - assumed all sequences in the batch to have length seq_length. It has shape [batch_size].

  • initial_h (optional, heterogeneous) - T: Optional initial value of the hidden. If not specified - assumed to be 0. It has shape [num_directions, batch_size, hidden_size].

Outputs

  • Y (optional, heterogeneous) - T: A tensor that concats all the intermediate output values of the hidden. It has shape [seq_length, num_directions, batch_size, hidden_size]. It is optional if output_sequence is 0.

  • Y_h (heterogeneous) - T: The last output value of the hidden. It has shape [num_directions, batch_size, hidden_size].

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

  • T1 in ( tensor(int32) ): Constrain seq_lens to integer tensor.