GRU - 1 vs 14#

Next section compares an older to a newer version of the same operator after both definition are converted into markdown text. Green means an addition to the newer version, red means a deletion. Anything else is unchanged.

Files changed (1) hide show
  1. GRU1 → GRU14 +10 -27
GRU1 → GRU14 RENAMED
@@ -1 +1 @@
1
1
  Computes an one-layer GRU. This operator is usually supported via some custom
2
2
  implementation such as CuDNN.
3
3
  Notations:
4
4
  X - input tensor
5
5
  z - update gate
6
6
  r - reset gate
7
7
  h - hidden gate
8
8
  t - time step (t-1 means previous time step)
9
9
  W[zrh] - W parameter weight matrix for update, reset, and hidden gates
10
10
  R[zrh] - R recurrence weight matrix for update, reset, and hidden gates
11
11
  Wb[zrh] - W bias vectors for update, reset, and hidden gates
12
12
  Rb[zrh] - R bias vectors for update, reset, and hidden gates
13
13
  WB[zrh] - W parameter weight matrix for backward update, reset, and hidden gates
14
14
  RB[zrh] - R recurrence weight matrix for backward update, reset, and hidden gates
15
15
  WBb[zrh] - W bias vectors for backward update, reset, and hidden gates
16
16
  RBb[zrh] - R bias vectors for backward update, reset, and hidden gates
17
17
  H - Hidden state
18
18
  num_directions - 2 if direction == bidirectional else 1
19
19
  Activation functions:
20
20
  Relu(x) - max(0, x)
21
21
  Tanh(x) - (1 - e^{-2x})/(1 + e^{-2x})
22
22
  Sigmoid(x) - 1/(1 + e^{-x})
23
23
  (NOTE: Below are optional)
24
24
  Affine(x) - alpha*x + beta
25
25
  LeakyRelu(x) - x if x >= 0 else alpha * x
26
26
  ThresholdedRelu(x) - x if x >= alpha else 0
27
27
  ScaledTanh(x) - alpha*Tanh(beta*x)
28
28
  HardSigmoid(x) - min(max(alpha*x + beta, 0), 1)
29
29
  Elu(x) - x if x >= 0 else alpha*(e^x - 1)
30
30
  Softsign(x) - x/(1 + |x|)
31
31
  Softplus(x) - log(1 + e^x)
32
32
  Equations (Default: f=Sigmoid, g=Tanh):
33
- - zt = f(Xt*(Wz^T) + Ht-1*(Rz^T) + Wbz + Rbz)
33
+ - zt = f(Xt*(Wz^T) + Ht-1*Rz + Wbz + Rbz)
34
- - rt = f(Xt*(Wr^T) + Ht-1*(Rr^T) + Wbr + Rbr)
34
+ - rt = f(Xt*(Wr^T) + Ht-1*Rr + Wbr + Rbr)
35
- - ht = g(Xt*(Wh^T) + (rt (.) Ht-1)*(Rh^T) + Rbh + Wbh) # default, when linear_before_reset = 0
35
+ - ht = g(Xt*(Wh^T) + (rt (.) Ht-1)*Rh + Rbh + Wbh) # default, when linear_before_reset = 0
36
- - ht = g(Xt*(Wh^T) + (rt (.) (Ht-1*(Rh^T) + Rbh)) + Wbh) # when linear_before_reset != 0
36
+ - ht = g(Xt*(Wh^T) + (rt (.) (Ht-1*Rh + Rbh) + Wbh) # when linear_before_reset != 0
37
37
  - Ht = (1 - zt) (.) ht + zt (.) Ht-1
38
- This operator has **optional** inputs/outputs. See ONNX <https://github.com/onnx/onnx/blob/master/docs/IR.md>_ for more details about the representation of optional arguments. An empty string may be used in the place of an actual argument's name to indicate a missing argument. Trailing optional arguments (those not followed by an argument that is present) may also be simply omitted.
39
38
  **Attributes**
40
39
  * **activation_alpha**:
41
40
  Optional scaling values used by some activation functions. The
42
41
  values are consumed in the order of activation functions, for
42
+ example (f, g, h) in LSTM.
43
- example (f, g, h) in LSTM. Default values are the same as of
44
- corresponding ONNX operators.For example with LeakyRelu, the default
45
- alpha is 0.01.
46
43
  * **activation_beta**:
47
44
  Optional scaling values used by some activation functions. The
48
45
  values are consumed in the order of activation functions, for
46
+ example (f, g, h) in LSTM.
49
- example (f, g, h) in LSTM. Default values are the same as of
50
- corresponding ONNX operators.
51
47
  * **activations**:
52
48
  A list of 2 (or 4 if bidirectional) activation functions for update,
53
49
  reset, and hidden gates. The activation functions must be one of the
54
50
  activation functions specified above. Optional: See the equations
55
51
  for default if not specified.
56
52
  * **clip**:
57
53
  Cell clip threshold. Clipping bounds the elements of a tensor in the
58
54
  range of [-threshold, +threshold] and is applied to the input of
59
55
  activations. No clip if not specified.
60
56
  * **direction**:
61
57
  Specify if the RNN is forward, reverse, or bidirectional. Must be
62
58
  one of forward (default), reverse, or bidirectional.
63
59
  * **hidden_size**:
64
60
  Number of neurons in the hidden layer
61
+ * **output_sequence**:
62
+ The sequence output for the hidden is optional if 0. Default 0.
65
- * **layout**:
66
- The shape format of inputs X, initial_h and outputs Y, Y_h. If 0,
67
- the following shapes are expected: X.shape = [seq_length,
68
- batch_size, input_size], Y.shape = [seq_length, num_directions,
69
- batch_size, hidden_size], initial_h.shape = Y_h.shape =
70
- [num_directions, batch_size, hidden_size]. If 1, the following
71
- shapes are expected: X.shape = [batch_size, seq_length, input_size],
72
- Y.shape = [batch_size, seq_length, num_directions, hidden_size],
73
- initial_h.shape = Y_h.shape = [batch_size, num_directions,
74
- hidden_size].
75
- * **linear_before_reset**:
76
- When computing the output of the hidden gate, apply the linear
77
- transformation before multiplying by the output of the reset gate.
78
63
  **Inputs**
79
64
  Between 3 and 6 inputs.
80
65
  * **X** (heterogeneous) - **T**:
81
66
  The input sequences packed (and potentially padded) into one 3-D
82
67
  tensor with the shape of [seq_length, batch_size, input_size].
83
68
  * **W** (heterogeneous) - **T**:
84
69
  The weight tensor for the gates. Concatenation of W[zrh] and
85
70
  WB[zrh] (if bidirectional) along dimension 0. This tensor has
86
71
  shape [num_directions, 3*hidden_size, input_size].
87
72
  * **R** (heterogeneous) - **T**:
88
73
  The recurrence weight tensor. Concatenation of R[zrh] and
89
74
  RB[zrh] (if bidirectional) along dimension 0. This tensor has
90
75
  shape [num_directions, 3*hidden_size, hidden_size].
91
76
  * **B** (optional, heterogeneous) - **T**:
92
77
  The bias tensor for the gates. Concatenation of [Wb[zrh], Rb[zrh]]
93
78
  and [WBb[zrh], RBb[zrh]] (if bidirectional) along dimension 0.
94
79
  This tensor has shape [num_directions, 6*hidden_size]. Optional:
95
80
  If not specified - assumed to be 0
96
81
  * **sequence_lens** (optional, heterogeneous) - **T1**:
97
82
  Optional tensor specifying lengths of the sequences in a batch. If
98
83
  not specified - assumed all sequences in the batch to have length
99
84
  seq_length. It has shape [batch_size].
100
85
  * **initial_h** (optional, heterogeneous) - **T**:
101
86
  Optional initial value of the hidden. If not specified - assumed to
102
87
  be 0. It has shape [num_directions, batch_size, hidden_size].
103
88
  **Outputs**
104
- Between 0 and 2 outputs.
105
-
106
89
  * **Y** (optional, heterogeneous) - **T**:
107
90
  A tensor that concats all the intermediate output values of the
108
91
  hidden. It has shape [seq_length, num_directions, batch_size,
109
- hidden_size].
92
+ hidden_size]. It is optional if output_sequence is 0.
110
- * **Y_h** (optional, heterogeneous) - **T**:
93
+ * **Y_h** (heterogeneous) - **T**:
111
94
  The last output value of the hidden. It has shape [num_directions,
112
95
  batch_size, hidden_size].
113
96
  **Type Constraints**
114
97
  * **T** in (
115
98
  tensor(double),
116
99
  tensor(float),
117
100
  tensor(float16)
118
101
  ):
119
102
  Constrain input and output types to float tensors.
120
103
  * **T1** in (
121
104
  tensor(int32)
122
105
  ):
123
106
  Constrain seq_lens to integer tensor.