GRU - 1 vs 3

Files changed (1) hide show
  1. GRU1 → GRU3 +11 -3
GRU1 → GRU3 RENAMED
@@ -1 +1 @@
1
1
  Computes an one-layer GRU. This operator is usually supported via some custom
2
2
  implementation such as CuDNN.
3
3
  Notations:
4
4
  X - input tensor
5
5
  z - update gate
6
6
  r - reset gate
7
7
  h - hidden gate
8
8
  t - time step (t-1 means previous time step)
9
9
  W[zrh] - W parameter weight matrix for update, reset, and hidden gates
10
10
  R[zrh] - R recurrence weight matrix for update, reset, and hidden gates
11
11
  Wb[zrh] - W bias vectors for update, reset, and hidden gates
12
12
  Rb[zrh] - R bias vectors for update, reset, and hidden gates
13
13
  WB[zrh] - W parameter weight matrix for backward update, reset, and hidden gates
14
14
  RB[zrh] - R recurrence weight matrix for backward update, reset, and hidden gates
15
15
  WBb[zrh] - W bias vectors for backward update, reset, and hidden gates
16
16
  RBb[zrh] - R bias vectors for backward update, reset, and hidden gates
17
17
  H - Hidden state
18
18
  num_directions - 2 if direction == bidirectional else 1
19
19
  Activation functions:
20
20
  Relu(x) - max(0, x)
21
21
  Tanh(x) - (1 - e^{-2x})/(1 + e^{-2x})
22
22
  Sigmoid(x) - 1/(1 + e^{-x})
23
23
  (NOTE: Below are optional)
24
24
  Affine(x) - alpha*x + beta
25
25
  LeakyRelu(x) - x if x >= 0 else alpha * x
26
26
  ThresholdedRelu(x) - x if x >= alpha else 0
27
27
  ScaledTanh(x) - alpha*Tanh(beta*x)
28
28
  HardSigmoid(x) - min(max(alpha*x + beta, 0), 1)
29
29
  Elu(x) - x if x >= 0 else alpha*(e^x - 1)
30
30
  Softsign(x) - x/(1 + |x|)
31
31
  Softplus(x) - log(1 + e^x)
32
32
  Equations (Default: f=Sigmoid, g=Tanh):
33
33
  - zt = f(Xt*(Wz^T) + Ht-1*Rz + Wbz + Rbz)
34
34
  - rt = f(Xt*(Wr^T) + Ht-1*Rr + Wbr + Rbr)
35
35
  - ht = g(Xt*(Wh^T) + (rt (.) Ht-1)*Rh + Rbh + Wbh) # default, when linear_before_reset = 0
36
36
  - ht = g(Xt*(Wh^T) + (rt (.) (Ht-1*Rh + Rbh) + Wbh) # when linear_before_reset != 0
37
37
  - Ht = (1 - zt) (.) ht + zt (.) Ht-1
38
38
  **Attributes**
39
39
  * **activation_alpha**:
40
40
  Optional scaling values used by some activation functions. The
41
41
  values are consumed in the order of activation functions, for
42
- example (f, g, h) in LSTM.
42
+ example (f, g, h) in LSTM. Default values are the same as of
43
+ corresponding ONNX operators.For example with LeakyRelu, the default
44
+ alpha is 0.01.
43
45
  * **activation_beta**:
44
46
  Optional scaling values used by some activation functions. The
45
47
  values are consumed in the order of activation functions, for
46
- example (f, g, h) in LSTM.
48
+ example (f, g, h) in LSTM. Default values are the same as of
49
+ corresponding ONNX operators.
47
50
  * **activations**:
48
51
  A list of 2 (or 4 if bidirectional) activation functions for update,
49
52
  reset, and hidden gates. The activation functions must be one of the
50
53
  activation functions specified above. Optional: See the equations
51
54
  for default if not specified.
52
55
  * **clip**:
53
56
  Cell clip threshold. Clipping bounds the elements of a tensor in the
54
57
  range of [-threshold, +threshold] and is applied to the input of
55
58
  activations. No clip if not specified.
56
59
  * **direction**:
57
60
  Specify if the RNN is forward, reverse, or bidirectional. Must be
58
61
  one of forward (default), reverse, or bidirectional.
59
62
  * **hidden_size**:
60
63
  Number of neurons in the hidden layer
64
+ * **linear_before_reset**:
65
+ When computing the output of the hidden gate, apply the linear
66
+ transformation before multiplying by the output of the reset gate.
61
67
  * **output_sequence**:
62
68
  The sequence output for the hidden is optional if 0. Default 0.
63
69
  **Inputs**
64
70
  Between 3 and 6 inputs.
65
71
  * **X** (heterogeneous) - **T**:
66
72
  The input sequences packed (and potentially padded) into one 3-D
67
73
  tensor with the shape of [seq_length, batch_size, input_size].
68
74
  * **W** (heterogeneous) - **T**:
69
75
  The weight tensor for the gates. Concatenation of W[zrh] and
70
76
  WB[zrh] (if bidirectional) along dimension 0. This tensor has
71
77
  shape [num_directions, 3*hidden_size, input_size].
72
78
  * **R** (heterogeneous) - **T**:
73
79
  The recurrence weight tensor. Concatenation of R[zrh] and
74
80
  RB[zrh] (if bidirectional) along dimension 0. This tensor has
75
81
  shape [num_directions, 3*hidden_size, hidden_size].
76
82
  * **B** (optional, heterogeneous) - **T**:
77
83
  The bias tensor for the gates. Concatenation of [Wb[zrh], Rb[zrh]]
78
84
  and [WBb[zrh], RBb[zrh]] (if bidirectional) along dimension 0.
79
85
  This tensor has shape [num_directions, 6*hidden_size]. Optional:
80
86
  If not specified - assumed to be 0
81
87
  * **sequence_lens** (optional, heterogeneous) - **T1**:
82
88
  Optional tensor specifying lengths of the sequences in a batch. If
83
89
  not specified - assumed all sequences in the batch to have length
84
90
  seq_length. It has shape [batch_size].
85
91
  * **initial_h** (optional, heterogeneous) - **T**:
86
92
  Optional initial value of the hidden. If not specified - assumed to
87
93
  be 0. It has shape [num_directions, batch_size, hidden_size].
88
94
  **Outputs**
95
+ Between 0 and 2 outputs.
96
+
89
97
  * **Y** (optional, heterogeneous) - **T**:
90
98
  A tensor that concats all the intermediate output values of the
91
99
  hidden. It has shape [seq_length, num_directions, batch_size,
92
100
  hidden_size]. It is optional if output_sequence is 0.
93
- * **Y_h** (heterogeneous) - **T**:
101
+ * **Y_h** (optional, heterogeneous) - **T**:
94
102
  The last output value of the hidden. It has shape [num_directions,
95
103
  batch_size, hidden_size].
96
104
  **Type Constraints**
97
105
  * **T** in (
98
106
  tensor(double),
99
107
  tensor(float),
100
108
  tensor(float16)
101
109
  ):
102
110
  Constrain input and output types to float tensors.
103
111
  * **T1** in (
104
112
  tensor(int32)
105
113
  ):
106
114
  Constrain seq_lens to integer tensor.