GRU - 1 vs 3#

Next section compares an older to a newer version of the same operator after both definition are converted into markdown text. Green means an addition to the newer version, red means a deletion. Anything else is unchanged.

Files changed (1) hide show
  1. GRU1 → GRU3 +3 -11
GRU1 → GRU3 RENAMED
@@ -1 +1 @@
1
1
  Computes an one-layer GRU. This operator is usually supported via some custom
2
2
  implementation such as CuDNN.
3
3
  Notations:
4
4
  X - input tensor
5
5
  z - update gate
6
6
  r - reset gate
7
7
  h - hidden gate
8
8
  t - time step (t-1 means previous time step)
9
9
  W[zrh] - W parameter weight matrix for update, reset, and hidden gates
10
10
  R[zrh] - R recurrence weight matrix for update, reset, and hidden gates
11
11
  Wb[zrh] - W bias vectors for update, reset, and hidden gates
12
12
  Rb[zrh] - R bias vectors for update, reset, and hidden gates
13
13
  WB[zrh] - W parameter weight matrix for backward update, reset, and hidden gates
14
14
  RB[zrh] - R recurrence weight matrix for backward update, reset, and hidden gates
15
15
  WBb[zrh] - W bias vectors for backward update, reset, and hidden gates
16
16
  RBb[zrh] - R bias vectors for backward update, reset, and hidden gates
17
17
  H - Hidden state
18
18
  num_directions - 2 if direction == bidirectional else 1
19
19
  Activation functions:
20
20
  Relu(x) - max(0, x)
21
21
  Tanh(x) - (1 - e^{-2x})/(1 + e^{-2x})
22
22
  Sigmoid(x) - 1/(1 + e^{-x})
23
23
  (NOTE: Below are optional)
24
24
  Affine(x) - alpha*x + beta
25
25
  LeakyRelu(x) - x if x >= 0 else alpha * x
26
26
  ThresholdedRelu(x) - x if x >= alpha else 0
27
27
  ScaledTanh(x) - alpha*Tanh(beta*x)
28
28
  HardSigmoid(x) - min(max(alpha*x + beta, 0), 1)
29
29
  Elu(x) - x if x >= 0 else alpha*(e^x - 1)
30
30
  Softsign(x) - x/(1 + |x|)
31
31
  Softplus(x) - log(1 + e^x)
32
32
  Equations (Default: f=Sigmoid, g=Tanh):
33
33
  - zt = f(Xt*(Wz^T) + Ht-1*Rz + Wbz + Rbz)
34
34
  - rt = f(Xt*(Wr^T) + Ht-1*Rr + Wbr + Rbr)
35
35
  - ht = g(Xt*(Wh^T) + (rt (.) Ht-1)*Rh + Rbh + Wbh) # default, when linear_before_reset = 0
36
36
  - ht = g(Xt*(Wh^T) + (rt (.) (Ht-1*Rh + Rbh) + Wbh) # when linear_before_reset != 0
37
37
  - Ht = (1 - zt) (.) ht + zt (.) Ht-1
38
38
  **Attributes**
39
39
  * **activation_alpha**:
40
40
  Optional scaling values used by some activation functions. The
41
41
  values are consumed in the order of activation functions, for
42
+ example (f, g, h) in LSTM.
42
- example (f, g, h) in LSTM. Default values are the same as of
43
- corresponding ONNX operators.For example with LeakyRelu, the default
44
- alpha is 0.01.
45
43
  * **activation_beta**:
46
44
  Optional scaling values used by some activation functions. The
47
45
  values are consumed in the order of activation functions, for
46
+ example (f, g, h) in LSTM.
48
- example (f, g, h) in LSTM. Default values are the same as of
49
- corresponding ONNX operators.
50
47
  * **activations**:
51
48
  A list of 2 (or 4 if bidirectional) activation functions for update,
52
49
  reset, and hidden gates. The activation functions must be one of the
53
50
  activation functions specified above. Optional: See the equations
54
51
  for default if not specified.
55
52
  * **clip**:
56
53
  Cell clip threshold. Clipping bounds the elements of a tensor in the
57
54
  range of [-threshold, +threshold] and is applied to the input of
58
55
  activations. No clip if not specified.
59
56
  * **direction**:
60
57
  Specify if the RNN is forward, reverse, or bidirectional. Must be
61
58
  one of forward (default), reverse, or bidirectional.
62
59
  * **hidden_size**:
63
60
  Number of neurons in the hidden layer
64
- * **linear_before_reset**:
65
- When computing the output of the hidden gate, apply the linear
66
- transformation before multiplying by the output of the reset gate.
67
61
  * **output_sequence**:
68
62
  The sequence output for the hidden is optional if 0. Default 0.
69
63
  **Inputs**
70
64
  Between 3 and 6 inputs.
71
65
  * **X** (heterogeneous) - **T**:
72
66
  The input sequences packed (and potentially padded) into one 3-D
73
67
  tensor with the shape of [seq_length, batch_size, input_size].
74
68
  * **W** (heterogeneous) - **T**:
75
69
  The weight tensor for the gates. Concatenation of W[zrh] and
76
70
  WB[zrh] (if bidirectional) along dimension 0. This tensor has
77
71
  shape [num_directions, 3*hidden_size, input_size].
78
72
  * **R** (heterogeneous) - **T**:
79
73
  The recurrence weight tensor. Concatenation of R[zrh] and
80
74
  RB[zrh] (if bidirectional) along dimension 0. This tensor has
81
75
  shape [num_directions, 3*hidden_size, hidden_size].
82
76
  * **B** (optional, heterogeneous) - **T**:
83
77
  The bias tensor for the gates. Concatenation of [Wb[zrh], Rb[zrh]]
84
78
  and [WBb[zrh], RBb[zrh]] (if bidirectional) along dimension 0.
85
79
  This tensor has shape [num_directions, 6*hidden_size]. Optional:
86
80
  If not specified - assumed to be 0
87
81
  * **sequence_lens** (optional, heterogeneous) - **T1**:
88
82
  Optional tensor specifying lengths of the sequences in a batch. If
89
83
  not specified - assumed all sequences in the batch to have length
90
84
  seq_length. It has shape [batch_size].
91
85
  * **initial_h** (optional, heterogeneous) - **T**:
92
86
  Optional initial value of the hidden. If not specified - assumed to
93
87
  be 0. It has shape [num_directions, batch_size, hidden_size].
94
88
  **Outputs**
95
- Between 0 and 2 outputs.
96
-
97
89
  * **Y** (optional, heterogeneous) - **T**:
98
90
  A tensor that concats all the intermediate output values of the
99
91
  hidden. It has shape [seq_length, num_directions, batch_size,
100
92
  hidden_size]. It is optional if output_sequence is 0.
101
- * **Y_h** (optional, heterogeneous) - **T**:
93
+ * **Y_h** (heterogeneous) - **T**:
102
94
  The last output value of the hidden. It has shape [num_directions,
103
95
  batch_size, hidden_size].
104
96
  **Type Constraints**
105
97
  * **T** in (
106
98
  tensor(double),
107
99
  tensor(float),
108
100
  tensor(float16)
109
101
  ):
110
102
  Constrain input and output types to float tensors.
111
103
  * **T1** in (
112
104
  tensor(int32)
113
105
  ):
114
106
  Constrain seq_lens to integer tensor.