RNN - 1 vs 7¶
- RNN1 → RNN7 +3 -4
RNN1 → RNN7
RENAMED
@@ -1 +1 @@
|
|
1
1
|
Computes an one-layer simple RNN. This operator is usually supported
|
2
2
|
via some custom implementation such as CuDNN.
|
3
3
|
Notations:
|
4
4
|
X - input tensor
|
5
5
|
i - input gate
|
6
6
|
t - time step (t-1 means previous time step)
|
7
7
|
Wi - W parameter weight matrix for input gate
|
8
8
|
Ri - R recurrence weight matrix for input gate
|
9
9
|
Wbi - W parameter bias vector for input gate
|
10
10
|
Rbi - R parameter bias vector for input gate
|
11
11
|
WBi - W parameter weight matrix for backward input gate
|
12
12
|
RBi - R recurrence weight matrix for backward input gate
|
13
13
|
WBbi - WR bias vectors for backward input gate
|
14
14
|
RBbi - RR bias vectors for backward input gate
|
15
15
|
H - Hidden state
|
16
16
|
num_directions - 2 if direction == bidirectional else 1
|
17
17
|
Activation functions:
|
18
18
|
Relu(x) - max(0, x)
|
19
19
|
Tanh(x) - (1 - e^{-2x})/(1 + e^{-2x})
|
20
20
|
Sigmoid(x) - 1/(1 + e^{-x})
|
21
21
|
(NOTE: Below are optional)
|
22
22
|
Affine(x) - alpha*x + beta
|
23
23
|
LeakyRelu(x) - x if x >= 0 else alpha * x
|
24
24
|
ThresholdedRelu(x) - x if x >= alpha else 0
|
25
25
|
ScaledTanh(x) - alpha*Tanh(beta*x)
|
26
26
|
HardSigmoid(x) - min(max(alpha*x + beta, 0), 1)
|
27
27
|
Elu(x) - x if x >= 0 else alpha*(e^x - 1)
|
28
28
|
Softsign(x) - x/(1 + |x|)
|
29
29
|
Softplus(x) - log(1 + e^x)
|
30
30
|
Equations (Default: f=Tanh):
|
31
|
-
- Ht = f(Xt*(Wi^T) + Ht-1*Ri + Wbi + Rbi)
|
31
|
+
- Ht = f(Xt*(Wi^T) + Ht-1*(Ri^T) + Wbi + Rbi)
|
32
|
+
This operator has **optional** inputs/outputs. See ONNX <https://github.com/onnx/onnx/blob/master/docs/IR.md>_ for more details about the representation of optional arguments. An empty string may be used in the place of an actual argument's name to indicate a missing argument. Trailing optional arguments (those not followed by an argument that is present) may also be simply omitted.
|
32
33
|
**Attributes**
|
33
34
|
* **activation_alpha**:
|
34
35
|
Optional scaling values used by some activation functions. The
|
35
36
|
values are consumed in the order of activation functions, for
|
36
37
|
example (f, g, h) in LSTM. Default values are the same as of
|
37
38
|
corresponding ONNX operators.For example with LeakyRelu, the default
|
38
39
|
alpha is 0.01.
|
39
40
|
* **activation_beta**:
|
40
41
|
Optional scaling values used by some activation functions. The
|
41
42
|
values are consumed in the order of activation functions, for
|
42
43
|
example (f, g, h) in LSTM. Default values are the same as of
|
43
44
|
corresponding ONNX operators.
|
44
45
|
* **activations**:
|
45
46
|
One (or two if bidirectional) activation function for input gate.
|
46
47
|
The activation function must be one of the activation functions
|
47
48
|
specified above. Optional: Default Tanh if not specified.
|
48
49
|
* **clip**:
|
49
50
|
Cell clip threshold. Clipping bounds the elements of a tensor in the
|
50
51
|
range of [-threshold, +threshold] and is applied to the input of
|
51
52
|
activations. No clip if not specified.
|
52
53
|
* **direction**:
|
53
54
|
Specify if the RNN is forward, reverse, or bidirectional. Must be
|
54
55
|
one of forward (default), reverse, or bidirectional.
|
55
56
|
* **hidden_size**:
|
56
57
|
Number of neurons in the hidden layer
|
57
|
-
* **output_sequence**:
|
58
|
-
The sequence output for the hidden is optional if 0. Default 0.
|
59
58
|
**Inputs**
|
60
59
|
Between 3 and 6 inputs.
|
61
60
|
* **X** (heterogeneous) - **T**:
|
62
61
|
The input sequences packed (and potentially padded) into one 3-D
|
63
62
|
tensor with the shape of [seq_length, batch_size, input_size].
|
64
63
|
* **W** (heterogeneous) - **T**:
|
65
64
|
The weight tensor for input gate. Concatenation of Wi and WBi
|
66
65
|
(if bidirectional). The tensor has shape [num_directions,
|
67
66
|
hidden_size, input_size].
|
68
67
|
* **R** (heterogeneous) - **T**:
|
69
68
|
The recurrence weight tensor. Concatenation of Ri and RBi (if
|
70
69
|
bidirectional). The tensor has shape [num_directions, hidden_size,
|
71
70
|
hidden_size].
|
72
71
|
* **B** (optional, heterogeneous) - **T**:
|
73
72
|
The bias tensor for input gate. Concatenation of [Wbi, Rbi] and
|
74
73
|
[WBbi, RBbi] (if bidirectional). The tensor has shape
|
75
74
|
[num_directions, 2*hidden_size]. Optional: If not specified -
|
76
75
|
assumed to be 0.
|
77
76
|
* **sequence_lens** (optional, heterogeneous) - **T1**:
|
78
77
|
Optional tensor specifying lengths of the sequences in a batch. If
|
79
78
|
not specified - assumed all sequences in the batch to have length
|
80
79
|
seq_length. It has shape [batch_size].
|
81
80
|
* **initial_h** (optional, heterogeneous) - **T**:
|
82
81
|
Optional initial value of the hidden. If not specified - assumed to
|
83
82
|
be 0. It has shape [num_directions, batch_size, hidden_size].
|
84
83
|
**Outputs**
|
85
84
|
Between 0 and 2 outputs.
|
86
85
|
* **Y** (optional, heterogeneous) - **T**:
|
87
86
|
A tensor that concats all the intermediate output values of the
|
88
87
|
hidden. It has shape [seq_length, num_directions, batch_size,
|
89
|
-
hidden_size].
|
88
|
+
hidden_size].
|
90
89
|
* **Y_h** (optional, heterogeneous) - **T**:
|
91
90
|
The last output value of the hidden. It has shape [num_directions,
|
92
91
|
batch_size, hidden_size].
|
93
92
|
**Type Constraints**
|
94
93
|
* **T** in (
|
95
94
|
tensor(double),
|
96
95
|
tensor(float),
|
97
96
|
tensor(float16)
|
98
97
|
):
|
99
98
|
Constrain input and output types to float tensors.
|
100
99
|
* **T1** in (
|
101
100
|
tensor(int32)
|
102
101
|
):
|
103
102
|
Constrain seq_lens to integer tensor.
|