LSTM - 7 vs 14#
Next section compares an older to a newer version of the same operator after both definition are converted into markdown text. Green means an addition to the newer version, red means a deletion. Anything else is unchanged.
- LSTM7 → LSTM14 +0 -11
LSTM7 → LSTM14
RENAMED
@@ -1 +1 @@
|
|
1
1
|
Computes an one-layer LSTM. This operator is usually supported via some
|
2
2
|
custom implementation such as CuDNN.
|
3
3
|
Notations:
|
4
4
|
X - input tensor
|
5
5
|
i - input gate
|
6
6
|
o - output gate
|
7
7
|
f - forget gate
|
8
8
|
c - cell gate
|
9
9
|
t - time step (t-1 means previous time step)
|
10
10
|
W[iofc] - W parameter weight matrix for input, output, forget, and cell gates
|
11
11
|
R[iofc] - R recurrence weight matrix for input, output, forget, and cell gates
|
12
12
|
Wb[iofc] - W bias vectors for input, output, forget, and cell gates
|
13
13
|
Rb[iofc] - R bias vectors for input, output, forget, and cell gates
|
14
14
|
P[iof] - P peephole weight vector for input, output, and forget gates
|
15
15
|
WB[iofc] - W parameter weight matrix for backward input, output, forget, and cell gates
|
16
16
|
RB[iofc] - R recurrence weight matrix for backward input, output, forget, and cell gates
|
17
17
|
WBb[iofc] - W bias vectors for backward input, output, forget, and cell gates
|
18
18
|
RBb[iofc] - R bias vectors for backward input, output, forget, and cell gates
|
19
19
|
PB[iof] - P peephole weight vector for backward input, output, and forget gates
|
20
20
|
H - Hidden state
|
21
21
|
num_directions - 2 if direction == bidirectional else 1
|
22
22
|
Activation functions:
|
23
23
|
Relu(x) - max(0, x)
|
24
24
|
Tanh(x) - (1 - e^{-2x})/(1 + e^{-2x})
|
25
25
|
Sigmoid(x) - 1/(1 + e^{-x})
|
26
26
|
(NOTE: Below are optional)
|
27
27
|
Affine(x) - alpha*x + beta
|
28
28
|
LeakyRelu(x) - x if x >= 0 else alpha * x
|
29
29
|
ThresholdedRelu(x) - x if x >= alpha else 0
|
30
30
|
ScaledTanh(x) - alpha*Tanh(beta*x)
|
31
31
|
HardSigmoid(x) - min(max(alpha*x + beta, 0), 1)
|
32
32
|
Elu(x) - x if x >= 0 else alpha*(e^x - 1)
|
33
33
|
Softsign(x) - x/(1 + |x|)
|
34
34
|
Softplus(x) - log(1 + e^x)
|
35
35
|
Equations (Default: f=Sigmoid, g=Tanh, h=Tanh):
|
36
36
|
- it = f(Xt*(Wi^T) + Ht-1*(Ri^T) + Pi (.) Ct-1 + Wbi + Rbi)
|
37
37
|
- ft = f(Xt*(Wf^T) + Ht-1*(Rf^T) + Pf (.) Ct-1 + Wbf + Rbf)
|
38
38
|
- ct = g(Xt*(Wc^T) + Ht-1*(Rc^T) + Wbc + Rbc)
|
39
39
|
- Ct = ft (.) Ct-1 + it (.) ct
|
40
40
|
- ot = f(Xt*(Wo^T) + Ht-1*(Ro^T) + Po (.) Ct + Wbo + Rbo)
|
41
41
|
- Ht = ot (.) h(Ct)
|
42
42
|
This operator has **optional** inputs/outputs. See ONNX <https://github.com/onnx/onnx/blob/master/docs/IR.md>_ for more details about the representation of optional arguments. An empty string may be used in the place of an actual argument's name to indicate a missing argument. Trailing optional arguments (those not followed by an argument that is present) may also be simply omitted.
|
43
43
|
**Attributes**
|
44
44
|
* **activation_alpha**:
|
45
45
|
Optional scaling values used by some activation functions. The
|
46
46
|
values are consumed in the order of activation functions, for
|
47
47
|
example (f, g, h) in LSTM. Default values are the same as of
|
48
48
|
corresponding ONNX operators.For example with LeakyRelu, the default
|
49
49
|
alpha is 0.01.
|
50
50
|
* **activation_beta**:
|
51
51
|
Optional scaling values used by some activation functions. The
|
52
52
|
values are consumed in the order of activation functions, for
|
53
53
|
example (f, g, h) in LSTM. Default values are the same as of
|
54
54
|
corresponding ONNX operators.
|
55
55
|
* **activations**:
|
56
56
|
A list of 3 (or 6 if bidirectional) activation functions for input,
|
57
57
|
output, forget, cell, and hidden. The activation functions must be
|
58
58
|
one of the activation functions specified above. Optional: See the
|
59
59
|
equations for default if not specified.
|
60
60
|
* **clip**:
|
61
61
|
Cell clip threshold. Clipping bounds the elements of a tensor in the
|
62
62
|
range of [-threshold, +threshold] and is applied to the input of
|
63
63
|
activations. No clip if not specified.
|
64
64
|
* **direction**:
|
65
65
|
Specify if the RNN is forward, reverse, or bidirectional. Must be
|
66
66
|
one of forward (default), reverse, or bidirectional.
|
67
67
|
* **hidden_size**:
|
68
68
|
Number of neurons in the hidden layer
|
69
69
|
* **input_forget**:
|
70
70
|
Couple the input and forget gates if 1.
|
71
|
-
* **layout**:
|
72
|
-
The shape format of inputs X, initial_h, initial_c and outputs Y,
|
73
|
-
Y_h, Y_c. If 0, the following shapes are expected: X.shape =
|
74
|
-
[seq_length, batch_size, input_size], Y.shape = [seq_length,
|
75
|
-
num_directions, batch_size, hidden_size], initial_h.shape =
|
76
|
-
Y_h.shape = initial_c.shape = Y_c.shape = [num_directions,
|
77
|
-
batch_size, hidden_size]. If 1, the following shapes are expected:
|
78
|
-
X.shape = [batch_size, seq_length, input_size], Y.shape =
|
79
|
-
[batch_size, seq_length, num_directions, hidden_size],
|
80
|
-
initial_h.shape = Y_h.shape = initial_c.shape = Y_c.shape =
|
81
|
-
[batch_size, num_directions, hidden_size].
|
82
71
|
**Inputs**
|
83
72
|
Between 3 and 8 inputs.
|
84
73
|
* **X** (heterogeneous) - **T**:
|
85
74
|
The input sequences packed (and potentially padded) into one 3-D
|
86
75
|
tensor with the shape of [seq_length, batch_size, input_size].
|
87
76
|
* **W** (heterogeneous) - **T**:
|
88
77
|
The weight tensor for the gates. Concatenation of W[iofc] and
|
89
78
|
WB[iofc] (if bidirectional) along dimension 0. The tensor has
|
90
79
|
shape [num_directions, 4*hidden_size, input_size].
|
91
80
|
* **R** (heterogeneous) - **T**:
|
92
81
|
The recurrence weight tensor. Concatenation of R[iofc] and
|
93
82
|
RB[iofc] (if bidirectional) along dimension 0. This tensor has
|
94
83
|
shape [num_directions, 4*hidden_size, hidden_size].
|
95
84
|
* **B** (optional, heterogeneous) - **T**:
|
96
85
|
The bias tensor for input gate. Concatenation of [Wb[iofc],
|
97
86
|
Rb[iofc]], and [WBb[iofc], RBb[iofc]] (if bidirectional) along
|
98
87
|
dimension 0. This tensor has shape [num_directions,
|
99
88
|
8*hidden_size]. Optional: If not specified - assumed to be 0.
|
100
89
|
* **sequence_lens** (optional, heterogeneous) - **T1**:
|
101
90
|
Optional tensor specifying lengths of the sequences in a batch. If
|
102
91
|
not specified - assumed all sequences in the batch to have length
|
103
92
|
seq_length. It has shape [batch_size].
|
104
93
|
* **initial_h** (optional, heterogeneous) - **T**:
|
105
94
|
Optional initial value of the hidden. If not specified - assumed to
|
106
95
|
be 0. It has shape [num_directions, batch_size, hidden_size].
|
107
96
|
* **initial_c** (optional, heterogeneous) - **T**:
|
108
97
|
Optional initial value of the cell. If not specified - assumed to be
|
109
98
|
0. It has shape [num_directions, batch_size, hidden_size].
|
110
99
|
* **P** (optional, heterogeneous) - **T**:
|
111
100
|
The weight tensor for peepholes. Concatenation of P[iof] and
|
112
101
|
PB[iof] (if bidirectional) along dimension 0. It has shape
|
113
102
|
[num_directions, 3*hidde_size]. Optional: If not specified -
|
114
103
|
assumed to be 0.
|
115
104
|
**Outputs**
|
116
105
|
Between 0 and 3 outputs.
|
117
106
|
* **Y** (optional, heterogeneous) - **T**:
|
118
107
|
A tensor that concats all the intermediate output values of the
|
119
108
|
hidden. It has shape [seq_length, num_directions, batch_size,
|
120
109
|
hidden_size].
|
121
110
|
* **Y_h** (optional, heterogeneous) - **T**:
|
122
111
|
The last output value of the hidden. It has shape [num_directions,
|
123
112
|
batch_size, hidden_size].
|
124
113
|
* **Y_c** (optional, heterogeneous) - **T**:
|
125
114
|
The last output value of the cell. It has shape [num_directions,
|
126
115
|
batch_size, hidden_size].
|
127
116
|
**Type Constraints**
|
128
117
|
* **T** in (
|
129
118
|
tensor(double),
|
130
119
|
tensor(float),
|
131
120
|
tensor(float16)
|
132
121
|
):
|
133
122
|
Constrain input and output types to float tensors.
|
134
123
|
* **T1** in (
|
135
124
|
tensor(int32)
|
136
125
|
):
|
137
126
|
Constrain seq_lens to integer tensor.
|