LSTM#

LSTM - 14#

Version

  • name: LSTM (GitHub)

  • domain: main

  • since_version: 14

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 14.

Summary

Computes an one-layer LSTM. This operator is usually supported via some custom implementation such as CuDNN.

Notations:

X - input tensor

i - input gate

o - output gate

f - forget gate

c - cell gate

t - time step (t-1 means previous time step)

W[iofc] - W parameter weight matrix for input, output, forget, and cell gates

R[iofc] - R recurrence weight matrix for input, output, forget, and cell gates

Wb[iofc] - W bias vectors for input, output, forget, and cell gates

Rb[iofc] - R bias vectors for input, output, forget, and cell gates

P[iof] - P peephole weight vector for input, output, and forget gates

WB[iofc] - W parameter weight matrix for backward input, output, forget, and cell gates

RB[iofc] - R recurrence weight matrix for backward input, output, forget, and cell gates

WBb[iofc] - W bias vectors for backward input, output, forget, and cell gates

RBb[iofc] - R bias vectors for backward input, output, forget, and cell gates

PB[iof] - P peephole weight vector for backward input, output, and forget gates

H - Hidden state

num_directions - 2 if direction == bidirectional else 1

Activation functions:

Relu(x) - max(0, x)

Tanh(x) - (1 - e^{-2x})/(1 + e^{-2x})

Sigmoid(x) - 1/(1 + e^{-x})

(NOTE: Below are optional)

Affine(x) - alpha*x + beta

LeakyRelu(x) - x if x >= 0 else alpha * x

ThresholdedRelu(x) - x if x >= alpha else 0

ScaledTanh(x) - alpha*Tanh(beta*x)

HardSigmoid(x) - min(max(alpha*x + beta, 0), 1)

Elu(x) - x if x >= 0 else alpha*(e^x - 1)

Softsign(x) - x/(1 + |x|)

Softplus(x) - log(1 + e^x)

Equations (Default: f=Sigmoid, g=Tanh, h=Tanh):

  • it = f(Xt*(Wi^T) + Ht-1*(Ri^T) + Pi (.) Ct-1 + Wbi + Rbi)

  • ft = f(Xt*(Wf^T) + Ht-1*(Rf^T) + Pf (.) Ct-1 + Wbf + Rbf)

  • ct = g(Xt*(Wc^T) + Ht-1*(Rc^T) + Wbc + Rbc)

  • Ct = ft (.) Ct-1 + it (.) ct

  • ot = f(Xt*(Wo^T) + Ht-1*(Ro^T) + Po (.) Ct + Wbo + Rbo)

  • Ht = ot (.) h(Ct)

This operator has optional inputs/outputs. See ONNX for more details about the representation of optional arguments. An empty string may be used in the place of an actual argument’s name to indicate a missing argument. Trailing optional arguments (those not followed by an argument that is present) may also be simply omitted.

Attributes

  • activation_alpha: Optional scaling values used by some activation functions. The values are consumed in the order of activation functions, for example (f, g, h) in LSTM. Default values are the same as of corresponding ONNX operators.For example with LeakyRelu, the default alpha is 0.01.

  • activation_beta: Optional scaling values used by some activation functions. The values are consumed in the order of activation functions, for example (f, g, h) in LSTM. Default values are the same as of corresponding ONNX operators.

  • activations: A list of 3 (or 6 if bidirectional) activation functions for input, output, forget, cell, and hidden. The activation functions must be one of the activation functions specified above. Optional: See the equations for default if not specified.

  • clip: Cell clip threshold. Clipping bounds the elements of a tensor in the range of [-threshold, +threshold] and is applied to the input of activations. No clip if not specified.

  • direction: Specify if the RNN is forward, reverse, or bidirectional. Must be one of forward (default), reverse, or bidirectional. Default value is 'forward'.

  • hidden_size: Number of neurons in the hidden layer

  • input_forget: Couple the input and forget gates if 1. Default value is 0.

  • layout: The shape format of inputs X, initial_h, initial_c and outputs Y, Y_h, Y_c. If 0, the following shapes are expected: X.shape = [seq_length, batch_size, input_size], Y.shape = [seq_length, num_directions, batch_size, hidden_size], initial_h.shape = Y_h.shape = initial_c.shape = Y_c.shape = [num_directions, batch_size, hidden_size]. If 1, the following shapes are expected: X.shape = [batch_size, seq_length, input_size], Y.shape = [batch_size, seq_length, num_directions, hidden_size], initial_h.shape = Y_h.shape = initial_c.shape = Y_c.shape = [batch_size, num_directions, hidden_size]. Default value is 0.

Inputs

Between 3 and 8 inputs.

  • X (heterogeneous) - T: The input sequences packed (and potentially padded) into one 3-D tensor with the shape of [seq_length, batch_size, input_size].

  • W (heterogeneous) - T: The weight tensor for the gates. Concatenation of W[iofc] and WB[iofc] (if bidirectional) along dimension 0. The tensor has shape [num_directions, 4*hidden_size, input_size].

  • R (heterogeneous) - T: The recurrence weight tensor. Concatenation of R[iofc] and RB[iofc] (if bidirectional) along dimension 0. This tensor has shape [num_directions, 4*hidden_size, hidden_size].

  • B (optional, heterogeneous) - T: The bias tensor for input gate. Concatenation of [Wb[iofc], Rb[iofc]], and [WBb[iofc], RBb[iofc]] (if bidirectional) along dimension 0. This tensor has shape [num_directions, 8*hidden_size]. Optional: If not specified - assumed to be 0.

  • sequence_lens (optional, heterogeneous) - T1: Optional tensor specifying lengths of the sequences in a batch. If not specified - assumed all sequences in the batch to have length seq_length. It has shape [batch_size].

  • initial_h (optional, heterogeneous) - T: Optional initial value of the hidden. If not specified - assumed to be 0. It has shape [num_directions, batch_size, hidden_size].

  • initial_c (optional, heterogeneous) - T: Optional initial value of the cell. If not specified - assumed to be 0. It has shape [num_directions, batch_size, hidden_size].

  • P (optional, heterogeneous) - T: The weight tensor for peepholes. Concatenation of P[iof] and PB[iof] (if bidirectional) along dimension 0. It has shape [num_directions, 3*hidde_size]. Optional: If not specified - assumed to be 0.

Outputs

Between 0 and 3 outputs.

  • Y (optional, heterogeneous) - T: A tensor that concats all the intermediate output values of the hidden. It has shape [seq_length, num_directions, batch_size, hidden_size].

  • Y_h (optional, heterogeneous) - T: The last output value of the hidden. It has shape [num_directions, batch_size, hidden_size].

  • Y_c (optional, heterogeneous) - T: The last output value of the cell. It has shape [num_directions, batch_size, hidden_size].

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

  • T1 in ( tensor(int32) ): Constrain seq_lens to integer tensor.

Examples

defaults

input = np.array([[[1., 2.], [3., 4.], [5., 6.]]]).astype(np.float32)

input_size = 2
hidden_size = 3
weight_scale = 0.1
number_of_gates = 4

node = onnx.helper.make_node(
    'LSTM',
    inputs=['X', 'W', 'R'],
    outputs=['', 'Y_h'],
    hidden_size=hidden_size
)

W = weight_scale * np.ones((1, number_of_gates * hidden_size, input_size)).astype(np.float32)
R = weight_scale * np.ones((1, number_of_gates * hidden_size, hidden_size)).astype(np.float32)

lstm = LSTM_Helper(X=input, W=W, R=R)
_, Y_h = lstm.step()
expect(node, inputs=[input, W, R], outputs=[Y_h.astype(np.float32)], name='test_lstm_defaults')

initial_bias

input = np.array([[[1., 2., 3.], [4., 5., 6.], [7., 8., 9.]]]).astype(np.float32)

input_size = 3
hidden_size = 4
weight_scale = 0.1
custom_bias = 0.1
number_of_gates = 4

node = onnx.helper.make_node(
    'LSTM',
    inputs=['X', 'W', 'R', 'B'],
    outputs=['', 'Y_h'],
    hidden_size=hidden_size
)

W = weight_scale * np.ones((1, number_of_gates * hidden_size, input_size)).astype(np.float32)
R = weight_scale * np.ones((1, number_of_gates * hidden_size, hidden_size)).astype(np.float32)

# Adding custom bias
W_B = custom_bias * np.ones((1, number_of_gates * hidden_size)).astype(np.float32)
R_B = np.zeros((1, number_of_gates * hidden_size)).astype(np.float32)
B = np.concatenate((W_B, R_B), 1)

lstm = LSTM_Helper(X=input, W=W, R=R, B=B)
_, Y_h = lstm.step()
expect(node, inputs=[input, W, R, B], outputs=[Y_h.astype(np.float32)], name='test_lstm_with_initial_bias')

peepholes

input = np.array([[[1., 2., 3., 4.], [5., 6., 7., 8.]]]).astype(np.float32)

input_size = 4
hidden_size = 3
weight_scale = 0.1
number_of_gates = 4
number_of_peepholes = 3

node = onnx.helper.make_node(
    'LSTM',
    inputs=['X', 'W', 'R', 'B', 'sequence_lens', 'initial_h', 'initial_c', 'P'],
    outputs=['', 'Y_h'],
    hidden_size=hidden_size
)

# Initializing Inputs
W = weight_scale * np.ones((1, number_of_gates * hidden_size, input_size)).astype(np.float32)
R = weight_scale * np.ones((1, number_of_gates * hidden_size, hidden_size)).astype(np.float32)
B = np.zeros((1, 2 * number_of_gates * hidden_size)).astype(np.float32)
seq_lens = np.repeat(input.shape[0], input.shape[1]).astype(np.int32)
init_h = np.zeros((1, input.shape[1], hidden_size)).astype(np.float32)
init_c = np.zeros((1, input.shape[1], hidden_size)).astype(np.float32)
P = weight_scale * np.ones((1, number_of_peepholes * hidden_size)).astype(np.float32)

lstm = LSTM_Helper(X=input, W=W, R=R, B=B, P=P, initial_c=init_c, initial_h=init_h)
_, Y_h = lstm.step()
expect(node, inputs=[input, W, R, B, seq_lens, init_h, init_c, P], outputs=[Y_h.astype(np.float32)],
       name='test_lstm_with_peepholes')

batchwise

input = np.array([[[1., 2.]], [[3., 4.]], [[5., 6.]]]).astype(np.float32)

input_size = 2
hidden_size = 7
weight_scale = 0.3
number_of_gates = 4
layout = 1

node = onnx.helper.make_node(
    'LSTM',
    inputs=['X', 'W', 'R'],
    outputs=['Y', 'Y_h'],
    hidden_size=hidden_size,
    layout=layout
)

W = weight_scale * np.ones((1, number_of_gates * hidden_size, input_size)).astype(np.float32)
R = weight_scale * np.ones((1, number_of_gates * hidden_size, hidden_size)).astype(np.float32)

lstm = LSTM_Helper(X=input, W=W, R=R, layout=layout)
Y, Y_h = lstm.step()
expect(node, inputs=[input, W, R], outputs=[Y.astype(np.float32), Y_h.astype(np.float32)], name='test_lstm_batchwise')

Differences

00Computes an one-layer LSTM. This operator is usually supported via someComputes an one-layer LSTM. This operator is usually supported via some
11custom implementation such as CuDNN.custom implementation such as CuDNN.
22
33Notations:Notations:
44
55X - input tensorX - input tensor
66
77i - input gatei - input gate
88
99o - output gateo - output gate
1010
1111f - forget gatef - forget gate
1212
1313c - cell gatec - cell gate
1414
1515t - time step (t-1 means previous time step)t - time step (t-1 means previous time step)
1616
1717W[iofc] - W parameter weight matrix for input, output, forget, and cell gatesW[iofc] - W parameter weight matrix for input, output, forget, and cell gates
1818
1919R[iofc] - R recurrence weight matrix for input, output, forget, and cell gatesR[iofc] - R recurrence weight matrix for input, output, forget, and cell gates
2020
2121Wb[iofc] - W bias vectors for input, output, forget, and cell gatesWb[iofc] - W bias vectors for input, output, forget, and cell gates
2222
2323Rb[iofc] - R bias vectors for input, output, forget, and cell gatesRb[iofc] - R bias vectors for input, output, forget, and cell gates
2424
2525P[iof] - P peephole weight vector for input, output, and forget gatesP[iof] - P peephole weight vector for input, output, and forget gates
2626
2727WB[iofc] - W parameter weight matrix for backward input, output, forget, and cell gatesWB[iofc] - W parameter weight matrix for backward input, output, forget, and cell gates
2828
2929RB[iofc] - R recurrence weight matrix for backward input, output, forget, and cell gatesRB[iofc] - R recurrence weight matrix for backward input, output, forget, and cell gates
3030
3131WBb[iofc] - W bias vectors for backward input, output, forget, and cell gatesWBb[iofc] - W bias vectors for backward input, output, forget, and cell gates
3232
3333RBb[iofc] - R bias vectors for backward input, output, forget, and cell gatesRBb[iofc] - R bias vectors for backward input, output, forget, and cell gates
3434
3535PB[iof] - P peephole weight vector for backward input, output, and forget gatesPB[iof] - P peephole weight vector for backward input, output, and forget gates
3636
3737H - Hidden stateH - Hidden state
3838
3939num_directions - 2 if direction == bidirectional else 1num_directions - 2 if direction == bidirectional else 1
4040
4141Activation functions:Activation functions:
4242
4343 Relu(x) - max(0, x) Relu(x) - max(0, x)
4444
4545 Tanh(x) - (1 - e^{-2x})/(1 + e^{-2x}) Tanh(x) - (1 - e^{-2x})/(1 + e^{-2x})
4646
4747 Sigmoid(x) - 1/(1 + e^{-x}) Sigmoid(x) - 1/(1 + e^{-x})
4848
4949 (NOTE: Below are optional) (NOTE: Below are optional)
5050
5151 Affine(x) - alpha*x + beta Affine(x) - alpha*x + beta
5252
5353 LeakyRelu(x) - x if x >= 0 else alpha * x LeakyRelu(x) - x if x >= 0 else alpha * x
5454
5555 ThresholdedRelu(x) - x if x >= alpha else 0 ThresholdedRelu(x) - x if x >= alpha else 0
5656
5757 ScaledTanh(x) - alpha*Tanh(beta*x) ScaledTanh(x) - alpha*Tanh(beta*x)
5858
5959 HardSigmoid(x) - min(max(alpha*x + beta, 0), 1) HardSigmoid(x) - min(max(alpha*x + beta, 0), 1)
6060
6161 Elu(x) - x if x >= 0 else alpha*(e^x - 1) Elu(x) - x if x >= 0 else alpha*(e^x - 1)
6262
6363 Softsign(x) - x/(1 + |x|) Softsign(x) - x/(1 + |x|)
6464
6565 Softplus(x) - log(1 + e^x) Softplus(x) - log(1 + e^x)
6666
6767Equations (Default: f=Sigmoid, g=Tanh, h=Tanh):Equations (Default: f=Sigmoid, g=Tanh, h=Tanh):
6868
6969 - it = f(Xt*(Wi^T) + Ht-1*(Ri^T) + Pi (.) Ct-1 + Wbi + Rbi) - it = f(Xt*(Wi^T) + Ht-1*(Ri^T) + Pi (.) Ct-1 + Wbi + Rbi)
7070
7171 - ft = f(Xt*(Wf^T) + Ht-1*(Rf^T) + Pf (.) Ct-1 + Wbf + Rbf) - ft = f(Xt*(Wf^T) + Ht-1*(Rf^T) + Pf (.) Ct-1 + Wbf + Rbf)
7272
7373 - ct = g(Xt*(Wc^T) + Ht-1*(Rc^T) + Wbc + Rbc) - ct = g(Xt*(Wc^T) + Ht-1*(Rc^T) + Wbc + Rbc)
7474
7575 - Ct = ft (.) Ct-1 + it (.) ct - Ct = ft (.) Ct-1 + it (.) ct
7676
7777 - ot = f(Xt*(Wo^T) + Ht-1*(Ro^T) + Po (.) Ct + Wbo + Rbo) - ot = f(Xt*(Wo^T) + Ht-1*(Ro^T) + Po (.) Ct + Wbo + Rbo)
7878
7979 - Ht = ot (.) h(Ct) - Ht = ot (.) h(Ct)
8080This operator has **optional** inputs/outputs. See ONNX _ for more details about the representation of optional arguments. An empty string may be used in the place of an actual argument's name to indicate a missing argument. Trailing optional arguments (those not followed by an argument that is present) may also be simply omitted.This operator has **optional** inputs/outputs. See ONNX _ for more details about the representation of optional arguments. An empty string may be used in the place of an actual argument's name to indicate a missing argument. Trailing optional arguments (those not followed by an argument that is present) may also be simply omitted.
8181
8282**Attributes****Attributes**
8383
8484* **activation_alpha**:* **activation_alpha**:
8585 Optional scaling values used by some activation functions. The Optional scaling values used by some activation functions. The
8686 values are consumed in the order of activation functions, for values are consumed in the order of activation functions, for
8787 example (f, g, h) in LSTM. Default values are the same as of example (f, g, h) in LSTM. Default values are the same as of
8888 corresponding ONNX operators.For example with LeakyRelu, the default corresponding ONNX operators.For example with LeakyRelu, the default
8989 alpha is 0.01. alpha is 0.01.
9090* **activation_beta**:* **activation_beta**:
9191 Optional scaling values used by some activation functions. The Optional scaling values used by some activation functions. The
9292 values are consumed in the order of activation functions, for values are consumed in the order of activation functions, for
9393 example (f, g, h) in LSTM. Default values are the same as of example (f, g, h) in LSTM. Default values are the same as of
9494 corresponding ONNX operators. corresponding ONNX operators.
9595* **activations**:* **activations**:
9696 A list of 3 (or 6 if bidirectional) activation functions for input, A list of 3 (or 6 if bidirectional) activation functions for input,
9797 output, forget, cell, and hidden. The activation functions must be output, forget, cell, and hidden. The activation functions must be
9898 one of the activation functions specified above. Optional: See the one of the activation functions specified above. Optional: See the
9999 equations for default if not specified. equations for default if not specified.
100100* **clip**:* **clip**:
101101 Cell clip threshold. Clipping bounds the elements of a tensor in the Cell clip threshold. Clipping bounds the elements of a tensor in the
102102 range of [-threshold, +threshold] and is applied to the input of range of [-threshold, +threshold] and is applied to the input of
103103 activations. No clip if not specified. activations. No clip if not specified.
104104* **direction**:* **direction**:
105105 Specify if the RNN is forward, reverse, or bidirectional. Must be Specify if the RNN is forward, reverse, or bidirectional. Must be
106106 one of forward (default), reverse, or bidirectional. Default value is 'forward'. one of forward (default), reverse, or bidirectional. Default value is 'forward'.
107107* **hidden_size**:* **hidden_size**:
108108 Number of neurons in the hidden layer Number of neurons in the hidden layer
109109* **input_forget**:* **input_forget**:
110110 Couple the input and forget gates if 1. Default value is 0. Couple the input and forget gates if 1. Default value is 0.
111* **layout**:
112 The shape format of inputs X, initial_h, initial_c and outputs Y,
113 Y_h, Y_c. If 0, the following shapes are expected: X.shape =
114 [seq_length, batch_size, input_size], Y.shape = [seq_length,
115 num_directions, batch_size, hidden_size], initial_h.shape =
116 Y_h.shape = initial_c.shape = Y_c.shape = [num_directions,
117 batch_size, hidden_size]. If 1, the following shapes are expected:
118 X.shape = [batch_size, seq_length, input_size], Y.shape =
119 [batch_size, seq_length, num_directions, hidden_size],
120 initial_h.shape = Y_h.shape = initial_c.shape = Y_c.shape =
121 [batch_size, num_directions, hidden_size]. Default value is 0.
111122
112123**Inputs****Inputs**
113124
114125Between 3 and 8 inputs.Between 3 and 8 inputs.
115126
116127* **X** (heterogeneous) - **T**:* **X** (heterogeneous) - **T**:
117128 The input sequences packed (and potentially padded) into one 3-D The input sequences packed (and potentially padded) into one 3-D
118129 tensor with the shape of [seq_length, batch_size, input_size]. tensor with the shape of [seq_length, batch_size, input_size].
119130* **W** (heterogeneous) - **T**:* **W** (heterogeneous) - **T**:
120131 The weight tensor for the gates. Concatenation of W[iofc] and The weight tensor for the gates. Concatenation of W[iofc] and
121132 WB[iofc] (if bidirectional) along dimension 0. The tensor has WB[iofc] (if bidirectional) along dimension 0. The tensor has
122133 shape [num_directions, 4*hidden_size, input_size]. shape [num_directions, 4*hidden_size, input_size].
123134* **R** (heterogeneous) - **T**:* **R** (heterogeneous) - **T**:
124135 The recurrence weight tensor. Concatenation of R[iofc] and The recurrence weight tensor. Concatenation of R[iofc] and
125136 RB[iofc] (if bidirectional) along dimension 0. This tensor has RB[iofc] (if bidirectional) along dimension 0. This tensor has
126137 shape [num_directions, 4*hidden_size, hidden_size]. shape [num_directions, 4*hidden_size, hidden_size].
127138* **B** (optional, heterogeneous) - **T**:* **B** (optional, heterogeneous) - **T**:
128139 The bias tensor for input gate. Concatenation of [Wb[iofc], The bias tensor for input gate. Concatenation of [Wb[iofc],
129140 Rb[iofc]], and [WBb[iofc], RBb[iofc]] (if bidirectional) along Rb[iofc]], and [WBb[iofc], RBb[iofc]] (if bidirectional) along
130141 dimension 0. This tensor has shape [num_directions, dimension 0. This tensor has shape [num_directions,
131142 8*hidden_size]. Optional: If not specified - assumed to be 0. 8*hidden_size]. Optional: If not specified - assumed to be 0.
132143* **sequence_lens** (optional, heterogeneous) - **T1**:* **sequence_lens** (optional, heterogeneous) - **T1**:
133144 Optional tensor specifying lengths of the sequences in a batch. If Optional tensor specifying lengths of the sequences in a batch. If
134145 not specified - assumed all sequences in the batch to have length not specified - assumed all sequences in the batch to have length
135146 seq_length. It has shape [batch_size]. seq_length. It has shape [batch_size].
136147* **initial_h** (optional, heterogeneous) - **T**:* **initial_h** (optional, heterogeneous) - **T**:
137148 Optional initial value of the hidden. If not specified - assumed to Optional initial value of the hidden. If not specified - assumed to
138149 be 0. It has shape [num_directions, batch_size, hidden_size]. be 0. It has shape [num_directions, batch_size, hidden_size].
139150* **initial_c** (optional, heterogeneous) - **T**:* **initial_c** (optional, heterogeneous) - **T**:
140151 Optional initial value of the cell. If not specified - assumed to be Optional initial value of the cell. If not specified - assumed to be
141152 0. It has shape [num_directions, batch_size, hidden_size]. 0. It has shape [num_directions, batch_size, hidden_size].
142153* **P** (optional, heterogeneous) - **T**:* **P** (optional, heterogeneous) - **T**:
143154 The weight tensor for peepholes. Concatenation of P[iof] and The weight tensor for peepholes. Concatenation of P[iof] and
144155 PB[iof] (if bidirectional) along dimension 0. It has shape PB[iof] (if bidirectional) along dimension 0. It has shape
145156 [num_directions, 3*hidde_size]. Optional: If not specified - [num_directions, 3*hidde_size]. Optional: If not specified -
146157 assumed to be 0. assumed to be 0.
147158
148159**Outputs****Outputs**
149160
150161Between 0 and 3 outputs.Between 0 and 3 outputs.
151162
152163* **Y** (optional, heterogeneous) - **T**:* **Y** (optional, heterogeneous) - **T**:
153164 A tensor that concats all the intermediate output values of the A tensor that concats all the intermediate output values of the
154165 hidden. It has shape [seq_length, num_directions, batch_size, hidden. It has shape [seq_length, num_directions, batch_size,
155166 hidden_size]. hidden_size].
156167* **Y_h** (optional, heterogeneous) - **T**:* **Y_h** (optional, heterogeneous) - **T**:
157168 The last output value of the hidden. It has shape [num_directions, The last output value of the hidden. It has shape [num_directions,
158169 batch_size, hidden_size]. batch_size, hidden_size].
159170* **Y_c** (optional, heterogeneous) - **T**:* **Y_c** (optional, heterogeneous) - **T**:
160171 The last output value of the cell. It has shape [num_directions, The last output value of the cell. It has shape [num_directions,
161172 batch_size, hidden_size]. batch_size, hidden_size].
162173
163174**Type Constraints****Type Constraints**
164175
165176* **T** in (* **T** in (
166177 tensor(double), tensor(double),
167178 tensor(float), tensor(float),
168179 tensor(float16) tensor(float16)
169180 ): ):
170181 Constrain input and output types to float tensors. Constrain input and output types to float tensors.
171182* **T1** in (* **T1** in (
172183 tensor(int32) tensor(int32)
173184 ): ):
174185 Constrain seq_lens to integer tensor. Constrain seq_lens to integer tensor.

LSTM - 7#

Version

  • name: LSTM (GitHub)

  • domain: main

  • since_version: 7

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 7.

Summary

Computes an one-layer LSTM. This operator is usually supported via some custom implementation such as CuDNN.

Notations:

X - input tensor

i - input gate

o - output gate

f - forget gate

c - cell gate

t - time step (t-1 means previous time step)

W[iofc] - W parameter weight matrix for input, output, forget, and cell gates

R[iofc] - R recurrence weight matrix for input, output, forget, and cell gates

Wb[iofc] - W bias vectors for input, output, forget, and cell gates

Rb[iofc] - R bias vectors for input, output, forget, and cell gates

P[iof] - P peephole weight vector for input, output, and forget gates

WB[iofc] - W parameter weight matrix for backward input, output, forget, and cell gates

RB[iofc] - R recurrence weight matrix for backward input, output, forget, and cell gates

WBb[iofc] - W bias vectors for backward input, output, forget, and cell gates

RBb[iofc] - R bias vectors for backward input, output, forget, and cell gates

PB[iof] - P peephole weight vector for backward input, output, and forget gates

H - Hidden state

num_directions - 2 if direction == bidirectional else 1

Activation functions:

Relu(x) - max(0, x)

Tanh(x) - (1 - e^{-2x})/(1 + e^{-2x})

Sigmoid(x) - 1/(1 + e^{-x})

(NOTE: Below are optional)

Affine(x) - alpha*x + beta

LeakyRelu(x) - x if x >= 0 else alpha * x

ThresholdedRelu(x) - x if x >= alpha else 0

ScaledTanh(x) - alpha*Tanh(beta*x)

HardSigmoid(x) - min(max(alpha*x + beta, 0), 1)

Elu(x) - x if x >= 0 else alpha*(e^x - 1)

Softsign(x) - x/(1 + |x|)

Softplus(x) - log(1 + e^x)

Equations (Default: f=Sigmoid, g=Tanh, h=Tanh):

  • it = f(Xt*(Wi^T) + Ht-1*(Ri^T) + Pi (.) Ct-1 + Wbi + Rbi)

  • ft = f(Xt*(Wf^T) + Ht-1*(Rf^T) + Pf (.) Ct-1 + Wbf + Rbf)

  • ct = g(Xt*(Wc^T) + Ht-1*(Rc^T) + Wbc + Rbc)

  • Ct = ft (.) Ct-1 + it (.) ct

  • ot = f(Xt*(Wo^T) + Ht-1*(Ro^T) + Po (.) Ct + Wbo + Rbo)

  • Ht = ot (.) h(Ct)

This operator has optional inputs/outputs. See ONNX for more details about the representation of optional arguments. An empty string may be used in the place of an actual argument’s name to indicate a missing argument. Trailing optional arguments (those not followed by an argument that is present) may also be simply omitted.

Attributes

  • activation_alpha: Optional scaling values used by some activation functions. The values are consumed in the order of activation functions, for example (f, g, h) in LSTM. Default values are the same as of corresponding ONNX operators.For example with LeakyRelu, the default alpha is 0.01.

  • activation_beta: Optional scaling values used by some activation functions. The values are consumed in the order of activation functions, for example (f, g, h) in LSTM. Default values are the same as of corresponding ONNX operators.

  • activations: A list of 3 (or 6 if bidirectional) activation functions for input, output, forget, cell, and hidden. The activation functions must be one of the activation functions specified above. Optional: See the equations for default if not specified.

  • clip: Cell clip threshold. Clipping bounds the elements of a tensor in the range of [-threshold, +threshold] and is applied to the input of activations. No clip if not specified.

  • direction: Specify if the RNN is forward, reverse, or bidirectional. Must be one of forward (default), reverse, or bidirectional. Default value is 'forward'.

  • hidden_size: Number of neurons in the hidden layer

  • input_forget: Couple the input and forget gates if 1. Default value is 0.

Inputs

Between 3 and 8 inputs.

  • X (heterogeneous) - T: The input sequences packed (and potentially padded) into one 3-D tensor with the shape of [seq_length, batch_size, input_size].

  • W (heterogeneous) - T: The weight tensor for the gates. Concatenation of W[iofc] and WB[iofc] (if bidirectional) along dimension 0. The tensor has shape [num_directions, 4*hidden_size, input_size].

  • R (heterogeneous) - T: The recurrence weight tensor. Concatenation of R[iofc] and RB[iofc] (if bidirectional) along dimension 0. This tensor has shape [num_directions, 4*hidden_size, hidden_size].

  • B (optional, heterogeneous) - T: The bias tensor for input gate. Concatenation of [Wb[iofc], Rb[iofc]], and [WBb[iofc], RBb[iofc]] (if bidirectional) along dimension 0. This tensor has shape [num_directions, 8*hidden_size]. Optional: If not specified - assumed to be 0.

  • sequence_lens (optional, heterogeneous) - T1: Optional tensor specifying lengths of the sequences in a batch. If not specified - assumed all sequences in the batch to have length seq_length. It has shape [batch_size].

  • initial_h (optional, heterogeneous) - T: Optional initial value of the hidden. If not specified - assumed to be 0. It has shape [num_directions, batch_size, hidden_size].

  • initial_c (optional, heterogeneous) - T: Optional initial value of the cell. If not specified - assumed to be 0. It has shape [num_directions, batch_size, hidden_size].

  • P (optional, heterogeneous) - T: The weight tensor for peepholes. Concatenation of P[iof] and PB[iof] (if bidirectional) along dimension 0. It has shape [num_directions, 3*hidde_size]. Optional: If not specified - assumed to be 0.

Outputs

Between 0 and 3 outputs.

  • Y (optional, heterogeneous) - T: A tensor that concats all the intermediate output values of the hidden. It has shape [seq_length, num_directions, batch_size, hidden_size].

  • Y_h (optional, heterogeneous) - T: The last output value of the hidden. It has shape [num_directions, batch_size, hidden_size].

  • Y_c (optional, heterogeneous) - T: The last output value of the cell. It has shape [num_directions, batch_size, hidden_size].

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

  • T1 in ( tensor(int32) ): Constrain seq_lens to integer tensor.

Differences

00Computes an one-layer LSTM. This operator is usually supported via someComputes an one-layer LSTM. This operator is usually supported via some
11custom implementation such as CuDNN.custom implementation such as CuDNN.
22
33Notations:Notations:
44
55X - input tensorX - input tensor
66
77i - input gatei - input gate
88
99o - output gateo - output gate
1010
1111f - forget gatef - forget gate
1212
1313c - cell gatec - cell gate
1414
1515t - time step (t-1 means previous time step)t - time step (t-1 means previous time step)
1616
1717W[iofc] - W parameter weight matrix for input, output, forget, and cell gatesW[iofc] - W parameter weight matrix for input, output, forget, and cell gates
1818
1919R[iofc] - R recurrence weight matrix for input, output, forget, and cell gatesR[iofc] - R recurrence weight matrix for input, output, forget, and cell gates
2020
2121Wb[iofc] - W bias vectors for input, output, forget, and cell gatesWb[iofc] - W bias vectors for input, output, forget, and cell gates
2222
2323Rb[iofc] - R bias vectors for input, output, forget, and cell gatesRb[iofc] - R bias vectors for input, output, forget, and cell gates
2424
2525P[iof] - P peephole weight vector for input, output, and forget gatesP[iof] - P peephole weight vector for input, output, and forget gates
2626
2727WB[iofc] - W parameter weight matrix for backward input, output, forget, and cell gatesWB[iofc] - W parameter weight matrix for backward input, output, forget, and cell gates
2828
2929RB[iofc] - R recurrence weight matrix for backward input, output, forget, and cell gatesRB[iofc] - R recurrence weight matrix for backward input, output, forget, and cell gates
3030
3131WBb[iofc] - W bias vectors for backward input, output, forget, and cell gatesWBb[iofc] - W bias vectors for backward input, output, forget, and cell gates
3232
3333RBb[iofc] - R bias vectors for backward input, output, forget, and cell gatesRBb[iofc] - R bias vectors for backward input, output, forget, and cell gates
3434
3535PB[iof] - P peephole weight vector for backward input, output, and forget gatesPB[iof] - P peephole weight vector for backward input, output, and forget gates
3636
3737H - Hidden stateH - Hidden state
3838
3939num_directions - 2 if direction == bidirectional else 1num_directions - 2 if direction == bidirectional else 1
4040
4141Activation functions:Activation functions:
4242
4343 Relu(x) - max(0, x) Relu(x) - max(0, x)
4444
4545 Tanh(x) - (1 - e^{-2x})/(1 + e^{-2x}) Tanh(x) - (1 - e^{-2x})/(1 + e^{-2x})
4646
4747 Sigmoid(x) - 1/(1 + e^{-x}) Sigmoid(x) - 1/(1 + e^{-x})
4848
4949 (NOTE: Below are optional) (NOTE: Below are optional)
5050
5151 Affine(x) - alpha*x + beta Affine(x) - alpha*x + beta
5252
5353 LeakyRelu(x) - x if x >= 0 else alpha * x LeakyRelu(x) - x if x >= 0 else alpha * x
5454
5555 ThresholdedRelu(x) - x if x >= alpha else 0 ThresholdedRelu(x) - x if x >= alpha else 0
5656
5757 ScaledTanh(x) - alpha*Tanh(beta*x) ScaledTanh(x) - alpha*Tanh(beta*x)
5858
5959 HardSigmoid(x) - min(max(alpha*x + beta, 0), 1) HardSigmoid(x) - min(max(alpha*x + beta, 0), 1)
6060
6161 Elu(x) - x if x >= 0 else alpha*(e^x - 1) Elu(x) - x if x >= 0 else alpha*(e^x - 1)
6262
6363 Softsign(x) - x/(1 + |x|) Softsign(x) - x/(1 + |x|)
6464
6565 Softplus(x) - log(1 + e^x) Softplus(x) - log(1 + e^x)
6666
6767Equations (Default: f=Sigmoid, g=Tanh, h=Tanh):Equations (Default: f=Sigmoid, g=Tanh, h=Tanh):
6868
6969 - it = f(Xt*(Wi^T) + Ht-1*Ri + Pi (.) Ct-1 + Wbi + Rbi) - it = f(Xt*(Wi^T) + Ht-1*(Ri^T) + Pi (.) Ct-1 + Wbi + Rbi)
7070
7171 - ft = f(Xt*(Wf^T) + Ht-1*Rf + Pf (.) Ct-1 + Wbf + Rbf) - ft = f(Xt*(Wf^T) + Ht-1*(Rf^T) + Pf (.) Ct-1 + Wbf + Rbf)
7272
7373 - ct = g(Xt*(Wc^T) + Ht-1*Rc + Wbc + Rbc) - ct = g(Xt*(Wc^T) + Ht-1*(Rc^T) + Wbc + Rbc)
7474
7575 - Ct = ft (.) Ct-1 + it (.) ct - Ct = ft (.) Ct-1 + it (.) ct
7676
7777 - ot = f(Xt*(Wo^T) + Ht-1*Ro + Po (.) Ct + Wbo + Rbo) - ot = f(Xt*(Wo^T) + Ht-1*(Ro^T) + Po (.) Ct + Wbo + Rbo)
7878
79 - Ht = ot (.) h(Ct)
7980 - Ht = ot (.) h(Ct)This operator has **optional** inputs/outputs. See ONNX <https://github.com/onnx/onnx/blob/master/docs/IR.md>_ for more details about the representation of optional arguments. An empty string may be used in the place of an actual argument's name to indicate a missing argument. Trailing optional arguments (those not followed by an argument that is present) may also be simply omitted.
8081
8182**Attributes****Attributes**
8283
8384* **activation_alpha**:* **activation_alpha**:
8485 Optional scaling values used by some activation functions. The Optional scaling values used by some activation functions. The
8586 values are consumed in the order of activation functions, for values are consumed in the order of activation functions, for
8687 example (f, g, h) in LSTM. Default values are the same as of example (f, g, h) in LSTM. Default values are the same as of
8788 corresponding ONNX operators.For example with LeakyRelu, the default corresponding ONNX operators.For example with LeakyRelu, the default
8889 alpha is 0.01. alpha is 0.01.
8990* **activation_beta**:* **activation_beta**:
9091 Optional scaling values used by some activation functions. The Optional scaling values used by some activation functions. The
9192 values are consumed in the order of activation functions, for values are consumed in the order of activation functions, for
9293 example (f, g, h) in LSTM. Default values are the same as of example (f, g, h) in LSTM. Default values are the same as of
9394 corresponding ONNX operators. corresponding ONNX operators.
9495* **activations**:* **activations**:
9596 A list of 3 (or 6 if bidirectional) activation functions for input, A list of 3 (or 6 if bidirectional) activation functions for input,
9697 output, forget, cell, and hidden. The activation functions must be output, forget, cell, and hidden. The activation functions must be
9798 one of the activation functions specified above. Optional: See the one of the activation functions specified above. Optional: See the
9899 equations for default if not specified. equations for default if not specified.
99100* **clip**:* **clip**:
100101 Cell clip threshold. Clipping bounds the elements of a tensor in the Cell clip threshold. Clipping bounds the elements of a tensor in the
101102 range of [-threshold, +threshold] and is applied to the input of range of [-threshold, +threshold] and is applied to the input of
102103 activations. No clip if not specified. activations. No clip if not specified.
103104* **direction**:* **direction**:
104105 Specify if the RNN is forward, reverse, or bidirectional. Must be Specify if the RNN is forward, reverse, or bidirectional. Must be
105106 one of forward (default), reverse, or bidirectional. Default value is 'forward'. one of forward (default), reverse, or bidirectional. Default value is 'forward'.
106107* **hidden_size**:* **hidden_size**:
107108 Number of neurons in the hidden layer Number of neurons in the hidden layer
108109* **input_forget**:* **input_forget**:
109110 Couple the input and forget gates if 1, default 0. Default value is 0. Couple the input and forget gates if 1. Default value is 0.
110* **output_sequence**:
111 The sequence output for the hidden is optional if 0. Default 0. Default value is 0.
112111
113112**Inputs****Inputs**
114113
115114Between 3 and 8 inputs.Between 3 and 8 inputs.
116115
117116* **X** (heterogeneous) - **T**:* **X** (heterogeneous) - **T**:
118117 The input sequences packed (and potentially padded) into one 3-D The input sequences packed (and potentially padded) into one 3-D
119118 tensor with the shape of [seq_length, batch_size, input_size]. tensor with the shape of [seq_length, batch_size, input_size].
120119* **W** (heterogeneous) - **T**:* **W** (heterogeneous) - **T**:
121120 The weight tensor for the gates. Concatenation of W[iofc] and The weight tensor for the gates. Concatenation of W[iofc] and
122121 WB[iofc] (if bidirectional) along dimension 0. The tensor has WB[iofc] (if bidirectional) along dimension 0. The tensor has
123122 shape [num_directions, 4*hidden_size, input_size]. shape [num_directions, 4*hidden_size, input_size].
124123* **R** (heterogeneous) - **T**:* **R** (heterogeneous) - **T**:
125124 The recurrence weight tensor. Concatenation of R[iofc] and The recurrence weight tensor. Concatenation of R[iofc] and
126125 RB[iofc] (if bidirectional) along dimension 0. This tensor has RB[iofc] (if bidirectional) along dimension 0. This tensor has
127126 shape [num_directions, 4*hidden_size, hidden_size]. shape [num_directions, 4*hidden_size, hidden_size].
128127* **B** (optional, heterogeneous) - **T**:* **B** (optional, heterogeneous) - **T**:
129128 The bias tensor for input gate. Concatenation of [Wb[iofc], The bias tensor for input gate. Concatenation of [Wb[iofc],
130129 Rb[iofc]], and [WBb[iofc], RBb[iofc]] (if bidirectional) along Rb[iofc]], and [WBb[iofc], RBb[iofc]] (if bidirectional) along
131130 dimension 0. This tensor has shape [num_directions, dimension 0. This tensor has shape [num_directions,
132131 8*hidden_size]. Optional: If not specified - assumed to be 0. 8*hidden_size]. Optional: If not specified - assumed to be 0.
133132* **sequence_lens** (optional, heterogeneous) - **T1**:* **sequence_lens** (optional, heterogeneous) - **T1**:
134133 Optional tensor specifying lengths of the sequences in a batch. If Optional tensor specifying lengths of the sequences in a batch. If
135134 not specified - assumed all sequences in the batch to have length not specified - assumed all sequences in the batch to have length
136135 seq_length. It has shape [batch_size]. seq_length. It has shape [batch_size].
137136* **initial_h** (optional, heterogeneous) - **T**:* **initial_h** (optional, heterogeneous) - **T**:
138137 Optional initial value of the hidden. If not specified - assumed to Optional initial value of the hidden. If not specified - assumed to
139138 be 0. It has shape [num_directions, batch_size, hidden_size]. be 0. It has shape [num_directions, batch_size, hidden_size].
140139* **initial_c** (optional, heterogeneous) - **T**:* **initial_c** (optional, heterogeneous) - **T**:
141140 Optional initial value of the cell. If not specified - assumed to be Optional initial value of the cell. If not specified - assumed to be
142141 0. It has shape [num_directions, batch_size, hidden_size]. 0. It has shape [num_directions, batch_size, hidden_size].
143142* **P** (optional, heterogeneous) - **T**:* **P** (optional, heterogeneous) - **T**:
144143 The weight tensor for peepholes. Concatenation of P[iof] and The weight tensor for peepholes. Concatenation of P[iof] and
145144 PB[iof] (if bidirectional) along dimension 0. It has shape PB[iof] (if bidirectional) along dimension 0. It has shape
146145 [num_directions, 3*hidde_size]. Optional: If not specified - [num_directions, 3*hidde_size]. Optional: If not specified -
147146 assumed to be 0. assumed to be 0.
148147
149148**Outputs****Outputs**
150149
151150Between 0 and 3 outputs.Between 0 and 3 outputs.
152151
153152* **Y** (optional, heterogeneous) - **T**:* **Y** (optional, heterogeneous) - **T**:
154153 A tensor that concats all the intermediate output values of the A tensor that concats all the intermediate output values of the
155154 hidden. It has shape [seq_length, num_directions, batch_size, hidden. It has shape [seq_length, num_directions, batch_size,
156155 hidden_size]. It is optional if output_sequence is 0. hidden_size].
157156* **Y_h** (optional, heterogeneous) - **T**:* **Y_h** (optional, heterogeneous) - **T**:
158157 The last output value of the hidden. It has shape [num_directions, The last output value of the hidden. It has shape [num_directions,
159158 batch_size, hidden_size]. batch_size, hidden_size].
160159* **Y_c** (optional, heterogeneous) - **T**:* **Y_c** (optional, heterogeneous) - **T**:
161160 The last output value of the cell. It has shape [num_directions, The last output value of the cell. It has shape [num_directions,
162161 batch_size, hidden_size]. batch_size, hidden_size].
163162
164163**Type Constraints****Type Constraints**
165164
166165* **T** in (* **T** in (
167166 tensor(double), tensor(double),
168167 tensor(float), tensor(float),
169168 tensor(float16) tensor(float16)
170169 ): ):
171170 Constrain input and output types to float tensors. Constrain input and output types to float tensors.
172171* **T1** in (* **T1** in (
173172 tensor(int32) tensor(int32)
174173 ): ):
175174 Constrain seq_lens to integer tensor. Constrain seq_lens to integer tensor.

LSTM - 1#

Version

  • name: LSTM (GitHub)

  • domain: main

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 1.

Summary

Computes an one-layer LSTM. This operator is usually supported via some custom implementation such as CuDNN.

Notations:

X - input tensor

i - input gate

o - output gate

f - forget gate

c - cell gate

t - time step (t-1 means previous time step)

W[iofc] - W parameter weight matrix for input, output, forget, and cell gates

R[iofc] - R recurrence weight matrix for input, output, forget, and cell gates

Wb[iofc] - W bias vectors for input, output, forget, and cell gates

Rb[iofc] - R bias vectors for input, output, forget, and cell gates

P[iof] - P peephole weight vector for input, output, and forget gates

WB[iofc] - W parameter weight matrix for backward input, output, forget, and cell gates

RB[iofc] - R recurrence weight matrix for backward input, output, forget, and cell gates

WBb[iofc] - W bias vectors for backward input, output, forget, and cell gates

RBb[iofc] - R bias vectors for backward input, output, forget, and cell gates

PB[iof] - P peephole weight vector for backward input, output, and forget gates

H - Hidden state

num_directions - 2 if direction == bidirectional else 1

Activation functions:

Relu(x) - max(0, x)

Tanh(x) - (1 - e^{-2x})/(1 + e^{-2x})

Sigmoid(x) - 1/(1 + e^{-x})

(NOTE: Below are optional)

Affine(x) - alpha*x + beta

LeakyRelu(x) - x if x >= 0 else alpha * x

ThresholdedRelu(x) - x if x >= alpha else 0

ScaledTanh(x) - alpha*Tanh(beta*x)

HardSigmoid(x) - min(max(alpha*x + beta, 0), 1)

Elu(x) - x if x >= 0 else alpha*(e^x - 1)

Softsign(x) - x/(1 + |x|)

Softplus(x) - log(1 + e^x)

Equations (Default: f=Sigmoid, g=Tanh, h=Tanh):

  • it = f(Xt*(Wi^T) + Ht-1*Ri + Pi (.) Ct-1 + Wbi + Rbi)

  • ft = f(Xt*(Wf^T) + Ht-1*Rf + Pf (.) Ct-1 + Wbf + Rbf)

  • ct = g(Xt*(Wc^T) + Ht-1*Rc + Wbc + Rbc)

  • Ct = ft (.) Ct-1 + it (.) ct

  • ot = f(Xt*(Wo^T) + Ht-1*Ro + Po (.) Ct + Wbo + Rbo)

  • Ht = ot (.) h(Ct)

Attributes

  • activation_alpha: Optional scaling values used by some activation functions. The values are consumed in the order of activation functions, for example (f, g, h) in LSTM. Default values are the same as of corresponding ONNX operators.For example with LeakyRelu, the default alpha is 0.01.

  • activation_beta: Optional scaling values used by some activation functions. The values are consumed in the order of activation functions, for example (f, g, h) in LSTM. Default values are the same as of corresponding ONNX operators.

  • activations: A list of 3 (or 6 if bidirectional) activation functions for input, output, forget, cell, and hidden. The activation functions must be one of the activation functions specified above. Optional: See the equations for default if not specified.

  • clip: Cell clip threshold. Clipping bounds the elements of a tensor in the range of [-threshold, +threshold] and is applied to the input of activations. No clip if not specified.

  • direction: Specify if the RNN is forward, reverse, or bidirectional. Must be one of forward (default), reverse, or bidirectional. Default value is 'forward'.

  • hidden_size: Number of neurons in the hidden layer

  • input_forget: Couple the input and forget gates if 1, default 0. Default value is 0.

  • output_sequence: The sequence output for the hidden is optional if 0. Default 0. Default value is 0.

Inputs

Between 3 and 8 inputs.

  • X (heterogeneous) - T: The input sequences packed (and potentially padded) into one 3-D tensor with the shape of [seq_length, batch_size, input_size].

  • W (heterogeneous) - T: The weight tensor for the gates. Concatenation of W[iofc] and WB[iofc] (if bidirectional) along dimension 0. The tensor has shape [num_directions, 4*hidden_size, input_size].

  • R (heterogeneous) - T: The recurrence weight tensor. Concatenation of R[iofc] and RB[iofc] (if bidirectional) along dimension 0. This tensor has shape [num_directions, 4*hidden_size, hidden_size].

  • B (optional, heterogeneous) - T: The bias tensor for input gate. Concatenation of [Wb[iofc], Rb[iofc]], and [WBb[iofc], RBb[iofc]] (if bidirectional) along dimension 0. This tensor has shape [num_directions, 8*hidden_size]. Optional: If not specified - assumed to be 0.

  • sequence_lens (optional, heterogeneous) - T1: Optional tensor specifying lengths of the sequences in a batch. If not specified - assumed all sequences in the batch to have length seq_length. It has shape [batch_size].

  • initial_h (optional, heterogeneous) - T: Optional initial value of the hidden. If not specified - assumed to be 0. It has shape [num_directions, batch_size, hidden_size].

  • initial_c (optional, heterogeneous) - T: Optional initial value of the cell. If not specified - assumed to be 0. It has shape [num_directions, batch_size, hidden_size].

  • P (optional, heterogeneous) - T: The weight tensor for peepholes. Concatenation of P[iof] and PB[iof] (if bidirectional) along dimension 0. It has shape [num_directions, 3*hidde_size]. Optional: If not specified - assumed to be 0.

Outputs

Between 0 and 3 outputs.

  • Y (optional, heterogeneous) - T: A tensor that concats all the intermediate output values of the hidden. It has shape [seq_length, num_directions, batch_size, hidden_size]. It is optional if output_sequence is 0.

  • Y_h (optional, heterogeneous) - T: The last output value of the hidden. It has shape [num_directions, batch_size, hidden_size].

  • Y_c (optional, heterogeneous) - T: The last output value of the cell. It has shape [num_directions, batch_size, hidden_size].

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

  • T1 in ( tensor(int32) ): Constrain seq_lens to integer tensor.