LSTM - version 1#

This page documents version 1 of operator LSTM. See LSTM for the latest version (since version 22).

Computes an one-layer LSTM. This operator is usually supported via some custom implementation such as CuDNN.

Notations:

X - input tensor
i - input gate
o - output gate
f - forget gate
c - cell gate
t - time step (t-1 means previous time step)
W[iofc] - W parameter weight matrix for input, output, forget, and cell gates
R[iofc] - R recurrence weight matrix for input, output, forget, and cell gates
Wb[iofc] - W bias vectors for input, output, forget, and cell gates
Rb[iofc] - R bias vectors for input, output, forget, and cell gates
P[iof] - P peephole weight vector for input, output, and forget gates
WB[iofc] - W parameter weight matrix for backward input, output, forget, and cell gates
RB[iofc] - R recurrence weight matrix for backward input, output, forget, and cell gates
WBb[iofc] - W bias vectors for backward input, output, forget, and cell gates
RBb[iofc] - R bias vectors for backward input, output, forget, and cell gates
PB[iof] - P peephole weight vector for backward input, output, and forget gates
H - Hidden state
num_directions - 2 if direction == bidirectional else 1

Activation functions:

NOTE: Below are optional

Equations (Default: f=Sigmoid, g=Tanh, h=Tanh):

Inputs

X (T): The input sequences packed (and potentially padded) into one 3-D tensor with the shape of [seq_length, batch_size, input_size].
W (T): The weight tensor for the gates. Concatenation of W[iofc] and WB[iofc] (if bidirectional) along dimension 0. The tensor has shape [num_directions, 4*hidden_size, input_size].
R (T): The recurrence weight tensor. Concatenation of R[iofc] and RB[iofc] (if bidirectional) along dimension 0. This tensor has shape [num_directions, 4*hidden_size, hidden_size].
B (T): The bias tensor for input gate. Concatenation of [Wb[iofc], Rb[iofc]], and [WBb[iofc], RBb[iofc]] (if bidirectional) along dimension 0. This tensor has shape [num_directions, 8*hidden_size]. Optional: If not specified - assumed to be 0.
sequence_lens (T1): Optional tensor specifying lengths of the sequences in a batch. If not specified - assumed all sequences in the batch to have length seq_length. It has shape [batch_size].
initial_h (T): Optional initial value of the hidden. If not specified - assumed to be 0. It has shape [num_directions, batch_size, hidden_size].
initial_c (T): Optional initial value of the cell. If not specified - assumed to be 0. It has shape [num_directions, batch_size, hidden_size].
P (T): The weight tensor for peepholes. Concatenation of P[iof] and PB[iof] (if bidirectional) along dimension 0. It has shape [num_directions, 3*hidde_size]. Optional: If not specified - assumed to be 0.

Outputs

Y (T): A tensor that concats all the intermediate output values of the hidden. It has shape [seq_length, num_directions, batch_size, hidden_size]. It is optional if output_sequence is 0.
Y_h (T): The last output value of the hidden. It has shape [num_directions, batch_size, hidden_size].
Y_c (T): The last output value of the cell. It has shape [num_directions, batch_size, hidden_size].

Type Constraints

T: Constrain input and output types to float tensors. Allowed types: tensor(double), tensor(float), tensor(float16).
T1: Constrain seq_lens to integer tensor. Allowed types: tensor(int32).