LSTM - version 1#
This page documents version 1 of operator LSTM. See LSTM for the latest version (since version 22).
Domain:
ai.onnxSince version: 1
Computes an one-layer LSTM. This operator is usually supported via some custom implementation such as CuDNN.
Notations:
X- input tensori- input gateo- output gatef- forget gatec- cell gatet- time step (t-1 means previous time step)W[iofc]- W parameter weight matrix for input, output, forget, and cell gatesR[iofc]- R recurrence weight matrix for input, output, forget, and cell gatesWb[iofc]- W bias vectors for input, output, forget, and cell gatesRb[iofc]- R bias vectors for input, output, forget, and cell gatesP[iof]- P peephole weight vector for input, output, and forget gatesWB[iofc]- W parameter weight matrix for backward input, output, forget, and cell gatesRB[iofc]- R recurrence weight matrix for backward input, output, forget, and cell gatesWBb[iofc]- W bias vectors for backward input, output, forget, and cell gatesRBb[iofc]- R bias vectors for backward input, output, forget, and cell gatesPB[iof]- P peephole weight vector for backward input, output, and forget gatesH- Hidden statenum_directions- 2 if direction == bidirectional else 1
Activation functions:
Relu(x) - max(0, x)
Tanh(x) - (1 - e^{-2x})/(1 + e^{-2x})
Sigmoid(x) - 1/(1 + e^{-x})
NOTE: Below are optional
Affine(x) - alpha*x + beta
LeakyRelu(x) - x if x >= 0 else alpha * x
ThresholdedRelu(x) - x if x >= alpha else 0
ScaledTanh(x) - alpha*Tanh(beta*x)
HardSigmoid(x) - min(max(alpha*x + beta, 0), 1)
Elu(x) - x if x >= 0 else alpha*(e^x - 1)
Softsign(x) - x/(1 +
|x|)Softplus(x) - log(1 + e^x)
Equations (Default: f=Sigmoid, g=Tanh, h=Tanh):
it = f(Xt*(Wi^T) + Ht-1*(Ri^T) + Pi (.) Ct-1 + Wbi + Rbi)
ft = f(Xt*(Wf^T) + Ht-1*(Rf^T) + Pf (.) Ct-1 + Wbf + Rbf)
ct = g(Xt*(Wc^T) + Ht-1*(Rc^T) + Wbc + Rbc)
Ct = ft (.) Ct-1 + it (.) ct
gt = f(Xt*(Wo^T) + Ht-1*(Ro^T) + Po (.) Ct + Wbo + Rbo)
Ht = gt (.) h(Ct)
Inputs
X (T): The input sequences packed (and potentially padded) into one 3-D tensor with the shape of
[seq_length, batch_size, input_size].W (T): The weight tensor for the gates. Concatenation of
W[iofc]andWB[iofc](if bidirectional) along dimension 0. The tensor has shape[num_directions, 4*hidden_size, input_size].R (T): The recurrence weight tensor. Concatenation of
R[iofc]andRB[iofc](if bidirectional) along dimension 0. This tensor has shape[num_directions, 4*hidden_size, hidden_size].B (T): The bias tensor for input gate. Concatenation of
[Wb[iofc], Rb[iofc]], and[WBb[iofc], RBb[iofc]](if bidirectional) along dimension 0. This tensor has shape[num_directions, 8*hidden_size]. Optional: If not specified - assumed to be 0.sequence_lens (T1): Optional tensor specifying lengths of the sequences in a batch. If not specified - assumed all sequences in the batch to have length
seq_length. It has shape[batch_size].initial_h (T): Optional initial value of the hidden. If not specified - assumed to be 0. It has shape
[num_directions, batch_size, hidden_size].initial_c (T): Optional initial value of the cell. If not specified - assumed to be 0. It has shape
[num_directions, batch_size, hidden_size].P (T): The weight tensor for peepholes. Concatenation of
P[iof]andPB[iof](if bidirectional) along dimension 0. It has shape[num_directions, 3*hidde_size]. Optional: If not specified - assumed to be 0.
Outputs
Y (T): A tensor that concats all the intermediate output values of the hidden. It has shape
[seq_length, num_directions, batch_size, hidden_size]. It is optional ifoutput_sequenceis 0.Y_h (T): The last output value of the hidden. It has shape
[num_directions, batch_size, hidden_size].Y_c (T): The last output value of the cell. It has shape
[num_directions, batch_size, hidden_size].
Type Constraints
T: Constrain input and output types to float tensors. Allowed types: tensor(double), tensor(float), tensor(float16).
T1: Constrain seq_lens to integer tensor. Allowed types: tensor(int32).