LSTM#

  • Domain: ai.onnx

  • Since version: 22

Computes an one-layer LSTM. This operator is usually supported via some custom implementation such as CuDNN.

Notations:

  • X - input tensor

  • i - input gate

  • o - output gate

  • f - forget gate

  • c - cell gate

  • t - time step (t-1 means previous time step)

  • W[iofc] - W parameter weight matrix for input, output, forget, and cell gates

  • R[iofc] - R recurrence weight matrix for input, output, forget, and cell gates

  • Wb[iofc] - W bias vectors for input, output, forget, and cell gates

  • Rb[iofc] - R bias vectors for input, output, forget, and cell gates

  • P[iof] - P peephole weight vector for input, output, and forget gates

  • WB[iofc] - W parameter weight matrix for backward input, output, forget, and cell gates

  • RB[iofc] - R recurrence weight matrix for backward input, output, forget, and cell gates

  • WBb[iofc] - W bias vectors for backward input, output, forget, and cell gates

  • RBb[iofc] - R bias vectors for backward input, output, forget, and cell gates

  • PB[iof] - P peephole weight vector for backward input, output, and forget gates

  • H - Hidden state

  • num_directions - 2 if direction == bidirectional else 1

Activation functions:

  • Relu(x) - max(0, x)

  • Tanh(x) - (1 - e^{-2x})/(1 + e^{-2x})

  • Sigmoid(x) - 1/(1 + e^{-x})

NOTE: Below are optional

  • Affine(x) - alpha*x + beta

  • LeakyRelu(x) - x if x >= 0 else alpha * x

  • ThresholdedRelu(x) - x if x >= alpha else 0

  • ScaledTanh(x) - alpha*Tanh(beta*x)

  • HardSigmoid(x) - min(max(alpha*x + beta, 0), 1)

  • Elu(x) - x if x >= 0 else alpha*(e^x - 1)

  • Softsign(x) - x/(1 + |x|)

  • Softplus(x) - log(1 + e^x)

Equations (Default: f=Sigmoid, g=Tanh, h=Tanh):

  • it = f(Xt*(Wi^T) + Ht-1*(Ri^T) + Pi (.) Ct-1 + Wbi + Rbi)

  • ft = f(Xt*(Wf^T) + Ht-1*(Rf^T) + Pf (.) Ct-1 + Wbf + Rbf)

  • ct = g(Xt*(Wc^T) + Ht-1*(Rc^T) + Wbc + Rbc)

  • Ct = ft (.) Ct-1 + it (.) ct

  • gt = f(Xt*(Wo^T) + Ht-1*(Ro^T) + Po (.) Ct + Wbo + Rbo)

  • Ht = gt (.) h(Ct)

Inputs

  • X (T): The input sequences packed (and potentially padded) into one 3-D tensor with the shape of [seq_length, batch_size, input_size].

  • W (T): The weight tensor for the gates. Concatenation of W[iofc] and WB[iofc] (if bidirectional) along dimension 0. The tensor has shape [num_directions, 4*hidden_size, input_size].

  • R (T): The recurrence weight tensor. Concatenation of R[iofc] and RB[iofc] (if bidirectional) along dimension 0. This tensor has shape [num_directions, 4*hidden_size, hidden_size].

  • B (T): The bias tensor for input gate. Concatenation of [Wb[iofc], Rb[iofc]], and [WBb[iofc], RBb[iofc]] (if bidirectional) along dimension 0. This tensor has shape [num_directions, 8*hidden_size]. Optional: If not specified - assumed to be 0.

  • sequence_lens (T1): Optional tensor specifying lengths of the sequences in a batch. If not specified - assumed all sequences in the batch to have length seq_length. It has shape [batch_size].

  • initial_h (T): Optional initial value of the hidden. If not specified - assumed to be 0. It has shape [num_directions, batch_size, hidden_size].

  • initial_c (T): Optional initial value of the cell. If not specified - assumed to be 0. It has shape [num_directions, batch_size, hidden_size].

  • P (T): The weight tensor for peepholes. Concatenation of P[iof] and PB[iof] (if bidirectional) along dimension 0. It has shape [num_directions, 3*hidde_size]. Optional: If not specified - assumed to be 0.

Outputs

  • Y (T): A tensor that concats all the intermediate output values of the hidden. It has shape [seq_length, num_directions, batch_size, hidden_size].

  • Y_h (T): The last output value of the hidden. It has shape [num_directions, batch_size, hidden_size].

  • Y_c (T): The last output value of the cell. It has shape [num_directions, batch_size, hidden_size].

Type Constraints

  • T: Constrain input and output types to float tensors. Allowed types: tensor(bfloat16), tensor(double), tensor(float), tensor(float16).

  • T1: Constrain seq_lens to integer tensor. Allowed types: tensor(int32).

Differences with previous version (14)#

SchemaDiff: LSTM (domain 'ai.onnx')

  • old version: 14

  • new version: 22

  • breaking: no

Type constraints:

  • changed ‘T’: added types: [‘tensor(bfloat16)’]

Version History#