LSTM#

Domain: ai.onnx
Since version: 22

Computes an one-layer LSTM. This operator is usually supported via some custom implementation such as CuDNN.

Notations:

X - input tensor
i - input gate
o - output gate
f - forget gate
c - cell gate
t - time step (t-1 means previous time step)
W[iofc] - W parameter weight matrix for input, output, forget, and cell gates
R[iofc] - R recurrence weight matrix for input, output, forget, and cell gates
Wb[iofc] - W bias vectors for input, output, forget, and cell gates
Rb[iofc] - R bias vectors for input, output, forget, and cell gates
P[iof] - P peephole weight vector for input, output, and forget gates
WB[iofc] - W parameter weight matrix for backward input, output, forget, and cell gates
RB[iofc] - R recurrence weight matrix for backward input, output, forget, and cell gates
WBb[iofc] - W bias vectors for backward input, output, forget, and cell gates
RBb[iofc] - R bias vectors for backward input, output, forget, and cell gates
PB[iof] - P peephole weight vector for backward input, output, and forget gates
H - Hidden state
num_directions - 2 if direction == bidirectional else 1

Activation functions:

Relu(x) - max(0, x)
Tanh(x) - (1 - e^{-2x})/(1 + e^{-2x})
Sigmoid(x) - 1/(1 + e^{-x})

NOTE: Below are optional

Affine(x) - alpha*x + beta
LeakyRelu(x) - x if x >= 0 else alpha * x
ThresholdedRelu(x) - x if x >= alpha else 0
ScaledTanh(x) - alpha*Tanh(beta*x)
HardSigmoid(x) - min(max(alpha*x + beta, 0), 1)
Elu(x) - x if x >= 0 else alpha*(e^x - 1)
Softsign(x) - x/(1 + |x|)
Softplus(x) - log(1 + e^x)

Equations (Default: f=Sigmoid, g=Tanh, h=Tanh):

it = f(Xt*(Wi^T) + Ht-1*(Ri^T) + Pi (.) Ct-1 + Wbi + Rbi)
ft = f(Xt*(Wf^T) + Ht-1*(Rf^T) + Pf (.) Ct-1 + Wbf + Rbf)
ct = g(Xt*(Wc^T) + Ht-1*(Rc^T) + Wbc + Rbc)
Ct = ft (.) Ct-1 + it (.) ct
gt = f(Xt*(Wo^T) + Ht-1*(Ro^T) + Po (.) Ct + Wbo + Rbo)
Ht = gt (.) h(Ct)

Inputs

X (T): The input sequences packed (and potentially padded) into one 3-D tensor with the shape of [seq_length, batch_size, input_size].
W (T): The weight tensor for the gates. Concatenation of W[iofc] and WB[iofc] (if bidirectional) along dimension 0. The tensor has shape [num_directions, 4*hidden_size, input_size].
R (T): The recurrence weight tensor. Concatenation of R[iofc] and RB[iofc] (if bidirectional) along dimension 0. This tensor has shape [num_directions, 4*hidden_size, hidden_size].
B (T): The bias tensor for input gate. Concatenation of [Wb[iofc], Rb[iofc]], and [WBb[iofc], RBb[iofc]] (if bidirectional) along dimension 0. This tensor has shape [num_directions, 8*hidden_size]. Optional: If not specified - assumed to be 0.
sequence_lens (T1): Optional tensor specifying lengths of the sequences in a batch. If not specified - assumed all sequences in the batch to have length seq_length. It has shape [batch_size].
initial_h (T): Optional initial value of the hidden. If not specified - assumed to be 0. It has shape [num_directions, batch_size, hidden_size].
initial_c (T): Optional initial value of the cell. If not specified - assumed to be 0. It has shape [num_directions, batch_size, hidden_size].
P (T): The weight tensor for peepholes. Concatenation of P[iof] and PB[iof] (if bidirectional) along dimension 0. It has shape [num_directions, 3*hidde_size]. Optional: If not specified - assumed to be 0.

Outputs

Y (T): A tensor that concats all the intermediate output values of the hidden. It has shape [seq_length, num_directions, batch_size, hidden_size].
Y_h (T): The last output value of the hidden. It has shape [num_directions, batch_size, hidden_size].
Y_c (T): The last output value of the cell. It has shape [num_directions, batch_size, hidden_size].

Type Constraints

T: Constrain input and output types to float tensors. Allowed types: tensor(bfloat16), tensor(double), tensor(float), tensor(float16).
T1: Constrain seq_lens to integer tensor. Allowed types: tensor(int32).

Examples#

test_cc_lstm_batchwise

Node:
  LSTM(X, W, R) -> (Y, Y_h)
  Attributes:
    hidden_size = 7
    layout = 1

Inputs:
  X: shape=(3, 1, 2), dtype=float32
    [[[1., 2.]],

     [[3., 4.]],

     [[5., 6.]]]
  W: shape=(1, 28, 2), dtype=float32
    [[[0.3, 0.3],
      [0.3, 0.3],
      [0.3, 0.3],
      [0.3, 0.3],
      [0.3, 0.3],
      [0.3, 0.3],
      [0.3, 0.3],
      [0.3, 0.3],
      [0.3, 0.3],
      [0.3, 0.3],
      [0.3, 0.3],
      [0.3, 0.3],
      [0.3, 0.3],
      [0.3, 0.3],
      [0.3, 0.3],
      [0.3, 0.3],
      [0.3, 0.3],
      [0.3, 0.3],
      [0.3, 0.3],
      [0.3, 0.3],
      [0.3, 0.3],
      [0.3, 0.3],
      [0.3, 0.3],
      [0.3, 0.3],
      [0.3, 0.3],
      [0.3, 0.3],
      [0.3, 0.3],
      [0.3, 0.3]]]
  R: shape=(1, 28, 7), dtype=float32
    [[[0.3, 0.3, 0.3, ..., 0.3, 0.3, 0.3],
      [0.3, 0.3, 0.3, ..., 0.3, 0.3, 0.3],
      [0.3, 0.3, 0.3, ..., 0.3, 0.3, 0.3],
      ...,
      [0.3, 0.3, 0.3, ..., 0.3, 0.3, 0.3],
      [0.3, 0.3, 0.3, ..., 0.3, 0.3, 0.3],
      [0.3, 0.3, 0.3, ..., 0.3, 0.3, 0.3]]]

Outputs:
  Y: shape=(3, 1, 1, 7), dtype=float32
    [[[[0.3336926 , 0.3336926 , 0.3336926 , 0.3336926 , 0.3336926 , 0.3336926 ,
        0.3336926 ]]],


     [[[0.6223932 , 0.6223932 , 0.6223932 , 0.6223932 , 0.6223932 , 0.6223932 ,
        0.6223932 ]]],


     [[[0.71857905, 0.71857905, 0.71857905, 0.71857905, 0.71857905, 0.71857905,
        0.71857905]]]]
  Y_h: shape=(3, 1, 7), dtype=float32
    [[[0.3336926 , 0.3336926 , 0.3336926 , 0.3336926 , 0.3336926 , 0.3336926 ,
       0.3336926 ]],

     [[0.6223932 , 0.6223932 , 0.6223932 , 0.6223932 , 0.6223932 , 0.6223932 ,
       0.6223932 ]],

     [[0.71857905, 0.71857905, 0.71857905, 0.71857905, 0.71857905, 0.71857905,
       0.71857905]]]

test_cc_lstm_defaults

Node:
  LSTM(X, W, R) -> ("", Y_h)
  Attributes:
    hidden_size = 3

Inputs:
  X: shape=(1, 3, 2), dtype=float32
    [[[1., 2.],
      [3., 4.],
      [5., 6.]]]
  W: shape=(1, 12, 2), dtype=float32
    [[[0.1, 0.1],
      [0.1, 0.1],
      [0.1, 0.1],
      [0.1, 0.1],
      [0.1, 0.1],
      [0.1, 0.1],
      [0.1, 0.1],
      [0.1, 0.1],
      [0.1, 0.1],
      [0.1, 0.1],
      [0.1, 0.1],
      [0.1, 0.1]]]
  R: shape=(1, 12, 3), dtype=float32
    [[[0.1, 0.1, 0.1],
      [0.1, 0.1, 0.1],
      [0.1, 0.1, 0.1],
      [0.1, 0.1, 0.1],
      [0.1, 0.1, 0.1],
      [0.1, 0.1, 0.1],
      [0.1, 0.1, 0.1],
      [0.1, 0.1, 0.1],
      [0.1, 0.1, 0.1],
      [0.1, 0.1, 0.1],
      [0.1, 0.1, 0.1],
      [0.1, 0.1, 0.1]]]

Outputs:
  Y_h: shape=(1, 3, 3), dtype=float32
    [[[0.09524119, 0.09524119, 0.09524119],
      [0.25606444, 0.25606444, 0.25606444],
      [0.40323776, 0.40323776, 0.40323776]]]

test_cc_lstm_with_initial_bias

Node:
  LSTM(X, W, R, B) -> ("", Y_h)
  Attributes:
    hidden_size = 4

Inputs:
  X: shape=(1, 3, 3), dtype=float32
    [[[1., 2., 3.],
      [4., 5., 6.],
      [7., 8., 9.]]]
  W: shape=(1, 16, 3), dtype=float32
    [[[0.1, 0.1, 0.1],
      [0.1, 0.1, 0.1],
      [0.1, 0.1, 0.1],
      [0.1, 0.1, 0.1],
      [0.1, 0.1, 0.1],
      [0.1, 0.1, 0.1],
      [0.1, 0.1, 0.1],
      [0.1, 0.1, 0.1],
      [0.1, 0.1, 0.1],
      [0.1, 0.1, 0.1],
      [0.1, 0.1, 0.1],
      [0.1, 0.1, 0.1],
      [0.1, 0.1, 0.1],
      [0.1, 0.1, 0.1],
      [0.1, 0.1, 0.1],
      [0.1, 0.1, 0.1]]]
  R: shape=(1, 16, 4), dtype=float32
    [[[0.1, 0.1, 0.1, 0.1],
      [0.1, 0.1, 0.1, 0.1],
      [0.1, 0.1, 0.1, 0.1],
      [0.1, 0.1, 0.1, 0.1],
      [0.1, 0.1, 0.1, 0.1],
      [0.1, 0.1, 0.1, 0.1],
      [0.1, 0.1, 0.1, 0.1],
      [0.1, 0.1, 0.1, 0.1],
      [0.1, 0.1, 0.1, 0.1],
      [0.1, 0.1, 0.1, 0.1],
      [0.1, 0.1, 0.1, 0.1],
      [0.1, 0.1, 0.1, 0.1],
      [0.1, 0.1, 0.1, 0.1],
      [0.1, 0.1, 0.1, 0.1],
      [0.1, 0.1, 0.1, 0.1],
      [0.1, 0.1, 0.1, 0.1]]]
  B: shape=(1, 32), dtype=float32
    [[0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1,
      0.1, 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ,
      0. , 0. ]]

Outputs:
  Y_h: shape=(1, 3, 4), dtype=float32
    [[[0.25606444, 0.25606444, 0.25606444, 0.25606444],
      [0.5367278 , 0.5367278 , 0.5367278 , 0.5367278 ],
      [0.6672132 , 0.6672132 , 0.6672132 , 0.6672132 ]]]

test_cc_lstm_with_peepholes

Node:
  LSTM(X, W, R, B, sequence_lens, initial_h, initial_c, P) -> ("", Y_h)
  Attributes:
    hidden_size = 3

Inputs:
  X: shape=(1, 2, 4), dtype=float32
    [[[1., 2., 3., 4.],
      [5., 6., 7., 8.]]]
  W: shape=(1, 12, 4), dtype=float32
    [[[0.1, 0.1, 0.1, 0.1],
      [0.1, 0.1, 0.1, 0.1],
      [0.1, 0.1, 0.1, 0.1],
      [0.1, 0.1, 0.1, 0.1],
      [0.1, 0.1, 0.1, 0.1],
      [0.1, 0.1, 0.1, 0.1],
      [0.1, 0.1, 0.1, 0.1],
      [0.1, 0.1, 0.1, 0.1],
      [0.1, 0.1, 0.1, 0.1],
      [0.1, 0.1, 0.1, 0.1],
      [0.1, 0.1, 0.1, 0.1],
      [0.1, 0.1, 0.1, 0.1]]]
  R: shape=(1, 12, 3), dtype=float32
    [[[0.1, 0.1, 0.1],
      [0.1, 0.1, 0.1],
      [0.1, 0.1, 0.1],
      [0.1, 0.1, 0.1],
      [0.1, 0.1, 0.1],
      [0.1, 0.1, 0.1],
      [0.1, 0.1, 0.1],
      [0.1, 0.1, 0.1],
      [0.1, 0.1, 0.1],
      [0.1, 0.1, 0.1],
      [0.1, 0.1, 0.1],
      [0.1, 0.1, 0.1]]]
  B: shape=(1, 24), dtype=float32
    [[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
      0., 0., 0., 0., 0.]]
  sequence_lens: shape=(2,), dtype=int32
    [1, 1]
  initial_h: shape=(1, 2, 3), dtype=float32
    [[[0., 0., 0.],
      [0., 0., 0.]]]
  initial_c: shape=(1, 2, 3), dtype=float32
    [[[0., 0., 0.],
      [0., 0., 0.]]]
  P: shape=(1, 9), dtype=float32
    [[0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1]]

Outputs:
  Y_h: shape=(1, 2, 3), dtype=float32
    [[[0.3750691 , 0.3750691 , 0.3750691 ],
      [0.68013096, 0.68013096, 0.68013096]]]

Differences with previous version (14)#

SchemaDiff: LSTM (domain 'ai.onnx')

old version: 14
new version: 22
breaking: no

Type constraints:

changed ‘T’: added types: [‘tensor(bfloat16)’]

LSTM#

Examples#

Differences with previous version (14)#

Version History#