GRU - 1 vs 14#

Next section compares an older to a newer version of the same operator after both definition are converted into markdown text. Green means an addition to the newer version, red means a deletion. Anything else is unchanged.

Files changed (1) hide show

GRU1 → GRU14 +10 -27

GRU1 → GRU14 RENAMED Viewed

@@ -1 +1 @@
  Computes an one-layer GRU. This operator is usually supported via some custom
  implementation such as CuDNN.
  Notations:
  X - input tensor
  z - update gate
  r - reset gate
  h - hidden gate
  t - time step (t-1 means previous time step)
  W[zrh] - W parameter weight matrix for update, reset, and hidden gates
  R[zrh] - R recurrence weight matrix for update, reset, and hidden gates
  Wb[zrh] - W bias vectors for update, reset, and hidden gates
  Rb[zrh] - R bias vectors for update, reset, and hidden gates
  WB[zrh] - W parameter weight matrix for backward update, reset, and hidden gates
  RB[zrh] - R recurrence weight matrix for backward update, reset, and hidden gates
  WBb[zrh] - W bias vectors for backward update, reset, and hidden gates
  RBb[zrh] - R bias vectors for backward update, reset, and hidden gates
  H - Hidden state
  num_directions - 2 if direction == bidirectional else 1
  Activation functions:
    Relu(x)                - max(0, x)
    Tanh(x)                - (1 - e^{-2x})/(1 + e^{-2x})
    Sigmoid(x)             - 1/(1 + e^{-x})
    (NOTE: Below are optional)
    Affine(x)              - alpha*x + beta
    LeakyRelu(x)           - x if x >= 0 else alpha * x
    ThresholdedRelu(x)     - x if x >= alpha else 0
    ScaledTanh(x)          - alpha*Tanh(beta*x)
    HardSigmoid(x)         - min(max(alpha*x + beta, 0), 1)
    Elu(x)                 - x if x >= 0 else alpha*(e^x - 1)
    Softsign(x)            - x/(1 + |x|)
    Softplus(x)            - log(1 + e^x)
  Equations (Default: f=Sigmoid, g=Tanh):
-   - zt = f(Xt*(Wz^T) + Ht-1*(Rz^T) + Wbz + Rbz)
+   - zt = f(Xt*(Wz^T) + Ht-1*Rz + Wbz + Rbz)
-   - rt = f(Xt*(Wr^T) + Ht-1*(Rr^T) + Wbr + Rbr)
+   - rt = f(Xt*(Wr^T) + Ht-1*Rr + Wbr + Rbr)
-   - ht = g(Xt*(Wh^T) + (rt (.) Ht-1)*(Rh^T) + Rbh + Wbh) # default, when linear_before_reset = 0
+   - ht = g(Xt*(Wh^T) + (rt (.) Ht-1)*Rh + Rbh + Wbh) # default, when linear_before_reset = 0
-   - ht = g(Xt*(Wh^T) + (rt (.) (Ht-1*(Rh^T) + Rbh)) + Wbh) # when linear_before_reset != 0
+   - ht = g(Xt*(Wh^T) + (rt (.) (Ht-1*Rh + Rbh) + Wbh) # when linear_before_reset != 0
    - Ht = (1 - zt) (.) ht + zt (.) Ht-1
- This operator has **optional** inputs/outputs. See ONNX <https://github.com/onnx/onnx/blob/master/docs/IR.md>_ for more details about the representation of optional arguments. An empty string may be used in the place of an actual argument's name to indicate a missing argument. Trailing optional arguments (those not followed by an argument that is present) may also be simply omitted.
  **Attributes**
  * **activation_alpha**:
    Optional scaling values used by some activation functions. The
    values are consumed in the order of activation functions, for
+   example (f, g, h) in LSTM.
-   example (f, g, h) in LSTM. Default values are the same as of
-   corresponding ONNX operators.For example with LeakyRelu, the default
-   alpha is 0.01.
  * **activation_beta**:
    Optional scaling values used by some activation functions. The
    values are consumed in the order of activation functions, for
+   example (f, g, h) in LSTM.
-   example (f, g, h) in LSTM. Default values are the same as of
-   corresponding ONNX operators.
  * **activations**:
    A list of 2 (or 4 if bidirectional) activation functions for update,
    reset, and hidden gates. The activation functions must be one of the
    activation functions specified above. Optional: See the equations
    for default if not specified.
  * **clip**:
    Cell clip threshold. Clipping bounds the elements of a tensor in the
    range of [-threshold, +threshold] and is applied to the input of
    activations. No clip if not specified.
  * **direction**:
    Specify if the RNN is forward, reverse, or bidirectional. Must be
    one of forward (default), reverse, or bidirectional.
  * **hidden_size**:
    Number of neurons in the hidden layer
+ * **output_sequence**:
+   The sequence output for the hidden is optional if 0. Default 0.
- * **layout**:
-   The shape format of inputs X, initial_h and outputs Y, Y_h. If 0,
-   the following shapes are expected: X.shape = [seq_length,
-   batch_size, input_size], Y.shape = [seq_length, num_directions,
-   batch_size, hidden_size], initial_h.shape = Y_h.shape =
-   [num_directions, batch_size, hidden_size]. If 1, the following
-   shapes are expected: X.shape = [batch_size, seq_length, input_size],
-   Y.shape = [batch_size, seq_length, num_directions, hidden_size],
-   initial_h.shape = Y_h.shape = [batch_size, num_directions,
-   hidden_size].
- * **linear_before_reset**:
-   When computing the output of the hidden gate, apply the linear
-   transformation before multiplying by the output of the reset gate.
  **Inputs**
  Between 3 and 6 inputs.
  * **X** (heterogeneous) - **T**:
    The input sequences packed (and potentially padded) into one 3-D
    tensor with the shape of [seq_length, batch_size, input_size].
  * **W** (heterogeneous) - **T**:
    The weight tensor for the gates. Concatenation of W[zrh] and
    WB[zrh] (if bidirectional) along dimension 0. This tensor has
    shape [num_directions, 3*hidden_size, input_size].
  * **R** (heterogeneous) - **T**:
    The recurrence weight tensor. Concatenation of R[zrh] and
    RB[zrh] (if bidirectional) along dimension 0. This tensor has
    shape [num_directions, 3*hidden_size, hidden_size].
  * **B** (optional, heterogeneous) - **T**:
    The bias tensor for the gates. Concatenation of [Wb[zrh], Rb[zrh]]
    and [WBb[zrh], RBb[zrh]] (if bidirectional) along dimension 0.
    This tensor has shape [num_directions, 6*hidden_size]. Optional:
    If not specified - assumed to be 0
  * **sequence_lens** (optional, heterogeneous) - **T1**:
    Optional tensor specifying lengths of the sequences in a batch. If
    not specified - assumed all sequences in the batch to have length
    seq_length. It has shape [batch_size].
  * **initial_h** (optional, heterogeneous) - **T**:
    Optional initial value of the hidden. If not specified - assumed to
    be 0. It has shape [num_directions, batch_size, hidden_size].
  **Outputs**
- Between 0 and 2 outputs.
  * **Y** (optional, heterogeneous) - **T**:
    A tensor that concats all the intermediate output values of the
    hidden. It has shape [seq_length, num_directions, batch_size,
-   hidden_size].
+   hidden_size]. It is optional if output_sequence is 0.
- * **Y_h** (optional, heterogeneous) - **T**:
+ * **Y_h** (heterogeneous) - **T**:
    The last output value of the hidden. It has shape [num_directions,
    batch_size, hidden_size].
  **Type Constraints**
  * **T** in (
    tensor(double),
    tensor(float),
    tensor(float16)
    ):
    Constrain input and output types to float tensors.
  * **T1** in (
    tensor(int32)
    ):
    Constrain seq_lens to integer tensor.