Softmax - 11 vs 13

Files changed (1) hide show
  1. Softmax11 → Softmax13 +10 -19
Softmax11 → Softmax13 RENAMED
@@ -1 +1 @@
1
- The operator computes the softmax (normalized exponential) values for each layer in the batch
1
+ The operator computes the normalized exponential values for the given input:
2
+ Softmax(input, axis) = Exp(input) / ReduceSum(Exp(input), axis=axis, keepdims=1)
3
+
2
- of the given input.
4
+ The "axis" attribute indicates the dimension along which Softmax
3
- The input does not need to explicitly be a 2D vector; rather, it will be
4
- coerced into one. For an arbitrary n-dimensional tensor
5
- input in [a_0, a_1, ..., a_{k-1}, a_k, ..., a_{n-1}] and k is
6
- the axis provided, then input will be coerced into a 2-dimensional tensor with
7
- dimensions [a_0 * ... * a_{k-1}, a_k * ... * a_{n-1}]. For the default
8
- case where axis=1, this means the input tensor will be coerced into a 2D tensor
9
- of dimensions [a_0, a_1 * ... * a_{n-1}], where a_0 is often the batch size.
10
- In this situation, we must have a_0 = N and a_1 * ... * a_{n-1} = D.
11
- Each of these dimensions must be matched correctly, or else the operator
12
- will throw errors. The output tensor has the same shape
5
+ will be performed. The output tensor has the same shape
13
- and contains the softmax values of the corresponding input.
6
+ and contains the Softmax values of the corresponding input.
14
7
  **Attributes**
15
8
  * **axis**:
9
+ Describes the dimension Softmax will be performed on. Negative
16
- Describes the axis of the inputs when coerced to 2D; defaults to one
17
- because the 0th axis most likely describes the batch_size. Negative
18
10
  value means counting dimensions from the back. Accepted range is
19
11
  [-r, r-1] where r = rank(input).
20
12
  **Inputs**
21
13
  * **input** (heterogeneous) - **T**:
14
+ The input tensor of rank >= axis.
22
- The input tensor that's coerced into a 2D matrix of size (NxD) as
23
- described above.
24
15
  **Outputs**
25
16
  * **output** (heterogeneous) - **T**:
26
- The output values with the same shape as input tensor (the original
17
+ The output values with the same shape as the input tensor.
27
- size without coercion).
28
18
  **Type Constraints**
29
19
  * **T** in (
20
+ tensor(bfloat16),
30
21
  tensor(double),
31
22
  tensor(float),
32
23
  tensor(float16)
33
24
  ):
34
25
  Constrain input and output types to float tensors.