SoftmaxCrossEntropyLoss - 12 vs 13#

Next section compares an older to a newer version of the same operator after both definition are converted into markdown text. Green means an addition to the newer version, red means a deletion. Anything else is unchanged.

SoftmaxCrossEntropyLoss12 → SoftmaxCrossEntropyLoss13 RENAMED
@@ -1 +1 @@
1
1
  Loss function that measures the softmax cross entropy
2
2
  between 'scores' and 'labels'.
3
3
  This operator first computes a loss tensor whose shape is identical to the labels input.
4
4
  If the input is 2-D with shape (N, C), the loss tensor may be a N-element vector L = (l_1, l_2, ..., l_N).
5
5
  If the input is N-D tensor with shape (N, C, D1, D2, ..., Dk),
6
6
  the loss tensor L may have (N, D1, D2, ..., Dk) as its shape and L[i,][j_1][j_2]...[j_k] denotes a scalar element in L.
7
7
  After L is available, this operator can optionally do a reduction operator.
8
8
  shape(scores): (N, C) where C is the number of classes, or (N, C, D1, D2,..., Dk),
9
9
  with K >= 1 in case of K-dimensional loss.
10
10
  shape(labels): (N) where each value is 0 <= labels[i] <= C-1, or (N, D1, D2,..., Dk),
11
11
  with K >= 1 in case of K-dimensional loss.
12
12
  The loss for one sample, l_i, can caculated as follows:
13
13
  l[i][d1][d2]...[dk] = -y[i][c][d1][d2]..[dk], where i is the index of classes.
14
14
  or
15
15
  l[i][d1][d2]...[dk] = -y[i][c][d1][d2]..[dk] * weights[c], if 'weights' is provided.
16
16
  loss is zero for the case when label-value equals ignore_index.
17
17
  l[i][d1][d2]...[dk] = 0, when labels[n][d1][d2]...[dk] = ignore_index
18
18
  where:
19
19
  p = Softmax(scores)
20
20
  y = Log(p)
21
21
  c = labels[i][d1][d2]...[dk]
22
22
  Finally, L is optionally reduced:
23
23
  If reduction = 'none', the output is L with shape (N, D1, D2, ..., Dk).
24
24
  If reduction = 'sum', the output is scalar: Sum(L).
25
25
  If reduction = 'mean', the output is scalar: ReduceMean(L), or if weight is provided: ReduceSum(L) / ReduceSum(W),
26
26
  where tensor W is of shape (N, D1, D2, ..., Dk) and W[n][d1][d2]...[dk] = weights[labels[i][d1][d2]...[dk]].
27
27
  **Attributes**
28
28
  * **ignore_index**:
29
29
  Specifies a target value that is ignored and does not contribute to
30
30
  the input gradient. It's an optional value.
31
31
  * **reduction**:
32
32
  Type of reduction to apply to loss: none, sum, mean(default).
33
33
  'none': no reduction will be applied, 'sum': the output will be
34
34
  summed. 'mean': the sum of the output will be divided by the number
35
35
  of elements in the output.
36
36
  **Inputs**
37
37
  Between 2 and 3 inputs.
38
38
  * **scores** (heterogeneous) - **T**:
39
39
  The predicted outputs with shape [batch_size, class_size], or
40
40
  [batch_size, class_size, D1, D2 , ..., Dk], where K is the number of
41
41
  dimensions.
42
42
  * **labels** (heterogeneous) - **Tind**:
43
43
  The ground truth output tensor, with shape [batch_size], or
44
44
  [batch_size, D1, D2, ..., Dk], where K is the number of dimensions.
45
45
  Labels element value shall be in range of [0, C). If ignore_index is
46
46
  specified, it may have a value outside [0, C) and the label values
47
47
  should either be in the range [0, C) or have the value ignore_index.
48
48
  * **weights** (optional, heterogeneous) - **T**:
49
49
  A manual rescaling weight given to each class. If given, it has to
50
50
  be a 1D Tensor assigning weight to each of the classes. Otherwise,
51
51
  it is treated as if having all ones.
52
52
  **Outputs**
53
53
  Between 1 and 2 outputs.
54
54
  * **output** (heterogeneous) - **T**:
55
55
  Weighted loss float Tensor. If reduction is 'none', this has the
56
56
  shape of [batch_size], or [batch_size, D1, D2, ..., Dk] in case of
57
57
  K-dimensional loss. Otherwise, it is a scalar.
58
58
  * **log_prob** (optional, heterogeneous) - **T**:
59
59
  Log probability tensor. If the output of softmax is prob, its value
60
60
  is log(prob).
61
61
  **Type Constraints**
62
62
  * **T** in (
63
- tensor(bfloat16),
64
63
  tensor(double),
65
64
  tensor(float),
66
65
  tensor(float16)
67
66
  ):
68
67
  Constrain input and output types to float tensors.
69
68
  * **Tind** in (
70
69
  tensor(int32),
71
70
  tensor(int64)
72
71
  ):
73
72
  Constrain target to integer types