SoftmaxCrossEntropyLoss - 12 vs 13#
Next section compares an older to a newer version of the same operator after both definition are converted into markdown text. Green means an addition to the newer version, red means a deletion. Anything else is unchanged.
SoftmaxCrossEntropyLoss12 → SoftmaxCrossEntropyLoss13
RENAMED
@@ -1 +1 @@
|
|
1
1
|
Loss function that measures the softmax cross entropy
|
2
2
|
between 'scores' and 'labels'.
|
3
3
|
This operator first computes a loss tensor whose shape is identical to the labels input.
|
4
4
|
If the input is 2-D with shape (N, C), the loss tensor may be a N-element vector L = (l_1, l_2, ..., l_N).
|
5
5
|
If the input is N-D tensor with shape (N, C, D1, D2, ..., Dk),
|
6
6
|
the loss tensor L may have (N, D1, D2, ..., Dk) as its shape and L[i,][j_1][j_2]...[j_k] denotes a scalar element in L.
|
7
7
|
After L is available, this operator can optionally do a reduction operator.
|
8
8
|
shape(scores): (N, C) where C is the number of classes, or (N, C, D1, D2,..., Dk),
|
9
9
|
with K >= 1 in case of K-dimensional loss.
|
10
10
|
shape(labels): (N) where each value is 0 <= labels[i] <= C-1, or (N, D1, D2,..., Dk),
|
11
11
|
with K >= 1 in case of K-dimensional loss.
|
12
12
|
The loss for one sample, l_i, can caculated as follows:
|
13
13
|
l[i][d1][d2]...[dk] = -y[i][c][d1][d2]..[dk], where i is the index of classes.
|
14
14
|
or
|
15
15
|
l[i][d1][d2]...[dk] = -y[i][c][d1][d2]..[dk] * weights[c], if 'weights' is provided.
|
16
16
|
loss is zero for the case when label-value equals ignore_index.
|
17
17
|
l[i][d1][d2]...[dk] = 0, when labels[n][d1][d2]...[dk] = ignore_index
|
18
18
|
where:
|
19
19
|
p = Softmax(scores)
|
20
20
|
y = Log(p)
|
21
21
|
c = labels[i][d1][d2]...[dk]
|
22
22
|
Finally, L is optionally reduced:
|
23
23
|
If reduction = 'none', the output is L with shape (N, D1, D2, ..., Dk).
|
24
24
|
If reduction = 'sum', the output is scalar: Sum(L).
|
25
25
|
If reduction = 'mean', the output is scalar: ReduceMean(L), or if weight is provided: ReduceSum(L) / ReduceSum(W),
|
26
26
|
where tensor W is of shape (N, D1, D2, ..., Dk) and W[n][d1][d2]...[dk] = weights[labels[i][d1][d2]...[dk]].
|
27
27
|
**Attributes**
|
28
28
|
* **ignore_index**:
|
29
29
|
Specifies a target value that is ignored and does not contribute to
|
30
30
|
the input gradient. It's an optional value.
|
31
31
|
* **reduction**:
|
32
32
|
Type of reduction to apply to loss: none, sum, mean(default).
|
33
33
|
'none': no reduction will be applied, 'sum': the output will be
|
34
34
|
summed. 'mean': the sum of the output will be divided by the number
|
35
35
|
of elements in the output.
|
36
36
|
**Inputs**
|
37
37
|
Between 2 and 3 inputs.
|
38
38
|
* **scores** (heterogeneous) - **T**:
|
39
39
|
The predicted outputs with shape [batch_size, class_size], or
|
40
40
|
[batch_size, class_size, D1, D2 , ..., Dk], where K is the number of
|
41
41
|
dimensions.
|
42
42
|
* **labels** (heterogeneous) - **Tind**:
|
43
43
|
The ground truth output tensor, with shape [batch_size], or
|
44
44
|
[batch_size, D1, D2, ..., Dk], where K is the number of dimensions.
|
45
45
|
Labels element value shall be in range of [0, C). If ignore_index is
|
46
46
|
specified, it may have a value outside [0, C) and the label values
|
47
47
|
should either be in the range [0, C) or have the value ignore_index.
|
48
48
|
* **weights** (optional, heterogeneous) - **T**:
|
49
49
|
A manual rescaling weight given to each class. If given, it has to
|
50
50
|
be a 1D Tensor assigning weight to each of the classes. Otherwise,
|
51
51
|
it is treated as if having all ones.
|
52
52
|
**Outputs**
|
53
53
|
Between 1 and 2 outputs.
|
54
54
|
* **output** (heterogeneous) - **T**:
|
55
55
|
Weighted loss float Tensor. If reduction is 'none', this has the
|
56
56
|
shape of [batch_size], or [batch_size, D1, D2, ..., Dk] in case of
|
57
57
|
K-dimensional loss. Otherwise, it is a scalar.
|
58
58
|
* **log_prob** (optional, heterogeneous) - **T**:
|
59
59
|
Log probability tensor. If the output of softmax is prob, its value
|
60
60
|
is log(prob).
|
61
61
|
**Type Constraints**
|
62
62
|
* **T** in (
|
63
|
-
tensor(bfloat16),
|
64
63
|
tensor(double),
|
65
64
|
tensor(float),
|
66
65
|
tensor(float16)
|
67
66
|
):
|
68
67
|
Constrain input and output types to float tensors.
|
69
68
|
* **Tind** in (
|
70
69
|
tensor(int32),
|
71
70
|
tensor(int64)
|
72
71
|
):
|
73
72
|
Constrain target to integer types
|