NegativeLogLikelihoodLoss - 12 vs 13#
Next section compares an older to a newer version of the same operator after both definition are converted into markdown text. Green means an addition to the newer version, red means a deletion. Anything else is unchanged.
NegativeLogLikelihoodLoss12 → NegativeLogLikelihoodLoss13
RENAMED
@@ -1 +1 @@
|
|
1
1
|
A NegativeLogLikelihoodLoss operator computes (weighted) negative log likelihood loss.
|
2
2
|
Its "input" tensor has the shape of (N, C, d1, d2, ..., dk) where k >= 0.
|
3
3
|
The "input" tensor contains log-probabilities for input[n, :, d_1, d_2,..., d_k] being in a class of [0, C).
|
4
4
|
The operator's "target" input tensor has the shape of (N, d1, d2, ..., dk). It encodes class labels (one of C classes)
|
5
5
|
or it may contain a special value (indicated by an attribute ignore_index) for N x d1 x d2 x ... x dk samples.
|
6
6
|
The loss value for input[n, :, d_1, d_2,...d_k] being classified as class c = target[n][d_1][d_2]...[d_k] is computed as:
|
7
|
-
|
8
7
|
loss[n][d_1][d_2]...[d_k] = -input[n][c][d_1][d_2]...[d_k].
|
9
|
-
|
10
8
|
When an optional "weight" is provided, the sample loss is calculated as:
|
11
|
-
|
12
9
|
loss[n][d_1][d_2]...[d_k] = -input[n][c][d_1][d_2]...[d_k] * weight[c].
|
13
|
-
|
14
10
|
loss is zero for the case when target-value equals ignore_index.
|
15
11
|
loss[n][d_1][d_2]...[d_k] = 0, when target[n][d_1][d_2]...[d_k] = ignore_index
|
16
|
-
|
17
12
|
If "reduction" attribute is set to "none", the operator's output will be the above loss with shape (N, d1, d2, ..., dk).
|
18
13
|
If "reduction" attribute is set to "mean" (the default attribute value), the output loss is (weight) averaged:
|
19
|
-
|
20
14
|
mean(loss), if "weight" is not provided,
|
21
|
-
|
22
15
|
or if weight is provided,
|
23
|
-
|
24
16
|
sum(loss) / sum(weight[target[n][d_1][d_2]...[d_k]]]), for all samples.
|
25
|
-
|
26
17
|
If "reduction" attribute is set to "sum", the output is a scalar:
|
27
18
|
sum(loss).
|
28
|
-
|
29
19
|
See also https://pytorch.org/docs/stable/nn.html#torch.nn.NLLLoss.
|
30
|
-
|
31
20
|
Example 1:
|
32
|
-
|
33
21
|
// negative log likelihood loss, "none" reduction
|
34
22
|
N, C, d1 = 2, 3, 2
|
35
23
|
input = [[[1.0, 2.0], [2.0, 2.0], [3.0, 2.0]],
|
36
24
|
[[0.0, 1.0], [2.0, 2.0], [1.0, 2]]]
|
37
25
|
target = [[2, 1], [0, 2]]
|
38
|
-
|
39
26
|
loss = np.zeros((N, d1))
|
40
27
|
for n in range(N):
|
41
28
|
for d_1 in range(d1):
|
42
29
|
c = target[n][d_1]
|
43
30
|
loss[n][d_1] = -input[n][c][d_1]
|
44
|
-
|
45
31
|
// print(loss)
|
46
32
|
// [[-3. -2.]
|
47
33
|
// [-0. -2.]]
|
48
|
-
|
49
34
|
Example 2:
|
50
|
-
|
51
35
|
// weighted negative log likelihood loss, sum reduction
|
52
36
|
N, C, d1 = 2, 3, 2
|
53
37
|
input = [[[1.0, 2.0], [2.0, 2.0], [3.0, 2.0]],
|
54
38
|
[[0.0, 1.0], [2.0, 2.0], [1.0, 2]]]
|
55
39
|
target = [[2, 1], [0, 2]]
|
56
40
|
weight = [0.2, 0.3, 0.1]
|
57
41
|
loss = np.zeros((N, d1))
|
58
42
|
for n in range(N):
|
59
43
|
for d_1 in range(d1):
|
60
44
|
c = target[n][d_1]
|
61
45
|
loss[n][d_1] = -input[n][c][d_1] * weight[c]
|
62
|
-
|
63
46
|
loss = np.sum(loss)
|
64
47
|
// print(loss)
|
65
48
|
// -1.1
|
66
|
-
|
67
49
|
Example 3:
|
68
|
-
|
69
50
|
// weighted negative log likelihood loss, mean reduction
|
70
51
|
N, C, d1 = 2, 3, 2
|
71
52
|
input = [[[1.0, 2.0], [2.0, 2.0], [3.0, 2.0]],
|
72
53
|
[[0.0, 1.0], [2.0, 2.0], [1.0, 2]]]
|
73
54
|
target = [[2, 1], [0, 2]]
|
74
55
|
weight = [0.2, 0.3, 0.1]
|
75
56
|
loss = np.zeros((N, d1))
|
76
57
|
weight_total = 0
|
77
58
|
for n in range(N):
|
78
59
|
for d_1 in range(d1):
|
79
60
|
c = target[n][d_1]
|
80
61
|
loss[n][d_1] = -input[n][c][d_1] * weight[c]
|
81
62
|
weight_total = weight_total + weight[c]
|
82
|
-
|
83
63
|
loss = np.sum(loss) / weight_total
|
84
64
|
// print(loss)
|
85
65
|
// -1.57
|
86
66
|
**Attributes**
|
87
67
|
* **ignore_index**:
|
88
68
|
Specifies a target value that is ignored and does not contribute to
|
89
69
|
the input gradient. It's an optional value.
|
90
70
|
* **reduction**:
|
91
71
|
Type of reduction to apply to loss: none, sum, mean (default).
|
92
72
|
'none': the output is the loss for each sample. 'sum': the output
|
93
73
|
will be summed. 'mean': the sum of the output will be divided by the
|
94
74
|
sum of applied weights.
|
95
75
|
**Inputs**
|
96
76
|
Between 2 and 3 inputs.
|
97
77
|
* **input** (heterogeneous) - **T**:
|
98
78
|
Input tensor of shape (N, C) or (N, C, d1, d2, ..., dk).
|
99
79
|
* **target** (heterogeneous) - **Tind**:
|
100
80
|
Target tensor of shape (N) or (N, d1, d2, ..., dk). Target element
|
101
81
|
value shall be in range of [0, C). If ignore_index is specified, it
|
102
82
|
may have a value outside [0, C) and the target values should either
|
103
83
|
be in the range [0, C) or have the value ignore_index.
|
104
84
|
* **weight** (optional, heterogeneous) - **T**:
|
105
85
|
Optional rescaling weight tensor. If given, it has to be a tensor of
|
106
86
|
size C. Otherwise, it is treated as if having all ones.
|
107
87
|
**Outputs**
|
108
88
|
* **loss** (heterogeneous) - **T**:
|
109
89
|
The negative log likelihood loss
|
110
90
|
**Type Constraints**
|
111
91
|
* **T** in (
|
112
92
|
tensor(double),
|
113
93
|
tensor(float),
|
114
94
|
tensor(float16)
|
115
95
|
):
|
116
96
|
Constrain input, weight, and output types to floating-point tensors.
|
117
97
|
* **Tind** in (
|
118
98
|
tensor(int32),
|
119
99
|
tensor(int64)
|
120
100
|
):
|
121
101
|
Constrain target to integer types
|