NegativeLogLikelihoodLoss - 12 vs 13¶
NegativeLogLikelihoodLoss12 → NegativeLogLikelihoodLoss13
RENAMED
@@ -1 +1 @@
|
|
1
1
|
A NegativeLogLikelihoodLoss operator computes (weighted) negative log likelihood loss.
|
2
2
|
Its "input" tensor has the shape of (N, C, d1, d2, ..., dk) where k >= 0.
|
3
3
|
The "input" tensor contains log-probabilities for input[n, :, d_1, d_2,..., d_k] being in a class of [0, C).
|
4
4
|
The operator's "target" input tensor has the shape of (N, d1, d2, ..., dk). It encodes class labels (one of C classes)
|
5
5
|
or it may contain a special value (indicated by an attribute ignore_index) for N x d1 x d2 x ... x dk samples.
|
6
6
|
The loss value for input[n, :, d_1, d_2,...d_k] being classified as class c = target[n][d_1][d_2]...[d_k] is computed as:
|
7
|
+
|
7
8
|
loss[n][d_1][d_2]...[d_k] = -input[n][c][d_1][d_2]...[d_k].
|
9
|
+
|
8
10
|
When an optional "weight" is provided, the sample loss is calculated as:
|
11
|
+
|
9
12
|
loss[n][d_1][d_2]...[d_k] = -input[n][c][d_1][d_2]...[d_k] * weight[c].
|
13
|
+
|
10
14
|
loss is zero for the case when target-value equals ignore_index.
|
11
15
|
loss[n][d_1][d_2]...[d_k] = 0, when target[n][d_1][d_2]...[d_k] = ignore_index
|
16
|
+
|
12
17
|
If "reduction" attribute is set to "none", the operator's output will be the above loss with shape (N, d1, d2, ..., dk).
|
13
18
|
If "reduction" attribute is set to "mean" (the default attribute value), the output loss is (weight) averaged:
|
19
|
+
|
14
20
|
mean(loss), if "weight" is not provided,
|
21
|
+
|
15
22
|
or if weight is provided,
|
23
|
+
|
16
24
|
sum(loss) / sum(weight[target[n][d_1][d_2]...[d_k]]]), for all samples.
|
25
|
+
|
17
26
|
If "reduction" attribute is set to "sum", the output is a scalar:
|
18
27
|
sum(loss).
|
28
|
+
|
19
29
|
See also https://pytorch.org/docs/stable/nn.html#torch.nn.NLLLoss.
|
30
|
+
|
20
31
|
Example 1:
|
32
|
+
|
21
33
|
// negative log likelihood loss, "none" reduction
|
22
34
|
N, C, d1 = 2, 3, 2
|
23
35
|
input = [[[1.0, 2.0], [2.0, 2.0], [3.0, 2.0]],
|
24
36
|
[[0.0, 1.0], [2.0, 2.0], [1.0, 2]]]
|
25
37
|
target = [[2, 1], [0, 2]]
|
38
|
+
|
26
39
|
loss = np.zeros((N, d1))
|
27
40
|
for n in range(N):
|
28
41
|
for d_1 in range(d1):
|
29
42
|
c = target[n][d_1]
|
30
43
|
loss[n][d_1] = -input[n][c][d_1]
|
44
|
+
|
31
45
|
// print(loss)
|
32
46
|
// [[-3. -2.]
|
33
47
|
// [-0. -2.]]
|
48
|
+
|
34
49
|
Example 2:
|
50
|
+
|
35
51
|
// weighted negative log likelihood loss, sum reduction
|
36
52
|
N, C, d1 = 2, 3, 2
|
37
53
|
input = [[[1.0, 2.0], [2.0, 2.0], [3.0, 2.0]],
|
38
54
|
[[0.0, 1.0], [2.0, 2.0], [1.0, 2]]]
|
39
55
|
target = [[2, 1], [0, 2]]
|
40
56
|
weight = [0.2, 0.3, 0.1]
|
41
57
|
loss = np.zeros((N, d1))
|
42
58
|
for n in range(N):
|
43
59
|
for d_1 in range(d1):
|
44
60
|
c = target[n][d_1]
|
45
61
|
loss[n][d_1] = -input[n][c][d_1] * weight[c]
|
62
|
+
|
46
63
|
loss = np.sum(loss)
|
47
64
|
// print(loss)
|
48
65
|
// -1.1
|
66
|
+
|
49
67
|
Example 3:
|
68
|
+
|
50
69
|
// weighted negative log likelihood loss, mean reduction
|
51
70
|
N, C, d1 = 2, 3, 2
|
52
71
|
input = [[[1.0, 2.0], [2.0, 2.0], [3.0, 2.0]],
|
53
72
|
[[0.0, 1.0], [2.0, 2.0], [1.0, 2]]]
|
54
73
|
target = [[2, 1], [0, 2]]
|
55
74
|
weight = [0.2, 0.3, 0.1]
|
56
75
|
loss = np.zeros((N, d1))
|
57
76
|
weight_total = 0
|
58
77
|
for n in range(N):
|
59
78
|
for d_1 in range(d1):
|
60
79
|
c = target[n][d_1]
|
61
80
|
loss[n][d_1] = -input[n][c][d_1] * weight[c]
|
62
81
|
weight_total = weight_total + weight[c]
|
82
|
+
|
63
83
|
loss = np.sum(loss) / weight_total
|
64
84
|
// print(loss)
|
65
85
|
// -1.57
|
66
86
|
**Attributes**
|
67
87
|
* **ignore_index**:
|
68
88
|
Specifies a target value that is ignored and does not contribute to
|
69
89
|
the input gradient. It's an optional value.
|
70
90
|
* **reduction**:
|
71
91
|
Type of reduction to apply to loss: none, sum, mean (default).
|
72
92
|
'none': the output is the loss for each sample. 'sum': the output
|
73
93
|
will be summed. 'mean': the sum of the output will be divided by the
|
74
94
|
sum of applied weights.
|
75
95
|
**Inputs**
|
76
96
|
Between 2 and 3 inputs.
|
77
97
|
* **input** (heterogeneous) - **T**:
|
78
98
|
Input tensor of shape (N, C) or (N, C, d1, d2, ..., dk).
|
79
99
|
* **target** (heterogeneous) - **Tind**:
|
80
100
|
Target tensor of shape (N) or (N, d1, d2, ..., dk). Target element
|
81
101
|
value shall be in range of [0, C). If ignore_index is specified, it
|
82
102
|
may have a value outside [0, C) and the target values should either
|
83
103
|
be in the range [0, C) or have the value ignore_index.
|
84
104
|
* **weight** (optional, heterogeneous) - **T**:
|
85
105
|
Optional rescaling weight tensor. If given, it has to be a tensor of
|
86
106
|
size C. Otherwise, it is treated as if having all ones.
|
87
107
|
**Outputs**
|
88
108
|
* **loss** (heterogeneous) - **T**:
|
89
109
|
The negative log likelihood loss
|
90
110
|
**Type Constraints**
|
91
111
|
* **T** in (
|
92
112
|
tensor(double),
|
93
113
|
tensor(float),
|
94
114
|
tensor(float16)
|
95
115
|
):
|
96
116
|
Constrain input, weight, and output types to floating-point tensors.
|
97
117
|
* **Tind** in (
|
98
118
|
tensor(int32),
|
99
119
|
tensor(int64)
|
100
120
|
):
|
101
121
|
Constrain target to integer types
|