SoftmaxCrossEntropyLoss#
SoftmaxCrossEntropyLoss - 13#
Version
domain: main
since_version: 13
function: False
support_level: SupportType.COMMON
shape inference: True
This version of the operator has been available since version 13.
Summary
Loss function that measures the softmax cross entropy between ‘scores’ and ‘labels’. This operator first computes a loss tensor whose shape is identical to the labels input. If the input is 2-D with shape (N, C), the loss tensor may be a N-element vector L = (l_1, l_2, …, l_N). If the input is N-D tensor with shape (N, C, D1, D2, …, Dk), the loss tensor L may have (N, D1, D2, …, Dk) as its shape and L[i,][j_1][j_2]…[j_k] denotes a scalar element in L. After L is available, this operator can optionally do a reduction operator.
- shape(scores): (N, C) where C is the number of classes, or (N, C, D1, D2,…, Dk),
with K >= 1 in case of K-dimensional loss.
- shape(labels): (N) where each value is 0 <= labels[i] <= C-1, or (N, D1, D2,…, Dk),
with K >= 1 in case of K-dimensional loss.
- The loss for one sample, l_i, can caculated as follows:
l[i][d1][d2]…[dk] = -y[i][c][d1][d2]..[dk], where i is the index of classes.
- or
l[i][d1][d2]…[dk] = -y[i][c][d1][d2]..[dk] * weights[c], if ‘weights’ is provided.
- loss is zero for the case when label-value equals ignore_index.
l[i][d1][d2]…[dk] = 0, when labels[n][d1][d2]…[dk] = ignore_index
- where:
p = Softmax(scores) y = Log(p) c = labels[i][d1][d2]…[dk]
Finally, L is optionally reduced: If reduction = ‘none’, the output is L with shape (N, D1, D2, …, Dk). If reduction = ‘sum’, the output is scalar: Sum(L). If reduction = ‘mean’, the output is scalar: ReduceMean(L), or if weight is provided: ReduceSum(L) / ReduceSum(W), where tensor W is of shape (N, D1, D2, …, Dk) and W[n][d1][d2]…[dk] = weights[labels[i][d1][d2]…[dk]].
Attributes
ignore_index: Specifies a target value that is ignored and does not contribute to the input gradient. It’s an optional value.
reduction: Type of reduction to apply to loss: none, sum, mean(default). ‘none’: no reduction will be applied, ‘sum’: the output will be summed. ‘mean’: the sum of the output will be divided by the number of elements in the output. Default value is
'mean'
.
Inputs
Between 2 and 3 inputs.
scores (heterogeneous) - T: The predicted outputs with shape [batch_size, class_size], or [batch_size, class_size, D1, D2 , …, Dk], where K is the number of dimensions.
labels (heterogeneous) - Tind: The ground truth output tensor, with shape [batch_size], or [batch_size, D1, D2, …, Dk], where K is the number of dimensions. Labels element value shall be in range of [0, C). If ignore_index is specified, it may have a value outside [0, C) and the label values should either be in the range [0, C) or have the value ignore_index.
weights (optional, heterogeneous) - T: A manual rescaling weight given to each class. If given, it has to be a 1D Tensor assigning weight to each of the classes. Otherwise, it is treated as if having all ones.
Outputs
Between 1 and 2 outputs.
output (heterogeneous) - T: Weighted loss float Tensor. If reduction is ‘none’, this has the shape of [batch_size], or [batch_size, D1, D2, …, Dk] in case of K-dimensional loss. Otherwise, it is a scalar.
log_prob (optional, heterogeneous) - T: Log probability tensor. If the output of softmax is prob, its value is log(prob).
Type Constraints
T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.
Tind in ( tensor(int32), tensor(int64) ): Constrain target to integer types
Examples
Differences
0 | 0 | Loss function that measures the softmax cross entropy | Loss function that measures the softmax cross entropy |
1 | 1 | between 'scores' and 'labels'. | between 'scores' and 'labels'. |
2 | 2 | This operator first computes a loss tensor whose shape is identical to the labels input. | This operator first computes a loss tensor whose shape is identical to the labels input. |
3 | 3 | If the input is 2-D with shape (N, C), the loss tensor may be a N-element vector L = (l_1, l_2, ..., l_N). | If the input is 2-D with shape (N, C), the loss tensor may be a N-element vector L = (l_1, l_2, ..., l_N). |
4 | 4 | If the input is N-D tensor with shape (N, C, D1, D2, ..., Dk), | If the input is N-D tensor with shape (N, C, D1, D2, ..., Dk), |
5 | 5 | the loss tensor L may have (N, D1, D2, ..., Dk) as its shape and L[i,][j_1][j_2]...[j_k] denotes a scalar element in L. | the loss tensor L may have (N, D1, D2, ..., Dk) as its shape and L[i,][j_1][j_2]...[j_k] denotes a scalar element in L. |
6 | 6 | After L is available, this operator can optionally do a reduction operator. | After L is available, this operator can optionally do a reduction operator. |
7 | 7 |
|
|
8 | 8 | shape(scores): (N, C) where C is the number of classes, or (N, C, D1, D2,..., Dk), | shape(scores): (N, C) where C is the number of classes, or (N, C, D1, D2,..., Dk), |
9 | 9 | with K >= 1 in case of K-dimensional loss. | with K >= 1 in case of K-dimensional loss. |
10 | 10 | shape(labels): (N) where each value is 0 <= labels[i] <= C-1, or (N, D1, D2,..., Dk), | shape(labels): (N) where each value is 0 <= labels[i] <= C-1, or (N, D1, D2,..., Dk), |
11 | 11 | with K >= 1 in case of K-dimensional loss. | with K >= 1 in case of K-dimensional loss. |
12 | 12 |
|
|
13 | 13 | The loss for one sample, l_i, can caculated as follows: | The loss for one sample, l_i, can caculated as follows: |
14 | 14 | l[i][d1][d2]...[dk] = -y[i][c][d1][d2]..[dk], where i is the index of classes. | l[i][d1][d2]...[dk] = -y[i][c][d1][d2]..[dk], where i is the index of classes. |
15 | 15 | or | or |
16 | 16 | l[i][d1][d2]...[dk] = -y[i][c][d1][d2]..[dk] * weights[c], if 'weights' is provided. | l[i][d1][d2]...[dk] = -y[i][c][d1][d2]..[dk] * weights[c], if 'weights' is provided. |
17 | 17 |
|
|
18 | 18 | loss is zero for the case when label-value equals ignore_index. | loss is zero for the case when label-value equals ignore_index. |
19 | 19 | l[i][d1][d2]...[dk] = 0, when labels[n][d1][d2]...[dk] = ignore_index | l[i][d1][d2]...[dk] = 0, when labels[n][d1][d2]...[dk] = ignore_index |
20 | 20 |
|
|
21 | 21 | where: | where: |
22 | 22 | p = Softmax(scores) | p = Softmax(scores) |
23 | 23 | y = Log(p) | y = Log(p) |
24 | 24 | c = labels[i][d1][d2]...[dk] | c = labels[i][d1][d2]...[dk] |
25 | 25 |
|
|
26 | 26 | Finally, L is optionally reduced: | Finally, L is optionally reduced: |
27 | 27 | If reduction = 'none', the output is L with shape (N, D1, D2, ..., Dk). | If reduction = 'none', the output is L with shape (N, D1, D2, ..., Dk). |
28 | 28 | If reduction = 'sum', the output is scalar: Sum(L). | If reduction = 'sum', the output is scalar: Sum(L). |
29 | 29 | If reduction = 'mean', the output is scalar: ReduceMean(L), or if weight is provided: ReduceSum(L) / ReduceSum(W), | If reduction = 'mean', the output is scalar: ReduceMean(L), or if weight is provided: ReduceSum(L) / ReduceSum(W), |
30 | 30 | where tensor W is of shape (N, D1, D2, ..., Dk) and W[n][d1][d2]...[dk] = weights[labels[i][d1][d2]...[dk]]. | where tensor W is of shape (N, D1, D2, ..., Dk) and W[n][d1][d2]...[dk] = weights[labels[i][d1][d2]...[dk]]. |
31 | 31 |
|
|
32 | 32 | **Attributes** | **Attributes** |
33 | 33 |
|
|
34 | 34 | * **ignore_index**: | * **ignore_index**: |
35 | 35 | Specifies a target value that is ignored and does not contribute to | Specifies a target value that is ignored and does not contribute to |
36 | 36 | the input gradient. It's an optional value. | the input gradient. It's an optional value. |
37 | 37 | * **reduction**: | * **reduction**: |
38 | 38 | Type of reduction to apply to loss: none, sum, mean(default). | Type of reduction to apply to loss: none, sum, mean(default). |
39 | 39 | 'none': no reduction will be applied, 'sum': the output will be | 'none': no reduction will be applied, 'sum': the output will be |
40 | 40 | summed. 'mean': the sum of the output will be divided by the number | summed. 'mean': the sum of the output will be divided by the number |
41 | 41 | of elements in the output. Default value is 'mean'. | of elements in the output. Default value is 'mean'. |
42 | 42 |
|
|
43 | 43 | **Inputs** | **Inputs** |
44 | 44 |
|
|
45 | 45 | Between 2 and 3 inputs. | Between 2 and 3 inputs. |
46 | 46 |
|
|
47 | 47 | * **scores** (heterogeneous) - **T**: | * **scores** (heterogeneous) - **T**: |
48 | 48 | The predicted outputs with shape [batch_size, class_size], or | The predicted outputs with shape [batch_size, class_size], or |
49 | 49 | [batch_size, class_size, D1, D2 , ..., Dk], where K is the number of | [batch_size, class_size, D1, D2 , ..., Dk], where K is the number of |
50 | 50 | dimensions. | dimensions. |
51 | 51 | * **labels** (heterogeneous) - **Tind**: | * **labels** (heterogeneous) - **Tind**: |
52 | 52 | The ground truth output tensor, with shape [batch_size], or | The ground truth output tensor, with shape [batch_size], or |
53 | 53 | [batch_size, D1, D2, ..., Dk], where K is the number of dimensions. | [batch_size, D1, D2, ..., Dk], where K is the number of dimensions. |
54 | 54 | Labels element value shall be in range of [0, C). If ignore_index is | Labels element value shall be in range of [0, C). If ignore_index is |
55 | 55 | specified, it may have a value outside [0, C) and the label values | specified, it may have a value outside [0, C) and the label values |
56 | 56 | should either be in the range [0, C) or have the value ignore_index. | should either be in the range [0, C) or have the value ignore_index. |
57 | 57 | * **weights** (optional, heterogeneous) - **T**: | * **weights** (optional, heterogeneous) - **T**: |
58 | 58 | A manual rescaling weight given to each class. If given, it has to | A manual rescaling weight given to each class. If given, it has to |
59 | 59 | be a 1D Tensor assigning weight to each of the classes. Otherwise, | be a 1D Tensor assigning weight to each of the classes. Otherwise, |
60 | 60 | it is treated as if having all ones. | it is treated as if having all ones. |
61 | 61 |
|
|
62 | 62 | **Outputs** | **Outputs** |
63 | 63 |
|
|
64 | 64 | Between 1 and 2 outputs. | Between 1 and 2 outputs. |
65 | 65 |
|
|
66 | 66 | * **output** (heterogeneous) - **T**: | * **output** (heterogeneous) - **T**: |
67 | 67 | Weighted loss float Tensor. If reduction is 'none', this has the | Weighted loss float Tensor. If reduction is 'none', this has the |
68 | 68 | shape of [batch_size], or [batch_size, D1, D2, ..., Dk] in case of | shape of [batch_size], or [batch_size, D1, D2, ..., Dk] in case of |
69 | 69 | K-dimensional loss. Otherwise, it is a scalar. | K-dimensional loss. Otherwise, it is a scalar. |
70 | 70 | * **log_prob** (optional, heterogeneous) - **T**: | * **log_prob** (optional, heterogeneous) - **T**: |
71 | 71 | Log probability tensor. If the output of softmax is prob, its value | Log probability tensor. If the output of softmax is prob, its value |
72 | 72 | is log(prob). | is log(prob). |
73 | 73 |
|
|
74 | 74 | **Type Constraints** | **Type Constraints** |
75 | 75 |
|
|
76 | 76 | * **T** in ( | * **T** in ( |
77 | tensor(bfloat16), | ||
77 | 78 | tensor(double), | tensor(double), |
78 | 79 | tensor(float), | tensor(float), |
79 | 80 | tensor(float16) | tensor(float16) |
80 | 81 | ): | ): |
81 | 82 | Constrain input and output types to float tensors. | Constrain input and output types to float tensors. |
82 | 83 | * **Tind** in ( | * **Tind** in ( |
83 | 84 | tensor(int32), | tensor(int32), |
84 | 85 | tensor(int64) | tensor(int64) |
85 | 86 | ): | ): |
86 | 87 | Constrain target to integer types | Constrain target to integer types |
SoftmaxCrossEntropyLoss - 12#
Version
domain: main
since_version: 12
function: False
support_level: SupportType.COMMON
shape inference: True
This version of the operator has been available since version 12.
Summary
Loss function that measures the softmax cross entropy between ‘scores’ and ‘labels’. This operator first computes a loss tensor whose shape is identical to the labels input. If the input is 2-D with shape (N, C), the loss tensor may be a N-element vector L = (l_1, l_2, …, l_N). If the input is N-D tensor with shape (N, C, D1, D2, …, Dk), the loss tensor L may have (N, D1, D2, …, Dk) as its shape and L[i,][j_1][j_2]…[j_k] denotes a scalar element in L. After L is available, this operator can optionally do a reduction operator.
- shape(scores): (N, C) where C is the number of classes, or (N, C, D1, D2,…, Dk),
with K >= 1 in case of K-dimensional loss.
- shape(labels): (N) where each value is 0 <= labels[i] <= C-1, or (N, D1, D2,…, Dk),
with K >= 1 in case of K-dimensional loss.
- The loss for one sample, l_i, can caculated as follows:
l[i][d1][d2]…[dk] = -y[i][c][d1][d2]..[dk], where i is the index of classes.
- or
l[i][d1][d2]…[dk] = -y[i][c][d1][d2]..[dk] * weights[c], if ‘weights’ is provided.
- loss is zero for the case when label-value equals ignore_index.
l[i][d1][d2]…[dk] = 0, when labels[n][d1][d2]…[dk] = ignore_index
- where:
p = Softmax(scores) y = Log(p) c = labels[i][d1][d2]…[dk]
Finally, L is optionally reduced: If reduction = ‘none’, the output is L with shape (N, D1, D2, …, Dk). If reduction = ‘sum’, the output is scalar: Sum(L). If reduction = ‘mean’, the output is scalar: ReduceMean(L), or if weight is provided: ReduceSum(L) / ReduceSum(W), where tensor W is of shape (N, D1, D2, …, Dk) and W[n][d1][d2]…[dk] = weights[labels[i][d1][d2]…[dk]].
Attributes
ignore_index: Specifies a target value that is ignored and does not contribute to the input gradient. It’s an optional value.
reduction: Type of reduction to apply to loss: none, sum, mean(default). ‘none’: no reduction will be applied, ‘sum’: the output will be summed. ‘mean’: the sum of the output will be divided by the number of elements in the output. Default value is
'mean'
.
Inputs
Between 2 and 3 inputs.
scores (heterogeneous) - T: The predicted outputs with shape [batch_size, class_size], or [batch_size, class_size, D1, D2 , …, Dk], where K is the number of dimensions.
labels (heterogeneous) - Tind: The ground truth output tensor, with shape [batch_size], or [batch_size, D1, D2, …, Dk], where K is the number of dimensions. Labels element value shall be in range of [0, C). If ignore_index is specified, it may have a value outside [0, C) and the label values should either be in the range [0, C) or have the value ignore_index.
weights (optional, heterogeneous) - T: A manual rescaling weight given to each class. If given, it has to be a 1D Tensor assigning weight to each of the classes. Otherwise, it is treated as if having all ones.
Outputs
Between 1 and 2 outputs.
output (heterogeneous) - T: Weighted loss float Tensor. If reduction is ‘none’, this has the shape of [batch_size], or [batch_size, D1, D2, …, Dk] in case of K-dimensional loss. Otherwise, it is a scalar.
log_prob (optional, heterogeneous) - T: Log probability tensor. If the output of softmax is prob, its value is log(prob).
Type Constraints
T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.
Tind in ( tensor(int32), tensor(int64) ): Constrain target to integer types