SoftmaxCrossEntropyLoss#

SoftmaxCrossEntropyLoss - 13#

Version

This version of the operator has been available since version 13.

Summary

Loss function that measures the softmax cross entropy between ‘scores’ and ‘labels’. This operator first computes a loss tensor whose shape is identical to the labels input. If the input is 2-D with shape (N, C), the loss tensor may be a N-element vector L = (l_1, l_2, …, l_N). If the input is N-D tensor with shape (N, C, D1, D2, …, Dk), the loss tensor L may have (N, D1, D2, …, Dk) as its shape and L[i,][j_1][j_2]…[j_k] denotes a scalar element in L. After L is available, this operator can optionally do a reduction operator.

shape(scores): (N, C) where C is the number of classes, or (N, C, D1, D2,…, Dk),

with K >= 1 in case of K-dimensional loss.

shape(labels): (N) where each value is 0 <= labels[i] <= C-1, or (N, D1, D2,…, Dk),

with K >= 1 in case of K-dimensional loss.

The loss for one sample, l_i, can caculated as follows:

l[i][d1][d2]…[dk] = -y[i][c][d1][d2]..[dk], where i is the index of classes.

or

l[i][d1][d2]…[dk] = -y[i][c][d1][d2]..[dk] * weights[c], if ‘weights’ is provided.

loss is zero for the case when label-value equals ignore_index.

l[i][d1][d2]…[dk] = 0, when labels[n][d1][d2]…[dk] = ignore_index

where:

p = Softmax(scores) y = Log(p) c = labels[i][d1][d2]…[dk]

Finally, L is optionally reduced: If reduction = ‘none’, the output is L with shape (N, D1, D2, …, Dk). If reduction = ‘sum’, the output is scalar: Sum(L). If reduction = ‘mean’, the output is scalar: ReduceMean(L), or if weight is provided: ReduceSum(L) / ReduceSum(W), where tensor W is of shape (N, D1, D2, …, Dk) and W[n][d1][d2]…[dk] = weights[labels[i][d1][d2]…[dk]].

Attributes

  • ignore_index: Specifies a target value that is ignored and does not contribute to the input gradient. It’s an optional value.

  • reduction: Type of reduction to apply to loss: none, sum, mean(default). ‘none’: no reduction will be applied, ‘sum’: the output will be summed. ‘mean’: the sum of the output will be divided by the number of elements in the output. Default value is 'mean'.

Inputs

Between 2 and 3 inputs.

  • scores (heterogeneous) - T: The predicted outputs with shape [batch_size, class_size], or [batch_size, class_size, D1, D2 , …, Dk], where K is the number of dimensions.

  • labels (heterogeneous) - Tind: The ground truth output tensor, with shape [batch_size], or [batch_size, D1, D2, …, Dk], where K is the number of dimensions. Labels element value shall be in range of [0, C). If ignore_index is specified, it may have a value outside [0, C) and the label values should either be in the range [0, C) or have the value ignore_index.

  • weights (optional, heterogeneous) - T: A manual rescaling weight given to each class. If given, it has to be a 1D Tensor assigning weight to each of the classes. Otherwise, it is treated as if having all ones.

Outputs

Between 1 and 2 outputs.

  • output (heterogeneous) - T: Weighted loss float Tensor. If reduction is ‘none’, this has the shape of [batch_size], or [batch_size, D1, D2, …, Dk] in case of K-dimensional loss. Otherwise, it is a scalar.

  • log_prob (optional, heterogeneous) - T: Log probability tensor. If the output of softmax is prob, its value is log(prob).

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

  • Tind in ( tensor(int32), tensor(int64) ): Constrain target to integer types

Examples

Differences

00Loss function that measures the softmax cross entropyLoss function that measures the softmax cross entropy
11between 'scores' and 'labels'.between 'scores' and 'labels'.
22This operator first computes a loss tensor whose shape is identical to the labels input.This operator first computes a loss tensor whose shape is identical to the labels input.
33If the input is 2-D with shape (N, C), the loss tensor may be a N-element vector L = (l_1, l_2, ..., l_N).If the input is 2-D with shape (N, C), the loss tensor may be a N-element vector L = (l_1, l_2, ..., l_N).
44If the input is N-D tensor with shape (N, C, D1, D2, ..., Dk),If the input is N-D tensor with shape (N, C, D1, D2, ..., Dk),
55the loss tensor L may have (N, D1, D2, ..., Dk) as its shape and L[i,][j_1][j_2]...[j_k] denotes a scalar element in L.the loss tensor L may have (N, D1, D2, ..., Dk) as its shape and L[i,][j_1][j_2]...[j_k] denotes a scalar element in L.
66After L is available, this operator can optionally do a reduction operator.After L is available, this operator can optionally do a reduction operator.
77
88shape(scores): (N, C) where C is the number of classes, or (N, C, D1, D2,..., Dk),shape(scores): (N, C) where C is the number of classes, or (N, C, D1, D2,..., Dk),
99 with K >= 1 in case of K-dimensional loss. with K >= 1 in case of K-dimensional loss.
1010shape(labels): (N) where each value is 0 <= labels[i] <= C-1, or (N, D1, D2,..., Dk),shape(labels): (N) where each value is 0 <= labels[i] <= C-1, or (N, D1, D2,..., Dk),
1111 with K >= 1 in case of K-dimensional loss. with K >= 1 in case of K-dimensional loss.
1212
1313The loss for one sample, l_i, can caculated as follows:The loss for one sample, l_i, can caculated as follows:
1414 l[i][d1][d2]...[dk] = -y[i][c][d1][d2]..[dk], where i is the index of classes. l[i][d1][d2]...[dk] = -y[i][c][d1][d2]..[dk], where i is the index of classes.
1515oror
1616 l[i][d1][d2]...[dk] = -y[i][c][d1][d2]..[dk] * weights[c], if 'weights' is provided. l[i][d1][d2]...[dk] = -y[i][c][d1][d2]..[dk] * weights[c], if 'weights' is provided.
1717
1818loss is zero for the case when label-value equals ignore_index.loss is zero for the case when label-value equals ignore_index.
1919 l[i][d1][d2]...[dk] = 0, when labels[n][d1][d2]...[dk] = ignore_index l[i][d1][d2]...[dk] = 0, when labels[n][d1][d2]...[dk] = ignore_index
2020
2121where:where:
2222 p = Softmax(scores) p = Softmax(scores)
2323 y = Log(p) y = Log(p)
2424 c = labels[i][d1][d2]...[dk] c = labels[i][d1][d2]...[dk]
2525
2626Finally, L is optionally reduced:Finally, L is optionally reduced:
2727If reduction = 'none', the output is L with shape (N, D1, D2, ..., Dk).If reduction = 'none', the output is L with shape (N, D1, D2, ..., Dk).
2828If reduction = 'sum', the output is scalar: Sum(L).If reduction = 'sum', the output is scalar: Sum(L).
2929If reduction = 'mean', the output is scalar: ReduceMean(L), or if weight is provided: ReduceSum(L) / ReduceSum(W),If reduction = 'mean', the output is scalar: ReduceMean(L), or if weight is provided: ReduceSum(L) / ReduceSum(W),
3030where tensor W is of shape (N, D1, D2, ..., Dk) and W[n][d1][d2]...[dk] = weights[labels[i][d1][d2]...[dk]].where tensor W is of shape (N, D1, D2, ..., Dk) and W[n][d1][d2]...[dk] = weights[labels[i][d1][d2]...[dk]].
3131
3232**Attributes****Attributes**
3333
3434* **ignore_index**:* **ignore_index**:
3535 Specifies a target value that is ignored and does not contribute to Specifies a target value that is ignored and does not contribute to
3636 the input gradient. It's an optional value. the input gradient. It's an optional value.
3737* **reduction**:* **reduction**:
3838 Type of reduction to apply to loss: none, sum, mean(default). Type of reduction to apply to loss: none, sum, mean(default).
3939 'none': no reduction will be applied, 'sum': the output will be 'none': no reduction will be applied, 'sum': the output will be
4040 summed. 'mean': the sum of the output will be divided by the number summed. 'mean': the sum of the output will be divided by the number
4141 of elements in the output. Default value is 'mean'. of elements in the output. Default value is 'mean'.
4242
4343**Inputs****Inputs**
4444
4545Between 2 and 3 inputs.Between 2 and 3 inputs.
4646
4747* **scores** (heterogeneous) - **T**:* **scores** (heterogeneous) - **T**:
4848 The predicted outputs with shape [batch_size, class_size], or The predicted outputs with shape [batch_size, class_size], or
4949 [batch_size, class_size, D1, D2 , ..., Dk], where K is the number of [batch_size, class_size, D1, D2 , ..., Dk], where K is the number of
5050 dimensions. dimensions.
5151* **labels** (heterogeneous) - **Tind**:* **labels** (heterogeneous) - **Tind**:
5252 The ground truth output tensor, with shape [batch_size], or The ground truth output tensor, with shape [batch_size], or
5353 [batch_size, D1, D2, ..., Dk], where K is the number of dimensions. [batch_size, D1, D2, ..., Dk], where K is the number of dimensions.
5454 Labels element value shall be in range of [0, C). If ignore_index is Labels element value shall be in range of [0, C). If ignore_index is
5555 specified, it may have a value outside [0, C) and the label values specified, it may have a value outside [0, C) and the label values
5656 should either be in the range [0, C) or have the value ignore_index. should either be in the range [0, C) or have the value ignore_index.
5757* **weights** (optional, heterogeneous) - **T**:* **weights** (optional, heterogeneous) - **T**:
5858 A manual rescaling weight given to each class. If given, it has to A manual rescaling weight given to each class. If given, it has to
5959 be a 1D Tensor assigning weight to each of the classes. Otherwise, be a 1D Tensor assigning weight to each of the classes. Otherwise,
6060 it is treated as if having all ones. it is treated as if having all ones.
6161
6262**Outputs****Outputs**
6363
6464Between 1 and 2 outputs.Between 1 and 2 outputs.
6565
6666* **output** (heterogeneous) - **T**:* **output** (heterogeneous) - **T**:
6767 Weighted loss float Tensor. If reduction is 'none', this has the Weighted loss float Tensor. If reduction is 'none', this has the
6868 shape of [batch_size], or [batch_size, D1, D2, ..., Dk] in case of shape of [batch_size], or [batch_size, D1, D2, ..., Dk] in case of
6969 K-dimensional loss. Otherwise, it is a scalar. K-dimensional loss. Otherwise, it is a scalar.
7070* **log_prob** (optional, heterogeneous) - **T**:* **log_prob** (optional, heterogeneous) - **T**:
7171 Log probability tensor. If the output of softmax is prob, its value Log probability tensor. If the output of softmax is prob, its value
7272 is log(prob). is log(prob).
7373
7474**Type Constraints****Type Constraints**
7575
7676* **T** in (* **T** in (
77 tensor(bfloat16),
7778 tensor(double), tensor(double),
7879 tensor(float), tensor(float),
7980 tensor(float16) tensor(float16)
8081 ): ):
8182 Constrain input and output types to float tensors. Constrain input and output types to float tensors.
8283* **Tind** in (* **Tind** in (
8384 tensor(int32), tensor(int32),
8485 tensor(int64) tensor(int64)
8586 ): ):
8687 Constrain target to integer types Constrain target to integer types

SoftmaxCrossEntropyLoss - 12#

Version

This version of the operator has been available since version 12.

Summary

Loss function that measures the softmax cross entropy between ‘scores’ and ‘labels’. This operator first computes a loss tensor whose shape is identical to the labels input. If the input is 2-D with shape (N, C), the loss tensor may be a N-element vector L = (l_1, l_2, …, l_N). If the input is N-D tensor with shape (N, C, D1, D2, …, Dk), the loss tensor L may have (N, D1, D2, …, Dk) as its shape and L[i,][j_1][j_2]…[j_k] denotes a scalar element in L. After L is available, this operator can optionally do a reduction operator.

shape(scores): (N, C) where C is the number of classes, or (N, C, D1, D2,…, Dk),

with K >= 1 in case of K-dimensional loss.

shape(labels): (N) where each value is 0 <= labels[i] <= C-1, or (N, D1, D2,…, Dk),

with K >= 1 in case of K-dimensional loss.

The loss for one sample, l_i, can caculated as follows:

l[i][d1][d2]…[dk] = -y[i][c][d1][d2]..[dk], where i is the index of classes.

or

l[i][d1][d2]…[dk] = -y[i][c][d1][d2]..[dk] * weights[c], if ‘weights’ is provided.

loss is zero for the case when label-value equals ignore_index.

l[i][d1][d2]…[dk] = 0, when labels[n][d1][d2]…[dk] = ignore_index

where:

p = Softmax(scores) y = Log(p) c = labels[i][d1][d2]…[dk]

Finally, L is optionally reduced: If reduction = ‘none’, the output is L with shape (N, D1, D2, …, Dk). If reduction = ‘sum’, the output is scalar: Sum(L). If reduction = ‘mean’, the output is scalar: ReduceMean(L), or if weight is provided: ReduceSum(L) / ReduceSum(W), where tensor W is of shape (N, D1, D2, …, Dk) and W[n][d1][d2]…[dk] = weights[labels[i][d1][d2]…[dk]].

Attributes

  • ignore_index: Specifies a target value that is ignored and does not contribute to the input gradient. It’s an optional value.

  • reduction: Type of reduction to apply to loss: none, sum, mean(default). ‘none’: no reduction will be applied, ‘sum’: the output will be summed. ‘mean’: the sum of the output will be divided by the number of elements in the output. Default value is 'mean'.

Inputs

Between 2 and 3 inputs.

  • scores (heterogeneous) - T: The predicted outputs with shape [batch_size, class_size], or [batch_size, class_size, D1, D2 , …, Dk], where K is the number of dimensions.

  • labels (heterogeneous) - Tind: The ground truth output tensor, with shape [batch_size], or [batch_size, D1, D2, …, Dk], where K is the number of dimensions. Labels element value shall be in range of [0, C). If ignore_index is specified, it may have a value outside [0, C) and the label values should either be in the range [0, C) or have the value ignore_index.

  • weights (optional, heterogeneous) - T: A manual rescaling weight given to each class. If given, it has to be a 1D Tensor assigning weight to each of the classes. Otherwise, it is treated as if having all ones.

Outputs

Between 1 and 2 outputs.

  • output (heterogeneous) - T: Weighted loss float Tensor. If reduction is ‘none’, this has the shape of [batch_size], or [batch_size, D1, D2, …, Dk] in case of K-dimensional loss. Otherwise, it is a scalar.

  • log_prob (optional, heterogeneous) - T: Log probability tensor. If the output of softmax is prob, its value is log(prob).

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

  • Tind in ( tensor(int32), tensor(int64) ): Constrain target to integer types