BatchNormalization - 6 vs 7#

Next section compares an older to a newer version of the same operator after both definition are converted into markdown text. Green means an addition to the newer version, red means a deletion. Anything else is unchanged.

Files changed (1) hide show

BatchNormalization6 → BatchNormalization7 +25 -22

BatchNormalization6 → BatchNormalization7 RENAMED Viewed

@@ -1 +1 @@
  Carries out batch normalization as described in the paper
  https://arxiv.org/abs/1502.03167. Depending on the mode it is being run,
  there are multiple cases for the number of outputs, which we list below:
  Output case #1: Y, mean, var, saved_mean, saved_var (training mode)
  Output case #2: Y (test mode)
-     This operator has **optional** inputs/outputs. See ONNX <https://github.com/onnx/onnx/blob/master/docs/IR.md>_ for more details about the representation of optional arguments. An empty string may be used in the place of an actual argument's name to indicate a missing argument. Trailing optional arguments (those not followed by an argument that is present) may also be simply omitted.
  **Attributes**
  * **epsilon**:
-   The epsilon value to use to avoid division by zero.
+   The epsilon value to use to avoid division by zero, default is
+   1e-5f.
+ * **is_test**:
+   If set to nonzero, run spatial batch normalization in test mode,
+   default is 0.
  * **momentum**:
    Factor used in computing the running mean and variance.e.g.,
-   running_mean = running_mean * momentum + mean * (1 - momentum).
+   running_mean = running_mean * momentum + mean * (1 - momentum),
+   default is 0.9f.
  * **spatial**:
-   If true, compute the mean and variance across per activation. If
+   If true, compute the mean and variance across all spatial elements
-   false, compute the mean and variance across per feature over each
+   If false, compute the mean and variance across per feature.Default
-   mini-batch.
+   is 1.
  **Inputs**
  * **X** (heterogeneous) - **T**:
    Input data tensor from the previous operator; dimensions for image
    case are (N x C x H x W), where N is the batch size, C is the number
    of channels, and H and W are the height and the width of the data.
    For non image case, the dimensions are in the form of (N x C x D1 x
    D2 ... Dn), where N is the batch size.
  * **scale** (heterogeneous) - **T**:
-   If spatial is true, the dimension of scale is (C). If spatial is
-   false, the dimensions of scale are (C x D1 x ... x Dn)
+   The scale as a 1-dimensional tensor of size C to be applied to the
+   output.
  * **B** (heterogeneous) - **T**:
-   If spatial is true, the dimension of bias is (C). If spatial is
-   false, the dimensions of bias are (C x D1 x ... x Dn)
+   The bias as a 1-dimensional tensor of size C to be applied to the
+   output.
  * **mean** (heterogeneous) - **T**:
-   If spatial is true, the dimension of the running mean (training) or
-   the estimated mean (testing) is (C). If spatial is false, the
-   dimensions of the running mean (training) or the estimated mean
+   The running mean (training) or the estimated mean (testing) as a
-   (testing) are (C x D1 x ... x Dn).
+   1-dimensional tensor of size C.
  * **var** (heterogeneous) - **T**:
+   The running variance (training) or the estimated variance (testing)
+   as a 1-dimensional tensor of size C.
-   If spatial is true, the dimension of the running variance(training)
-   or the estimated variance (testing) is (C). If spatial is false, the
-   dimensions of the running variance(training) or the estimated
-   variance (testing) are (C x D1 x ... x Dn).
  **Outputs**
  Between 1 and 5 outputs.
  * **Y** (heterogeneous) - **T**:
-   The output tensor of the same shape as X
+   The output tensor of the same shape as X.
  * **mean** (optional, heterogeneous) - **T**:
-   The running mean after the BatchNormalization operator.
+   The running mean after the BatchNormalization operator. Must be in-
+   place with the input mean. Should not be used for testing.
  * **var** (optional, heterogeneous) - **T**:
-   The running variance after the BatchNormalization operator.
+   The running variance after the BatchNormalization operator. Must be
+   in-place with the input var. Should not be used for testing.
  * **saved_mean** (optional, heterogeneous) - **T**:
    Saved mean used during training to speed up gradient computation.
+   Should not be used for testing.
  * **saved_var** (optional, heterogeneous) - **T**:
    Saved variance used during training to speed up gradient
-   computation.
+   computation. Should not be used for testing.
  **Type Constraints**
  * **T** in (
    tensor(double),
    tensor(float),
    tensor(float16)
    ):
    Constrain input and output types to float tensors.