BatchNormalization - 6 vs 7

BatchNormalization6 → BatchNormalization7 RENAMED
@@ -1 +1 @@
1
1
  Carries out batch normalization as described in the paper
2
2
  https://arxiv.org/abs/1502.03167. Depending on the mode it is being run,
3
3
  there are multiple cases for the number of outputs, which we list below:
4
4
  Output case #1: Y, mean, var, saved_mean, saved_var (training mode)
5
5
  Output case #2: Y (test mode)
6
+ This operator has **optional** inputs/outputs. See ONNX <https://github.com/onnx/onnx/blob/master/docs/IR.md>_ for more details about the representation of optional arguments. An empty string may be used in the place of an actual argument's name to indicate a missing argument. Trailing optional arguments (those not followed by an argument that is present) may also be simply omitted.
6
7
  **Attributes**
7
8
  * **epsilon**:
8
- The epsilon value to use to avoid division by zero, default is
9
+ The epsilon value to use to avoid division by zero.
9
- 1e-5f.
10
- * **is_test**:
11
- If set to nonzero, run spatial batch normalization in test mode,
12
- default is 0.
13
10
  * **momentum**:
14
11
  Factor used in computing the running mean and variance.e.g.,
15
- running_mean = running_mean * momentum + mean * (1 - momentum),
12
+ running_mean = running_mean * momentum + mean * (1 - momentum).
16
- default is 0.9f.
17
13
  * **spatial**:
18
- If true, compute the mean and variance across all spatial elements
14
+ If true, compute the mean and variance across per activation. If
19
- If false, compute the mean and variance across per feature.Default
15
+ false, compute the mean and variance across per feature over each
20
- is 1.
16
+ mini-batch.
21
17
  **Inputs**
22
18
  * **X** (heterogeneous) - **T**:
23
19
  Input data tensor from the previous operator; dimensions for image
24
20
  case are (N x C x H x W), where N is the batch size, C is the number
25
21
  of channels, and H and W are the height and the width of the data.
26
22
  For non image case, the dimensions are in the form of (N x C x D1 x
27
23
  D2 ... Dn), where N is the batch size.
28
24
  * **scale** (heterogeneous) - **T**:
25
+ If spatial is true, the dimension of scale is (C). If spatial is
29
- The scale as a 1-dimensional tensor of size C to be applied to the
26
+ false, the dimensions of scale are (C x D1 x ... x Dn)
30
- output.
31
27
  * **B** (heterogeneous) - **T**:
28
+ If spatial is true, the dimension of bias is (C). If spatial is
32
- The bias as a 1-dimensional tensor of size C to be applied to the
29
+ false, the dimensions of bias are (C x D1 x ... x Dn)
33
- output.
34
30
  * **mean** (heterogeneous) - **T**:
31
+ If spatial is true, the dimension of the running mean (training) or
32
+ the estimated mean (testing) is (C). If spatial is false, the
35
- The running mean (training) or the estimated mean (testing) as a
33
+ dimensions of the running mean (training) or the estimated mean
36
- 1-dimensional tensor of size C.
34
+ (testing) are (C x D1 x ... x Dn).
37
35
  * **var** (heterogeneous) - **T**:
36
+ If spatial is true, the dimension of the running variance(training)
37
+ or the estimated variance (testing) is (C). If spatial is false, the
38
- The running variance (training) or the estimated variance (testing)
38
+ dimensions of the running variance(training) or the estimated
39
- as a 1-dimensional tensor of size C.
39
+ variance (testing) are (C x D1 x ... x Dn).
40
40
  **Outputs**
41
41
  Between 1 and 5 outputs.
42
42
  * **Y** (heterogeneous) - **T**:
43
- The output tensor of the same shape as X.
43
+ The output tensor of the same shape as X
44
44
  * **mean** (optional, heterogeneous) - **T**:
45
- The running mean after the BatchNormalization operator. Must be in-
45
+ The running mean after the BatchNormalization operator.
46
- place with the input mean. Should not be used for testing.
47
46
  * **var** (optional, heterogeneous) - **T**:
48
- The running variance after the BatchNormalization operator. Must be
47
+ The running variance after the BatchNormalization operator.
49
- in-place with the input var. Should not be used for testing.
50
48
  * **saved_mean** (optional, heterogeneous) - **T**:
51
49
  Saved mean used during training to speed up gradient computation.
52
- Should not be used for testing.
53
50
  * **saved_var** (optional, heterogeneous) - **T**:
54
51
  Saved variance used during training to speed up gradient
55
- computation. Should not be used for testing.
52
+ computation.
56
53
  **Type Constraints**
57
54
  * **T** in (
58
55
  tensor(double),
59
56
  tensor(float),
60
57
  tensor(float16)
61
58
  ):
62
59
  Constrain input and output types to float tensors.