BatchNormalization - 14 vs 15#

Next section compares an older to a newer version of the same operator after both definition are converted into markdown text. Green means an addition to the newer version, red means a deletion. Anything else is unchanged.

BatchNormalization14 → BatchNormalization15 RENAMED
@@ -1 +1 @@
1
1
  Carries out batch normalization as described in the paper
2
2
  https://arxiv.org/abs/1502.03167. Depending on the mode it is being run,
3
3
  There are five required inputs 'X', 'scale', 'B', 'input_mean' and
4
4
  'input_var'.
5
5
  Note that 'input_mean' and 'input_var' are expected to be the estimated
6
6
  statistics in inference mode (training_mode=False, default),
7
7
  and the running statistics in training mode (training_mode=True).
8
8
  There are multiple cases for the number of outputs, which we list below:
9
9
  Output case #1: Y, running_mean, running_var (training_mode=True)
10
10
  Output case #2: Y (training_mode=False)
11
11
  When training_mode=False, extra outputs are invalid.
12
12
  The outputs are updated as follows when training_mode=True:
13
13
  ::
14
14
  running_mean = input_mean * momentum + current_mean * (1 - momentum)
15
15
  running_var = input_var * momentum + current_var * (1 - momentum)
16
16
  Y = (X - current_mean) / sqrt(current_var + epsilon) * scale + B
17
17
  where:
18
18
  current_mean = ReduceMean(X, axis=all_except_channel_index)
19
19
  current_var = ReduceVar(X, axis=all_except_channel_index)
20
20
  Notice that ReduceVar refers to the population variance, and it equals to
21
21
  sum(sqrd(x_i - x_avg)) / N
22
22
  where N is the population size (this formula does not use sample size N - 1).
23
- The computation of ReduceMean and ReduceVar uses float to avoid overflow for float16 inputs.
24
-
25
23
  When training_mode=False:
26
24
  ::
27
25
  Y = (X - input_mean) / sqrt(input_var + epsilon) * scale + B
28
26
  For previous (depreciated) non-spatial cases, implementors are suggested
29
27
  to flatten the input shape to (N x C * D1 * D2 * ... * Dn) before a BatchNormalization Op.
30
28
  This operator has **optional** inputs/outputs. See ONNX <https://github.com/onnx/onnx/blob/master/docs/IR.md>_ for more details about the representation of optional arguments. An empty string may be used in the place of an actual argument's name to indicate a missing argument. Trailing optional arguments (those not followed by an argument that is present) may also be simply omitted.
31
29
  **Attributes**
32
30
  * **epsilon**:
33
31
  The epsilon value to use to avoid division by zero.
34
32
  * **momentum**:
35
33
  Factor used in computing the running mean and variance.e.g.,
36
34
  running_mean = running_mean * momentum + mean * (1 - momentum).
37
35
  * **training_mode**:
38
36
  If set to true, it indicates BatchNormalization is being used for
39
37
  training, and outputs 1, 2, 3, and 4 would be populated.
40
38
  **Inputs**
41
39
  * **X** (heterogeneous) - **T**:
42
40
  Input data tensor from the previous operator; dimensions are in the
43
41
  form of (N x C x D1 x D2 ... Dn), where N is the batch size, C is
44
42
  the number of channels. Statistics are computed for every channel of
45
43
  C over N and D1 to Dn dimensions. For image data, input dimensions
46
44
  become (N x C x H x W). The op also accepts single dimension input
47
45
  of size N in which case C is assumed to be 1
48
- * **scale** (heterogeneous) - **T1**:
46
+ * **scale** (heterogeneous) - **T**:
49
47
  Scale tensor of shape (C).
50
- * **B** (heterogeneous) - **T1**:
48
+ * **B** (heterogeneous) - **T**:
51
49
  Bias tensor of shape (C).
52
- * **input_mean** (heterogeneous) - **T2**:
50
+ * **input_mean** (heterogeneous) - **U**:
53
51
  running (training) or estimated (testing) mean tensor of shape (C).
54
- * **input_var** (heterogeneous) - **T2**:
52
+ * **input_var** (heterogeneous) - **U**:
55
53
  running (training) or estimated (testing) variance tensor of shape
56
54
  (C).
57
55
  **Outputs**
58
56
  Between 1 and 3 outputs.
59
57
  * **Y** (heterogeneous) - **T**:
60
58
  The output tensor of the same shape as X
61
- * **running_mean** (optional, heterogeneous) - **T2**:
59
+ * **running_mean** (optional, heterogeneous) - **U**:
62
60
  The running mean after the BatchNormalization operator.
63
- * **running_var** (optional, heterogeneous) - **T2**:
61
+ * **running_var** (optional, heterogeneous) - **U**:
64
62
  The running variance after the BatchNormalization operator. This op
65
63
  uses the population size (N) for calculating variance, and not the
66
64
  sample size N-1.
67
65
  **Type Constraints**
68
66
  * **T** in (
69
67
  tensor(bfloat16),
70
68
  tensor(double),
71
69
  tensor(float),
72
70
  tensor(float16)
73
71
  ):
74
72
  Constrain input and output types to float tensors.
75
- * **T1** in (
73
+ * **U** in (
76
74
  tensor(bfloat16),
77
75
  tensor(double),
78
76
  tensor(float),
79
77
  tensor(float16)
80
78
  ):
81
- Constrain scale and bias types to float tensors.
82
- * **T2** in (
83
- tensor(bfloat16),
84
- tensor(double),
85
- tensor(float),
79
+ float type for U.
86
- tensor(float16)
87
- ):
88
- Constrain mean and variance types to float tensors.+ Constrain mean and variance types to float tensors. It allows all