BatchNormalization - 14 vs 15¶
BatchNormalization14 → BatchNormalization15
RENAMED
@@ -1 +1 @@
|
|
1
1
|
Carries out batch normalization as described in the paper
|
2
2
|
https://arxiv.org/abs/1502.03167. Depending on the mode it is being run,
|
3
3
|
There are five required inputs 'X', 'scale', 'B', 'input_mean' and
|
4
4
|
'input_var'.
|
5
5
|
Note that 'input_mean' and 'input_var' are expected to be the estimated
|
6
6
|
statistics in inference mode (training_mode=False, default),
|
7
7
|
and the running statistics in training mode (training_mode=True).
|
8
8
|
There are multiple cases for the number of outputs, which we list below:
|
9
9
|
Output case #1: Y, running_mean, running_var (training_mode=True)
|
10
10
|
Output case #2: Y (training_mode=False)
|
11
11
|
When training_mode=False, extra outputs are invalid.
|
12
12
|
The outputs are updated as follows when training_mode=True:
|
13
13
|
::
|
14
14
|
running_mean = input_mean * momentum + current_mean * (1 - momentum)
|
15
15
|
running_var = input_var * momentum + current_var * (1 - momentum)
|
16
16
|
Y = (X - current_mean) / sqrt(current_var + epsilon) * scale + B
|
17
17
|
where:
|
18
18
|
current_mean = ReduceMean(X, axis=all_except_channel_index)
|
19
19
|
current_var = ReduceVar(X, axis=all_except_channel_index)
|
20
20
|
Notice that ReduceVar refers to the population variance, and it equals to
|
21
21
|
sum(sqrd(x_i - x_avg)) / N
|
22
22
|
where N is the population size (this formula does not use sample size N - 1).
|
23
|
+
The computation of ReduceMean and ReduceVar uses float to avoid overflow for float16 inputs.
|
24
|
+
|
23
25
|
When training_mode=False:
|
24
26
|
::
|
25
27
|
Y = (X - input_mean) / sqrt(input_var + epsilon) * scale + B
|
26
28
|
For previous (depreciated) non-spatial cases, implementors are suggested
|
27
29
|
to flatten the input shape to (N x C * D1 * D2 * ... * Dn) before a BatchNormalization Op.
|
28
30
|
This operator has **optional** inputs/outputs. See ONNX <https://github.com/onnx/onnx/blob/master/docs/IR.md>_ for more details about the representation of optional arguments. An empty string may be used in the place of an actual argument's name to indicate a missing argument. Trailing optional arguments (those not followed by an argument that is present) may also be simply omitted.
|
29
31
|
**Attributes**
|
30
32
|
* **epsilon**:
|
31
33
|
The epsilon value to use to avoid division by zero.
|
32
34
|
* **momentum**:
|
33
35
|
Factor used in computing the running mean and variance.e.g.,
|
34
36
|
running_mean = running_mean * momentum + mean * (1 - momentum).
|
35
37
|
* **training_mode**:
|
36
38
|
If set to true, it indicates BatchNormalization is being used for
|
37
39
|
training, and outputs 1, 2, 3, and 4 would be populated.
|
38
40
|
**Inputs**
|
39
41
|
* **X** (heterogeneous) - **T**:
|
40
42
|
Input data tensor from the previous operator; dimensions are in the
|
41
43
|
form of (N x C x D1 x D2 ... Dn), where N is the batch size, C is
|
42
44
|
the number of channels. Statistics are computed for every channel of
|
43
45
|
C over N and D1 to Dn dimensions. For image data, input dimensions
|
44
46
|
become (N x C x H x W). The op also accepts single dimension input
|
45
47
|
of size N in which case C is assumed to be 1
|
46
|
-
* **scale** (heterogeneous) - **
|
48
|
+
* **scale** (heterogeneous) - **T1**:
|
47
49
|
Scale tensor of shape (C).
|
48
|
-
* **B** (heterogeneous) - **
|
50
|
+
* **B** (heterogeneous) - **T1**:
|
49
51
|
Bias tensor of shape (C).
|
50
|
-
* **input_mean** (heterogeneous) - **
|
52
|
+
* **input_mean** (heterogeneous) - **T2**:
|
51
53
|
running (training) or estimated (testing) mean tensor of shape (C).
|
52
|
-
* **input_var** (heterogeneous) - **
|
54
|
+
* **input_var** (heterogeneous) - **T2**:
|
53
55
|
running (training) or estimated (testing) variance tensor of shape
|
54
56
|
(C).
|
55
57
|
**Outputs**
|
56
58
|
Between 1 and 3 outputs.
|
57
59
|
* **Y** (heterogeneous) - **T**:
|
58
60
|
The output tensor of the same shape as X
|
59
|
-
* **running_mean** (optional, heterogeneous) - **
|
61
|
+
* **running_mean** (optional, heterogeneous) - **T2**:
|
60
62
|
The running mean after the BatchNormalization operator.
|
61
|
-
* **running_var** (optional, heterogeneous) - **
|
63
|
+
* **running_var** (optional, heterogeneous) - **T2**:
|
62
64
|
The running variance after the BatchNormalization operator. This op
|
63
65
|
uses the population size (N) for calculating variance, and not the
|
64
66
|
sample size N-1.
|
65
67
|
**Type Constraints**
|
66
68
|
* **T** in (
|
67
69
|
tensor(bfloat16),
|
68
70
|
tensor(double),
|
69
71
|
tensor(float),
|
70
72
|
tensor(float16)
|
71
73
|
):
|
72
74
|
Constrain input and output types to float tensors.
|
73
|
-
* **
|
75
|
+
* **T1** in (
|
74
76
|
tensor(bfloat16),
|
75
77
|
tensor(double),
|
76
78
|
tensor(float),
|
77
79
|
tensor(float16)
|
78
80
|
):
|
81
|
+
Constrain scale and bias types to float tensors.
|
82
|
+
* **T2** in (
|
83
|
+
tensor(bfloat16),
|
84
|
+
tensor(double),
|
85
|
+
tensor(float),
|
86
|
+
tensor(float16)
|
87
|
+
):
|
79
|
-
Constrain mean and variance types to float tensors
|
88
|
+
Constrain mean and variance types to float tensors.- float type for U.
|