BatchNormalization - 7 vs 9#
Next section compares an older to a newer version of the same operator after both definition are converted into markdown text. Green means an addition to the newer version, red means a deletion. Anything else is unchanged.
BatchNormalization7 → BatchNormalization9
RENAMED
@@ -1 +1 @@
|
|
1
1
|
Carries out batch normalization as described in the paper
|
2
2
|
https://arxiv.org/abs/1502.03167. Depending on the mode it is being run,
|
3
3
|
there are multiple cases for the number of outputs, which we list below:
|
4
4
|
Output case #1: Y, mean, var, saved_mean, saved_var (training mode)
|
5
5
|
Output case #2: Y (test mode)
|
6
|
-
|
7
|
-
For previous (depreciated) non-spatial cases, implementors are suggested
|
8
|
-
to flatten the input shape to (N x C*D1*D2 ..*Dn) before a BatchNormalization Op.
|
9
|
-
This operator has **optional** inputs/outputs. See ONNX <https://github.com/onnx/onnx/blob/master/docs/IR.md>_ for more details about the representation of optional arguments. An empty string may be used in the place of an actual argument's name to indicate a missing argument. Trailing optional arguments (those not followed by an argument that is present) may also be simply omitted.
|
6
|
+
This operator has **optional** inputs/outputs. See ONNX <https://github.com/onnx/onnx/blob/master/docs/IR.md>_ for more details about the representation of optional arguments. An empty string may be used in the place of an actual argument's name to indicate a missing argument. Trailing optional arguments (those not followed by an argument that is present) may also be simply omitted.
|
10
7
|
**Attributes**
|
11
8
|
* **epsilon**:
|
12
9
|
The epsilon value to use to avoid division by zero.
|
13
10
|
* **momentum**:
|
14
11
|
Factor used in computing the running mean and variance.e.g.,
|
15
12
|
running_mean = running_mean * momentum + mean * (1 - momentum).
|
13
|
+
* **spatial**:
|
14
|
+
If true, compute the mean and variance across per activation. If
|
15
|
+
false, compute the mean and variance across per feature over each
|
16
|
+
mini-batch.
|
16
17
|
**Inputs**
|
17
18
|
* **X** (heterogeneous) - **T**:
|
18
|
-
Input data tensor from the previous operator; dimensions
|
19
|
+
Input data tensor from the previous operator; dimensions for image
|
20
|
+
case are (N x C x H x W), where N is the batch size, C is the number
|
21
|
+
of channels, and H and W are the height and the width of the data.
|
22
|
+
For non image case, the dimensions are in the form of (N x C x D1 x
|
23
|
+
D2 ... Dn), where N is the batch size.
|
19
|
-
form of (N x C x D1 x D2 ... Dn), where N is the batch size, C is
|
20
|
-
the number of channels. Statistics are computed for every channel of
|
21
|
-
C over N and D1 to Dn dimensions. For image data, input dimensions
|
22
|
-
become (N x C x H x W). The op also accepts single dimension input
|
23
|
-
of size N in which case C is assumed to be 1
|
24
24
|
* **scale** (heterogeneous) - **T**:
|
25
|
+
If spatial is true, the dimension of scale is (C). If spatial is
|
25
|
-
|
26
|
+
false, the dimensions of scale are (C x D1 x ... x Dn)
|
26
27
|
* **B** (heterogeneous) - **T**:
|
28
|
+
If spatial is true, the dimension of bias is (C). If spatial is
|
27
|
-
|
29
|
+
false, the dimensions of bias are (C x D1 x ... x Dn)
|
28
30
|
* **mean** (heterogeneous) - **T**:
|
31
|
+
If spatial is true, the dimension of the running mean (training) or
|
32
|
+
the estimated mean (testing) is (C). If spatial is false, the
|
29
|
-
running (training) or estimated
|
33
|
+
dimensions of the running mean (training) or the estimated mean
|
34
|
+
(testing) are (C x D1 x ... x Dn).
|
30
35
|
* **var** (heterogeneous) - **T**:
|
36
|
+
If spatial is true, the dimension of the running variance(training)
|
37
|
+
or the estimated variance (testing) is (C). If spatial is false, the
|
31
|
-
running (training) or estimated
|
38
|
+
dimensions of the running variance(training) or the estimated
|
32
|
-
(C).
|
39
|
+
variance (testing) are (C x D1 x ... x Dn).
|
33
40
|
**Outputs**
|
34
41
|
Between 1 and 5 outputs.
|
35
42
|
* **Y** (heterogeneous) - **T**:
|
36
43
|
The output tensor of the same shape as X
|
37
44
|
* **mean** (optional, heterogeneous) - **T**:
|
38
45
|
The running mean after the BatchNormalization operator.
|
39
46
|
* **var** (optional, heterogeneous) - **T**:
|
40
47
|
The running variance after the BatchNormalization operator.
|
41
48
|
* **saved_mean** (optional, heterogeneous) - **T**:
|
42
49
|
Saved mean used during training to speed up gradient computation.
|
43
50
|
* **saved_var** (optional, heterogeneous) - **T**:
|
44
51
|
Saved variance used during training to speed up gradient
|
45
52
|
computation.
|
46
53
|
**Type Constraints**
|
47
54
|
* **T** in (
|
48
55
|
tensor(double),
|
49
56
|
tensor(float),
|
50
57
|
tensor(float16)
|
51
58
|
):
|
52
59
|
Constrain input and output types to float tensors.
|