BatchNormalization - 1 vs 6¶
BatchNormalization1 → BatchNormalization6
RENAMED
@@ -1 +1 @@
|
|
1
1
|
Carries out batch normalization as described in the paper
|
2
2
|
https://arxiv.org/abs/1502.03167. Depending on the mode it is being run,
|
3
3
|
there are multiple cases for the number of outputs, which we list below:
|
4
4
|
Output case #1: Y, mean, var, saved_mean, saved_var (training mode)
|
5
5
|
Output case #2: Y (test mode)
|
6
6
|
**Attributes**
|
7
|
-
* **consumed_inputs** (required):
|
8
|
-
legacy optimization attribute.
|
9
7
|
* **epsilon**:
|
10
8
|
The epsilon value to use to avoid division by zero, default is
|
11
9
|
1e-5f.
|
12
10
|
* **is_test**:
|
13
11
|
If set to nonzero, run spatial batch normalization in test mode,
|
14
12
|
default is 0.
|
15
13
|
* **momentum**:
|
16
14
|
Factor used in computing the running mean and variance.e.g.,
|
17
15
|
running_mean = running_mean * momentum + mean * (1 - momentum),
|
18
16
|
default is 0.9f.
|
19
17
|
* **spatial**:
|
20
18
|
If true, compute the mean and variance across all spatial elements
|
21
19
|
If false, compute the mean and variance across per feature.Default
|
22
20
|
is 1.
|
23
21
|
**Inputs**
|
24
22
|
* **X** (heterogeneous) - **T**:
|
23
|
+
Input data tensor from the previous operator; dimensions for image
|
24
|
+
case are (N x C x H x W), where N is the batch size, C is the number
|
25
|
+
of channels, and H and W are the height and the width of the data.
|
25
|
-
|
26
|
+
For non image case, the dimensions are in the form of (N x C x D1 x
|
27
|
+
D2 ... Dn), where N is the batch size.
|
26
28
|
* **scale** (heterogeneous) - **T**:
|
27
29
|
The scale as a 1-dimensional tensor of size C to be applied to the
|
28
30
|
output.
|
29
31
|
* **B** (heterogeneous) - **T**:
|
30
32
|
The bias as a 1-dimensional tensor of size C to be applied to the
|
31
33
|
output.
|
32
34
|
* **mean** (heterogeneous) - **T**:
|
33
35
|
The running mean (training) or the estimated mean (testing) as a
|
34
36
|
1-dimensional tensor of size C.
|
35
37
|
* **var** (heterogeneous) - **T**:
|
36
38
|
The running variance (training) or the estimated variance (testing)
|
37
39
|
as a 1-dimensional tensor of size C.
|
38
40
|
**Outputs**
|
39
41
|
Between 1 and 5 outputs.
|
40
42
|
* **Y** (heterogeneous) - **T**:
|
41
|
-
The output
|
43
|
+
The output tensor of the same shape as X.
|
42
44
|
* **mean** (optional, heterogeneous) - **T**:
|
43
45
|
The running mean after the BatchNormalization operator. Must be in-
|
44
46
|
place with the input mean. Should not be used for testing.
|
45
47
|
* **var** (optional, heterogeneous) - **T**:
|
46
48
|
The running variance after the BatchNormalization operator. Must be
|
47
49
|
in-place with the input var. Should not be used for testing.
|
48
50
|
* **saved_mean** (optional, heterogeneous) - **T**:
|
49
51
|
Saved mean used during training to speed up gradient computation.
|
50
52
|
Should not be used for testing.
|
51
53
|
* **saved_var** (optional, heterogeneous) - **T**:
|
52
54
|
Saved variance used during training to speed up gradient
|
53
55
|
computation. Should not be used for testing.
|
54
56
|
**Type Constraints**
|
55
57
|
* **T** in (
|
56
58
|
tensor(double),
|
57
59
|
tensor(float),
|
58
60
|
tensor(float16)
|
59
61
|
):
|
60
62
|
Constrain input and output types to float tensors.
|