BatchNormalization - 1 vs 6

BatchNormalization1 → BatchNormalization6 RENAMED
@@ -1 +1 @@
1
1
  Carries out batch normalization as described in the paper
2
2
  https://arxiv.org/abs/1502.03167. Depending on the mode it is being run,
3
3
  there are multiple cases for the number of outputs, which we list below:
4
4
  Output case #1: Y, mean, var, saved_mean, saved_var (training mode)
5
5
  Output case #2: Y (test mode)
6
6
  **Attributes**
7
- * **consumed_inputs** (required):
8
- legacy optimization attribute.
9
7
  * **epsilon**:
10
8
  The epsilon value to use to avoid division by zero, default is
11
9
  1e-5f.
12
10
  * **is_test**:
13
11
  If set to nonzero, run spatial batch normalization in test mode,
14
12
  default is 0.
15
13
  * **momentum**:
16
14
  Factor used in computing the running mean and variance.e.g.,
17
15
  running_mean = running_mean * momentum + mean * (1 - momentum),
18
16
  default is 0.9f.
19
17
  * **spatial**:
20
18
  If true, compute the mean and variance across all spatial elements
21
19
  If false, compute the mean and variance across per feature.Default
22
20
  is 1.
23
21
  **Inputs**
24
22
  * **X** (heterogeneous) - **T**:
23
+ Input data tensor from the previous operator; dimensions for image
24
+ case are (N x C x H x W), where N is the batch size, C is the number
25
+ of channels, and H and W are the height and the width of the data.
25
- The input 4-dimensional tensor of shape NCHW.
26
+ For non image case, the dimensions are in the form of (N x C x D1 x
27
+ D2 ... Dn), where N is the batch size.
26
28
  * **scale** (heterogeneous) - **T**:
27
29
  The scale as a 1-dimensional tensor of size C to be applied to the
28
30
  output.
29
31
  * **B** (heterogeneous) - **T**:
30
32
  The bias as a 1-dimensional tensor of size C to be applied to the
31
33
  output.
32
34
  * **mean** (heterogeneous) - **T**:
33
35
  The running mean (training) or the estimated mean (testing) as a
34
36
  1-dimensional tensor of size C.
35
37
  * **var** (heterogeneous) - **T**:
36
38
  The running variance (training) or the estimated variance (testing)
37
39
  as a 1-dimensional tensor of size C.
38
40
  **Outputs**
39
41
  Between 1 and 5 outputs.
40
42
  * **Y** (heterogeneous) - **T**:
41
- The output 4-dimensional tensor of the same shape as X.
43
+ The output tensor of the same shape as X.
42
44
  * **mean** (optional, heterogeneous) - **T**:
43
45
  The running mean after the BatchNormalization operator. Must be in-
44
46
  place with the input mean. Should not be used for testing.
45
47
  * **var** (optional, heterogeneous) - **T**:
46
48
  The running variance after the BatchNormalization operator. Must be
47
49
  in-place with the input var. Should not be used for testing.
48
50
  * **saved_mean** (optional, heterogeneous) - **T**:
49
51
  Saved mean used during training to speed up gradient computation.
50
52
  Should not be used for testing.
51
53
  * **saved_var** (optional, heterogeneous) - **T**:
52
54
  Saved variance used during training to speed up gradient
53
55
  computation. Should not be used for testing.
54
56
  **Type Constraints**
55
57
  * **T** in (
56
58
  tensor(double),
57
59
  tensor(float),
58
60
  tensor(float16)
59
61
  ):
60
62
  Constrain input and output types to float tensors.