BatchNormalization - 1 vs 6#

Next section compares an older to a newer version of the same operator after both definition are converted into markdown text. Green means an addition to the newer version, red means a deletion. Anything else is unchanged.

BatchNormalization1 → BatchNormalization6 RENAMED
@@ -1 +1 @@
1
1
  Carries out batch normalization as described in the paper
2
2
  https://arxiv.org/abs/1502.03167. Depending on the mode it is being run,
3
3
  there are multiple cases for the number of outputs, which we list below:
4
4
  Output case #1: Y, mean, var, saved_mean, saved_var (training mode)
5
5
  Output case #2: Y (test mode)
6
6
  **Attributes**
7
+ * **consumed_inputs** (required):
8
+ legacy optimization attribute.
7
9
  * **epsilon**:
8
10
  The epsilon value to use to avoid division by zero, default is
9
11
  1e-5f.
10
12
  * **is_test**:
11
13
  If set to nonzero, run spatial batch normalization in test mode,
12
14
  default is 0.
13
15
  * **momentum**:
14
16
  Factor used in computing the running mean and variance.e.g.,
15
17
  running_mean = running_mean * momentum + mean * (1 - momentum),
16
18
  default is 0.9f.
17
19
  * **spatial**:
18
20
  If true, compute the mean and variance across all spatial elements
19
21
  If false, compute the mean and variance across per feature.Default
20
22
  is 1.
21
23
  **Inputs**
22
24
  * **X** (heterogeneous) - **T**:
25
+ The input 4-dimensional tensor of shape NCHW.
23
- Input data tensor from the previous operator; dimensions for image
24
- case are (N x C x H x W), where N is the batch size, C is the number
25
- of channels, and H and W are the height and the width of the data.
26
- For non image case, the dimensions are in the form of (N x C x D1 x
27
- D2 ... Dn), where N is the batch size.
28
26
  * **scale** (heterogeneous) - **T**:
29
27
  The scale as a 1-dimensional tensor of size C to be applied to the
30
28
  output.
31
29
  * **B** (heterogeneous) - **T**:
32
30
  The bias as a 1-dimensional tensor of size C to be applied to the
33
31
  output.
34
32
  * **mean** (heterogeneous) - **T**:
35
33
  The running mean (training) or the estimated mean (testing) as a
36
34
  1-dimensional tensor of size C.
37
35
  * **var** (heterogeneous) - **T**:
38
36
  The running variance (training) or the estimated variance (testing)
39
37
  as a 1-dimensional tensor of size C.
40
38
  **Outputs**
41
39
  Between 1 and 5 outputs.
42
40
  * **Y** (heterogeneous) - **T**:
43
- The output tensor of the same shape as X.
41
+ The output 4-dimensional tensor of the same shape as X.
44
42
  * **mean** (optional, heterogeneous) - **T**:
45
43
  The running mean after the BatchNormalization operator. Must be in-
46
44
  place with the input mean. Should not be used for testing.
47
45
  * **var** (optional, heterogeneous) - **T**:
48
46
  The running variance after the BatchNormalization operator. Must be
49
47
  in-place with the input var. Should not be used for testing.
50
48
  * **saved_mean** (optional, heterogeneous) - **T**:
51
49
  Saved mean used during training to speed up gradient computation.
52
50
  Should not be used for testing.
53
51
  * **saved_var** (optional, heterogeneous) - **T**:
54
52
  Saved variance used during training to speed up gradient
55
53
  computation. Should not be used for testing.
56
54
  **Type Constraints**
57
55
  * **T** in (
58
56
  tensor(double),
59
57
  tensor(float),
60
58
  tensor(float16)
61
59
  ):
62
60
  Constrain input and output types to float tensors.