BatchNormalization - 7 vs 9#

Next section compares an older to a newer version of the same operator after both definition are converted into markdown text. Green means an addition to the newer version, red means a deletion. Anything else is unchanged.

BatchNormalization7 → BatchNormalization9 RENAMED
@@ -1 +1 @@
1
1
  Carries out batch normalization as described in the paper
2
2
  https://arxiv.org/abs/1502.03167. Depending on the mode it is being run,
3
3
  there are multiple cases for the number of outputs, which we list below:
4
4
  Output case #1: Y, mean, var, saved_mean, saved_var (training mode)
5
5
  Output case #2: Y (test mode)
6
-
7
- For previous (depreciated) non-spatial cases, implementors are suggested
8
- to flatten the input shape to (N x C*D1*D2 ..*Dn) before a BatchNormalization Op.
9
- This operator has **optional** inputs/outputs. See ONNX <https://github.com/onnx/onnx/blob/master/docs/IR.md>_ for more details about the representation of optional arguments. An empty string may be used in the place of an actual argument's name to indicate a missing argument. Trailing optional arguments (those not followed by an argument that is present) may also be simply omitted.
6
+ This operator has **optional** inputs/outputs. See ONNX <https://github.com/onnx/onnx/blob/master/docs/IR.md>_ for more details about the representation of optional arguments. An empty string may be used in the place of an actual argument's name to indicate a missing argument. Trailing optional arguments (those not followed by an argument that is present) may also be simply omitted.
10
7
  **Attributes**
11
8
  * **epsilon**:
12
9
  The epsilon value to use to avoid division by zero.
13
10
  * **momentum**:
14
11
  Factor used in computing the running mean and variance.e.g.,
15
12
  running_mean = running_mean * momentum + mean * (1 - momentum).
13
+ * **spatial**:
14
+ If true, compute the mean and variance across per activation. If
15
+ false, compute the mean and variance across per feature over each
16
+ mini-batch.
16
17
  **Inputs**
17
18
  * **X** (heterogeneous) - **T**:
18
- Input data tensor from the previous operator; dimensions are in the
19
+ Input data tensor from the previous operator; dimensions for image
20
+ case are (N x C x H x W), where N is the batch size, C is the number
21
+ of channels, and H and W are the height and the width of the data.
22
+ For non image case, the dimensions are in the form of (N x C x D1 x
23
+ D2 ... Dn), where N is the batch size.
19
- form of (N x C x D1 x D2 ... Dn), where N is the batch size, C is
20
- the number of channels. Statistics are computed for every channel of
21
- C over N and D1 to Dn dimensions. For image data, input dimensions
22
- become (N x C x H x W). The op also accepts single dimension input
23
- of size N in which case C is assumed to be 1
24
24
  * **scale** (heterogeneous) - **T**:
25
+ If spatial is true, the dimension of scale is (C). If spatial is
25
- Scale tensor of shape (C).
26
+ false, the dimensions of scale are (C x D1 x ... x Dn)
26
27
  * **B** (heterogeneous) - **T**:
28
+ If spatial is true, the dimension of bias is (C). If spatial is
27
- Bias tensor of shape (C).
29
+ false, the dimensions of bias are (C x D1 x ... x Dn)
28
30
  * **mean** (heterogeneous) - **T**:
31
+ If spatial is true, the dimension of the running mean (training) or
32
+ the estimated mean (testing) is (C). If spatial is false, the
29
- running (training) or estimated (testing) mean tensor of shape (C).
33
+ dimensions of the running mean (training) or the estimated mean
34
+ (testing) are (C x D1 x ... x Dn).
30
35
  * **var** (heterogeneous) - **T**:
36
+ If spatial is true, the dimension of the running variance(training)
37
+ or the estimated variance (testing) is (C). If spatial is false, the
31
- running (training) or estimated (testing) variance tensor of shape
38
+ dimensions of the running variance(training) or the estimated
32
- (C).
39
+ variance (testing) are (C x D1 x ... x Dn).
33
40
  **Outputs**
34
41
  Between 1 and 5 outputs.
35
42
  * **Y** (heterogeneous) - **T**:
36
43
  The output tensor of the same shape as X
37
44
  * **mean** (optional, heterogeneous) - **T**:
38
45
  The running mean after the BatchNormalization operator.
39
46
  * **var** (optional, heterogeneous) - **T**:
40
47
  The running variance after the BatchNormalization operator.
41
48
  * **saved_mean** (optional, heterogeneous) - **T**:
42
49
  Saved mean used during training to speed up gradient computation.
43
50
  * **saved_var** (optional, heterogeneous) - **T**:
44
51
  Saved variance used during training to speed up gradient
45
52
  computation.
46
53
  **Type Constraints**
47
54
  * **T** in (
48
55
  tensor(double),
49
56
  tensor(float),
50
57
  tensor(float16)
51
58
  ):
52
59
  Constrain input and output types to float tensors.