Conv - 1 vs 11#
Next section compares an older to a newer version of the same operator after both definition are converted into markdown text. Green means an addition to the newer version, red means a deletion. Anything else is unchanged.
- Conv1 → Conv11 +9 -16
Conv1 → Conv11
RENAMED
@@ -1 +1 @@
|
|
1
1
|
The convolution operator consumes an input tensor and a filter, and
|
2
2
|
computes the output.
|
3
3
|
**Attributes**
|
4
4
|
* **auto_pad**:
|
5
5
|
auto_pad must be either NOTSET, SAME_UPPER, SAME_LOWER or VALID.
|
6
6
|
Where default value is NOTSET, which means explicit padding is used.
|
7
|
-
SAME_UPPER or SAME_LOWER mean pad the input so that
|
7
|
+
SAME_UPPER or SAME_LOWER mean pad the input so that the output
|
8
|
+
spatial size match the input.In case of odd number add the extra
|
9
|
+
padding at the end for SAME_UPPER and at the beginning for
|
10
|
+
SAME_LOWER. VALID mean no padding.
|
8
|
-
= ceil(input_shape[i] / strides[i]) for each axis i. The padding
|
9
|
-
is split between the two sides equally or almost equally (depending
|
10
|
-
on whether it is even or odd). In case the padding is an odd number,
|
11
|
-
the extra padding is added at the end for SAME_UPPER and at the
|
12
|
-
beginning for SAME_LOWER.
|
13
11
|
* **dilations**:
|
14
|
-
dilation value along each spatial axis of the filter.
|
12
|
+
dilation value along each spatial axis of the filter.
|
15
|
-
present, the dilation defaults is 1 along each spatial axis.
|
16
13
|
* **group**:
|
17
14
|
number of groups input channels and output channels are divided
|
18
15
|
into.
|
19
16
|
* **kernel_shape**:
|
20
17
|
The shape of the convolution kernel. If not present, should be
|
21
18
|
inferred from input W.
|
22
19
|
* **pads**:
|
23
20
|
Padding for the beginning and ending along each spatial axis, it can
|
24
21
|
take any value greater than or equal to 0. The value represent the
|
25
22
|
number of pixels added to the beginning and end part of the
|
26
23
|
corresponding axis. pads format should be as follow [x1_begin,
|
27
24
|
x2_begin...x1_end, x2_end,...], where xi_begin the number of pixels
|
28
25
|
added at the beginning of axis i and xi_end, the number of pixels
|
29
26
|
added at the end of axis i. This attribute cannot be used
|
30
27
|
simultaneously with auto_pad attribute. If not present, the padding
|
31
28
|
defaults to 0 along start and end of each spatial axis.
|
32
29
|
* **strides**:
|
33
|
-
Stride along each spatial axis. If not present, the stride defaults
|
34
|
-
|
30
|
+
Stride along each spatial axis.
|
35
31
|
**Inputs**
|
36
32
|
Between 2 and 3 inputs.
|
37
33
|
* **X** (heterogeneous) - **T**:
|
38
34
|
Input data tensor from previous layer; has size (N x C x H x W),
|
39
35
|
where N is the batch size, C is the number of channels, and H and W
|
40
36
|
are the height and width. Note that this is for the 2D image.
|
41
37
|
Otherwise the size is (N x C x D1 x D2 ... x Dn). Optionally, if
|
42
38
|
dimension denotation is in effect, the operation expects input data
|
43
39
|
tensor to arrive with the dimension denotation of [DATA_BATCH,
|
44
40
|
DATA_CHANNEL, DATA_FEATURE, DATA_FEATURE ...].
|
45
41
|
* **W** (heterogeneous) - **T**:
|
46
42
|
The weight tensor that will be used in the convolutions; has size (M
|
47
43
|
x C/group x kH x kW), where C is the number of channels, and kH and
|
48
44
|
kW are the height and width of the kernel, and M is the number of
|
49
45
|
feature maps. For more than 2 dimensions, the kernel shape will be
|
50
46
|
(M x C/group x k1 x k2 x ... x kn), where (k1 x k2 x ... kn) is the
|
51
47
|
dimension of the kernel. Optionally, if dimension denotation is in
|
52
48
|
effect, the operation expects the weight tensor to arrive with the
|
53
49
|
dimension denotation of [FILTER_OUT_CHANNEL, FILTER_IN_CHANNEL,
|
50
|
+
FILTER_SPATIAL, FILTER_SPATIAL ...]. X.shape[1] == (W.shape[1] *
|
51
|
+
group) == C (assuming zero based indices for the shape array). Or in
|
52
|
+
other words FILTER_IN_CHANNEL should be equal to DATA_CHANNEL.
|
54
|
-
FILTER_SPATIAL, FILTER_SPATIAL ...]. Assuming zero based indices for
|
55
|
-
the shape array, X.shape[1] == (W.shape[1] * group) == C and
|
56
|
-
W.shape[0] mod G == 0. Or in other words FILTER_IN_CHANNEL
|
57
|
-
multiplied by the number of groups should be equal to DATA_CHANNEL
|
58
|
-
and the number of feature maps M should be a multiple of the number
|
59
|
-
of groups G.
|
60
53
|
* **B** (optional, heterogeneous) - **T**:
|
61
54
|
Optional 1D bias to be added to the convolution, has size of M.
|
62
55
|
**Outputs**
|
63
56
|
* **Y** (heterogeneous) - **T**:
|
64
57
|
Output data tensor that contains the result of the convolution. The
|
65
58
|
output dimensions are functions of the kernel size, stride size, and
|
66
59
|
pad lengths.
|
67
60
|
**Type Constraints**
|
68
61
|
* **T** in (
|
69
62
|
tensor(double),
|
70
63
|
tensor(float),
|
71
64
|
tensor(float16)
|
72
65
|
):
|
73
66
|
Constrain input and output types to float tensors.
|