Conv#
Conv - 11#
Version
name: Conv (GitHub)
domain: main
since_version: 11
function: False
support_level: SupportType.COMMON
shape inference: True
This version of the operator has been available since version 11.
Summary
The convolution operator consumes an input tensor and a filter, and computes the output.
Attributes
auto_pad: auto_pad must be either NOTSET, SAME_UPPER, SAME_LOWER or VALID. Where default value is NOTSET, which means explicit padding is used. SAME_UPPER or SAME_LOWER mean pad the input so that output_shape[i] = ceil(input_shape[i] / strides[i]) for each axis i. The padding is split between the two sides equally or almost equally (depending on whether it is even or odd). In case the padding is an odd number, the extra padding is added at the end for SAME_UPPER and at the beginning for SAME_LOWER. Default value is
'NOTSET'
.dilations: dilation value along each spatial axis of the filter. If not present, the dilation defaults is 1 along each spatial axis.
group: number of groups input channels and output channels are divided into. Default value is
1
.kernel_shape: The shape of the convolution kernel. If not present, should be inferred from input W.
pads: Padding for the beginning and ending along each spatial axis, it can take any value greater than or equal to 0. The value represent the number of pixels added to the beginning and end part of the corresponding axis. pads format should be as follow [x1_begin, x2_begin…x1_end, x2_end,…], where xi_begin the number of pixels added at the beginning of axis i and xi_end, the number of pixels added at the end of axis i. This attribute cannot be used simultaneously with auto_pad attribute. If not present, the padding defaults to 0 along start and end of each spatial axis.
strides: Stride along each spatial axis. If not present, the stride defaults is 1 along each spatial axis.
Inputs
Between 2 and 3 inputs.
X (heterogeneous) - T: Input data tensor from previous layer; has size (N x C x H x W), where N is the batch size, C is the number of channels, and H and W are the height and width. Note that this is for the 2D image. Otherwise the size is (N x C x D1 x D2 … x Dn). Optionally, if dimension denotation is in effect, the operation expects input data tensor to arrive with the dimension denotation of [DATA_BATCH, DATA_CHANNEL, DATA_FEATURE, DATA_FEATURE …].
W (heterogeneous) - T: The weight tensor that will be used in the convolutions; has size (M x C/group x kH x kW), where C is the number of channels, and kH and kW are the height and width of the kernel, and M is the number of feature maps. For more than 2 dimensions, the kernel shape will be (M x C/group x k1 x k2 x … x kn), where (k1 x k2 x … kn) is the dimension of the kernel. Optionally, if dimension denotation is in effect, the operation expects the weight tensor to arrive with the dimension denotation of [FILTER_OUT_CHANNEL, FILTER_IN_CHANNEL, FILTER_SPATIAL, FILTER_SPATIAL …]. Assuming zero based indices for the shape array, X.shape[1] == (W.shape[1] * group) == C and W.shape[0] mod G == 0. Or in other words FILTER_IN_CHANNEL multiplied by the number of groups should be equal to DATA_CHANNEL and the number of feature maps M should be a multiple of the number of groups G.
B (optional, heterogeneous) - T: Optional 1D bias to be added to the convolution, has size of M.
Outputs
Y (heterogeneous) - T: Output data tensor that contains the result of the convolution. The output dimensions are functions of the kernel size, stride size, and pad lengths.
Type Constraints
T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.
Examples
conv_with_strides
x = np.array([[[[0., 1., 2., 3., 4.], # (1, 1, 7, 5) input tensor
[5., 6., 7., 8., 9.],
[10., 11., 12., 13., 14.],
[15., 16., 17., 18., 19.],
[20., 21., 22., 23., 24.],
[25., 26., 27., 28., 29.],
[30., 31., 32., 33., 34.]]]]).astype(np.float32)
W = np.array([[[[1., 1., 1.], # (1, 1, 3, 3) tensor for convolution weights
[1., 1., 1.],
[1., 1., 1.]]]]).astype(np.float32)
# Convolution with strides=2 and padding
node_with_padding = onnx.helper.make_node(
'Conv',
inputs=['x', 'W'],
outputs=['y'],
kernel_shape=[3, 3],
pads=[1, 1, 1, 1],
strides=[2, 2], # Default values for other attributes: dilations=[1, 1], groups=1
)
y_with_padding = np.array([[[[12., 27., 24.], # (1, 1, 4, 3) output tensor
[63., 108., 81.],
[123., 198., 141.],
[112., 177., 124.]]]]).astype(np.float32)
expect(node_with_padding, inputs=[x, W], outputs=[y_with_padding],
name='test_conv_with_strides_padding')
# Convolution with strides=2 and no padding
node_without_padding = onnx.helper.make_node(
'Conv',
inputs=['x', 'W'],
outputs=['y'],
kernel_shape=[3, 3],
pads=[0, 0, 0, 0],
strides=[2, 2], # Default values for other attributes: dilations=[1, 1], groups=1
)
y_without_padding = np.array([[[[54., 72.], # (1, 1, 3, 2) output tensor
[144., 162.],
[234., 252.]]]]).astype(np.float32)
expect(node_without_padding, inputs=[x, W], outputs=[y_without_padding],
name='test_conv_with_strides_no_padding')
# Convolution with strides=2 and padding only along one dimension (the H dimension in NxCxHxW tensor)
node_with_asymmetric_padding = onnx.helper.make_node(
'Conv',
inputs=['x', 'W'],
outputs=['y'],
kernel_shape=[3, 3],
pads=[1, 0, 1, 0],
strides=[2, 2], # Default values for other attributes: dilations=[1, 1], groups=1
)
y_with_asymmetric_padding = np.array([[[[21., 33.], # (1, 1, 4, 2) output tensor
[99., 117.],
[189., 207.],
[171., 183.]]]]).astype(np.float32)
expect(node_with_asymmetric_padding, inputs=[x, W], outputs=[y_with_asymmetric_padding],
name='test_conv_with_strides_and_asymmetric_padding')
conv_with_autopad_same
x = np.array([[[[0., 1., 2., 3., 4.], # (1, 1, 5, 5) input tensor
[5., 6., 7., 8., 9.],
[10., 11., 12., 13., 14.],
[15., 16., 17., 18., 19.],
[20., 21., 22., 23., 24.]]]]).astype(np.float32)
W = np.array([[[[1., 1., 1.], # (1, 1, 3, 3) tensor for convolution weights
[1., 1., 1.],
[1., 1., 1.]]]]).astype(np.float32)
# Convolution with auto_pad='SAME_LOWER' and strides=2
node = onnx.helper.make_node(
'Conv',
inputs=['x', 'W'],
outputs=['y'],
auto_pad='SAME_LOWER',
kernel_shape=[3, 3],
strides=[2, 2],
)
y = np.array([[[[12., 27., 24.],
[63., 108., 81.],
[72., 117., 84.]]]]).astype(np.float32)
expect(node, inputs=[x, W], outputs=[y],
name='test_conv_with_autopad_same')
Differences
0 | 0 | The convolution operator consumes an input tensor and a filter, and | The convolution operator consumes an input tensor and a filter, and |
1 | 1 | computes the output. | computes the output. |
2 | 2 |
|
|
3 | 3 | **Attributes** | **Attributes** |
4 | 4 |
|
|
5 | 5 | * **auto_pad**: | * **auto_pad**: |
6 | 6 | auto_pad must be either NOTSET, SAME_UPPER, SAME_LOWER or VALID. | auto_pad must be either NOTSET, SAME_UPPER, SAME_LOWER or VALID. |
7 | 7 | Where default value is NOTSET, which means explicit padding is used. | Where default value is NOTSET, which means explicit padding is used. |
8 | 8 | SAME_UPPER or SAME_LOWER mean pad the input so that the output |
|
9 | spatial size match the input.In case of odd number add the extra | ||
9 | = ceil(input_shape[i] / strides[i]) for each axis i. The padding | ||
10 | is split between the two sides equally or almost equally (depending | ||
11 | on whether it is even or odd). In case the padding is an odd number, | ||
10 | 12 | padding at the end for SAME_UPPER and at the beginning for |
|
11 | 13 | SAME_LOWER. VALID mean no padding. Default value is 'NOTSET'. |
|
12 | 14 | * **dilations**: | * **dilations**: |
13 | 15 | dilation value along each spatial axis of the filter. |
|
16 | present, the dilation defaults is 1 along each spatial axis. | ||
14 | 17 | * **group**: | * **group**: |
15 | 18 | number of groups input channels and output channels are divided | number of groups input channels and output channels are divided |
16 | 19 | into. Default value is 1. | into. Default value is 1. |
17 | 20 | * **kernel_shape**: | * **kernel_shape**: |
18 | 21 | The shape of the convolution kernel. If not present, should be | The shape of the convolution kernel. If not present, should be |
19 | 22 | inferred from input W. | inferred from input W. |
20 | 23 | * **pads**: | * **pads**: |
21 | 24 | Padding for the beginning and ending along each spatial axis, it can | Padding for the beginning and ending along each spatial axis, it can |
22 | 25 | take any value greater than or equal to 0. The value represent the | take any value greater than or equal to 0. The value represent the |
23 | 26 | number of pixels added to the beginning and end part of the | number of pixels added to the beginning and end part of the |
24 | 27 | corresponding axis. pads format should be as follow [x1_begin, | corresponding axis. pads format should be as follow [x1_begin, |
25 | 28 | x2_begin...x1_end, x2_end,...], where xi_begin the number of pixels | x2_begin...x1_end, x2_end,...], where xi_begin the number of pixels |
26 | 29 | added at the beginning of axis i and xi_end, the number of pixels | added at the beginning of axis i and xi_end, the number of pixels |
27 | 30 | added at the end of axis i. This attribute cannot be used | added at the end of axis i. This attribute cannot be used |
28 | 31 | simultaneously with auto_pad attribute. If not present, the padding | simultaneously with auto_pad attribute. If not present, the padding |
29 | 32 | defaults to 0 along start and end of each spatial axis. | defaults to 0 along start and end of each spatial axis. |
30 | 33 | * **strides**: | * **strides**: |
31 | 34 | Stride along each spatial axis. |
|
35 | is 1 along each spatial axis. | ||
32 | 36 |
|
|
33 | 37 | **Inputs** | **Inputs** |
34 | 38 |
|
|
35 | 39 | Between 2 and 3 inputs. | Between 2 and 3 inputs. |
36 | 40 |
|
|
37 | 41 | * **X** (heterogeneous) - **T**: | * **X** (heterogeneous) - **T**: |
38 | 42 | Input data tensor from previous layer; has size (N x C x H x W), | Input data tensor from previous layer; has size (N x C x H x W), |
39 | 43 | where N is the batch size, C is the number of channels, and H and W | where N is the batch size, C is the number of channels, and H and W |
40 | 44 | are the height and width. Note that this is for the 2D image. | are the height and width. Note that this is for the 2D image. |
41 | 45 | Otherwise the size is (N x C x D1 x D2 ... x Dn). Optionally, if | Otherwise the size is (N x C x D1 x D2 ... x Dn). Optionally, if |
42 | 46 | dimension denotation is in effect, the operation expects input data | dimension denotation is in effect, the operation expects input data |
43 | 47 | tensor to arrive with the dimension denotation of [DATA_BATCH, | tensor to arrive with the dimension denotation of [DATA_BATCH, |
44 | 48 | DATA_CHANNEL, DATA_FEATURE, DATA_FEATURE ...]. | DATA_CHANNEL, DATA_FEATURE, DATA_FEATURE ...]. |
45 | 49 | * **W** (heterogeneous) - **T**: | * **W** (heterogeneous) - **T**: |
46 | 50 | The weight tensor that will be used in the convolutions; has size (M | The weight tensor that will be used in the convolutions; has size (M |
47 | 51 | x C/group x kH x kW), where C is the number of channels, and kH and | x C/group x kH x kW), where C is the number of channels, and kH and |
48 | 52 | kW are the height and width of the kernel, and M is the number of | kW are the height and width of the kernel, and M is the number of |
49 | 53 | feature maps. For more than 2 dimensions, the kernel shape will be | feature maps. For more than 2 dimensions, the kernel shape will be |
50 | 54 | (M x C/group x k1 x k2 x ... x kn), where (k1 x k2 x ... kn) is the | (M x C/group x k1 x k2 x ... x kn), where (k1 x k2 x ... kn) is the |
51 | 55 | dimension of the kernel. Optionally, if dimension denotation is in | dimension of the kernel. Optionally, if dimension denotation is in |
52 | 56 | effect, the operation expects the weight tensor to arrive with the | effect, the operation expects the weight tensor to arrive with the |
53 | 57 | dimension denotation of [FILTER_OUT_CHANNEL, FILTER_IN_CHANNEL, | dimension denotation of [FILTER_OUT_CHANNEL, FILTER_IN_CHANNEL, |
54 | 58 | FILTER_SPATIAL, FILTER_SPATIAL ...]. X.shape[1] == (W.shape[1] * |
|
55 | group) == C (assuming zero based indices for the shape array). Or in | ||
59 | the shape array, X.shape[1] == (W.shape[1] * group) == C and | ||
60 | W.shape[0] mod G == 0. Or in other words FILTER_IN_CHANNEL | ||
56 | 61 | other words FILTER_IN_CHANNEL should be equal to DATA_CHANNEL. |
|
62 | and the number of feature maps M should be a multiple of the number | ||
63 | of groups G. | ||
57 | 64 | * **B** (optional, heterogeneous) - **T**: | * **B** (optional, heterogeneous) - **T**: |
58 | 65 | Optional 1D bias to be added to the convolution, has size of M. | Optional 1D bias to be added to the convolution, has size of M. |
59 | 66 |
|
|
60 | 67 | **Outputs** | **Outputs** |
61 | 68 |
|
|
62 | 69 | * **Y** (heterogeneous) - **T**: | * **Y** (heterogeneous) - **T**: |
63 | 70 | Output data tensor that contains the result of the convolution. The | Output data tensor that contains the result of the convolution. The |
64 | 71 | output dimensions are functions of the kernel size, stride size, and | output dimensions are functions of the kernel size, stride size, and |
65 | 72 | pad lengths. | pad lengths. |
66 | 73 |
|
|
67 | 74 | **Type Constraints** | **Type Constraints** |
68 | 75 |
|
|
69 | 76 | * **T** in ( | * **T** in ( |
70 | 77 | tensor(double), | tensor(double), |
71 | 78 | tensor(float), | tensor(float), |
72 | 79 | tensor(float16) | tensor(float16) |
73 | 80 | ): | ): |
74 | 81 | Constrain input and output types to float tensors. | Constrain input and output types to float tensors. |
Conv - 1#
Version
name: Conv (GitHub)
domain: main
since_version: 1
function: False
support_level: SupportType.COMMON
shape inference: True
This version of the operator has been available since version 1.
Summary
The convolution operator consumes an input tensor and a filter, and computes the output.
Attributes
auto_pad: auto_pad must be either NOTSET, SAME_UPPER, SAME_LOWER or VALID. Where default value is NOTSET, which means explicit padding is used. SAME_UPPER or SAME_LOWER mean pad the input so that the output spatial size match the input.In case of odd number add the extra padding at the end for SAME_UPPER and at the beginning for SAME_LOWER. VALID mean no padding. Default value is
'NOTSET'
.dilations: dilation value along each spatial axis of the filter.
group: number of groups input channels and output channels are divided into. Default value is
1
.kernel_shape: The shape of the convolution kernel. If not present, should be inferred from input W.
pads: Padding for the beginning and ending along each spatial axis, it can take any value greater than or equal to 0. The value represent the number of pixels added to the beginning and end part of the corresponding axis. pads format should be as follow [x1_begin, x2_begin…x1_end, x2_end,…], where xi_begin the number of pixels added at the beginning of axis i and xi_end, the number of pixels added at the end of axis i. This attribute cannot be used simultaneously with auto_pad attribute. If not present, the padding defaults to 0 along start and end of each spatial axis.
strides: Stride along each spatial axis.
Inputs
Between 2 and 3 inputs.
X (heterogeneous) - T: Input data tensor from previous layer; has size (N x C x H x W), where N is the batch size, C is the number of channels, and H and W are the height and width. Note that this is for the 2D image. Otherwise the size is (N x C x D1 x D2 … x Dn). Optionally, if dimension denotation is in effect, the operation expects input data tensor to arrive with the dimension denotation of [DATA_BATCH, DATA_CHANNEL, DATA_FEATURE, DATA_FEATURE …].
W (heterogeneous) - T: The weight tensor that will be used in the convolutions; has size (M x C/group x kH x kW), where C is the number of channels, and kH and kW are the height and width of the kernel, and M is the number of feature maps. For more than 2 dimensions, the kernel shape will be (M x C/group x k1 x k2 x … x kn), where (k1 x k2 x … kn) is the dimension of the kernel. Optionally, if dimension denotation is in effect, the operation expects the weight tensor to arrive with the dimension denotation of [FILTER_OUT_CHANNEL, FILTER_IN_CHANNEL, FILTER_SPATIAL, FILTER_SPATIAL …]. X.shape[1] == (W.shape[1] * group) == C (assuming zero based indices for the shape array). Or in other words FILTER_IN_CHANNEL should be equal to DATA_CHANNEL.
B (optional, heterogeneous) - T: Optional 1D bias to be added to the convolution, has size of M.
Outputs
Y (heterogeneous) - T: Output data tensor that contains the result of the convolution. The output dimensions are functions of the kernel size, stride size, and pad lengths.
Type Constraints
T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.