RoiAlign - 10 vs 16#
Next section compares an older to a newer version of the same operator after both definition are converted into markdown text. Green means an addition to the newer version, red means a deletion. Anything else is unchanged.
- RoiAlign10 → RoiAlign16 +0 -6
RoiAlign10 → RoiAlign16
RENAMED
@@ -1 +1 @@
|
|
1
1
|
Region of Interest (RoI) align operation described in the
|
2
2
|
[Mask R-CNN paper](https://arxiv.org/abs/1703.06870).
|
3
3
|
RoiAlign consumes an input tensor X and region of interests (rois)
|
4
4
|
to apply pooling across each RoI; it produces a 4-D tensor of shape
|
5
5
|
(num_rois, C, output_height, output_width).
|
6
6
|
RoiAlign is proposed to avoid the misalignment by removing
|
7
7
|
quantizations while converting from original image into feature
|
8
8
|
map and from feature map into RoI feature; in each ROI bin,
|
9
9
|
the value of the sampled locations are computed directly
|
10
10
|
through bilinear interpolation.
|
11
11
|
**Attributes**
|
12
|
-
* **coordinate_transformation_mode**:
|
13
|
-
Allowed values are 'half_pixel' and 'output_half_pixel'. Use the
|
14
|
-
value 'half_pixel' to pixel shift the input coordinates by -0.5 (the
|
15
|
-
recommended behavior). Use the value 'output_half_pixel' to omit the
|
16
|
-
pixel shift for the input (use this for a backward-compatible
|
17
|
-
behavior).
|
18
12
|
* **mode**:
|
19
13
|
The pooling method. Two modes are supported: 'avg' and 'max'.
|
20
14
|
Default is 'avg'.
|
21
15
|
* **output_height**:
|
22
16
|
default 1; Pooled output Y's height.
|
23
17
|
* **output_width**:
|
24
18
|
default 1; Pooled output Y's width.
|
25
19
|
* **sampling_ratio**:
|
26
20
|
Number of sampling points in the interpolation grid used to compute
|
27
21
|
the output value of each pooled output bin. If > 0, then exactly
|
28
22
|
sampling_ratio x sampling_ratio grid points are used. If == 0, then
|
29
23
|
an adaptive number of grid points are used (computed as
|
30
24
|
ceil(roi_width / output_width), and likewise for height). Default is
|
31
25
|
0.
|
32
26
|
* **spatial_scale**:
|
33
27
|
Multiplicative spatial scale factor to translate ROI coordinates
|
34
28
|
from their input spatial scale to the scale used when pooling, i.e.,
|
35
29
|
spatial scale of the input feature map X relative to the input
|
36
30
|
image. E.g.; default is 1.0f.
|
37
31
|
**Inputs**
|
38
32
|
* **X** (heterogeneous) - **T1**:
|
39
33
|
Input data tensor from the previous operator; 4-D feature map of
|
40
34
|
shape (N, C, H, W), where N is the batch size, C is the number of
|
41
35
|
channels, and H and W are the height and the width of the data.
|
42
36
|
* **rois** (heterogeneous) - **T1**:
|
43
37
|
RoIs (Regions of Interest) to pool over; rois is 2-D input of shape
|
44
38
|
(num_rois, 4) given as [[x1, y1, x2, y2], ...]. The RoIs'
|
45
39
|
coordinates are in the coordinate system of the input image. Each
|
46
40
|
coordinate set has a 1:1 correspondence with the 'batch_indices'
|
47
41
|
input.
|
48
42
|
* **batch_indices** (heterogeneous) - **T2**:
|
49
43
|
1-D tensor of shape (num_rois,) with each element denoting the index
|
50
44
|
of the corresponding image in the batch.
|
51
45
|
**Outputs**
|
52
46
|
* **Y** (heterogeneous) - **T1**:
|
53
47
|
RoI pooled output, 4-D tensor of shape (num_rois, C, output_height,
|
54
48
|
output_width). The r-th batch element Y[r-1] is a pooled feature map
|
55
49
|
corresponding to the r-th RoI X[r-1].
|
56
50
|
**Type Constraints**
|
57
51
|
* **T1** in (
|
58
52
|
tensor(double),
|
59
53
|
tensor(float),
|
60
54
|
tensor(float16)
|
61
55
|
):
|
62
56
|
Constrain types to float tensors.
|
63
57
|
* **T2** in (
|
64
58
|
tensor(int64)
|
65
59
|
):
|
66
60
|
Constrain types to int tensors.
|