RoiAlign - 10 vs 16#

Next section compares an older to a newer version of the same operator after both definition are converted into markdown text. Green means an addition to the newer version, red means a deletion. Anything else is unchanged.

Files changed (1) hide show
  1. RoiAlign10 → RoiAlign16 +0 -6
RoiAlign10 → RoiAlign16 RENAMED
@@ -1 +1 @@
1
1
  Region of Interest (RoI) align operation described in the
2
2
  [Mask R-CNN paper](https://arxiv.org/abs/1703.06870).
3
3
  RoiAlign consumes an input tensor X and region of interests (rois)
4
4
  to apply pooling across each RoI; it produces a 4-D tensor of shape
5
5
  (num_rois, C, output_height, output_width).
6
6
  RoiAlign is proposed to avoid the misalignment by removing
7
7
  quantizations while converting from original image into feature
8
8
  map and from feature map into RoI feature; in each ROI bin,
9
9
  the value of the sampled locations are computed directly
10
10
  through bilinear interpolation.
11
11
  **Attributes**
12
- * **coordinate_transformation_mode**:
13
- Allowed values are 'half_pixel' and 'output_half_pixel'. Use the
14
- value 'half_pixel' to pixel shift the input coordinates by -0.5 (the
15
- recommended behavior). Use the value 'output_half_pixel' to omit the
16
- pixel shift for the input (use this for a backward-compatible
17
- behavior).
18
12
  * **mode**:
19
13
  The pooling method. Two modes are supported: 'avg' and 'max'.
20
14
  Default is 'avg'.
21
15
  * **output_height**:
22
16
  default 1; Pooled output Y's height.
23
17
  * **output_width**:
24
18
  default 1; Pooled output Y's width.
25
19
  * **sampling_ratio**:
26
20
  Number of sampling points in the interpolation grid used to compute
27
21
  the output value of each pooled output bin. If > 0, then exactly
28
22
  sampling_ratio x sampling_ratio grid points are used. If == 0, then
29
23
  an adaptive number of grid points are used (computed as
30
24
  ceil(roi_width / output_width), and likewise for height). Default is
31
25
  0.
32
26
  * **spatial_scale**:
33
27
  Multiplicative spatial scale factor to translate ROI coordinates
34
28
  from their input spatial scale to the scale used when pooling, i.e.,
35
29
  spatial scale of the input feature map X relative to the input
36
30
  image. E.g.; default is 1.0f.
37
31
  **Inputs**
38
32
  * **X** (heterogeneous) - **T1**:
39
33
  Input data tensor from the previous operator; 4-D feature map of
40
34
  shape (N, C, H, W), where N is the batch size, C is the number of
41
35
  channels, and H and W are the height and the width of the data.
42
36
  * **rois** (heterogeneous) - **T1**:
43
37
  RoIs (Regions of Interest) to pool over; rois is 2-D input of shape
44
38
  (num_rois, 4) given as [[x1, y1, x2, y2], ...]. The RoIs'
45
39
  coordinates are in the coordinate system of the input image. Each
46
40
  coordinate set has a 1:1 correspondence with the 'batch_indices'
47
41
  input.
48
42
  * **batch_indices** (heterogeneous) - **T2**:
49
43
  1-D tensor of shape (num_rois,) with each element denoting the index
50
44
  of the corresponding image in the batch.
51
45
  **Outputs**
52
46
  * **Y** (heterogeneous) - **T1**:
53
47
  RoI pooled output, 4-D tensor of shape (num_rois, C, output_height,
54
48
  output_width). The r-th batch element Y[r-1] is a pooled feature map
55
49
  corresponding to the r-th RoI X[r-1].
56
50
  **Type Constraints**
57
51
  * **T1** in (
58
52
  tensor(double),
59
53
  tensor(float),
60
54
  tensor(float16)
61
55
  ):
62
56
  Constrain types to float tensors.
63
57
  * **T2** in (
64
58
  tensor(int64)
65
59
  ):
66
60
  Constrain types to int tensors.