RoiAlign - 10 vs 16

Files changed (1) hide show
  1. RoiAlign10 → RoiAlign16 +6 -0
RoiAlign10 → RoiAlign16 RENAMED
@@ -1 +1 @@
1
1
  Region of Interest (RoI) align operation described in the
2
2
  [Mask R-CNN paper](https://arxiv.org/abs/1703.06870).
3
3
  RoiAlign consumes an input tensor X and region of interests (rois)
4
4
  to apply pooling across each RoI; it produces a 4-D tensor of shape
5
5
  (num_rois, C, output_height, output_width).
6
6
  RoiAlign is proposed to avoid the misalignment by removing
7
7
  quantizations while converting from original image into feature
8
8
  map and from feature map into RoI feature; in each ROI bin,
9
9
  the value of the sampled locations are computed directly
10
10
  through bilinear interpolation.
11
11
  **Attributes**
12
+ * **coordinate_transformation_mode**:
13
+ Allowed values are 'half_pixel' and 'output_half_pixel'. Use the
14
+ value 'half_pixel' to pixel shift the input coordinates by -0.5 (the
15
+ recommended behavior). Use the value 'output_half_pixel' to omit the
16
+ pixel shift for the input (use this for a backward-compatible
17
+ behavior).
12
18
  * **mode**:
13
19
  The pooling method. Two modes are supported: 'avg' and 'max'.
14
20
  Default is 'avg'.
15
21
  * **output_height**:
16
22
  default 1; Pooled output Y's height.
17
23
  * **output_width**:
18
24
  default 1; Pooled output Y's width.
19
25
  * **sampling_ratio**:
20
26
  Number of sampling points in the interpolation grid used to compute
21
27
  the output value of each pooled output bin. If > 0, then exactly
22
28
  sampling_ratio x sampling_ratio grid points are used. If == 0, then
23
29
  an adaptive number of grid points are used (computed as
24
30
  ceil(roi_width / output_width), and likewise for height). Default is
25
31
  0.
26
32
  * **spatial_scale**:
27
33
  Multiplicative spatial scale factor to translate ROI coordinates
28
34
  from their input spatial scale to the scale used when pooling, i.e.,
29
35
  spatial scale of the input feature map X relative to the input
30
36
  image. E.g.; default is 1.0f.
31
37
  **Inputs**
32
38
  * **X** (heterogeneous) - **T1**:
33
39
  Input data tensor from the previous operator; 4-D feature map of
34
40
  shape (N, C, H, W), where N is the batch size, C is the number of
35
41
  channels, and H and W are the height and the width of the data.
36
42
  * **rois** (heterogeneous) - **T1**:
37
43
  RoIs (Regions of Interest) to pool over; rois is 2-D input of shape
38
44
  (num_rois, 4) given as [[x1, y1, x2, y2], ...]. The RoIs'
39
45
  coordinates are in the coordinate system of the input image. Each
40
46
  coordinate set has a 1:1 correspondence with the 'batch_indices'
41
47
  input.
42
48
  * **batch_indices** (heterogeneous) - **T2**:
43
49
  1-D tensor of shape (num_rois,) with each element denoting the index
44
50
  of the corresponding image in the batch.
45
51
  **Outputs**
46
52
  * **Y** (heterogeneous) - **T1**:
47
53
  RoI pooled output, 4-D tensor of shape (num_rois, C, output_height,
48
54
  output_width). The r-th batch element Y[r-1] is a pooled feature map
49
55
  corresponding to the r-th RoI X[r-1].
50
56
  **Type Constraints**
51
57
  * **T1** in (
52
58
  tensor(double),
53
59
  tensor(float),
54
60
  tensor(float16)
55
61
  ):
56
62
  Constrain types to float tensors.
57
63
  * **T2** in (
58
64
  tensor(int64)
59
65
  ):
60
66
  Constrain types to int tensors.