RoiAlign - version 16#

This page documents version 16 of operator RoiAlign. See RoiAlign for the latest version (since version 22).

Domain: ai.onnx
Since version: 16

Region of Interest (RoI) align operation described in the Mask R-CNN paper. RoiAlign consumes an input tensor X and region of interests (rois) to apply pooling across each RoI; it produces a 4-D tensor of shape (num_rois, C, output_height, output_width).

RoiAlign is proposed to avoid the misalignment by removing quantizations while converting from original image into feature map and from feature map into RoI feature; in each ROI bin, the value of the sampled locations are computed directly through bilinear interpolation.

Inputs

X (T1): Input data tensor from the previous operator; 4-D feature map of shape (N, C, H, W), where N is the batch size, C is the number of channels, and H and W are the height and the width of the data.
rois (T1): RoIs (Regions of Interest) to pool over; rois is 2-D input of shape (num_rois, 4) given as [[x1, y1, x2, y2], …]. The RoIs’ coordinates are in the coordinate system of the input image. Each coordinate set has a 1:1 correspondence with the ‘batch_indices’ input.
batch_indices (T2): 1-D tensor of shape (num_rois,) with each element denoting the index of the corresponding image in the batch.

Outputs

Y (T1): RoI pooled output, 4-D tensor of shape (num_rois, C, output_height, output_width). The r-th batch element Y[r-1] is a pooled feature map corresponding to the r-th RoI X[r-1].

Type Constraints

T1: Constrain types to float tensors. Allowed types: tensor(bfloat16), tensor(double), tensor(float), tensor(float16).
T2: Constrain types to int tensors. Allowed types: tensor(int64).

Examples#

test_cc_roialign

Node:
  RoiAlign(X, rois, batch_indices) -> (Y)
  Attributes:
    mode = "avg"
    output_height = 5
    output_width = 5
    sampling_ratio = 2
    spatial_scale = 1.0

Inputs:
  X: shape=(1, 1, 10, 10), dtype=float32
    [[[[0.  , 0.01, 0.02, ..., 0.07, 0.08, 0.09],
       [0.1 , 0.11, 0.12, ..., 0.17, 0.18, 0.19],
       [0.2 , 0.21, 0.22, ..., 0.27, 0.28, 0.29],
       ...,
       [0.7 , 0.71, 0.72, ..., 0.77, 0.78, 0.79],
       [0.8 , 0.81, 0.82, ..., 0.87, 0.88, 0.89],
       [0.9 , 0.91, 0.92, ..., 0.97, 0.98, 0.99]]]]
  rois: shape=(2, 4), dtype=float32
    [[0., 0., 9., 9.],
     [2., 2., 7., 7.]]
  batch_indices: shape=(2,), dtype=int64
    [0, 0]

Outputs:
  Y: shape=(2, 1, 5, 5), dtype=float32
    [[[[0.04674999, 0.0645    , 0.0825    , 0.10049999, 0.11849999],
       [0.22424999, 0.24199998, 0.26      , 0.278     , 0.296     ],
       [0.40425003, 0.422     , 0.44      , 0.458     , 0.47599998],
       [0.58425   , 0.60199994, 0.61999995, 0.63799995, 0.6559999 ],
       [0.7642499 , 0.78199995, 0.7999999 , 0.81799996, 0.8359999 ]]],


     [[[0.22      , 0.22999999, 0.24      , 0.25      , 0.26      ],
       [0.32      , 0.32999998, 0.33999997, 0.35000002, 0.36      ],
       [0.42      , 0.43      , 0.44      , 0.45      , 0.45999998],
       [0.52      , 0.53      , 0.53999996, 0.5500001 , 0.56      ],
       [0.62      , 0.63      , 0.64      , 0.65      , 0.65999997]]]]

test_cc_roialign_aligned_false

Node:
  RoiAlign(X, rois, batch_indices) -> (Y)
  Attributes:
    coordinate_transformation_mode = "output_half_pixel"
    output_height = 5
    output_width = 5
    sampling_ratio = 2
    spatial_scale = 1.0

Inputs:
  X: shape=(1, 1, 10, 10), dtype=float32
    [[[[0.2764, 0.715 , 0.1958, ..., 0.6518, 0.4856, 0.725 ],
       [0.9637, 0.0895, 0.2919, ..., 0.5324, 0.8992, 0.4467],
       [0.3265, 0.8479, 0.9698, ..., 0.4308, 0.34  , 0.2162],
       ...,
       [0.1366, 0.3671, 0.7011, ..., 0.5609, 0.8788, 0.9928],
       [0.5697, 0.8511, 0.6711, ..., 0.1049, 0.1559, 0.2514],
       [0.7012, 0.4056, 0.7879, ..., 0.3727, 0.5482, 0.0502]]]]
  rois: shape=(3, 4), dtype=float32
    [[0., 0., 9., 9.],
     [0., 5., 4., 9.],
     [5., 5., 9., 9.]]
  batch_indices: shape=(3,), dtype=int64
    [0, 0, 0]

Outputs:
  Y: shape=(3, 1, 5, 5), dtype=float32
    [[[[0.46642143, 0.44655263, 0.34052122, 0.56884855, 0.6067808 ],
       [0.37137935, 0.429572  , 0.38352   , 0.5562415 , 0.35105002],
       [0.27680248, 0.48828623, 0.52220017, 0.55277014, 0.41705737],
       [0.4712407 , 0.4844096 , 0.69045746, 0.49203938, 0.87739855],
       [0.6238897 , 0.712462  , 0.62892646, 0.335504  , 0.34946907]]],


     [[[0.30218   , 0.4304639 , 0.469586  , 0.39774403, 0.54226   ],
       [0.36555204, 0.704924  , 0.516482  , 0.317132  , 0.7014441 ],
       [0.29123998, 0.50589806, 0.6476109 , 0.6234899 , 0.82988   ],
       [0.591568  , 0.73885995, 0.704826  , 0.83714795, 0.889316  ],
       [0.62268007, 0.6152761 , 0.709714  , 0.615356  , 0.45852405]]],


     [[[0.23845196, 0.33795202, 0.3716939 , 0.6099999 , 0.76005995],
       [0.376724  , 0.37853205, 0.7146899 , 0.92430794, 0.97278404],
       [0.57490396, 0.58262396, 0.5709361 , 0.761904  , 0.87699807],
       [0.53550804, 0.25658002, 0.21409804, 0.27960402, 0.36      ],
       [0.43648803, 0.350428  , 0.28875598, 0.36613995, 0.23492002]]]]

test_cc_roialign_aligned_true

Node:
  RoiAlign(X, rois, batch_indices) -> (Y)
  Attributes:
    coordinate_transformation_mode = "half_pixel"
    output_height = 5
    output_width = 5
    sampling_ratio = 2
    spatial_scale = 1.0

Inputs:
  X: shape=(1, 1, 10, 10), dtype=float32
    [[[[0.2764, 0.715 , 0.1958, ..., 0.6518, 0.4856, 0.725 ],
       [0.9637, 0.0895, 0.2919, ..., 0.5324, 0.8992, 0.4467],
       [0.3265, 0.8479, 0.9698, ..., 0.4308, 0.34  , 0.2162],
       ...,
       [0.1366, 0.3671, 0.7011, ..., 0.5609, 0.8788, 0.9928],
       [0.5697, 0.8511, 0.6711, ..., 0.1049, 0.1559, 0.2514],
       [0.7012, 0.4056, 0.7879, ..., 0.3727, 0.5482, 0.0502]]]]
  rois: shape=(3, 4), dtype=float32
    [[0., 0., 9., 9.],
     [0., 5., 4., 9.],
     [5., 5., 9., 9.]]
  batch_indices: shape=(3,), dtype=int64
    [0, 0, 0]

Outputs:
  Y: shape=(3, 1, 5, 5), dtype=float32
    [[[[0.517783  , 0.343411  , 0.32290465, 0.44736183, 0.63437545],
       [0.40308002, 0.53664714, 0.44279063, 0.486144  , 0.40231282],
       [0.25119427, 0.40015405, 0.5155241 , 0.6953686 , 0.34653684],
       [0.33503968, 0.4600988 , 0.58806926, 0.34386316, 0.6849323 ],
       [0.49319047, 0.7140578 , 0.82174444, 0.47193527, 0.40394643]]],


     [[[0.3069545 , 0.218678  , 0.33369   , 0.48800054, 0.48696172],
       [0.18709001, 0.49142003, 0.55611   , 0.41916698, 0.36860803],
       [0.1432775 , 0.4608351 , 0.59712505, 0.53095996, 0.49820745],
       [0.2788185 , 0.43856898, 0.60220003, 0.700038  , 0.752436  ],
       [0.5773852 , 0.70238346, 0.7250975 , 0.7337535 , 0.81630385]]],


     [[[0.23933008, 0.40751356, 0.33789265, 0.25252149, 0.4743352 ],
       [0.36707455, 0.270168  , 0.41050994, 0.64189005, 0.83077747],
       [0.55564   , 0.4542949 , 0.5564499 , 0.75015   , 0.9299975 ],
       [0.6625695 , 0.561664  , 0.48127502, 0.495449  , 0.66630596],
       [0.66357267, 0.37210703, 0.20560265, 0.19277613, 0.24784915]]]]

test_cc_roialign_max

Node:
  RoiAlign(X, rois, batch_indices) -> (Y)
  Attributes:
    mode = "max"
    coordinate_transformation_mode = "output_half_pixel"
    output_height = 5
    output_width = 5
    sampling_ratio = 2
    spatial_scale = 1.0

Inputs:
  X: shape=(1, 1, 10, 10), dtype=float32
    [[[[0.  , 0.01, 0.02, ..., 0.07, 0.08, 0.09],
       [0.1 , 0.11, 0.12, ..., 0.17, 0.18, 0.19],
       [0.2 , 0.21, 0.22, ..., 0.27, 0.28, 0.29],
       ...,
       [0.7 , 0.71, 0.72, ..., 0.77, 0.78, 0.79],
       [0.8 , 0.81, 0.82, ..., 0.87, 0.88, 0.89],
       [0.9 , 0.91, 0.92, ..., 0.97, 0.98, 0.99]]]]
  rois: shape=(2, 4), dtype=float32
    [[0., 0., 9., 9.],
     [2., 2., 7., 7.]]
  batch_indices: shape=(2,), dtype=int64
    [0, 0]

Outputs:
  Y: shape=(2, 1, 5, 5), dtype=float32
    [[[[0.04777499, 0.07182502, 0.09262499, 0.08839995, 0.07604997],
       [0.17127506, 0.23842509, 0.282625  , 0.26009986, 0.20994991],
       [0.31492496, 0.42797494, 0.4963748 , 0.4521996 , 0.3581497 ],
       [0.34612483, 0.4653748 , 0.53437454, 0.48449937, 0.38024953],
       [0.34222484, 0.4585748 , 0.5248746 , 0.4751494 , 0.37179956]]],


     [[[0.185625  , 0.19125   , 0.19687499, 0.20250002, 0.208125  ],
       [0.24187501, 0.2475    , 0.25312498, 0.25875   , 0.264375  ],
       [0.29812497, 0.30375   , 0.30937502, 0.315     , 0.320625  ],
       [0.354375  , 0.35999998, 0.365625  , 0.37125   , 0.376875  ],
       [0.410625  , 0.41625   , 0.421875  , 0.4275    , 0.433125  ]]]]

test_cc_roialign_mode_max

Node:
  RoiAlign(X, rois, batch_indices) -> (Y)
  Attributes:
    mode = "max"
    coordinate_transformation_mode = "output_half_pixel"
    output_height = 5
    output_width = 5
    sampling_ratio = 2
    spatial_scale = 1.0

Inputs:
  X: shape=(1, 1, 10, 10), dtype=float32
    [[[[0.2764, 0.715 , 0.1958, ..., 0.6518, 0.4856, 0.725 ],
       [0.9637, 0.0895, 0.2919, ..., 0.5324, 0.8992, 0.4467],
       [0.3265, 0.8479, 0.9698, ..., 0.4308, 0.34  , 0.2162],
       ...,
       [0.1366, 0.3671, 0.7011, ..., 0.5609, 0.8788, 0.9928],
       [0.5697, 0.8511, 0.6711, ..., 0.1049, 0.1559, 0.2514],
       [0.7012, 0.4056, 0.7879, ..., 0.3727, 0.5482, 0.0502]]]]
  rois: shape=(3, 4), dtype=float32
    [[0., 0., 9., 9.],
     [0., 5., 4., 9.],
     [5., 5., 9., 9.]]
  batch_indices: shape=(3,), dtype=int64
    [0, 0, 0]

Outputs:
  Y: shape=(3, 1, 5, 5), dtype=float32
    [[[[0.3445228 , 0.37310338, 0.37865096, 0.446696  , 0.37991184],
       [0.4133513 , 0.5455125 , 0.6651902 , 0.55805874, 0.27110294],
       [0.21223956, 0.40924096, 0.8417618 , 0.792561  , 0.37196714],
       [0.46835402, 0.39741728, 0.8012819 , 0.4969306 , 0.5495158 ],
       [0.3595896 , 0.5196813 , 0.5403741 , 0.23814403, 0.19992709]]],


     [[[0.30517197, 0.5086199 , 0.3189761 , 0.4054401 , 0.47630402],
       [0.50862   , 0.8477    , 0.37808004, 0.24936005, 0.79384017],
       [0.17620805, 0.29368007, 0.44870415, 0.4987201 , 0.63148826],
       [0.51066005, 0.8511    , 0.5368801 , 0.9406    , 0.70008016],
       [0.4487681 , 0.51066035, 0.5042561 , 0.5643603 , 0.42004836]]],


     [[[0.21062402, 0.3510401 , 0.37416005, 0.5967599 , 0.46507207],
       [0.32336006, 0.31180006, 0.6236001 , 0.9946    , 0.7751202 ],
       [0.35744014, 0.5588001 , 0.35897616, 0.7030401 , 0.6353923 ],
       [0.5996801 , 0.27940005, 0.17948808, 0.35152006, 0.31769615],
       [0.3598083 , 0.40752012, 0.2385281 , 0.43856013, 0.26313624]]]]

Differences with previous version (10)#

SchemaDiff: RoiAlign (domain 'ai.onnx')

old version: 10
new version: 16
breaking: no