RoiAlign - version 10#

This page documents version 10 of operator RoiAlign. See RoiAlign for the latest version (since version 22).

  • Domain: ai.onnx

  • Since version: 10

Region of Interest (RoI) align operation described in the Mask R-CNN paper. RoiAlign consumes an input tensor X and region of interests (rois) to apply pooling across each RoI; it produces a 4-D tensor of shape (num_rois, C, output_height, output_width).

RoiAlign is proposed to avoid the misalignment by removing quantizations while converting from original image into feature map and from feature map into RoI feature; in each ROI bin, the value of the sampled locations are computed directly through bilinear interpolation.

Inputs

  • X (T1): Input data tensor from the previous operator; 4-D feature map of shape (N, C, H, W), where N is the batch size, C is the number of channels, and H and W are the height and the width of the data.

  • rois (T1): RoIs (Regions of Interest) to pool over; rois is 2-D input of shape (num_rois, 4) given as [[x1, y1, x2, y2], …]. The RoIs’ coordinates are in the coordinate system of the input image. Each coordinate set has a 1:1 correspondence with the ‘batch_indices’ input.

  • batch_indices (T2): 1-D tensor of shape (num_rois,) with each element denoting the index of the corresponding image in the batch.

Outputs

  • Y (T1): RoI pooled output, 4-D tensor of shape (num_rois, C, output_height, output_width). The r-th batch element Y[r-1] is a pooled feature map corresponding to the r-th RoI X[r-1].

Type Constraints

  • T1: Constrain types to float tensors. Allowed types: tensor(bfloat16), tensor(double), tensor(float), tensor(float16).

  • T2: Constrain types to int tensors. Allowed types: tensor(int64).