com.microsoft - DecoderMaskedMultiheadAttention#

DecoderMaskedMultiheadAttention - 1#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Attributes

  • num_heads - INT (required) : Number of attention heads

  • past_present_share_buffer - INT : Corresponding past and present are same tensor, its size is (2, batch_size, num_heads, max_sequence_length, head_size)

  • scale - FLOAT : Custom scale will be used if specified. Default value is 1/sqrt(head_size)

Inputs

Between 3 and 7 inputs.

  • input (heterogeneous) - T:

  • weights (heterogeneous) - T:

  • bias (heterogeneous) - T:

  • mask_index (optional, heterogeneous) - M:

  • past (optional, heterogeneous) - T:

  • relative_position_bias (optional, heterogeneous) - T:

  • past_sequence_length (optional, heterogeneous) - M:

Outputs

Between 1 and 2 outputs.

  • output (heterogeneous) - T:

  • present (optional, heterogeneous) - T:

Type Constraints

  • T in ( tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

  • M in ( tensor(int32) ): Constrain mask index to integer types

Examples