com.microsoft - MultiHeadAttention#

MultiHeadAttention - 1#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Attributes

mask_filter_value - FLOAT : The value to be filled in the attention mask. Default value is -10000.0f
num_heads - INT (required) : Number of attention heads
scale - FLOAT : Custom scale will be used if specified. Default value is 1/sqrt(head_size)

Inputs

Between 1 and 8 inputs.

Outputs

Between 1 and 3 outputs.

Type Constraints

T in ( tensor(float), tensor(float16) ): Constrain input and output to float tensors.
M in ( tensor(int32) ): Constrain mask to integer types

Examples