com.microsoft - Attention#
Attention - 1#
Version
name: Attention (GitHub)
domain: com.microsoft
since_version: 1
function:
support_level: SupportType.COMMON
shape inference: True
This version of the operator has been available since version 1 of domain com.microsoft.
Summary
Attributes
do_rotary - INT : Whether to use rotary position embedding. Default value is 0.
mask_filter_value - FLOAT : The value to be filled in the attention mask. Default value is -10000.0f
num_heads - INT (required) : Number of attention heads
past_present_share_buffer - INT : Corresponding past and present are same tensor, its size is (2, batch_size, num_heads, max_sequence_length, head_size)
qkv_hidden_sizes - INTS : Hidden dimension of Q, K, V: hidden_size, hidden_size and v_hidden_size
scale - FLOAT : Custom scale will be used if specified. Default value is 1/sqrt(head_size)
unidirectional - INT : Whether every token can only attend to previous tokens. Default value is 0.
Inputs
Between 2 and 7 inputs.
input (heterogeneous) - T:
weights (heterogeneous) - T:
bias (optional, heterogeneous) - T:
mask_index (optional, heterogeneous) - M:
past (optional, heterogeneous) - T:
relative_position_bias (optional, heterogeneous) - T:
past_sequence_length (optional, heterogeneous) - M:
Outputs
Between 1 and 2 outputs.
output (heterogeneous) - T:
present (optional, heterogeneous) - T:
Type Constraints
T in ( tensor(float), tensor(float16) ): Constrain input and output types to float tensors.
M in ( tensor(int32) ): Constrain mask index to integer types
Examples