com.microsoft - DecoderAttention#
DecoderAttention - 1#
Version
domain: com.microsoft
since_version: 1
function:
support_level: SupportType.COMMON
shape inference: True
This version of the operator has been available since version 1 of domain com.microsoft.
Summary
Attributes
mask_filter_value - FLOAT : The value to be filled in the attention mask. Default value is -10000.0f
num_heads - INT (required) : Number of attention heads
Inputs
query (heterogeneous) - T:
key (heterogeneous) - T:
q_weight (heterogeneous) - T:
kv_weight (heterogeneous) - T:
bias (heterogeneous) - T:
key_padding_mask (optional, heterogeneous) - B:
key_cache (optional, heterogeneous) - T:
value_cache (optional, heterogeneous) - T:
static_kv (heterogeneous) - B:
use_past (heterogeneous) - B:
has_layer_state (heterogeneous) - B:
has_key_padding_mask (heterogeneous) - B:
Outputs
Between 1 and 3 outputs.
output (heterogeneous) - T:
new_key_cache (optional, heterogeneous) - T:
new_value_cache (optional, heterogeneous) - T:
Type Constraints
T in ( tensor(float), tensor(float16) ): Constrain input and output types to float and float16 tensors.
B in ( tensor(bool) ): Constrain key_padding_mask to bool tensors.
Examples