com.microsoft - DecoderAttention#

DecoderAttention - 1#

Version

  • name: DecoderAttention (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Attributes

  • mask_filter_value - FLOAT : The value to be filled in the attention mask. Default value is -10000.0f

  • num_heads - INT (required) : Number of attention heads

Inputs

  • query (heterogeneous) - T:

  • key (heterogeneous) - T:

  • q_weight (heterogeneous) - T:

  • kv_weight (heterogeneous) - T:

  • bias (heterogeneous) - T:

  • key_padding_mask (optional, heterogeneous) - B:

  • key_cache (optional, heterogeneous) - T:

  • value_cache (optional, heterogeneous) - T:

  • static_kv (heterogeneous) - B:

  • use_past (heterogeneous) - B:

  • has_layer_state (heterogeneous) - B:

  • has_key_padding_mask (heterogeneous) - B:

Outputs

Between 1 and 3 outputs.

  • output (heterogeneous) - T:

  • new_key_cache (optional, heterogeneous) - T:

  • new_value_cache (optional, heterogeneous) - T:

Type Constraints

  • T in ( tensor(float), tensor(float16) ): Constrain input and output types to float and float16 tensors.

  • B in ( tensor(bool) ): Constrain key_padding_mask to bool tensors.

Examples