DisentangledAttention_TRT#

DisentangledAttention_TRT - 1#

Version

This version of the operator has been available since version 1.

Summary

Attributes

  • factor - FLOAT (required) : Scaling factor applied to attention values, 1/sqrt(3d). d is hidden size per head = H/N. H is hidden size, N is number of heads.

  • span - INT (required) : Maximum relative distance, k.

Inputs

  • c2c_attention (heterogeneous) - T:

  • c2p_attention (heterogeneous) - T:

  • p2c_attention (heterogeneous) - T:

Outputs

  • disentangled_attention (heterogeneous) - T:

Type Constraints

  • T in ( tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

Examples