DisentangledAttention_TRT#

DisentangledAttention_TRT - 1#

Version

name: DisentangledAttention_TRT (GitHub)
domain: main
since_version: 1
function:
support_level: SupportType.COMMON
shape inference: True

This version of the operator has been available since version 1.

Summary

Attributes

factor - FLOAT (required) : Scaling factor applied to attention values, 1/sqrt(3d). d is hidden size per head = H/N. H is hidden size, N is number of heads.
span - INT (required) : Maximum relative distance, k.

Inputs

c2c_attention (heterogeneous) - T:
c2p_attention (heterogeneous) - T:
p2c_attention (heterogeneous) - T:

Outputs

disentangled_attention (heterogeneous) - T:

Type Constraints

T in ( tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

Examples