DisentangledAttention_TRT#
DisentangledAttention_TRT - 1#
Version
domain: main
since_version: 1
function:
support_level: SupportType.COMMON
shape inference: True
This version of the operator has been available since version 1.
Summary
Attributes
factor - FLOAT (required) : Scaling factor applied to attention values, 1/sqrt(3d). d is hidden size per head = H/N. H is hidden size, N is number of heads.
span - INT (required) : Maximum relative distance, k.
Inputs
c2c_attention (heterogeneous) - T:
c2p_attention (heterogeneous) - T:
p2c_attention (heterogeneous) - T:
Outputs
disentangled_attention (heterogeneous) - T:
Type Constraints
T in ( tensor(float), tensor(float16) ): Constrain input and output types to float tensors.
Examples