Loop#

Loop - 16#

Version

  • name: Loop (GitHub)

  • domain: main

  • since_version: 16

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 16.

Summary

Generic Looping construct. This loop has multiple termination conditions:

  1. Trip count. Iteration count specified at runtime. Set by specifying the input M. Optional. Set to empty string to omit. Note that a static trip count (specified at graph construction time) can be specified by passing in a constant node for input M.

  2. Loop termination condition. This is an input to the op that determines whether to run the first iteration and also a loop-carried dependency for the body graph. The body graph must yield a value for the condition variable, whether this input is provided or not.

This table summarizes the operating modes of this operator with equivalent C-style code:

Operator inputs defined as (max_trip_count, condition_var).

input (“”, “”):
for (int i=0; ; ++i) {

cond = … // Note this value is ignored, but is required in the body

}

input (“”, cond) // Note this is analogous to a while loop

bool cond = …; for (int i=0; cond; ++i) {

cond = …;

}

input (“”, 1) // Note this is analogous to a do-while loop

bool cond = true for (int i=0; cond; ++i) {

cond = …;

}

input (trip_count, “”) // Note this is analogous to a for loop

int trip_count = … for (int i=0; i < trip_count; ++i) {

cond = …; // ignored

}

input (trip_count, cond)

int trip_count = …; bool cond = …; for (int i=0; i < trip_count && cond; ++i) {

cond = …;

}

Sample usage - cond as well as trip count

graph predict-net {

%a = Constant[value = <Scalar Tensor [3]>]() %b = Constant[value = <Scalar Tensor [6]>]() %keepgoing = Constant[value = <Scalar Tensor [1]>]() %max_trip_count = Constant[value = <Scalar Tensor [10]>]() %keepgoing_out, %b_out, %user_defined_vals = Loop[body = <graph body-net>](%max_trip_count, %keepgoing, %b) return

}

graph body-net (

%i[INT32, scalar] // iteration number %keepgoing_in[BOOL, scalar] // incoming loop-termination-condition; not used %b_in[INT32, scalar] // incoming value of loop-carried-dependency b

) {

%my_local = Add(%a, %b_in) %b_out = Sub(%a, %b_in) // outgoing value of loop-carried-dependency b %keepgoing_out = Greater(%my_local, %b_out) // outgoing loop-termination-condition %user_defined_val = Add(%b_in, %b_in) // scan-output value to be accumulated return %keepgoing_out, %b_out, %user_defined_val

}

Sample equivalent C code

{

/* User-defined code (enclosing scope) / int a = 3, b = 6; bool keepgoing = true; // Analogous to input cond / End user-defined code */

/* Implicitly-defined code / const int max_trip_count = 10; // Analogous to input M int user_defined_vals[]; // Imagine this is resizable / End implicitly-defined code / / initialize loop-carried variables and scan-output variables */ bool keepgoing_out = keepgoing int b_out = b

for (int i=0; i < max_trip_count && keepgoing_out; ++i) {
/* Implicitly-defined code: bind actual parameter values

to formal parameter variables of loop-body */

bool keepgoing_in = keepgoing_out; bool b_in = b_out;

/* User-defined code (loop body) / int my_local = a + b_in; // Reading value “a” from the enclosing scope is fine b_out = a - b_in; keepgoing_out = my_local > b_out; user_defined_val = b_in + b_in; // b_in and b_out are different variables / End user-defined code */

/* Implicitly defined-code */ user_defined_vals[i] = user_defined_val // accumulate scan-output values

} // int t = my_local; // Can’t do this. my_local is not accessible here.

// The values below are bound to the output variables of the loop and therefore accessible // b_out; user_defined_vals; keepgoing_out;

}

There are several things of note in this code snippet:

  1. Values from the enclosing scope (i.e. variable “a” here) are in scope and can be referenced in the inputs of the loop.

  2. Any values computed in the loop body that needs to be used in a subsequent iteration or after the loop are modelled using a pair of variables in the loop-body, consisting of an input variable (eg., b_in) and an output variable (eg., b_out). These are referred to as loop-carried dependences. The loop operation node supplies the input value of the input variable for the first iteration, and returns the output value of the output variable produced by the final iteration.

  3. Scan_output variables are used to implicitly concatenate values computed across all the iterations. In the above example, the value of user_defined_val computed over all iterations are concatenated and returned as the value of user_defined_vals after the loop.

  4. Values created in the body cannot be accessed in the enclosing scope, except using the mechanism described above.

Note that the semantics of this op support “diagonal” or “wavefront” execution. (See Step 3 here for an example: https://devblogs.nvidia.com/optimizing-recurrent-neural-networks-cudnn-5/). Frontends should emit multi-layer RNNs as a series of While operators (with time being the inner looping dimension), with each successive layer consuming the scan_outputs from the previous layer, possibly going through several point-wise operators (e.g. dropout, residual connections, linear layer).

The input/output of subgraph (produced by loop node) matching is based on order instead of name. The implementation will figure out the names based on this order.

Attributes

  • body (required): The graph run each iteration. It has 2+N inputs: (iteration_num, condition, loop carried dependencies…). It has 1+N+K outputs: (condition, loop carried dependencies…, scan_outputs…). Each scan_output is created by concatenating the value of the specified output value at the end of each iteration of the loop. It is an error if the dimensions or data type of these scan_outputs change across loop iterations.

Inputs

Between 2 and 2147483647 inputs.

  • M (optional, heterogeneous) - I: A maximum trip-count for the loop specified at runtime. Optional. Pass empty string to skip.

  • cond (optional, heterogeneous) - B: A boolean termination condition. Optional. Pass empty string to skip.

  • v_initial (variadic) - V: The initial values of any loop-carried dependencies (values that change across loop iterations)

Outputs

Between 1 and 2147483647 outputs.

  • v_final_and_scan_outputs (variadic) - V: Final N loop carried dependency values then K scan_outputs. Scan outputs must be Tensors.

Type Constraints

  • V in ( optional(seq(tensor(bfloat16))), optional(seq(tensor(bool))), optional(seq(tensor(complex128))), optional(seq(tensor(complex64))), optional(seq(tensor(double))), optional(seq(tensor(float))), optional(seq(tensor(float16))), optional(seq(tensor(int16))), optional(seq(tensor(int32))), optional(seq(tensor(int64))), optional(seq(tensor(int8))), optional(seq(tensor(string))), optional(seq(tensor(uint16))), optional(seq(tensor(uint32))), optional(seq(tensor(uint64))), optional(seq(tensor(uint8))), optional(tensor(bfloat16)), optional(tensor(bool)), optional(tensor(complex128)), optional(tensor(complex64)), optional(tensor(double)), optional(tensor(float)), optional(tensor(float16)), optional(tensor(int16)), optional(tensor(int32)), optional(tensor(int64)), optional(tensor(int8)), optional(tensor(string)), optional(tensor(uint16)), optional(tensor(uint32)), optional(tensor(uint64)), optional(tensor(uint8)), seq(tensor(bfloat16)), seq(tensor(bool)), seq(tensor(complex128)), seq(tensor(complex64)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(string)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)), tensor(bfloat16), tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): All Tensor, Sequence(Tensor), Optional(Tensor), and Optional(Sequence(Tensor)) types

  • I in ( tensor(int64) ): tensor of int64, which should be a scalar.

  • B in ( tensor(bool) ): tensor of bool, which should be a scalar.

Examples

loop_11

# Given a tensor x of values [x1, ..., xN], and initial tensor y
# sum up its elements using a scan
# returning the final state (y+x1+x2+...+xN) as well the scan_output
# [y+x1, y+x1+x2, ..., y+x1+x2+...+xN]

y_in = onnx.helper.make_tensor_value_info('y_in', onnx.TensorProto.FLOAT, [1])
y_out = onnx.helper.make_tensor_value_info('y_out', onnx.TensorProto.FLOAT, [1])
scan_out = onnx.helper.make_tensor_value_info('scan_out', onnx.TensorProto.FLOAT, [1])
cond_in = onnx.helper.make_tensor_value_info('cond_in', onnx.TensorProto.BOOL, [])
cond_out = onnx.helper.make_tensor_value_info('cond_out', onnx.TensorProto.BOOL, [])
iter_count = onnx.helper.make_tensor_value_info('iter_count', onnx.TensorProto.INT64, [])

x = np.array([1, 2, 3, 4, 5]).astype(np.float32)
y = np.array([-2]).astype(np.float32)

x_const_node = onnx.helper.make_node(
    'Constant',
    inputs=[],
    outputs=['x'],
    value=onnx.helper.make_tensor(
        name='const_tensor_x',
        data_type=onnx.TensorProto.FLOAT,
        dims=x.shape,
        vals=x.flatten().astype(float),
    )
)

one_const_node = onnx.helper.make_node(
    'Constant',
    inputs=[],
    outputs=['one'],
    value=onnx.helper.make_tensor(
        name='const_tensor_one',
        data_type=onnx.TensorProto.INT64,
        dims=(),
        vals=[1]
    )
)

i_add_node = onnx.helper.make_node(
    'Add',
    inputs=['iter_count', 'one'],
    outputs=['end']
)

start_unsqueeze_node = onnx.helper.make_node(
    'Unsqueeze',
    inputs=['iter_count'],
    outputs=['slice_start'],
    axes=[0]
)

end_unsqueeze_node = onnx.helper.make_node(
    'Unsqueeze',
    inputs=['end'],
    outputs=['slice_end'],
    axes=[0]
)

slice_node = onnx.helper.make_node(
    'Slice',
    inputs=['x', 'slice_start', 'slice_end'],
    outputs=['slice_out']
)

y_add_node = onnx.helper.make_node(
    'Add',
    inputs=['y_in', 'slice_out'],
    outputs=['y_out']
)

identity_node = onnx.helper.make_node(
    'Identity',
    inputs=['cond_in'],
    outputs=['cond_out']
)

scan_identity_node = onnx.helper.make_node(
    'Identity',
    inputs=['y_out'],
    outputs=['scan_out']
)

loop_body = onnx.helper.make_graph(
    [identity_node, x_const_node, one_const_node, i_add_node,
     start_unsqueeze_node, end_unsqueeze_node, slice_node, y_add_node,
     scan_identity_node],
    'loop_body',
    [iter_count, cond_in, y_in],
    [cond_out, y_out, scan_out]
)

node = onnx.helper.make_node(
    'Loop',
    inputs=['trip_count', 'cond', 'y'],
    outputs=['res_y', 'res_scan'],
    body=loop_body
)

trip_count = np.array(5).astype(np.int64)
res_y = np.array([13]).astype(np.float32)
cond = np.array(1).astype(bool)
res_scan = np.array([-1, 1, 4, 8, 13]).astype(np.float32).reshape((5, 1))
expect(node, inputs=[trip_count, cond, y], outputs=[res_y, res_scan],
       name='test_loop11', opset_imports=[onnx.helper.make_opsetid("", 11)])

loop_13

# Given a tensor x of values [x1, ..., xN],
# Return a sequence of tensors of
#   [[x1], [x1, x2], ..., [x1, ..., xN]]

seq_in = onnx.helper.make_tensor_sequence_value_info('seq_in', onnx.TensorProto.FLOAT, None)
seq_out = onnx.helper.make_tensor_sequence_value_info('seq_out', onnx.TensorProto.FLOAT, None)
cond_in = onnx.helper.make_tensor_value_info('cond_in', onnx.TensorProto.BOOL, [])
cond_out = onnx.helper.make_tensor_value_info('cond_out', onnx.TensorProto.BOOL, [])
iter_count = onnx.helper.make_tensor_value_info('iter_count', onnx.TensorProto.INT64, [])

x = np.array([1, 2, 3, 4, 5]).astype(np.float32)

x_const_node = onnx.helper.make_node(
    'Constant',
    inputs=[],
    outputs=['x'],
    value=onnx.helper.make_tensor(
        name='const_tensor_x',
        data_type=onnx.TensorProto.FLOAT,
        dims=x.shape,
        vals=x.flatten().astype(float),
    )
)

one_const_node = onnx.helper.make_node(
    'Constant',
    inputs=[],
    outputs=['one'],
    value=onnx.helper.make_tensor(
        name='const_tensor_one',
        data_type=onnx.TensorProto.INT64,
        dims=(),
        vals=[1]
    )
)

zero_const_node = onnx.helper.make_node(
    'Constant',
    inputs=[],
    outputs=['slice_start'],
    value=onnx.helper.make_tensor(
        name='const_tensor_zero',
        data_type=onnx.TensorProto.INT64,
        dims=(1,),
        vals=[0]
    )
)

axes_node = onnx.helper.make_node(
    'Constant',
    inputs=[],
    outputs=['axes'],
    value=onnx.helper.make_tensor(
        name='const_tensor_axes',
        data_type=onnx.TensorProto.INT64,
        dims=(),
        vals=[0]
    )
)

add_node = onnx.helper.make_node(
    'Add',
    inputs=['iter_count', 'one'],
    outputs=['end']
)

end_unsqueeze_node = onnx.helper.make_node(
    'Unsqueeze',
    inputs=['end', 'axes'],
    outputs=['slice_end']
)

slice_node = onnx.helper.make_node(
    'Slice',
    inputs=['x', 'slice_start', 'slice_end'],
    outputs=['slice_out']
)

insert_node = onnx.helper.make_node(
    'SequenceInsert',
    inputs=['seq_in', 'slice_out'],
    outputs=['seq_out']
)

identity_node = onnx.helper.make_node(
    'Identity',
    inputs=['cond_in'],
    outputs=['cond_out']
)

loop_body = onnx.helper.make_graph(
    [identity_node, x_const_node, one_const_node, zero_const_node, add_node,
     axes_node, end_unsqueeze_node, slice_node, insert_node],
    'loop_body',
    [iter_count, cond_in, seq_in],
    [cond_out, seq_out]
)

node = onnx.helper.make_node(
    'Loop',
    inputs=['trip_count', 'cond', 'seq_empty'],
    outputs=['seq_res'],
    body=loop_body
)

trip_count = np.array(5).astype(np.int64)
seq_empty: List[Any] = []
seq_res = [x[:int(i)] for i in x]
cond = np.array(1).astype(bool)
expect(node, inputs=[trip_count, cond, seq_empty], outputs=[seq_res],
       name='test_loop13_seq', opset_imports=[onnx.helper.make_opsetid("", 13)],
       input_type_protos=[onnx.helper.make_tensor_type_proto(onnx.TensorProto.INT64, trip_count.shape),
                          onnx.helper.make_tensor_type_proto(onnx.TensorProto.BOOL, cond.shape),
                          onnx.helper.make_sequence_type_proto(
                              onnx.helper.make_tensor_type_proto(onnx.TensorProto.FLOAT, []))])

loop_16_none

# Given a tensor sequence of values [x1, ..., xN], and an initial optional sequence of tensors [x0],
# Return a concatenated sequence of tensors of
#   [x0, [x1], [x1, x2], ..., [x1, ..., xN]]

ten_in_tp = onnx.helper.make_tensor_type_proto(onnx.TensorProto.FLOAT, [])
seq_in_tp = onnx.helper.make_sequence_type_proto(ten_in_tp)
opt_in_tp = onnx.helper.make_optional_type_proto(seq_in_tp)
opt_in = onnx.helper.make_value_info('opt_seq_in', opt_in_tp)
seq_out = onnx.helper.make_tensor_sequence_value_info('seq_out', onnx.TensorProto.FLOAT, [])
cond_in = onnx.helper.make_tensor_value_info('cond_in', onnx.TensorProto.BOOL, [])
cond_out = onnx.helper.make_tensor_value_info('cond_out', onnx.TensorProto.BOOL, [])
iter_count = onnx.helper.make_tensor_value_info('iter_count', onnx.TensorProto.INT64, [])

x0 = np.array(0).astype(np.float32)
x = np.array([1, 2, 3, 4, 5]).astype(np.float32)

optional_has_elem_node = onnx.helper.make_node(
    'OptionalHasElement',
    inputs=['opt_seq_in'],
    outputs=['optional_has_elem']
)

optional_is_none = onnx.helper.make_node(
    'Not',
    inputs=['optional_has_elem'],
    outputs=['optional_is_none']
)

optional_get_elem = onnx.helper.make_node(
    'OptionalGetElement',
    inputs=['opt_seq_in'],
    outputs=['seq_in']
)

constant_in = onnx.helper.make_node(
    'Constant',
    inputs=[],
    outputs=['constant_in'],
    value=onnx.helper.make_tensor(
        name='const_tensor',
        data_type=onnx.TensorProto.FLOAT,
        dims=(),
        vals=[0]
    )
)

seq_const_in = onnx.helper.make_node(
    'SequenceConstruct',
    inputs=['constant_in'],
    outputs=['init_seq_in']
)

then_seq_out = onnx.helper.make_tensor_sequence_value_info('init_seq_in', onnx.TensorProto.FLOAT, [])
then_body = onnx.helper.make_graph(
    [constant_in, seq_const_in],
    'then_body',
    [],
    [then_seq_out]
)

else_seq_out = onnx.helper.make_tensor_sequence_value_info('seq_in', onnx.TensorProto.FLOAT, [])
else_body = onnx.helper.make_graph(
    [optional_get_elem],
    'else_body',
    [],
    [else_seq_out]
)

if_node = onnx.helper.make_node(
    'If',
    inputs=['optional_is_none'],
    outputs=['sequence'],
    then_branch=then_body,
    else_branch=else_body
)

x_const_node = onnx.helper.make_node(
    'Constant',
    inputs=[],
    outputs=['x'],
    value=onnx.helper.make_tensor(
        name='const_tensor_x',
        data_type=onnx.TensorProto.FLOAT,
        dims=x.shape,
        vals=x.flatten().astype(float),
    )
)

one_const_node = onnx.helper.make_node(
    'Constant',
    inputs=[],
    outputs=['one'],
    value=onnx.helper.make_tensor(
        name='const_tensor_one',
        data_type=onnx.TensorProto.INT64,
        dims=(),
        vals=[1]
    )
)

zero_const_node = onnx.helper.make_node(
    'Constant',
    inputs=[],
    outputs=['slice_start'],
    value=onnx.helper.make_tensor(
        name='const_tensor_zero',
        data_type=onnx.TensorProto.INT64,
        dims=(1,),
        vals=[0]
    )
)

axes_node = onnx.helper.make_node(
    'Constant',
    inputs=[],
    outputs=['axes'],
    value=onnx.helper.make_tensor(
        name='const_tensor_axes',
        data_type=onnx.TensorProto.INT64,
        dims=(),
        vals=[0]
    )
)

add_node = onnx.helper.make_node(
    'Add',
    inputs=['iter_count', 'one'],
    outputs=['end']
)

end_unsqueeze_node = onnx.helper.make_node(
    'Unsqueeze',
    inputs=['end', 'axes'],
    outputs=['slice_end']
)

slice_node = onnx.helper.make_node(
    'Slice',
    inputs=['x', 'slice_start', 'slice_end'],
    outputs=['slice_out']
)

insert_node = onnx.helper.make_node(
    'SequenceInsert',
    inputs=['sequence', 'slice_out'],
    outputs=['seq_out']
)

identity_node = onnx.helper.make_node(
    'Identity',
    inputs=['cond_in'],
    outputs=['cond_out']
)

loop_body = onnx.helper.make_graph(
    [identity_node, optional_has_elem_node, optional_is_none, if_node, x_const_node, one_const_node,
     zero_const_node, add_node, axes_node, end_unsqueeze_node, slice_node, insert_node],
    'loop_body',
    [iter_count, cond_in, opt_in],
    [cond_out, seq_out]
)

node = onnx.helper.make_node(
    'Loop',
    inputs=['trip_count', 'cond', 'opt_seq'],
    outputs=['seq_res'],
    body=loop_body
)

trip_count = np.array(5).astype(np.int64)
cond = np.array(1).astype(bool)
seq_res = compute_loop_outputs(x, [x0], trip_count)
opt_seq_in: List[Any] = [x0]
expect(node, inputs=[trip_count, cond, opt_seq_in], outputs=[seq_res],
       name='test_loop16_seq_none', opset_imports=[onnx.helper.make_opsetid("", 16)],
       input_type_protos=[onnx.helper.make_tensor_type_proto(onnx.TensorProto.INT64, trip_count.shape),
                          onnx.helper.make_tensor_type_proto(onnx.TensorProto.BOOL, cond.shape),
                          opt_in_tp])

Differences

00Generic Looping construct. This loop has multiple termination conditions:Generic Looping construct. This loop has multiple termination conditions:
11
221) Trip count. Iteration count specified at runtime. Set by1) Trip count. Iteration count specified at runtime. Set by
33 specifying the input M. Optional. Set to empty string to omit. specifying the input M. Optional. Set to empty string to omit.
44 Note that a static trip count (specified at graph construction time) can be Note that a static trip count (specified at graph construction time) can be
55 specified by passing in a constant node for input M. specified by passing in a constant node for input M.
662) Loop termination condition. This is an input to the op that determines2) Loop termination condition. This is an input to the op that determines
77 whether to run the first iteration and also a loop-carried dependency for whether to run the first iteration and also a loop-carried dependency for
88 the body graph. The body graph must yield a value for the condition variable, the body graph. The body graph must yield a value for the condition variable,
99 whether this input is provided or not. whether this input is provided or not.
1010
1111This table summarizes the operating modes of this operator with equivalentThis table summarizes the operating modes of this operator with equivalent
1212C-style code:C-style code:
1313
1414 Operator inputs defined as (max_trip_count, condition_var). Operator inputs defined as (max_trip_count, condition_var).
1515
1616 input ("", ""): input ("", ""):
1717 for (int i=0; ; ++i) { for (int i=0; ; ++i) {
1818 cond = ... // Note this value is ignored, but is required in the body cond = ... // Note this value is ignored, but is required in the body
1919 } }
2020
2121 input ("", cond) // Note this is analogous to a while loop input ("", cond) // Note this is analogous to a while loop
2222 bool cond = ...; bool cond = ...;
2323 for (int i=0; cond; ++i) { for (int i=0; cond; ++i) {
2424 cond = ...; cond = ...;
2525 } }
2626
2727 input ("", 1) // Note this is analogous to a do-while loop input ("", 1) // Note this is analogous to a do-while loop
2828 bool cond = true bool cond = true
2929 for (int i=0; cond; ++i) { for (int i=0; cond; ++i) {
3030 cond = ...; cond = ...;
3131 } }
3232
3333 input (trip_count, "") // Note this is analogous to a for loop input (trip_count, "") // Note this is analogous to a for loop
3434 int trip_count = ... int trip_count = ...
3535 for (int i=0; i < trip_count; ++i) { for (int i=0; i < trip_count; ++i) {
3636 cond = ...; // ignored cond = ...; // ignored
3737 } }
3838
3939 input (trip_count, cond) input (trip_count, cond)
4040 int trip_count = ...; int trip_count = ...;
4141 bool cond = ...; bool cond = ...;
4242 for (int i=0; i < trip_count && cond; ++i) { for (int i=0; i < trip_count && cond; ++i) {
4343 cond = ...; cond = ...;
4444 } }
4545
4646*Sample usage - cond as well as trip count**Sample usage - cond as well as trip count*
4747
4848 graph predict-net { graph predict-net {
4949 %a = Constant[value = ]() %a = Constant[value = ]()
5050 %b = Constant[value = ]() %b = Constant[value = ]()
5151 %keepgoing = Constant[value = ]() %keepgoing = Constant[value = ]()
5252 %max_trip_count = Constant[value = ]() %max_trip_count = Constant[value = ]()
5353 %keepgoing_out, %b_out, %user_defined_vals = Loop[body = ](%max_trip_count, %keepgoing, %b) %keepgoing_out, %b_out, %user_defined_vals = Loop[body = ](%max_trip_count, %keepgoing, %b)
5454 return return
5555 } }
5656
5757 graph body-net ( graph body-net (
5858 %i[INT32, scalar] // iteration number %i[INT32, scalar] // iteration number
5959 %keepgoing_in[BOOL, scalar] // incoming loop-termination-condition; not used %keepgoing_in[BOOL, scalar] // incoming loop-termination-condition; not used
6060 %b_in[INT32, scalar] // incoming value of loop-carried-dependency b %b_in[INT32, scalar] // incoming value of loop-carried-dependency b
6161 ) { ) {
6262 %my_local = Add(%a, %b_in) %my_local = Add(%a, %b_in)
6363 %b_out = Sub(%a, %b_in) // outgoing value of loop-carried-dependency b %b_out = Sub(%a, %b_in) // outgoing value of loop-carried-dependency b
6464 %keepgoing_out = Greater(%my_local, %b_out) // outgoing loop-termination-condition %keepgoing_out = Greater(%my_local, %b_out) // outgoing loop-termination-condition
6565 %user_defined_val = Add(%b_in, %b_in) // scan-output value to be accumulated %user_defined_val = Add(%b_in, %b_in) // scan-output value to be accumulated
6666 return %keepgoing_out, %b_out, %user_defined_val return %keepgoing_out, %b_out, %user_defined_val
6767 } }
6868
6969*Sample equivalent C code**Sample equivalent C code*
7070
7171 { {
7272 /* User-defined code (enclosing scope) */ /* User-defined code (enclosing scope) */
7373 int a = 3, b = 6; int a = 3, b = 6;
7474 bool keepgoing = true; // Analogous to input cond bool keepgoing = true; // Analogous to input cond
7575 /* End user-defined code */ /* End user-defined code */
7676
7777 /* Implicitly-defined code */ /* Implicitly-defined code */
7878 const int max_trip_count = 10; // Analogous to input M const int max_trip_count = 10; // Analogous to input M
7979 int user_defined_vals[]; // Imagine this is resizable int user_defined_vals[]; // Imagine this is resizable
8080 /* End implicitly-defined code */ /* End implicitly-defined code */
8181 /* initialize loop-carried variables and scan-output variables */ /* initialize loop-carried variables and scan-output variables */
8282 bool keepgoing_out = keepgoing bool keepgoing_out = keepgoing
8383 int b_out = b int b_out = b
8484
8585 for (int i=0; i < max_trip_count && keepgoing_out; ++i) { for (int i=0; i < max_trip_count && keepgoing_out; ++i) {
8686 /* Implicitly-defined code: bind actual parameter values /* Implicitly-defined code: bind actual parameter values
8787 to formal parameter variables of loop-body */ to formal parameter variables of loop-body */
8888 bool keepgoing_in = keepgoing_out; bool keepgoing_in = keepgoing_out;
8989 bool b_in = b_out; bool b_in = b_out;
9090
9191 /* User-defined code (loop body) */ /* User-defined code (loop body) */
9292 int my_local = a + b_in; // Reading value "a" from the enclosing scope is fine int my_local = a + b_in; // Reading value "a" from the enclosing scope is fine
9393 b_out = a - b_in; b_out = a - b_in;
9494 keepgoing_out = my_local > b_out; keepgoing_out = my_local > b_out;
9595 user_defined_val = b_in + b_in; // b_in and b_out are different variables user_defined_val = b_in + b_in; // b_in and b_out are different variables
9696 /* End user-defined code */ /* End user-defined code */
9797
9898 /* Implicitly defined-code */ /* Implicitly defined-code */
9999 user_defined_vals[i] = user_defined_val // accumulate scan-output values user_defined_vals[i] = user_defined_val // accumulate scan-output values
100100 } }
101101 // int t = my_local; // Can't do this. my_local is not accessible here. // int t = my_local; // Can't do this. my_local is not accessible here.
102102
103103 // The values below are bound to the output variables of the loop and therefore accessible // The values below are bound to the output variables of the loop and therefore accessible
104104 // b_out; user_defined_vals; keepgoing_out; // b_out; user_defined_vals; keepgoing_out;
105105 } }
106106
107107There are several things of note in this code snippet:There are several things of note in this code snippet:
108108
1091091) Values from the enclosing scope (i.e. variable "a" here) are in scope and can1) Values from the enclosing scope (i.e. variable "a" here) are in scope and can
110110 be referenced in the inputs of the loop. be referenced in the inputs of the loop.
1111112) Any values computed in the loop body that needs to be used in a subsequent2) Any values computed in the loop body that needs to be used in a subsequent
112112 iteration or after the loop are modelled using a pair of variables in the loop-body, iteration or after the loop are modelled using a pair of variables in the loop-body,
113113 consisting of an input variable (eg., b_in) and an output variable (eg., b_out). consisting of an input variable (eg., b_in) and an output variable (eg., b_out).
114114 These are referred to as loop-carried dependences. The loop operation node These are referred to as loop-carried dependences. The loop operation node
115115 supplies the input value of the input variable for the first iteration, and supplies the input value of the input variable for the first iteration, and
116116 returns the output value of the output variable produced by the final returns the output value of the output variable produced by the final
117117 iteration. iteration.
1181183) Scan_output variables are used to implicitly concatenate values computed across3) Scan_output variables are used to implicitly concatenate values computed across
119119 all the iterations. In the above example, the value of user_defined_val computed all the iterations. In the above example, the value of user_defined_val computed
120120 over all iterations are concatenated and returned as the value of user_defined_vals over all iterations are concatenated and returned as the value of user_defined_vals
121121 after the loop. after the loop.
1221224) Values created in the body cannot be accessed in the enclosing scope,4) Values created in the body cannot be accessed in the enclosing scope,
123123 except using the mechanism described above. except using the mechanism described above.
124124
125125Note that the semantics of this op support "diagonal" or "wavefront" execution.Note that the semantics of this op support "diagonal" or "wavefront" execution.
126126(See Step 3 here for an example:(See Step 3 here for an example:
127127https://devblogs.nvidia.com/optimizing-recurrent-neural-networks-cudnn-5/).https://devblogs.nvidia.com/optimizing-recurrent-neural-networks-cudnn-5/).
128128Frontends should emit multi-layer RNNs as a series of While operators (withFrontends should emit multi-layer RNNs as a series of While operators (with
129129time being the inner looping dimension), with each successive layer consumingtime being the inner looping dimension), with each successive layer consuming
130130the scan_outputs from the previous layer, possibly going through severalthe scan_outputs from the previous layer, possibly going through several
131131point-wise operators (e.g. dropout, residual connections, linear layer).point-wise operators (e.g. dropout, residual connections, linear layer).
132132
133133The input/output of subgraph (produced by loop node) matching is based on order instead of name. The implementation will figure out the names based on this order.The input/output of subgraph (produced by loop node) matching is based on order instead of name. The implementation will figure out the names based on this order.
134134
135135**Attributes****Attributes**
136136
137137* **body** (required):* **body** (required):
138138 The graph run each iteration. It has 2+N inputs: (iteration_num, The graph run each iteration. It has 2+N inputs: (iteration_num,
139139 condition, loop carried dependencies...). It has 1+N+K outputs: condition, loop carried dependencies...). It has 1+N+K outputs:
140140 (condition, loop carried dependencies..., scan_outputs...). Each (condition, loop carried dependencies..., scan_outputs...). Each
141141 scan_output is created by concatenating the value of the specified scan_output is created by concatenating the value of the specified
142142 output value at the end of each iteration of the loop. It is an output value at the end of each iteration of the loop. It is an
143143 error if the dimensions or data type of these scan_outputs change error if the dimensions or data type of these scan_outputs change
144144 across loop iterations. across loop iterations.
145145
146146**Inputs****Inputs**
147147
148148Between 2 and 2147483647 inputs.Between 2 and 2147483647 inputs.
149149
150150* **M** (optional, heterogeneous) - **I**:* **M** (optional, heterogeneous) - **I**:
151151 A maximum trip-count for the loop specified at runtime. Optional. A maximum trip-count for the loop specified at runtime. Optional.
152152 Pass empty string to skip. Pass empty string to skip.
153153* **cond** (optional, heterogeneous) - **B**:* **cond** (optional, heterogeneous) - **B**:
154154 A boolean termination condition. Optional. Pass empty string to A boolean termination condition. Optional. Pass empty string to
155155 skip. skip.
156156* **v_initial** (variadic) - **V**:* **v_initial** (variadic) - **V**:
157157 The initial values of any loop-carried dependencies (values that The initial values of any loop-carried dependencies (values that
158158 change across loop iterations) change across loop iterations)
159159
160160**Outputs****Outputs**
161161
162162Between 1 and 2147483647 outputs.Between 1 and 2147483647 outputs.
163163
164164* **v_final_and_scan_outputs** (variadic) - **V**:* **v_final_and_scan_outputs** (variadic) - **V**:
165165 Final N loop carried dependency values then K scan_outputs. Scan Final N loop carried dependency values then K scan_outputs. Scan
166166 outputs must be Tensors. outputs must be Tensors.
167167
168168**Type Constraints****Type Constraints**
169169
170170* **V** in (* **V** in (
171 optional(seq(tensor(bfloat16))),
172 optional(seq(tensor(bool))),
173 optional(seq(tensor(complex128))),
174 optional(seq(tensor(complex64))),
175 optional(seq(tensor(double))),
176 optional(seq(tensor(float))),
177 optional(seq(tensor(float16))),
178 optional(seq(tensor(int16))),
179 optional(seq(tensor(int32))),
180 optional(seq(tensor(int64))),
181 optional(seq(tensor(int8))),
182 optional(seq(tensor(string))),
183 optional(seq(tensor(uint16))),
184 optional(seq(tensor(uint32))),
185 optional(seq(tensor(uint64))),
186 optional(seq(tensor(uint8))),
187 optional(tensor(bfloat16)),
188 optional(tensor(bool)),
189 optional(tensor(complex128)),
190 optional(tensor(complex64)),
191 optional(tensor(double)),
192 optional(tensor(float)),
193 optional(tensor(float16)),
194 optional(tensor(int16)),
195 optional(tensor(int32)),
196 optional(tensor(int64)),
197 optional(tensor(int8)),
198 optional(tensor(string)),
199 optional(tensor(uint16)),
200 optional(tensor(uint32)),
201 optional(tensor(uint64)),
202 optional(tensor(uint8)),
203 seq(tensor(bfloat16)),
171204 seq(tensor(bool)), seq(tensor(bool)),
172205 seq(tensor(complex128)), seq(tensor(complex128)),
173206 seq(tensor(complex64)), seq(tensor(complex64)),
174207 seq(tensor(double)), seq(tensor(double)),
175208 seq(tensor(float)), seq(tensor(float)),
176209 seq(tensor(float16)), seq(tensor(float16)),
177210 seq(tensor(int16)), seq(tensor(int16)),
178211 seq(tensor(int32)), seq(tensor(int32)),
179212 seq(tensor(int64)), seq(tensor(int64)),
180213 seq(tensor(int8)), seq(tensor(int8)),
181214 seq(tensor(string)), seq(tensor(string)),
182215 seq(tensor(uint16)), seq(tensor(uint16)),
183216 seq(tensor(uint32)), seq(tensor(uint32)),
184217 seq(tensor(uint64)), seq(tensor(uint64)),
185218 seq(tensor(uint8)), seq(tensor(uint8)),
219 tensor(bfloat16),
186220 tensor(bool), tensor(bool),
187221 tensor(complex128), tensor(complex128),
188222 tensor(complex64), tensor(complex64),
189223 tensor(double), tensor(double),
190224 tensor(float), tensor(float),
191225 tensor(float16), tensor(float16),
192226 tensor(int16), tensor(int16),
193227 tensor(int32), tensor(int32),
194228 tensor(int64), tensor(int64),
195229 tensor(int8), tensor(int8),
196230 tensor(string), tensor(string),
197231 tensor(uint16), tensor(uint16),
198232 tensor(uint32), tensor(uint32),
199233 tensor(uint64), tensor(uint64),
200234 tensor(uint8) tensor(uint8)
201235 ): ):
202236 All Tensor and Sequence types All Tensor, Sequence(Tensor), Optional(Tensor), and
237 Optional(Sequence(Tensor)) types
203238* **I** in (* **I** in (
204239 tensor(int64) tensor(int64)
205240 ): ):
206241 tensor of int64, which should be a scalar. tensor of int64, which should be a scalar.
207242* **B** in (* **B** in (
208243 tensor(bool) tensor(bool)
209244 ): ):
210245 tensor of bool, which should be a scalar. tensor of bool, which should be a scalar.

Loop - 13#

Version

  • name: Loop (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

Generic Looping construct. This loop has multiple termination conditions:

  1. Trip count. Iteration count specified at runtime. Set by specifying the input M. Optional. Set to empty string to omit. Note that a static trip count (specified at graph construction time) can be specified by passing in a constant node for input M.

  2. Loop termination condition. This is an input to the op that determines whether to run the first iteration and also a loop-carried dependency for the body graph. The body graph must yield a value for the condition variable, whether this input is provided or not.

This table summarizes the operating modes of this operator with equivalent C-style code:

Operator inputs defined as (max_trip_count, condition_var).

input (“”, “”):
for (int i=0; ; ++i) {

cond = … // Note this value is ignored, but is required in the body

}

input (“”, cond) // Note this is analogous to a while loop

bool cond = …; for (int i=0; cond; ++i) {

cond = …;

}

input (“”, 1) // Note this is analogous to a do-while loop

bool cond = true for (int i=0; cond; ++i) {

cond = …;

}

input (trip_count, “”) // Note this is analogous to a for loop

int trip_count = … for (int i=0; i < trip_count; ++i) {

cond = …; // ignored

}

input (trip_count, cond)

int trip_count = …; bool cond = …; for (int i=0; i < trip_count && cond; ++i) {

cond = …;

}

Sample usage - cond as well as trip count

graph predict-net {

%a = Constant[value = <Scalar Tensor [3]>]() %b = Constant[value = <Scalar Tensor [6]>]() %keepgoing = Constant[value = <Scalar Tensor [1]>]() %max_trip_count = Constant[value = <Scalar Tensor [10]>]() %keepgoing_out, %b_out, %user_defined_vals = Loop[body = <graph body-net>](%max_trip_count, %keepgoing, %b) return

}

graph body-net (

%i[INT32, scalar] // iteration number %keepgoing_in[BOOL, scalar] // incoming loop-termination-condition; not used %b_in[INT32, scalar] // incoming value of loop-carried-dependency b

) {

%my_local = Add(%a, %b_in) %b_out = Sub(%a, %b_in) // outgoing value of loop-carried-dependency b %keepgoing_out = Greater(%my_local, %b_out) // outgoing loop-termination-condition %user_defined_val = Add(%b_in, %b_in) // scan-output value to be accumulated return %keepgoing_out, %b_out, %user_defined_val

}

Sample equivalent C code

{

/* User-defined code (enclosing scope) / int a = 3, b = 6; bool keepgoing = true; // Analogous to input cond / End user-defined code */

/* Implicitly-defined code / const int max_trip_count = 10; // Analogous to input M int user_defined_vals[]; // Imagine this is resizable / End implicitly-defined code / / initialize loop-carried variables and scan-output variables */ bool keepgoing_out = keepgoing int b_out = b

for (int i=0; i < max_trip_count && keepgoing_out; ++i) {
/* Implicitly-defined code: bind actual parameter values

to formal parameter variables of loop-body */

bool keepgoing_in = keepgoing_out; bool b_in = b_out;

/* User-defined code (loop body) / int my_local = a + b_in; // Reading value “a” from the enclosing scope is fine b_out = a - b_in; keepgoing_out = my_local > b_out; user_defined_val = b_in + b_in; // b_in and b_out are different variables / End user-defined code */

/* Implicitly defined-code */ user_defined_vals[i] = user_defined_val // accumulate scan-output values

} // int t = my_local; // Can’t do this. my_local is not accessible here.

// The values below are bound to the output variables of the loop and therefore accessible // b_out; user_defined_vals; keepgoing_out;

}

There are several things of note in this code snippet:

  1. Values from the enclosing scope (i.e. variable “a” here) are in scope and can be referenced in the inputs of the loop.

  2. Any values computed in the loop body that needs to be used in a subsequent iteration or after the loop are modelled using a pair of variables in the loop-body, consisting of an input variable (eg., b_in) and an output variable (eg., b_out). These are referred to as loop-carried dependences. The loop operation node supplies the input value of the input variable for the first iteration, and returns the output value of the output variable produced by the final iteration.

  3. Scan_output variables are used to implicitly concatenate values computed across all the iterations. In the above example, the value of user_defined_val computed over all iterations are concatenated and returned as the value of user_defined_vals after the loop.

  4. Values created in the body cannot be accessed in the enclosing scope, except using the mechanism described above.

Note that the semantics of this op support “diagonal” or “wavefront” execution. (See Step 3 here for an example: https://devblogs.nvidia.com/optimizing-recurrent-neural-networks-cudnn-5/). Frontends should emit multi-layer RNNs as a series of While operators (with time being the inner looping dimension), with each successive layer consuming the scan_outputs from the previous layer, possibly going through several point-wise operators (e.g. dropout, residual connections, linear layer).

The input/output of subgraph (produced by loop node) matching is based on order instead of name. The implementation will figure out the names based on this order.

Attributes

  • body (required): The graph run each iteration. It has 2+N inputs: (iteration_num, condition, loop carried dependencies…). It has 1+N+K outputs: (condition, loop carried dependencies…, scan_outputs…). Each scan_output is created by concatenating the value of the specified output value at the end of each iteration of the loop. It is an error if the dimensions or data type of these scan_outputs change across loop iterations.

Inputs

Between 2 and 2147483647 inputs.

  • M (optional, heterogeneous) - I: A maximum trip-count for the loop specified at runtime. Optional. Pass empty string to skip.

  • cond (optional, heterogeneous) - B: A boolean termination condition. Optional. Pass empty string to skip.

  • v_initial (variadic) - V: The initial values of any loop-carried dependencies (values that change across loop iterations)

Outputs

Between 1 and 2147483647 outputs.

  • v_final_and_scan_outputs (variadic) - V: Final N loop carried dependency values then K scan_outputs. Scan outputs must be Tensors.

Type Constraints

  • V in ( seq(tensor(bool)), seq(tensor(complex128)), seq(tensor(complex64)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(string)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)), tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): All Tensor and Sequence types

  • I in ( tensor(int64) ): tensor of int64, which should be a scalar.

  • B in ( tensor(bool) ): tensor of bool, which should be a scalar.

Differences

00Generic Looping construct. This loop has multiple termination conditions:Generic Looping construct. This loop has multiple termination conditions:
11
221) Trip count. Iteration count specified at runtime. Set by1) Trip count. Iteration count specified at runtime. Set by
33 specifying the input M. Optional. Set to empty string to omit. specifying the input M. Optional. Set to empty string to omit.
44 Note that a static trip count (specified at graph construction time) can be Note that a static trip count (specified at graph construction time) can be
55 specified by passing in a constant node for input M. specified by passing in a constant node for input M.
662) Loop termination condition. This is an input to the op that determines2) Loop termination condition. This is an input to the op that determines
77 whether to run the first iteration and also a loop-carried dependency for whether to run the first iteration and also a loop-carried dependency for
88 the body graph. The body graph must yield a value for the condition variable, the body graph. The body graph must yield a value for the condition variable,
99 whether this input is provided or not. whether this input is provided or not.
1010
1111This table summarizes the operating modes of this operator with equivalentThis table summarizes the operating modes of this operator with equivalent
1212C-style code:C-style code:
1313
1414 Operator inputs defined as (max_trip_count, condition_var). Operator inputs defined as (max_trip_count, condition_var).
1515
1616 input ("", ""): input ("", ""):
1717 for (int i=0; ; ++i) { for (int i=0; ; ++i) {
1818 cond = ... // Note this value is ignored, but is required in the body cond = ... // Note this value is ignored, but is required in the body
1919 } }
2020
2121 input ("", cond) // Note this is analogous to a while loop input ("", cond) // Note this is analogous to a while loop
2222 bool cond = ...; bool cond = ...;
2323 for (int i=0; cond; ++i) { for (int i=0; cond; ++i) {
2424 cond = ...; cond = ...;
2525 } }
2626
2727 input ("", 1) // Note this is analogous to a do-while loop input ("", 1) // Note this is analogous to a do-while loop
2828 bool cond = true bool cond = true
2929 for (int i=0; cond; ++i) { for (int i=0; cond; ++i) {
3030 cond = ...; cond = ...;
3131 } }
3232
3333 input (trip_count, "") // Note this is analogous to a for loop input (trip_count, "") // Note this is analogous to a for loop
3434 int trip_count = ... int trip_count = ...
3535 for (int i=0; i < trip_count; ++i) { for (int i=0; i < trip_count; ++i) {
3636 cond = ...; // ignored cond = ...; // ignored
3737 } }
3838
3939 input (trip_count, cond) input (trip_count, cond)
4040 int trip_count = ...; int trip_count = ...;
4141 bool cond = ...; bool cond = ...;
4242 for (int i=0; i < trip_count && cond; ++i) { for (int i=0; i < trip_count && cond; ++i) {
4343 cond = ...; cond = ...;
4444 } }
4545
4646*Sample usage - cond as well as trip count**Sample usage - cond as well as trip count*
4747
4848 graph predict-net { graph predict-net {
4949 %a = Constant[value = ]() %a = Constant[value = ]()
5050 %b = Constant[value = ]() %b = Constant[value = ]()
5151 %keepgoing = Constant[value = ]() %keepgoing = Constant[value = ]()
5252 %max_trip_count = Constant[value = ]() %max_trip_count = Constant[value = ]()
5353 %keepgoing_out, %b_out, %user_defined_vals = Loop[body = ](%max_trip_count, %keepgoing, %b) %keepgoing_out, %b_out, %user_defined_vals = Loop[body = ](%max_trip_count, %keepgoing, %b)
5454 return return
5555 } }
5656
5757 graph body-net ( graph body-net (
5858 %i[INT32, scalar] // iteration number %i[INT32, scalar] // iteration number
5959 %keepgoing_in[BOOL, scalar] // incoming loop-termination-condition; not used %keepgoing_in[BOOL, scalar] // incoming loop-termination-condition; not used
6060 %b_in[INT32, scalar] // incoming value of loop-carried-dependency b %b_in[INT32, scalar] // incoming value of loop-carried-dependency b
6161 ) { ) {
6262 %my_local = Add(%a, %b_in) %my_local = Add(%a, %b_in)
6363 %b_out = Sub(%a, %b_in) // outgoing value of loop-carried-dependency b %b_out = Sub(%a, %b_in) // outgoing value of loop-carried-dependency b
6464 %keepgoing_out = Greater(%my_local, %b_out) // outgoing loop-termination-condition %keepgoing_out = Greater(%my_local, %b_out) // outgoing loop-termination-condition
6565 %user_defined_val = Add(%b_in, %b_in) // scan-output value to be accumulated %user_defined_val = Add(%b_in, %b_in) // scan-output value to be accumulated
6666 return %keepgoing_out, %b_out, %user_defined_val return %keepgoing_out, %b_out, %user_defined_val
6767 } }
6868
6969*Sample equivalent C code**Sample equivalent C code*
7070
7171 { {
7272 /* User-defined code (enclosing scope) */ /* User-defined code (enclosing scope) */
7373 int a = 3, b = 6; int a = 3, b = 6;
7474 bool keepgoing = true; // Analogous to input cond bool keepgoing = true; // Analogous to input cond
7575 /* End user-defined code */ /* End user-defined code */
7676
7777 /* Implicitly-defined code */ /* Implicitly-defined code */
7878 const int max_trip_count = 10; // Analogous to input M const int max_trip_count = 10; // Analogous to input M
7979 int user_defined_vals[]; // Imagine this is resizable int user_defined_vals[]; // Imagine this is resizable
8080 /* End implicitly-defined code */ /* End implicitly-defined code */
8181 /* initialize loop-carried variables and scan-output variables */ /* initialize loop-carried variables and scan-output variables */
8282 bool keepgoing_out = keepgoing bool keepgoing_out = keepgoing
8383 int b_out = b int b_out = b
8484
8585 for (int i=0; i < max_trip_count && keepgoing_out; ++i) { for (int i=0; i < max_trip_count && keepgoing_out; ++i) {
8686 /* Implicitly-defined code: bind actual parameter values /* Implicitly-defined code: bind actual parameter values
8787 to formal parameter variables of loop-body */ to formal parameter variables of loop-body */
8888 bool keepgoing_in = keepgoing_out; bool keepgoing_in = keepgoing_out;
8989 bool b_in = b_out; bool b_in = b_out;
9090
9191 /* User-defined code (loop body) */ /* User-defined code (loop body) */
9292 int my_local = a + b_in; // Reading value "a" from the enclosing scope is fine int my_local = a + b_in; // Reading value "a" from the enclosing scope is fine
9393 b_out = a - b_in; b_out = a - b_in;
9494 keepgoing_out = my_local > b_out; keepgoing_out = my_local > b_out;
9595 user_defined_val = b_in + b_in; // b_in and b_out are different variables user_defined_val = b_in + b_in; // b_in and b_out are different variables
9696 /* End user-defined code */ /* End user-defined code */
9797
9898 /* Implicitly defined-code */ /* Implicitly defined-code */
9999 user_defined_vals[i] = user_defined_val // accumulate scan-output values user_defined_vals[i] = user_defined_val // accumulate scan-output values
100100 } }
101101 // int t = my_local; // Can't do this. my_local is not accessible here. // int t = my_local; // Can't do this. my_local is not accessible here.
102102
103103 // The values below are bound to the output variables of the loop and therefore accessible // The values below are bound to the output variables of the loop and therefore accessible
104104 // b_out; user_defined_vals; keepgoing_out; // b_out; user_defined_vals; keepgoing_out;
105105 } }
106106
107107There are several things of note in this code snippet:There are several things of note in this code snippet:
108108
1091091) Values from the enclosing scope (i.e. variable "a" here) are in scope and can1) Values from the enclosing scope (i.e. variable "a" here) are in scope and can
110110 be referenced in the inputs of the loop. be referenced in the inputs of the loop.
1111112) Any values computed in the loop body that needs to be used in a subsequent2) Any values computed in the loop body that needs to be used in a subsequent
112112 iteration or after the loop are modelled using a pair of variables in the loop-body, iteration or after the loop are modelled using a pair of variables in the loop-body,
113113 consisting of an input variable (eg., b_in) and an output variable (eg., b_out). consisting of an input variable (eg., b_in) and an output variable (eg., b_out).
114114 These are referred to as loop-carried dependences. The loop operation node These are referred to as loop-carried dependences. The loop operation node
115115 supplies the input value of the input variable for the first iteration, and supplies the input value of the input variable for the first iteration, and
116116 returns the output value of the output variable produced by the final returns the output value of the output variable produced by the final
117117 iteration. iteration.
1181183) Scan_output variables are used to implicitly concatenate values computed across3) Scan_output variables are used to implicitly concatenate values computed across
119119 all the iterations. In the above example, the value of user_defined_val computed all the iterations. In the above example, the value of user_defined_val computed
120120 over all iterations are concatenated and returned as the value of user_defined_vals over all iterations are concatenated and returned as the value of user_defined_vals
121121 after the loop. after the loop.
1221224) Values created in the body cannot be accessed in the enclosing scope,4) Values created in the body cannot be accessed in the enclosing scope,
123123 except using the mechanism described above. except using the mechanism described above.
124124
125125Note that the semantics of this op support "diagonal" or "wavefront" execution.Note that the semantics of this op support "diagonal" or "wavefront" execution.
126126(See Step 3 here for an example:(See Step 3 here for an example:
127127https://devblogs.nvidia.com/optimizing-recurrent-neural-networks-cudnn-5/).https://devblogs.nvidia.com/optimizing-recurrent-neural-networks-cudnn-5/).
128128Frontends should emit multi-layer RNNs as a series of While operators (withFrontends should emit multi-layer RNNs as a series of While operators (with
129129time being the inner looping dimension), with each successive layer consumingtime being the inner looping dimension), with each successive layer consuming
130130the scan_outputs from the previous layer, possibly going through severalthe scan_outputs from the previous layer, possibly going through several
131131point-wise operators (e.g. dropout, residual connections, linear layer).point-wise operators (e.g. dropout, residual connections, linear layer).
132132
133The input/output of subgraph (produced by loop node) matching is based on order instead of name. The implementation will figure out the names based on this order.
134
133135**Attributes****Attributes**
134136
135137* **body** (required):* **body** (required):
136138 The graph run each iteration. It has 2+N inputs: (iteration_num, The graph run each iteration. It has 2+N inputs: (iteration_num,
137139 condition, loop carried dependencies...). It has 1+N+K outputs: condition, loop carried dependencies...). It has 1+N+K outputs:
138140 (condition, loop carried dependencies..., scan_outputs...). Each (condition, loop carried dependencies..., scan_outputs...). Each
139141 scan_output is created by concatenating the value of the specified scan_output is created by concatenating the value of the specified
140142 output value at the end of each iteration of the loop. It is an output value at the end of each iteration of the loop. It is an
141143 error if the dimensions or data type of these scan_outputs change error if the dimensions or data type of these scan_outputs change
142144 across loop iterations. across loop iterations.
143145
144146**Inputs****Inputs**
145147
146148Between 2 and 2147483647 inputs.Between 2 and 2147483647 inputs.
147149
148150* **M** (optional, heterogeneous) - **I**:* **M** (optional, heterogeneous) - **I**:
149151 A maximum trip-count for the loop specified at runtime. Optional. A maximum trip-count for the loop specified at runtime. Optional.
150152 Pass empty string to skip. Pass empty string to skip.
151153* **cond** (optional, heterogeneous) - **B**:* **cond** (optional, heterogeneous) - **B**:
152154 A boolean termination condition. Optional. Pass empty string to A boolean termination condition. Optional. Pass empty string to
153155 skip. skip.
154156* **v_initial** (variadic) - **V**:* **v_initial** (variadic) - **V**:
155157 The initial values of any loop-carried dependencies (values that The initial values of any loop-carried dependencies (values that
156158 change across loop iterations) change across loop iterations)
157159
158160**Outputs****Outputs**
159161
160162Between 1 and 2147483647 outputs.Between 1 and 2147483647 outputs.
161163
162164* **v_final_and_scan_outputs** (variadic) - **V**:* **v_final_and_scan_outputs** (variadic) - **V**:
163165 Final N loop carried dependency values then K scan_outputs Final N loop carried dependency values then K scan_outputs. Scan
166 outputs must be Tensors.
164167
165168**Type Constraints****Type Constraints**
166169
167170* **V** in (* **V** in (
171 seq(tensor(bool)),
172 seq(tensor(complex128)),
173 seq(tensor(complex64)),
174 seq(tensor(double)),
175 seq(tensor(float)),
176 seq(tensor(float16)),
177 seq(tensor(int16)),
178 seq(tensor(int32)),
179 seq(tensor(int64)),
180 seq(tensor(int8)),
181 seq(tensor(string)),
182 seq(tensor(uint16)),
183 seq(tensor(uint32)),
184 seq(tensor(uint64)),
185 seq(tensor(uint8)),
168186 tensor(bool), tensor(bool),
169187 tensor(complex128), tensor(complex128),
170188 tensor(complex64), tensor(complex64),
171189 tensor(double), tensor(double),
172190 tensor(float), tensor(float),
173191 tensor(float16), tensor(float16),
174192 tensor(int16), tensor(int16),
175193 tensor(int32), tensor(int32),
176194 tensor(int64), tensor(int64),
177195 tensor(int8), tensor(int8),
178196 tensor(string), tensor(string),
179197 tensor(uint16), tensor(uint16),
180198 tensor(uint32), tensor(uint32),
181199 tensor(uint64), tensor(uint64),
182200 tensor(uint8) tensor(uint8)
183201 ): ):
184202 All Tensor types All Tensor and Sequence types
185203* **I** in (* **I** in (
186204 tensor(int64) tensor(int64)
187205 ): ):
188206 tensor of int64, which should be a scalar. tensor of int64, which should be a scalar.
189207* **B** in (* **B** in (
190208 tensor(bool) tensor(bool)
191209 ): ):
192210 tensor of bool, which should be a scalar. tensor of bool, which should be a scalar.

Loop - 11#

Version

  • name: Loop (GitHub)

  • domain: main

  • since_version: 11

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 11.

Summary

Generic Looping construct. This loop has multiple termination conditions:

  1. Trip count. Iteration count specified at runtime. Set by specifying the input M. Optional. Set to empty string to omit. Note that a static trip count (specified at graph construction time) can be specified by passing in a constant node for input M.

  2. Loop termination condition. This is an input to the op that determines whether to run the first iteration and also a loop-carried dependency for the body graph. The body graph must yield a value for the condition variable, whether this input is provided or not.

This table summarizes the operating modes of this operator with equivalent C-style code:

Operator inputs defined as (max_trip_count, condition_var).

input (“”, “”):
for (int i=0; ; ++i) {

cond = … // Note this value is ignored, but is required in the body

}

input (“”, cond) // Note this is analogous to a while loop

bool cond = …; for (int i=0; cond; ++i) {

cond = …;

}

input (“”, 1) // Note this is analogous to a do-while loop

bool cond = true for (int i=0; cond; ++i) {

cond = …;

}

input (trip_count, “”) // Note this is analogous to a for loop

int trip_count = … for (int i=0; i < trip_count; ++i) {

cond = …; // ignored

}

input (trip_count, cond)

int trip_count = …; bool cond = …; for (int i=0; i < trip_count && cond; ++i) {

cond = …;

}

Sample usage - cond as well as trip count

graph predict-net {

%a = Constant[value = <Scalar Tensor [3]>]() %b = Constant[value = <Scalar Tensor [6]>]() %keepgoing = Constant[value = <Scalar Tensor [1]>]() %max_trip_count = Constant[value = <Scalar Tensor [10]>]() %keepgoing_out, %b_out, %user_defined_vals = Loop[body = <graph body-net>](%max_trip_count, %keepgoing, %b) return

}

graph body-net (

%i[INT32, scalar] // iteration number %keepgoing_in[BOOL, scalar] // incoming loop-termination-condition; not used %b_in[INT32, scalar] // incoming value of loop-carried-dependency b

) {

%my_local = Add(%a, %b_in) %b_out = Sub(%a, %b_in) // outgoing value of loop-carried-dependency b %keepgoing_out = Greater(%my_local, %b_out) // outgoing loop-termination-condition %user_defined_val = Add(%b_in, %b_in) // scan-output value to be accumulated return %keepgoing_out, %b_out, %user_defined_val

}

Sample equivalent C code

{

/* User-defined code (enclosing scope) / int a = 3, b = 6; bool keepgoing = true; // Analogous to input cond / End user-defined code */

/* Implicitly-defined code / const int max_trip_count = 10; // Analogous to input M int user_defined_vals[]; // Imagine this is resizable / End implicitly-defined code / / initialize loop-carried variables and scan-output variables */ bool keepgoing_out = keepgoing int b_out = b

for (int i=0; i < max_trip_count && keepgoing_out; ++i) {
/* Implicitly-defined code: bind actual parameter values

to formal parameter variables of loop-body */

bool keepgoing_in = keepgoing_out; bool b_in = b_out;

/* User-defined code (loop body) / int my_local = a + b_in; // Reading value “a” from the enclosing scope is fine b_out = a - b_in; keepgoing_out = my_local > b_out; user_defined_val = b_in + b_in; // b_in and b_out are different variables / End user-defined code */

/* Implicitly defined-code */ user_defined_vals[i] = user_defined_val // accumulate scan-output values

} // int t = my_local; // Can’t do this. my_local is not accessible here.

// The values below are bound to the output variables of the loop and therefore accessible // b_out; user_defined_vals; keepgoing_out;

}

There are several things of note in this code snippet:

  1. Values from the enclosing scope (i.e. variable “a” here) are in scope and can be referenced in the inputs of the loop.

  2. Any values computed in the loop body that needs to be used in a subsequent iteration or after the loop are modelled using a pair of variables in the loop-body, consisting of an input variable (eg., b_in) and an output variable (eg., b_out). These are referred to as loop-carried dependences. The loop operation node supplies the input value of the input variable for the first iteration, and returns the output value of the output variable produced by the final iteration.

  3. Scan_output variables are used to implicitly concatenate values computed across all the iterations. In the above example, the value of user_defined_val computed over all iterations are concatenated and returned as the value of user_defined_vals after the loop.

  4. Values created in the body cannot be accessed in the enclosing scope, except using the mechanism described above.

Note that the semantics of this op support “diagonal” or “wavefront” execution. (See Step 3 here for an example: https://devblogs.nvidia.com/optimizing-recurrent-neural-networks-cudnn-5/). Frontends should emit multi-layer RNNs as a series of While operators (with time being the inner looping dimension), with each successive layer consuming the scan_outputs from the previous layer, possibly going through several point-wise operators (e.g. dropout, residual connections, linear layer).

Attributes

  • body (required): The graph run each iteration. It has 2+N inputs: (iteration_num, condition, loop carried dependencies…). It has 1+N+K outputs: (condition, loop carried dependencies…, scan_outputs…). Each scan_output is created by concatenating the value of the specified output value at the end of each iteration of the loop. It is an error if the dimensions or data type of these scan_outputs change across loop iterations.

Inputs

Between 2 and 2147483647 inputs.

  • M (optional, heterogeneous) - I: A maximum trip-count for the loop specified at runtime. Optional. Pass empty string to skip.

  • cond (optional, heterogeneous) - B: A boolean termination condition. Optional. Pass empty string to skip.

  • v_initial (variadic) - V: The initial values of any loop-carried dependencies (values that change across loop iterations)

Outputs

Between 1 and 2147483647 outputs.

  • v_final_and_scan_outputs (variadic) - V: Final N loop carried dependency values then K scan_outputs

Type Constraints

  • V in ( tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): All Tensor types

  • I in ( tensor(int64) ): tensor of int64, which should be a scalar.

  • B in ( tensor(bool) ): tensor of bool, which should be a scalar.

Differences

00Generic Looping construct. This loop has multiple termination conditions:Generic Looping construct. This loop has multiple termination conditions:
11
221) Trip count. Iteration count specified at runtime. Set by1) Trip count. Iteration count specified at runtime. Set by
33 specifying the input M. Optional. Set to empty string to omit. specifying the input M. Optional. Set to empty string to omit.
44 Note that a static trip count (specified at graph construction time) can be Note that a static trip count (specified at graph construction time) can be
55 specified by passing in a constant node for input M. specified by passing in a constant node for input M.
662) Loop termination condition. This is an input to the op that determines2) Loop termination condition. This is an input to the op that determines
77 whether to run the first iteration and also a loop-carried dependency for whether to run the first iteration and also a loop-carried dependency for
88 the body graph. The body graph must yield a value for the condition variable, the body graph. The body graph must yield a value for the condition variable,
99 whether this input is provided or not. whether this input is provided or not.
1010
1111This table summarizes the operating modes of this operator with equivalentThis table summarizes the operating modes of this operator with equivalent
1212C-style code:C-style code:
1313
1414 Operator inputs defined as (max_trip_count, condition_var). Operator inputs defined as (max_trip_count, condition_var).
1515
1616 input ("", ""): input ("", ""):
1717 for (int i=0; ; ++i) { for (int i=0; ; ++i) {
1818 cond = ... // Note this value is ignored, but is required in the body cond = ... // Note this value is ignored, but is required in the body
1919 } }
2020
2121 input ("", cond) // Note this is analogous to a while loop input ("", cond) // Note this is analogous to a while loop
2222 bool cond = ...; bool cond = ...;
2323 for (int i=0; cond; ++i) { for (int i=0; cond; ++i) {
2424 cond = ...; cond = ...;
2525 } }
2626
2727 input ("", 1) // Note this is analogous to a do-while loop input ("", 1) // Note this is analogous to a do-while loop
2828 bool cond = true bool cond = true
2929 for (int i=0; cond; ++i) { for (int i=0; cond; ++i) {
3030 cond = ...; cond = ...;
3131 } }
3232
3333 input (trip_count, "") // Note this is analogous to a for loop input (trip_count, "") // Note this is analogous to a for loop
3434 int trip_count = ... int trip_count = ...
3535 for (int i=0; i < trip_count; ++i) { for (int i=0; i < trip_count; ++i) {
3636 cond = ...; // ignored cond = ...; // ignored
3737 } }
3838
3939 input (trip_count, cond) input (trip_count, cond)
4040 int trip_count = ...; int trip_count = ...;
4141 bool cond = ...; bool cond = ...;
4242 for (int i=0; i < trip_count && cond; ++i) { for (int i=0; i < trip_count && cond; ++i) {
4343 cond = ...; cond = ...;
4444 } }
4545
4646*Sample usage - cond as well as trip count**Sample usage - cond as well as trip count*
4747
4848 graph predict-net { graph predict-net {
4949 %a = Constant[value = ]() %a = Constant[value = ]()
5050 %b = Constant[value = ]() %b = Constant[value = ]()
5151 %keepgoing = Constant[value = ]() %keepgoing = Constant[value = ]()
5252 %max_trip_count = Constant[value = ]() %max_trip_count = Constant[value = ]()
5353 %keepgoing_out, %b_out, %user_defined_vals = Loop[body = ](%max_trip_count, %keepgoing, %b) %keepgoing_out, %b_out, %user_defined_vals = Loop[body = ](%max_trip_count, %keepgoing, %b)
5454 return return
5555 } }
5656
5757 graph body-net ( graph body-net (
5858 %i[INT32, scalar] %i[INT32, scalar] // iteration number
5959 %keepgoing[BOOL, scalar] %keepgoing_in[BOOL, scalar] // incoming loop-termination-condition; not used
6060 %b[INT32, scalar] %b_in[INT32, scalar] // incoming value of loop-carried-dependency b
6161 ) { ) {
6262 %my_local = Add(%a, %b) %my_local = Add(%a, %b_in)
6363 %b_out = Sub(%a, %b) %b_out = Sub(%a, %b_in) // outgoing value of loop-carried-dependency b
6464 %keepgoing_out = Greater(%my_local, %b_out) %keepgoing_out = Greater(%my_local, %b_out) // outgoing loop-termination-condition
6565 %user_defined_vals = Add(%b, %b) %user_defined_val = Add(%b_in, %b_in) // scan-output value to be accumulated
6666 return %keepgoing_out, %b_out, %user_defined_vals return %keepgoing_out, %b_out, %user_defined_val
6767 } }
6868
6969*Sample equivalent C code**Sample equivalent C code*
7070
7171 { {
7272 /* User-defined code (enclosing scope) */ /* User-defined code (enclosing scope) */
7373 int a = 3, b = 6; int a = 3, b = 6;
7474 bool keepgoing = true; // Analogous to input cond bool keepgoing = true; // Analogous to input cond
7575 /* End user-defined code */ /* End user-defined code */
7676
7777 /* Implicitly-defined code */ /* Implicitly-defined code */
7878 const int max_trip_count = 10; // Analogous to input M const int max_trip_count = 10; // Analogous to input M
7979 int user_defined_vals[]; // Imagine this is resizable int user_defined_vals[]; // Imagine this is resizable
8080 /* End implicitly-defined code */ /* End implicitly-defined code */
81 /* initialize loop-carried variables and scan-output variables */
82 bool keepgoing_out = keepgoing
83 int b_out = b
84
8185 for (int i=0; i < max_trip_count && keepgoing; ++i) { for (int i=0; i < max_trip_count && keepgoing_out; ++i) {
86 /* Implicitly-defined code: bind actual parameter values
87 to formal parameter variables of loop-body */
88 bool keepgoing_in = keepgoing_out;
89 bool b_in = b_out;
90
8291 /* User-defined code (loop body) */ /* User-defined code (loop body) */
8392 int my_local = a + b; // Reading values in the enclosing scope is fine int my_local = a + b_in; // Reading value "a" from the enclosing scope is fine
93 b_out = a - b_in;
8494 b = a - b; // writes fine if we specify b as a loop-carried dependency keepgoing_out = my_local > b_out;
95 user_defined_val = b_in + b_in; // b_in and b_out are different variables
96 /* End user-defined code */
97
8598 keepgoing = my_local > b; // keepgoing is a loop-carried dependency /* Implicitly defined-code */
8699 user_defined_vals[i] = b + b; user_defined_vals[i] = user_defined_val // accumulate scan-output values
100 }
87101 /* End user-defined code */ // int t = my_local; // Can't do this. my_local is not accessible here.
88 }
89 // my_local = 123; // Can't do this. my_local was defined in the the body
90102
91103 // These below values are live-out from the loop and therefore accessible // The values below are bound to the output variables of the loop and therefore accessible
92104 b_out; user_defined_vals; keepgoing_out; // b_out; user_defined_vals; keepgoing_out;
93105 } }
94106
95107There are several things of note in this code snippet:There are several things of note in this code snippet:
96108
971091) Values from the enclosing scope (i.e. variable a here) are in scope and can1) Values from the enclosing scope (i.e. variable "a" here) are in scope and can
98110 be referenced in the inputs of the loop. be referenced in the inputs of the loop.
1112) Any values computed in the loop body that needs to be used in a subsequent
112 iteration or after the loop are modelled using a pair of variables in the loop-body,
113 consisting of an input variable (eg., b_in) and an output variable (eg., b_out).
114 These are referred to as loop-carried dependences. The loop operation node
115 supplies the input value of the input variable for the first iteration, and
116 returns the output value of the output variable produced by the final
991172) Any variables which you wish to make available in the enclosing scope (i.e. iteration.
100 the variables b and keepgoing) must be declared as either loop-carried
101 dependencies (both at the op inputs and output and at the body net input and
102118 output) or scan_outputs.3) Scan_output variables are used to implicitly concatenate values computed across
119 all the iterations. In the above example, the value of user_defined_val computed
120 over all iterations are concatenated and returned as the value of user_defined_vals
121 after the loop.
1031223) Values created in the body cannot be accessed in the enclosing scope.4) Values created in the body cannot be accessed in the enclosing scope,
123 except using the mechanism described above.
104124
105125Note that the semantics of this op support "diagonal" or "wavefront" execution.Note that the semantics of this op support "diagonal" or "wavefront" execution.
106126(See Step 3 here for an example:(See Step 3 here for an example:
107127https://devblogs.nvidia.com/optimizing-recurrent-neural-networks-cudnn-5/).https://devblogs.nvidia.com/optimizing-recurrent-neural-networks-cudnn-5/).
108128Frontends should emit multi-layer RNNs as a series of While operators (withFrontends should emit multi-layer RNNs as a series of While operators (with
109129time being the inner looping dimension), with each successive layer consumingtime being the inner looping dimension), with each successive layer consuming
110130the scan_outputs from the previous layer, possibly going through severalthe scan_outputs from the previous layer, possibly going through several
111131point-wise operators (e.g. dropout, residual connections, linear layer).point-wise operators (e.g. dropout, residual connections, linear layer).
112132
113133**Attributes****Attributes**
114134
115135* **body** (required):* **body** (required):
116136 The graph run each iteration. It has 2+N inputs: (iteration_num, The graph run each iteration. It has 2+N inputs: (iteration_num,
117137 condition, loop carried dependencies...). It has 1+N+K outputs: condition, loop carried dependencies...). It has 1+N+K outputs:
118138 (condition, loop carried dependencies..., scan_outputs...). Each (condition, loop carried dependencies..., scan_outputs...). Each
119139 scan_output is created by concatenating the value of the specified scan_output is created by concatenating the value of the specified
120140 output value at the end of each iteration of the loop. It is an output value at the end of each iteration of the loop. It is an
121141 error if the dimensions or data type of these scan_outputs change error if the dimensions or data type of these scan_outputs change
122142 across loop iterations. across loop iterations.
123143
124144**Inputs****Inputs**
125145
126146Between 3 and 2147483647 inputs.Between 2 and 2147483647 inputs.
127147
128148* **M** (optional, heterogeneous) - **I**:* **M** (optional, heterogeneous) - **I**:
129149 A maximum trip-count for the loop specified at runtime. Optional. A maximum trip-count for the loop specified at runtime. Optional.
130150 Pass empty string to skip. Pass empty string to skip.
131151* **cond** (optional, heterogeneous) - **B**:* **cond** (optional, heterogeneous) - **B**:
132152 A boolean termination condition. Optional. Pass empty string to A boolean termination condition. Optional. Pass empty string to
133153 skip. skip.
134154* **v_initial** (variadic) - **V**:* **v_initial** (variadic) - **V**:
135155 The initial values of any loop-carried dependencies (values that The initial values of any loop-carried dependencies (values that
136156 change across loop iterations) change across loop iterations)
137157
138158**Outputs****Outputs**
139159
140160Between 1 and 2147483647 outputs.Between 1 and 2147483647 outputs.
141161
142162* **v_final_and_scan_outputs** (variadic) - **V**:* **v_final_and_scan_outputs** (variadic) - **V**:
143163 Final N loop carried dependency values then K scan_outputs Final N loop carried dependency values then K scan_outputs
144164
145165**Type Constraints****Type Constraints**
146166
147167* **V** in (* **V** in (
148168 tensor(bool), tensor(bool),
149169 tensor(complex128), tensor(complex128),
150170 tensor(complex64), tensor(complex64),
151171 tensor(double), tensor(double),
152172 tensor(float), tensor(float),
153173 tensor(float16), tensor(float16),
154174 tensor(int16), tensor(int16),
155175 tensor(int32), tensor(int32),
156176 tensor(int64), tensor(int64),
157177 tensor(int8), tensor(int8),
158178 tensor(string), tensor(string),
159179 tensor(uint16), tensor(uint16),
160180 tensor(uint32), tensor(uint32),
161181 tensor(uint64), tensor(uint64),
162182 tensor(uint8) tensor(uint8)
163183 ): ):
164184 All Tensor types All Tensor types
165185* **I** in (* **I** in (
166186 tensor(int64) tensor(int64)
167187 ): ):
168188 tensor of int64, which should be a scalar. tensor of int64, which should be a scalar.
169189* **B** in (* **B** in (
170190 tensor(bool) tensor(bool)
171191 ): ):
172192 tensor of bool, which should be a scalar. tensor of bool, which should be a scalar.

Loop - 1#

Version

  • name: Loop (GitHub)

  • domain: main

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 1.

Summary

Generic Looping construct. This loop has multiple termination conditions:

  1. Trip count. Iteration count specified at runtime. Set by specifying the input M. Optional. Set to empty string to omit. Note that a static trip count (specified at graph construction time) can be specified by passing in a constant node for input M.

  2. Loop termination condition. This is an input to the op that determines whether to run the first iteration and also a loop-carried dependency for the body graph. The body graph must yield a value for the condition variable, whether this input is provided or not.

This table summarizes the operating modes of this operator with equivalent C-style code:

Operator inputs defined as (max_trip_count, condition_var).

input (“”, “”):
for (int i=0; ; ++i) {

cond = … // Note this value is ignored, but is required in the body

}

input (“”, cond) // Note this is analogous to a while loop

bool cond = …; for (int i=0; cond; ++i) {

cond = …;

}

input (“”, 1) // Note this is analogous to a do-while loop

bool cond = true for (int i=0; cond; ++i) {

cond = …;

}

input (trip_count, “”) // Note this is analogous to a for loop

int trip_count = … for (int i=0; i < trip_count; ++i) {

cond = …; // ignored

}

input (trip_count, cond)

int trip_count = …; bool cond = …; for (int i=0; i < trip_count && cond; ++i) {

cond = …;

}

Sample usage - cond as well as trip count

graph predict-net {

%a = Constant[value = <Scalar Tensor [3]>]() %b = Constant[value = <Scalar Tensor [6]>]() %keepgoing = Constant[value = <Scalar Tensor [1]>]() %max_trip_count = Constant[value = <Scalar Tensor [10]>]() %keepgoing_out, %b_out, %user_defined_vals = Loop[body = <graph body-net>](%max_trip_count, %keepgoing, %b) return

}

graph body-net (

%i[INT32, scalar] %keepgoing[BOOL, scalar] %b[INT32, scalar]

) {

%my_local = Add(%a, %b) %b_out = Sub(%a, %b) %keepgoing_out = Greater(%my_local, %b_out) %user_defined_vals = Add(%b, %b) return %keepgoing_out, %b_out, %user_defined_vals

}

Sample equivalent C code

{

/* User-defined code (enclosing scope) / int a = 3, b = 6; bool keepgoing = true; // Analogous to input cond / End user-defined code */

/* Implicitly-defined code / const int max_trip_count = 10; // Analogous to input M int user_defined_vals[]; // Imagine this is resizable / End implicitly-defined code */ for (int i=0; i < max_trip_count && keepgoing; ++i) {

/* User-defined code (loop body) / int my_local = a + b; // Reading values in the enclosing scope is fine b = a - b; // writes fine if we specify b as a loop-carried dependency keepgoing = my_local > b; // keepgoing is a loop-carried dependency user_defined_vals[i] = b + b; / End user-defined code */

} // my_local = 123; // Can’t do this. my_local was defined in the the body

// These below values are live-out from the loop and therefore accessible b_out; user_defined_vals; keepgoing_out;

}

There are several things of note in this code snippet:

  1. Values from the enclosing scope (i.e. variable a here) are in scope and can be referenced in the inputs of the loop.

  2. Any variables which you wish to make available in the enclosing scope (i.e. the variables b and keepgoing) must be declared as either loop-carried dependencies (both at the op inputs and output and at the body net input and output) or scan_outputs.

  3. Values created in the body cannot be accessed in the enclosing scope.

Note that the semantics of this op support “diagonal” or “wavefront” execution. (See Step 3 here for an example: https://devblogs.nvidia.com/optimizing-recurrent-neural-networks-cudnn-5/). Frontends should emit multi-layer RNNs as a series of While operators (with time being the inner looping dimension), with each successive layer consuming the scan_outputs from the previous layer, possibly going through several point-wise operators (e.g. dropout, residual connections, linear layer).

Attributes

  • body (required): The graph run each iteration. It has 2+N inputs: (iteration_num, condition, loop carried dependencies…). It has 1+N+K outputs: (condition, loop carried dependencies…, scan_outputs…). Each scan_output is created by concatenating the value of the specified output value at the end of each iteration of the loop. It is an error if the dimensions or data type of these scan_outputs change across loop iterations.

Inputs

Between 3 and 2147483647 inputs.

  • M (optional, heterogeneous) - I: A maximum trip-count for the loop specified at runtime. Optional. Pass empty string to skip.

  • cond (optional, heterogeneous) - B: A boolean termination condition. Optional. Pass empty string to skip.

  • v_initial (variadic) - V: The initial values of any loop-carried dependencies (values that change across loop iterations)

Outputs

Between 1 and 2147483647 outputs.

  • v_final_and_scan_outputs (variadic) - V: Final N loop carried dependency values then K scan_outputs

Type Constraints

  • V in ( tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): All Tensor types

  • I in ( tensor(int64) ): tensor of int64, which should be a scalar.

  • B in ( tensor(bool) ): tensor of bool, which should be a scalar.