Loop#
Loop - 16#
Version
name: Loop (GitHub)
domain: main
since_version: 16
function: False
support_level: SupportType.COMMON
shape inference: True
This version of the operator has been available since version 16.
Summary
Generic Looping construct. This loop has multiple termination conditions:
Trip count. Iteration count specified at runtime. Set by specifying the input M. Optional. Set to empty string to omit. Note that a static trip count (specified at graph construction time) can be specified by passing in a constant node for input M.
Loop termination condition. This is an input to the op that determines whether to run the first iteration and also a loop-carried dependency for the body graph. The body graph must yield a value for the condition variable, whether this input is provided or not.
This table summarizes the operating modes of this operator with equivalent C-style code:
Operator inputs defined as (max_trip_count, condition_var).
- input (“”, “”):
- for (int i=0; ; ++i) {
cond = … // Note this value is ignored, but is required in the body
}
- input (“”, cond) // Note this is analogous to a while loop
bool cond = …; for (int i=0; cond; ++i) {
cond = …;
}
- input (“”, 1) // Note this is analogous to a do-while loop
bool cond = true for (int i=0; cond; ++i) {
cond = …;
}
- input (trip_count, “”) // Note this is analogous to a for loop
int trip_count = … for (int i=0; i < trip_count; ++i) {
cond = …; // ignored
}
- input (trip_count, cond)
int trip_count = …; bool cond = …; for (int i=0; i < trip_count && cond; ++i) {
cond = …;
}
Sample usage - cond as well as trip count
- graph predict-net {
%a = Constant[value = <Scalar Tensor [3]>]() %b = Constant[value = <Scalar Tensor [6]>]() %keepgoing = Constant[value = <Scalar Tensor [1]>]() %max_trip_count = Constant[value = <Scalar Tensor [10]>]() %keepgoing_out, %b_out, %user_defined_vals = Loop[body = <graph body-net>](%max_trip_count, %keepgoing, %b) return
}
- graph body-net (
%i[INT32, scalar] // iteration number %keepgoing_in[BOOL, scalar] // incoming loop-termination-condition; not used %b_in[INT32, scalar] // incoming value of loop-carried-dependency b
- ) {
%my_local = Add(%a, %b_in) %b_out = Sub(%a, %b_in) // outgoing value of loop-carried-dependency b %keepgoing_out = Greater(%my_local, %b_out) // outgoing loop-termination-condition %user_defined_val = Add(%b_in, %b_in) // scan-output value to be accumulated return %keepgoing_out, %b_out, %user_defined_val
}
Sample equivalent C code
- {
/* User-defined code (enclosing scope) / int a = 3, b = 6; bool keepgoing = true; // Analogous to input cond / End user-defined code */
/* Implicitly-defined code / const int max_trip_count = 10; // Analogous to input M int user_defined_vals[]; // Imagine this is resizable / End implicitly-defined code / / initialize loop-carried variables and scan-output variables */ bool keepgoing_out = keepgoing int b_out = b
- for (int i=0; i < max_trip_count && keepgoing_out; ++i) {
- /* Implicitly-defined code: bind actual parameter values
to formal parameter variables of loop-body */
bool keepgoing_in = keepgoing_out; bool b_in = b_out;
/* User-defined code (loop body) / int my_local = a + b_in; // Reading value “a” from the enclosing scope is fine b_out = a - b_in; keepgoing_out = my_local > b_out; user_defined_val = b_in + b_in; // b_in and b_out are different variables / End user-defined code */
/* Implicitly defined-code */ user_defined_vals[i] = user_defined_val // accumulate scan-output values
} // int t = my_local; // Can’t do this. my_local is not accessible here.
// The values below are bound to the output variables of the loop and therefore accessible // b_out; user_defined_vals; keepgoing_out;
}
There are several things of note in this code snippet:
Values from the enclosing scope (i.e. variable “a” here) are in scope and can be referenced in the inputs of the loop.
Any values computed in the loop body that needs to be used in a subsequent iteration or after the loop are modelled using a pair of variables in the loop-body, consisting of an input variable (eg., b_in) and an output variable (eg., b_out). These are referred to as loop-carried dependences. The loop operation node supplies the input value of the input variable for the first iteration, and returns the output value of the output variable produced by the final iteration.
Scan_output variables are used to implicitly concatenate values computed across all the iterations. In the above example, the value of user_defined_val computed over all iterations are concatenated and returned as the value of user_defined_vals after the loop.
Values created in the body cannot be accessed in the enclosing scope, except using the mechanism described above.
Note that the semantics of this op support “diagonal” or “wavefront” execution. (See Step 3 here for an example: https://devblogs.nvidia.com/optimizing-recurrent-neural-networks-cudnn-5/). Frontends should emit multi-layer RNNs as a series of While operators (with time being the inner looping dimension), with each successive layer consuming the scan_outputs from the previous layer, possibly going through several point-wise operators (e.g. dropout, residual connections, linear layer).
The input/output of subgraph (produced by loop node) matching is based on order instead of name. The implementation will figure out the names based on this order.
Attributes
body (required): The graph run each iteration. It has 2+N inputs: (iteration_num, condition, loop carried dependencies…). It has 1+N+K outputs: (condition, loop carried dependencies…, scan_outputs…). Each scan_output is created by concatenating the value of the specified output value at the end of each iteration of the loop. It is an error if the dimensions or data type of these scan_outputs change across loop iterations.
Inputs
Between 2 and 2147483647 inputs.
M (optional, heterogeneous) - I: A maximum trip-count for the loop specified at runtime. Optional. Pass empty string to skip.
cond (optional, heterogeneous) - B: A boolean termination condition. Optional. Pass empty string to skip.
v_initial (variadic) - V: The initial values of any loop-carried dependencies (values that change across loop iterations)
Outputs
Between 1 and 2147483647 outputs.
v_final_and_scan_outputs (variadic) - V: Final N loop carried dependency values then K scan_outputs. Scan outputs must be Tensors.
Type Constraints
V in ( optional(seq(tensor(bfloat16))), optional(seq(tensor(bool))), optional(seq(tensor(complex128))), optional(seq(tensor(complex64))), optional(seq(tensor(double))), optional(seq(tensor(float))), optional(seq(tensor(float16))), optional(seq(tensor(int16))), optional(seq(tensor(int32))), optional(seq(tensor(int64))), optional(seq(tensor(int8))), optional(seq(tensor(string))), optional(seq(tensor(uint16))), optional(seq(tensor(uint32))), optional(seq(tensor(uint64))), optional(seq(tensor(uint8))), optional(tensor(bfloat16)), optional(tensor(bool)), optional(tensor(complex128)), optional(tensor(complex64)), optional(tensor(double)), optional(tensor(float)), optional(tensor(float16)), optional(tensor(int16)), optional(tensor(int32)), optional(tensor(int64)), optional(tensor(int8)), optional(tensor(string)), optional(tensor(uint16)), optional(tensor(uint32)), optional(tensor(uint64)), optional(tensor(uint8)), seq(tensor(bfloat16)), seq(tensor(bool)), seq(tensor(complex128)), seq(tensor(complex64)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(string)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)), tensor(bfloat16), tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): All Tensor, Sequence(Tensor), Optional(Tensor), and Optional(Sequence(Tensor)) types
I in ( tensor(int64) ): tensor of int64, which should be a scalar.
B in ( tensor(bool) ): tensor of bool, which should be a scalar.
Examples
loop_11
# Given a tensor x of values [x1, ..., xN], and initial tensor y
# sum up its elements using a scan
# returning the final state (y+x1+x2+...+xN) as well the scan_output
# [y+x1, y+x1+x2, ..., y+x1+x2+...+xN]
y_in = onnx.helper.make_tensor_value_info('y_in', onnx.TensorProto.FLOAT, [1])
y_out = onnx.helper.make_tensor_value_info('y_out', onnx.TensorProto.FLOAT, [1])
scan_out = onnx.helper.make_tensor_value_info('scan_out', onnx.TensorProto.FLOAT, [1])
cond_in = onnx.helper.make_tensor_value_info('cond_in', onnx.TensorProto.BOOL, [])
cond_out = onnx.helper.make_tensor_value_info('cond_out', onnx.TensorProto.BOOL, [])
iter_count = onnx.helper.make_tensor_value_info('iter_count', onnx.TensorProto.INT64, [])
x = np.array([1, 2, 3, 4, 5]).astype(np.float32)
y = np.array([-2]).astype(np.float32)
x_const_node = onnx.helper.make_node(
'Constant',
inputs=[],
outputs=['x'],
value=onnx.helper.make_tensor(
name='const_tensor_x',
data_type=onnx.TensorProto.FLOAT,
dims=x.shape,
vals=x.flatten().astype(float),
)
)
one_const_node = onnx.helper.make_node(
'Constant',
inputs=[],
outputs=['one'],
value=onnx.helper.make_tensor(
name='const_tensor_one',
data_type=onnx.TensorProto.INT64,
dims=(),
vals=[1]
)
)
i_add_node = onnx.helper.make_node(
'Add',
inputs=['iter_count', 'one'],
outputs=['end']
)
start_unsqueeze_node = onnx.helper.make_node(
'Unsqueeze',
inputs=['iter_count'],
outputs=['slice_start'],
axes=[0]
)
end_unsqueeze_node = onnx.helper.make_node(
'Unsqueeze',
inputs=['end'],
outputs=['slice_end'],
axes=[0]
)
slice_node = onnx.helper.make_node(
'Slice',
inputs=['x', 'slice_start', 'slice_end'],
outputs=['slice_out']
)
y_add_node = onnx.helper.make_node(
'Add',
inputs=['y_in', 'slice_out'],
outputs=['y_out']
)
identity_node = onnx.helper.make_node(
'Identity',
inputs=['cond_in'],
outputs=['cond_out']
)
scan_identity_node = onnx.helper.make_node(
'Identity',
inputs=['y_out'],
outputs=['scan_out']
)
loop_body = onnx.helper.make_graph(
[identity_node, x_const_node, one_const_node, i_add_node,
start_unsqueeze_node, end_unsqueeze_node, slice_node, y_add_node,
scan_identity_node],
'loop_body',
[iter_count, cond_in, y_in],
[cond_out, y_out, scan_out]
)
node = onnx.helper.make_node(
'Loop',
inputs=['trip_count', 'cond', 'y'],
outputs=['res_y', 'res_scan'],
body=loop_body
)
trip_count = np.array(5).astype(np.int64)
res_y = np.array([13]).astype(np.float32)
cond = np.array(1).astype(bool)
res_scan = np.array([-1, 1, 4, 8, 13]).astype(np.float32).reshape((5, 1))
expect(node, inputs=[trip_count, cond, y], outputs=[res_y, res_scan],
name='test_loop11', opset_imports=[onnx.helper.make_opsetid("", 11)])
loop_13
# Given a tensor x of values [x1, ..., xN],
# Return a sequence of tensors of
# [[x1], [x1, x2], ..., [x1, ..., xN]]
seq_in = onnx.helper.make_tensor_sequence_value_info('seq_in', onnx.TensorProto.FLOAT, None)
seq_out = onnx.helper.make_tensor_sequence_value_info('seq_out', onnx.TensorProto.FLOAT, None)
cond_in = onnx.helper.make_tensor_value_info('cond_in', onnx.TensorProto.BOOL, [])
cond_out = onnx.helper.make_tensor_value_info('cond_out', onnx.TensorProto.BOOL, [])
iter_count = onnx.helper.make_tensor_value_info('iter_count', onnx.TensorProto.INT64, [])
x = np.array([1, 2, 3, 4, 5]).astype(np.float32)
x_const_node = onnx.helper.make_node(
'Constant',
inputs=[],
outputs=['x'],
value=onnx.helper.make_tensor(
name='const_tensor_x',
data_type=onnx.TensorProto.FLOAT,
dims=x.shape,
vals=x.flatten().astype(float),
)
)
one_const_node = onnx.helper.make_node(
'Constant',
inputs=[],
outputs=['one'],
value=onnx.helper.make_tensor(
name='const_tensor_one',
data_type=onnx.TensorProto.INT64,
dims=(),
vals=[1]
)
)
zero_const_node = onnx.helper.make_node(
'Constant',
inputs=[],
outputs=['slice_start'],
value=onnx.helper.make_tensor(
name='const_tensor_zero',
data_type=onnx.TensorProto.INT64,
dims=(1,),
vals=[0]
)
)
axes_node = onnx.helper.make_node(
'Constant',
inputs=[],
outputs=['axes'],
value=onnx.helper.make_tensor(
name='const_tensor_axes',
data_type=onnx.TensorProto.INT64,
dims=(),
vals=[0]
)
)
add_node = onnx.helper.make_node(
'Add',
inputs=['iter_count', 'one'],
outputs=['end']
)
end_unsqueeze_node = onnx.helper.make_node(
'Unsqueeze',
inputs=['end', 'axes'],
outputs=['slice_end']
)
slice_node = onnx.helper.make_node(
'Slice',
inputs=['x', 'slice_start', 'slice_end'],
outputs=['slice_out']
)
insert_node = onnx.helper.make_node(
'SequenceInsert',
inputs=['seq_in', 'slice_out'],
outputs=['seq_out']
)
identity_node = onnx.helper.make_node(
'Identity',
inputs=['cond_in'],
outputs=['cond_out']
)
loop_body = onnx.helper.make_graph(
[identity_node, x_const_node, one_const_node, zero_const_node, add_node,
axes_node, end_unsqueeze_node, slice_node, insert_node],
'loop_body',
[iter_count, cond_in, seq_in],
[cond_out, seq_out]
)
node = onnx.helper.make_node(
'Loop',
inputs=['trip_count', 'cond', 'seq_empty'],
outputs=['seq_res'],
body=loop_body
)
trip_count = np.array(5).astype(np.int64)
seq_empty: List[Any] = []
seq_res = [x[:int(i)] for i in x]
cond = np.array(1).astype(bool)
expect(node, inputs=[trip_count, cond, seq_empty], outputs=[seq_res],
name='test_loop13_seq', opset_imports=[onnx.helper.make_opsetid("", 13)],
input_type_protos=[onnx.helper.make_tensor_type_proto(onnx.TensorProto.INT64, trip_count.shape),
onnx.helper.make_tensor_type_proto(onnx.TensorProto.BOOL, cond.shape),
onnx.helper.make_sequence_type_proto(
onnx.helper.make_tensor_type_proto(onnx.TensorProto.FLOAT, []))])
loop_16_none
# Given a tensor sequence of values [x1, ..., xN], and an initial optional sequence of tensors [x0],
# Return a concatenated sequence of tensors of
# [x0, [x1], [x1, x2], ..., [x1, ..., xN]]
ten_in_tp = onnx.helper.make_tensor_type_proto(onnx.TensorProto.FLOAT, [])
seq_in_tp = onnx.helper.make_sequence_type_proto(ten_in_tp)
opt_in_tp = onnx.helper.make_optional_type_proto(seq_in_tp)
opt_in = onnx.helper.make_value_info('opt_seq_in', opt_in_tp)
seq_out = onnx.helper.make_tensor_sequence_value_info('seq_out', onnx.TensorProto.FLOAT, [])
cond_in = onnx.helper.make_tensor_value_info('cond_in', onnx.TensorProto.BOOL, [])
cond_out = onnx.helper.make_tensor_value_info('cond_out', onnx.TensorProto.BOOL, [])
iter_count = onnx.helper.make_tensor_value_info('iter_count', onnx.TensorProto.INT64, [])
x0 = np.array(0).astype(np.float32)
x = np.array([1, 2, 3, 4, 5]).astype(np.float32)
optional_has_elem_node = onnx.helper.make_node(
'OptionalHasElement',
inputs=['opt_seq_in'],
outputs=['optional_has_elem']
)
optional_is_none = onnx.helper.make_node(
'Not',
inputs=['optional_has_elem'],
outputs=['optional_is_none']
)
optional_get_elem = onnx.helper.make_node(
'OptionalGetElement',
inputs=['opt_seq_in'],
outputs=['seq_in']
)
constant_in = onnx.helper.make_node(
'Constant',
inputs=[],
outputs=['constant_in'],
value=onnx.helper.make_tensor(
name='const_tensor',
data_type=onnx.TensorProto.FLOAT,
dims=(),
vals=[0]
)
)
seq_const_in = onnx.helper.make_node(
'SequenceConstruct',
inputs=['constant_in'],
outputs=['init_seq_in']
)
then_seq_out = onnx.helper.make_tensor_sequence_value_info('init_seq_in', onnx.TensorProto.FLOAT, [])
then_body = onnx.helper.make_graph(
[constant_in, seq_const_in],
'then_body',
[],
[then_seq_out]
)
else_seq_out = onnx.helper.make_tensor_sequence_value_info('seq_in', onnx.TensorProto.FLOAT, [])
else_body = onnx.helper.make_graph(
[optional_get_elem],
'else_body',
[],
[else_seq_out]
)
if_node = onnx.helper.make_node(
'If',
inputs=['optional_is_none'],
outputs=['sequence'],
then_branch=then_body,
else_branch=else_body
)
x_const_node = onnx.helper.make_node(
'Constant',
inputs=[],
outputs=['x'],
value=onnx.helper.make_tensor(
name='const_tensor_x',
data_type=onnx.TensorProto.FLOAT,
dims=x.shape,
vals=x.flatten().astype(float),
)
)
one_const_node = onnx.helper.make_node(
'Constant',
inputs=[],
outputs=['one'],
value=onnx.helper.make_tensor(
name='const_tensor_one',
data_type=onnx.TensorProto.INT64,
dims=(),
vals=[1]
)
)
zero_const_node = onnx.helper.make_node(
'Constant',
inputs=[],
outputs=['slice_start'],
value=onnx.helper.make_tensor(
name='const_tensor_zero',
data_type=onnx.TensorProto.INT64,
dims=(1,),
vals=[0]
)
)
axes_node = onnx.helper.make_node(
'Constant',
inputs=[],
outputs=['axes'],
value=onnx.helper.make_tensor(
name='const_tensor_axes',
data_type=onnx.TensorProto.INT64,
dims=(),
vals=[0]
)
)
add_node = onnx.helper.make_node(
'Add',
inputs=['iter_count', 'one'],
outputs=['end']
)
end_unsqueeze_node = onnx.helper.make_node(
'Unsqueeze',
inputs=['end', 'axes'],
outputs=['slice_end']
)
slice_node = onnx.helper.make_node(
'Slice',
inputs=['x', 'slice_start', 'slice_end'],
outputs=['slice_out']
)
insert_node = onnx.helper.make_node(
'SequenceInsert',
inputs=['sequence', 'slice_out'],
outputs=['seq_out']
)
identity_node = onnx.helper.make_node(
'Identity',
inputs=['cond_in'],
outputs=['cond_out']
)
loop_body = onnx.helper.make_graph(
[identity_node, optional_has_elem_node, optional_is_none, if_node, x_const_node, one_const_node,
zero_const_node, add_node, axes_node, end_unsqueeze_node, slice_node, insert_node],
'loop_body',
[iter_count, cond_in, opt_in],
[cond_out, seq_out]
)
node = onnx.helper.make_node(
'Loop',
inputs=['trip_count', 'cond', 'opt_seq'],
outputs=['seq_res'],
body=loop_body
)
trip_count = np.array(5).astype(np.int64)
cond = np.array(1).astype(bool)
seq_res = compute_loop_outputs(x, [x0], trip_count)
opt_seq_in: List[Any] = [x0]
expect(node, inputs=[trip_count, cond, opt_seq_in], outputs=[seq_res],
name='test_loop16_seq_none', opset_imports=[onnx.helper.make_opsetid("", 16)],
input_type_protos=[onnx.helper.make_tensor_type_proto(onnx.TensorProto.INT64, trip_count.shape),
onnx.helper.make_tensor_type_proto(onnx.TensorProto.BOOL, cond.shape),
opt_in_tp])
Differences
0 | 0 | Generic Looping construct. This loop has multiple termination conditions: | Generic Looping construct. This loop has multiple termination conditions: |
1 | 1 |
|
|
2 | 2 | 1) Trip count. Iteration count specified at runtime. Set by | 1) Trip count. Iteration count specified at runtime. Set by |
3 | 3 | specifying the input M. Optional. Set to empty string to omit. | specifying the input M. Optional. Set to empty string to omit. |
4 | 4 | Note that a static trip count (specified at graph construction time) can be | Note that a static trip count (specified at graph construction time) can be |
5 | 5 | specified by passing in a constant node for input M. | specified by passing in a constant node for input M. |
6 | 6 | 2) Loop termination condition. This is an input to the op that determines | 2) Loop termination condition. This is an input to the op that determines |
7 | 7 | whether to run the first iteration and also a loop-carried dependency for | whether to run the first iteration and also a loop-carried dependency for |
8 | 8 | the body graph. The body graph must yield a value for the condition variable, | the body graph. The body graph must yield a value for the condition variable, |
9 | 9 | whether this input is provided or not. | whether this input is provided or not. |
10 | 10 |
|
|
11 | 11 | This table summarizes the operating modes of this operator with equivalent | This table summarizes the operating modes of this operator with equivalent |
12 | 12 | C-style code: | C-style code: |
13 | 13 |
|
|
14 | 14 | Operator inputs defined as (max_trip_count, condition_var). | Operator inputs defined as (max_trip_count, condition_var). |
15 | 15 |
|
|
16 | 16 | input ("", ""): | input ("", ""): |
17 | 17 | for (int i=0; ; ++i) { | for (int i=0; ; ++i) { |
18 | 18 | cond = ... // Note this value is ignored, but is required in the body | cond = ... // Note this value is ignored, but is required in the body |
19 | 19 | } | } |
20 | 20 |
|
|
21 | 21 | input ("", cond) // Note this is analogous to a while loop | input ("", cond) // Note this is analogous to a while loop |
22 | 22 | bool cond = ...; | bool cond = ...; |
23 | 23 | for (int i=0; cond; ++i) { | for (int i=0; cond; ++i) { |
24 | 24 | cond = ...; | cond = ...; |
25 | 25 | } | } |
26 | 26 |
|
|
27 | 27 | input ("", 1) // Note this is analogous to a do-while loop | input ("", 1) // Note this is analogous to a do-while loop |
28 | 28 | bool cond = true | bool cond = true |
29 | 29 | for (int i=0; cond; ++i) { | for (int i=0; cond; ++i) { |
30 | 30 | cond = ...; | cond = ...; |
31 | 31 | } | } |
32 | 32 |
|
|
33 | 33 | input (trip_count, "") // Note this is analogous to a for loop | input (trip_count, "") // Note this is analogous to a for loop |
34 | 34 | int trip_count = ... | int trip_count = ... |
35 | 35 | for (int i=0; i < trip_count; ++i) { | for (int i=0; i < trip_count; ++i) { |
36 | 36 | cond = ...; // ignored | cond = ...; // ignored |
37 | 37 | } | } |
38 | 38 |
|
|
39 | 39 | input (trip_count, cond) | input (trip_count, cond) |
40 | 40 | int trip_count = ...; | int trip_count = ...; |
41 | 41 | bool cond = ...; | bool cond = ...; |
42 | 42 | for (int i=0; i < trip_count && cond; ++i) { | for (int i=0; i < trip_count && cond; ++i) { |
43 | 43 | cond = ...; | cond = ...; |
44 | 44 | } | } |
45 | 45 |
|
|
46 | 46 | *Sample usage - cond as well as trip count* | *Sample usage - cond as well as trip count* |
47 | 47 |
|
|
48 | 48 | graph predict-net { | graph predict-net { |
49 | 49 | %a = Constant[value = | %a = Constant[value = |
50 | 50 | %b = Constant[value = | %b = Constant[value = |
51 | 51 | %keepgoing = Constant[value = | %keepgoing = Constant[value = |
52 | 52 | %max_trip_count = Constant[value = | %max_trip_count = Constant[value = |
53 | 53 | %keepgoing_out, %b_out, %user_defined_vals = Loop[body = | %keepgoing_out, %b_out, %user_defined_vals = Loop[body = |
54 | 54 | return | return |
55 | 55 | } | } |
56 | 56 |
|
|
57 | 57 | graph body-net ( | graph body-net ( |
58 | 58 | %i[INT32, scalar] // iteration number | %i[INT32, scalar] // iteration number |
59 | 59 | %keepgoing_in[BOOL, scalar] // incoming loop-termination-condition; not used | %keepgoing_in[BOOL, scalar] // incoming loop-termination-condition; not used |
60 | 60 | %b_in[INT32, scalar] // incoming value of loop-carried-dependency b | %b_in[INT32, scalar] // incoming value of loop-carried-dependency b |
61 | 61 | ) { | ) { |
62 | 62 | %my_local = Add(%a, %b_in) | %my_local = Add(%a, %b_in) |
63 | 63 | %b_out = Sub(%a, %b_in) // outgoing value of loop-carried-dependency b | %b_out = Sub(%a, %b_in) // outgoing value of loop-carried-dependency b |
64 | 64 | %keepgoing_out = Greater(%my_local, %b_out) // outgoing loop-termination-condition | %keepgoing_out = Greater(%my_local, %b_out) // outgoing loop-termination-condition |
65 | 65 | %user_defined_val = Add(%b_in, %b_in) // scan-output value to be accumulated | %user_defined_val = Add(%b_in, %b_in) // scan-output value to be accumulated |
66 | 66 | return %keepgoing_out, %b_out, %user_defined_val | return %keepgoing_out, %b_out, %user_defined_val |
67 | 67 | } | } |
68 | 68 |
|
|
69 | 69 | *Sample equivalent C code* | *Sample equivalent C code* |
70 | 70 |
|
|
71 | 71 | { | { |
72 | 72 | /* User-defined code (enclosing scope) */ | /* User-defined code (enclosing scope) */ |
73 | 73 | int a = 3, b = 6; | int a = 3, b = 6; |
74 | 74 | bool keepgoing = true; // Analogous to input cond | bool keepgoing = true; // Analogous to input cond |
75 | 75 | /* End user-defined code */ | /* End user-defined code */ |
76 | 76 |
|
|
77 | 77 | /* Implicitly-defined code */ | /* Implicitly-defined code */ |
78 | 78 | const int max_trip_count = 10; // Analogous to input M | const int max_trip_count = 10; // Analogous to input M |
79 | 79 | int user_defined_vals[]; // Imagine this is resizable | int user_defined_vals[]; // Imagine this is resizable |
80 | 80 | /* End implicitly-defined code */ | /* End implicitly-defined code */ |
81 | 81 | /* initialize loop-carried variables and scan-output variables */ | /* initialize loop-carried variables and scan-output variables */ |
82 | 82 | bool keepgoing_out = keepgoing | bool keepgoing_out = keepgoing |
83 | 83 | int b_out = b | int b_out = b |
84 | 84 |
|
|
85 | 85 | for (int i=0; i < max_trip_count && keepgoing_out; ++i) { | for (int i=0; i < max_trip_count && keepgoing_out; ++i) { |
86 | 86 | /* Implicitly-defined code: bind actual parameter values | /* Implicitly-defined code: bind actual parameter values |
87 | 87 | to formal parameter variables of loop-body */ | to formal parameter variables of loop-body */ |
88 | 88 | bool keepgoing_in = keepgoing_out; | bool keepgoing_in = keepgoing_out; |
89 | 89 | bool b_in = b_out; | bool b_in = b_out; |
90 | 90 |
|
|
91 | 91 | /* User-defined code (loop body) */ | /* User-defined code (loop body) */ |
92 | 92 | int my_local = a + b_in; // Reading value "a" from the enclosing scope is fine | int my_local = a + b_in; // Reading value "a" from the enclosing scope is fine |
93 | 93 | b_out = a - b_in; | b_out = a - b_in; |
94 | 94 | keepgoing_out = my_local > b_out; | keepgoing_out = my_local > b_out; |
95 | 95 | user_defined_val = b_in + b_in; // b_in and b_out are different variables | user_defined_val = b_in + b_in; // b_in and b_out are different variables |
96 | 96 | /* End user-defined code */ | /* End user-defined code */ |
97 | 97 |
|
|
98 | 98 | /* Implicitly defined-code */ | /* Implicitly defined-code */ |
99 | 99 | user_defined_vals[i] = user_defined_val // accumulate scan-output values | user_defined_vals[i] = user_defined_val // accumulate scan-output values |
100 | 100 | } | } |
101 | 101 | // int t = my_local; // Can't do this. my_local is not accessible here. | // int t = my_local; // Can't do this. my_local is not accessible here. |
102 | 102 |
|
|
103 | 103 | // The values below are bound to the output variables of the loop and therefore accessible | // The values below are bound to the output variables of the loop and therefore accessible |
104 | 104 | // b_out; user_defined_vals; keepgoing_out; | // b_out; user_defined_vals; keepgoing_out; |
105 | 105 | } | } |
106 | 106 |
|
|
107 | 107 | There are several things of note in this code snippet: | There are several things of note in this code snippet: |
108 | 108 |
|
|
109 | 109 | 1) Values from the enclosing scope (i.e. variable "a" here) are in scope and can | 1) Values from the enclosing scope (i.e. variable "a" here) are in scope and can |
110 | 110 | be referenced in the inputs of the loop. | be referenced in the inputs of the loop. |
111 | 111 | 2) Any values computed in the loop body that needs to be used in a subsequent | 2) Any values computed in the loop body that needs to be used in a subsequent |
112 | 112 | iteration or after the loop are modelled using a pair of variables in the loop-body, | iteration or after the loop are modelled using a pair of variables in the loop-body, |
113 | 113 | consisting of an input variable (eg., b_in) and an output variable (eg., b_out). | consisting of an input variable (eg., b_in) and an output variable (eg., b_out). |
114 | 114 | These are referred to as loop-carried dependences. The loop operation node | These are referred to as loop-carried dependences. The loop operation node |
115 | 115 | supplies the input value of the input variable for the first iteration, and | supplies the input value of the input variable for the first iteration, and |
116 | 116 | returns the output value of the output variable produced by the final | returns the output value of the output variable produced by the final |
117 | 117 | iteration. | iteration. |
118 | 118 | 3) Scan_output variables are used to implicitly concatenate values computed across | 3) Scan_output variables are used to implicitly concatenate values computed across |
119 | 119 | all the iterations. In the above example, the value of user_defined_val computed | all the iterations. In the above example, the value of user_defined_val computed |
120 | 120 | over all iterations are concatenated and returned as the value of user_defined_vals | over all iterations are concatenated and returned as the value of user_defined_vals |
121 | 121 | after the loop. | after the loop. |
122 | 122 | 4) Values created in the body cannot be accessed in the enclosing scope, | 4) Values created in the body cannot be accessed in the enclosing scope, |
123 | 123 | except using the mechanism described above. | except using the mechanism described above. |
124 | 124 |
|
|
125 | 125 | Note that the semantics of this op support "diagonal" or "wavefront" execution. | Note that the semantics of this op support "diagonal" or "wavefront" execution. |
126 | 126 | (See Step 3 here for an example: | (See Step 3 here for an example: |
127 | 127 | https://devblogs.nvidia.com/optimizing-recurrent-neural-networks-cudnn-5/). | https://devblogs.nvidia.com/optimizing-recurrent-neural-networks-cudnn-5/). |
128 | 128 | Frontends should emit multi-layer RNNs as a series of While operators (with | Frontends should emit multi-layer RNNs as a series of While operators (with |
129 | 129 | time being the inner looping dimension), with each successive layer consuming | time being the inner looping dimension), with each successive layer consuming |
130 | 130 | the scan_outputs from the previous layer, possibly going through several | the scan_outputs from the previous layer, possibly going through several |
131 | 131 | point-wise operators (e.g. dropout, residual connections, linear layer). | point-wise operators (e.g. dropout, residual connections, linear layer). |
132 | 132 |
|
|
133 | 133 | The input/output of subgraph (produced by loop node) matching is based on order instead of name. The implementation will figure out the names based on this order. | The input/output of subgraph (produced by loop node) matching is based on order instead of name. The implementation will figure out the names based on this order. |
134 | 134 |
|
|
135 | 135 | **Attributes** | **Attributes** |
136 | 136 |
|
|
137 | 137 | * **body** (required): | * **body** (required): |
138 | 138 | The graph run each iteration. It has 2+N inputs: (iteration_num, | The graph run each iteration. It has 2+N inputs: (iteration_num, |
139 | 139 | condition, loop carried dependencies...). It has 1+N+K outputs: | condition, loop carried dependencies...). It has 1+N+K outputs: |
140 | 140 | (condition, loop carried dependencies..., scan_outputs...). Each | (condition, loop carried dependencies..., scan_outputs...). Each |
141 | 141 | scan_output is created by concatenating the value of the specified | scan_output is created by concatenating the value of the specified |
142 | 142 | output value at the end of each iteration of the loop. It is an | output value at the end of each iteration of the loop. It is an |
143 | 143 | error if the dimensions or data type of these scan_outputs change | error if the dimensions or data type of these scan_outputs change |
144 | 144 | across loop iterations. | across loop iterations. |
145 | 145 |
|
|
146 | 146 | **Inputs** | **Inputs** |
147 | 147 |
|
|
148 | 148 | Between 2 and 2147483647 inputs. | Between 2 and 2147483647 inputs. |
149 | 149 |
|
|
150 | 150 | * **M** (optional, heterogeneous) - **I**: | * **M** (optional, heterogeneous) - **I**: |
151 | 151 | A maximum trip-count for the loop specified at runtime. Optional. | A maximum trip-count for the loop specified at runtime. Optional. |
152 | 152 | Pass empty string to skip. | Pass empty string to skip. |
153 | 153 | * **cond** (optional, heterogeneous) - **B**: | * **cond** (optional, heterogeneous) - **B**: |
154 | 154 | A boolean termination condition. Optional. Pass empty string to | A boolean termination condition. Optional. Pass empty string to |
155 | 155 | skip. | skip. |
156 | 156 | * **v_initial** (variadic) - **V**: | * **v_initial** (variadic) - **V**: |
157 | 157 | The initial values of any loop-carried dependencies (values that | The initial values of any loop-carried dependencies (values that |
158 | 158 | change across loop iterations) | change across loop iterations) |
159 | 159 |
|
|
160 | 160 | **Outputs** | **Outputs** |
161 | 161 |
|
|
162 | 162 | Between 1 and 2147483647 outputs. | Between 1 and 2147483647 outputs. |
163 | 163 |
|
|
164 | 164 | * **v_final_and_scan_outputs** (variadic) - **V**: | * **v_final_and_scan_outputs** (variadic) - **V**: |
165 | 165 | Final N loop carried dependency values then K scan_outputs. Scan | Final N loop carried dependency values then K scan_outputs. Scan |
166 | 166 | outputs must be Tensors. | outputs must be Tensors. |
167 | 167 |
|
|
168 | 168 | **Type Constraints** | **Type Constraints** |
169 | 169 |
|
|
170 | 170 | * **V** in ( | * **V** in ( |
171 | optional(seq(tensor(bfloat16))), | ||
172 | optional(seq(tensor(bool))), | ||
173 | optional(seq(tensor(complex128))), | ||
174 | optional(seq(tensor(complex64))), | ||
175 | optional(seq(tensor(double))), | ||
176 | optional(seq(tensor(float))), | ||
177 | optional(seq(tensor(float16))), | ||
178 | optional(seq(tensor(int16))), | ||
179 | optional(seq(tensor(int32))), | ||
180 | optional(seq(tensor(int64))), | ||
181 | optional(seq(tensor(int8))), | ||
182 | optional(seq(tensor(string))), | ||
183 | optional(seq(tensor(uint16))), | ||
184 | optional(seq(tensor(uint32))), | ||
185 | optional(seq(tensor(uint64))), | ||
186 | optional(seq(tensor(uint8))), | ||
187 | optional(tensor(bfloat16)), | ||
188 | optional(tensor(bool)), | ||
189 | optional(tensor(complex128)), | ||
190 | optional(tensor(complex64)), | ||
191 | optional(tensor(double)), | ||
192 | optional(tensor(float)), | ||
193 | optional(tensor(float16)), | ||
194 | optional(tensor(int16)), | ||
195 | optional(tensor(int32)), | ||
196 | optional(tensor(int64)), | ||
197 | optional(tensor(int8)), | ||
198 | optional(tensor(string)), | ||
199 | optional(tensor(uint16)), | ||
200 | optional(tensor(uint32)), | ||
201 | optional(tensor(uint64)), | ||
202 | optional(tensor(uint8)), | ||
203 | seq(tensor(bfloat16)), | ||
171 | 204 | seq(tensor(bool)), | seq(tensor(bool)), |
172 | 205 | seq(tensor(complex128)), | seq(tensor(complex128)), |
173 | 206 | seq(tensor(complex64)), | seq(tensor(complex64)), |
174 | 207 | seq(tensor(double)), | seq(tensor(double)), |
175 | 208 | seq(tensor(float)), | seq(tensor(float)), |
176 | 209 | seq(tensor(float16)), | seq(tensor(float16)), |
177 | 210 | seq(tensor(int16)), | seq(tensor(int16)), |
178 | 211 | seq(tensor(int32)), | seq(tensor(int32)), |
179 | 212 | seq(tensor(int64)), | seq(tensor(int64)), |
180 | 213 | seq(tensor(int8)), | seq(tensor(int8)), |
181 | 214 | seq(tensor(string)), | seq(tensor(string)), |
182 | 215 | seq(tensor(uint16)), | seq(tensor(uint16)), |
183 | 216 | seq(tensor(uint32)), | seq(tensor(uint32)), |
184 | 217 | seq(tensor(uint64)), | seq(tensor(uint64)), |
185 | 218 | seq(tensor(uint8)), | seq(tensor(uint8)), |
219 | tensor(bfloat16), | ||
186 | 220 | tensor(bool), | tensor(bool), |
187 | 221 | tensor(complex128), | tensor(complex128), |
188 | 222 | tensor(complex64), | tensor(complex64), |
189 | 223 | tensor(double), | tensor(double), |
190 | 224 | tensor(float), | tensor(float), |
191 | 225 | tensor(float16), | tensor(float16), |
192 | 226 | tensor(int16), | tensor(int16), |
193 | 227 | tensor(int32), | tensor(int32), |
194 | 228 | tensor(int64), | tensor(int64), |
195 | 229 | tensor(int8), | tensor(int8), |
196 | 230 | tensor(string), | tensor(string), |
197 | 231 | tensor(uint16), | tensor(uint16), |
198 | 232 | tensor(uint32), | tensor(uint32), |
199 | 233 | tensor(uint64), | tensor(uint64), |
200 | 234 | tensor(uint8) | tensor(uint8) |
201 | 235 | ): | ): |
202 | 236 | All Tensor and Sequence types |
|
237 | Optional(Sequence(Tensor)) types | ||
203 | 238 | * **I** in ( | * **I** in ( |
204 | 239 | tensor(int64) | tensor(int64) |
205 | 240 | ): | ): |
206 | 241 | tensor of int64, which should be a scalar. | tensor of int64, which should be a scalar. |
207 | 242 | * **B** in ( | * **B** in ( |
208 | 243 | tensor(bool) | tensor(bool) |
209 | 244 | ): | ): |
210 | 245 | tensor of bool, which should be a scalar. | tensor of bool, which should be a scalar. |
Loop - 13#
Version
name: Loop (GitHub)
domain: main
since_version: 13
function: False
support_level: SupportType.COMMON
shape inference: True
This version of the operator has been available since version 13.
Summary
Generic Looping construct. This loop has multiple termination conditions:
Trip count. Iteration count specified at runtime. Set by specifying the input M. Optional. Set to empty string to omit. Note that a static trip count (specified at graph construction time) can be specified by passing in a constant node for input M.
Loop termination condition. This is an input to the op that determines whether to run the first iteration and also a loop-carried dependency for the body graph. The body graph must yield a value for the condition variable, whether this input is provided or not.
This table summarizes the operating modes of this operator with equivalent C-style code:
Operator inputs defined as (max_trip_count, condition_var).
- input (“”, “”):
- for (int i=0; ; ++i) {
cond = … // Note this value is ignored, but is required in the body
}
- input (“”, cond) // Note this is analogous to a while loop
bool cond = …; for (int i=0; cond; ++i) {
cond = …;
}
- input (“”, 1) // Note this is analogous to a do-while loop
bool cond = true for (int i=0; cond; ++i) {
cond = …;
}
- input (trip_count, “”) // Note this is analogous to a for loop
int trip_count = … for (int i=0; i < trip_count; ++i) {
cond = …; // ignored
}
- input (trip_count, cond)
int trip_count = …; bool cond = …; for (int i=0; i < trip_count && cond; ++i) {
cond = …;
}
Sample usage - cond as well as trip count
- graph predict-net {
%a = Constant[value = <Scalar Tensor [3]>]() %b = Constant[value = <Scalar Tensor [6]>]() %keepgoing = Constant[value = <Scalar Tensor [1]>]() %max_trip_count = Constant[value = <Scalar Tensor [10]>]() %keepgoing_out, %b_out, %user_defined_vals = Loop[body = <graph body-net>](%max_trip_count, %keepgoing, %b) return
}
- graph body-net (
%i[INT32, scalar] // iteration number %keepgoing_in[BOOL, scalar] // incoming loop-termination-condition; not used %b_in[INT32, scalar] // incoming value of loop-carried-dependency b
- ) {
%my_local = Add(%a, %b_in) %b_out = Sub(%a, %b_in) // outgoing value of loop-carried-dependency b %keepgoing_out = Greater(%my_local, %b_out) // outgoing loop-termination-condition %user_defined_val = Add(%b_in, %b_in) // scan-output value to be accumulated return %keepgoing_out, %b_out, %user_defined_val
}
Sample equivalent C code
- {
/* User-defined code (enclosing scope) / int a = 3, b = 6; bool keepgoing = true; // Analogous to input cond / End user-defined code */
/* Implicitly-defined code / const int max_trip_count = 10; // Analogous to input M int user_defined_vals[]; // Imagine this is resizable / End implicitly-defined code / / initialize loop-carried variables and scan-output variables */ bool keepgoing_out = keepgoing int b_out = b
- for (int i=0; i < max_trip_count && keepgoing_out; ++i) {
- /* Implicitly-defined code: bind actual parameter values
to formal parameter variables of loop-body */
bool keepgoing_in = keepgoing_out; bool b_in = b_out;
/* User-defined code (loop body) / int my_local = a + b_in; // Reading value “a” from the enclosing scope is fine b_out = a - b_in; keepgoing_out = my_local > b_out; user_defined_val = b_in + b_in; // b_in and b_out are different variables / End user-defined code */
/* Implicitly defined-code */ user_defined_vals[i] = user_defined_val // accumulate scan-output values
} // int t = my_local; // Can’t do this. my_local is not accessible here.
// The values below are bound to the output variables of the loop and therefore accessible // b_out; user_defined_vals; keepgoing_out;
}
There are several things of note in this code snippet:
Values from the enclosing scope (i.e. variable “a” here) are in scope and can be referenced in the inputs of the loop.
Any values computed in the loop body that needs to be used in a subsequent iteration or after the loop are modelled using a pair of variables in the loop-body, consisting of an input variable (eg., b_in) and an output variable (eg., b_out). These are referred to as loop-carried dependences. The loop operation node supplies the input value of the input variable for the first iteration, and returns the output value of the output variable produced by the final iteration.
Scan_output variables are used to implicitly concatenate values computed across all the iterations. In the above example, the value of user_defined_val computed over all iterations are concatenated and returned as the value of user_defined_vals after the loop.
Values created in the body cannot be accessed in the enclosing scope, except using the mechanism described above.
Note that the semantics of this op support “diagonal” or “wavefront” execution. (See Step 3 here for an example: https://devblogs.nvidia.com/optimizing-recurrent-neural-networks-cudnn-5/). Frontends should emit multi-layer RNNs as a series of While operators (with time being the inner looping dimension), with each successive layer consuming the scan_outputs from the previous layer, possibly going through several point-wise operators (e.g. dropout, residual connections, linear layer).
The input/output of subgraph (produced by loop node) matching is based on order instead of name. The implementation will figure out the names based on this order.
Attributes
body (required): The graph run each iteration. It has 2+N inputs: (iteration_num, condition, loop carried dependencies…). It has 1+N+K outputs: (condition, loop carried dependencies…, scan_outputs…). Each scan_output is created by concatenating the value of the specified output value at the end of each iteration of the loop. It is an error if the dimensions or data type of these scan_outputs change across loop iterations.
Inputs
Between 2 and 2147483647 inputs.
M (optional, heterogeneous) - I: A maximum trip-count for the loop specified at runtime. Optional. Pass empty string to skip.
cond (optional, heterogeneous) - B: A boolean termination condition. Optional. Pass empty string to skip.
v_initial (variadic) - V: The initial values of any loop-carried dependencies (values that change across loop iterations)
Outputs
Between 1 and 2147483647 outputs.
v_final_and_scan_outputs (variadic) - V: Final N loop carried dependency values then K scan_outputs. Scan outputs must be Tensors.
Type Constraints
V in ( seq(tensor(bool)), seq(tensor(complex128)), seq(tensor(complex64)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(string)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)), tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): All Tensor and Sequence types
I in ( tensor(int64) ): tensor of int64, which should be a scalar.
B in ( tensor(bool) ): tensor of bool, which should be a scalar.
Differences
0 | 0 | Generic Looping construct. This loop has multiple termination conditions: | Generic Looping construct. This loop has multiple termination conditions: |
1 | 1 |
|
|
2 | 2 | 1) Trip count. Iteration count specified at runtime. Set by | 1) Trip count. Iteration count specified at runtime. Set by |
3 | 3 | specifying the input M. Optional. Set to empty string to omit. | specifying the input M. Optional. Set to empty string to omit. |
4 | 4 | Note that a static trip count (specified at graph construction time) can be | Note that a static trip count (specified at graph construction time) can be |
5 | 5 | specified by passing in a constant node for input M. | specified by passing in a constant node for input M. |
6 | 6 | 2) Loop termination condition. This is an input to the op that determines | 2) Loop termination condition. This is an input to the op that determines |
7 | 7 | whether to run the first iteration and also a loop-carried dependency for | whether to run the first iteration and also a loop-carried dependency for |
8 | 8 | the body graph. The body graph must yield a value for the condition variable, | the body graph. The body graph must yield a value for the condition variable, |
9 | 9 | whether this input is provided or not. | whether this input is provided or not. |
10 | 10 |
|
|
11 | 11 | This table summarizes the operating modes of this operator with equivalent | This table summarizes the operating modes of this operator with equivalent |
12 | 12 | C-style code: | C-style code: |
13 | 13 |
|
|
14 | 14 | Operator inputs defined as (max_trip_count, condition_var). | Operator inputs defined as (max_trip_count, condition_var). |
15 | 15 |
|
|
16 | 16 | input ("", ""): | input ("", ""): |
17 | 17 | for (int i=0; ; ++i) { | for (int i=0; ; ++i) { |
18 | 18 | cond = ... // Note this value is ignored, but is required in the body | cond = ... // Note this value is ignored, but is required in the body |
19 | 19 | } | } |
20 | 20 |
|
|
21 | 21 | input ("", cond) // Note this is analogous to a while loop | input ("", cond) // Note this is analogous to a while loop |
22 | 22 | bool cond = ...; | bool cond = ...; |
23 | 23 | for (int i=0; cond; ++i) { | for (int i=0; cond; ++i) { |
24 | 24 | cond = ...; | cond = ...; |
25 | 25 | } | } |
26 | 26 |
|
|
27 | 27 | input ("", 1) // Note this is analogous to a do-while loop | input ("", 1) // Note this is analogous to a do-while loop |
28 | 28 | bool cond = true | bool cond = true |
29 | 29 | for (int i=0; cond; ++i) { | for (int i=0; cond; ++i) { |
30 | 30 | cond = ...; | cond = ...; |
31 | 31 | } | } |
32 | 32 |
|
|
33 | 33 | input (trip_count, "") // Note this is analogous to a for loop | input (trip_count, "") // Note this is analogous to a for loop |
34 | 34 | int trip_count = ... | int trip_count = ... |
35 | 35 | for (int i=0; i < trip_count; ++i) { | for (int i=0; i < trip_count; ++i) { |
36 | 36 | cond = ...; // ignored | cond = ...; // ignored |
37 | 37 | } | } |
38 | 38 |
|
|
39 | 39 | input (trip_count, cond) | input (trip_count, cond) |
40 | 40 | int trip_count = ...; | int trip_count = ...; |
41 | 41 | bool cond = ...; | bool cond = ...; |
42 | 42 | for (int i=0; i < trip_count && cond; ++i) { | for (int i=0; i < trip_count && cond; ++i) { |
43 | 43 | cond = ...; | cond = ...; |
44 | 44 | } | } |
45 | 45 |
|
|
46 | 46 | *Sample usage - cond as well as trip count* | *Sample usage - cond as well as trip count* |
47 | 47 |
|
|
48 | 48 | graph predict-net { | graph predict-net { |
49 | 49 | %a = Constant[value = | %a = Constant[value = |
50 | 50 | %b = Constant[value = | %b = Constant[value = |
51 | 51 | %keepgoing = Constant[value = | %keepgoing = Constant[value = |
52 | 52 | %max_trip_count = Constant[value = | %max_trip_count = Constant[value = |
53 | 53 | %keepgoing_out, %b_out, %user_defined_vals = Loop[body = | %keepgoing_out, %b_out, %user_defined_vals = Loop[body = |
54 | 54 | return | return |
55 | 55 | } | } |
56 | 56 |
|
|
57 | 57 | graph body-net ( | graph body-net ( |
58 | 58 | %i[INT32, scalar] // iteration number | %i[INT32, scalar] // iteration number |
59 | 59 | %keepgoing_in[BOOL, scalar] // incoming loop-termination-condition; not used | %keepgoing_in[BOOL, scalar] // incoming loop-termination-condition; not used |
60 | 60 | %b_in[INT32, scalar] // incoming value of loop-carried-dependency b | %b_in[INT32, scalar] // incoming value of loop-carried-dependency b |
61 | 61 | ) { | ) { |
62 | 62 | %my_local = Add(%a, %b_in) | %my_local = Add(%a, %b_in) |
63 | 63 | %b_out = Sub(%a, %b_in) // outgoing value of loop-carried-dependency b | %b_out = Sub(%a, %b_in) // outgoing value of loop-carried-dependency b |
64 | 64 | %keepgoing_out = Greater(%my_local, %b_out) // outgoing loop-termination-condition | %keepgoing_out = Greater(%my_local, %b_out) // outgoing loop-termination-condition |
65 | 65 | %user_defined_val = Add(%b_in, %b_in) // scan-output value to be accumulated | %user_defined_val = Add(%b_in, %b_in) // scan-output value to be accumulated |
66 | 66 | return %keepgoing_out, %b_out, %user_defined_val | return %keepgoing_out, %b_out, %user_defined_val |
67 | 67 | } | } |
68 | 68 |
|
|
69 | 69 | *Sample equivalent C code* | *Sample equivalent C code* |
70 | 70 |
|
|
71 | 71 | { | { |
72 | 72 | /* User-defined code (enclosing scope) */ | /* User-defined code (enclosing scope) */ |
73 | 73 | int a = 3, b = 6; | int a = 3, b = 6; |
74 | 74 | bool keepgoing = true; // Analogous to input cond | bool keepgoing = true; // Analogous to input cond |
75 | 75 | /* End user-defined code */ | /* End user-defined code */ |
76 | 76 |
|
|
77 | 77 | /* Implicitly-defined code */ | /* Implicitly-defined code */ |
78 | 78 | const int max_trip_count = 10; // Analogous to input M | const int max_trip_count = 10; // Analogous to input M |
79 | 79 | int user_defined_vals[]; // Imagine this is resizable | int user_defined_vals[]; // Imagine this is resizable |
80 | 80 | /* End implicitly-defined code */ | /* End implicitly-defined code */ |
81 | 81 | /* initialize loop-carried variables and scan-output variables */ | /* initialize loop-carried variables and scan-output variables */ |
82 | 82 | bool keepgoing_out = keepgoing | bool keepgoing_out = keepgoing |
83 | 83 | int b_out = b | int b_out = b |
84 | 84 |
|
|
85 | 85 | for (int i=0; i < max_trip_count && keepgoing_out; ++i) { | for (int i=0; i < max_trip_count && keepgoing_out; ++i) { |
86 | 86 | /* Implicitly-defined code: bind actual parameter values | /* Implicitly-defined code: bind actual parameter values |
87 | 87 | to formal parameter variables of loop-body */ | to formal parameter variables of loop-body */ |
88 | 88 | bool keepgoing_in = keepgoing_out; | bool keepgoing_in = keepgoing_out; |
89 | 89 | bool b_in = b_out; | bool b_in = b_out; |
90 | 90 |
|
|
91 | 91 | /* User-defined code (loop body) */ | /* User-defined code (loop body) */ |
92 | 92 | int my_local = a + b_in; // Reading value "a" from the enclosing scope is fine | int my_local = a + b_in; // Reading value "a" from the enclosing scope is fine |
93 | 93 | b_out = a - b_in; | b_out = a - b_in; |
94 | 94 | keepgoing_out = my_local > b_out; | keepgoing_out = my_local > b_out; |
95 | 95 | user_defined_val = b_in + b_in; // b_in and b_out are different variables | user_defined_val = b_in + b_in; // b_in and b_out are different variables |
96 | 96 | /* End user-defined code */ | /* End user-defined code */ |
97 | 97 |
|
|
98 | 98 | /* Implicitly defined-code */ | /* Implicitly defined-code */ |
99 | 99 | user_defined_vals[i] = user_defined_val // accumulate scan-output values | user_defined_vals[i] = user_defined_val // accumulate scan-output values |
100 | 100 | } | } |
101 | 101 | // int t = my_local; // Can't do this. my_local is not accessible here. | // int t = my_local; // Can't do this. my_local is not accessible here. |
102 | 102 |
|
|
103 | 103 | // The values below are bound to the output variables of the loop and therefore accessible | // The values below are bound to the output variables of the loop and therefore accessible |
104 | 104 | // b_out; user_defined_vals; keepgoing_out; | // b_out; user_defined_vals; keepgoing_out; |
105 | 105 | } | } |
106 | 106 |
|
|
107 | 107 | There are several things of note in this code snippet: | There are several things of note in this code snippet: |
108 | 108 |
|
|
109 | 109 | 1) Values from the enclosing scope (i.e. variable "a" here) are in scope and can | 1) Values from the enclosing scope (i.e. variable "a" here) are in scope and can |
110 | 110 | be referenced in the inputs of the loop. | be referenced in the inputs of the loop. |
111 | 111 | 2) Any values computed in the loop body that needs to be used in a subsequent | 2) Any values computed in the loop body that needs to be used in a subsequent |
112 | 112 | iteration or after the loop are modelled using a pair of variables in the loop-body, | iteration or after the loop are modelled using a pair of variables in the loop-body, |
113 | 113 | consisting of an input variable (eg., b_in) and an output variable (eg., b_out). | consisting of an input variable (eg., b_in) and an output variable (eg., b_out). |
114 | 114 | These are referred to as loop-carried dependences. The loop operation node | These are referred to as loop-carried dependences. The loop operation node |
115 | 115 | supplies the input value of the input variable for the first iteration, and | supplies the input value of the input variable for the first iteration, and |
116 | 116 | returns the output value of the output variable produced by the final | returns the output value of the output variable produced by the final |
117 | 117 | iteration. | iteration. |
118 | 118 | 3) Scan_output variables are used to implicitly concatenate values computed across | 3) Scan_output variables are used to implicitly concatenate values computed across |
119 | 119 | all the iterations. In the above example, the value of user_defined_val computed | all the iterations. In the above example, the value of user_defined_val computed |
120 | 120 | over all iterations are concatenated and returned as the value of user_defined_vals | over all iterations are concatenated and returned as the value of user_defined_vals |
121 | 121 | after the loop. | after the loop. |
122 | 122 | 4) Values created in the body cannot be accessed in the enclosing scope, | 4) Values created in the body cannot be accessed in the enclosing scope, |
123 | 123 | except using the mechanism described above. | except using the mechanism described above. |
124 | 124 |
|
|
125 | 125 | Note that the semantics of this op support "diagonal" or "wavefront" execution. | Note that the semantics of this op support "diagonal" or "wavefront" execution. |
126 | 126 | (See Step 3 here for an example: | (See Step 3 here for an example: |
127 | 127 | https://devblogs.nvidia.com/optimizing-recurrent-neural-networks-cudnn-5/). | https://devblogs.nvidia.com/optimizing-recurrent-neural-networks-cudnn-5/). |
128 | 128 | Frontends should emit multi-layer RNNs as a series of While operators (with | Frontends should emit multi-layer RNNs as a series of While operators (with |
129 | 129 | time being the inner looping dimension), with each successive layer consuming | time being the inner looping dimension), with each successive layer consuming |
130 | 130 | the scan_outputs from the previous layer, possibly going through several | the scan_outputs from the previous layer, possibly going through several |
131 | 131 | point-wise operators (e.g. dropout, residual connections, linear layer). | point-wise operators (e.g. dropout, residual connections, linear layer). |
132 | 132 |
|
|
133 | The input/output of subgraph (produced by loop node) matching is based on order instead of name. The implementation will figure out the names based on this order. | ||
134 |
| ||
133 | 135 | **Attributes** | **Attributes** |
134 | 136 |
|
|
135 | 137 | * **body** (required): | * **body** (required): |
136 | 138 | The graph run each iteration. It has 2+N inputs: (iteration_num, | The graph run each iteration. It has 2+N inputs: (iteration_num, |
137 | 139 | condition, loop carried dependencies...). It has 1+N+K outputs: | condition, loop carried dependencies...). It has 1+N+K outputs: |
138 | 140 | (condition, loop carried dependencies..., scan_outputs...). Each | (condition, loop carried dependencies..., scan_outputs...). Each |
139 | 141 | scan_output is created by concatenating the value of the specified | scan_output is created by concatenating the value of the specified |
140 | 142 | output value at the end of each iteration of the loop. It is an | output value at the end of each iteration of the loop. It is an |
141 | 143 | error if the dimensions or data type of these scan_outputs change | error if the dimensions or data type of these scan_outputs change |
142 | 144 | across loop iterations. | across loop iterations. |
143 | 145 |
|
|
144 | 146 | **Inputs** | **Inputs** |
145 | 147 |
|
|
146 | 148 | Between 2 and 2147483647 inputs. | Between 2 and 2147483647 inputs. |
147 | 149 |
|
|
148 | 150 | * **M** (optional, heterogeneous) - **I**: | * **M** (optional, heterogeneous) - **I**: |
149 | 151 | A maximum trip-count for the loop specified at runtime. Optional. | A maximum trip-count for the loop specified at runtime. Optional. |
150 | 152 | Pass empty string to skip. | Pass empty string to skip. |
151 | 153 | * **cond** (optional, heterogeneous) - **B**: | * **cond** (optional, heterogeneous) - **B**: |
152 | 154 | A boolean termination condition. Optional. Pass empty string to | A boolean termination condition. Optional. Pass empty string to |
153 | 155 | skip. | skip. |
154 | 156 | * **v_initial** (variadic) - **V**: | * **v_initial** (variadic) - **V**: |
155 | 157 | The initial values of any loop-carried dependencies (values that | The initial values of any loop-carried dependencies (values that |
156 | 158 | change across loop iterations) | change across loop iterations) |
157 | 159 |
|
|
158 | 160 | **Outputs** | **Outputs** |
159 | 161 |
|
|
160 | 162 | Between 1 and 2147483647 outputs. | Between 1 and 2147483647 outputs. |
161 | 163 |
|
|
162 | 164 | * **v_final_and_scan_outputs** (variadic) - **V**: | * **v_final_and_scan_outputs** (variadic) - **V**: |
163 | 165 | Final N loop carried dependency values then K scan_outputs |
|
166 | outputs must be Tensors. | ||
164 | 167 |
|
|
165 | 168 | **Type Constraints** | **Type Constraints** |
166 | 169 |
|
|
167 | 170 | * **V** in ( | * **V** in ( |
171 | seq(tensor(bool)), | ||
172 | seq(tensor(complex128)), | ||
173 | seq(tensor(complex64)), | ||
174 | seq(tensor(double)), | ||
175 | seq(tensor(float)), | ||
176 | seq(tensor(float16)), | ||
177 | seq(tensor(int16)), | ||
178 | seq(tensor(int32)), | ||
179 | seq(tensor(int64)), | ||
180 | seq(tensor(int8)), | ||
181 | seq(tensor(string)), | ||
182 | seq(tensor(uint16)), | ||
183 | seq(tensor(uint32)), | ||
184 | seq(tensor(uint64)), | ||
185 | seq(tensor(uint8)), | ||
168 | 186 | tensor(bool), | tensor(bool), |
169 | 187 | tensor(complex128), | tensor(complex128), |
170 | 188 | tensor(complex64), | tensor(complex64), |
171 | 189 | tensor(double), | tensor(double), |
172 | 190 | tensor(float), | tensor(float), |
173 | 191 | tensor(float16), | tensor(float16), |
174 | 192 | tensor(int16), | tensor(int16), |
175 | 193 | tensor(int32), | tensor(int32), |
176 | 194 | tensor(int64), | tensor(int64), |
177 | 195 | tensor(int8), | tensor(int8), |
178 | 196 | tensor(string), | tensor(string), |
179 | 197 | tensor(uint16), | tensor(uint16), |
180 | 198 | tensor(uint32), | tensor(uint32), |
181 | 199 | tensor(uint64), | tensor(uint64), |
182 | 200 | tensor(uint8) | tensor(uint8) |
183 | 201 | ): | ): |
184 | 202 | All Tensor types |
|
185 | 203 | * **I** in ( | * **I** in ( |
186 | 204 | tensor(int64) | tensor(int64) |
187 | 205 | ): | ): |
188 | 206 | tensor of int64, which should be a scalar. | tensor of int64, which should be a scalar. |
189 | 207 | * **B** in ( | * **B** in ( |
190 | 208 | tensor(bool) | tensor(bool) |
191 | 209 | ): | ): |
192 | 210 | tensor of bool, which should be a scalar. | tensor of bool, which should be a scalar. |
Loop - 11#
Version
name: Loop (GitHub)
domain: main
since_version: 11
function: False
support_level: SupportType.COMMON
shape inference: True
This version of the operator has been available since version 11.
Summary
Generic Looping construct. This loop has multiple termination conditions:
Trip count. Iteration count specified at runtime. Set by specifying the input M. Optional. Set to empty string to omit. Note that a static trip count (specified at graph construction time) can be specified by passing in a constant node for input M.
Loop termination condition. This is an input to the op that determines whether to run the first iteration and also a loop-carried dependency for the body graph. The body graph must yield a value for the condition variable, whether this input is provided or not.
This table summarizes the operating modes of this operator with equivalent C-style code:
Operator inputs defined as (max_trip_count, condition_var).
- input (“”, “”):
- for (int i=0; ; ++i) {
cond = … // Note this value is ignored, but is required in the body
}
- input (“”, cond) // Note this is analogous to a while loop
bool cond = …; for (int i=0; cond; ++i) {
cond = …;
}
- input (“”, 1) // Note this is analogous to a do-while loop
bool cond = true for (int i=0; cond; ++i) {
cond = …;
}
- input (trip_count, “”) // Note this is analogous to a for loop
int trip_count = … for (int i=0; i < trip_count; ++i) {
cond = …; // ignored
}
- input (trip_count, cond)
int trip_count = …; bool cond = …; for (int i=0; i < trip_count && cond; ++i) {
cond = …;
}
Sample usage - cond as well as trip count
- graph predict-net {
%a = Constant[value = <Scalar Tensor [3]>]() %b = Constant[value = <Scalar Tensor [6]>]() %keepgoing = Constant[value = <Scalar Tensor [1]>]() %max_trip_count = Constant[value = <Scalar Tensor [10]>]() %keepgoing_out, %b_out, %user_defined_vals = Loop[body = <graph body-net>](%max_trip_count, %keepgoing, %b) return
}
- graph body-net (
%i[INT32, scalar] // iteration number %keepgoing_in[BOOL, scalar] // incoming loop-termination-condition; not used %b_in[INT32, scalar] // incoming value of loop-carried-dependency b
- ) {
%my_local = Add(%a, %b_in) %b_out = Sub(%a, %b_in) // outgoing value of loop-carried-dependency b %keepgoing_out = Greater(%my_local, %b_out) // outgoing loop-termination-condition %user_defined_val = Add(%b_in, %b_in) // scan-output value to be accumulated return %keepgoing_out, %b_out, %user_defined_val
}
Sample equivalent C code
- {
/* User-defined code (enclosing scope) / int a = 3, b = 6; bool keepgoing = true; // Analogous to input cond / End user-defined code */
/* Implicitly-defined code / const int max_trip_count = 10; // Analogous to input M int user_defined_vals[]; // Imagine this is resizable / End implicitly-defined code / / initialize loop-carried variables and scan-output variables */ bool keepgoing_out = keepgoing int b_out = b
- for (int i=0; i < max_trip_count && keepgoing_out; ++i) {
- /* Implicitly-defined code: bind actual parameter values
to formal parameter variables of loop-body */
bool keepgoing_in = keepgoing_out; bool b_in = b_out;
/* User-defined code (loop body) / int my_local = a + b_in; // Reading value “a” from the enclosing scope is fine b_out = a - b_in; keepgoing_out = my_local > b_out; user_defined_val = b_in + b_in; // b_in and b_out are different variables / End user-defined code */
/* Implicitly defined-code */ user_defined_vals[i] = user_defined_val // accumulate scan-output values
} // int t = my_local; // Can’t do this. my_local is not accessible here.
// The values below are bound to the output variables of the loop and therefore accessible // b_out; user_defined_vals; keepgoing_out;
}
There are several things of note in this code snippet:
Values from the enclosing scope (i.e. variable “a” here) are in scope and can be referenced in the inputs of the loop.
Any values computed in the loop body that needs to be used in a subsequent iteration or after the loop are modelled using a pair of variables in the loop-body, consisting of an input variable (eg., b_in) and an output variable (eg., b_out). These are referred to as loop-carried dependences. The loop operation node supplies the input value of the input variable for the first iteration, and returns the output value of the output variable produced by the final iteration.
Scan_output variables are used to implicitly concatenate values computed across all the iterations. In the above example, the value of user_defined_val computed over all iterations are concatenated and returned as the value of user_defined_vals after the loop.
Values created in the body cannot be accessed in the enclosing scope, except using the mechanism described above.
Note that the semantics of this op support “diagonal” or “wavefront” execution. (See Step 3 here for an example: https://devblogs.nvidia.com/optimizing-recurrent-neural-networks-cudnn-5/). Frontends should emit multi-layer RNNs as a series of While operators (with time being the inner looping dimension), with each successive layer consuming the scan_outputs from the previous layer, possibly going through several point-wise operators (e.g. dropout, residual connections, linear layer).
Attributes
body (required): The graph run each iteration. It has 2+N inputs: (iteration_num, condition, loop carried dependencies…). It has 1+N+K outputs: (condition, loop carried dependencies…, scan_outputs…). Each scan_output is created by concatenating the value of the specified output value at the end of each iteration of the loop. It is an error if the dimensions or data type of these scan_outputs change across loop iterations.
Inputs
Between 2 and 2147483647 inputs.
M (optional, heterogeneous) - I: A maximum trip-count for the loop specified at runtime. Optional. Pass empty string to skip.
cond (optional, heterogeneous) - B: A boolean termination condition. Optional. Pass empty string to skip.
v_initial (variadic) - V: The initial values of any loop-carried dependencies (values that change across loop iterations)
Outputs
Between 1 and 2147483647 outputs.
v_final_and_scan_outputs (variadic) - V: Final N loop carried dependency values then K scan_outputs
Type Constraints
V in ( tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): All Tensor types
I in ( tensor(int64) ): tensor of int64, which should be a scalar.
B in ( tensor(bool) ): tensor of bool, which should be a scalar.
Differences
0 | 0 | Generic Looping construct. This loop has multiple termination conditions: | Generic Looping construct. This loop has multiple termination conditions: |
1 | 1 |
|
|
2 | 2 | 1) Trip count. Iteration count specified at runtime. Set by | 1) Trip count. Iteration count specified at runtime. Set by |
3 | 3 | specifying the input M. Optional. Set to empty string to omit. | specifying the input M. Optional. Set to empty string to omit. |
4 | 4 | Note that a static trip count (specified at graph construction time) can be | Note that a static trip count (specified at graph construction time) can be |
5 | 5 | specified by passing in a constant node for input M. | specified by passing in a constant node for input M. |
6 | 6 | 2) Loop termination condition. This is an input to the op that determines | 2) Loop termination condition. This is an input to the op that determines |
7 | 7 | whether to run the first iteration and also a loop-carried dependency for | whether to run the first iteration and also a loop-carried dependency for |
8 | 8 | the body graph. The body graph must yield a value for the condition variable, | the body graph. The body graph must yield a value for the condition variable, |
9 | 9 | whether this input is provided or not. | whether this input is provided or not. |
10 | 10 |
|
|
11 | 11 | This table summarizes the operating modes of this operator with equivalent | This table summarizes the operating modes of this operator with equivalent |
12 | 12 | C-style code: | C-style code: |
13 | 13 |
|
|
14 | 14 | Operator inputs defined as (max_trip_count, condition_var). | Operator inputs defined as (max_trip_count, condition_var). |
15 | 15 |
|
|
16 | 16 | input ("", ""): | input ("", ""): |
17 | 17 | for (int i=0; ; ++i) { | for (int i=0; ; ++i) { |
18 | 18 | cond = ... // Note this value is ignored, but is required in the body | cond = ... // Note this value is ignored, but is required in the body |
19 | 19 | } | } |
20 | 20 |
|
|
21 | 21 | input ("", cond) // Note this is analogous to a while loop | input ("", cond) // Note this is analogous to a while loop |
22 | 22 | bool cond = ...; | bool cond = ...; |
23 | 23 | for (int i=0; cond; ++i) { | for (int i=0; cond; ++i) { |
24 | 24 | cond = ...; | cond = ...; |
25 | 25 | } | } |
26 | 26 |
|
|
27 | 27 | input ("", 1) // Note this is analogous to a do-while loop | input ("", 1) // Note this is analogous to a do-while loop |
28 | 28 | bool cond = true | bool cond = true |
29 | 29 | for (int i=0; cond; ++i) { | for (int i=0; cond; ++i) { |
30 | 30 | cond = ...; | cond = ...; |
31 | 31 | } | } |
32 | 32 |
|
|
33 | 33 | input (trip_count, "") // Note this is analogous to a for loop | input (trip_count, "") // Note this is analogous to a for loop |
34 | 34 | int trip_count = ... | int trip_count = ... |
35 | 35 | for (int i=0; i < trip_count; ++i) { | for (int i=0; i < trip_count; ++i) { |
36 | 36 | cond = ...; // ignored | cond = ...; // ignored |
37 | 37 | } | } |
38 | 38 |
|
|
39 | 39 | input (trip_count, cond) | input (trip_count, cond) |
40 | 40 | int trip_count = ...; | int trip_count = ...; |
41 | 41 | bool cond = ...; | bool cond = ...; |
42 | 42 | for (int i=0; i < trip_count && cond; ++i) { | for (int i=0; i < trip_count && cond; ++i) { |
43 | 43 | cond = ...; | cond = ...; |
44 | 44 | } | } |
45 | 45 |
|
|
46 | 46 | *Sample usage - cond as well as trip count* | *Sample usage - cond as well as trip count* |
47 | 47 |
|
|
48 | 48 | graph predict-net { | graph predict-net { |
49 | 49 | %a = Constant[value = | %a = Constant[value = |
50 | 50 | %b = Constant[value = | %b = Constant[value = |
51 | 51 | %keepgoing = Constant[value = | %keepgoing = Constant[value = |
52 | 52 | %max_trip_count = Constant[value = | %max_trip_count = Constant[value = |
53 | 53 | %keepgoing_out, %b_out, %user_defined_vals = Loop[body = | %keepgoing_out, %b_out, %user_defined_vals = Loop[body = |
54 | 54 | return | return |
55 | 55 | } | } |
56 | 56 |
|
|
57 | 57 | graph body-net ( | graph body-net ( |
58 | 58 | %i[INT32, scalar] |
|
59 | 59 | %keepgoing[BOOL, scalar] |
|
60 | 60 | %b[INT32, scalar] |
|
61 | 61 | ) { | ) { |
62 | 62 | %my_local = Add(%a, %b) |
|
63 | 63 | %b_out = Sub(%a, %b) |
|
64 | 64 | %keepgoing_out = Greater(%my_local, %b_out) |
|
65 | 65 | %user_defined_vals = Add(%b, %b) |
|
66 | 66 | return %keepgoing_out, %b_out, %user_defined_vals |
|
67 | 67 | } | } |
68 | 68 |
|
|
69 | 69 | *Sample equivalent C code* | *Sample equivalent C code* |
70 | 70 |
|
|
71 | 71 | { | { |
72 | 72 | /* User-defined code (enclosing scope) */ | /* User-defined code (enclosing scope) */ |
73 | 73 | int a = 3, b = 6; | int a = 3, b = 6; |
74 | 74 | bool keepgoing = true; // Analogous to input cond | bool keepgoing = true; // Analogous to input cond |
75 | 75 | /* End user-defined code */ | /* End user-defined code */ |
76 | 76 |
|
|
77 | 77 | /* Implicitly-defined code */ | /* Implicitly-defined code */ |
78 | 78 | const int max_trip_count = 10; // Analogous to input M | const int max_trip_count = 10; // Analogous to input M |
79 | 79 | int user_defined_vals[]; // Imagine this is resizable | int user_defined_vals[]; // Imagine this is resizable |
80 | 80 | /* End implicitly-defined code */ | /* End implicitly-defined code */ |
81 | /* initialize loop-carried variables and scan-output variables */ | ||
82 | bool keepgoing_out = keepgoing | ||
83 | int b_out = b | ||
84 |
| ||
81 | 85 | for (int i=0; i < max_trip_count && keepgoing; ++i) { |
|
86 | /* Implicitly-defined code: bind actual parameter values | ||
87 | to formal parameter variables of loop-body */ | ||
88 | bool keepgoing_in = keepgoing_out; | ||
89 | bool b_in = b_out; | ||
90 |
| ||
82 | 91 | /* User-defined code (loop body) */ | /* User-defined code (loop body) */ |
83 | 92 | int my_local = a + b; // Reading values in the enclosing scope is fine |
|
93 | b_out = a - b_in; | ||
84 | 94 | b = a - b; // writes fine if we specify b as a loop-carried dependency |
|
95 | user_defined_val = b_in + b_in; // b_in and b_out are different variables | ||
96 | /* End user-defined code */ | ||
97 |
| ||
85 | 98 | keepgoing = my_local > b; // keepgoing is a loop-carried dependency |
|
86 | 99 | user_defined_vals[i] = b + b; |
|
100 | } | ||
87 | 101 | /* End user-defined code */ |
|
88 | } | ||
89 | // my_local = 123; // Can't do this. my_local was defined in the the body | ||
90 | 102 |
|
|
91 | 103 | // These below values are live-out from the loop and therefore accessible |
|
92 | 104 | b_out; user_defined_vals; keepgoing_out; |
|
93 | 105 | } | } |
94 | 106 |
|
|
95 | 107 | There are several things of note in this code snippet: | There are several things of note in this code snippet: |
96 | 108 |
|
|
97 | 109 | 1) Values from the enclosing scope (i.e. variable a here) are in scope and can |
|
98 | 110 | be referenced in the inputs of the loop. | be referenced in the inputs of the loop. |
111 | 2) Any values computed in the loop body that needs to be used in a subsequent | ||
112 | iteration or after the loop are modelled using a pair of variables in the loop-body, | ||
113 | consisting of an input variable (eg., b_in) and an output variable (eg., b_out). | ||
114 | These are referred to as loop-carried dependences. The loop operation node | ||
115 | supplies the input value of the input variable for the first iteration, and | ||
116 | returns the output value of the output variable produced by the final | ||
99 | 117 | 2) Any variables which you wish to make available in the enclosing scope (i.e. |
|
100 | the variables b and keepgoing) must be declared as either loop-carried | ||
101 | dependencies (both at the op inputs and output and at the body net input and | ||
102 | 118 | output) or scan_outputs. |
|
119 | all the iterations. In the above example, the value of user_defined_val computed | ||
120 | over all iterations are concatenated and returned as the value of user_defined_vals | ||
121 | after the loop. | ||
103 | 122 | 3) Values created in the body cannot be accessed in the enclosing scope. |
|
123 | except using the mechanism described above. | ||
104 | 124 |
|
|
105 | 125 | Note that the semantics of this op support "diagonal" or "wavefront" execution. | Note that the semantics of this op support "diagonal" or "wavefront" execution. |
106 | 126 | (See Step 3 here for an example: | (See Step 3 here for an example: |
107 | 127 | https://devblogs.nvidia.com/optimizing-recurrent-neural-networks-cudnn-5/). | https://devblogs.nvidia.com/optimizing-recurrent-neural-networks-cudnn-5/). |
108 | 128 | Frontends should emit multi-layer RNNs as a series of While operators (with | Frontends should emit multi-layer RNNs as a series of While operators (with |
109 | 129 | time being the inner looping dimension), with each successive layer consuming | time being the inner looping dimension), with each successive layer consuming |
110 | 130 | the scan_outputs from the previous layer, possibly going through several | the scan_outputs from the previous layer, possibly going through several |
111 | 131 | point-wise operators (e.g. dropout, residual connections, linear layer). | point-wise operators (e.g. dropout, residual connections, linear layer). |
112 | 132 |
|
|
113 | 133 | **Attributes** | **Attributes** |
114 | 134 |
|
|
115 | 135 | * **body** (required): | * **body** (required): |
116 | 136 | The graph run each iteration. It has 2+N inputs: (iteration_num, | The graph run each iteration. It has 2+N inputs: (iteration_num, |
117 | 137 | condition, loop carried dependencies...). It has 1+N+K outputs: | condition, loop carried dependencies...). It has 1+N+K outputs: |
118 | 138 | (condition, loop carried dependencies..., scan_outputs...). Each | (condition, loop carried dependencies..., scan_outputs...). Each |
119 | 139 | scan_output is created by concatenating the value of the specified | scan_output is created by concatenating the value of the specified |
120 | 140 | output value at the end of each iteration of the loop. It is an | output value at the end of each iteration of the loop. It is an |
121 | 141 | error if the dimensions or data type of these scan_outputs change | error if the dimensions or data type of these scan_outputs change |
122 | 142 | across loop iterations. | across loop iterations. |
123 | 143 |
|
|
124 | 144 | **Inputs** | **Inputs** |
125 | 145 |
|
|
126 | 146 | Between 3 and 2147483647 inputs. |
|
127 | 147 |
|
|
128 | 148 | * **M** (optional, heterogeneous) - **I**: | * **M** (optional, heterogeneous) - **I**: |
129 | 149 | A maximum trip-count for the loop specified at runtime. Optional. | A maximum trip-count for the loop specified at runtime. Optional. |
130 | 150 | Pass empty string to skip. | Pass empty string to skip. |
131 | 151 | * **cond** (optional, heterogeneous) - **B**: | * **cond** (optional, heterogeneous) - **B**: |
132 | 152 | A boolean termination condition. Optional. Pass empty string to | A boolean termination condition. Optional. Pass empty string to |
133 | 153 | skip. | skip. |
134 | 154 | * **v_initial** (variadic) - **V**: | * **v_initial** (variadic) - **V**: |
135 | 155 | The initial values of any loop-carried dependencies (values that | The initial values of any loop-carried dependencies (values that |
136 | 156 | change across loop iterations) | change across loop iterations) |
137 | 157 |
|
|
138 | 158 | **Outputs** | **Outputs** |
139 | 159 |
|
|
140 | 160 | Between 1 and 2147483647 outputs. | Between 1 and 2147483647 outputs. |
141 | 161 |
|
|
142 | 162 | * **v_final_and_scan_outputs** (variadic) - **V**: | * **v_final_and_scan_outputs** (variadic) - **V**: |
143 | 163 | Final N loop carried dependency values then K scan_outputs | Final N loop carried dependency values then K scan_outputs |
144 | 164 |
|
|
145 | 165 | **Type Constraints** | **Type Constraints** |
146 | 166 |
|
|
147 | 167 | * **V** in ( | * **V** in ( |
148 | 168 | tensor(bool), | tensor(bool), |
149 | 169 | tensor(complex128), | tensor(complex128), |
150 | 170 | tensor(complex64), | tensor(complex64), |
151 | 171 | tensor(double), | tensor(double), |
152 | 172 | tensor(float), | tensor(float), |
153 | 173 | tensor(float16), | tensor(float16), |
154 | 174 | tensor(int16), | tensor(int16), |
155 | 175 | tensor(int32), | tensor(int32), |
156 | 176 | tensor(int64), | tensor(int64), |
157 | 177 | tensor(int8), | tensor(int8), |
158 | 178 | tensor(string), | tensor(string), |
159 | 179 | tensor(uint16), | tensor(uint16), |
160 | 180 | tensor(uint32), | tensor(uint32), |
161 | 181 | tensor(uint64), | tensor(uint64), |
162 | 182 | tensor(uint8) | tensor(uint8) |
163 | 183 | ): | ): |
164 | 184 | All Tensor types | All Tensor types |
165 | 185 | * **I** in ( | * **I** in ( |
166 | 186 | tensor(int64) | tensor(int64) |
167 | 187 | ): | ): |
168 | 188 | tensor of int64, which should be a scalar. | tensor of int64, which should be a scalar. |
169 | 189 | * **B** in ( | * **B** in ( |
170 | 190 | tensor(bool) | tensor(bool) |
171 | 191 | ): | ): |
172 | 192 | tensor of bool, which should be a scalar. | tensor of bool, which should be a scalar. |
Loop - 1#
Version
name: Loop (GitHub)
domain: main
since_version: 1
function: False
support_level: SupportType.COMMON
shape inference: True
This version of the operator has been available since version 1.
Summary
Generic Looping construct. This loop has multiple termination conditions:
Trip count. Iteration count specified at runtime. Set by specifying the input M. Optional. Set to empty string to omit. Note that a static trip count (specified at graph construction time) can be specified by passing in a constant node for input M.
Loop termination condition. This is an input to the op that determines whether to run the first iteration and also a loop-carried dependency for the body graph. The body graph must yield a value for the condition variable, whether this input is provided or not.
This table summarizes the operating modes of this operator with equivalent C-style code:
Operator inputs defined as (max_trip_count, condition_var).
- input (“”, “”):
- for (int i=0; ; ++i) {
cond = … // Note this value is ignored, but is required in the body
}
- input (“”, cond) // Note this is analogous to a while loop
bool cond = …; for (int i=0; cond; ++i) {
cond = …;
}
- input (“”, 1) // Note this is analogous to a do-while loop
bool cond = true for (int i=0; cond; ++i) {
cond = …;
}
- input (trip_count, “”) // Note this is analogous to a for loop
int trip_count = … for (int i=0; i < trip_count; ++i) {
cond = …; // ignored
}
- input (trip_count, cond)
int trip_count = …; bool cond = …; for (int i=0; i < trip_count && cond; ++i) {
cond = …;
}
Sample usage - cond as well as trip count
- graph predict-net {
%a = Constant[value = <Scalar Tensor [3]>]() %b = Constant[value = <Scalar Tensor [6]>]() %keepgoing = Constant[value = <Scalar Tensor [1]>]() %max_trip_count = Constant[value = <Scalar Tensor [10]>]() %keepgoing_out, %b_out, %user_defined_vals = Loop[body = <graph body-net>](%max_trip_count, %keepgoing, %b) return
}
- graph body-net (
%i[INT32, scalar] %keepgoing[BOOL, scalar] %b[INT32, scalar]
- ) {
%my_local = Add(%a, %b) %b_out = Sub(%a, %b) %keepgoing_out = Greater(%my_local, %b_out) %user_defined_vals = Add(%b, %b) return %keepgoing_out, %b_out, %user_defined_vals
}
Sample equivalent C code
- {
/* User-defined code (enclosing scope) / int a = 3, b = 6; bool keepgoing = true; // Analogous to input cond / End user-defined code */
/* Implicitly-defined code / const int max_trip_count = 10; // Analogous to input M int user_defined_vals[]; // Imagine this is resizable / End implicitly-defined code */ for (int i=0; i < max_trip_count && keepgoing; ++i) {
/* User-defined code (loop body) / int my_local = a + b; // Reading values in the enclosing scope is fine b = a - b; // writes fine if we specify b as a loop-carried dependency keepgoing = my_local > b; // keepgoing is a loop-carried dependency user_defined_vals[i] = b + b; / End user-defined code */
} // my_local = 123; // Can’t do this. my_local was defined in the the body
// These below values are live-out from the loop and therefore accessible b_out; user_defined_vals; keepgoing_out;
}
There are several things of note in this code snippet:
Values from the enclosing scope (i.e. variable a here) are in scope and can be referenced in the inputs of the loop.
Any variables which you wish to make available in the enclosing scope (i.e. the variables b and keepgoing) must be declared as either loop-carried dependencies (both at the op inputs and output and at the body net input and output) or scan_outputs.
Values created in the body cannot be accessed in the enclosing scope.
Note that the semantics of this op support “diagonal” or “wavefront” execution. (See Step 3 here for an example: https://devblogs.nvidia.com/optimizing-recurrent-neural-networks-cudnn-5/). Frontends should emit multi-layer RNNs as a series of While operators (with time being the inner looping dimension), with each successive layer consuming the scan_outputs from the previous layer, possibly going through several point-wise operators (e.g. dropout, residual connections, linear layer).
Attributes
body (required): The graph run each iteration. It has 2+N inputs: (iteration_num, condition, loop carried dependencies…). It has 1+N+K outputs: (condition, loop carried dependencies…, scan_outputs…). Each scan_output is created by concatenating the value of the specified output value at the end of each iteration of the loop. It is an error if the dimensions or data type of these scan_outputs change across loop iterations.
Inputs
Between 3 and 2147483647 inputs.
M (optional, heterogeneous) - I: A maximum trip-count for the loop specified at runtime. Optional. Pass empty string to skip.
cond (optional, heterogeneous) - B: A boolean termination condition. Optional. Pass empty string to skip.
v_initial (variadic) - V: The initial values of any loop-carried dependencies (values that change across loop iterations)
Outputs
Between 1 and 2147483647 outputs.
v_final_and_scan_outputs (variadic) - V: Final N loop carried dependency values then K scan_outputs
Type Constraints
V in ( tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): All Tensor types
I in ( tensor(int64) ): tensor of int64, which should be a scalar.
B in ( tensor(bool) ): tensor of bool, which should be a scalar.