BatchNormalization#

BatchNormalization - 15
BatchNormalization - 14
BatchNormalization - 9
BatchNormalization - 7
BatchNormalization - 6
BatchNormalization - 1

BatchNormalization - 15 #

Version

name: BatchNormalization (GitHub)
domain: main
since_version: 15
function: False
support_level: SupportType.COMMON
shape inference: True

This version of the operator has been available since version 15.

Summary

Carries out batch normalization as described in the paper https://arxiv.org/abs/1502.03167. Depending on the mode it is being run, There are five required inputs ‘X’, ‘scale’, ‘B’, ‘input_mean’ and ‘input_var’. Note that ‘input_mean’ and ‘input_var’ are expected to be the estimated statistics in inference mode (training_mode=False, default), and the running statistics in training mode (training_mode=True). There are multiple cases for the number of outputs, which we list below:

Output case #1: Y, running_mean, running_var (training_mode=True) Output case #2: Y (training_mode=False)

When training_mode=False, extra outputs are invalid. The outputs are updated as follows when training_mode=True:

running_mean = input_mean * momentum + current_mean * (1 - momentum)
running_var = input_var * momentum + current_var * (1 - momentum)

Y = (X - current_mean) / sqrt(current_var + epsilon) * scale + B

where:

current_mean = ReduceMean(X, axis=all_except_channel_index)
current_var =  ReduceVar(X, axis=all_except_channel_index)

Notice that ReduceVar refers to the population variance, and it equals to
sum(sqrd(x_i - x_avg)) / N
where N is the population size (this formula does not use sample size N - 1).

The computation of ReduceMean and ReduceVar uses float to avoid overflow for float16 inputs.

When training_mode=False:

Y = (X - input_mean) / sqrt(input_var + epsilon) * scale + B

For previous (depreciated) non-spatial cases, implementors are suggested to flatten the input shape to (N x C * D1 * D2 * … * Dn) before a BatchNormalization Op. This operator has optional inputs/outputs. See ONNX for more details about the representation of optional arguments. An empty string may be used in the place of an actual argument’s name to indicate a missing argument. Trailing optional arguments (those not followed by an argument that is present) may also be simply omitted.

Attributes

epsilon: The epsilon value to use to avoid division by zero. Default value is 9.999999747378752e-06.
momentum: Factor used in computing the running mean and variance.e.g., running_mean = running_mean * momentum + mean * (1 - momentum). Default value is 0.8999999761581421.
training_mode: If set to true, it indicates BatchNormalization is being used for training, and outputs 1, 2, 3, and 4 would be populated. Default value is 0.

Inputs

X (heterogeneous) - T: Input data tensor from the previous operator; dimensions are in the form of (N x C x D1 x D2 … Dn), where N is the batch size, C is the number of channels. Statistics are computed for every channel of C over N and D1 to Dn dimensions. For image data, input dimensions become (N x C x H x W). The op also accepts single dimension input of size N in which case C is assumed to be 1
scale (heterogeneous) - T1: Scale tensor of shape (C).
B (heterogeneous) - T1: Bias tensor of shape (C).
input_mean (heterogeneous) - T2: running (training) or estimated (testing) mean tensor of shape (C).
input_var (heterogeneous) - T2: running (training) or estimated (testing) variance tensor of shape (C).

Outputs

Between 1 and 3 outputs.

Y (heterogeneous) - T: The output tensor of the same shape as X
running_mean (optional, heterogeneous) - T2: The running mean after the BatchNormalization operator.
running_var (optional, heterogeneous) - T2: The running variance after the BatchNormalization operator. This op uses the population size (N) for calculating variance, and not the sample size N-1.

Type Constraints

T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.
T1 in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16) ): Constrain scale and bias types to float tensors.
T2 in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16) ): Constrain mean and variance types to float tensors.

Examples

Differences

`0`	`0`	`Carries out batch normalization as described in the paper`	`Carries out batch normalization as described in the paper`
`1`	`1`	`https://arxiv.org/abs/1502.03167. Depending on the mode it is being run,`	`https://arxiv.org/abs/1502.03167. Depending on the mode it is being run,`
`2`	`2`	`There are five required inputs 'X', 'scale', 'B', 'input_mean' and`	`There are five required inputs 'X', 'scale', 'B', 'input_mean' and`
`3`	`3`	`'input_var'.`	`'input_var'.`
`4`	`4`	`Note that 'input_mean' and 'input_var' are expected to be the estimated`	`Note that 'input_mean' and 'input_var' are expected to be the estimated`
`5`	`5`	`statistics in inference mode (training_mode=False, default),`	`statistics in inference mode (training_mode=False, default),`
`6`	`6`	`and the running statistics in training mode (training_mode=True).`	`and the running statistics in training mode (training_mode=True).`
`7`	`7`	`There are multiple cases for the number of outputs, which we list below:`	`There are multiple cases for the number of outputs, which we list below:`
`8`	`8`
`9`	`9`	`Output case #1: Y, running_mean, running_var (training_mode=True)`	`Output case #1: Y, running_mean, running_var (training_mode=True)`
`10`	`10`	`Output case #2: Y (training_mode=False)`	`Output case #2: Y (training_mode=False)`
`11`	`11`
`12`	`12`	`When training_mode=False, extra outputs are invalid.`	`When training_mode=False, extra outputs are invalid.`
`13`	`13`	`The outputs are updated as follows when training_mode=True:`	`The outputs are updated as follows when training_mode=True:`
`14`	`14`	`::`	`::`
`15`	`15`
`16`	`16`	`running_mean = input_mean * momentum + current_mean * (1 - momentum)`	`running_mean = input_mean * momentum + current_mean * (1 - momentum)`
`17`	`17`	`running_var = input_var * momentum + current_var * (1 - momentum)`	`running_var = input_var * momentum + current_var * (1 - momentum)`
`18`	`18`
`19`	`19`	`Y = (X - current_mean) / sqrt(current_var + epsilon) * scale + B`	`Y = (X - current_mean) / sqrt(current_var + epsilon) * scale + B`
`20`	`20`
`21`	`21`	`where:`	`where:`
`22`	`22`
`23`	`23`	`current_mean = ReduceMean(X, axis=all_except_channel_index)`	`current_mean = ReduceMean(X, axis=all_except_channel_index)`
`24`	`24`	`current_var = ReduceVar(X, axis=all_except_channel_index)`	`current_var = ReduceVar(X, axis=all_except_channel_index)`
`25`	`25`
`26`	`26`	`Notice that ReduceVar refers to the population variance, and it equals to`	`Notice that ReduceVar refers to the population variance, and it equals to`
`27`	`27`	`sum(sqrd(x_i - x_avg)) / N`	`sum(sqrd(x_i - x_avg)) / N`
`28`	`28`	`where N is the population size (this formula does not use sample size N - 1).`	`where N is the population size (this formula does not use sample size N - 1).`
`29`	`29`
	`30`		`The computation of ReduceMean and ReduceVar uses float to avoid overflow for float16 inputs.`
	`31`
`30`	`32`	`When training_mode=False:`	`When training_mode=False:`
`31`	`33`	`::`	`::`
`32`	`34`
`33`	`35`	`Y = (X - input_mean) / sqrt(input_var + epsilon) * scale + B`	`Y = (X - input_mean) / sqrt(input_var + epsilon) * scale + B`
`34`	`36`
`35`	`37`	`For previous (depreciated) non-spatial cases, implementors are suggested`	`For previous (depreciated) non-spatial cases, implementors are suggested`
`36`	`38`	`to flatten the input shape to (N x C * D1 * D2 * ... * Dn) before a BatchNormalization Op.`	`to flatten the input shape to (N x C * D1 * D2 * ... * Dn) before a BatchNormalization Op.`
`37`	`39`	`This operator has optional inputs/outputs. See ONNX _ for more details about the representation of optional arguments. An empty string may be used in the place of an actual argument's name to indicate a missing argument. Trailing optional arguments (those not followed by an argument that is present) may also be simply omitted.`	`This operator has optional inputs/outputs. See ONNX _ for more details about the representation of optional arguments. An empty string may be used in the place of an actual argument's name to indicate a missing argument. Trailing optional arguments (those not followed by an argument that is present) may also be simply omitted.`
`38`	`40`
`39`	`41`	`Attributes`	`Attributes`
`40`	`42`
`41`	`43`	`* epsilon:`	`* epsilon:`
`42`	`44`	`The epsilon value to use to avoid division by zero. Default value is 9.999999747378752e-06.`	`The epsilon value to use to avoid division by zero. Default value is 9.999999747378752e-06.`
`43`	`45`	`* momentum:`	`* momentum:`
`44`	`46`	`Factor used in computing the running mean and variance.e.g.,`	`Factor used in computing the running mean and variance.e.g.,`
`45`	`47`	`running_mean = running_mean * momentum + mean * (1 - momentum). Default value is 0.8999999761581421.`	`running_mean = running_mean * momentum + mean * (1 - momentum). Default value is 0.8999999761581421.`
`46`	`48`	`* training_mode:`	`* training_mode:`
`47`	`49`	`If set to true, it indicates BatchNormalization is being used for`	`If set to true, it indicates BatchNormalization is being used for`
`48`	`50`	`training, and outputs 1, 2, 3, and 4 would be populated. Default value is 0.`	`training, and outputs 1, 2, 3, and 4 would be populated. Default value is 0.`
`49`	`51`
`50`	`52`	`Inputs`	`Inputs`
`51`	`53`
`52`	`54`	`* X (heterogeneous) - T:`	`* X (heterogeneous) - T:`
`53`	`55`	`Input data tensor from the previous operator; dimensions are in the`	`Input data tensor from the previous operator; dimensions are in the`
`54`	`56`	`form of (N x C x D1 x D2 ... Dn), where N is the batch size, C is`	`form of (N x C x D1 x D2 ... Dn), where N is the batch size, C is`
`55`	`57`	`the number of channels. Statistics are computed for every channel of`	`the number of channels. Statistics are computed for every channel of`
`56`	`58`	`C over N and D1 to Dn dimensions. For image data, input dimensions`	`C over N and D1 to Dn dimensions. For image data, input dimensions`
`57`	`59`	`become (N x C x H x W). The op also accepts single dimension input`	`become (N x C x H x W). The op also accepts single dimension input`
`58`	`60`	`of size N in which case C is assumed to be 1`	`of size N in which case C is assumed to be 1`
`59`	`61`	`* scale (heterogeneous) - T:`	`* scale (heterogeneous) - T1:`
`60`	`62`	`Scale tensor of shape (C).`	`Scale tensor of shape (C).`
`61`	`63`	`* B (heterogeneous) - T:`	`* B (heterogeneous) - T1:`
`62`	`64`	`Bias tensor of shape (C).`	`Bias tensor of shape (C).`
`63`	`65`	`* input_mean (heterogeneous) - U:`	`* input_mean (heterogeneous) - T2:`
`64`	`66`	`running (training) or estimated (testing) mean tensor of shape (C).`	`running (training) or estimated (testing) mean tensor of shape (C).`
`65`	`67`	`* input_var (heterogeneous) - U:`	`* input_var (heterogeneous) - T2:`
`66`	`68`	`running (training) or estimated (testing) variance tensor of shape`	`running (training) or estimated (testing) variance tensor of shape`
`67`	`69`	`(C).`	`(C).`
`68`	`70`
`69`	`71`	`Outputs`	`Outputs`
`70`	`72`
`71`	`73`	`Between 1 and 3 outputs.`	`Between 1 and 3 outputs.`
`72`	`74`
`73`	`75`	`* Y (heterogeneous) - T:`	`* Y (heterogeneous) - T:`
`74`	`76`	`The output tensor of the same shape as X`	`The output tensor of the same shape as X`
`75`	`77`	`* running_mean (optional, heterogeneous) - U:`	`* running_mean (optional, heterogeneous) - T2:`
`76`	`78`	`The running mean after the BatchNormalization operator.`	`The running mean after the BatchNormalization operator.`
`77`	`79`	`* running_var (optional, heterogeneous) - U:`	`* running_var (optional, heterogeneous) - T2:`
`78`	`80`	`The running variance after the BatchNormalization operator. This op`	`The running variance after the BatchNormalization operator. This op`
`79`	`81`	`uses the population size (N) for calculating variance, and not the`	`uses the population size (N) for calculating variance, and not the`
`80`	`82`	`sample size N-1.`	`sample size N-1.`
`81`	`83`
`82`	`84`	`Type Constraints`	`Type Constraints`
`83`	`85`
`84`	`86`	`* T in (`	`* T in (`
`85`	`87`	`tensor(bfloat16),`	`tensor(bfloat16),`
`86`	`88`	`tensor(double),`	`tensor(double),`
`87`	`89`	`tensor(float),`	`tensor(float),`
`88`	`90`	`tensor(float16)`	`tensor(float16)`
`89`	`91`	`):`	`):`
`90`	`92`	`Constrain input and output types to float tensors.`	`Constrain input and output types to float tensors.`
	`93`		`* T1 in (`
	`94`		`tensor(bfloat16),`
	`95`		`tensor(double),`
	`96`		`tensor(float),`
	`97`		`tensor(float16)`
	`98`		`):`
`91`	`99`	`* U in (`	`Constrain scale and bias types to float tensors.`
	`100`		`* T2 in (`
`92`	`101`	`tensor(bfloat16),`	`tensor(bfloat16),`
`93`	`102`	`tensor(double),`	`tensor(double),`
`94`	`103`	`tensor(float),`	`tensor(float),`
`95`	`104`	`tensor(float16)`	`tensor(float16)`
`96`	`105`	`):`	`):`
`97`	`106`	`Constrain mean and variance types to float tensors. It allows all`	`Constrain mean and variance types to float tensors.`
`98`		`float type for U.`

BatchNormalization - 14 #

Version

name: BatchNormalization (GitHub)
domain: main
since_version: 14
function: False
support_level: SupportType.COMMON
shape inference: True

This version of the operator has been available since version 14.

Summary

Output case #1: Y, running_mean, running_var (training_mode=True) Output case #2: Y (training_mode=False)

When training_mode=False, extra outputs are invalid. The outputs are updated as follows when training_mode=True:

running_mean = input_mean * momentum + current_mean * (1 - momentum)
running_var = input_var * momentum + current_var * (1 - momentum)

Y = (X - current_mean) / sqrt(current_var + epsilon) * scale + B

where:

current_mean = ReduceMean(X, axis=all_except_channel_index)
current_var =  ReduceVar(X, axis=all_except_channel_index)

Notice that ReduceVar refers to the population variance, and it equals to
sum(sqrd(x_i - x_avg)) / N
where N is the population size (this formula does not use sample size N - 1).

When training_mode=False:

Y = (X - input_mean) / sqrt(input_var + epsilon) * scale + B

Attributes

epsilon: The epsilon value to use to avoid division by zero. Default value is 9.999999747378752e-06.
momentum: Factor used in computing the running mean and variance.e.g., running_mean = running_mean * momentum + mean * (1 - momentum). Default value is 0.8999999761581421.
training_mode: If set to true, it indicates BatchNormalization is being used for training, and outputs 1, 2, 3, and 4 would be populated. Default value is 0.

Inputs

X (heterogeneous) - T: Input data tensor from the previous operator; dimensions are in the form of (N x C x D1 x D2 … Dn), where N is the batch size, C is the number of channels. Statistics are computed for every channel of C over N and D1 to Dn dimensions. For image data, input dimensions become (N x C x H x W). The op also accepts single dimension input of size N in which case C is assumed to be 1
scale (heterogeneous) - T: Scale tensor of shape (C).
B (heterogeneous) - T: Bias tensor of shape (C).
input_mean (heterogeneous) - U: running (training) or estimated (testing) mean tensor of shape (C).
input_var (heterogeneous) - U: running (training) or estimated (testing) variance tensor of shape (C).

Outputs

Between 1 and 3 outputs.

Y (heterogeneous) - T: The output tensor of the same shape as X
running_mean (optional, heterogeneous) - U: The running mean after the BatchNormalization operator.
running_var (optional, heterogeneous) - U: The running variance after the BatchNormalization operator. This op uses the population size (N) for calculating variance, and not the sample size N-1.

Type Constraints

T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.
U in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16) ): Constrain mean and variance types to float tensors. It allows all float type for U.

Differences

`0`	`0`	`Carries out batch normalization as described in the paper`	`Carries out batch normalization as described in the paper`
`1`	`1`	`https://arxiv.org/abs/1502.03167. Depending on the mode it is being run,`	`https://arxiv.org/abs/1502.03167. Depending on the mode it is being run,`
	`2`		`There are five required inputs 'X', 'scale', 'B', 'input_mean' and`
	`3`		`'input_var'.`
	`4`		`Note that 'input_mean' and 'input_var' are expected to be the estimated`
	`5`		`statistics in inference mode (training_mode=False, default),`
	`6`		`and the running statistics in training mode (training_mode=True).`
`2`	`7`	`there are multiple cases for the number of outputs, which we list below:`	`There are multiple cases for the number of outputs, which we list below:`
`3`	`8`
	`9`		`Output case #1: Y, running_mean, running_var (training_mode=True)`
	`10`		`Output case #2: Y (training_mode=False)`
	`11`
	`12`		`When training_mode=False, extra outputs are invalid.`
	`13`		`The outputs are updated as follows when training_mode=True:`
	`14`		`::`
	`15`
	`16`		`running_mean = input_mean * momentum + current_mean * (1 - momentum)`
	`17`		`running_var = input_var * momentum + current_var * (1 - momentum)`
	`18`
	`19`		`Y = (X - current_mean) / sqrt(current_var + epsilon) * scale + B`
	`20`
	`21`		`where:`
	`22`
	`23`		`current_mean = ReduceMean(X, axis=all_except_channel_index)`
	`24`		`current_var = ReduceVar(X, axis=all_except_channel_index)`
	`25`
	`26`		`Notice that ReduceVar refers to the population variance, and it equals to`
`4`	`27`	`Output case #1: Y, mean, var, saved_mean, saved_var (training mode)`	`sum(sqrd(x_i - x_avg)) / N`
`5`	`28`	`Output case #2: Y (test mode)`	`where N is the population size (this formula does not use sample size N - 1).`
`6`	`29`
	`30`		`When training_mode=False:`
	`31`		`::`
	`32`
	`33`		`Y = (X - input_mean) / sqrt(input_var + epsilon) * scale + B`
	`34`
`7`	`35`	`For previous (depreciated) non-spatial cases, implementors are suggested`	`For previous (depreciated) non-spatial cases, implementors are suggested`
`8`	`36`	`to flatten the input shape to (N x CD1D2 ..*Dn) before a BatchNormalization Op.`	`to flatten the input shape to (N x C * D1 * D2 * ... * Dn) before a BatchNormalization Op.`
`9`	`37`	`This operator has optional inputs/outputs. See ONNX _ for more details about the representation of optional arguments. An empty string may be used in the place of an actual argument's name to indicate a missing argument. Trailing optional arguments (those not followed by an argument that is present) may also be simply omitted.`	`This operator has optional inputs/outputs. See ONNX _ for more details about the representation of optional arguments. An empty string may be used in the place of an actual argument's name to indicate a missing argument. Trailing optional arguments (those not followed by an argument that is present) may also be simply omitted.`
`10`	`38`
`11`	`39`	`Attributes`	`Attributes`
`12`	`40`
`13`	`41`	`* epsilon:`	`* epsilon:`
`14`	`42`	`The epsilon value to use to avoid division by zero. Default value is 9.999999747378752e-06.`	`The epsilon value to use to avoid division by zero. Default value is 9.999999747378752e-06.`
`15`	`43`	`* momentum:`	`* momentum:`
`16`	`44`	`Factor used in computing the running mean and variance.e.g.,`	`Factor used in computing the running mean and variance.e.g.,`
`17`	`45`	`running_mean = running_mean * momentum + mean * (1 - momentum). Default value is 0.8999999761581421.`	`running_mean = running_mean * momentum + mean * (1 - momentum). Default value is 0.8999999761581421.`
	`46`		`* training_mode:`
	`47`		`If set to true, it indicates BatchNormalization is being used for`
	`48`		`training, and outputs 1, 2, 3, and 4 would be populated. Default value is 0.`
`18`	`49`
`19`	`50`	`Inputs`	`Inputs`
`20`	`51`
`21`	`52`	`* X (heterogeneous) - T:`	`* X (heterogeneous) - T:`
`22`	`53`	`Input data tensor from the previous operator; dimensions are in the`	`Input data tensor from the previous operator; dimensions are in the`
`23`	`54`	`form of (N x C x D1 x D2 ... Dn), where N is the batch size, C is`	`form of (N x C x D1 x D2 ... Dn), where N is the batch size, C is`
`24`	`55`	`the number of channels. Statistics are computed for every channel of`	`the number of channels. Statistics are computed for every channel of`
`25`	`56`	`C over N and D1 to Dn dimensions. For image data, input dimensions`	`C over N and D1 to Dn dimensions. For image data, input dimensions`
`26`	`57`	`become (N x C x H x W). The op also accepts single dimension input`	`become (N x C x H x W). The op also accepts single dimension input`
`27`	`58`	`of size N in which case C is assumed to be 1`	`of size N in which case C is assumed to be 1`
`28`	`59`	`* scale (heterogeneous) - T:`	`* scale (heterogeneous) - T:`
`29`	`60`	`Scale tensor of shape (C).`	`Scale tensor of shape (C).`
`30`	`61`	`* B (heterogeneous) - T:`	`* B (heterogeneous) - T:`
`31`	`62`	`Bias tensor of shape (C).`	`Bias tensor of shape (C).`
`32`	`63`	`* mean (heterogeneous) - T:`	`* input_mean (heterogeneous) - U:`
`33`	`64`	`running (training) or estimated (testing) mean tensor of shape (C).`	`running (training) or estimated (testing) mean tensor of shape (C).`
`34`	`65`	`* var (heterogeneous) - T:`	`* input_var (heterogeneous) - U:`
`35`	`66`	`running (training) or estimated (testing) variance tensor of shape`	`running (training) or estimated (testing) variance tensor of shape`
`36`	`67`	`(C).`	`(C).`
`37`	`68`
`38`	`69`	`Outputs`	`Outputs`
`39`	`70`
`40`	`71`	`Between 1 and 5 outputs.`	`Between 1 and 3 outputs.`
`41`	`72`
`42`	`73`	`* Y (heterogeneous) - T:`	`* Y (heterogeneous) - T:`
`43`	`74`	`The output tensor of the same shape as X`	`The output tensor of the same shape as X`
`44`	`75`	`* mean (optional, heterogeneous) - T:`	`* running_mean (optional, heterogeneous) - U:`
`45`	`76`	`The running mean after the BatchNormalization operator.`	`The running mean after the BatchNormalization operator.`
`46`	`77`	`* var (optional, heterogeneous) - T:`	`* running_var (optional, heterogeneous) - U:`
`47`	`78`	`The running variance after the BatchNormalization operator.`	`The running variance after the BatchNormalization operator. This op`
	`79`		`uses the population size (N) for calculating variance, and not the`
`48`	`80`	`* saved_mean (optional, heterogeneous) - T:`	`sample size N-1.`
	`81`
`49`	`82`	`Saved mean used during training to speed up gradient computation.`	`Type Constraints`
	`83`
	`84`		`* T in (`
`50`	`85`	`* saved_var (optional, heterogeneous) - T:`	`tensor(bfloat16),`
`51`	`86`	`Saved variance used during training to speed up gradient`	`tensor(double),`
`52`		`computation.`
`53`
	`87`		`tensor(float),`
	`88`		`tensor(float16)`
	`89`		`):`
`54`	`90`	`Type Constraints`	`Constrain input and output types to float tensors.`
`55`
`56`	`91`	`* T in (`	`* U in (`
	`92`		`tensor(bfloat16),`
`57`	`93`	`tensor(double),`	`tensor(double),`
`58`	`94`	`tensor(float),`	`tensor(float),`
`59`	`95`	`tensor(float16)`	`tensor(float16)`
`60`	`96`	`):`	`):`
`61`	`97`	`Constrain input and output types to float tensors.`	`Constrain mean and variance types to float tensors. It allows all`
	`98`		`float type for U.`

BatchNormalization - 9 #

Version

name: BatchNormalization (GitHub)
domain: main
since_version: 9
function: False
support_level: SupportType.COMMON
shape inference: True

This version of the operator has been available since version 9.

Summary

Carries out batch normalization as described in the paper https://arxiv.org/abs/1502.03167. Depending on the mode it is being run, there are multiple cases for the number of outputs, which we list below:

Output case #1: Y, mean, var, saved_mean, saved_var (training mode) Output case #2: Y (test mode)

For previous (depreciated) non-spatial cases, implementors are suggested to flatten the input shape to (N x C*D1*D2 ..*Dn) before a BatchNormalization Op. This operator has optional inputs/outputs. See ONNX for more details about the representation of optional arguments. An empty string may be used in the place of an actual argument’s name to indicate a missing argument. Trailing optional arguments (those not followed by an argument that is present) may also be simply omitted.

Attributes

epsilon: The epsilon value to use to avoid division by zero. Default value is 9.999999747378752e-06.
momentum: Factor used in computing the running mean and variance.e.g., running_mean = running_mean * momentum + mean * (1 - momentum). Default value is 0.8999999761581421.

Inputs

X (heterogeneous) - T: Input data tensor from the previous operator; dimensions are in the form of (N x C x D1 x D2 … Dn), where N is the batch size, C is the number of channels. Statistics are computed for every channel of C over N and D1 to Dn dimensions. For image data, input dimensions become (N x C x H x W). The op also accepts single dimension input of size N in which case C is assumed to be 1
scale (heterogeneous) - T: Scale tensor of shape (C).
B (heterogeneous) - T: Bias tensor of shape (C).
mean (heterogeneous) - T: running (training) or estimated (testing) mean tensor of shape (C).
var (heterogeneous) - T: running (training) or estimated (testing) variance tensor of shape (C).

Outputs

Between 1 and 5 outputs.

Y (heterogeneous) - T: The output tensor of the same shape as X
mean (optional, heterogeneous) - T: The running mean after the BatchNormalization operator.
var (optional, heterogeneous) - T: The running variance after the BatchNormalization operator.
saved_mean (optional, heterogeneous) - T: Saved mean used during training to speed up gradient computation.
saved_var (optional, heterogeneous) - T: Saved variance used during training to speed up gradient computation.

Type Constraints

T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

Differences

`0`	`0`	`Carries out batch normalization as described in the paper`	`Carries out batch normalization as described in the paper`
`1`	`1`	`https://arxiv.org/abs/1502.03167. Depending on the mode it is being run,`	`https://arxiv.org/abs/1502.03167. Depending on the mode it is being run,`
`2`	`2`	`there are multiple cases for the number of outputs, which we list below:`	`there are multiple cases for the number of outputs, which we list below:`
`3`	`3`
`4`	`4`	`Output case #1: Y, mean, var, saved_mean, saved_var (training mode)`	`Output case #1: Y, mean, var, saved_mean, saved_var (training mode)`
`5`	`5`	`Output case #2: Y (test mode)`	`Output case #2: Y (test mode)`
	`6`
	`7`		`For previous (depreciated) non-spatial cases, implementors are suggested`
	`8`		`to flatten the input shape to (N x CD1D2 ..*Dn) before a BatchNormalization Op.`
`6`	`9`	`This operator has optional inputs/outputs. See ONNX _ for more details about the representation of optional arguments. An empty string may be used in the place of an actual argument's name to indicate a missing argument. Trailing optional arguments (those not followed by an argument that is present) may also be simply omitted.`	`This operator has optional inputs/outputs. See ONNX _ for more details about the representation of optional arguments. An empty string may be used in the place of an actual argument's name to indicate a missing argument. Trailing optional arguments (those not followed by an argument that is present) may also be simply omitted.`
`7`	`10`
`8`	`11`	`Attributes`	`Attributes`
`9`	`12`
`10`	`13`	`* epsilon:`	`* epsilon:`
`11`	`14`	`The epsilon value to use to avoid division by zero. Default value is 9.999999747378752e-06.`	`The epsilon value to use to avoid division by zero. Default value is 9.999999747378752e-06.`
`12`	`15`	`* momentum:`	`* momentum:`
`13`	`16`	`Factor used in computing the running mean and variance.e.g.,`	`Factor used in computing the running mean and variance.e.g.,`
`14`	`17`	`running_mean = running_mean * momentum + mean * (1 - momentum). Default value is 0.8999999761581421.`	`running_mean = running_mean * momentum + mean * (1 - momentum). Default value is 0.8999999761581421.`
`15`		`* spatial:`
`16`		`If true, compute the mean and variance across per activation. If`
`17`		`false, compute the mean and variance across per feature over each`
`18`		`mini-batch. Default value is 1.`
`19`	`18`
`20`	`19`	`Inputs`	`Inputs`
`21`	`20`
`22`	`21`	`* X (heterogeneous) - T:`	`* X (heterogeneous) - T:`
`23`	`22`	`Input data tensor from the previous operator; dimensions for image`	`Input data tensor from the previous operator; dimensions are in the`
`24`	`23`	`case are (N x C x H x W), where N is the batch size, C is the number`	`form of (N x C x D1 x D2 ... Dn), where N is the batch size, C is`
`25`		`of channels, and H and W are the height and the width of the data.`
`26`		`For non image case, the dimensions are in the form of (N x C x D1 x`
`27`		`D2 ... Dn), where N is the batch size.`
	`24`		`the number of channels. Statistics are computed for every channel of`
	`25`		`C over N and D1 to Dn dimensions. For image data, input dimensions`
	`26`		`become (N x C x H x W). The op also accepts single dimension input`
	`27`		`of size N in which case C is assumed to be 1`
`28`	`28`	`* scale (heterogeneous) - T:`	`* scale (heterogeneous) - T:`
`29`	`29`	`If spatial is true, the dimension of scale is (C). If spatial is`	`Scale tensor of shape (C).`
`30`		`false, the dimensions of scale are (C x D1 x ... x Dn)`
`31`	`30`	`* B (heterogeneous) - T:`	`* B (heterogeneous) - T:`
`32`	`31`	`If spatial is true, the dimension of bias is (C). If spatial is`	`Bias tensor of shape (C).`
`33`		`false, the dimensions of bias are (C x D1 x ... x Dn)`
`34`	`32`	`* mean (heterogeneous) - T:`	`* mean (heterogeneous) - T:`
`35`		`If spatial is true, the dimension of the running mean (training) or`
`36`		`the estimated mean (testing) is (C). If spatial is false, the`
`37`	`33`	`dimensions of the running mean (training) or the estimated mean`	`running (training) or estimated (testing) mean tensor of shape (C).`
`38`		`(testing) are (C x D1 x ... x Dn).`
`39`	`34`	`* var (heterogeneous) - T:`	`* var (heterogeneous) - T:`
`40`		`If spatial is true, the dimension of the running variance(training)`
`41`	`35`	`or the estimated variance (testing) is (C). If spatial is false, the`	`running (training) or estimated (testing) variance tensor of shape`
`42`		`dimensions of the running variance(training) or the estimated`
`43`		`variance (testing) are (C x D1 x ... x Dn).`
	`36`		`(C).`
`44`	`37`
`45`	`38`	`Outputs`	`Outputs`
`46`	`39`
`47`	`40`	`Between 1 and 5 outputs.`	`Between 1 and 5 outputs.`
`48`	`41`
`49`	`42`	`* Y (heterogeneous) - T:`	`* Y (heterogeneous) - T:`
`50`	`43`	`The output tensor of the same shape as X`	`The output tensor of the same shape as X`
`51`	`44`	`* mean (optional, heterogeneous) - T:`	`* mean (optional, heterogeneous) - T:`
`52`	`45`	`The running mean after the BatchNormalization operator.`	`The running mean after the BatchNormalization operator.`
`53`	`46`	`* var (optional, heterogeneous) - T:`	`* var (optional, heterogeneous) - T:`
`54`	`47`	`The running variance after the BatchNormalization operator.`	`The running variance after the BatchNormalization operator.`
`55`	`48`	`* saved_mean (optional, heterogeneous) - T:`	`* saved_mean (optional, heterogeneous) - T:`
`56`	`49`	`Saved mean used during training to speed up gradient computation.`	`Saved mean used during training to speed up gradient computation.`
`57`	`50`	`* saved_var (optional, heterogeneous) - T:`	`* saved_var (optional, heterogeneous) - T:`
`58`	`51`	`Saved variance used during training to speed up gradient`	`Saved variance used during training to speed up gradient`
`59`	`52`	`computation.`	`computation.`
`60`	`53`
`61`	`54`	`Type Constraints`	`Type Constraints`
`62`	`55`
`63`	`56`	`* T in (`	`* T in (`
`64`	`57`	`tensor(double),`	`tensor(double),`
`65`	`58`	`tensor(float),`	`tensor(float),`
`66`	`59`	`tensor(float16)`	`tensor(float16)`
`67`	`60`	`):`	`):`
`68`	`61`	`Constrain input and output types to float tensors.`	`Constrain input and output types to float tensors.`

BatchNormalization - 7 #

Version

name: BatchNormalization (GitHub)
domain: main
since_version: 7
function: False
support_level: SupportType.COMMON
shape inference: True

This version of the operator has been available since version 7.

Summary

Output case #1: Y, mean, var, saved_mean, saved_var (training mode) Output case #2: Y (test mode)

This operator has optional inputs/outputs. See ONNX for more details about the representation of optional arguments. An empty string may be used in the place of an actual argument’s name to indicate a missing argument. Trailing optional arguments (those not followed by an argument that is present) may also be simply omitted.

Attributes

epsilon: The epsilon value to use to avoid division by zero. Default value is 9.999999747378752e-06.
momentum: Factor used in computing the running mean and variance.e.g., running_mean = running_mean * momentum + mean * (1 - momentum). Default value is 0.8999999761581421.
spatial: If true, compute the mean and variance across per activation. If false, compute the mean and variance across per feature over each mini-batch. Default value is 1.

Inputs

X (heterogeneous) - T: Input data tensor from the previous operator; dimensions for image case are (N x C x H x W), where N is the batch size, C is the number of channels, and H and W are the height and the width of the data. For non image case, the dimensions are in the form of (N x C x D1 x D2 … Dn), where N is the batch size.
scale (heterogeneous) - T: If spatial is true, the dimension of scale is (C). If spatial is false, the dimensions of scale are (C x D1 x … x Dn)
B (heterogeneous) - T: If spatial is true, the dimension of bias is (C). If spatial is false, the dimensions of bias are (C x D1 x … x Dn)
mean (heterogeneous) - T: If spatial is true, the dimension of the running mean (training) or the estimated mean (testing) is (C). If spatial is false, the dimensions of the running mean (training) or the estimated mean (testing) are (C x D1 x … x Dn).
var (heterogeneous) - T: If spatial is true, the dimension of the running variance(training) or the estimated variance (testing) is (C). If spatial is false, the dimensions of the running variance(training) or the estimated variance (testing) are (C x D1 x … x Dn).

Outputs

Between 1 and 5 outputs.

Y (heterogeneous) - T: The output tensor of the same shape as X
mean (optional, heterogeneous) - T: The running mean after the BatchNormalization operator.
var (optional, heterogeneous) - T: The running variance after the BatchNormalization operator.
saved_mean (optional, heterogeneous) - T: Saved mean used during training to speed up gradient computation.
saved_var (optional, heterogeneous) - T: Saved variance used during training to speed up gradient computation.

Type Constraints

T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

Differences

`0`	`0`	`Carries out batch normalization as described in the paper`	`Carries out batch normalization as described in the paper`
`1`	`1`	`https://arxiv.org/abs/1502.03167. Depending on the mode it is being run,`	`https://arxiv.org/abs/1502.03167. Depending on the mode it is being run,`
`2`	`2`	`there are multiple cases for the number of outputs, which we list below:`	`there are multiple cases for the number of outputs, which we list below:`
`3`	`3`
`4`	`4`	`Output case #1: Y, mean, var, saved_mean, saved_var (training mode)`	`Output case #1: Y, mean, var, saved_mean, saved_var (training mode)`
`5`	`5`	`Output case #2: Y (test mode)`	`Output case #2: Y (test mode)`
`6`
`7`		`Attributes`
`8`
`9`		`* epsilon:`
`10`	`6`	`The epsilon value to use to avoid division by zero, default is`	`This operator has optional inputs/outputs. See ONNX <https://github.com/onnx/onnx/blob/master/docs/IR.md>_ for more details about the representation of optional arguments. An empty string may be used in the place of an actual argument's name to indicate a missing argument. Trailing optional arguments (those not followed by an argument that is present) may also be simply omitted.`
	`7`
`11`	`8`	`1e-5f. Default value is 9.999999747378752e-06.`	`Attributes`
`12`		`* is_test:`
	`9`
`13`	`10`	`If set to nonzero, run spatial batch normalization in test mode,`	`* epsilon:`
`14`	`11`	`default is 0. Default value is 0.`	`The epsilon value to use to avoid division by zero. Default value is 9.999999747378752e-06.`
`15`	`12`	`* momentum:`	`* momentum:`
`16`	`13`	`Factor used in computing the running mean and variance.e.g.,`	`Factor used in computing the running mean and variance.e.g.,`
`17`	`14`	`running_mean = running_mean * momentum + mean * (1 - momentum),`	`running_mean = running_mean * momentum + mean * (1 - momentum). Default value is 0.8999999761581421.`
`18`		`default is 0.9f. Default value is 0.8999999761581421.`
`19`	`15`	`* spatial:`	`* spatial:`
`20`	`16`	`If true, compute the mean and variance across all spatial elements`	`If true, compute the mean and variance across per activation. If`
`21`	`17`	`If false, compute the mean and variance across per feature.Default`	`false, compute the mean and variance across per feature over each`
`22`	`18`	`is 1. Default value is 1.`	`mini-batch. Default value is 1.`
`23`	`19`
`24`	`20`	`Inputs`	`Inputs`
`25`	`21`
`26`	`22`	`* X (heterogeneous) - T:`	`* X (heterogeneous) - T:`
`27`	`23`	`Input data tensor from the previous operator; dimensions for image`	`Input data tensor from the previous operator; dimensions for image`
`28`	`24`	`case are (N x C x H x W), where N is the batch size, C is the number`	`case are (N x C x H x W), where N is the batch size, C is the number`
`29`	`25`	`of channels, and H and W are the height and the width of the data.`	`of channels, and H and W are the height and the width of the data.`
`30`	`26`	`For non image case, the dimensions are in the form of (N x C x D1 x`	`For non image case, the dimensions are in the form of (N x C x D1 x`
`31`	`27`	`D2 ... Dn), where N is the batch size.`	`D2 ... Dn), where N is the batch size.`
`32`	`28`	`* scale (heterogeneous) - T:`	`* scale (heterogeneous) - T:`
`33`		`The scale as a 1-dimensional tensor of size C to be applied to the`
`34`		`output.`
	`29`		`If spatial is true, the dimension of scale is (C). If spatial is`
	`30`		`false, the dimensions of scale are (C x D1 x ... x Dn)`
`35`	`31`	`* B (heterogeneous) - T:`	`* B (heterogeneous) - T:`
`36`		`The bias as a 1-dimensional tensor of size C to be applied to the`
`37`		`output.`
	`32`		`If spatial is true, the dimension of bias is (C). If spatial is`
	`33`		`false, the dimensions of bias are (C x D1 x ... x Dn)`
`38`	`34`	`* mean (heterogeneous) - T:`	`* mean (heterogeneous) - T:`
	`35`		`If spatial is true, the dimension of the running mean (training) or`
	`36`		`the estimated mean (testing) is (C). If spatial is false, the`
`39`	`37`	`The running mean (training) or the estimated mean (testing) as a`	`dimensions of the running mean (training) or the estimated mean`
`40`		`1-dimensional tensor of size C.`
	`38`		`(testing) are (C x D1 x ... x Dn).`
`41`	`39`	`* var (heterogeneous) - T:`	`* var (heterogeneous) - T:`
	`40`		`If spatial is true, the dimension of the running variance(training)`
	`41`		`or the estimated variance (testing) is (C). If spatial is false, the`
`42`	`42`	`The running variance (training) or the estimated variance (testing)`	`dimensions of the running variance(training) or the estimated`
`43`		`as a 1-dimensional tensor of size C.`
	`43`		`variance (testing) are (C x D1 x ... x Dn).`
`44`	`44`
`45`	`45`	`Outputs`	`Outputs`
`46`	`46`
`47`	`47`	`Between 1 and 5 outputs.`	`Between 1 and 5 outputs.`
`48`	`48`
`49`	`49`	`* Y (heterogeneous) - T:`	`* Y (heterogeneous) - T:`
`50`	`50`	`The output tensor of the same shape as X.`	`The output tensor of the same shape as X`
`51`	`51`	`* mean (optional, heterogeneous) - T:`	`* mean (optional, heterogeneous) - T:`
`52`	`52`	`The running mean after the BatchNormalization operator. Must be in-`	`The running mean after the BatchNormalization operator.`
`53`		`place with the input mean. Should not be used for testing.`
`54`	`53`	`* var (optional, heterogeneous) - T:`	`* var (optional, heterogeneous) - T:`
`55`	`54`	`The running variance after the BatchNormalization operator. Must be`	`The running variance after the BatchNormalization operator.`
`56`		`in-place with the input var. Should not be used for testing.`
`57`	`55`	`* saved_mean (optional, heterogeneous) - T:`	`* saved_mean (optional, heterogeneous) - T:`
`58`	`56`	`Saved mean used during training to speed up gradient computation.`	`Saved mean used during training to speed up gradient computation.`
`59`		`Should not be used for testing.`
`60`	`57`	`* saved_var (optional, heterogeneous) - T:`	`* saved_var (optional, heterogeneous) - T:`
`61`	`58`	`Saved variance used during training to speed up gradient`	`Saved variance used during training to speed up gradient`
`62`	`59`	`computation. Should not be used for testing.`	`computation.`
`63`	`60`
`64`	`61`	`Type Constraints`	`Type Constraints`
`65`	`62`
`66`	`63`	`* T in (`	`* T in (`
`67`	`64`	`tensor(double),`	`tensor(double),`
`68`	`65`	`tensor(float),`	`tensor(float),`
`69`	`66`	`tensor(float16)`	`tensor(float16)`
`70`	`67`	`):`	`):`
`71`	`68`	`Constrain input and output types to float tensors.`	`Constrain input and output types to float tensors.`

BatchNormalization - 6 #

Version

name: BatchNormalization (GitHub)
domain: main
since_version: 6
function: False
support_level: SupportType.COMMON
shape inference: True

This version of the operator has been available since version 6.

Summary

Output case #1: Y, mean, var, saved_mean, saved_var (training mode) Output case #2: Y (test mode)

Attributes

epsilon: The epsilon value to use to avoid division by zero, default is 1e-5f. Default value is 9.999999747378752e-06.
is_test: If set to nonzero, run spatial batch normalization in test mode, default is 0. Default value is 0.
momentum: Factor used in computing the running mean and variance.e.g., running_mean = running_mean * momentum + mean * (1 - momentum), default is 0.9f. Default value is 0.8999999761581421.
spatial: If true, compute the mean and variance across all spatial elements If false, compute the mean and variance across per feature.Default is 1. Default value is 1.

Inputs

X (heterogeneous) - T: Input data tensor from the previous operator; dimensions for image case are (N x C x H x W), where N is the batch size, C is the number of channels, and H and W are the height and the width of the data. For non image case, the dimensions are in the form of (N x C x D1 x D2 … Dn), where N is the batch size.
scale (heterogeneous) - T: The scale as a 1-dimensional tensor of size C to be applied to the output.
B (heterogeneous) - T: The bias as a 1-dimensional tensor of size C to be applied to the output.
mean (heterogeneous) - T: The running mean (training) or the estimated mean (testing) as a 1-dimensional tensor of size C.
var (heterogeneous) - T: The running variance (training) or the estimated variance (testing) as a 1-dimensional tensor of size C.

Outputs

Between 1 and 5 outputs.

Y (heterogeneous) - T: The output tensor of the same shape as X.
mean (optional, heterogeneous) - T: The running mean after the BatchNormalization operator. Must be in- place with the input mean. Should not be used for testing.
var (optional, heterogeneous) - T: The running variance after the BatchNormalization operator. Must be in-place with the input var. Should not be used for testing.
saved_mean (optional, heterogeneous) - T: Saved mean used during training to speed up gradient computation. Should not be used for testing.
saved_var (optional, heterogeneous) - T: Saved variance used during training to speed up gradient computation. Should not be used for testing.

Type Constraints

T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

Differences

BatchNormalization - 1 #

Version

name: BatchNormalization (GitHub)
domain: main
since_version: 1
function: False
support_level: SupportType.COMMON
shape inference: False

This version of the operator has been available since version 1.

Summary

Output case #1: Y, mean, var, saved_mean, saved_var (training mode) Output case #2: Y (test mode)

Attributes

consumed_inputs (required): legacy optimization attribute.
epsilon: The epsilon value to use to avoid division by zero, default is 1e-5f. Default value is 9.999999747378752e-06.
is_test: If set to nonzero, run spatial batch normalization in test mode, default is 0. Default value is 0.
momentum: Factor used in computing the running mean and variance.e.g., running_mean = running_mean * momentum + mean * (1 - momentum), default is 0.9f. Default value is 0.8999999761581421.
spatial: If true, compute the mean and variance across all spatial elements If false, compute the mean and variance across per feature.Default is 1. Default value is 1.

Inputs

X (heterogeneous) - T: The input 4-dimensional tensor of shape NCHW.
scale (heterogeneous) - T: The scale as a 1-dimensional tensor of size C to be applied to the output.
B (heterogeneous) - T: The bias as a 1-dimensional tensor of size C to be applied to the output.
mean (heterogeneous) - T: The running mean (training) or the estimated mean (testing) as a 1-dimensional tensor of size C.
var (heterogeneous) - T: The running variance (training) or the estimated variance (testing) as a 1-dimensional tensor of size C.

Outputs

Between 1 and 5 outputs.

Y (heterogeneous) - T: The output 4-dimensional tensor of the same shape as X.
mean (optional, heterogeneous) - T: The running mean after the BatchNormalization operator. Must be in- place with the input mean. Should not be used for testing.
var (optional, heterogeneous) - T: The running variance after the BatchNormalization operator. Must be in-place with the input var. Should not be used for testing.
saved_mean (optional, heterogeneous) - T: Saved mean used during training to speed up gradient computation. Should not be used for testing.
saved_var (optional, heterogeneous) - T: Saved variance used during training to speed up gradient computation. Should not be used for testing.

Type Constraints

T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

AveragePool

Bernoulli

BatchNormalization#

BatchNormalization - 15#

BatchNormalization - 14#

BatchNormalization - 9#

BatchNormalization - 7#

BatchNormalization - 6#

BatchNormalization - 1#

BatchNormalization - 15 #

BatchNormalization - 14 #

BatchNormalization - 9 #

BatchNormalization - 7 #

BatchNormalization - 6 #

BatchNormalization - 1 #