Adaptive Normalization

Adaptive Normalization, or AdaNorm, is a variation of Layer Normalization which replaces the bias and scaling terms with a scaling function, based on the statistics of the input. Furthermore the gradients of the scaling function are 'detached' (that is, not required) during backpropagation.