RMSNorm
Root mean square layer normalization (RMSNorm) is similar to Layer Normalization, except that instead of setting the mean to 0 and the standard deviation to 1, it divides the layer by the root mean square of its activations, then scales it by a learnable tensor.