Masked Language Model

A masked language model is a machine learning model trained to "fill in the blanks" in text. I.e., training data points are segments of text where a percentage of tokens (i.e. words or parts or words) are "masked", and the modeling task is to predict the masked tokens based on the non-masked ones. BERT (Bidirectional Encoder Representations from Transformers) is an example of one such model.