Masked Autoencoder

Masked Autoencoder is a deep learning architecture used to build image representations for downstream recognition tasks. Training involves the following steps: (1) mask a random subset of image patches, (2) run the remaining visible patches through an encoder; (3) introduce mask tokens in the appropriate positions; (4) run these tokens plus the encoded patches through a decoder. Once the model is trained, an image representation is obtained by applying the encoder on all the patches, without masking.
Related concepts:
Encoder, Decoder, Autoencoder
Related video:
https://youtube.com/shorts/rA1AKIiHbeE
External reference:
https://arxiv.org/abs/2111.06377