Vision Transformer

The Vision Transformer is an adaptation of the Transformer architecture for image classification, where the "tokens" are non-overlapping image patches.
Related concepts:
Transformer
External reference:
https://arxiv.org/abs/2010.11929v2