Vision Transformer

The Vision Transformer is an adaptation of the Transformer architecture for image classification, where the "tokens" are non-overlapping image patches.

Related concepts:

Transformer

External reference:

https://arxiv.org/abs/2010.11929v2