Transformer

The Transformer is a deep learning architecture designed to process sequential data, which features an attention mechanism that embeds context to each sequence element. While originally developed for text, it was soon adapted to other data modalities, including those where data has typically not been treated as sequential (e.g. images).