Attention Mechanism
The Attention Mechanism is a pooling function which performs a 'soft' look-up in a dictionary of vectors. Given a query Q, in a typical dictionary look-up we would find the key K closes to Q and return the corresponding value V. In the Attention Mechanism however, all the keys are considered similar to the query, even if by a negligible amount, and the returned value is a linear combination of the values in the dictionary, with weights given by the similarity between the query and the corresponding keys. The Attention Mechanism is at the core of the Transformer, a popular deep learning architecture.