Grouped-Query Attention

Grouped-query attention is a variation of multi-head attention where, instead of learning a different triplet query/key/value per attention head, a smaller number of key/value pairs are learned and shared across groups of attention heads -- while queries are learned independently as in the original multi-head attention.
Related concepts:
Multi-Query AttentionMulti-Head Attention
External reference:
https://arxiv.org/abs/2305.13245