One of many key elements of transformers are positional embeddings. You could ask: why? As a result of the self-attention mechanism in transformers is permutation-invariant; meaning it computes the quantity of `consideration` every token within the enter receives from different tokens within the sequence, nonetheless it doesn’t take the order of the tokens into consideration. Actually, consideration mechanism treats the sequence as a bag of tokens. Because of this, we have to have one other part known as positional embedding which accounts for the order of tokens and it influences token embeddings. However what are the several types of positional embeddings and the way are they carried out?
On this put up, we check out three main kinds of positional embeddings and dive deep into their implementation.
Right here is the desk of content material for this put up:
1. Context and Background
2. Absolute Positional Embedding
- 2.1 Discovered Method
- 2.2 Fastened Method (Sinusoidal)
- 2.3 Code Instance: RoBERTa Implementation