On this article, we’ll embark on a playful journey via the world of transformers, unraveling the complexities of their structure utilizing the Einstein notation.
Introduction:
Transformer fashions have revolutionized the sector of pure language processing (and past), reaching state-of-the-art outcomes on a wide range of duties. They’ve spectacular efficiency however the underlying mathematical operations may be advanced and troublesome to know — particularly with out breaking down the person layers. On this article, I suggest utilizing the Einstein notation to specific the mathematical operations inside a transformer mannequin.
Notice that the Einstein notation is generally utilized in Physics and Arithmetic akin to in Normal Relativity, Electromagnetism, Quantum and Fluid Mechanics but in addition in Linear Algebra to signify matrix operations in a extra compact type.
The aim is to jot down the mathematical operations of each layer in a concise and chic method. By leveraging implicit summation over repeated indices, Einstein notation can simplify the illustration of tensor operations, making it (doubtlessly) simpler to know and due to this fact implement the person layers of the transformer fashions…