Meet GPT, The Decoder-Solely Transformer | by Muhammad Ardi | Jan, 2025

Massive Language Fashions (LLMs), equivalent to ChatGPT, Gemini, Claude, and many others., have been round for some time now, and I consider all of us already used at the very least certainly one of them. As this text is written, ChatGPT already implements the fourth technology of the GPT-based mannequin, named GPT-4. However are you aware what GPT truly is, and what the underlying neural community structure appears like? On this article we’re going to discuss GPT fashions, particularly GPT-1, GPT-2 and GPT-3. I may even reveal the best way to code them from scratch with PyTorch so as to get higher understanding in regards to the construction of those fashions.

A Transient Historical past of GPT

Earlier than we get into GPT, we have to perceive the unique Transformer structure prematurely. Usually talking, a Transformer consists of two major elements: the Encoder and the Decoder. The previous is chargeable for understanding enter sequence, whereas the latter is used for producing one other sequence based mostly on the enter. For instance, in a query answering activity, the decoder will produce a solution to the enter sequence, whereas in a machine translation activity it’s used for producing the interpretation of the enter.