On this companion article, I’ll present my implementation for coaching from scratch a GPT-like mannequin, in Rust. No GPUs, solely CPUs, with a efficiency 30 occasions higher than the native C code.
In my final article, I launched the issue of matrix multiplication, how the eye algorithm makes use of matrix multiplication to carry out an averaging course of, and find out how to effectively implement — or at the very least, for me — a matrix multiplication perform in Rust with Blas.
On this new article, I need to present my first constructing block for implementing llm.c in Rust, specifically, coaching a GPT-like mannequin from scratch utilizing Rust. This has been my means of studying increasingly in regards to the Rust ecosystem and understanding how comparable is with C. Particularly, I need my code to have the ability to prepare a GPT-like mannequin, ranging from GPT weights, utilizing solely CPUs— so no GPUs or TPUs. My purpose is to know how a lot we are able to push these fashions on easy laptops, and the way a lot the Rust ecosystem can be utilized for this. Ultimately, this code may be helpful to fine-tune GPT fashions with a given enter corpus.
All of the related items of code might be discovered right here.