There’s a race in direction of language fashions with longer context home windows. However how good are they, and the way can we all know?
This text was initially printed on Artwork Fish Intelligence.
The context window of huge language fashions — the quantity of textual content they’ll course of without delay — has been rising at an exponential charge.
In 2018, language fashions like BERT, T5, and GPT-1 may take as much as 512 tokens as enter. Now, in summer season of 2024, this quantity has jumped to 2 million tokens (in publicly obtainable LLMs). However what does this imply for us, and the way will we consider these more and more succesful fashions?
The just lately launched Gemini 1.5 Professional mannequin can absorb as much as 2 million tokens. However what does 2 million tokens even imply?
If we estimate 4 phrases to roughly equal about 3 tokens, it implies that 2 million tokens can (virtually) match all the Harry Potter and Lord of the Ring sequence.
(The overall phrase rely of all seven books within the Harry Potter sequence is 1,084,625. The overall phrase rely of all seven books within the Lord of the Ring sequence is 481,103. (1,084,625 +…