On this article we are going to discover why 128K tokens (and extra) fashions can’t totally change utilizing RAG.
We’ll begin with a quick reminder of the issues that may be solved with RAG, earlier than wanting on the enhancements in LLMs and their impression on the want to make use of RAG.
RAG isn’t actually new
The concept of injecting a context to let a language mannequin get entry to up-to-date knowledge is kind of “outdated” (on the LLM stage). It was first launched by Fb AI/Meta researcher on this 2020 paper “Retrieval-Augmented Era for Information-Intensive NLP Duties”. As compared the primary model of ChatGPT was solely launched on November 2022.
On this paper they distinguish two type of reminiscence:
- the parametric one, which is what’s inherent to the LLM, what it discovered whereas being fed lot and lot of texts throughout coaching,
- the non-parametric one, which is the reminiscence you possibly can present by feeding a context to the immediate.