Going into the Google DeepMind’s “Scaling LLM Check-Time Compute Optimally could be Extra Efficient than Scaling Mannequin Parameters”
Lately OpenAI unveiled their latest mannequin o1. Quite than spotlight the parameter measurement of this mannequin, OpenAI as a substitute showcased that the mannequin performs considerably higher as a result of it takes extra time. Once you ask the mannequin a query, it can typically taken a number of seconds to reply — a far cry from the millisecond velocity most individuals now anticipate with Massive Language Fashions (LLMs). However, this additional time seems to repay as o1 scores considerably larger than different fashions on the LMSYS Chatbot Enviornment.
Given this leap in efficiency, the query everyone seems to be asking is, How did they do that?
Whereas OpenAI has not publicly said how they achieved these outcomes, there have been a couple of papers just lately which can be good candidates for what is occurring behind the scenes. One such paper is “Scaling LLM Check-Time Compute Optimally could be Extra Efficient than Scaling Mannequin Parameters”. This goes into how one can leverage…