LLM Routing — The Coronary heart of Any Sensible AI Chatbot Software | by Aliaksei Mikhailiuk | Jan, 2025

constructing dependable, scalable, and sturdy AI purposes— defined in 5 minutes.

Picture generated by the Writer with Imagen 3.

Larger and greater fashions, with extra capability and a bigger context window, beat all and everybody. Having constructed the second most used chatbot thus far — Snapchat’s My AI—I’ve first-handedly seen that sensible merchandise don’t want the very best however most related fashions.

Whereas one mannequin would possibly beat one other within the enviornment for superior Ph.D.-level math questions, the shedding mannequin can be extra appropriate on your particular request as a result of its response is shorter and to the purpose.

Selecting not the very best in high quality, however the correct mannequin on your utility additionally comes from the price and latency perspective — if nearly all of your requests are simply chit-chat — you’ll be losing sources by utilizing the most important fashions on replying to messages like “Hello! How are you?”.

Apart from, what if you wish to serve tens of millions of requests? Server capability is restricted, and in peak hours, you could be dealing with an elevated latency for essentially the most in-demand fashions. To not make your customers wait, you’ll be able to reply with a much less in-demand mannequin that might give an appropriate degree of high quality.