Machine Learning

Economics of Internet hosting Open Supply LLMs | by Ida Silfverskiöld | Nov, 2024

November 12, 2024

Massive Language Fashions in Manufacturing

Leveraging varied deployment choices

To not scale* — Whole Processing Time on GPU vs CPU | Picture by creator

For those who’re not a member however need to learn this text, see this pal hyperlink right here.

For those who’ve been experimenting with open-source fashions of various sizes, you’re in all probability asking your self: what’s probably the most environment friendly approach to deploy them?

What’s the pricing distinction between on-demand and serverless suppliers, and is it actually price coping with a participant like AWS when there are LLM serving platforms?

I’ve determined to dive into this topic, evaluating cloud distributors like AWS with newer options like Modal, BentoML, Replicate, Hugging Face Endpoints, and Beam.

We’ll have a look at metrics similar to processing time, chilly begin delays, and CPU, reminiscence, and GPU prices to grasp what’s best and economical. We’ll additionally cowl softer metrics like ease of deployment, developer expertise and group.