Run and Serve Quicker VLMs Like Pixtral and Phi-3.5 Imaginative and prescient with vLLM -

Understanding how a lot reminiscence you must serve a VLM

A picture encoded by Pixtral — Picture by the writer

vLLM is at the moment one of many quickest inference engines for big language fashions (LLMs). It helps a variety of mannequin architectures and quantization strategies.

vLLM additionally helps vision-language fashions (VLMs) with multimodal inputs containing each pictures and textual content prompts. As an example, vLLM can now serve fashions like Phi-3.5 Imaginative and prescient and Pixtral, which excel at duties corresponding to picture captioning, optical character recognition (OCR), and visible query answering (VQA).

On this article, I’ll present you learn how to use VLMs with vLLM, specializing in key parameters that influence reminiscence consumption. We’ll see why VLMs devour way more reminiscence than commonplace LLMs. We’ll use Phi-3.5 Imaginative and prescient and Pixtral as case research for a multimodal software that processes prompts containing textual content and pictures.

The code for operating Phi-3.5 Imaginative and prescient and Pixtral with vLLM is offered on this pocket book:

Get the pocket book (#105)

In transformer fashions, producing textual content token by token is sluggish as a result of every prediction depends upon all earlier tokens…

Run and Serve Quicker VLMs Like Pixtral and Phi-3.5 Imaginative and prescient with vLLM

Understanding how a lot reminiscence you must serve a VLM

14 Highly effective Methods Defining the Evolution of Embedding

Do Cognitive Features Range Amongst People?

o3 vs o4-mini vs Gemini 2.5 professional: The Final Reasoning Battle

Yahoo will give tens of millions to a settlement fund for Chinese language dissidents, many years after exposing person information

The Symphony of Thought: The Harmonious Complexity of a New Neural Community

14 Highly effective Methods Defining the Evolution of Embedding

Do Cognitive Features Range Amongst People?

o3 vs o4-mini vs Gemini 2.5 professional: The Final Reasoning Battle

Yahoo will give tens of millions to a settlement fund for Chinese language dissidents, many years after exposing person information