Small Coaching Dataset? You Want SetFit | by Matt Chapman | Jan, 2025

The enterprise-friendly option to prepare NLP classifiers with Python in 2025

Picture by writer

Knowledge shortage is a giant downside for a lot of knowledge scientists.

That may sound ridiculous (“isn’t this the age of Massive Knowledge?”), however in lots of domains there merely isn’t sufficient labelled coaching knowledge to coach performant fashions utilizing conventional ML approaches.

In classification duties, the lazy method to this downside is to “throw AI at it”: take an off-the-shelf pre-trained LLM, add a intelligent immediate, and Bob’s your uncle.

However LLMs aren’t all the time one of the best device for the job. At scale, LLM pipelines might be gradual, costly, and unreliable.

Another possibility is to make use of a fine-tuning/coaching approach that’s designed for few-shot eventualities (the place there’s little coaching knowledge).

On this article, I’ll introduce you to a favorite strategy of mine: SetFit, a fine-tuning framework that may aid you construct extremely performant NLP classifiers with as few as 8 labelled samples per class.