Utilizing Vector Steering to Enhance Mannequin Steering | by Matthew Gunton | Oct, 2024

Massive language fashions are complicated and don’t at all times give solutions which can be excellent. To treatment this, individuals attempt many various methods to information the mannequin’s output. We’ve seen pre-training on bigger datasets, pre-training fashions with extra parameters, and utilizing a vector-database (or another type of lookup) so as to add related context to the LLM’s enter. All of those do see some enchancment, however there isn’t a technique in the present day that’s fool-proof.

One attention-grabbing technique to information the mannequin is vector steering. An attention-grabbing instance of that is the Claude Golden Gate Bridge experiment. Right here we see that it doesn’t matter what the consumer asks, Claude will discover some intelligent technique to carry up its favourite subject: the Golden Gate Bridge.

Picture from “Scaling Monosemanticity: Extracting Interpretable Options from Claude 3 Sonnet” Exhibiting Claude Sonnet’s Conduct Change With Steering Vector

At the moment I’ll be going via the analysis executed on this subject and in addition explaining Anastasia Borovykh’s glorious code implementation. If you happen to’re extra on this subject, I extremely suggest testing her video.

Let’s dive in!