Past Causal Language Modeling. A deep dive into “Not All Tokens Are… | by Masatake Hirono

Contributions of This Work

This paper gives each an illuminating evaluation of token-level coaching dynamics and a brand new method referred to as SLM:

Token Loss Evaluation:
They show {that a} majority of tokens contribute little past the preliminary coaching part, whereas a small subset stays persistently excessive loss.

SLM for Targeted Studying:
By leveraging a reference mannequin to gauge how “helpful” every token is, they handle to cut back coaching tokens drastically with out sacrificing high quality — in lots of instances even boosting downstream efficiency.

Broad Demonstration of Effectiveness:
SLM works not solely on math-specific duties but additionally in additional common domains, with both a meticulously curated reference dataset or a reference mannequin drawn from the identical giant corpus.

The place May This Go Subsequent?

SLM encompasses varied potential instructions for future analysis. For instance:

Scaling Up Additional:
Although the paper primarily focuses on fashions round 1B to 7B parameters, there stays the open query of how SLM performs on the 30B, 70B, or 100B+ scale. If the token-level method generalizes effectively, the associated fee financial savings could possibly be monumental for really large LLMs.

Reference Fashions through API:
Should you can’t collect curated knowledge, possibly you may use an API-based language mannequin as your reference. That may make SLM extra sensible for smaller analysis groups who lack the sources for selective reference coaching.

Reinforcement Studying Extensions:
Think about coupling SLM with reinforcement studying. The reference mannequin may act as a “reward mannequin,” and token choice would possibly then be optimized by way of one thing akin to coverage gradients.

A number of Reference Fashions:
As an alternative of a single RM, you may practice or collect a number of, every specializing in a unique area or fashion. Then, mix their token scores to supply a extra sturdy multi-domain filtering system.

Alignment and Security:
There’s a rising development towards factoring in alignment or truthfulness. One would possibly practice a reference mannequin to provide greater scores to well-supported statements and 0 out tokens that look factually incorrect or dangerous.

Past Causal Language Modeling. A deep dive into “Not All Tokens Are… | by Masatake Hirono | Jan, 2025

Contributions of This Work

The place May This Go Subsequent?

Contained in the controversial tree farms powering Apple’s carbon impartial purpose

Discover insights from the AI in Schooling Report

Synthetic intelligence enhances air mobility planning | MIT Information

Chopping the complexity from digital carpentry

The vibes are shifting for US local weather tech

Contained in the controversial tree farms powering Apple’s carbon impartial purpose

Discover insights from the AI in Schooling Report

Synthetic intelligence enhances air mobility planning | MIT Information

Chopping the complexity from digital carpentry