Visualizing Stochastic Regularization for Entity Embeddings | by Valerie Carey | Aug, 2024

A glimpse into how neural networks understand categoricals and their hierarchies

Picture by Rachael Crowe on Unsplash

Trade knowledge typically incorporates non-numeric knowledge with many doable values, for instance zip codes, medical prognosis codes, most well-liked footwear model. These high-cardinality categorical options include helpful info, however incorporating them into machine studying fashions is a little bit of an artwork type.

I’ve been writing a sequence of weblog posts on strategies for these options. Final episode, I confirmed how perturbed coaching knowledge (stochastic regularization) in neural community fashions can dramatically scale back overfitting and enhance efficiency on unseen categorical codes [1].

The truth is, mannequin efficiency for unseen codes can method that of identified codes when hierarchical info is used with stochastic regularization!

Right here, I exploit visualizations and SHAP values to “look beneath the hood” and acquire some insights into how entity embeddings reply to stochastic regularization. The photographs are fairly, and it’s cool to see plots shift as knowledge is modified. Plus, the visualizations recommend mannequin enhancements and may determine teams that may be of curiosity to analysts.

NAICS Codes