How one can Deal with Imbalanced Datasets in Machine Studying Tasks | by Jiayan Yin | Oct, 2024

Methods to deal with imbalanced datasets, examples, and Python snippets

Photograph by Nick Fewings on Unsplash

Think about that you simply’ve educated a predictive mannequin with an accuracy rating as excessive as 0.9. The analysis metrics like precision, recall and f1-score additionally seem promising. However your expertise and instinct informed you that one thing isn’t proper so you probably did additional investigation and located this:

Image_1 — Screenshot by the creator

The mannequin’s seemingly robust efficiency is pushed by the bulk class 0 in its goal variable. Because of the evident imbalance between the bulk and minority courses, the mannequin excels at predicting its majority class 0 whereas the efficiency of the minority class 1 is much from passable. Nevertheless, as a result of class 1 represents a really small portion of the goal variable, its efficiency has little affect on the general scores of those analysis metrics, which supplies you an phantasm that the mannequin is powerful.

This isn’t a uncommon case. Quite the opposite, knowledge scientists regularly come throughout imbalanced datasets within the real-world initiatives. An imbalanced dataset refers to a dataset the place the courses or classes will not be