Keep away from These Simply Missed Errors in Machine Studying Workflows — Half 1 | by Thomas A Dorfer

Misusing identifiers, incorrect information splits, and ignoring uncommon characteristic values

A collage of the three errors this article focuses on: misusing identifiers, ignoring rare feature values, and incorrect data partitioning. — Picture by the Writer.

One tremendously fulfilling factor about having been concerned within the area of machine studying for so long as I’ve is the chance to at all times be taught one thing new. That one thing new can both be a brand new instrument or methodology (given the fast improvement within the machine studying panorama, there’s by no means a scarcity of that), however it will also be the invention of inaccurate processes in our work that we merely had by no means been conscious of.

A few of these will be fairly obscure and arduous to identify at first look. If these inaccurate processes do slip into your mannequin improvement, there’s an excellent probability it would damage its predictive energy and thus its reliability, and, in the end, its applicability.

On this article, which is the start of a collection exploring widespread pitfalls in machine studying, we’ll concentrate on three information dealing with errors that may happen each throughout the preprocessing part but additionally throughout the modeling part:

Utilizing Numerical Identifiers as Options
Random Partitioning As an alternative of Group Partitioning
Together with Characteristic Values with Inadequate Observations

Keep away from These Simply Missed Errors in Machine Studying Workflows — Half 1 | by Thomas A Dorfer | Jan, 2025

Misusing identifiers, incorrect information splits, and ignoring uncommon characteristic values

Really, Being a Knowledge Scientist is Superior | by Marina Wyss – Gratitude Pushed | Jan, 2025

This quantum pc constructed on server racks paves the way in which to larger machines

Distributed Tracing: A Highly effective Method to Debugging Advanced Methods | by Hareesha Dandamudi | Dec, 2024

DeepSeek-R1 Now Stay With NVIDIA NIM

Indie Selects Anniversary Celebration – Our High Picks from 2024, Plus a Enormous Indie Sale

Really, Being a Knowledge Scientist is Superior | by Marina Wyss – Gratitude Pushed | Jan, 2025

This quantum pc constructed on server racks paves the way in which to larger machines

Distributed Tracing: A Highly effective Method to Debugging Advanced Methods | by Hareesha Dandamudi | Dec, 2024

DeepSeek-R1 Now Stay With NVIDIA NIM