Outlier Detection Utilizing Random Forest Regressors: Leveraging Algorithm Strengths to Your Benefit | by Michael Zakhary

Utilizing a mannequin’s robustness to outliers to detect them

Picture by Will Myers on Unsplash

The issue of outlier detection could be tough, particularly if the bottom fact or the outline of what’s an outlier is ambiguous or based mostly upon a number of elements. Mathematically talking, an outlier could be outlined as knowledge factors greater than three customary deviations away from a imply. Nonetheless, in most real-life issues, not all knowledge factors away from a imply are of the identical significance, generally we require a bit extra nuance when flagging outliers.

Picture by Rohanukhade

Let’s take a fast instance:

We have now a dataset of water consumption per family. By analyzing the water consumption as an entire and isolating factors 3 customary deviations from the imply, we are able to shortly get the outliers that use probably the most water.

This nevertheless fails to have in mind the rationale behind the rise in consumption, i.e. there may very well be a number of the reason why the water consumption is excessive, some causes are of extra curiosity…