all been in that second, proper? Looking at a chart as if it’s some historical script, questioning how we’re alleged to make sense of all of it. That’s precisely how I felt once I was requested to clarify the AUC for the ROC curve at work not too long ago.
Although I had a strong understanding of the mathematics behind it, breaking it down into easy, digestible phrases proved to be a problem. I spotted that if I used to be battling it, others most likely had been too. So, I made a decision to put in writing this text to share an intuitive option to perceive the AUC-ROC curve by means of a sensible instance. No dry definitions right here—simply clear, easy explanations targeted on the instinct.
Right here’s the code1 used on this article.
Each knowledge scientist goes by means of a section of evaluating classification fashions. Amidst an array of analysis metrics, Receiver Working Attribute (ROC) curve and the Space Beneath The Curve (AUC) is an indispensable software for gauging mannequin’s efficiency. On this complete article, we are going to talk about primary ideas and see them in motion utilizing our good previous Titanic dataset2.
Part 1: ROC Curve
At its core, the ROC curve visually portrays the fragile stability between a mannequin’s sensitivity and specificity throughout various classification thresholds.
To completely grasp the ROC curve, let’s delve into the ideas:
- Sensitivity/Recall (True Optimistic Fee): Sensitivity quantifies a mannequin’s adeptness at accurately figuring out constructive situations. In our Titanic instance, sensitivity corresponds to the the proportion of precise survival instances that the mannequin precisely labels as constructive.

- Specificity (True Damaging Fee): Specificity measures a mannequin’s proficiency in accurately figuring out unfavourable situations. For our dataset, it represents the proportion of precise non-survived instances (Survival = 0) that the mannequin accurately identifies as unfavourable.

- False Optimistic Fee: FPR measures the proportion of unfavourable situations which might be incorrectly labeled as constructive by the mannequin.

Discover that Specificity and FPR are complementary to one another. Whereas specificity focuses on the right classification of unfavourable situations, FPR focuses on the inaccurate classification of unfavourable situations as constructive. Thus-

Now that we all know the definitions, let’s work with an instance. For Titanic dataset, I’ve constructed a easy logistic regression mannequin that predicts whether or not the passenger survived the shipwreck or not, utilizing following options: Passenger Class, Intercourse, # of siblings/spouses aboard, passenger fare and Port of Embarkation. Word that, the mannequin predicts the ‘likelihood of survival’. The default threshold for logistic regression in sklearn is 0.5. Nevertheless, this default threshold might not at all times make sense for the issue being solved and we have to mess around with the likelihood threshold i.e. if the anticipated likelihood > threshold, occasion is predicted to be constructive else unfavourable.
Now, let’s revisit the definitions of Sensitivity, Specificity and FPR above. Since our predicted binary classification depends on the likelihood threshold, for the given mannequin, these three metrics will change based mostly on the likelihood threshold we use. If we use the next likelihood threshold, we are going to classify fewer instances as positives i.e. our true positives will probably be fewer, leading to decrease Sensitivity/Recall. The next likelihood threshold additionally means fewer false positives, so low FPR. As such, rising sensitivity/recall might result in elevated FPR.
For our coaching knowledge, we are going to use 10 totally different likelihood cutoffs and calculate Sensitivity/TPR and FPR and plot in a chart beneath. Word, the dimensions of circles within the scatterplot correspond to the likelihood threshold used for classification.

Effectively, that’s it. The graph we created above plots Sensitivity (TPR) Vs. FPR at numerous likelihood thresholds IS the ROC curve!
In our experiment, we used 10 totally different likelihood cutoffs with an increment of 0.1 giving us 10 observations. If we use a smaller increment for the likelihood threshold, we are going to find yourself with extra knowledge factors and the graph will appear like our acquainted ROC curve.
To verify our understanding, for the mannequin we constructed for predicting passenger’s survival, we are going to loop by means of numerous predicted likelihood thresholds and calculate TPR, FPR for the testing dataset (see code snippet beneath). Plot the leads to a graph and evaluate this graph with the ROC curve plotted utilizing sklearn’s roc_curve
3 .

As we are able to see, the 2 curves are virtually equivalent. Word the AUC=0.92 was calculated utilizing the roc_auc_score
4 perform. We’ll talk about this AUC within the later a part of this text.
To summarize, ROC curve plots TPR and FPR for the mannequin at numerous likelihood thresholds. Word that, the precise possibilities are NOT displayed within the graph, however one can assume that the observations on the decrease left facet of the curve correspond to larger likelihood thresholds (low TPR), and commentary on the highest proper facet correspond to decrease likelihood thresholds (excessive TPR).
To visualise what’s said above, discuss with the beneath chart, the place I’ve tried to annotate TPR and FPR at totally different likelihood cutoffs.

Part 2: AUC
Now that we now have developed some instinct round what ROC curve is, the following step is to grasp Space Beneath the Curve (AUC). However earlier than delving into the specifics, let’s take into consideration what an ideal classifier seems to be like. Within the best case, we would like the mannequin to realize good separation between constructive and unfavourable observations. In different phrases, the mannequin assigns low possibilities to unfavourable observations and excessive possibilities to constructive observations with no overlap. Thus, there’ll exist some likelihood minimize off, such that each one observations with predicted likelihood < minimize off are unfavourable, and all observations with likelihood >= minimize off are constructive. When this occurs, True Optimistic Fee will probably be 1 and False Optimistic Fee will probably be 0. So the best state to realize is TPR=1 and FPR=0. In actuality, this doesn’t occur, and a extra sensible expectation must be to maximise TPR and decrease FPR.
Usually, as TPR will increase with reducing likelihood threshold, the FPR additionally will increase (see chart 1). We wish TPR to be a lot larger than FPR. That is characterised by the ROC curve that’s bent in the direction of the highest left facet. The next ROC area chart exhibits the right classifier with a blue circle (TPR=1 and FPR=0). Fashions that yield the ROC curve nearer to the blue circle are higher. Intuitively, it signifies that the mannequin is ready to pretty separate unfavourable and constructive observations. Among the many ROC curves within the following chart, gentle blue is greatest adopted by inexperienced and orange. The dashed diagonal line represents random guesses (consider a coin flip).

Now that we perceive ROC curves skewed to the highest left are higher, how can we quantify this? Effectively, mathematically, this may be quantified by calculating the Space Beneath the Curve. The Space Beneath the Curve (AUC) of the ROC curve is at all times between 0 and 1 as a result of our ROC area is bounded between 0 and 1 on each axes. Among the many above ROC curves, the mannequin similar to the sunshine blue ROC curve is healthier in comparison with inexperienced and orange because it has larger AUC.
However how is AUC calculated? Computationally, AUC entails integrating the Roc curve. For fashions producing discrete predictions, AUC might be approximated utilizing the trapezoidal rule6. In its easiest kind, the trapezoidal rule works by approximating the area underneath the graph as a trapezoid and calculating its space. I’ll most likely talk about this in one other article.
This brings us to the final and probably the most awaited half — easy methods to intuitively make sense of AUC? Let’s say you constructed a primary model of a classification mannequin with AUC 0.7 and also you later high-quality tune the mannequin. The revised mannequin has an AUC of 0.9. We perceive that the mannequin with larger AUC is healthier. However what does it actually imply? What does it suggest about our improved prediction energy? Why does it matter? Effectively, there’s lots of literature explaining AUC and its interpretation. A few of them are too technical, some incomplete, and a few are outright incorrect! One interpretation that made probably the most sense to me is:
AUC is the likelihood {that a} randomly chosen constructive occasion possesses the next predicted likelihood than a randomly chosen unfavourable occasion.
Let’s confirm this interpretation. For the straightforward logistic regression we constructed, we are going to visualize the anticipated possibilities of constructive and unfavourable courses (i.e. Survived the shipwreck or not).

We are able to see the mannequin performs fairly nicely in assigning the next likelihood to Survived instances than those who didn’t. There’s some overlap of possibilities within the center part. The AUC calculated utilizing the auc rating
perform in sklearn for our mannequin on the check dataset is 0.92 (see chart 2). So based mostly on the above interpretation of AUC, if we randomly select a constructive occasion and a unfavourable occasion, the likelihood that the constructive occasion may have the next predicted likelihood than the unfavourable occasion must be ~92%.
For this objective, we are going to create swimming pools of predicted possibilities of constructive and unfavourable outcomes. Now we randomly choose one commentary every from each the swimming pools and evaluate their predicted possibilities. We repeat this 100K occasions. Later we calculate % of occasions the anticipated likelihood of a constructive occasion was > predicted likelihood of a unfavourable occasion. If our interpretation is right, this must be equal to .

We did certainly get 0.92! Hope this helps.
Let me know your feedback and be happy to attach with me on LinkedIn.
Word — this text is revised model of the unique article that I wrote on Medium in 2023.
References:
- https://github.com/Swpnilsp/ROC-AUC-Curve/blob/principal/RoC_Curve_Analysispercent20(2).ipynb
- https://www.kaggle.com/competitions/titanic/knowledge (License-CC0: Public Area)
- https://scikit-learn.org/secure/modules/generated/sklearn.metrics.roc_curve.html#sklearn.metrics.roc_curve
- https://scikit-learn.org/secure/modules/generated/sklearn.metrics.roc_auc_score.html
- https://en.wikipedia.org/wiki/Receiver_operating_characteristic
- https://en.wikipedia.org/wiki/Trapezoidal_rule