Your Classifier Is Damaged, However It Is Nonetheless Helpful | by David Lindelöf | Jan, 2025

While you run a binary classifier over a inhabitants you get an estimate of the proportion of true positives in that inhabitants. This is named the prevalence.

Photograph by Rod Lengthy on Unsplash

However that estimate is biased, as a result of no classifier is ideal. For instance, in case your classifier tells you that you’ve got 20% of constructive circumstances, however its precision is thought to be solely 50%, you’d anticipate the true prevalence to be 0.2 × 0.5 = 0.1, i.e. 10%. However that’s assuming excellent recall (all true positives are flagged by the classifier). If the recall is lower than 1, then you recognize the classifier missed some true positives, so that you additionally have to normalize the prevalence estimate by the recall.

This results in the frequent method for getting the true prevalence Pr(y=1) from the constructive prediction charge Pr(ŷ=1):

However suppose that you just need to run the classifier greater than as soon as. For instance, you may need to do that at common intervals to detect developments within the prevalence. You possibly can’t use this method anymore, as a result of precision depends upon the prevalence. To make use of the method above you would need to re-estimate the precision usually (say, with human eval), however then you can simply as properly additionally re-estimate the prevalence itself.

How will we get out of this round reasoning? It seems that binary classifiers produce other efficiency metrics (in addition to precision) that don’t rely upon the prevalence. These embody not solely the recall R but in addition the specificity S, and these metrics can be utilized to regulate Pr(ŷ=1) to get an unbiased estimate of the true prevalence utilizing this method (typically referred to as prevalence adjustment):

the place:

  • Pr(y=1) is the true prevalence
  • S is the specificity
  • R is the sensitivity or recall
  • Pr(ŷ=1) is the proportion of positives

The proof is easy:

Fixing for Pr(y = 1) yields the method above.

Discover that this method breaks down when the denominator R — (1 — S) turns into 0, or when recall turns into equal to the false constructive charge 1-S. However keep in mind what a typical ROC curve seems like:

From https://en.wikipedia.org/wiki/Receiver_operating_characteristic#/media/File:Roccurves.png

An ROC curve like this one plots recall R (aka true constructive charge) in opposition to the false constructive charge 1-S, so a classifier for which R = (1-S) is a classifier falling on the diagonal of the ROC diagram. It is a classifier that’s, basically, guessing randomly. True circumstances and false circumstances are equally prone to be labeled positively by this classifier, so the classifier is totally non-informative, and you’ll’t be taught something from it—and positively not the true prevalence.

Sufficient concept, let’s see if this works in observe:

# randomly draw some covariate
x <- runif(10000, -1, 1)

# take the logit and draw the result
logit <- plogis(x)
y <- runif(10000) < logit

# match a logistic regression mannequin
m <- glm(y ~ x, household = binomial)

# make some predictions, utilizing an absurdly low threshold
y_hat <- predict(m, sort = "response") < 0.3

# get the recall (aka sensitivity) and specificity
c <- caret::confusionMatrix(issue(y_hat), issue(y), constructive = "TRUE")
recall <- unname(c$byClass['Sensitivity'])
specificity <- unname(c$byClass['Specificity'])

# get the adjusted prevalence
(imply(y_hat) - (1 - specificity)) / (recall - (1 - specificity))

# evaluate with precise prevalence
imply(y)

On this simulation I get recall = 0.049 and specificity = 0.875. The expected prevalence is a ridiculously biased 0.087, however the adjusted prevalence is actually equal to the true prevalence (0.498).

To sum up: this exhibits how, utilizing a classifier’s recall and specificity, you’ll be able to adjusted the expected prevalence to trace it over time, assuming that recall and specificity are secure over time. You can’t do that utilizing precision and recall as a result of precision depends upon the prevalence, whereas recall and specificity don’t.