Uncertainty Quantification in Machine Studying with an Straightforward Python Interface

(UQ) in a Machine Studying (ML) mannequin permits one to estimate the precision of its predictions. That is extraordinarily necessary for using its predictions in real-world duties. As an illustration, if a machine studying mannequin is skilled to foretell a property of a cloth, a predicted worth with a 20% uncertainty (error) is probably going for use very otherwise from a predicted worth with a 5% uncertainty (error) within the total decision-making course of. Regardless of its significance, UQ capabilities aren’t accessible with common ML software program in Python, similar to scikit-learn, Tensorflow, and Pytorch.

Enter ML Uncertainty: a Python package deal designed to handle this drawback. Constructed on high of common Python libraries similar to SciPy and scikit-learn, ML Uncertainty offers a really intuitive interface to estimate uncertainties in ML predictions and, the place doable, mannequin parameters. Requiring solely about 4 strains of code to carry out these estimations, the package deal leverages highly effective and theoretically rigorous mathematical strategies within the background. It exploits the underlying statistical properties of the ML mannequin in query, making the package deal computationally cheap. Furthermore, this method extends its applicability to real-world use circumstances the place usually, solely small quantities of information can be found.

Motivation

I’ve been an avid Python person for the final 10 years. I really like the big variety of highly effective libraries which were created and maintained, and the group, which could be very lively. The concept for ML Uncertainty got here to me after I was engaged on a hybrid ML drawback. I had constructed an ML mannequin to foretell stress-strain curves of some polymers. Stress-strain curves–an necessary property of polymers–obey sure physics-based guidelines; for example, they’ve a linear area at low pressure values, and the tensile modulus decreases with temperature.

I discovered from literature some non-linear fashions to explain the curves and these behaviors, thereby decreasing the stress-strain curves to a set of parameters, every with some bodily that means. Then, I skilled an ML mannequin to foretell these parameters from some simply measurable polymer attributes. Notably, I solely had a number of hundred knowledge factors, as is kind of widespread in scientific functions. Having skilled the mannequin, finetuned the hyperparameters, and carried out the outlier evaluation, one of many stakeholders requested me: “That is all good, however what are the error estimates in your predictions?” And I noticed that there wasn’t a sublime strategy to estimate this with Python. I additionally realized that this wasn’t going to be the final time that this drawback was going to come up. And that led me down the trail that culminated on this package deal. 

Having spent a while learning Statistics, I suspected that the mathematics for this wasn’t unattainable and even that tough. I started researching and studying up books like Introduction to Statistical Studying and Components of Statistical Studying1,2 and located some solutions there. ML Uncertainty is my try at implementing a few of these strategies in Python to combine statistics extra tightly into machine studying. I consider that the way forward for machine studying is determined by our potential to extend the reliability of predictions and the interpretability of fashions, and this can be a small step in direction of that purpose. Having developed this package deal, I’ve ceaselessly used it in my work, and it has benefited me enormously.

That is an introduction to ML Uncertainty with an summary of the theories underpinning it. I’ve included some equations to elucidate the speculation, but when these are overwhelming, be happy to gloss over them. For each equation, I’ve said the important thing concept it represents.

Getting began: An instance

We frequently study finest by doing. So, earlier than diving deeper, let’s think about an instance. Say we’re engaged on an excellent old style linear regression drawback the place the mannequin is skilled with scikit-learn. We expect that the mannequin has been skilled effectively, however we would like extra data. As an illustration, what are the prediction intervals for the outputs? With ML Uncertainty, this may be achieved in 4 strains as proven under and mentioned on this instance.

Illustrating ML uncertainty code (a) and plot (b) for linear regression. Picture by writer.

All examples for this package deal will be discovered right here: https://github.com/architdatar/ml_uncertainty/tree/foremost/examples.

Delving deeper: A peek beneath the hood

ML Uncertainty performs these computations by having the ParametricModelInference class wrap across the LinearRegression estimator from scikit-learn to extract all the knowledge it must carry out the uncertainty calculations. It follows the usual process for uncertainty estimation, which is detailed in lots of a statistics textbook,2 of which an summary is proven under.

Since this can be a linear mannequin that may be expressed by way of parameters (( beta )) as ( y = Xbeta ), ML Uncertainty first computes the levels of freedom for the mannequin (( p )), the error levels of freedom (( n – p – 1 )), and the residual sum of squares (( hat{sigma}^2 )). Then, it computes the uncertainty within the mannequin parameters; i.e., the variance-covariance matrix.3

( textual content{Var}(hat{beta}) = hat{sigma}^2 (J^T J)^{-1} )

The place ( J ) is the Jacobian matrix for the parameters. For linear regression, this interprets to:

( textual content{Var}(hat{beta}) = hat{sigma}^2 (X^T X)^{-1} )

Lastly, the get_intervals perform computes the prediction intervals by propagating the uncertainties in each inputs in addition to the parameters. Thus, for knowledge ( X^* ) the place predictions and uncertainties are to be estimated, predictions ( hat{y^*} ) together with the ( (1 – alpha) occasions 100% ) prediction interval are:

( hat{y^*} pm t_{1 – alpha/2, n – p – 1} , hat{sigma} sqrt{textual content{Var}(hat{y^*})} )

The place,

( textual content{Var}(hat{y^*}) = (nabla_X f)(delta X^*)^2(nabla_X f)^T + (nabla_beta f)(delta hat{beta})^2(nabla_beta f)^T + hat{sigma}^2 )

In English, because of this the uncertainty within the output is determined by the uncertainty within the inputs, uncertainty within the parameters, and the residual uncertainty. Simplified for a a number of linear mannequin and assuming no uncertainty in inputs, this interprets to:

( textual content{Var}(hat{y^*}) = hat{sigma}^2 left(1 + X^* (X^T X)^{-1} X^{*T} proper) )

Extensions to linear regression

So, that is what goes on beneath the hood when these 4 strains of code are executed for linear regression. However this isn’t all. ML Uncertainty comes geared up with two extra highly effective capabilities:

  1. Regularization: ML Uncertainty helps L1, L2, and L1+L2 regularization. Mixed with linear regression, because of this it may well cater to LASSO, ridge, and elastic internet regressions. Take a look at this instance.
  2. Weighted least squares regression: Typically, not all observations are equal. We’d need to give extra weight to some observations and fewer weight to others. Generally, this occurs in science when some observations have a excessive quantity of uncertainty whereas some are extra exact. We would like our regression to replicate the extra exact ones, however can not totally discard those with excessive uncertainty. For such circumstances, the weighted least squares regression is used.

Most significantly, a key assumption of linear regression is one thing generally known as homoscedasticity; i.e., that the samples of the response variables are drawn from populations with related variances. If this isn’t the case, it’s dealt with by assigning weights to responses relying on the inverse of their variance. This may be simply dealt with in ML Uncertainty by merely specifying the pattern weights for use throughout coaching within the y_train_weights parameter of the ParametricModelInference class, and the remaining shall be dealt with. An software of that is proven on this instance, albeit for a nonlinear regression case.

Foundation expansions

I’m at all times fascinated by how a lot ML we will get achieved by simply doing linear regression correctly. Many varieties of information similar to tendencies, time collection, audio, and pictures, will be represented by foundation expansions. These representations behave like linear fashions with many wonderful properties. ML Uncertainty can be utilized to compute uncertainties for these fashions simply. Take a look at these examples known as spline_synthetic_data, spline_wage_data, and fourier_basis.

Outcomes of ML Uncertainty used for weighted least squares regression, B-Spline foundation with artificial knowledge, B-Spline foundation with wage knowledge, and Fourier foundation. Picture by writer.

Past linear regression

We frequently encounter conditions the place the underlying mannequin can’t be expressed as a linear mannequin. This generally happens in science, for example, when complicated response kinetics, transport phenomena, course of management issues, are modeled. Commonplace Python packages like scikit-learn, and many others., don’t permit one to immediately match these non-linear fashions and carry out uncertainty estimation on them. ML Uncertainty ships with a category known as NonLinearRegression able to dealing with non-linear fashions. The person can specify the mannequin to be match and the category handles becoming with a scikit-learn-like interface which makes use of a SciPy least_squares perform within the background. This may be simply built-in with the ParametericModelInference class for seamless uncertainty estimation. Like linear regression, we will deal with weighted least squares and regularization for non-linear regression. Right here is an instance.

Random Forests

Random Forests have gained vital reputation within the area. They function by averaging the predictions of resolution timber. Resolution timber, in flip, determine a algorithm to divide the predictor variable area (enter area) and assign a response worth to every terminal node (leaf). The predictions from resolution timber are averaged to offer a prediction for the random forest.1 They’re significantly helpful as a result of they’ll determine complicated relationships in knowledge, are correct, and make fewer assumptions in regards to the knowledge than regressions do.

Whereas it’s applied in common ML libraries like scikit-learn, there is no such thing as a easy strategy to estimate prediction intervals. That is significantly necessary for regression as random forests, given their excessive flexibility, are inclined to overfit their coaching knowledge. Since random forests doesn’t have parameters like conventional regression fashions do, uncertainty quantification must be carried out otherwise. 

We use the fundamental concept of estimating prediction intervals utilizing bootstrapping as described by Hastie et al. in Chapter 7 of their ebook Components of Statistical Studying.2 The central concept we will exploit is that the variance of the predictions ( S(Z) ) for some knowledge ( Z ) will be estimated through predictions of its bootstrap samples as follows:

( widehat{textual content{Var}}[S(Z)] = frac{1}{B – 1} sum_{b=1}^{B} left( S(Z^{*b}) – bar{S}^{*} proper)^2 )

The place ( bar{S}^{*} = sum_b S(Z^{*b}) / B ). Bootstrap samples are samples drawn from the unique dataset repeatedly and independently, thereby permitting repetitions. Fortunate for us, random forests are skilled utilizing one bootstrap pattern for every resolution tree inside it. So, the prediction from every tree ends in a distribution whose variance provides us the variance of the prediction. However there may be nonetheless one drawback. Let’s say we need to receive the variance in prediction for the ( i^{textual content{th}} ) coaching pattern. If we merely use the components above, some predictions shall be from timber that embody the ( i^{textual content{th}} ) pattern within the bootstrap pattern on which they’re skilled. This might result in an unrealistically smaller variance estimate.

To resolve this drawback, the algorithm applied in ML Uncertainty solely considers predictions from timber which didn’t use the ( i^{textual content{th}} ) pattern for coaching. This ends in an unbiased estimate of the variance.

The attractive factor about this method is that we don’t want any further re-training steps. As a substitute, the EnsembleModelInference class elegantly wraps across the RandomForestRegressor estimator in scikit-learn and obtains all the mandatory data from it.

This technique is benchmarked utilizing the strategy described in Zhang et al.,4 which states {that a} right ( (1 – alpha) occasions 100% ) prediction interval is one for which the likelihood of it containing the noticed response is ( (1 – alpha) occasions 100% ). Mathematically,

( P(Y in I_{alpha}) approx 1 – alpha )

Right here is an instance to see ML Uncertainty in motion for random forest fashions.

Uncertainty propagation (Error propagation)

How a lot does a certain quantity of uncertainty in enter variables and/or mannequin parameters have an effect on the uncertainty within the response variable? How does this uncertainty (epistemic) evaluate to the inherent uncertainty within the response variables (aleatoric uncertainty)? Usually, you will need to reply these inquiries to determine on the plan of action. As an illustration, if one finds that the uncertainty in mannequin parameters contributes extremely to the uncertainty in predictions, one may acquire extra knowledge or examine different fashions to cut back this uncertainty. Conversely, if the epistemic uncertainty is smaller than the aleatoric uncertainty, attempting to cut back it additional is perhaps pointless. With ML uncertainty, these questions will be answered simply.

Given a mannequin relating the predictor variables to the response variable, the ErrorPropagation class can simply compute the uncertainty in responses. Say the responses (( y )) are associated to the predictor variables (( X )) through some perform (( f )) and a few parameters (( beta )), expressed as:

( y = f(X, beta) ).

We want to receive prediction intervals for responses (( hat{y^*} )) for some predictor knowledge (( X^* )) with mannequin parameters estimated as ( hat{beta} ). The uncertainty in ( X^* ) and ( hat{beta} ) are given by ( delta X^* ) and ( delta hat{beta} ), respectively. Then, the ( (1 – alpha) occasions 100% ) prediction interval of the response variables shall be given as:

( hat{y^*} pm t_{1 – alpha/2, n – p – 1} , hat{sigma} sqrt{textual content{Var}(hat{y^*})} )

The place,

( textual content{Var}(hat{y^*}) = (nabla_X f)(delta X^*)^2(nabla_X f)^T + (nabla_beta f)(delta hat{beta})^2(nabla_beta f)^T + hat{sigma}^2 )

The necessary factor right here is to note how the uncertainty in predictions contains contributions from the inputs, parameters, in addition to the inherent uncertainty of the response.

The power of the ML Uncertainty package deal to propagate each enter and parameter uncertainties makes it very useful, significantly in science, the place we strongly care in regards to the error (uncertainty) in every worth being predicted. Contemplate the customarily talked about idea of hybrid machine studying. Right here, we mannequin recognized relationships in knowledge by first rules and unknown ones utilizing black-box fashions. Utilizing ML Uncertainty, the uncertainties obtained from these completely different strategies will be simply propagated by the computation graph.

A quite simple instance is that of the Arrhenius mannequin for predicting response price constants. The components ( okay = Ae^{-E_a / RT} ) could be very well-known. Say, the parameters ( A, E_a ) have been predicted from some ML mannequin and have an uncertainty of 5%. We want to understand how a lot error that interprets to within the response price fixed.

This may be very simply achieved with ML Uncertainty as proven on this instance.

Illustration of uncertainty propagation by computational graph. Picture by writer.

Limitations

As of v0.1.1, ML Uncertainty solely works for ML fashions skilled with scikit-learn. It helps the next ML fashions natively: random forest, linear regression, LASSO regression, ridge regression, elastic internet, and regression splines. For another fashions, the person can create the mannequin, the residual, loss perform, and many others., as proven for the non-linear regression instance. The package deal has not been examined for neural networks, transformers, and different deep studying fashions.

Contributions from the open-source ML group are welcome and extremely appreciated. Whereas there may be a lot to be achieved, some key areas of effort are adapting ML Uncertainty to different frameworks similar to PyTorch and Tensorflow, including assist for different ML fashions, highlighting points, and bettering documentation.

Benchmarking

The ML Uncertainty code has been benchmarked in opposition to the statsmodels package deal in Python. Particular circumstances will be discovered right here.

Background

Uncertainty quantification in machine studying has been studied within the ML group and there may be rising curiosity on this area. Nevertheless, as of now, the present options are relevant to very particular use circumstances and have key limitations.

For linear fashions, the statsmodels library can present UQ capabilities. Whereas theoretically rigorous, it can not deal with non-linear fashions. Furthermore, the mannequin must be expressed in a format particular to the package deal. Because of this the person can not benefit from the highly effective preprocessing, coaching, visualization, and different capabilities offered by ML packages like scikit-learn. Whereas it may well present confidence intervals based mostly on uncertainty within the mannequin parameters, it can not propagate uncertainty in predictor variables (enter variables).

One other household of options is model-agnostic UQ. These options make the most of subsamples of coaching knowledge, practice the mannequin repeatedly based mostly on it, and use these outcomes to estimate prediction intervals. Whereas typically helpful within the restrict of huge knowledge, these strategies might not present correct estimates for small coaching datasets the place the samples chosen may result in considerably completely different estimates. Furthermore, it’s a computationally costly train because the mannequin must be retrained a number of occasions. Some packages utilizing this method are MAPIE, PUNCC, UQPy, and ml_uncertainty by NIST (identical title, completely different package deal), amongst many others.5–8

With ML Uncertainty, the objectives have been to maintain the coaching of the mannequin and its UQ separate, cater to extra generic fashions past linear regression, exploit the underlying statistics of the fashions, and keep away from retraining the mannequin a number of occasions to make it computationally cheap.

Abstract and future work

This was an introduction to ML Uncertainty—a Python software program package deal to simply compute uncertainties in machine studying. The principle options of this package deal have been launched right here and among the philosophy of its improvement has been mentioned. Extra detailed documentation and principle will be discovered within the docs. Whereas that is solely a begin, there may be immense scope to broaden this. Questions, discussions, and contributions are at all times welcome. The code will be discovered on GitHub and the package deal will be put in from PyPi. Give it a strive with pip set up ml-uncertainty.

References

(1) James, G.; Witten, D.; Hastie, T.; Tibshirani, R. An Introduction to Statistical Studying; Springer US: New York, NY, 2021. https://doi.org/10.1007/978-1-0716-1418-1.

(2) Hastie, T.; Tibshirani, R.; Friedman, J. The Components of Statistical Studying; Springer New York: New York, NY, 2009. https://doi.org/10.1007/978-0-387-84858-7.

(3) Börlin, N. Nonlinear Optimization. https://www8.cs.umu.se/kurser/5DA001/HT07/lectures/lsq-handouts.pdf.

(4) Zhang, H.; Zimmerman, J.; Nettleton, D.; Nordman, D. J. Random Forest Prediction Intervals. Am Stat 2020, 74 (4), 392–406. https://doi.org/10.1080/00031305.2019.1585288.

(5) Cordier, T.; Blot, V.; Lacombe, L.; Morzadec, T.; Capitaine, A.; Brunel, N. Versatile and Systematic Uncertainty Estimation with Conformal Prediction through the MAPIE Library. In Conformal and Probabilistic Prediction with Purposes; 2023.

(6) Mendil, M.; Mossina, L.; Vigouroux, D. PUNCC: A Python Library for Predictive Uncertainty and Conformalization. In Proceedings of the Twelfth Symposium on Conformal and Probabilistic Prediction with Purposes; Papadopoulos, H., Nguyen, Ok. A., Boström, H., Carlsson, L., Eds.; Proceedings of Machine Studying Analysis; PMLR, 2023; Vol. 204, pp 582–601.

(7) Tsapetis, D.; Shields, M. D.; Giovanis, D. G.; Olivier, A.; Novak, L.; Chakroborty, P.; Sharma, H.; Chauhan, M.; Kontolati, Ok.; Vandanapu, L.; Loukrezis, D.; Gardner, M. UQpy v4.1: Uncertainty Quantification with Python. SoftwareX 2023, 24, 101561. https://doi.org/10.1016/j.softx.2023.101561.

(8) Sheen, D. Machine Studying Uncertainty Estimation Toolbox. https://github.com/usnistgov/ml_uncertainty_py.

[]