lintsampler
is a pure Python bundle that may simply and effectively generate random samples from any likelihood distribution.
Full disclosure: I’m one of many authors of lintsampler
.
We frequently discover ourselves in conditions the place we’ve got a likelihood distribution (PDF) and we have to draw random samples it. For instance, we’d wish to estimate some abstract statistics or to create a inhabitants of particles for a simulation.
If the likelihood distribution is a regular one, comparable to a uniform distribution or a Gaussian (regular) distribution, then the numpy
/scipy
ecosystem supplies us with some simple methods to attract these samples, through the numpy.random
or scipy.stats
modules.
Nonetheless, out within the wild, we regularly encounter likelihood distributions that aren’t Gaussian. Typically, they’re very not Gaussian. For instance:
How would we draw samples from this distribution?
There are a number of widely-used methods to attract samples from arbitrary distributions like this, comparable to rejection sampling or Markov chain Monte Carlo (MCMC). These are wonderful and dependable strategies, with some helpful Python implementations. For instance, emcee is an MCMC sampler broadly utilized in scientific functions.
The issue with these current methods is that they require a good quantity of setup and tuning. With rejection sampling, one has to decide on a proposal distribution, and a poor selection could make the process very inefficient. With MCMC one has to fret about whether or not the samples are converged, which usually requires some post-hoc testing to gauge.
Enter lintsampler
. It’s as simple as:
from lintsampler import LintSampler
import numpy as npx = np.linspace(xmin, xmax, ngrid)
y = np.linspace(ymin, ymax, ngrid)
sampler = LintSampler((x, y), pdf)
pts = sampler.pattern(N=100000)
On this code snippet, we constructed 1D arrays alongside every of the 2 dimensions, then we fed them to the LintSampler
object (imported from the lintsampler
bundle) together with a pdf
perform representing the likelihood distribution we wish to draw samples from. We didn’t spell out the pdf
perform on this snippet, however there are some absolutely self-contained examples within the docs.
Now, pts
is an array containing 100000 samples from the PDF. Right here they’re in a scatter plot:
The purpose of this instance was to display how simple it’s to arrange and use lintsampler
. In sure instances, additionally it is a lot sooner and extra environment friendly than MCMC and/or rejection sampling. If you happen to’re to learn how lintsampler
works below the hood, learn on. In any other case, go to the docs, the place there are directions describing the way to set up and use lintsampler
, together with instance notebooks with 1D, 2D, and 3D use instances, in addition to descriptions of a few of lintsampler’s further options: quasi Monte Carlo sampling (a.ok.a. low discrepancy sequencing), and sampling on an adaptive tree construction. There’s additionally a paper revealed within the Journal of Open Supply Software program (JOSS) describing lintsampler
.
Underlying lintsampler
is an algorithm we name linear interpolant sampling. The concept part of the docs offers a extra detailed and extra mathematical description of how the algorithm works, however right here it’s in brief.
The instance under illustrates what occurs below the hood in lintsampler
while you feed a PDF and a grid to the LintSampler
class. We’ll take a straightforward instance of a 2D Gaussian, however this technique applies in any variety of dimensions, and with a lot much less pleasant PDFs.
- First, the PDF will get evaluated on the grid. Within the instance under, the grid has uneven spacings, only for enjoyable.
- Having evaluated the PDF on the grid on this manner, we will estimate the entire likelihood of every grid cell in accordance with the trapezium rule (i.e., quantity of the cell multiplied by the typical of its nook densities).
- Inside every grid cell, we will approximate the PDF with the bilinear interpolant between the cell corners:
- This linear approximation to the PDF can then be sampled very effectively. Drawing a single pattern is a two step course of, illustrated within the determine under. First, select a random cell from the probability-weighted checklist of cells (left-hand panel). Subsequent, pattern some extent throughout the cell through inverse rework sampling (right-hand panel).
It’s price understanding that the important thing step right here is the linear approximation: we describe this, in addition to extra particulars of the inverse rework sampling course of, within the lintsampler
docs. Approximating the PDF to a linear perform inside grid every cell means it has a closed, analytic type for its quantile perform (i.e., its inverse CDF), which implies doing inverse rework sampling primarily boils all the way down to drawing uniform samples and making use of an algebraic perform to them.
The principle factor the consumer wants to fret about is getting an honest grid decision, in order that the linear approximation is ample. What a great decision is will differ from use case to make use of case, as demonstrated in a few of the instance notebooks within the lintsampler
docs.
Completely happy sampling!