Probabilistic time sequence forecasting with compositional bayesian neural networks

AutoBNN is predicated on a line of analysis that over the previous decade has yielded improved predictive accuracy by modeling time sequence utilizing GPs with discovered kernel constructions. The kernel operate of a GP encodes assumptions concerning the operate being modeled, such because the presence of tendencies, periodicity or noise. With discovered GP kernels, the kernel operate is outlined compositionally: it’s both a base kernel (akin to Linear, Quadratic, Periodic, Matérn or ExponentiatedQuadratic) or a composite that mixes two or extra kernel capabilities utilizing operators akin to Addition, Multiplication, or ChangePoint. This compositional kernel construction serves two associated functions. First, it’s easy sufficient {that a} consumer who’s an knowledgeable about their knowledge, however not essentially about GPs, can assemble an inexpensive prior for his or her time sequence. Second, methods like Sequential Monte Carlo can be utilized for discrete searches over small constructions and may output interpretable outcomes.

AutoBNN improves upon these concepts, changing the GP with Bayesian neural networks (BNNs) whereas retaining the compositional kernel construction. A BNN is a neural community with a chance distribution over weights fairly than a set set of weights. This induces a distribution over outputs, capturing uncertainty within the predictions. BNNs convey the next benefits over GPs: First, coaching massive GPs is computationally costly, and conventional coaching algorithms scale because the dice of the variety of knowledge factors within the time sequence. In distinction, for a set width, coaching a BNN will usually be roughly linear within the variety of knowledge factors. Second, BNNs lend themselves higher to GPU and TPU {hardware} acceleration than GP coaching operations. Third, compositional BNNs might be simply mixed with conventional deep BNNs, which have the power to do function discovery. One may think about “hybrid” architectures, through which customers specify a top-level construction of Add(Linear, Periodic, Deep), and the deep BNN is left to study the contributions from probably high-dimensional covariate info.

How would possibly one translate a GP with compositional kernels right into a BNN then? A single layer neural community will usually converge to a GP because the variety of neurons (or “width”) goes to infinity. Extra lately, researchers have found a correspondence within the different path — many fashionable GP kernels (akin to Matern, Exponentiated Quadratic, Polynomial or Periodic) might be obtained as infinite-width BNNs with appropriately chosen activation capabilities and weight distributions. Moreover, these BNNs stay near the corresponding GP even when the width may be very a lot lower than infinite. For instance, the figures under present the distinction within the covariance between pairs of observations, and regression outcomes of the true GPs and their corresponding width-10 neural community variations.