VAE for Time Sequence. Generate real looking sequential knowledge with… | by David Kyle | Aug, 2024

Convolutions exploit the cyclical nature of the inputs to construct higher latent options. Deconvolutions convert latent options into overlapping, repeating sequences to generate knowledge with periodic patterns.

Versatile Time Dimension

Picture-generating VAEs normally have hundreds of photos pre-processed to have a set width and peak. The generated photos will match the width and peak of the coaching knowledge.

For the Phoenix dataset, I solely have one 50 yr time collection. To enhance the coaching, I broke the information up into sequences, finally deciding on assigning a latent variable to every 96 hour interval. Nonetheless, I’ll need to generate time collection which can be longer than 4 days, and, ideally, the output is clean moderately than having discrete 96 hour chunks within the simulations.

Thankfully, Tensorflow permits you to specify unconstrained dimensions in your neural community. In the identical method that neural networks can deal with any batch dimension, you’ll be able to construct your mannequin to deal with an arbitrary variety of time steps. Because of this, my latent variable additionally features a time dimension which might differ. In my mannequin, there’s one time step within the latent area for each 96 hours within the inputs.

Producing new knowledge is so simple as sampling latent variables from the prior the place you choose the variety of steps you need to embody within the time dimension.

VAEs with an unconstrained time dimension can generate knowledge to any size.

The simulated output may have 4 days for every time step you sampled, and the outcomes will seem clean since convolution layers enable enter layers to spill into neighboring time durations.

Seasonally dependent prior

In most VAEs, every element of the latent variable is assumed to comply with a typical regular distribution. This distribution, typically known as the prior, is sampled, then decoded, to generate new knowledge. On this case, I selected a barely extra advanced prior that is dependent upon the time of yr.

Latent variables sampled from a seasonal prior will generate knowledge with traits that modify by the point of yr.

Underneath this prior, generated January knowledge will look very completely different than July knowledge, and generated knowledge from the identical month will share lots of the similar options.

I represented the time of yr as an angle, θ, the place 0° is January 1st, 180° is the start of July, and 360° is again to January once more. The prior is a standard distribution whose imply and log-variance is a 3rd diploma trigonometric polynomial of θ the place the coefficients of the polynomial are parameters discovered throughout coaching along side the encoder and decoder.

The prior distribution parameters are a periodic perform of θ, and well-behaved periodic capabilities may be approximated to any stage of accuracy given a trigonometric polynomial of sufficiently excessive diploma. [5]

left: visualization of θ | proper: prior distribution of Z by way of parameters m and s

The seasonal knowledge is barely used within the prior and doesn’t affect the encoder or decoder. The complete set of probabilistic dependencies is proven right here graphically.

Probabilistic graphical mannequin together with the prior

I educated the mannequin utilizing Tensorflow in Python.

from tensorflow.keras import layers, fashions

Encoder

The enter is outlined with a versatile time dimension. In Keras, you specify an unconstrained dimension utilizing None .

Utilizing the 'similar' padding will append zeros to the enter layer such that the output dimension matches the enter dimension divided by the stride.

inputs = layers.Enter(form=(None,)) # (N, 96*okay)
x = layers.Reshape((-1, 1))(inputs) # (N, 96*okay, 1)

# Conv1D parameters: filters, kernel_size, strides, padding
x = layers.Conv1D(40, 5, 3, 'similar', activation='relu')(x) # (N, 32*okay, 40)
x = layers.Conv1D(40, 3, 2, 'similar', activation='relu')(x) # (N, 16*okay, 40)
x = layers.Conv1D(40, 3, 2, 'similar', activation='relu')(x) # (N, 8*okay, 40)
x = layers.Conv1D(40, 3, 2, 'similar', activation='relu')(x) # (N, 4*okay, 40)
x = layers.Conv1D(40, 3, 2, 'similar', activation='relu')(x) # (N, 2*okay, 40)
x = layers.Conv1D(20, 3, 2, 'similar')(x) # (N, okay, 20)

z_mean = x[: ,:, :10] # (N, okay, 10)
z_log_var = x[:, :, 10:] # (N, okay, 10)
z = Sampling()([z_mean, z_log_var]) # customized layer sampling from gaussian

encoder = fashions.Mannequin(inputs, [z_mean, z_log_var, z], identify='encoder')

Sampling() is a customized layer that samples knowledge from a standard distribution with the given imply and log variance.

Decoder

Deconvolution is carried out with Conv1DTranspose .

# enter form: (batch_size, time_length/96, latent_features)
inputs = layers.Enter(form=(None, 10)) # (N, okay, 10)

# Conv1DTranspose parameters: filters, kernel_size, strides, padding
x = layers.Conv1DTranspose(40, 3, 2, 'similar', activation='relu')(inputs) # (N, 2*okay, 40)
x = layers.Conv1DTranspose(40, 3, 2, 'similar', activation='relu')(x) # (N, 4*okay, 40)
x = layers.Conv1DTranspose(40, 3, 2, 'similar', activation='relu')(x) # (N, 8*okay, 40)
x = layers.Conv1DTranspose(40, 3, 2, 'similar', activation='relu')(x) # (N, 16*okay, 40)
x = layers.Conv1DTranspose(40, 3, 2, 'similar', activation='relu')(x) # (N, 32*okay, 40)
x = layers.Conv1DTranspose(1, 5, 3, 'similar')(x) # (N, 96*okay, 1)

outputs = layers.Reshape((-1,))(x) # (N, 96*okay)

decoder = fashions.Mannequin(inputs, outputs, identify='decoder')

Prior

The prior expects inputs already within the type [sin(θ), cos(θ), sin(2θ), cos(2θ), sin(3θ), cos(3θ)].

The Dense layer has no bias time period as a method of stopping the prior distribution from drifting too removed from zero or having an total variance that was too excessive or too small.

# seasonal inputs form: (N, okay, 6)
inputs = layers.Enter(form=(None, 2*3))

x = layers.Dense(20, use_bias=False)(inputs) # (N, okay, 20)
z_mean = x[:, :, :10] # (N, okay, 10)
z_log_var = x[:, :, 10:] # (N, okay, 10)
z = Sampling()([z_mean, z_log_var]) # (N, okay, 10)

prior = fashions.Mannequin(inputs, [z_mean, z_log_var, z], identify='seasonal_prior')

Full Mannequin

The loss perform comprises a reconstruction time period and a latent regularization time period.

Perform log_lik_normal_sum is a customized perform for calculating the traditional log probability of the noticed knowledge given the reconstructed output. Calculating the log-likelihood requires noise distribution across the decoded output which is assumed to be regular with log variance given by self.noise_log_var, discovered throughout coaching.

For the regularization time period, kl_divergence_sum calculates the Kullback–Leibler divergence between two gaussians — on this case, the latent encoded and prior distributions.

class VAE(fashions.Mannequin):
def __init__(self, encoder, decoder, prior, **kwargs):
tremendous(VAE, self).__init__(**kwargs)
self.encoder = encoder
self.decoder = decoder
self.prior = prior
self.noise_log_var = self.add_weight(identify='var', form=(1,), initializer='zeros', trainable=True)

@tf.perform
def vae_loss(self, knowledge):
values, seasonal = knowledge
z_mean, z_log_var, z = self.encoder(values)
reconstructed = self.decoder(z)
reconstruction_loss = -log_lik_normal_sum(values, reconstructed, self.noise_log_var)/INPUT_SIZE
seasonal_z_mean, seasonal_z_log_var, _ = self.prior(seasonal)
kl_loss_z = kl_divergence_sum(z_mean, z_log_var, seasonal_z_mean, seasonal_z_log_var)/INPUT_SIZE
return reconstruction_loss, kl_loss_z

def train_step(self, knowledge):
with tf.GradientTape() as tape:
reconstruction_loss, kl_loss_z = self.vae_loss(knowledge)
total_loss = reconstruction_loss + kl_loss_z

gradients = tape.gradient(total_loss, self.trainable_variables)
self.optimizer.apply_gradients(zip(gradients, self.trainable_variables))

return {'loss': total_loss}

For the complete implementation, go to my Github repository.

After coaching the mannequin, the generated knowledge matches the seasonal/diurnal profiles and autocorrelation of the unique temperature knowledge.