What’s Inside a Neural Community?. Plotting floor of error in 3D utilizing… | by Aleksei Rozanov | Sep, 2024

Before everything, we want artificial information to work with. The information ought to exhibit some non-linear dependency. Let’s outline it like this:

Picture by writer.

In python it can have the next form:

np.random.seed(42)
X = np.random.regular(1, 4.5, 10000)
y = np.piecewise(X, [X < -2,(X >= -2) & (X < 2), X >= 2], [lambda X: 2*X + 5, lambda X: 7.3*np.sin(X), lambda X: -0.03*X**3 + 2]) + np.random.regular(0, 1, X.form)

After visualization:

Picture by writer.

Since we’re visualizing a 3D house, our neural community will solely have 2 weights. This implies the ANN will encompass a single hidden neuron. Implementing this in PyTorch is sort of intuitive:

class ANN(nn.Module):
def __init__(self, input_size, N, output_size):
tremendous().__init__()
self.internet = nn.Sequential()
self.internet.add_module(title='Layer_1', module=nn.Linear(input_size, N, bias=False))
self.internet.add_module(title='Tanh',module=nn.Tanh())
self.internet.add_module(title='Layer_2',module=nn.Linear(N, output_size, bias=False))

def ahead(self, x):
return self.internet(x)

Necessary! Don’t overlook to show off the biases in your layers, in any other case you’ll find yourself having x2 extra parameters.

Picture by writer.

To construct the error floor, we first have to create a grid of attainable values for W1 and W2. Then, for every weight mixture, we’ll replace the parameters of the community and calculate the error:

W1, W2 = np.arange(-2, 2, 0.05), np.arange(-2, 2, 0.05)
LOSS = np.zeros((len(W1), len(W2)))
for i, w1 in enumerate(W1):
mannequin.internet._modules['Layer_1'].weight.information = torch.tensor([[w1]], dtype=torch.float32)

for j, w2 in enumerate(W2):
mannequin.internet._modules['Layer_2'].weight.information = torch.tensor([[w2]], dtype=torch.float32)

mannequin.eval()
total_loss = 0
with torch.no_grad():
for x, y in test_loader:
preds = mannequin(x.reshape(-1, 1))
total_loss += loss(preds, y).merchandise()

LOSS[i, j] = total_loss / len(test_loader)

It might take a while. In the event you make the decision of this grid too coarse (i.e., the step measurement between attainable weight values), you would possibly miss native minima and maxima. Bear in mind how the educational charge is usually schedule to lower over time? Once we do that, absolutely the change in weight values may be as small as 1e-3 or much less. A grid with a 0.5 step merely received’t seize these tremendous particulars of the error floor!

At this level, we don’t care in any respect concerning the high quality of the skilled mannequin. Nonetheless, we do need to take note of the educational charge, so let’s preserve it between 1e-1 and 1e-2. We’ll merely accumulate the burden values and errors in the course of the coaching course of and retailer them in separate lists:

mannequin = ANN(1,1,1)
epochs = 25
lr = 1e-2

optimizer = optim.SGD(mannequin.parameters(),lr =lr)

mannequin.internet._modules['Layer_1'].weight.information = torch.tensor([[-1]], dtype=torch.float32)
mannequin.internet._modules['Layer_2'].weight.information = torch.tensor([[-1]], dtype=torch.float32)

errors, weights_1, weights_2 = [], [], []

mannequin.eval()
with torch.no_grad():
total_loss = 0
for x, y in test_loader:
preds = mannequin(x.reshape(-1,1))
error = loss(preds, y)
total_loss += error.merchandise()
weights_1.append(mannequin.internet._modules['Layer_1'].weight.information.merchandise())
weights_2.append(mannequin.internet._modules['Layer_2'].weight.information.merchandise())
errors.append(total_loss / len(test_loader))

for epoch in tqdm(vary(epochs)):
mannequin.practice()

for x, y in train_loader:
pred = mannequin(x.reshape(-1,1))
error = loss(pred, y)
optimizer.zero_grad()
error.backward()
optimizer.step()

mannequin.eval()
test_preds, true = [], []
with torch.no_grad():
total_loss = 0
for x, y in test_loader:
preds = mannequin(x.reshape(-1,1))
error = loss(preds, y)
test_preds.append(preds)
true.append(y)

total_loss += error.merchandise()
weights_1.append(mannequin.internet._modules['Layer_1'].weight.information.merchandise())
weights_2.append(mannequin.internet._modules['Layer_2'].weight.information.merchandise())
errors.append(total_loss / len(test_loader))

Picture by writer.

Lastly, we are able to visualize the information we’ve collected utilizing plotly. The plot can have two scenes: floor and SGD trajectory. One of many methods to do the primary half is to create a determine with a plotly floor. After that we are going to type it slightly by updating a structure.

The second half is so simple as it’s — simply use Scatter3d operate and specify all three axes.

import plotly.graph_objects as go
import plotly.io as pio

plotly_template = pio.templates["plotly_dark"]
fig = go.Determine(information=[go.Surface(z=LOSS, x=W1, y=W2)])

fig.update_layout(
title='Loss Floor',
scene=dict(
xaxis_title='w1',
yaxis_title='w2',
zaxis_title='Loss',
aspectmode='handbook',
aspectratio=dict(x=1, y=1, z=0.5),
xaxis=dict(showgrid=False),
yaxis=dict(showgrid=False),
zaxis=dict(showgrid=False),
),
width=800,
top=800
)

fig.add_trace(go.Scatter3d(x=weights_2, y=weights_1, z=errors,
mode='strains+markers',
line=dict(colour='pink', width=2),
marker=dict(measurement=4, colour='yellow') ))
fig.present()

Operating it in Google Colab or domestically in Jupyter Pocket book will will let you examine the error floor extra intently. Truthfully, I spent a buch of time simply this determine:)

Picture by writer.

I’d like to see you surfaces, so please be happy to share it in feedback. I strongly consider that the extra imperfect the floor is the extra attention-grabbing it’s to research it!

===========================================

All my publications on Medium are free and open-access, that’s why I’d actually recognize should you adopted me right here!

P.s. I’m extraordinarily enthusiastic about (Geo)Knowledge Science, ML/AI and Local weather Change. So if you wish to work collectively on some mission pls contact me in LinkedIn and take a look at my web site!

🛰️Comply with for extra🛰️