A Sensible Information to Contrastive Studying | by Mengliu Zhao

Now it’s time for some contrastive studying. To mitigate the problem of inadequate annotation labels and totally make the most of the massive amount of unlabelled knowledge, contrastive studying may very well be used to successfully assist the spine study the information representations and not using a particular job. The spine may very well be frozen for a given downstream job and solely practice a shallow community on a restricted annotated dataset to realize passable outcomes.

Probably the most generally used contrastive studying approaches embody SimCLR, SimSiam, and MOCO (see my earlier article on MOCO). Right here, we evaluate SimCLR and SimSiam.

SimCLR calculates over optimistic and detrimental pairs throughout the knowledge batch, which requires onerous detrimental mining, NT-Xent loss (which extends the cosine similarity loss over a batch) and a big batch dimension. SimCLR additionally requires the LARS optimizer to accommodate a big batch dimension.

SimSiam, nonetheless, makes use of a Siamese structure, which avoids utilizing detrimental pairs and additional avoids the necessity for giant batch sizes. The variations between SimSiam and SimCLR are given within the desk beneath.

Comparability between SimCLR and SimSiam. Picture by creator.

The SimSiam structure. Picture supply: https://arxiv.org/pdf/2011.10566

We will see from the determine above that the SimSiam structure solely comprises two components: the encoder/spine and the predictor. Throughout coaching time, the gradient propagation of the Siamese half is stopped, and the cosine similarity is calculated between the outputs of the predictors and the spine.

So, how can we implement this structure in actuality? Persevering with on the supervised classification design, we preserve the spine the identical and solely modify the MLP layer. Within the supervised studying structure, the MLP outputs a 10-element vector indicating the possibilities of the ten courses. However for SimSiam, the aim is to not carry out “classification” however to study the “illustration,” so we want the output to be of the identical dimension because the spine output for loss calculation. And the negative_cosine_similarity is given beneath:

import torch.nn as nn
import matplotlib.pyplot as pltclass SimSiam(nn.Module):
def __init__(self):
tremendous(SimSiam, self).__init__()
self.spine = nn.Sequential(
nn.Conv2d(1, 32, kernel_size=3, stride=2, padding=1),
nn.ReLU(),
nn.BatchNorm2d(32),
nn.Conv2d(32, 64, kernel_size=3, stride=2, padding=1),
nn.ReLU(),
nn.BatchNorm2d(64),
nn.Conv2d(64, 128, kernel_size=3, stride=2, padding=1),
nn.ReLU(),
nn.BatchNorm2d(128),
)
self.prediction_mlp = nn.Sequential(nn.Linear(128*4*4, 64),
nn.BatchNorm1d(64),
nn.ReLU(),
nn.Linear(64, 128*4*4),
)
def ahead(self, x):
x = self.spine(x)
x = x.view(-1, 128 * 4 * 4)
pred_output = self.prediction_mlp(x)
return x, pred_output
cos = nn.CosineSimilarity(dim=1, eps=1e-6)
def negative_cosine_similarity_stopgradient(pred, proj):
return -cos(pred, proj.detach()).imply()

The pseudo-code for coaching the SimSiam is given within the unique paper beneath:

Coaching pseudo-code for SimSiam. Supply: https://arxiv.org/pdf/2011.10566

And we convert it into actual coaching code:

import tqdmimport torch
import torch.optim as optim
from torch.utils.knowledge import DataLoader
from torchvision.transforms import RandAugment
import wandb
wandb_config = {
"learning_rate": 0.0001,
"structure": "simsiam",
"dataset": "FashionMNIST",
"epochs": 100,
"batch_size": 256,
}
wandb.init(
# set the wandb mission the place this run can be logged
mission="simsiam",
# monitor hyperparameters and run metadata
config=wandb_config,
)
# Initialize mannequin and optimizer
machine = torch.machine('cuda' if torch.cuda.is_available() else 'cpu')
simsiam = SimSiam()
random_augmenter = RandAugment(num_ops=5)
optimizer = optim.SGD(simsiam.parameters(), 
lr=wandb_config["learning_rate"], 
momentum=0.9, 
weight_decay=1e-5,
)
train_dataloader = DataLoader(train_dataset, batch_size=wandb_config["batch_size"], shuffle=True)
# Coaching loop
for epoch in vary(wandb_config["epochs"]):
simsiam.practice()
print(f"Epoch {epoch}")
train_loss = 0
for batch_idx, (picture, _) in enumerate(tqdm.tqdm(train_dataloader, whole=len(train_dataloader))):
optimizer.zero_grad()
aug1, aug2 = random_augmenter((picture*255).to(dtype=torch.uint8)).to(dtype=torch.float32) / 255.0, 
random_augmenter((picture*255).to(dtype=torch.uint8)).to(dtype=torch.float32) / 255.0
proj1, pred1 = simsiam(aug1)
proj2, pred2 = simsiam(aug2)
loss = negative_cosine_similarity_stopgradient(pred1, proj2) / 2 + negative_cosine_similarity_stopgradient(pred2, proj1) / 2
loss.backward()
optimizer.step()
wandb.log({"coaching loss": loss})
if (epoch+1) % 10 == 0:
torch.save(simsiam.state_dict(), f"weights/simsiam_epoch{epoch+1}.pt")

We educated for 100 epochs as a good comparability to the restricted supervised coaching; the coaching loss is proven beneath. Be aware: Attributable to its Siamese design, SimSiam may very well be very delicate to hyperparameters like studying price and MLP hidden layers. The unique SimSiam paper supplies an in depth configuration for the ResNet50 spine. For the ViT-based spine, we advocate studying the MOCO v3 paper, which adopts the SimSiam mannequin in a momentum replace scheme.

Coaching loss for SimSiam. Picture by creator.

Then, we run the educated SimSiam on the testing set and visualize the representations utilizing UMAP discount:

import tqdm
import numpy as npimport torch
machine = torch.machine('cuda' if torch.cuda.is_available() else 'cpu')
simsiam = SimSiam()                      
test_dataloader = DataLoader(test_dataset, batch_size=32, shuffle=False)
simsiam.load_state_dict(torch.load("weights/simsiam_epoch100.pt"))
simsiam.eval()
simsiam.to(machine)
options = []
labels = []
for batch_idx, (picture, goal) in enumerate(tqdm.tqdm(test_dataloader, whole=len(test_dataloader))):
with torch.no_grad():
proj, pred = simsiam(picture.to(machine))
options.lengthen(np.squeeze(pred.detach().cpu().numpy()).tolist())
labels.lengthen(goal.detach().cpu().numpy().tolist())
import plotly.specific as px
import umap.umap_ as umap
reducer = umap.UMAP(n_components=3, n_neighbors=10, metric="cosine")
projections = reducer.fit_transform(np.array(options))
px.scatter(projections, x=0, y=1,
coloration=labels, labels={'coloration': 'Style MNIST Labels'}
)

The UMAP of the SimSiam illustration over the testing set. Picture by creator.

It’s attention-grabbing to see that there are two small islands within the reduced-dimension map above: class 5, 7, 8, and a few 9. If we pull out the FashionMNIST class listing, we all know that these courses correspond to footwear comparable to “Sandal,” “Sneaker,” “Bag,” and “Ankle boot.” The massive purple cluster corresponds to clothes courses like “T-shirt/high,” “Trousers,” “Pullover,” “Costume,” “Coat,” and “Shirt.” The SimSiam demonstrates studying a significant illustration within the imaginative and prescient area.

A Sensible Information to Contrastive Studying | by Mengliu Zhao | Jul, 2024

Leave a Reply Cancel reply

10 Free AI instruments for Working Professionals

Constructing Fashionable Knowledge Lakehouses on Google Cloud with Apache Iceberg and Apache Spark

What’s Multi-Modal Information Evaluation?

Construct ETL Pipelines for Information Science Workflows in About 30 Strains of Python

10 GitHub LLM Repositories Each AI Engineer Ought to Know

10 Free AI instruments for Working Professionals

Constructing Fashionable Knowledge Lakehouses on Google Cloud with Apache Iceberg and Apache Spark

What’s Multi-Modal Information Evaluation?

Construct ETL Pipelines for Information Science Workflows in About 30 Strains of Python