The Final Information to Imaginative and prescient Transformers | by François Porcher | Aug, 2024

A complete information to the Imaginative and prescient Transformer (ViT) that revolutionized laptop imaginative and prescient

Hello everybody! For individuals who have no idea me but, my title is Francois, I’m a Analysis Scientist at Meta. I’ve a ardour for explaining superior AI ideas and making them extra accessible.

At this time, let’s dive into probably the most vital contribution within the subject of Pc Imaginative and prescient: the Imaginative and prescient Transformer (ViT).

Changing a picture into patches, picture by writer

The Imaginative and prescient Transformer was launched by Alexey Dosovitskiy and al. (Google Mind) in 2021 within the paper An Picture is price 16×16 phrases. On the time, Transformers had proven to be the important thing to unlock nice efficiency on NLP duties, launched within the should paper Consideration is All you Want in 2017.

Between 2017 and 2021, there have been a number of makes an attempt to combine the eye mechanism into Convolutional Neural Networks (CNNs). Nonetheless, these had been largely hybrid fashions (combining CNN layers with consideration layers) and lacked scalability. Google addressed this by fully eliminating convolutions and leveraging their computational energy to scale the mannequin.