How the Vision Transformer (ViT) works in 10 minutes: an image is worth 16×16 words

28.01.2021

How the Vision Transformer (ViT) works in 10 minutes: an image is worth 16×16 words

In this article you will learn how the vision transformer works for image classification problems. We distill all the important details you need to grasp along with reasons it can work very well given enough data for pretraining.

Nikolas Adaloglou

Comments are closed.