How Transformers Are Shaping the Future of Object Detection

by Ankit Sachan • July 8, 2024

The world of computer vision changed forever 2011 onwards, when convolutional neural networks (CNNs) revolutionized object detection by providing a significant leap in accuracy and efficiency compared to earlier methods like the Viola-Jones framework, which primarily relied on handcrafted features and boosted classifiers.    CNN-based models like Faster R-CNN, YOLO, and CenterNet brought about groundbreaking […]

Continue Reading

Technical overview of Image Synthesis : Stable Diffusion

by Ankit Sachan • March 2, 2023

Tex  to Image models like DALL-E, Imagen, and Stable Diffusion have attracted a lot of attention to Image Synthesis models, recently. These models can generate impressive looking images from benign looking prompts. Here are a few typical examples of images from Stable Diffusion:                 Looking under the hood […]

Continue Reading

MOTR: End-to-End Multi-Object Tracking with Transformers

by Ankit Sachan • January 15, 2023

MOTR is a state of the art end-to-end multiple object tracker that does not require any temporal association between objects of adjacent frames. It directly outputs the track of objects in a sequence of input images (video). MOTR uses Deformable DETR for object detection on a single image. To understand the architecture of MOTR it […]

Continue Reading

GhostNetV2: Enhance Cheap Operation with Long-Range Attention

by Ankit Sachan • November 15, 2022

GhostNetV2 is a recent SOTA architecture that allows an implementation of Long-Range attention in the deep CNN frameworks used in various ML tasks such as image classification, object detection, and video analysis. GhostNetV2 proposes a new attention mechanism called DFC attention to capture long range spatial information. And it does so while keeping the implementation […]

Continue Reading

Understanding CLIP by OpenAI

by Ankit Sachan • May 10, 2022

CLIP By OPEN-AI Introduction Nearly all state-of-the-art visual perception algorithms rely on the same formula:  (1) pretrain a convolutional network on a large, manually annotated image classification dataset (2) finetune the network on a smaller, task-specific dataset. This technique has been widely used for several years and has led to impressive improvements on numerous tasks.  […]

Continue Reading