Photo by Chris Liverani on Unsplash

This blog is the continuation of the previous article that introduced the Kalman Filter. We’ll start with the external influence by taking an example of a self-driving car for better clarity.

External Influence(known): While driving, based on the sensor information there will be control over the movement. One nice example to quote here is Tesla’s autopilot, it automatically issues a sudden brake when a pedestrian unexpectedly crosses. Based on the external activity, certain commands are issued(brake in our case). This auxiliary knowledge can be treated as a correction to the initial prediction we made before.


Photo by Yann Allegre on Unsplash

So far we’ve discussed diverse object detection architectures and the next step is to track the detected objects. Object tracking is one of the evolving fields in computer vision, finding applications in varied domains including self-driving cars, monitoring traffic congestion and tracking bird migration. But, before jumping on to the tracking process, it is essential to gain an in-depth understanding of the maths behind the Kalman filter and Hungarian algorithm. The focus of this post will be on the Kalman filter along with its derivation.

Basic Intuition of Kalman Filter: For instance, one of the wild explorers is on an…


Photo by Ebuen Clemente Jr on Unsplash

Having discussed some of the Nitty-gritty of the YOLOv4 architecture, let’s explore how the pre-trained network of YOLOv4 can be leveraged for custom object detection. Similar to YOLOv5, every image is associated with a ‘.txt’ file which contains details in the form of object-class, x_center, y_center, width, height. we’ll go through the 9 steps explained in the original doc.

Step1: Creating custom configuration files for our task. For training, download the pre-trained weights yolov4.conv.137, this will be used for the custom config file cfg/yolov4-custom.cfg.

Step2: Now, we can make a copy of the custom configuration file and rename it as…


Photo by Michael Dziedzic on Unsplash

In the earlier post, we’ve observed the bag of special for detector includes Mish activation. Let’s explore it in depth along with another activation called Swish.

One of the popular activation functions among the DL community is ReLU. Even though there have been other activation functions introduced over the years, none of them took over the ReLU(because of the simplicity and reliability). But one of the main drawbacks is the dying ReLU caused by gradient information loss because of collapsing negative inputs to zero.

We already know that the prime job of the activation function is to introduce non-linearity into…


We’ve already discussed IoU and how it can be applied to compute the bounding box regression loss. The focus of this article will be Generalized Intersection over Union(GIoU), Distance IoU(DIoU) and Complete IoU(CIoU).

The major weakness of IoU is, it takes the value of zero when there is no overlap(between the boxes) but fails to indicate how far these boxes are separated from one another. In this scenario, the intersection between A & B|AnB| = 0 which makes it hard to represent the proximity. Due to IoU=0, the resultant will be a vanishing gradient problem and thus no learning. …


Let’s wind up the YOLOv4 discussion in this post. In the previous post, we’ve seen until some of the data augmentation techniques. Here, we’ll start with Self-Adversarial Training(SAT).

Self-Adversarial Training(SAT): It works based on the principle of 2 forward backward stages. In the first stage, the network modifies the original image instead of the network weights. Through this approach, the network does an adversarial attack on itself thus altering the original image. It results in a deception that there is no intended object in the image. …


Bag of Specials: Unlike Bag of freebies, there will be some inference cost incurred for the modules that fall under the category of Bag of Specials. But, these techniques boost the accuracy of object detection. The principal functionalities of these plugins include enlarging the receptive field or strengthening feature integration capability.

  • The widely used modules to improve receptive field are SPP, ASPP and RFB. The SPP(Spatial Pyramid Pooling) module takes the idea from Spatial Pyramid matching(SPM). …


Notes: The series on YOLOv4 will be the explanation of the original paper. Most of the content will be referred from the original paper for reference.

We’ve already seen so many upgrades happened to the ‘You Look Only Once’ over a period of time. The next set of enhancements took the model to a completely new horizon in terms of speed, which can be installed in conventional GPUs for real-time object detection. Version 4 was introduced by completely different authors and of course for the greater good, it was made open-source(with a free license).

Fig 1 — shows the comparison between YOLOv4 and other SOTA models from original paper

YOLOv4 runs twice faster than EfficientDet…


This post explains some of the points to be taken care of while implementing yolov5 for simple custom object detection. YOLOv5 is one of the fastest object detection models(with relevant accuracy) and for the very same reason, it became a go-to choice for detection use cases. The code is written in PyTorch that can be referred from the ultralytics link. The convolutional architecture(along with anchor boxes) can be directly used with a minimal tweak to the parameters to accommodate our requirement.

Table of contents:

  • Image set and the labelling
  • Formating the bounding box coordinates and labels
  • Splitting the dataset and…


The next version in the series of YOLO models is v3 which incorporated design changes in the network. The new modifications resulted in a bigger network when compared to the earlier version but still more accurate.

Bounding Box Prediction: It takes the same path as YOLO9000 for bounding box prediction(i.e dimension clusters as anchor boxes). The four coordinates of the box correspond to tx, ty, tw and th. The cell offset from the top left corner is considered as (cx and cy) and the prior width & height is denoted by pw and ph. The predicted values are as below,

The prediction formula is taken from the original paper

Nandhini N

AI Enthusiast

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store