Day 96(DL) — Object tracking based on Centroid

Photo by Jorge Pena on Unsplash

The simplest algorithm for object tracking is based on the centroid of the objects between the timeframes. Since we use bb boxes for detection, the centroid will be the centre point of this bounding box. For instance, let’s take a 30-seconds video of rabbits in the mp4 format.

Step1: Convert the entire video into multiple image frames. Each frame has to be passed onto any object detection algorithm(faster R-CNN, Detectron or YOLO). The output will be a set of bounding boxes corresponding to every object in the frame.

Step2: Computation of centroid for the bounding boxes. As we know, bounding boxes are represented by 4 values that correspond to (x-centre, y-centre, width and height). In this case, we can directly use the centre values to figure out the central point. The other format for the bounding box is (xmin, ymin, xmax and ymax). In this scenario we find the centroid using (xmin + xmax) / 2 and (ymin + ymax)/2. By using these two coordinate values, we can locate the bb box mid-point.

Step3(when an object neither disappears nor newly added across the frames): Compare the euclidean distance between all the centroids from the previous frame with all the centroids in the current frame. To assign the same id, the two objects(in the consecutive frames) should have minimal euclidean distance when compared to the rest of the objects in the frame.

Step4(when an object is newly introduced): In the scenario where more number of objects detected when compared to the previous timeframe, a new id gets allocated to the newly added ones.

Step5(the case of object dropout): In the event of objects getting dropped in between the frames, the allocated id will be removed from passing on to the successive frames. We can also have control over when the id has to be dropped exactly. To cite an example, if an object does not appear in 4 consecutive frames, then it could be deregistered.


  • Object detection(bb boxes) should be done on every frame in the video. This demands computational cost as well as processing time.
  • Since the id mapping happens based on the euclidean distance between the objects i.e (t and t-1). If we take the case of overlapping scenarios, the object centroids might be the same which would result in the switching of ids.
  • As the algorithm is based on the euclidean distance, the overlapping scenario has a significant impact on the tracking when compared to other tracking algorithms.

When to use: Best suitable for the cases where the above points are well-taken care.

  • When we have an object detector that performs swift detections in real-time.
  • When there are fewer overlapping circumstances.

Recommended Reading:

AI Enthusiast | Blogger✍