Day 89(DL) — YOLOv4: Optimal Speed and Accuracy of Object Detection — Part 3
Let’s wind up the YOLOv4 discussion in this post. In the previous post, we’ve seen until some of the data augmentation techniques. Here, we’ll start with Self-Adversarial Training(SAT).
Self-Adversarial Training(SAT): It works based on the principle of 2 forward backward stages. In the first stage, the network modifies the original image instead of the network weights. Through this approach, the network does an adversarial attack on itself thus altering the original image. It results in a deception that there is no intended object in the image. In the second stage, the network learns to identify an object in the modified image in the normal way.
Cross mini-Batch Normalization(CmBN): Collects statistics only between mini-batches within a single batch.
SAM modifications: SAM is changed from spatial-wise attention to point-wise attention. This is followed by replacing the shortcut connection of PAN to concatenation.
YOLOv4 Structure: YOLOv4 is comprised of Backbone(CSPDarknet53), Neck(SPP, PAN) and Head(YOLOv3).
Bags of freebies(backbone): CutMix and Mosaic data augmentation, DropBlock regularization and Class label smoothing.
Bags of freebies(detector): CIoU-loss, CmBN, DropBlock regularization, Mosaic data augmentation, Self-Adversarial Training, Eliminate grid sensitivity, Using multiple anchors for single ground truth, Cosine annealing scheduler, Optimal hyper-parameters and Random training shapes.
Bag of Specials(backbone): Mish activation, Cross-stage partial connections(CSP), Multi-input weighted residual connection(MiWRC).
Bag of Specials(detector): Mish activation, SPP-block, SAM-block, PAN path-aggregation block, DIoU-NMS.
The classification accuracy is tested on the ImageNet dataset, whereas the accuracy of the detection is validated on MS COCO dataset.
The below tabular list shows the improvement in the accuracy by including different add-ons.