Abstract:
We proposed a new object detection method based on the CSE-YOLOv5 (CBAM-SPPF-EIoU-YOLOv5) model for insufficient multi-scale feature learning ability and the difficulty of balancing detection accuracy and model parameter quantity in remote sensing image object detection algorithms in complex task scenarios. We built this method on the YOLOv5 model's backbone network framework and introduced a convolutional attention mechanism layer into the shallow layers to enhance the model's ability to extract refined features and suppress redundant information interference. In the deep layers, we constructed a spatial pyramid pooling fast (SPPF) with a tandem construction module and improved the statistical pooling method to fuse multi-scale key feature information from shallow to deep. In addition, we further enhanced the multi-scale feature learning ability by optimizing the anchor box mechanism and improving the loss function. The experimental results demonstrated the superior performance of the CSE-YOLOv5 series models on the publicly available datasets RSOD, DIOR, and DOTA. The average mean precisions (mAP@0.5) were 96.8%, 92.0%, and 71.0% for RSOD, DIOR, and DOTA, respectively. Furthermore, the average mAP@0.5:0.95 at a wider IoU range of 0.5 to 0.95 achieved 87.0%, 78.5%, and 61.9% on the same datasets. The inference speed of the model satisfied the real-time requirements. Compared to the YOLOv5 series models, the CSE-YOLOv5 model exhibited significant performance enhancements and surpassed other mainstream models in object detection.