Abstract:
In response to the challenges posed by low signal-to-noise ratios and complex task scenarios, an improved detection method called DCS-YOLOv8 (DCN_C2f-CA-SIoU-YOLOv8) is proposed to address the insufficient infrared occluded object detection and weak target detection capabilities of the YOLOv8 model. Building on the YOLOv8 framework, the backbone network incorporates a lightweight deformable convolution network (DCN_C2f) module based on deformable convolutions, which adaptively adjusts the network's visual receptive field to enhance the multi-scale feature representation of objects. The feature fusion network introduces the coordinate attention (CA) module based on coordinate attention mechanisms to capture spatial dependencies among multiple objects, thereby improving the object localization accuracy. Additionally, the position regression loss function is enhanced using Scylla IoU to ensure a relative displacement direction match between the predicted and ground truth boxes. This improvement accelerates the model convergence speed and enhances the detection and localization accuracy. The experimental results demonstrate that DCS-YOLOv8 achieves significant improvements in the average precision of the FLIR, OTCBVS, and VEDAI test sets compared to the YOLOv8-n\s\m\l\x series models. Specifically, the average mAP@0.5 values are enhanced by 6.8%, 0.6%, and 4.0% respectively, reaching 86.5%, 99.0%, and 75.6%. Furthermore, the model's inference speed satisfies the real-time requirements for infrared object detection tasks.