Abstract:
To address the issues of low resolution, complex background, and difficulty in recognizing multi-scale targets in pedestrian detection using infrared images, this paper proposes an improved YOLOv8 pedestrian detection model for infrared scenes (IRD-YOLOv8n). Firstly, the RetinexFormer is adopted as the pre-image enhancement module to restore the details of damaged images and improve their quality, ensuring that the subsequent model can extract more effective pedestrian features. Secondly, the ordinary convolution in the backbone network is replaced with the receptive field attention convolution (RFAConv) to enhance the model's focusing ability and the diversity of feature expression. Finally, an adaptive hierarchical feature fusion module (HFF module) is introduced in the Neck to achieve refined fusion of cross-layer features and enhance the model's detection performance for multi-scale targets. Training and validation were conducted on the FLIR dataset. Experimental results show that the improved IRD-YOLOv8n model reduces the missed detection rate by 5% compared to the original YOLOv8 model, and improves mAP@0.5 and mAP@0.5:0.95 by 2.4% and 2.3% respectively, demonstrating broad application prospects.