Abstract:
Harsh conditions such as strong light interference and low illumination are major challenges for underground target detection in mines. Detection using only visible light often results in missed or false detections. Existing algorithms for fusing visible and infrared images often fail to effectively extract features from both types of images simultaneously. To address this issue, we propose a multi-scale fusion target detection algorithm, DF-YOLOv8, which combines visible and infrared images. By constructing a dual-stream feature extractor architecture, we separately extract features from low-resolution infrared and visible light images. We use bilinear interpolation for feature map upsampling and apply a channel attention mechanism for feature map fusion. The introduction of a fusion weighted feature loss function and a consistency loss strategy optimizes the model's adaptability and robustness. Results from ablation experiments and comparison experiments show that using the proposed method, the model achieves a mean average precision (mAP) of 87.9% on our self-built Coal-Mine Video dataset, representing an improvement of 6.0% and 5.7% compared to YOLOv7 and YOLOv8, respectively. Additionally, it meets real-time monitoring requirements with a detection speed of 67 FPS.