Abstract:
Object detection has long been a research hotspot in the field of computer vision, and the YOLO series of object detection models is widely used in numerous fields. However, most current image data for object detection are based on a single type of sensor, which makes it difficult to fully characterize the imaging scene. The detected objects contain limited useful information, especially under conditions of low illumination, night, rain, and fog. To improve nighttime object detection, our study proposed a multi-attention mechanism for infrared and visible images. This mechanism combines the CBAM attention mechanism with a Transformer to obtain rich local and contextual information and reduce false detections. To verify the effectiveness of the method, five current mainstream object detection algorithms were selected and tested on a public infrared object detection dataset. The mAP of the proposed method improved from 62.6% to 71.5% compared to the original YOLOv7. This study also produced an infrared–visible fusion dataset for nighttime object detection. On this dataset, the mAP improved significantly from 79.90% to 94.80% compared to the original YOLOv7.