Citation: | LI Ruihong, FU Zhitao, ZHANG Shaochen, ZHANG Jian, WANG Leiguang. Nighttime Object Detection in Infrared and Visible Images Based on Multi-Attention Mechanism[J]. Infrared Technology , 2024, 46(12): 1371-1379. |
Object detection has long been a research hotspot in the field of computer vision, and the YOLO series of object detection models is widely used in numerous fields. However, most current image data for object detection are based on a single type of sensor, which makes it difficult to fully characterize the imaging scene. The detected objects contain limited useful information, especially under conditions of low illumination, night, rain, and fog. To improve nighttime object detection, our study proposed a multi-attention mechanism for infrared and visible images. This mechanism combines the CBAM attention mechanism with a Transformer to obtain rich local and contextual information and reduce false detections. To verify the effectiveness of the method, five current mainstream object detection algorithms were selected and tested on a public infrared object detection dataset. The mAP of the proposed method improved from 62.6% to 71.5% compared to the original YOLOv7. This study also produced an infrared–visible fusion dataset for nighttime object detection. On this dataset, the mAP improved significantly from 79.90% to 94.80% compared to the original YOLOv7.
[1] |
Hafiz A M, Bhat G M. A survey on instance segmentation: state of the art[J]. International Journal of Multimedia Information Retrieval, 2020, 9(3): 171-189. DOI: 10.1007/s13735-020-00195-x
|
[2] |
ZHANG D, Islam M M, LU G. A review on automatic image annotation techniques[J]. Pattern Recognition, 2012, 45(1): 346-362. DOI: 10.1016/j.patcog.2011.05.013
|
[3] |
Souza É L, Nakamura E F, Pazzi R W. Object tracking for sensor networks: a survey[J]. ACM Computing Surveys (CSUR), 2016, 49(2): 1-31.
|
[4] |
YAO H, ZHANG Y, JIAN H, et al. Nighttime pedestrian detection based on fore-background contrast learning[J]. Knowledge-Based Systems, 2023, 275: 110719. DOI: 10.1016/j.knosys.2023.110719
|
[5] |
Polukhin A, Gordienko Y, Jervan G, et al. Object detection for rescue operations by high-altitude infrared thermal imaging collected by unmanned aerial vehicles[C]//Iberian Conference on Pattern Recognition and Image Analysis. Cham: Springer Nature Switzerland, 2023: 490-504.
|
[6] |
MA J, MA Y, LI C. Infrared and visible image fusion methods and applications: a survey[J]. Information Fusion, 2019, 45: 153-178. DOI: 10.1016/j.inffus.2018.02.004
|
[7] |
MA J, TANG L, XU M, et al. STDFusionNet: An infrared and visible image fusion network based on salient object detection[J]. IEEE Transactions on Instrumentation and Measurement, 2021, 70: 1-13.
|
[8] |
CHEN J, XU X, ZHANG J, et al. Ship target detection algorithm based on decision-level fusion of visible and SAR images[J]. IEEE Journal on Miniaturization for Air and Space Systems, 2023, 4(3): 242-249. DOI: 10.1109/JMASS.2023.3269434
|
[9] |
WANG C Y, Bochkovskiy A, LIAO H Y M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023: 7464-7475.
|
[10] |
WANG C Y, LIAO H Y M. Yolov4: Optimal speed and accuracy of object detection[J]. arXiv preprint arXiv: 2004.10934, 2020.
|
[11] |
Woo S, Park J, Lee J Y, et al. Cbam: Convolutional lock attention module[C]//Proceedings of the European Conference on Computer Vision (ECCV), 2018: 3-19.
|
[12] |
Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[J]. Advances in Neural Information Processing Systems, 2017, 30: 5998-6008.
|
[13] |
MA J, YU W, LIANG P, et al. FusionGAN: a generative adversarial network for infrared and visible image fusion[J]. Information Fusion, 2019, 48: 11-26. DOI: 10.1016/j.inffus.2018.09.004
|
[14] |
MA J, CHEN C, LI C, et al. Infrared and visible image fusion via gradient transfer and total variation minimization[J]. Information Fusion, 2016, 31: 100-109. DOI: 10.1016/j.inffus.2016.02.001
|
[15] |
XU H, MA J, JIAGN J, et al. U2Fusion: a unified unsupervised image fusion network[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 44(1): 502-518.
|
[16] |
LI H, WU X J. DenseFuse: a fusion approach to infrared and visible images[J]. IEEE Transactions on Image Processing, 2018, 28(5): 2614-2623.
|
[17] |
REN S, HE K, Girshick R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 39(6): 1137-1149.
|
[18] |
LIU W, Anguelov D, Erhan D, et al. Ssd: single shot multibox detector[C]//Computer Vision–ECCV 2016: 14th European Conference, Proceedings, Part Ⅰ 14. Springer International Publishing, 2016: 21-37.
|
[19] |
ZHU X, LYU S, WANG X, et al. TPH-YOLOv5: Improved YOLOv5 based on transformer prediction head for object detection on drone-captured scenarios[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021: 2778-2788.
|