Multiscale Infrared Object Detection Network Based on YOLO-MIR Algorithm
-
摘要: 针对红外图像相对于可见光检测精度低,鲁棒性差的问题,提出了一种基于YOLO的多尺度红外图目标检测网络YOLO-MIR(YOLO for Multi-scale IR image)。首先,为了提高网络对红外图像的适应能力,改进了特征提取以及融合模块,使其保留更多的红外图像细节。其次,为增强对多尺度目标的检测能力,增大了融合网络的尺度,加强红外图像特征的进一步融合。最后,为增加网络的鲁棒性,设计了针对红外图像的数据增广算法。设置消融实验评估不同方法对网络性能的影响,结果表明在红外数据集下网络性能得到明显提升。与主流算法YOLOv7相比在参数量不变的条件下平均检测精度提升了3%,提高了网络对红外图像的适应能力,实现了对各尺度目标的精确检测。Abstract: To address the low detection accuracy and poor robustness of infrared images compared with visible images, a multiscale object detection network YOLO-MIR(YOLO for multiscale IR images) for infrared images is proposed. First, to increase the adaptability of the network to infrared images, the feature extraction and fusion modules were improved to retain more details in the infrared images. Second, the detection ability of multiscale objects is enhanced, the scale of the fusion network is increased, and the fusion of infrared image features is facilitated. Finally, a data augmentation algorithm for infrared images was designed to increase the network robustness. Ablation experiments were conducted to evaluate the impact of different methods on the network performance, and the results show that the network performance was significantly improved using the infrared dataset. Compared with the prevalent algorithm YOLOv7, the average detection accuracy of this algorithm was improved by 3%, the adaptive ability to infrared images was improved, and the accurate detection of targets at various scales was realized.
-
Keywords:
- object detection /
- deep learning /
- infrared image /
- YOLO
-
-
表 1 YOLOv7数据扩充方法在不同数据集上的对比
Table 1 Comparison of YOLOv7 data expansion methods on different data sets
表 2 YOLO-MIR在FLIR数据集上的消融实验
Table 2 YOLO-MIR ablation experiments on FLIR dataset
YOLOv7 Avg pooling Data argument Multi-scale integration mAP50/% √ 90.0 √ √ 90.5 √ √ 90.9 √ √ 91.6 √ √ √ √ 92.7 表 3 YOLO-MIR与其他网络在FLIR数据集上的对比实验
Table 3 Experiments comparing YOLO-MIR with other networks on FLIR dataset
Methods mAP/% Person/% Bicycle/% Car/% Parameters FLOPs/B Faster R-CNN 79.2 76.4 72.5 88.4 41.2M 156.1 YOLOv4 79.3 76.2 75.1 87.3 63.9M 128.3 YOLOv5m 81.6 78.0 78.1 89.2 35.7M 50.2 SMG-Y[19] 77.0 78.5 65.8 86.6 43.8M 54.7 PMBW[20] 77.3 81.2 64.0 86.5 36.0M 120.0 RGBT[21] 82.9 80.1 76.7 91.8 82.7M 130.0 YOLO-ACN 82.1 79.1 57.9 85.1 34.5M 111.5 YOLOv7 89.7 88.6 87.2 92.8 36.9M 104.7 YOLO-MIR 92.7 91.1 91.0 97.2 37.0M 104.8 -
[1] Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the IEEE conference on computer vision and pattern recognition, 2014: 580-587.
[2] Redmon J, Divvala S, Girshick R, et al. You only look once: Unified, real-time object detection[C]//Proceedings of the IEEE conference on computer vision and pattern recognition, 2016: 779-788.
[3] LI Z, ZHOU F. FSSD: feature fusion single shot multibox detector[J/OL]. arXiv preprint arXiv, 2017, https://arxiv.org/abs/1712.00960.
[4] Redmon J, Farhadi A. Yolov3: An incremental improvement[J/OL]. arXiv preprint arXiv, 2018, https://arxiv.org/abs/1804.02767.
[5] Jocher G, Chaurasia A, Stoken A, et al. ultralytics/yolov5: v6.1 - TensorRT, TensorFlow Edge TPU and OpenVINO Export and Inference[Z/OL]. 2022, https://doi.org/10.5281/ZENODO.6222936.
[6] Bochkovskiy A, Wang C Y, Liao H Y M. Yolov4: Optimal speed and accuracy of object detection[J/OL]. arXiv preprint arXiv, 2020, https://arxiv.org/abs/2004.10934#:~:text=%EE%80%80YOLOv4%3A%20Optimal%20Speed%20and%20Accuracy%20of%20Object%20Detection%EE%80%81.,features%20operate%20on%20certain%20models%20exclusively%20and%20.
[7] WANG C Y, Bochkovskiy A, LIAO H Y M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[J]. arXiv preprint arXiv, 2022, https://arxiv.org/abs/2207.02696.
[8] LIU S, QI L, QIN H, et al. Path aggregation network for instance segmentation[C]//Proceedings of the IEEE conference on computer vision and pattern recognition, 2018: 8759-8768.
[9] Redmon J, Farhadi A. YOLO9000: Better, Faster, Stronger[C]// Conference on Computer Vision & Pattern Recognition. IEEE, 2017: 6517-6525.
[10] REN S, HE K, Girshick R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2017, 39(6): 1137-1149. http://pubmed.ncbi.nlm.nih.gov/27295650/
[11] He K, Gkioxari G, Dollár P, et al. Mask r-cnn[C]//Proceedings of the IEEE International Conference on Computer Vision, 2017: 2961-2969.
[12] ZHENG Z, WANG P, REN D, et al. Enhancing geometric factors in model learning and inference for object detection and instance segmentation[J]. IEEE Transactions on Cybernetics, 2021, 52(8): 8574-8586. http://www.xueshufan.com/publication/3194790201
[13] Veit A, Matera T, Neumann L, et al. Coco-text: Dataset and benchmark for text detection and recognition in natural images[J]. arXiv preprint arXiv, 2016, https://arxiv.org/abs/1601.07140.
[14] Smith A R. Color gamut transform pairs[J]. ACM Siggraph Computer Graphics, 1978, 12(3): 12-19. DOI: 10.1145/965139.807361
[15] Zhou Z, Cao J, Wang H, et al. Image denoising algorithm via doubly bilateral filtering[C]// International Conference on Information Engineering and Computer Science. IEEE, 2009: 1-4.
[16] Hoiem D, Divvala S K, Hays J H. Pascal VOC 2008 challenge[J]. Computer Science, 2009 https://www.semanticscholar.org/paper/Pascal-VOC-2008-Challenge-Hoiem-Divvala/9c327cf1bb8435a8fba27b6ace50bb907078d8d1.
[17] ZHAO W Y. Discriminant component analysis for face recognition[C]//Proceedings 15th International Conference on Pattern Recognition, IEEE, 2000, 2: 818-821.
[18] Venkataraman V, FAN G, FAN X. Target tracking with online feature selection in FLIR imagery[C]// IEEE Conference on Computer Vision and Pattern Recognition, IEEE, 2007: 1-8.
[19] CHEN R, LIU S, MU J, et al. Borrow from source models: efficient infrared object detection with limited examples[J]. Applied Sciences, 2022, 12(4): 1896. DOI: 10.3390/app12041896
[20] Kera S B, Tadepalli A, Ranjani J J. A paced multi-stage block-wise approach for object detection in thermal images[J]. The Visual Computer, 2022, https://doi.org/10.1007/s00371-022-02445-x.
[21] Vadidar M, Kariminezhad A, Mayr C, et al. Robust Environment Perception for Automated Driving: A Unified Learning Pipeline for Visual-Infrared Object Detection[C]// IEEE Intelligent Vehicles Symposium (Ⅳ). IEEE, 2022: 367-374.
-
期刊类型引用(9)
1. 杨仁梅,赵艳,权军霞,方婷婷,白莎莎,费利燕. 1例血液透析患者股静脉导管周围医用粘胶剂相关性皮肤损伤的护理. 当代护士(中旬刊). 2025(04): 109-112 . 百度学术
2. 李猛,尚坤,陈树刚,刘秀斌,王奕霏,吴璠. 一种适用于载人航天飞行的针织手套设计及性能分析. 载人航天. 2024(01): 17-22 . 百度学术
3. 陈红,段小文,郭玲玲,范硕,祝成炎,张红霞. 远红外涤纶交织面料的开发及其结构性能. 上海纺织科技. 2024(04): 64-68 . 百度学术
4. 朱小英,朱丽舒,孔月明. 艾灸联合远红外线照射改善一例血液透析动静脉内瘘血肿的效果. 名医. 2024(11): 69-71 . 百度学术
5. 郑红菊,张方方,冯文艇. 低强度激光长期暴露对女性工人职业健康的影响. 职业卫生与应急救援. 2023(05): 595-598 . 百度学术
6. 侯刘林,李贺,宗珂. 等速肌力训练联合远红外线照射在乳腺癌根治术后患者中的应用效果. 癌症进展. 2022(10): 1024-1027 . 百度学术
7. 杨永健,李丽娟,庞永诚,杨海玲,龚瑞莹. 基于网络药理学探讨三黄紫参油治疗压力性损伤的作用机制. 湖南中医杂志. 2022(08): 160-167 . 百度学术
8. 叶来生,孟林,梁浩瀚,黄为,翁晨祎. 髋关节置换术后早期疼痛外治法应用的研究进展. 大众科技. 2022(10): 96-99 . 百度学术
9. 王琬婧,刘晓雯,刘瑶. 远红外线干预应用于自体动静脉内瘘护理的研究进展. 医疗装备. 2021(23): 194-196 . 百度学术
其他类型引用(8)