基于YOLO-MIR算法的多尺度红外目标检测网络

周金杰; 吉莉; 张倩; 张宝辉; 袁茜琳; 刘燕晴; 岳江

基于YOLO-MIR算法的多尺度红外目标检测网络

周金杰^1,,
吉莉¹,
张倩¹,
张宝辉^1, ,,
袁茜琳¹,
刘燕晴¹,
岳江²

1.
昆明物理研究所, 云南昆明 650221
2.
河海大学理学院, 江苏南京 210024

详细信息

作者简介:
周金杰（1998-），男，硕士研究生，主要从事红外图像处理方面的研究。E-mail: 1943035411@qq.com

通讯作者:
张宝辉（1984-），男，正高级工程师，博士，主要从事红外图像处理方面的研究。E-mail：zbhmatt@163.com

中图分类号: TP391.4
计量
- 文章访问数: 263
- HTML全文浏览量: 85
- PDF下载量: 59
出版历程
- 收稿日期: 2023-02-05
- 修回日期: 2023-03-30
- 刊出日期: 2023-05-19

Multiscale Infrared Object Detection Network Based on YOLO-MIR Algorithm

1.
Nanjing Research Center, Kunming Institute of Physics, kunming 650221, China
2.
College of Science, Hohai University, Nanjing 210024, China

摘要

摘要: 针对红外图像相对于可见光检测精度低，鲁棒性差的问题，提出了一种基于YOLO的多尺度红外图目标检测网络YOLO-MIR（YOLO for Multi-scale IR image）。首先，为了提高网络对红外图像的适应能力，改进了特征提取以及融合模块，使其保留更多的红外图像细节。其次，为增强对多尺度目标的检测能力，增大了融合网络的尺度，加强红外图像特征的进一步融合。最后，为增加网络的鲁棒性，设计了针对红外图像的数据增广算法。设置消融实验评估不同方法对网络性能的影响，结果表明在红外数据集下网络性能得到明显提升。与主流算法YOLOv7相比在参数量不变的条件下平均检测精度提升了3%，提高了网络对红外图像的适应能力，实现了对各尺度目标的精确检测。
- 目标检测 /
- 深度学习 /
- 红外图像 /
- YOLO
Abstract: To address the low detection accuracy and poor robustness of infrared images compared with visible images, a multiscale object detection network YOLO-MIR(YOLO for multiscale IR images) for infrared images is proposed. First, to increase the adaptability of the network to infrared images, the feature extraction and fusion modules were improved to retain more details in the infrared images. Second, the detection ability of multiscale objects is enhanced, the scale of the fusion network is increased, and the fusion of infrared image features is facilitated. Finally, a data augmentation algorithm for infrared images was designed to increase the network robustness. Ablation experiments were conducted to evaluate the impact of different methods on the network performance, and the results show that the network performance was significantly improved using the infrared dataset. Compared with the prevalent algorithm YOLOv7, the average detection accuracy of this algorithm was improved by 3%, the adaptive ability to infrared images was improved, and the accurate detection of targets at various scales was realized.
- object detection /
- deep learning /
- infrared image /
- YOLO

HTML全文

图 1 YOLO-MIR网络结构，Backbone负责特征提取，Neck负责特征融合，Head负责分类预测

Figure 1. YOLO-MIR network structure, Backbone is responsible for feature extraction, Neck is responsible for feature fusion, and Head is responsible for classification prediction.

下载: 全尺寸图片幻灯片

图 2 单通道红外图像的池化操作

Figure 2. Pooling operation for single channel IR images

下载: 全尺寸图片幻灯片

图 3 多尺度特征金字塔结构

Figure 3. Multi-scale feature pyramid structure

下载: 全尺寸图片幻灯片

图 4 CIOU原理图

Figure 4. CIOU schematic

下载: 全尺寸图片幻灯片

图 5 可见光预处理算法

Figure 5. Visible image preprocessing algorithm

下载: 全尺寸图片幻灯片

图 6 灰度反转算法

Figure 6. Grayscale inversion algorithm

下载: 全尺寸图片幻灯片

图 7 网络训练时的loss下降曲线；红色曲线(a)表示使用了本文提出的红外数据增广算法，蓝色曲线(b)表示使用传统数据处理方法

Figure 7. Loss descent curve in network training; The red curve (a) indicates the use of the infrared data augmentation algorithm proposed in this paper, and the blue curve (b) indicates the use of traditional methods

下载: 全尺寸图片幻灯片

图 8 各网络预测结果对比

Figure 8. Comparison of prediction results of each network

下载: 全尺寸图片幻灯片

表 1 YOLOv7数据扩充方法在不同数据集上的对比

Table 1 Comparison of YOLOv7 data expansion methods on different data sets

Category	Dataset	mAP₅₀ / %
Category	Dataset	YOLOv7 (clip, rotating, overturn)	YOLOv7 (inverse only)
Visible	Voc^[16]	84.0	84.2	0.2↑
Visible	CoCo	69.7	67.9	1.8↓
Terminal	KAIST^[17]	94.6	97.1	2.5↑
Terminal	FLIR	89.4	90.9	1.5↑

下载: 导出CSV

表 2 YOLO-MIR在FLIR数据集上的消融实验

Table 2 YOLO-MIR ablation experiments on FLIR dataset

YOLOv7	Avg pooling	Data argument	Multi-scale integration	mAP50/%
√				90.0
√	√			90.5
√		√		90.9
√			√	91.6
√	√	√	√	92.7

下载: 导出CSV

表 3 YOLO-MIR与其他网络在FLIR数据集上的对比实验

Table 3 Experiments comparing YOLO-MIR with other networks on FLIR dataset

Methods	mAP/%	Person/%	Bicycle/%	Car/%	Parameters	FLOPs/B
Faster R-CNN	79.2	76.4	72.5	88.4	41.2M	156.1
YOLOv4	79.3	76.2	75.1	87.3	63.9M	128.3
YOLOv5m	81.6	78.0	78.1	89.2	35.7M	50.2
SMG-Y^[19]	77.0	78.5	65.8	86.6	43.8M	54.7
PMBW^[20]	77.3	81.2	64.0	86.5	36.0M	120.0
RGBT^[21]	82.9	80.1	76.7	91.8	82.7M	130.0
YOLO-ACN	82.1	79.1	57.9	85.1	34.5M	111.5
YOLOv7	89.7	88.6	87.2	92.8	36.9M	104.7
YOLO-MIR	92.7	91.1	91.0	97.2	37.0M	104.8

下载: 导出CSV

参考文献(21)

[1]	Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the IEEE conference on computer vision and pattern recognition, 2014: 580-587.
[2]	Redmon J, Divvala S, Girshick R, et al. You only look once: Unified, real-time object detection[C]//Proceedings of the IEEE conference on computer vision and pattern recognition, 2016: 779-788.
[3]	LI Z, ZHOU F. FSSD: feature fusion single shot multibox detector[J/OL]. arXiv preprint arXiv, 2017, https://arxiv.org/abs/1712.00960.
[4]	Redmon J, Farhadi A. Yolov3: An incremental improvement[J/OL]. arXiv preprint arXiv, 2018, https://arxiv.org/abs/1804.02767.
[5]	Jocher G, Chaurasia A, Stoken A, et al. ultralytics/yolov5: v6.1 - TensorRT, TensorFlow Edge TPU and OpenVINO Export and Inference[Z/OL]. 2022, https://doi.org/10.5281/ZENODO.6222936.
[6]	Bochkovskiy A, Wang C Y, Liao H Y M. Yolov4: Optimal speed and accuracy of object detection[J/OL]. arXiv preprint arXiv, 2020, https://arxiv.org/abs/2004.10934#:~:text=%EE%80%80YOLOv4%3A%20Optimal%20Speed%20and%20Accuracy%20of%20Object%20Detection%EE%80%81.,features%20operate%20on%20certain%20models%20exclusively%20and%20.
[7]	WANG C Y, Bochkovskiy A, LIAO H Y M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[J]. arXiv preprint arXiv, 2022, https://arxiv.org/abs/2207.02696.
[8]	LIU S, QI L, QIN H, et al. Path aggregation network for instance segmentation[C]//Proceedings of the IEEE conference on computer vision and pattern recognition, 2018: 8759-8768.
[9]	Redmon J, Farhadi A. YOLO9000: Better, Faster, Stronger[C]// Conference on Computer Vision & Pattern Recognition. IEEE, 2017: 6517-6525.
[10]	REN S, HE K, Girshick R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2017, 39(6): 1137-1149. http://pubmed.ncbi.nlm.nih.gov/27295650/
[11]	He K, Gkioxari G, Dollár P, et al. Mask r-cnn[C]//Proceedings of the IEEE International Conference on Computer Vision, 2017: 2961-2969.
[12]	ZHENG Z, WANG P, REN D, et al. Enhancing geometric factors in model learning and inference for object detection and instance segmentation[J]. IEEE Transactions on Cybernetics, 2021, 52(8): 8574-8586. http://www.xueshufan.com/publication/3194790201
[13]	Veit A, Matera T, Neumann L, et al. Coco-text: Dataset and benchmark for text detection and recognition in natural images[J]. arXiv preprint arXiv, 2016, https://arxiv.org/abs/1601.07140.
[14]	Smith A R. Color gamut transform pairs[J]. ACM Siggraph Computer Graphics, 1978, 12(3): 12-19. DOI: 10.1145/965139.807361
[15]	Zhou Z, Cao J, Wang H, et al. Image denoising algorithm via doubly bilateral filtering[C]// International Conference on Information Engineering and Computer Science. IEEE, 2009: 1-4.
[16]	Hoiem D, Divvala S K, Hays J H. Pascal VOC 2008 challenge[J]. Computer Science, 2009 https://www.semanticscholar.org/paper/Pascal-VOC-2008-Challenge-Hoiem-Divvala/9c327cf1bb8435a8fba27b6ace50bb907078d8d1.
[17]	ZHAO W Y. Discriminant component analysis for face recognition[C]//Proceedings 15th International Conference on Pattern Recognition, IEEE, 2000, 2: 818-821.
[18]	Venkataraman V, FAN G, FAN X. Target tracking with online feature selection in FLIR imagery[C]// IEEE Conference on Computer Vision and Pattern Recognition, IEEE, 2007: 1-8.
[19]	CHEN R, LIU S, MU J, et al. Borrow from source models: efficient infrared object detection with limited examples[J]. Applied Sciences, 2022, 12(4): 1896. DOI: 10.3390/app12041896
[20]	Kera S B, Tadepalli A, Ranjani J J. A paced multi-stage block-wise approach for object detection in thermal images[J]. The Visual Computer, 2022, https://doi.org/10.1007/s00371-022-02445-x.
[21]	Vadidar M, Kariminezhad A, Mayr C, et al. Robust Environment Perception for Automated Driving: A Unified Learning Pipeline for Visual-Infrared Object Detection[C]// IEEE Intelligent Vehicles Symposium (Ⅳ). IEEE, 2022: 367-374.

施引文献(17)

期刊类型引用(9)

1.	杨仁梅，赵艳，权军霞，方婷婷，白莎莎，费利燕. 1例血液透析患者股静脉导管周围医用粘胶剂相关性皮肤损伤的护理. 当代护士(中旬刊). 2025(04): 109-112 . 百度学术
2.	李猛，尚坤，陈树刚，刘秀斌，王奕霏，吴璠. 一种适用于载人航天飞行的针织手套设计及性能分析. 载人航天. 2024(01): 17-22 . 百度学术
3.	陈红，段小文，郭玲玲，范硕，祝成炎，张红霞. 远红外涤纶交织面料的开发及其结构性能. 上海纺织科技. 2024(04): 64-68 . 百度学术
4.	朱小英，朱丽舒，孔月明. 艾灸联合远红外线照射改善一例血液透析动静脉内瘘血肿的效果. 名医. 2024(11): 69-71 . 百度学术
5.	郑红菊，张方方，冯文艇. 低强度激光长期暴露对女性工人职业健康的影响. 职业卫生与应急救援. 2023(05): 595-598 . 百度学术
6.	侯刘林，李贺，宗珂. 等速肌力训练联合远红外线照射在乳腺癌根治术后患者中的应用效果. 癌症进展. 2022(10): 1024-1027 . 百度学术
7.	杨永健，李丽娟，庞永诚，杨海玲，龚瑞莹. 基于网络药理学探讨三黄紫参油治疗压力性损伤的作用机制. 湖南中医杂志. 2022(08): 160-167 . 百度学术
8.	叶来生，孟林，梁浩瀚，黄为，翁晨祎. 髋关节置换术后早期疼痛外治法应用的研究进展. 大众科技. 2022(10): 96-99 . 百度学术
9.	王琬婧，刘晓雯，刘瑶. 远红外线干预应用于自体动静脉内瘘护理的研究进展. 医疗装备. 2021(23): 194-196 . 百度学术