Infrared Target Detection Algorithm Based on Improved YOLOv8 in Complex Street Scenes
-
摘要:
针对复杂街道背景下的红外图像因遮挡、缺乏纹理细节等因素而导致目标误检、漏检的问题,提出一种复杂街道场景下的红外目标检测算法。以YOLOv8n作为基线模型,首先,通过设计多分支卷积结构,以强化特征提取和特征表达,利用结构重参数化实现训练和推理阶段解耦,提高模型推理速度,同时引入全局自注意力估计来加快注意力的计算,将时间复杂度降为O(n),使得卷积核注意力实现动态同一。其次,结合深度可分离卷积和可变形卷积的优势,对上采样结果与主干网络的输出特征进行特征融合之后,引入显著信息感知的可变形卷积注意力门控机制,提高融合特征的语义信息丰富度。最后,替换定位损失函数为高效交并比,分别计算预测框和真实框的长、宽影响因子,加速收敛速度。在Flir数据集上进行验证实验,改进算法的平均精度均值达到79.5%,相较于YOLOv8n算法提高了3.9%,验证了所提算法在复杂街道背景下的红外目标检测上的优越性。
Abstract:Aiming at the problem of target misdetection and missed detection in infrared images under complex street backgrounds due to factors such as occlusion and lack of texture details, this paper proposes an infrared target detection algorithm for complex street scenes. Using YOLOv8n as the baseline model, firstly, a multi branch convolutional structure is designed to enhance feature extraction and expression. Structural reparameterization is used to decouple the training and inference stages, improve the inference speed of the model, and global self attention estimation is introduced to accelerate the calculation of attention. The time complexity is reduced to O(n), enabling the convolutional kernel attention to achieve dynamic identity. Secondly, combining the advantages of depthwise separable convolution and deformable convolution, after feature fusion between the upsampling results and the output features of the backbone network, a salient information aware deformable convolution attention gating mechanism is introduced to improve the semantic information richness of the fused features. Finally, An efficient intersection and union ratio replace the localization loss function, calculate the length and width influence factors of the predicted box and the true box separately, and accelerate the convergence speed. Validation experiments were conducted on the Flir dataset, and the average accuracy of the improved algorithm reached 79.5%, which is 3.9% higher than the YOLOv8n algorithm. This validates the superiority of the proposed algorithm in infrared target detection under complex street backgrounds.
-
Keywords:
- infrared targets /
- street scenes /
- WIoU /
- global self-attention estimation /
- deformable convolution
-
-
表 1 实验环境配置
Table 1 Experimental environment configuration
Name Environment Configuration Operating System Windows10 CPU Intel 12400F GPU NVIDIA RTX 4070 12GB Framework Pytorch1.9.0 + CUDA12.2 +cuDNN8.9.6 Languages Python3.9 表 2 各实验对比结果
Table 2 Comparison of experimental results
Models FLOPs/G Size/MB AP mAP(IoU=0.5)/% FPS Car/% Bicycle/% Person/% YOLOv5s 15.8 13.76 90.3 62.6 83.0 78.6 80.4 YOLO-IDSTD[16] 3.0 7.36 83.1 44.8 72.4 66.8 - FEID-YOLO[23] - 20.62 76.5 36.6 58.7 57.3 - YOLOv7-tiny 13.0 11.72 90.1 61.5 83.8 78.5 108.2 MSC-YOLO 5.9 4.63 89.2 62.3 83.1 78.2 96.3 FS-YOLOv5s[24] - 10.72 89.1 59.2 81.5 76.6 - YOLOv8n 8.9 5.96 89.3 56.8 81.3 75.6 117.6 IMPROVED-YOLOv8n 9.6 6.52 90.2 66.3 82.1 79.5 114.1 表 3 不同模型在VOC 2007数据集上的对比结果
Table 3 Comparison results of different models on the VOC 2007 dataset
Models Input image size Size/MB mAP(IoU=0.5)/% FPS DPM-v5[25] - - 32.1 0.7 DPM-CF[26] - - 30.6 5.2 Fastest-DPM[27] - - 30.4 28.6 Faster R-CNN(VGG) 600*1000 462 81.5 13.5 SSD(VGG) 512*512 105.8 77.2 49.5 DSSD(ResNet101) 321*321 490.3 78.4 9.5 FSSD(VGG) 300*300 - 78.6 68.5 YOLOv5s 544*544 28.8 73.5 76.2 YOLOv8n 512*640 5.96 76.8 104.3 IMPROVED-YOLOv8n 512*640 6.52 79.4 100.7 表 4 消融实验
Table 4 Ablation experiment
Models MBC-GSAE DAC WIoU Car/% Bicycle/% Person/% mAP0.5/% YOLOv8-n 89.3 56.8 81.3 75.6 √ 89.6 61.7 81.6 77.6 √ √ 89.8 64.9 81.8 78.8 √ √ √ 90.2 66.3 82.1 79.5 -
[1] 楼哲航, 罗素云. 基于YOLOX和Swin Transformer的车载红外目标检测[J]. 红外技术, 2022, 44(11): 1167-1175. http://hwjs.nvir.cn/article/id/3d31e429-9365-4797-ab65-60e06a4414d8 LOU Zhehang, LUO Suyun. Vehicle infrared target detection based on YOLOX and swin transformer[J]. Infrared Technology, 2022, 44(11): 1167-1175. http://hwjs.nvir.cn/article/id/3d31e429-9365-4797-ab65-60e06a4414d8
[2] DAI X, YUAN X, WEI X. TIRNet: Object detection in thermal infrared images for autonomous driving [J]. Applied Intelligence, 2020, 51(3): 1244-1261.
[3] 易诗, 李欣荣, 吴志娟, 等. 基于红外热成像与改进YOLOV3的夜间野兔监测方法[J]. 农业工程学报, 2019, 35(19): 223-229. YI Shi, LI Xinrong, WU Zhijuan, et al. Night hare detection method based on infrared thermal imaging and improved YOLOV3[J]. Transactions of the Chinese Society of Agricultural Engineering. 2019, 35(19): 223-229.
[4] 刘晓文, 曾雪婷, 李涛, 等. 基于改进YOLO v7的生猪群体体温热红外自动检测方法[J]. 农业机械学报, 2023, 54(S1): 267-274. DOI: 10.6041/j.issn.1000-1298.2023.S1.029 LIU Xiaowen, ZENG Xueting, LI TAO, et al. Automatic detection method of body temperature in herd of pigs based on ilmproved YOLOv7[J]. Transactions of the Chinese Society for Agricultural Machinery, 2023, 54(S1): 267-274. DOI: 10.6041/j.issn.1000-1298.2023.S1.029
[5] 刘刚, 冯彦坤, 康熙. 基于改进YOLO v4的生猪耳根温度热红外视频检测方法[J]. 农业机械学报, 2023, 54(2): 240-248. LIU GANG, FENG Yankun, KANG XI. Detection method of pig ear root temperature based on improved YOLO v4[J]. Transactions of the Chinese Society for Agricultural Machinery, 2023, 54(2): 240-248.
[6] ZHANG H, LUO C, WANG Q, et al. A novel infrared video surveillance system using deep learning based techniques [J]. Multimedia Tools and Applications, 2018: 77(20): 26657-26676. DOI: 10.1007/s11042-018-5883-y
[7] Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014: 580-587.
[8] Girshick R. Fast R-CNN[C]//Proceedings of the IEEE International Conference on Computer Vision, 2015: 1440-1448.
[9] REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: towards realtime object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149. DOI: 10.1109/TPAMI.2016.2577031
[10] Redmon J, Divvala S, Girshick R, et al. You only look once: unified, realtime object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 779-788.
[11] Redmon J, Farhadi A. YOLO9000: better, faster, stronger[C]// Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017: 6517-6525.
[12] Redmon J, Farhadi A. Yolov3: An incremental improvement[J/OL]. arXiv preprint arXiv: 1804.02767, https://arxiv.org/abs/1804.02767.
[13] LIU W, Anguelov D, Erhan D, et al. SSD: single shot multibox detector[C]//Computer Vision–ECCV Proceedings, 2016: 21-37.
[14] LIN T Y, Goyal P, Girshick R, et al. Focal loss for dense object detection[C]//Proceedings of the IEEE International Conference on Computer Vision, 2017: 2980-2988.
[15] 李强龙, 周新文, 位梦恩, 等. 基于条形池化和注意力机制的街道场景红外目标检测算法[J/OL]. 计算机工程: 1-13, [2023-05-20]. Doi: 10.19678/j.issn.1000-3428.0065481. LI Qianglong, ZHOU Xinwen, WEI Meng'en, et al. Infrared target detection algorithm based on strip pooling and attention mechanism in street scene[J/OL]. Computer Engineering: 1-13, [2023-05-20]. Doi: 10.19678/j.issn.1000-3428.0065481.
[16] 蒋昕昊, 蔡伟, 杨志勇, 等. 基于YOLO-IDSTD算法的红外弱小目标检测[J]. 红外与激光工程, 2022, 51(3): 502-511. JIANG Xinhao, CAI Wei, YANG Zhiyong, et al. Infrared dim and small target detection based on YOLO-IDSTD algorithm[J]. Infrared and Laser Engineering, 2022, 51(3): 502-511.
[17] 陈永麟, 王恒涛, 张上. 基于YOLO v7的轻量级红外目标检测算法[J]. 红外技术, 2024, 46(12): 1380-1389. http://hwjs.nvir.cn/article/id/e476d956-cfb7-4f3a-aafb-2e7b5e7a7890 CHEN Yonglin, WANG Hengtao, ZHANG Shang. Lightweight infrared target detection algorithm based on YOLOv7[J]. Infrared Technology, 2024, 46(12): 1380-1389. http://hwjs.nvir.cn/article/id/e476d956-cfb7-4f3a-aafb-2e7b5e7a7890
[18] 蔡伟, 徐佩伟, 杨志勇, 等. 复杂背景下红外图像弱小目标检测[J]. 应用光学, 2021, 42(4): 643-650. CAI Wei, XU Peiwei, YANG Zhiyong, et al. Dim-small targets detection of infrared images in complex backgrounds[J]. Journal of Applied Optics, 2021, 42(4): 643-650.
[19] WU Haiping, XIAO Bin, Noel Codella, et al. CvT: Introducing convolutions to vision transformers[J/OL]. arXiv: 2103.15808, https://doi.org/10.48550/arXiv.2103.15808.
[20] Irwan Bello, Barret Zoph, Quoc Le, et al. Attention augmented convolutional networks[C]// IEEE International Conference on Computer Vision, 2019: 3286-3295.
[21] ZHANG H, Fromont E, Lefevre S, et al. Multispectral fusion for object detection with cyclic fuse-and-refine blocks[C]//IEEE International Conference on Image Processing, 2020: 276-280.
[22] 邓姗姗, 黄慧, 马燕. 基于改进Faster R-CNN的小目标检测算法[J]. 计算机工程与科学, 2023, 45(5): 869-877. DOI: 10.3969/j.issn.1007-130X.2023.05.012 DENG Shanshan, HUANG Hui, MA Yan. A small object detection algorithm based on improved Faster R-CNN[J]. Computer Engineering and Science, 2023, 45(5): 869-877. DOI: 10.3969/j.issn.1007-130X.2023.05.012
[23] 郭勇, 张凯. 基于特征增强的快速红外目标检测[J]. 无线电工程, 2023, 53(1): 47-55. GUO Yong, ZHANG Kai. Fast infrared object detection based on feature enhancement[J]. Radio Engineering, 2023, 53(1): 47-55.
[24] 黄磊, 杨媛, 杨成煜, 等. FS-YOLOv5: 轻量化红外目标检测方法[J]. 计算机工程与应用, 2023, 59(9): 215-224. HUANG Lei, YANG Yuan, YANG Chengyu, et al. FS-YOLOv5: lightweight infrared rode target detection method[J]. Computer Engineering and Applications, 2023, 59(9): 215-224.
[25] Girshick R, Felzenszwalb P, FMcAllester D. Object Detection with Discriminatively Trained Part Based Models[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010, 32(9): 1627-1645. DOI: 10.1109/TPAMI.2009.167
[26] Pedersoli M, Vedaldi A, Gonz`alez J, et al. A coarse-to-fine approach for fast deformable object detection[J]. Pattern Recognition, 2015, 48(5): 1844-1853, .
[27] YAN J, LEI Z, WEN L, et al. The fastest deformable part model for object detection[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2014: 2497-2504.
-
期刊类型引用(1)
1. 费国标. 一种红外成像镜头结构设计及分析. 科学技术创新. 2020(15): 188-189 . 百度学术
其他类型引用(0)