基于改进YOLOv8复杂街道场景下的红外目标检测算法

洪俐; 曾祥进

基于改进YOLOv8复杂街道场景下的红外目标检测算法

洪俐,
曾祥进^,

武汉工程大学计算机科学与工程学院, 湖北武汉 430205

基金项目:

国家自然科学基金 61502354

湖北省湖北三峡实验室创新基金 SC215001

详细信息

作者简介:
洪俐（1998-），男，硕士研究生，研究方向为机器视觉。E-mail：1292286139@qq.com

通讯作者:
曾祥进（1977-），男，博士，副教授，硕士生导师。研究方向为智能机器人控制、机器视觉、运动控制。E-mail：xjzeng21@163.com

中图分类号: TP391.4
计量
- 文章访问数: 118
- HTML全文浏览量: 12
- PDF下载量: 46
出版历程
- 收稿日期: 2023-12-27
- 修回日期: 2024-01-23
- 网络出版日期: 2025-05-27
- 刊出日期: 2025-05-19

Infrared Target Detection Algorithm Based on Improved YOLOv8 in Complex Street Scenes

HONG Li,
ZENG Xiangjin^,

School of Computer Science and Engineering, Wuhan Institute of Technology, Wuhan 430205, China

摘要

摘要:
针对复杂街道背景下的红外图像因遮挡、缺乏纹理细节等因素而导致目标误检、漏检的问题，提出一种复杂街道场景下的红外目标检测算法。以YOLOv8n作为基线模型，首先，通过设计多分支卷积结构，以强化特征提取和特征表达，利用结构重参数化实现训练和推理阶段解耦，提高模型推理速度，同时引入全局自注意力估计来加快注意力的计算，将时间复杂度降为O(n)，使得卷积核注意力实现动态同一。其次，结合深度可分离卷积和可变形卷积的优势，对上采样结果与主干网络的输出特征进行特征融合之后，引入显著信息感知的可变形卷积注意力门控机制，提高融合特征的语义信息丰富度。最后，替换定位损失函数为高效交并比，分别计算预测框和真实框的长、宽影响因子，加速收敛速度。在Flir数据集上进行验证实验，改进算法的平均精度均值达到79.5%，相较于YOLOv8n算法提高了3.9%，验证了所提算法在复杂街道背景下的红外目标检测上的优越性。
- 红外目标 /
- 街道场景 /
- WIoU /
- 全局自注意力估计 /
- 可变形卷积
Abstract:
Aiming at the problem of target misdetection and missed detection in infrared images under complex street backgrounds due to factors such as occlusion and lack of texture details, this paper proposes an infrared target detection algorithm for complex street scenes. Using YOLOv8n as the baseline model, firstly, a multi branch convolutional structure is designed to enhance feature extraction and expression. Structural reparameterization is used to decouple the training and inference stages, improve the inference speed of the model, and global self attention estimation is introduced to accelerate the calculation of attention. The time complexity is reduced to O(n), enabling the convolutional kernel attention to achieve dynamic identity. Secondly, combining the advantages of depthwise separable convolution and deformable convolution, after feature fusion between the upsampling results and the output features of the backbone network, a salient information aware deformable convolution attention gating mechanism is introduced to improve the semantic information richness of the fused features. Finally, An efficient intersection and union ratio replace the localization loss function, calculate the length and width influence factors of the predicted box and the true box separately, and accelerate the convergence speed. Validation experiments were conducted on the Flir dataset, and the average accuracy of the improved algorithm reached 79.5%, which is 3.9% higher than the YOLOv8n algorithm. This validates the superiority of the proposed algorithm in infrared target detection under complex street backgrounds.
- infrared targets /
- street scenes /
- WIoU /
- global self-attention estimation /
- deformable convolution

HTML全文

图 1 YOLOv8网络结构

Figure 1. YOLOv8 network structure

下载: 全尺寸图片幻灯片

图 2 改进后的YOLOv8网络结构

Figure 2. Improved YOLOv8 network structure

下载: 全尺寸图片幻灯片

图 3 COSA流程处理

Figure 3. COSA process processing

下载: 全尺寸图片幻灯片

图 4 MBC-GSAE结构

Figure 4. MBC-GSAE structural diagram

下载: 全尺寸图片幻灯片

图 5 DAC结构

Figure 5. DAC structure

下载: 全尺寸图片幻灯片

图 6 原图、YOLOv8n以及改进YOLOv8n检测结果对比

Figure 6. Comparison of the original image, YOLOv8n and improved YOLOv8n detection results

下载: 全尺寸图片幻灯片

表 1 实验环境配置

Table 1 Experimental environment configuration

Name	Environment Configuration
Operating System	Windows10
CPU	Intel 12400F
GPU	NVIDIA RTX 4070 12GB
Framework	Pytorch1.9.0 + CUDA12.2 +cuDNN8.9.6
Languages	Python3.9

下载: 导出CSV

表 2 各实验对比结果

Table 2 Comparison of experimental results

Models	FLOPs/G	Size/MB	AP			mAP(IoU=0.5)/%	FPS
Models	FLOPs/G	Size/MB	Car/%	Bicycle/%	Person/%	mAP(IoU=0.5)/%	FPS
YOLOv5s	15.8	13.76	90.3	62.6	83.0	78.6	80.4
YOLO-IDSTD^[16]	3.0	7.36	83.1	44.8	72.4	66.8	-
FEID-YOLO^[23]	-	20.62	76.5	36.6	58.7	57.3	-
YOLOv7-tiny	13.0	11.72	90.1	61.5	83.8	78.5	108.2
MSC-YOLO	5.9	4.63	89.2	62.3	83.1	78.2	96.3
FS-YOLOv5s^[24]	-	10.72	89.1	59.2	81.5	76.6	-
YOLOv8n	8.9	5.96	89.3	56.8	81.3	75.6	117.6
IMPROVED-YOLOv8n	9.6	6.52	90.2	66.3	82.1	79.5	114.1

下载: 导出CSV

表 3 不同模型在VOC 2007数据集上的对比结果

Table 3 Comparison results of different models on the VOC 2007 dataset

Models	Input image size	Size/MB	mAP(IoU=0.5)/%	FPS
DPM-v5^[25]	-	-	32.1	0.7
DPM-CF^[26]	-	-	30.6	5.2
Fastest-DPM^[27]	-	-	30.4	28.6
Faster R-CNN(VGG)	600^*1000	462	81.5	13.5
SSD(VGG)	512^*512	105.8	77.2	49.5
DSSD(ResNet101)	321^*321	490.3	78.4	9.5
FSSD(VGG)	300^*300	-	78.6	68.5
YOLOv5s	544^*544	28.8	73.5	76.2
YOLOv8n	512^*640	5.96	76.8	104.3
IMPROVED-YOLOv8n	512^*640	6.52	79.4	100.7

下载: 导出CSV

表 4 消融实验

Table 4 Ablation experiment

Models	MBC-GSAE	DAC	WIoU	Car/%	Bicycle/%	Person/%	mAP_0.5/%
YOLOv8-n				89.3	56.8	81.3	75.6
	√			89.6	61.7	81.6	77.6
	√	√		89.8	64.9	81.8	78.8
	√	√	√	90.2	66.3	82.1	79.5

下载: 导出CSV

参考文献(27)

[1]	楼哲航, 罗素云. 基于YOLOX和Swin Transformer的车载红外目标检测[J]. 红外技术, 2022, 44(11): 1167-1175. http://hwjs.nvir.cn/article/id/3d31e429-9365-4797-ab65-60e06a4414d8 LOU Zhehang, LUO Suyun. Vehicle infrared target detection based on YOLOX and swin transformer[J]. Infrared Technology, 2022, 44(11): 1167-1175. http://hwjs.nvir.cn/article/id/3d31e429-9365-4797-ab65-60e06a4414d8
[2]	DAI X, YUAN X, WEI X. TIRNet: Object detection in thermal infrared images for autonomous driving [J]. Applied Intelligence, 2020, 51(3): 1244-1261.
[3]	易诗, 李欣荣, 吴志娟, 等. 基于红外热成像与改进YOLOV3的夜间野兔监测方法[J]. 农业工程学报, 2019, 35(19): 223-229. YI Shi, LI Xinrong, WU Zhijuan, et al. Night hare detection method based on infrared thermal imaging and improved YOLOV3[J]. Transactions of the Chinese Society of Agricultural Engineering. 2019, 35(19): 223-229.
[4]	刘晓文, 曾雪婷, 李涛, 等. 基于改进YOLO v7的生猪群体体温热红外自动检测方法[J]. 农业机械学报, 2023, 54(S1): 267-274. DOI: 10.6041/j.issn.1000-1298.2023.S1.029 LIU Xiaowen, ZENG Xueting, LI TAO, et al. Automatic detection method of body temperature in herd of pigs based on ilmproved YOLOv7[J]. Transactions of the Chinese Society for Agricultural Machinery, 2023, 54(S1): 267-274. DOI: 10.6041/j.issn.1000-1298.2023.S1.029
[5]	刘刚, 冯彦坤, 康熙. 基于改进YOLO v4的生猪耳根温度热红外视频检测方法[J]. 农业机械学报, 2023, 54(2): 240-248. LIU GANG, FENG Yankun, KANG XI. Detection method of pig ear root temperature based on improved YOLO v4[J]. Transactions of the Chinese Society for Agricultural Machinery, 2023, 54(2): 240-248.
[6]	ZHANG H, LUO C, WANG Q, et al. A novel infrared video surveillance system using deep learning based techniques [J]. Multimedia Tools and Applications, 2018: 77(20): 26657-26676. DOI: 10.1007/s11042-018-5883-y
[7]	Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014: 580-587.
[8]	Girshick R. Fast R-CNN[C]//Proceedings of the IEEE International Conference on Computer Vision, 2015: 1440-1448.
[9]	REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: towards realtime object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149. DOI: 10.1109/TPAMI.2016.2577031
[10]	Redmon J, Divvala S, Girshick R, et al. You only look once: unified, realtime object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 779-788.
[11]	Redmon J, Farhadi A. YOLO9000: better, faster, stronger[C]// Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017: 6517-6525.
[12]	Redmon J, Farhadi A. Yolov3: An incremental improvement[J/OL]. arXiv preprint arXiv: 1804.02767, https://arxiv.org/abs/1804.02767.
[13]	LIU W, Anguelov D, Erhan D, et al. SSD: single shot multibox detector[C]//Computer Vision–ECCV Proceedings, 2016: 21-37.
[14]	LIN T Y, Goyal P, Girshick R, et al. Focal loss for dense object detection[C]//Proceedings of the IEEE International Conference on Computer Vision, 2017: 2980-2988.
[15]	李强龙, 周新文, 位梦恩, 等. 基于条形池化和注意力机制的街道场景红外目标检测算法[J/OL]. 计算机工程: 1-13, [2023-05-20]. Doi: 10.19678/j.issn.1000-3428.0065481. LI Qianglong, ZHOU Xinwen, WEI Meng'en, et al. Infrared target detection algorithm based on strip pooling and attention mechanism in street scene[J/OL]. Computer Engineering: 1-13, [2023-05-20]. Doi: 10.19678/j.issn.1000-3428.0065481.
[16]	蒋昕昊, 蔡伟, 杨志勇, 等. 基于YOLO-IDSTD算法的红外弱小目标检测[J]. 红外与激光工程, 2022, 51(3): 502-511. JIANG Xinhao, CAI Wei, YANG Zhiyong, et al. Infrared dim and small target detection based on YOLO-IDSTD algorithm[J]. Infrared and Laser Engineering, 2022, 51(3): 502-511.
[17]	陈永麟, 王恒涛, 张上. 基于YOLO v7的轻量级红外目标检测算法[J]. 红外技术, 2024, 46(12): 1380-1389. http://hwjs.nvir.cn/article/id/e476d956-cfb7-4f3a-aafb-2e7b5e7a7890 CHEN Yonglin, WANG Hengtao, ZHANG Shang. Lightweight infrared target detection algorithm based on YOLOv7[J]. Infrared Technology, 2024, 46(12): 1380-1389. http://hwjs.nvir.cn/article/id/e476d956-cfb7-4f3a-aafb-2e7b5e7a7890
[18]	蔡伟, 徐佩伟, 杨志勇, 等. 复杂背景下红外图像弱小目标检测[J]. 应用光学, 2021, 42(4): 643-650. CAI Wei, XU Peiwei, YANG Zhiyong, et al. Dim-small targets detection of infrared images in complex backgrounds[J]. Journal of Applied Optics, 2021, 42(4): 643-650.
[19]	WU Haiping, XIAO Bin, Noel Codella, et al. CvT: Introducing convolutions to vision transformers[J/OL]. arXiv: 2103.15808, https://doi.org/10.48550/arXiv.2103.15808.
[20]	Irwan Bello, Barret Zoph, Quoc Le, et al. Attention augmented convolutional networks[C]// IEEE International Conference on Computer Vision, 2019: 3286-3295.
[21]	ZHANG H, Fromont E, Lefevre S, et al. Multispectral fusion for object detection with cyclic fuse-and-refine blocks[C]//IEEE International Conference on Image Processing, 2020: 276-280.
[22]	邓姗姗, 黄慧, 马燕. 基于改进Faster R-CNN的小目标检测算法[J]. 计算机工程与科学, 2023, 45(5): 869-877. DOI: 10.3969/j.issn.1007-130X.2023.05.012 DENG Shanshan, HUANG Hui, MA Yan. A small object detection algorithm based on improved Faster R-CNN[J]. Computer Engineering and Science, 2023, 45(5): 869-877. DOI: 10.3969/j.issn.1007-130X.2023.05.012
[23]	郭勇, 张凯. 基于特征增强的快速红外目标检测[J]. 无线电工程, 2023, 53(1): 47-55. GUO Yong, ZHANG Kai. Fast infrared object detection based on feature enhancement[J]. Radio Engineering, 2023, 53(1): 47-55.
[24]	黄磊, 杨媛, 杨成煜, 等. FS-YOLOv5: 轻量化红外目标检测方法[J]. 计算机工程与应用, 2023, 59(9): 215-224. HUANG Lei, YANG Yuan, YANG Chengyu, et al. FS-YOLOv5: lightweight infrared rode target detection method[J]. Computer Engineering and Applications, 2023, 59(9): 215-224.
[25]	Girshick R, Felzenszwalb P, FMcAllester D. Object Detection with Discriminatively Trained Part Based Models[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010, 32(9): 1627-1645. DOI: 10.1109/TPAMI.2009.167
[26]	Pedersoli M, Vedaldi A, Gonz`alez J, et al. A coarse-to-fine approach for fast deformable object detection[J]. Pattern Recognition, 2015, 48(5): 1844-1853, .
[27]	YAN J, LEI Z, WEN L, et al. The fastest deformable part model for object detection[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2014: 2497-2504.