基于空间自适应和内容感知的红外小目标检测

闵锋; 刘彪; 况永刚; 毛一新; 刘煜晖

基于空间自适应和内容感知的红外小目标检测

武汉工程大学计算机科学与工程学院智能机器人湖北省重点实验室, 湖北武汉 430205

基金项目:

国家自然科学基金 62171328

详细信息

作者简介:
闵锋（1976-），男，湖北黄冈人，博士，副教授，硕士生导师，主要研究方向：图像处理与模式识别、计算机视觉等

通讯作者:
刘彪（2001-），男，山东菏泽人，硕士研究生，主要研究方向：计算机视觉。E-mail：445040158@qq.com

中图分类号: TP391.4
计量
- 文章访问数: 241
- HTML全文浏览量: 32
- PDF下载量: 93
出版历程
- 收稿日期: 2023-07-02
- 修回日期: 2023-08-06
- 网络出版日期: 2024-07-24
- 刊出日期: 2024-07-19

Spatially Adaptive and Content-Aware Infrared Small Target Detection

Hubei Province Key Laboratory of Intelligent Robot, School of Computer Science and Engineering, Wuhan Institute of Technology, Wuhan 430205, China

摘要

摘要:
由于红外街道图像中小目标像素较少、颜色特征不丰富，容易导致模型漏检、误检以及检测效果不佳等问题，因此提出了一种基于空间自适应和内容感知的红外小目标检测算法。首先，通过堆叠局部注意力与可变形注意力设计一种基于空间自适应的转换器，以增强对长距离依赖特征的建模能力，捕获到更多空间位置信息。其次，采用内容感知特征重组算子进行特征上采样，实现在大感受野内聚合上下文信息以及利用浅层特征信息来自适应地重组特征。最后增加160×160的高分辨率预测头，将输入特征的像素点映射到更细小的检测区域，进一步改善小目标的检测效果。在FILR数据集上的实验结果表明，改进算法的平均精度均值达到85.6%，相较于YOLOX-s算法提高了3.9%，验证了所提算法在红外小目标检测上的优越性。
- 空间自适应 /
- 内容感知 /
- 红外目标 /
- 重组特征 /
- 高分辨率预测头
Abstract:
Owing to the scarcity of pixel values and limited color features in infrared street images, issues such as missed detections, false detections, and poor detection performance are common. To address these problems, a spatially adaptive and content-aware infrared small object detection algorithm is proposed. The key components of this algorithm are as follows. 1) Spatially adaptive transformer: This transformer is designed by stacking local attention and deformable attention mechanisms to enhance the modeling capability of long-range dependency features and capture more spatial positional information. 2) Content-aware reassembly of features (CARAFE) operator: This operator is used for feature upsampling, aggregating contextual information within a large receptive field, and adaptively recombining features using shallow-level information. 3) High-resolution prediction head: A high-resolution prediction head of size 160x160 is added to map the pixels of input features to finer detection regions, further improving the detection performance of small objects. Experimental results on the FLIR dataset demonstrate that the proposed algorithm achieves an average precision mean of 85.6%, representing a 3.9% improvement over the YOLOX-s algorithm. These results validate the superiority of the proposed algorithm in detecting small objects in infrared images.
- spatially adaptive /
- content aware /
- infrared targe /
- restructuring features /
- high resolution prediction head

HTML全文

图 1 改进后的网络结构图

Figure 1. Improved network structure diagram

下载: 全尺寸图片幻灯片

图 2 基于空间自适应的转换器

Figure 2. Transformer based on spatial adaptation

下载: 全尺寸图片幻灯片

图 3 可变形注意力

Figure 3. Deformable attention

下载: 全尺寸图片幻灯片

图 4 CARAFE上采样算子流程图

Figure 4. CARAFE upsampling operator flowchart

下载: 全尺寸图片幻灯片

图 5 数据集中所有类别标签的大小分布

Figure 5. Size distribution of all category labels in the dataset

下载: 全尺寸图片幻灯片

图 6 YOLOv5s、YOLOX-s以及改进YOLOX-s模型检测结果对比

Figure 6. Comparison of detection results between YOLOv5s, YOLOX-s, and improved YOLOX-s

下载: 全尺寸图片幻灯片

表 1 模型训练的超参数

Table 1 Hyperparameters for model training

Training hyperparameters	Parameter values
Maximum learning rate	1e-2
Minimum learning rate	(1e-2)*0.01
Weight attenuation value	5e-4
Epochs	300
Batch-size	4
Freeze training	50

下载: 导出CSV

表 2 各实验结果对比

Table 2 Comparison of experimental results

Models	Backbone	AP₅₀/%			mAP₅₀/%	Params/M	FPS
Models	Backbone	Person	Bicycle	Car	mAP₅₀/%	Params/M	FPS
FCOS	ResNet50	67.7	52.4	73.6	64.6%	32.1	71
Qin^[27]	EfficientNet	-	-	-	70.8%	-	22
YOLOv5s	CSPDarknet-53	79.2	66.1	89.6	78.3%	7.1	109
YOLOv5m	CSPDarknet-53	83.2	78.3	86.6	82.7%	21.1	64
Li^[18]	CSPDarknet-53	84.8	67.1	90.5	80.7%	8.1	-
LRAF-Net^[28]	CSPDarknet-53	-	-	-	80.5%	18.8	-
YOLOX-s	CSPDarknet-53	78.8	75.6	90.7	81.7%	8.9	104
Improved model	SAT-CSPDarknet	86.6	80.1	90.3	85.6%	11.6	95

下载: 导出CSV

表 3 消融实验结果

Table 3 The results of ablation experiment

Models	SAT	CARAFE	Head	Person/%	Bicycle/%	Car/%	mAP_0.5/%
YOLOX-s				78.8	75.6	90.7	81.7
	√			78.6	77.2	91.2	82.3
	√	√		82.1	78.6	91.3	84.0
	√	√	√	86.6	80.1	90.3	85.6

下载: 导出CSV

参考文献(28)

[1]	楼哲航, 罗素云. 基于YOLOX和Swin Transformer的车载红外目标检测[J]. 红外技术, 2022, 44(11): 1167-1175. http://hwjs.nvir.cn/cn/article/id/3d31e429-9365-4797-ab65-60e06a4414d8 LOU Zhehang, LUO Suyun. Vehicle infrared target detection based on YOLOX and swin transformer[J]. Infrared Technology, 2022, 44(11): 1167-1175. http://hwjs.nvir.cn/cn/article/id/3d31e429-9365-4797-ab65-60e06a4414d8
[2]	Lowe D G. Distinctive image features from scale-invariant keypoints[J]. International Journal of Computer Vision, 2004, 60: 91-110. DOI: 10.1023/B:VISI.0000029664.99615.94
[3]	Viola P, Jones M. Rapid object detection using a boosted cascade of simple features[C]//Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2001, 1: I-I. DOI: 10.1109/CVPR.2001.990517.
[4]	Pal M, Foody G M. Feature selection for classification of hyperspectral data by SVM[J]. IEEE Transactions on Geoscience and Remote Sensing, 2010, 48(5): 2297-2307. DOI: 10.1109/TGRS.2009.2039484
[5]	杜妮妮, 单凯东, 卫莎莎. LPformer: 基于拉普拉斯金字塔多级Transformer的红外小目标检测[J]. 红外技术, 2023, 45(6): 630-638. http://hwjs.nvir.cn/cn/article/id/ad309416-52b1-456f-b972-42f94c2aa3e1 DU Nini, SHAN Kaidong, WEI Shasha. LPformer: aplacian pyramid multi-level transformer or infrared small target detection[J]. Infrared Technology, 2023, 45(6): 630-638. http://hwjs.nvir.cn/cn/article/id/ad309416-52b1-456f-b972-42f94c2aa3e1
[6]	武连全, 楚宪腾, 杨海涛, 等. 基于改进YOLOX的X射线违禁物品检测[J]. 红外技术, 2023, 45(4): 427-435. http://hwjs.nvir.cn/cn/article/id/7e45bcc9-aca9-49c9-8f88-0d8c22e5c7de WU Lianquan, CHU Xianteng, YANG Haitao, et al. X-ray detection of prohibited items based on improved YOLOX[J]. Infrared Technology, 2023, 45(4): 427-435. http://hwjs.nvir.cn/cn/article/id/7e45bcc9-aca9-49c9-8f88-0d8c22e5c7de
[7]	苏海锋, 赵岩, 武泽君, 等. 基于改进RetinaNet的电力设备红外目标精细化检测模型[J]. 红外技术, 2021, 43(11): 1104-1111. http://hwjs.nvir.cn/cn/article/id/3233a6a1-cbf0-4110-baa5-2a56e551f092 SU Haifeng, ZHAO Yan, WU Zejun, et al. Refined infrared object detection model for power equipment based on improved RetinaNet[J]. Infrared Technology, 2021, 43(11): 1104-1111. http://hwjs.nvir.cn/cn/article/id/3233a6a1-cbf0-4110-baa5-2a56e551f092
[8]	徐微, 汤俊伟, 张驰. 基于RA-UNet++的肝癌图像分割方法[J/OL]. 软件导刊: 1-6, [2023-06-28]. http://kns.cnki.net/kcms/detail/42.1671.TP.20230625.2233.048.html. XU Wei, TANG Junwei, ZHANG Chi. Image segmentation method of liver cancer based on RA-UNet++ Network[J/OL]. Software Guide: 1-6, [2023-06-28]. http://kns.cnki.net/kcms/detail/42.1671.TP.20230625.2233.048.html
[9]	刘伟光, 孔令军. 一种基于TransUnet的臂丛神经超声图像分割网络[J/OL]. 无线电通信技术: 1-8. [2023-06-28]. http://kns.cnki.net/kcms/detail/13.1099.TN.20230625.1719.020.html. LIU Weiguang, KONG Lingjun. A brachial plexus nerve ultrasonography segmentation network based on TransUnet[J/OL]. Radio Communications Technology: 1-8. [2023-06-28]. http://kns.cnki.net/kcms/detail/13.1099.TN.20230625.1719.020.html
[10]	Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014: 580-587.
[11]	Girshick R. Fast R-CNN[C]//Proceedings of the IEEE International Conference on Computer Vision, 2015: 1440-1448.
[12]	REN S Q, HE K M, Girshick R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149. DOI: 10.1109/TPAMI.2016.2577031
[13]	Redmon J, Divvala S, Girshick R, et al. You only look once: unified, real-time object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 779-788.
[14]	Redmon J, Farhadi A. YOLO9000: better, faster, stronger[C]// Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017: 6517-6525.
[15]	Redmon J, Farhadi A. Yolov3: An incremental improvement[J/OL]. arXiv preprint arXiv: 1804.02767, https://arxiv.org/abs/1804.02767.
[16]	LIU W, Anguelov D, Erhan D, et al. SSD: single shot multibox detector[C]//Computer Vision-ECCV Proceedings, 2016: 21-37.
[17]	LIN T Y, Goyal P, Girshick R, et al. Focal loss for dense object detection[C]//Proceedings of the IEEE International Conference on Computer Vision, 2017: 2980-2988.
[18]	李强龙, 周新文, 位梦恩, 等. 基于条形池化和注意力机制的街道场景红外目标检测算法[J/OL]. 计算机工程: 1-13, [2023-05-20]. DOI: 10.19678/j.issn.1000-3428.0065481. LI Qianglong, ZHOU Xinwen, WEI Meng'en, et al. Infrared target detection algorithm based on strip pooling and attention mechanism in street scene[J/OL]. Computer Engineering: 1-13, [2023-05-20]. DOI: 10.19678/j.issn.1000-3428.0065481.
[19]	蒋昕昊, 蔡伟, 杨志勇, 等. 基于YOLO-IDSTD算法的红外弱小目标检测[J]. 红外与激光工程, 2022, 51(3): 502-511. https://www.cnki.com.cn/Article/CJFDTOTAL-HWYJ202203045.htm JIANG Xinhao, CAI Wei, YANG Zhiyong, et al. Infrared dim and small target detection based on YOLO-IDSTD algorithm[J]. Infrared and Laser Engineering, 2022, 51(3): 502-511. https://www.cnki.com.cn/Article/CJFDTOTAL-HWYJ202203045.htm
[20]	蔡伟, 徐佩伟, 杨志勇, 等. 复杂背景下红外图像弱小目标检测[J]. 应用光学, 2021, 42(4): 643-650. https://www.cnki.com.cn/Article/CJFDTOTAL-YYGX202104012.htm CAI Wei, XU Peiwei, YANG Zhiyong, et al. Dim-small targets detection of infrared images in complex backgrounds[J]. Journal of Applied Optics, 2021, 42(4): 643-650. https://www.cnki.com.cn/Article/CJFDTOTAL-YYGX202104012.htm
[21]	Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[J/OL]. Advances in Neural Information Processing Systems, 2017, https://arxiv.org/abs/1706.03762.
[22]	LIU Z, LIN Y, CAO Y, et al. Swin transformer: Hierarchical vision transformer using shifted windows[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021: 10012-10022.
[23]	GE Zheng, LIU Songtao, WANG Feng, et al. Yolox: exceeding YOLO series in 2021[EB/OL]. (2021-07-06) [2023-09-27]. https://arxiv.org/abs/2107.08430.
[24]	WANG J, CHEN K, XU R, et al. Carafe: Content-aware reassembly of features[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019: 3007-3016.
[25]	WANG W, XIE E, LI X, et al. Pyramid vision transformer: a versatile backbone for dense prediction without convolutions[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021: 568-578.
[26]	XIA Z, PAN X, SONG S, et al. Vision transformer with deformable attention[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 4794-4803.
[27]	秦鹏, 唐川明, 刘云峰, 等. 基于改进YOLOv3的红外目标检测方法[J]. 计算机工程, 2022, 48(3): 211-219. DOI: 10.19678/j.issn.1000-3428.0060518. QIN Peng, TANG Chuanming, LIU Yunfeng, et al. Infrared target detection method based on improved YOLOv3[J]. Computer Engineering, 2022, 48(3): 211-219. DOI: 10.19678/j.issn.1000-3428.0060518.
[28]	FU H, WANG S, DUAN P, et al. LRAF-Net: long-range attention fusion network for visible-infrared object detection[J]. IEEE Transactions on Neural Networks and Learning Systems, 2023: 1-14. DOI: 10.1109/TNNLS.2023.3266452.

施引文献(17)

期刊类型引用(9)

1.	杨仁梅，赵艳，权军霞，方婷婷，白莎莎，费利燕. 1例血液透析患者股静脉导管周围医用粘胶剂相关性皮肤损伤的护理. 当代护士(中旬刊). 2025(04): 109-112 . 百度学术
2.	李猛，尚坤，陈树刚，刘秀斌，王奕霏，吴璠. 一种适用于载人航天飞行的针织手套设计及性能分析. 载人航天. 2024(01): 17-22 . 百度学术
3.	陈红，段小文，郭玲玲，范硕，祝成炎，张红霞. 远红外涤纶交织面料的开发及其结构性能. 上海纺织科技. 2024(04): 64-68 . 百度学术
4.	朱小英，朱丽舒，孔月明. 艾灸联合远红外线照射改善一例血液透析动静脉内瘘血肿的效果. 名医. 2024(11): 69-71 . 百度学术
5.	郑红菊，张方方，冯文艇. 低强度激光长期暴露对女性工人职业健康的影响. 职业卫生与应急救援. 2023(05): 595-598 . 百度学术
6.	侯刘林，李贺，宗珂. 等速肌力训练联合远红外线照射在乳腺癌根治术后患者中的应用效果. 癌症进展. 2022(10): 1024-1027 . 百度学术
7.	杨永健，李丽娟，庞永诚，杨海玲，龚瑞莹. 基于网络药理学探讨三黄紫参油治疗压力性损伤的作用机制. 湖南中医杂志. 2022(08): 160-167 . 百度学术
8.	叶来生，孟林，梁浩瀚，黄为，翁晨祎. 髋关节置换术后早期疼痛外治法应用的研究进展. 大众科技. 2022(10): 96-99 . 百度学术
9.	王琬婧，刘晓雯，刘瑶. 远红外线干预应用于自体动静脉内瘘护理的研究进展. 医疗装备. 2021(23): 194-196 . 百度学术