基于自适应注意力机制的红外与可见光图像目标检测

赵松璞; 杨利萍; 赵昕; 彭志远; 梁东兴; 梁洪军

基于自适应注意力机制的红外与可见光图像目标检测

深圳市朗驰欣创科技股份有限公司成都分公司, 四川成都 610000

基金项目:

深圳市科技计划项目 JSGG20210802153009029

详细信息

作者简介:
赵松璞（1973-），男，汉族，陕西西安人，硕士，工程师。研究方向：机器人技术、智能电网、模式识别。E-mail: 1419446206@qq.com

中图分类号: TP391.41
计量
- 文章访问数: 39
- HTML全文浏览量: 14
- PDF下载量: 12
- 被引次数: 0
出版历程
- 收稿日期: 2022-08-30
- 修回日期: 2022-09-28
- 刊出日期: 2024-04-20

Object Detection in Visible Light and Infrared Images Based on Adaptive Attention Mechanism

Shenzhen Launch Digital Technology Co. Ltd. Chengdu Branch, Chengdu 610000, China

摘要

摘要: 针对红外和可见光目标检测方法存在的不足，将深度学习技术与多源目标检测相结合，提出了一种基于自适应注意力机制的目标检测方法。该方法首先以深度可分离卷积为核心构建双源特征提取结构，分别提取红外和可见光目标特征。其次，为充分互补目标多模态信息，设计了自适应注意力机制，以数据驱动的方式加权融合红外和可见光特征，保证特征充分融合的同时降低噪声干扰。最后，针对多尺度目标检测，将自适应注意力机制结合多尺度参数来提取并融合目标全局和局部特征，提升尺度不变性。通过实验表明，所提方法相较于同类型目标检测算法能够准确高效地在复杂场景下实现目标识别和定位，并且在实际变电站设备检测中，该方法也体现出更高的泛化性和鲁棒性，可以有效辅助机器人完成目标检测任务。
- 红外与可见光 /
- 目标检测 /
- 深度学习 /
- 自适应注意力机制
Abstract: To address the shortcomings of infrared and visible light object detection methods, a detection method based on an adaptive attention mechanism that combines deep learning technology with multi-source object detection is proposed. First, a dual-source feature extraction structure is constructed based on deep separable convolution to extract the features of infrared and visible objects. Second, an adaptive attention mechanism is designed to fully complement the multimodal information of the object, and the infrared and visible features are weighted and fused using a data-driven method to ensure the full fusion of features and reduce noise interference. Finally, for multiscale object detection, the adaptive attention mechanism is combined with multiscale parameters to extract and fuse the global and local features of the object to improve the scale invariance. Experiments show that the proposed method can accurately and efficiently achieve target recognition and localization in complex scenarios compared to similar object detection algorithms. Moreover, in actual substation equipment detection, this method also demonstrates higher generalization and robustness, which can effectively assist robots in completing object detection tasks.
- infrared and visible light /
- object detection /
- deep learning /
- adaptive attention mechanisms

HTML全文

图 1 红外-可见光目标检测整体架构

Figure 1. Overall framework of infrared-visible light object detection

下载: 全尺寸图片幻灯片

图 2 特征提取模块

Figure 2. Feature extraction modules

下载: 全尺寸图片幻灯片

图 3 AAM特征融合

Figure 3. AAM feature fusion

下载: 全尺寸图片幻灯片

图 4 AAM多尺度检测

Figure 4. AAM multiscale detection

下载: 全尺寸图片幻灯片

图 5 单源与双源网络检测结果对比

Figure 5. Comparison of single source and dual source network detection results

下载: 全尺寸图片幻灯片

图 6 Block 3特征融合结果对比

Figure 6. Block 3 feature fusion results comparison

下载: 全尺寸图片幻灯片

图 7 多尺度融合结果对比

Figure 7. Comparison of multiscale fusion results

下载: 全尺寸图片幻灯片

图 8 红外-可见光网络检测效果对比（前三排：RGBT210；后两排：KAIST）

Figure 8. Comparison of infrared-visible network detection effects(The first three rows: RGBT210; The last two rows: KAIST)

下载: 全尺寸图片幻灯片

图 9 变电站设备检测效果对比

Figure 9. Comparison of substation equipment detection effects

下载: 全尺寸图片幻灯片

表 1 特征提取支路

Table 1. Feature extraction branch

Module	Layer	Repetitions	Output
Input	RGB, 3	1	512×448
Init	Conv 3×3, 10 DWconv 3×3, 3	1	256×224
Init	Max pooling 2×2, 3	1	256×224
Block 1	DWconv 3×3, 32 Residual	1	128×112
Block 2	DWconv 3×3, 64 Residual	2	64×56
Block 3	DWconv 3×3, 128 Residual	3	32×28
Block 4	DWconv 3×3, 256 Residual	3	16×14
Block 5	DWconv 3×3, 512 Residual	2	8×7

下载: 导出CSV

表 2 网络训练超参及策略

Table 2. Network training hyperparameter and strategy

Parameter	Value
Batch_Size	4
Base_Lr	0.01
Momentum	0.95
Weight_Decay	0.0005
Learning	step
Optimization	Adam
Loss function	Cross Entropy

下载: 导出CSV

表 3 单源网络测试对比

Table 3. Single source network test comparison

Network	FPS	Accuracy/(%)
Network	FPS	mAP	mAP_s	mAP_m	mAP_l
YOLO^[14]	67	70.6	49.8	73.8	81.1
MobileNet^[15]	107	69.7	48.7	72.2	79.5
Ours	121	69.3	48.1	71.9	79.3

下载: 导出CSV

表 4 双源特征融合测试对比

Table 4. Comparison of dual source feature fusion

Network	FPS	Accuracy/(%)
Network	FPS	mAP	mAP_s	mAP_m	mAP_l
Infrared branch	120	62.1	43.8	65.9	72.8
Visible branch	121	69.3	48.1	71.9	79.3
SE fusion	89	71.2	51.5	74.7	82.4
CBAM fusion	87	72.0	52.4	75.2	83.1
AAM fusion	86	72.6	53.8	75.9	83.6

下载: 导出CSV

表 5 多尺度结构对比

Table 5. Multiscale structure comparison

Network	FPS	Accuracy /(%)
Network	FPS	mAP	mAP_s	mAP_m	mAP_l
Pyramid multiscale	86	72.6	53.8	75.9	83.6
AAM multiscale	84	73.5	54.9	76.4	84.0

下载: 导出CSV

表 6 同类方法测试对比

Table 6. Test comparison of similar methods

Network	FPS	Accuracy /(%)
Network	FPS	mAP	mAP_s	mAP_m	mAP_l
Reference [6]	105	67.4	48.3	70.5	77.8
Reference [10]	79	71.3	51.0	73.6	81.2
Reference [24]	73	72.9	53.3	76.1	83.9
Ours	84	73.5	54.9	76.4	84.0

下载: 导出CSV

表 7 KAIST数据集测试对比

Table 7. Test comparison of KAIST dataset

Network	FPS	Accuracy /(%)
Network	FPS	mAP	mAP_s	mAP_m	mAP_l
Reference [6]	107	72.8	50.3	77.6	85.3
Reference [10]	80	77.5	53.1	81.2	88.1
Reference [24]	74	78.6	54.7	73.1	89.9
Ours	85	79.0	55.9	73.4	90.3

下载: 导出CSV

表 8 变电站设备检测测试对比

Table 8. Comparison of inspection and test of substation equipment

Network	FPS	Accuracy /(%)
Network	FPS	mAP	mAP_s	mAP_m	mAP_l
Reference [6]	26	84.2	64.3	86.7	92.1
Reference [10]	18	86.3	66.5	88.2	95.3
Reference [24]	15	88.0	68.2	90.1	96.7
Ours	20	88.5	69.1	90.3	97.0

下载: 导出CSV

参考文献(24)

[1]	王灿, 卜乐平. 基于卷积神经网络的目标检测算法综述[J]. 舰船电子工程, 2021, 41(9): 161-169. https://www.cnki.com.cn/Article/CJFDTOTAL-JCGC202109036.htm WANG Can, BU Leping. Overview of target detection algorithms based on convolutional neural networks[J]. Naval Electronic Engineering, 2021, 41(9): 161-169. https://www.cnki.com.cn/Article/CJFDTOTAL-JCGC202109036.htm
[2]	郝永平, 曹昭睿, 白帆, 等. 基于兴趣区域掩码卷积神经网络的红外-可见光图像融合与目标识别算法研究[J]. 光子学报, 2021, 50(2): 84-98. https://www.cnki.com.cn/Article/CJFDTOTAL-GZXB202102010.htm HAO Yongping, CAO Zhaorui, BAI Fan, et al Research on infrared visible image fusion and target recognition algorithm based on region of interest mask convolution neural network[J]. Acta PHOTONICA Sinica, 2021, 50 (2): 84-98 https://www.cnki.com.cn/Article/CJFDTOTAL-GZXB202102010.htm
[3]	刘齐, 王茂军, 高强, 等. 基于红外成像技术的电气设备故障检测[J]. 电测与仪表, 2019, 56(10): 122-126. https://www.cnki.com.cn/Article/CJFDTOTAL-DCYQ201910020.htm LIU Qi, WANG Maojun, GAO Qiang, et al Electrical equipment fault detection based on infrared imaging technology[J]. Electric Measurement and Instrumentation, 2019, 56(10): 122-126. https://www.cnki.com.cn/Article/CJFDTOTAL-DCYQ201910020.htm
[4]	XIA J, LU Y, TAN L, et al. Intelligent fusion of infrared and visible image data based on convolutional sparse representation and improved pulse-coupled neural network[J]. Computers, Materials and Continua, 2021, 67(1): 613-624. doi: 10.32604/cmc.2021.013457
[5]	汪勇, 张英, 廖如超, 等. 基于可见光、热红外及激光雷达传感的无人机图像融合方法[J]. 激光杂志, 2020, 41(2): 141-145. https://www.cnki.com.cn/Article/CJFDTOTAL-JGZZ202002029.htm WANG Yong, ZHANG Ying, LIAO Ruchao, et al. UAV image fusion method based on visible light, thermal infrared and lidar sensing[J]. Laser Journal, 2020, 41(2): 141-145. https://www.cnki.com.cn/Article/CJFDTOTAL-JGZZ202002029.htm
[6]	ZHANG S, LI X, ZHANG X, et al. Infrared and visible image fusion based on saliency detection and two-scale transform decomposition[J]. Infrared Physics & Technology, 2021, 114(3): 103626.
[7]	王传洋. 基于红外与可见光图像的电力设备识别的研究[D]. 北京: 华北电力大学, 2017. WANG Chuanyang. Research on Power Equipment Recognition Based on Infrared and Visible Images[D]. Beijing: North China Electric Power University, 2017.
[8]	LI H, WU X J. Infrared and visible image fusion using Latent low-rank representation[J]. Arxiv Preprint Arxiv, 2018: 1804.08992.
[9]	HUI L, WU X J. DenseFuse: A fusion approach to infrared and visible images[J]. IEEE Transactions on Image Processing, 2018, 28(5): 2614-2623.
[10]	唐聪, 凌永顺, 杨华, 等. 基于深度学习的红外与可见光决策级融合跟踪[J]. 激光与光电子学进展, 2019, 56(7): 209-216. https://www.cnki.com.cn/Article/CJFDTOTAL-JGDJ201907023.htm TANG Cong, LING Yongshun, YANG Hua, et al. Decision-level fusion tracking of infrared and visible light based on deep learning[J]. Advances in Lasers and Optoelectronics, 2019, 56(7): 209-216. https://www.cnki.com.cn/Article/CJFDTOTAL-JGDJ201907023.htm
[11]	MA J, TANG L, XU M, et al. STDFusionNet: an infrared and visible image fusion network based on salient object detection[J]. IEEE Transactions on Instrumentation and Measurement, 2021, 70: 1-13.
[12]	杨雪鹤, 刘欢喜, 肖建力. 多模态生物特征提取及相关性评价综述[J]. 中国图象图形学报, 2020, 25(8): 1529-1538. https://www.cnki.com.cn/Article/CJFDTOTAL-ZGTB202008002.htm YANG Xuehe, LIU Huanxi, XIAO Jianli. A review of multimodal biometric feature extraction and correlation evaluation[J]. Chinese Journal of Image and Graphics, 2020, 25(8): 1529-1538. https://www.cnki.com.cn/Article/CJFDTOTAL-ZGTB202008002.htm
[13]	WANG Z, XIN Z, HUANG X, et al. Overview of SAR image feature extraction and object recognition[J]. Springer, 2021, 234(4): 69-75.
[14]	WEI Z. A summary of research and application of deep learning[J]. International Core Journal of Engineering, 2019, 5(9): 167-169.
[15]	Bochkovskiy A, WANG C Y, LIAO H. YOLOv4: Optimal speed and accuracy of object detection[J]. Arxiv Preprint Arxiv, 2020: 2004.10934.
[16]	HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]// IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016: 770-778.
[17]	Howard A, Sandler M, Chen B, et al. Searching for MobileNetV3 [C]//IEEE International Conference on Computer Vision (ICCV), 2020: 1314-1324.
[18]	CHEN H, WANG Y, XU C, et al. AdderNet: Do we really need multiplications in deep learning?[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2020: 1465-1474.
[19]	宋鹏汉, 辛怀声, 刘楠楠. 基于深度学习的海上舰船目标多源特征融合识别[J]. 中国电子科学研究院学报, 2021, 16(2): 127-133. https://www.cnki.com.cn/Article/CJFDTOTAL-KJPL202102004.htm SONG Penghan, XIN Huaisheng, LIU Nannan. Multi-source feature fusion recognition of marine ship targets based on deep learning[J]. Journal of the Chinese Academy of Electronic Sciences, 2021, 16(2): 127-133. https://www.cnki.com.cn/Article/CJFDTOTAL-KJPL202102004.htm
[20]	Hassan E. Multiple object tracking using feature fusion in hierarchical LSTMs[J]. The Journal of Engineering, 2020(10): 893-899.
[21]	LIN T Y, Dollar P, Girshick R, et al. Feature pyramid networks for object detection[C]// IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017: 936-944.
[22]	LIU S, HUANG D, WANG Y. Learning spatial fusion for single-shot object detection[J]. Arxiv Preprint Arxiv, 2019: 1911.09516v1.
[23]	LI C, ZHAO N, LU Y, et al. Weighted sparse representation regularized graph learning for RGB-T object tracking[C]// Acm on Multimedia Conference, ACM, 2017: 1856-1864.
[24]	XIAO X, WANG B, MIAO L, et al. Infrared and visible image object detection via focused feature enhancement and cascaded semantic extension[J]. Remote Sensing, 2021, 13(13): 2538. doi: 10.3390/rs13132538

施引文献

资源附件(0)

访问统计

点击查看大图

图(9) / 表(8)

计量

文章访问数: 39
HTML全文浏览量: 14
PDF下载量: 12
被引次数: 0

姓名
邮箱
手机号码
标题
留言内容
验证码

留言板

基于自适应注意力机制的红外与可见光图像目标检测

作者简介:
赵松璞（1973-），男，汉族，陕西西安人，硕士，工程师。研究方向：机器人技术、智能电网、模式识别。E-mail: 1419446206@qq.com

计量

Object Detection in Visible Light and Infrared Images Based on Adaptive Attention Mechanism

计量

目录

留言板

基于自适应注意力机制的红外与可见光图像目标检测

作者简介: 赵松璞（1973-），男，汉族，陕西西安人，硕士，工程师。研究方向：机器人技术、智能电网、模式识别。E-mail: 1419446206@qq.com

计量

出版历程

Object Detection in Visible Light and Infrared Images Based on Adaptive Attention Mechanism

计量

出版历程

目录

作者简介:
赵松璞（1973-），男，汉族，陕西西安人，硕士，工程师。研究方向：机器人技术、智能电网、模式识别。E-mail: 1419446206@qq.com