Object Detection in Visible Light and Infrared Images Based on Adaptive Attention Mechanism
-
摘要: 针对红外和可见光目标检测方法存在的不足,将深度学习技术与多源目标检测相结合,提出了一种基于自适应注意力机制的目标检测方法。该方法首先以深度可分离卷积为核心构建双源特征提取结构,分别提取红外和可见光目标特征。其次,为充分互补目标多模态信息,设计了自适应注意力机制,以数据驱动的方式加权融合红外和可见光特征,保证特征充分融合的同时降低噪声干扰。最后,针对多尺度目标检测,将自适应注意力机制结合多尺度参数来提取并融合目标全局和局部特征,提升尺度不变性。通过实验表明,所提方法相较于同类型目标检测算法能够准确高效地在复杂场景下实现目标识别和定位,并且在实际变电站设备检测中,该方法也体现出更高的泛化性和鲁棒性,可以有效辅助机器人完成目标检测任务。Abstract: To address the shortcomings of infrared and visible light object detection methods, a detection method based on an adaptive attention mechanism that combines deep learning technology with multi-source object detection is proposed. First, a dual-source feature extraction structure is constructed based on deep separable convolution to extract the features of infrared and visible objects. Second, an adaptive attention mechanism is designed to fully complement the multimodal information of the object, and the infrared and visible features are weighted and fused using a data-driven method to ensure the full fusion of features and reduce noise interference. Finally, for multiscale object detection, the adaptive attention mechanism is combined with multiscale parameters to extract and fuse the global and local features of the object to improve the scale invariance. Experiments show that the proposed method can accurately and efficiently achieve target recognition and localization in complex scenarios compared to similar object detection algorithms. Moreover, in actual substation equipment detection, this method also demonstrates higher generalization and robustness, which can effectively assist robots in completing object detection tasks.
-
表 1 特征提取支路
Table 1. Feature extraction branch
Module Layer Repetitions Output Input RGB, 3 1 512×448 Init Conv 3×3, 10
DWconv 3×3, 31 256×224 Max pooling 2×2, 3 Block 1 DWconv 3×3, 32
Residual1 128×112 Block 2 DWconv 3×3, 64
Residual2 64×56 Block 3 DWconv 3×3, 128
Residual3 32×28 Block 4 DWconv 3×3, 256
Residual3 16×14 Block 5 DWconv 3×3, 512
Residual2 8×7 表 2 网络训练超参及策略
Table 2. Network training hyperparameter and strategy
Parameter Value Batch_Size 4 Base_Lr 0.01 Momentum 0.95 Weight_Decay 0.0005 Learning step Optimization Adam Loss function Cross Entropy 表 3 单源网络测试对比
Table 3. Single source network test comparison
表 4 双源特征融合测试对比
Table 4. Comparison of dual source feature fusion
Network FPS Accuracy/(%) mAP mAPs mAPm mAPl Infrared branch 120 62.1 43.8 65.9 72.8 Visible branch 121 69.3 48.1 71.9 79.3 SE fusion 89 71.2 51.5 74.7 82.4 CBAM fusion 87 72.0 52.4 75.2 83.1 AAM fusion 86 72.6 53.8 75.9 83.6 表 5 多尺度结构对比
Table 5. Multiscale structure comparison
Network FPS Accuracy /(%) mAP mAPs mAPm mAPl Pyramid multiscale 86 72.6 53.8 75.9 83.6 AAM multiscale 84 73.5 54.9 76.4 84.0 表 6 同类方法测试对比
Table 6. Test comparison of similar methods
表 7 KAIST数据集测试对比
Table 7. Test comparison of KAIST dataset
-
[1] 王灿, 卜乐平. 基于卷积神经网络的目标检测算法综述[J]. 舰船电子工程, 2021, 41(9): 161-169. https://www.cnki.com.cn/Article/CJFDTOTAL-JCGC202109036.htmWANG Can, BU Leping. Overview of target detection algorithms based on convolutional neural networks[J]. Naval Electronic Engineering, 2021, 41(9): 161-169. https://www.cnki.com.cn/Article/CJFDTOTAL-JCGC202109036.htm [2] 郝永平, 曹昭睿, 白帆, 等. 基于兴趣区域掩码卷积神经网络的红外-可见光图像融合与目标识别算法研究[J]. 光子学报, 2021, 50(2): 84-98. https://www.cnki.com.cn/Article/CJFDTOTAL-GZXB202102010.htmHAO Yongping, CAO Zhaorui, BAI Fan, et al Research on infrared visible image fusion and target recognition algorithm based on region of interest mask convolution neural network[J]. Acta PHOTONICA Sinica, 2021, 50 (2): 84-98 https://www.cnki.com.cn/Article/CJFDTOTAL-GZXB202102010.htm [3] 刘齐, 王茂军, 高强, 等. 基于红外成像技术的电气设备故障检测[J]. 电测与仪表, 2019, 56(10): 122-126. https://www.cnki.com.cn/Article/CJFDTOTAL-DCYQ201910020.htmLIU Qi, WANG Maojun, GAO Qiang, et al Electrical equipment fault detection based on infrared imaging technology[J]. Electric Measurement and Instrumentation, 2019, 56(10): 122-126. https://www.cnki.com.cn/Article/CJFDTOTAL-DCYQ201910020.htm [4] XIA J, LU Y, TAN L, et al. Intelligent fusion of infrared and visible image data based on convolutional sparse representation and improved pulse-coupled neural network[J]. Computers, Materials and Continua, 2021, 67(1): 613-624. doi: 10.32604/cmc.2021.013457 [5] 汪勇, 张英, 廖如超, 等. 基于可见光、热红外及激光雷达传感的无人机图像融合方法[J]. 激光杂志, 2020, 41(2): 141-145. https://www.cnki.com.cn/Article/CJFDTOTAL-JGZZ202002029.htmWANG Yong, ZHANG Ying, LIAO Ruchao, et al. UAV image fusion method based on visible light, thermal infrared and lidar sensing[J]. Laser Journal, 2020, 41(2): 141-145. https://www.cnki.com.cn/Article/CJFDTOTAL-JGZZ202002029.htm [6] ZHANG S, LI X, ZHANG X, et al. Infrared and visible image fusion based on saliency detection and two-scale transform decomposition[J]. Infrared Physics & Technology, 2021, 114(3): 103626. [7] 王传洋. 基于红外与可见光图像的电力设备识别的研究[D]. 北京: 华北电力大学, 2017.WANG Chuanyang. Research on Power Equipment Recognition Based on Infrared and Visible Images[D]. Beijing: North China Electric Power University, 2017. [8] LI H, WU X J. Infrared and visible image fusion using Latent low-rank representation[J]. Arxiv Preprint Arxiv, 2018: 1804.08992. [9] HUI L, WU X J. DenseFuse: A fusion approach to infrared and visible images[J]. IEEE Transactions on Image Processing, 2018, 28(5): 2614-2623. [10] 唐聪, 凌永顺, 杨华, 等. 基于深度学习的红外与可见光决策级融合跟踪[J]. 激光与光电子学进展, 2019, 56(7): 209-216. https://www.cnki.com.cn/Article/CJFDTOTAL-JGDJ201907023.htmTANG Cong, LING Yongshun, YANG Hua, et al. Decision-level fusion tracking of infrared and visible light based on deep learning[J]. Advances in Lasers and Optoelectronics, 2019, 56(7): 209-216. https://www.cnki.com.cn/Article/CJFDTOTAL-JGDJ201907023.htm [11] MA J, TANG L, XU M, et al. STDFusionNet: an infrared and visible image fusion network based on salient object detection[J]. IEEE Transactions on Instrumentation and Measurement, 2021, 70: 1-13. [12] 杨雪鹤, 刘欢喜, 肖建力. 多模态生物特征提取及相关性评价综述[J]. 中国图象图形学报, 2020, 25(8): 1529-1538. https://www.cnki.com.cn/Article/CJFDTOTAL-ZGTB202008002.htmYANG Xuehe, LIU Huanxi, XIAO Jianli. A review of multimodal biometric feature extraction and correlation evaluation[J]. Chinese Journal of Image and Graphics, 2020, 25(8): 1529-1538. https://www.cnki.com.cn/Article/CJFDTOTAL-ZGTB202008002.htm [13] WANG Z, XIN Z, HUANG X, et al. Overview of SAR image feature extraction and object recognition[J]. Springer, 2021, 234(4): 69-75. [14] WEI Z. A summary of research and application of deep learning[J]. International Core Journal of Engineering, 2019, 5(9): 167-169. [15] Bochkovskiy A, WANG C Y, LIAO H. YOLOv4: Optimal speed and accuracy of object detection[J]. Arxiv Preprint Arxiv, 2020: 2004.10934. [16] HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]// IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016: 770-778. [17] Howard A, Sandler M, Chen B, et al. Searching for MobileNetV3 [C]//IEEE International Conference on Computer Vision (ICCV), 2020: 1314-1324. [18] CHEN H, WANG Y, XU C, et al. AdderNet: Do we really need multiplications in deep learning?[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2020: 1465-1474. [19] 宋鹏汉, 辛怀声, 刘楠楠. 基于深度学习的海上舰船目标多源特征融合识别[J]. 中国电子科学研究院学报, 2021, 16(2): 127-133. https://www.cnki.com.cn/Article/CJFDTOTAL-KJPL202102004.htmSONG Penghan, XIN Huaisheng, LIU Nannan. Multi-source feature fusion recognition of marine ship targets based on deep learning[J]. Journal of the Chinese Academy of Electronic Sciences, 2021, 16(2): 127-133. https://www.cnki.com.cn/Article/CJFDTOTAL-KJPL202102004.htm [20] Hassan E. Multiple object tracking using feature fusion in hierarchical LSTMs[J]. The Journal of Engineering, 2020(10): 893-899. [21] LIN T Y, Dollar P, Girshick R, et al. Feature pyramid networks for object detection[C]// IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017: 936-944. [22] LIU S, HUANG D, WANG Y. Learning spatial fusion for single-shot object detection[J]. Arxiv Preprint Arxiv, 2019: 1911.09516v1. [23] LI C, ZHAO N, LU Y, et al. Weighted sparse representation regularized graph learning for RGB-T object tracking[C]// Acm on Multimedia Conference, ACM, 2017: 1856-1864. [24] XIAO X, WANG B, MIAO L, et al. Infrared and visible image object detection via focused feature enhancement and cascaded semantic extension[J]. Remote Sensing, 2021, 13(13): 2538. doi: 10.3390/rs13132538