Object Detection Algorithm Based on Infrared and Visible Light Images
-
摘要: 针对现有基于可见光的目标检测算法存在的不足,提出了一种红外和可见光图像融合的目标检测方法。该方法将深度可分离卷积与残差结构相结合,构建并列的高效率特征提取网络,分别提取红外和可见光图像目标信息;同时,引入自适应特征融合模块以自主学习的方式融合两支路对应尺度的特征,使两类图像信息互补;最后,利用特征金字塔结构将深层特征逐层与浅层融合,提升网络对不同尺度目标的检测精度。实验结果表明,所提网络能够充分融合红外和可见光图像中的有效信息,并在保障精度与效率的前提下实现目标识别与定位;同时,在实际变电站设备检测场景中,该网络也体现出较好的鲁棒性和泛化能力,可以高效完成检测任务。Abstract: A target detection method based on infrared and visible image fusion is proposed to overcome the shortcomings of the existing target detection algorithms based on visible light. In this method, depth separable convolution and the residual structure are combined to construct a parallel high-efficiency feature extraction network to extract the object information of infrared and visible images, respectively. Simultaneously, the adaptive feature fusion module is introduced to fuse the features of the corresponding scales of the two branches through autonomous learning such that the two types of image information are complementary. Finally, the deep and shallow features are fused layer by layer using the feature pyramid structure to improve the detection accuracy of different scale targets. Experimental results show that the proposed network can completely integrate the effective information in infrared and optical images and realize target recognition and location on the premise of ensuring accuracy and efficiency. Moreover, in the actual substation equipment detection scene, the network shows good robustness and generalization ability and can efficiently complete the detection task.
-
Key words:
- object detection /
- infrared and visible light image /
- deep learning /
- adaptive fusion
-
表 1 特征提取结构
Table 1. Feature extraction structure
Stage Layer structure Repetitions Output size Original RGB, 3 1 448×448 Init Conv 3×3, 13
Max pooling 2×2, 31 224×224 Stage1 DWconv 3×3, 32
Residual1 112×112 Stage2 DWconv 3×3, 64
Residual2 56×56 Stage3 DWconv 3×3, 128
Residual4 28×28 Stage4 DWconv 3×3, 256
Residual4 14×14 Stage5 DWconv 3×3, 512
Residual2 7×7 表 2 可见光网络测试结果对比
Table 2. Comparison of visible network test results
表 3 不同结构测试结果
Table 3. Different dilation rates test results
Network FPS Test accuracy /(%) mAP mAPs mAPm mAPl Infrared branch 94 61.0 45.3 64.2 71.8 Visible-light branch 93 67.3 48.5 70.1 77.6 Eltwise Fusion 81 70.4 51.1 73.8 80.1 Concat Fusion 79 71.6 52.3 74.2 81.6 This paper 78 73.8 54.3 75.1 83.2 表 4 同类型网络测试结果对比
Table 4. Comparison of test results of the same type of network
-
[1] 孙怡峰, 吴疆, 黄严严, 等. 一种视频监控中基于航迹的运动小目标检测算法[J]. 电子与信息学报, 2019, 41(11): 2744-2751. https://www.cnki.com.cn/Article/CJFDTOTAL-DZYX201911028.htmSUN Yifeng, WU Jiang, HUANG Yan, et al. A track based moving small target detection algorithm in video surveillance [J]. Journal of Electronics and Information, 2019, 41(11): 2744-2751. https://www.cnki.com.cn/Article/CJFDTOTAL-DZYX201911028.htm [2] LIN C, LU J, GANG W, et al. Graininess-aware deep feature learning for pedestrian detection[J]. IEEE Transactions on Image Processing, 2020, 29: 3820-3834. doi: 10.1109/TIP.2020.2966371 [3] 范丽丽, 赵宏伟, 赵浩宇, 等. 基于深度卷积神经网络的目标检测研究综述[J]. 光学精密工程, 2020, 28(5): 161-173. https://www.cnki.com.cn/Article/CJFDTOTAL-GXJM202005019.htmFAN Lili, ZHAO Hongwei, ZHAO Haoyu, et al. Overview of target detection based on deep convolution neural network[J]. Optical Precision Engineering, 2020, 28(5): 161-173. https://www.cnki.com.cn/Article/CJFDTOTAL-GXJM202005019.htm [4] 赵永强, 饶元, 董世鹏, 等. 深度学习目标检测方法综述[J]. 中国图象图形学报, 2020, 288(4): 5-30. https://www.cnki.com.cn/Article/CJFDTOTAL-ZGTB202004001.htmZHAO Yongqiang, RAO yuan, DONG Shipeng, et al. Overview of deep learning target detection methods[J]. Chinese Journal of Image and Graphics, 2020, 288(4): 5-30. https://www.cnki.com.cn/Article/CJFDTOTAL-ZGTB202004001.htm [5] 罗会兰, 彭珊, 陈鸿坤. 目标检测难点问题最新研究进展综述[J]. 计算机工程与应用, 2021, 57(5): 36-46. https://www.cnki.com.cn/Article/CJFDTOTAL-JSGG202105005.htmLUO Huilan, PENG Shan, CHEN Hongkun. Overview of the latest research progress on difficult problems of target detection[J]. Computer Engineering and Application, 2021, 57(5): 36-46. https://www.cnki.com.cn/Article/CJFDTOTAL-JSGG202105005.htm [6] 郝永平, 曹昭睿, 白帆, 等. 基于兴趣区域掩码卷积神经网络的红外-可见光图像融合与目标识别算法研究[J]. 光子学报, 2021, 50(2): 15-16. https://www.cnki.com.cn/Article/CJFDTOTAL-GZXB202102010.htmHAO Yongping, CAO Zhaorui, BAI fan, et al. Research on infrared visible image fusion and target recognition algorithm based on region of interest mask convolution neural network[J]. Acta Photonica Sinica, 2021, 50(2): 15-16. https://www.cnki.com.cn/Article/CJFDTOTAL-GZXB202102010.htm [7] 李舒涵, 许宏科, 武治宇. 基于红外与可见光图像融合的交通标志检测[J]. 现代电子技术, 2020, 43(3): 45-49. https://www.cnki.com.cn/Article/CJFDTOTAL-XDDJ202003012.htmLI Shuhan, XU Hongke, WU Zhiyu. Traffic sign detection based on infrared and visible image fusion [J]. Modern Electronic Technology, 2020, 43(3): 45-49. https://www.cnki.com.cn/Article/CJFDTOTAL-XDDJ202003012.htm [8] XIAO X, WANG B, MIAO L, et al. Infrared and visible image object detection via focused feature enhancement and cascaded semantic extension[J]. Remote Sensing, 2021, 13(13): 2538. [9] Banuls A, Mandow A, Vazquez-Martin R, et al. Object detection from thermal infrared and visible light cameras in search and rescue scenes[C]// 2020 IEEE International Symposium on Safety, Security, and Rescue Robotics (SSRR). IEEE, 2020: 380-386. [10] 李章维, 胡安顺, 王晓飞. 基于视觉的目标检测方法综述[J]. 计算机工程与应用, 2020, 56(8): 7-15. https://www.cnki.com.cn/Article/CJFDTOTAL-JSGG202008002.htmLI Zhangwei, HU Anshun, WANG Xiaofei. Overview of vision based target detection methods[J]. Computer Engineering and Application, 2020, 56(8): 7-15. https://www.cnki.com.cn/Article/CJFDTOTAL-JSGG202008002.htm [11] 汪廷. 红外图像与可见光图像融合研究与应用[D]. 西安: 西安理工大学, 2019.WANG Ting. Research and Application of Infrared Image and Visible Image Fusion[D]. Xi'an: Xi'an University of Technology, 2019. [12] XIANG X, LV N, YU Z, et al. Cross-modality person re-identification based on dual-path multi-branch network[J]. IEEE Sensors Journal, 2019, 19(23): 11706-11713. [13] REN S, HE K, Girshick R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2017, 39(6): 1137-1149. [14] Bochkovskiy A, WANG C Y, LIAO H. YOLOv4: Optimal speed and accuracy of object detection[J/OL]. Arxiv Preprint Arxiv, https://arxiv.org/abs/2004.10934. [15] LIU W, Anguelov D, Erhan D, et al. SSD: Single Shot MultiBox Detector[C]// European Conference on Computer Vision, 2016: 21-37. [16] TIAN Z, SHEN C, CHEN H, et al. FCOS: Fully convolutional one-stage object detection[C]// 2019 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE, 2020: 9626-9635. [17] DUAN K, XIE L, QI H, et al. Location-sensitive visual recognition with cross-IOU loss[J/OL]. Arxiv Preprint Arxiv, https://arxiv.org/abs/2104.04899. [18] LI C, ZHAO N, LU Y, et al. Weighted sparse representation regularized graph learning for RGB-T object tracking[C]// ACM on Multimedia Conference. ACM, 2017: 1856-1864. [19] 白玉, 侯志强, 刘晓义, 等. 基于可见光图像和红外图像决策级融合的目标检测算法[J]. 空军工程大学学报, 2020, 21(6): 53-59. https://www.cnki.com.cn/Article/CJFDTOTAL-KJGC202006009.htmBAI Yu, HOU Zhiqiang, LIU Xiaoyi, et al. Target detection algorithm based on decision level fusion of visible and infrared images[J]. Journal of Air Force Engineering University, 2020, 21(6): 53-59. https://www.cnki.com.cn/Article/CJFDTOTAL-KJGC202006009.htm