Infrared and Visible Image Fusion Using Attention- Based Generative Adversarial Networks
-
摘要: 目前,基于深度学习的融合方法依赖卷积核提取局部特征,而单尺度网络、卷积核大小以及网络深度的限制无法满足图像的多尺度与全局特性。为此,本文提出了红外与可见光图像注意力生成对抗融合方法。该方法采用编码器和解码器构成的生成器以及两个判别器。在编码器中设计了多尺度模块与通道自注意力机制,可以有效提取多尺度特征,并建立特征通道长距离依赖关系,增强了多尺度特征的全局特性。此外,构建了两个判别器,以建立生成图像与源图像之间的对抗关系,保留更多细节信息。实验结果表明,本文方法在主客观评价上都优于其他典型方法。Abstract: At present, deep learning-based fusion methods rely only on convolutional kernels to extract local features, but the limitations of single-scale networks, convolutional kernel size, and network depth cannot provide a sufficient number of multi-scale and global image characteristics. Therefore, here we propose an infrared and visible image fusion method using attention-based generative adversarial networks. This study uses a generator consisting of an encoder and decoder, and two discriminators. The multi-scale module and channel self-attention mechanism are designed in the encoder, which can effectively extract multi-scale features and establish the dependency between the long ranges of feature channels, thus enhancing the global characteristics of multi-scale features. In addition, two discriminators are constructed to establish an adversarial relationship between the fused image and the source images to preserve more detailed information. The experimental results demonstrate that the proposed method is superior to other typical methods in both subjective and objective evaluations.
-
表 1 生成器参数设置
Table 1. Parameter setting of the generator
Parts Layer Kernel size/stride Input channel/ Output channel, activation Encoder C0 3×3/1 1/16, LeakyReLU Res2Net1 - 16/32, LeakyReLU Res2Net2 - 32/64, LeakyReLU Decoder C1 3×3/1 128/64, LeakyReLU C2 3×3/1 64/32, LeakyReLU C3 3×3/1 32/16, LeakyReLU C4 3×3/1 16/1, Tanh 表 2 判别器参数设置
Table 2. Parameter setting of the discriminators
Layer Kernel size/stride Input channel/Output channel, activation L1 3×3/2 1/16, LeakyReLU L2 3×3/2 16/32, LeakyReLU L3 3×3/2 32/64, LeakyReLU L4 3×3/2 64/128, LeakyReLU L5(FC(1)) - 128/1, Tanh 表 3 消融实验的定量比较
Table 3. Quantitative analysis of ablation experiment
Methods EN SD CC SCD MS-SSIM VIFF No-CA 7.2439 42.6525 0.6305 1.7705 0.9221 0.5149 No-Res2Net 7.3372 46.2277 0.6295 1.8453 0.9227 0.5572 Ours 7.3596 46.9659 0.6290 1.8494 0.9278 0.5683 表 4 时间计算率比较
Table 4. Comparison of time efficiency
Methods TNO Roadscene CVT 1.33 0.92 ASR 332.21 165.23 WLS 2.23 1.17 DenseFuse 0.11 0.08 FusionGan 1.98 1.02 IFCNN 0.08 0.07 Ours 0.23 0.19 -
[1] MA J, MA Y, LI C. Infrared and visible image fusion methods and applications: a survey[J]. Information Fusion, 2019, 45: 153-178. doi: 10.1016/j.inffus.2018.02.004 [2] LI S, KANG X, FANG L, et al. Pixel-level image fusion: a survey of the state of the art[J]. Information Fusion, 2017, 33: 100-112. doi: 10.1016/j.inffus.2016.05.004 [3] LIU Y, CHEN X, WANG Z, et al. Deep learning for pixel-level image fusion: Recent advances and future prospects[J]. Information Fusion, 2018, 42: 158-173. doi: 10.1016/j.inffus.2017.10.007 [4] LI S, YANG B, HU J. Performance comparison of different multi-resolution transforms for image fusion[J]. Information Fusion, 2011, 12(2): 74-84. doi: 10.1016/j.inffus.2010.03.002 [5] ZHANG Q, LIU Y, Rick S Blum, et al. Sparse representation based multi-sensor image fusion for multi-focus and multi-modality images: a review[J]. Information Fusion, 2018, 40: 57-75. doi: 10.1016/j.inffus.2017.05.006 [6] ZHANG Xiaoye, MA Yong, ZHANG Ying, et al. Infrared and visible image fusion via saliency analysis and local edge-preserving multi-scale decomposition[J]. Journal of the Optical Society of America A Optics Image Science & Vision, 2017, 34(8): 1400-1410. [7] YU L, LIU S, WANG Z. A general framework for image fusion based on multi-scale transform and sparse representation[J]. Information Fusion, 2015, 24: 147-164. doi: 10.1016/j.inffus.2014.09.004 [8] HAN J, Pauwels E J, P De Zeeuw. Fast saliency-aware multimodality image fusion[J]. Neurocomputing, 2013, 111: 70-80. doi: 10.1016/j.neucom.2012.12.015 [9] YIN Haitao. Sparse representation with learned multiscale dictionary for image fusion[J]. Neurocomputing, 2015, 148: 600-610. doi: 10.1016/j.neucom.2014.07.003 [10] WANG Zhishe, YANG Fengbao, PENG Zhihao, et al. Multi-sensor image enhanced fusion algorithm based on NSST and top-hat transformation[J]. Optik-International Journal for Light and Electron Optics, 2015, 126(23): 4184-4190. doi: 10.1016/j.ijleo.2015.08.118 [11] CUI G, FENG H, XU Z, et al. Detail preserved fusion of visible and infrared images using regional saliency extraction and multi-scale image decomposition[J]. Optics Communications, 2015, 341: 199-209. doi: 10.1016/j.optcom.2014.12.032 [12] LI Q, LU L, LI Z, et al. Coupled GAN with relativistic discriminators for infrared and visible images fusion[J]. IEEE Sensors Journal, 2021, 21(6): 7458-7467. doi: 10.1109/JSEN.2019.2921803 [13] LIU Y, CHEN X, CHENG J, et al. Infrared and visible image fusion with convolutional neural networks[J]. International Journal of Wavelets, Multiresolution and Information Processing, 2018, 16(3): 1850018. doi: 10.1142/S0219691318500182 [14] LI H, WU X J. DenseFuse: a fusion approach to infrared and visible images[J]. IEEE transactions on Image Processing: a Publication of the IEEE Signal Processing Society, 2019, 28(5): 2614-2523. doi: 10.1109/TIP.2018.2887342 [15] XU H, MA J, JIANG J, et al. U2Fusion: A unified unsupervised image fusion network[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 44(1): 502-518. [16] HOU R. VIF-Net: an unsupervised framework for infrared and visible image fusion[J]. IEEE Transactions on Computational Imaging, 2020, 6: 640-651. doi: 10.1109/TCI.2020.2965304 [17] HUI L A, XJW A, JK B. RFN-Nest: An end-to-end residual fusion network for infrared and visible images[J]. Information Fusion, 2021, 73: 72-86. doi: 10.1016/j.inffus.2021.02.023 [18] MA J, WEI Y, LIANG P, et al. FusionGAN: a generative adversarial network for infrared and visible image fusion[J]. Information Fusion, 2019, 48: 11-26. doi: 10.1016/j.inffus.2018.09.004 [19] JM A, Pl A, WEI Y A, et al. Infrared and visible image fusion via detail preserving adversarial learning[J]. Information Fusion, 2020, 54: 85-98. doi: 10.1016/j.inffus.2019.07.005 [20] MA J, XU H, JIANG J, et al. DDcGAN: A dual-discriminator conditional generative adversarial network for multi-resolution image fusion[J]. IEEE Transactions on Image Processing, 2020, 29: 4980-4995. doi: 10.1109/TIP.2020.2977573 [21] GAO S, CHENG M M, ZHAO K, et al. Res2Net: A new multi-scale backbone architecture[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(2): 652-662. doi: 10.1109/TPAMI.2019.2938758 [22] FU J, LIU J, TIAN H, et al. Dual attention network for scene segmentation[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), 2020: DOI: 10.1109/cvpr.2019.00326. [23] Nencini F, Garzelli A, Baronti S, et al. Alparone, remote sensing image fusion using the curvelet transform[J]. Information Fusion, 2007, 8(2): 143-156. doi: 10.1016/j.inffus.2006.02.001 [24] LIU Y, WANG Z. Simultaneous image fusion and denoising with adaptive sparse representation[J]. Image Processing Iet. , 2014, 9(5): 347-357. [25] MA J, ZHOU Z, WANG B, et al. Infrared and visible image fusion based on visual saliency map and weighted least square optimization[J]. Infrared Physics & Technology, 2017, 82: 8-17. [26] YU Z A, YU L B, PENG S C, et al. IFCNN: A general image fusion framework based on convolutional neural network[J]. Information Fusion, 2020, 54: 99-118. doi: 10.1016/j.inffus.2019.07.011