Infrared and Visible Image Fusion Based on Self-attention Learning
-
摘要: 为确保源图像中的显著区域在融合图像保持显著,提出了一种自注意力引导的红外与可见光图像融合方法。在特征学习层引入自注意力学习机制获取源图像的特征图和自注意力图,利用自注意力图可以捕获到图像中长距离依赖的特性,设计平均加权融合策略对源图像的特征图进行融合,最后将融合后的特征图进行重构获得融合图像。通过生成对抗网络实现了图像特征编码、自注意力学习、融合规则和融合特征解码的学习。TNO真实数据上的实验表明,学习到注意力单元体现了图像中显著的区域,能够较好地引导融合规则的生成,提出的算法在客观和主观评价上优于当前主流红外与可见光图像融合算法,较好地保留了可见光图像的细节信息和红外图像的红外目标信息。Abstract: Due to the lack of image saliency preserving in the existing fusion rules, a self-attention-guided infrared and visible light image fusion method is proposed. First, the feature map and self-attention map of the source images are learnt by the self-attention learning mechanism in the feature learning layer. Next, the self-attention map which can capture the long-distance dependent characteristics of the image is used to design average weighted fusion strategy. Finally, the fused feature maps are reconstructed to obtain the fused image, and the learning of image feature coding, self-attention mechanism, fusion rule, and fused feature decoding are realized by generative adversarial network. Experiments on TNO real-world data show that the learned self-attention unit can represent the salient region and benefit the fusion rule design, the proposed algorithm is better than SOAT infrared and visible image fusion algorithms in objective and subjective evaluation, and it retains the detailed information of visible images and infrared target information of infrared images.
-
Key words:
- image fusion /
- self-attention /
- generative adversarial network /
- infrared images /
- deep learning
-
表 1 不同融合方法的客观评价结果
Table 1. Objective evaluation results of different fusion methods
Method EN SD CC SF MG EI CSR 6.52 28.30 0.51 17.25 8.02 53.69 GTF 6.80 39.33 0.33 15.32 7.72 48.29 DenseFuse 6.93 37.50 0.53 16.24 8.29 53.48 IFCNN 6.84 35.65 0.48 20.38 10.43 67.09 FusionGAN 6.75 33.59 0.43 10.76 5.57 39.83 DDcGAN 7.45 51.98 0.26 17.02 8.72 62.09 SEDRFuse 6.99 42.49 0.51 14.83 7.21 56.37 Ours 7.29 45.20 0.52 25.11 13.02 93.30 -
[1] MA J Y, MA Y, LI C. Infrared and visible image fusion methods and applications: a survey[J]. Information Fusion, 2019, 45: 153-178. doi: 10.1016/j.inffus.2018.02.004 [2] YU X C, GAO G Y, XU J D, et al. Remote sensing image fusion based on sparse representation[C]//2014 IEEE Geoscience and Remote Sensing Symposium, 2014: 2858-2861. [3] ZHAO W D, LU H C. Medical image fusion and denoising with alternating sequential filter and adaptive fractional order total variation[J]. IEEE Transactions on Instrumentation and Measurement, 2017, 66(9): 2283-2294. doi: 10.1109/TIM.2017.2700198 [4] LI Y S, TAO C, et al. Unsupervised multilayer feature learning for satellite image scene classification[J]. IEEE Geoscience and Remote Sensing Letters, 2016, 13(2): 157-161. doi: 10.1109/LGRS.2015.2503142 [5] JIN X, JIANG Q, et al. A survey of infrared and visual image fusion methods[J]. Information Fusion, 2017, 85: 478-501. [6] BAI X, ZHANG Y, ZHOU F, et al. Quadtree-based multi-focus image fusion using a weighted focus-measure[J]. Information Fusion, 2015, 22: 105-118. doi: 10.1016/j.inffus.2014.05.003 [7] BAI X Z. Infrared and visual image fusion through feature extraction by morphological sequential toggle operator[J]. Information Fusion, 2015, 71: 77-86. [8] LIU Y, CHEN X, PENG H, et al. Multi-focus image fusion with a deep convolutional neural network[J]. Information Fusion, 2017, 36: 191-207. doi: 10.1016/j.inffus.2016.12.001 [9] MA J Y, YU W, LIANG P W, et al. FusionGAN: a generative adversarial network for infrared and visible image fusion[J]. Information Fusion, 2019, 48: 11-26. doi: 10.1016/j.inffus.2018.09.004 [10] LI H, WU X J. DenseFuse: a fusion approach to infrared and visible images[J]. IEEE Transactions on Image Processing, 2019, 28(5): 2614-2623. doi: 10.1109/TIP.2018.2887342 [11] WANG X, GIRSHICK R, GUPTA A, et al. Non-local neural networks[C]//Computer Vision and Pattern Recognition, 2018: 7794-7803. [12] ZHANG H, GOODFELLOW I, Metaxas D, et al. Self-attention generative adversarial networks[C]//International Conference on Machine Learning, 2020: 7354-7363. [13] 杨晓莉, 蔺素珍. 一种注意力机制的多波段图像特征级融合方法[J]. 西安电子科技大学学报, 2020, 47(1): 123-130. https://www.cnki.com.cn/Article/CJFDTOTAL-XDKD202001018.htmYANG X L, LIN S Z. Method for multi-band image feature-level fusion based on attention mechanism[J]. Journal of Xidian University, 2020, 47(1): 123-130. https://www.cnki.com.cn/Article/CJFDTOTAL-XDKD202001018.htm [14] JIAN L, YANG X, LIU Z, et al. A symmetric encoder-decoder with residual block for infrared and visible image fusion[J]. IEEE Transactions on Instrumentation and Measurement, 2021, 70: 1-15. [15] LOFFE S, SZEGEDY C. Batch normalization: accelerating deep network training by reducing internal covariate shift [C]//Proceedings of the 32nd International Conference on International Conference on Machine Learning, 2015, 37: 448-456. [16] ZHANG Y, LIU Y, SUN P, et al. IFCNN: a general image fusion framework based on convolutional neural network [J]. Information Fusion, 2020, 54: 99-118. doi: 10.1016/j.inffus.2019.07.011 [17] YAN H, YU X, et al. Single image depth estimation with normal guided scale invariant deep convolutional fields[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2020, 29(1): 80-92. [18] LIU Y, CHEN X, WARD R, et al. Image fusion with convolutional sparse representation[J]. IEEE Signal Processing Letters, 2016, 23(12): 1882-1886. doi: 10.1109/LSP.2016.2618776 [19] MA J Y, CHEN C, LI C, et al. Infrared and visible image fusion via gradient transfer and total variation minimization[J]. Information Fusion, 2016, 31: 100-109. doi: 10.1016/j.inffus.2016.02.001 [20] MA J Y, XU H, JIANG J, et al. DDcGAN: a dual-discriminator conditional generative adversarial network for multi-resolution image fusion [J]. IEEE Transactions on Image Processing, 2020, 29: 4980-4995. doi: 10.1109/TIP.2020.2977573