A Visible and Infrared Image Fusion Method based on Generative Adversarial Networks and Attention Mechanism
-
摘要: 针对低照度可见光图像中目标难以识别的问题,提出了一种新的基于生成对抗网络的可见光和红外图像的融合方法,该方法可直接用于RGB三通道的可见光图像和单通道红外图像的融合。在生成对抗网络中,生成器采用具有编码层和解码层的U-Net结构,判别器采用马尔科夫判别器,并引入注意力机制模块,使得融合图像可以更关注红外图像上的高强度信息。实验结果表明,该方法在维持可见光图像细节纹理信息的同时,引入红外图像的主要目标信息,生成视觉效果良好、目标辨识度高的融合图像,并在信息熵、结构相似性等多项客观指标上表现良好。Abstract: A new fusion method for visible and infrared images based on generative adversarial networks is proposed to solve the problem of recognizing targets in low-light images; the method can be directly applied to the fusion of RGB three-channel visible images and infrared images. In generative adversarial networks, the generator adopts a U-Net structure with encoding and decoding layers. The discriminator adopts a Markovian discriminator, and the attention mechanism is introduced to force the fused image to pay more attention to the high-intensity information on infrared images. The experimental results show that the proposed method not only maintains the detailed texture information of visible images but also introduces the main target information of infrared images to generate fusion images with good visual effects and high target identification, and it performs well in information entropy, structural similarity, and other objective indexes.
-
表 1 生成器网络参数
Table 1. The parameters of generator
Convolution layer Kernel size/stride Padding Input size Output size Conv1 4×4/2 (1, 1) 480×640×4 240×320×32 CBAM 4×4/2 (1, 1) 240×320×32 240×320×32 Conv2 4×4/2 (1, 1) 240×320×32 120×160×64 Conv3 4×4/2 (1, 1) 120×160×64 60×80×128 Conv4 4×4/2 (1, 1) 60×80×128 30×40×256 Conv5 4×4/2 (2, 1) 30×40×256 16×20×512 Conv6 4×4/2 (1, 1) 16×20×512 8×10×512 Conv7 4×4/2 (1, 2) 8×10×512 4×6×512 Conv8 4×4/2 (1, 1) 4×6×512 2×3×512 ConvTrans8 4×4/2 (1, 1) 2×3×512 4×6×512 ConvTrans7 4×4/2 (1, 2) 4×6×1024 8×10×512 ConvTrans6 4×4/2 (1, 1) 8×10×1024 16×20×512 ConvTrans5 4×4/2 (2, 1) 16×10×1024 30×40×256 ConvTrans4 4×4/2 (1, 1) 30×40×512 60×80×128 ConvTrans3 4×4/2 (1, 1) 60×80×256 120×160×64 ConvTrans2 4×4/2 (1, 1) 120×160×128 240×320×32 ConvTrans1 4×4/2 (1, 1) 240×320×64 480×640×3 表 2 判别器参数
Table 2. The parameters of discriminator
Convolution layer Kernel size/stride Padding Output size Conv1 4×4/2 (1, 1) 240×320×64 Conv2 4×4/2 (1, 1) 120×160×128 Conv3 4×4/2 (1, 1) 60×80×256 Conv4 4×4/2 (1, 1) 30×40×512 Conv5 4×4/2 (1, 1) 15×20×512 Conv6 1×1/1 (0, 0) 15×20×1 表 3 融合图像客观指标值
Table 3. The quantitative comparisons of fusion images
Fusion methods EN MI FMI SSIM CC PSNR LP 5.918 11.836 0.944 0.681 0.646 68.496 LP-SR 6.393 12.785 0.945 0.823 0.566 67.801 NSCT 5.821 11.643 0.942 0.671 0.652 68.575 NSCT-SR 6.224 12.447 0.940 0.859 0.575 67.472 DTCWT 5.804 11.608 0.942 0.670 0.647 68.570 DTCWT-SR 6.455 12.910 0.945 0.782 0.525 67.338 DenseFuse 6.036 12.071 0.939 0.631 0.684 67.319 CBAM-GAN 5.918 11.836 0.928 0.796 0.649 68.751 Avarage 6.111 12.223 0.941 0.740 0.606 67.967 -
[1] MA J, MA Y, LI C. Infrared and visible image fusion methods and applications: a survey[J]. Information Fusion, 2019, 45: 153-178. doi: 10.1016/j.inffus.2018.02.004 [2] Burt P J, Adelson E H. The Laplacian pyramid as a compact image code[J]. Readings in Computer Vision, 1987, 31(4): 671-679. https://www.sciencedirect.com/science/article/pii/B9780080515816500659 [3] Selesnick I W, Baraniuk R G, Kingsbury N C. The dual-tree complex wavelet transform[J]. IEEE Signal Processing Magazine, 2005, 22(6): 123-151. doi: 10.1109/MSP.2005.1550194 [4] A L da Cunha, J Zhou, M N Do. Nonsubsampled contourilet transform: filter design and applications in denoising[C]//IEEE International Conference on Image Processing 2005, 749: (doi: 10.1109/ICIP.2005.1529859). [5] Hariharan H, Koschan A, Abidi M. The direct use of curvelets in multifocus fusion[C]//16th IEEE International Conference on Image Processing (ICIP), 2009: 2185-2188(doi: 10.1109/ICIP.2009.5413840). [6] LI Hui. Dense fuse: a fusion approach to infrared and visible images[C]//IEEE Transactions on Image Processing, 2018, 28: 2614- 2623(doi: 0.1109/TIP.2018.2887342). [7] MA J, YU W, LIANG P, et al. Fusion GAN: a generative adversarial network for infrared and visible image fusion[J]. Information Fusion, 2019, 48: 11-26. doi: 10.1016/j.inffus.2018.09.004 [8] Ronneberger O, Fischer P, Brox T. U-net: convolutional networks for biomedical image segmentation[C]//International Conference on Medical Image Computing and Computer-assisted Intervention, 2015: 234-241. [9] Hwang S, Park J, Kim N, et al. Multispectral pedestrian detection: Benchmark dataset and baseline[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015: 1037-1045. [10] Goodfellow I, Pouget-Abadie J, Mirza M, et al. Generative adversarial nets[C]//Advances in Neural Information Processing Systems, 2014: 2672-2680. [11] Radford A, Metz L, Chintala S. Unsupervised representation learning with deep convolutional generative adversarial networks[J/OL][2015-11-07]. arXiv preprint arXiv: 1511.06434, 2015: https://arxiv.org/abs/1511.06434v1. [12] MAO X, LI Q, XIE H, et al. Least squares generative adversarial networks[C]//2017 IEEE International Conference on Computer Vision (ICCV), 2017: 2813-2821(doi: 10.1109/ICCV.2017.304). [13] Isola Phillip, ZHU Junyan, ZHOU Tinghui, et al. Image-to-image translation with conditional adversarial networks, 2017: 5967-5976 (doi: 10.1109/CVPR.2017.632). [14] Jaderberg M, Simonyan K, Zisserman A. Spatial transformer networks[C]//Advances in Neural Information Processing Systems, 2015: 2017-2025. [15] HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018: 7132-7141. [16] Woo S, Park J, Lee J Y, et al. Cbam: convolutional block attention module[C]//Proceedings of the European Conference on Computer Vision (ECCV), 2018: 3-19.