留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

红外与可见光图像多尺度Transformer融合方法

陈彦林 王志社 邵文禹 杨帆 孙婧

陈彦林, 王志社, 邵文禹, 杨帆, 孙婧. 红外与可见光图像多尺度Transformer融合方法[J]. 红外技术, 2023, 45(3): 266-275.
引用本文: 陈彦林, 王志社, 邵文禹, 杨帆, 孙婧. 红外与可见光图像多尺度Transformer融合方法[J]. 红外技术, 2023, 45(3): 266-275.
CHEN Yanlin, WANG Zhishe, SHAO Wenyu, YANG Fan, SUN Jing. Multi-scale Transformer Fusion Method for Infrared and Visible Images[J]. Infrared Technology , 2023, 45(3): 266-275.
Citation: CHEN Yanlin, WANG Zhishe, SHAO Wenyu, YANG Fan, SUN Jing. Multi-scale Transformer Fusion Method for Infrared and Visible Images[J]. Infrared Technology , 2023, 45(3): 266-275.

红外与可见光图像多尺度Transformer融合方法

基金项目: 

山西省基础研究计划资助项目 201901D111260

信息探测与处理山西省重点实验室开放基金 ISPT2020-4

详细信息
    作者简介:

    陈彦林(1995-)男,硕士研究生,研究方向为图像融合,深度学习。E-mail:chentyust@163.com

    通讯作者:

    王志社(1982-),男,副教授,博士,研究方向为图像融合,深度学习,机器视觉。E-mail:wangzs@tyust.edu.cn

  • 中图分类号: TP391

Multi-scale Transformer Fusion Method for Infrared and Visible Images

  • 摘要: 目前主流的深度融合方法仅利用卷积运算来提取图像局部特征,但图像与卷积核之间的交互过程与内容无关,且不能有效建立特征长距离依赖关系,不可避免地造成图像上下文内容信息的丢失,限制了红外与可见光图像的融合性能。为此,本文提出了一种红外与可见光图像多尺度Transformer融合方法。以Swin Transformer为组件,架构了Conv Swin Transformer Block模块,利用卷积层增强图像全局特征的表征能力。构建了多尺度自注意力编码-解码网络,实现了图像全局特征提取与全局特征重构;设计了特征序列融合层,利用SoftMax操作计算特征序列的注意力权重系数,突出了源图像各自的显著特征,实现了端到端的红外与可见光图像融合。在TNO、Roadscene数据集上的实验结果表明,该方法在主观视觉描述和客观指标评价都优于其他典型的传统与深度学习融合方法。本方法结合自注意力机制,利用Transformer建立图像的长距离依赖关系,构建了图像全局特征融合模型,比其他深度学习融合方法具有更优的融合性能和更强的泛化能力。
  • 图  1  多尺度Transformer图像融合方法原理

    Figure  1.  Principle of multi-scale Transformer image fusion method

    图  2  融合层示意图

    Figure  2.  Schematic diagram of fusion strategy

    图  3  5种融合模型的主观对比结果

    Figure  3.  The subjective comparison results of five fusion models

    图  4  TNO数据集Nato_camp主观评价对比结果

    Figure  4.  The subjective comparison results of Nato_camp from the TNO dataset

    图  5  TNO数据集Street主观评价对比结果

    Figure  5.  The subjective comparison results of Street from the TNO dataset

    图  6  TNO数据集Bench主观评价对比结果

    Figure  6.  The subjective comparison results of Bench from the TNO dataset

    图  7  TNO数据集Kaptein_1123主观评价对比结果

    Figure  7.  The subjective comparison results of Kaptein_1123 from the TNO dataset

    图  8  TNO数据集不同融合方法指标EN、SD、MI、SF、NCIE、VIF对比结果

    Figure  8.  The objective comparison results of EN, SD, MI, SF, NCIE and VIF of different fusion methods from the TNO dataset

    图  9  Roadscene数据集FLIR_07210主观评价对比结果

    Figure  9.  The subjective comparison results of FLIR 07210 from the Roadscene dataset

    图  10  Roadscene数据集FLIR_08954主观评价对比结果

    Figure  10.  The subjective comparison results of FLIR_08954 from the Roadscene dataset

    图  11  Roadscene数据集不同融合方法指标指标EN、SD、MI、SF、NCIE、VIF对比结果

    Figure  11.  The objective comparison results of EN, SD, MI, SF, NCIE and VIF of different fusion methods from Roadscene dataset

    表  1  5种融合模型的客观对比结果

    Table  1.   The objective comparison results of five fusion model

    Models Parameters SF VIF EN SD MI NCIE
    Fusion Layer Add 9.51591 0.36018 7.22888 47.80041 2.55456 0.80587
    Ours 9.84952 0.39648 7.23821 48.54178 2.61691 0.80607
    STL Number 5 9.31619 0.35836 7.20091 48.01473 2.57833 0.80592
    6 9.84952 0.39648 7.23821 48.54178 2.61691 0.80607
    7 9.40830 0.37508 7.22529 48.40002 2.49068 0.80564
    Conv Layer No 9.39838 0.36835 7.21169 47.38488 2.45803 0.80556
    Yes 9.84952 0.39648 7.23821 48.54178 2.61691 0.80607
    下载: 导出CSV

    表  2  不同融合方法计算效率对比结果

    Table  2.   The comparison results of computation efficiency for different fusion methods  s

    Method TNO Roadscene
    MDLatLRR 7.941×101 3.839×101
    IFCNN 4.554×10-2 2.246×10-2
    DenseFuse 8.509×10-2 4.001×10-2
    RFN-Nest 1.777×10-1 8.609×10-2
    FusionGAN 2.015 1.093
    GANMcC 4.21 2.195
    PMGI 5.445×10-1 2.928×10-1
    SwinFuse 2.145×10-1 1.291×10-1
    IFT 8.141×10-1 4.025×10-1
    Ours 5.091×10-1 2.848×10-1
    下载: 导出CSV
  • [1] Paramanandham N, Rajendiran K. Multi sensor image fusion for surveillance applications using hybrid image fusion algorithm[J]. Multimedia Tools and Applications, 2018, 77(10): 12405-12436. doi:  10.1007/s11042-017-4895-3
    [2] ZHANG Xingchen, YE Ping, QIAO Dan, et al. Object fusion tracking based on visible and infrared images: a comprehensive review[J]. Information Fusion, 2020, 63: 166-187. doi:  10.1016/j.inffus.2020.05.002
    [3] TU Zhengzheng, LI Zhun, LI Chenglong, et al. Multi-interactive dual- decoder for RGB-thermal salient object detection[J]. IEEE Transactions on Image Processing, 2021, 30: 5678-5691. doi:  10.1109/TIP.2021.3087412
    [4] 汪荣贵, 王静, 杨娟, 等. 基于红外和可见光模态的随机融合特征金子塔行人重识别[J]. 光电工程, 2020, 47(12): 190669. Doi:  10.12086/oee.2020.190669.

    WANG Ronggui, WANG Jing, YANG Juan, et al. Random feature fusion of golden Tower for pedestrian rerecognition based on infrared and visible modes[J]. Opto-Electronic Engineering, 2020, 47(12): 190669. Doi:  10.12086/oee.2020.190669
    [5] WANG Zhishe, XU Jiawei, JIANG Xiaolin, et al. Infrared and visible image fusion via hybrid decomposition of NSCT and morphological sequential toggle operator[J]. Optik, 2020, 201: 163497. doi:  10.1016/j.ijleo.2019.163497
    [6] LI Hui, WU Xiaojun, Kittle J. MDLatLRR: a novel decomposition method for infrared and visible image fusion[J]. IEEE Transactions on Image Processing, 2020, 29: 4733-4746. doi:  10.1109/TIP.2020.2975984
    [7] 孙彬, 诸葛吴为, 高云翔, 等. 基于潜在低秩表示的红外和可见光图像融合[J]. 红外技术, 2022, 44(8): 853-862. http://hwjs.nvir.cn/article/id/7fc3a60d-61bb-454f-ad00-e925eeb54576

    SUN Bin, ZHUGE Wuwei, GAO Yunxiang et al. Infrared and visible image fusion based on potential low-rank representation[J]. Infrared Technology, 2022, 44(8): 853-862. http://hwjs.nvir.cn/article/id/7fc3a60d-61bb-454f-ad00-e925eeb54576
    [8] MA Jinlei, ZHOU Zhiqiang, WANG Bo, et al. Infrared and visible image fusion based on visual saliency map and weighted least square optimization[J]. Infrared Physics & Technology, 2017, 82: 8-17.
    [9] KONG Weiwei, LEI Yang, ZHAO Huaixun. Adaptive fusion method of visible light and infrared images based on non-subsampled shearlet transform and fast non-negative matrix factorization[J]. Infrared Physics & Technology, 2014, 67: 161-172.
    [10] 姜迈, 沙贵君, 李宁. 基于PUCS与DTCWT的红外与弱可见光图像融合[J]. 红外技术, 2022, 44(7): 716-725. http://hwjs.nvir.cn/article/id/ee43f5b8-9a1f-441c-9d95-e339989d8954

    JIANG Mai, SHA Guijun, LI Ning. Infrared and inferior visible image fusion based on PUCS and DTCWT [J]. Infrared Technology, 2022, 44(7): 716-725. http://hwjs.nvir.cn/article/id/ee43f5b8-9a1f-441c-9d95-e339989d8954
    [11] WANG Zhishe, YANG Fengbao, PENG Zhihao, et al. Multi-sensor image enhanced fusion algorithm based on NSST and top-hat transformation[J]. Optik, 2015, 126(23): 4184-4190. doi:  10.1016/j.ijleo.2015.08.118
    [12] LIU Yu, CHEN Xun, PENG Hu, et al. Multi-focus imagefusion with a deep convolutional neural network[J]. Information Fusion, 2017, 36: 191-207. doi:  10.1016/j.inffus.2016.12.001
    [13] ZHANG Hao, XU Han, TIAN Xin, et al. Image fusion meets deep learning: A survey and perspective[J]. Information Fusion, 2021, 76: 323-336. doi:  10.1016/j.inffus.2021.06.008
    [14] ZHANG Yu, LIU Yu, SUN Peng, et al. IFCNN: A general image fusion framework based on convolutional neural network[J]. Information Fusion, 2020, 54: 99-118. doi:  10.1016/j.inffus.2019.07.011
    [15] LI Hui, WU Xiaojun. DenseFuse: a fusion approach to infrared and visible images[J]. IEEE Transactions on Image Processing, 2019, 28(5): 2614- 2623. doi:  10.1109/TIP.2018.2887342
    [16] LI Hui, WU Xiaojun, Kittler J. RFN-Nest: An end-to-end residual fusion network for infrared and visible images[J]. Information Fusion, 2021, 73: 72-86. doi:  10.1016/j.inffus.2021.02.023
    [17] JIAN Lihua, YANG Xiaomin, LIU Zheng, et al. SEDRFuse: A symmetric encoder–decoder with residual block network for infrared and visible image fusion[J]. IEEE Transactions on Instrumentation and Measurement, 2020, 70: 1-15.
    [18] ZHANG Hao, XU Han, XIAO Yang, et al. Rethinking the image fusion: A fast unified image fusion network based on proportional maintenance of gradient and intensity[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(7): 12797-12804.
    [19] WANG Zhishe, WANG Junyao, WU Yuanyuan, et al. UNFusion: a unified multi-scale densely connected network for infrared and visible image fusion[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2022, 32(6): 3360- 3374.
    [20] WANG Zhishe; WU Yuanyuan; WANG Junyao, et al. Res2Fusion: infrared and visible image fusion based on dense Res2net and double non-local attention models[J]. IEEE Transactions on Instrumentation and Measurement, 2022, 71: 1-12.
    [21] MA Jiayi, YU Wei, LIANG Pengwei, et al. FusionGAN: a generative adversarial network for infrared and visible image fusion[J]. Information Fusion, 2019, 48: 11-26.
    [22] MA Jiayi, ZHANG Hao, SHAO Zhenfeng, et al. GANMcC: a generative adversarial network with multiclassification constraints for infrared and visible image fusion[J]. IEEE Transactions on Instrumentation and Measurement, 2021, 70: 1-14.
    [23] 王志社, 邵文禹, 杨风暴, 等. 红外与可见光图像交互注意力生成对抗融合方法[J]. 光子学报, 2022, 51(4): 318-328. https://www.cnki.com.cn/Article/CJFDTOTAL-GZXB202204029.htm

    WANG Zhishe, SHAO Wenyu, YANG Fengbao, et al. A generative antagonism fusion method for interactive attention of infrared and visible images [J]. Acta Photonica Sinica, 2022, 51(4): 318-328. https://www.cnki.com.cn/Article/CJFDTOTAL-GZXB202204029.htm
    [24] LI Jing, ZHU Jianming, LI Chang, et al. CGTF: Convolution-Guided Transformer for Infrared and Visible Image Fusion [J]. IEEE Transactions on Instrumentation and Measurement. 2022, 71: 1-14.
    [25] RAO Dongyu, WU Xiaojun, XU Tianyang. TGFuse: An infrared and visible image fusion approach based on transformer and generative adversarial network [J/OL].arXiv preprint arXiv: 2201.10147. 2022.
    [26] WANG Zhishe, CHEN Yanlin, SHAO Wenyu, et al. SwinFuse: a residual swin transformer fusion network for infrared and visible images[J/OL]. arXiv preprint arXiv: 2204.11436. 2022.
    [27] ZHAO Haibo, NIE Rencan. DNDT: infrared and visible image fusion via DenseNet and dual-transformer[C]// International Conference on Information Technology and Biomedical Engineering (ICITBE), 2021: 71-75.
    [28] VS V, Valanarasu J M J, Oza P, et al. Image fusion transformer [J/OL]. arXiv preprint arXiv: 2107.09011. 2021.
    [29] LIU Ze, LIN Yutong, CAO Yue, et al. Swin transformer: Hierarchical vision transformer using shifted windows[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021: 10012-10022.
    [30] TOET A. TNO Image Fusion Datase[DB/OL]. [2014-04-26].https://figshare.com/articles/TNImageFusionDataset/1008029.
    [31] XU Han. Roadscene Database[DB/OL]. [2020-08-07].https://github.com/hanna-xu/RoadScene.
    [32] LI Hui, WU Xiaojun, Kittle J. MDLatLRR: a novel decomposition method for infrared and visible image fusion[J]. IEEE Transactions on Image Processing, 2020, 29: 4733-4746.
  • 加载中
图(11) / 表(2)
计量
  • 文章访问数:  266
  • HTML全文浏览量:  348
  • PDF下载量:  107
  • 被引次数: 0
出版历程
  • 收稿日期:  2022-08-23
  • 修回日期:  2022-09-13
  • 刊出日期:  2023-03-20

目录

    /

    返回文章
    返回