Citation: | SUN Jing, WANG Zhishe, YANG Fan, YU Zhaofa. Multi-layer Perceptron Interactive Fusion Method for Infrared and Visible Images[J]. Infrared Technology , 2025, 47(5): 619-627. |
Existing Transformer-based fusion methods employ a self-attention mechanism to model the global dependency of the image context, which can generate superior fusion performance. However, due to the high complexity of the models related to attention mechanisms, the training efficiency is low, which limits the practical application of image fusion. Therefore, a multilayer perceptron interactive fusion method for Infrared and visible images, called MLPFuse, is proposed. First, a lightweight multilayer perceptron network architecture is constructed that uses a fully connected layer to establish global dependencies. This framework can achieve high computational efficiency while retaining strong feature representation capabilities. Second, a cascaded token- and channel-wise interaction model is designed to realize feature interaction between different tokens and independent channels to focus on the inherent features of the source images and enhance the feature complementarity of different modalities. Compared to seven typical fusion methods, the experimental results on the TNO and MSRS datasets and object detection tasks show that the proposed MLPFuse outperforms other methods in terms of subjective visual descriptions and objective metric evaluations. This method utilizes a multilayer perceptron to model the long-distance dependency of images and constructs a cascaded token-wise and channel-wise interaction model to extract the global features of images from spatial and channel dimensions. Compared with other typical fusion methods, our MLPFuse achieves remarkable fusion performance and competitive computational efficiency.
[1] |
宁大海, 郑晟. 可见光和红外图像决策级融合目标检测算法[J]. 红外技术, 2023, 45(3): 282-291. http://hwjs.nvir.cn/article/id/5340b616-c317-4372-9776-a7c81ca2c729
NING D H, ZHENG S. An object detection algorithm based on decision-level fusion of visible and infrared images[J]. Infrared Technology, 2023, 45(3): 282-291. http://hwjs.nvir.cn/article/id/5340b616-c317-4372-9776-a7c81ca2c729
|
[2] |
FENG Z, LAI J, XIE X. Learning modality-specific representations for visible-infrared person re-identification[J]. IEEE Transactions on Image Processing, 2020(29): 579-590.
|
[3] |
周华兵, 侯积磊, 吴伟, 等. 基于语义分割的红外和可见光图像融合[J]. 计算机研究与发展, 2021, 58(2): 436-443.
ZHOU H B, HOU J L, WU W, et al. Infrared and visible image fusion based on semantic segmentation[J]. Journal of Computer Research and Development, 2021, 58(2): 436-443.
|
[4] |
WANG Z S, XU J W, JIANG X L, et al. Infrared and visible image fusion via hybrid decomposition of NSCT and morphological sequential toggle operator[J]. Optik, 2020, 201: 1-11.
|
[5] |
LI H, WU X J, Kittler J. MDLatLRR: A novel decomposition method for infrared and visible image fusion[J]. IEEE Transactions on Image Processing, 2020, 29: 4733-4746. DOI: 10.1109/TIP.2020.2975984
|
[6] |
WANG Z S, WANG J Y, WU Y Y, et al. UNFusion: A unified multi-scale densely connected network for infrared and visible image fusion[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2022, 32(6): 3360-3374. DOI: 10.1109/TCSVT.2021.3109895
|
[7] |
XU H, ZHANG H, MA J Y. Classification saliency-based rule for visible and infrared image fusion[J]. IEEE Transactions on Computational Imaging, 2021(7): 824-836.
|
[8] |
杨艳春, 李永萍, 党建武, 等. 基于快速交替引导滤波和CNN的红外与可见光图像融合[J]. 光学精密工程, 2023, 31(10): 1548-1562. DOI: 10.37188/OPE.20233110.1548
YANG Y C, LI Y P, DANG J W, et al. Infrared and visible image fusion based on fast alternating guided filtering and CNN[J]. Optics and Precision Engineering, 2023, 31(10): 1548-1562. DOI: 10.37188/OPE.20233110.1548
|
[9] |
WANG Z S, WU Y Y, WANG J Y, et al. Res2Fusion: Infrared and visible image fusion based on dense Res2net and double non-local attention models[J]. IEEE Transactions on Instrumentation and Measurement, 2022, 71: 1-12.
|
[10] |
WANG Z S, YANG F, WANG J Y, et al. A dual-path residual attention fusion network for infrared and visible images[J]. Optik, 2023, 33(7): 3159-3172.
|
[11] |
XU H, MA J Y, JIANG J J, et al. U2Fusion: A unified unsupervised image fusion network[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 4(11): 502-518.
|
[12] |
LI H, WU X J, KITTLER J. RFN-Nest: An end-to-end residual fusion network for infrared and visible images[J]. Information Fusion, 2021(73): 1566-2535.
|
[13] |
MA J Y, YU W, LIANG P W, et al. FusionGAN: A generative adversarial network for infrared and visible image fusion[J]. Information Fusion, 2019(48): 11-26.
|
[14] |
MA J Y, ZHANG H, SHAO Z F, et al. GANMcC: A generative adversarial network with multiclassification constraints for infrared and visible image fusion[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021(70): 1-14.
|
[15] |
陈欣. 基于双注意力机制的红外与可见光图像融合方法[J]. 红外技术, 2023, 45(6): 639-648. http://hwjs.nvir.cn/article/id/a00923cc-937e-4dc6-893c-bd6e73ed3dc2
CHEN X. Infrared and visible image fusion using double attention generative adversarial networks[J]. Infrared Technology, 2023, 45(6): 639-648. http://hwjs.nvir.cn/article/id/a00923cc-937e-4dc6-893c-bd6e73ed3dc2
|
[16] |
WANG Z S, SHAO W Y, CHEN Y L, et al. Infrared and visible image fusion via interactive compensatory attention adversarial learning[J]. IEEE Transactions on Multimedia, 2023, 25: 7800-7813. DOI: 10.1109/TMM.2022.3228685
|
[17] |
WANG Z S, SHAO W Y, CHEN Y L, et al. A cross-scale iterative attentional adversarial fusion network for infrared and visible images[J]. Transactions on Circuits and Systems for Video Technology, 2023, 33(8): 3677-3688. DOI: 10.1109/TCSVT.2023.3239627
|
[18] |
Dosovitskiy A, Beyer L, A Kolesnikov, et al. An image is worth 16×16 words: Transformers for image recognition at Scale[J]. ArXiv, abs/2010.11929.
|
[19] |
WANG Z S, CHEN Y L, SHAO W Y, et al. SwinFuse: A residual swin transformer fusion network for infrared and visible images[J]. IEEE Transactions on Instrumentation and Measurement, 2022, 71: 1-12.
|
[20] |
TANG W, HE F Z, LIU Y. YDTR: Infrared and visible image fusion via Y-shape dynamic transformer[J]. IEEE Transactions on Multimedia, 2023, 25: 5413-5428. DOI: 10.1109/TMM.2022.3192661
|
[21] |
TOET A (2014). TNO Image Fusion Dataset. Data[DB/OL]. [2023-12-01]. https://figshare.com/articles/TNO Image Fusion Dataset/1008029.
|
[22] |
TANG L F. MSRS Dataset. Data [DB/OL]. [2023-12-01]. https://github.com/Linfeng-Tang/MSRS. 2022.
|
[23] |
ZHENG L, FORSYTH D S, Laganière R. A feature-based metric for the quantitative evaluation of pixel-level image fusion[J]. Computer Vision and Image Understanding, 2008, 109(1): 56-68. DOI: 10.1016/j.cviu.2007.04.003
|
[24] |
HAN Y, CAI Y Z, CAO Y, et al. A new image fusion performance metric based on visual information fidelity[J]. Information Fusion, 2013(14): 127-135.
|
[25] |
ZHOU W, BOVIK A C, SHEIKH H R, et al. Image quality assessment: From error visibility to structural similarity[J]. IEEE Transactions on Image Processing, 2004, 13(4): 600-612. DOI: 10.1109/TIP.2003.819861
|
[26] |
RAO Y J. In-fibre bragg grating sensors[J]. Measurement Science and Technology, 1997(8): 355-375.
|
[27] |
QU G H, ZHANG D L, YAN P F. Information measure for performance of image fusion[J]. Electronics Letters, 2002, 38(7): 313-315. DOI: 10.1049/el:20020212
|
[28] |
PIELLA G, HEIJMANS H. A new quality metric for image fusion[C]//International Conference on Image Processing, 2023: 111-173.
|
[29] |
XYDEAS C, PETROVIC V. Objective image fusion performance measure[J]. Electron. Lett., 2000, 36: 308-309. DOI: 10.1049/el:20000267
|