红外与可见光图像多层感知机交互融合方法

孙婧; 王志社; 杨帆; 余朝发

红外与可见光图像多层感知机交互融合方法

Multi-layer Perceptron Interactive Fusion Method for Infrared and Visible Images

摘要

摘要: 现有的Transformer融合方法利用自注意力机制建立图像上下文的全局依赖关系，从而产生优越的融合性能。然而由于与注意力机制相关的模型高复杂度，导致训练效率较低，限制了图像融合的实际应用。为此，本文提出了红外与可见光图像多层感知机交互融合方法。首先，构建轻量化多层感知机网络架构，利用全连接层建立全局依赖关系，在获得较高的计算效率时，具有较强的特征表征能力。其次，设计了级联空间通道交互模型，实现不同空间位置和独立通道之间的特征交互，从而聚焦源图像各自的内在特征，增强模态间特征的互补性。与其他7种典型的融合方法相比，TNO、MSRS数据集以及目标检测任务的实验结果表明，本文方法在主观视觉描述和客观指标评价都优于其他融合方法。本方法利用多层感知机建立图像的长距离依赖关系，构建了级联空间通道交互模型，从空间和通道维度提取图像全局特征，比其他典型融合方法具有更优越的融合性能和更高的计算效率。

Abstract: Existing Transformer-based fusion methods employ a self-attention mechanism to model the global dependency of the image context, which can generate superior fusion performance. However, due to the high complexity of the models related to attention mechanisms, the training efficiency is low, which limits the practical application of image fusion. Therefore, a multilayer perceptron interactive fusion method for Infrared and visible images, called MLPFuse, is proposed. First, a lightweight multilayer perceptron network architecture is constructed that uses a fully connected layer to establish global dependencies. This framework can achieve high computational efficiency while retaining strong feature representation capabilities. Second, a cascaded token- and channel-wise interaction model is designed to realize feature interaction between different tokens and independent channels to focus on the inherent features of the source images and enhance the feature complementarity of different modalities. Compared to seven typical fusion methods, the experimental results on the TNO and MSRS datasets and object detection tasks show that the proposed MLPFuse outperforms other methods in terms of subjective visual descriptions and objective metric evaluations. This method utilizes a multilayer perceptron to model the long-distance dependency of images and constructs a cascaded token-wise and channel-wise interaction model to extract the global features of images from spatial and channel dimensions. Compared with other typical fusion methods, our MLPFuse achieves remarkable fusion performance and competitive computational efficiency.

HTML全文

参考文献(29)

施引文献

资源附件(0)