基于Swin Transformer和混合特征聚合的红外与可见光图像融合方法

Infrared and Visible Light Image Fusion Method Based on Swin Transformer and Hybrid Feature Aggregation

  • 摘要: 红外与可见光图像融合可以生成包含更多信息的图像,比原始图像更符合人类视觉感知也有利于下游任务的进行。传统的基于信号处理的图像融合方法存在泛化能力不强、处理复杂图片融合性能下降等问题。深度学习有很强的特征提取能力,其生成的结果较好,但结果中存在纹理细节信息保存少、图像模糊的问题。针对这一问题,文中提出一种基于多尺度Swin-transformer和注意力机制的红外与可见光图像融合网络模型。Swin-transformer可以在多尺度视角下提取长距离语义信息,注意力机制可以将所提特征中的不重要特征弱化,保留主要信息。此外本文提出了一种新的混合特征聚合模块,针对红外和可见光图像各自的特点分别设计了亮度增强模块和细节保留模块,有效保留更多的纹理细节和红外目标信息。该融合方法包括编码器、特征聚合和解码器三部分。首先,将源图像输入编码器,提取多尺度深度特征;然后,设计特征聚合融合每个尺度的深度特征;最后,采用基于嵌套连接的解码器重构融合后的图像。在公开数据集上的实验结果表明本文提出的方法对比其他先进的方法具有更好的融合性能。其中在客观评价指标中EI、AG、QP、EN、SD指标达到最优。从主观感受上,所提红外和可见光图像融合方法能够使结果中保留更多的边缘细节。

     

    Abstract: The fusion of infrared and visible light images can generate images containing more information in line with human visual perception compared with the original images, and is also beneficial for downstream tasks. Traditional image fusion methods based on signal processing have problems such as poor generalization ability and reduced performance of complex image fusion. Deep learning is capable of features extraction and provides good results. However, its results have problems such as reduced preservation of textural details and blurred images. To address these problems, this study proposes a fusion network model of infrared and visible light images based on the multiscale Swin Transformer and an attention mechanism. Swin Transformers can extract long-distance semantic information from a multiscale perspective, and the attention mechanism can weaken the insignificant features in the proposed features to retain the main information. In addition, this study proposes a new hybrid fusion strategy and designs brightness enhancement and detail retention modules according to the respective characteristics of the infrared and visible images to retain more textural details and infrared target information. The fusion method has three parts: the encoder, fusion strategy, and decoder. First, the source image was input into the encoder to extract multiscale depth features. Then, a fusion strategy was designed to fuse the depth features of each scale. Finally, the fused image was reconstructed using a decoder based on nested connections. The experimental results on public datasets show that the proposed method has a better fusion performance compared with other state-of-the-art methods. Among the objective evaluation indicators, EI, AG, QP, EN, and SD were optimal. From a subjective perspective, the proposed infrared and visible light image fusion method can preserve additional edge details in the results.

     

/

返回文章
返回