SFMFusion: 基于语义特征映射自编码的红外与可见光图像融合

SFMFusion: Infrared and Visible Image Fusion Based on Semantic Feature Mapping Autoencoder Network

  • 摘要: 以往的红外与可见光图像融合方法常忽略了语义信息特征的关系,导致红外图像的独特信息挖掘不够充分。为了充分提取挖掘图像的语义信息和细粒度判别特征,本文提出了一种基于语义特征映射自编码的红外与可见光图像融合方法(SFMFusion)。该方法针对粗、细粒度关注的信息重点不同,采取了两重融合策略:对于包含图像空间细节纹理的浅层信息,本文设计了基于内容丰富度的融合规则;对于蕴含图像判别性内容的深层语义信息,设计了基于最小二乘法的语义特征映射融合规则,通过寻求最佳特征映射以便最大限度地保留红外图像的独特信息。在此基础上,为了进一步增强语义融合特征的上下文相关性,本文设计了多尺度增强模块。该模块使用多个具有不同扩张率的空洞卷积对特征进行并行处理语义融合特征,以此学习特征不同尺度的信息。最后,在浅层融合细节信息的逐层引导下,从粗到细重构出最终的融合图像。通过在标准图像TNO和RoadScene数据集上进行主客观实验,与传统和近来深度学习融合方法进行比较分析,结果显示本文方法能有效保留并融合红外与可见光图像中的互补信息,在视觉感知和定量指标上均取得较好的效果。

     

    Abstract: Traditional fusion methods for infrared and visible images often ignore the relationships between semantic information features, resulting in insufficient unique information mining for infrared images. To fully extract and mine semantic information and fine-grained discriminant features of images, this paper proposes an infrared and visible image fusion method based on a semantic feature mapping autoencoder network (SFMFusion). This method adopts dual fusion strategies in view of the different emphasis of coarse- and fine-grained information. For shallow information containing the spatial details of image texture, this paper designs a fusion rule based on information richness. For the deep semantic information containing the discriminative content of images, a fusion rule of semantic feature mapping based on least squares is designed to seek the best feature mapping so as to retain the unique information of the infrared image to the maximum extent. On this basis, to further enhance the contextual relevance of semantic fusion features, a multi-scale enhancement module was designed. In this module, multiple hollow convolutions with different expansion rates are used to process semantic fusion features in parallel to learn the feature information at different scales. Finally, the final fusion image is reconstructed from coarse to fine under the guidance of the shallow fusion details. Subjective and objective experiments were conducted on standard image TNO and RoadScene datasets and compared with traditional and recent deep learning fusion methods. The results showed that the proposed method effectively retained and fuse complementary information in infrared and visible images and achieved good results in both visual perception and quantitative indicators.

     

/

返回文章
返回