基于多模态特征融合与场景信息嵌入的可见光与红外图像融合网络

Visible Light and Infrared Image Fusion Network Based on Multi-modal Feature Fusion and Scene Information Embedding

  • 摘要: 针对复杂天气环境下图像目标不清晰、信息提取能力不足等导致图像融合效果不佳的问题,提出了一种多模态特征融合与场景信息嵌入的红外与可见光图像融合新方法。首先,为了增强复杂环境下图像特征的提取能力,设计了基于低光照度增强和暗通道先验去雾的混合先验知识模块;其次,为了克服现有方法对可见光图像场景语义信息理解能力不足的问题,引入了CLIP模型并使用LoRA技术进行改进,以有效获取可见光图像场景信息;最后,通过设计融合与分割协同网络,进一步提高了融合结果的显著目标信息和图像质量。在公开数据集上与其他9种图像融合方法进行了对比实验,实验结果表明,本文方法具有优秀的融合性能。在LLVIP通用场景上,本文方法的AG、SF、SD和VIF指标分别提高了145.79%、116.77%、78.77%和73.86%;在极端环境主导的MSRS数据集上,AG、EN、SF、SD、VIF和MI分别提高了95.11%、36.13%、43.53%、155.86%、82.66%和67.22%;在动态模糊与复杂光照挑战的M3FD数据集上,AG、SF和SD分别提高了81.45%、90.49%和87.18%。

     

    Abstract: In order to solve the problem of poor fusion effect of infrared and visible light images caused by unclear image targets and insufficient information extraction ability under complex weather conditions, a new infrared and visible image fusion method based on multi-modal feature fusion and scene information embedding is proposed. Firstly, in order to enhance the image feature extraction capability under complex environments, a hybrid prior knowledge module based on low-light illuminance enhancement and dark channel prior defogging is designed. Secondly, in order to overcome the limitation of existing methods in understanding the semantic information of visible light image scenes, the CLIP model is introduced and improved by using LoRA technology to effectively obtain the scene information from visible light images. Finally, by designing a collaboration network integrating fusion and segmentation, the significant target information and image quality of fused results are further improved. Comparative experiments are conducted on public datasets with other nine fusion methods, and the experimental results show that the proposed method has excellent fusion performance, and the AG, SF, SD and VIF metrics are improved by 145.79%, 116.77%, 78.77% and 73.86% on the LLVIP dataset, respectively. On the MSRS dataset dominated by extreme environments, the metrics of AG, EN, SF, SD, VIF and MI are improved by 95.11%, 36.13%, 43.53%, 155.86%, 82.66% and 67.22%, respectively. On the M3FD dataset with dynamic blur and complex lighting, the metrics of AG, SF and SD are improved by 81.45%, 90.49% and 87.18%, respectively.

     

/

返回文章
返回