Visible Light and Infrared Image Fusion Network Based on Multi-modal Feature Fusion and Scene Information Embedding
-
Abstract
In order to solve the problem of poor fusion effect of infrared and visible light images caused by unclear image targets and insufficient information extraction ability under complex weather conditions, a new infrared and visible image fusion method based on multi-modal feature fusion and scene information embedding is proposed. Firstly, in order to enhance the image feature extraction capability under complex environments, a hybrid prior knowledge module based on low-light illuminance enhancement and dark channel prior defogging is designed. Secondly, in order to overcome the limitation of existing methods in understanding the semantic information of visible light image scenes, the CLIP model is introduced and improved by using LoRA technology to effectively obtain the scene information from visible light images. Finally, by designing a collaboration network integrating fusion and segmentation, the significant target information and image quality of fused results are further improved. Comparative experiments are conducted on public datasets with other nine fusion methods, and the experimental results show that the proposed method has excellent fusion performance, and the AG, SF, SD and VIF metrics are improved by 145.79%, 116.77%, 78.77% and 73.86% on the LLVIP dataset, respectively. On the MSRS dataset dominated by extreme environments, the metrics of AG, EN, SF, SD, VIF and MI are improved by 95.11%, 36.13%, 43.53%, 155.86%, 82.66% and 67.22%, respectively. On the M3FD dataset with dynamic blur and complex lighting, the metrics of AG, SF and SD are improved by 81.45%, 90.49% and 87.18%, respectively.
-
-