Abstract:
Traditional fusion methods for infrared and visible images often ignore the relationships between semantic information features, resulting in insufficient unique information mining for infrared images. To fully extract and mine semantic information and fine-grained discriminant features of images, this paper proposes an infrared and visible image fusion method based on a semantic feature mapping autoencoder network (SFMFusion). This method adopts dual fusion strategies in view of the different emphasis of coarse- and fine-grained information. For shallow information containing the spatial details of image texture, this paper designs a fusion rule based on information richness. For the deep semantic information containing the discriminative content of images, a fusion rule of semantic feature mapping based on least squares is designed to seek the best feature mapping so as to retain the unique information of the infrared image to the maximum extent. On this basis, to further enhance the contextual relevance of semantic fusion features, a multi-scale enhancement module was designed. In this module, multiple hollow convolutions with different expansion rates are used to process semantic fusion features in parallel to learn the feature information at different scales. Finally, the final fusion image is reconstructed from coarse to fine under the guidance of the shallow fusion details. Subjective and objective experiments were conducted on standard image TNO and RoadScene datasets and compared with traditional and recent deep learning fusion methods. The results showed that the proposed method effectively retained and fuse complementary information in infrared and visible images and achieved good results in both visual perception and quantitative indicators.