基于双向门控交叉融合的可见光与红外目标检测

王江安; 吴嘉闻; 安瑛; 米文宝

基于双向门控交叉融合的可见光与红外目标检测

Visible and Infrared Object Detection Based on Bidirectional Gated Cross-modal Fusion

摘要

摘要: 通过融合可见光与红外图像的互补特征可有效提高复杂场景下的检测精度。然而，现有方法对双模态特征间的模态差异缺乏充分考量。为此，本文提出了一种双向门控交叉增强融合网络（Bidirectional Gated Cross-enhanced Fusion Network，BGCF-Net），通过双向门控机制和交叉增强实现模态间的充分融合以提升检测性能。首先采用双流CSPDarknet53网络架构提取双模态特征，并引入小波卷积以扩展浅层骨干网络的感受野；其次，设计了双向门控交互模块，通过门控机制有效利用模态间的互补信息来减小模态差异；此外，进一步提出交叉注意力增强融合模块，利用长程上下文信息交换增强融合特征的有效性。在FLIR_Aligned和M3FD数据集上的实验结果表明，本方法的mAP50分别达到77.1%和86.3%，性能优于多个双模态检测算法，验证了所提方法的有效性。

Abstract: Detection accuracy in complex scenes can be effectively improved by fusing complementary features from visible and infrared images. However, existing methods often fail to fully account for the modal differences between dual-modality features. To address this issue, this paper proposes a Bidirectional Gated Cross-enhanced Fusion Network (BGCF-Net), which achieves thorough inter-modal fusion through a bidirectional gated mechanism and cross-enhancement to enhance detection performance. Firstly, the dual-stream CSPDarknet53 network architecture is adopted to extract dual-modality features, and wavelet convolution is introduced to expand the receptive field of the shallow backbone. Secondly, a bidirectional gated interaction module is designed to effectively utilize the complementary information between modalities and reduce the modal differences via a gating mechanism. In addition, the cross-attention enhanced fusion module is proposed to improve the effectiveness of fused features by exchanging long-range contextual information. The experimental results on FLIR_Aligned and M3FD datasets demonstrate that the proposed method achieves mAP50 scores of 77.1% and 86.3%, respectively, outperforming several detection algorithms and verifying the effectiveness of our approach.

HTML全文

参考文献(28)

施引文献

资源附件(0)