Visible and Infrared Object Detection
Based on Bidirectional Gated Cross-modal Fusion
-
Abstract
Detection accuracy in complex scenes can be effectively improved by fusing complementary features from visible and infrared images. However, existing methods often fail to fully account for the modal differences between dual-modality features. To address this issue, this paper proposes a Bidirectional Gated Cross-enhanced Fusion Network (BGCF-Net), which achieves thorough inter-modal fusion through a bidirectional gated mechanism and cross-enhancement to enhance detection performance. Firstly, the dual-stream CSPDarknet53 network architecture is adopted to extract dual-modality features, and wavelet convolution is introduced to expand the receptive field of the shallow backbone. Secondly, a bidirectional gated interaction module is designed to effectively utilize the complementary information between modalities and reduce the modal differences via a gating mechanism. In addition, the cross-attention enhanced fusion module is proposed to improve the effectiveness of fused features by exchanging long-range contextual information. The experimental results on FLIR_Aligned and M3FD datasets demonstrate that the proposed method achieves mAP50 scores of 77.1% and 86.3%, respectively, outperforming several detection algorithms and verifying the effectiveness of our approach.
-
-