融合全局与局部信息的红外-可见光目标检测

陈斌杰; 郑书航; 王珺; 张昀轩

融合全局与局部信息的红外-可见光目标检测

Infrared Visible Light Object Detection Integrating Global and Local Information

摘要

摘要: 为提升复杂场景下多模态目标检测的性能，提出一种融合全局与局部特征的红外-可见光目标检测模型。模型采用多源异构网络架构进行设计：针对可见光图像，利用基于特征关键度的分组卷积来构建卷积神经网络，通过滑动窗口机制从局部空间维度提取目标细节纹理特征；针对红外图像，采用高效率多头自注意力机制来构建视觉Transformer网络，通过全局上下文建模来捕捉目标显著特征。针对红外-可见光特征交互，设计动态权重自适应融合策略，通过关键特征增强与模态权重分配实现跨模态特征互补。同时，为缓解不同尺度特征融合时的信息冲突，提出空间-通道多尺度协同优化模块，通过注意力机制分别从空间和通道维度突出有效特征并抑制噪声信息，进而降低特征间的相互干扰。通过在多个标准数据集上的实验结果表明：所提方法在多模态特征提取、跨模态特征交互以及多尺度特征融合等方面均有显著提升，并且相较于现有同类型网络，该方法也体现出更优的检测效果，能够在复杂环境下精准高效地实现目标检测。

Abstract: In order to improve the performance of multimodal object detection in complex scenes, a model for infrared-visible object detection that integrates global and local features is proposed. The model is designed using a multi-source heterogeneous network architecture. For visible light images, group convolution based on feature importance is used to build a convolutional neural network, extracting detailed texture features of objects from local spatial dimensions through a sliding window mechanism; for infrared images, a highly efficient multi-head self-attention mechanism is employed to build a visual Transformer network, capturing salient features of objects through global context modeling. To facilitate the interaction of infrared-visible features, a dynamic weight adaptive fusion strategy is designed to achieve cross-modal feature complementarity through key feature enhancement and modal weight allocation. Additionally, to alleviate information conflicts when fusing features of different scales, a spatial-channel multi-scale collaborative optimization module is proposed, which highlights effective features and suppresses noise information from spatial and channel dimensions through an attention mechanism, thereby reducing mutual interference between features. Experiments on multiple standard datasets show that the proposed method has significant improvements in multimodal feature extraction, cross-modal feature interaction, and multi-scale feature fusion. Moreover, compared to existing networks of the same type, this method also demonstrates better detection performance, enabling accurate and efficient object detection in complex environments.

HTML全文

参考文献(0)

施引文献

资源附件(0)