融合感受野与跨尺度全局建模的双模态小目标检测

唐逸凡; 胡旭冉; 罗熹; 黄娟娟; 戴超兰; 李岚婷

融合感受野与跨尺度全局建模的双模态小目标检测

Receptive Field Fusion and Cross-Scale Global Modeling for Infrared and Visible Small Object Detection

摘要

摘要: 针对无人机平台在复杂环境中获取的可见光与红外图像中，小目标普遍存在尺度变化剧烈、热信号微弱与背景干扰显著等问题，本文提出一种融合感受野增强与全局跨尺度建模的双模态小目标检测模型。该模型基于YOLOv11架构，围绕特征提取、融合与判别3个关键环节展开创新优化。首先，引入重参数化感受野注意力卷积模块，通过双分支结构在浅层阶段实现大感受野建模，提升特征表示的空间敏感性与模态适应能力；其次，设计基于Transformer的全局感知跨尺度特征融合机制，在语义层面对多尺度特征进行非局部对齐，有效缓解特征信息传递中的语义偏移问题；最后，在检测头引入混合局部通道注意力机制，提升小目标区域的特征聚焦能力，并抑制背景干扰。在VisDrone2021与HIT-UAV两个主流数据集上的实验证明，本文方法在检测精度、结构效率与多模态鲁棒性方面均优于现有主流轻量化模型与Transformer模型。

Abstract: Small object detection in UAV-based visible and infrared imagery remains challenging due to scale variation, weak thermal signals, and complex background interference. This paper proposes a dual-modality detection model that integrates receptive field enhancement and global cross-scale semantic fusion, built upon the YOLOv11 architecture. A reparameterized receptive field attention convolution (RFAConv) module expands shallow-layer receptive fields via a dual-branch structure to improve spatial sensitivity and modality adaptability. A Transformer-guided global fusion mechanism aligns multi-scale semantics non-locally, and a mixed local channel attention module enhances focus on small-object regions while suppressing noise. Experiments on VisDrone2021 and HIT-UAV datasets show that the proposed method achieves superior accuracy, structural efficiency, and robustness compared to existing lightweight and Transformer-based detectors.

HTML全文

参考文献(0)

施引文献

资源附件(0)