基于Transformer的多阶段特征增强RGB-T目标跟踪方法

丁明亮; 赵树飞; 宋娟; 郁章伟

基于Transformer的多阶段特征增强RGB-T目标跟踪方法

A Multi-Stage Feature Enhancement RGB-T Object Tracking Method Based on Transformer

摘要

摘要: 可见光-热红外（RGB-T）目标跟踪旨在结合可见光与热红外成像的双重优势，以达到稳定的跟踪性能。目前普遍的方法是为每个模态分配权重，但是这样无法充分利用它们的互补性。我们提出了一种用于RGB-T目标跟踪任务的多阶段多模态融合方法。首先，使用两个独立的特征提取网络单独提取各自模态特征，以便将单模态跟踪器TransT扩展到RGB-T任务上。其次，设计了模态内多尺度特征聚合和模态间特征调制融合，前者挖掘单一模态内的多尺度信息的特征互补性，提升单模态特征判别能力。后者旨在利用多模态信息的特征互补性，实现多模态特征的交互。本文方法在GTOT数据集上实现了57.3%的成功率和71.2%的精确度，在LasHeR数据集上实现了44.6%的成功率和56.8%的精确度，实验结果表明，所提出的跟踪算法能够有效提高目标跟踪的性能。

Abstract: Visible Light-Thermal Infrared (RGB-T) target tracking aims to combine the dual advantages of visible light and thermal infrared imaging for stable tracking performance. Common methods assign weights to each modality, but this fails to fully utilize their complementarity. We propose a multi-stage multi-modal fusion method for RGB-T target tracking. First, two independent feature extraction networks are used to extract respective modal features, extending the single-modal tracker TransT to the RGB-T task. Second, we design intra-modal multi-scale feature aggregation and inter-modal feature modulation fusion. The former exploits the feature complementarity of multi-scale information within a single modality, enhancing the discriminative ability of single-modal features. The latter aims to utilize the feature complementarity of multi-modal information for interactive multi-modal feature integration. Our method achieves a success rate of 57.3% and an accuracy of 71.2% on the GTOT dataset, and a success rate of 44.6% and an accuracy of 56.8% on the LasHeR dataset. Experimental results indicate that the proposed tracking algorithm effectively improves target tracking performance.

HTML全文

参考文献(0)

施引文献

资源附件(0)