引入Transformer的RGBT自适应融合视觉跟踪

郭勇; 谌海云; 陈建宇; 肖章勇

引入Transformer的RGBT自适应融合视觉跟踪

RGBT Adaptive Fusion Visual Tracking with Transformer

摘要

摘要: RGBT目标跟踪利用可见光和热红外两种模态信息的互补性改善了在云雾遮挡、光照变化等场景下的跟踪性能，但由于可见光和热红外图像特征差异较大，导致大多数跟踪算法特征信息提取不充分、特征融合的冗余信息太多。针对以上问题，提出一种引入Transformer的RGBT自适应融合目标跟踪算法SiamTAF。首先，在特征提取阶段，使用Transformer分别对可见光和热红外分支的AlexNet网络后两层进行改进，使其具有建立特征上下文依赖性的能力。其次，结合交叉注意力和挑选机制提出一种自适应融合模块，以促进两模态特征的互补性融合。最后，为了让线性互相关运算具有捕捉非线性相似特征的能力，在线性互相关运算中加入非线性门控注意力。在GTOT和RGBT234基准数据集上的实验结果表明，相比MANet、DAFNet、DAPNet等算法，SiamTAF算法在面对热交叉、光照变化等问题时，其跟踪性能更具鲁棒性。

Abstract: The RGBT object tracking utilizes the complementarity of visible light and thermal infrared modal information to improve tracking performance in scenarios such as cloud and fog occlusion, and illumination variation. However, due to significant differences in visible light and thermal infrared image features, most tracking algorithms cannot fully extract feature information, resulting in excessive redundant information after feature fusion. To address these problems, the SiamTAF adaptive fusion target tracking algorithm with RGBT and transformer was proposed. First, in the feature extraction stage, a transformer is used to enhance the last two layers of the AlexNet network with visible light and thermal infrared branches, enabling the feature extraction network to establish feature contextual dependency. Secondly, an adaptive fusion module is proposed that combines cross-attention and selection mechanisms to promote complementary fusion between the two modal features. Finally, to enable a linear cross-correlation operation to capture nonlinear similar features, nonlinear gated attention is added to the linear cross-correlation operation. Experiments on the GTOT and RGBT234 benchmark datasets show that, compared with algorithms such as MANet, DAFNet, and DAPNet, the SiamTAF algorithm is more robust when addressing thermal crossover and illumination variation.

HTML全文

参考文献(32)

施引文献

资源附件(0)