RGBT Adaptive Fusion Visual Tracking with Transformer
-
Abstract
The RGBT object tracking utilizes the complementarity of visible light and thermal infrared modal information to improve tracking performance in scenarios such as cloud and fog occlusion, and illumination variation. However, due to significant differences in visible light and thermal infrared image features, most tracking algorithms cannot fully extract feature information, resulting in excessive redundant information after feature fusion. To address these problems, the SiamTAF adaptive fusion target tracking algorithm with RGBT and transformer was proposed. First, in the feature extraction stage, a transformer is used to enhance the last two layers of the AlexNet network with visible light and thermal infrared branches, enabling the feature extraction network to establish feature contextual dependency. Secondly, an adaptive fusion module is proposed that combines cross-attention and selection mechanisms to promote complementary fusion between the two modal features. Finally, to enable a linear cross-correlation operation to capture nonlinear similar features, nonlinear gated attention is added to the linear cross-correlation operation. Experiments on the GTOT and RGBT234 benchmark datasets show that, compared with algorithms such as MANet, DAFNet, and DAPNet, the SiamTAF algorithm is more robust when addressing thermal crossover and illumination variation.
-
-