基于视觉Transformer和双解码器的红外小目标检测方法

Infrared Small Target Detection Method with Vision Transformer and Dual Decoder

  • 摘要: 当前基于卷积神经网络的红外小目标检测方法在编码器阶段受限于感受野,且解码器在多尺度特征融合中缺乏有效的特征交互。本文提出了一种基于编码器-解码器结构的新方法,针对现有红外小目标检测方法中的问题进行改进。该方法使用视觉Transformer作为编码器,能够有效地提取红外小目标图像的多尺度特征。视觉Transformer是一种新兴的深度学习架构,其通过自注意力机制捕捉图像中像素之间的全局关系,以处理长程依赖性和上下文信息。此外,本文还设计了一个由交互式解码器和辅助解码器组成的双解码器模块,旨在提高解码器对红外小目标的重构能力。该双解码器模块能够充分利用不同特征之间的互补信息,促进深层特征和浅层特征之间的交互,并通过将两个解码器的结果进行叠加,以更好地重构红外小目标。在广泛使用的公共数据集上的实验结果表明,本文提出的方法在F1和mIoU两个评价指标上的性能优于其他对比方法。

     

    Abstract: The existing infrared small-target detection method based on convolutional neural networks (CNN) exhibits the problem of a limited receptive field in the encoder stage, and the decoder lacks an effective feature interaction when fusing multiscale features. To address the aforementioned issues, in this study, a new method is proposed based on an encoder–decoder structure. Specifically, a vision transformer is used as an encoder to extract multiscale features from small infrared target images. The vision transformer is an emerging deep-learning architecture that uses a self-attention mechanism to capture the global relationship between all pixels in the input image, thereby effectively processing long-range dependencies and contextual information in the image. Furthermore, a dual-decoder module, comprising an interactive decoder and auxiliary decoder, is proposed to improve the ability of the decoder to reconstruct small infrared targets. The dual-decoder module can make full use of the complementary information between different features, promote interaction between deep and shallow features, and better reconstruct small infrared targets by combining the results of the two decoders. Experimental results on widely used public datasets show that the proposed method outperforms other methods in terms of two evaluation indicators: F1 and mIoU.

     

/

返回文章
返回