HRformer:基于多级回归Transformer网络的红外小目标检测

HRformer: Hierarchical Regression Transformer for Infrared Small-Target Detection

  • 摘要: 红外小目标检测是指从低信噪比、复杂背景的红外图像中对小目标进行检测,在海上救援、交通管理等应用中具有重要实际意义。然而,由于图像分辨率低、目标尺寸小以及特征不突出等因素,导致红外目标很容易淹没在包含噪声和杂波的背景中,如何精确检测红外小目标的外形信息仍然是一个挑战。针对上述问题,构建了一种基于多级回归Transformer(HRformer)网络的红外小目标检测算法。具体来说,首先为了在获得多尺度信息的同时尽可能避免原始图像信息的损失,采用像素逆重组(PixelUnShuffle)操作对原始图像下采样来获取不同层级网络的输入,同时采用一种可学习的像素重组(PixelShuffle)操作对每一层级的输出特征图进行上采样,提升了网络的灵活性;接着,为实现网络中不同层级特征之间的信息交互,本文设计了一种包含空间注意力计算分支以及通道注意力计算分支在内的交叉注意力融合(cross attention fusion, CAF)模块实现特征高效融合以及信息互补;最后,为进一步提升网络的检测性能,结合普通Transformer结构具有较大感受野以及基于窗口的Transformer结构具有较少计算复杂度的优势,提出了一种局部-全局Transformer(LGT)结构,能够在提取局部上下文信息的同时对全局依赖关系进行建模,计算成本也得到节省。实验结果表明,与目前较为先进的一些红外小目标检测算法相比,本文所提出的算法具有更高的检测精度,同时具有较少的参数量,在解决实际问题中更有意义。

     

    Abstract: Infrared small-target detection refers to the detection of small targets in infrared images with low signal-to-noise ratios and complex backgrounds. Infrared small-target detection is essential in applications, such as maritime rescue and traffic management. However, because of factors such as low image resolution, small target size, and inconspicuous features, infrared targets are prone to submergence in a background that contains noise and clutter. The accurate detection of the shape information of small infrared targets remains a challenge. An infrared small-target detection algorithm based on a hierarchical regression transformer (HRformer) network was constructed to address these problems. Specifically, the PixelUnShuffle operation was leveraged to downsample the original image and obtain the input of different network levels to obtain multiscale information while minimizing the loss of the original image information. The PixelShuffle operation upsamples the output feature map of each level, improving the flexibility of the network. Next, a cross-attention fusion module that includes the spatial and channel attention calculation branches realizes efficient feature fusion and information complementarity to realize the information interaction between different levels of features in the network. Finally, combined with the ordinary Transformer structure, which has a large receptive field, and the window-based Transformer, which has the advantage of minimal computational complexity, a local–global transformer structure is proposed to further improve the detection performance of the network and reduce computational costs. The proposed structure can model global dependencies while extracting local context information. The experimental results show that the proposed algorithm has a higher detection accuracy and fewer parameters than some advanced infrared small-target detection algorithms. Therefore, the proposed algorithm is suitable for solving practical problems.

     

/

返回文章
返回