FDnet：基于频域分解网络的红外小目标检测

杜妮妮; 叶文亚; 刘烨; 徐生

FDnet：基于频域分解网络的红外小目标检测

FDnet: Frequency Decomposition Network for Infrared Small Target Detection

摘要

摘要: 在复杂背景杂波存在的情况下，检测缺乏纹理和形状信息的红外小目标成为了近年来一个备受关注的挑战。传统的模型驱动方法由于缺乏特征学习和表示的能力，对各种场景的适应性较差。同时，大部分基于深度学习的目标检测方法通过设计结构较深的网络结构来充分提取特征，但可能会在较深层失去目标的纹理结构信息，难以直接用于红外小目标检测。针对以上问题，本文按照对图像频域进行分解并分别进行处理的设计思路，提出了一种基于频域分解网络（frequency decomposition network, FDnet）的红外小目标检测算法。具体来说，FDnet首先通过高频特征提取模块分解出输入图像的高频以及低频成分，并分别送入高频分支以及低频分支用于分别提取高频边界信息以及语义信息，同时为实现两分支信息交互，本文还设计了一种空间信息聚合（spatial information aggregation, SIA）模块实现高频分支对低频分支的引导。此外，为有效捕获输入图像的空间和通道信息的注意力信息，考虑到高频信息的稀疏性，本文在高频分支引入了空间维稀疏自注意力机制（spatial-wise sparse self-attention mechanism, SSAM），同时在低频分支中引入通道维自注意力机制（CAM），从而进一步提升网络对于有效目标的感知能力。与其他现有方法相比，该算法在公开数据集上在使用较低参数量的情况下仍能保持更高的检测精度。

Abstract: In recent years, the detection of small infrared targets, which lack texture and shape information, in the presence of complex background clutter has become a significant challenge. Traditional model-driven approaches exhibit limited feature-learning and representation capabilities, thus showing poor adaptability to diverse scenarios. Most deep-learning-based detection methods rely on deep network architectures to extract features; such architectures may lead to the loss of fine-grained texture information in deeper layers and are thus less effective for small infrared target detection. To address these challenges, we propose a frequency decomposition network (FDnet) that follows the design principle of decomposing an image in the frequency domain and processing different frequency components separately. Specifically, FDnet first employs a high-frequency feature extraction module to decompose the input image into high- and low-frequency components. These components are then processed by two separate branches to extract boundary and semantic information. To facilitate interaction between the two branches, a spatial information aggregation (SIA) module is introduced, enabling high-frequency features to guide the low-frequency branch. Furthermore, considering the sparsity of high-frequency components, a spatially sparse self-attention mechanism (SSAM) is incorporated into the high-frequency branch to better capture spatial attention, whereas a channel-wise attention mechanism (CAM) is embedded in the low-frequency branch to model global channel dependencies. These components operate collectively to enhance a network's perception of meaningful targets. Experimental results on public datasets demonstrate that the proposed method achieves high detection accuracy with significantly fewer parameters compared to other state-of-the-art approaches.

HTML全文

参考文献(30)

施引文献

资源附件(0)