Abstract:
In recent years, the detection of small infrared targets, which lack texture and shape information, in the presence of complex background clutter has become a significant challenge. Traditional model-driven approaches exhibit limited feature-learning and representation capabilities, thus showing poor adaptability to diverse scenarios. Most deep-learning-based detection methods rely on deep network architectures to extract features; such architectures may lead to the loss of fine-grained texture information in deeper layers and are thus less effective for small infrared target detection. To address these challenges, we propose a frequency decomposition network (FDnet) that follows the design principle of decomposing an image in the frequency domain and processing different frequency components separately. Specifically, FDnet first employs a high-frequency feature extraction module to decompose the input image into high- and low-frequency components. These components are then processed by two separate branches to extract boundary and semantic information. To facilitate interaction between the two branches, a spatial information aggregation (SIA) module is introduced, enabling high-frequency features to guide the low-frequency branch. Furthermore, considering the sparsity of high-frequency components, a spatially sparse self-attention mechanism (SSAM) is incorporated into the high-frequency branch to better capture spatial attention, whereas a channel-wise attention mechanism (CAM) is embedded in the low-frequency branch to model global channel dependencies. These components operate collectively to enhance a network's perception of meaningful targets. Experimental results on public datasets demonstrate that the proposed method achieves high detection accuracy with significantly fewer parameters compared to other state-of-the-art approaches.