基于扩散模型的红外小目标检测

屠晨浩, 叶文亚, 杜妮妮, 郑彬淏, 徐生

屠晨浩, 叶文亚, 杜妮妮, 郑彬淏, 徐生. 基于扩散模型的红外小目标检测[J]. 红外技术, 2025, 47(6): 757-764.
引用本文: 屠晨浩, 叶文亚, 杜妮妮, 郑彬淏, 徐生. 基于扩散模型的红外小目标检测[J]. 红外技术, 2025, 47(6): 757-764.
TU Chenhao, YE Wenya, DU Nini, ZHENG Binhao, XU Sheng. Diffusion Model for Infrared Small Target Detection[J]. Infrared Technology , 2025, 47(6): 757-764.
Citation: TU Chenhao, YE Wenya, DU Nini, ZHENG Binhao, XU Sheng. Diffusion Model for Infrared Small Target Detection[J]. Infrared Technology , 2025, 47(6): 757-764.

基于扩散模型的红外小目标检测

基金项目: 

宁波市交通运输科技计划项目 202216

宁波市科技计划项目 2024S076

详细信息
    作者简介:

    屠晨浩(2002-),男,本科,主要研究方向为道路与桥梁工程。E-mail:956250283@qq.com

    通讯作者:

    叶文亚(1974-),女,高级工程师,主要研究方向为图像检测、道路智能化。E-mail: 763425011@qq.com

  • 中图分类号: TP753

Diffusion Model for Infrared Small Target Detection

  • 摘要:

    红外小目标检测作为一项复杂且关键的计算机视觉任务,面临着目标尺寸微小、对比度低、背景噪声干扰强烈及数据稀缺等多重挑战,这些问题极大地制约了检测精度与实时性。现有基于深度学习的算法大多基于分割范式,通过设计结构较深的编码器-解码器网络实现分割掩码的生成,由于缺乏足够的特征表示和学习能力,在应对各种复杂场景时检测精度较低。鉴于此,受启发于人工智能领域扩散模型技术所取得的巨大成功,本文提供了一种新的解决思路,将红外小目标检测问题描述为生成式任务,并提出了一个条件去噪网络diff-ISTD。该网络利用逐步去噪与重建优势,挖掘图像内在深层次统计特性,从而能够更精确地区分并捕获微弱且易于混淆的小目标特征。具体来说,该网络包含条件分支网络以及去噪分支网络,分别用于充分提取红外图像的先验知识和细化含有噪声的掩码。此外,本文还设计了一种并行双维自注意力计算(PDSA)模块,融合空间与通道维度分析,极大增强了模型对全局结构和局部细节的把握力,克服了由分辨率和环境多样性引起的目标模糊难题。综合实验结果显示,diff-ISTD在面对极端检测条件时,相比目前先进的分割方法,展现出卓越的性能与更高的检测效率,为克服小目标检测领域的长期挑战开辟了新路径。

    Abstract:

    Infrared small-target detection, a complex and critical task in computer vision, faces numerous challenges—including tiny target sizes, low contrast, severe background noise, and limited data availability. These factors significantly impair detection accuracy and real-time performance. Existing deep learning–based algorithms, which predominantly adopt segmentation paradigms via deep encoder–decoder architectures for generating segmentation masks, often exhibit limited precision in complex scenarios due to inadequate feature representation and learning capabilities. Inspired by the notable success of diffusion models in artificial intelligence, this paper introduces a novel approach by reframing infrared small-target detection as a generative task and proposes a conditional denoising network, termed diff-ISTD. By leveraging the strengths of progressive denoising and image reconstruction, diff-ISTD captures the deep statistical properties of infrared images, enabling more precise identification of weak and ambiguous small-target features. The proposed network consists of conditional branching modules for extracting prior knowledge from infrared inputs and denoising branches for refining noisy segmentation masks. In addition, a parallel dual-dimensional self-attention (PDSA) block is introduced to integrate spatial and channel information, significantly enhancing the model's sensitivity to global structures and local details. This design effectively addresses the challenges of target blurring caused by resolution limitations and environmental variability. Comprehensive experiments demonstrate that, under rigorous detection conditions, diff-ISTD outperforms current state-of-the-art segmentation methods in terms of performance and detection efficiency, offering a promising direction for advancing infrared small-target detection technologies.

  • 图  1   前向过程与逆向去噪过程示意图

    Figure  1.   Illustration of the forward diffusion and reverse denoising processes

    图  2   diff-ISTD的整体框架:PDSA表示并行双维自注意力模块,Res表示残差块,MCA表示多头交叉注意模块。

    Figure  2.   Overall framework of diff-ISTD, where PDSA represents the parallel dual-dimension self-attention block, Res represents the residual block, and MCA represents the multi-head cross-attention module

    图  3   diff-ISTD子模块结构示意图:(a)并行双维自注意力模块结构;(b) 空间维自注意力模块;(c) 通道维自注意力模块。

    注:红色T表示矩阵转置操作。

    Figure  3.   Illustration of the diff-ISTD submodule structure: (a) The architecture of the PDSA module, (b) the architecture of the spatial-wise self-attention module, and (c) the architecture of the channel-wise self-attention module.

    Note: The red T represents the matric transpose operation

    图  4   不同算法在NUAA-SIRST数据集上红外图像检测结果

    Figure  4.   Infrared image detection results of different algorithms on NUAA-SIRST datasets

    图  5   不同算法在NUAA-SIRST(实线)以及IRSTD-1k(虚线)数据集上的ROC曲线

    Figure  5.   ROC curves of different algorithms on NUAA-SIRST (solid line) and IRSTD-1k datasets (dotted line)

    表  1   消融实验结果

    Table  1   Ablation study results

    Experiment Model IoU nIoU Pd Fa
    1 w/o diffusion+PDSA 70.04 69.38 93.26 39.45
    2 w/o diffusion 71.62 70.83 95.42 32.02
    3 w/o PDSA 72.87 70.65 96.72 28.75
    4 diffusion+SE attention 72.94 70.41 96.88 32.42
    5 diffusion+spatial-wise transformer 73.16 71.06 97.45 28.07
    6 diffusion+channel-wise transformer 72.89 70.82 97.78 26.92
    7 diffusion+spatial & channel-wise trans- former 74.01 71.98 98.33 25.53
    8 diff-ISTD 74.45 72.81 98.52 20.13
    下载: 导出CSV

    表  2   对比实验结果

    Table  2   Experimental results on different algorithms

    Method NUAA-SIRST IRSTD-1k Params
    IoU nIoU Pd Fa IoU nIoU Pd Fa
    WSLCM 4.41 33.82 91.74 22593 3.45 0.68 72.44 6619 -
    TLLCM 3.51 21.75 92.66 26498 3.31 0.78 77.39 6738 -
    IPI 2.62 4.16 84.40 203.07 27.92 20.46 81.37 16.18 -
    NRAM 45.68 55.49 85.32 161.15 15.25 9.90 70.68 16.93 -
    PSTNN 51.95 62.66 82.57 394.29 24.57 17.93 71.99 35.25 -
    MSLSTIPT 20.21 24.74 82.57 259.75 11.43 5.93 79.03 1524 -
    MDvsFA 45.28 48.16 76.15 166.07 42.45 44.31 83.21 78.54 3.77 M
    ACM 67.96 71.05 97.25 72.92 58.64 56.94 90.42 23.57 0.39 M
    ALCNet 73.43 71.44 97.84 25.68 61.02 57.98 91.24 26.53 0.38 M
    AGPCNet 74.26 70.05 98.16 20.56 61.53 58.32 92.02 24.43 12.36 M
    diff-ISTD 74.45 72.81 98.52 20.13 62.65 60.18 93.53 21.03 0.29 M
    下载: 导出CSV
  • [1]

    ZHAO M, LI W, LI L, et al. Three-order tensor creation and tucker decomposition for infrared small-target detection[J]. IEEE Transactions on Geoscience and Remote Sensing, 2021, 60: 1-16.

    [2]

    ZHAO M, LI L, LI W, et al. Infrared small-target detection based on multiple morphological profiles[J]. IEEE Transactions on Geoscience and Remote Sensing, 2020, 59(7): 6077-6091.

    [3]

    ZHANG J, TAO D. Empowering things with intelligence: a survey of the progress, challenges, and opportunities in artificial intelligence of things[J]. IEEE Internet of Things Journal, 2020, 8(10): 7789-7817.

    [4]

    Deshpande S D, Er M H, Venkateswarlu R, et al. Max-mean and max-median filters for detection of small targets[C]//Signal and Data Processing of Small Targets, Proc. of SPIE, 1999, 3809: 74-83.

    [5]

    GAO C, MENG D, YANG Y, et al. Infrared patch-image model for small target detection in a single image[J]. IEEE Transactions on Image Processing, 2013, 22(12): 4996-5009. DOI: 10.1109/TIP.2013.2281420

    [6]

    HAN J, Moradi S, Faramarzi I, et al. A local contrast method for infrared small-target detection utilizing a tri-layer window[J]. IEEE Geoscience and Remote Sensing Letters, 2019, 17(10): 1822-1826.

    [7]

    HAN J, Moradi S, Faramarzi I, et al. Infrared small target detection based on the weighted strengthened local contrast measure[J]. IEEE Geoscience and Remote Sensing Letters, 2020, 18(9): 1670-1674.

    [8]

    DAI Y, WU Y, ZHOU F, et al. Asymmetric contextual modulation for infrared small target detection[C]//Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2021: 950-959.

    [9]

    DAI Y, WU Y, ZHOU F, et al. Attentional local contrast networks for infrared small target detection[J]. IEEE Transactions on Geoscience and Remote Sensing, 2021, 59(11): 9813-9824.

    [10]

    ZHANG M, ZHANG R, YANG Y, et al. ISNet: Shape matters for infrared small target detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 877-886.

    [11]

    WU X, HONG D, Chanussot J. UIU-Net: U-Net in U-Net for infrared small object detection[J]. IEEE Transactions on Image Processing, 2022, 32: 364-376.

    [12]

    LI B, XIAO C, WANG L, et al. Dense nested attention network for infrared small target detection[J]. IEEE Transactions on Image Processing, 2022, 32: 1745-1758.

    [13]

    WANG K, WU X, ZHOU P, et al. AFE-Net: Attention-guided feature enhancement network for infrared small target detection[J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2024, 17: 4208-4221.

    [14]

    ZHANG M, YUE K, ZHANG J, et al. Exploring feature compensation and cross-level correlation for infrared small target detection[C]// Proceedings of the 30th ACM International Conference on Multimedia, 2022: 1857-1865.

    [15]

    LIU F, GAO C, CHEN F, et al. Infrared small and dim target detection with transformer under complex backgrounds[J]. IEEE Transactions on Image Processing, 2023, 32: 5921-5932.

    [16] 杜妮妮, 单凯东, 卫莎莎. LPformer: 基于拉普拉斯金字塔多级Transformer的红外小目标检测[J]. 红外技术, 2023, 45(6): 630-638. http://hwjs.nvir.cn/article/id/ad309416-52b1-456f-b972-42f94c2aa3e1

    DU Nini, SHAN Kaidong, WEI Shasha. LPformer: Laplacian pyramid multi-level transformer for infrared small target detection[J]. Infrared Technology, 2023, 45(6): 630-638. http://hwjs.nvir.cn/article/id/ad309416-52b1-456f-b972-42f94c2aa3e1

    [17] 杜妮妮, 单凯东, 王建超. HRformer: 基于多级回归Transformer网络的红外小目标检测[J]. 红外技术, 2024, 46(2): 199-207. http://hwjs.nvir.cn/article/id/079d28fc-8d91-4e17-b98a-18a5900c0872

    DU Nini, SHAN Kaidong, WANG Jianchao. HRformer: hierarchical regression transformer for infrared small-target detection[J]. Infrared Technology, 2024, 46(2): 199-207. http://hwjs.nvir.cn/article/id/079d28fc-8d91-4e17-b98a-18a5900c0872

    [18]

    Samuel D, Ben-Ari R, Raviv S, et al. Generating images of rare concepts using pre-trained diffusion models[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2024, 38(5): 4695-4703.

    [19]

    LI Y, WANG H, JIN Q, et al. Snapfusion: text-to-image diffusion model on mobile devices within two seconds[J]. Advances in Neural Information Processing Systems, 2023, 36: 20662-20678.

    [20]

    ZHANG Z, LI B, NIE X, et al. Towards consistent video editing with text-to-image diffusion models[J]. Advances in Neural Information Processing Systems, 2023, 36: 58508-58519.

    [21]

    CHEN C F R, FAN Q, Panda R. Crossvit: cross-attention multi-scale vision transformer for image classification[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021: 357-366.

    [22]

    HO J, JAIN A, Abbeel P. Denoising diffusion probabilistic models[J]. Advances in Neural Information Processing Systems, 2020, 33: 6840-6851.

    [23]

    Saharia C, Ho J, Chan W, et al. Image super-resolution via iterative refinement[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 45(4): 4713-4726.

    [24]

    XI Y, ZHANG J, LIU K. Nanetformer: nested attention network with auxiliary transformer enhancement for infrared small target detection[C]//IGARSS 2023-2023 IEEE International Geoscience and Remote Sensing Symposium, 2023: 6596-6599.

    [25]

    Zamir S W, Arora A, Khan S, et al. Restormer: efficient transformer for high-resolution image restoration[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 5728-5739.

    [26]

    ZHANG T, LI L, CAO S, et al. Attention-guided pyramid context networks for detecting infrared small target under complex background[C]//IEEE Transactions on Aerospace and Electronic Systems, 2023, 59(4): 4250-4261. doi: 10.1109/TAES.2023.3238703

    [27]

    WANG Huan, ZHOU Luping, WANG Lei. Miss detection vs. false alarm: adversarial learning for small object segmentation in infrared images[C]//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision (ICCV), 2019: 8508-8517. doi: 10.1109/ICCV.2019.00860.

    [28]

    ZHANG L, PENG L, ZHANG T, et al. Infrared small target detection via non-convex rank approximation minimization joint l 2, 1 norm[J]. Remote Sensing, 2018, 10(11): 1821.

    [29]

    ZHANG L, PENG Z. Infrared small target detection based on partial sum of the tensor nuclear norm[J]. Remote Sensing, 2019, 11(4): 382.

    [30]

    SUN Y, YANG J, AN W. Infrared dim and small target detection via multiple subspace learning and spatial-temporal patch-tensor model[J]. IEEE Transactions on Geoscience and Remote Sensing, 2020, 59(5): 3737-3752.

    [31]

    HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018: 7132-7141.

图(5)  /  表(2)
计量
  • 文章访问数:  35
  • HTML全文浏览量:  4
  • PDF下载量:  18
  • 被引次数: 0
出版历程
  • 收稿日期:  2024-05-23
  • 修回日期:  2024-06-14
  • 网络出版日期:  2025-06-26
  • 刊出日期:  2025-06-19

目录

    /

    返回文章
    返回