RGB-T显著性目标检测综述

吴锦涛, 王安志, 任春洪

吴锦涛, 王安志, 任春洪. RGB-T显著性目标检测综述[J]. 红外技术, 2025, 47(1): 1-9.
引用本文: 吴锦涛, 王安志, 任春洪. RGB-T显著性目标检测综述[J]. 红外技术, 2025, 47(1): 1-9.
WU Jintao, WANG Anzhi, REN Chunhong. RGB-T Salient Object Detection: A Survey[J]. Infrared Technology , 2025, 47(1): 1-9.
Citation: WU Jintao, WANG Anzhi, REN Chunhong. RGB-T Salient Object Detection: A Survey[J]. Infrared Technology , 2025, 47(1): 1-9.

RGB-T显著性目标检测综述

基金项目: 

国家自然科学基金地区基金项目 62162013

贵州师范大学学术新苗基金项目 黔师新苗[2022]30号

详细信息
    作者简介:

    吴锦涛(2000-),男,浙江宁波人,硕士研究生,研究方向:显著性目标检测。E-mail:bigdatawujitnao@163.com

    通讯作者:

    王安志(1986-),男,贵州铜仁人,副教授,研究方向:人工智能、计算机视觉等。E-mail:cvmll6102@163.com

  • 中图分类号: TP391

RGB-T Salient Object Detection: A Survey

  • 摘要:

    除RGB图像外,热红外图像也能提取出对显著性目标检测至关重要的显著性信息。热红外图像随着红外传感设备的发展和普及已经变得易于获取,RGB-T显著性目标检测已成为了热门研究领域,但目前仍缺少对现有方法全面的综述。首先介绍了基于机器学习的RGB-T显著性目标检测方法,然后着重介绍了两类基于深度学习的RGB-T显著性目标检测方法:基于卷积神经网络和基于Vision Transformer的方法。随后对相关数据集和评价指标进行介绍,并在这些数据集上对代表性的方法进行了定性和定量的比较分析。最后对RGB-T显著性目标检测面临的挑战及未来的发展方向进行了总结与展望。

    Abstract:

    In addition to RGB images, thermal IR images can be used to extract salient information, which is crucial for salient object detection. With the development and popularization of IR sensing equipment, thermal IR images have become readily available, and RGB-T salient object detection has become a popular research topic. However, there is currently a lack of comprehensive surveys on the existing methods. First, we briefly introduce machine learning-based RGB-T salient object detection methods and then focus on two types of deep learning methods based on CNNs and vision transformers. Subsequently, relevant datasets and evaluation metrics are introduced, and both qualitative and quantitative comparative analyses are conducted on representative methods using these datasets. Finally, challenges and future development directions for RGB-T salient object detection are summarized and discussed.

  • 图  1   RGB-T显著性目标检测的分类

    Figure  1.   Classification of RGB-T salient object detection

    图  2   RGB-T显著性目标检测的发展历程

    Figure  2.   The development of RGB-T saliency target detection

    图  3   不同质量的输入及其显著性预测

    Figure  3.   Different quality inputs and their salient prediction

    图  4   RGB-T显著性目标检测方法的可视化比较

    Figure  4.   Visual comparison of RGB-T salient object detection methods

    图  5   RGB-T显著性目标检测面临的挑战

    Figure  5.   The challenges faced by RGB-T salient object detection

    表  1   RGB-T显著性目标检测数据集

    Table  1   The RGB-T salient object detection datasets

    Name Year Scales Camera equipment Disadvantage
    VT821 2018 821 FLIR A310、SONY TD-2073 1. Simple scenes that lack complexity and variety.
    2. The camera uses different parameters when capturing RGB and thermal images.
    3. Additional whitespace is introduced when aligning images.
    VT1000 2019 1000 FLIR SC620 1. There are potential errors as the images are aligned manually.
    2. Limited scenario complexity and diversity.
    VT5000 2020 5000 FLIR T640、FLIR T610 1. Images are affected by thermal crossover, making detection challenging.
    下载: 导出CSV

    表  2   基于机器学习的RGB-T显著性目标检测方法定量比较

    Table  2   Quantitative comparison of machine learning-based RGB-T salient object detection methods

    Algorithms VT821 VT1000 VT5000
    S↑ F↑ E↑ MAE↓ S↑ F↑ E↑ MAE↓ S↑ F↑ E↑ MAE↓
    MTMR[7] 0.725 0.662 0.815 0.108 0.706 0.715 0.836 0.119 0.680 0.595 0.795 0.114
    N3S-NIR[10] 0.723 0.734 0.859 0.140 0.726 0.717 0.827 0.145 0.652 0.575 0.780 0.168
    LTCR[11] 0.762 0.737 0.854 0.088 0.799 0.794 0.872 0.084 -
    MGFL[12] 0.782 0.725 0.841 0.071 0.820 0.801 0.882 0.066 0.751 0.661 0.817 0.085
    Note: ↑ indicates that the larger the indicator, the better, and ↓ indicates that the smaller the indicator, the better. Bold and underline indicate optimal and sub-optimal results, respectively.
    下载: 导出CSV

    表  3   基于深度学习的RGB-T显著性目标检测方法定量比较

    Table  3   Quantitative comparison of deep learning-based RGB-T salient object detection methods

    Methods Algorithms Backbone VT821 VT1000 VT5000
    S↑ F↑ E↑ MAE↓ S↑ F↑ E↑ MAE↓ S↑ F↑ E↑ MAE↓
    CNN-based FMCF[8] VGG16 0.760 0.640 0.796 0.080 0.873 0.823 0.921 0.037 0.814 0.734 0.864 0.055
    SGDL[15] VGG19 0.765 0.730 0.847 0.085 0.787 0.764 0.856 0.090 0.750 0.672 0.824 0.089
    ADFNet[21] VGG16 0.810 0.716 0.842 0.077 0.910 0.847 0.921 0.034 0.863 0.778 0.891 0.048
    MIDD[22] VGG16 0.871 0.804 0.895 0.045 0.915 0.882 0.933 0.027 0.867 0.801 0.897 0.043
    CGFNet[23] VGG16 0.881 0.845 0.912 0.038 0.923 0.906 0.944 0.023 0.883 0.851 0.922 0.035
    CGMDRNet[25] Res2Net-50 0.894 0.840 0.920 0.035 0.931 0.893 0.940 0.020 0.896 0.846 0.928 0.032
    TNet[27] ResNet-50 0.898 0.841 0.919 0.030 0.928 0.889 0.937 0.021 0.894 0.847 0.927 0.033
    MIA_DPD[28] ResNet-50 0.844 - 0.850 0.070 0.924 - 0.926 0.025 0.879 - 0.893 0.040
    MMNet[29] ResNet-50 0.875 0.798 0.893 0.040 0.917 0.863 0.924 0.027 0.864 0.785 0.890 0.043
    CAVER[30] ResNet-50 0.891 0.839 0.919 0.033 0.935 0.903 0.945 0.018 0.891 0.842 0.930 0.032
    CSRNet[31] ESPNet’v2 0.885 0.830 0.908 0.038 0.918 0.877 0.925 0.024 0.868 0.810 0.905 0.042
    ViT-based SwinNet[35] Swin transformer 0.904 0.847 0.926 0.030 0.938 0.896 0.947 0.018 0.912 0.865 0.942 0.026
    HRTransNet[37] HRFormer 0.906 0.853 0.929 0.026 0.938 0.900 0.945 0.017 0.912 0.871 0.945 0.025
    MITF-Net[36] PVT’v2 0.905 0.853 0.927 0.027 0.938 0.906 0.949 0.016 0.910 0.870 0.943 0.025
    Note: ↑ indicates that the larger the indicator, the better, and ↓ indicates that the smaller the indicator, the better. Bold and underline indicate optimal and sub-optimal results, respectively.
    下载: 导出CSV
  • [1]

    XU H, ZHANG H, MA J Y. Classification saliency-based rule for visible and infrared image fusion[J]. IEEE Transactions on Computational Imaging, 2021, 7: 824-836. DOI: 10.1109/TCI.2021.3100986

    [2]

    LI G Y, WANG Y K, LIU Z, et al. RGB-T semantic segmentation with location, activation, and sharpening [J]. IEEE Transactions on Circuits and Systems for Video Technology, 2023, 33(3): 1223-1235. DOI: 10.1109/TCSVT.2022.3208833

    [3] 侯毅苇, 李林汉, 王彦. 结合红外显著性目标导引的改进YOLO网络的智能装备目标识别研究[J]. 红外技术, 2020, 42(7): 644-650. http://hwjs.nvir.cn/article/id/hwjs202007007

    HOU Yiwei, LI Linhan, WANG Yan. Intelligent equipment object recognition based on improved YOLO network guided by infrared saliency detection[J]. Infrared Technology, 2020, 42(7): 644-650. http://hwjs.nvir.cn/article/id/hwjs202007007

    [4]

    Itti L, Koch C, Niebur E. A model of saliency-based visual attention for rapid scene analysis[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1998, 20(11): 1254-1259. DOI: 10.1109/34.730558

    [5]

    LI C L, CHENG H, HU S Y, et al. Learning collaborative sparse representation for grayscale-thermal tracking[J]. IEEE Transactions on Image Processing, 2016, 25(12): 5743-5756. DOI: 10.1109/TIP.2016.2614135

    [6] 张骏, 张鹏, 张政, 等. 类HED网络的热红外图像显著性人体检测深度网络[J]. 红外技术, 2023, 45(6): 649-657. http://hwjs.nvir.cn/article/id/bc2b522e-24dc-4229-8ed3-0b973874e0f4

    ZHANG Jun, ZHANG Peng, ZHANG Zheng, et al. Similar HED-Net for salient human detection in thermal infrared images[J]. Infrared Technology, 2023, 45(6): 649-657. http://hwjs.nvir.cn/article/id/bc2b522e-24dc-4229-8ed3-0b973874e0f4

    [7]

    WANG G Z, LI C L, MA Y P, et al. RGB-T saliency detection benchmark: dataset, baselines, analysis and a novel approach[C]//IGTA 2018: The 13th Academic Conference on Image Graphics Technology and Application, 2018: 359-369.

    [8]

    MA Y, SUN D, MENG Q, et al. Learning multiscale deep features and svm regressors for adaptive RGB-T saliency detection[C]//ISCID 2017: 2017 10th International Symposium on Computational Intelligence and Design, 2017: 389-392.

    [9]

    ZHOU D Y, Weston J, Gretton A, et al. Ranking on data manifolds[C]// NIPS 2003: Advances in Neural Information Processing Systems, 2003: 169-176.

    [10]

    TU Z Z, XIA T, LI C L, et al. M3S-NIR: multi-modal multi-scale noise-insensitive ranking for RGB-T saliency detection[C]// MIPR 2019: 2019 IEEE Conference on Multimedia Information Processing and Retrieval, 2019: 141-146.

    [11]

    HUANG L M, SONG K C, WANG J, et al. Multi-graph fusion and learning for RGBT image saliency detection[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2022, 32(3): 1366-1377. DOI: 10.1109/TCSVT.2021.3069812

    [12]

    HUANG L M, SONG K C, GONG A J, et al. RGB-T saliency detection via low-rank tensor learning and unified collaborative ranking[J]. IEEE Signal Processing Letters, 2020, 27: 1585-1589. DOI: 10.1109/LSP.2020.3020735

    [13] 张冬明, 靳国庆, 代锋, 等. 基于深度融合的显著性目标检测算法[J]. 计算机学报, 2019, 42(9): 2076-2086.

    ZHANG D M, JIN G Q, DAI F. Sailent object detection based on deep fusion of hand-craft features[J]. Chinese Journal of Computers, 2019, 42(9): 2076-2086.

    [14]

    Sandler M, Howard A, ZHU M L, et al. MobileNetV2: inverted residuals and linear bottlenecks[C]// CVPR 2018: Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018: 4510-4520.

    [15]

    TU Z Z, XIA T, LI C L, et al. RGB-t image saliency detection via collaborative graph learning[J]. IEEE Transactions on Multimedia, 2020, 22(1): 160-173. DOI: 10.1109/TMM.2019.2924578

    [16]

    PANG Y, WU H, WU C D. Cross-modal co-feedback cellular automata for RGB-T saliency detection[J]. Pattern Recognition, 2023, 135: 109-138.

    [17]

    LIU Z Y, HUANG X S, ZHANG G H et al. Scribble-supervised RGB-T salient object detection[C]//ICME 2023: Proceedings of the IEEE International Conference on Multimedia and Expo, 2023: 2369-2374.

    [18]

    ZHANG Q, HUANG N C, YAO L, et al. RGB-T salient object detection via fusing multi-level CNN features[J]. IEEE Transactions on Image Processing, 2020, 29: 3321-3335. DOI: 10.1109/TIP.2019.2959253

    [19]

    ZHANG Q, HUANG N C, XIAO T, et al. Revisiting feature fusion for RGB-T salient object detection[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2020, 31(5): 1804-1818.

    [20]

    BI H B, WU R W, LIU Z Q, et al. PSNet: parallel symmetric network for RGB-T salient object detection[J]. Neurocomputing, 2022, 511: 410-425. DOI: 10.1016/j.neucom.2022.09.052

    [21]

    TU Z Z, MA Y, LI Z, et al. RGBT salient object detection: a large-scale dataset and benchmark[J]. IEEE Transactions on Multimedia, 2022, 25: 4163-4176.

    [22]

    TU Z Z, LI Z, LI C L, et al. Multi-interactive dual-decoder for RGB-thermal salient object detection[J]. IEEE Transactions on Image Processing, 2021, 30: 5678-5691. DOI: 10.1109/TIP.2021.3087412

    [23]

    WANG J, SONG K C, BAO Y Q, et al. CGFNet: cross-guided fusion network for RGB-T salient object detection[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2022, 32(5): 2949-2961. DOI: 10.1109/TCSVT.2021.3099120

    [24]

    CHEN Q, LIU Z, ZHANG Y, et al. RGB-D Salient Object Detection via 3D Convolutional Neural Networks[C]// Proceedings of the AAAI Conference on Artificial Intelligence, 2022: 1063-1071.

    [25]

    CHEN G, SHAO F, CHAI X L, et al. CGMDRNet: cross-guided modality difference reduction network for RGB-T salient object detection[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2022, 32(9): 6308-6323. DOI: 10.1109/TCSVT.2022.3166914

    [26]

    LIAO G B, GAO W, LI G, et al. Cross-collaborative fusion-encoder network for robust rgb-thermal salient object detection[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2022, 32(11): 7646-7661. DOI: 10.1109/TCSVT.2022.3184840

    [27]

    CONG R M, ZHANG K P, ZHANG C, et al. Does thermal really always matter for RGB-T salient object detection?[J]. IEEE Transactions on Multimedia, 2022, 25: 1-12.

    [28]

    LIANG Y H, QIN G H, SUN M H, et al. Multi-modal interactive attention and dual progressive decoding network for RGB-D/T salient object detection[J]. Neurocomputing, 2022, 490: 132-145. DOI: 10.1016/j.neucom.2022.03.029

    [29]

    GAO W, LIAO G B, MA S W, et al. Unified information fusion network for multi-modal RGB-D and RGB-T salient object detection[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2022, 32(4): 2091-2106. DOI: 10.1109/TCSVT.2021.3082939

    [30]

    PANG Y W, ZHAO X Q, ZHANG L H, et al. CAVER: cross-modal view-mixed transformer for bi-modal salient object detection[J]. IEEE Transactions on Image Processing, 2023, 32: 892-904.

    [31]

    ZHOU W J, GUO Q L, LEI J S, et al. ECFFNet: effective and consistent feature fusion network for RGB-T salient object detection[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2022, 32(3): 1224-1235. DOI: 10.1109/TCSVT.2021.3077058

    [32]

    ZHOU W J, ZHU Y, LEI J S, et al. LSNet: lightweight spatial boosting network for detecting salient objects in RGB-thermal images[J]. IEEE Transactions on Image Processing, 2023, 32: 1329-1340. DOI: 10.1109/TIP.2023.3242775

    [33]

    Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[C]//NIPS 2017: Advances in Neural Information Processing Systems, 2017: 6000-6010.

    [34]

    WANG W H, XIE E Z, LI X, et al. PVTv2: Improved baselines with pyramid vision transformer[J]. Computational Visual Media, 2021, 8: 415-424.

    [35]

    LIU Z Y, TAN Y C, HE Q, et al. SwinNet: swin transformer drives edge-aware RGB-D and RGB-T salient object detection[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2022, 32(7): 4486-4497. DOI: 10.1109/TCSVT.2021.3127149

    [36]

    CHEN G, SHAO F, CHAI X L, et al. Modality-induced transfer-fusion network for RGB-D and RGB-T salient object detection[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2023, 33(4): 1787-1801.

    [37]

    TANG B, LIU Z Y, TAN Y C, et al. HRTransNet: HRFormer-driven two-modality salient object detection[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2023, 33(2): 728-742.

    [38]

    YUAN Y H, FU R, HUANG L, et al. HRFormer: high-resolution vision transformer for dense predict[C]//NIPS 2021: Advances in Neural Information Processing Systems, Virtual, 2021: 7281-7293.

    [39]

    FAN D P, CHENG M M, LIU Y, et al. Structure-measure: a new way to evaluate foreground maps[C]//ICCV 2017: Proceedings of the 2017 IEEE/CVF International Conference on Computer Vision, 2017: 4558-4567.

    [40]

    FAN D P, GONG C, CAO Y, et al. Enhanced-alignment measure for binary foreground map evaluation[C]//IJCAI 2018: The 27th International Joint Conference on Artificial Intelligence, 2018: 698-704.

    [41]

    YAN Q, XU L, SHI J P, et al. Hierarchical saliency detection[C]//CVPR 2013: Proceedings of the 2013 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2013: 1155-1162.

    [42]

    LIN Y, HOU X D, Koch C, et al. The secrets of salient object segmentation[C]//CVPR 2014: Proceedings of the 2014 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2014: 280-287.

图(5)  /  表(3)
计量
  • 文章访问数:  192
  • HTML全文浏览量:  15
  • PDF下载量:  79
  • 被引次数: 0
出版历程
  • 收稿日期:  2023-10-31
  • 修回日期:  2024-01-18
  • 刊出日期:  2025-01-19

目录

    /

    返回文章
    返回