RGB-T显著性目标检测综述

吴锦涛; 王安志; 任春洪

RGB-T显著性目标检测综述

贵州师范大学大数据与计算机科学学院, 贵州贵阳 550000

基金项目:

国家自然科学基金地区基金项目 62162013

贵州师范大学学术新苗基金项目黔师新苗[2022]30号

详细信息

作者简介:
吴锦涛（2000-），男，浙江宁波人，硕士研究生，研究方向：显著性目标检测。E-mail：bigdatawujitnao@163.com

通讯作者:
王安志（1986-），男，贵州铜仁人，副教授，研究方向：人工智能、计算机视觉等。E-mail：cvmll6102@163.com

中图分类号: TP391
计量
- 文章访问数: 192
- HTML全文浏览量: 15
- PDF下载量: 79
出版历程
- 收稿日期: 2023-10-31
- 修回日期: 2024-01-18
- 刊出日期: 2025-01-19

RGB-T Salient Object Detection: A Survey

School of Big Data and Computer Science, Guizhou Normal University, Guiyang 550000, China

摘要

摘要:
除RGB图像外，热红外图像也能提取出对显著性目标检测至关重要的显著性信息。热红外图像随着红外传感设备的发展和普及已经变得易于获取，RGB-T显著性目标检测已成为了热门研究领域，但目前仍缺少对现有方法全面的综述。首先介绍了基于机器学习的RGB-T显著性目标检测方法，然后着重介绍了两类基于深度学习的RGB-T显著性目标检测方法：基于卷积神经网络和基于Vision Transformer的方法。随后对相关数据集和评价指标进行介绍，并在这些数据集上对代表性的方法进行了定性和定量的比较分析。最后对RGB-T显著性目标检测面临的挑战及未来的发展方向进行了总结与展望。
- 显著性目标检测 /
- 热红外图像 /
- RGB-T显著性目标检测 /
- 深度学习
Abstract:
In addition to RGB images, thermal IR images can be used to extract salient information, which is crucial for salient object detection. With the development and popularization of IR sensing equipment, thermal IR images have become readily available, and RGB-T salient object detection has become a popular research topic. However, there is currently a lack of comprehensive surveys on the existing methods. First, we briefly introduce machine learning-based RGB-T salient object detection methods and then focus on two types of deep learning methods based on CNNs and vision transformers. Subsequently, relevant datasets and evaluation metrics are introduced, and both qualitative and quantitative comparative analyses are conducted on representative methods using these datasets. Finally, challenges and future development directions for RGB-T salient object detection are summarized and discussed.
- salient object detection /
- infrared image /
- RGB-T salient object detection /
- deep learning

HTML全文

图 1 RGB-T显著性目标检测的分类

Figure 1. Classification of RGB-T salient object detection

下载: 全尺寸图片幻灯片

图 2 RGB-T显著性目标检测的发展历程

Figure 2. The development of RGB-T saliency target detection

下载: 全尺寸图片幻灯片

图 3 不同质量的输入及其显著性预测

Figure 3. Different quality inputs and their salient prediction

下载: 全尺寸图片幻灯片

图 4 RGB-T显著性目标检测方法的可视化比较

Figure 4. Visual comparison of RGB-T salient object detection methods

下载: 全尺寸图片幻灯片

图 5 RGB-T显著性目标检测面临的挑战

Figure 5. The challenges faced by RGB-T salient object detection

下载: 全尺寸图片幻灯片

表 1 RGB-T显著性目标检测数据集

Table 1 The RGB-T salient object detection datasets

Name	Year	Scales	Camera equipment	Disadvantage
VT821	2018	821	FLIR A310、SONY TD-2073	1. Simple scenes that lack complexity and variety. 2. The camera uses different parameters when capturing RGB and thermal images. 3. Additional whitespace is introduced when aligning images.
VT1000	2019	1000	FLIR SC620	1. There are potential errors as the images are aligned manually. 2. Limited scenario complexity and diversity.
VT5000	2020	5000	FLIR T640、FLIR T610	1. Images are affected by thermal crossover, making detection challenging.

下载: 导出CSV

表 2 基于机器学习的RGB-T显著性目标检测方法定量比较

Table 2 Quantitative comparison of machine learning-based RGB-T salient object detection methods

Algorithms	VT821				VT1000				VT5000
Algorithms	S↑	F↑	E↑	MAE↓	S↑	F↑	E↑	MAE↓	S↑	F↑	E↑	MAE↓
MTMR^[7]	0.725	0.662	0.815	0.108	0.706	0.715	0.836	0.119	0.680	0.595	0.795	0.114
N3S-NIR^[10]	0.723	0.734	0.859	0.140	0.726	0.717	0.827	0.145	0.652	0.575	0.780	0.168
LTCR^[11]	0.762	0.737	0.854	0.088	0.799	0.794	0.872	0.084	-
MGFL^[12]	0.782	0.725	0.841	0.071	0.820	0.801	0.882	0.066	0.751	0.661	0.817	0.085
Note: ↑ indicates that the larger the indicator, the better, and ↓ indicates that the smaller the indicator, the better. Bold and underline indicate optimal and sub-optimal results, respectively.

下载: 导出CSV

表 3 基于深度学习的RGB-T显著性目标检测方法定量比较

Table 3 Quantitative comparison of deep learning-based RGB-T salient object detection methods

Methods	Algorithms	Backbone	VT821				VT1000				VT5000
Methods	Algorithms	Backbone	S↑	F↑	E↑	MAE↓	S↑	F↑	E↑	MAE↓	S↑	F↑	E↑	MAE↓
CNN-based	FMCF^[8]	VGG16	0.760	0.640	0.796	0.080	0.873	0.823	0.921	0.037	0.814	0.734	0.864	0.055
	SGDL^[15]	VGG19	0.765	0.730	0.847	0.085	0.787	0.764	0.856	0.090	0.750	0.672	0.824	0.089
	ADFNet^[21]	VGG16	0.810	0.716	0.842	0.077	0.910	0.847	0.921	0.034	0.863	0.778	0.891	0.048
	MIDD^[22]	VGG16	0.871	0.804	0.895	0.045	0.915	0.882	0.933	0.027	0.867	0.801	0.897	0.043
	CGFNet^[23]	VGG16	0.881	0.845	0.912	0.038	0.923	0.906	0.944	0.023	0.883	0.851	0.922	0.035
	CGMDRNet^[25]	Res2Net-50	0.894	0.840	0.920	0.035	0.931	0.893	0.940	0.020	0.896	0.846	0.928	0.032
	TNet^[27]	ResNet-50	0.898	0.841	0.919	0.030	0.928	0.889	0.937	0.021	0.894	0.847	0.927	0.033
	MIA_DPD^[28]	ResNet-50	0.844	-	0.850	0.070	0.924	-	0.926	0.025	0.879	-	0.893	0.040
	MMNet^[29]	ResNet-50	0.875	0.798	0.893	0.040	0.917	0.863	0.924	0.027	0.864	0.785	0.890	0.043
	CAVER^[30]	ResNet-50	0.891	0.839	0.919	0.033	0.935	0.903	0.945	0.018	0.891	0.842	0.930	0.032
	CSRNet^[31]	ESPNet’v2	0.885	0.830	0.908	0.038	0.918	0.877	0.925	0.024	0.868	0.810	0.905	0.042
ViT-based	SwinNet^[35]	Swin transformer	0.904	0.847	0.926	0.030	0.938	0.896	0.947	0.018	0.912	0.865	0.942	0.026
	HRTransNet^[37]	HRFormer	0.906	0.853	0.929	0.026	0.938	0.900	0.945	0.017	0.912	0.871	0.945	0.025
	MITF-Net^[36]	PVT’v2	0.905	0.853	0.927	0.027	0.938	0.906	0.949	0.016	0.910	0.870	0.943	0.025
Note: ↑ indicates that the larger the indicator, the better, and ↓ indicates that the smaller the indicator, the better. Bold and underline indicate optimal and sub-optimal results, respectively.

下载: 导出CSV

参考文献(42)

[1]	XU H, ZHANG H, MA J Y. Classification saliency-based rule for visible and infrared image fusion[J]. IEEE Transactions on Computational Imaging, 2021, 7: 824-836. DOI: 10.1109/TCI.2021.3100986
[2]	LI G Y, WANG Y K, LIU Z, et al. RGB-T semantic segmentation with location, activation, and sharpening [J]. IEEE Transactions on Circuits and Systems for Video Technology, 2023, 33(3): 1223-1235. DOI: 10.1109/TCSVT.2022.3208833
[3]	侯毅苇, 李林汉, 王彦. 结合红外显著性目标导引的改进YOLO网络的智能装备目标识别研究[J]. 红外技术, 2020, 42(7): 644-650. http://hwjs.nvir.cn/article/id/hwjs202007007 HOU Yiwei, LI Linhan, WANG Yan. Intelligent equipment object recognition based on improved YOLO network guided by infrared saliency detection[J]. Infrared Technology, 2020, 42(7): 644-650. http://hwjs.nvir.cn/article/id/hwjs202007007
[4]	Itti L, Koch C, Niebur E. A model of saliency-based visual attention for rapid scene analysis[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1998, 20(11): 1254-1259. DOI: 10.1109/34.730558
[5]	LI C L, CHENG H, HU S Y, et al. Learning collaborative sparse representation for grayscale-thermal tracking[J]. IEEE Transactions on Image Processing, 2016, 25(12): 5743-5756. DOI: 10.1109/TIP.2016.2614135
[6]	张骏, 张鹏, 张政, 等. 类HED网络的热红外图像显著性人体检测深度网络[J]. 红外技术, 2023, 45(6): 649-657. http://hwjs.nvir.cn/article/id/bc2b522e-24dc-4229-8ed3-0b973874e0f4 ZHANG Jun, ZHANG Peng, ZHANG Zheng, et al. Similar HED-Net for salient human detection in thermal infrared images[J]. Infrared Technology, 2023, 45(6): 649-657. http://hwjs.nvir.cn/article/id/bc2b522e-24dc-4229-8ed3-0b973874e0f4
[7]	WANG G Z, LI C L, MA Y P, et al. RGB-T saliency detection benchmark: dataset, baselines, analysis and a novel approach[C]//IGTA 2018: The 13th Academic Conference on Image Graphics Technology and Application, 2018: 359-369.
[8]	MA Y, SUN D, MENG Q, et al. Learning multiscale deep features and svm regressors for adaptive RGB-T saliency detection[C]//ISCID 2017: 2017 10th International Symposium on Computational Intelligence and Design, 2017: 389-392.
[9]	ZHOU D Y, Weston J, Gretton A, et al. Ranking on data manifolds[C]// NIPS 2003: Advances in Neural Information Processing Systems, 2003: 169-176.
[10]	TU Z Z, XIA T, LI C L, et al. M3S-NIR: multi-modal multi-scale noise-insensitive ranking for RGB-T saliency detection[C]// MIPR 2019: 2019 IEEE Conference on Multimedia Information Processing and Retrieval, 2019: 141-146.
[11]	HUANG L M, SONG K C, WANG J, et al. Multi-graph fusion and learning for RGBT image saliency detection[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2022, 32(3): 1366-1377. DOI: 10.1109/TCSVT.2021.3069812
[12]	HUANG L M, SONG K C, GONG A J, et al. RGB-T saliency detection via low-rank tensor learning and unified collaborative ranking[J]. IEEE Signal Processing Letters, 2020, 27: 1585-1589. DOI: 10.1109/LSP.2020.3020735
[13]	张冬明, 靳国庆, 代锋, 等. 基于深度融合的显著性目标检测算法[J]. 计算机学报, 2019, 42(9): 2076-2086. ZHANG D M, JIN G Q, DAI F. Sailent object detection based on deep fusion of hand-craft features[J]. Chinese Journal of Computers, 2019, 42(9): 2076-2086.
[14]	Sandler M, Howard A, ZHU M L, et al. MobileNetV2: inverted residuals and linear bottlenecks[C]// CVPR 2018: Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018: 4510-4520.
[15]	TU Z Z, XIA T, LI C L, et al. RGB-t image saliency detection via collaborative graph learning[J]. IEEE Transactions on Multimedia, 2020, 22(1): 160-173. DOI: 10.1109/TMM.2019.2924578
[16]	PANG Y, WU H, WU C D. Cross-modal co-feedback cellular automata for RGB-T saliency detection[J]. Pattern Recognition, 2023, 135: 109-138.
[17]	LIU Z Y, HUANG X S, ZHANG G H et al. Scribble-supervised RGB-T salient object detection[C]//ICME 2023: Proceedings of the IEEE International Conference on Multimedia and Expo, 2023: 2369-2374.
[18]	ZHANG Q, HUANG N C, YAO L, et al. RGB-T salient object detection via fusing multi-level CNN features[J]. IEEE Transactions on Image Processing, 2020, 29: 3321-3335. DOI: 10.1109/TIP.2019.2959253
[19]	ZHANG Q, HUANG N C, XIAO T, et al. Revisiting feature fusion for RGB-T salient object detection[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2020, 31(5): 1804-1818.
[20]	BI H B, WU R W, LIU Z Q, et al. PSNet: parallel symmetric network for RGB-T salient object detection[J]. Neurocomputing, 2022, 511: 410-425. DOI: 10.1016/j.neucom.2022.09.052
[21]	TU Z Z, MA Y, LI Z, et al. RGBT salient object detection: a large-scale dataset and benchmark[J]. IEEE Transactions on Multimedia, 2022, 25: 4163-4176.
[22]	TU Z Z, LI Z, LI C L, et al. Multi-interactive dual-decoder for RGB-thermal salient object detection[J]. IEEE Transactions on Image Processing, 2021, 30: 5678-5691. DOI: 10.1109/TIP.2021.3087412
[23]	WANG J, SONG K C, BAO Y Q, et al. CGFNet: cross-guided fusion network for RGB-T salient object detection[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2022, 32(5): 2949-2961. DOI: 10.1109/TCSVT.2021.3099120
[24]	CHEN Q, LIU Z, ZHANG Y, et al. RGB-D Salient Object Detection via 3D Convolutional Neural Networks[C]// Proceedings of the AAAI Conference on Artificial Intelligence, 2022: 1063-1071.
[25]	CHEN G, SHAO F, CHAI X L, et al. CGMDRNet: cross-guided modality difference reduction network for RGB-T salient object detection[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2022, 32(9): 6308-6323. DOI: 10.1109/TCSVT.2022.3166914
[26]	LIAO G B, GAO W, LI G, et al. Cross-collaborative fusion-encoder network for robust rgb-thermal salient object detection[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2022, 32(11): 7646-7661. DOI: 10.1109/TCSVT.2022.3184840
[27]	CONG R M, ZHANG K P, ZHANG C, et al. Does thermal really always matter for RGB-T salient object detection?[J]. IEEE Transactions on Multimedia, 2022, 25: 1-12.
[28]	LIANG Y H, QIN G H, SUN M H, et al. Multi-modal interactive attention and dual progressive decoding network for RGB-D/T salient object detection[J]. Neurocomputing, 2022, 490: 132-145. DOI: 10.1016/j.neucom.2022.03.029
[29]	GAO W, LIAO G B, MA S W, et al. Unified information fusion network for multi-modal RGB-D and RGB-T salient object detection[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2022, 32(4): 2091-2106. DOI: 10.1109/TCSVT.2021.3082939
[30]	PANG Y W, ZHAO X Q, ZHANG L H, et al. CAVER: cross-modal view-mixed transformer for bi-modal salient object detection[J]. IEEE Transactions on Image Processing, 2023, 32: 892-904.
[31]	ZHOU W J, GUO Q L, LEI J S, et al. ECFFNet: effective and consistent feature fusion network for RGB-T salient object detection[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2022, 32(3): 1224-1235. DOI: 10.1109/TCSVT.2021.3077058
[32]	ZHOU W J, ZHU Y, LEI J S, et al. LSNet: lightweight spatial boosting network for detecting salient objects in RGB-thermal images[J]. IEEE Transactions on Image Processing, 2023, 32: 1329-1340. DOI: 10.1109/TIP.2023.3242775
[33]	Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[C]//NIPS 2017: Advances in Neural Information Processing Systems, 2017: 6000-6010.
[34]	WANG W H, XIE E Z, LI X, et al. PVTv2: Improved baselines with pyramid vision transformer[J]. Computational Visual Media, 2021, 8: 415-424.
[35]	LIU Z Y, TAN Y C, HE Q, et al. SwinNet: swin transformer drives edge-aware RGB-D and RGB-T salient object detection[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2022, 32(7): 4486-4497. DOI: 10.1109/TCSVT.2021.3127149
[36]	CHEN G, SHAO F, CHAI X L, et al. Modality-induced transfer-fusion network for RGB-D and RGB-T salient object detection[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2023, 33(4): 1787-1801.
[37]	TANG B, LIU Z Y, TAN Y C, et al. HRTransNet: HRFormer-driven two-modality salient object detection[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2023, 33(2): 728-742.
[38]	YUAN Y H, FU R, HUANG L, et al. HRFormer: high-resolution vision transformer for dense predict[C]//NIPS 2021: Advances in Neural Information Processing Systems, Virtual, 2021: 7281-7293.
[39]	FAN D P, CHENG M M, LIU Y, et al. Structure-measure: a new way to evaluate foreground maps[C]//ICCV 2017: Proceedings of the 2017 IEEE/CVF International Conference on Computer Vision, 2017: 4558-4567.
[40]	FAN D P, GONG C, CAO Y, et al. Enhanced-alignment measure for binary foreground map evaluation[C]//IJCAI 2018: The 27th International Joint Conference on Artificial Intelligence, 2018: 698-704.
[41]	YAN Q, XU L, SHI J P, et al. Hierarchical saliency detection[C]//CVPR 2013: Proceedings of the 2013 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2013: 1155-1162.
[42]	LIN Y, HOU X D, Koch C, et al. The secrets of salient object segmentation[C]//CVPR 2014: Proceedings of the 2014 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2014: 280-287.

施引文献

资源附件(0)

图(5) / 表(3)

计量

文章访问数: 192
HTML全文浏览量: 15
PDF下载量: 79
被引次数: 0

RGB-T显著性目标检测综述

作者简介: 吴锦涛（2000-），男，浙江宁波人，硕士研究生，研究方向：显著性目标检测。E-mail：bigdatawujitnao@163.com

通讯作者: 王安志（1986-），男，贵州铜仁人，副教授，研究方向：人工智能、计算机视觉等。E-mail：cvmll6102@163.com

计量

出版历程

RGB-T Salient Object Detection: A Survey

计量

出版历程

目录

作者简介:
吴锦涛（2000-），男，浙江宁波人，硕士研究生，研究方向：显著性目标检测。E-mail：bigdatawujitnao@163.com

通讯作者:
王安志（1986-），男，贵州铜仁人，副教授，研究方向：人工智能、计算机视觉等。E-mail：cvmll6102@163.com