基于RGB-T图像的双流残差扩张网络人群计数算法

杨佩龙; 陈树越; 杨尚瑜; 王佳宏

基于RGB-T图像的双流残差扩张网络人群计数算法

常州大学计算机与人工智能学院, 江苏常州 213164

基金项目:

江苏省关键研究与发展计划项目 BE2021012-5

详细信息

作者简介:
杨佩龙（1997-），男，硕士，主要研究方向为计算机视觉。E-mail: 2247291086@qq.com

通讯作者:
陈树越（1963-），男，教授，主要研究方向为计算机视觉与检测技术。E-mail：csyue2000@163.com

中图分类号: TP391
计量
- 文章访问数: 125
- HTML全文浏览量: 80
- PDF下载量: 23
出版历程
- 收稿日期: 2022-07-12
- 修回日期: 2022-09-12
- 刊出日期: 2023-11-19

Two-Stream Residual Dilation Network Algorithm for Crowd Counting Based on RGB-T Images

School of Computer and Artificial Intelligence, Changzhou University, Changzhou 213164, China

摘要

摘要: 在人群计数中，针对尺度变化、行人分布不均以及夜间较差成像条件，提出了一种基于RGB-T（RGB-Thermal）图像的多模态人群计数算法，称为双流残差扩张网络，它由前端特征提取网络、多尺度的残差扩张卷积模块和全局注意力模块所构成。其中，前端网络用来提取RGB特征和热特征，扩张卷积模块进一步提取不同尺度的行人特征信息，全局注意力模块用来建立全局特征之间的依赖关系。此外，还引入了一种新的多尺度差异性损失，以提高网络的计数性能。为评估该方法，在RGBT-CC（RGBT Crowd Counting）数据集和DroneRGBT数据集上进行了对比实验。实验结果表明，在RGBT-CC数据集上与CMCRL（Cross-modal Collaborative Representation Learning）算法相比该算法的GAME(0)（Grid Average Mean absolute Errors）和RMSE（Root Mean Squared Error）分别降低了0.8和3.49，在DroneRGBT数据集上与MMCCN（Multi-Modal Crowd Counting Network）算法比分别降低了0.34和0.17，表明具有较好的计数性能。
- 人群计数 /
- RGB-T图像 /
- 扩张卷积 /
- 全局注意力 /
- 多尺度差异性损失
Abstract: We proposed a multimodal crowd counting algorithm based on RGB-Thermal (RGB-T) images (two-stream residual expansion network) in crowd counting, given scale changes, uneven pedestrian distribution, and poor imaging conditions at night. It has a front-end feature extraction network, multi-scale residual dilation convolution, and global attention modules. We used the front-end network to extract RGB and thermal features, and the dilated convolution module further extracted pedestrian feature information at different scales and used the global attention module to establish dependencies between global features. We also introduced a new multi-scale dissimilarity loss method to improve the counting performance of the network and conducted comparative experiments on the RGBT crowd counting (RGBT-CC) and DroneRGBT datasets to evaluate the method. Experimental results showed that compared with the cross-modal collaborative representation learning (CMCRL) algorithm on the RGBT-CC dataset, the grid average mean absolute error (GAME (0)) and root mean squared error (RMSE) of this algorithm are reduced by 0.8 and 3.49, respectively. On the DroneRGBT dataset, the algorithm are reduced by 0.34 and 0.17, respectively, compared to the multimodal crowd counting network (MMCCN) algorithm, indicating better counting performance.
- crowd counting /
- RGB-T images /
- dilated convolution /
- global attention /
- multi-scale disparity loss

HTML全文

图 1 不同光线下的RGB图像和热图像

Figure 1. RGB images and thermal images under different lights

下载: 全尺寸图片幻灯片

图 2 TSRDNet网络的结构图。TSRDNet包括两个VGG-19前12层组成的前端网络、4个残差扩张卷积模块（RDCB）、4个全局注意力模块和一个用于回归密度图的卷积层

Figure 2. Structure diagram of the TSRDNet network. TSRDNet consists of two VGG-19 front-end networks consisting of the first 12 layers, 4 residual dilated convolution modules (RDCB), 4 global attention modules and a convolutional layer for regressing density maps

下载: 全尺寸图片幻灯片

图 3 不同扩张率的扩张卷积模型

Figure 3. Dilated convolution models with different expansion rates

下载: 全尺寸图片幻灯片

图 4 全局注意力机制的结构图

Figure 4. Structure diagram of the global attention mechanism

下载: 全尺寸图片幻灯片

图 5 通道注意子模块结构图

Figure 5. Channel attention sub-module structure diagram

下载: 全尺寸图片幻灯片

图 6 空间注意力子模块结构图

Figure 6. Structure diagram of spatial attention sub-module

下载: 全尺寸图片幻灯片

图 7 本文算法的部分测试结果

Figure 7. Part of the test results of the method in this paper

下载: 全尺寸图片幻灯片

图 8 本文算法的部分测试结果图。第一列和第二列分别代表RGB图像和热图像，第三列是对应的人群密度图真值，第四列是本文方法的预测值

Figure 8. Part of the test results of the method in this paper. The first and second columns represent the RGB image and thermal image, respectively, the third column is the corresponding ground-truth crowd density map, and the fourth column is the predicted value of our method

下载: 全尺寸图片幻灯片

图 9 消融实验结果对比图

Figure 9. Comparison of the results of ablation experiments

下载: 全尺寸图片幻灯片

图 10 参数λ的消融实验的结果对比图

Figure 10. Comparison of the results of the ablation experiment of the parameter λ

下载: 全尺寸图片幻灯片

表 1 在RGBT-CC数据集上的对比实验结果

Table 1 Comparative experimental results on the RGBT-CC dataset

Methods	GAME(0)	GAME(1)	GAME(2)	GAME(3)	RMSE
UCNet^[20]	33.96	42.42	53.06	65.07	56.31
HDFNet^[21]	22.36	27.79	33.68	42.48	33.93
MCNN^[22]	21.89	25.70	30.22	37.19	37.44
SANet^[23]	21.99	24.76	28.52	34.25	41.60
CSRNet^[18]	20.4	23.58	28.03	35.51	35.26
BBSNet^[24]	19.56	25.07	31.25	39.24	32.48
MVMS^[25]	19.97	25.10	31.02	38.91	33.97
BL^[26]	18.70	22.55	26.83	34.62	32.67
CMCRL^[13]	15.61	19.95	24.69	32.89	28.18
TSRDNet (Ours)	14.81	18.77	23.04	28.76	24.69

下载: 导出CSV

表 2 在RGBT-CC数据上不同光照环境下的对比实验结果

Table 2 Comparative experimental results under different lighting environments on RGBT-CC data

Illumination	Methods	GAME(0)	GAME(1)	GAME(2)	GAME(3)	RMSE
Brightness	CMCRL^[13]	20.36	23.57	28.49	36.29	32.57
Brightness	TSRDNet(ours)	16.08	21.12	27.12	33.85	27.72
Darkness	CMCRL^[13]	15.44	19.23	23.79	30.28	29.11
Darkness	TSRDNet(ours)	14.92	18.81	22.41	29.83	27.07

下载: 导出CSV

表 3 在DroneRGBT数据集上的对比实验结果

Table 3 Comparative experimental results on the DroneRGBT dataset

Methods	GAME(0)	RMSE
MCNN^[22]	13.64	19.77
MSCNN^[27]	14.89	20.41
ACSCP^[28]	13.06	20.29
SANet^[23]	12.13	17.52
SANet++^[10]	11.68	17.07
CSRNet ^[18]	8.91	13.80
BL^[26]	7.41	11.56
CMCRL^[13]	7.35	11.84
MMCCN^[14]	7.27	11.45
TSRDNet (ours)	6.93	11.28

下载: 导出CSV

表 4 全局注意力模块和CBAM的对比实验

Table 4 Comparative experiments of global attention module and CBAM

Evaluation index Attention module	Global attention module	CBAM^[19]
GAME(0)	14.81	15.31
GAME(1)	18.77	21.72
GAME(2)	23.04	28.20
GAME(3)	28.76	35.42
RMSE	24.69	27.29

下载: 导出CSV

参考文献(28)

[1]	张宇倩, 李国辉, 雷军, 等. FF-CAM: 基于通道注意机制前后端融合的人群计数[J]. 计算机学报, 2021, 44(2): 304-317. https://www.cnki.com.cn/Article/CJFDTOTAL-JSJX202102004.htm ZHANG Yuqian, LI Guohui, LEI Jun, et al. FF-CAM: crowd counting based on front-end and back-end fusion of channel attention mechanism [J]. Journal of Computer Science, 2021, 44(2): 304-317. https://www.cnki.com.cn/Article/CJFDTOTAL-JSJX202102004.htm
[2]	YANG Z, WEN J, HUANG K. A method of pedestrian flow monitoring based on received signal strength[J]. EURASIP Journal on Wireless Communications and Networking, 2022, 2022(1): 1-17. DOI: 10.1186/s13638-021-02080-5
[3]	王曲, 赵炜琪, 罗海勇, 等. 人群行为分析研究综述[J]. 计算机辅助设计与图形学学报, 2018, 30(12): 2353-2365. https://www.cnki.com.cn/Article/CJFDTOTAL-JSJF201812018.htm WANG Qu, ZHAO Weiqi, LUO Haiyong, et al. Review of research on crowd behavior analysis[J]. Journal of Computer-Aided Design and Graphics, 2018, 30(12): 2353-2365. https://www.cnki.com.cn/Article/CJFDTOTAL-JSJF201812018.htm
[4]	蒋一, 侯丽萍, 张强. 基于改进空时双流网络的红外行人动作识别研究[J]. 红外技术, 2021, 43(9): 852-860. http://hwjs.nvir.cn/article/id/f44f08d7-9ff9-413b-938d-de049d8dc5a2 JIANG Yi, HOU Liping, ZHANG Qiang. Research on infrared pedestrian action recognition based on improved space-time dual-stream network [J]. Infrared Technology, 2021, 43(9): 852-860. http://hwjs.nvir.cn/article/id/f44f08d7-9ff9-413b-938d-de049d8dc5a2
[5]	赵才荣, 齐鼎, 窦曙光, 等. 智能视频监控关键技术: 行人再识别研究综述[J]. 中国科学: 信息科学, 2021, 51(12): 1979-2015. https://www.cnki.com.cn/Article/CJFDTOTAL-PZKX202112002.htm ZHAO Cairong, QI Ding, DOU Shuguang, et al. Key technologies for intelligent video surveillance: A review of pedestrian re-identification research [J]. Science in China: Information Science, 2021, 51(12): 1979-2015. https://www.cnki.com.cn/Article/CJFDTOTAL-PZKX202112002.htm
[6]	Enzweiler M, Gavrila D M. Monocular pedestrian detection: Survey and experiments[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008, 31(12): 2179-2195.
[7]	LI M, ZHANG Z, HUANG K, et al. Estimating the number of people in crowded scenes by mid based foreground segmentation and head-shoulder detection[C]//2008 19th International Conference on Pattern Recognition, 2008: 1-4.
[8]	CHEN K, Loy C C, GONG S, et al. Feature mining for localised crowd counting[C]//BMVC, 2012: 3-12.
[9]	Pham V Q, Kozakaya T, Yamaguchi O, et al. Count forest: Co-voting uncertain number of targets using random forest for crowd density estimation[C]//Proceedings of the IEEE International Conference on Computer Vision, 2015: 3253-3261.
[10]	PAN S, ZHAO Y, SU F, et al. SANet++: enhanced scale aggregation with densely connected feature fusion for crowd counting[C]//ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021: 1980-1984.
[11]	吴奇元, 王晓东, 章联军, 等. 融合注意力机制与上下文密度图的人群计数网络[J]. 计算机工程, 2022, 48(5): 235-241, 250. https://www.cnki.com.cn/Article/CJFDTOTAL-JSJC202205031.htm WU Qiyuan, WANG Xiaodong, ZHANG Lianjun, et al. Crowd counting network integrating attention mechanism and context density map [J]. Computer Engineering, 2022, 48(5): 235-241, 250. https://www.cnki.com.cn/Article/CJFDTOTAL-JSJC202205031.htm
[12]	TANG H, WANG Y, CHAU L-P. TAFNet: a three-stream adaptive fusion network for RGB-T crowd counting[J/OL]. arXiv preprint arXiv: 2202.08517, 2022. https://doi.org/10.48550/arXiv.2202.08517.
[13]	LIU L, CHEN J, WU H, et al. Cross-modal collaborative representation learning and a large-scale rgbt benchmark for crowd counting[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 4823-4833.
[14]	PENG T, LI Q, ZHU P. RGB-T crowd counting from drone: a benchmark and MMCCN network[C]//Computer Vision – ACCV 2020, 2021: 497-513.
[15]	Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition[C]//International Conference on Learning Representations (ICLR), 2014: 1-14.
[16]	HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 770-778.
[17]	DAI F, LIU H, MA Y, et al. Dense scale network for crowd counting[C]//Proceedings of the 2021 International Conference on Multimedia Retrieval, 2021: 64-72.
[18]	LI Y, ZHANG X, CHEN D. Csrnet: Dilated convolutional neural networks for understanding the highly congested scenes[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018: 1091-1100.
[19]	Woo S, Park J, Lee J Y, et al. Cbam: Convolutional block attention module[C]//Proceedings of the European Conference on Computer Vision (ECCV), 2018: 3-19.
[20]	ZHANG J, FAN D P, DAI Y, et al. UC-Net: uncertainty inspired RGB-D saliency detection via conditional variational autoencoders[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 8582-8591.
[21]	PANG Y, ZHANG L, ZHAO X, et al. Hierarchical dynamic filtering network for rgb-d salient object detection[C]//European Conference on Computer Vision, 2020: 235-252.
[22]	ZHANG Y, ZHOU D, CHEN S, et al. Single-image crowd counting via multi-column convolutional neural network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 589-597.
[23]	CAO X, WANG Z, ZHAO Y, et al. Scale aggregation network for accurate and efficient crowd counting[C]//Proceedings of the European Conference on Computer Vision (ECCV), 2018: 734-750.
[24]	FAN D P, ZHAI Y, Borji A, et al. BBS-Net: RGB-D salient object detection with a bifurcated backbone strategy network[C]//European Conference on Computer Vision, 2020: 275-292.
[25]	ZHANG Q, CHAN A B. Wide-area crowd counting via ground-plane density maps and multi-view fusion cnns[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 8297-8306.
[26]	MA Z, WEI X, HONG X, et al. Bayesian loss for crowd count estimation with point supervision[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019: 6142-6151.
[27]	ZENG L, XU X, CAI B, et al. Multi-scale convolutional neural networks for crowd counting[C]//IEEE International Conference on Image Processing (ICIP), 2017: 465-469.
[28]	SHEN Z, XU Y, NI B, et al. Crowd counting via adversarial cross-scale consistency pursuit[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018: 5245-5254.

施引文献(16)

期刊类型引用(8)

1.	盛雨婷. 雾天环境下前车车距测量方法研究. 山东科学. 2024(01): 88-94 . 百度学术
2.	张莉婷，陈顺吉. 一种改进的无人机影像去雾算法研究. 地矿测绘. 2024(03): 13-17+36 . 百度学术
3.	廖章回，姜闯. 高分辨率遥感影像快速去雾. 测绘学报. 2022(03): 446-456 . 百度学术
4.	王鹏飞，李强，冯谦，贾爱琴. 动态不良天候图像去雾霾增强方法. 光学与光电技术. 2022(06): 36-44 . 百度学术
5.	陈志恒，严利民，张竞阳. 采用自适应全局亮度补偿的夜间去雾算法. 红外技术. 2021(10): 954-959 . 本站查看
6.	刘天时，惠霄霄. 基于暗通道的图像去雾改进算法研究. 智能计算机与应用. 2018(01): 36-39+45 . 百度学术
7.	惠霄霄. 航拍图像去雾算法研究. 信息技术与信息化. 2017(10): 61-63 . 百度学术
8.	吴绍启，唐宁，张子方，赵鹏. 一种视频图像的去雾算法. 桂林电子科技大学学报. 2016(05): 364-368 . 百度学术