Two-Stream Residual Dilation Network Algorithm for Crowd Counting Based on RGB-T Images
-
摘要: 在人群计数中,针对尺度变化、行人分布不均以及夜间较差成像条件,提出了一种基于RGB-T(RGB-Thermal)图像的多模态人群计数算法,称为双流残差扩张网络,它由前端特征提取网络、多尺度的残差扩张卷积模块和全局注意力模块所构成。其中,前端网络用来提取RGB特征和热特征,扩张卷积模块进一步提取不同尺度的行人特征信息,全局注意力模块用来建立全局特征之间的依赖关系。此外,还引入了一种新的多尺度差异性损失,以提高网络的计数性能。为评估该方法,在RGBT-CC(RGBT Crowd Counting)数据集和DroneRGBT数据集上进行了对比实验。实验结果表明,在RGBT-CC数据集上与CMCRL(Cross-modal Collaborative Representation Learning)算法相比该算法的GAME(0)(Grid Average Mean absolute Errors)和RMSE(Root Mean Squared Error)分别降低了0.8和3.49,在DroneRGBT数据集上与MMCCN(Multi-Modal Crowd Counting Network)算法比分别降低了0.34和0.17,表明具有较好的计数性能。Abstract: We proposed a multimodal crowd counting algorithm based on RGB-Thermal (RGB-T) images (two-stream residual expansion network) in crowd counting, given scale changes, uneven pedestrian distribution, and poor imaging conditions at night. It has a front-end feature extraction network, multi-scale residual dilation convolution, and global attention modules. We used the front-end network to extract RGB and thermal features, and the dilated convolution module further extracted pedestrian feature information at different scales and used the global attention module to establish dependencies between global features. We also introduced a new multi-scale dissimilarity loss method to improve the counting performance of the network and conducted comparative experiments on the RGBT crowd counting (RGBT-CC) and DroneRGBT datasets to evaluate the method. Experimental results showed that compared with the cross-modal collaborative representation learning (CMCRL) algorithm on the RGBT-CC dataset, the grid average mean absolute error (GAME (0)) and root mean squared error (RMSE) of this algorithm are reduced by 0.8 and 3.49, respectively. On the DroneRGBT dataset, the algorithm are reduced by 0.34 and 0.17, respectively, compared to the multimodal crowd counting network (MMCCN) algorithm, indicating better counting performance.
-
Keywords:
- crowd counting /
- RGB-T images /
- dilated convolution /
- global attention /
- multi-scale disparity loss
-
-
图 2 TSRDNet网络的结构图。TSRDNet包括两个VGG-19前12层组成的前端网络、4个残差扩张卷积模块(RDCB)、4个全局注意力模块和一个用于回归密度图的卷积层
Figure 2. Structure diagram of the TSRDNet network. TSRDNet consists of two VGG-19 front-end networks consisting of the first 12 layers, 4 residual dilated convolution modules (RDCB), 4 global attention modules and a convolutional layer for regressing density maps
图 8 本文算法的部分测试结果图。第一列和第二列分别代表RGB图像和热图像,第三列是对应的人群密度图真值,第四列是本文方法的预测值
Figure 8. Part of the test results of the method in this paper. The first and second columns represent the RGB image and thermal image, respectively, the third column is the corresponding ground-truth crowd density map, and the fourth column is the predicted value of our method
表 1 在RGBT-CC数据集上的对比实验结果
Table 1 Comparative experimental results on the RGBT-CC dataset
Methods GAME(0) GAME(1) GAME(2) GAME(3) RMSE UCNet[20] 33.96 42.42 53.06 65.07 56.31 HDFNet[21] 22.36 27.79 33.68 42.48 33.93 MCNN[22] 21.89 25.70 30.22 37.19 37.44 SANet[23] 21.99 24.76 28.52 34.25 41.60 CSRNet[18] 20.4 23.58 28.03 35.51 35.26 BBSNet[24] 19.56 25.07 31.25 39.24 32.48 MVMS[25] 19.97 25.10 31.02 38.91 33.97 BL[26] 18.70 22.55 26.83 34.62 32.67 CMCRL[13] 15.61 19.95 24.69 32.89 28.18 TSRDNet
(Ours)14.81 18.77 23.04 28.76 24.69 表 2 在RGBT-CC数据上不同光照环境下的对比实验结果
Table 2 Comparative experimental results under different lighting environments on RGBT-CC data
表 3 在DroneRGBT数据集上的对比实验结果
Table 3 Comparative experimental results on the DroneRGBT dataset
表 4 全局注意力模块和CBAM的对比实验
Table 4 Comparative experiments of global attention module and CBAM
Evaluation index
Attention moduleGlobal attention module CBAM[19] GAME(0) 14.81 15.31 GAME(1) 18.77 21.72 GAME(2) 23.04 28.20 GAME(3) 28.76 35.42 RMSE 24.69 27.29 -
[1] 张宇倩, 李国辉, 雷军, 等. FF-CAM: 基于通道注意机制前后端融合的人群计数[J]. 计算机学报, 2021, 44(2): 304-317. https://www.cnki.com.cn/Article/CJFDTOTAL-JSJX202102004.htm ZHANG Yuqian, LI Guohui, LEI Jun, et al. FF-CAM: crowd counting based on front-end and back-end fusion of channel attention mechanism [J]. Journal of Computer Science, 2021, 44(2): 304-317. https://www.cnki.com.cn/Article/CJFDTOTAL-JSJX202102004.htm
[2] YANG Z, WEN J, HUANG K. A method of pedestrian flow monitoring based on received signal strength[J]. EURASIP Journal on Wireless Communications and Networking, 2022, 2022(1): 1-17. DOI: 10.1186/s13638-021-02080-5
[3] 王曲, 赵炜琪, 罗海勇, 等. 人群行为分析研究综述[J]. 计算机辅助设计与图形学学报, 2018, 30(12): 2353-2365. https://www.cnki.com.cn/Article/CJFDTOTAL-JSJF201812018.htm WANG Qu, ZHAO Weiqi, LUO Haiyong, et al. Review of research on crowd behavior analysis[J]. Journal of Computer-Aided Design and Graphics, 2018, 30(12): 2353-2365. https://www.cnki.com.cn/Article/CJFDTOTAL-JSJF201812018.htm
[4] 蒋一, 侯丽萍, 张强. 基于改进空时双流网络的红外行人动作识别研究[J]. 红外技术, 2021, 43(9): 852-860. http://hwjs.nvir.cn/article/id/f44f08d7-9ff9-413b-938d-de049d8dc5a2 JIANG Yi, HOU Liping, ZHANG Qiang. Research on infrared pedestrian action recognition based on improved space-time dual-stream network [J]. Infrared Technology, 2021, 43(9): 852-860. http://hwjs.nvir.cn/article/id/f44f08d7-9ff9-413b-938d-de049d8dc5a2
[5] 赵才荣, 齐鼎, 窦曙光, 等. 智能视频监控关键技术: 行人再识别研究综述[J]. 中国科学: 信息科学, 2021, 51(12): 1979-2015. https://www.cnki.com.cn/Article/CJFDTOTAL-PZKX202112002.htm ZHAO Cairong, QI Ding, DOU Shuguang, et al. Key technologies for intelligent video surveillance: A review of pedestrian re-identification research [J]. Science in China: Information Science, 2021, 51(12): 1979-2015. https://www.cnki.com.cn/Article/CJFDTOTAL-PZKX202112002.htm
[6] Enzweiler M, Gavrila D M. Monocular pedestrian detection: Survey and experiments[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008, 31(12): 2179-2195.
[7] LI M, ZHANG Z, HUANG K, et al. Estimating the number of people in crowded scenes by mid based foreground segmentation and head-shoulder detection[C]//2008 19th International Conference on Pattern Recognition, 2008: 1-4.
[8] CHEN K, Loy C C, GONG S, et al. Feature mining for localised crowd counting[C]//BMVC, 2012: 3-12.
[9] Pham V Q, Kozakaya T, Yamaguchi O, et al. Count forest: Co-voting uncertain number of targets using random forest for crowd density estimation[C]//Proceedings of the IEEE International Conference on Computer Vision, 2015: 3253-3261.
[10] PAN S, ZHAO Y, SU F, et al. SANet++: enhanced scale aggregation with densely connected feature fusion for crowd counting[C]//ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021: 1980-1984.
[11] 吴奇元, 王晓东, 章联军, 等. 融合注意力机制与上下文密度图的人群计数网络[J]. 计算机工程, 2022, 48(5): 235-241, 250. https://www.cnki.com.cn/Article/CJFDTOTAL-JSJC202205031.htm WU Qiyuan, WANG Xiaodong, ZHANG Lianjun, et al. Crowd counting network integrating attention mechanism and context density map [J]. Computer Engineering, 2022, 48(5): 235-241, 250. https://www.cnki.com.cn/Article/CJFDTOTAL-JSJC202205031.htm
[12] TANG H, WANG Y, CHAU L-P. TAFNet: a three-stream adaptive fusion network for RGB-T crowd counting[J/OL]. arXiv preprint arXiv: 2202.08517, 2022. https://doi.org/10.48550/arXiv.2202.08517.
[13] LIU L, CHEN J, WU H, et al. Cross-modal collaborative representation learning and a large-scale rgbt benchmark for crowd counting[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 4823-4833.
[14] PENG T, LI Q, ZHU P. RGB-T crowd counting from drone: a benchmark and MMCCN network[C]//Computer Vision – ACCV 2020, 2021: 497-513.
[15] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition[C]//International Conference on Learning Representations (ICLR), 2014: 1-14.
[16] HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 770-778.
[17] DAI F, LIU H, MA Y, et al. Dense scale network for crowd counting[C]//Proceedings of the 2021 International Conference on Multimedia Retrieval, 2021: 64-72.
[18] LI Y, ZHANG X, CHEN D. Csrnet: Dilated convolutional neural networks for understanding the highly congested scenes[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018: 1091-1100.
[19] Woo S, Park J, Lee J Y, et al. Cbam: Convolutional block attention module[C]//Proceedings of the European Conference on Computer Vision (ECCV), 2018: 3-19.
[20] ZHANG J, FAN D P, DAI Y, et al. UC-Net: uncertainty inspired RGB-D saliency detection via conditional variational autoencoders[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 8582-8591.
[21] PANG Y, ZHANG L, ZHAO X, et al. Hierarchical dynamic filtering network for rgb-d salient object detection[C]//European Conference on Computer Vision, 2020: 235-252.
[22] ZHANG Y, ZHOU D, CHEN S, et al. Single-image crowd counting via multi-column convolutional neural network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 589-597.
[23] CAO X, WANG Z, ZHAO Y, et al. Scale aggregation network for accurate and efficient crowd counting[C]//Proceedings of the European Conference on Computer Vision (ECCV), 2018: 734-750.
[24] FAN D P, ZHAI Y, Borji A, et al. BBS-Net: RGB-D salient object detection with a bifurcated backbone strategy network[C]//European Conference on Computer Vision, 2020: 275-292.
[25] ZHANG Q, CHAN A B. Wide-area crowd counting via ground-plane density maps and multi-view fusion cnns[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 8297-8306.
[26] MA Z, WEI X, HONG X, et al. Bayesian loss for crowd count estimation with point supervision[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019: 6142-6151.
[27] ZENG L, XU X, CAI B, et al. Multi-scale convolutional neural networks for crowd counting[C]//IEEE International Conference on Image Processing (ICIP), 2017: 465-469.
[28] SHEN Z, XU Y, NI B, et al. Crowd counting via adversarial cross-scale consistency pursuit[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018: 5245-5254.
-
期刊类型引用(8)
1. 盛雨婷. 雾天环境下前车车距测量方法研究. 山东科学. 2024(01): 88-94 . 百度学术
2. 张莉婷,陈顺吉. 一种改进的无人机影像去雾算法研究. 地矿测绘. 2024(03): 13-17+36 . 百度学术
3. 廖章回,姜闯. 高分辨率遥感影像快速去雾. 测绘学报. 2022(03): 446-456 . 百度学术
4. 王鹏飞,李强,冯谦,贾爱琴. 动态不良天候图像去雾霾增强方法. 光学与光电技术. 2022(06): 36-44 . 百度学术
5. 陈志恒,严利民,张竞阳. 采用自适应全局亮度补偿的夜间去雾算法. 红外技术. 2021(10): 954-959 . 本站查看
6. 刘天时,惠霄霄. 基于暗通道的图像去雾改进算法研究. 智能计算机与应用. 2018(01): 36-39+45 . 百度学术
7. 惠霄霄. 航拍图像去雾算法研究. 信息技术与信息化. 2017(10): 61-63 . 百度学术
8. 吴绍启,唐宁,张子方,赵鹏. 一种视频图像的去雾算法. 桂林电子科技大学学报. 2016(05): 364-368 . 百度学术
其他类型引用(8)