基于RGB-T图像的双流残差扩张网络人群计数算法

杨佩龙, 陈树越, 杨尚瑜, 王佳宏

杨佩龙, 陈树越, 杨尚瑜, 王佳宏. 基于RGB-T图像的双流残差扩张网络人群计数算法[J]. 红外技术, 2023, 45(11): 1177-1186.
引用本文: 杨佩龙, 陈树越, 杨尚瑜, 王佳宏. 基于RGB-T图像的双流残差扩张网络人群计数算法[J]. 红外技术, 2023, 45(11): 1177-1186.
YANG Peilong, CHEN Shuyue, YANG Shangyu, WANG Jiahong. Two-Stream Residual Dilation Network Algorithm for Crowd Counting Based on RGB-T Images[J]. Infrared Technology , 2023, 45(11): 1177-1186.
Citation: YANG Peilong, CHEN Shuyue, YANG Shangyu, WANG Jiahong. Two-Stream Residual Dilation Network Algorithm for Crowd Counting Based on RGB-T Images[J]. Infrared Technology , 2023, 45(11): 1177-1186.

基于RGB-T图像的双流残差扩张网络人群计数算法

基金项目: 

江苏省关键研究与发展计划项目 BE2021012-5

详细信息
    作者简介:

    杨佩龙(1997-),男,硕士,主要研究方向为计算机视觉。E-mail: 2247291086@qq.com

    通讯作者:

    陈树越(1963-),男,教授,主要研究方向为计算机视觉与检测技术。E-mail:csyue2000@163.com

  • 中图分类号: TP391

Two-Stream Residual Dilation Network Algorithm for Crowd Counting Based on RGB-T Images

  • 摘要: 在人群计数中,针对尺度变化、行人分布不均以及夜间较差成像条件,提出了一种基于RGB-T(RGB-Thermal)图像的多模态人群计数算法,称为双流残差扩张网络,它由前端特征提取网络、多尺度的残差扩张卷积模块和全局注意力模块所构成。其中,前端网络用来提取RGB特征和热特征,扩张卷积模块进一步提取不同尺度的行人特征信息,全局注意力模块用来建立全局特征之间的依赖关系。此外,还引入了一种新的多尺度差异性损失,以提高网络的计数性能。为评估该方法,在RGBT-CC(RGBT Crowd Counting)数据集和DroneRGBT数据集上进行了对比实验。实验结果表明,在RGBT-CC数据集上与CMCRL(Cross-modal Collaborative Representation Learning)算法相比该算法的GAME(0)(Grid Average Mean absolute Errors)和RMSE(Root Mean Squared Error)分别降低了0.8和3.49,在DroneRGBT数据集上与MMCCN(Multi-Modal Crowd Counting Network)算法比分别降低了0.34和0.17,表明具有较好的计数性能。
    Abstract: We proposed a multimodal crowd counting algorithm based on RGB-Thermal (RGB-T) images (two-stream residual expansion network) in crowd counting, given scale changes, uneven pedestrian distribution, and poor imaging conditions at night. It has a front-end feature extraction network, multi-scale residual dilation convolution, and global attention modules. We used the front-end network to extract RGB and thermal features, and the dilated convolution module further extracted pedestrian feature information at different scales and used the global attention module to establish dependencies between global features. We also introduced a new multi-scale dissimilarity loss method to improve the counting performance of the network and conducted comparative experiments on the RGBT crowd counting (RGBT-CC) and DroneRGBT datasets to evaluate the method. Experimental results showed that compared with the cross-modal collaborative representation learning (CMCRL) algorithm on the RGBT-CC dataset, the grid average mean absolute error (GAME (0)) and root mean squared error (RMSE) of this algorithm are reduced by 0.8 and 3.49, respectively. On the DroneRGBT dataset, the algorithm are reduced by 0.34 and 0.17, respectively, compared to the multimodal crowd counting network (MMCCN) algorithm, indicating better counting performance.
  • 图  1   不同光线下的RGB图像和热图像

    Figure  1.   RGB images and thermal images under different lights

    图  2   TSRDNet网络的结构图。TSRDNet包括两个VGG-19前12层组成的前端网络、4个残差扩张卷积模块(RDCB)、4个全局注意力模块和一个用于回归密度图的卷积层

    Figure  2.   Structure diagram of the TSRDNet network. TSRDNet consists of two VGG-19 front-end networks consisting of the first 12 layers, 4 residual dilated convolution modules (RDCB), 4 global attention modules and a convolutional layer for regressing density maps

    图  3   不同扩张率的扩张卷积模型

    Figure  3.   Dilated convolution models with different expansion rates

    图  4   全局注意力机制的结构图

    Figure  4.   Structure diagram of the global attention mechanism

    图  5   通道注意子模块结构图

    Figure  5.   Channel attention sub-module structure diagram

    图  6   空间注意力子模块结构图

    Figure  6.   Structure diagram of spatial attention sub-module

    图  7   本文算法的部分测试结果

    Figure  7.   Part of the test results of the method in this paper

    图  8   本文算法的部分测试结果图。第一列和第二列分别代表RGB图像和热图像,第三列是对应的人群密度图真值,第四列是本文方法的预测值

    Figure  8.   Part of the test results of the method in this paper. The first and second columns represent the RGB image and thermal image, respectively, the third column is the corresponding ground-truth crowd density map, and the fourth column is the predicted value of our method

    图  9   消融实验结果对比图

    Figure  9.   Comparison of the results of ablation experiments

    图  10   参数λ的消融实验的结果对比图

    Figure  10.   Comparison of the results of the ablation experiment of the parameter λ

    表  1   在RGBT-CC数据集上的对比实验结果

    Table  1   Comparative experimental results on the RGBT-CC dataset

    Methods GAME(0) GAME(1) GAME(2) GAME(3) RMSE
    UCNet[20] 33.96 42.42 53.06 65.07 56.31
    HDFNet[21] 22.36 27.79 33.68 42.48 33.93
    MCNN[22] 21.89 25.70 30.22 37.19 37.44
    SANet[23] 21.99 24.76 28.52 34.25 41.60
    CSRNet[18] 20.4 23.58 28.03 35.51 35.26
    BBSNet[24] 19.56 25.07 31.25 39.24 32.48
    MVMS[25] 19.97 25.10 31.02 38.91 33.97
    BL[26] 18.70 22.55 26.83 34.62 32.67
    CMCRL[13] 15.61 19.95 24.69 32.89 28.18
    TSRDNet
    (Ours)
    14.81 18.77 23.04 28.76 24.69
    下载: 导出CSV

    表  2   在RGBT-CC数据上不同光照环境下的对比实验结果

    Table  2   Comparative experimental results under different lighting environments on RGBT-CC data

    Illumination Methods GAME(0) GAME(1) GAME(2) GAME(3) RMSE
    Brightness CMCRL[13] 20.36 23.57 28.49 36.29 32.57
    TSRDNet(ours) 16.08 21.12 27.12 33.85 27.72
    Darkness CMCRL[13] 15.44 19.23 23.79 30.28 29.11
    TSRDNet(ours) 14.92 18.81 22.41 29.83 27.07
    下载: 导出CSV

    表  3   在DroneRGBT数据集上的对比实验结果

    Table  3   Comparative experimental results on the DroneRGBT dataset

    Methods GAME(0) RMSE
    MCNN[22] 13.64 19.77
    MSCNN[27] 14.89 20.41
    ACSCP[28] 13.06 20.29
    SANet[23] 12.13 17.52
    SANet++[10] 11.68 17.07
    CSRNet [18] 8.91 13.80
    BL[26] 7.41 11.56
    CMCRL[13] 7.35 11.84
    MMCCN[14] 7.27 11.45
    TSRDNet (ours) 6.93 11.28
    下载: 导出CSV

    表  4   全局注意力模块和CBAM的对比实验

    Table  4   Comparative experiments of global attention module and CBAM

    Evaluation index
    Attention module
    Global attention module CBAM[19]
    GAME(0) 14.81 15.31
    GAME(1) 18.77 21.72
    GAME(2) 23.04 28.20
    GAME(3) 28.76 35.42
    RMSE 24.69 27.29
    下载: 导出CSV
  • [1] 张宇倩, 李国辉, 雷军, 等. FF-CAM: 基于通道注意机制前后端融合的人群计数[J]. 计算机学报, 2021, 44(2): 304-317. https://www.cnki.com.cn/Article/CJFDTOTAL-JSJX202102004.htm

    ZHANG Yuqian, LI Guohui, LEI Jun, et al. FF-CAM: crowd counting based on front-end and back-end fusion of channel attention mechanism [J]. Journal of Computer Science, 2021, 44(2): 304-317. https://www.cnki.com.cn/Article/CJFDTOTAL-JSJX202102004.htm

    [2]

    YANG Z, WEN J, HUANG K. A method of pedestrian flow monitoring based on received signal strength[J]. EURASIP Journal on Wireless Communications and Networking, 2022, 2022(1): 1-17. DOI: 10.1186/s13638-021-02080-5

    [3] 王曲, 赵炜琪, 罗海勇, 等. 人群行为分析研究综述[J]. 计算机辅助设计与图形学学报, 2018, 30(12): 2353-2365. https://www.cnki.com.cn/Article/CJFDTOTAL-JSJF201812018.htm

    WANG Qu, ZHAO Weiqi, LUO Haiyong, et al. Review of research on crowd behavior analysis[J]. Journal of Computer-Aided Design and Graphics, 2018, 30(12): 2353-2365. https://www.cnki.com.cn/Article/CJFDTOTAL-JSJF201812018.htm

    [4] 蒋一, 侯丽萍, 张强. 基于改进空时双流网络的红外行人动作识别研究[J]. 红外技术, 2021, 43(9): 852-860. http://hwjs.nvir.cn/article/id/f44f08d7-9ff9-413b-938d-de049d8dc5a2

    JIANG Yi, HOU Liping, ZHANG Qiang. Research on infrared pedestrian action recognition based on improved space-time dual-stream network [J]. Infrared Technology, 2021, 43(9): 852-860. http://hwjs.nvir.cn/article/id/f44f08d7-9ff9-413b-938d-de049d8dc5a2

    [5] 赵才荣, 齐鼎, 窦曙光, 等. 智能视频监控关键技术: 行人再识别研究综述[J]. 中国科学: 信息科学, 2021, 51(12): 1979-2015. https://www.cnki.com.cn/Article/CJFDTOTAL-PZKX202112002.htm

    ZHAO Cairong, QI Ding, DOU Shuguang, et al. Key technologies for intelligent video surveillance: A review of pedestrian re-identification research [J]. Science in China: Information Science, 2021, 51(12): 1979-2015. https://www.cnki.com.cn/Article/CJFDTOTAL-PZKX202112002.htm

    [6]

    Enzweiler M, Gavrila D M. Monocular pedestrian detection: Survey and experiments[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008, 31(12): 2179-2195.

    [7]

    LI M, ZHANG Z, HUANG K, et al. Estimating the number of people in crowded scenes by mid based foreground segmentation and head-shoulder detection[C]//2008 19th International Conference on Pattern Recognition, 2008: 1-4.

    [8]

    CHEN K, Loy C C, GONG S, et al. Feature mining for localised crowd counting[C]//BMVC, 2012: 3-12.

    [9]

    Pham V Q, Kozakaya T, Yamaguchi O, et al. Count forest: Co-voting uncertain number of targets using random forest for crowd density estimation[C]//Proceedings of the IEEE International Conference on Computer Vision, 2015: 3253-3261.

    [10]

    PAN S, ZHAO Y, SU F, et al. SANet++: enhanced scale aggregation with densely connected feature fusion for crowd counting[C]//ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021: 1980-1984.

    [11] 吴奇元, 王晓东, 章联军, 等. 融合注意力机制与上下文密度图的人群计数网络[J]. 计算机工程, 2022, 48(5): 235-241, 250. https://www.cnki.com.cn/Article/CJFDTOTAL-JSJC202205031.htm

    WU Qiyuan, WANG Xiaodong, ZHANG Lianjun, et al. Crowd counting network integrating attention mechanism and context density map [J]. Computer Engineering, 2022, 48(5): 235-241, 250. https://www.cnki.com.cn/Article/CJFDTOTAL-JSJC202205031.htm

    [12]

    TANG H, WANG Y, CHAU L-P. TAFNet: a three-stream adaptive fusion network for RGB-T crowd counting[J/OL]. arXiv preprint arXiv: 2202.08517, 2022. https://doi.org/10.48550/arXiv.2202.08517.

    [13]

    LIU L, CHEN J, WU H, et al. Cross-modal collaborative representation learning and a large-scale rgbt benchmark for crowd counting[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 4823-4833.

    [14]

    PENG T, LI Q, ZHU P. RGB-T crowd counting from drone: a benchmark and MMCCN network[C]//Computer Vision – ACCV 2020, 2021: 497-513.

    [15]

    Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition[C]//International Conference on Learning Representations (ICLR), 2014: 1-14.

    [16]

    HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 770-778.

    [17]

    DAI F, LIU H, MA Y, et al. Dense scale network for crowd counting[C]//Proceedings of the 2021 International Conference on Multimedia Retrieval, 2021: 64-72.

    [18]

    LI Y, ZHANG X, CHEN D. Csrnet: Dilated convolutional neural networks for understanding the highly congested scenes[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018: 1091-1100.

    [19]

    Woo S, Park J, Lee J Y, et al. Cbam: Convolutional block attention module[C]//Proceedings of the European Conference on Computer Vision (ECCV), 2018: 3-19.

    [20]

    ZHANG J, FAN D P, DAI Y, et al. UC-Net: uncertainty inspired RGB-D saliency detection via conditional variational autoencoders[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 8582-8591.

    [21]

    PANG Y, ZHANG L, ZHAO X, et al. Hierarchical dynamic filtering network for rgb-d salient object detection[C]//European Conference on Computer Vision, 2020: 235-252.

    [22]

    ZHANG Y, ZHOU D, CHEN S, et al. Single-image crowd counting via multi-column convolutional neural network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 589-597.

    [23]

    CAO X, WANG Z, ZHAO Y, et al. Scale aggregation network for accurate and efficient crowd counting[C]//Proceedings of the European Conference on Computer Vision (ECCV), 2018: 734-750.

    [24]

    FAN D P, ZHAI Y, Borji A, et al. BBS-Net: RGB-D salient object detection with a bifurcated backbone strategy network[C]//European Conference on Computer Vision, 2020: 275-292.

    [25]

    ZHANG Q, CHAN A B. Wide-area crowd counting via ground-plane density maps and multi-view fusion cnns[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 8297-8306.

    [26]

    MA Z, WEI X, HONG X, et al. Bayesian loss for crowd count estimation with point supervision[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019: 6142-6151.

    [27]

    ZENG L, XU X, CAI B, et al. Multi-scale convolutional neural networks for crowd counting[C]//IEEE International Conference on Image Processing (ICIP), 2017: 465-469.

    [28]

    SHEN Z, XU Y, NI B, et al. Crowd counting via adversarial cross-scale consistency pursuit[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018: 5245-5254.

  • 期刊类型引用(8)

    1. 盛雨婷. 雾天环境下前车车距测量方法研究. 山东科学. 2024(01): 88-94 . 百度学术
    2. 张莉婷,陈顺吉. 一种改进的无人机影像去雾算法研究. 地矿测绘. 2024(03): 13-17+36 . 百度学术
    3. 廖章回,姜闯. 高分辨率遥感影像快速去雾. 测绘学报. 2022(03): 446-456 . 百度学术
    4. 王鹏飞,李强,冯谦,贾爱琴. 动态不良天候图像去雾霾增强方法. 光学与光电技术. 2022(06): 36-44 . 百度学术
    5. 陈志恒,严利民,张竞阳. 采用自适应全局亮度补偿的夜间去雾算法. 红外技术. 2021(10): 954-959 . 本站查看
    6. 刘天时,惠霄霄. 基于暗通道的图像去雾改进算法研究. 智能计算机与应用. 2018(01): 36-39+45 . 百度学术
    7. 惠霄霄. 航拍图像去雾算法研究. 信息技术与信息化. 2017(10): 61-63 . 百度学术
    8. 吴绍启,唐宁,张子方,赵鹏. 一种视频图像的去雾算法. 桂林电子科技大学学报. 2016(05): 364-368 . 百度学术

    其他类型引用(8)

图(10)  /  表(4)
计量
  • 文章访问数:  125
  • HTML全文浏览量:  80
  • PDF下载量:  23
  • 被引次数: 16
出版历程
  • 收稿日期:  2022-07-12
  • 修回日期:  2022-09-12
  • 刊出日期:  2023-11-19

目录

    /

    返回文章
    返回
    x 关闭 永久关闭

    尊敬的专家、作者、读者:

    端午节期间因系统维护,《红外技术》网站(hwjs.nvir.cn)将于2024年6月7日20:00-6月10日关闭。关闭期间,您将暂时无法访问《红外技术》网站和登录投审稿系统,给您带来不便敬请谅解!

    预计6月11日正常恢复《红外技术》网站及投审稿系统的服务。您如有任何问题,可发送邮件至编辑部邮箱(irtek@china.com)与我们联系。

    感谢您对本刊的支持!

    《红外技术》编辑部

    2024年6月6日