基于RGB-T图像的双流残差扩张网络人群计数算法

Two-Stream Residual Dilation Network Algorithm for Crowd Counting Based on RGB-T Images

  • 摘要: 在人群计数中,针对尺度变化、行人分布不均以及夜间较差成像条件,提出了一种基于RGB-T(RGB-Thermal)图像的多模态人群计数算法,称为双流残差扩张网络,它由前端特征提取网络、多尺度的残差扩张卷积模块和全局注意力模块所构成。其中,前端网络用来提取RGB特征和热特征,扩张卷积模块进一步提取不同尺度的行人特征信息,全局注意力模块用来建立全局特征之间的依赖关系。此外,还引入了一种新的多尺度差异性损失,以提高网络的计数性能。为评估该方法,在RGBT-CC(RGBT Crowd Counting)数据集和DroneRGBT数据集上进行了对比实验。实验结果表明,在RGBT-CC数据集上与CMCRL(Cross-modal Collaborative Representation Learning)算法相比该算法的GAME(0)(Grid Average Mean absolute Errors)和RMSE(Root Mean Squared Error)分别降低了0.8和3.49,在DroneRGBT数据集上与MMCCN(Multi-Modal Crowd Counting Network)算法比分别降低了0.34和0.17,表明具有较好的计数性能。

     

    Abstract: We proposed a multimodal crowd counting algorithm based on RGB-Thermal (RGB-T) images (two-stream residual expansion network) in crowd counting, given scale changes, uneven pedestrian distribution, and poor imaging conditions at night. It has a front-end feature extraction network, multi-scale residual dilation convolution, and global attention modules. We used the front-end network to extract RGB and thermal features, and the dilated convolution module further extracted pedestrian feature information at different scales and used the global attention module to establish dependencies between global features. We also introduced a new multi-scale dissimilarity loss method to improve the counting performance of the network and conducted comparative experiments on the RGBT crowd counting (RGBT-CC) and DroneRGBT datasets to evaluate the method. Experimental results showed that compared with the cross-modal collaborative representation learning (CMCRL) algorithm on the RGBT-CC dataset, the grid average mean absolute error (GAME (0)) and root mean squared error (RMSE) of this algorithm are reduced by 0.8 and 3.49, respectively. On the DroneRGBT dataset, the algorithm are reduced by 0.34 and 0.17, respectively, compared to the multimodal crowd counting network (MMCCN) algorithm, indicating better counting performance.

     

/

返回文章
返回