留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于改进空时双流网络的红外行人动作识别研究

蒋一 侯丽萍 张强

蒋一, 侯丽萍, 张强. 基于改进空时双流网络的红外行人动作识别研究[J]. 红外技术, 2021, 43(9): 852-860.
引用本文: 蒋一, 侯丽萍, 张强. 基于改进空时双流网络的红外行人动作识别研究[J]. 红外技术, 2021, 43(9): 852-860.
JIANG Yi, HOU Liping, ZHANG Qiang. Infrared Pedestrian Action Recognition Based on Improved Spatial-temporal Two-stream Convolution Network[J]. Infrared Technology , 2021, 43(9): 852-860.
Citation: JIANG Yi, HOU Liping, ZHANG Qiang. Infrared Pedestrian Action Recognition Based on Improved Spatial-temporal Two-stream Convolution Network[J]. Infrared Technology , 2021, 43(9): 852-860.

基于改进空时双流网络的红外行人动作识别研究

基金项目: 装备预研基金资助课题项目
详细信息
    作者简介:

    蒋一(1983-), 男, 汉族, 河南信阳人, 学士, 讲师, 主要研究方向:计算机视觉、红外应用技术。E-mail: 85112285@qq.com

  • 中图分类号: TP391.4

Infrared Pedestrian Action Recognition Based on Improved Spatial-temporal Two-stream Convolution Network

  • 摘要: 为了提升复杂背景下红外序列的行人动作识别精度,本文提出了一种改进的空时双流网络,该网络首先采用深度差分网络代替时间信息网络,提高时空特征的表征能力与提取效率;然后,采用基于决策级特征融合机制的代价函数对模型进行训练,可以更大限度地保留不同网络帧间图像的时空特征,更加真实地反映行人的动作类别。仿真结果表明,本文提出的改进网络在自建的红外视频数据集上获得了81%的识别精度,且计算效率也提升了25%,具有较高的工程应用价值。
  • 图  1  双流网络结构

    Figure  1.  Two-stream network

    图  2  LSTM网络结构

    Figure  2.  LSTM structure

    图  3  改进的双流网络

    Figure  3.  Improved two-stream network

    图  4  差分关键帧与对应的光流图

    Figure  4.  Differential keyframe and corresponding optical-flow

    图  5  训练过程中的模型损失值变化趋势

    Figure  5.  Change trend of loss value during training

    图  6  训练过程中的模型识别精度变化趋势

    Figure  6.  Trend of precision variation during training

    表  1  数据集类别及其数量

    Table  1.   Classes and quantities of data-sets

    NO Categories Total
    1 Walk 152
    2 Stand 203
    3 climb 186
    4 Jog 265
    5 Jump 174
    5 Punch 128
    7 Lying 295
    8 Wave1 168
    9 Wave2 177
    10 Crouch 312
    11 Sitting 268
    12 Handclapping 208
    13 Push 158
    14 Fight 119
    15 Handshake 134
    16 Hug 168
    下载: 导出CSV

    表  2  不同模块性能分析

    Table  2.   Performance analysis of different modules

    DDN IS DF Pr/% FPS
    77.12 13.9
    77.83 18.1
    79.91 13.8
    79.78 12.7
    81.79 17.8
    82.09 18.5
    81.83 11.6
    83.01 17.7
    下载: 导出CSV

    表  3  不同对比算法的性能分析

    Table  3.   Performance analysis of different comparison models

    Categories IDT C3D SCNN-3G L-LSTM Ts-3D OFGF Our
    Pr Mr Rr Pr Mr Rr Pr Mr Rr Pr Mr Rr Pr Mr Rr Pr Mr Rr Pr Mr Rr
    Walk 64 27 70 66 21 72 68 23 72 74 19 77 76 27 74 79 16 80 78 10 80
    Stand 72 20 75 76 19 77 76 19 74 82 19 87 84 20 75 84 16 85 85 20 86
    climb 50 36 61 53 31 63 61 34 66 66 25 67 71 36 61 76 24 81 78 16 81
    Jog 66 28 70 68 23 75 70 23 70 67 28 76 71 28 70 76 19 78 86 8 90
    Jump 60 32 65 61 31 68 67 34 67 60 32 74 72 32 65 72 22 77 71 16 80
    Punch 41 50 44 41 40 43 46 51 48 51 40 58 60 50 64 61 30 64 67 22 69
    Lying 56 36 60 57 31 66 59 33 65 56 36 67 70 30 67 66 22 69 67 16 70
    Wave1 65 31 65 68 29 68 68 30 68 65 31 76 72 23 75 75 11 80 82 11 85
    Wave2 68 28 69 70 30 71 71 23 76 68 28 87 78 28 79 81 17 86 88 8 88
    Crouch 41 29 41 43 34 45 44 23 46 41 29 58 53 20 50 60 22 61 68 26 71
    Sitting 70 24 78 73 28 80 72 28 79 71 24 81 78 19 81 80 15 88 82 14 87
    Handclap 37 33 38 38 34 42 38 30 33 37 33 50 45 23 58 67 22 68 72 23 76
    Push 41 46 44 44 47 46 42 42 47 41 46 57 66 30 64 71 23 74 71 16 79
    Fight 53 35 57 58 30 58 56 31 58 53 35 67 67 29 67 63 15 77 80 13 80
    Handshake 62 29 67 65 31 70 66 26 70 62 29 76 71 20 77 75 19 87 76 22 81
    Hug 67 26 69 66 27 72 61 28 74 76 28 74 74 26 78 78 25 79 81 14 85
    Mixed dataset 57 31 60 59 30 63 60 29 63 60 30 70 69 27 69 72 18 77 77 15 80
    下载: 导出CSV
  • [1] Karpathy A, Toderici G, Shetty S, et al. Large- scale video classification with convolutional neural networks[C]// CVPR, 2014: 1725-1732.
    [2] Tran D, Bourdev L D, Fergus R, et al. Learning spatiotem-poral features with 3d convolutional networks[C]//ICCV, 2015: 4489-4497.
    [3] ZHANG B, WANG L, WANG Z, et al. Real-time action recognition with enhanced motion vector CNNs[C]//CVPR, 2016: 2718-2726.
    [4] Niebles J C, CHEN C W, LI F F. Modeling temporal structure of decomposable motion segments for activity classification[C]// ECCV, 2010: 392-405.
    [5] Tumas P, Nowosielski A, Serackis A. Pedestrian detection in severe weather conditions[J]. IEEE Access, 2020, 8: 62775-62784. doi:  10.1109/ACCESS.2020.2982539
    [6] 魏丽, 丁萌, 曾丽君. 红外图像中基于似物性与稀疏编码的行人检测[J]. 红外技术, 2016, 38(9): 752-757. http://hwjs.nvir.cn/article/id/hwjs201609007

    WEI Li, DING Meng, ZENG Lijun. Pedestrian Detection Based on Objectness and Sparse Coding in a Single Infrared Image[J]. Infrared Technology, 2016, 38(9): 752-757. http://hwjs.nvir.cn/article/id/hwjs201609007
    [7] Fernando B, Gavves E M, Ghodrati J O, et al. Modeling video evolution for action recognition[C]//CVPR, 2015: 5378-5387.
    [8] Varol G, Laptev I, Schmid C. Long-term temporal convolutions for action recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 40(6): 1510-1517.
    [9] Donahue J, Anne Hendricks L, Guadarrama S, et al. Long-term Recurrent Convolutional Networks for Visual Recognition and Description[M]. Elsevier, 2015: 2625-2634.
    [10] Soomro K, Zamir A R, Shah M. A dataset of 101 human actions classes from videos in the wild[J/OL]. Computer Vision and Pattern Recognition, arXiv: 1212.0402, 2012.
    [11] Kuehne H, Jhuang H, Garrote E, et al. HMDB: A large video database for human motion recognition[C]//ICCV, 2011: 2556-2563.
    [12] Ioffe S, Szegedy C. Batch normalization: Accelerating deep network training by reducing internal covariate shift[C]//ICML, 2015: 448-456.
    [13] WANG L, QIAO Y, TANG X. Video action detection with relational dynamic- poselets[C]//ECCV, 2014: 565-580.
    [14] GAN C, YAO T, YANG K, et a. You lead, we exceed: Labor-free video concept learning by jointly exploiting web videos and images[C]//CVPR, 2016: 923-932.
    [15] Simonyan K, Zisserman A. Two-Stream Convolutional Networks for Action Recognition in Videos[J]. Advances in Neural Information Processing Systems, 2014, 150: 109-125. http://de.arxiv.org/pdf/1406.2199
    [16] 冉鹏, 王灵, 李昕, 等. 改进Softmax分类器的深度卷积神经网络及其在人脸识别中的应用[J]. 上海大学学报: 自然科学版, 2018, 24(3): 352-366. https://www.cnki.com.cn/Article/CJFDTOTAL-SDXZ201803004.htm

    RAN Peng, WANG Ling, LI Xin, et al. Deep convolution neural network based on improved softmax classifier and its application in face recognition[J]. Journal of Shanghai University: Natural Science Edition, 2018, 24(3): 352-366. https://www.cnki.com.cn/Article/CJFDTOTAL-SDXZ201803004.htm
    [17] Yasin H, Hussain M, Weber A. Keys for Action: An Efficient Keyframe-Based Approach for 3D Action Recognition Using a Deep Neural Network[J]. Sensors, 2020, 20(8): 2226. doi:  10.3390/s20082226
    [18] GAO Chenqiang, DU Yinhe, LIU Jiang, et al. InfAR dataset: Infrared action recognition at different times[J]. Neurcomputing, 2016, 212: 36-47. doi:  10.1016/j.neucom.2016.05.094
    [19] WANG H, SCHMID C. Action recognition with improved trajectories[C]//Proceedings of the 2013 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2013: 3551-3558.
    [20] Du Tran, Lubomir Bourdev, Rob Fergus, et al. Learning spatiotemporal features with 3D convolutional networks[C]//Proceedings of the 2015 IEEE, International Conference on Computer Vision. Piscataway: IEEE, 2015: 4489-4497.
    [21] 杨天明, 陈志, 岳文静. 基于视频深度学习的时空双流人物动作识别模型[J]. 计算机应用, 2018, 38(3): 895-899. https://www.cnki.com.cn/Article/CJFDTOTAL-JSJY201803050.htm

    YANG T M, CHENG Z, YU, W J, et al. Spatio-temporal two-stream human action recognition model based on video deep learning[J]. Journal of Computer Applications, 2018, 38(3): 895-899, 915. https://www.cnki.com.cn/Article/CJFDTOTAL-JSJY201803050.htm
    [22] LIN S, JIA K, CHEN K, et al. Lattice long short-term memory for human action recognition[C]// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2017: 2166-2175.
    [23] Carrlira J, Gisslrman A. Quo vadis. action recognition? A new model and the kinetics dataset[C]//Proceedings of the 2017 IEEE, Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 4724-4733.
    [24] SUN S, KUANG Z, SHENG L, et al. Optical Flow Guided Feature: A Fast and Robust Motion Representation for Video Action Recognition[C]//The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018: 20118-20132.
  • 加载中
图(6) / 表(3)
计量
  • 文章访问数:  35
  • HTML全文浏览量:  8
  • PDF下载量:  10
  • 被引次数: 0
出版历程
  • 收稿日期:  2020-12-27
  • 修回日期:  2021-08-24
  • 刊出日期:  2021-09-20

目录

    /

    返回文章
    返回

    《红外技术》网站维护通知

    尊敬的专家、作者、读者:

    国庆假期期间(10月1日-3日)因设备维护,《红外技术》网站(hwjs.nvir.cn)将于2021年9月30日18:00-10月4日13:00关闭。关闭期间,您将暂时无法访问《红外技术》网站和登录投审稿系统,给您带来不便敬请谅解!

    《红外技术》编辑部

    2021年9月29日