JIANG Yi, HOU Liping, ZHANG Qiang. Infrared Pedestrian Action Recognition Based on Improved Spatial-temporal Two-stream Convolution Network[J]. Infrared Technology , 2021, 43(9): 852-860.
Citation: JIANG Yi, HOU Liping, ZHANG Qiang. Infrared Pedestrian Action Recognition Based on Improved Spatial-temporal Two-stream Convolution Network[J]. Infrared Technology , 2021, 43(9): 852-860.

Infrared Pedestrian Action Recognition Based on Improved Spatial-temporal Two-stream Convolution Network

More Information
  • Received Date: December 26, 2020
  • Revised Date: August 23, 2021
  • This study proposes an improved spatial-temporal two-stream network to improve the pedestrian action recognition accuracy of infrared sequences in complex backgrounds. First, a deep differential network replaces the temporal stream network to improve the representation ability and extraction efficiency of spatio-temporal features. Then, the improved softmax loss function based on the decision-making level feature fusion mechanism is used to train the model, which can retain the spatio-temporal characteristics of images between different network frames to a greater extent and reflect the action category of pedestrians more realistically. Simulation results show that the proposed improved network achieves 87% recognition accuracy on the self-built infrared dataset, and the computational efficiency is improved by 25%, which has a high engineering application value.
  • [1]
    Karpathy A, Toderici G, Shetty S, et al. Large- scale video classification with convolutional neural networks[C]// CVPR, 2014: 1725-1732.
    [2]
    Tran D, Bourdev L D, Fergus R, et al. Learning spatiotem-poral features with 3d convolutional networks[C]//ICCV, 2015: 4489-4497.
    [3]
    ZHANG B, WANG L, WANG Z, et al. Real-time action recognition with enhanced motion vector CNNs[C]//CVPR, 2016: 2718-2726.
    [4]
    Niebles J C, CHEN C W, LI F F. Modeling temporal structure of decomposable motion segments for activity classification[C]// ECCV, 2010: 392-405.
    [5]
    Tumas P, Nowosielski A, Serackis A. Pedestrian detection in severe weather conditions[J]. IEEE Access, 2020, 8: 62775-62784. DOI: 10.1109/ACCESS.2020.2982539
    [6]
    魏丽, 丁萌, 曾丽君. 红外图像中基于似物性与稀疏编码的行人检测[J]. 红外技术, 2016, 38(9): 752-757. http://hwjs.nvir.cn/article/id/hwjs201609007

    WEI Li, DING Meng, ZENG Lijun. Pedestrian Detection Based on Objectness and Sparse Coding in a Single Infrared Image[J]. Infrared Technology, 2016, 38(9): 752-757. http://hwjs.nvir.cn/article/id/hwjs201609007
    [7]
    Fernando B, Gavves E M, Ghodrati J O, et al. Modeling video evolution for action recognition[C]//CVPR, 2015: 5378-5387.
    [8]
    Varol G, Laptev I, Schmid C. Long-term temporal convolutions for action recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 40(6): 1510-1517.
    [9]
    Donahue J, Anne Hendricks L, Guadarrama S, et al. Long-term Recurrent Convolutional Networks for Visual Recognition and Description[M]. Elsevier, 2015: 2625-2634.
    [10]
    Soomro K, Zamir A R, Shah M. A dataset of 101 human actions classes from videos in the wild[J/OL]. Computer Vision and Pattern Recognition, arXiv: 1212.0402, 2012.
    [11]
    Kuehne H, Jhuang H, Garrote E, et al. HMDB: A large video database for human motion recognition[C]//ICCV, 2011: 2556-2563.
    [12]
    Ioffe S, Szegedy C. Batch normalization: Accelerating deep network training by reducing internal covariate shift[C]//ICML, 2015: 448-456.
    [13]
    WANG L, QIAO Y, TANG X. Video action detection with relational dynamic- poselets[C]//ECCV, 2014: 565-580.
    [14]
    GAN C, YAO T, YANG K, et a. You lead, we exceed: Labor-free video concept learning by jointly exploiting web videos and images[C]//CVPR, 2016: 923-932.
    [15]
    Simonyan K, Zisserman A. Two-Stream Convolutional Networks for Action Recognition in Videos[J]. Advances in Neural Information Processing Systems, 2014, 150: 109-125. http://de.arxiv.org/pdf/1406.2199
    [16]
    冉鹏, 王灵, 李昕, 等. 改进Softmax分类器的深度卷积神经网络及其在人脸识别中的应用[J]. 上海大学学报: 自然科学版, 2018, 24(3): 352-366. https://www.cnki.com.cn/Article/CJFDTOTAL-SDXZ201803004.htm

    RAN Peng, WANG Ling, LI Xin, et al. Deep convolution neural network based on improved softmax classifier and its application in face recognition[J]. Journal of Shanghai University: Natural Science Edition, 2018, 24(3): 352-366. https://www.cnki.com.cn/Article/CJFDTOTAL-SDXZ201803004.htm
    [17]
    Yasin H, Hussain M, Weber A. Keys for Action: An Efficient Keyframe-Based Approach for 3D Action Recognition Using a Deep Neural Network[J]. Sensors, 2020, 20(8): 2226. DOI: 10.3390/s20082226
    [18]
    GAO Chenqiang, DU Yinhe, LIU Jiang, et al. InfAR dataset: Infrared action recognition at different times[J]. Neurcomputing, 2016, 212: 36-47. DOI: 10.1016/j.neucom.2016.05.094
    [19]
    WANG H, SCHMID C. Action recognition with improved trajectories[C]//Proceedings of the 2013 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2013: 3551-3558.
    [20]
    Du Tran, Lubomir Bourdev, Rob Fergus, et al. Learning spatiotemporal features with 3D convolutional networks[C]//Proceedings of the 2015 IEEE, International Conference on Computer Vision. Piscataway: IEEE, 2015: 4489-4497.
    [21]
    杨天明, 陈志, 岳文静. 基于视频深度学习的时空双流人物动作识别模型[J]. 计算机应用, 2018, 38(3): 895-899. https://www.cnki.com.cn/Article/CJFDTOTAL-JSJY201803050.htm

    YANG T M, CHENG Z, YU, W J, et al. Spatio-temporal two-stream human action recognition model based on video deep learning[J]. Journal of Computer Applications, 2018, 38(3): 895-899, 915. https://www.cnki.com.cn/Article/CJFDTOTAL-JSJY201803050.htm
    [22]
    LIN S, JIA K, CHEN K, et al. Lattice long short-term memory for human action recognition[C]// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2017: 2166-2175.
    [23]
    Carrlira J, Gisslrman A. Quo vadis. action recognition? A new model and the kinetics dataset[C]//Proceedings of the 2017 IEEE, Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 4724-4733.
    [24]
    SUN S, KUANG Z, SHENG L, et al. Optical Flow Guided Feature: A Fast and Robust Motion Representation for Video Action Recognition[C]//The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018: 20118-20132.
  • Cited by

    Periodical cited type(0)

    Other cited types(1)

Catalog

    Article views (213) PDF downloads (37) Cited by(1)
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return