留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于无预训练卷积神经网络的红外车辆目标检测

陈皋 王卫华 林丹丹

陈皋, 王卫华, 林丹丹. 基于无预训练卷积神经网络的红外车辆目标检测[J]. 红外技术, 2021, 43(4): 342-348.
引用本文: 陈皋, 王卫华, 林丹丹. 基于无预训练卷积神经网络的红外车辆目标检测[J]. 红外技术, 2021, 43(4): 342-348.
CHEN Gao, WANG Weihua, LIN Dandan. Infrared Vehicle Target Detection Based on Convolutional Neural Network without Pre-training[J]. Infrared Technology , 2021, 43(4): 342-348.
Citation: CHEN Gao, WANG Weihua, LIN Dandan. Infrared Vehicle Target Detection Based on Convolutional Neural Network without Pre-training[J]. Infrared Technology , 2021, 43(4): 342-348.

基于无预训练卷积神经网络的红外车辆目标检测

详细信息
    作者简介:

    陈皋(1996-),男,安徽人,硕士研究生,研究方向为红外图像处理、目标检测。E-mail:jzchenriver@163.com

  • 中图分类号: TP391

Infrared Vehicle Target Detection Based on Convolutional Neural Network without Pre-training

  • 摘要: 为解决基于卷积神经网络的目标检测算法对预训练权重的过度依赖,特别是数据稀缺条件下的红外场景目标检测,提出了融入注意力模块来缓解不进行预训练所带来的检测性能下降的方法。本文基于YOLO v3算法,在网络结构中融入模仿人类注意力机制的SE和CBAM模块,对提取的特征进行通道层面和空间层面的重标定。根据特征的重要程度,自适应地赋予不同权重,最终提升检测精度。在构建的红外车辆目标数据集上,注意力模块能够显著提升无预训练卷积神经网络的检测精度,融入了CBAM模块的网络检测精度为86.3 mAP。实验结果证明了注意力模块能够提升网络的特征提取能力,使网络摆脱对预训练权重的过度依赖。
  • 图  1  YOLO v3中目标位置的预测

    Figure  1.  The prediction of the target location in the YOLO v3

    图  2  SE模块的结构

    Figure  2.  The structure of SE module

    图  3  CBAM模块中的通道注意力

    Figure  3.  The channel attention in the CBAM

    图  4  CBAM模块中的空间注意力

    Figure  4.  The spatial attention in the CBAM

    图  5  CBAM模块的结构

    Figure  5.  The structure of CBAM

    图  6  改进前后的残差块

    Figure  6.  The residual block before and after improvement

    图  7  训练120轮次的测试结果

    Figure  7.  The results on the test dataset when epoch=120

    图  8  训练300轮次的测试结果

    Figure  8.  The results on test dataset when epoch=300

    图  9  测试集的部分检测结果

    Figure  9.  Part of detection results on the test dataset

    表  1  改进的DarkNet-53结构

    Table  1.   The structure of improved DarkNet-53

    Name Type Num Size
    Convolutional 32 3×3
    Convolutional 64 3×3, 2
    Stage 1 $\left( {\begin{array}{*{20}{l}} {\rm{Convolutional}} \\ {\rm{Convolutional}} \\ {\rm{Attention}} \\ \operatorname{Re} {\rm{sidual}} \\ \end{array}} \right) \times 1$ 32
    64
    1×1
    3×3
    Convolutional 128 3×3, 2
    Stage 2 $\left( {\begin{array}{*{20}{l}} {\rm{Convolutional}} \\ {\rm{Convolutional}} \\ {\rm{Attention}} \\ \operatorname{Re} {\rm{sidual}} \\ \end{array}} \right) \times 2$ 64
    128
    1×1
    3×3
    Convolutional 256 3×3, 2
    Stage 3 $\left( {\begin{array}{*{20}{l}} {\rm{Convolutional}} \\ {\rm{Convolutional}} \\ {\rm{Attention}} \\ \operatorname{Re} {\rm{sidual}} \\ \end{array}} \right) \times 8$ 128
    256
    1×1
    3×3
    Convolutional 512 3×3, 2
    Stage 4 $\left( {\begin{array}{*{20}{l}} {\rm{Convolutional}} \\ {\rm{Convolutional}} \\ {\rm{Attention}} \\ \operatorname{Re} {\rm{sidual}} \\ \end{array}} \right) \times 8$ 256
    512
    1×1
    3×3
    Convolutional 1024 3×3, 2
    Stage 5 $\left( {\begin{array}{*{20}{l}} {\rm{Convolutional}} \\ {\rm{Convolutional}} \\ {\rm{Attention}} \\ \operatorname{Re} {\rm{sidual}} \\ \end{array}} \right) \times 4$ 512
    1024
    1×1
    3×3
    下载: 导出CSV

    表  2  改进的预测分支结构

    Table  2.   The structure of improved prediction subnet

    Name Type Num Size
    Prediction $\left( {\begin{array}{*{20}{l}} {\rm{Convolutional}} \\ {\rm{Convolutional}} \\ \end{array}} \right) \times 1$ N
    2N
    N=512, 256, 128
    1×1
    3×3
    Attention
    Convolutional (4+1+Ncls) 1×1
    YOLO
    下载: 导出CSV

    表  3  训练120轮次的实验结果对比

    Table  3.   The comparison of results when epoch=120

    Epoch=120 Precision Recall mAP@0.5
    YOLO v3-mscoco-pretrained 82.3 77.0 84.7
    YOLO v3-no-pretrained 65.3 22.4 38.9
    YOLO v3-SE 68.9 36.2 51.7
    YOLO v3-CBAM 78.2 40.5 61.7
    下载: 导出CSV

    表  4  训练300轮次的实验结果对比

    Table  4.   The comparison of results when epoch=300

    Epoch=300 Precision Recall mAP@0.5
    YOLO v3-no-pre-trained 83.7 65.3 80.6
    YOLO v3-SE 87.0 72.4 85.6
    YOLO v3-CBAM 87.8 75.8 86.3
    下载: 导出CSV
  • [1] Otsu N. A threshold selection method from gray-level histograms[J]. IEEE Transactions on Systems, Man, and Cybernetics, 1979, 9(1): 62-66. doi:  10.1109/TSMC.1979.4310076
    [2] Dalal N, Triggs B. Histograms of oriented gradients for human detection[C]//Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2005: 886-893.
    [3] Suykens J A K, Vandewalle J. Least squares support vector machine classifiers[J]. Neural Processing Letters, 1999, 9(3): 293-300. doi:  10.1023/A:1018628609742
    [4] REN S, HE K, Girshick R, et al. Faster r-cnn: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 39(6): 1137-1149 doi:  10.1109/TPAMI.2016.2577031
    [5] QIN Z, LI Z, ZHANG Z, et al. ThunderNet: Towards real-time generic object detection on mobile devices[C]//Proceedings of the IEEE International Conference on Computer Vision, 2019: 6718-6727.
    [6] Redmon J, Divvala S, Girshick R, et al. You only look once: unified, real-time object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 779-788.
    [7] DUAN K, BAI S, XIE L, et al. Centernet: keypoint triplets for object detection[C]//Proceedings of the IEEE International Conference on Computer Vision, 2019: 6569-6578.
    [8] TANG T, ZHOU S, DENG Z, et al. Vehicle detection in aerial images based on region convolutional neural networks and hard negative example mining[J]. Sensors, 2017, 17(2): 336. doi:  10.3390/s17020336
    [9] DENG J, DONG W, Socher R, et al. Imagenet: a large-scale hierarchical image database[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2009: 248-255.
    [10] LIN T Y, Maire M, Belongie S, et al. Microsoft coco: common objects in context[C]//Proceedings of the European Conference on Computer Vision, 2014: 740-755.
    [11] Redmon J, Farhadi A. YOLOv3: An incremental improvement[DB/OL]. https://arxiv.org/abs/1804.02767.2020-0703.
    [12] HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018: 7132-7141.
    [13] Woo S, Park J, Lee J Y, et al. Cbam: convolutional block attention module[C]//Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany: Springer, 2018: 3-19.
    [14] HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 770-778.
    [15] Krishna K, Murty M N. Genetic K-means algorithm[J]. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 1999, 29(3): 433-439. doi:  10.1109/3477.764879
  • 加载中
图(9) / 表(4)
计量
  • 文章访问数:  280
  • HTML全文浏览量:  151
  • PDF下载量:  61
  • 被引次数: 0
出版历程
  • 收稿日期:  2020-07-17
  • 修回日期:  2020-07-30
  • 刊出日期:  2021-04-20

目录

    /

    返回文章
    返回