留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于YOLO-MIR算法的多尺度红外目标检测网络

周金杰 吉莉 张倩 张宝辉 袁茜琳 刘燕晴 岳江

周金杰, 吉莉, 张倩, 张宝辉, 袁茜琳, 刘燕晴, 岳江. 基于YOLO-MIR算法的多尺度红外目标检测网络[J]. 红外技术, 2023, 45(5): 506-512.
引用本文: 周金杰, 吉莉, 张倩, 张宝辉, 袁茜琳, 刘燕晴, 岳江. 基于YOLO-MIR算法的多尺度红外目标检测网络[J]. 红外技术, 2023, 45(5): 506-512.
ZHOU Jinjie, JI Li, ZHANG Qian, ZHANG Baohui, YUAN Xilin, LIU Yanqing, YUE Jiang. Multiscale Infrared Object Detection Network Based on YOLO-MIR Algorithm[J]. Infrared Technology , 2023, 45(5): 506-512.
Citation: ZHOU Jinjie, JI Li, ZHANG Qian, ZHANG Baohui, YUAN Xilin, LIU Yanqing, YUE Jiang. Multiscale Infrared Object Detection Network Based on YOLO-MIR Algorithm[J]. Infrared Technology , 2023, 45(5): 506-512.

基于YOLO-MIR算法的多尺度红外目标检测网络

详细信息
    作者简介:

    周金杰(1998-),男,硕士研究生,主要从事红外图像处理方面的研究。E-mail: 1943035411@qq.com

    通讯作者:

    张宝辉(1984-),男,正高级工程师,博士,主要从事红外图像处理方面的研究。E-mail:zbhmatt@163.com

  • 中图分类号: TP391.4

Multiscale Infrared Object Detection Network Based on YOLO-MIR Algorithm

  • 摘要: 针对红外图像相对于可见光检测精度低,鲁棒性差的问题,提出了一种基于YOLO的多尺度红外图目标检测网络YOLO-MIR(YOLO for Multi-scale IR image)。首先,为了提高网络对红外图像的适应能力,改进了特征提取以及融合模块,使其保留更多的红外图像细节。其次,为增强对多尺度目标的检测能力,增大了融合网络的尺度,加强红外图像特征的进一步融合。最后,为增加网络的鲁棒性,设计了针对红外图像的数据增广算法。设置消融实验评估不同方法对网络性能的影响,结果表明在红外数据集下网络性能得到明显提升。与主流算法YOLOv7相比在参数量不变的条件下平均检测精度提升了3%,提高了网络对红外图像的适应能力,实现了对各尺度目标的精确检测。
  • 图  1  YOLO-MIR网络结构,Backbone负责特征提取,Neck负责特征融合,Head负责分类预测

    Figure  1.  YOLO-MIR network structure, Backbone is responsible for feature extraction, Neck is responsible for feature fusion, and Head is responsible for classification prediction.

    图  2  单通道红外图像的池化操作

    Figure  2.  Pooling operation for single channel IR images

    图  3  多尺度特征金字塔结构

    Figure  3.  Multi-scale feature pyramid structure

    图  4  CIOU原理图

    Figure  4.  CIOU schematic

    图  5  可见光预处理算法

    Figure  5.  Visible image preprocessing algorithm

    图  6  灰度反转算法

    Figure  6.  Grayscale inversion algorithm

    图  7  网络训练时的loss下降曲线;红色曲线(a)表示使用了本文提出的红外数据增广算法,蓝色曲线(b)表示使用传统数据处理方法

    Figure  7.  Loss descent curve in network training; The red curve (a) indicates the use of the infrared data augmentation algorithm proposed in this paper, and the blue curve (b) indicates the use of traditional methods

    图  8  各网络预测结果对比

    Figure  8.  Comparison of prediction results of each network

    表  1  YOLOv7数据扩充方法在不同数据集上的对比

    Table  1.   Comparison of YOLOv7 data expansion methods on different data sets

    Category Dataset mAP50 / %
    YOLOv7
    (clip, rotating, overturn)
    YOLOv7
    (inverse only)
    Visible Voc[16] 84.0 84.2 0.2↑
    CoCo 69.7 67.9 1.8↓
    Terminal KAIST[17] 94.6 97.1 2.5↑
    FLIR 89.4 90.9 1.5↑
    下载: 导出CSV

    表  2  YOLO-MIR在FLIR数据集上的消融实验

    Table  2.   YOLO-MIR ablation experiments on FLIR dataset

    YOLOv7 Avg pooling Data argument Multi-scale integration mAP50/%
    90.0
    90.5
    90.9
    91.6
    92.7
    下载: 导出CSV

    表  3  YOLO-MIR与其他网络在FLIR数据集上的对比实验

    Table  3.   Experiments comparing YOLO-MIR with other networks on FLIR dataset

    Methods mAP/% Person/% Bicycle/% Car/% Parameters FLOPs/B
    Faster R-CNN 79.2 76.4 72.5 88.4 41.2M 156.1
    YOLOv4 79.3 76.2 75.1 87.3 63.9M 128.3
    YOLOv5m 81.6 78.0 78.1 89.2 35.7M 50.2
    SMG-Y[19] 77.0 78.5 65.8 86.6 43.8M 54.7
    PMBW[20] 77.3 81.2 64.0 86.5 36.0M 120.0
    RGBT[21] 82.9 80.1 76.7 91.8 82.7M 130.0
    YOLO-ACN 82.1 79.1 57.9 85.1 34.5M 111.5
    YOLOv7 89.7 88.6 87.2 92.8 36.9M 104.7
    YOLO-MIR 92.7 91.1 91.0 97.2 37.0M 104.8
    下载: 导出CSV
  • [1] Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the IEEE conference on computer vision and pattern recognition, 2014: 580-587.
    [2] Redmon J, Divvala S, Girshick R, et al. You only look once: Unified, real-time object detection[C]//Proceedings of the IEEE conference on computer vision and pattern recognition, 2016: 779-788.
    [3] LI Z, ZHOU F. FSSD: feature fusion single shot multibox detector[J/OL]. arXiv preprint arXiv, 2017, https://arxiv.org/abs/1712.00960.
    [4] Redmon J, Farhadi A. Yolov3: An incremental improvement[J/OL]. arXiv preprint arXiv, 2018, https://arxiv.org/abs/1804.02767.
    [5] Jocher G, Chaurasia A, Stoken A, et al. ultralytics/yolov5: v6.1 - TensorRT, TensorFlow Edge TPU and OpenVINO Export and Inference[Z/OL]. 2022, https://doi.org/10.5281/ZENODO.6222936.
    [6] Bochkovskiy A, Wang C Y, Liao H Y M. Yolov4: Optimal speed and accuracy of object detection[J/OL]. arXiv preprint arXiv, 2020, https://arxiv.org/abs/2004.10934#:~:text=%EE%80%80YOLOv4%3A%20Optimal%20Speed%20and%20Accuracy%20of%20Object%20Detection%EE%80%81.,features%20operate%20on%20certain%20models%20exclusively%20and%20.
    [7] WANG C Y, Bochkovskiy A, LIAO H Y M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[J]. arXiv preprint arXiv, 2022, https://arxiv.org/abs/2207.02696.
    [8] LIU S, QI L, QIN H, et al. Path aggregation network for instance segmentation[C]//Proceedings of the IEEE conference on computer vision and pattern recognition, 2018: 8759-8768.
    [9] Redmon J, Farhadi A. YOLO9000: Better, Faster, Stronger[C]// Conference on Computer Vision & Pattern Recognition. IEEE, 2017: 6517-6525.
    [10] REN S, HE K, Girshick R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2017, 39(6): 1137-1149. http://pubmed.ncbi.nlm.nih.gov/27295650/
    [11] He K, Gkioxari G, Dollár P, et al. Mask r-cnn[C]//Proceedings of the IEEE International Conference on Computer Vision, 2017: 2961-2969.
    [12] ZHENG Z, WANG P, REN D, et al. Enhancing geometric factors in model learning and inference for object detection and instance segmentation[J]. IEEE Transactions on Cybernetics, 2021, 52(8): 8574-8586. http://www.xueshufan.com/publication/3194790201
    [13] Veit A, Matera T, Neumann L, et al. Coco-text: Dataset and benchmark for text detection and recognition in natural images[J]. arXiv preprint arXiv, 2016, https://arxiv.org/abs/1601.07140.
    [14] Smith A R. Color gamut transform pairs[J]. ACM Siggraph Computer Graphics, 1978, 12(3): 12-19. doi:  10.1145/965139.807361
    [15] Zhou Z, Cao J, Wang H, et al. Image denoising algorithm via doubly bilateral filtering[C]// International Conference on Information Engineering and Computer Science. IEEE, 2009: 1-4.
    [16] Hoiem D, Divvala S K, Hays J H. Pascal VOC 2008 challenge[J]. Computer Science, 2009 https://www.semanticscholar.org/paper/Pascal-VOC-2008-Challenge-Hoiem-Divvala/9c327cf1bb8435a8fba27b6ace50bb907078d8d1.
    [17] ZHAO W Y. Discriminant component analysis for face recognition[C]//Proceedings 15th International Conference on Pattern Recognition, IEEE, 2000, 2: 818-821.
    [18] Venkataraman V, FAN G, FAN X. Target tracking with online feature selection in FLIR imagery[C]// IEEE Conference on Computer Vision and Pattern Recognition, IEEE, 2007: 1-8.
    [19] CHEN R, LIU S, MU J, et al. Borrow from source models: efficient infrared object detection with limited examples[J]. Applied Sciences, 2022, 12(4): 1896. doi:  10.3390/app12041896
    [20] Kera S B, Tadepalli A, Ranjani J J. A paced multi-stage block-wise approach for object detection in thermal images[J]. The Visual Computer, 2022, https://doi.org/10.1007/s00371-022-02445-x.
    [21] Vadidar M, Kariminezhad A, Mayr C, et al. Robust Environment Perception for Automated Driving: A Unified Learning Pipeline for Visual-Infrared Object Detection[C]// IEEE Intelligent Vehicles Symposium (Ⅳ). IEEE, 2022: 367-374.
  • 加载中
图(8) / 表(3)
计量
  • 文章访问数:  153
  • HTML全文浏览量:  41
  • PDF下载量:  33
  • 被引次数: 0
出版历程
  • 收稿日期:  2023-02-06
  • 修回日期:  2023-03-31
  • 刊出日期:  2023-05-20

目录

    /

    返回文章
    返回