LIU Fukuan, LUO Suyun, HE Jia, ZHA Chaoneng. FVIT-YOLO v8: Improved YOLO v8 Small Object Detection Based on Multi-scale Fusion Attention Mechanism[J]. Infrared Technology , 2024, 46(8): 912-922.
Citation: LIU Fukuan, LUO Suyun, HE Jia, ZHA Chaoneng. FVIT-YOLO v8: Improved YOLO v8 Small Object Detection Based on Multi-scale Fusion Attention Mechanism[J]. Infrared Technology , 2024, 46(8): 912-922.

FVIT-YOLO v8: Improved YOLO v8 Small Object Detection Based on Multi-scale Fusion Attention Mechanism

More Information
  • Received Date: April 25, 2023
  • Revised Date: August 01, 2024
  • This study investigates the problem of small-target detection in remote sensing and drone aerial images. These images have the characteristics of a small target scale, dense target distribution, and complex background, which makes feature extraction difficult. Most current algorithms for small-target detection ignore the impact of parameter quantity and inference speed on the practicality of the algorithm to improve accuracy. Therefore, this algorithm is impractical. To address these problems, this study proposes an improved YOLO v8 small target detection algorithm based on a lightweight multiscale fusion attention mechanism. The algorithm first adds the F operator to the FPN structure of YOLO v8, designs the weighted fusion of multiscale features, removes the P4 and P5 prediction layers in the network prediction layer, adds a P2 layer for small target prediction, improves the image input grid segmentation integration of the lightweight attention mechanism, and replaces the C2f module in the improved FPN with it, thereby improving the algorithm have better global perception ability and greatly reducing the parameter quantity. Compared to YOLO v8s, the mAP of this algorithm on the DOTA dataset increased by 4.4%, the network parameter quantity was reduced by 52%, and the FPS reached 46 frames. For the VisDrone dataset, this algorithm improved the accuracy by 8.3%.

  • [1]
    Everingham M, Van Gool L, Williams C K I, et al. The pascal vision object classes (voc) challenge[J]. International Journal of Computer Vision, 2009, 88: 303-308.
    [2]
    LIN T Y, Maire M, Belongie S, et al. Microsoft coco: lofxol common objects in context[C]//13th European Conference, 2014: 740-755.
    [3]
    Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014: 580-587.
    [4]
    Girshick R. Fast r-cnn[C]//Proceedings of the IEEE International Conference on Computer Vision, 2015: 1440-1448.
    [5]
    HE K, Gkioxari G, Dollár P, et al. Mask r-cnn[C]//Proceedings of the IEEE International Conference on Computer Vision, 2017: 2961-2969.
    [6]
    Purkait P, Zhao C, Zach C. SPP-Net: Deep absolute pose regression with synthetic views[J]. arXiv preprint arXiv: 1712.03452, 2017.
    [7]
    Redmon J, Divvala S, Girshick R, et al. You only look once: Unified, real-time object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 779-788.
    [8]
    Bochkovskiy A, WANG C Y, LIAO H Y M. Yolov4: Optimal speed and accuracy of object detection[J]. arXiv preprint arXiv: 2004.10934, 2020.
    [9]
    WANG C Y, Bochkovskiy A, LIAO H Y M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[J]. arXiv preprint arXiv: 2207.02696, 2022.
    [10]
    HAN X, CHANG J, WANG K. Real-time object detection based on YOLO-v2 for tiny vehicle object[J]. Procedia Computer Science, 2021, 183: 61-72. DOI: 10.1016/j.procs.2021.02.031
    [11]
    LIU W, Anguelov D, Erhan D, et al. Ssd: Single shot multibox detector[C]//Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, 2016: 21-37.
    [12]
    ZHU M, XU Y, MA S, et al. Effective airplane detection in remote sensing images based on multilayer feature fusion and improved nonmaximal suppression algorithm[J]. Remote Sensing, 2019, 11(9): 1062. DOI: 10.3390/rs11091062
    [13]
    DONG Z, LIN B. BMF-CNN: an object detection method based on multi-scale feature fusion in VHR remote sensing images[J]. Remote Sensing Letters, 2020, 11(3): 215-224. DOI: 10.1080/2150704X.2019.1706007
    [14]
    ZHU H, ZHANG P, WANG L, et al. A multiscale object detection approach for remote sensing images based on MSE-DenseNet and the dynamic anchor assignment[J]. Remote Sensing Letters, 2019, 10(10): 959-967. DOI: 10.1080/2150704X.2019.1633486
    [15]
    ZHANG X, ZHU K, CHEN G, et al. Geospatial object detection on high resolution remote sensing imagery based on double multi-scale feature pyramid network[J]. Remote Sensing, 2019, 11(7): 755. DOI: 10.3390/rs11070755
    [16]
    ZHUANG S, WANG P, JIANG B, et al. A single shot framework with multi-scale feature fusion for geospatial object detection[J]. Remote Sensing, 2019, 11(5): 594. DOI: 10.3390/rs11050594
    [17]
    CHENG G, SI Y, HONG H, et al. Cross-scale feature fusion for object detection in optical remote sensing images[J]. IEEE Geoscience and Remote Sensing Letters, 2020, 18(3): 431-435.
    [18]
    Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16x16 words: Transformers for image recognition at scale[J]. arXiv preprint arXiv: 2010.11929, 2020.
    [19]
    Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[J]. Advances in Neural Information Processing Systems, 2017, 30: 6000-6010.
    [20]
    ZHU X, LYU S, WANG X, et al. TPH-YOLOv5: Improved YOLOv5 based on transformer prediction head for object detection on drone-captured scenarios[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021: 2778-2788.
    [21]
    LIU Z, LIN Y, CAO Y, et al. Swin transformer: hierarchical vision transformer using shifted windows[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021: 10012-10022.
    [22]
    XIA G S, BAI X, DING J, et al. DOTA: A large-scale dataset for object detection in aerial images[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018: 3974-3983.
    [23]
    Targ S, Almeida D, Lyman K. Resnet in resnet: generalizing residual architectures[J]. arXiv preprint arXiv: 1603.08029, 2016.
    [24]
    HAN K, XIAO A, WU E, et al. Transformer in transformer[J]. Advances in Neural Information Processing Systems, 2021, 34: 15908-15919.
    [25]
    LI X, WANG W, WU L, et al. Generalized focal loss: learning qualified and distributed bounding boxes for dense object detection[J]. Advances in Neural Information Processing Systems, 2020, 33: 21002-21012.
    [26]
    DAI J, LI Y, HE K, et al. R-fcn: Object detection via region-based fully convolutional networks[J]. Advances in Neural Information Processing Systems, 2016, 29: 379-387.
    [27]
    Selvaraju R R, Cogswell M, Das A, et al. Grad-cam: Vision explanations from deep networks via gradient-based localization[C]//Proceedings of the IEEE International Conference on Computer Vision, 2017: 618-626.
    [28]
    Visdrone Team. Visdrone2020leaderboard [EB/OL][2020-07-10]. http://aiskyeye.com/visdrone-2020-leaderboard/.
    [29]
    CHENG X, YU J. RetinaNet with difference channel attention and adaptively spatial feature fusion for steel surface defect detection[J]. IEEE Transactions on Instrumentation and Measurement, 2020, 70: 1-11.
    [30]
    ZHANG S, WEN L, BIAN X, et al. Single-shot refinement neural network for object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018: 4203-4212.
    [31]
    LI Z, PENG C, YU G, et al. Detnet: a backbone network for object detection[J]. arXiv preprint arXiv: 1804.06215, 2018.
    [32]
    CAI Z, Vasconcelos N. Cascade r-cnn: delving into high quality object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018: 6154-6162.
    [33]
    LIN T Y, Dollár P, Girshick R, et al. Feature pyramid networks for object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017: 2117-2125.
    [34]
    LI Z, PENG C, YU G, et al. Light-head r-cnn: In defense of two-stage object detector[J]. arXiv preprint arXiv: 1711.07264, 2017.
    [35]
    Law H, DENG J. Cornernet: detecting objects as paired keypoints[C]//Proceedings of the European Conference on Computer Vision (ECCV), 2018: 734-750.
  • Related Articles

    [1]CHEN Zhuang, HE Feng, HONG Xiaohang, ZHANG Qiran, YANG Yuyan. Embedded Platform IR Small-target Detection Based on Self-attention and Convolution Fused Architecture[J]. Infrared Technology , 2025, 47(1): 89-96.
    [2]DI Jing, LIANG Chan, REN Li, GUO Wenqing, LIAN Jing. Infrared and Visible Image Fusion Based on Multi-Scale Contrast Enhancement and Cross-Dimensional Interactive Attention Mechanism[J]. Infrared Technology , 2024, 46(7): 754-764.
    [3]ZHAO Songpu, YANG Liping, ZHAO Xin, PENG Zhiyuan, LIANG Dongxing, LIANG Hongjun. Object Detection in Visible Light and Infrared Images Based on Adaptive Attention Mechanism[J]. Infrared Technology , 2024, 46(4): 443-451.
    [4]HE Le, LI Zhongwei, LUO Cai, REN Peng, SUI Hao. Infrared and Visible Image Fusion Based on Dilated Convolution and Dual Attention Mechanism[J]. Infrared Technology , 2023, 45(7): 732-738.
    [5]CHEN Xin. Infrared and Visible Image Fusion Using Double Attention Generative Adversarial Networks[J]. Infrared Technology , 2023, 45(6): 639-648.
    [6]CHEN Yanlin, WANG Zhishe, SHAO Wenyu, YANG Fan, SUN Jing. Multi-scale Transformer Fusion Method for Infrared and Visible Images[J]. Infrared Technology , 2023, 45(3): 266-275.
    [7]WANG Tianyuan, LUO Xiaoqing, ZHANG Zhancheng. Infrared and Visible Image Fusion Based on Self-attention Learning[J]. Infrared Technology , 2023, 45(2): 171-177.
    [8]HUANG Linglin, LI Qiang, LU Jinzheng, HE Xianzhen, PENG Bo. Infrared and Visible Image Fusion Based on Multi-scale and Attention Model[J]. Infrared Technology , 2023, 45(2): 143-149.
    [9]CHEN Da, HE Quancai, DI Erzhen, DENG Zaozhu. Application of Partial Differential Segmentation Model with Adaptive Weight in Infrared Image of Substation Equipment[J]. Infrared Technology , 2022, 44(2): 179-188.
    [10]WU Yuanyuan, WANG Zhishe, WANG Junyao, SHAO Wenyu, CHEN Yanlin. Infrared and Visible Image Fusion Using Attention- Based Generative Adversarial Networks[J]. Infrared Technology , 2022, 44(2): 170-178.
  • Cited by

    Periodical cited type(1)

    1. 杨晓超,郝慧良. 矿用电缆放电监测系统研究设计. 中国煤炭. 2024(S1): 406-410 .

    Other cited types(0)

Catalog

    Article views (406) PDF downloads (129) Cited by(1)
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return