刘富宽, 罗素云, 何佳, 查超能. FVIT-YOLO v8:基于多尺度融合注意机制的改进YOLO v8小目标检测[J]. 红外技术, 2024, 46(8): 912-922.
引用本文: 刘富宽, 罗素云, 何佳, 查超能. FVIT-YOLO v8:基于多尺度融合注意机制的改进YOLO v8小目标检测[J]. 红外技术, 2024, 46(8): 912-922.
LIU Fukuan, LUO Suyun, HE Jia, ZHA Chaoneng. FVIT-YOLO v8: Improved YOLO v8 Small Object Detection Based on Multi-scale Fusion Attention Mechanism[J]. Infrared Technology , 2024, 46(8): 912-922.
Citation: LIU Fukuan, LUO Suyun, HE Jia, ZHA Chaoneng. FVIT-YOLO v8: Improved YOLO v8 Small Object Detection Based on Multi-scale Fusion Attention Mechanism[J]. Infrared Technology , 2024, 46(8): 912-922.

FVIT-YOLO v8:基于多尺度融合注意机制的改进YOLO v8小目标检测

FVIT-YOLO v8: Improved YOLO v8 Small Object Detection Based on Multi-scale Fusion Attention Mechanism

  • 摘要: 本文研究了遥感与无人机航拍图像中的小目标检测问题。由于这类图像存在目标尺度小、目标分布密集、背景复杂等特点,使得特征提取困难。目前针对小目标检测的算法,为了提升精度,大多忽略了参数量与推理速度的影响,这使得算法缺乏实用性。针对上述问题,本文提出了一种基于轻量化的多尺度融合注意机制的改进YOLO v8小目标检测算法。算法首先在YOLO v8的FPN结构中加入F算子,设计了多尺度特征的加权融合;然后在网络预测层剔除了P4、P5预测层,加入P2层用于小目标的预测;最后对轻量化自注意力机制进行图像输入网格化分割整合改进,并用它替换了FPN中的C2f模块,使得算法具有更好的全局感知能力,并大幅降低了参数量。与YOLO v8s相比,本文算法在DOTA数据集上的mAP提升了4.4%,网络参数量下降了52%,FPS达到了46帧/s。在VisDrone数据集中,本算法在精度上提升了8.2%。

     

    Abstract: This study investigates the problem of small-target detection in remote sensing and drone aerial images. These images have the characteristics of a small target scale, dense target distribution, and complex background, which makes feature extraction difficult. Most current algorithms for small-target detection ignore the impact of parameter quantity and inference speed on the practicality of the algorithm to improve accuracy. Therefore, this algorithm is impractical. To address these problems, this study proposes an improved YOLO v8 small target detection algorithm based on a lightweight multiscale fusion attention mechanism. The algorithm first adds the F operator to the FPN structure of YOLO v8, designs the weighted fusion of multiscale features, removes the P4 and P5 prediction layers in the network prediction layer, adds a P2 layer for small target prediction, improves the image input grid segmentation integration of the lightweight attention mechanism, and replaces the C2f module in the improved FPN with it, thereby improving the algorithm have better global perception ability and greatly reducing the parameter quantity. Compared to YOLO v8s, the mAP of this algorithm on the DOTA dataset increased by 4.4%, the network parameter quantity was reduced by 52%, and the FPS reached 46 frames. For the VisDrone dataset, this algorithm improved the accuracy by 8.3%.

     

/

返回文章
返回