Abstract:
Aerial small object detection faces significant challenges due to insufficient feature representation, background confusion, and dense object distribution. These issues make the task particularly difficult, especially when algorithms are deployed in environments with limited computational resources, where both accuracy and speed must be extensively optimized. To address these problems, this paper proposes a lightweight detection model: MSAF-YOLO. The model integrates feature aggregation, feature enhancement, and spatial awareness to improve detection performance. By optimizing the network structure, the detection head P5—originally used for large object detection—is removed, and a P2 detection head is added to strengthen the model’s focus on small objects. MSAF-YOLO introduces three innovative modules: the MultiScale Spatial Aggregation Module (MSAM), the Multi-Scale Edge Feature Enhancement Module (MEFEM), and the Channel-Spatial Attention Module (CSAM). These modules respectively enhance the model's capabilities in multi-scale feature fusion, local feature perception, and global cross-channel and cross-spatial correlation, without increasing model complexity. Experimental results on the Visdrone2019 dataset show that compared to the baseline model YOLOv8s, MSAF-YOLO achieves an 8.2% improvement in mAP
50 and a 5.9% improvement in mAP
50-95, while reducing the number of parameters by 56.3%. The proposed method demonstrates superior performance in small object detection tasks, verifying its effectiveness and practical value.