Abstract:
In dense pedestrian detection, the number of targets is large and often accompanied by interference factors such as occlusion and complex background, which is easy to cause problems such as insufficient detection accuracy, missing detection and false detection. To address these challenges, a YOLOv10-n modified model MSD-YOLO (C2f-MCLU, STNet, DyHead-DCNv4) was developed. The C2f-MCLU module is proposed, which effectively improves the capability of feature expression by establishing a close interdependence between channel dimension and spatial position. A bidirectional fusion pyramid structure with small target enhancement is designed to reconstruct the neck network, so that the model can extract more subtle features. The DyHead-DCNv4 detection head was constructed to further improve the recognition ability of severely occluded people. The experimental results show that compared with YOLOv10-n, the accuracy of the improved model on the Crowd Human and Wider Person data sets is increased by 3.3% and 1.5%, respectively, while the parameter number is only 3.0M and the computation amount is 10.6GFLOPs, which meets the requirements of high-precision deployment under the environment of low computing power.