基于多尺度注意力的无人机红外图像目标检测方法

朱磊; 赵兴瑞; 李光健

基于多尺度注意力的无人机红外图像目标检测方法

Unmanned Aerial Vehicle Infrared Image Target Detection Method Based on Multi-Scale Attention

摘要

摘要: 无人机拍摄的红外图像纹理信息少、边缘特征弱，常导致轻量型网络对红外图像中的小尺度目标检测精度提升困难，为此，提出了一种基于多尺度注意力的轻量型无人机红外目标检测网络方法UIDNet。该方法在YOLOv8网络基础上，首先对原网络最后一层下采样进行裁剪，在降低网络参数量的同时改善过深的卷积神经网络导致的小目标细节特征丢失问题；然后，基于跨空间学习的高效多尺度注意力（Efficient Multi-Scale Attention，EMA）构造特征提取模块C2f-EMA，通过通道重塑和维度分组在抑制背景环境干扰的同时，最大程度保留并突出小目标特征；最后，引入WIoU损失函数代替原网络CIoU损失函数，实现目标权重的动态调整，以进一步提高网络对小目标的检测性能。实验结果表明，与YOLOv8n和PiCoDet等其他先进网络在HIT-UAV数据集上相比，UIDNet拥有更小的模型体积与更好的检测效果，相对于原YOLOv8n模型，UIDNet的平均检测精度提升了1.7%，参数量减少了67.4%，模型体积压缩了63.5%，仅为2.3 MB。

Abstract: Infrared images captured by drones contain limited textural information and weak edge features, making it difficult for lightweight networks to improve the detection accuracy of small-scale targets in infrared images. Based on the YOLOv8N network, the last layer of the original network was first trimmed to reduce the number of network parameters and improve the loss of small target detail features caused by the deep convolutional neural network. Subsequently, based on the efficient multi-scale attention (EMA) of cross-spatial learning, the feature-extraction module C2f-EMA was constructed, which retained and highlighted the small target features to the greatest extent while suppressing the interference of the background environment through channel remodeling and dimensional grouping. Finally, the WIoU loss function was introduced to replace the original network CIoU loss function and realize the dynamic adjustment of the target weight to further improve the detection performance of the network for small targets. The experimental results show that compared with other advanced networks, such as YOLOv8n and PiCoDet on the HIT-UAV dataset, UIDNet has a smaller model size and better detection effect than the original YOLOv8n model. The average detection accuracy of UIDNet was increased by 1.7%, the number of parameters was reduced by 67.4%, and the model volume was compressed by 63.5% to only 2.3 MB.

HTML全文

参考文献(24)

施引文献

资源附件(0)