Infrared Image Object Detection Method Based on DCS-YOLOv8 Model
-
摘要:
针对低信噪比与复杂任务场景下,YOLOv8模型对红外遮挡目标和弱小目标检测能力不足的问题,提出了改进的DCS-YOLOv8模型(DCN_C2f-CA-SIoU-YOLOv8)的目标检测方法。以YOLOv8框架为基础,主干网络构建了基于可变形卷积的轻量级DCN_C2f(Deformable Convolution Network)模块,自适应调整网络的视觉感受野,提高目标多尺度特征表示能力。特征融合网络引入基于坐标注意力机制CA(Coordinate Attention)的模块,通过捕捉多目标空间位置依赖关系,提高目标的定位准确性。改进基于SIoU(Scylla IoU)的位置回归损失函数,实现预测框与真实框之间的相对位移方向匹配,加快模型收敛速度并提升检测与定位精度。实验结果表明,相较于YOLOv8-n\s\m\l\x系列模型,DCS-YOLOv8在FLIR、OTCBVS与VEDAI测试集上平均精度均值mAP@0.5平均提高了6.8%、0.6%、4.0%,分别达到86.5%、99.0%与75.6%。同时,模型的推理速度满足红外目标检测任务的实时性要求。
Abstract:In response to the challenges posed by low signal-to-noise ratios and complex task scenarios, an improved detection method called DCS-YOLOv8 (DCN_C2f-CA-SIoU-YOLOv8) is proposed to address the insufficient infrared occluded object detection and weak target detection capabilities of the YOLOv8 model. Building on the YOLOv8 framework, the backbone network incorporates a lightweight deformable convolution network (DCN_C2f) module based on deformable convolutions, which adaptively adjusts the network's visual receptive field to enhance the multi-scale feature representation of objects. The feature fusion network introduces the coordinate attention (CA) module based on coordinate attention mechanisms to capture spatial dependencies among multiple objects, thereby improving the object localization accuracy. Additionally, the position regression loss function is enhanced using Scylla IoU to ensure a relative displacement direction match between the predicted and ground truth boxes. This improvement accelerates the model convergence speed and enhances the detection and localization accuracy. The experimental results demonstrate that DCS-YOLOv8 achieves significant improvements in the average precision of the FLIR, OTCBVS, and VEDAI test sets compared to the YOLOv8-n\s\m\l\x series models. Specifically, the average mAP@0.5 values are enhanced by 6.8%, 0.6%, and 4.0% respectively, reaching 86.5%, 99.0%, and 75.6%. Furthermore, the model's inference speed satisfies the real-time requirements for infrared object detection tasks.
-
-
表 1 模型训练超参数设置
Table 1 Model training hyperparameter settings
Hyperparameter options Setting Input Resolution 640×640 Initial Learning Rate 0 (lr0) 0.01 Learning Rate Float (lrf) 0.01 Momentum 0.937 Weight_Decay 0.0005 Batch_Size 4 Epochs 200 表 2 不同数据集上消融实验结果对比
Table 2 Comparison of ablation experiment results on different datasets
Models 1 Params/M GFLOPs Precision /% 2 Recall /% 2 mAP@0.5 /% 2 B D C S D1 D2 D3 D1 D2 D3 D1 D2 D3 √ 3.2 8.2 74.5 94.1 73.2 68.6 90.0 43.5 77.2 97.6 60.5 √ √ 3.4 8.3 80.1 94.5 74.4 74.3 90.2 43.9 79.5 98.0 61.3 √ √ 3.2 8.2 80.0 94.4 80.1 73.1 93.3 49.6 78.0 97.9 62.8 √ √ 3.2 8.2 80.3 95.7 73.8 75.5 94.7 68.1 80.8 97.8 64.3 √ √ √ 3.4 8.3 80.5 94.3 71.7 75.2 93.3 69.8 80.5 98.2 67.6 √ √ √ 3.4 8.3 80.8 98.5 69.3 75.5 96.3 68.0 81.5 98.3 68.1 √ √ √ 3.2 8.2 81.2 99.5 69.5 75.6 95.4 72.1 82.0 98.0 70.5 √ √ √ √ 3.4 8.3 81.1 99.3 73.5 75.7 95.9 70.5 83.1 98.5 71.3 1 B: Base(Yolov8n), D: DCN_C2f, C: CA, S: SIoU. 2 D1: FLIR, D2: OTCBVS, D3: VEDAI. 表 3 不同模型的目标检测实验结果
Table 3 Results of different object detection model
Models Params/M GFLOPs mAP@0.5/%1 Inference/(ms) 1 D1 D2 D3 D1 D2 D3 Faster R-CNN 15.8 28.3 71.1 87.8 52.4 30.4 102.3 63.1 YOLOv3_tiny 8.7 13.0 74.2 90.5 58.1 12.6 37.1 21.3 YOLOv5n 7.0 16.0 75.1 95.8 59.3 6.9 25.1 11.7 YOLOv8n 3.2 8.2 77.2 97.6 67.5 7.1 23.7 9.9 YOLOv8s 11.2 28.8 79.3 98.1 71.5 10.8 29.8 12.3 YOLOv8m 25.9 79.1 81.5 98.5 72.6 20.5 41.0 15.2 YOLOv8l 43.6 165.4 82.7 98.9 74.8 35.1 52.5 19.5 YOLOv8x 68.2 258.1 84.5 99.1 76.9 47.5 70.6 27.1 DCS-YOLOv8n 3.4 8.3 83.1 98.5 72.5 7.1 22.9 10.6 DCS-YOLOv8s 11.3 29.2 85.2 98.9 73.8 10.9 28.7 13.1 DCS-YOLOv8m 25.9 79.5 87.4 99.2 75.9 20.6 38.1 16.4 DCS-YOLOv8l 43.8 165.8 88.1 99.3 77.2 35.3 50.4 21.0 DCS-YOLOv8x 69.1 258.5 88.6 99.3 78.6 47.9 62.7 29.1 1 D1: FLIR, D2: OTCBVS, D3: VEDAI. -
[1] 韩金辉, 魏艳涛, 彭真明, 等. 红外弱小目标检测方法综述[J]. 红外与激光工程, 2022, 51(4): 438-461. https://www.cnki.com.cn/Article/CJFDTOTAL-HWYJ202204050.htm HAN J H, WEI Y T, PENG Z M, et al. Infrared dim and small target detection: a review[J]. Infrared and Laser Engineering, 2022, 51(4): 438-461. https://www.cnki.com.cn/Article/CJFDTOTAL-HWYJ202204050.htm
[2] Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//IEEE Conference on Computer Vision and Pattern Recognition, 2014: 580-587.
[3] ZHAO M, LI W, LI L, et al. Single-frame infrared small-target detection: a survey[J]. IEEE Geoscience and Remote Sensing Magazine, 2022, 10(2): 87-119. DOI: 10.1109/MGRS.2022.3145502
[4] Girshick R. Fast R-CNN[C]//IEEE International Conference on Computer Vision (ICCV), 2015: 1440-1448.
[5] Gavrilescu R, Zet C, Fosalau C, et al. Faster R-CNN: an approach to real-time object detection[C]//Proc of International Conference and Exposition on Electrical and Power Engineering, 2018: 165-168.
[6] CAI Z, Vasconcelos N. Cascade R-CNN: delving into high quality object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018: 6154-6162.
[7] HE Kaiming, Gkioxari Georgia, Dollar Piotr, et al. Mask R-CNN[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 42(2): 318-327. DOI: 10.1109/TPAMI.2018.2858826
[8] WEI Liu, Dragomir Anguelov, Dumitru Erhan, et al. SSD: single shot multibox detector[J]. arXiv, 2015: 1512.02325.
[9] Redmon J, Divvala S, Girshick R, et al. You only look once: unified, real-time object detection[C]//IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016: 779-788.
[10] Redmon J, Farhadi A. Yolov3: An incremental improvement[J]. arXiv, 2018: 1804.02767.
[11] Krizhevsky A, Sutskever I, Hinton Ge. ImageNet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017, 60(6): 84-90. DOI: 10.1145/3065386
[12] 高昂, 梁兴柱, 夏晨星, 等. 一种改进YOLOv8的密集行人检测算法[J]. 图学学报, 2023, 44(5): 890-898. https://www.cnki.com.cn/Article/CJFDTOTAL-GCTX202305005.htm GAO A, LIANG X Z, XIA C X, et al. A dense pedestrian detection algorithm with improved Yolov8[J]. Journal of Graphics, 2023, 44(5): 890-898. https://www.cnki.com.cn/Article/CJFDTOTAL-GCTX202305005.htm
[13] 陈皋, 王卫华, 林丹丹. 基于无预训练卷积神经网络的红外车辆目标检测[J]. 红外技术, 2021, 43(4): 342-348. http://hwjs.nvir.cn/cn/article/id/8142853e-c38f-43ff-8915-4810e1948dc3?viewType=HTML CHEN G, WANG W H, LIN D D. Infrared vehicle target detection based on convolutional neural network without pre-training[J]. Infrared Technology, 2021, 43(4): 342-348. http://hwjs.nvir.cn/cn/article/id/8142853e-c38f-43ff-8915-4810e1948dc3?viewType=HTML
[14] 周颖, 颜毓泽, 陈海永, 等. 基于改进YOLOv8的光伏电池缺陷检测[J]. 激光与光电子学进展, 2024, 61(8): 0812008. https://www.cnki.com.cn/Article/CJFDTOTAL-JGDJ202408025.htm ZHOU Y, YAN Y Z, CHEN H Y et al. Defect detection of photovoltaic cells based on improved Yolov8[J]. Laser & Optoelectronics Progress, 2024, 61(8): 0812008. https://www.cnki.com.cn/Article/CJFDTOTAL-JGDJ202408025.htm
[15] HOU L, LU K, XUE J, et al. Cascade detector with feature fusion for arbitrary-oriented objects in remote sensing images[C]//IEEE International Conference on Multimedia and Expo, 2020: 1-6.
[16] XU D, WU Y. FE-YOLO: A feature enhancement network for remote sensing target detection[J]. Remote Sensing, 2021, 13(7): 1311. DOI: 10.3390/rs13071311
[17] LIU W, MA L, WANG J, et al. Detection of multiclass objects in optical remote sensing images[J]. IEEE Geoscience and Remote Sensing Letters, 2018, 16(5): 791-795.
[18] HU J, ZHI X, SHI T, et al. PAG-YOLO: a portable attention-guided YOLO network for small ship detection[J]. Remote Sensing, 2021, 13(16): 3059. DOI: 10.3390/rs13163059
[19] CHEN L, SHI W, DENG D. Improved YOLOv3 based on attention mechanism for fast and accurate ship detection in optical remote sensing images[J]. Remote Sensing, 2021, 13(4): 660. DOI: 10.3390/rs13040660
[20] Gevorgyan Z. Siou Loss: More powerful learning for bounding box regression[J]. arXiv, 2022: 2205.12740.
[21] XU Z, XU X, WANG L, et al. Deformable convnet with aspect ratio constrained NMS for object detection in remote sensing imagery[J]. Remote Sensing, 2017, 9(12): 1312-1331. DOI: 10.3390/rs9121312
[22] LI C, LUO B, HONG H, et al. Object detection based on global-local saliency constraint in aerial images[J]. Remote Sensing, 2020, 12(9): 1435-1457. DOI: 10.3390/rs12091435
[23] ZHENG Z, ZHONG Y F, MA A L, et al. HyNet: hyper-scale object detection network framework for multiple spatial resolution remote sensing imagery[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2020, 166: 1-14. DOI: 10.1016/j.isprsjprs.2020.04.019
[24] 王建军, 魏江, 梅少辉, 等. 面向遥感图像小目标检测的改进YOLOv3算法[J]. 计算机工程与应用, 2021, 57(20): 133-141. https://www.cnki.com.cn/Article/CJFDTOTAL-JSGG202120016.htm WANG J J, WEI J, MEI S H, et al. Improved Yolov3 for small object detection in remote sensing image[J]. Computer Engineering and Applications, 2021, 57(20): 133-141. https://www.cnki.com.cn/Article/CJFDTOTAL-JSGG202120016.htm
[25] 张瑶, 潘志松. GP-YOLOX: 无预训练的轻量级红外目标检测模型[J]. 计算机技术与发展, 2022, 32(12): 165-172. https://www.cnki.com.cn/Article/CJFDTOTAL-WJFZ202212025.htm ZHANG Y, PAN Z S. GP-YOLOX: Light-weight infrared object detection model without pre-training[J]. Computer Technology and Development, 2022, 32(12): 165-172. https://www.cnki.com.cn/Article/CJFDTOTAL-WJFZ202212025.htm
[26] DAI J, QI H, XIONG Y, et al. Deformable Convolutional Networks[C]//IEEE International Conference on Computer Vision (ICCV), 2017: 764-777.
[27] DENG L, GONG Y, LU X, et al. Focus-enhanced scene text recognition with deformable convolutions[C]//Proceedings of the 5th International Conference on Computer and Communications, 2019: 1685-1689.
[28] XI W, SUN L, SUN J. Upgrade your network in-place with deformable convolution[C]//Proceedings of the 19th International Symposium on Distributed Computing and Applications for Business Engineering and Science, 2020: 239-242.
[29] LIN T, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 42(2): 318-327. DOI: 10.1109/TPAMI.2018.2858826
[30] RAZAKARIVONY S, JURIE F. Vehicle detection in aerial imagery: A small target detection benchmark[J]. Journal of Visual Communication and Image Representation, 2016, 32(1): 187-203.