红外-可见光图像特征交互引导的变电设备目标检测方法

金迪; 吴田; 黎鹏; 邱中华; 何清; 彭勇

红外-可见光图像特征交互引导的变电设备目标检测方法

Infrared-Visible Image Feature Interaction-Guided Substation Equipment Target Detection Method

摘要

摘要: 针对变电设备在复杂场景下单模态目标检测出现漏检、误检及精度差的问题，本文提出了红外-可见光图像特征交互引导的目标检测方法。基于YOLOv8网络框架，首先，构建了一个可以同时输入红外和可见光图像的双分流主干网络；其次，针对红外图像目标细节模糊且存在远距离的小目标、目标点遮挡，可见光图像在低光照情况下目标与背景区分度不足，导致网络对目标特征捕捉能力差的问题，在双分流特征提取网络引入了通道先验卷积注意模块（Channel-prior Convolutional Attention Module，CPCAM），以提升对关键目标的特征提取能力；最后，为解决现有图像融合方法在融合过程中容易出现噪声淹没，提出了一种多模态特征交互引导模块（Multi-modal Cross-guided Feature Module，MCFM），将图像的亮度和对比度作为预测权重，以引导特征共享的方式进行双向模态特征自适应融合。实验表明：本文的图像融合检测方法对比YOLOv8单模态网络在可见光和红外图像检测上的mAP@0.5分别提高了19.98%、35.25%，对比主流图像融合方法的检测精度平均提高了11.08%，并且均无漏检、误检现象。

Abstract: To address the issues of missed detection, false detection, and poor accuracy in single-modal target detection of Substation equipment in complex scenarios, this paper proposes a target detection method guided by infrared-visible image feature interaction. Based on the YOLOv8 network framework, a dual-branch backbone network capable of simultaneously inputting infrared and visible images is first constructed. Secondly, to tackle the problems of blurred target details, long-distance small targets, and occluded target points in infrared images, as well as the insufficient distinction between targets and backgrounds in visible images under low-light conditions, which lead to poor feature capture capability of the network, a Channel-prior Convolutional Attention Module (CPCAM) is introduced into the dual-branch feature extraction network to enhance the feature extraction capability for key targets. Finally, to solve the problem that existing image fusion methods are prone to noise inundation during the fusion process, a Multi-modal Cross-guided Feature Module (MCFM) is utilized the image's light and contrast as predictive weights to guide bidirectional crossmodal adaptive feature fusion through a guided feature sharing mechanism. Experiments show that the image fusion detection method proposed in this paper improves the mAP@0.5 by 19.98% and 35.25% respectively compared with YOLOv8 single-modal network on visible and infrared image detection, and improves the detection accuracy by 11.08% on average compared with mainstream image fusion methods, with no missed detection or false detection.

HTML全文

参考文献(0)

施引文献

资源附件(0)