基于分组特征提取的轻量型多源目标检测

万军; 周凯; 何文磊

基于分组特征提取的轻量型多源目标检测

1.
三门峡社会管理职业学院, 河南三门峡 472000
2.
西安工程大学电子信息学院, 陕西西安 710048
3.
中国科学院成都计算机应用研究所, 四川成都 610041

基金项目:

国家自然科学基金 62072362

详细信息

作者简介:
万军（1976-），男，汉族，河南洛阳人，讲师，研究方向：人工智能、模式识别、计算机技术。E-mail: lmclw13@sina.com

中图分类号: TP391.41
计量
- 文章访问数: 38
- HTML全文浏览量: 3
- PDF下载量: 12
出版历程
- 收稿日期: 2023-05-21
- 修回日期: 2023-07-02
- 刊出日期: 2025-03-19

Lightweight Multisource Object Detection Based on Group Feature Extraction

1.
Sanmenxia College of Social Administration, Sanmengxia 472000, China
2.
Electronic Information College, Xi'an Polytechnic University, Xi'an 710048, China
3.
Chengdu Institute of Computer Application, Chinese Academy of Sciences, Chengdu 610041, China

摘要

摘要:
为兼顾多源目标检测网络的精度与效率，将分组卷积作用于目标多模态特征中，并配合注意力多尺度结构以及改进的目标框筛选策略，设计了一种轻量级的红外与可见光目标检测模型。模型先以多种特征降维策略对输入图像进行采样，降低噪声及冗余信息的影响；其次，根据特征通道所属模态进行分组，并利用深度可分离卷积分别对红外特征、可见光特征以及融合特征进行提取，提升多源特征提取结构的多样性以及高效性；然后，针对各维度多模态特征，引入改进的注意力机制来增强关键特征，再结合邻域多尺度融合结构保障网络的尺度不变性；最后，利用优化后的非极大值抑制算法来综合各尺度目标预测结果，精确检测出各个目标。通过在KAIST、FLIR、RGBT公开数据集上的测试结果表明，所提模型有效提升了目标检测性能，并且相对于同类型多源目标检测方法，该模型也体现出较高的鲁棒性和泛化性，可以更好地实现目标检测。
- 多源目标检测 /
- 分组特征提取 /
- 注意力多尺度 /
- 非极大值抑制
Abstract:
To balance the accuracy and efficiency of multisource object detection networks, a lightweight infrared and visible light object detection model with a multiscale attention structure and an improved object-box filtering strategy was designed by applying group convolution to multimodal object features. First, multiple feature dimensionality reduction strategies were adopted to sample the input image and reduce the impact of noise and redundant information. Subsequently, feature grouping was performed based on the mode of the feature channel, and deep separable convolution was used to extract infrared, visible, and fused features, to enhance the diversity and efficiency of extracted multisource feature structures. Then, an improved attention mechanism was utilized to enhance key multimodal features in various dimensions, combining them with a neighborhood multiscale fusion structure to ensure scale invariance of the network. Finally, the optimized non-maximum suppression algorithm was used to synthesize the prediction results of objects at various scales for accurate detection of each object. Experimental results based on the KAIST, FLIR, and RGBT public thermal datasets show that the proposed model effectively improves object detection performance compared with the same type of multisource object detection methods.
- multi-source object detection /
- group feature extraction /
- attention multiscale /
- non-maximum suppression

HTML全文

图 1 多源目标检测整体结构

Figure 1. Overall structure of object detection network

下载: 全尺寸图片幻灯片

图 2 单支路分组特征提取结构

Figure 2. Single branch grouping feature extraction structure

下载: 全尺寸图片幻灯片

图 3 注意力多尺度结构

Figure 3. Attention multi-scale structure

下载: 全尺寸图片幻灯片

图 4 目标框筛选流程

Figure 4. Target box filtering process

下载: 全尺寸图片幻灯片

图 5 KAIST数据集目标检测效果对比

Figure 5. Comparison of object detection effects in KAIST dataset

下载: 全尺寸图片幻灯片

图 6 FLIR和RGBT数据集目标检测效果（前两行：FLIR；后两行：RGBT）

Figure 6. Object detection effect of FLIR and RGBT datasets (first two rows: FLIR; second two rows: RGBT)

下载: 全尺寸图片幻灯片

表 1 实验数据集构成

Table 1 Composition of experimental dataset

Configure	KAIST	FLIR	RGBT
Number of images	8600	11000	15000
Image size	512×512	512×512	640×480
Number of target categories	4	8	11
Train: Verification: Test	7:1:2

下载: 导出CSV

表 2 超参数设置

Table 2 Hyperparameter setting

Hyperparameter	Value
Batch size	4
Learning rate	0.01
Weight initialization	Xavier
Learning rate regulation	Multistep
Weight decay	0.005
Momentum	0.95
Weight adjustment strategy	Adam
Category loss calculation	Cross Entropy
Position loss calculation	CIoU

下载: 导出CSV

表 3 基础特征提取结构对比

Table 3 Comparison of basic feature extraction structures

Network	Efficiency/fps	Test accuracy/(%)
Network	Efficiency/fps	mAP	mAP_s	mAP_m	mAP_l
ShuffleNetv2^[18]	38	71.0	50.8	70.5	79.6
GhostNetv2^[19]	35	71.6	51.1	71.2	80.4
MobileNetv3^[20]	32	72.5	52.2	72.4	81.6
Our network	34	72.1	51.9	72.0	81.2

下载: 导出CSV

表 4 多源特征提取结构对比

Table 4 Comparison of multi-source feature extraction structures

Network	Efficiency/fps	Test accuracy/(%)
Network	Efficiency/fps	mAP	mAP_s	mAP_m	mAP_l
Dual branch extraction	17	76.2	57.1	76.5	85.3
Fusion extraction	33	74.9	55.6	75.1	82.8
Group extraction	30	77.5	58.3	77.8	86.9

下载: 导出CSV

表 5 注意力结构对比

Table 5 Comparison of attention structure

Network	Efficiency/fps	Test accuracy/(%)
Network	Efficiency/fps	mAP	mAP_s	mAP_m	mAP_l
No attention	30	77.5	58.3	77.8	86.9
ECANet^[21]	29	78.1	58.8	78.5	87.7
ViT^[22]	29	78.2	59.0	78.6	87.6
CBAM^[15]	26	78.8	59.7	79.2	88.3
Triplet^[23]	28	78.6	59.4	78.9	88.0
Our attention	28	79.2	60.1	79.5	88.9

下载: 导出CSV

表 6 多尺度特征融合结构对比

Table 6 Comparison of multi-scale feature fusion structures

Network	Efficiency/fps	Test accuracy/(%)
Network	Efficiency/fps	mAP	mAP_s	mAP_m	mAP_l
Upsampling^[13]	28	79.2	60.1	79.5	88.9
Adaptive^[14]	24	80.4	60.8	80.9	90.3
Gaussian^[19]	26	79.7	60.0	80.3	89.7
Ours Neighborhood	27	80.3	61.0	80.5	90.1

下载: 导出CSV

表 7 NMS改进前后对比

Table 7 Comparison of NMS before and after improvement

Network	Efficiency/fps	Test accuracy/(%)
Network	Efficiency/fps	AP	AP⁵⁰	AP⁷⁵
Before NMS optimization	27	60.5	87.5	60.4
After NMS optimization	27	61.0	88.8	61.3

下载: 导出CSV

表 8 同类型多源目标检测对比

Table 8 Comparison of same type multiple source object detection

Network	Efficiency/fps	Test accuracy/(%)
Network	Efficiency/fps	mAP	mAP_s	mAP_m	mAP_l
Literature [9]	16	77.9	60.1	77.7	88.5
Literature [10]	22	78.8	60.8	78.9	89.3
Literature [11]	26	78.3	60.2	78.3	88.7
Ours	27	80.6	61.4	80.8	90.3

下载: 导出CSV

表 9 FLIR数据集测试结果对比

Table 9 Comparison of FLIR dataset test results

Network	Efficiency/fps	Test accuracy/(%)
Network	Efficiency/fps	mAP	mAP_s	mAP_m	mAP_l
Literature [9]	15	75.3	58.2	74.9	86.1
Literature [10]	21	76.5	59.1	76.0	87.2
Literature [11]	25	76.2	58.7	75.6	86.8
Ours	26	79.1	60.6	78.8	88.7

下载: 导出CSV

表 10 RGBT数据集测试结果

Table 10 Comparison of RGBT dataset test results

Network	Efficiency/fps	Test accuracy/(%)
Network	Efficiency/fps	mAP	mAP_s	mAP_m	mAP_l
Literature [9]	14	70.3	52.8	70.5	82.1
Literature [10]	20	71.0	53.5	71.6	83.2
Literature [11]	24	70.6	53.0	71.2	82.9
Ours	25	72.4	54.7	72.8	84.3

下载: 导出CSV

参考文献(23)

[1]	杜紫薇, 周恒, 李承阳, 等. 面向深度卷积神经网络的小目标检测算法综述[J]. 计算机科学, 2022, 49(12): 205-218. DOI: 10.11896/jsjkx.220500260 DU Z W, ZHOU H, LI C Y, et al. A survey on small object detection algorithms for deep convolutional neural networks[J]. Computer Science, 2022, 49(12): 205-218. DOI: 10.11896/jsjkx.220500260
[2]	李科岑, 王晓强, 林浩, 等. 深度学习中的单阶段小目标检测方法综述[J]. 计算机科学与探索, 2022, 16(1): 41-58. LI K C, WANG X Q, LIN H, et al. A survey on single-stage small object detection methods in deep learning[J]. Journal of Computer Science and Exploration, 2022, 16(1): 41-58.
[3]	LIANG Y, QIN G, SUN M, et al. Multi-modal interactive attention and dual progressive decoding network for RGB-D/T salient object detection[J]. Neurocomputing, 2022, 490: 132-145. DOI: 10.1016/j.neucom.2022.03.029
[4]	宋文姝, 侯建民, 崔雨勇. 基于多源信息融合的智能目标检测技术[J]. 电视技术, 2021, 45(6): 101-105. SONG W S, HOU J M, CUI Y Y. Intelligent object detection technology based on multi-source information fusion[J]. Television Technology, 2021, 45(6): 101-105.
[5]	LIU J, FAN X, HUANG Z, et al. Target-aware dual adversarial learning and a multi-scenario multi-modality benchmark to fuse infrared and visible for object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 5802-5811.
[6]	吴泽, 缪小冬, 李伟文, 等. 基于红外可见光融合的低能见度道路目标检测算法[J]. 红外技术, 2022, 44(11): 1154-1160. http://hwjs.nvir.cn/article/id/4bac684b-eed1-4894-900f-ed97489995e6 WU Z, MIAO X D, LI W W, et al. Low-visibility road object detection algorithm based on infrared visible light fusion[J]. Infrared Technology, 2022, 44(11): 1154-1160. http://hwjs.nvir.cn/article/id/4bac684b-eed1-4894-900f-ed97489995e6
[7]	Afyouni I, Al Aghbari Z, Razack R A. Multi-feature, multi-modal, and multi-source social event detection: a comprehensive survey[J]. Information Fusion, 2022, 79: 279-308. DOI: 10.1016/j.inffus.2021.10.013
[8]	程腾, 孙磊, 侯登超, 等. 基于特征融合的多层次多模态目标检测[J]. 汽车工程, 2021, 43(11): 1602-1610. CHENG T, SUN L, HOU D C, et al. Multi-level multi-modal object detection based on feature fusion[J]. Automotive Engineering, 2021, 43(11): 1602-1610.
[9]	ZHANG L, WANG S, SUN H, et al. Research on dual mode target detection algorithm for embedded platform[J]. Complexity, 2021, 2021(8): 1-8. http://openurl.ebsco.com/contentitem/doi:10.1155%2F2024%2F9874354?sid=ebsco:plink:crawler&id=ebsco:doi:10.1155%2F2024%2F9874354
[10]	邝楚文, 何望. 基于红外与可见光图像的目标检测算法[J]. 红外技术, 2022, 44(9): 912-919. http://hwjs.nvir.cn/article/id/60c5ef39-1d9c-4918-842f-3d86b939f3a6 KUANG C W, HE W. Target detection algorithm based on infrared and visible light images[J]. Infrared Technology, 2022, 44(9): 912-919. http://hwjs.nvir.cn/article/id/60c5ef39-1d9c-4918-842f-3d86b939f3a6
[11]	马野, 吴振宇, 姜徐. 基于红外图像与可见光图像特征融合的目标检测算法[J]. 导弹与航天运载技术, 2022(5): 83-87. MA Y, WU Z Y, JIANG X. Target detection algorithm based on feature fusion of infrared and visible light images[J]. Missile and Space Vehicle Technology, 2022(5): 83-87.
[12]	ZHANG D, YE M, LIU Y, et al. Multi-source unsupervised domain adaptation for object detection[J]. Information Fusion, 2022, 78: 138-148. http://www.sciencedirect.com/science/article/pii/S1566253521001895
[13]	CHEN S, MA W, ZHANG L. Dual-bottleneck feature pyramid network for multiscale object detection[J]. Journal of Electronic Imaging, 2022, 31(1): 1-16. http://www.nstl.gov.cn/paper_detail.html?id=58c6ec4e74c19b48febf51e68105aea0
[14]	TANG B. ASFF-YOLOv5: Multielement detection method for road traffic in UAV images based on multiscale feature fusion[J]. Remote Sensing, 2022, 14(14): 3498-3499. http://www.mdpi.com/2072-4292/14/14/3498
[15]	Woo S, Park J, Lee J Y, et al. CBAM: Convolutional block attention module[C]//Proceedings of the European Conference on Computer Vision (ECCV), 2018: 3-19.
[16]	LI C, LIANG X, LU Y, et al. RGB-T object tracking: benchmark and baseline[J]. Pattern Recognition, 2019, 96: 106977. http://doc.paperpass.com/patent/arXiv180508982.html
[17]	LIN T Y, Maire M, Belongie S, et al. Microsoft COCO: Common objects in context[C]//Computer Vision–ECCV 2014: 13th European Conference, 2014: 740-755.
[18]	MA N, ZHANG X, ZHENG H T, et al. Shufflenet v2: Practical guidelines for efficient cnn architecture design[C]//Proceedings of the European Conference on Computer Vision (ECCV), 2018: 116-131.
[19]	HAN K, WANG Y, XU C, et al. GhostNets on heterogeneous devices via cheap operations[J]. International Journal of Computer Vision, 2022, 130(4): 1050-1069. DOI: 10.1007/s11263-022-01575-y
[20]	Howard A, Sandler M, CHU G, et al. Searching for mobilenetv3[C]// Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019: 1314-1324.
[21]	WANG Q, WU B, ZHU P, et al. ECA-Net: efficient channel attention for deep convolutional neural networks[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2020: 11534-11542.
[22]	Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16×16 words: transformers for image recognition at scale[C]//International Conference on Learning Representations, 2021: 331-368.
[23]	Misra D, Nalamada T, Arasanipalai A U, et al. Rotate to attend: convolutional triplet attention module[C]//Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2021: 3139-3148.

施引文献

资源附件(0)

图(6) / 表(10)

计量

文章访问数: 38
HTML全文浏览量: 3
PDF下载量: 12
被引次数: 0

基于分组特征提取的轻量型多源目标检测

作者简介: 万军（1976-），男，汉族，河南洛阳人，讲师，研究方向：人工智能、模式识别、计算机技术。E-mail: lmclw13@sina.com

计量

出版历程

Lightweight Multisource Object Detection Based on Group Feature Extraction

计量

出版历程

目录

作者简介:
万军（1976-），男，汉族，河南洛阳人，讲师，研究方向：人工智能、模式识别、计算机技术。E-mail: lmclw13@sina.com