基于分组特征提取的轻量型多源目标检测

万军, 周凯, 何文磊

万军, 周凯, 何文磊. 基于分组特征提取的轻量型多源目标检测[J]. 红外技术, 2025, 47(3): 307-315.
引用本文: 万军, 周凯, 何文磊. 基于分组特征提取的轻量型多源目标检测[J]. 红外技术, 2025, 47(3): 307-315.
WAN Jun, ZHOU Kai, HE Wenlei. Lightweight Multisource Object Detection Based on Group Feature Extraction[J]. Infrared Technology , 2025, 47(3): 307-315.
Citation: WAN Jun, ZHOU Kai, HE Wenlei. Lightweight Multisource Object Detection Based on Group Feature Extraction[J]. Infrared Technology , 2025, 47(3): 307-315.

基于分组特征提取的轻量型多源目标检测

基金项目: 

国家自然科学基金 62072362

详细信息
    作者简介:

    万军(1976-),男,汉族,河南洛阳人,讲师,研究方向:人工智能、模式识别、计算机技术。E-mail: lmclw13@sina.com

  • 中图分类号: TP391.41

Lightweight Multisource Object Detection Based on Group Feature Extraction

  • 摘要:

    为兼顾多源目标检测网络的精度与效率,将分组卷积作用于目标多模态特征中,并配合注意力多尺度结构以及改进的目标框筛选策略,设计了一种轻量级的红外与可见光目标检测模型。模型先以多种特征降维策略对输入图像进行采样,降低噪声及冗余信息的影响;其次,根据特征通道所属模态进行分组,并利用深度可分离卷积分别对红外特征、可见光特征以及融合特征进行提取,提升多源特征提取结构的多样性以及高效性;然后,针对各维度多模态特征,引入改进的注意力机制来增强关键特征,再结合邻域多尺度融合结构保障网络的尺度不变性;最后,利用优化后的非极大值抑制算法来综合各尺度目标预测结果,精确检测出各个目标。通过在KAIST、FLIR、RGBT公开数据集上的测试结果表明,所提模型有效提升了目标检测性能,并且相对于同类型多源目标检测方法,该模型也体现出较高的鲁棒性和泛化性,可以更好地实现目标检测。

    Abstract:

    To balance the accuracy and efficiency of multisource object detection networks, a lightweight infrared and visible light object detection model with a multiscale attention structure and an improved object-box filtering strategy was designed by applying group convolution to multimodal object features. First, multiple feature dimensionality reduction strategies were adopted to sample the input image and reduce the impact of noise and redundant information. Subsequently, feature grouping was performed based on the mode of the feature channel, and deep separable convolution was used to extract infrared, visible, and fused features, to enhance the diversity and efficiency of extracted multisource feature structures. Then, an improved attention mechanism was utilized to enhance key multimodal features in various dimensions, combining them with a neighborhood multiscale fusion structure to ensure scale invariance of the network. Finally, the optimized non-maximum suppression algorithm was used to synthesize the prediction results of objects at various scales for accurate detection of each object. Experimental results based on the KAIST, FLIR, and RGBT public thermal datasets show that the proposed model effectively improves object detection performance compared with the same type of multisource object detection methods.

  • 图  1   多源目标检测整体结构

    Figure  1.   Overall structure of object detection network

    图  2   单支路分组特征提取结构

    Figure  2.   Single branch grouping feature extraction structure

    图  3   注意力多尺度结构

    Figure  3.   Attention multi-scale structure

    图  4   目标框筛选流程

    Figure  4.   Target box filtering process

    图  5   KAIST数据集目标检测效果对比

    Figure  5.   Comparison of object detection effects in KAIST dataset

    图  6   FLIR和RGBT数据集目标检测效果(前两行:FLIR;后两行:RGBT)

    Figure  6.   Object detection effect of FLIR and RGBT datasets (first two rows: FLIR; second two rows: RGBT)

    表  1   实验数据集构成

    Table  1   Composition of experimental dataset

    Configure KAIST FLIR RGBT
    Number of images 8600 11000 15000
    Image size 512×512 512×512 640×480
    Number of target categories 4 8 11
    Train: Verification: Test 7:1:2
    下载: 导出CSV

    表  2   超参数设置

    Table  2   Hyperparameter setting

    Hyperparameter Value
    Batch size 4
    Learning rate 0.01
    Weight initialization Xavier
    Learning rate regulation Multistep
    Weight decay 0.005
    Momentum 0.95
    Weight adjustment strategy Adam
    Category loss calculation Cross Entropy
    Position loss calculation CIoU
    下载: 导出CSV

    表  3   基础特征提取结构对比

    Table  3   Comparison of basic feature extraction structures

    Network Efficiency/fps Test accuracy/(%)
    mAP mAPs mAPm mAPl
    ShuffleNetv2[18] 38 71.0 50.8 70.5 79.6
    GhostNetv2[19] 35 71.6 51.1 71.2 80.4
    MobileNetv3[20] 32 72.5 52.2 72.4 81.6
    Our network 34 72.1 51.9 72.0 81.2
    下载: 导出CSV

    表  4   多源特征提取结构对比

    Table  4   Comparison of multi-source feature extraction structures

    Network Efficiency/fps Test accuracy/(%)
    mAP mAPs mAPm mAPl
    Dual branch extraction 17 76.2 57.1 76.5 85.3
    Fusion extraction 33 74.9 55.6 75.1 82.8
    Group extraction 30 77.5 58.3 77.8 86.9
    下载: 导出CSV

    表  5   注意力结构对比

    Table  5   Comparison of attention structure

    Network Efficiency/fps Test accuracy/(%)
    mAP mAPs mAPm mAPl
    No attention 30 77.5 58.3 77.8 86.9
    ECANet[21] 29 78.1 58.8 78.5 87.7
    ViT[22] 29 78.2 59.0 78.6 87.6
    CBAM[15] 26 78.8 59.7 79.2 88.3
    Triplet[23] 28 78.6 59.4 78.9 88.0
    Our attention 28 79.2 60.1 79.5 88.9
    下载: 导出CSV

    表  6   多尺度特征融合结构对比

    Table  6   Comparison of multi-scale feature fusion structures

    Network Efficiency/fps Test accuracy/(%)
    mAP mAPs mAPm mAPl
    Upsampling[13] 28 79.2 60.1 79.5 88.9
    Adaptive[14] 24 80.4 60.8 80.9 90.3
    Gaussian[19] 26 79.7 60.0 80.3 89.7
    Ours Neighborhood 27 80.3 61.0 80.5 90.1
    下载: 导出CSV

    表  7   NMS改进前后对比

    Table  7   Comparison of NMS before and after improvement

    Network Efficiency/fps Test accuracy/(%)
    AP AP50 AP75
    Before NMS optimization 27 60.5 87.5 60.4
    After NMS optimization 27 61.0 88.8 61.3
    下载: 导出CSV

    表  8   同类型多源目标检测对比

    Table  8   Comparison of same type multiple source object detection

    Network Efficiency/fps Test accuracy/(%)
    mAP mAPs mAPm mAPl
    Literature [9] 16 77.9 60.1 77.7 88.5
    Literature [10] 22 78.8 60.8 78.9 89.3
    Literature [11] 26 78.3 60.2 78.3 88.7
    Ours 27 80.6 61.4 80.8 90.3
    下载: 导出CSV

    表  9   FLIR数据集测试结果对比

    Table  9   Comparison of FLIR dataset test results

    Network Efficiency/fps Test accuracy/(%)
    mAP mAPs mAPm mAPl
    Literature [9] 15 75.3 58.2 74.9 86.1
    Literature [10] 21 76.5 59.1 76.0 87.2
    Literature [11] 25 76.2 58.7 75.6 86.8
    Ours 26 79.1 60.6 78.8 88.7
    下载: 导出CSV

    表  10   RGBT数据集测试结果

    Table  10   Comparison of RGBT dataset test results

    Network Efficiency/fps Test accuracy/(%)
    mAP mAPs mAPm mAPl
    Literature [9] 14 70.3 52.8 70.5 82.1
    Literature [10] 20 71.0 53.5 71.6 83.2
    Literature [11] 24 70.6 53.0 71.2 82.9
    Ours 25 72.4 54.7 72.8 84.3
    下载: 导出CSV
  • [1] 杜紫薇, 周恒, 李承阳, 等. 面向深度卷积神经网络的小目标检测算法综述[J]. 计算机科学, 2022, 49(12): 205-218. DOI: 10.11896/jsjkx.220500260

    DU Z W, ZHOU H, LI C Y, et al. A survey on small object detection algorithms for deep convolutional neural networks[J]. Computer Science, 2022, 49(12): 205-218. DOI: 10.11896/jsjkx.220500260

    [2] 李科岑, 王晓强, 林浩, 等. 深度学习中的单阶段小目标检测方法综述[J]. 计算机科学与探索, 2022, 16(1): 41-58.

    LI K C, WANG X Q, LIN H, et al. A survey on single-stage small object detection methods in deep learning[J]. Journal of Computer Science and Exploration, 2022, 16(1): 41-58.

    [3]

    LIANG Y, QIN G, SUN M, et al. Multi-modal interactive attention and dual progressive decoding network for RGB-D/T salient object detection[J]. Neurocomputing, 2022, 490: 132-145. DOI: 10.1016/j.neucom.2022.03.029

    [4] 宋文姝, 侯建民, 崔雨勇. 基于多源信息融合的智能目标检测技术[J]. 电视技术, 2021, 45(6): 101-105.

    SONG W S, HOU J M, CUI Y Y. Intelligent object detection technology based on multi-source information fusion[J]. Television Technology, 2021, 45(6): 101-105.

    [5]

    LIU J, FAN X, HUANG Z, et al. Target-aware dual adversarial learning and a multi-scenario multi-modality benchmark to fuse infrared and visible for object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 5802-5811.

    [6] 吴泽, 缪小冬, 李伟文, 等. 基于红外可见光融合的低能见度道路目标检测算法[J]. 红外技术, 2022, 44(11): 1154-1160. http://hwjs.nvir.cn/article/id/4bac684b-eed1-4894-900f-ed97489995e6

    WU Z, MIAO X D, LI W W, et al. Low-visibility road object detection algorithm based on infrared visible light fusion[J]. Infrared Technology, 2022, 44(11): 1154-1160. http://hwjs.nvir.cn/article/id/4bac684b-eed1-4894-900f-ed97489995e6

    [7]

    Afyouni I, Al Aghbari Z, Razack R A. Multi-feature, multi-modal, and multi-source social event detection: a comprehensive survey[J]. Information Fusion, 2022, 79: 279-308. DOI: 10.1016/j.inffus.2021.10.013

    [8] 程腾, 孙磊, 侯登超, 等. 基于特征融合的多层次多模态目标检测[J]. 汽车工程, 2021, 43(11): 1602-1610.

    CHENG T, SUN L, HOU D C, et al. Multi-level multi-modal object detection based on feature fusion[J]. Automotive Engineering, 2021, 43(11): 1602-1610.

    [9]

    ZHANG L, WANG S, SUN H, et al. Research on dual mode target detection algorithm for embedded platform[J]. Complexity, 2021, 2021(8): 1-8. http://openurl.ebsco.com/contentitem/doi:10.1155%2F2024%2F9874354?sid=ebsco:plink:crawler&id=ebsco:doi:10.1155%2F2024%2F9874354

    [10] 邝楚文, 何望. 基于红外与可见光图像的目标检测算法[J]. 红外技术, 2022, 44(9): 912-919. http://hwjs.nvir.cn/article/id/60c5ef39-1d9c-4918-842f-3d86b939f3a6

    KUANG C W, HE W. Target detection algorithm based on infrared and visible light images[J]. Infrared Technology, 2022, 44(9): 912-919. http://hwjs.nvir.cn/article/id/60c5ef39-1d9c-4918-842f-3d86b939f3a6

    [11] 马野, 吴振宇, 姜徐. 基于红外图像与可见光图像特征融合的目标检测算法[J]. 导弹与航天运载技术, 2022(5): 83-87.

    MA Y, WU Z Y, JIANG X. Target detection algorithm based on feature fusion of infrared and visible light images[J]. Missile and Space Vehicle Technology, 2022(5): 83-87.

    [12]

    ZHANG D, YE M, LIU Y, et al. Multi-source unsupervised domain adaptation for object detection[J]. Information Fusion, 2022, 78: 138-148. http://www.sciencedirect.com/science/article/pii/S1566253521001895

    [13]

    CHEN S, MA W, ZHANG L. Dual-bottleneck feature pyramid network for multiscale object detection[J]. Journal of Electronic Imaging, 2022, 31(1): 1-16. http://www.nstl.gov.cn/paper_detail.html?id=58c6ec4e74c19b48febf51e68105aea0

    [14]

    TANG B. ASFF-YOLOv5: Multielement detection method for road traffic in UAV images based on multiscale feature fusion[J]. Remote Sensing, 2022, 14(14): 3498-3499. http://www.mdpi.com/2072-4292/14/14/3498

    [15]

    Woo S, Park J, Lee J Y, et al. CBAM: Convolutional block attention module[C]//Proceedings of the European Conference on Computer Vision (ECCV), 2018: 3-19.

    [16]

    LI C, LIANG X, LU Y, et al. RGB-T object tracking: benchmark and baseline[J]. Pattern Recognition, 2019, 96: 106977. http://doc.paperpass.com/patent/arXiv180508982.html

    [17]

    LIN T Y, Maire M, Belongie S, et al. Microsoft COCO: Common objects in context[C]//Computer Vision–ECCV 2014: 13th European Conference, 2014: 740-755.

    [18]

    MA N, ZHANG X, ZHENG H T, et al. Shufflenet v2: Practical guidelines for efficient cnn architecture design[C]//Proceedings of the European Conference on Computer Vision (ECCV), 2018: 116-131.

    [19]

    HAN K, WANG Y, XU C, et al. GhostNets on heterogeneous devices via cheap operations[J]. International Journal of Computer Vision, 2022, 130(4): 1050-1069. DOI: 10.1007/s11263-022-01575-y

    [20]

    Howard A, Sandler M, CHU G, et al. Searching for mobilenetv3[C]// Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019: 1314-1324.

    [21]

    WANG Q, WU B, ZHU P, et al. ECA-Net: efficient channel attention for deep convolutional neural networks[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2020: 11534-11542.

    [22]

    Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16×16 words: transformers for image recognition at scale[C]//International Conference on Learning Representations, 2021: 331-368.

    [23]

    Misra D, Nalamada T, Arasanipalai A U, et al. Rotate to attend: convolutional triplet attention module[C]//Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2021: 3139-3148.

图(6)  /  表(10)
计量
  • 文章访问数:  35
  • HTML全文浏览量:  3
  • PDF下载量:  11
  • 被引次数: 0
出版历程
  • 收稿日期:  2023-05-21
  • 修回日期:  2023-07-02
  • 刊出日期:  2025-03-19

目录

    /

    返回文章
    返回