Civil Drone Detection Based on Deep Convolutional Neural Networks: a Survey
-
摘要: 小型民用无人机预警探测是公共安全领域的热点问题,也是视觉目标检测领域的研究难点。采用手工特征的经典目标检测方法在语义信息的提取和表征方面存在局限性,因此基于深度卷积神经网络的目标检测方法在近年已成为业内主流技术手段。围绕基于深度卷积神经网络的小型民用无人机检测技术发展现状,本文介绍了计算机视觉目标检测领域中基于深度卷积神经网络的双阶段算法和单阶段检测算法,针对小型无人机检测任务分别总结了面向静态图像和视频数据的无人机目标检测方法,进而探讨了无人机视觉检测中亟待解决的瓶颈性问题,最后对该领域研究的未来发展趋势进行了讨论和展望。Abstract: Vision-based early warnings against civil drones are crucial in the field of public security and are also challenging in visual object detection. Because conventional target detection methods built on handcrafted features are limited in terms of high-level semantic feature representations, methods based on deep convolutional neural networks (DCNNs) have facilitated the main trend in target detection over the past several years. Focusing on the development of civil drone-detection technology based on DCNNs, this paper introduces the advancements in DCNN-based object detection algorithms, including two-stage and one-stage algorithms. Subsequently, existing drone-detection methods developed for still images and videos are summarized separately. In particular, motion information extraction approaches to drone detection are investigated. Furthermore, the main bottlenecks in drone detection are discussed. Finally, potentially promising solutions and future development directions in the drone-detection field are presented.
-
0. 引言
偏振成像[1]是一种新型的光学成像手段,能够获取多个偏振方向图像,通过偏振信息解析,能够实现同场景多个不同偏振方向图像来表征目标的偏振信息,从而实现对目标的检测。偏振信息解析是偏振成像中的重要环节,目前有很多学者针对不同的应用场景提出了多种偏振信息解析的方法[2-5],其中,沈洁[6]等人根据螳螂虾的复眼能充分利用偏振信息实现水下复杂环境的猎物捕获,提出了基于拮抗机制的偏振仿生信息解析方法,将0°、45°、90°、135°四个偏振方向图像分成2组拮抗图像,通过使2组图像的拮抗信息熵最大来获得偏振方向图像的加权系数,得到偏振拮抗图像,从而实现水下目标探测。但这种方法本质是偏振方向图像间的线性操作,存在计算效率低、解析结果不确定、目标不够突出等问题。
深度学习具有强大的表征能力,可以从高维、复杂以及非线性数据中提取有用的特征,目前在很多领域都有着广泛的应用,如自然语言处理[7]、语音识别[8]、图像处理[9]等。Li[10]等人提出一种基于卷积神经网络(Convolutional Neural Networks,CNN)和残差神经网络(Residual Networks,ResNets)的深度学习架构,用于红外和可见光图像的融合,较好地突显了目标信息,并解决了传统的基于CNN中随着网络深度增加特征信息退化的问题。
偏振图像拮抗过程属于偏振方向图像融合的范畴,本文充分利用深度学习在图像处理上的优势,提出了一种偏振方向图像的双支路拮抗融合网络,主要包括特征提取、特征融合和特征转化3个模块,输入4个不同方向的偏振图像,分成两个支路,低频支路通过合成图像来减少能量的损失,高频支路通过差分图像来突显图像的细节信息。将两个支路处理的结果分别通过深度融合网络进行处理,获取效果更好的融合图像,提高后续目标检测与识别效果。
1. 基本原理
1.1 分焦平面型偏振成像
偏振成像需要获取多个偏振方向图像,常用的获取方式可以分为:分时型、分振幅型、分孔径型和分焦平面型。相对于其他的成像方式,分焦平面型偏振成像具有体积小、重量轻、成本低、能同时获取多个偏振方向图像等优点,成为目前偏振成像获取方法的主流。其基本原理是:在探测器的芯片上集成微型偏振分析器(如图 1所示),在探测器的每4个像元上,耦合0°、45°、90°、135°等4个线偏振方向的偏振分析器,利用4个像元实现1个像素的偏振信息采集。按照微型偏振分析器排列顺序,对探测器输出图像进行重新整合,即可得到0°、45°、90°、135°四个偏振方向图像,如图 2(a)~(d)所示。
利用偏振成像理论,可以得到合成强度图像I[6],如图 2(e)所示:
$$ I=I\left(0^{\circ}\right)+I\left(90^{\circ}\right) $$ (1) 1.2 仿生偏振拮抗图像
根据螳螂虾复眼的偏振拮抗机制[6],一对正交偏振图像的输入可以形成一个拮抗,如0°和90°偏振方向图像、45°和135°偏振方向图像,将采集到的4组正交偏振图像形成4个偏振拮抗通道,每个通道由相互正交的一对偏振信号组成,各组拮抗信号通过拮抗运算方式可以得到偏振拮抗图像,如下所示:
$$ S_{\mathrm{d}}=k_1 \times I\left(45^{\circ}\right)-k_2 \times I\left(135^{\circ}\right) $$ (2) $$ S_{\mathrm{dd}}=k_3 \times I\left(135^{\circ}\right)-k_4 \times I\left(45^{\circ}\right) $$ (3) $$ S_{\mathrm{h}}=k_5 \times I\left(0^{\circ}\right)-k_6 \times I\left(90^{\circ}\right) $$ (4) $$ S_{\mathrm{v}}=k_7 \times I\left(90^{\circ}\right)-k_8 \times I\left(0^{\circ}\right) $$ (5) 式中:I(0°)、I(90°)、I(45°)和I(135°)分别表示0°、90°、45°和135°的偏振方向图像,ki(i=1, …, 8)为拮抗系数,起到对图像的增强和抑制作用,k的取值范围由人为设定,km≥1(m=1, 3, 5, 7),0<kn≤1(n=2, 4, 6, 8),文献[6]中k是通过遍历所有范围内可能的值,求取偏振拮抗图像信息熵最大来确定,存在计算效率低、结果不确定的问题,使得求取的偏振拮抗图像目标可能不够突出。
1.3 基于深度学习的图像融合
图像融合就是通过处理不同传感器所拍摄的源图像,提取有用的信息或特征,将其整合来改善图像的品质和清晰度[11]。传统图像融合需要人工提取特征,指定融合规则,而基于深度学习的图像融合,利用深度网络对输入图像进行卷积,提取出目标高层特征,再利用卷积将融合后特征转换为融合图像,如图 3所示。相对于可监督图像融合的方法,无监督的方法通过约束融合图像和原图像之间的相似性,克服了大多数图像融合中无参考度量的普遍问题。
2. DANet网络设计
2.1 网络结构
由偏振成像机理可知,每个偏振方向图像能量损失一半。为了提高融合后的图像的亮度,如图 4所示,我们设计了一个低频支路,将4个偏振方向图像通过Concat操作进行连接输入,用于提取每个偏振方向图像的低频特征;根据Tyo[12]的研究结论,偏振差分成像可以突显目标细节信息,因此本文设计了另一个高频支路,将2组拮抗图像进行差分输入,用于提取差分图像的目标高频特征。Huang[13]等人提出了一种密集块结构,其中使用了从任何层到所有后续层的直接连接。这种体系结构可以保存尽可能多的信息,该模型可以改善网络中的信息流和梯度,使网络易于训练,同时,密集连接具有正则化效果,减少了任务的过拟合。受此启发,本文将密集连接加入到差分图像细节特征提取中,用于降低细节信息的损失。设计的网络结构主要包括特征提取、特征融合和特征转化3个模块。
图 4中,特征融合模块将两个支路提取的特征图进行对应像素融合,得到融合特征,特征转化模块利用1×1卷积将融合的特征整合得到输出图像。低频和高频支路均有3个3×3的卷积层,网络参数如表 1所示。
表 1 网络参数Table 1. Network parametersLayer Input channel Output channel Feature extraction Low frequency Conv1 4 128 Conv2 128 64 Conv3 64 50 High frequency Conv4 2 16 Conv5 18 16 Conv6 34 50 Feature fusion Fusion 50 50 Feature transformation Conv7 50 1 2.2 损失函数
在偏振拮抗图像获取中,利用信息熵最大作为评价标准[6],因此本文将信息熵损失Lentropy加入到损失函数中,用于增大融合图像的信息量;结构相似性度量方法作为图像领域使用最广泛的指标之一,该方法基于图像的亮度、对比度和结构3部分来衡量图像之间的相似性,本文将结构相似性损失Lssim加入到损失函数中,用于保持融合图像结构特征;感知损失通常用于图像重建中,恢复出来的图像视觉效果较好,本文将感知损失Lperceptural加入到损失函数中,用于提高融合图像的人眼视觉效果。因此,本文的损失函数L计算公式如下:
$$ L=L_{\text {entropy }} \times \sigma+L_{\text {ssim}} \times \beta+L_{\text {perceptural }} \times \gamma $$ (6) 式中:σ、β、γ为各损失的权重,本文分别取0.1、10、0.1。
1)信息熵损失Lentropy
信息熵(Information Entropy,IE)越大,图像包含的信息越多,为使融合图像信息熵最大,信息熵损失Lentropy定义为:
$$ {L_{{\text{entropy}}}} = \frac{1}{{{\text{IE}} + \varepsilon }} $$ (7) $$ {\text{IE}} = - \sum\limits_{i = 1}^n {p\left( {{x_i}} \right)\log p\left( {{x_i}} \right)} $$ (8) 式中:ε为极小量;xi为随机变量;p(xi)为输出概率函数;n为灰度等级。
2)结构相似性损失Lssim
结构相似性(structural similarity index,SSIM)用于度量两幅图像的相似度,结构相似性损失Lssim定义为:
$$ L_{\text {ssim }}=1-\operatorname{SSIM}(\text { output, } I \text { ) } $$ (9) 式中:SSIM(⋅)表示结构相似性运算;output为输出图像;I为强度图像。
3)感知损失Lperceptural
$$ {L_{{\text{perceptural}}}} = \left\| {{\varPhi _i}\left( {{\text{output}}} \right), {\varPhi _i}\left( I \right)} \right\|_2^2 $$ (10) 式中:Φi(⋅)为VGG16网络第i层特征图。
3. 实验与分析
3.1 实验环境与数据
实验环境如下:训练与测试图像集采用处理器为11th Gen Intel(R) Core(TM) i7-11800H @ 2.30 GHz,系统运行内存为32 GB,GPU显卡为8 GB显存容量的NVIDIA GeForce RTX3080的图形工作站。训练软件运行环境为Python3.9,编程框架为Torch 1.11.0,搭建Cuda10.0用于实验加速。
本文采用LUCID公司的分焦平面型偏振相机(型号:PHX050S-P),如图 5所示。该相机能够同时获取0°、45°、90°、135°四个偏振方向图像,图像分辨率为2448×2048。我们拍摄采集了多种场景目标的9320组偏振方向图像,构建出本文的数据集。
3.2 评价指标及训练参数
本文采用主观定性和客观定量相结合的方法对融合图像进行综合评价,主观评价主要通过人眼观察图像亮度和细节信息,客观评价采用平均梯度[14]、信息熵[15]、空间频率[16]、均值[17]等4个评价指标,定量评估融合效果。
1)平均梯度(Average Gradient,AG)
平均梯度能有效反映出图像层次信息,其值越大,图像层次越丰富,其计算公式为:
$$ \begin{array}{l} {\text{AG}} = \frac{1}{{\left( {M - 1} \right)\left( {N - 1} \right)}} \times \hfill \\ \quad \quad \sum\limits_{i = 1}^{M - 1} {\sum\limits_{j = 1}^{N - 1} {\sqrt {\frac{{{{\left[ {F\left( {i, j} \right) - F\left( {i + 1, j} \right)} \right]}^2} + {{\left[ {F\left( {i, j} \right) - F\left( {i, j + 1} \right)} \right]}^2}}}{2}} } } \hfill \\ \end{array} $$ (11) 式中:F(i, j)为图像的第i行、第j列的灰度值;M、N分别为图像的总行数和总列数。
2)空间频率(Spatial Frequency,SF)
空间频率是图像质量经典的标准之一,其值越大,代表图像质量越高,越清晰,其计算公式为:
$$ {\text{SF}} = \sqrt {{\text{R}}{{\text{F}}^2} + {\text{C}}{{\text{F}}^2}} $$ (12) $$ {\text{RF}} = \sqrt {\frac{1}{{MN}}\sum\limits_{i = 1}^M {\sum\limits_{j = 2}^N {\left[ {{I_{\text{p}}}\left( {i, j} \right) - {I_{\text{p}}}\left( {i, j - 1} \right)} \right]} } } $$ (13) $$ {\text{CF}} = \sqrt {\frac{1}{{MN}}\sum\limits_{i = 2}^M {\sum\limits_{j = 1}^N {\left[ {{I_{\text{p}}}\left( {i, j} \right) - {I_{\text{p}}}\left( {i - 1, j} \right)} \right]} } } $$ (14) 式中:RF是行频率;CF是列频率;M、N为图片的宽高;Ip(i, j)为图像在(i, j)处的像素值。
3)图像均值(Image Mean,IM)
均值即图像像素的平均值,反应图像的平均亮度,平均亮度越大,能量越高,其计算公式为:
$$ {\text{IM}} = \sum\limits_{k = 0}^{L - 1} {{z_k}p\left( {\frac{{{n_k}}}{{MN}}} \right)} $$ (15) 式中:zk为图像的第k个灰度级;L表示图像的灰度等级数目;nk是zk在图像中出现的次数。
本文数据集共计9320组,其中训练集和测试集按照9:1划分,算法的模型由Adam optimizer训练,训练轮次为20,初始学习率为1e-4,每训练4轮学习率衰减一半,详细参数如表 2所示。
表 2 训练参数Table 2. Training parametersParameters Values Training set 8388 Testing set 932 Training round 20 Epoch 4 Optimizer Adam Activation function ReLU Initial learning rate 1e-4 Learning rate decay rate 0.5*lr/4 round 3.3 实验结果分析
为了验证本文算法的有效性,从测试集中随机选取了4组数据,每组数据包含0°、45°、90°和135°偏振方向图像,第1组为室内沙地伪装板目标,第2组为室内标定装置目标,第3组为室外草地伪装板,第4组为水下珊瑚目标,如图 6所示。
将上述数据输入到本文训练好的模型中,得到对应的融合图像,根据公式(1)~(5)分别得到合成强度图像I、偏振拮抗图像Sd、Sdd、Sh、Sv,如图 7所示。
从图 7可以看出,本文的融合图像亮度最高,能量最大,说明网络中的低频支路对图像能量的提升效果明显,有效解决了偏振成像中能量降低的问题。从图像细节来说,融合图像的细节效果有较为明显的提升,如第1组本文融合图像中的沙粒更加有颗粒感,伪装板的边缘更加突出,第2组本文融合图像中标定装置的线缆显现出来,背景板的线条更加清晰,而其他图像不太明显,第3组本文融合图像的3块伪装板全部从背景中区分出来,而其他图像只有部分能够区分开,第4组本文融合图像中,珊瑚整体目更亮,珊瑚边缘也更明晰。由于图像能量提升较为明显,可能会造成目标的对比度有所下降,如第一组本文融合图像的伪装板的对比度相较于Sd图和Sh图有所下降,但不影响目标整体的检测效果。
本文将测试集中932组图像输入到模型中,得到对应的932幅融合图像,并根据公式(1)~(5)分别得到932幅合成强度图像I和相应的偏振拮抗图像Sd、Sdd、Sh、Sv。利用平均梯度(AG)、信息熵(IE)、空间频率(SF)和图像灰度均值(IM)指标对其计算均值并进行评价,如表 3所示。
表 3 输出结果的各项评价指标Table 3. Evaluation indexes of the output resultsI Sd Sdd Sh Sv DANet AG 0.0099 0.0128 0.0119 0.0144 0.0126 0.0185 IE 6.06 6.18 6.08 6.15 6.39 7.04 SF 0.35 0.49 0.40 0.46 0.45 0.64 IM 41 49 47 46 57 93 从表 3中可以看出,在4个评价指标上,本文的方法都是最高的,在平均梯度上最少提高了22.16%,最多提高了46.49%;在信息熵上最少提高了9.23%,最多提高了13.92%;在空间频率上最少提高了23.44%,最多提高了45.31%;在图像灰度均值上最少提高了38.71%,最多提高了55.91%。实验结果表明,本文方法得到的融合图像亮度更高,包含的信息量更丰富,可以显现出更多的细节信息。
4. 结论
针对偏振方向图像融合效果不明显的问题,本文提出了一种基于双支路拮抗融合网络的偏振信息解析方法,可以解决现有基于拮抗机制的仿生偏振信息解析方法中存在计算效率低、解析结果不确定、目标不够突出等问题,为偏振信息解析提供了一个新的技术途径。本文设计的DANet主要包括特征提取、特征融合和特征转化3个模块。首先,特征提取模块由低频支路和高频支路组成,将0°、45°、90°和135°偏振方向图像连接输入到低频支路,提取能量特征,将2组拮抗图像差分输入到高频支路,提取图像细节特征;其次,将得到的能量特征和细节特征进行特征融合;最后,将融合后的特征转化整合为融合图像。实验表明,通过DANet得到的融合图像在视觉效果和评价指标上均有较为显著提升,在平均梯度、信息熵、空间频率和图像灰度均值上分别至少提升了22.16%、9.23%、23.44%、38.71%。下一步,我们将进一步优化网络结构,以平衡能量支路和细节支路,改善融合图像的对比度;加大水下偏振方向图像在数据集中的比重,优化数据集。
-
图 11 无人机检测的难点和瓶颈性问题示例图像
注:第一行:目标小尺寸且缺乏外观信息[47, 55, 62];第二行:背景复杂多样[47-48];第三行:目标尺度异质性问题[53]
Figure 11. Image examples to demonstrate difficulties and bottlenecks in drone detection
Note: Row 1: Targets that are small and weak in appearance information[47, 55, 62]; Row 2: Targets in complex and diverse backgrounds[47-48]; Row 3: Targets that have heterogeneous scales [53])
表 1 视觉目标检测领域代表性算法归纳
Table 1 Summary of representative algorithms in the visual object detection field
Model Year Backbone Characteristics Two-stage R-CNN[15] 2014 AlexNet[16] Integrate CNN classification and proposal generation; need multi-stage training; time-consuming and space-consuming. SPPNet[17] 2015 ZFNet[19] Introduce the spatial pyramid pooling (SPP) into CNNs. Fast R-CNN[18] 2015 AlexNet、VGG16[20] Introduce regions of interest (RoIs) pooling layer; difficult to achieve real-time detection. Faster R-CNN[21] 2015 ZFNet、VGG Introducing region proposal network (RPN) to generate high-quality proposals; complex training procedures and poor real-time performance. ION[22] 2016 IRNN[23] Improve performance on small object detection by employing context and multi-scale skip pooling. R-FCN[24] 2016 ResNet101[25] Apply the fully convolutional neural network (FCN) to Faster R-CNN to share the computation of the entire network, improving detection speed. FPN[26] 2017 ResNet101 Propose a feature pyramid model to handle scale variation issues in object detection. Mask R-CNN[27] 2018 ResNeXt[28]、FPN Add parallel branches to extend Faster R-CNN to achieve object segmentation, which cannot be detected in real-time. PANet[29] 2018 FPN Bottom-up enhancement path and adaptive feature pooling are introduced. TridentNet[30] 2019 ResNet101 Elucidating the effect of receptive field on objects of different sizes in object detection tasks. CPNDet[31] 2020 Hourglass104[32] Generate anchor-free proposals; two-step classification for filtering proposals. One-stage YOLOv1[33] 2016 GoogLeNet[34] End-to-end real-time detection does not produce proposals but has poor detection accuracy and difficult to detect small cluster objects. SSD[35] 2016 VGG16 Combined with CNN and YOLOv1 model, SSD detects on multi-scale layers, which is faster and more accurate than YOLOv1. YOLOv2[36] 2016 DarkNet19 Propose DarkNet19 to achieve high precision and high speed, but it is still difficult to detect small objects. RetinaNet[37] 2018 ResNeXt101+FPN Proposed focal loss function to solve the extreme foreground-background class imbalance problem. YOLOv3[38] 2018 DarkNet53 Improving performance on small objects by multi-scale detection. STDN[39] 2018 DenseNet169[40] Resolve multi-scale objects by employing a scale transformation module. CornerNet[41] 2019 Hourglass104 Regard the object detection task as a key point detection problem,by inferencing two key points (upper left and lower right corners) as the prediction box. YOLOv4[42] 2020 CSPDarknet53 Faster and more accurate object detection in terms of mosaic data augmentation and self-adversarial training tips. DETR[43] 2020 ResNet101 Introduce transformer structure to object detection field, but the performance for small targets needs to be improved. -
[1] WANG J, LIU Y, SONG H. Counter-unmanned aircraft system (s)(C-UAS): State of the art, challenges, and future trends[J]. IEEE Aerospace and Electronic Systems Magazine, 2021, 36(3): 4-29.
[2] LI Xiaoping, LEI Songze, ZHANG Boxing, et al. Fast aerial UAV detection using improved inter-frame difference and SVM[C]//Journal of Physics: Conference Series. IOP Publishing, 2019, 1187(3): 032082.
[3] WANG C, WANG T, WANG E, et al. Flying small target detection for anti-UAV based on a Gaussian mixture model in a compressive sensing domain[J]. Sensors, 2019, 19(9): 2168. DOI: 10.3390/s19092168
[4] Seidaliyeva U, Akhmetov D, Ilipbayeva L, et al. Real-time and accurate drone detection in a video with a static background[J]. Sensors, 2020, 20(14): 3856. DOI: 10.3390/s20143856
[5] ZHAO W, CHEN X, CHENG J, et al. An application of scale-invariant feature transform in iris recognition[C]//Proceedings of the IEEE/ACIS 12th International Conference on Computer and Information Science, IEEE, 2013: 219-222.
[6] SHU C, DING X, FANG C. Histogram of the oriented gradient for face recognition[J]. Tsinghua Science and Technology, 2011, 16(2): 216-224. DOI: 10.1016/S1007-0214(11)70032-3
[7] SHEN Y K, CHIU C T. Local binary pattern orientation based face recognition[C]//Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, IEEE, 2015: 1091-1095.
[8] YUAN Xiaofang, WANG Yaonan. Parameter selection of support vector machine for function approximation based on chaos optimization[J]. Journal of Systems Engineering and Electronics, 2008, 19(1): 191-197. DOI: 10.1016/S1004-4132(08)60066-3
[9] FENG J, WANG L, Sugiyama M, et al. Boosting and margin theory[J]. Frontiers of Electrical and Electronic Engineering, 2012, 7(1): 127-133. DOI: 10.1007/s11460-012-0188-9
[10] WEI L, HONG Z, Gui-Jin H. NMS-based blurred image sub-pixel registration[C]//Proceedings of the International Conference on Image Analysis and Signal Processing. IEEE, 2011: 98-101.
[11] 罗会兰, 陈鸿坤. 基于深度学习的目标检测研究综述[J]. 电子学报, 2020, 48(6): 1230-1239. DOI: 10.3969/j.issn.0372-2112.2020.06.026 LUO Huilan, CHEN Hongkun. Survey of object detection based on deep learning[J]. Acta Electronica Sinica, 2020, 48(6): 1230-1239. DOI: 10.3969/j.issn.0372-2112.2020.06.026
[12] Bosquet B, Mucientes M, Brea V M. STDNet: exploiting high resolution feature maps for small object detection[J]. Engineering Applications of Artificial Intelligence, 2020, 91: 103615. DOI: 10.1016/j.engappai.2020.103615
[13] SUN H, YANG J, SHEN J, et al. TIB-Net: Drone detection network with tiny iterative backbone[J]. IEEE Access, 2020, 8: 130697-130707. DOI: 10.1109/ACCESS.2020.3009518
[14] LIU L, OUYANG W, WANG X, et al. Deep learning for generic object detection: a survey[J]. International Journal of Computer Vision, 2020, 128(2): 261-318. DOI: 10.1007/s11263-019-01247-4
[15] Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014: 580-587.
[16] Krizhevsky A, Sutskever I, Hinton G E. Imagenet classification with deep convolutional neural networks[C]//Proceedings of the Advances in Neural Information Processing Systems, 2012, 25: 1097-1105.
[17] HE K, ZHANG X, REN S, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9): 1904-1916. DOI: 10.1109/TPAMI.2015.2389824
[18] Girshick R. Fast R-CNN[C]//Proceedings of the IEEE International Conference on Computer Vision, 2015: 1440-1448.
[19] Zeiler M D, Fergus R. Visualizing and understanding convolutional networks[C]//Proceedings of the European Conference on Computer Vision, 2014: 818-833.
[20] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition[J/OL]. arXiv preprint arXiv: 1409.1556, 2014.
[21] REN S, HE K, Girshick R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 39(6): 1137-1149.
[22] Bell S, Lawrence Zitnick C, Bala K, et al. Inside-outside net: detecting objects in context with skip pooling and recurrent neural networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 2874-2883.
[23] LE Q V, Jaitly N, Hinton G E. A simple way to initialize recurrent networks of rectified linear units[J/OL]. arXiv preprint arXiv: 1504.00941, 2015.
[24] DAI J, LI Y, HE K, et al. R-FCN: Object detection via region-based fully convolutional networks[J/OL]. arXiv preprint arXiv: 1605.06409, 2016.
[25] HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 770-778.
[26] LIN T Y, Dollár P, Girshick R, et al. Feature pyramid networks for object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017: 2117-2125.
[27] He K, Gkioxari G, Dollár P, et al. Mask R-CNN[C]//Proceedings of the IEEE International Conference on Computer Vision, 2017: 2961-2969.
[28] XIE S, Girshick R, Dollár P, et al. Aggregated residual transformations for deep neural networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017: 1492-1500.
[29] LIU S, QI L, QIN H, et al. Path aggregation network for instance segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018: 8759-8768.
[30] LI Y, CHEN Y, WANG N, et al. Scale-aware trident networks for object detection[C]//Proceedings of the IEEE International Conference on Computer Vision, 2019: 6054-6063.
[31] DUAN K, XIE L, QI H, et al. Corner proposal network for anchor-free, two-stage object detection[C]//European Conference on Computer Vision. Springer, Cham, 2020: 399-416.
[32] Newell A, YANG K, DENG J. Stacked hourglass networks for human pose estimation[C]//Proceedings of the European Conference on Computer Vision, 2016: 483-499.
[33] Redmon J, Divvala S, Girshick R, et al. You only look once: unified, real-time object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 779-788.
[34] Szegedy C, LIU W, JIA Y, et al. Going deeper with convolutions [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015: 1-9.
[35] LIU W, Anguelov D, Erhan D, et al. SSD: single shot multibox detector[C]//Proceedings of the European Conference on Computer Vision. Springer, 2016: 21-37.
[36] Redmon J, Farhadi A. YOLO9000: Better, faster, stronger[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017: 7263-7271.
[37] LIN T Y, Goyal P, Girshick R, et al. Focal loss for dense object detection[C]//Proceedings of the IEEE International Conference on Computer Vision, 2017: 2980-2988.
[38] Redmon J, Farhadi A. YOLOv3: An incremental improvement[J/OL]. arXiv preprint arXiv: 1804.02767, 2018.
[39] ZHOU P, NI B, GENG C, et al. Scale-transferrable object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018: 528-537.
[40] HUANG G, LIU Z, Van Der Maaten L, et al. Densely connected convolutional networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017: 4700-4708.
[41] LAW H, DENG J. Cornernet: Detecting objects as paired keypoints[C]//Proceedings of the European Conference on Computer Vision, 2018: 734-750.
[42] Bochkovskiy A, WANG C Y, LIAO H Y M. YOLOv4: Optimal speed and accuracy of object detection[J/OL]. arXiv preprint arXiv: 2004.10934, 2020.
[43] Carion N, Massa F, Synnaeve G, et al. End-to-end object detection with transformers[C]//European Conference on Computer Vision. Springer, Cham, 2020: 213-229.
[44] JIANG N, WANG K, PENG X, et al. Anti-UAV: A large multi-modal benchmark for UAV tracking[J]. arXiv preprint arXiv: 2101.08466, 2021.
[45] ZHAO J, WANG G, LI J, et al. The 2nd Anti-UAV Workshop & Challenge: Methods and results[J]. arXiv preprint arXiv: 2108.09909, 2021.
[46] Coluccia A, Fascista A, Schumann A, et al. Drone-vs-Bird detection challenge at IEEE AVSS2019[C]//Proceedings of the IEEE International Conference on Advanced Video and Signal Based Surveillance. IEEE, 2019: 1-7.
[47] WU M, XIE W, SHI X, et al. Real-time drone detection using deep learning approach[C]//Proceedings of the International Conference on Machine Learning and Intelligent Communications, 2018: 22-32.
[48] ZHAO W, ZHANG Q, LI H, et al. Low-altitude UAV detection method based on one-staged detection framework[C]//Proceedings of the International Conference on Advances in Computer Technology, Information Science and Communications IEEE, 2020: 112-117.
[49] Magoulianitis V, Ataloglou D, Dimou A, et al. Does deep super-resolution enhance UAV detection?[C]//Proceedings of the IEEE International Conference on Advanced Video and Signal Based Surveillance IEEE, 2019: 1-6.
[50] Kim J, Kwon Lee J, Mu Lee K. Accurate image super-resolution using very deep convolutional networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 1646-1654.
[51] Craye C, Ardjoune S. Spatio-temporal semantic segmentation for drone detection[C]//Proceedings of the IEEE International Conference on Advanced Video and Signal Based Surveillance. IEEE, 2019: 1-5.
[52] Ronneberger O, Fischer P, Brox T. U-Net: Convolutional networks for biomedical image segmentation[C]//Proceedings of the International Conference on Medical Image Computing and Computer-assisted Intervention, 2015: 234-241.
[53] Aker C. End-to-end Networks for Detection and Tracking of Micro Unmanned Aerial Vehicles[D]. Ankara, Turkey: Middle East Technical University, 2018.
[54] 张锡联, 段海滨. 一种基于Gabor深度学习的无人机目标检测算法[J]. 空间控制技术与应用, 2019, 45(4): 38-45. DOI: 10.3969/j.issn.1674-1579.2019.04.005 ZHANG X, DUAN H. A target detection algorithm for UAV based on Gabor deep learning[J]. Aerospace Control and Application, 2019, 45(4): 38-45. DOI: 10.3969/j.issn.1674-1579.2019.04.005
[55] 马旗, 朱斌, 张宏伟, 等. 基于优化YOLOv3的低空无人机检测识别方法[J]. 激光与光电子学进展, 2019, 56(20): 279-286. https://www.cnki.com.cn/Article/CJFDTOTAL-JGDJ201920027.htm MA Q, ZHU B, ZHANG H, et al. Low-Altitude UAV detection and recognition method based on optimized YOLOv3[J]. Laser & Optoelectronics Progress, 2019, 56(20): 279-286. https://www.cnki.com.cn/Article/CJFDTOTAL-JGDJ201920027.htm
[56] Cohen M B, Elder S, Musco C, et al. Dimensionality reduction for k-means clustering and low rank approximation[C]//Proceedings of the Forty-Seventh Annual ACM Symposium on Theory of Computing, 2015: 163-172.
[57] Saqib M, Khan S D, Sharma N, et al. A study on detecting drones using deep convolutional neural networks[C]//Proceedings of the IEEE International Conference on Advanced Video and Signal Based Surveillance. IEEE, 2017: 1-5.
[58] Nalamati M, Kapoor A, Saqib M, et al. Drone detection in long-range surveillance videos[C]//Proceedings of the IEEE International Conference on Advanced Video and Signal Based Surveillance. IEEE, 2019: 1-6.
[59] Aker C, Kalkan S. Using deep networks for drone detection[C]//Proceedings of the IEEE International Conference on Advanced Video and Signal Based Surveillance. IEEE, 2017: 1-6.
[60] 张汝榛, 张建林, 祁小平, 等. 复杂场景下的红外目标检测[J]. 光电工程, 2020, 47(10): 128-137. https://www.cnki.com.cn/Article/CJFDTOTAL-GDGC202010010.htm ZHANG R, ZHANG J, QI X, et al. Infrared target detection and recognition in complex scene[J]. Opto-Eletronic Engineering, 2020, 47(10): 128-137. https://www.cnki.com.cn/Article/CJFDTOTAL-GDGC202010010.htm
[61] 刘俊明, 孟卫华. 融合全卷积神经网络和视觉显著性的红外小目标检测[J]. 光子学报, 2020, 49(7): 46-56. https://www.cnki.com.cn/Article/CJFDTOTAL-GZXB202007006.htm LIU J, MENG W. Infrared small target detection based on fully convolutional neural network and visual saliency[J]. Acta Photonica Sincia, 2020, 49(7): 46-56. https://www.cnki.com.cn/Article/CJFDTOTAL-GZXB202007006.htm
[62] 马旗, 朱斌, 程正东, 等. 基于双通道的快速低空无人机检测识别方法[J]. 光学学报, 2019, 39(12): 105-115. https://www.cnki.com.cn/Article/CJFDTOTAL-GXXB201912012.htm MA Q, ZHU B, CHENG Z, et al. Detection and recognition method of fast low-altitude unmanned aerial vehicle based on dual channel[J]. Acta Optica Sinica, 2019, 39(12): 105-115. https://www.cnki.com.cn/Article/CJFDTOTAL-GXXB201912012.htm
[63] CUI Z, YANG J, JIANG S, et al. An infrared small target detection algorithm based on high-speed local contrast method[J]. Infrared Physics & Technology, 2016, 76: 474-481.
[64] ZHAO Y, PAN H, DU C, et al. Bilateral two-dimensional least mean square filter for infrared small target detection[J]. Infrared Physics & Technology, 2014, 65: 17-23.
[65] Lange H. Real-time contrasted target detection for IR imagery based on a multiscale top hat filter[C]//Signal Processing, Sensor Fusion, and Target Recognition VIII. International Society for Optics and Photonics, 1999, 3720: 214-226.
[66] BAI X, ZHOU F, ZHANG S, et al. Top-Hat by the reconstruction operation-based infrared small target detection[C]//Proceedings of the International Conference in Electrics, Communication and Automatic Control Proceedings, 2012: 867-873.
[67] 王刚, 陈永光, 杨锁昌, 等. 采用图像块对比特性的红外弱小目标检测[J]. 光学精密工程, 2015, 23(5): 1424-1433. https://www.cnki.com.cn/Article/CJFDTOTAL-GXJM201505029.htm WANG G, CHEN Y, YANG S, et al. Infrared dim and small target detection using image block contrast characteristics[J]. Optics and Precision Engineering, 2015, 23(5): 1424-1433. https://www.cnki.com.cn/Article/CJFDTOTAL-GXJM201505029.htm
[68] 吴双忱, 左峥嵘. 基于深度卷积神经网络的红外小目标检测[J]. 红外与毫米波学报, 2019, 38(3): 371-380. https://www.cnki.com.cn/Article/CJFDTOTAL-HWYH201903019.htm WU S, ZUO Z. Infrared small target detection based on deep convolutional neural network[J]. Journal of Infrared and Millimeter Waves, 2019, 38(3): 371-380. https://www.cnki.com.cn/Article/CJFDTOTAL-HWYH201903019.htm
[69] 李俊宏, 张萍, 王晓玮, 等. 红外弱小目标检测算法综述[J]. 中国图象图形学报, 2020, 25(9): 1739-1753. https://www.cnki.com.cn/Article/CJFDTOTAL-ZGTB202009002.htm LI J, ZHANG P, WANG X, et al. A survey of infrared dim target detection algorithms[J]. Journal of Image and Graphics, 2020, 25(9): 1739-1753. https://www.cnki.com.cn/Article/CJFDTOTAL-ZGTB202009002.htm
[70] Horn B K P, Schunck B G. Determining optical flow[C]//Techniques and Applications of Image Understanding. International Society for Optics and Photonics, 1981, 281: 319-331.
[71] Lucas B D, Kanade T. An iterative image registration technique with an application to stereo vision[C]//Proceedings of the International Joint Conference on Artificial Intelligence, 1981: 674-679.
[72] Dosovitskiy A, Fischer P, Ilg E, et al. Flownet: Learning optical flow with convolutional networks[C]//Proceedings of the IEEE International Conference on Computer Vision, 2015: 2758-2766.
[73] Ilg E, Mayer N, Saikia T, et al. FlowNet 2.0: Evolution of optical flow estimation with deep networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017: 2462-2470.
[74] Teed Z, Deng J. Raft: Recurrent all-pairs field transforms for optical flow[C]// Proceedings of the European Conference on Computer Vision, 2020: 402-419.
[75] ZHU X, XIONG Y, DAI J, et al. Deep feature flow for video recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017: 2349-2358.
[76] ZHU X, WANG Y, DAI J, et al. Flow-guided feature aggregation for video object detection[C]//Proceedings of the IEEE International Conference on Computer Vision, 2017: 408-417.
[77] Rozantsev A, Lepetit V, Fua P. Flying objects detection from a single moving camera[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015: 4128-4136.
[78] Bertinetto L, Valmadre J, Henriques J F, et al. Fully-convolutional siamese networks for object tracking[C]//Proceedings of the European Conference on Computer Vision, 2016: 850-865.
[79] Stewart R, Andriluka M, Ng A Y. End-to-end people detection in crowded scenes[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 2325-2333.
[80] ZHAO B, ZHAO B, TANG L, et al. Deep spatial-temporal joint feature representation for video object detection[J]. Sensors, 2018, 18(3): 774.
[81] 刘宜成, 廖鹭川, 张劲, 等. 基于轨迹和形态识别的无人机检测方法[J]. 计算机工程, 2020, 46(12): 283-289. https://www.cnki.com.cn/Article/CJFDTOTAL-JSJC202012038.htm LIU Y, LIAO L, ZHANG J, et al. UAV detection method based on trajectory and shape recognition[J]. Computer Engineering, 2018, 18(3): 774. https://www.cnki.com.cn/Article/CJFDTOTAL-JSJC202012038.htm
[82] 吴飞, 阳春华, 兰旭光, 等. 人工智能的回顾与展望[J]. 中国科学基金, 2018, 32(3): 243-250. https://www.cnki.com.cn/Article/CJFDTOTAL-ZKJJ201803002.htm WU F, YANG C H, LAN X, et al. Retrospect and prospect of artificial intelligence[J]. Bulletin of National Natural Science Foundation of China, 2018, 32(3): 243-250. https://www.cnki.com.cn/Article/CJFDTOTAL-ZKJJ201803002.htm