Abstract:
Infrared target detection on unmanned aerial vehicles (UAVs) has become a key capability for intelligent perception and autonomous decision-making in public security, border surveillance, and emergency response. By leveraging end-to-end architectures and strong feature-learning efficiency, the YOLO family of neural models has surpassed traditional handcrafted methods and now represents the mainstream framework for infrared detection. Recent progress in anchor-free design, multi-scale fusion, attention mechanisms, and end-to-end inference has markedly improved the detection of small and low-contrast targets in complex scenes. In this study, we review UAV-based infrared detection approaches built on YOLO models, synthesize major enhancement strategies, and evaluate their effects on weak target recognition and real-time performance while summarizing representative datasets. Remaining challenges such as retention of weak signal, cross-modal alignment, and spatiotemporal modeling are analyzed, and future directions toward multi-source collaborative perception and onboard intelligent deployment are outlined.