Abstract:
Owing to the inherently low contrast of vehicle‑mounted infrared imagery—which renders target details and contours indistinct, leading to frequent loss of small‑target features and elevated misclassification rates—we present YOLO‑OS (Object‑Small), a novel infrared‑small‑target detection framework. First, we introduce an ultra‑small‑target detection head that seamlessly fuses deep semantic features, thereby bolstering the network’s capacity to represent and parse fine-grained information; concurrently, we embed an ADown downsampling convolution to strike an optimal trade‑off between model compactness and accuracy. Next, our backbone incorporates a C3SCC module—integrating spatial‑channel reconstruction convolutions—that elevates detection performance on diminutive targets amidst complex backgrounds while preserving a minimal parameter footprint. Finally, we replace conventional upsampling with a Dynamic Upsampling (Dysample) scheme to enhance interpolation precision during multi‑scale feature aggregation. Experimental evaluation demonstrates that, relative to the YOLOv11n baseline, YOLO‑OS achieves a 38 % reduction in parameter count and boosts mAP from 51.5 % to 57.5 %, evidencing its superior feature‑extraction prowess and detection accuracy for infrared small‑target scenarios.