Abstract:
The technology of infrared target detection using drones has wide applications in the fields of national defense and emergency rescue. However, it faces challenges such as the small size of relevant targets and complex backgrounds, which makes it easy to miss the detailed features of a target. By constructing detail enhancement and multiscale cross-space attention network models, detection accuracy can be improved by enhancing low-resolution details in the feature extraction process and capturing both local and global spatial semantic information in the feature fusion process. First, the detail enhancement network extracts feature maps of different scales from the input image to enhance detail information in the original image as much as possible. Second, the multiscale cross-space attention network captures both global and local semantic features simultaneously and performs multiscale feature fusion. Finally, the detection head uses a bounding box regression loss function that measures the similarity between adjacent boxes, considering the distance between the vertices of the bounding boxes. The experimental results show that the mAP@0.5 of the proposed model DM-IUAV(l) on the infrared dataset of Drone_Vehicle was 81.2%, with a value of 74.4% on the HIT-UAV dataset.