[1]王倩倩,赵海涛.基于深度CRF网络的单目红外场景深度估计[J].红外技术,2020,42(6):580-588.[doi:10.11846/j.issn.1001_8891.202006011]
 WANG Qianqian,ZHAO Haitao.Depth Estimation of Monocular Infrared Scene Based on Deep CRF Network[J].Infrared Technology,2020,42(6):580-588.[doi:10.11846/j.issn.1001_8891.202006011]
点击复制

基于深度CRF网络的单目红外场景深度估计
分享到:

《红外技术》[ISSN:1001-8891/CN:CN 53-1053/TN]

卷:
42卷
期数:
2020年第6期
页码:
580-588
栏目:
出版日期:
2020-06-23

文章信息/Info

Title:
Depth Estimation of Monocular Infrared Scene Based on Deep CRF Network
文章编号:
1001-8891(2020)06-0580-09
作者:
王倩倩赵海涛
华东理工大学 信息科学与工程学院,上海 200237
Author(s):
WANG QianqianZHAO Haitao
School of Information Science and Engineering, East China University of Science and Technology, Shanghai 200237, China
关键词:
红外图像深度估计条件随机场有序约束
Keywords:
infrared image depth estimation conditional random field ordered constraint
分类号:
TP391.9
DOI:
10.11846/j.issn.1001_8891.202006011
文献标志码:
A
摘要:
对单目红外图像进行深度估计,不仅有利于3D场景理解,而且有助于进一步推广和开发夜间视觉应用。针对红外图像无颜色、纹理不丰富、轮廓不清晰等缺点,本文提出一种新颖的深度条件随机场网络学习模型(deep conditional random field network, DCRFN)来估计红外图像的深度。首先,与传统条件随机场(conditional random field, CRF)模型不同,DCRFN不需预设成对特征,可通过一个浅层网络架构提取和优化模型的成对特征。其次,将传统单目图像深度回归问题转换为分类问题,在损失函数中考虑不同标签的有序信息,不仅加快了网络的收敛速度,而且有助于获得更优的解。最后,本文在DCRFN损失函数层计算不同空间尺度的成对项,使得预测深度图的景物轮廓信息相比于无尺度约束模型更加丰富。实验结果表明,本文提出的方法在红外数据集上优于现有的深度估计方法,在局部场景变化的预测中更加平滑。
Abstract:
 Depth estimation from monocular infrared images is required for understanding 3D scenes; moreover, it could be used to develop and promote night-vision applications. Owing to the shortcomings of infrared images, such as a lack of colors, poor textures, and unclear outlines, a novel deep conditional random field network (DCRFN) is proposed for estimating depth from infrared images. First, in contrast with the traditional CRF(conditional random field) model, DCRFN does not need to preset pairwise features. It can extract and optimize pairwise features through a shallow network architecture. Second, conventional monocular-image-based depth regression is replaced with multi-class classification, wherein the loss function considers information regarding the order of various labels. This conversion not only accelerates the convergence speed of the network but also yields a better solution. Finally, in the loss function layer of the DCRFN, pairwise terms of different spatial scales are computed; this makes the scene contour information in the depth map more abundant than that in the case of the scale-free model. The experimental results show that the proposed method outperforms other depth estimation methods with regard to the prediction of local scene changes.

参考文献/References:

[1] Cordts M, Omran M, Ramos S, et al. The cityscapes dataset for semantic urban scene understanding[C]//IEEE Computer Vision and Pattern Recognition, 2016: 3213-3223
[2] PENG X, SUN B, Ali K, et al. Learning deep object detectors from 3D models[C]//IEEE Computer Vision and Pattern Recognition, 2015: 1278-1286.
[3] Biswas J, Velos oM. Depth camera based indoor mobile robot localization and navigation[C]//IEEE Robotics and Automation, 2011: 1697-1702.
[4] Sivaraman S, Trivedi M M. Combining monocular and stereo-vision for real-time vehicle ranging and tracking on multilane highways[C]//IEEE Intelligent Transportation Systems, 2011: 1249-1254.
[5] Hedau V, Hoiem D, Forsyth D. Thinking inside the box: using appearance models and context based on room geometry[C]//European Conference on Computer Vision, 2010: 224-237.
[6] Saxena A, SUN M, Ng A Y. Make3D: learning 3D scene structure from a single still image[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2009, 31(5): 824-840.
[7] LIU B, Gould S, Koller D. Single image depth estimation from predicted semantic labels[C]//IEEE Computer Vision and Pattern Recognition, 2010: 1253-1260.
[8] Russell B C, Torralba A. Building a database of 3D scenes from user annotations[C]//IEEE Computer Vision and Pattern Recognition, 2009: 2711-2718.
[9] LIU M, Salzmann M, HE X. Discrete-continuous depth estimation from a single image[C]//IEEE Computer Vision and Pattern Recognition, 2014: 716-723.
[10] Karsch K, LIU C, KANG S B. Depth extraction from video using non-parametric sampling[C]//European Conference on Computer Vision, 2012: 775-788.
[11] Eigen D, Puhrsch C, Fergus R. Depth map prediction from a single image using a multi-scale deep network[C]//International Conference on Neural Information Processing Systems, 2014: 2366-2374.
[12] Laina I, Rupprecht C, Belagiannis V, et al. Deeper depth prediction with fully convolutional residual networks[J]. International Conference on 3D Vision, 2016: 239-248.
[13] 顾婷婷, 赵海涛, 孙韶媛. 基于金字塔型残差神经网络的红外图像深度估计[J]. 红外技术, 2018, 40(5): 21-27.
GU T T, ZHAO H T, SUN S Y. Depth estimation of infrared image based on pyramid residual neural networks[J]. Infrared Technology, 2018, 40(5): 21-27.
[14] WU S C, ZHAO H T, SUN S Y. Depth estimation from infrared video using local-feature-flow neural network[J/OL]. International Journal of Machine Learning and Cybernetics, 2018: doi.org/10.1007/s13042-018-0891-9.
[15] HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//IEEE Computer Vision and Pattern Recognition, 2016: 770-778.
[16] CHEN L C, Papandreou G, Kokkinos I, et al. Semantic image segmentation with deep convolutional nets and fully connected CRFs[J]. Computer Science, 2015(4): 357-361.
[17] Krähenbühl P, Koltun V. Efficient inference in fully connected CRFs with Gaussian edge potentials[J]. In Advances in Neural Information Processing Systems, 2012(24): 109-117.
[18] LI N B, SHEN N C, DAI N Y, et al. Depth and surface normal estimation from monocular images using regression on deep features and hierarchical CRFs[C]//IEEE Computer Vision and Pattern Recognition, 2015: 1119-1127.
[19] LIU F, SHEN C, LIN G. Deep convolutional neural fields for depth estimation from a single image[C]//IEEE Computer Vision and Pattern Recognition, 2015: 5162-5170.
[20] LIU F, SHEN C, LIN G, et al. Learning depth from single monocular images using deep convolutional neural fields[C]//IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015: 5162-5170.
[21] XU D, WANG W, TANG H, et al. Structured attention guided convolutional neural fields for monocular depth estimation[C]//IEEE Computer Vision and Pattern Recognition, 2018: 3917-3925.
[22] Ibarra-Castanedo C, González D, Klein M, et al. Infrared image processing and data analysis[J]. Infrared Physics and Technology, 2004, 46(1-2): 75-83.
[23] 张蓓蕾, 孙韶媛, 武江伟. 基于DRF-MAP模型的单目图像深度估计的改进算法[J]. 红外技术, 2009, 31(12): 712-715.
ZHANG B L, SUN S Y, WU J W. Depth estimation from monocular images based on DRF-MAP model[J]. Infrared Technology, 2009, 31(12): 712-715.
[24] 席林, 孙韶媛, 李琳娜. 基于SVM模型的单目红外图像深度估计[J]. 激光与红外, 2012, 42(11): 1311-1315.
XI L, SUN S Y, LI L N. Depth estimation from monocular infrared images based on SVM model[J]. Laser and Infrared, 2012, 42(11): 1311-1315.
[25] HUANG G, LIU Z, Laurens V D M, et al. Densely connected convolutional networks[C]//IEEE Conference on Computer Vision and Pattern Recognition, 2017: arXiv:1608.06993.
[26] Noh H, HONG S, HAN B. Learning deconvolution network for semantic segmentation[C]//IEEE International Conference on Computer Vision, 2015: 1520-1528.
[27] CAO Y, WU Z, SHEN C. Estimating depth from monocular images as classification using deep fully convolutional residual networks[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2017(99): 1-1.
[28] FU H, GONG M,WANG C, et al. Deep ordinal regression network for monocular depth estimation[C]// IEEE Computer Vision and Pattern Recognition, 2018: 2002-2011.

相似文献/References:

[1]郭水旺,王宝红,季钢,等.基于基因表达式编码算法的红外图像轮廓提取[J].红外技术,2013,35(01):038.
 GUO Shui-wang,WANG Bao-hong,JI Gang,et al. Infrared Image Contour Extraction Based on the Gene Expression Coding Algorithm[J].Infrared Technology,2013,35(6):038.
[2]孙爱平,皮冬明,安长亮,等. 光机装校阶段红外与可见光图像配准技术研究[J].红外技术,2013,35(01):050.
 SUN Ai-ping,PI Dong-ming,AN Chang-liang,et al. Study on IR/Visible Image Registration for Lens Assembly[J].Infrared Technology,2013,35(6):050.
[3]路建方,王新赛,肖志洋,等. 基于FPGA的红外图像自适应分段线性增强算法[J].红外技术,2013,35(02):102.
 LU Jian-fang,WANG Xin-sai,XIAO Zhi-yang,et al. An Adaptive Piecewise Linear Enhance Algorithm for Infrared Image Based on FPGA[J].Infrared Technology,2013,35(6):102.
[4]徐铭蔚,李郁峰,陈念年,等.多尺度融合与非线性颜色传递的微光与红外图像染色[J].红外技术,2012,34(12):722.
 XU Ming-wei,LI Yu-feng,CHEN Nian-nian,et al. Coloration of the Low Light Level and Infrared Image Using Multi-scale Fusion and Nonlinear Color Transfer Technique[J].Infrared Technology,2012,34(6):722.
[5]纪利娥,杨风暴,王志社,等. 基于边缘图像和SURF特征的可见光与红外图像的匹配算法[J].红外技术,2012,34(11):629.
 JI Li-e,YANG Feng-bao,WANG Zhi-she,et al.Visible and Infrared Image Matching Algorithm Based on Edge Image and SURF Features[J].Infrared Technology,2012,34(6):629.
[6]张红辉,罗海波,余新荣,等. 改进的神经网络红外图像非均匀性校正方法[J].红外技术,2013,35(04):232.
[7]张强,侯宁,刘红燕. 红外焦平面阵列非均匀性多点实时压缩校正研究[J].红外技术,2012,34(10):593.
 ZHANG Qiang,HOU Ning,LIU Hong-yan. Study on Real-time Multi-points Compressive Nonuniformity Correction of IRFPA[J].Infrared Technology,2012,34(6):593.
[8]路建方,王新赛,肖志洋,等. 基于灰度分层的FPGA红外图像伪彩色实时化研究[J].红外技术,2013,35(05):285.
 LU Jian-fang,WANG Xin-sai,XIAO Zhi-yang,et al. The Research on Real-time Pseudo-color of Infrared Image in FPGA Based on Gray Delaminating[J].Infrared Technology,2013,35(6):285.
[9]陈钱.红外图像处理技术现状及发展趋势[J].红外技术,2013,35(06):311.
 CHEN Qian.The Status and Development Trend of Infrared Image Processing Technology[J].Infrared Technology,2013,35(6):311.
[10]谭东杰,张安.基于局部直方图规定化的红外图像非均匀性校正[J].红外技术,2013,35(06):325.
 TAN Dong-jie,ZHANG An.Non-uniformity Correction Based on Local Histogram Specification[J].Infrared Technology,2013,35(6):325.

备注/Memo

备注/Memo:
收稿日期:2019-05-29;修订日期:2019-07-12.
作者简介:王倩倩(1993-),女,硕士研究生,主要从事计算机视觉方面的研究。
通信作者:赵海涛(1974-),男,博士,教授,主要从事模式识别、计算机视觉方面的研究。E-mail:haitaozhao@ecust.edu.cn。
基金项目:国家自然科学基金(61375007);上海市科委基础研究项目(15JC1400600)。
更新日期/Last Update: 2020-06-22