Welcome to Journal of Beijing Institute of Technology
Volume 29Issue 1
.
Turn off MathJax
Article Contents
Wenxia Bao, Yaping Yang, Dong Liang, Ming Zhu. Multi-Residual Module Stacked Hourglass Networks for Human Pose Estimation[J]. JOURNAL OF BEIJING INSTITUTE OF TECHNOLOGY, 2020, 29(1): 110-119. doi: 10.15918/j.jbit1004-0579.18151
Citation: Wenxia Bao, Yaping Yang, Dong Liang, Ming Zhu. Multi-Residual Module Stacked Hourglass Networks for Human Pose Estimation[J].JOURNAL OF BEIJING INSTITUTE OF TECHNOLOGY, 2020, 29(1): 110-119.doi:10.15918/j.jbit1004-0579.18151

Multi-Residual Module Stacked Hourglass Networks for Human Pose Estimation

doi:10.15918/j.jbit1004-0579.18151
  • Received Date:2018-11-16
  • A multi-residual module stacked hourglass network (MRSH) was proposed to improve the accuracy and robustness of human body pose estimation. The network uses multiple hourglass sub-networks and three new residual modules. In the hourglass sub-network, the large receptive field residual module (LRFRM) and the multi-scale residual module (MSRM) are first used to learn the spatial relationship between features and body parts at various scales. Only the improved residual module (IRM) is used when the resolution is minimized. The final network uses four stacked hourglass sub-networks, with intermediate supervision at the end of each hourglass, repeating high-low (from high resolution to low resolution) and low-high (from low resolution to high resolution) learning. The network was tested on the public datasets of Leeds sports poses (LSP) and MPII human pose. The experimental results show that the proposed network has better performance in human pose estimation.
  • loading
  • [1]
    Tompson J, Jain A, Lecun Y, et al. Joint training of a convolutional network and a graphical model for human pose estimation[C]//28th Annual Conference on Neural Information Processing Systems, Montreal, QC, Canada, 2014.
    [2]
    Chu X, Ouyang W, Li H, et al. Structured feature learning for pose estimation[C]//29th IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 2016.
    [3]
    Bulat A, Tzimiropoulos G. Human pose estimation via convolutional part heatmap regression[C]//14th European Conference on Computer Vision, Springer Verlag, Cham, France,2016.
    [4]
    Pfister T, Charles J, Zisserman A. Flowing ConvNets for human pose estimation in videos[C]//15th IEEE International Conference on Computer Vision, Santiago, Chile, 2016.
    [5]
    Carreira J, Agrawal P, Fragkiadaki K, et al. Human pose estimation with iterative error feedback[C]//29th IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 2016.
    [6]
    He K, Zhang X, Ren S, et al. Deep residual learning for image recognition[C]//29th IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 2016.
    [7]
    He K, Zhang X, Ren S, et al. Identity mappings in deep residual networks[C]//14th European Conference on Computer Vision, Springer Verlag, Cham, France,2016.
    [8]
    Tompson J, Goroshin R, Jain A, et al. Efficient object localization using convolutional networks[C]//2015 IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 2015.
    [9]
    Wei S E, Ramakrishna V, Kanade T, et al. Convolutional pose machines[C]//29th IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 2016.
    [10]
    Newell A, Yang K, Deng J. Stacked hourglass networks for human pose estimation[C]//14th European Conference on Computer Vision, Springer Verlag, Cham, France,2016.
    [11]
    Johnson S, Everingham M. Learning effective human pose estimation from inaccurate annotation[C]//IEEE Conference on Computer Vision and Pattern Recognition, IEEE Computer Society, Washington D C, USA, 2011.
    [12]
    Johnson S, Everingham M. Clustered pose and nonlinear appearance models for human pose estimation[C]//21st British Machine Vision Conference, Aberystwyth, UK, 2010.
    [13]
    Andriluka M, Pishchulin L, Gehler P, et al. 2D human pose estimation:New benchmark and state of the art analysis//IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 2014.
    [14]
    Yang Y, Ramanan D. Articulated human detection with flexible mixtures of parts[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(12):2878-2890.
    [15]
    Eichner M, Marin-jimenez M, Zisserman A, et al. 2D articulated human pose estimation and retrieval in (almost) unconstrained still images[J]. International Journal of Computer Vision, 2012, 99(2):190-214.
    [16]
    Fan X C, Zheng K, Lin Y W, et al. Combining local appearance and holistic view:Dual-source deep neural networks for human pose estimation[C]//IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 2015.
    [17]
    Yang W, Ouyang W L, Li H S, et al. End-to-end learning of deformable mixture of parts and deep convolutional neural networks for human pose estimation[C]//IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 2016.
    [18]
    Yu X, Zhou F, Chandraker M. Deep deformation network for object landmark localization[C]//14th European Conference on Computer Vision, Springer Verlag, Cham, France, 2016.
    [19]
    Belagiannis V, Zisserman A. Recurrent human pose estimation[C]//Proceedings of the International Conference on Automatic Face and Gesture Recognition(FG), Washington D C, USA, 2017.
    [20]
    Pishchulin L, Insafutdinov E, Tang S, et al. Deepcut:Joint subset partition and labeling for multi person pose estimation[C]//29th IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 2016.
    [21]
    Insafutdinov E, Pishchulin L, Andres B, et al. DeeperCut:A deeper, stronger, and faster multi-person pose estimation model[C]//14th European Conference on Computer Vision, Springer Verlag, Cham, France,2016.
    [22]
    Chu X, Yang W, Ouyang W, et al. Multi-context attention for human pose estimation[C]//30th IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 2017.
    [23]
    Hu P, Ramanan D. Bottom-up and top-down reasoning with hierarchical rectified gaussians[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 2016.
    [24]
    Lifshitz I, Fetaya E, Ullman S. Human pose estimation using deep consensus voting[C]//14th European Conference on Computer Vision, Springer Verlag, Cham, France,2016.
    [25]
    Gkioxari G, Toshev A, Jaitly N. Chained predictions using convolutional neural networks[C]//14th European Conference on Computer Vision, Springer Verlag, Cham, France,2016.
    [26]
    Rafi U, Leibe B, Gall J, et al. An efficient convolutional network for human pose estimation[C]//27th British Machine Vision Conference, York, UK, 2016.
    [27]
    Ning G, Zhang Z, He Z. Knowledge-guided deep fractal neural networks for human pose estimation[J]. IEEE Transactions on Multimedia, 2017, 20(5):1246-1259.
  • 加载中

Catalog

    通讯作者:陈斌, bchen63@163.com
    • 1.

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Article Metrics

    Article views (536) PDF downloads(176) Cited by()
    Proportional views
    Related

    /

      Return
      Return
        Baidu
        map