Welcome to Journal of Beijing Institute of Technology
Volume 32Issue 1
Feb. 2023
Turn off MathJax
Article Contents
Ji Lai, Lixin Yang, Dejian Li, Chongfei Shen, Xi Feng, Jizeng Wei, Yu Liu. Design and Optimization of Winograd Convolution on Array Accelerator[J]. JOURNAL OF BEIJING INSTITUTE OF TECHNOLOGY, 2023, 32(1): 69-81. doi: 10.15918/j.jbit1004-0579.2022.094
Citation: Ji Lai, Lixin Yang, Dejian Li, Chongfei Shen, Xi Feng, Jizeng Wei, Yu Liu. Design and Optimization of Winograd Convolution on Array Accelerator[J].JOURNAL OF BEIJING INSTITUTE OF TECHNOLOGY, 2023, 32(1): 69-81.doi:10.15918/j.jbit1004-0579.2022.094

Design and Optimization of Winograd Convolution on Array Accelerator

doi:10.15918/j.jbit1004-0579.2022.094
Funds:This paper was supported by the Project of the State Grid Corporation of China in 2022 (No. 5700-201941501A-0-0-00) and the National Natural Science Foundation of China (No. U21B2031).
More Information
  • Author Bio:

    Ji Laireceived his B.Eng degree from the School of Microelectronics, Tianjin University, Tianjin, China, in 2020. He is currently pursuing a master’s degree in Tianjin University with the School of Microelectronics. His research interests include machine learning and optimization of software and hardware system design

    Lixin Yangreceived his M.B.A. degree from the School of Economics and Management, University of Chinese Academy of Sciences, Beijing, China. He is now an engineer in Beijing Smart-chip Microelectronics Technology Company Ltd. His research interests include main control chip design, embedded software design and application scheme design

    Dejian Lireceived his M.E. degree from Department of Electronic Engineering, Tsinghua University, Beijing, China. He is now an engineer in Beijing Smart-chip Microelectronics Technology Company Ltd. His research interests include VLSI design, master chip architecture design and low power circuit design

    Chongfei Shenreceived his M.E. degree from Department of Biomedical Engineering, Tsinghua University, Beijing, China. He is now an engineer in Beijing Smart-chip Microelectronics Technology Company Ltd. His research interests include industrial control chip architecture research, industrial high reliability software design and navigation algorithm research

    Xi Fengreceived his M.E. degree from Department of Microelectronics and Nanoelectronics, Tsinghua University, Beijing, China. He is now an engineer in Beijing Smart-chip Microelectronics Technology Company Ltd. His research interests include VLSI design, security chip architecture design and master chip architecture design

    Jizeng Weireceived the B.S. degree from the Harbin Institute of Technology, in 2004, and the M.S. and Ph.D. degrees in computer science from Tianjin University, Tianjin, China, in 2007 and 2010, respectively. He is currently an sssociate professor with the College of Intelligence and Computing, Tianjin University. His research interests include computer architecture, heterogeneous processor design, AI accelerator, and embedded systems

    Yu Liureceived his B.Eng in electronic engineering from Tianjin University, Tianjin, China, in 1998. From 1998 to 2000, he worked as an electronic engineer in Shenzhen, China. In 2000, he returned to Tianjin University and received his M.Eng in information and communication engineering and Ph.D. in signal and information processing both from Tianjin University in 2002 and 2005, respectively. Currently, he is a full professor with the School of Microelectronics, Tianjin University. From 2011 to 2012, Dr. Liu was a visiting research fellow with the Department of Electrical Engineering, Princeton University, Princeton, NJ, USA. His research interests include sensing and applications in multimedia signal processing, compressed sensing, and machine intelligence

  • Corresponding author:weijizeng@tju.edu.cn
  • Received Date:2022-08-31
  • Rev Recd Date:2022-10-29
  • Accepted Date:2022-11-08
  • Publish Date:2023-02-28
  • With the rapid development and popularization of artificial intelligence technology, convolutional neural network(CNN) is applied in many fields, and begins to replace most traditional algorithms and gradually deploys to terminal devices. However, the huge data movement and computational complexity of CNN bring huge power consumption and performance challenges to the hardware, which hinders the application of CNN in embedded devices such as smartphones and smart cars. This paper implements a convolutional neural network accelerator based on Winograd convolution algorithm on field-programmable gate array (FPGA). Firstly, a convolution kernel decomposition method for Winograd convolution is proposed. The convolution kernel larger than 3×3 is divided into multiple 3×3 convolution kernels for convolution operation, and the unsynchronized long convolution operation is processed. Then, we design Winograd convolution array and use configurable multiplier to flexibly realize multiplication for data with different accuracy. Experimental results on VGG16 and AlexNet network show that our accelerator has the most energy efficient and 101 times that of the CPU, 5.8 times that of the GPU. At the same time, it has higher energy efficiency than other convolutional neural network accelerators.
  • loading
  • [1]
    G. Hinton, L. Deng, D. Yu, G. E. Dahl, A. R. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. N. Sainath, and B. Kingsbury,“Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups,” IEEE Signal Processing Magazine, vol. 29, no. 6, pp. 82-97, 2012. doi:10.1109/MSP.2012.2205597
    [2]
    D. Ciregan, U. Meier, and J. Schmidhuber, “Multi-column deep neural networks for image classification,” in 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3642-3649, 2012.
    [3]
    J. Morajda, “Neural networks and their economic applications,” in Artificial Intelligence and Security in Computing Systems. Springer, 2003, pp. 53-62.
    [4]
    J. L. Patel and R. K. Goyal,“Applications of artificial neural networks in medical science,” Current Clinical Pharmacology, vol. 2, no. 3, pp. 217-226, 2007. doi:10.2174/157488407781668811
    [5]
    H. Malmgren, M. Borga, and L. Niklasson, “Artificial neural networks in medicine and biology,” in Proceedings of the ANNIMAB-1 Conference,Göteborg, Sweden, 13-16 May 2000. Springer Science & Business Media, 2012.
    [6]
    J. Albericio, P. Judd, T. Hetherington, T. Aamodt, N. E. Jerger, and A. Moshovos,“Cnvlutin: Ineffectual-neuron-free deep neural network computing,” ACM SIGARCH Computer Architecture News, vol. 44, no. 3, pp. 1-13, 2016. doi:10.1145/3007787.3001138
    [7]
    A. Krizhevsky, I. Sutskever, and G. E. Hinton,“Imagenet classification with deep convolutional neural networks,” Communications of the ACM, vol. 60, no. 6, pp. 84-90, 2017. doi:10.1145/3065386
    [8]
    K. Fukushima,“A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position,” Biol. Cybern., vol. 36, pp. 193-202, 1980. doi:10.1007/BF00344251
    [9]
    K. Simonyan and A. Zisserman,“Two-stream convolutional networks for action recognition in videos,” Advances in Neural Information Processing Systems, vol. 27, pp. 568-576, 2014.
    [10]
    A. Parashar, M. Rhu, A. Mukkara, A. Puglielli, R. Venkatesan, B. Khailany, J. Emer, S. W. Keckler, and W. J. Dally,“SCNN: An accelerator for compressed-sparse convolutional neural networks,” ACM SIGARCH Computer Architecture News, vol. 45, no. 2, pp. 27-40, 2017. doi:10.1145/3140659.3080254
    [11]
    Y. Cai, K. Zhou, X. Xue, M. Wang, and X. Zeng, “Nonvolatile binary CNN accelerator with extremely low standby power using rram for iot applications,” in 2019 IEEE 13th International Conference on ASIC (ASICON), pp. 1-4, 2019.
    [12]
    A. Lavin and S. Gray, “Fast algorithms for convolutional neural networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4013-4021, 2016.
    [13]
    Y. Cao, X. Wei, T. Qiao, and H. Chen, “FPGA-based accelerator for convolution operations,” in 2019 IEEE International Conference on Signal, Information and Data Processing (ICSIDP), pp. 1-5, 2019.
    [14]
    Y. Chen, T. Luo, S. Liu, S. Zhang, L. He, J. Wang, L. Li, T. Chen, Z. Xu, N. Sun, and O. Temam, “Dadiannao: A machine-learning supercomputer,” in 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 609-622, 2014.
    [15]
    B. Barabasz and D. Gregg, “Winograd convolution for DNNs: Beyond linear polynomials,” in International Conference of the Italian Association for Artificial Intelligence, pp. 307-320, 2019.
    [16]
    K. Vincent, K. Stephano, M. A. Frumkin, B. Ginsburg, and J. Demouth, “On improving the numerical stability of winograd convolutions,” in 5th International Conference on Learning Representations, ICLR 2017, pp. 1-4, 2017.
    [17]
    L. Meng and J. Brothers, “Efficient winograd convolution via integer arithmetic,” arXiv preprint,arXiv: 1901.01965, 2019.
    [18]
    L. Lu, Y. Liang, Q. Xiao, and S. Yan, “Evaluating fast algorithms for convolutional neural networks on FPGAs,” in 2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), pp. 101-108, 2017.
    [19]
    D. Huang, X. Zhang, R. Zhang, T. Zhi, D. He, J. Guo, C. Liu, Q. Guo, Z. Du, S. Liu et al., “DWM: A decomposable winograd method for convolution acceleration,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 4, pp. 4174-4181, 2020.
    [20]
    Y. Umuroglu, N. J. Fraser, G. Gambardella, M. Blott, P. Leong, M. Jahre, and K. Vissers, “Finn: A framework for fast, scalable binarized neural network inference,” in Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 65-74, 2017.
    [21]
    K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770-778, 2016.
    [22]
    O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei,“Imagenet large scale visual recognition challenge,” International Journal of Computer Vision, vol. 115, no. 3, pp. 211-252, 2015. doi:10.1007/s11263-015-0816-y
    [23]
    Y. Guo, A. Yao, and Y. Chen,“Dynamic network surgery for efficient DNNs,” Advances in Neural Information Processing Systems, vol. 29, pp. 1387-1395, 2016.
  • 加载中

Catalog

    通讯作者:陈斌, bchen63@163.com
    • 1.

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(12)/Tables(6)

    Article Metrics

    Article views (19) PDF downloads(3) Cited by()
    Proportional views
    Related

    /

    Return
    Return
      Baidu
      map