Welcome to Journal of Beijing Institute of Technology
Volume 26Issue 4
.
Turn off MathJax
Article Contents
Rui Shan, Lin Jiang, Junyong Deng, Xueting Li, Xubang Shen. Design and Implementation of Memory Access Fast Switching Structure in Cluster-Based Reconfigurable Array Processor[J]. JOURNAL OF BEIJING INSTITUTE OF TECHNOLOGY, 2017, 26(4): 494-504. doi: 10.15918/j.jbit1004-0579.201726.0409
Citation: Rui Shan, Lin Jiang, Junyong Deng, Xueting Li, Xubang Shen. Design and Implementation of Memory Access Fast Switching Structure in Cluster-Based Reconfigurable Array Processor[J].JOURNAL OF BEIJING INSTITUTE OF TECHNOLOGY, 2017, 26(4): 494-504.doi:10.15918/j.jbit1004-0579.201726.0409

Design and Implementation of Memory Access Fast Switching Structure in Cluster-Based Reconfigurable Array Processor

doi:10.15918/j.jbit1004-0579.201726.0409
  • Received Date:2016-12-16
  • Memory access fast switching structures in cluster are studied, and three kinds of fast switching structures (FS, LR2SS, and LAPS) are proposed. A mixed simulation test bench is constructed and used for statistic of data access delay among these three structures in various cases. Finally these structures are realized on Xilinx FPGA development board and DCT, FFT, SAD, IME, FME, and de-blocking filtering algorithms are mapped onto the structures. Compared with available architectures, our proposed structures have lower data access delay and lower area.
  • loading
  • [1]
    Shi C, Yang J, Han Y, et al. A 1000 fps vision chip based on a dynamically reconfigurable hybrid architecture comprising a PE array processor and self-organizing map neural network[J]. IEEE Journal of Solid-State Circuits, 2014, 49(9):2067-2082.
    [2]
    Chen Yang, Leibo Liu, Yansheng Wang, et al. Configuration approaches to enhance computing efficiency of coarse-grained reconfigurable array[J]. Journal of Circuits System & Computers, 2015, 24(3):426-429.
    [3]
    Patel K, Bleakley C J. Coarse grained reconfigurable array based architecture for low power real-time seizure detection[J]. Journal of Signal Processing Systems, 2016, 82(1):55-68.
    [4]
    Tang C, Liu D, Xing Z, et al. Memory access analysis of many-core system with abundant bandwidth[C]//IEEE International Symposium on Embedded Multicore/many-Core Systems-On-Chip, Turin, Italy, 2015.
    [5]
    Chen Y, Liu L B, Yin S Y, et al. Efficient and flexible memory architecture to alleviate data and context bandwidth bottlenecks of coarse-grained reconfigurable arrays[J]. Science China Physics, Mechanics & Astronomy, 2014, 57(12):2214-2227.
    [6]
    Liu Y, Zhang W. Scratchpad memory architectures and allocation algorithms for hard real-time multicore processors[J]. Journal of Computing Science & Engineering, 2015, 9(2):51-72.
    [7]
    Chakraborty P, Panda P R, Sen S. Partitioning and data mapping in reconfigurable cache and scratchpad memory-based architectures[J]. Acm Transactions on Design Automation of Electronic Systems, 2016, 22(1):1-25.
    [8]
    Nouri S, Hussain W, Nurmi J. Implementation of IEEE-802.11a/g receiver blocks on a coarse-grained reconfigurable array[C]//Design and Architectures for Signal and Image Processing, Cracow, Poland, 2015.
    [9]
    Majzoub S, Diab H. MorphoSys reconfigurable hardware for cryptography:the two fish case[J]. The Journal of Supercomputing, 2012, 59(1):22-41.
    [10]
    Bell S, Edwards B, Amann J, et al. TILE64-processor:a 64-Core SoC with mesh interconnect[C]//IEEE International Solid-state Circuits Conference, San Francisco, America, 2008.
    [11]
    Li T, Xiao L, Huang H, et al. PAAG:A polymorphic array architecture for graphics and image processing[C]//International Symposium on Parallel Architectures, Algorithms and Programming, Taipei, Taiwan, China, 2012.
    [12]
    Wang K, Gu H, Yang Y, et al. Optical interconnection network for parallel access to multi-rank memory in future computing systems.[J]. Optics Express, 2015, 23(16):20480-20494.
    [13]
    Wang Y, Gu H, Wang K, et al. Low-power low-latency optical network architecture for memory access communication[J]. IEEE/OSA Journal of Optical Communications and Networking, 2016, 8(10):757-764.
    [14]
    Li B M, Leong P H. Serial and parallel FPGA-based variable block size motion estimation processors[J]. Journal of Signal Processing Systems, 2008, 51(1):77-98.
    [15]
    Medhat A, Shalaby A, Sayed M S, et al. A highly parallel SAD architecture for motion estimation in HEVC encoder[C]//Circuits and Systems, Okinawa, Japan, 2014.
    [16]
    Hetul Sanghvi. 2D cache architecture for motion compensation in a 4K Ultra-HD AVE and HEVC video codec system[C]//2014 IEEE International Conference on Consumer Electronics, Lasvegas, America, 2014.
    [17]
    Liu L B,Wang Y S,Yin S Y,et al. Row-based configuration mechanism for a 2-D processing element array in coarse-grained reconfigurable architecture[J]. Science China Information Sciences,2014, 57(10):1-18
    [18]
    Ruiz G A, Michell J A. An efficient VLSI architecture of fractional motion estimation in H.264 for HDTV[J]. Journal of Signal Processing Systems, 2011, 62(3):443-457.
    [19]
    Hu Z, Cuvillo J D, Zhu W, et al. Optimization of dense matrix multiplication on IBM Cyclops-64:challenges and experiences[C]//Euro-Par 2006, Parallel Processing, 12th International Euro-Par Conference, Dresden, Germany, 2006.
    [20]
    Zhang Y P, Jeong T, Chen F, et al. A study of the on-chip interconnection network for the IBM Cyclops64 multi-core architecture[C]//Parallel and Distributed Processing Symposium, IPDPS 2006, Rhodes Island, Greece, 2006.
    [21]
    Loi I, Benini L. An efficient distributed memory interface for many-core platform with 3D stacked DRAM.[C]//IEEE Design, Automation & Test in Europe Conference & Exhibition (DATE), Dresden, Germany, 2010.
  • 加载中

Catalog

    通讯作者:陈斌, bchen63@163.com
    • 1.

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Article Metrics

    Article views (501) PDF downloads(486) Cited by()
    Proportional views
    Related

    /

      Return
      Return
        Baidu
        map