Article Contents

Article Navigation> JOURNAL OF BEIJING INSTITUTE OF TECHNOLOGY> 2017> 26(4): 494-504

Rui Shan, Lin Jiang, Junyong Deng, Xueting Li, Xubang Shen. Design and Implementation of Memory Access Fast Switching Structure in Cluster-Based Reconfigurable Array Processor[J]. JOURNAL OF BEIJING INSTITUTE OF TECHNOLOGY, 2017, 26(4): 494-504. doi: 10.15918/j.jbit1004-0579.201726.0409

Citation:

Rui Shan, Lin Jiang, Junyong Deng, Xueting Li, Xubang Shen. Design and Implementation of Memory Access Fast Switching Structure in Cluster-Based Reconfigurable Array Processor[J].JOURNAL OF BEIJING INSTITUTE OF TECHNOLOGY, 2017, 26(4): 494-504.doi:10.15918/j.jbit1004-0579.201726.0409

Citation:

Rui Shan, Lin Jiang, Junyong Deng, Xueting Li, Xubang Shen. Design and Implementation of Memory Access Fast Switching Structure in Cluster-Based Reconfigurable Array Processor[J].JOURNAL OF BEIJING INSTITUTE OF TECHNOLOGY, 2017, 26(4): 494-504.doi:10.15918/j.jbit1004-0579.201726.0409

PDF( 1049 KB)

Design and Implementation of Memory Access Fast Switching Structure in Cluster-Based Reconfigurable Array Processor

doi:10.15918/j.jbit1004-0579.201726.0409

1.
School of Micro-electronics, Xidian University, Xi'an 710071, China
2.
School of Electronic Engineering, Xi'an University of Posts and Telecommunication, Xi'an 710121, China
3.
School of Computer Science & Technology, Xi'an University of Posts and Telecommunication, Xi'an 710121, China

Received Date:2016-12-16

Abstract

Abstract

Memory access fast switching structures in cluster are studied, and three kinds of fast switching structures (FS, LR2SS, and LAPS) are proposed. A mixed simulation test bench is constructed and used for statistic of data access delay among these three structures in various cases. Finally these structures are realized on Xilinx FPGA development board and DCT, FFT, SAD, IME, FME, and de-blocking filtering algorithms are mapped onto the structures. Compared with available architectures, our proposed structures have lower data access delay and lower area.
- array processor,
- distributed memory,
- memory access,
- switching structure

FullText(HTML)

References (21)

References

[1]	Shi C, Yang J, Han Y, et al. A 1000 fps vision chip based on a dynamically reconfigurable hybrid architecture comprising a PE array processor and self-organizing map neural network[J]. IEEE Journal of Solid-State Circuits, 2014, 49(9):2067-2082.
[2]	Chen Yang, Leibo Liu, Yansheng Wang, et al. Configuration approaches to enhance computing efficiency of coarse-grained reconfigurable array[J]. Journal of Circuits System & Computers, 2015, 24(3):426-429.
[3]	Patel K, Bleakley C J. Coarse grained reconfigurable array based architecture for low power real-time seizure detection[J]. Journal of Signal Processing Systems, 2016, 82(1):55-68.
[4]	Tang C, Liu D, Xing Z, et al. Memory access analysis of many-core system with abundant bandwidth[C]//IEEE International Symposium on Embedded Multicore/many-Core Systems-On-Chip, Turin, Italy, 2015.
[5]	Chen Y, Liu L B, Yin S Y, et al. Efficient and flexible memory architecture to alleviate data and context bandwidth bottlenecks of coarse-grained reconfigurable arrays[J]. Science China Physics, Mechanics & Astronomy, 2014, 57(12):2214-2227.
[6]	Liu Y, Zhang W. Scratchpad memory architectures and allocation algorithms for hard real-time multicore processors[J]. Journal of Computing Science & Engineering, 2015, 9(2):51-72.
[7]	Chakraborty P, Panda P R, Sen S. Partitioning and data mapping in reconfigurable cache and scratchpad memory-based architectures[J]. Acm Transactions on Design Automation of Electronic Systems, 2016, 22(1):1-25.
[8]	Nouri S, Hussain W, Nurmi J. Implementation of IEEE-802.11a/g receiver blocks on a coarse-grained reconfigurable array[C]//Design and Architectures for Signal and Image Processing, Cracow, Poland, 2015.
[9]	Majzoub S, Diab H. MorphoSys reconfigurable hardware for cryptography:the two fish case[J]. The Journal of Supercomputing, 2012, 59(1):22-41.
[10]	Bell S, Edwards B, Amann J, et al. TILE64-processor:a 64-Core SoC with mesh interconnect[C]//IEEE International Solid-state Circuits Conference, San Francisco, America, 2008.
[11]	Li T, Xiao L, Huang H, et al. PAAG:A polymorphic array architecture for graphics and image processing[C]//International Symposium on Parallel Architectures, Algorithms and Programming, Taipei, Taiwan, China, 2012.
[12]	Wang K, Gu H, Yang Y, et al. Optical interconnection network for parallel access to multi-rank memory in future computing systems.[J]. Optics Express, 2015, 23(16):20480-20494.
[13]	Wang Y, Gu H, Wang K, et al. Low-power low-latency optical network architecture for memory access communication[J]. IEEE/OSA Journal of Optical Communications and Networking, 2016, 8(10):757-764.
[14]	Li B M, Leong P H. Serial and parallel FPGA-based variable block size motion estimation processors[J]. Journal of Signal Processing Systems, 2008, 51(1):77-98.
[15]	Medhat A, Shalaby A, Sayed M S, et al. A highly parallel SAD architecture for motion estimation in HEVC encoder[C]//Circuits and Systems, Okinawa, Japan, 2014.
[16]	Hetul Sanghvi. 2D cache architecture for motion compensation in a 4K Ultra-HD AVE and HEVC video codec system[C]//2014 IEEE International Conference on Consumer Electronics, Lasvegas, America, 2014.
[17]	Liu L B,Wang Y S,Yin S Y,et al. Row-based configuration mechanism for a 2-D processing element array in coarse-grained reconfigurable architecture[J]. Science China Information Sciences,2014, 57(10):1-18
[18]	Ruiz G A, Michell J A. An efficient VLSI architecture of fractional motion estimation in H.264 for HDTV[J]. Journal of Signal Processing Systems, 2011, 62(3):443-457.
[19]	Hu Z, Cuvillo J D, Zhu W, et al. Optimization of dense matrix multiplication on IBM Cyclops-64:challenges and experiences[C]//Euro-Par 2006, Parallel Processing, 12th International Euro-Par Conference, Dresden, Germany, 2006.
[20]	Zhang Y P, Jeong T, Chen F, et al. A study of the on-chip interconnection network for the IBM Cyclops64 multi-core architecture[C]//Parallel and Distributed Processing Symposium, IPDPS 2006, Rhodes Island, Greece, 2006.
[21]	Loi I, Benini L. An efficient distributed memory interface for many-core platform with 3D stacked DRAM.[C]//IEEE Design, Automation & Test in Europe Conference & Exhibition (DATE), Dresden, Germany, 2010.

Relative Articles

Supplements (0)

Cited By

Proportional views

Proportional views

通讯作者:陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Get Citation

PDF

XML

Article Metrics

Article views (501) PDF downloads(486)

Design and Implementation of Memory Access Fast Switching Structure in Cluster-Based Reconfigurable Array Processor

doi:10.15918/j.jbit1004-0579.201726.0409

Abstract

References

Proportional views

Catalog

通讯作者:陈斌, bchen63@163.com

Article Metrics

Proportional views

Related

Design and Implementation of Memory Access Fast Switching Structure in Cluster-Based Reconfigurable Array Processor

doi:10.15918/j.jbit1004-0579.201726.0409

Abstract

References

Proportional views

Catalog

通讯作者:陈斌, bchen63@163.com

Article Metrics

Proportional views

Related

Export File

Citation

Format

Content