The MareNostrum Experimental Exascale Platform (MEEP)
DOI:
https://doi.org/10.14529/jsfi210105Abstract
Nascent Open Source Instruction Set Architectures such as OpenPOWER or RISC-V, allow software/hardware co-designers to fully utilize the underlying hardware, modify it or extend it based on their needs. In this paper, we introduce the vision of the MareNostrum Experimental Exascale Platform (MEEP), an Open Source platform enabling software and hardware stack experimentation targeting the High-Performance Computing (HPC) ecosystem. MEEP is built with state-of-the-art FPGAs that support PCIe and High Bandwidth Memory (HBM), making it ideal to emulate chiplet-based HPC accelerators such as ACME, at the chip, package, and/or system level. MEEP provides an FPGA Shell containing standardized interfaces (I/O and memory), enabling an emulated accelerator to communicate with the hardware of the FPGA and ensures quick integration. The first demonstration of MEEP is mapping a new accelerator, the Accelerated Compute and Memory Engine (ACME), on to this digital laboratory. This enables exploration of this novel disaggregated architecture, which separates the computation from the memory operations, optimizing the accelerator for both dense (compute-bound) as well as sparse (memory-bandwidth bound) workloads. Dense workloads focus on the computational capabilities of the engine, while dedicated processors for memory accesses optimize non-unit stride and/or random memory accesses required by sparse workloads. MEEP is an open source digital laboratory that can provide a future environment for full-stack co-design and pre-silicon exploration. MEEP invites software developers and hardware engineers to build the application, compiler, libraries and the hardware to solve future challenges in the HPC, AI, ML, and DL domains.References
Fact Sheet & Background: Roadrunner Smashes the Petaflop Barrier (2008), http://www-03.ibm.com/press/us/en/pressrelease/24405.wss
The Mont-Blanc Project (2020), https://www.montblanc-project.eu/
Basic Linear Algebra Subprograms (2021), http://www.netlib.org/blas/
The European Processor Initiative (2021), https://www.european-processor-initiative.eu/
Abadi, M., Agarwal, A., Barham, P., et al.: TensorFlow: Large-scale machine learning on heterogeneous systems (2015), https://www.tensorflow.org/
Altman, A., Arafa, M., Balasubramanian, K., et al.: Intel Optane Data Center Persistent Memory. In: 2019 IEEE Hot Chips 31 Symposium (HCS), 18-20 Aug. 2019, Cupertino, CA, USA. pp. i–xxv. IEEE (2019), DOI: 10.1109/HOTCHIPS.2019.8875668
Balkind, J., McKeown, M., Fu, Y., et al.: OpenPiton: an open source hardware platform for your research. Commun. ACM 62(12), 79–87 (2019), DOI: 10.1145/3366343
BSC: eProcessor: European, extendable, energy-efficient, energetic, embedded, extensible, Processor Ecosystem (2021), https://www.bsc.es/research-and-development/projects/eprocessor-european-extendable-energy-efficient-energetic-embedded
Chung, E.S., Nurvitadhi, E., Hoe, J.C., Falsafi, B., Mai, K.: PROToFLEX: FPGA-accelerated Hybrid Functional Simulator. In: 2007 IEEE International Parallel and Distributed Processing Symposium, 26-30 March 2007, Long Beach, CA, USA. pp. 1–6. IEEE (2007), DOI: 10.1109/IPDPS.2007.370516
Flich, J.: MANGO: Exploring Manycore Architectures for Next-GeneratiOn HPC Systems. In: 2017 Euromicro Conference on Digital System Design (DSD), 30 Aug.-1 Sept. 2017, Vienna, Austria. pp. 478–485. IEEE (2017), DOI: 10.1109/DSD.2017.51
Hashemi, M., Mutlu, O., Patt, Y.N.: Continuous runahead: Transparent hardware acceleration for memory intensive workloads. In: 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 15-19 Oct. 2016, Taipei, Taiwan. pp. 61:1–61:12. IEEE Computer Society (2016), DOI: 10.1109/MICRO.2016.7783764
Izraelevitz, J., Yang, J., Zhang, L., et al.: Basic Performance Measurements of the Intel Optane DC Persistent Memory Module. CoRR abs/1903.05714 (2019), http://arxiv.org/abs/1903.05714
Jiang, N., Becker, D.U., Michelogiannakis, G., et al.: A Detailed and Flexible Cycle-Accurate Network-on-Chip Simulator. In: 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), 21-23 April 2013, Austin, TX, USA. pp. 86–96. IEEE (2013), DOI: 10.1109/ISPASS.2013.6557149
Lattner, C., Adve, V.: LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation. In: Proceedings of the 2004 International Symposium on Code Generation and Optimization, 20-24 March 2004, San Jose, CA, USA. pp. 75–86. IEEE (2004), DOI: 10.1109/CGO.2004.1281665
Leyva-Santes, N.I., Perez, I., Hernández-Calderón, C.A., et al.: Lagarto I RISC-V Multi-core: Research Challenges to Build and Integrate a Network-on-Chip. In: Supercomputing, 25-29 March 2019, Monterrey, Mexico,. pp. 237–248. Springer, Cham (2019), DOI: 10.1007/978-3-030-38043-4_20
Lordan, F., Tejedor, E., Ejarque, J., et al.: ServiceSs: An Interoperable Programming Framework for the Cloud. Journal of Grid Computing 12, 1–25 (2013), DOI: 10.1007/s10723-013-9272-5
Message Passing Interface Forum: MPI: A message-passing interface standard. Version 3.1 (2015), https://www.mpi-forum.org/docs/mpi-3.1/mpi31-report.pdf, accessed: 2021-04-23
OpenMP Architecture Review Board: OpenMP Application Programming Interface 5.1 (2020), https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5-1.pdf, accessed: 2021-04-23
Perez, B., Fell, A., Davis, J.D.: Coyote: An Open Source Simulation Tool to Enable RISC-V in HPC. In: Design, Automation, and Test in Europe, DATE (2021)
Pillet, V., Labarta, J., Cortes, T., Girona, S.: PARAVER: A Tool to visualize and analyze parallel Code. WoTUG-18 44 (1995)
RISCV: The Spike RISC-V ISA Simulator (2021), https://github.com/riscv/riscv-isa-sim
Semidynamics: Open Vector Interface (2021), https://github.com/semidynamics/OpenVectorInterface
Si-Five: The Sparta Framework (2021), https://github.com/sparcians/map/tree/master/sparta
Smith, J.E.: Decoupled Access/Execute Computer Architectures. ACM Trans. of Computer Sys. 2(4), 289 (1984)
Srinivasan, V., Chowdhury, R.B.R., Rotenberg, E.: Slipstream Processors Revisited: Exploiting Branch Sets. In: 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA), 30 May-3 June 2020, Valencia, Spain. pp. 105–117. IEEE (2020), DOI: 10.1109/ISCA45697.2020.00020
Tan, Z., Qian, Z., Chen, X., et al.: DIABLO: A Warehouse-Scale Computer Network Simulator Using FPGAs. In: Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems, 14-18 March 2015, Istanbul, Turkey. pp. 207–221. ACM, New York, NY, USA (2015), DOI: 10.1145/2694344.2694362
Wissolik, M., Zacher, D., Torza, A., Day, B.: Virtex UltraScale+ HBM FPGA: A Revolutionary Increase in Memory Performance (2019)
Xilinx: UltraScale Architecture GTY Transceivers (2017)
Xilinx: AXI High Bandwidth Memory Controller v1.0 LogiCORE IP Product Guide (2019)
Xilinx: Alveo U280 Data Center Accelerator Card (2020), https://www.xilinx.com/products/boards-and-kits/alveo/u280.html
Xilinx: Virtex UltraScale+ (2020), https://www.xilinx.com/products/silicon-devices/fpga/virtex-ultrascale-plus.html
Xilinx: UltraScale Devices Integrated 100G Ethernet Subsystem v2.6 (2021)
Yang, J., Kim, J., Hoseinzadeh, M., et al.: An Empirical Guide to the Behavior and Use of Scalable Persistent Memory. CoRR abs/1908.03583 (2019), http://arxiv.org/abs/1908.03583
Zaharia, M., Xin, R.S., Wendell, P., et al.: Apache Spark: A unified engine for big data processing. Communications of the ACM 59(11), 56–65 (2016), DOI: 10.1145/2934664
Downloads
Published
How to Cite
Issue
License
Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution-Non Commercial 3.0 License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.