The High Performance Interconnect Architecture for Supercomputers
DOI:
https://doi.org/10.14529/jsfi230208Keywords:
interconnect, high performance computing, supercomputer, AngaraAbstract
In this paper, we introduce the design of an advanced high-performance interconnect architecture for supercomputers. In the first part of the paper, we consider the first generation high-performance Angara interconnect (Angara G1). The Angara interconnect is based on the router ASIC, which supports a 4D torus topology, a deterministic and an adaptive routing, and has the hardware support of the RDMA technology. The interface with a processor unit is PCI Express. The Angara G1 interconnect has an extremely low communication latency of 850 ns using the MPI library, as well as a link bandwidth of 75 Gbps. In the paper, we present the scalability performance results of the considered application problems on the supercomputers equipped with the Angara G1 interconnect. In the second part of the paper, using research results and experience we present the architecture of the advanced interconnect for supercomputers (G2). The G2 architecture supports 6D torus topology, the advanced deterministic and zone adaptive routing algorithms, and a low-level interconnect operations including acknowledgments and notifications. G2 includes support for exceptions, performance counters, and SR-IOV virtualization. A G2 hardware is planned in the form factor of a 32-port switch with the QSFP-DD connectors and a two-port low profile PCI Express adapter. The switches can be combined to 4D torus topology. We show the performance evaluation of an experimental FPGA prototype, which confirm the possibility of implementing the proposed advanced high performance interconnect architecture.
References
Abts, D.: The Cray XT4 and Seastar 3D torus interconnect (2011)
Adiga, N.R., Blumrich, M.A., Chen, D., et al.: Blue Gene/L torus interconnection network. IBM Journal of Research and Development 49(2.3), 265–276 (2005). https://doi.org/10.1147/rd.492.0265
Agarkov, A., Ismagilov, T., Makagon, D., et al.: Performance evaluation of the Angara interconnect. In: Proc. Int. Conf. on Russian Supercomputing Days, Moscow, Russia. pp. 626–639 (2016). https://russianscdays.org/files/pdf16/626.pdf, accessed: 2023-05-15 (in Russian)
Akimov, V., Silaev, D., Aksenov, A., et al.: FlowVision scalability on supercomputers with Angara interconnect. Lobachevskii Journal of Mathematics 39, 1159–1169 (2018). https://doi.org/10.1134/S1995080218090081
Alam, S.R., Kuehn, J.A., Barrett, R.F., et al.: Cray XT4: an early evaluation for petascale scientific simulation. In: Proceedings of the 2007 ACM/IEEE Conference on Supercomputing. pp. 1–12 (2007). https://doi.org/10.1145/1362622.1362675
Almasi, G., Asaad, S., Bellofatto, R.E., et al.: Overview of the IBM Blue Gene/P project. IBM Journal of Research and Development 52(1-2), 199–220 (2008). https://doi.org/10.1147/rd.521.0199
Alverson, R., Roweth, D., Kaplan, L.: The Gemini system interconnect. In: 2010 18th IEEE Symposium on High Performance Interconnects. pp. 83–87. IEEE (2010). https://doi.org/10.1109/HOTI.2010.23
Chen, D., Eisley, N., Heidelberger, P., et al.: The IBM Blue Gene/Q interconnection fabric. IEEE Micro 32(1), 32–43 (2011). https://doi.org/10.1109/MM.2011.96
Dally, W.J., Seitz, C.L.: The torus routing chip. Distributed computing 1(4), 187–196 (1986). https://doi.org/10.1007/BF01660031
Duato, J., Yalamanchili, S., Ni, L.: Interconnection networks. Morgan Kaufmann (2003)
Gara, A., Blumrich, M.A., Chen, D., et al.: Overview of the Blue Gene/L system architecture. IBM Journal of research and development 49(2.3), 195–212 (2005). https://doi.org/10.1147/rd.492.0195
Mukosey, A., Simonov, A., Semenov, A.: Extended routing table generation algorithm for the Angara interconnect. In: Russian Supercomputing Days. pp. 573–583. Springer (2019). https://doi.org/10.1007/978-3-030-36592-9_47
Nikolskiy, V., Pavlov, D., Stegailov, V.: State-of-the-art molecular dynamics packages for GPU computations: Performance, scalability and limitations. In: Russian Supercomputing Days. pp. 342–355. Springer (2022). https://doi.org/10.1007/978-3-031-22941-1_25
Ostroumova, G., Orekhov, N., Stegailov, V.: Reactive molecular-dynamics study of onion-like carbon nanoparticle formation. Diamond and Related Materials 94, 14–20 (2019). https://doi.org/10.1016/j.diamond.2019.01.019
Polyakov, S., Podryga, V., Puzyrkov, D.: High performance computing in multiscale problems of gas dynamics. Lobachevskii Journal of Mathematics 39(9), 1239–1250 (2018). https://doi.org/10.1134/S1995080218090160
Puente, V., Izu, C., Beivide, R., et al.: The adaptive bubble router. Journal of Parallel and Distributed Computing 61(9), 1180–1208 (2001). https://doi.org/10.1006/jpdc.2001.1746
Pugachev, L., Umarov, I., Popov, V., et al.: PIConGPU on Desmos supercomputer: GPU acceleration, scalability and storage bottleneck. In: Russian Supercomputing Days. pp. 290–302. Springer (2022). https://doi.org/10.1007/978-3-031-22941-1_21
Scott, S.L., et al.: The Cray T3E network: adaptive routing in a high performance 3D torus (1996)
Shamsutdinov, A., Khalilov, M., Ismagilov, T., et al.: Performance of supercomputers based on Angara interconnect and novel AMD CPUs/GPUs. In: International Conference on Mathematical Modeling and Supercomputer Technologies. pp. 401–416. Springer (2020). https://doi.org/10.1007/978-3-030-78759-2_33
Simonov, A.: Simulation model of high-speed Angara communication network with kd-tor topology. Trudy MAI (109), 22–22 (2019). https://doi.org/10.34759/trd-2019-109-22, (in Russian)
Simonov, A., Makagon, D., Zhabin, I., et al.: The first generation of Angara high-speed interconnect. Science Technologies 15(1), 21–28 (2014) (in Russian)
Stegailov, V., Dlinnova, E., Ismagilov, T., et al.: Angara interconnect makes GPU-based Desmos supercomputer an efficient tool for molecular dynamics calculations. The International Journal of High Performance Computing Applications 33(3) (2019). https://doi.org/10.1177/1094342019826667
Stegailov, V., Smirnov, G., Vecher, V.: VASP hits the memory wall: Processors efficiency comparison. Concurrency and Computation: Practice and Experience, p. e5136 (2019). https://doi.org/10.1002/cpe.5136
Tolstykh, M., Goyman, G., Fadeev, R., Shashkin, V.: Structure and algorithms of SLAV atmosphere model parallel program complex. Lobachevskii Journal of Mathematics 39(4), 587–595 (2018). https://doi.org/10.1134/S199508021804014
Downloads
Published
How to Cite
License
Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution-Non Commercial 3.0 License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.