Performance Evaluation of Different Implementation Schemes of an Iterative Flow Solver on Modern Vector Machines

Kenta Yamaguchi; Takashi Soga; Yoichi Shimomura; Thorsten Reimann; Kazuhiko Komatsu; Ryusuke Egawa; Akihiro Musa; Hiroyuki Takizawa; Hiroaki Kobayashi

doi:10.14529/jsfi190106

Authors

Kenta Yamaguchi NEC Solution Innovators
Takashi Soga NEC Solution Innovators
Yoichi Shimomura NEC Solution Innovators
Thorsten Reimann Technische Universität Darmstadt
Kazuhiko Komatsu Tohoku University
Ryusuke Egawa Tohoku University
Akihiro Musa Tohoku University
Hiroyuki Takizawa Tohoku University
Hiroaki Kobayashi Tohoku University

DOI:

https://doi.org/10.14529/jsfi190106

Abstract

Modern supercomputers consist of multi-core processors, and these processors have recently employed vector instructions, or so-called SIMD instructions, to improve performances. Numerical simulations need to be vectorized in order to achieve higher performance on these processors. Various legacy numerical simulation codes that have been utilized for a long time often contain two versions of source codes: a non-vectorized version and a vectorized version that is optimized for old vector supercomputers. It is important to clarify which version is better for modern supercomputers in order to achieve higher performance. In this paper, we evaluate the performances of a legacy fluid dynamics simulation code called FASTEST on modern supercomputers in order to provide a guidepost for migrating such codes to modern supercomputers. The solver has a nonvectorized version and a vectorized version, and the latter uses the hyperplane ordering method for vectorization. For the evaluation, we also implement the red-black ordering method, which is another way to vectorize the solver. Then, we examine the performance on NEC SX-ACE, SXAurora TSUBASA, Intel Xeon Gold, and Xeon Phi. The results show that the shortest execution times are with the red-black ordering method on SX-ACE and SX-Aurora TSUBASA, and with the non-vectorized version on Xeon Gold and Xeon Phi. Therefore, achieving a higher performance on multiple modern supercomputers potentially requires maintenance of multiple code versions. We also show that the red-black ordering method is more promising to achieve high performance on modern supercomputers.

References

Fujino, S., Mori, M., Takeuchi, T.: Performance of Hyperplane ordering on vector computers, Journal of Computational and Applied Mathematics 38, 125–136 (1991), DOI: 10.1016/0377-0427(91)90165-G

Soga, T., Musa, A., Okabe, K., Komatsu, K., Egawa, R., Takizawa, H., Kobayashi, H., Takahashi, S., Sasaki, D., Nakahashi, K.: Performance of SOR Methods on Modern Vector and Scalar Processors, Journal of the Computers & Fluids 45(1), 215–221 (2011), DOI: 10.1016/j.compfluid.2010.12.024

Project Site for Fastest at Technische Universitat Darmsadt. https://www.fnb.tudarmstadt.de/forschung_fnb/software_fnb/software_fnb.en.jsp, accessed: 2019-01-21

Scheit, C., Becker, S., Hager, G., Treibig, J., Wellein, G.: Optimization of FASTEST-3D for Modern Multicore System. https://arxiv.org/pdf/1303.4538, accessed: 2019-01-21

Stone, H.L.: Iterative solution of implicit approximations of multidimensional partial differential equations, SIAM Journal on Numerical Analysis 5(3), 530–558 (1968), DOI: 10.1137/0705044

Burger, M., Bischof, C.: Optimizing the memory access performance of FASTEST's sipsol routine. In: 6th European Conference on Computational Fluid Dynamics, ECFD VI, July 2014, Barcelona, Spain. DOI: 10.13140/RG.2.2.32568.14089

Top 500 supercomputers sites. https://www.top500.org/, accessed: 2019-01-21

Momose, S.: Next generation vector supercomputer for providing higher sustained performance. In: Proceedings of IEEE Symposium on Low-Power and High-Speed Chips and Systems XVI, COOLChips 19, Yokohama, Japan, April 17–19, 2013

Egawa, R., Komatsu, K., Momose, S., Isobe, Y., Musa, A., Takizawa, H., Kobayashi, H.: Potential of a Modern Vector Supercomputer for Practical Applications - Performance Evaluation of SX-ACE , The Journal of Supercomputing 73(9), 3948–3976 (2017), DOI: 10.1007/s11227-017-1993-y

Yamada Y., Momose, S.: Vector Engine Processor of NEC's Brand-New Supercomputer SX-Aurora TSUBASA. In: Proceedings of A Symposium on High Performance Chips, Hot Chips 30, Cupertino, California, USA, August 19–21, 2018

Komatsu, K., Momose, S., Isobe, Y., Sato, M., Musa, A., Kobayashi, H.: Early Evaluation of a New Vector Processor SX-Aurora TSUBASA. In: Research Poster of ISC Higher Performance 2018, ISC 18, Frankfurt, Germany, June 24–28, 2018

Mulnix, D.: Intel Xeon Processor Scalable Family Technical Overview. https://software.intel.com/en-us/articles/intel-xeon-processor-scalable-family-technical-overview, accessed: 2019-01-21

Sodani, A., Gramunt, R., Corbal, J., Kim, H.-S., Vinod, K., Chinthamani, S., Hutsell, S., Agarwal, R., Liu, Y.-C.: Knight Landing: Second-Generation Intel Xeon Phi Product, IEEE Micro 36(2), 34–46 (2016), DOI: 10.1109/MM.2016.25

Patankar, S.V.: Numerical heat transfer and fluid flow, Hemisphere Publishing Corporation (1980)

Takizawa, H., Hirasawa, S., Hayashi, Y., Egawa, R., Kobayashi, H.: Xevolver: An XML-based code translation framework for supporting HPC application migration. In: Proceedings of IEEE International Conference on High Performance Computing, HiPC 2014, Goa, India, December 17–20, 2014, vol. 1, pp. 1–11 (2014)

Suda, R., Takizawa, H., Hirasawa, S.: Xevtgen: Fortran code transformer generator for high performance scientific codes, International Journal of Networking and Computing 6(2), 263–289 (2016)