High-performance Shallow Water Model for Use on Massively Parallel and Heterogeneous Computing Systems


  • Andrey V. Chaplygin Marchuk Institute of Numerical Mathematics of the Russian Academy of Sciences
  • Anatoly V. Gusev Marchuk Institute of Numerical Mathematics of the Russian Academy of Sciences
  • Nikolay A. Diansky Lomonosov Moscow State University




shallow water, supercomputer modeling, heterogeneous computing systems, MPI, OpenMP, CUDA


This paper presents the shallow water model, formulated from the ocean general circulation sigma model INMOM (Institute of Numerical Mathematics Ocean Model). The shallow water model is based on software architecture, which separates the physics-related code from parallel implementation features, thereby simplifying the model’s support and development. As an improvement of the two-dimensional domain decomposition method, we present the blocked-based decomposition proposing load-balanced and cache-friendly calculations on CPUs. We propose various hybrid parallel programming patterns in the shallow water model for effective calculation on massively parallel and heterogeneous computing systems and evaluate their scaling performances on the Lomonosov-2 supercomputer. We demonstrate that performance per a single grid point on GPUs dramatically decreases for small grid sizes starting from 219 points per node, while performance on CPUs scales up to 217 well. Although, calculations on GPUs outperform calculations on CPUs by a factor of 4.7 at 30 nodes using 60 GPUs and 360 CPU cores at 6100 x 4460 grid size. We demonstrate that overlapping kernel execution with data transfers on GPUs increases performance by 28%. Furthermore, we demonstrate the advantage of using the load-balancing method in the Azov Sea model on CPUs and GPUs.


Afzal, A., Ansari, Z., Faizabadi, A.R., Ramis, M.K.: Parallelization Strategies for Computational Fluid Dynamics Software: State of the Art Review. Archives of Computational Methods in Engineering 24, 337–363 (2017). https://doi.org/10.1007/s11831-016-9165-4

Shchepetkin, A.F., McWilliams, J.C.: The regional oceanic modeling system (ROMS): a split-explicit, free-surface, topography-following-coordinate oceanic model. Ocean Modelling 9(4), 347–404 (2005). https://doi.org/10.1016/j.ocemod.2004.08.002

Porter, A.R., Appleyard, J., Ashworth, M., et al.: Portable multi- and many-core performance for finite-difference or finite-element codes – application to the free-surface component of NEMO (NEMOLite2D 1.0). Geosci. Model Dev. 11, 3447–3464 (2018). https://doi.org/10.5194/gmd-11-3447-2018

Chaplygin, A.V., Dianskii, N.A., Gusev, A.V.: Load balancing using Hilbert space-filling curves for parallel shallow water simulations. Num. Meth. Prog. 20:1, 75–87 (2019). https://doi.org/10.26089/NumMet.v20r108

Chaplygin, A.V., Diansky, N.A., Gusev, A.V.: Parallel Modeling of Nonlinear Shallow Water Equation. In: Proc. 60th All-Russia Conf. on Applied Mathematics and Informatics, Moscow Institute of Physics and Technology, Dolgoprudny, Russia, November 20-26, 2017. pp. 192–194. Moscow Inst. Phys. Technol., Dolgoprudny (2017)

Chaplygin, A.V., Gusev, A.V.: Shallow Water Model Using a Hybrid MPI/OpenMP Parallel Programming. Problems of Informatics 1, 65–82 (2021). https://doi.org/10.24411/2073-0667-2021-10006

Christidis, Z.: Performance and Scaling of WRF on Three Different Parallel Supercomputers. In: Kunkel, J., Ludwig, T. (eds.) High Performance Computing. ISC High Performance 2015. Lecture Notes in Computer Science, vol. 9137, pp. 514–528. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-20119-1_37

Akhmetova, D., Iakymchuk, R., Ekeberg, O., Laure, E.: Performance Study of Multithreaded MPI and OpenMP Tasking in a Large Scientific Code. 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 756–765. IEEE (2017) https://doi.org/10.1109/IPDPSW.2017.128

Diansky, N.A.: Ocean circulation modelling and research of its response to short-term and long-term atmospheric forcing. Fizmatlit, Moscow (2013)

Volodin, E.M., Diansky, N.A., Gusev, A.V.: Simulation and Prediction of Climate Changes in the 19th to 21st Centuries with the Institute of Numerical Mathematics, Russian Academy of Sciences, Model of the Earths Climate System. Izv., Atmos. Ocean. Phys. 49(4), 347–366 (2013)

Fomin, V.V., Diansky, N.A.: Simulation of Extreme Surges in the Taganrog Bay with Atmosphere and Ocean Circulation Models. Russ. Meteorol. Hydrol. 43, 843–851 (2018). https://doi.org/10.3103/S1068373918120051

Fu, H., Gan, L., Yang, C., et al.: Solving global shallow water equations on heterogeneous supercomputers. PLoS ONE 12(3), e0172583 (2017). https://doi.org/10.1371/journal.pone.0172583

Lawrence, B.N., Rezny, M., Budich, R., et al.: Crossing the chasm: how to develop weather and climate models for next generation computers? Geosci. Model Dev. 11, 1799–1821 (2018). https://doi.org/10.5194/gmd-11-1799-2018

Bari, M.S., Stoltzfus, L., Lin, P., et al.: Is Data Placement Optimization Still Relevant on Newer GPUs? 2018 IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS), Dallas, TX, USA, 2018. pp. 83–96. IEEE, 2018. https://doi.org/10.1109/PMBS.2018.8641666

NEMO Consortium. NEMO development strategy Version 2: 2018-2022. https://www.nemo-ocean.eu/wp-content/uploads/NEMO_Development_Strategy_Version2_2018-2022.pdf (2018)

Tintó, O., Acosta, M., Castrillo, M., et al.: Optimizing domain decomposition in an ocean model: the case of NEMO. Procedia Computer Science 108, 776–785 (2017). https://doi.org/10.1016/j.procs.2017.05.257

Perezhogin, P., Chernov, I., Iakovlev, N.: Advanced parallel implementation of the coupled oceanice model FEMAO (version 2.0) with load balancing. Geosci. Model Dev. 14, 843–857 (2021). https://doi.org/10.5194/gmd-14-843-2021

Reguly, I.Z., Giles, D., Gopinathan, D., et al.: The VOLNA-OP2 tsunami code (version 1.5). Geosci. Model Dev. 11, 4621–4635 (2018). https://doi.org/10.5194/gmd-11-4621-2018

Saburin, D.S., Elizarova, T.G.: Modelling the Azov Sea circulation and extreme surges in 2013-2014 using the regularized shallow water equations. Russian Journal of Numerical Analysis and Mathematical Modelling 33(3), 173–185 (2018). https://doi.org/10.1515/rnam-2018-0015

Smith, R., Jones, P., Briegleb, B., et al.: The parallel ocean program (POP) reference manual: ocean component of the Community Climate System Model (CCSM) and Community Earth System Model (CESM). http://www.cesm.ucar.edu/models/cesm1.0/pop2/doc/sci/POPRefManual.pdf

Liu, T., Zhuang, Y., Tian, M., et al.: Parallel Implementation and Optimization of Regional Ocean Modeling System (ROMS) Based on Sunway SW26010 Many-Core Processor. IEEE Access 7, 146170–146182 (2019). https://doi.org/10.1109/ACCESS.2019.2944922

van Werkhoven, B., Maassen, J., Kliphuis, M., et al.: A distributed computing approach to improve the performance of the parallel ocean program. Geosci. Model Dev. 7, 267–281 (2014). https://doi.org/10.5194/gmd-7-267-2014

Voevodin, Vl., Antonov, A., Nikitenko, D., et al.: Supercomputer Lomonosov-2: Large Scale, Deep Monitoring and Fine Analytics for the User Community. Supercomputing Frontiers and Innovations 6(2), 4–11 (2019). https://doi.org/10.14529/jsfi190201

Wilhelmsson, T.: Parallelization of the HIROMB ocean model. https://pdfs.semanticscholar.org/ee95/be1a6bb90becdc31c84f83c343ca8daf5bdc.pdf

Xu, S., Huang, X., Oey, L.-Y., et al.: POM.gpu-v1.0: a GPU-based Princeton Ocean Model. Geosci. Model Dev. 8, 2815–2827 (2015). https://doi.org/10.5194/gmd-8-2815-2015




How to Cite

Chaplygin, A. V., Gusev, A. V., & Diansky, N. A. (2022). High-performance Shallow Water Model for Use on Massively Parallel and Heterogeneous Computing Systems. Supercomputing Frontiers and Innovations, 8(4), 74–93. https://doi.org/10.14529/jsfi210407