Towards a performance portable, architecture agnostic implementation strategy for weather and climate models

Oliver Fuhrer, Carlos Osuna, Xavier Lapillonne, Tobias Gysi, Ben Cumming, Mauro Bianco, Andrea Arteaga, Thomas Christoph Schulthess

Abstract


We propose a software implementation strategy for complex weather and climate models that produces performance portable, architecture agnostic codes. It relies on domain and data structure specific tools that are usable within common model development frameworks -- Fortran today and possibly high-level programming environments like Python in the future. We present the strategy in terms of a refactoring project of the atmospheric model COSMO, where we have rewritten the dynamical core and refactored the remaining Fortran code. The dynamical core is built on top of the domain specific ``Stencil Loop Language'' for stencil computations on structured grids, a generic framework for halo exchange and boundary conditions, as well as a generic communication library that handles data exchange on a distributed memory system. All these tools are implemented in C++ making extensive use of generic programming and template metaprogramming. The refactored code is shown to outperform the current production code and is performance portable to various hybrid CPU-GPU node architectures.


Full Text:

PDF

References


I. Abrahams and A. Gurtovoy. C++ Template Metaprogramming: Concepts, Tools, And Techniques From Boost And Beyond. The C++ in-Depth Series. Addison Wesley Professional, 2005.

A. Alexandrescu. Modern C++ Design: Generic Programming and Design Patterns Applied. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 2001.

M. Baldauf. Stability analysis for linear discretisations of the advection equation with runge-kutta time integration. Journal of Computational Physics, 227(13):6638 - 6659, 2008.

M. Baldauf, A. Seifert, J. Forstner, D. Majewski, and M. Raschendorfer. Operational convective-scale numerical weather prediction with the cosmo model: Description and sensitivities. Monthly Weather Review, 139:3387-3905, 2011.

M. Bianco. An interface for halo exchange pattern, 2012.

F. Cappelo and D. Etiemble. MPI versus MPI+OpenMP on the IBM SP for the NAS benchmarks. In Proceedings of 2000 International Conference for High Performance Computing, Networking, Storage and Analysis, SC'00. ACM/IEEE International Conference for High Performance Computing, Networking, Storage, and Analysis (SC), 2000.

CAPS. CAPS the many core company, 2014. http://www.caps-entreprise.com/.

Climate Limited-area Modelling Community. http://www.clm-community.eu/.

Consortium for Small-Scale Modeling. http://www.cosmo-model.org/.

Cray Inc. Cray Fortran Reference Manual, 2014. http://docs.cray.com/books/ S-3901-82/S-3901-82.pdf.

M. J. Djomehri and H. H. Jin. Hybrid MPI+OpenMP programming of an overset CFD solver and performance investigations. NASA/NAS Technical Report NAS-02-002, NASA Ames Research Center, 2002.

G. Doms and U. Schattler. The nonhydrostatic limited-area model LM (lokal-model) of the DWD. Part I: Scientic documentation. Technical report, German Weather Service (DWD), Oenbach, Germany, 1999.

M. Govett. F2C-ACC Users Guide, Version 4.2, 2012. http://www.esrl.noaa.gov/gsd/ab/ac/Accelerators.html.

M. Govett, L. Hart, T. Henderson, J. Middleco, and D. Schaer. The scalable modeling system: directive-based code parallelization for distributed and shared memory computers. Parallel Computing, 29(8):995-1020, 2003.

M. Govett, J. Middleco, and T. Henderson. Running the NIM next-generation weather model on gpus. In Proceedings 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing (CCGrid), pages 792-796, 2010.

T. Gysi, O. Fuhrer, C. Osuna, M. Bianco, and T. Schulthess. Stella: A domain-specific language and tool for structured grid methods. submitted.

T. Henderson, J. Middleco, J. Rosinski, M. Govett, and P. Madden. Experience applying fortran GPU compilers to numerical weather prediction. In Proceedings of the 2011 Symposium on Application Accelerators in High-Performance Computing (SAAHPC 2011), pages 34-41, 2011.

Jenkins CI. Jenkins the continuous integration tool, 2014. http://jenkins-ci.org/.

H. Jin, H. Jin, M. Frumkin, M. Frumkin, J. Yan, and J. Yan. The OpenMP implementation of NAS parallel benchmarks and its performance. Technical report, 1999.

W. Langhans, J. Schmidli, O. Fuhrer, S. Bieri, and C. Schar. Long-term simulations of thermally driven ows and orographic convection at convection-parameterizing and cloud resolving resolutions. Journal of Applied Meteorology and Climatology, 52:1490-1510, 2013.

X. Lapillonne and O. Fuhrer. Using compiler directives to port large scientific applications to GPUs: An example from atmospheric science. Parallel Processing Letters, 24(1):1450003, 2014.

J. K. Lazo, J. S. Rice, and M. L. Hagenstad. Benefits of investing in a supercomputer to support weather forecasting research: An example of benefit cost analysis. Yuejiang Academic Journal, 1:1-22, 2010.

Mellanox. Nvidia gpudirect technology|accelerating gpu-based systems. http://www.mellanox.com/pdf/whitepapers/TB_GPU_Direct.pdf.

M. Norman, J. Larkin, R. Archibald, V. Anantharaj, I. Carpenter, P. Micikevicius, and K. Evans. Porting the spectral element community atmosphere model (CAM-SE) to hybrid gpu platforms. In 2012 SC COMPANION: HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SCC), pages 1788-1798, Salt Lake City, UT, 2012.

NVIDIA. CUDA Parallel Computing Platform. https://developer.nvidia.com/cuda.

L. Oliker, X. Li, P. Husbands, and R. Biswas. Effects of ordering strategies and programming paradigms on sparse matrix computations. SIAM Review, 44(3):373-393, 2002.

OpenACC Corporation. The OpenACC Application Programing Interface, 2011. http://www.openacc.org/.

Portland Group Inc. PGI Compiler Reference Manual, 2014. http://www.pgroup.com/doc/pgiref.pdf.

A. Possner, E. Zubler, O. Fuhrer, U. Lohmann, and C. Schar. A case study in modeling lowlying inversions and stratocumulus cloud cover in the bay of biscay. Weather and Forecasting, 29(2):289-304, 2014/06/01 2014.

T. Shimokawabe, T. Aoki, J. Ishida, K. Kawano, and C. Mtiroi. 145 tops performance on 3990 gpus of tsubame 2.0 supercomputer for an operational weather prediction. In Proc. Int. Conf. Comp. Sci., volume 4 of Procedia Computer Science, pages 1535-1544, 2011.

T. Shimokawabe, T. Aoki, C. Muroi, J. Ishida, K. Kawano, T. Endo, A. Nukada, N. Maruyama, and S. Matsuoka. An 80-fold speedup, 15.0 tops full gpu acceleration of non-hydrostatic weather model asuca production code. In High Performance Computing, Networking, Storage and Analysis (SC), 2010 International Conference for, pages 1-11, Nov 2010.

J. Steppeler, G. Doms, U. Schattler, H. Bitzer, A. Gassmann, U. Damrath, and G. Gregoric. Meso gamma scale forecasts using the nonhydrostatic model LM. Meteor. Atmos. Phys., 82, 2002.

N. Stern. Review on the economics of climate change. Technical report, HM Treasury, London, UK, 2006. http://www.hm-treasury.gov.uk/stern_review_report.htm.

The OpenMP ARB. The OpenMP API Specification for Parallel Programming, 2013. http://www.openmp.org.

T. Weustho, F. Ament, M. Arpagaus, and M. W. Rotach. Assessing the benefits of convection-permitting models by neighborhood verification: Examples from map d-phase. Monthly Weather Review, 138:3418-3433, 2010.




Publishing Center of South Ural State University (454080, Lenin prospekt, 76, Chelyabinsk, Russia)