The Parallel Performance of SLNE Atmosphere–Ocean–Sea Ice Coupled Model

The paper presents the ﬁrst version of SLNE coupled model. SL and NE here are the ﬁrst two letters from SLAV (Semi-Lagrangian, based on Absolute Vorticity equation) atmospheric model and NEMO (Nucleus for European Modelling of the Ocean) ocean model that have been coupled using OASIS3-MCT software. SLAV uses 0 . 9 ◦ x0 . 72 ◦ regular lat-lon grid with 96 vertical levels. NEMO incorporates SI3 sea ice model. Both of them use the same ORCA025 tripolar grid. Flux adjustments to correct inconsistencies at the interface between coupled atmosphere– ocean models have not been applied in SLNE. The model design and coupling particularities are described here in detail. A series of numerical experiments with SLNE model were performed to measure its parallel performance. We also investigated the scalability of SLNE model and its components in terms of simulation speed. Based on these results, an optimum conﬁgurations of SLNE were identiﬁed. It was found that the coupled model showed scaling eﬃciency of about 85% on 4000 computational cores of Cray XC40-LC in comparison to the SLNE conﬁguration running on 224 cores. Simulations with lead times ranging from a few days to several years showed that there are no signiﬁcant systematic errors in the coupled model.


Introduction
Modern coupled models for simulating climate change were preceded by their earlier counterparts [12].The history of the development of such models begins with conceptual models [5,50], followed by mathematical models of energy balance [4,53] and radiative transfer [7], as well as simple analog models [19,47].Smagorinsky was probably the first researcher to realize the importance of atmosphere-ocean coupling for climate modelling.Under his leadership, the first coupled atmosphere-ocean general circulation model was created [41] at the Geophysical Fluid Dynamics Laboratory (GFDL).Concurrently with GFDL, work on development of the atmospheric general circulation model was carried out at the University of California, Los Angeles (UCLA) Department of Meteorology (known as UCLA model series), and at the US National Center for Atmospheric Research (NCAR) a few years later [33].
Nikita Moiseev's model [45] developed at the Computer Centre of the Academy of Sciences is apparently the first internationally recognized Soviet coupled model.More recent research by the Alexandrov-led team has focused on studies of global changes in the biosphere, including economic, social and demographic aspects [1].One of the first models of global atmospheric circulation was developed by Gury Marchuk, who used numerical modelling of atmospheric processes for numerical weather forecasting.From 1973 Marchuk created "mathematic calculations of atmospheric-ocean dynamics" [9] that was later tested on the supercomputer.
Nowadays, software packages based on coupled models are widely used to assess climate change and study the Earth's climate in the past [13].Coupled models are also used for the longrange weather prediction in leading World Meteorological Organization (WMO) meteorological centers: Met Office Global Seasonal forecasting system (GloSea) [8,37] in the UK Meteorological Service, Meteo-France seasonal forecastng system 8 (SYSTEM8) [25] in France, Climate Forecast System Version 2 (CFSv2) [51] in the USA, the Canadian Seasonal to Interannual Prediction System Version 2 (CanSIPSv2) [36] in Canada, Japan Meteorological Agency/Meteorological Research Institute Coupled Prediction System Version 3 (JMA/MRI-CPS3) [28] in Japan.The typical horizontal mesh resolution of these models ranges from 25 to 80 km (in atmospheric component).
The development and implementation of coupled models for global and regional mediumrange weather forecasting (with lead times up to 10 days) is mainly driven by the idea that by resolving ocean circulation, coastal ocean features and air-sea interaction a more accurate representation of atmosphere, ocean and sea ice dynamics can be achieved.Another motivation for using the coupled models is the idea of seamless prediction [29], whereby a single model family can be used for prediction across a range of timescales.Canada, UK and European Centre for Medium-Range Weather Forecasts (ECMWF) are currently using coupled models for medium-range weather forecasting [61].The coupled model is planned for implementation for medium-range weather prediction in the USA in 2025.In [56] a statistically significant improvement is shown in the accuracy of medium-range weather prediction by the coupled The Global Environmental Multiscale Model (GEM) [21] and Global Ice Ocean Prediction System (GIOPS) that based on Nucleus for European Modelling of the Ocean (NEMO) model in comparison with the operational forecast model of the Meteorological Center of Canada.The [40] discusses the results on the accuracy of tropical cyclone intensity and trajectory predictions based on the coupled atmosphere-ocean model of ECMWF.Note that Coupled Ocean-Atmosphere Mesoscale Prediction System-Tropical Cyclones (COAMPS-TC) [10] model is used for predicting the trajectory and intensity of tropical cyclones in the Atlantic Ocean and the Eastern Pacific Ocean in the USA.A coupled model [62] incorporates ENDGame-based (Even Newer Dynamics for the General Atmospheric Modelling of the Environment) [76] Met Office Unified Model (Me-tUM) of the UK Meteorological Service coupled with the the NEMO ocean model.This model is developed using Ocean-Atmosphere-Sea-Ice-Soil (OASIS) [68] software to study and forecast typhoons in Southeast Asia (Maritime continent).The spatial resolution of this model is about 4.5 km.
In Russia, coupled models are also being developed.Marchuk Institute of Numerical Mathematics Climate Model (INMCM) [71,72] participates in the Climate Model Intercomparison Project (CMIP), the objective of which is to better understand past, present and future Earth climate changes in a multi-model context [75].INMCM is also used in the long-range forecasting system developed in the Hydrometeorological Research Center of Russian Federation (Hydrometcenter of Russia).Team from Institute of Computational Mathematics and Mathematical Geophysics Siberian Branch of the Russian Academy of Sciences (SB RAS) develop Planet Simulator of Institute of Computational Mathematics and Mathematical Geophysics (PlaSim-ICMMG) [49].This is intermediate complexity climate model, that includes own developed ocean model [22,23], Portable University Model of the Atmosphere (PUMA) [18] and the Los Alamos sea ice model CICE (Community Ice CodE) [30].Since proprietary software Sib-CIOM Coupling Module (SCM) [24] is used to couple these models, there is another configuration of SB RAS climate model that incorporates INMCM's atmospheric model instead of PUMA: Siberian coupled ice-ocean model (INMCM-SibCIOM).A spectral atmospheric model developed at A.I. Voyeykov Main Geophysical Observatory (MGO) is currently used in Hydrometcenter of Russia as one of the components for probabilistic long-range forecast of weather anomalies.Based on this model, coupled model is also being developed at MGO [42,43].It is expected that once the coupled model is finalized, it will be applied for long-range forecasting.Another coupled model for long-range weather prediction under development is SLAV-INMIO-CICE model [14].This model combines SLAV atmospheric model [63] that developed at Marchuk Institute of Numerical Mathematics RAS (INM RAS) and Hydrometcenter of Russia, INMIO (Institute of Numerical Mathematics and Shirshov Institute of Oceanology RAS) ocean model [31], and a CICE sea ice model.The coupling is performed using the own developed Coupling Modeling Framework (CMF) [32].
Inspired by the idea of seamless prediction, a new coupled model SLNE was developed.SLNE is the acronym of the models to be coupled: SLAV and NEMO.Note that NEMO includes sea ice SI3 model.Both SLAV and NEMO models are applied over a wide range of time scales: medium-range weather forecasting, sub-seasonal (with lead times from 2 to 6 weeks) and longrange ensemble prediction.Initial conditions for these models are provided by the software developed at Hydrometcenter of Russia: 3dvar meteorological data assimilation system [67] is used for SLAV and oceanographic data assimilation system [58] is applied for NEMO.OASIS3-MCT [68] software is used to couple SLAV and NEMO.Spherical Coordinate Remapping and Interpolation (SCRIP) [48,55] library for remapping the exchanged data.SLNE coupled model has been developed for medium-range weather forecasting and long-range forecasting of weather anomalies.
The coupled models are computationally demanding.This is not only due to the sum of the costs of the individual model components, but also to additional costs of the coupler, mapping procedures and load imbalances of the components.This paper focuses on SLNE computational efficiency and parallel scaling analisys in order to extend the model usability, improve its performance and to focus future research on evaluating simulation accuracy.The paper is organized as follows.Section 1 presents the coupled model and its components.Section 2 describes the parallel structure of SLNE, the results of the parallel scalability study and coupled model optimal configurations.The results of numerical simulations are given in Section 3. Finally, we summarize the conclusions of the paper.

SLAV Atmospheric Model
Atmospheric model SLAV [64] was developed at Marchuk Institute of Numerical Mathematics RAS (INM RAS) and Hydrometcenter of Russia.SLAV20 and SLAV10 are used for the medium-range numerical weather prediction with lead times up to 10 days in Hydrometcenter of Russia [65].SLAV10 is released in 2023.The features of this model are the high horizontal resolution (about 10 km over the Northen hemisphere compared to 20 km in SLAV20), a more detailed description of the lower troposphere, and improved parameterizations of subgrid-scale processes.A coarser resolution version of SLAV10, named SLAV072L96 is applied for ensemble medium-range weather prediction since 2022.This model is also submitted for operational testing in 2022 for long-range prediction of weather anomalies at time scales of up to 4 month.SLNE coupled model is based on the most recent version of SLAV072L96 [66] which differs from SLAV2008 in many aspects.The most notable difference between these models is the enhanced spatial resolution and the increased number of vertical levels from 28 to 96.The top boundary of this model is located at 0.03 hPa.In horizontal, SLAV072L96 uses regular lat-lon grid with the 0.72 • grid step in latitude and 0.9 • resolution in longitude.
Together with an increase of the horizontal and vertical resolution of SLAV model, the used parameterizations of the subgrid-scale processes have been significantly improved with respect to SLAV2008.The key changes include the methods of describing the radiation transfer in the Earth atmosphere (CLIRAD SW [6,6] and RRTMG LW [44] packages are now used), the atmospheric boundary layer and land surface model, which was supplemented by a multilayer soil model [70] developed at INM RAS and Research Computer Center of the Lomonosov Moscow State University (RCC MSU).A detailed description of all improvements is the subject of a special paper [66].
It should be noted that a substantial amount of time in the development of the new SLAV072L96 model version was devoted to model tuning in order to match the behavior of all parameterizations and the dynamical core [15].Due to this study, the monthly and annual averaged characteristics of the model atmosphere (including surface heat fluxes) became close to the ERA5 reanalysis [27].The annual surface energy budget is less than 1 W/m 2 .The zonal wind speed bias at all heights, including the near surface layer, was significantly reduced in SLAV072L96 due to recent study [17] on improving the deep convection and wind gustiness parameterization.
The vertical grid structure in SLAV072L96 is set to provide increased resolution in the lower troposphere and in the stratosphere.Non-uniform vertical grid spacing allows to describe explicitly the atmospheric processes in the stratosphere.It is shown in [52] that SLAV072L96 successfully reproduces the mean zonal wind speed and temperature distribution in winter and summer seasons in comparison to the ERA reanalysis [27].The period and amplitude of the quasi-biennial equatorial wind oscillation are also close to the ERA5.Explicit simulation of stratospheric dynamics is important for several reasons.First, the [3] shows that the effect of El Niño-Southern Oscillation on the eddy-driven jet during spring and early summer occurs via the stratosphere.Second, tropical variability associated with the Madden-Julian oscillation (MJO) has been shown to affect the circulation in the extratropical stratosphere during boreal winter [54] and can lead to extended predictability at the surface.

NEMO-SI3 Ocean-Sea Ice Model Configuration
The coupled SLNE model uses NEMO version 4.0.4[38] with SI3 sea ice model [69].NEMO and SI3 use the same ORCA025 grid with a horizontal resolution of about 1/4 • .The ORCA family is a series of global orthogonal curvilinear ocean meshes that are generated using semianalytical method [39].This is tripolar grid that has no singularity point inside the computational domain because two north poles of the mesh are introduced and placed on lands.ORCA025 computational grid has 1442 nodes along the first horizontal dimension and 1021 nodes along the second dimension.The ocean model also has 75 vertical levels.Data on temperature and salinity of the river runoff are read from a file.Unfortunately, river runoff temperature is not consistent with the temperature of the lower atmosphere.Coupled model does not include iceberg floats and wave model, but internal wave-driven mixing parameterization is used in NEMO.
SI3 simulates both ice dynamics (two-dimensional continuum elastic-viscous-plastic formulation is used) and thermodynamics via one-dimensional approximation along the vertical coordinate.The number of ice categories in SI3 is 5, the number of ice layers is 2 and there is one snow layer over the sea ice.
The resolution and configuration of NEMO and SI3 are the same as those used at Hydrometcenter of Russia (see [57,58] for details).NEMO and SI3 share the same Message Passing Interface (MPI) communicator.

Coupled Model Design
SLAV072L96 and NEMO-SI3 models are coupled using OASIS3-MCT software, that performs parallel exchange of coupling data between SLAV and NEMO.OASIS3-MCT uses the MPI library for direct parallel communications between components of the coupled model.OASIS3-MCT is able to gather and scatter the arrays of coupling data.Since the computational meshes of SLAV and NEMO differ, OASIS3-MCT provides service for run-time remapping of the exchanged data using pre-computed interpolation weights and addresses.The current OASIS3-MCT software internally uses the Model Coupling Toolkit (MCT) [34], that implements parallel remapping as a parallel matrix-vector multiplication.OASIS3-MCT supports coupling of 2D logically-rectangular arrays.3-dimensional arrays expressed on unstructured grids are also supported by the software using a one dimension degeneration of the data structures.
OASIS3-MCT is a library that is compiled and linked to the coupled model components.At runtime, all components as well as OASIS3-MCT are launched together.The MPI COMM WORLD communicator is split using OASIS3-MCT Application Programming Interface (API) into "private" communicators for SLAV and for NEMO as part of the initialization procedure of the coupled model.All of the MPI data transfer within the components is then done via these "private" communicators whereas exchange of data between SLAV and NEMO is done via OASIS3-MCT API.It means that components make OASIS3-MCT API calls to send or to receive data from within the component code directly.The OASIS3-MCT API routines call are located in a part of the program known as the component interface.
OASIS3-MCT has to know information about component computational grid structure, data decomposition and coupling fields identifiers to perform remapping at run-time.Transfer of this information to the OASIS3-MCT occurs at the stage of component initialization.Information on the method and frequency (coupling period) of data exchange between components is specified in the OASIS3-MCT configuration file.
To interact with the rest of the coupled system, SLAV interface for OASIS3-MCT API has been developed.It includes the following stages: • Preliminary initialization of SLAV as a component of the coupled model, including partition definition, time step and computational grid declaration.The partition definition implies that all the MPI processes of the component have to describe in a global index space the local partitioning of computational grids onto which the coupling data is expressed.OASIS3-MCT supports several partition types: serial, apple (each partition is a segment of the global domain, described by its global offset and its local size), box, orange (each partition is an ensemble of segments describing by its global offset and its local extent in the global domain) and points.In SLNE the box partition approach is used.It means that each partition is a rectangular region of the global domain.• Component initialization: memory allocation, reading initial condition, preliminary computations, etc. • Definition phase: coupling data arrays declaration.
• Preliminary data exchanges phase.SLAV model component receiving boundary condition from NEMO model has to wait for the coupling data (sea surface temperature, sea ice The Parallel Performance of SLNE Atmosphere-Ocean-Sea Ice Coupled Model temperature and concentration) before it can perform its own calculations.OASIS3-MCT supports only regular (each coupling period of time) data exchange.Therefore, the preliminary data exchange and appropriate pre-processing of coupling data procedures were implemented into the main part of the code of SLAV and NEMO.• Main time loop.In the component time step loop, each MPI process additionally performs its part of the coupling data sending and recieving via OASIS3-MCT API (internally uses MPI non-blocking request).The sending (recieving) is performed if the actual time corresponds to a time at which it should be activated.It is controlled by the OASIS3-MCT, to which the component passes its actual time.• Termination.In total, six to eight OASIS3-MCT API routines have to be called by each component to get the local MPI communicator, to declare the components id and grid partitioning, define, send and receive the coupling data and, finally, close the MPI context at the components runtime end.These actions are performed in the aforementioned component interface that was developed for the SLAV model.On the NEMO side, the coupling interface was created by the community and customized for use in SLNE.Within SLAV model framework, a number of numerical methods were implemented to compute the lower atmosphere properties and surface fluxes passed to NEMO and SI3.
The time step in SLAV atmospheric model is 1440 s, while in NEMO ocean model 720 s. time step is used.It can be seen that the NEMO time step is half of SLAV's time step.Therefore, the coupling time period used in SLNE is a multiple of SLAV time step.Atmosphere and ocean models exchange two-dimensional data arrays only.Mapping of coupling data is performed using bilinear interpolation from SCRIP [48,55] library.
SLAV and NEMO models run in parallel as different binaries.Execution on Cray XC-40 (see Sec. 2.1) is performed using the following command: The peculiarity of the program design of SLAV model is the use of MPI-based onedimensional domain decomposition in latitude supplemented by OpenMP loop parallelization in longitude.Computation of parameterization of subgrid-scale processes are the most resourcedemanding part of SLAV model.Parameterizations are calculated independently for each vertical column.This means that the application of OpenMP for SLAV is important, while NEMO does not support OpenMP.Therefore, the coupled model was assembled as two independent executable files, corresponding to the atmosphere and ocean-sea ice models and executed each with its own environment.

Physical Basis
In coupled mode, SLAV model performs computation for coupling period of time.It calculates the surface fluxes and accumulates precipitation.These data are then passed to the NEMO which uses them as the upper boundary condition while computing ocean dynamics.After coupling period of time updated ocean data is sent to SLAV model to be used as the lower boundary condition.The coupling period defines the moments of time, at which the models are synchronized.
The data coupling between SLAV and NEMO models is organized as follows.The atmospheric model sends 10 surface fields to the ocean model, including two heat fluxes (downward solar radiation and sum of the net surface thermal radiation flux, latent and sensible heat fluxes), wind stress (two components), amount of precipitation during coupling period (kg/m 2 , separately for liquid and solid phases), evaporation flux from the surface (kg/s m 2 ).Note that solar radiation, non-solar radiation and evaporation fluxes are computed and sent for water and icecovered surfaces individually.All the heat fluxes are expressed in W/m 2 .Surface heat fluxes are used to calculate the heat budget while wind stress is used as a direct momentum flux for NEMO to calculate the water motion.Evaporation and precipitation fluxes are used by the ocean model to compute fresh water budget and adjust the salinity of the uppermost ocean layer.Flux adjustments to correct inconsistencies at the interface between coupled atmosphere-ocean models have not been applied in SLNE.
NEMO sends seven data arrays to SLAV: sea surface temperature, ice concentration, sea ice surface temperature, sea ice top layer temperature, ice thickness, sea ice snow cover thickness and sea ice effective conductivity.Sea ice top layer temperature, thickness and effective conductivity are used in SLAV model to calculate the heat flux between the low atmosphere and sea ice.The evolution of sea ice surface temperature T s is computed in SLAV using the following equation: where C tc is the thermic coefficient (Km −2 J −1 ), F atm is the surface net heat flux (W/m 2 ), T tl is the sea ice top layer temperature, C ice is the sea ice effective conductivity.The first and second terms in the right hand side of (1) are the heat fluxes into the atmosphere and sea ice, respectively.For simplicity, in the Equation (1) we omit the snow related variables and terms.
In SLAV model, Equation ( 1) is incorporated into the vertical diffusion scheme of planetary boundary layer (PBL) parameterization.
The turbulent flux F ψ of a prognostic variable ψ in vertical direction is calculated in SLAV using the following equation: where K is the diffusion coefficient.On the surface, the Equation (2) takes the form: where ρ is the air density, C dh is the surface exchange coefficient, U L and ψ L are the wind speed and prognostic variable value at the lower model level.The surface latent and sensible heat fluxes are calculated in SLAV by equation (3) using the surface temperature T s .Therefore, over land and sea ice, Equations ( 2) and ( 3) are solved together with Equation (1) using an implicit time scheme.Coupling within SLAV of locally one-dimensional models of the atmosphere and sea ice were performed using a numerical algorithm developed to describe the atmosphere-glacier interaction over land [16].Over the water surface, Equation ( 1) is redundant because the water temperature is computed in the ocean model.The coupling of the atmosphere and sea ice models was one of the most time-consuming stages of SLNE model development.The relatively small heat capacity of the upper layer of sea ice and large variability of heat fluxes on its surface in time can lead to the numerical instability expressed in typical two time step oscillations of the sea ice surface temperature.Special attention in the developed model was paid to the consistency of the evolution of the The Parallel Performance of SLNE Atmosphere-Ocean-Sea Ice Coupled Model characteristics of the sea ice surface and the lower levels of the SLAV model in the case when the atmospheric model cell is not completely occupied by sea ice.To describe this case, we will use the mosaic approach [46] implemented into SLAV model earlier.

HPC System and Tools
The previously developed Parallel Profiler (ParProf) [59] software was used to analyze the parallel structure and scalability of the coupled model.This software allows to measure the average computation time of the program code fragment of interest.This software module was used in the SLAV's interface file, as well as in SLAV and NEMO main code.In the first case, the results of the measurements allowed us to investigate the characteristics of the coupled model and its parallel efficiency.In the latter case, the obtained results allowed us to study the performance of the coupled model components individually.
A massively parallel supercomputer XC40-LC installed at the Main Computer Center of Federal Service for Hydrometeorology and Environmental Monitoring was used to study the parallel structure of the coupled model.Cray XC40-LC consists of 976 compute nodes interconnected via the Cray's proprietary Aries network.Each node has 128 Gb of RAM and two Intel Xeon E5-2697v4 processors with 18 CPU cores and 45 Mb of Intel Smart Cache per processor.The total number of computational cores is 35136.It is important to note that the user is able to use only 32 cores per node since 4 cores per node are reserved to maintain the distributed Lustre file system.
The Cray XC40-LC job scheduling system delegates the entire node to the user to perform computations.This means that the number of cores used by the coupled model is a multiple of 32.This feature was taken into account in the study.The number of delegated computational cores to models was set to a multiple of 32.The amount of computational resources available for the experiments with the coupled model was 157 nodes or 5024 cores.
SLAV, NEMO and SI3 models are implemented using Fortran.To compile the program code of these models, Intel Compiler version 19.1.3.304 was used.A study of model performance depending on compiler options was not performed, because it might produce variations in floating point results.Compiler options that control optimization of atmospheric model code were the same as in SLAV-based long-range prediction system at Hydrometcenter of Russia.Compiler options for ocean and sea ice model were the same as in NEMO-based oceanographic data assimilation system of Hydrometcenter of Russia.
Note that ParProf software output was independently verified using the average step information of each model of SLNE.The performance of the coupled model was also verified on the computational systems installed at Marchuk Institute of Numerical Mathematics RAS and Keldysh Institute of Applied Mathematics RAS.

Numerical Experiments Methodology
A number of numerical experiments were performed to study the parallel structure of the coupled model.All of them were organized in the same way.The start date of the coupled model computation corresponded to November 11, 2021.Initial data was prepared in advance using 3dvar meteorological data assimilation system [67] and oceanographic data assimilation system [58].The assimilation system for the ocean and sea ice is based on the relaxation procedure for ice concentration in SI3 model which is applied along with the 3DVAR analysis for temperature and salinity of sea water in NEMO model.This approach guarantees consistent initial states of both ocean and sea ice in coupled model.Unfortunately, uncoupled assimilation technique lead to inconsistencies in the initial states of the atmosphere and ocean.The coupled model was integrated for 10 days for all numerical experiments to study the parallel performance of the model.This lead time corresponds to 600 time steps of SLAV model and 1200 time steps of NEMO.In the following sections it is shown that the variance of the computation time of one time step in the experiments is relatively small.This means that 10 days is sufficient to evaluate the performance of the coupled model.
Measurements of the computation time of one step of SLAV and NEMO models began after the first data exchange between models was finalized.The first exchange of data between models synchronizes them.We use this approach due to the significantly longer NEMO initialization time in comparison to SLAV.Control points, diagnostic information and output data were not recorded in the experiments.No additional operations (transformations, accumulation, etc.) within OASIS3-MCT were performed on the exchanged data.
As noted in Sec.2.1, each model was allocated a multiple of 32 computational cores.SLAV and NEMO used different communicators, while NEMO and SI3 shared a common MPI communicator.A feature of SLAV model is its use of a one-dimensional MPI domain decomposition in latitude.For longitude, in addition to MPI, OpenMP is used.Therefore, the best performance is achieved when the number of MPI processes is as close as possible to a divisor of the number of nodes in latitude.In SLAV072L96 the number of grid nodes in latitude is 251 (including pole points).The number of OpenMP threads should be a divisor of the number of nodes of the computational grid by longitude, which equal to 400.Calculated in this way optimal SLAV configurations are presented in Tab. 1.The first column in the Tab. 1 corresponds to SLAV configuration identifier, the second to the number of MPI processes, the third to the number of OpenMP threads and the fourth to the total number of computational cores used.
Note that the number of OpenMP threads equal to four is the best option.Several experiments have been performed for SLAV individually and within the coupled model, where the size of the MPI communicator was increased (decreased) and the number of OpenMP threads was decreased (increased) by the same amount.All experiments showed either close or worse The Parallel Performance of SLNE Atmosphere-Ocean-Sea Ice Coupled Model performance of the atmospheric model compared to the version of SLAV with 4 threads of OpenMP.
NEMO uses two-dimensional MPI decomposition of the computational domain and data.According to the results of experiments (not presented in this paper), it was found that the performance of NEMO and SI3 models degrades in case of excessive stretching of data along one of the coordinates.Optimal parallel configurations of the used NEMO version correspond to those where the computational domain is divided into approximately equal squares.Even then, NEMO model has many parallel configurations compared to SLAV model.

SLNE Configurations
In numerical experiments SLNE configurations presented in Tab. 2 were used.The first column of this table lists the model configuration identifier, which will be used later in the paper.The second and third columns give the coupling periods (in hours).The fourth column presents information about the time interval between computing the radiative transfer modelthe most resource-demanding parameterization in SLAV.
In the first o1a1 SLNE configuration data exchange is performed every 1440 s (each SLAV's time step).For computational efficiency reasons o1a1 model configuration is not considered as a principal version of SLNE model: computing the radiative transfer model at each step of the model increases its cost by 30-35% compared to o2a2.So, performance of this configuration has not been studied.o2a2 corresponds to the data exchange between the models every second SLAV's time steps (every fourth NEMO's time steps).The third o3a3 configuration corresponds to the exchange between models every third SLAV's time step or, which is the same, sixth NEMO's time step.
Since the rate of change of ocean surface characteristics is much smaller than in the lower atmosphere, in the third o6a3 model configuration (fourth row) data are sent from SLAV to ocean and sea ice models every third SLAV's time step (as in o3a3 configuration), while NEMO sends data to the atmospheric model every 6th SLAV'a time step (every 12th NEMO time step).The reduced number of data transfer and associated data interpolations offers the hope of decreasing computation time.However, the SLAV and SI3 do not exchange data directly.Therefore, a disadvantage of this approach is that the sea ice surface properties are constant in the atmospheric model for six SLAV time steps.Since the thermal conductivity and heat capacity of sea ice are significantly different from those of sea water, this can lead to reduced accuracy of the heat fluxes computation procedure in the lower atmosphere above sea ice.Data exchange between SLAV and NEMO with a frequency of about one hour is typical for coupled models, where the horizontal resolution of the atmospheric model is about 1 • and that of the ocean model is about 1/4 • .
In SLAV072L96, the computation of the radiation fluxes in the atmosphere (shortwave and longwave spectrum) occurs every third time step of the model.At the intermediate time moments, the radiation fluxes are extrapolated in time taking into account the evolution of the zenith angle.This approach is motivated by the cost of the radiative transfer parameterization that is comparable to all other calculations preformed to compute one time step.Data exchange between models and radiative transfer computation performed with the same period of time to load balance between SLAV and NEMO.

Scalability of SLNE
Three series of experiments were performed to study the parallel efficiency of the coupled model.Figure 1a-Figure 1c illustrate the parallel scalability of SLNE on Cray XC40-LC.In these figures the performance (in terms of simulated years per day) as a function of the number of computational cores used by SLNE is shown.The dots indicate different parallel configurations of the model, the solid line corresponds to linear scaling and the dashed curve corresponds to 80% performance compared to the optimal configuration with the minimal number of computational cores (the definition of SLNE optimal configuration is given in the following section).The dots of the same color differ in the number of computational cores allocated for NEMO with a fixed number of cores being used by SLAV. Figure 1a corresponds to o2a3 SLNE configuration, Fig. 1b -o3a3 and Fig. 1c -o6a3.
In Fig. 1a-Fig.1c one can see that SLNE model scales non-monotonically for the each SLAV configuration.This is due to two reasons.First, NEMO and SI3 models scale non-monotonic with increasing communicator size.The best performance of these models is achieved when the computational area is partitioned into sub-domains which shape is close to square.In other words, the number of cells in the sub-domain along each of the two horizontal coordinates are close to each other.However, the number of computing cores for NEMO and SI3 is allocated in multiples of 32.Therefore, if the computational domain is partitioned into 48x48 boxes, the next "well-partitioned computational domain" configuration with a larger number of computational cores, contains 50x48 rectangles.The increase in the first multiplier occurs up to the configuration with the number of rectangles equal to 64x48.After that, the second multiplier starts to increase.
The second explanation for the non-monotonic scaling of SLNE with fixed number of computational cores allocated for SLAV is the heterogeneity of the computational system.To illustrate this assertion, together with measurements of the computation time of SLAV time step, we measured the computation time of time step of this model without taking into account the time spent on parallel exchanges with the ocean model, preparation of relevant data, and other coupling overheads.The computation time of SLAV's time step is illustrated in Fig. 1d.The colors of the dots here correspond to the colors in Fig. 1b.It can be seen that one SLAV's time step average computation time is almost constant with increasing number of computational cores allocated for NEMO.Small deviations around this value can be interpreted as inhomogeneity of the computational system.The dispersion of these deviations increases with the increasing of the number of computational cores allocated for SLAV.The results presented in Fig. 1d allow us to filter out spurious optimal configurations of SLNE model, where the good performance is due to the heterogeneity of the computational system.The optimal configurations will be discussed in the next section.

SLNE Optimal Configurations
Data exchanges between SLNE components means synchronization of SLAV and NEMO at certain moments of time.If one of the components performs calculations too fast with respect to another model, it will soon be waiting for the data to be exchanged.Waiting for the data means load imbalance between components and a waste of computational resources.The performance optimization of a coupled model relies on the allocation of an optimum number of computational resources to each component.It can be achieved by balancing the load of SLAV and NEMO between the available computing resources.
The optimal configurations of the coupled model are summarized in Tab. 3. The first column in Tab. 3 is the SLNE configuration id; the second column is the number of computational cores used by the coupled model; the third column is the parallel performance of the model (in terms of the number of simulated years per day); the fourth column is the number of computational cores allocated for NEMO; the fifth column represents the way the NEMO computational domain is partitioned; the sixth column is the number of computational cores allocated for SLAV; the seventh column of the table contains the ratio of the computational cores allocated for NEMO and SLAV.The results presented in Tab. 3 show that the intensity of exchanges directly affects the performance of the model.For example, the o2a2 configuration performance is about 7.75 years per day on 2208 computational cores allocated for the model.In contrast, the o3a3 configuration performance is 8.63 years per day on 2176 computational cores.o3a3 and o6a3 configurations are not significantly different.This can be seen in the Fig. 2, which illustrates the parallel performance of the model on a logarithmic scale.Linear scalability is shown by the solid line.o2a2 configuration parallel efficiency is about 83% on 3000 computational cores with respect to the optimal configuration with the minimal number of computational cores.For the o3a3 and o6a3 configurations SLNE parallel efficiency is about 90% and 88%, respectively.Note, that in our experiments we were limited to 5000 computational cores.
For all model configurations presented in Tab. 3, we can observe a decrease in the last column value.This means that the atmospheric model scales worse than the ocean model.As computational cores allocated for SLNE increase, the atmospheric model requires more resources to have the same performance as NEMO.Possible reasons for this dependence may be the use of one-dimensional domain MPI decomposition in SLAV and the significantly higher resolution of NEMO.
In Fig. 1a-Fig.1c, it can be seen that the scalability of the coupled model has a similar pattern for different configurations of the atmospheric model.With increasing number of NEMO computational cores, SLNE exhibits a so-called super-linear speed-up, followed by a low speedup.Each of these segments is an illustration of the imbalance of the coupled model components.In these figures, the optimal SLNE configurations for each SLAV configuration are highlighted by the black border of the dot.At optimal configurations both components arrive at about the same speed and costs.At the super-linear speed-up segment the fastest model is SLAV.While at a low speed-up segment NEMO has to wait for the exchanged data before it can perform its own computations.In optimal o2a2 SLNE configurations, the number of computational cores granted for NEMO is 5 to 6 times larger than the number of computational cores allocated for SLAV.In the case of less frequent exchanges between atmosphere and ocean models, this ratio increases and ranges from 7 to 8 times depending on the overall size of the MPI communicator.Therefore, the optimal performance of the coupled model is achieved around configuration where the ocean and sea ice models do not wait for the exchanged data.
Figure 3-Figure 4 illustrate the optimality of SLNE configurations presented in Tab. 3. Figure 3 illustrates the dependence of the relative cost of atmosphere model program code fragment computation (Fig. 3a) and ocean model (Fig. 3b) depending on the number of SLNE computational cores (caption under the column).The number of computational cores allocated for SLAV in these figures is 128 (128 : 48x4 SLAV configuration).Bar segments of different colors correspond to different fragments of program code.These fragments cover the entire time loop body of the corresponding model.The absolute computation time of one time step is given above the bar at the top of the figures.The proportion (given in %) of the computation time of a software fragment relative to the one time step computational time is given when it exceeds 5%.Both Fig. 3a-Fig.3b correspond to the o2a2 SLNE configuration.
The following fragments of SLAV program code are highlighted with colors in Fig. 3a: diag (blue bar) corresponds to the program fragment responsible for diagnostic data calculation; get (yellow bar) -receive data from NEMO; put (blue bar) -send data to NEMO; prep2send (green bar) -computation of SLAV data to be sent to NEMO; step (red bar) -calculation of one time step (dynamical core and parameterizations of sub-grid scale processes).In Fig. 3a, it can be seen that as the number of NEMO computing cores increases, the waiting time of SLAV decreases.In Fig. 3b, the time fraction of the program code fragment sbc rcv responsible for receiving data from SLAV is negligible for the small number of computational cores allocated for NEMO.However, as the number of these computational cores increases, the ocean model starts to compute faster and the fraction of this fragment starts to increase rapidly.In comparison to SLAV, ocean and sea ice model require significantly more computational resources for computation.Therefore, NEMO should not wait for data from SLAV.In contrast, if the atmospheric model spends part of its computational time waiting for data from NEMO, it is not essential because SLAV uses a relatively small fraction of the computational resources (from 11% to 20%).Therefore, the optimal configuration of the coupled model SLNE is achieved when NEMO computes slightly slower in comparison to SLAV.
Another feature of the results presented in Fig. 3a-Fig.3b is the following.The procedure performing data transfer from SLAV to NEMO requires less time than data exchange from NEMO to SLAV.In the o2a2 configuration, the time fraction of the NEMO sbc rcv procedure increases from about zero to 40%.At the same time, the number of computational cores granted to NEMO doubles.The number of SLAV cores is 128.The time fraction of SLAV get function decreases from 25% to 14%.Similar dependence can be seen in Fig. 4, where the o3a3 configuration is presented.Figures 4a and 4b differ in the number of computational cores allocated for SLAV: in Fig. 4a, it corresponds to 128 computational cores, and in Fig. 4b to 512.

SLNE Performance vs Number of Exchanged Arrays
This section presents the results of a performance study of the optimal o3a3 SLNE configuration as a function of the number of two dimensional arrays exchanged between the atmosphere and ocean models.The numerical experiments were organized as follows.For each optimal SLNE configuration, three additional experiments were performed.In the first experiment, the number of data arrays sent from SLAV to NEMO and from NEMO to SLAV was increased by 10.In  the second and third experiments it was increased by 25 and 50, respectively.The additional exchanges and interpolations were performed in both directions.Therefore, in the first experiment, the total number of arrays exchanged between models was increased from 17 to 37. In the second and third experiments -to 67 and 117, respectively.Extra20, extra50 and extra100 will be used to indicate these configurations.The additional data exchanged between the models corresponded to the solar radiation surface heat flux.The obtained results of the experiments are presented in Tab. 4. The first row of the table lists the computational cores allocated for the SLNE model.The second row gives the performance (in terms of simulated years per day) of the o3a3 model in reference version with 17 exchanged arrays.Other rows of the table give the performance of extra20, extra50 and extra100 SLNE configurations with additional exchanges.The last column shows the average relative performance slowdown (given in %) of the corresponding o3a3 model configuration compared to the optimal model version.As can be seen in Tab. 4, increasing the number of exchanged arrays by 20 leads to an insignificant slowdown of the model by a value of about several percent in comparison to reference (optimal) version of the model.In the case when the number of exchanged arrays significantly increases, the model performance slowdown is on average 5%, and for some configurations it reaches 7% slowdown.Note that only one experiment was performed for each configuration.Therefore, the results presented in the table may not be statistically significant.However, they show relatively small overheads associated with the data exchange and interpolation compared to the one time step component integration time.
The results of model profiling using ParProf in the version with increased number of exchanged arrays are shown in Fig. 4.These configurations are highlighted by stripes over the bar.Stripes inclined to the right corresponds to the extra20 configuration.Other cross-stripe patterns match the extra 50 (square tiles) and the extra100 (triangular tiles) configurations.The bars without stripes correspond to the reference (optimal) o3a3 SLNE version with 1088 (Fig. 4a) and 4096 (Fig. 4b) computational cores.
Parallel profile of SLNE components shown in Fig. 3-Fig.4 gives a better understanding of the development of load imbalance costs.Additional exchanges lead to an increase in the time fraction of the put routine sending data from SLAV to NEMO.A similar result was obtained for the o2a2 SLNE configuration, that is shown in Fig. 3.Additional exchanges lead to extra costs of SLAV's put routine and NEMO sbc rcv routine.Change in the average computation time of one time step in this experiment is not substantial and is equal to 3.1% (it increases from 1.095 s. to 1.13 s.)

Interpolation Accuracy
To evaluate the accuracy of interpolation of the data exchanged between SLNE components, we performed so-called ping-pong test.This test implies that the data are passed forth and back between SLAV and NEMO sequentially without being modified in these models.Each exchanging of data between components consists of a mapping operation that interpolates the data using bilinear interpolation scheme.In the experiment we used o1a1 version of SLNE model where data exchange is performed every time step in both models.In o1a1 configuration 128 computational cores are granted to SLAV and 768 cores are allocated for NEMO.At the initial moment of time, the data exchanged by the models corresponded to the downward solar radiation surface flux (see Fig. 5a).This heat flux has a sharp structure because it depends on cloudiness, cloud liquid water content, surface albedo, aerosol mixing ratio, etc.
Figure 5b shows the appropriate heat flux after performing of 840 exchanges from SLAV to NEMO and 840 exchanges in the opposite direction.Thus, the flux was interpolated 1680 times from the regular latitude-longitude grid to the tripolar ORCA025 grid and back.This number of exchanges corresponds to the 14 simulated days.
It can be seen that the solution is significantly smoothed.Figure 5c-Figure 5d show the difference of these fields after performing two (Fig. 5c) and 1680 (Fig. 5d) procedures of mapping.It can be seen that after performing 1680 procedures of mapping, no spurious extremum and abnormal anomalies due to the presence of the coastline and the water surface-sea ice interface appeared in test data.Global mean downward surface solar radiation used as test data decreases from 171.64 to 171.17 W/m 2 (0.27%).

Results of Numerical Simulations
Concurrent execution of multiple components may induce numerical instability, which can be caused by the following conditions: inconsistent coastal line of atmosphere and ocean models, coupling data errors, imbalance of physical processes at the components interface, etc.The development of numerical instability due to these errors and the importance of correctly coupling the components is shown in [2,26,35,74].To understand the feedbacks between components of In the first experiment, SLNE model was run for 14 days.Figure 6a shows the surface temperature after 14 days of SLNE integration.Figure 6b illustrates the surface temperature bias with respect to the atmospheric and ocean state corresponding to 00 UTC November 25, 2021 and obtained using the assimilation technology developed at RMHS [58].A similar methodology was used to prepare the initial state of the ocean and sea ice.
In Fig. 6b, it can be seen that the surface temperature on land differs significantly from the observed temperature.This is due to the nonlinear nature of simulated processes: the lead time exceeds the hydrodynamic limit of predictability.Above sea ice, the bias is most noticeable near the water-sea ice boundary.The ocean surface temperature bias for 14 days of model integration in some regions reaches 2 • Celsius, but, in general, it is not large.Numerical instability and large errors near the shoreline do not appear in the model.Errors near the sea ice boundary are probably due to an insufficiently accurate method for computation of atmospheric surface heat fluxes.The error reduction can be expected in case of implementation of the mosaic approach to representing surface heterogeneity.
SLNE model in version o1a1 has been integrated for a period of up to 3 years.Simulations showed that there were no significant systematic errors in the coupled model.The averaged sea ice extent in the Northern and Southern hemispheres remains approximately the same.The seasonal variations of surface temperature of sea ice and ocean in tropics are consistent with the reanalysis.A more detailed study of the medium-range and seasonal predictability of the model will be performed in the next studies.

Conclusion
The paper presents the first version of SLNE coupled model.This model consists of an atmospheric model SLAV072L96, an ocean model NEMO4 and a sea ice model SI3.The models are coupled using OASIS3-MCT software.We propose three SLNE configurations that differ in the coupling period.
The total duration of a coupled model simulation can be separated into two parts for each component: a computing time and a waiting time during which a component waits for boundary conditions.An efficient use of the computing resources requires synchronizing of the component computational speed.In order to find these model configurations, we performed several experiments.It was found that in optimal parallel configurations SLAV model has to wait for data from NEMO.The fraction of SLAV's waiting time is about 10%.Nevertheless, this time is not significant because SLAV requires 6 to 7 times fewer computational resources than NEMO.
The optimal SLNE configurations are run using from from 224 to 4096 computational cores of Cray XC40-LC HPC system installed at the Main Computer Center of Federal Service for Hydrometeorology and Environmental Monitoring.Coupled model scales quite well: scaling efficiency of about 85% on 4000 computational cores in comparison to the configuration using on 224 cores.Unfortunately, in our study, we were limited to 5000 computational cores.
The results of simulations with lead time ranges from 14 to 1100 days showed the absence of numerical instability and significant systematic errors of surface fields.Further research will focus on coupled model tuning and studying the accuracy of simulation.

Figure 1 .
Figure 1.SLNE (a-c) and SLAV (d) computational performance with respect to the number of computational cores allocated for SLNE

Figure 3 .
Figure 3. o2a2 SLNE configuration: time step fraction of SLAV (a) and NEMO (b) models with respect to the number of SLNE computational cores (captions under the columns); SLAV model runs on 128 cores (a) SLAV 128: 48x4 time loop profile (b) SLAV 512: 128x4 time loop profile

Figure 4 .
Figure 4. SLAV time step fraction with respect to the number of o3a3 SLNE computational cores (captions under the columns); SLNE configurations with additional exchanges are highlighted by stripes over the bars

Figure 5 .
Figure 5. Results of ping-pong test where the test data corresponds to the solar radiation heat flux on the Earth's surface (W/m 2 )

Figure 6 .
Figure 6.Test data array corresponding to the initial moment of time to the solar radiation on the Earth's surface (W/m 2 )

Table 1 .
SLAV optimal parallel configurations and their identifiers

Table 2 .
SLNE configurations: coupling periods of the components

Table 3 .
SLNE optimal parallel configurations and performance

Table 4 .
Parallel performance of optimal SLNE configurations in the reference version and in versions with increased number of data