TY - JOUR AU - Kunkel, Julian M. AU - Pedro, Luciana R. PY - 2020/07/11 Y2 - 2024/03/29 TI - Potential of I/O Aware Workflows in Climate and Weather JF - Supercomputing Frontiers and Innovations JA - superfri VL - 7 IS - 2 SE - Articles DO - 10.14529/jsfi200203 UR - https://superfri.org/index.php/superfri/article/view/309 SP - 35-53 AB - <p>The efficient, convenient, and robust execution of data-driven workflows and enhanced data <span style="font-size: 10px;">management are essential for productivity in scientific computing. In HPC, the concerns of storage </span><span style="font-size: 10px;">and computing are traditionally separated and optimised independently from each other and the </span><span style="font-size: 10px;">needs of the end-to-end user. However, in complex workflows, this is becoming problematic. These </span><span style="font-size: 10px;">problems are particularly acute in climate and weather workflows, which as well as becoming </span><span style="font-size: 10px;">increasingly complex and exploiting deep storage hierarchies, can involve multiple data centres.</span></p><p>The key contributions of this paper are: 1) A sketch of a vision for an integrated data-driven <span style="font-size: 10px;">approach, with a discussion of the associated challenges and implications, and 2) An architecture </span><span style="font-size: 10px;">and roadmap consistent with this vision that would allow a seamless integration into current </span><span style="font-size: 10px;">climate and weather workflows as it utilises versions of existing tools (ESDM, Cylc, XIOS, and </span><span style="font-size: 10px;">DDN’s IME).</span></p><p>The vision proposed here is built on the belief that workflows composed of data, computing, <span style="font-size: 10px;">and communication-intensive tasks should drive interfaces and hardware configurations to </span><span style="font-size: 10px;">better support the programming models. When delivered, this work will increase the opportunity </span><span style="font-size: 10px;">for smarter scheduling of computing by considering storage in heterogeneous storage systems. </span><span style="font-size: 10px;">We illustrate the performance-impact on an example workload using a model built on measured </span><span style="font-size: 10px;">performance data using ESDM at DKRZ.</span></p> ER -