InfiniCortex - From Proof-of-concept to Production

Gabriel Noaje, Alan Davis, Jonathan Low, Seng Lim, Geok Lian Tan, Łukasz Orłowski, Dominic Chien, Sing-Wu Liou, Tin Wee Tan, Yves Poppe, Kenneth Ban Hon Kim, Andrew Howard, David Southwell, Jason Gunthorpe, Marek Michalewicz

Abstract


The global effort to build ever more powerful supercomputers is faced with the challenge of ramping up High Performance Computing systems to ExaScale capabilities and, at the same time, keeping the electrical power consumption for a system of that scale at less than 20 MW level. One possible solution, bypassing this local energy limit, is to use distributed supercomputers to alleviate intense power requirements at any single location. The other critical challenge faced by the global computer industry and international scientific collaborations is the requirement of streaming colossal amounts of time-critical data. Examples abound: i) transfer of astrophysical data collected by the Square Kilometre Array to the international partners, ii) streaming of large facilities experimental data through the Pacific Research Platform collaboration of DoE, ESnet and other partners in the US and elsewhere, iii) the Superficilities vision expressed by DoE, iv) new architecture for CERN LHC data processing pipeline focussing on more powerful processing facilities connected by higher throughput connectivity.

The InfiniCortex project led by A*STAR Computational Resource Centre demonstrates a worldwide InfiniBand fabric circumnavigating the globe and bringing together, as one concurrent globally distributer HPC system, several supercomputing facilities spanned across four continents (Asia, Australia, Europe and North America). Using global scale InfiniBand connections, with bandwidth utilisation approaching 98% link capacity, we have established a new architectural approach which might lead to the next generation supercomputing systems capable of solving the most complex problems through the aggregation and parallelisation of many globally distributed supercomputers into a single hive-mind of enormous scale.


Full Text:

PDF

References


References

A*CRC team received five awards in 2015:. Singapore Ministry of Trade and Industry Awards 2015: Innovative Project Gold Award 2015; A*STAR Awards 2015: STAR Innovation Award 2015; FutureGov Singapore Award: Technology Leadership Award; CIO Asia 100: Honouree 2015; Singapore Public Service Most Innovative Project: Merit Award 2015.

EuroHPC. https://ec.europa.eu/digital-single-market/en/news/eu-ministers-commit- digitising-europe-high-performance-computing-power. Last accessed: May 10, 2017.

Pacific research platform. http://prp.ucsd.edu/. Last accessed: May 10, 2017.

Square kilometer array. http://skatelescope.org/signal-processing/. Last accessed: May 10, 2017.

Obsidian introduces the Bowman Global Fabric Controller. http://www.obsidianresearch.com/archives/all/2015/Bowman-Global-Fabric- Controller.html, November 2015. Last accessed: May 10, 2017.

Kate Antypas. Superfacility: How new workflows in the DOE Office of Science are influencing storage system requirements. http://storageconference.us/2016/Slides/KatieAntypas.pdf, May 2016.

Kenneth Hon Kim Ban, Jakub Chrzeszczyk, Andrew Howard, Dongyang Li, and Tin Wee Tan. Infinicloud: Leveraging the global infinicortex fabric and openstack cloud for borderless high performance computing of genomic data. Supercomputing Frontiers and Innovations, 2(3):14–27, 2015. DOI:10.14529/jsfi150302

Jakub Chrzeszczyk, Andrew Howard, Andrzej Chrzeszczyk, Ben Swift, Peter Davis, Jonathan Low, Tin Wee Tan, and Kenneth Ban. Infinicloud 2.0: distributing high perfor- mance computing across continents. Supercomputing Frontiers and Innovations, 3(2):54–71, 2016. DOI:10.14529/jsfi160204

Jonathan Low, Jakub Chrzeszczyk, Andrew Howard, and Andrzej Chrzeszczyk. Performance assessment of infiniband hpc cloud instances on intel haswell and intel sandy bridge architec- tures. Supercomputing Frontiers and Innovations, 2(3):28–40, 2015. DOI:10.14529/jsfi150303

Marek Michalewicz, David Southwell, Tin Wee Tan, Yves Poppe, Scott Klasky, Yuefan Deng, Matthew Wolf, Manish Parashar, Tahsin Kurc, C.S. Choong-Seock Chang, Satoshi Mat- suoka, Shin’ichi Muira, Jakub Chrzęszczyk, and Andrew Howard. InfiniCortex: concurrent supercomputing across the globe utilising transcontinental InfiniBand and Galaxy of Su- percomputers. Supercomputing 2014: The International Conference for High Performance Computing, Networking, Storage and Analysis, At New Orleans, LA, USA, November 2014.

Marek T Michalewicz, Tan Geok Lian, Lim Seng, Jonathan Low, David Southwell, Jason Gunthorpe, Gabriel Noaje, Dominic Chien, Yves Poppe, Jakub Chrzęszczyk, et al. Infinicor- tex: present and future invited paper. In Proceedings of the ACM International Conference on Computing Frontiers, 267–273. ACM, 2016. DOI:10.1145/2903150.2912887

Łukasz Orłowski, Yuefan Deng, and Marek Michalewicz. Galaxies of supercomputers and their underlying interconnect topologies hierarchies. In International Supercomputer Con- ference, Leipzig, Germany, 2014.

Gianfranco Sciacca. Big Data Science Accessing High-End HPC. https://www.digitalinfrastructures.eu/sites/default/files/LHConCRAY-DI4R2016-v2.pdf, October 2016.




Publishing Center of South Ural State University (454080, Lenin prospekt, 76, Chelyabinsk, Russia)