Exascale Storage Systems -- An Analytical Study of Expenses

Julian Martin Kunkel, Michael Kuhn, Thomas Ludwig

Abstract


The computational power and storage capability of supercomputers are growing at a different pace, with storage lagging behind; the widening gap necessitates new approaches to keep the investment and running costs for storage systems at bay. In this paper, we aim to unify previous models and compare different approaches for solving these problems.
By extrapolating the characteristics of the German Climate Computing Center's previous supercomputers to the future, cost factors are identified and quantified in order to foster adequate research and development.
Using models to estimate the execution costs of two prototypical use cases, we are discussing the potential of three concepts: re-computation, data deduplication and data compression.


Full Text:

PDF

References


CMIP5 Overview. http://cmip-pcmdi.llnl.gov/cmip5/. Last accessed: 2014-06.

IBM researchers make 12-atom magnetic memory bit. http://www.bbc.com/news/technology-16543497, 2012. Last accessed: 2014-06.

Technical Challenges of Exascale Computing. Technical report, The MITRE Corporation, April 2013. http://institute.lanl.gov/resilience/docs/JSR-12-310-Challenges_of_exascaleFINAL.pdf.

Allison H. Baker, Haiying Xu, John M. Dennis, Michael N. Levy, Doug Nychka, and Sheri A. Mickelson. A Methodology for Evaluating the Impact of Data Compression on Climate Simulation Data. ACM Symposium on High-Performance Parallel and Distributed Computing, 2014, to appear.

John Bent. Exascale Storage for HPC: Burst Bu ers with a new storage API. Presentation at the Exascale10 satellite event during ISC. http://www.exascale10.com/, 2013.

Konstantinos Chasapis, Manuel Dolz, Michael Kuhn, and Thomas Ludwig. Evaluating Power-Performace Benefits of Data Compression in HPC Storage Servers. In Stefen Fries and Petre Dini, editors, IARIA Conference, pages 29 34. IARIA XPS Press, 04 2014.

Matthew L. Curry, H. LeeWard, Gary Grider, Jill Gemmill, Jay Harris, and David Martinez. Power Use of Disk Subsystems in Supercomputers. In Proceedings of the Sixth Workshop on Parallel Data Storage, PDSW '11, pages 49 54, New York, NY, USA, 2011. ACM.

Department of Energy. Exascale Strategy Report to Congress. Technical report, United States Department of Energy, June 2013. http://assets.fiercemarkets.net/public/sites/govit/perera_fgit_foia_doe_exascale%20report.pdf.

Jack Dongarra. Impact of Architecture and Technology for Extreme Scale on Software and Algorithm Design. presentation, 2010.

Giovanni Erbacci, Vincent Bergeaud Francois Bodin, Alberto Pasanisi, Simon McIntosh-Smith, Thomas Ludwig, Franck Cappello, Carlo Cavazzoni, and Marie-Christine Sawley. European Exascale Software Initiative 2 Deliverable D5.1 WP5 First Intermediate Report Cross Cutting Issues Working Groups. http://www.eesi-project.eu/modules/download_pictures/dlc.php?file=343&id=1389118661&sid=233, 2013.

Richard Freitas, Joseph Slember, Wayne Sawdon, and Lawrence Chiu. GPFS scans 10 billion files in 43 minutes. IBM Advanced Storage Laborator. IBM Almaden Research Center. San Jose, CA, 95120, 2011.

Serge Goldstein. DataSpace: A Model for Long-Term Preservation and Dissemination of Research Data. University of Massachusetts and New England Area Librarian e-Science Symposium, presentation, 2011.

Damien Hardy, Isidoros Sideris, Ali Saidi, and Yiannakis Sazeides. EETCO: A tool to estimate and explore the implications of datacenter design choices on the tco and the environmental impact. In Workshop on Energy-e cient Computing for a Sustainable World, 2011.

Nathanael Hübbe and Julian Kunkel. Reducing the HPC-Datastorage Footprint with MAFISC Multidimensional Adaptive Filtering Improved Scienti c data Compression. In Computer Science - Research and Development, Hamburg, Berlin, Heidelberg, 2012. Executive Committee, Springer.

Nathanael Hübbe, Al Wegener, Julian Kunkel, Yi Ling, and Thomas Ludwig. Evaluating Lossy Compression on Climate Data. In Julian Martin Kunkel, Thomas Ludwig, and Hans Werner Meuer, editors, Supercomputing, number 7905 in Lecture Notes in Computer Science, pages 343-356, Berlin, Heidelberg, 06 2013. Springer.

Intel. Intel Solid State Drive DC S3700 Series Product Speci cation, March 2014.

Peter Kogge, Keren Bergman, Shekhar Borkar, Dan Campbell, William Carlson, William Dally, Monty Denneau, Paul Franzon, William Harrod, Kerry Hill, Jon Hiller, Sherman Karp, Stephen Keckler, Dean Klein, Robert Lucas, Mark Richards, Al Scarpelli, Steven Scott, Allan Snavely, Thomas Sterling, R. Stanley Williams, and Katherine Yelick. ExaScale Computing Study: Technology Challenges in Achieving Exascale Systems, DARPA report. http://www.cse.nd.edu/Reports/2008/TR-2008-13.pdf, Sep 2008.

Julian Kunkel, Olga Mordvinova, Michael Kuhn, and Thomas Ludwig. Collecting Energy Consumption of Scientific Data. Computer Science - Research and Development, pages 1-9, 2010.

S. Lakshminarasimhan, N. Shah, S. Ethier, S.H. Ku, CS Chang, S. Klasky, R. Latham, R. Ross, and N.F. Samatova. Isabela for e ective in situ compression of scientific data. Concurrency and Computation: Practice and Experience, 2012.

P. Lindstrom and M. Isenburg. Fast and e cient compression of oating-point data. Visualization and Computer Graphics, IEEE Transactions on, 12(5):1245 1250, Sept 2006.

Dirk Meister, Jürgen Kaiser, Andre Brinkmann, Michael Kuhn, Julian Kunkel, and Toni Cortes. A Study on Data Deduplication in HPC Storage Systems. In Proceedings of the ACM/IEEE Conference on High Performance Computing (SC), 11 2012.

Chris Mellor. Spin that disk drive forecast, Gartner: Watch those desktop units dive. http://www.theregister.co.uk/2014/03/31/gartner_disk_drive_forecast/, 2014.

Dutch T. Meyer and William J. Bolosky. A study of practical deduplication. In Proceedings of the 9th USENIX Conference on File and Stroage Technologies, FAST'11, pages 1-1, Berkeley, CA, USA, 2011. USENIX Association.

Russ Rew and Glenn Davis. Data Management: NetCDF: an Interface for Scientific Data Access. IEEE Computer Graphics and Applications, (10-4):76-82, 1990.

The HD(CP)2 Project. HD(CP)2. http://hdcp2.eu/. Last accessed: 2014-06.

The TOP500 Editors. TOP500. http://www.top500.org/, 06 2013. Last accessed: 2013-06.

Wikipedia. Festplattenlaufwerk Geschwindigkeit. http://de.wikipedia.org/wiki/ Festplattenlaufwerk#Geschwindigkeit, 02 2013. Last accessed: 2013-02.

Wikipedia. Mark Kryder Kryder's Law. http://en.wikipedia.org/wiki/Mark_Kryder#Kryder.27s_Law, 02 2013. Last accessed: 2013-02.

R.N. Williams. An extremely fast Ziv-Lempel data compression algorithm. In Data Compression Conference, 1991. DCC '91., pages 362-371, Apr 1991.

Yann Collet. LZ4 Explained. http://fastcompression.blogspot.com/2011/05/lz4-explained.html, 2008. Last accessed: 2013-12.

J. Ziv and A. Lempel. A universal algorithm for sequential data compression. Information Theory, IEEE Transactions on, 23(3):337-343, May 1977.




Publishing Center of South Ural State University (454080, Lenin prospekt, 76, Chelyabinsk, Russia)