Data Compression for Climate Data

Michael Kuhn, Julian Kunkel, Thomas Ludwig

Abstract


The different rates of increase for computational power and storage capabilities of supercomputers turn data storage into a technical and economical problem. Because storage capabilities are lagging behind, investments and operational costs for storage systems have increased to keep up with the supercomputers' I/O requirements. One promising approach is to reduce the amount of data that is stored. In this paper, we take a look at the impact of compression on performance and costs of high performance systems. To this end, we analyze the applicability of compression on all layers of the I/O stack, that is, main memory, network and storage. Based on the Mistral system of the German Climate Computing Center (Deutsches Klimarechenzentrum, DKRZ), we illustrate potential performance improvements and cost savings. Making use of compression on a large scale can decrease investments and operational costs by 50% without negatively impacting performance. Additionally, we present ongoing work for supporting enhanced adaptive compression in the parallel distributed file system Lustre and application-specific compression.


Full Text:

PDF

References


CMIP5 – Overview. http://cmip-pcmdi.llnl.gov/cmip5/. Last accessed: 2016-04

Mohamed S. Abdelfattah, Andrei Hagiescu, and Deshanand Singh. Gzip on a chip: High performance lossless data compression on fpgas using opencl. In Proceedings of the International Workshop on OpenCL 2013 & 2014, IWOCL ’14, pages 4:1–4:9, New York, NY, USA, 2014. ACM

Kenneth C. Barr and Krste Asanović. Energy-aware lossless data compression. ACM Trans. Comput. Syst., 24(3):250–291, August 2006

L. Benini, D. Bruni, A. Macii, and E. Macii. Hardware-assisted data compression for energy minimization in systems with embedded processors. In Design, Automation and Test in Europe Conference and Exhibition, 2002. Proceedings, pages 449–453, 2002

Jeff Bonwick, Matt Ahrens, Val Henson, Mark Maybee, and Mark Shellenbaum. The Zettabyte File System. 2003

Konstantinos Chasapis, Manuel Dolz, Michael Kuhn, and Thomas Ludwig. Evaluating Power-Performace Benefits of Data Compression in HPC Storage Servers. In Steffen Fries and Petre Dini, editors, IARIA Conference, pages 29–34. IARIA XPS Press, 04 2014

Yanpei Chen, Archana Ganapathi, and Randy H. Katz. To compress or not to compress - compute vs. io tradeoffs for mapreduce energy efficiency. In Proceedings of the First ACM SIGCOMM Workshop on Green Networking, Green Networking ’10, pages 23–28, New York, NY, USA, 2010. ACM

D. J. Craft. A fast hardware data compression algorithm and some algorithmic extensions. IBM J. Res. Dev., 42(6):733–745, November 1998

Sébastien Denvil. The ESGF’s organization with a detailed discussion of the CMIP6 project and upcoming challenges. talk, https://rd-alliance.org/sites/default/files/attachment/RDA-ESGF-2015.pdf, 2015

Peter Deutsch. DEFLATE Compressed Data Format Specification version 1.3. RFC 1951, 1996

A. Dzhagaryan, A. Milenkovic, and M. Burtscher. Energy efficiency of lossless data compression on a mobile device: An experimental evaluation. In Performance Analysis of Systems and Software (ISPASS), 2013 IEEE International Symposium on, pages 126–127, April 2013

ECMA. Standard ECMA-321: Streaming Lossless Data Compression Algorithm – (SLDC). http://www.ecma-international.org/publications/standards/Ecma-321.htm, June 2011

Florian Ehmke. Adaptive Compression for the Zettabyte File System. Master’s thesis, Universität Hamburg, 02 2015

Rosa Filgueira, Malcolm Atkinson, Alberto Nuñez, and Javier Fernández. Euro-Par 2012 Parallel Processing: 18th International Conference, Euro-Par 2012, Rhodes Island, Greece, August 27-31, 2012. Proceedings, chapter An Adaptive, Scalable, and Portable Technique for Speeding Up MPI-Based Applications, pages 729–740. Springer Berlin Heidelberg, Berlin, Heidelberg, 2012

Rosa Filgueira, Malcolm Atkinson, Yusuke Tanimura, and Isao Kojima. Euro-Par 2014 Parallel Processing: 20th International Conference, Porto, Portugal, August 25-29, 2014. Proceedings, chapter Applying Selectively Parallel I/O Compression to Parallel Storage Systems, pages 282–293. Springer International Publishing, Cham, 2014

Rosa Filgueira, David E. Singh, Alejandro Calderón, and Jesús Carretero. CoMPI: Enhancing MPI Based Applications Performance and Scalability Using Run-Time Compression. In Proceedings of the 16th European PVM/MPI Users’ Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface, pages 207–218, Berlin, Heidelberg, 2009. Springer-Verlag

Rosa Filgueira, David E. Singh, Jesús Carretero, Alejandro Calderón, and Félix García. Adaptive-Compi: Enhancing Mpi-Based Applications - Performance and Scalability by Using Adaptive Compression. Int. J. High Perform. Comput. Appl., 25(1):93–114, February 2011

Nathanel Hübbe and Julian Kunkel. Reducing the HPC-Datastorage Footprint with MAFISC – Multidimensional Adaptive Filtering Improved Scientific data Compression. Computer Science - Research and Development, pages 231–239, 05 2013

Intel High Performance Data Division. Lustre – The High Performance File System, 2013

Joachim Metz. Shrinking the gap: carving NTFS-compressed files. https://articles.forensicfocus.com/2011/07/18/shrinking-the-gap-carving-ntfs-compressed-files/, 07 2011. Last accessed: 2016-04

J. Kane and Q. Yang. Compression speed enhancements to lzo for multi-core systems. In Computer Architecture and High Performance Computing (SBAC-PAD), 2012 IEEE 24th International Symposium on, pages 108–115, Oct 2012

Meaza Taye Kebede. Performance Comparison of Btrfs and Ext4 Filesystems. Master’s thesis, University of Oslo, 2012

Kush K. Kella and Aasia Khanum. APCFS: Autonomous and Parallel Compressed File System. International Journal of Parallel Programming, 39(4):522–532, 2010

Peter Kogge, Keren Bergman, Shekhar Borkar, Dan Campbell, William Carlson, William Dally, Monty Denneau, Paul Franzon, William Harrod, Kerry Hill, Jon Hiller, Sherman Karp, Stephen Keckler, Dean Klein, Robert Lucas, Mark Richards, Al Scarpelli, Steven Scott, Allan Snavely, Thomas Sterling, R. Stanley Williams, and Katherine Yelick. ExaScale Computing Study: Technology Challenges in Achieving Exascale Systems, DARPA report. http://www.cse.nd.edu/Reports/2008/TR-2008-13.pdf, Sep 2008

Rachita Kothiyal, Vasily Tarasov, Priya Sehgal, and Erez Zadok. Energy and Performance Evaluation of Lossless File Data Compression on Server Systems. In Proceedings of SYSTOR 2009: The Israeli Experimental Systems Conference, SYSTOR ’09, pages 4:1–4:12, New York, NY, USA, 2009. ACM

Julian Kunkel, Michael Kuhn, and Thomas Ludwig. Exascale Storage Systems – An Analytical Study of Expenses. Supercomputing Frontiers and Innovations, pages 116–134, 06 2014

Sriram Lakshminarasimhan, Neil Shah, Stephane Ethier, Seung-Hoe Ku, Choong-Seock Chang, Scott Klasky, Rob Latham, Rob Ross, and Nagiza F Samatova. ISABELA for effective in situ compression of scientific data. Concurrency and Computation: Practice and Experience, 25(4):524–540, 2013

P. Lindstrom and M. Isenburg. Fast and efficient compression of floating-point data. IEEE Transactions on Visualization and Computer Graphics, 12(5):1245–1250, Sept 2006

Peter Lindstrom. Fixed-Rate Compressed Floating-Point Arrays. Visualization and Computer Graphics, IEEE Transactions on, 20(12):2674–2683, 2014

Dirk Meister, Jürgen Kaiser, Andre Brinkmann, Michael Kuhn, Julian Kunkel, and Toni Cortes. A Study on Data Deduplication in HPC Storage Systems. In Proceedings of the ACM/IEEE Conference on High Performance Computing (SC), 11 2012

Dutch T. Meyer and William J. Bolosky. A study of practical deduplication. In Proceedings of the 9th USENIX Conference on File and Stroage Technologies, FAST’11, pages 1–1, Berkeley, CA, USA, 2011. USENIX Association

Nitin Gupta. zram: Compressed RAM based block devices. https://www.kernel.org/doc/Documentation/blockdev/zram.txt, 11 2015. Last accessed: 2016-04

Ritesh A Patel, Yao Zhang, Jason Mak, Andrew Davidson, and John D Owens. Parallel lossless data compression on the GPU. IEEE, 2012

P. Ratanaworabhan, Jian Ke, and M. Burtscher. Fast lossless compression of scientific floating-point data. In Data Compression Conference (DCC’06), pages 133–142, March 2006

Christopher M. Sadler and Margaret Martonosi. Data compression algorithms for energy-constrained devices in delay tolerant networks. In Proceedings of the 4th International Conference on Embedded Networked Sensor Systems, SenSys ’06, pages 265–278, New York, NY, USA, 2006. ACM

The Green500 Editors. Green500. http://www.green500.org/, 2016. Last accessed: 2016-04

The TOP500 Editors. TOP500. http://www.top500.org/, 06 2014. Last accessed: 2016-04

Ning Wang, Jian-Wen Bao, Jin-Luen Lee, Fanthune Moeng, and Cliff Matsumoto. Wavelet Compression Technique for High-Resolution Global Model Data on an Icosahedral Grid. Journal of Atmospheric and Oceanic Technology, 32(9):1650–1667, 2015

Benjamin Welton, Dries Kimpe, Jason Cope, Christina M. Patrick, Kamil Iskra, and Robert Ross. Improving I/O Forwarding Throughput with Data Compression. In Proceedings of the 2011 IEEE International Conference on Cluster Computing, CLUSTER ’11, pages 438–445, Washington, DC, USA, 2011. IEEE Computer Society

R.N. Williams. An extremely fast Ziv-Lempel data compression algorithm. In Data Compression Conference, 1991. DCC ’91., pages 362–371, Apr 1991

Rong Xu, Zhiyuan Li, Cheng Wang, and Peifeng Ni. Impact of data compression on energy consumption of wireless-networked handheld devices. In Distributed Computing Systems, 2003. Proceedings. 23rd International Conference on, pages 302–311, May 2003

Yann Collet. lz4. http://www.lz4.org/, 04 2016. Last accessed: 2016-04

J. Ziv and A. Lempel. A universal algorithm for sequential data compression. Information Theory, IEEE Transactions on, 23(3):337–343, May 1977




Publishing Center of South Ural State University (454080, Lenin prospekt, 76, Chelyabinsk, Russia)