Published: 2021-05-29

Size & Shape Matters: The Need of HPC Benchmarks of High Resolution Image Training for Deep Learning

Ferran Parés Pont, Pedro Megias, Dario Garcia-Gasulla, Marta Garcia-Gasulla, Eduard Ayguadé, Jesús Labarta


One of the purposes of HPC benchmarks is to identify limitations and bottlenecks in hardware. This functionality is particularly influential when assessing performance on emerging tasks, the nature and requirements of which may not yet be fully understood. In this setting, a proper benchmark can steer the design of next generation hardware by properly identifying said requirements, and quicken the deployment of novel solutions. With the increasing popularity of deep learning workloads, benchmarks for this family of tasks have been gaining popularity. Particularly for image based tasks, which rely on the most well established family of deep learning models: Convolutional Neural Networks. Significantly, most benchmarks for CNN use low-resolution and fixed-shape (LR&FS) images. While this sort of inputs have been very successful for certain purposes, they are insufficient for some domains of special interest (e.g., medical image diagnosis or autonomous driving) where one requires higher resolutions and variable-shape (HR&VS) images to avoid loss of information and deformation. As of today, it is still unclear how does image resolution and shape variability affect the nature of the problem from a computational perspective. In this paper we assess the differences between training with LR&FS and HR&VS, as means to justify the importance of building benchmarks specific for the latter. Our results on three different HPC clusters show significant variations in time, resources and memory management, highlighting the differences between LR&FS and HR&VS image deep learning.

Full Text:



Press Release 11/18/20: MLPerf Releases Inaugural Results for Leading High-Performance ML Training Systems (2020),

Advanced Simulation and Computing: CORAL Benchmarks., accessed: 2021-04-06

Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. CoRR abs/1409.0473 (2014),

Bailey, D., Harris, T., Saphir, W., Van Der Wijngaart, R., Woo, A., Yarrow, M.: The NAS parallel benchmarks 2.0. Tech. rep., Technical Report NAS-95-020, NASA Ames Research Center (1995)

Banchelli, F., Garcia-Gasulla, M., Houzeaux, G., Mantovani, F.: Benchmarking of state-ofthe-art HPC Clusters with a Production CFD Code. In: Proceedings of the Platform for Advanced Scientific Computing Conference, 29 June-1 July 2020, Geneva, Switzerland. pp. 1–11 (2020), DOI: 10.1145/3394277.3401847

Beaumont, O., Eyraud-Dubois, L., Shilova, A.: Optimal GPU-CPU Offloading Strategies for Deep Neural Network Training, pp. 151–166 (2020), DOI: 10.1007/978-3-030-57675-2_10

Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training deep nets with sublinear memory cost. CoRR abs/1604.06174 (2016),

Chen, X., Ma, H., Wan, J., Li, B., Xia, T.: Multi-view 3d object detection network for autonomous driving. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 21-26 July 2017, Honolulu, HI, USA. pp. 1907–1915. IEEE (2017), DOI: 10.1109/cvpr.2017.691

Criado, J., Garcia-Gasulla, M., Kumbhar, P., Awile, O., Magkanaris, I., Mantovani, F.: CoreNEURON: Performance and Energy Efficiency Evaluation on Intel and Arm CPUs. In: 2020 IEEE International Conference on Cluster Computing (CLUSTER), 14-17 Sept. 2020, Kobe, Japan. pp. 540–548. IEEE (2020), DOI: 10.1109/cluster49012.2020.00077

Dean, J., Corrado, G.S., Monga, R., et al.: Large scale distributed deep networks. In: Advances in Neural Information Processing Systems. vol. 25. Curran Associates, Inc. (2012)

Dongarra, J., Heroux, M.A., Luszczek, P.: HPCG benchmark: a new metric for ranking high performance computing systems. The International Journal of High Performance Computing Applications 30(1), 3–10 (2016), DOI: 10.1177/1094342015593158

Geras, K.J., Wolfson, S., Shen, Y., et al.: High-resolution breast cancer screening with multiview deep convolutional neural networks. CoRR abs/1703.07047 (2017),

He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE transactions on pattern analysis and machine intelligence 37(9), 1904–1916 (2015), DOI: 10.1007/978-3-319-10578-9_23

He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 27-30 June 2016, Las Vegas, NV, USA. pp. 770–778. IEEE (2016), DOI: 10.1109/cvpr.2016.90

Jiang, Z., Gao, W., Wang, L., et al.: HPC AI500: a benchmark suite for HPC AI systems. In: International Symposium on Benchmarking, Measuring and Optimization, 10-13 Dec. 2018, Seattle, WA, USA. pp. 10–22. Springer (2018), DOI: 10.1007/978-3-030-32813-9_2

Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. CoRR abs/1412.6980 (2014),

Kudo, S., Nitadori, K., Ina, T., Imamura, T.: Implementation and Numerical Techniques for One EFlop/s HPL-AI Benchmark on Fugaku. In: Proceedings of the 11th IEEE/ACM Workshop on Latest Advances in Scalable Algorithms for Large-Scale, 13 Nov. 2020, GA, USA. pp. 69–76. IEEE (2020), DOI: 10.1109/ScalA51936.2020.00014

Kudo, S., Nitadori, K., Ina, T., Imamura, T.: Prompt report on Exa-scale HPL-AI benchmark. In: 2020 IEEE International Conference on Cluster Computing (CLUSTER), 14-17 Sept. 2020, Kobe, Japan. pp. 418–419. IEEE (2020), DOI: 10.1109/cluster49012.2020.00058

Kurth, T., Treichler, S., Romero, J., et al.: Exascale deep learning for climate analytics. In: SC18: International Conference for High Performance Computing, Networking, Storage and Analysis, 11-16 Nov. 2018, Dallas, TX, USA. pp. 649–660. IEEE (2018), DOI: 10.1109/sc.2018.00054

Liu, Y., Zhang, H., Zeng, L., Wu, W., Zhang, C.: MLbench: benchmarking machine learning services against human experts. Proceedings of the VLDB Endowment 11(10), 1220–1232 (2018), DOI: 10.14778/3231751.3231770

Lotter, W., Sorensen, G., Cox, D.: A multi-scale CNN and curriculum learning strategy for mammogram classification. In: Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support, 17 Sept. 2017, Québec City, QC, Canada, pp. 169–177. Springer (2017), DOI: 10.1007/978-3-319-67558-9_20

Mantovani, F., Garcia-Gasulla, M., Gracia, J., et al.: Performance and energy consumption of HPC workloads on a cluster based on Arm ThunderX2 CPU. Future Generation Computer Systems 112, 800–818 (2020), DOI: 10.1016/j.future.2020.06.033

Mathuriya, A., Bard, D., Mendygral, P., et al.: CosmoFlow: Using deep learning to learn the universe at scale. In: SC18: International Conference for High Performance Computing, Networking, Storage and Analysis, 11-16 Nov. 2018, Dallas, TX, USA. pp. 819–829. IEEE (2018), DOI: 10.1109/sc.2018.00068

Monnier, N., Lofstead, J., Lawson, M., Curry, M.: Profiling platform storage using IO500 and mistral. In: 2019 IEEE/ACM Fourth International Parallel Data Systems Workshop (PDSW), 18 Nov. 2019, Denver, CO, USA. pp. 60–73. IEEE (2019), DOI: 10.1109/pdsw49588.2019.00011

Murphy, R.C., Wheeler, K.B., Barrett, B.W., Ang, J.A.: Introducing the Graph 500. Cray Users Group (CUG) 19, 45–74 (2010)

Parés, F., Arias-Duart, A., Garcia-Gasulla, D., et al.: A Closer Look at Art Mediums: The MAMe Image Classification Dataset. CoRR abs/2007.13693 (2020),

Ramirez-Gargallo, G., Garcia-Gasulla, M., Mantovani, F.: TensorFlow on state-of-the-art HPC clusters: a machine learning use case. In: 2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID), 14-17 May 2019, Larnaca, Cyprus. pp. 1–8. IEEE (2019), DOI: 10.1109/ccgrid.2019.00067

Reichstein, M., Camps-Valls, G., Stevens, B., et al.: Deep learning and process understanding for data-driven earth system science. Nature 566(7743), 195–204 (2019), DOI: 10.1038/s41586-019-0912-1

Sotiropoulos, I.N.: Handling variable shaped & high resolution images for multi-class classification problem. Master’s thesis, Universitat Politècnica de Catalunya (2020)

Treml, M., Arjona-Medina, J., Unterthiner, T., et al.: Speeding up semantic segmentation for autonomous driving. In: MLITS, NIPS Workshop. vol. 2, p. 7 (2016)

Wu, N., Phang, J., Park, J., et al.: Deep neural networks improve radiologists performance in breast cancer screening. IEEE transactions on medical imaging 39(4), 1184–1194 (2019), DOI: 10.1109/TMI.2019.2945514

Publishing Center of South Ural State University (454080, Lenin prospekt, 76, Chelyabinsk, Russia)