Autotuning Techniques for Performance-Portable Point Set Registration in 3D

Piotr Luszczek, Jakub Kurzak, Ichitaro Yamazaki, David Keffer, Vasileios Maroulas, Jack Dongarra


We present an autotuning approach applied to exhaustive performance engineering of the EM-ICP algorithm for the point set registration problem with a known reference. We were able to achieve progressively higher performance levels through a variety of code transformations and an automated procedure of generating a large number of implementation variants. Furthermore, we managed to exploit code patterns that are not common when only attempting manual optimization but which yielded in our tests better performance for the chosen registration algorithm. Finally, we also show how we maintained high levels of the performance rate in a portable fashion across a wide range of hardware platforms including multicore, manycore coprocessors, and accelerators. Each of these hardware classes is much different from the others and, consequently, cannot reliably be mastered by a single developer in a short time required to deliver a close-to-optimal implementation. We assert in our concluding remarks that our methodology as well as the presented tools provide a valid automation system for software optimization tasks on modern HPC hardware.

Full Text:



Bartok, A., Kondor, R., Csanyi, G.: On representing chemical environments. Physical Review B 87(18) (2013), DOI: 10.1103/PhysRevB.87.184115

Besl, P.J., McKay, N.D.: A method for registration of 3-D shapes. IEEE PAMI 14(2), 239–256 (1992), DOI: 10.1109/34.121791

Chui, H., Rangarajan, A.: A feature registration framework using mixture models. In: IEEE Workshop on MMBIA. pp. 190–197 (2000), DOI: 10.1109/MMBIA.2000.852377

Chui, H., Rangarajan, A.: A new algorithm for non-rigid point matching. In: CVPR. vol. 2, pp. 44–51. IEEE Press (2000), DOI: 10.1016/S1077-3142(03)00009-2

Chui, H., Rangarajan, A.: A new point matching algorithm for non-rigid registration. CVIU 89(2–3), 114–141 (2003), DOI: 10.1016/S1077-3142(03)00009-2

Du, S., Zheng, N., Xiong, L., Ying, S., Xue, J.: Scaling iterative closest point algorithm for registration of md point sets. Journal of Visual Communication and Image Representation 21(5–6), 442–452 (2010), DOI: 10.1016/j.jvcir.2010.02.005

Fitzgibbon, A.W.: Robust registration of 2D and 3D point sets. Image and Vision Computing 21, 1145–1153 (2003), DOI: 10.1016/j.imavis.2003.09.004

Gao, M.C., Yeh, J.W., Liaw, P.K., Zhang, Y.: High-entropy alloys: fundamentals and applications. Springer International Publishing Switzerland (2016), DOI: 10.1007/978-3-319-27013-5

Gold, S., Lu, C.P., Rangarajan, A., Pappu, S., Mjolsness, E.: New algorithms for 2D and 3D point matching: Pose estimation and corresp. In: NIPS. vol. 7, pp. 957–964. The MIT Press (1995), DOI: 10.1016/S0031-3203(98)80010-1

Granger, S., Pennec, X.: Multi-scale EM-ICP: A fast and robust approach for surface registration. In: et al., A.H. (ed.) ECCV 2002. pp. 418–432. LNCS 2353, (c) Springer-Verlag Berlin Heidelberg (2002), DOI: 10.1007/3-540-47979-1 28

Gupta, S., Agrawal, A., Gopalakrishnan, K., Narayanan, P.: Deep learning with limited numerical precision. CoRR abs/1502.02551 (2015),, accessed: 2018-08-01

Gupta, S., Agrawal, A., Gopalakrishnan, K., Narayanan, P.: Deep learning with limited numerical precision. In: Bach, F., Blei, D. (eds.) Proceedings of the 32nd International Conference onMachine Learning. Proceedings of Machine Learning Research, vol. 37, pp. 1737– 1746. PMLR, Lille, France (2015),, accessed: 2018-08-01

Hockney, R.W., Curington, I.J.: f1/2: A parameter to characterize memory and communication bottlenecks. Parallel Computing 10(3), 277–286 (1989), DOI: 10.1016/0167-8191(89)90100-2

Kim, Y.M., Morozovska, A., Eliseev, E., Oxley, M., Mishra, R., Selbach, S., Grande, T., Pantelides, S., Kalinin, S., Borisevich, A.: Direct observation of ferroelectric field effect and vacancy-controlled screening at the BiFeO3/LaxSr1−xMnO3 interface. Nature Materials 13(11), 1019–1025 (2014), DOI: 10.1038/nmat4058

Larson, D., Prosa, T., Ulfig, R., Geiser, B., Kelly, T.: Local Electrode Atom Probe Tomography: A User’s Guide. Springer-Verlag New York (2013), DOI: 10.1007/978-1-4614-8721-0

Luszczek, P., Gates, M., Kurzak, J., Danalis, A., Dongarra, J.: Search space generation and pruning system for autotuners. In: Proceedings of IPDPSW. pp. 1545–1554. The Eleventh International Workshop on Automatic Performance Tuning (iWAPT) 2016, IEEE, Chicago, IL, USA (2016), DOI: 10.1109/IPDPSW.2016.197

Luszczek, P., Kurzak, J., Yamazaki, I., Dongarra, J.: Towards numerical benchmark for halfprecision floating point arithmetic. In: 2017 IEEE High Performance Extreme Computing Conference (2017), DOI: 10.1109/HPEC.2017.8091031

Luszczek, P., Kurzak, J., Yamazaki, I., Keffer, D., Dongarra, J.J.: Scaling point set registration in 3D across thread counts on multicore and hardware accelerator platforms through autotuning for large scale analysis of scientific point clouds. In: 2017 IEEE International Conference on Big Data (Big Data). pp. 2893–2902. Boston, MA, USA (2017), DOI: 10.1109/BigData.2017.8258258

Miller, M.K., Forbes, R.G.: Atom-Probe Tomography: The Local Electrode Atom Probe. Springer US (2014), DOI: 10.1007/978-1-4899-7430-3

Myronenko, A., Song, X., Carreira-Perpi˜n´an, M.A.: Non-rigid point set registration: Coherent Point Drift. In: Sch¨olkopf, B., Platt, J.C., Hoffman, T. (eds.) Advances in Neural Information Processing Systems 19. pp. 1009–1016. MIT Press (2007)

Rangarajan, A., Chui, H., Mjolsness, E., Davachi, L., Goldman-Rakic, P.S., Duncan, J.S.: A robust point matching algorithm for autoradiograph alignment. Medical Image Analysis 1(4), 379–398 (1997), DOI: 10.1016/S1361-8415(97)85008-6

Rusinkiewicz, S., Levoy, M.: Efficient variants of the ICP algorithm. In: International Conference on 3D Digital Imaging and Modeling (3DIM). pp. 145–152 (2001), DOI: 10.1109/IM.2001.924423

Rusu, R.B., Cousins, S.: 3D is here: Point Cloud Library (PCL). In: IEEE International Conference on Robotics and Automation (ICRA). Shanghai, China (2011), DOI: 10.1109/ICRA.2011.5980567

Sohlberg, K., Rashkeev, S., Borisevich, A., Pennycook, S., Pantelides, S.: Origin of anomalous Pt-Pt distances in the Pt/Alumina catalytic system. ChemPhysChem 5(12), 1893–1897 (2004), DOI: 10.1002/cphc.200400212

Volkov, V., Demmel, J.W.: Benchmarking GPUs to Tune Dense Linear Algebra. In: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing. pp. 31:1–31:11. SC ’08, IEEE Press, Piscataway, NJ, USA (2008),, accessed: 2018-08-01, DOI: 10.1145/1413370.1413402

Zhang, Y., Zuo, T.T., Tang, Z., Gao, M.C., Dahmen, K.A., Liaw, P.K., Lu, Z.P.: Microstructures and properties of high-entropy alloys. Progress in Materials Science 61, 1–93 (2014), DOI: 10.1016/j.pmatsci.2013.10.001

Zhang, Z.: Iterative point matching for registration of free-form curves and surfaces. International Journal of Computer Vision 13(2), 119–152 (1994), DOI: 10.1007/BF01427149

Publishing Center of South Ural State University (454080, Lenin prospekt, 76, Chelyabinsk, Russia)