Comparative Analysis of Virtualization Methods in Big Data Processing

Gleb I. Radchenko, Ameer B. A. Alaasam, Andrei N. Tchernykh


Cloud computing systems have become widely used for Big Data processing, providing access to a wide variety of computing resources and a greater distribution between multi-clouds. This trend has been strengthened by the rapid development of the Internet of Things (IoT) concept. Virtualization via virtual machines and containers is a traditional way of organization of cloud computing infrastructure. Containerization technology provides a lightweight virtual runtime environment. In addition to the advantages of traditional virtual machines in terms of size and flexibility, containers are particularly important for integration tasks for PaaS solutions, such as application packaging and service orchestration. In this paper, we overview the current state-of-the-art of virtualization and containerization approaches and technologies in the context of Big Data tasks solution. We present the results of studies which compare the efficiency of containerization and virtualization technologies to solve Big Data problems. We also analyze containerized and virtualized services collaboration solutions to support automation of the deployment and execution of Big Data applications in the cloud infrastructure.

Full Text:



de Alfonso, C., Calatrava, A., Molto ́, G.: Container-based virtual elastic clusters. Journal of Systems and Software 127, 1–11 (2017), DOI: 10.1016/j.jss.2017.01.007

Anderson, C.: Docker. IEEE Software 32(3), 102–c3 (2015), DOI: 10.1109/MS.2015.62

Apache Software Foundation: Apache Mesos., accessed: 2018-12-04

Apache Software Foundation: Apache Tomcat., accessed: 2018-11-29

Appscale Systems: Eucalyptus., accessed: 2018-11-30

Barham, P., Dragovic, B., Fraser, K., Hand, S., Harris, T., Ho, A., Neugebauer, R., Pratt, I., Warfield, A.: Xen and the art of virtualization. In: Proceedings of the nineteenth ACM symposium on Operating systems principles - SOSP ’03. pp. 164–177. ACM Press, New York, New York, USA (2003), DOI: 10.1145/945445.945462

Baset, S.A.: Open source cloud technologies. In: Proceedings of the Third ACM Symposium on Cloud Computing - SoCC ’12. pp. 1–2. ACM Press, New York, New York, USA (2012), DOI: 10.1145/2391229.2391257

Bernstein, D.: Cloud Foundry Aims to Become the OpenStack of PaaS. IEEE Cloud Computing 1(2), 57–60 (2014), DOI: 10.1109/MCC.2014.32

Bhimani, J., Yang, Z., Leeser, M., Mi, N.: Accelerating big data applications using lightweight virtualization framework on enterprise cloud. In: 2017 IEEE High Performance Extreme Computing Conference (HPEC). pp. 1–7. IEEE (2017), DOI: 10.1109/HPEC.2017.8091086

Binz, T., Breitenbu ̈cher, U., Haupt, F., Kopp, O., Leymann, F., Nowak, A., Wagner, S.: OpenTOSCA - A runtime for TOSCA-based cloud applications. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2013), DOI: 10.1007/978-3-642-45005-1_62

Binz, T., Breitenbu ̈cher, U., Kopp, O., Leymann, F.: TOSCA: Portable Automated Deployment and Management of Cloud Applications. In: Advanced Web Services, pp. 527–549. Springer New York, New York, NY (2014), DOI: 10.1007/978-1-4614-7535-4_22

Binz, T., Breiter, G., Leyman, F., Spatzier, T.: Portable Cloud Services Using TOSCA. IEEE Internet Computing 16(3), 80–85 (2012), DOI: 10.1109/MIC.2012.43

Boettiger, C.: An introduction to Docker for reproducible research. ACM SIGOPS Operating Systems Review 49(1), 71–79 (2015), DOI: 10.1145/2723872.2723882

Buyya, R., Yeo, C.S., Venugopal, S., Broberg, J., Brandic, I.: Cloud computing and emerging IT platforms: Vision, hype, and reality for delivering computing as the 5th utility. Future Generation Computer Systems 25(6), 599–616 (2009), DOI: 10.1016/j.future.2008.12.001

Canizo, M., Onieva, E., Conde, A., Charramendieta, S., Trujillo, S.: Real-time predictive maintenance for wind turbines using Big Data frameworks. In: 2017 IEEE International Conference on Prognostics and Health Management (ICPHM). pp. 70–77. IEEE (2017), DOI: 10.1109/ICPHM.2017.7998308

Canonical Ltd.: Linux Containers - LXD-Introduction, accessed: 2018-11-29

Canosa, R., Tchernykh, A., Cort ́es-Mendoza, J.M., Rivera-Rodriguez, R., Rizk, J.E.L., Avetisyan, A., Du, Z., Radchenko, G., Morales, E.C.: Energy consumption and quality of service optimization in containerized cloud computing. In: The Proceedings of the 2018 Ivannikov ISPRAS Open Conference (ISPRAS 2018). pp. 47–55. IEEE, Mocsow (2018), DOI: 10.1109/ISPRAS.2018.00014

Che, J., Shi, C., Yu, Y., Lin, W.: A Synthetical Performance Evaluation of OpenVZ, Xen and KVM. In: 2010 IEEE Asia-Pacific Services Computing Conference. pp. 587–594. IEEE (2010), DOI: 10.1109/APSCC.2010.83

Cloud Foundry Foundation: Diego Components and Architecture | Cloud Foundry Docs., accessed: 2018-12-04

Cloudify Platform: Cloud NFV Orchestration Based on TOSCA | Cloudify. https: //, accessed: 2018-12-05

CRI-O author: Cri-o., accessed: 2018-11-30

Docker Inc.: Docker - Build, Ship, and Run Any App, Anywhere., accessed: 2019-02-28

Docker Inc.: Docker Compose | Docker Documentation., accessed: 2018-12-04

Docker Inc.: Swarm mode key concepts | Docker Documentation., accessed: 2018-12-04

Dua, R., Raja, A.R., Kakadia, D.: Virtualization vs Containerization to Support PaaS. In: 2014 IEEE International Conference on Cloud Engineering. pp. 610–614. IEEE (2014), DOI: 10.1109/IC2E.2014.41

Duato, J., Pena, A.J., Silla, F., Mayo, R., Quintana-Orti, E.S.: rCUDA: Reducing the number of GPU-based accelerators in high performance clusters. In: 2010 International Conference on High Performance Computing & Simulation. pp. 224–231. IEEE (2010), DOI: 10.1109/HPCS.2010.5547126

Dukaric, R., Juric, M.B.: Towards a unified taxonomy and architecture of cloud frameworks. Future Generation Computer Systems 29(5), 1196–1210 (2013), DOI: 10.1016/j.future.2012.09.006

Fayyad-Kazan, H., Perneel, L., Timmerman, M.: Full and Para-Virtualization with Xen: A Performance Comparison. Journal of Emerging Trends in Computing and Information Sciences 4(9), 719–727 (2013)

Felter, W., Ferreira, A., Rajamony, R., Rubio, J.: An updated performance comparison of virtual machines and Linux containers. In: 2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). pp. 171–172. IEEE (2015), DOI: 10.1109/ISPASS.2015.7095802

Filgueira, R., Da Silva, R.F., Deelman, E., Christodoulou, V., Krause, A.: IoT-Hub: New IoT data-platform for Virtual Research Environments. In: 10th International Workshop on Science Gateways (IWSG 2018). pp. 13–15 (2018)

Firesmith, D.: Virtualization via Containers. 2017/09/virtualization-via-containers.html (2017), accessed: 2018-02-28

Gass, O., Meth, H., Maedche, A.: PaaS Characteristics for Productive Software Development: An Evaluation Framework. IEEE Internet Computing 18(1), 56–64 (2014), DOI: 10.1109/MIC.2014.12

Gerdau, B.L., Weier, M., Hinkenjann, A.: Containerized Distributed Rendering for Inter-active Environments. In: EuroVR 2017: Virtual Reality and Augmented Reality. pp. 69–86 (2017), DOI: 10.1007/978-3-319-72323-5_5

Giunta, G., Montella, R., Agrillo, G., Coviello, G.: A GPGPU Transparent Virtualization Component for High Performance Computing Clouds. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), pp. 379–391 (2010), DOI: 10.1007/978-3-642-15277-1_37

Gupta, V., Gavrilovska, A., Schwan, K., Kharche, H., Tolia, N., Talwar, V., Ranganathan, P.: GViM: GPU-accelerated virtual machines. In: Proceedings of the 3rd ACM Workshop on System-level Virtualization for High Performance Computing - HPCVirt ’09. pp. 17–24. ACM Press, New York, New York, USA (2009), DOI: 10.1145/1519138.1519141

Huang, Q., Xia, J., Yang, C., Liu, K., Li, J., Gui, Z., Hassan, M., Chen, S.: An experimental study of open-source cloud platforms for dust storm forecasting. In: Proceedings of the 20th International Conference on Advances in Geographic Information Systems - SIGSPATIAL ’12. p. 534. ACM Press, New York, New York, USA (2012), DOI: 10.1145/2424321.2424408

Huang, Q., Yang, C., Liu, K., Xia, J., Xu, C., Li, J., Gui, Z., Sun, M., Li, Z.: Evaluating open-source cloud computing solutions for geosciences. Computers & Geosciences 59, 41–52 (2013), DOI: 10.1016/j.cageo.2013.05.001

Huber, N., von Quast, M., Hauck, M., Kounev, S.: Evaluating and Modeling Virtualization Performance Overhead for Cloud Environments. In: Proceedings of the 1st International Conference on Cloud Computing and Services Science. pp. 563–573. SciTePress - Science and and Technology Publications (2011), DOI: 10.5220/0003388905630573

IT Solution Architects: Containers 102: Continuing the Journey from OS Virtualization to Workload Virtualization. (2017), accessed: 2019-02-27

Kang, D., Jun, T.J., Kim, D., Kim, J., Kim, D.: ConVGPU: GPU Management Middleware in Container Based Virtualized Environment. In: 2017 IEEE International Conference on Cluster Computing (CLUSTER). pp. 301–309. IEEE (2017), DOI: 10.1109/CLUSTER.2017.17

Kim, J., Jun, T.J., Kang, D., Kim, D., Kim, D.: GPU Enabled Serverless Computing Framework. In: 2018 26th Euromicro International Conference on Paral- lel, Distributed and Network-based Processing (PDP). pp. 533–540. IEEE (2018), DOI: 10.1109/PDP2018.2018.00090

Kivity, A., Kamay, Y., Laor, D., Lublin, U., Liguori, A.: kvm: the Linux Virtual Machine Monitor. In: Ottawa Linux Symposium (2007), DOI: 10.1186/gb-2008-9-1-r8

Koukis, V., Venetsanopoulos, C., Koziris, N.: ̃okeanos: Building a Cloud, Cluster by Cluster. IEEE Internet Computing 17(3), 67–71 (2013), DOI: 10.1109/MIC.2013.43

KVM contributors: Kernel Virtual Machine., accessed: 2018-11-29

von Laszewski, G., Diaz, J., Wang, F., Fox, G.C.: Comparison of Multiple Cloud Frame- works. In: 2012 IEEE Fifth International Conference on Cloud Computing. pp. 734–741. IEEE (2012), DOI: 10.1109/CLOUD.2012.104

Lee, M., Shin, S., Hong, S., Song, S.k.: BAIPAS: Distributed Deep Learning Platform with Data Locality and Shuffling. In: 2017 European Conference on Electrical Engineering and Computer Science (EECS). pp. 5–8. IEEE (2017), DOI: 10.1109/EECS.2017.10

Li, J., Wang, Q., Jayasinghe, D., Park, J., Zhu, T., Pu, C.: Performance Overhead among Three Hypervisors: An Experimental Study Using Hadoop Benchmarks. In: 2013 IEEE International Congress on Big Data. pp. 9–16. IEEE (2013), DOI: 10.1109/BigData.Congress.2013.11

Madhavapeddy, A., Mortier, R., Rotsos, C., Scott, D., Singh, B., Gazagnaire, T., Smith, S., Hand, S., Crowcroft, J.: Unikernels. ACM SIGPLAN Notices 48(4), 461–472 (2013), DOI: 10.1145/2499368.2451167

Madhavapeddy, A., Scott, D.J.: Unikernels. Communications of the ACM 57(1), 61–69 (2014), DOI: 10.1145/2541883.2541895

Martin-Flatin, J.: Challenges in Cloud Management. IEEE Cloud Computing 1(1), 66–70 (2014), DOI: 10.1109/MCC.2014.4

Mavridis, I., Karatza, H.: Performance and Overhead Study of Containers Running on Top of Virtual Machines. In: 2017 IEEE 19th Conference on Business Informatics (CBI). pp. 32–38. IEEE (2017), DOI: 10.1109/CBI.2017.69

Mercl, L., Pavlik, J.: The Comparison of Container Orchestrators. In: Third International Congress on Information and Communication Technology, pp. 677–685. Springer (2019), DOI: 10.1007/978-981-13-1165-9_62

Mesosphere Inc.: Marathon: A container orchestration platform for Mesos and DC/OS., accessed: 2018-12-05

Microsoft: Windows Containers on Windows Server | Microsoft Docs. https://docs. start-windows-server, accessed: 2018-11-29

Naik, N.: Migrating from Virtualization to Dockerization in the Cloud: Simulation and Evaluation of Distributed Systems. In: 2016 IEEE 10th International Symposium on the Maintenance and Evolution of Service-Oriented and Cloud-Based Environments (MESOCA). pp. 1–8. IEEE (2016), DOI: 10.1109/MESOCA.2016.9

Noor, T.H., Sheng, Q.Z., Ngu, A.H., Dustdar, S.: Analysis of Web-Scale Cloud Services. IEEE Internet Computing 18(4), 55–61 (2014), DOI: 10.1109/MIC.2014.64

NVIDIA: GPU-Enabled Docker Container | NVIDIA., accessed: 2018-12-01

NVIDIA: NVIDIA Container Runtime | NVIDIA Developer., accessed: 2018-11-30

NVIDIA: nvidia-docker., accessed: 2018-11-29

NVIDIA: nvidia docker plugin. nvidia-docker-plugin, accessed: 2018-11-30

OASIS: OASIS Topology and Orchestration Specification for Cloud Applications (TOSCA) TC., accessed: 2018-11-28

OpenNebula Project: Home - OpenNebula., accessed: 2018-12-04

OpenStack Foundation: OpenStack: Build the future of Open Infrastructure., accessed: 2018-12-04

OpenVZ community: Open source container-based virtualization for Linux. https://, accessed: 2018-11-29

Oracle: Changelog for VirtualBox 4.1., accessed: 2019-03-06

Oracle: VirtualBox., accessed: 2018-11-29

Padala, P., Zhu, X., Wang, Z., Singhal, S., Shin, K.G.: Performance Evaluation of Virtualization Technologies for Server Consolidation. HP Technical Reports (2007), DOI:

Pahl, C., Lee, B.: Containers and clusters for edge cloud architectures-A technology review. In: Proceedings - 2015 International Conference on Future Internet of Things and Cloud, FiCloud 2015 and 2015 International Conference on Open and Big Data, OBD 2015 (2015), DOI: 10.1109/FiCloud.2015.35

Palit, H.N., Li, X., Lu, S., Larsen, L.C., Setia, J.A.: Evaluating hardware-assisted virtualization for deploying HPC-as-a-service. In: Proceedings of the 7th international workshop on Virtualization technologies in distributed computing - VTDC ’13. p. 11. ACM Press, New York, New York, USA (2013), DOI: 10.1145/2465829.2465833

Pasztor, J.: LXC vs Docker. (2018), accessed: 2018-02-28

Pivotal Software Inc.: GETTING STARTED: Building an Application with Spring Boot., accessed: 2018-11-29

Qanbari, S., Li, F., Dustdar, S.: Toward Portable Cloud Manufacturing Services. IEEE Internet Computing 18(6), 77–80 (2014), DOI: 10.1109/MIC.2014.125

Ranjan, R.: The Cloud Interoperability Challenge. IEEE Cloud Computing 1(2), 20–24 (2014), DOI: 10.1109/MCC.2014.41

Red Hat Inc.: OpenShift: Container Application Platform by Red Hat, Built on Docker and Kubernetes., accessed: 2018-12-04

Red Hat Inc.: rkt, a security-minded, standards-based container engine. https://coreos. com/rkt/, accessed: 2018-11-30

Red Hat Inc.: WildFly., accessed: 2018-11-29

Reuther, A., Byun, C., Arcand, W., Bestor, D., Bergeron, B., Hubbell, M., Jones, M., Michaleas, P., Prout, A., Rosa, A., Kepner, J.: Scheduler technologies in support of high performance data analysis. In: 2016 IEEE High Performance Extreme Computing Conference (HPEC). pp. 1–6. IEEE (2016), DOI: 10.1109/HPEC.2016.7761604

Sabeur, Z.A., Correndo, G., Veres, G., Arbab-Zavar, B., Lorenzo, J., Habib, T., Haugommard, A., Martin, F., Zigna, J.M., Weller, G.: EO Big Data Connectors and Analytics for Understanding the Effects of Climate Change on Migratory Trends of Marine Wildlife. In: ISESS 2017: Environmental Software Systems. Computer Science for Environmental Protection. pp. 85–94. Zadar (2017), DOI: 10.1007/978-3-319-89935-0_8

Sarai, A.: CVE-2019-5736: runc container breakout (all versions). oss-sec/2019/q1/119, accessed: 2019-03-07

Sempolinski, P., Thain, D.: A Comparison and Critique of Eucalyptus, OpenNebula and Nimbus. In: 2010 IEEE Second International Conference on Cloud Computing Technology and Science. pp. 417–426. IEEE (2010), DOI: 10.1109/CloudCom.2010.42

Shi, L., Chen, H., Sun, J., Li, K.: vCUDA: GPU-Accelerated High-Performance Computing in Virtual Machines. IEEE Transactions on Computers 61(6), 804–816 (2012), DOI: 10.1109/TC.2011.112

Shirinbab, S., Lundberg, L., Casalicchio, E.: Performance evaluation of container and virtual machine running cassandra workload. In: 2017 3rd International Conference of Cloud Computing Technologies and Applications (CloudTech). pp. 1–8. IEEE (2017), DOI: 10.1109/CloudTech.2017.8284700

Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The Hadoop distributed file system. In: 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies, MSST2010 (2010), DOI: 10.1109/MSST.2010.5496972

Singh, R.: LXD vs Docker., accessed: 2018-11-29

Soltesz, S., P ̈otzl, H., Fiuczynski, M.E., Bavier, A., Peterson, L.: Container-based operating system virtualization. ACM SIGOPS Operating Systems Review 41(3), 275 (2007), DOI: 10.1145/1272998.1273025

Soualhia, M., Khomh, F., Tahar, S.: Task Scheduling in Big Data Platforms: A Systematic Literature Review. Journal of Systems and Software 134, 170–189 (2017), DOI: 10.1016/j.jss.2017.09.001

SUSE LLC: SUSE Linux Enterprise Server 11 SP4 Virtualization with KVM., accessed: 2019-03-06

The Eclipse Foundation: Eclipse Jetty., accessed: 2018-11-29

The Linux Foundation: Home - Open Containers Initiative., accessed: 2018-11-29

The Linux Foundation: Production-Grade Container Orchestration - Kubernetes., accessed: 2018-12-04

The Linux Foundation: Xen Project Release Features., accessed: 2019-03-06

Uhlig, R., Neiger, G., Rodgers, D., Santoni, A., Martins, F., Anderson, A., Bennett, S., Kagi, A., Leung, F., Smith, L.: Intel virtualization technology. Computer 38(5), 48–56 (2005), DOI: 10.1109/MC.2005.163

University of Chicago: Nimbus., accessed: 2018-12-04

VMware: VMware Configuration Maximus., accessed: 2019-03-06

VMware: VMware ESXi: The Purpose-Built Bare Metal Hypervisor., accessed: 2018-11-29

Voras, I., Mihaljevi ́c, B., Orli ́c, M., Pletikosa, M., Zˇagar, M., Pavi ́c, T., Zimmer, K., Cˇavrak, I., Paunovi ́c, V., Bosni ́c, I., Tomi ́c, S.: Evaluating open-source cloud computing solutions. In: Proceedings of the 34th International Convention for Information and Communication Technology, Electronics and Microelectronics (MIPRO 2011). pp. 209–214. Opatija (2011)

Walraven, S., Truyen, E., Joosen, W.: Comparing PaaS offerings in light of SaaS development. Computing 96(8), 669–724 (2014), DOI: 10.1007/s00607-013-0346-9

Walters, J.P., Chaudhary, V., Cha, M., Jr., S.G., Gallo, S.: A Comparison of Virtualization Technologies for HPC. In: 22nd International Conference on Advanced Information Networking and Applications (aina 2008). pp. 861–868. IEEE (2008), DOI: 10.1109/AINA.2008.45

Walters, J.P., Younge, A.J., Kang, D.I., Yao, K.T., Kang, M., Crago, S.P., Fox, G.C.: GPU Passthrough Performance: A Comparison of KVM, Xen, VMWare ESXi, and LXC for CUDA and OpenCL Applications. In: 2014 IEEE 7th International Conference on Cloud Computing. pp. 636–643. IEEE (2014), DOI: 10.1109/CLOUD.2014.90

Wen, X., Gu, G., Li, Q., Gao, Y., Zhang, X.: Comparison of open-source cloud management platforms: OpenStack and OpenNebula. In: 2012 9th International Conference on Fuzzy Systems and Knowledge Discovery. pp. 2457–2461. IEEE (2012), DOI: 10.1109/FSKD.2012.6234218

Xavier, M.G., Neves, M.V., Rossi, F.D., Ferreto, T.C., Lange, T., De Rose, C.A.F.: Performance Evaluation of Container-Based Virtualization for High Performance Computing Environments. In: 2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing. pp. 233–240. IEEE (2013), DOI: 10.1109/PDP.2013.41

Xen Project community: Xen Project wiki: Dom0., accessed: 2018-11-29

Younge, A.J., Henschel, R., Brown, J.T., von Laszewski, G., Qiu, J., Fox, G.C.: Analysis of Virtualization Technologies for High Performance Computing Environments. In: 2011 IEEE 4th International Conference on Cloud Computing. pp. 9–16. IEEE (2011), DOI: 10.1109/CLOUD.2011.29

Zhenyun Zhuang, Cuong Tran, Weng, J., Ramachandra, H., Sridharan, B.: Taming memory related performance pitfalls in linux Cgroups. In: 2017 International Conference on Computing, Networking and Communications (ICNC). pp. 531–535. IEEE (2017), DOI: 10.1109/ICCNC.2017.7876184

Publishing Center of South Ural State University (454080, Lenin prospekt, 76, Chelyabinsk, Russia)