Aim and scope
Due to cost efficiency and scalability, virtualization and cloud-based systems will be important architectures for doing performance demanding (real-time) analytics of (streaming) big data. In order to provide high utilization of the hardware resources, we want to provide automatic resource orchestration and optimization, e.g., automatic migration of virtual machines (VMs) between different hardware servers, and (semi-)automatic expansion and reduction of the physical resource pool based on fluctuations in the load. Due to recent developments in distributed storage systems, migrating VM between physical servers has become less costly, since modern distributed storage systems make it possible to provide a shared storage area also for a large number of physical server, i.e., when using distributed storage systems we do not need to migrate the files associated with a VM explicitly.
Databases play an important role in most big data applications, and we will therefore evaluate the performance implications of using different database technologies and layouts, e.g., row-based versus column-based databases, and different platforms on which the databases are executed, e.g., clusters and SMPs.
We will also look at a number of optimization problems related to big data, e.g., location data of subscribers in the mobile network and data related to usage of mobile services. The latter form of data sets can be used for detecting the risk of churn and the chance of having content and happy subscribers that will generate a lot of traffic.
We will develop a number of different migration algorithms and triggers. The algorithms will be implemented and evaluated in a private cloud, and this private cloud will be executed in the large cluster system provided by Compuverde. We will also develop a performance model that makes it possible to optimize the number of physical servers in a cloud. A number of optimization problems will be investigated using Gurobi and similar tools. We will also perform a database evaluation of different database technologies on different application platforms.
The churn/”happy user” study will start with a small pilot project with three master students investigating three environments (Andriod, Iphone and laptop). The results from this pilot study will be presented to the companies (primarily Telenor)
We will measure the application level performance for a number of industrial challenges using different triggers, migration algorithms, hypervisors, and storage systems. We will also develop theoretical results, and compare these with what we observe in the industrial systems and applications.
Results so far
Understanding of the performance implications of:
- Using virtualization
- Having one big VM vs. many smaller VMs
- Having over allocation of vCPUs in relation to the physically available cores
- VM live migration
Understanding state-of-the-art in automatic load balancing in virtual environments.
Performance bounds on real-time scheduling in virtualized environments.
Expected long term results and impact
- Automatic controller for cloud-based big data platforms
- Understanding performance aspects of different database platforms
- Optimizing telecommuication infrastructure usage
- Understanding and exploiting how quality impacts business