Subproject 8 - Big data

Research questions

RQ1: Which aspects (e.g., CPU utilization, memory usage, and disk usage) should be included in triggers that decide when and to where a VM should be migrated?

RQ2: How can we define an automated migration algorithm that provides good utilization of the cloud resources, and how does the performance of the algorithm depend on the (distributed) storage system and the hypervisor used in the cloud?

RQ3: How can we optimize the number of physical servers in the cloud in order to meet the (real-time) performance requirements?

RQ4: How can we predict churn and “happy” users?

RQ5: How to demonstrate the impact of being able to predict churn and “happy” users on the corresponding company’s business?

Aim and scope

Due to cost efficiency and scalability, virtualization and cloud-based systems will be important architectures for doing performance demanding (real-time) analytics of (streaming) big data. In order to provide high utilization of the hardware resources, we want to provide automatic resource orchestration and optimization, e.g., automatic migration of virtual machines (VMs) between different hardware servers, and (semi-)automatic expansion and reduction of the physical resource pool based on fluctuations in the load. Due to recent developments in distributed storage systems, migrating VM between physical servers has become less costly, since modern distributed storage systems make it possible to provide a shared storage area also for a large number of physical server, i.e., when using distributed storage systems we do not need to migrate the files associated with a VM explicitly.

Databases play an important role in most big data applications, and we will therefore evaluate the performance implications of using different database technologies and layouts, e.g., row-based versus column-based databases, and different platforms on which the databases are executed, e.g., clusters and SMPs.

We will also look at a number of optimization problems related to big data, e.g., location data of subscribers in the mobile network and data related to usage of mobile services. The latter form of data sets can be used for detecting the risk of churn and the chance of having content and happy subscribers that will generate a lot of traffic.

Proposed work

We will develop a number of different migration algorithms and triggers. The algorithms will be implemented and evaluated in a private cloud, and this private cloud will be executed in the large cluster system provided by Compuverde. We will also develop a performance model that makes it possible to optimize the number of physical servers in a cloud. A number of optimization problems will be investigated using Gurobi and similar tools. We will also perform a database evaluation of different database technologies on different application platforms.

The churn/”happy user” study will start with a small pilot project with three master students investigating three environments (Andriod, Iphone and laptop). The results from this pilot study will be presented to the companies (primarily Telenor)

Evaluation/methodology

We will measure the application level performance for a number of industrial challenges using different triggers, migration algorithms, hypervisors, and storage systems. We will also develop theoretical results, and compare these with what we observe in the industrial systems and applications.