Presenter Information

Minmin ZhangFollow

Advisor Information

Kwangsung Oh

Presentation Type

Poster

Start Date

26-3-2021 12:00 AM

End Date

26-3-2021 12:00 AM

Abstract

Many popular cloud service providers deploy tens of data centers (DCs) around the world to reduce user-perceived latency for better user experiences, in which a large amount of data is generated and stored in a geo-distributed manner. Geo-distributed Data Analytics (GDA) has gained great popularity in meeting the growing demand to mine meaningful and timely knowledge from such highly dispersed data. Since GDA systems require a large data migration between DCs via a wide area network (WAN), many existing works invested significant effort to optimize data transfer strategies to efficiently use limited WAN by considering the network pricing policies on the base of infinite compute resources. However, the compute capacities and pricing policies, the limited and heterogeneous resources at different data centers, were ignored in most of the previous works while some compute-intensive workloads such as machine learning, require more compute resources than WAN resources in GDA. Since cloud providers may offer different compute resource capacities and pricing policies at each DC, any cost-agnostic approach can inflate the overall cost that may lead to a cost bottleneck, incurring more costs than their target budgets. To avoid both performances- and cost- bottlenecks, heterogeneous resource capacities, and their costs should be jointly considered. In this research, we propose a heterogeneous cloud resources cost-aware GDA system, called Butler, that exploits heterogeneous resource costs to meet cost-performance goals. To this end, Butler determines optimal task placements given inputs to achieve the best performance with the target budget. Butler provides an easy way to explore a richer cost-performance tradeoff space for various GDA applications.

COinS
 
Mar 26th, 12:00 AM Mar 26th, 12:00 AM

Heterogeneous resources cost-aware geo-distributed data analytics

Many popular cloud service providers deploy tens of data centers (DCs) around the world to reduce user-perceived latency for better user experiences, in which a large amount of data is generated and stored in a geo-distributed manner. Geo-distributed Data Analytics (GDA) has gained great popularity in meeting the growing demand to mine meaningful and timely knowledge from such highly dispersed data. Since GDA systems require a large data migration between DCs via a wide area network (WAN), many existing works invested significant effort to optimize data transfer strategies to efficiently use limited WAN by considering the network pricing policies on the base of infinite compute resources. However, the compute capacities and pricing policies, the limited and heterogeneous resources at different data centers, were ignored in most of the previous works while some compute-intensive workloads such as machine learning, require more compute resources than WAN resources in GDA. Since cloud providers may offer different compute resource capacities and pricing policies at each DC, any cost-agnostic approach can inflate the overall cost that may lead to a cost bottleneck, incurring more costs than their target budgets. To avoid both performances- and cost- bottlenecks, heterogeneous resource capacities, and their costs should be jointly considered. In this research, we propose a heterogeneous cloud resources cost-aware GDA system, called Butler, that exploits heterogeneous resource costs to meet cost-performance goals. To this end, Butler determines optimal task placements given inputs to achieve the best performance with the target budget. Butler provides an easy way to explore a richer cost-performance tradeoff space for various GDA applications.