Advisor Information
Kwangsung Oh
Presentation Type
Poster
Start Date
26-3-2021 12:00 AM
End Date
26-3-2021 12:00 AM
Abstract
Many popular cloud service providers deploy tens of data centers (DCs) around the world to reduce user-perceived latency for better user experiences, in which a large amount of data is generated and stored in a geo-distributed manner. Geo-distributed Data Analytics (GDA) has gained great popularity in meeting the growing demand to mine meaningful and timely knowledge from such highly dispersed data. Since GDA systems require a large data migration between DCs via a wide area network (WAN), many existing works invested significant effort to optimize data transfer strategies to efficiently use limited WAN by considering the network pricing policies on the base of infinite compute resources. However, the compute capacities and pricing policies, the limited and heterogeneous resources at different data centers, were ignored in most of the previous works while some compute-intensive workloads such as machine learning, require more compute resources than WAN resources in GDA. Since cloud providers may offer different compute resource capacities and pricing policies at each DC, any cost-agnostic approach can inflate the overall cost that may lead to a cost bottleneck, incurring more costs than their target budgets. To avoid both performances- and cost- bottlenecks, heterogeneous resource capacities, and their costs should be jointly considered. In this research, we propose a heterogeneous cloud resources cost-aware GDA system, called Butler, that exploits heterogeneous resource costs to meet cost-performance goals. To this end, Butler determines optimal task placements given inputs to achieve the best performance with the target budget. Butler provides an easy way to explore a richer cost-performance tradeoff space for various GDA applications.
Heterogeneous resources cost-aware geo-distributed data analytics
Many popular cloud service providers deploy tens of data centers (DCs) around the world to reduce user-perceived latency for better user experiences, in which a large amount of data is generated and stored in a geo-distributed manner. Geo-distributed Data Analytics (GDA) has gained great popularity in meeting the growing demand to mine meaningful and timely knowledge from such highly dispersed data. Since GDA systems require a large data migration between DCs via a wide area network (WAN), many existing works invested significant effort to optimize data transfer strategies to efficiently use limited WAN by considering the network pricing policies on the base of infinite compute resources. However, the compute capacities and pricing policies, the limited and heterogeneous resources at different data centers, were ignored in most of the previous works while some compute-intensive workloads such as machine learning, require more compute resources than WAN resources in GDA. Since cloud providers may offer different compute resource capacities and pricing policies at each DC, any cost-agnostic approach can inflate the overall cost that may lead to a cost bottleneck, incurring more costs than their target budgets. To avoid both performances- and cost- bottlenecks, heterogeneous resource capacities, and their costs should be jointly considered. In this research, we propose a heterogeneous cloud resources cost-aware GDA system, called Butler, that exploits heterogeneous resource costs to meet cost-performance goals. To this end, Butler determines optimal task placements given inputs to achieve the best performance with the target budget. Butler provides an easy way to explore a richer cost-performance tradeoff space for various GDA applications.