Distributed Analytics and Smart Workload Prediction with Smartpick

Presenter Information

Anshuman Das MohapatraFollow

Presenter Type

UNO Graduate Student (Masters)

Major/Field of Study

Computer Science

Advisor Information

Dr. Kwangsung Oh

Location

MBSC306 - G (Masters)

Presentation Type

Oral Presentation

Start Date

24-3-2023 1:00 PM

End Date

24-3-2023 2:15 PM

Abstract

Faster query processing is the key to the success of distributed data analytics systems. With several different alternatives for the compute resource, it becomes quite challenging to determine the optimal resource requirements for the ad-hoc workloads, especially over the Wide Area Network (WAN). In this paper, we incorporate a machine learning-based smart workload prediction scheme within a single data center, called - Smartpick. For resource determinations over the WAN, Smartpick instances for datacenter-specific predictions can be used at multiple data centers with aggregated results finally accumulated through a master-slave architecture. Smartpick’s approach not only helps distributed applications meet their performance goals, but it is also adaptive and flexible with respect to resource needs. Additionally, the proposed approach includes serverless instances - one of the emerging compute resources - for mitigating cold boot-up time in virtual machines and thus handles the latency needs with desired precision. Evaluations of Smartpick on datacenter-specific testbeds encompassing Amazon Web Services and Google Cloud indicate 98.5% and 73.4% prediction accuracies respectively. Furthermore, a revamped version of Smartpick with relay instances is able to achieve nearly 50% reduction in compute costs as opposed to the baseline models.

Scheduling

1-2:15 p.m., 2:30 -3:45 p.m.

This document is currently not available here.

COinS
 
Mar 24th, 1:00 PM Mar 24th, 2:15 PM

Distributed Analytics and Smart Workload Prediction with Smartpick

MBSC306 - G (Masters)

Faster query processing is the key to the success of distributed data analytics systems. With several different alternatives for the compute resource, it becomes quite challenging to determine the optimal resource requirements for the ad-hoc workloads, especially over the Wide Area Network (WAN). In this paper, we incorporate a machine learning-based smart workload prediction scheme within a single data center, called - Smartpick. For resource determinations over the WAN, Smartpick instances for datacenter-specific predictions can be used at multiple data centers with aggregated results finally accumulated through a master-slave architecture. Smartpick’s approach not only helps distributed applications meet their performance goals, but it is also adaptive and flexible with respect to resource needs. Additionally, the proposed approach includes serverless instances - one of the emerging compute resources - for mitigating cold boot-up time in virtual machines and thus handles the latency needs with desired precision. Evaluations of Smartpick on datacenter-specific testbeds encompassing Amazon Web Services and Google Cloud indicate 98.5% and 73.4% prediction accuracies respectively. Furthermore, a revamped version of Smartpick with relay instances is able to achieve nearly 50% reduction in compute costs as opposed to the baseline models.