Distributed Analytics and Smart Workload Prediction with Smartpick
Presenter Type
UNO Graduate Student (Masters)
Major/Field of Study
Computer Science
Advisor Information
Dr. Kwangsung Oh
Location
MBSC306 - G (Masters)
Presentation Type
Oral Presentation
Start Date
24-3-2023 1:00 PM
End Date
24-3-2023 2:15 PM
Abstract
Faster query processing is the key to the success of distributed data analytics systems. With several different alternatives for the compute resource, it becomes quite challenging to determine the optimal resource requirements for the ad-hoc workloads, especially over the Wide Area Network (WAN). In this paper, we incorporate a machine learning-based smart workload prediction scheme within a single data center, called - Smartpick. For resource determinations over the WAN, Smartpick instances for datacenter-specific predictions can be used at multiple data centers with aggregated results finally accumulated through a master-slave architecture. Smartpick’s approach not only helps distributed applications meet their performance goals, but it is also adaptive and flexible with respect to resource needs. Additionally, the proposed approach includes serverless instances - one of the emerging compute resources - for mitigating cold boot-up time in virtual machines and thus handles the latency needs with desired precision. Evaluations of Smartpick on datacenter-specific testbeds encompassing Amazon Web Services and Google Cloud indicate 98.5% and 73.4% prediction accuracies respectively. Furthermore, a revamped version of Smartpick with relay instances is able to achieve nearly 50% reduction in compute costs as opposed to the baseline models.
Scheduling
1-2:15 p.m., 2:30 -3:45 p.m.
Distributed Analytics and Smart Workload Prediction with Smartpick
MBSC306 - G (Masters)
Faster query processing is the key to the success of distributed data analytics systems. With several different alternatives for the compute resource, it becomes quite challenging to determine the optimal resource requirements for the ad-hoc workloads, especially over the Wide Area Network (WAN). In this paper, we incorporate a machine learning-based smart workload prediction scheme within a single data center, called - Smartpick. For resource determinations over the WAN, Smartpick instances for datacenter-specific predictions can be used at multiple data centers with aggregated results finally accumulated through a master-slave architecture. Smartpick’s approach not only helps distributed applications meet their performance goals, but it is also adaptive and flexible with respect to resource needs. Additionally, the proposed approach includes serverless instances - one of the emerging compute resources - for mitigating cold boot-up time in virtual machines and thus handles the latency needs with desired precision. Evaluations of Smartpick on datacenter-specific testbeds encompassing Amazon Web Services and Google Cloud indicate 98.5% and 73.4% prediction accuracies respectively. Furthermore, a revamped version of Smartpick with relay instances is able to achieve nearly 50% reduction in compute costs as opposed to the baseline models.