Semantics-aware prediction for analytic qeries in MapReduce environment

Weikuan Yu, Zhuo Liu, Xiaoning Ding

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Scopus citations

Abstract

MapReduce has emerged as a powerful data processing engine that supports large-scale complex analytics applications, most of which are written in declarative query languages such as HiveQL and Pig Latin. Analytic queries are typically compiled into execution plans in the form of directed acyclic graphs (DAGs) of MapReduce jobs. Jobs in the DAGs are dispatched to the MapReduce processing engine as soon as their dependencies are satisfied. MapReduce adopts a job-level scheduling policy to strive for balanced distribution of tasks and effective utilization of resources. However, there is a lack of query-level semantics in the purely task-based scheduling algorithms, resulting in resource thrashing among queries and an overall degradation of performance. Therefore, we introduce a semantic-aware query prediction framework to address these problems systematically. Our framework includes three major techniques: cross-layer semantics percolation, selectivity estimation, and multivariate time prediction for analytic queries. Multivariate query prediction allows us not only to gauge the dynamic size of analytics datasets, but also to accurately predict the resource usage (e.g., numbers of map and reduce tasks) of individual MapReduce jobs and whole queries. In addition, the accurate prediction and queuing of queries can be potentially exploited by Hadoop scheduling for optimizing overall query performance. Based on the query prediction, our case study scheduler demonstrates significant performance improvement compared to traditional Hadoop schedulers.

Original languageEnglish (US)
Title of host publication47th International Conference on Parallel Processing, ICPP 2018
Subtitle of host publicationWorkshop Proceedings
PublisherAssociation for Computing Machinery
ISBN (Print)9781450365239
DOIs
StatePublished - Aug 13 2018
Event47th International Conference on Parallel Processing, ICPP 2018 - Eugene, United States
Duration: Aug 13 2018Aug 16 2018

Publication series

NameACM International Conference Proceeding Series

Other

Other47th International Conference on Parallel Processing, ICPP 2018
Country/TerritoryUnited States
CityEugene
Period8/13/188/16/18

All Science Journal Classification (ASJC) codes

  • Software
  • Human-Computer Interaction
  • Computer Vision and Pattern Recognition
  • Computer Networks and Communications

Keywords

  • Analytics query
  • MapReduce
  • Scheduling
  • Semantics-aware

Fingerprint

Dive into the research topics of 'Semantics-aware prediction for analytic qeries in MapReduce environment'. Together they form a unique fingerprint.

Cite this