Scalable Query Optimization for Efficient Data Processing Using MapReduce

Yi Shan, Yi Chen

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Scopus citations

Abstract

MapReduce is widely acknowledged by both industry and academia as an effective programming model for query processing on big data. It is crucial to design an optimizer which finds the most efficient way to execute an SQL query using MapReduce. However, existing work in parallel query processing either falls short of optimizing an SQL query using MapReduce or the time complexity of the optimizer it uses is exponential. Also, industry solutions such as HIVE, and YSmart do not optimize the join sequence of an SQL query and cannot guarantee an optimal execution plan. In this paper, we propose a scalable optimizer for SQL queries using MapReduce, named SOSQL. Experiments performed on Google Cloud Platform confirmed the scalability and efficiency of SOSQL over existing work.

Original languageEnglish (US)
Title of host publicationProceedings - 2015 IEEE International Congress on Big Data, BigData Congress 2015
EditorsLatifur Khan, Carminati Barbara
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages649-652
Number of pages4
ISBN (Electronic)9781467372787
DOIs
StatePublished - Aug 17 2015
Event4th IEEE International Congress on Big Data, BigData Congress 2015 - New York City, United States
Duration: Jun 27 2015Jul 2 2015

Publication series

NameProceedings - 2015 IEEE International Congress on Big Data, BigData Congress 2015

Other

Other4th IEEE International Congress on Big Data, BigData Congress 2015
Country/TerritoryUnited States
CityNew York City
Period6/27/157/2/15

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications
  • Computer Science Applications
  • Information Systems

Keywords

  • Big data
  • Google
  • Industries
  • Optimization
  • Partitioning algorithms
  • Query processing
  • Time complexity

Fingerprint

Dive into the research topics of 'Scalable Query Optimization for Efficient Data Processing Using MapReduce'. Together they form a unique fingerprint.

Cite this