Clause-Iteration with MapReduce to scalably query data graphs in the SHARD graph-store

Kurt Rohloff, Richard E. Schantz

Research output: Chapter in Book/Report/Conference proceedingConference contribution

45 Scopus citations

Abstract

Graph data processing is an emerging application area for cloud computing because there are few other information infrastructures that cost-effectively permit scalable graph data processing. We present a scalable cloud-based approach to process queries on graph data utilizing the MapReduce model. We call this approach the Clause-Iteration approach. We present algorithms that, when used in conjunction with a MapReduce framework, respond to SPARQL queries over RDF data. Our innovation in the Clause-Iteration approach comes from 1) the iterative construction of query responses by incrementally growing the number of query clauses considered in a response, and 2) our use of flagged keys to join the results of these incremental responses. The Clause-Iteration algorithms form the basis of our scalable, SHARD graph-store built on the Hadoop implementation of MapReduce. SHARD performs favorably when compared to existing "industrial" graph-stores on a standard benchmark graph with 800 million edges. We discuss design considerations and alternatives associated with constructing scalable graph processing technologies.

Original languageEnglish (US)
Title of host publicationDIDC'11 - Proceedings of the 4th International Workshop on Data-Intensive Distributed Computing
Pages35-44
Number of pages10
DOIs
StatePublished - Jul 15 2011
Externally publishedYes
Event4th International Workshop on Data-Intensive Distributed Computing, DIDC 2011, held in conjunction with the ACM International Conference on High-Performance Parallel and Distributed Computing, HPDC 2011 - San Jose, CA, United States
Duration: Jun 8 2011Jun 8 2011

Publication series

NameDIDC'11 - Proceedings of the 4th International Workshop on Data-Intensive Distributed Computing

Other

Other4th International Workshop on Data-Intensive Distributed Computing, DIDC 2011, held in conjunction with the ACM International Conference on High-Performance Parallel and Distributed Computing, HPDC 2011
CountryUnited States
CitySan Jose, CA
Period6/8/116/8/11

All Science Journal Classification (ASJC) codes

  • Computational Theory and Mathematics
  • Computer Science Applications
  • Applied Mathematics

Keywords

  • Algorithms
  • Distributed computing
  • Graph data
  • Mapreduce
  • Performance evaluation
  • SPARQL
  • Semantic web
  • Systems

Fingerprint Dive into the research topics of 'Clause-Iteration with MapReduce to scalably query data graphs in the SHARD graph-store'. Together they form a unique fingerprint.

Cite this