High-performance, massively scalable distributed systems using the MapReduce software framework: The SHARD Triple-Store

Kurt Rohloff, Richard E. Schantz

Research output: Chapter in Book/Report/Conference proceedingConference contribution

138 Scopus citations

Abstract

In this paper we discuss the use of the MapReduce software framework to address the challenge of constructing high-performance, massively-scalable distributed systems. We discuss several design considerations associated with constructing complex distributed systems using the MapReduce software framework, including the difficulty of scalably building indexes. We focus on Hadoop, the most popular MapReduce implementation. Our discussion and analysis are motivated by our construction of SHARD, a massively scalable, high-performance and robust triple-store technology on top of Hadoop. We provide a general approach to construct an information system from the MapReduce software framework that responds to data queries. We provide experimental results generated of an early version of SHARD. We close with a discussion of hypothetical MapReduce alternatives that can be used for the construction of more scalable distributed computing systems.

Original languageEnglish (US)
Title of host publicationWorkshop on Programming Support Innovations for Emerging Distributed Applications, PSI EtA - PsiH 2010
DOIs
StatePublished - 2010
Externally publishedYes
EventSPLASH Workshop on Programming Support Innovations for Emerging Distributed Applications, PSI EtA - PsiH 2010 - Reno, NV, United States
Duration: Oct 17 2010Oct 21 2010

Publication series

NameWorkshop on Programming Support Innovations for Emerging Distributed Applications, PSI EtA - PsiH 2010

Other

OtherSPLASH Workshop on Programming Support Innovations for Emerging Distributed Applications, PSI EtA - PsiH 2010
Country/TerritoryUnited States
CityReno, NV
Period10/17/1010/21/10

All Science Journal Classification (ASJC) codes

  • Computational Theory and Mathematics
  • Computer Science Applications

Keywords

  • Distributed computing
  • Graph data
  • MapReduce
  • Performance evaluation
  • Programming
  • SPARQL
  • Semantic Web
  • Systems

Fingerprint

Dive into the research topics of 'High-performance, massively scalable distributed systems using the MapReduce software framework: The SHARD Triple-Store'. Together they form a unique fingerprint.

Cite this