Throughput optimization for Storm-based processing of stream data on clouds

Huiyan Cao, Chase Q. Wu, Liang Bao, Aiqin Hou, Wei Shen

Research output: Contribution to journalArticlepeer-review

8 Scopus citations

Abstract

There is a rapidly growing need for processing large volumes of streaming data in real time in various big data applications. As one of the most commonly used systems for streaming data processing, Apache Storm provides a workflow-based mechanism to execute directed acyclic graph (DAG)-structured topologies. With the expansion of cloud infrastructures around the globe and the economic benefits of cloud-based computing and storage services, many such Storm workflows have been shifted or are in active transition to clouds. However, modeling the behavior of streaming data processing and improving its performance in clouds still remain largely unexplored. We construct rigorous cost models to analyze the throughput dynamics of Storm workflows and formulate a budget-constrained topology mapping problem to maximize Storm workflow throughput in clouds. We show this problem to be NP-complete and design a heuristic solution that takes into consideration not only the selection of virtual machine type but also the degree of parallelism for each task (spout/bolt) in the topology. The performance superiority of the proposed mapping solution is illustrated through extensive simulations and further verified by real-life workflow experiments deployed in public clouds in comparison with the default Storm and other existing methods.

Original languageEnglish (US)
Pages (from-to)567-579
Number of pages13
JournalFuture Generation Computer Systems
Volume112
DOIs
StatePublished - Nov 2020

All Science Journal Classification (ASJC) codes

  • Software
  • Hardware and Architecture
  • Computer Networks and Communications

Keywords

  • Apache Storm
  • Scientific workflows
  • Throughput optimization
  • Workflow mapping

Fingerprint

Dive into the research topics of 'Throughput optimization for Storm-based processing of stream data on clouds'. Together they form a unique fingerprint.

Cite this