Efficiently discovering critical workflows in scientific explorations

Qihong Shao, Peng Sun, Yi Chen

Research output: Contribution to journalArticlepeer-review

5 Scopus citations


Existing workflow management systems assume that scientists have a well-specified workflow design before the execution. In reality, a lot of scientific discoveries are made as a result of a dynamic process, where scientists keep proposing new hypotheses and verifying them through multiple tries of various experiments before achieving successful experimental results. Consequently, not all the experiments in a workflow execution have necessarily contributed to the final result. In this paper, we investigate the problem of effectively reproducing the results of previous scientific workflow executions by discovering the critical experiments leading to the success and the logical constraints on their execution order. Relational schema and SQL queries have been designed for effectively recording the workflow execution log, efficiently identifying the critical experiments from the log, and recommending experiment reproduction strategies to users. Furthermore, we propose optimization techniques for evaluating such SQL queries according to the unique characteristics of the log data. Experimental evaluations demonstrate the performance speedup of our approach.

Original languageEnglish (US)
Pages (from-to)577-585
Number of pages9
JournalFuture Generation Computer Systems
Issue number5
StatePublished - May 2009
Externally publishedYes

All Science Journal Classification (ASJC) codes

  • Software
  • Hardware and Architecture
  • Computer Networks and Communications


  • Database system
  • Join algorithm
  • Log
  • Scientific workflow


Dive into the research topics of 'Efficiently discovering critical workflows in scientific explorations'. Together they form a unique fingerprint.

Cite this